Fix canonicalization problems (part 3 of 3)

Ian Lurie

This is part 3 in a series about canonicalization issues. Part 1 defined canonicalization. Part 2 gave advice for tracking down canonical problems on your site. This article deals with fixing the problems you just found.

Now that you’ve found your canonicalization problems, you need to fix them.
I’ve got 5 solutions for ya:

1: Just fix it

The best way to fix canonicalization problems is to fix them.
If you link to your home page 4 different ways, pick one and make your links consistent.
If you added query strings like ?link=1234 all over your site so that you could track clicks, get rid of them. Use something like event tracking in Google Analytics, instead.
Got session IDs all over the place? Get rid of them, and use cookie variables.
Repair whatever it is that’s creating multiple URLs for one page of content.
This is hard work. Doing most things right involves hard work. The payoff, though, is that you don’t have to depend on weird, semi-supported tags like rel=canonical or huge webs of complex 301 redirects.
And, if you really fix the problem, then the fix scales: New pages and content will behave themselves, and you’ll have less work in the long run. Anything else is a duct tape. Which, contrary to popular myth, won’t fix everything.
Good: Works forever. Makes your site well-coded. Builds good karma. Won’t fail when the search engines buy each other or change their minds about standards or whatever.
Bad: Requires higher thought.

2: Robots META tag

You can use the robots META tag to hide all but one version of the guilty pages.
Say you’ve got a canonicalization problem that looks like this:
…where all of those URLs go to the exact same page.
You can fix the problem by telling search engines to ignore the page at all but the first URL. Add this in the <head/> element:

<meta name=”robots” content=”noindex,nofollow”>

Important: You need to use some kind of conditional logic to only show that robots tag when there’s a ‘referrer’ attribute in the URL. Here’s what it’d look like in plain English:
IF there’s a thing called “referrer” in the URL, then insert <meta name=”robots” content=”noindex,nofollow”> in the page.
And in PHP:

if (.$_GET[‘referrer’]) {
echo “<meta name=”robots” content=”noindex,nofollow”>”

I’m at best a rookie PHP developer, so let me know if I screwed this up.

Without the conditional logic, you’ll hide every instance of the page, including the nice short one.
Good: Easy. Appeals to the spaghetti programmer in me.
Bad: Somehow, there’s always one case you miss. Next developer down the line will probably delete it, laughing at you the entire time. Only works on dynamic sites.

3: Use robots.txt

Continuing the example from above, you could use regular expressions to exclude all urls that include “referrer” from the search engine index.
Something as simple as:

User-agent: *
Disallow: /*?referrer=

might do the trick.
Good: It’s so easy. One little line in the robots.txt file and you’re all set. Sweet!
Bad: If done wrong, may cause your site to fall into a black hole. Also, different search engines support robots.txt differently.

4: Use 301 redirects

If you have a case where the problem stems from inconsistent linking practices like:
…where all four URLs point at your home page, you can use a 301 redirect to fix it.
Set up a 301 redirect from each of the 3 URLs you don’t want indexed to the one that you do. When search engines visit your site, they’ll scoot over to the correct page and index that one.
They’ll even apply most of the link authority from the incorrect URLs to the correct one.
This is also your best bet if external sites are linking to the wrong home page URL.
Good: Easy (if you have server access). Approved by all search engines. Can also be done using a scripting language like PHP. Works for external links to your site, too.
Bad: Tedious. Requires server access (or a programmer). Done wrong, may create endless loops that turn your data center into a mushroom cloud.

5: Webmaster tools

Google Webmaster Tools will let you exclude parameters in the toolset. Log into Google Webmaster Tools, then go to Settings and click ‘Adjust parameter settings’. Using the ‘referrer=’ example from #2, you’d do this:
Voila. Googlebot will strip out those URL attributes.
You can also set your preferred domain on the same screen:
Good: It’s just forms and stuff.
Bad: May not prevent canonicalization problems. Depends on Google and whatever’s going on in its pointy little head. Not supported by Yahoo! or Bing.

What now?

Now you know what canonicalization is, how to find problems with it, and 5 possible solutions. Start by checking your site. If you find a problem, sit down with your development team (if you have one) and work out a solution. Start at #1 as the best solution and work your way down.
Oh, and a piece of advice: Don’t tell anyone there’s a 2-5. #1 is what you want. Use 2-5 when all hope is lost.
Back to part 2: How to detect canonical problems on your site. It’s easy-peasy!

Related Reading

Ian Lurie

Ian Lurie is the founder of Portent. He's been a digital marketer since the days of AOL and Compuserve (25 years, if you're counting). Ian's recorded training for, writes regularly for the Portent Blog and has been published on AllThingsD, Smashing Magazine, and TechCrunch. Ian speaks at conferences around the world, including SearchLove, MozCon, Seattle Interactive Conference and ad:Tech. He has published several books about business and marketing: One Trick Ponies Get Shot, available on Kindle, The Web Marketing All-In-One Desk Reference for Dummies, and Conversation Marketing. Ian is now an independent consultant and continues to work with the Portent team- training the agency group on all things digital. You can find him at

Start call to action

See how Portent can help you own your piece of the web.

End call to action


  1. Awesome post, Ian. This couldn’t come at a better time! I’m going through this whole issue with the site I’m working on right now.
    I have a question for ya: In regards to the Webmaster Tools fix: The site I work on currently has the issue of Default,aspx/default.aspx issue for every URL. Can I specify in webmaster tools to ignore that? Would that help?

  2. @Dana Unfortunately, it won’t help with that. It only helps with query attributes like ?this=that and www versus non www.

  3. Hey Ian, thanks for this, well writen and easily understood.
    I’m about to try to fix a few canonicalization issues in some sites and just wanted to make a note that for apache servers, the 303 solution which will cover most of the issues mentioned may be well executed through .htaccess

  4. @Andreas I’d use a 301 redirect, instead. I’m not 100% sure how search engines handle 303’s, but since that’s basically a ‘see other’ my guess is they won’t pass pagerank. Anyone else know?

Comments are closed.

Close search overlay