3 Reasons to use rel=canonical, and 4 reasons not to.

The major search engines just announced that they’re going to start supporting a new link attribute: ‘rel=”canonical”‘.

The Google Webmaster Central blog has a fair amount of detail, so I won’t explain it all here.

A very brief lesson in canonicalization

Marcelo rightly pointed out a quick refresher might be in order. Here’s what a canonicalization problem is:

Say you have a big web site with lots of pages. On most pages, you link to one critical article at:
http://www.mysite.com/index.cfm?artid=123
But, on a few pages, you linked to the same page using a slightly different URL:
http://www.mysite.com/index.cfm?artid=123&this=that

That difference is a canonical difference between the URLs.

You and I know perfectly well they’re the same page. But a search engine does not. So they cheerfully crawl your site, splitting link authority and relevance between the two URLs and Ralph Naderizing (or Ross Peroing) your link building efforts.

Mixing home page links between www.mysite.com, mysite.com and www.mysite.com/index.cfm can have the same effect.

It’s a major SEO bugaboo – my company has invested a lot of time and money in building a toolset just to sniff out these problems. So finding the problems aren’t that hard. Fixing them is another matter entirely, because:

When you go to your development team and ask them to fix canonicalization issues, they start sticking pins in voodoo dolls.
Other sites may link to you using canonically inconsistent URLs. It’s even harder to get them to change.
Tracking down all the link sources is a major pain.

So, the new rel=canonical tag could save us a lot of pain.

Or not. Read on to get my two sides of the argument: To use, or not to use.

Reasons not to use the new canonicalization standard

But, ever the skeptic, I have a few things I think you should consider before diving into rel=canonical:

Google doesn’t offer unconditional support. Google says they will “take your preference into account” and that “it is a hint they will honor strongly”. So rel=canonical is not a directive.
Standards? We don’t need no stinkin’ standards. To this day, Google, Yahoo! and Live each support rel=nofollow in different ways. Compared to rel=canonical, nofollow was a cinch. Implementation is bound to be messy, at least until Google grinds its competitors into goo.
You should’ve done it right the first time. You hear me. Why the hell do people still tack session id’s, random URL variables and other crap onto URLs in this day and age? I don’t know. But relying on someone else to fix the mess for you could be trouble.
You’re paranoid. If you’re afraid the Search Engines Are Out To Get You, then you may not want to hand over even more information about your site’s structure and your intentions.

But At Least We Have Options

That said, I’m pretty damned happy about rel=canonical. It gives me an option on sites where their canonicalization looks like a bowl of dried spaghetti.

So, use rel=canonical if:

You have no other choice. Maybe you inherited an awful CMS implementation. Maybe you don’t know how to code. Maybe your IT department just hates you. Rel=canonical lets you fix the problem with a simple <head> tag.
It’s impossible to track down all the incorrect link sources. If you have an old, established site with thousands of internal and external links all using different canonical URLs, then rel=canonical is a relatively painless fix that you can put in place on the target page. You no longer have to track down every link source.
You want to try it. It can’t hurt, right? (unless you’re paranoid – see above)

I’ll be testing out the new canonicalization tag in the coming weeks, and will let you know how it goes.

Comments

Hey, it would have been nice if you had a short paragraph explaining what the heck rel=canonical is for.
I know you live inside the SEO world, but I mostly get my SEO “news” from you. Unless you tell me what this is for I’ll have to spend time and figure out somewhere else.
Cheers!

Good point. I just added it above.

Marcelo, rel=canonical is designed to determine whether your monastery is kosher per Canon Law. Isn’t it obvious? 😉
Diane, born without a left brain

@Diane you’ve out-hebed me. I didn’t think that was possible.

That’s only because I’ve been listening to my CD of *Guys and Dolls* so much lately that I’ve turned into a female Nathan Detroit. You object? So…sue me, sue me, what can you do me?

Well Ian, i found this article when i was searching what the hell is rel=canonical for. And as i see from most sources, rel=canonical is useful for duplicate content. But i personally prefer sculpting different pages with nofollow. But canonical seems pretty useful when you can’t get a track of nofollows 🙂

@Dena I’d be careful about nofollow – it’s changed quite a bit and doesn’t sculpt the way most SEO’s thought. Nofollow ‘evaporates’ pagerank, instead of funneling it…

“Nofollow ‘evaporates’ pagerank, instead of funneling it…”
Ian, can you expand a little bit please ( I understand the sculpting/funelling arguments )
Recently the google bot has been ‘cleverer’ and following a lot more parameterised links on my website-upshot is google webmaster ‘links to site’ pages have lots more of the these internal URLs. They are to pages which are not significant to my SEO goals ( eg to URLS that record a shorlisted item etc ) , so I have just added rel=’nofollow’ if for no other reason than to get these URLs out of my google WEBMASTER link listings.
Is this a mistake in the making and should I remove the nofollow?

@Jon Don’t use nofollow. It reduces the authority of the page on which you use it, as well as the target page. If you want to change your sitelinks, use the sitelinks tool in Google Webmaster Tools. If you want to remove links from your nav to improve pagerank funneling, you’ll want to actually rearrange the links.

If I don’t want search engines to follow some of my pages isn’t it good to list those pages under Disallow in the robots.txt?

I think some of the comments above have got things a bit off track.
I don’t see the rel= as having anything to do with “pagerank sculpting” and the like, and it’s not an alternative to “nofollow” as some have suggested in comments here.
Simply, if there are multiple URLS that the exact same page can be accessed by, it’s a note to tell Google “Whatever you think the URL of this page is, it’s REAL URL is this one. So every time you get to this page, treat it as if it’s this same URL”.
When it’s useful if your site is built on a stupid program like osCommerce that puts a random string on the end of a URL, it helps if Google doesn’t think each random string is a different page – or otherwise it finds each one and thinks there is no incoming links.
Another case is where for technical reasons you can only get “friendly” URL’s half working, so that if someone goes to the “friendly” url it works, but if they go to the unfriendly url version it doesn’t rewrite and shows that. Good for telling Google that they are actually the same.
In the end if every system was coded in a smart way, there would be no need for this tag as page would only be accessable via one URL. But in a less than ideal world on the web including various legacy systems and some current ones that weren’t built for SEO, this type of thing is a really good bandaid solution. Still just a bandaid, but hey at least it stops the bleeding!
Matt

Comments are closed.

A very brief lesson in canonicalization

Reasons not to use the new canonicalization standard

But At Least We Have Options

Give Me More

Comments