The major search engines just announced that they’re going to start supporting a new link attribute: ‘rel=”canonical”‘.
The Google Webmaster Central blog has a fair amount of detail, so I won’t explain it all here.
A very brief lesson in canonicalization
Marcelo rightly pointed out a quick refresher might be in order. Here’s what a canonicalization problem is:
Say you have a big web site with lots of pages. On most pages, you link to one critical article at:
But, on a few pages, you linked to the same page using a slightly different URL:
That difference is a canonical difference between the URLs.
You and I know perfectly well they’re the same page. But a search engine does not. So they cheerfully crawl your site, splitting link authority and relevance between the two URLs and Ralph Naderizing (or Ross Peroing) your link building efforts.
Mixing home page links between www.mysite.com, mysite.com and www.mysite.com/index.cfm can have the same effect.
It’s a major SEO bugaboo – my company has invested a lot of time and money in building a toolset just to sniff out these problems. So finding the problems aren’t that hard. Fixing them is another matter entirely, because:
- When you go to your development team and ask them to fix canonicalization issues, they start sticking pins in voodoo dolls.
- Other sites may link to you using canonically inconsistent URLs. It’s even harder to get them to change.
- Tracking down all the link sources is a major pain.
So, the new rel=canonical tag could save us a lot of pain.
Or not. Read on to get my two sides of the argument: To use, or not to use.
Reasons not to use the new canonicalization standard
But, ever the skeptic, I have a few things I think you should consider before diving into rel=canonical:
- Google doesn’t offer unconditional support. Google says they will “take your preference into account” and that “it is a hint they will honor strongly”. So rel=canonical is not a directive.
- Standards? We don’t need no stinkin’ standards. To this day, Google, Yahoo! and Live each support rel=nofollow in different ways. Compared to rel=canonical, nofollow was a cinch. Implementation is bound to be messy, at least until Google grinds its competitors into goo.
- You should’ve done it right the first time. You hear me. Why the hell do people still tack session id’s, random URL variables and other crap onto URLs in this day and age? I don’t know. But relying on someone else to fix the mess for you could be trouble.
- You’re paranoid. If you’re afraid the Search Engines Are Out To Get You, then you may not want to hand over even more information about your site’s structure and your intentions.
But At Least We Have Options
That said, I’m pretty damned happy about rel=canonical. It gives me an option on sites where their canonicalization looks like a bowl of dried spaghetti.
So, use rel=canonical if:
- You have no other choice. Maybe you inherited an awful CMS implementation. Maybe you don’t know how to code. Maybe your IT department just hates you. Rel=canonical lets you fix the problem with a simple <head> tag.
- It’s impossible to track down all the incorrect link sources. If you have an old, established site with thousands of internal and external links all using different canonical URLs, then rel=canonical is a relatively painless fix that you can put in place on the target page. You no longer have to track down every link source.
- You want to try it. It can’t hurt, right? (unless you’re paranoid – see above)
I’ll be testing out the new canonicalization tag in the coming weeks, and will let you know how it goes.