SEO worst practices: The content duplication toilet bowl of death
Ian Lurie Aug 17 2010
50% of SEO (search engine optimization, in case you live under a big rock, or you’ve never been to this blog before) is staying out of the way, staying out of trouble, and letting search engines find everything on your web site. It should be easy, but people seem to constantly create new ways to get in the way. Here’s one of my favorite examples: The exploding URL, AKA…
The Duplication Toilet Bowl of Death
There are lots of little problems that can generate duplicate URLs. But the worst is the Exploding URL, aka the Duplication Toilet Bowl of Death.
URL stands for ‘uniform resource locator’ – the unique address for any one page or file on your web site. It’s very, very important that you have one unique URL for each page – read the canonicalization series for the in-depth explanation, or read my Search Engine Land article on the same subject for the digest version.
Say you have a site that delivers different content to people who live in different cities. You let folks choose by clicking on a map. Once they click, you add a query string like “?city=seattle” on to the end of the URL.
Good so far.
But developers will often use shortcuts for situations like this, where they read whatever the current URL is and then tack on the additional information to the existing query string. So, if I clicked Seattle, then came back and clicked Manhattan, I could end up with:
Don’t roll your eyes at me. This is from a real-life example.
Keep going and you end up with all sorts of fun stuff, like:
And, since you carry that information to every page on your site, you carry the duplicate content love all over the place, getting fun addresses like:
Right down the toilet bowl. Of DEATH.
Right. Like THAT’ll happen
You’ll probably say “But Ian, most people just click one city. What do I care of some OCD lunatic clicks every city on the map?”
You care because search engines are OCD lunatics. If you have a map with, oh, 50 cities on it, and a search engine follows the link for every city, you end up with over 2000 possible versions of every page on your site.
That’s a teeny, tiny duplicate content and canonicalization problem. If a visiting search engine spider has a crawl allotment of 500 URLs, it could potentially spend its entire allotment on one or two pages.
Detection and fix
The easiest way to detect this problem is to study your log files, if possible. Grab the logs and grep for one any URL that contains a query string.
Another way is to perform a site: search on Google, like this:
Click through to the very last result. If you see a message like this:
Show the omitted results and check them for exploding URLs. Note, though, that Google may be filtering out the duplicates. Reading your log file is the only sure thing.
The fix: Either set a cookie on the visitor’s machine that stores the most recently selected city, or remove the old attribute before adding the new one. Sounds simple, I know. Actually, it is simple. Ask your developer to fix it – they can probably do it in a couple of minutes.
Other ways to head down the toilet bowl
There are lots of other ways to end up with exploding URLs:
- A “from” or “nav” attribute you add to URLs to track the page from which a visitor came;
- A faulty ’email a friend’ link;
- A site that adds sessionIds to every page (arghhh);
- Using ‘Get’ instead of ‘Post’ in a form.
The good news is, if you’ve got this problem, it’s easy to clear up, and your SEO numbers will probably skyrocket as soon as you do.
Ian Lurie is CEO and founder of Portent Inc. He's recorded training for Lynda.com, writes regularly for the Portent Blog and has been published on AllThingsD, Forbes.com and TechCrunch. Ian speaks at conferences around the world, including SearchLove, MozCon, SIC and ad:Tech. Follow him on Twitter at portentint. He also just published a book about strategy for services businesses: One Trick Ponies Get Shot, available on Kindle. Read More