SEO worst practices: The content duplication toilet bowl of death

Featured

Ian Lurie Aug 17 2010

Not the Toilet bowl of death! omg!

50% of SEO (search engine optimization, in case you live under a big rock, or you’ve never been to this blog before) is staying out of the way, staying out of trouble, and letting search engines find everything on your web site. It should be easy, but people seem to constantly create new ways to get in the way. Here’s one of my favorite examples: The exploding URL, AKA…

The Duplication Toilet Bowl of Death

There are lots of little problems that can generate duplicate URLs. But the worst is the Exploding URL, aka the Duplication Toilet Bowl of Death.

URL stands for ‘uniform resource locator’ – the unique address for any one page or file on your web site. It’s very, very important that you have one unique URL for each page – read the canonicalization series for the in-depth explanation, or read my Search Engine Land article on the same subject for the digest version.

Say you have a site that delivers different content to people who live in different cities. You let folks choose by clicking on a map. Once they click, you add a query string like “?city=seattle” on to the end of the URL.

Good so far.

But developers will often use shortcuts for situations like this, where they read whatever the current URL is and then tack on the additional information to the existing query string. So, if I clicked Seattle, then came back and clicked Manhattan, I could end up with:
www.mysite.com?city=seattlecity=Manhattan

Don’t roll your eyes at me. This is from a real-life example.

Keep going and you end up with all sorts of fun stuff, like:
www.mysite.com?city=seattlecity=Manhattancity=seattlecity=fargocity=troycity=buffalocity=bend
And, since you carry that information to every page on your site, you carry the duplicate content love all over the place, getting fun addresses like:
www.mysite.com/locations.html?city=seattlecity=Manhattancity=seattlecity=fargocity=troycity=buffalocity=bend
www.mysite.com/contact.html?city=seattlecity=Manhattancity=seattlecity=fargocity=troycity=buffalocity=bend
www.mysite.com/store.html?city=seattlecity=Manhattancity=seattlecity=fargocity=troycity=buffalocity=bend

Right down the toilet bowl. Of DEATH.

Right. Like THAT’ll happen

You’ll probably say “But Ian, most people just click one city. What do I care of some OCD lunatic clicks every city on the map?”

You care because search engines are OCD lunatics. If you have a map with, oh, 50 cities on it, and a search engine follows the link for every city, you end up with over 2000 possible versions of every page on your site.

That’s a teeny, tiny duplicate content and canonicalization problem. If a visiting search engine spider has a crawl allotment of 500 URLs, it could potentially spend its entire allotment on one or two pages.

Detection and fix

The easiest way to detect this problem is to study your log files, if possible. Grab the logs and grep for one any URL that contains a query string.

Another way is to perform a site: search on Google, like this:
site:www.mysite.com

Click through to the very last result. If you see a message like this:
Google duplicate content message

Show the omitted results and check them for exploding URLs. Note, though, that Google may be filtering out the duplicates. Reading your log file is the only sure thing.
The fix: Either set a cookie on the visitor’s machine that stores the most recently selected city, or remove the old attribute before adding the new one. Sounds simple, I know. Actually, it is simple. Ask your developer to fix it – they can probably do it in a couple of minutes.

Other ways to head down the toilet bowl

There are lots of other ways to end up with exploding URLs:

  • A “from” or “nav” attribute you add to URLs to track the page from which a visitor came;
  • A faulty ‘email a friend’ link;
  • A site that adds sessionIds to every page (arghhh);
  • Using ‘Get’ instead of ‘Post’ in a form.

The good news is, if you’ve got this problem, it’s easy to clear up, and your SEO numbers will probably skyrocket as soon as you do.

tags : conversation marketing

3 Comments

  1. Bill

    You should separate the URL parameters with the ampersand – otherwise, search engines will actually see different parameters.
    ?city=seattlecity=Manhattan
    is parameter ‘city’ equals value ‘seattlecity=Manhattan’
    not two city parameters with different values

  2. Andrew

    Your examples don’t include a delimiter between the various (sadly identical) GET variables.
    The most common delimiter is the ampersand, and without a specified method of breaking up everything past the “?”, the web server will not know what values belong to what variables.
    Your example like this:
    http://www.mysite.com?city=seattlecity=Manhattan
    Should really be:
    http://www.mysite.com?city=seattle&city=Manhattan
    Otherwise I imagine the much more visible issue would the fact that you were left with results for the city of “Seattlecity”.

  3. Ian

    @Andrew actually the developer in this case didn’t include the ampersand. It was tragic. But adding the ampersand will cause the same problem.

Comments are closed.