SEO Obviousness: Duplicate content sucks

Featured

Ian Lurie Oct 11 2010

weasel-dupes

Saying there’s no longer a duplicate content penalty is a semantic weasel routine that ignores one hard fact: Duplicate content will kill your rankings.

I’ve written a lot about canonicalization and the content duplication toilet bowl of death. Those are causes of duplication.

Read the two articles I linked to above first if you don’t know what I mean by ‘duplicate content’ or ‘canonicalization’. It’ll make this article make sense.

It really helps to know why duplicate content is bad, though. So here are the SEO reasons you need to avoid duplicate content:

1: Wasted crawl budget

Search engines ‘crawl’ your web site using programs called spiders. When Google or Bing send a spider to your site, they set a crawl budget – a certain amount of time the spider will spend on your site, scampering from page to page.

The spider crawls every page it finds, regardless of duplication. Even if you use rel=canonical to tell visiting spiders to use a different URL, the spider still had to go to the initial URL.

That’s wasted crawl budget, any way you slice it. If Googlebot (Google’s spider) spends 1 minute on your site, loading 1 page every 15 seconds, it crawls 4 pages. But, if 3 of those 4 pages are duplicates, it only crawled 2 ‘real’ pages. The other two are at best wasted, at worst an even bigger SEO problem (see #2 for why).

Duplicate content burns crawl budget. Search engines crawl fewer unique pages of your site, leaving you with fewer indexed pages, and fewer shots at good rankings.

2: Link dilution

Duplicate content is the Ralph Nader (or Ross Perot) of search engine optimization. In simplest terms, every link to your web site is a vote. Some votes are worth more than others, but they’re still votes.

If your site has 4 duplicates of a particular page, four different webmasters could each link to that same page at a different URL. You just split the vote 4 ways, and rel=canonical isn’t going to fix it. Not 100%, anyway.

Duplicate content dilutes link authority.

3: Indexing flip-flop

Finally, duplicate content leaves it up to search engines to guess which page to index and rank. Combine 3-4 copies of the same page at 3-4 different URLs with link dilution (see #2, above) and you’re kinda screwed. Search engines have no good way to know which of those pages should really show up in a search result.

If they rank the wrong page, and you then remove it, you lose your ranking. Or, as webmasters link to different copies of the page, various copies pop in and out of the rankings, never moving up.

Duplicate content sucks

Think carefully about how you’re linking within your web site. Don’t depend on SEO workarounds like rel=canonical to fix the problem later on. Address duplication pre-launch, pre-redesign. If it’s too late for that, address it right now. It’s the best SEO favor you can do for your site.

tags : conversation marketing

3 Comments

  1. I’ve heard the idea of canonicalization banded about but have never really heard any strong views on it – very much a case of ‘perhaps it’s true, perhaps it’s not’. It’s good to hear someone say it’s actually true, since I’ve suspected it might be for a while but haven’t had any serious motivation to do anything about it.. The idea is a crawl budget is curious too. At least it’s correctable with the appropriate redirect codes!

  2. Sasha

    Great post! You’ve explained this so well. And it all makes such simple sense! I didn’t know about the crawl budget of the spiders and I’ll definitely watch out for that on our own site.

  3. Excellent posting Ian..Thank you for making SEO an easier concept to understand!!

Comments are closed.