Saying there’s no longer a duplicate content penalty is a semantic weasel routine that ignores one hard fact: Duplicate content will kill your rankings.
Read the two articles I linked to above first if you don’t know what I mean by ‘duplicate content’ or ‘canonicalization’. It’ll make this article make sense.
It really helps to know why duplicate content is bad, though. So here are the SEO reasons you need to avoid duplicate content:
1: Wasted crawl budget
Search engines ‘crawl’ your web site using programs called spiders. When Google or Bing send a spider to your site, they set a crawl budget – a certain amount of time the spider will spend on your site, scampering from page to page.
The spider crawls every page it finds, regardless of duplication. Even if you use rel=canonical to tell visiting spiders to use a different URL, the spider still had to go to the initial URL.
That’s wasted crawl budget, any way you slice it. If Googlebot (Google’s spider) spends 1 minute on your site, loading 1 page every 15 seconds, it crawls 4 pages. But, if 3 of those 4 pages are duplicates, it only crawled 2 ‘real’ pages. The other two are at best wasted, at worst an even bigger SEO problem (see #2 for why).
Duplicate content burns crawl budget. Search engines crawl fewer unique pages of your site, leaving you with fewer indexed pages, and fewer shots at good rankings.
2: Link dilution
Duplicate content is the Ralph Nader (or Ross Perot) of search engine optimization. In simplest terms, every link to your web site is a vote. Some votes are worth more than others, but they’re still votes.
If your site has 4 duplicates of a particular page, four different webmasters could each link to that same page at a different URL. You just split the vote 4 ways, and rel=canonical isn’t going to fix it. Not 100%, anyway.
Duplicate content dilutes link authority.
3: Indexing flip-flop
Finally, duplicate content leaves it up to search engines to guess which page to index and rank. Combine 3-4 copies of the same page at 3-4 different URLs with link dilution (see #2, above) and you’re kinda screwed. Search engines have no good way to know which of those pages should really show up in a search result.
If they rank the wrong page, and you then remove it, you lose your ranking. Or, as webmasters link to different copies of the page, various copies pop in and out of the rankings, never moving up.
Duplicate content sucks
Think carefully about how you’re linking within your web site. Don’t depend on SEO workarounds like rel=canonical to fix the problem later on. Address duplication pre-launch, pre-redesign. If it’s too late for that, address it right now. It’s the best SEO favor you can do for your site.