Duplicate content sin #2: Default page linking

Featured

Ian Lurie Oct 21 2010

Last week I wrote about duplicate content sin #1 – screwy pagination. Today I’m going to explain a much simpler, but bigger problem: The inconsistent default page link.

When I say ‘default page’, I mean whatever page you’d first see if you navigated to a folder on a web site.

So the default page for Conversation Marketing (the whole site) can be found at www.conversationmarketing.com/. That’s the root folder – the main folder housing my whole site.

The default page for all of this month’s posts can be found at http://www.conversationmarketing.com/2010/10/. That’s the sub-sub folder /10/, in the sub-folder 2010, in the root folder for www.conversationmarketing.com:

cm folder structure

You can also find the default page for Conversation Marketing at http://www.conversationmarketing.com/index.htm. And you can find the default page for this month’s posts at http://www.conversationmarketing.com/2010/10/index.htm.

Web servers automatically deliver these default pages when a visitor requests the folder – that’s why you don’t have to add ‘index.htm’ to these addresses.

Inconsistency = Duplication

The problems arise when a developer or designer links to default pages using different link styles at different times. For example, if your site has a ‘home’ link that points at ‘/index.htm’ or ‘default.aspx’ or whatever your default page is, you’ve created duplication:

  • Search engines and most people see your home page as www.yoursite.com. Most other sites link to you there, too.
  • But search engines crawling your site also see the link to www.yoursite.com/index.htm, and follow that link.
  • To a search engine, the ‘/index.htm’ page and the www.yoursite.com page are two unique pages with the exact same content.

Voila. Duplication.

The same thing happens if you inconsistently link to subfolders in your site.

I won’t even waste time explaining what this does to your link profile. It’s bad.

The problem here is duplication. And, as we know, duplicate content sucks.

My rule

If you want to avoid this kind of problem, apply Ian’s Rule of Simplicity: Always use the shortest version of any default page’s address. That version should typically be:
www.yourdomain.com + folders
No filenames.

Do that, and you’ll eliminate one huge duplication problem. Best part is, most of your default page links will be in your navigation. If your site was built by a relatively sane person, you can make one change to your site template and fix a site-wide duplication issue. Woo hoo!

By the way, this is also considered a canonicalization problem. I’ll never stop ranting about canonicalization – you know that, right?

I’ve been writing up a storm this week, so no fancy conclusions or funny animal pictures. Bye.

tags : conversation marketing

related articles

3 Comments

  1. Hi Ian
    I recently upgraded our in-house CMS to make it more out of the box SEO.
    I am using the URL rewrite in IIS7 to take a fake page name (jit_default_794.Website_Design.html ) and interpret it into something like default.asp?id=794 for the CMS to serve content.
    I did this so that I could use the page name in the URL. Is this a waste of time, especially for pages in folders?

  2. Ian

    @Carl that’s a more complicated question. I’m a little afraid to try answering it here. First, it’d make more sense to create human-readable URLs with rewriting. Second, it depends on your folder structure, the purpose of your site and your business goals.

  3. Oops…have a few internal links, I need to check! Thanks Ian :-)

Comments are closed.