10 SEO tips for publishers

Featured

Ian Lurie Jun 16 2010

I’m speaking at the AABP Summer Conference in 10 days, and am very excited about it. Online publishing and SEO for publishers are two of my favorite topics. My presentation will cover more/different stuff, but these 10 tips always come up when I’m working with a publisher.

Note that these are site-wide configuration tips for publishers, not their editorial teams. I’ll get to those tomorrow.

1: Set up Google, Bing and Yahoo! Webmaster Tools.

Yawn, I know, but you can get amazing data back from these tools, including a list of broken links from other sites. Redirect those and you get an instant burst of link love. In the example below, the two sites outlined in red incorrectly linked to Conversation Marketing. If I 301 redirect those links to the correct page, I’ll gain the ‘votes’ those sites are trying to send my way:

Bad links you can redirect in Google Webmaster Tools

2: Don’t archive content.

I know you want to, but don’t move old content into a separate ‘archives’ folder on your site. Say you move http://www.mybigsite.com/articles/article1234.html to http://www.mybigsite.com/archives/article1234.html.

You’ve just broken every link from other sites to that article. Even if you 301 redirect them all, you’ve diminished the link authority.

You’ve also forced search engines to dump the old index of that page and grab a new one, thereby reducing crawl depth of your site.

Leave articles where they were created. Let your navigation take care of moving older stuff out of folks’ way.

3: Set ‘expires’ headers

After a story has died down, and you’re not getting new comments, set article pages to deliver a code of ’304′ to visiting browsers. That will tell visiting search engines to leave the page as-is, and move on to newer stuff.

You’ll get deeper crawls of your site, and more pages of your site will show up in search results. That means more traffic on more terms, and more pageviews, and therefore more advertising revenue. No down side.

4: Set up GZIP compression on your server

You may not know this, but your server can compress everything it sends to visiting browsers – html files, style sheets, images and scripts – for those browsers to decompress on the other end. That’s called GZIP compression, and you can use it on just about any reasonably up to date web server.

By doing so, you can speed server performance. A lot. On this teeny little blog it took page loads from a relatively sluggish 7-10 seconds down to 3-4 seconds.

On your really big site, where you get lots more traffic, it can cut bandwidth use by 30%, speed page loads, and yes, help search engines crawl more pages of your site in less time.

5: Generate a complete XML sitemap

Don’t generate an XML sitemap that just shows the most recent 10-20 stories. Blech. You need every page on your site – every URL – in that sitemap!

That’s insane, of course. If you’re a small publication with, oh, 10000 or so pages, you don’t want to regenerate the whole map every single time.

So instead, build your sitemap progressively: Generate multiple, smaller sitemaps and tie them together with a sitemap index. You can generate each of the smaller sitemaps from oldest to most recent. That way, you keep adding new stuff to the map, but keep the full map in place.

6: Practice good image housekeeping

Put your images in a folder called something revolutionary, like ‘images’. Configure your content management system to give them sensible, human-readable names, like ‘ian-photo.jpg’, instead of ’123asdf234q234asefs.jpg’. If you’re worried about keeping filenames unique, you can always have your CMS add a unique ID, so that you can have ‘ian-photo-61010.jpg’ and ‘ian-photo-61110.jpg’.

If your CMS folks tell you they can’t do that, either:

  1. They’re lying; or
  2. The CMS you’re using was written with sledgehammers and chisels and engraved into stone tablets. It will soon die, taking all your content with it anyway.

Either way, change is in order.

Also, put your images in your XML sitemap. This is a new thing, but Google just said they’d like image files in there.

Why do all this? Because images show up in search results. If you have the first picture of some famous person beaming over their new baby, you want that photo ranking, right? Especially since photos often occupy the top spots, and it’s a lot easier to rank for a photo than a text page:

elephant baby search

7: Set up the right redirects

Make sure everyone sees a single URL for your home page. You can ensure that by redirecting from ‘mybigsite.com’ to ‘www.mybigsite.com’, and ‘www.mybigsite.com/whateveryourhomepageis.html’ to ‘www.mybigsite.com’.

Use a 301 redirect, by the way.

And then do the same thing for all your category pages.

These redirects will keep everyone going to, and linking to, the same place. Search engines will see all of those links pointing at the same place, too, and voila: Canonical bliss.

8: When it’s a 404, deliver a 404

Make sure that your server sends a ’404′ error code when someone visits a page that doesn’t exist.

Many large site developers set up their systems to reroute visitors to a ‘Sorry, page not found’ page, but have the system deliver a ’200′ code instead. ’200′ tells a web browser or search engine that all is well, and index this page, if you please. ’404′ tells a web browser or search engine ‘OMG! This is all wrong! Forget it!’

If you deliver a 200 code, then search engines will index that ‘page not found’ page again, and again. And even again. That’ll get you hundreds, or thousands (the record in my experience was 103,000) of duplicate pages in the search index.

That, in turn, means search engine spiders waste time on your site. Instead of crawling juicy, real content, they crawl your fake page not found content.
Deliver the 404 code. Test it using Live HTTP Headers or some such.

You can read how to set up a 404 error page in my older post over here.

9: Learn how 1st click free works

If you really, absolutely have to use a registration or pay wall, and want to force people to give you their information/money before they read your stuff, make sure your site complies with 1st click free.

1st click free lets visitors from search engines see the first page they land on without registration or payment. If you comply, then Google will crawl and index all of your content, even if it’s behind a registration/pay wall.

You can read up on 1st click free on the Google Webmaster Blog.

10: Be careful with third-party systems

Don’t publish comments to your site using a javascript-driven system. That hides the comments from search engines.

Use a third-party video host, by all means. But don’t use their default pop-up players. Instead, the videos ‘live’ on pages hosted by the video host. Embed the videos right in pages on your site. Then you can publish a video sitemap and get the ranking juice you deserve for creating all that video content.
All of these vendors of comment tools, ‘integrated’ social media tools and video hosting will wax rhapsodic about all the great SEO stuff they do for you. They. Are. Full. Of. Crap. Ask them one question:

“Will the comments/videos appear on a page that is under ‘www.insertyourdomainnamehere.com’?”

If they can actually stutter out ‘yes’ with a straight face, make them put it in writing, with a signature and the exact URLs at which they’ll show up.
insertyourdomainnamehere.vendordomain.com is not the same thing. Neither is video.insertyourdomainnamehere.com. Do not compromise on this, or you’ll be losing tons of great content, and helping the vendor compete with you.

If the content isn’t on your site, under your domain, then it doesn’t count.

These are the easy changes

After these changes, you get down to editorial stuff: Article-by-article changes and person-by-person coaching and training. You have to do it. But why not do the easy stuff, first? Make the 10 changes above and you’ll have a huge head start.

Or don’t, and end up like Gourmet Magazine.

tags : conversation marketing

related articles

10 Comments

  1. Pranav Shah

    Hi Ian,
    I have a question about the complete Sitemap. I am in the process of building a site and most of the content is product details. One of my major concerns is that by providing a complete site-map am I not making it easy for a scrapper program (ex wget) to easily steal all of my content.
    Thanks,
    Pranav Shah

  2. Good advices, Ian. :-)
    Regarding point 3:
    “You’ll get deeper crawls of your site, and more pages of your site will show up in search results.”
    I don’t follow you here though. How can setting up a 304 give this result?

  3. I would definitely add an 11th! Do not duplicate content. If content appears in different sections of your site (by category and by tag for example), use canonicalisation. Have a look at http://www.mattcutts.com/blog/seo-advice-url-canonicalization/ – direct from the horse’s mouth!
    Also duplicating content across sites is a definite no-no. Changing the odd word here and there won’t cut it either. Keep it unique and ya be fine!

  4. Ian

    @Per a ’304′ tells a search spider that the page hasn’t changed. The spider responds by NOT crawling that page again, so it has more time to crawl the rest of the site. The 304′d page stays in the index, since it’s still there, but the spider doesn’t have to use any crawl allotment for it.

  5. Great tips! I never knew 404 pages mattered before or archiving is a bad a idea. Makes sense I guess. The only SEO trick I know would be backlinks
    I also agree about the duplicating content. Still I see some blog posts get re-posted in another blog. I know that doesn’t count in the human sense, but will it matter to search engines as well?

  6. Ian

    @Pranav If someone wants your content that badly, they’ll find other ways to get it. There are dozens of programs that would let me scrape your site for the content with or without a site map. So I’d say don’t worry about it.

  7. Ian

    @Megan It generally makes no difference. You might get backlinks from folks stealing your content like that, but they’ll likely be low-value. And you won’t suffer any penalties for it.

  8. Ian

    @Zadling Yes, that’s correct, as long as you have those modules installed. They come default with most Apache distributions, though – I’ve never had to install them separately. Add those lines and you’re good to go.

  9. Ben

    Great list for starters, I especially like that you included the 304 expired headers on old content. This I’d often missed and I’d so important for large publications.

Comments are closed.