We’re all talking a lot about content and social media these days. But visibility boils down to technical SEO: Ensuring that search engines can easily find and categorize every page of your site. These are the top 4 I look at first when I’m auditing a site:
1: Check server response codes
You can use our server response code checker for starters. Just make sure your server is delivering a 404 for a broken link, a 200 for a page that’s just fine, and a 301 for a redirect. You can learn about server response codes in this post I wrote a couple years ago.
2: Seek and destroy duplicate content
Duplicate content hurts site quality and crawl efficiency. You need to get rid of it. We have our own crawler for testing this, but you can use Screaming Frog SEO Spider. Distilled has a fantastic article that includes detailed instructions on using Screaming Frog to find duplicates.
3: Find unreachable pages
Site owners find all sorts of ways to make pages on their site vanish. They orphan pages; they break links; they remove all possible ways of reaching a specific page. You need to find those.
There’s no one fantastic way I know to do this. But some of the tricks I use are:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
4: Look for spider traps
Content management systems like WordPress have lots of extra little snippets of code they use to schedule tasks, deliver content via AJAX, handle searches and generate navigation. That’s fine, but if a search bot starts beating the poop out of some of these snippets, they can suck the life out of your server.
Here’s an example: A few weeks ago, RandFish was kind enough to tweet about a post we’d written on this very blog. That same day, I had an article go live on TechCrunch. As a result, we got about 5x our normal traffic. No big deal.
Unless, of course, you’ve already got GoogleBot rattling around between a WordPress AJAX script and a database scheduler every 15 seconds or so. Then your server coughs, sputters and flips over on its back, waiting for a tummy rub. It also locks up so badly that no amount of cursing or talking nice will get it to let you log in and fix it, by the way. In case you were wondering. And I know you were.
When I looked at our log files for the last month, I found two URLs that GoogleBot kept hitting: wp-cron.php and admin-ajax.php
‘Kept hitting’ means ‘latched onto like a leech at a blood bank’. GoogleBot hit these files 4-5 times per minute.
We disallowed them, and voila: No more crashing server.
That was a classic spider trap: Pages or scripts no bot should find, but did.
Check your log file BEFORE your site crashes and you can avoid our embarrassment.
These four tips are just for starters. You can check for broken links, work on site speed and clean up your code, for example. But once the really easy stuff is addressed, the four ideas above should keep you busy for a while.
What do you all look for in a technical site audit?