A Quick & Dirty Google Index Diagnostic
This article outlines a simple diagnostic technique to assess the health of your site architecture.
For all the current attention being paid to Content (caps intentional), our ability to generate organic traffic still depends heavily on that content being found and indexed by a search engine’s crawlers.
This is a simple way to diagnose how well your site’s indexed using the data available in Google Search Console (which I still call Webmaster Tools):
What you’ll need:
Webmaster Tools access.
If you don’t have it, ask anyone with admin privileges. If it’s not set up for your site, here are instructions.
A good sitemap.
Specifically, a sitemap that includes everything worth finding on your site and nothing extra. If that isn’t available, or doesn’t exist, that’s your new first step.
The basic process is to find the answers to two questions:
- How much of my sitemap (the things worth finding) has Google found?
- How much else (garbage that weighs down our site) has also been found?
We can tell a lot by looking at these two answers together. Let’s get started.
How much of my sitemap has Google found?
Finding the answer to this is straightforward. Expand the “crawl” option in Webmaster Tools’ side menu and select sitemaps. Once there, make sure that the “all” tab is selected.
Google self reports the number of pages you have submitted via your sitemaps and the number of those pages actually in their index. Find the percentage of your site indexed by dividing the pages indexed by the pages submitted. I call that the Sitemap Ratio.
How much ‘else’ has Google found?
Knowing the number of pages indexed from our sitemap, we’ll compare that to all the pages Google has indexed from our site. Expand the ‘Google Index’ option in the side menu and select ‘Index Status.’
Divide the pages indexed from our sitemap by Google’s total index of our site. I call that the Index Ratio.
The two ratios side by side provide a visual representation of your site’s presence in Google’s index.
Interpreting your results
I’ll cover some basic scenarios in this section, but understand that these things exist on a spectrum. A small problem on one site may be a “Holy Sh**!” moment on another: The impact of a site’s infrastructure problems scale with its size and revenue, since losing out on 100 pages worth $10/month each is a tad different than losing 100 pages worth $10,000/month each.
This is the ‘perfect’ result. 100% of your sitemap is indexed, and your sitemap is 100% of indexed pages.
This means that Google is finding everything you intend it to, and nothing you don’t. This is the unreachable ‘perfect’ end of the spectrum where you can’t make any further improvement.
Scenario 1: High Sitemap Ratio, High Index Ratio
The majority of your sitemap is indexed, and the majority of what has been indexed is in your sitemap.
Most of your unique contributions to the web are accessible and open to organic traffic, and your site isn’t weighed down by excess pages. This is the ‘healthy’ result.
Scenario 2: High Sitemap Ratio, Low Index Ratio
Most or all of your sitemap has been indexed, but Google has reported finding many more pages as well.
While your important pages are indexed, bloat may be hindering their performance. Double check your sitemap to make sure it’s comprehensive. Assuming the sitemap is up to snuff, you may have stumbled on a spider trap, or one of many other issues related to the accidental duplication or creation of pages.
Scenario 3: Low Sitemap Ratio, High Index Ratio
Only a fraction of your sitemap has been indexed, but that fraction is most of what Google has found on your site.
Bottom line: Google is having difficulty crawling and/or indexing your site.
Double check your robots.txt and meta-robots tags for conflicts with your sitemap. If that checks out, then you’ll likely want to seek expert help, as crawling the site is the best way to diagnose where the breaks are, and how to resolve them.
Depending on the underlying issue, fixing this may help both robots and humans navigate your site. No matter the cause, helping the search engines find your content will absolutely help you compete for organic traffic with your existing pages & content.
Scenario 4: Low Sitemap Ratio, Low Index Ratio
As above, only a fraction of your sitemap has been crawled, but Google has also found a lot of other, unintended pages.
This result indicates that there are serious issues with your site architecture.
Double check your sitemap for both omissions and unnecessary pages. Also check your robots.txt and meta-robots tags for conflicts with the sitemap. It usually requires multiple issues to cause a problem like this.
In this situation, you’ll very likely require expert assistance. The causes are usually nerdy, technical things like spider traps and other crawler-specific issues. They’re hard to spot and harder to pick apart.
That said, in this scenario you’ve also got the most to gain by resolving the problems in your architecture. In this condition, infrastructure is such a limiting factor that removing it as a roadblock can have enormous effects on your site’s performance.
For more on how infrastructure problems can affect your entire digital marketing program, our Marketing Stack Explainer is a great primer.
If you’d like to go a little geekier, we recently published a Field Guide to SEO Spider Traps, which can cause a lot of the surface level symptoms we’ve seen here.
I hope that you found the results of the diagnostic either reassuring or illuminating. And that if you found something to be concerned about, then at very least you’ve got a solid idea for the next questions to ask.
If you have questions about this method, leave a comment below:
Thanks for this fantastic breakdown. I really appreciate the work you folks do; very helpful!
I’m seeing something odd with my new test site (purely a discussion forum) and wondering how it relates — I’m getting organic search traffic, but Webmaster Tools says I have 0 indexed pages. I launched the site about a week ago in part to specifically test indexing of a fresh site, but I would have expected to see indexing rise before clickthrough rate. A site: search for my domain returns nearly all my pages. Am I totally misunderstanding the purpose of indexing status?
Hey Aimee, thanks for the reply.
Webmaster tools’ data only updates about once a week, so it is perfectly possible that between one update and the next your site has been found and indexed.
Hovering your mouse over the line graph in the Index Status section will give you the dates of the data points. If you still don’t see anything in the index status after the next update, make sure that WMT is set up for the right version of your site. http and https versions are considered different sites altogether in WMT, and will not share data like index status or search traffic.
Sometimes having a high sitemap/low index ratio is kind of inevitable, isn’t it? With things like ecommerce sites which end up having a ton of pages with duplicate content, such as search pages, etc. Those pages are necessary, too, of course, so you can’t just get rid of them, but it wouldn’t make sense to put all of them in your sitemap, either. So in cases like that, not having the “ideal” ratio would be okay… I think?