Making cats into turnips: Using Google Webmaster Tools query data

gwt-ga-4 Analytics

Ian Lurie Jan 4 2012

A long time ago, alchemists tried to turn lead into gold. They were pikers. Compared to digging keyword data out of the Google’s lovely (not provided) statistic, lead into gold woulda been a cinch.

But I’m stubborn. So, I’ve pulled a bunch of keyword analytics data from Google Analytics, and the Search Queries report from Google Webmaster Tools. I figured, if the data matched up reasonably well, I’d be all set.

Riiiiggghhhhhhhhhhhht.

It turns out comparing Google Analytics keyword data and Google Webmaster Tools data isn’t like comparing apples and oranges. Those are at least both round and fruit-like. It’s more like comparing cats and turnips.

The formula that works sorta works

I started this process assuming that if:

  1. I knew the reported Google Analytics rga organic traffic for a phrase;
  2. I knew the Google Webmaster Tools query clicks gw for the same phrase;
  3. I knew the (not provided) count np;
  4. And I knew the total Google Webmaster Tools clicks gwt for that phrase.

Then I could get at least a rough idea of the actual Google Analytics clicks aga, like this:

aga = (gw/gwt)*np

Then, I can use rga as a gut-check to ensure that Google Webmaster Tools isn’t lying to me.

But, as I discovered, there’s a lot of faith involved. And guessing. And gnashing of teeth.

Lost clicks

The problem with this method: It only works when there are more Google Webmaster Tools clicks than Google Analytics clicks. This formula finds missing clicks in Google Analytics, given a higher number of GWT clicks. If you switch things around, it becomes a crap shoot. And this is a problem that pops up a lot.

Google Webmaster Tools shows less traffic than Google Analytics

Seriously, Google?

If the Google Webmaster Tools data were 339% higher than the Google Analytics data, the difference might come out of (not provided). But in this case, the GWT data is lower.

I found a lot of these — about 35% of my search data. Soooooo, where’d those clicks come from? Why is Google Webmaster Tools under counting in so many cases? Or is Google Analytics over counting?

My palms got sweaty.

I try some theories. One sticks.

My first theory: Google Webmaster Tools failed to track clicks on videos, somehow, that weren’t YouTube or Google. That didn’t work out: Lots of non-video rankings in my data had the exact same problem.

Actually, my first first theory involved puppy sacrifice, but I figured Google couldn’t hide that for long, so I ruled it out.

.

My second theory: When both clicks and impressions get below 10 for a specific listing, Google Webmaster Tools goes cattywhumpus. It just can’t provide accurate data on really low-clickthru listings. Instead, GWT shows an oh-so-helpful <10 measurement. That’s fine if you’re simply ignoring low-volume phrases. But, if you’re trying to analyze the long tail, it really screws things up.

I need to emphasize this isn’t a keyword thing — it’s a listing thing. Here’s an example: For the phrase ‘Google Analytics Tutorials’, I see 170 impressions and less than 10 clicks in Google Webmaster Tools’ query report. In Google Analytics, I see 23 clicks for the same time period:

Google Webmaster Tools vs. Google Analytics for 'tutorials'

Sigh. Can't we all just get along?

The problem isn’t the phrase. I have other phrases that get few impressions and clicks, but still show accurate data. They show that accurate data because most of the clicks come from a single search listing. So instead of 5 listings all getting <10 clicks, I see one listing that gets, say, 110 clicks:

fewer urls means more accurate data

Fewer URLs means more accurate data for low-volume terms

While that’s higher volume, it’s not exactly a high-traffic phrase. The difference is the traffic distribution: There’s more traffic coming from fewer search listings.

So, if you have a term that gets you lots of clicks via lots of different listings, you may be out of luck. In the example below, if the ‘<10′ listings averaged about 4-5, that would explain the whole difference:

Webmaster tools data for a spread, long-tail term

Despair, thou art Google Query Reports

What to do?

I’m no mathematician. And my brain’s been in sugar-stoned holiday mode for two weeks. So the first thing is, check my calculations. Make sure I’m not missing something totally obvious.

Next, understand that, at least until someone comes up with a better solution, you aren’t going to be able to get accurate Google Webmaster Tools data for mid- or long-tail terms that send clicks to your site at several different URLs.

There is a bright side, though: This explanation makes sense, and I couldn’t find any examples where it failed. So, you can at least use the formula to calculate search phrase traffic for phrases with higher traffic concentration among fewer listings. In my case, I recovered about 75% of the data I’d lost to (not provided), taking my data loss from 20% of key words to about 5%.

What I’d ask for

If I could get anything from Google in the next 12 months, it’d be a Google Webmaster Tools search queries report that can handle wider query distribution among more listings.

Google? Any chances that’ll happen…?

Please?

If you want a more general, and accurate, look at the effects of (not provided) on your data, Avinash wrote a great how-to article. It’s not keyword data, but it’s great insight.

tags : conversation marketing

related articles

15 Comments

  1. Thanks Ian. Coming out of a bit of a sugar/food/drink coma myself, I’m not so sure I should be reading this just yet. So please accept my 2 cents for what it is, approx 1.25 cents (at best).

    1) The tools have two different purposes and two different intentions. Does one expect them to be more in concert? Yes, probably. On the other hand, to some extent, the contrast itself becomes a third perspective. Perhaps?

    2) In any case, I’m not so sure either should be taken as gospel. As I’m sure we’d agree both have exceptions, quirks, oddities, etc. At 50,000 feet they’re great. At 25,000 feet still very helpful. However, the closer you get to ground level – as you’ve noted – the more misleading they can be.

    Live without these tools? No way! But over-reliance and over-trusting them can get a bit sketchy, yes?

    • Ian Lurie

      Ian

      All true. But in that case, should Google be steering unwitting business owners to the Webmaster Tools Queries report? I don’t think so.

  2. Thanks for the post! Although as the percentage of (not provided) visits increases, we are less capable to get real knowledge about this :/

    For what I’ve seen, it seems that a limit exists on the quantity of data collected each day, specifically on the number of queries/landing pages. If you see this data in Google Analytics instead on Webmasters Tools (in the Traffic Sources->Search Engine Optimization->Queries report), you can access to three months of data, and see the data for one specific day. I’m getting no more than 1000 different queries/landing pages for any specific day, while the number of keywords bringing traffic is over 20000 for that same day (the same for the landing pages report). I’m not 100% sure 1000 is the limit because it’s always a number under that (900 something), but I think Google only shows specific data for the first 1000 queries/keywords…

    Anyone else has more info about this?

    Thanks in advance!

    • Ian Lurie

      Ian

      I’m hoping that one of these days, Google will make this data available via API. That would likely get rid of the 1000 query limit.

      Although for most purposes, the first 1000 queries are likely enough.

  3. Ian, the problem is in the Google Analytics attribution model. If you see 50 visits from a particular phrase in GA, only one visit could actually come from Google and 49 directly (or from an e-mail, a desktop Twitter app et.).

    • Ian Lurie

      Ian

      Hi Marek – can you explain that a bit more?

      • The _utmz cookie expiration is set (by default) to 6 months. So, if a visitor comes to your site through a google organic search, the _utmz gets updated to reflect that that visitor comes from google organic using a keyword ‘x’ (or not provided). Then, if that visitor enters your site everyday directly in the next six months, those visits count as google organic visits from that keyword.

        So, having 100 visits from the keyword ‘x’ in GA doesn’t necessarily mean that there were 100 visits who come from google using that keyword.

        Depending of the site, it’s usually a good idea to set the expiration day of this cookie to a shorter period of time (3 days for example), but that depends on the days your visitors spend to convert.

        • Ian Lurie

          Ian

          Totally true. But when 25% of your keyword data just vanishes, it’s going to take a big bite out of all of your data. We can at least look at return visitor counts to try to estimate how much trouble the cookie length is causing, right?

        • Wow, I’ve often encountered this issue but couldn’t come up with such a detailed explanation. Indeed I often look twice at GA referrer data due to this bug. I check the geographic location and how many of the referred visits were new visits to make sure unique users were involved and not the same person showed up several times.

          This bug is IMHO the reason why your GA and GWT don’t match.

  4. Hey Ian, did you try and match organic clicks to image search instead of web? It has been my experience, time after time, that Google’s Image Search will usually account for most of the discrepancy in reported Organic Google Search traffic in Google Analytics. GWT may only report Web searches and not Image search volume.

    For instance, WebRanking.com gets a steady stream of traffic for the keywords “portland”, “downtown portland” and similar from Google’s image search, but I’ve never found us listed in the top 50 for most (including those main two) of these keywords in the web search.

    Yep. Just checked again and still the same. And there are those same keywords appearing, in smaller volume, listed under images.google.com under Search Engines in GA.

    Look into those and you might find listings that may account for your own discrepancies. Might:)

    • Ian Lurie

      Ian

      Hi James – yep I tested that. It’s very inconsistent, but usually the missing searches are on the GA side for me, not the GWT side. So that only makes the discrepancies I’m finding worse.

  5. Hi Ian,

    I enjoyed your thought process on bridging data b/w GA and Webmaster Tools. I am going to take this into consideration as I expand out our “Not Provided” report in our Web Presence Software. I am the Co-founder and CTO at gShift Labs.

    The approach we took at gShift was to combine our clients ranking data of keyword phrases – including long-tail phrases. Because gShift knows the exact page that is ranking for a phrase – we are able to correlate that data with the clients Google Analytics. When we look at the entry / landing pages for any “Not Provided” keyword – we reversed engineered that to show what phrases that page is ranking for. And we show you what phrases are ranking in the top 10 in Google. Thus we can provide some clarity around Not Provided.

    I am going to think about how we can correlate the Google Webmaster data so that it can all work together to provide an even clearer understanding. Thanks for your post.

  6. That is true,ian. As an author and business man, I can relate to how you said, “It turns out comparing Google Analytics keyword data and Google Webmaster Tools data isn’t like comparing apples and oranges”. I hope more people discover your blog because you really know what you’re talking about. Can’t wait to read more from you!

  7. Thanks for the article Ian! We average around 17-18% for (not provided) data so finding ways to recover lost data is pretty important. Going off of what Mark said above, Webmaster Tools is very inaccurate for low traffic sites. Especially, when the majority of the queries have <10 Clicks. :C

Comments are closed.