Making cats into turnips: Using Google Webmaster Tools query data
A long time ago, alchemists tried to turn lead into gold. They were pikers. Compared to digging keyword data out of the Google’s lovely (not provided) statistic, lead into gold woulda been a cinch.
But I’m stubborn. So, I’ve pulled a bunch of keyword analytics data from Google Analytics, and the Search Queries report from Google Webmaster Tools. I figured, if the data matched up reasonably well, I’d be all set.
It turns out comparing Google Analytics keyword data and Google Webmaster Tools data isn’t like comparing apples and oranges. Those are at least both round and fruit-like. It’s more like comparing cats and turnips.
The formula that
works sorta works
I started this process assuming that if:
- I knew the reported Google Analytics rga organic traffic for a phrase;
- I knew the Google Webmaster Tools query clicks gw for the same phrase;
- I knew the (not provided) count np;
- And I knew the total Google Webmaster Tools clicks gwt for that phrase.
Then I could get at least a rough idea of the actual Google Analytics clicks aga, like this:
aga = (gw/gwt)*np
Then, I can use rga as a gut-check to ensure that Google Webmaster Tools isn’t lying to me.
But, as I discovered, there’s a lot of faith involved. And guessing. And gnashing of teeth.
The problem with this method: It only works when there are more Google Webmaster Tools clicks than Google Analytics clicks. This formula finds missing clicks in Google Analytics, given a higher number of GWT clicks. If you switch things around, it becomes a crap shoot. And this is a problem that pops up a lot.
If the Google Webmaster Tools data were 339% higher than the Google Analytics data, the difference might come out of (not provided). But in this case, the GWT data is lower.
I found a lot of these — about 35% of my search data. Soooooo, where’d those clicks come from? Why is Google Webmaster Tools under counting in so many cases? Or is Google Analytics over counting?
My palms got sweaty.
I try some theories. One sticks.
My first theory: Google Webmaster Tools failed to track clicks on videos, somehow, that weren’t YouTube or Google. That didn’t work out: Lots of non-video rankings in my data had the exact same problem.
Actually, my first first theory involved puppy sacrifice, but I figured Google couldn’t hide that for long, so I ruled it out.
My second theory: When both clicks and impressions get below 10 for a specific listing, Google Webmaster Tools goes cattywhumpus. It just can’t provide accurate data on really low-clickthru listings. Instead, GWT shows an oh-so-helpful <10 measurement. That’s fine if you’re simply ignoring low-volume phrases. But, if you’re trying to analyze the long tail, it really screws things up.
I need to emphasize this isn’t a keyword thing — it’s a listing thing. Here’s an example: For the phrase ‘Google Analytics Tutorials’, I see 170 impressions and less than 10 clicks in Google Webmaster Tools’ query report. In Google Analytics, I see 23 clicks for the same time period:
The problem isn’t the phrase. I have other phrases that get few impressions and clicks, but still show accurate data. They show that accurate data because most of the clicks come from a single search listing. So instead of 5 listings all getting <10 clicks, I see one listing that gets, say, 110 clicks:
While that’s higher volume, it’s not exactly a high-traffic phrase. The difference is the traffic distribution: There’s more traffic coming from fewer search listings.
So, if you have a term that gets you lots of clicks via lots of different listings, you may be out of luck. In the example below, if the ‘<10′ listings averaged about 4-5, that would explain the whole difference:
What to do?
I’m no mathematician. And my brain’s been in sugar-stoned holiday mode for two weeks. So the first thing is, check my calculations. Make sure I’m not missing something totally obvious.
Next, understand that, at least until someone comes up with a better solution, you aren’t going to be able to get accurate Google Webmaster Tools data for mid- or long-tail terms that send clicks to your site at several different URLs.
There is a bright side, though: This explanation makes sense, and I couldn’t find any examples where it failed. So, you can at least use the formula to calculate search phrase traffic for phrases with higher traffic concentration among fewer listings. In my case, I recovered about 75% of the data I’d lost to (not provided), taking my data loss from 20% of key words to about 5%.
What I’d ask for
If I could get anything from Google in the next 12 months, it’d be a Google Webmaster Tools search queries report that can handle wider query distribution among more listings.
Google? Any chances that’ll happen…?
If you want a more general, and accurate, look at the effects of (not provided) on your data, Avinash wrote a great how-to article. It’s not keyword data, but it’s great insight.