Google Webmaster Tools Query Data is Worthless
The short version, if you want to skip me ranting like a lunatic: Google Webmaster Tools query data is, as far as I can tell, completely, 100% useless. It’s not a good ‘relative comparison.’ It’s so wrong that it might actually be a bad idea to use it at all.
Now, here’s the whole, tragic story:
I’m a pretty empirical guy. I like my numbers. They’re comforting.
So, when the realization dawns that data we all depend on is a big, fat chunk of steaming camel manure, I get a little…
This week, Google Webmaster Tools has me reaching for the Valium, when a little bit of math proved GWT query data is said pile of recycled camel snacks.
How it happened
I built a tool that downloads Google Webmaster Tools web queries every night: A little bit of geeky goodness that made me smile, and let us archive query data for more than 90, 60 or 30 days, or 12 hours or whatever the next policy change brings.
Then I pulled all that data together and, in what may be the single stupidest idea ever, decided to compare the Webmaster Tools query data to analytics query data.
Here’s what I did:
- Picked 5 clients with overall organic traffic ranging from 3,000 visits/month up to 2,000,000 visits/month.
- Imported the Google Webmaster Tools (GWT – I’m sick of typing it) data into Excel.
- Dumped all terms with ‘<10’ clicks.
- Did the same with the analytics search data.
- Calculated the average ‘not provided’ impact for each client.
- Used that impact to attempt to adjust the GWT click numbers.
- Measured the percentage error by keyword.
Oh. My. Gods. The best result was an average 40% error. The worst? A client using Omniture who had a whopping 149% average error. Here’s a histogram. It’s not pretty:
When I saw this result, I tried a few different things:
I pulled GWT query data that includes all queries, instead of just web queries. That made the result even worse. The average percent error rose, with the best result at 45%, and the worst at over 170%.
Next, I checked data accuracy by date for a single domain. It turns out that GWT query data isn’t even consistently wrong. Over time, data accuracy for a single domain fluctuates wildly:
That’s one domain, measured every few days.
- There were an equal number of instances where GWT was too high and too low. So this isn’t about GWT or the analytics tools over- or under-counting. It’s random.
- I used clients in industries ranging from publishing to e-commerce.
- I was utterly sober while doing this math.
OK, Google, or Avinash, or someone, answer me this:
- Am I just doing it wrong? Please tell me I’m just doing it wrong. Please?
- If I’m right, why even show this data to us? It has zero value as a relative measure. As an absolute measure it’s worth a bit less than an NSA privacy agreement.
- Again, if I’m right, where does this data come from? Inebriated gnomes? A bunch of Atari 2600s managed by chimps? Or are you rolling a pile of dice?
- If I measure a single keyword over time, am I going to see the same kind of randomness? I’m not sure I could take it, so I’ll refrain. Just curious.
With that, I’ll return to my room, sit down on the floor, grab my knees and rock gently to and fro while humming tunelessly in a minor key.