Ian Lurie // Mar 23 2010
This is a preview of one of the advanced topics in the soon-to-be-published Fat Free Guide to Internet Marketing. Read it and see what you’ll be getting.
I’ve occasionally written about log files, and why I love them. It’s not an entirely healthy relationship – they sit there like, well, logs, and I slave away, grooming them, combing out data and finding snarls where search engines got stuck. Do they appreciate it? Noooo. They’re the aloof cats of the online world.
But, now and then, they give something back. This is one of those times. In this post, you’ll learn to use grep to crunch through web server log files and do some old-school analytics, such as finding all Googlebot hits on your site.
I’m a Linux/Mac OS X nerd. All tools I use in this post are built in to both. If you’re on Windows, you’ll need to install a tool like wingrep. If you want the full Linux command line in Windows, go for CYGWIN. Not for the faint of heart, though.
A computer with grep installed, one way or another.
A server log file.
A willingness to embrace the command line, if only for a few minutes.
Now, before we start, understand that grep is a command-line tool. This means going away from the nice, clickable interface we’ve all grown to love. It’s OK! Command line is your friend. It lets you do all sorts of magical stuff. It also won’t do unexpected things, like crash every other program on your computer, unless you really work at it. And it makes you seem more attractive to nearby geeks of the opposite (or same) sex. All important stuff.
Grep also happens to be the ultimate search tool if you’re trying to grab lines from a huge log file. That’s what we’re going to use it for.
Using grep is easy:
So, if you had a log file named ‘accesslog.log’, then
grep “Googlebot” accesslog.log
would sift through the file, grabbing every line that had ‘Googlebot’ in it. Note the quotes around the pattern. That’s how you type it.
Add one extra command and grep will put the results in a nice neat file for you:
grep “Googlebot” accesslog.log > googlebothits.log
The > googlebothits.log tells your computer “write the result of my grep search to a file called googlebothits.log”.
If you’re using a Windows tool, it may let you just click a button that says ‘write output to file’. Showoffs.
Also important: Grep is case sensitive! See how I’m capitalizing the ‘G’ in ‘Googlebot’? That’s why.
See the potential? Grep turns a log file into an instant database. You can search for:
And so on. Good stuff, in one little line of text.
Time to use grep.
Reminder: Grep is case-sensitive unless you add a -i modifier to it. I won’t go into that now. Just make sure you use capitals where necessary.
See? No pain. You didn’t end the universe. Nor did you reduce your computer to a smoking heap.
Open googlebot_visits.txt in the tool of your choice. If it’s a small file, I use Excel. If it’s huge, I may use more grep commands, or import the whole thing into a database tool, so I can do the analysis. Here’s how my Googlebot report looks in Excel:
If this is gibberish, wait 24 hours. Tomorrow I’m going to write about how to read a log file
You could use a tool that costs real money, like Sawmill or Splunk. But those cost money, and grep is free. Plus, all log analyzers make certain assumptions. If you’re a hardcore internet marketing nerd like me, you don’t like anyone making assumptions about anything. I want to see the raw log files and know I’m working with the real thing.
There’s another reason: Once you work with something like grep, you get more comfortable with using your computer to sift through mountains of data. That can open a lot of options that you may not have known about. This post barely scrapes the surface.
If this was too painful, a few alternatives are:
Ian Lurie is founder and CEO of Portent Inc., an internet marketing agency that has provided internet marketing, including PPC, SEO, social and analytics services, since 1995. Read More