What is a server log file? What's in it? Why should I care?

Contents
Log File Contents
Breaking Down A Log Entry
Getting The Logs
What You Can Learn (And What You Can’t)
My Favorite Log Analysis Tools

If you already know what a log file is, and want to learn how to analyze them for SEO, check out Log File Analysis For SEO.

Server log files are a raw, unfiltered look at traffic to your site. They’re text files stored on your web server. Every time any browser or user-agent, Google included, requests any resource—pages, images, javascript file, whatever—from your server, the server adds a line in the log.

That makes log files giant piles of juicy data.

If you already know how they work and want to analyze them, read my post about log file analysis for SEO. If you know that, too, get a cup of coffee and take care of all those emails in your inbox. You don’t need to read this.

Log File Contents

Here’s a line from Portent’s server log. I edited it to simplify a bit:

11.222.333.44 - - [11/Dec/2018:11:01:28 –0600] “GET /blog/page-address.htm HTTP/1.1” 200 182 “-” “Mozilla/5.0 Chrome/60.0.3112.113”

If you want to know the basics, you can read about the Common Log Format here.

The last bit that starts with “Mozilla” is the user agent. It’s important if you’re analyzing the log file for SEO, or to see what software is accessing your site, or to troubleshoot a specific server problem. The user agent is the type of browser or other software that’s accessing your site. If Googlebot requests a resource, you’ll see a user agent string that includes “GoogleBot.” If Bingbot hits your site, you’ll see a user agent string that includes “BingBot.”

Breaking Down A Log Entry

Here’s the example again:

11.222.333.44 - - [11/Dec/2018:11:01:28 –0600] “GET /blog/page-address.htm HTTP/1.1” 200 182 “-” “Mozilla/5.0 Chrome/60.0.3112.113”

On December 11, 2018, someone using Google Chrome tried to load https://portent.com/blog/page-address.htm. The ‘200’ means the server found the file (yay!). Page-address.htm is teeny, weighing in at 182 bytes.

The IP address of the client—the software that requested the file was—11.222.333.44. I put that last because for many reasons it’s not terribly helpful to us marketers.

Again: Every request from every user agent is a line in the log file. Every request. Every. Single. One.

Getting The Logs

That’s the rub. Some technical teams cling to log files, citing security concerns. Some site platforms hide log files so deep in their twisted innards finding them requires an electronic colonoscopy.

But the log files are there. They’re not a security risk. The site developer can zip them up and send them to you. Buy beers, bring chocolate, do whatever you need to do to make friends. Then ask.

If the files are gigantic, ask for a snapshot. A few days or even a few hours is a good start.

What You Can Learn And What You Don’t Need

Log files are data lasagna. They’re yummy. They’re substantial. And they’ll put you to sleep if you overindulge.

I use them to find:

Spider traps. Log files give you a great look at how search bots are crawling your site.
Spam content. If some hacker dumped a bunch of pages listing porn links on your site, any clicks to those pages appear in the log files.
Broken external links. Google eventually gives up crawling broken links from other sites. But people still click them. Track down those busted external links and reclaim some authority.
Incorrect server responses. Google Search Console can show some soft 404 errors, but the log file shows you all of them.

You can’t use them to:

Get keyword data. Keyword data isn’t just an analytics software problem. Not provided means you can’t find search terms here, either.
Track user sessions (usually). Most user session tracking requires javascript. Use a tool like Google Analytics, instead.
Track individual user. In theory, you could track visits from Ian Lurie. But it would require much mind-numbing labor.
Track rendering times. Log files show requests for resources. They don’t track what happens after the request. If a page renders incorrectly or slowly, it won’t show up here.

You don’t need to use them to:

Track conversions. Conversion tracking in log files is like sitting on your tongue. Feasible, but not recommended.
Analyze geographic data. You can, but most analytics software shows location data and requires a lot less work.
Track click paths through your site. Again, possible, but you can get the data more easily from your analytics software.

My Favorite Log Analysis Tools

Screaming Frog Log File Analyser is my number one choice. It’s a great combination of power and usability, and you can merge log file data with crawls and other data sources.

Splunk is so powerful it terrifies me. But it’s great for managing giant log files in near-real-time.

Apache Log Viewer is free. It has a steeper learning curve than Screaming Frog, but, you know, free.

Log files don’t provide conversion data or session data. That kind of tracking requires cookies and a client-side analytics suite, like Google Analytics. They do provide a record of every website resource requested from every browser and user-agent. That makes them very powerful.

Ian, WTF?

I usually write metaphor-stuffed rants. This is more Wikipedia style because I write about log files all the time. This seems more efficient than adding a “what’s a log file?” section to every post.

Comments

miki says:

April 2, 2019 at 3:08 am

HI Ian,

Thanks a lot for this article, you make it sound more understandable.
I have a question: you say here that you would not recommend tracking conversions through them, but i know for example Flashtalking is using the ad server log files to track conversions and attribution – do you see any limitation to that?

Thanks a mil!
1. Ian Lurie says:
  
  April 2, 2019 at 7:33 am
  
  Hi Miki,
  
  Unmodified log files won’t work for attribution/conversion tracking, because there’s no tracking across sessions. You can track requests for a conversion page, but that’s it.
  
  You could use software to inject additional session tracking information into the log file. I’m not sure why you’d do that, though. The result still wouldn’t be as good as typical analytics software (like Google Analytics).
Gina says:

December 8, 2020 at 8:07 pm

I understood this more than what’s in my text book. Thank you!!
Gautam Sehgal says:

December 24, 2020 at 6:44 am

Hi Ian,

I learned a lot about log files after reading your blog. Thanks for sharing the information. As someone who is fairly new to the concept of server log files, would it be safe to assume that log files can essentially provide the complete thought process of an end user visiting our website and the pattern in which they move from one resource page to other?

Thanks,
Gautam
1. Andy Schaff says:
  
  January 11, 2021 at 7:49 am
  
  Hello Gautam, I’m glad to hear you’ve learned from our resources! Access logs can certainly be used to trace a user’s journey. Keep in mind that there will be a lot of access log entries per IP address, including images, JavaScript, css, etcetera. So, it may not be the easiest to process. That being said, I know there are lots of log analysis software that could be used. Essentially, this is what Google Analytics accomplishes. But nothing will be more accurate than what your server’s access logs provide as they are at the source. I hope this helps!