Apache Log Regex: a lightweight Ruby Apache log parser

This is going to be a really fruitful month for me. I completed a couple of long-time standing activities and I finally had some time to go back working on my Ruby gems. After the third version of my Ruby client for delicious API, this is the turn of Apache Log Regex.

ApacheLogRegex is designed to be a simple Ruby class to parse Apache log files. It takes an Apache logging format and generates a regular expression which is used to parse a line from a log file and returns a Hash with keys corresponding to the fields defined in the log format.

Take for example the following Apache log entry.

87.18.183.252 - - [13/Aug/2008:00:50:49 -0700] "GET /blog/index.xml HTTP/1.1" 302 527 "-" "Feedreader 3.13 (Powered by Newsbrain)"

You can easily parse it with Apache Log Regex and extract only the information you need.

# This is the log line you want to parse
line = '87.18.183.252 - - [13/Aug/2008:00:50:49 -0700] "GET /blog/index.xml HTTP/1.1" 302 527 "-" "Feedreader 3.13 (Powered by Newsbrain)"'

# Define the log file format.
# This information is defined in you Apache log file
# with the LogFormat directive
format = '%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"'

# Initialize the parser
parser = ApacheLogRegex.new(format)

# Get the log line as a Hash
parser.parse(line)
# => {"%r"=>"GET /blog/index.xml HTTP/1.1", "%h"=>"87.18.183.252", "%>s"=>"302", "%t"=>"[13/Aug/2008:00:50:49 -0700]", "%{User-Agent}i"=>"Feedreader 3.13 (Powered by Newsbrain)", "%u"=>"-", "%{Referer}i"=>"-", "%b"=>"527", "%l"=>"-"}

If you want more control over the parser you can use the parse! method. It raises a ParseError exception if given line doesn't match the log format.

common_log_format = '%h %l %u %t "%r" %>s %b'
parser = ApacheLogRegex.new(common_log_format)

# No exception
parser.parse(line) # => nil

# Raises an exception
parser.parse!(line) # => ParseError

Instead of spending time parsing one line at once you can read entire log files and feed the parser collecting the final result.

result = File.readlines('/var/apache/access.log').collect do |line|
  parser.parse(line)
end

Apache Log Regex is a Ruby port of Peter Hickman's Apache::LogRegex 1.4 Perl module, available at http://cpan.uwinnipeg.ca/~peterhi/Apache-LogRegex.

You can install the library via RubyGems.

$ gem install apachelogregex

Feel free to email me with any questions or feedback. For the documentation and more details you can visit the ApacheLogRegex project page.