Logging external referers with Apache

The default Apache access log includes many useful details about each single request to you website. This is an example of how a log entry looks like:

79.28.43.25 - - [25/Jan/2009:13:18:02 +0000] "GET /blog/2007/01/internet-explorer-7-in-italiano/ HTTP/1.1" 200 14487 "http://www.google.it/search?hl=it&q=aggiornamento+internet+explorer+&btnG=Cerca+con+Google&meta=&aq=f&oq=" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

You can easily identify the client IP address, the request timestamp, the landing page and the referral, in this example represented by a Google Search Page.

Creating a custom referer log file

As a Marketer or SEO, the referral and the landing page can be really useful information. Extracting them from the default apache log file can be a little tricky and requires some parsing knowledge. For this reason you would find more convenient to write a custom log file including only those two details.

Let me show you how. You don't need to know much about Apache server management but you must have access to your virtual host configuration because the CustomLog and LogFormat directives can't be specified in the .htaccess file but only at server config or virtual host level. Write the following lines either in you Apache configuration file or in your virtual host definition depending on whether you want to create a referer log for all configured websites or just for a single virtual host.

In order to monitor incoming links you need to define a custom log format using the LogFormat directive and give it an useful name, for example referer.

LogFormat "%{Referer}i %U" referer

Then ask Apache to generate a new log passing the custom format.

CustomLog /path/to/folder/referer.log referer

You can specify as many CustomLog as you want, already configured logs will not be affected. In this case Apache will generate two logs for each request: the first one with the default format and the second one including only the referral string and the landing page.

Here's an example of a typical virtual host configuration.

<VirtualHost *:80>
  ServerName    example.com
  ServerAlias   www.example.com
  DocumentRoot  /var/www/example.com/public

  # many other directives ...

  LogFormat "%{Referer}i %U" referer
  CustomLog /var/www/example.com/logs/referer.log referer
</VirtualHost>

For each request to example.com Apache will write an entry in the referer.log file including the landing page and the referer string, like the following one.

Referer Log

In Apache 2.x the LogFormat name referer appears to be reserved for the format "%{Referer}i -> %U". You should use a different name to prevent conflicts.

Combining LogFormat and CustomLog in a single line

If you don't need to define a reusable LogFormat and you don't care to assign it to a format name, you can create a custom log in one step.

CustomLog /var/www/example.com/logs/referer.log "%{Referer}i %U"

The line above is equivalent to the following one.

LogFormat "%{Referer}i %U" myformat
CustomLog /var/www/example.com/logs/referer.log myformat

Writing a CSV log file

You can customize the referer log placing as many "%" directives as you wish in your log format. For example, the following format writes a CSV log file.

LogFormat "\"%{Referer}i\",\"%U\"" referer

Here's an example.

"http://www.google.com/search?q=keyword", "/page.html"
"http://www.google.com/search?q=keyword", "/new-page.html"

Log entries can be easily parsed or opened with a CSV-compatible software like OpenOffice or Excel.

Referer Excel

Logging external referers

Logging all referrals is expensive and not so effective for marketing analysis. It would probably be a good idea to restrict the directive to match only external referers. Enviroment Variables are what we need to do this.

SetEnvIfNoCase Referer (www\.)?example\.com INTERNAL_REFERRAL
LogFormat "\"%{Referer}i\",\"%U\"" referer
CustomLog /var/www/example.com/logs/referer.log referer env=!INTERNAL_REFERRAL

First we set an environment variable called INTERNAL_REFERRAL if the request comes with a referer string matching current website domain. Then we define the LogFormat as usual and we enable the CustomLog only if the environment variable is empty, thus if the request matches an external referral.