New in Whois: improved caching

I'm working very hard to include some of the most important features in the new version of the Ruby Whois library.

Today, I'm very happy to report that this week I closed the issue #18 which introduces a completely new caching system for the Whois::Answer::Parser.

The way Whois parsers currently work, is to extract a property only the very first time it is requested.

r = Whois.query "weppos.it"

# the property has never been requested before
# the value is computed, cached and returned
r.status
# => :ok

# the property has been requested before
# the value is returned without further elaborations
r.status
# => :ok

So far, so good.

The way the system works under the hood, is to create a parser instance variable for every single requested property.

r = Whois.query "weppos.it"

# get the first parser
# because
#   r.status relies on
# a
#   r.parser.parsers.first.status
p = r.parser.parsers.first

# value is not cached
p.instance_variable_get("@status")
# => nil

p.status
# => :ok

# value is cached
p.instance_variable_get("@status")
# => :ok

So far, quite good. This approach has a couple of drawbacks.

First, it creates an instance variable for every single property. Because of the large (and increasing) number of properties, the parser object space counts a large number of instance variables. This makes it hard, for instance, to sweep the cache because you have to loop through all instance variables and remove each one.

Second, there's a small inefficiency here. Because in Ruby you don't have to define variables, an undefined instance variable is nil. But nil is actually a value and properties can have a nil value. In this implementation, you don't have a way to distinguish when a value is nil and when it hasn't been elaborated yet, thus nil properties will never hit the cache.

def created_on
  @created_on ||= if very_expensive_scan(/created_on/)
    # ... set the value
  end
end

# calling #created_on several times will continue to perform
# the very_expensive_scan as long as the value != nil.

The new approach uses a single instance variable called @cached_properties as cache. The variable contains a Hash<:key => value>, where the key is the property and the value the cached result.

If the cache doesn't contain any key for given property, then the method hasn't been executed yet. Cached nil properties will return nil without further elaboration.

The method #cached_properties_fetch takes care of everything.

def created_on
  cached_properties_fetch(:created_on) do
    nil
  end
end

# value is cached and returned
created_on
# => nil

# the request hits the cache
created_on
# => nil

If you need to sweep the cache, reset @cached_properties to an empty Hash.