Have you ever found yourself looking for a solution to parse or validate a domain name? Probably, you spent several hours trying to find the most efficient and comprehensive regular expression, but the more examples you found, the more you realized that the final solution doesn't seem to exist.
And you are right. There is no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list of all top-level domains and the level at which domains can be registered. This is the aim of the effective TLD list.
Here comes the Public Suffix List.
What is the Public Suffix List?
The Public Suffix List is a cross-vendor initiative to provide an accurate list of domain name suffixes.
The Public Suffix List is an initiative of the Mozilla Project, but is maintained as a community resource. It is available for use in any software, but was originally created to meet the needs of browser manufacturers.
A "public suffix" is one under which Internet users can directly register names. Some examples of public suffixes are ".com", ".co.uk" and "pvt.k12.wy.us". The Public Suffix List is a list of all known public suffixes.
Does it work with Ruby?
Yeah! Public Suffix Service is a Ruby domain name parser based on the Public Suffix List. To use it you don't need to download the list or learn how it works. Just install the Gem and you're done.
$ gem install public_suffix
Here's a few examples:
# parse a very standard domain name
domain = PublicSuffixService.parse("google.com")
domain.tld
# => "com"
domain.domain
# => "google"
domain.subdomain
# => nil
# parse a less standard domain name
domain = PublicSuffixService.parse("google.co.uk")
domain.tld
# => "co.uk"
domain.domain
# => "google"
domain.subdomain
# => nil
# it works with subdomains too
domain = PublicSuffixService.parse("www.google.co.uk")
domain.tld
# => "co.uk"
domain.domain
# => "google"
domain.subdomain
# => "www"
Domain validation
The Public Suffix Service library offers a quick way to validate a domain.
PublicSuffixService.valid?("google.com")
# => true
PublicSuffixService.valid?("www.google.com")
# => true
The main difference compared with the regular expression based solutions is that this library actually validates the domain against a white/black list instead of running a soft check on the TLD size.
PublicSuffixService.valid?("google.xx")
# => false
PublicSuffixService.valid?("google.zip")
# => false
Domain transformation
The PublicSuffixService::Domain
class provides a bunch of methods to validate and transform a domain name.
domain = PublicSuffixService.parse("www.google.com")
domain.domain?
# => true
domain.is_a_domain?
# => false
domain.is_a_subdomain?
# => true
domain.subdomain
# => "www.google.com"
domain.domain
# => "google.com"
Who uses the Public Suffic List?
The list is used by well known browsers such as Google Chrome, Mozilla Firefox and Opera.
The Public Suffix Service Ruby library was created for RoboDomain and it has been used in production since November 2009.