Understanding the WHOIS protocol

The WHOIS is a query/response protocol that is widely used to query databases that hold information about internet resources such as domain names and IP address allocations.

Created in the 1980s, WHOIS began as a service used by Internet operators to identify individuals or entities responsible for the operation of a network resource on the Internet. The WHOIS service has since evolved into a tool used for many purposes.

Nowadays the WHOIS protocol is primarily used by registrants and users to query domain registrar databases to obtain domain name information and check domain name availability.

Specification

The NICNAME/WHOIS protocol was first described in RFC 812 in 1982 by Ken Harrenstien and Vic White of the Network Information Center at SRI International, and subsequently updated 3 years later in RFC 954.

The RFC 3912, published in 2004, is the latest and most significant update to the WHOIS protocol as of today. It renamed the NICNAME/WHOIS to WHOIS and introduced several updates intended to remove the information no longer applicable to the state of the Internet in 2004.

The RFC 3912 contains the essence of the WHOIS protocol specification.

A WHOIS server listens on TCP port 43 for requests from WHOIS clients. The WHOIS client makes a text request to the WHOIS server, then the WHOIS server replies with text content. All requests are terminated with ASCII CR and then ASCII LF. The response might contain more than one line of text, so the presence of ASCII CR or ASCII LF characters does not indicate the end of the response. The WHOIS server closes its connection as soon as the output is finished. The closed TCP connection is the indication to the client that the response has been received.

Whilst the RFC 3912 provides (very little) information about how a WHOIS query should work, it doesn't say anything at all about the content of a WHOIS server response. What about the syntax or the encoding?

Also, the WHOIS protocol lacks mechanisms for access control, integrity, and confidentiality.

As a result, in the years providers designed their own WHOIS server implementation. The proliferation of customized WHOIS protocols makes almost impossible the creation of an unique, standardized interface to query WHOIS servers and consume WHOIS responses.

Standardization

ICANN recently started a project that would eventually end up with the creation of a standardized WHOIS as a replacement for the current WHOIS protocol. This is an extremely ambitious project that unfortunately has a huge chance of failure due to the large adoption of the WHOIS protocol.

Replacing all existing WHOIS servers would requires an incredible effort and this is one of the reasons why similar projects such as RWhois and Whois++ have failed in the past.

ICANN is probably the only organization capable of start such kind of reorganization of the WHOIS protocol.

Current State

As of today, the WHOIS protocol is the de-facto "standard" for querying domain name information. The decision to expose or not a public WHOIS interface is completely up to the TLD maintainer.

For instance, the .COM and .DE TLDs provides a public WHOIS interface, the .ES TLD provides a private WHOIS interface while the .VA TLD doesn't provide any WHOIS interface. It means there's no way to get information about a .VA domain.

The WHOIS protocol is a TCP-based protocol designed to work on the port 43. This makes extremely difficult to perform a WHOIS query from a browser without relying on a server-side third party tool. In fact, client side JavaScript is not able to perform socket requests on port 43.

Because of this, every hosting company provides a custom web-based WHOIS tool to query WHOIS information. The majority of these tools is implemented using a WHOIS API service such as RoboWhois or DomainTools or using a WHOIS client. There are WHOIS clients in almost every programming language.

DISCLAIMER: I'm the author of RoboWhois.

Also, it's not unusual to find a WHOIS form directly on registry websites. This is the case, for example, of Denic.de for the .DE tld or Registro.it for the .IT tld. In several cases, the web-based registry form provides private information normally hidden in the public WHOIS interface, such as contact details.

This is an example of a GoDaddy WHOIS response using the TCP WHOIS interface. Notice the link to the web-based WHOIS interface.

Please note: the registrant of the domain name is specified
in the "registrant" field.  In most cases, GoDaddy.com, LLC
is not the registrant of domain names listed in this database.

Registrant:
   Simone Carletti

   Registered through: GoDaddy.com, LLC (http://www.godaddy.com)
   Domain Name: WEPPOS.COM

   Domain servers in listed order:
      NS1.DREAMHOST.COM
      NS2.DREAMHOST.COM
      NS3.DREAMHOST.COM

   For complete domain details go to:
   http://who.godaddy.com/whoischeck.aspx?Domain=WEPPOS.COM

There is also a plethora of web-based services that provide WHOIS information, normally mixed with other domain-related and networking details such as IP address, server location and reverse lookup. A large number of these services intentionally re-publishes WHOIS responses as web pages in order to gain visitors from search engines and promote affiliations or products.

For example, here's the result page for whois expedia.

Search results for whois expedia.com

In most cases, web-based WHOIS interfaces have been accused of front running domains. The best way to avoid your domain being registered by one of these services is to run WHOIS queries using the socket based WHOIS interface, use the maintainer web-based WHOIS service or make sure the service you want to use contains explicit information about domain name front running in the privacy policy or terms of service.

Resources

Despite WHOIS is a very simple protocol, the lack of a well-structured specification transforms it to one of the most complex and unforeseeing protocol to work with. It's impossible to cover all WHOIS topics into a single article and I will probably publish more posts in the future.

The following resources can help you to learn more about the WHOIS protocol, query a WHOIS server and consume WHOIS responses.

  • Wikipedia - the Wikipedia WHOIS article.
  • Whois - the most popular WHOIS commandline client.
  • Ruby Whois - a WHOIS client and parser in Ruby.
  • RoboWhois - a cloud-based web service that provides a RESTful API to access WHOIS records.