Validating the format of an URL with Rails

Validating the format of an URI is one of those problems that periodically arises when you are validating model attributes in Rails.

There are tons of solutions available on the web, but the 90% of them are usually based on complex regular expressions and they often made custom (and perhaps too restrictive) assumptions. This is a small list of the most common "mistakes":

  • Some validators don't support custom domain names such as http://simone.weppos, absolutely legal if my application is working behind a custom DNS service
  • Some validators don't support hostnames such as http://localhost
  • Some validators focus on specific URL patterns instead of supporting a common validation mechanism
  • Some validators don't understand that http://www.google.co.uk esists
  • Some validators rely on a TLD whitelist that often becomes outdates.

I'm working on a project where I need to validate URLs quite often and I decided to approach the problem from an other point of view. I don't like to reinvent the wheel, thus I decided to take advantage of Ruby URI library.

This is a super simple validator I wrote. It is based on URI.parse.

# Validates whether the value of the specified attribute matches the format of an URL,
# as defined by RFC 2396. See URI#parse for more information on URI decompositon and parsing.
#
# This method doesn't validate the existence of the domain, nor it validates the domain itself.
#
# Allowed values include http://foo.bar, http://www.foo.bar and even http://foo.
# Please note that http://foo is a valid URL, as well http://localhost.
# It's up to you to extend the validation with additional constraints.
#
#   class Site < ActiveRecord::Base
#     validates_format_of :url, :on => :create
#     validates_format_of :ftp, :schemes => [:ftp, :http, :https]
#   end
#
# ==== Configurations
#
# * <tt>:schemes</tt> - An array of allowed schemes to match against (default is <tt>[:http, :https]</tt>)
# * <tt>:message</tt> - A custom error message (default is: "is invalid").
# * <tt>:allow_nil</tt> - If set to true, skips this validation if the attribute is +nil+ (default is +false+).
# * <tt>:allow_blank</tt> - If set to true, skips this validation if the attribute is blank (default is +false+).
# * <tt>:on</tt> - Specifies when this validation is active (default is <tt>:save</tt>, other options <tt>:create</tt>, <tt>:update</tt>).
# * <tt>:if</tt> - Specifies a method, proc or string to call to determine if the validation should
#   occur (e.g. <tt>:if => :allow_validation</tt>, or <tt>:if => Proc.new { |user| user.signup_step > 2 }</tt>).  The
#   method, proc or string should return or evaluate to a true or false value.
# * <tt>:unless</tt> - Specifies a method, proc or string to call to determine if the validation should
#   not occur (e.g. <tt>:unless => :skip_validation</tt>, or <tt>:unless => Proc.new { |user| user.signup_step <= 2 }</tt>).  The
#   method, proc or string should return or evaluate to a true or false value.
#
def validates_format_of_url(*attr_names)
  require 'uri/http'

  configuration = { :on => :save, :schemes => %w(http https) }
  configuration.update(attr_names.extract_options!)

  allowed_schemes = [*configuration[:schemes]].map(&:to_s)

  validates_each(attr_names, configuration) do |record, attr_name, value|
    begin
      uri = URI.parse(value)

      if !allowed_schemes.include?(uri.scheme)
        raise(URI::InvalidURIError)
      end

      if [:scheme, :host].any? { |i| uri.send(i).blank? }
        raise(URI::InvalidURIError)
      end

    rescue URI::InvalidURIError => e
      record.errors.add(attr_name, :invalid, :default => configuration[:message], :value => value)
      next
    end
  end
end

The code is also available as a Gist.

I do have some unit tests, but they are specific to my application and I can't post them here. I encourage you to build your own.

I found shoulda to be particularly helpful in this situation.

class ModelTest < ActiveSupport::TestCase

  VALID_URLS    = [
                  'http://godaddy.com', 'http://www.godaddy.com',
                  'https://godaddy.com', 'https://www.godaddy.com',
                  'http://godaddy.host', 'https://godaddy.host',
                  ]
  INVALID_URLS  = [
                    'ftp://godaddy.com',
                    'www.godaddy.com', 'godaddy.com',
                    'http:/godaddy.com',
                  ]

  should_allow_values_for :myattr, *VALID_URLS
  should_not_allow_values_for :myattr, *INVALID_URLS

end

Please note that this validator does not claim to be perfect. As I explained in the documentation, it does not validates some requirements that might be mandatory for your specific application.