Some members of Boston.rb had a great discussion at last night’s hackfest about library dependencies. In particular, we were discussing the problem of libraries (gems) that depend on other libraries and how Rubygems and Bundler both promote implementation dependency instead of interface dependency. “Interfaces? We don’t need no stinkin’ interfaces!” I hear you cry. It’s true: Ruby doesn’t have interfaces in the same sense Java does, but it certainly has them in the object-oriented sense. We generally call them APIs.
Take the following scenario: you’re building a web application that you will deploy in a Java container, so you’re running on JRuby. You want to integrate with Twitter, so you add the twitter gem, which in turn depends on the yajl-ruby gem for JSON parsing. Sadly, yajl-ruby is C bindings for JSON. What you want to use is json-jruby, but you would have to fork the twitter gem to make it use that. It’s not just that yajl-ruby and json-jruby have different APIs — indeed, they don’t differ by much — it’s that the dependencies are declared in the .gemspec files, so even monkey-patching can’t save the day.
This is precisely why Eric Gamma (one of the Gang of Four) says you should, “program to an interface, not an implementation.” (See this interview.)
Interface Libraries
If twitter depended on an interface instead of an implementation, you would be free to use any implementation you wanted so long as it abided by the interface. But how, exactly, do you write an interface in Ruby?
Rubygems and provides
In a recent post, Marcin Kulic discusses this and other dependency problems in the Ruby/Gems landscape. His recommendation is that we add #provides to Gem::Specification and use it like so:
Gem::Specification.new do |gem|
gem.name = "my-json-implementation"
gem.version "2.7.4"
gem.provides "json", :version => "<= 1.2.0"
end
This is a fine idea in theory, but Nick Quaranto pointed out that modifying Rubygems is exceedingly difficult and it raises backwards-compatibility problems. Additionally, there are naming issues, since the first implementation of a set of functionality (JSON, YAML, LDAP, Twitter API) usually takes the most obvious name for the interface.
Bridge Gems
Michael Bleigh, in writing OmniAuth, a Rack-based authentication provider, found he needed JSON parsing, but he didn’t want to depend on a specific implementation precisely because he didn’t want to dictate what the clients of his gem should use. To scratch this itch, he created multi_json, which uses the Bridge Pattern. multi_json declares only two methods: MultiJson#encode(object) and MultiJson#decode(json_string). It has a number of “engines” that it can use. If you want to write a JRuby-based web application that uses OmniAuth, simply declare your dependency on json-jruby and then set the engine appropriately.
Pre-definition and Post-definition
For functionality that already exists in several implementations — JSON encoding and decoding, LDAP access, &c. — the mission of an interface (either the “provides” version or the bridge gem version) is clear: extract out the most salient pieces of all the popular implementations and give them a common API. For functionality that does not yet exist — for example, the client API for a new web service — the interface builder is faced with a difficult decision: define the interface first or extract it later?
Pre-defining the interface has one key advantage: clients of the interface can easily depend on the interface rather than an implementation, so there won’t be a mad rush to change all the gems in the wild later. It also has a disadvantage: it reeks of premature decision making. Waiting to define the interface until several competing implementations have sprung up allows the interface writer to select only the most important aspects of the libraries.
Of course, in the real world, it is likely to be a mix. Perhaps you start with an implementation, try it out on a few projects, build a first version of an interface, and iterate on both. Interfaces, after all, can be versioned just like implementations. Of course, we hope they don’t change too often.
Core Principles of an Interface
Above all, semantic versioning is key to building a great interface. Included in semantic versioning is proper use of deprecation. Any method, class, or module that is to be removed must first be deprecated in a minor or patch version and can only be removed in a major version.
Secondly, interface writers should take input from the community on the API. Perhaps the easiest way to do this is to inspect the popular implementations in a category on Ruby Toolbox for commonalities.
And, of course, only build interfaces where they’re truly necessary. Out of the thousands and thousands of gems in the wild, very few provide such core functionality that they are used by libraries that are in turn used by applications. Let’s keep in mind the problem we’re trying to solve and not go interface-crazy.
How to Build an Interface Gem
What does it mean to make an interface gem? multi_json is certainly a good start at a JSON interface. It provides core functionality (just two methods, #encode and #decode) and makes it easy for the client to plug in new engines. The Boston.rb group decided that a truly great interface gem would have at least one additional feature, though: an easy way for implementers to test the compliance of their implementations. This will vary in practice, but it might be an Rspec macro:
require 'multi_json/spec_macros'
describe MyJSONImplementation do
subject do
MyJSONImplementation
end
it_should_implement_json(:version => 1.0)
end
Or it could be a Module that gets mixed in to a test class:
require 'multi_json/test_support'
class MyJSONTest < Test::Unit::TestCase
include MultiJson::TestCase::VersionOnePointOh
def subject
MyJSONImplementation
end
end
In either case, the important point is that implementers have a dead-simple way to add a whole suite of compliance tests. One thing we discussed at the hackfest was creating an rspec_rfc gem that supports SHOULD, SHOULD NOT, MUST, and MUST NOT language so implementers could distinguish non-compliance (fails at least one MUST or MUST NOT), conditional compliance (passes all MUSTs and MUST NOTs, but fails at least one SHOULD or SHOULD NOT), and compliance (passes all). The problem here is that Rspec already uses should to mean what RFCs call MUST.
The other improvement that should be made to multi_json, and is important for all interfaces, is very thorough API documentation. multi_json only defines two methods, but the API contains more than method names. Parameter types, return types, possible exceptions, yielded block, yield params, and yield return values are all important. The current documentation for #encode and #decode looks like this:
# Decode a JSON string into Ruby.
#
# <b>Options</b>
#
# <tt>:symbolize_keys</tt> :: If true,
# will use symbols instead of strings
# for the keys.
def decode(string, options = {})
...
# Encodes a Ruby object as JSON.
def encode(object)
...
It should probably look like the following (and, yes, I will be submitting a patch):
# Decode a JSON String into a Ruby Object.
#
# @param [String] string the object in JSON representation
# @param [Hash, nil] options additional options
# @option options [true, false] :symbolize_keys if true,
# will use Symbols instead of Strings for Hash keys
# @return [Object] a Ruby Object
# @since 0.0.1
def decode(string, options = {})
...
# Encodes a Ruby Object as a JSON String
#
# @param [Hash, Array, Numeric, String, Symbol, #to_json] object a
# Ruby Object to be converted to JSON. Each object in
# the graph must be a Hash, Array, Numeric, String, or Symbol
# or declare a #to_json instance method.
# @raise [NoMethodError] if any object in the graph cannot be
# translated into JSON.
# @return [String] the Ruby Object as JSON
# @since 0.0.1
def encode(object)
...
Ideal Candidates
Good candidates for interfaces tend to meet the following:
- provide functionality that is used by libraries more than applications
- some external definition of correctness of functionality
- the potential for or existence of a variety of implementations
The three categories that come most easily to mind are formats (JSON, YAML, Markdown), protocols (LDAP, OpenID, HMAC, AES), and API clients (Twitter, Flickr, Delicious).
(Later: not ten minutes after posting this, I came across Yehuda Katz’s Moneta, an interface for key/value stores. Like multi_json, moneta is a great example of a bridge gem, though it could use some of the same improvements that multi_json could, namely providing implementers a test suite and having thorough documentation on the API.)
On Stifling Innovation
One criticism of interfaces is that they stifle innovation. It’s harder for me to write a truly new way of using LDAP if I have to comply with an interface. This is certainly a valid concern. The mitigation strategy should be extreme care in writing and curating interfaces. Don’t write one unless there’s an obvious need, and when you do write one, require only as much as you truly need for basic functionality. People can always depend on a specific implementation if they need more than the basics.