Riffing on `interpose` implementations in Ruby

I very much enjoyed Brian Cobb’s step-by-step translation of the Clojure interpose  function to Ruby. I too agree that interpose  would be a handy method to have around.

As a quick TL;DR: interpose is kind of like Array#join , except that it produces a sequence instead of a string.

Brian’s solution works by building an Enumerator.

Returning an Enumerator  is an Enumerable  convention. And it makes for very composable code.

The other characteristic of a conventional Enumerable  method is that it yields elements to a block when called with one:

Brian’s solution doesn’t currently do this, but it would be easy to add.

This got me thinking about what it would look like to implement the same functionality based on a yielding idiom. Most Enumerable methods are implemented in terms of  yield, only constructing an Enumerator  if they are called without a block.

Let’s start with an empty Interpose  module. Rather than globally adding it to Enumerable , we’ll make it an optional refinement for Enumerable . (This only works under Ruby 2.4 or newer)

The simplest requirement for interpose is that, given an empty sequence, it should yield nothing.

This one’s easy, all we need is an empty method.

Next, if we give it a sequence with only one method, we should get back the same sequence since there are no pairs of items between which to interpose the separator.

Simply delegating to each  is sufficient.

A two-element sequence is where things start to get interesting.

Now we have to actually use the method’s one parameter, which means giving it a name:  separator .

As for the implementation… well, we could try outputting the separator before each item.

But that doesn’t give us what we want.

If only we could insert the separator before each item except when it’s the first item. Is there a way to tell which iteration we’re currently in? Yes, there is! We just need to switch from each  to each_with_index , and then we can check to see if the current index is 0.

How about longer lists?

Our existing code handles this just fine with no further changes.

There’s one last requirement: when called with no block, the method should return an Enumerator  that has the same behavior as the block form.

Ordinarily we could accomplish this using the following magic incantation at the top of the method:

This tells Ruby that when there’s no block supplied, it should take the current method ( __callee__ ), construct an Enumerator around it, and return it.

But this results in a test error:

The problem here is that we’ve defined our Enumerable  extension as a refinement. Refinements are strictly lexically-scoped to the current file. When the Enumerator  code gets around to actually sending the interpose  message to the receiver, it occurs over in Ruby’s implementation of Enumerator . Since that implementation is in a different file from our interpose.rb  (and a compiled C file at that), the refinement is not in effect.

This is a feature, not a bug. The great virtue of refinements is that it is impossible for them to “leak” into contexts where they are not expected. In order for a message send to be “diverted” by a refinement, that message send must actually be visible in the context where the refinement is used. If you’re looking at some code, and you scroll up and don’t see a using  declaration, you can be confident that no refinements are in effect.

This does mean, though, that we need a way to construct an Enumerator where the interpose  message send occurs physically within the current file. Fortunately, this isn’t difficult. It’s just a little more verbose.

With the inner call to interpose  now captured in a block inside this file, the refinement is active and our test passes.

This feels like a big distraction before the “meat” of the interpose method. Let’s extract the Enumerator  creation out into its own method.

Again: the extra boilerplate for generating an Enumerator  is only because we’re defining this code as a refinement. If it were an ordinary module method, returning an optional Enumerator  would be a one-liner.

We’ve tested the block-less version of interpose with an empty sequence. We’d like to ensure that the generated Enumerator  has the exact same semantics as the block form of interpose, in all scenarios.

There are three more tests for the block-form call. These all have a very similar shape to each other.

Let’s extract them into a shared example group and re-use them for both block-form and block-less versions of the call. We’ll start by abstracting the way interpose  is called into a helper method called expect_transformation .

Then we’ll move these three abstract examples into a shared example group.

And then we’ll create a new spec context specifically for the block form of interpose . We move the “yields nothing when given nothing spec” into this context, along with our yield-oriented definition of expect_transformation .

Now we can construct a matching example group for the no-block-given scenario.

Note the separate, context-specific definition of expect_transformation . Instead of using RSpec block expectations, it converts the return value of interpose into an array and compares that to the expected value.

We can now confirm that we have the expected semantics both when called with and without a block.

Choosing a yield-based implementation over an Enumerator -based implementation is more than just a matter of style. Constructing and using Ruby Enumerator  objects can be comparatively heavyweight. We can see this if we benchmark the original Enumerator-based version to the yielding version. We’ll use the benchmark/ips gem for the comparison.

Here are the results on my machine:

When called with a block, the yielding version is at a major advantage: about 10x faster. This reflects the fact that it doesn’t have to construct an Enumerator  object at all, and can get straight to work.

But even when they are both returning Enumerators, the yielding version is still about 3x faster. I’m not sure, but I wonder if this is due in part to the fact that the Enumerator  version is forced to use exceptions for flow control. Raising exceptions tends to introduce a lot of extra overhead.

I’m pretty happy with the end result of this refactoring. It expresses its semantics concisely, and offers a meaningful performance improvement over the original.

We’ve made use of a bunch of different tools and techniques in this article. If there’s anything here you want to know more about, and you’re a RubyTapas subscriber, here are some links for further exploration:

Not a subscriber? Get a free week of access to check out the links above (and hundreds more) by using coupon code INTERPOSE when you sign up.