Benchmarking Ruby Dispatch Strategies

Let’s say we’re dispatching events to listener objects. Events look like this:

Events should be routed to different handler methods based on the name of the event. For instance, when an object is sent the #call message with an :accepted event:

…then it should be dispatched to the #handle_accepted method, as if the listener had been sent this message:

There are a few different ways we could handle this dispatching logic. To test them out, let’s first set up a little base class for testbed objects.

This gives us a place to record handled events, so we can verify that our test classes have the correct semantics.

Now let’s kick things off with a hardcoded version of the dispatching semantics. This version has a hand-coded case statement which simply switches on the event type and dispatches to the appropriate method.

If all we needed were a single listener, this would be sufficient. But the purpose of this exercise is to come up with a way to automatically dispatch to the correct handler method, without actually hand-coding a specialized #call method in every listener class. Ideally, we’d like to come up with a strategy we can bundle up into a reusable module.

Let’s start out simple. Here’s a version that does some string interpolation in order to infer the correct handler method, and then uses #__send__ (otherwise known as #send) to dispatch to that method. The send is only performed if the handler method exists.

We theorize that this strategy will be on the slow side. Every call will have to perform string interpolation, check if the method exists, and then do a dynamic message send. And dynamic sends can’t be optimized as tightly as hardcoded sends.

Let’s now turn our minds to how we might optimize this process.

Here’s one approach. In this version, we catch #method_added events at the class level. For each method added, we look to see if it’s an event handler method. If so, we add it to a class-wide lookup table that maps events to handler method names.

Then when an event is sent to the class with #call, it simply looks up the corresponding method and does a dyamic send to it.

This approach avoids the need for string interpolation on each dispatch. It also avoids the need to explicitly check for the existence of a handler method.

Now let’s go for a more exotic approach. Here’s a version that, similarly to the last one, builds a lookup table as methods are added to the class. But in this case, instead of storing method names, it stores UnboundMethod objects derived from instance_method.

Then when send #call, it looks up the matching method, binds it to self, and calls it.

In theory this could be even faster than the last strategy. Since an UnboundMethod is a direct reference to the chunk of compiled code that makes up a method, there is no #__send__ lookup overhead.

It’s time to get aggressive in our optimizations. Here’s a version that uses class_eval to dynamically regenerate the code of the #call method every time a new handler method is added. The code it generates is similar to our original, hardcoded example.

The final generated #call method will look something like this:

One last variation. We wonder if a case statement is the fastest way to write this generated dispatch code. After all, case statements do fancy “case matching” using the === operator. What if we instead built an if statement which compared symbols by identity?

Perhaps that will be a little faster.

OK, let’s put all these different versions to the test.

We write a helper method that will create a new instance of one of the testbed classes, dispatch three events to it, dispatch a fourth, unhandled event to it, and then verify that the three handled events are logged.

Then we stick all of our testbeds in an array, and benchmark them against each other using benchmark-ips.

Let’s take a look at the results, shall we? Here are the results on my system with Ruby 2.2:

Not surprisingly, our hardcoded version is fastest. Neck and neck with it is our first code generation version, the one that generates a case statement. In my experiments the code generated version sometimes comes out ahead about half the time; I think any difference between these two just statistical noise.

Interestingly, the generated code using if statements comes out a bit slower. Apparently Ruby case statements are quite well optimized.

Next up in the rankings is the “send table” version. We can see that we were right about it being faster to assemble a lookup table in advance than to string-interpolate a method name, check for its existence, and then #__send__ to it with each call.

The latter approach, our naive #__send__-based version, is over two times slower than the hardcoded and generated case-statement versions.

The big surprise (for me, anyway), is how slow the “bind table” version turns out to be. Evidently the overhead of binding a method object and then calling it is greater than the overhead of doing dynamic sends.

Overall, while 2x is a big enough difference to matter in some programs, it’s interesting that there are no order-of-magnitude gains or losses to be had here.

In case you’re curious, here are the results when running under Ruby 1.9 instead of 2.2.

The numbers are slower across the board. The relative speeds are mostly similar, although the “bind table” and “send” versions have switched places in the rankings.

As you might guess, I performed these benchmarks in order to get a concrete idea as to the most performant way to implement part of a library I’m working on. As usual with benchmarking, there were a few surprises, and I’m glad I took the trouble to test my hypotheses.

If you have different results on different versions of Ruby; or suggestions on how to improve the benchmarks; or new variations on method dispatching; I’d love to see them in the comments.