Boolean Externalities

The other night I spent a long time trying to figure out why an acceptance test was failing. Eventually I tracked it down to the fact that a particular predicate method was returning false , when I expected it to be returning true.

Ultimately I would find that the test failure pointed back to a legitimate oversight in my code. But I wasn’t there yet. First I had to work my way back to the source of the unexpected false.

Down the rabbit hole

The proximal source of my frustration was this method:

Simple enough method. It turned out that of the two possible reasons, the false return was coming from user.current_subscriber?, which looks like this:

This time, the offending value was stemming from the left side of the &&. Digging in, I found this:

I dutifully checked the return values, and followed the right-hand fork to this:

Following one step further down the rabbit hole led me here:

At last, I was at the end of the boolean chain. I hadn’t yet discovered why confirmed_at had not been set for the test user in question, but now I knew what I was looking for.

Debugger Blues

Following this trail was not easy, or particularly pleasant. I used a debugger to do it. Past experiences have left me gun-shy when it comes to debuggers; generally when I find myself contemplating using the debugger, I start questioning my recent life choices. Nonetheless they are sometimes a necessary evil.

Using the debugger meant first ensuring that I could catch the software at just the right moment to examine the values in question. Fortunately I was dealing with a deterministically failing test, not an occasional runtime error. Even more fortunately I was using RubyMine, which does a pretty decent job of taking the pain out of debugging in Ruby, even when working with the split client/server processes of a Capybara testing session.

Which is not to say that the debugging process was enjoyable. Thankfully this problem could be traced at a single point in execution, and didn’t require stepping. But I still had to methodically evaluate each predicate in turn to see which one(s) were returning unexpected values. In case you weren’t keeping count, that’s eight different methods.

Evaluate episode.available_to_user?. Click through to the method definition. Evaluate free?. Evaluate user.current_subscriber?. Click through to the method definition. Rinse. Repeat.

I could have done this without a debugger, of course. I could have annotated the methods with good-ole’ print statements instead. This is always a little tricky with query methods, because you have to make sure to preserve the method’s return value. One annotated method might have looked something like this:

This is a huge alteration to the method. If I had annotated seven more methods like this, I would have had a complete picture of why that boolean return value was false. Of course, I probably also would have had a screen full of output, since the method is probably called more than once in a single test, and I would have had to figure out which output corresponded to the point in the test I actually cared about…

Why does this hurt?

Ordinarily when I have a painful development experience, I reflect and come away with some conclusions about how I could have written the code better in order to avoid the problem. But here I was left scratching my head.

Because this is good code by most conventional measures. Every method is short and meaningfully named. Each deals only with one object’s limited view of the world, and respects the encapsulation of other objects. The methods have no need of extra documentation, because their meaning is self-evident. What constitutes a “current subscriber”? Why, a fully authenticated account and an open subscription, of course. It’s self evident.

Looking at this code, I can’t imagine wanting to structure it any other way, at least not while still remaining within the broad object-oriented paradigm. This code feels right.

And yet, it hurts. When I needed an answer, it took a long time and a lot of steps to get it. And I did not enjoy the process.

Information discarding

The fundamental issue is that while this code makes it very easy to ask “is a user allowed to view this episode?”, when the answer comes back “no”, there is no way to ask “well, why not?”.

One of the basic OO principles is information hiding: “I’ll answer your question, but you don’t need to know how I arrived at the answer.” Outside of the #current_subscriber? predicate I don’t want to be distracted by the details of how that answer is calculated. From a static point of view, this program does a great job at information hiding.

But from a dynamic perspective, this program isn’t just hiding information. It’s shredding it, burning it, and scattering the ashes. At every step in the chain of predicate methods, knowledge about the provenance of information is ruthlessly discarded in favor of cold, clean boolean yes-or-no responses.

And the problem is, as a developer sometimes I do need that information. And not necessarily only as a developer. One way to help users help themselves is to give them clues as to why something isn’t working as expected.

A frustrated user is staring at a disabled video player. Could that user make use of a hint, indicating that the reason they can’t watch a video may be related to the fact that their email address was never verified? Quite possibly. But with the system as it stands now, surfacing that information to the user would be a major project. It would require manually piecing together the chain of boolean consequences from start to finish.

Externalities

This all reminds me a bit of the concept of “externalities” in economics. According to some theories, prices in a market are an elegant way of dealing with the problem of limited information. No one can have an omniscient view of the entire market. But if individual buyers and sellers each make bargains based on their own limited sphere of knowledge—poor rainfall affecting crop yields; a fad diet increasing demand for bacon—in the end each product’s price will reduce all of those individual bits of knowledge into one essential number.

The problem is that sometimes the production of a good imposes a cost on someone who is neither the buyer or the seller. If manufacturing a widget has a side effect of pumping mercury into a nearby river, that might not affect the producer or the consumer in a meaningful way. But it has a value impact on the people living near the factory, one which may never be reflected in an increased price for widgets. These so-called “externalities” throw a wrench into the works of market-based economies.

I see the information lost along the steps of the logic chain in the code above as a kind of “boolean externality”. In theory, all the important information has been rolled up into one convenient boolean value. In reality, vital knowledge has been lost along the way.

Looking back, I realize that I’ve run into debugging problems like this one a lot over the years. And I’ve encountered similar situations where I discovered a need to expose why a decision was made to the user, but I didn’t have a clean way of tracing that information without building some very brittle special-purpose code.

This bothers me a lot, because as I said before, there is nothing wrong with this code as far as I can tell. It exhibits good OO style. Yet it falls short in making important information accessible. And I can’t imagine any changes short of a sweeping architectural shift—and a lot of extra code—which would change the situation.

Where to from here?

Is there a better way? I’m half assuming that readers from the functional programming camp have been quietly giggling to themselves for several paragraphs already. I suspect that this is one of those things that are well addressed by the data-centric viewpoint advocated by folks like Rich Hickey. I’d love to hear from someone with more practical FP experience than I have, demonstrating exactly how this problem would be addressed (and/or wouldn’t exist in the first place) in e.g. idiomatic Clojure.

I’m also mindful of the Functional Reactive Programming model, in which values are explicitly traced from source to destination. I don’t know for sure, but I imagine that an FRP environment might take a lot of the pain out of this code.

What do you think? Is this a pain you’ve felt as well? Do you have ideas about how to solve it? Is there something obvious I’m missing? Are there paradigms or technologies that better address the problem of boolean externalities? I’d love to hear your thoughts on this in the comments or on your own blogs.

EDIT: Kevin Rutherford replied with a thought-provoking post about replacing booleans with a state model.