Some quick thoughts on input validation

The Hanami project is looking at upgrading and/or replacing their input validation system. I had some thoughts on this topic, but I don’t want to excessively clutter up their thread with discussion which may be out of scope.

The present system is vaguely Rails-ish, in that the API uses keyword arguments to specify constraints:

This has some obvious shortcomings, the most obvious being the lack of explicit ordering of validations.

They are considering changing to a system either built on top of, or at least inspired by, dry-validations:

This API has an appealing elegance. It’s as succinct as the keyword version, while offering explicit ordering. It’s also built on an attractively functional model of stateless predicates and combinators. This reminds me of a number of DSL ideas I’ve either built or noodled around with over the years.

Affordances

But there are also some questions that come into my mind when I see a DSL like this:

  • What is str??
  • Where does it come from? Will I find method definitions which correspond to these predicates?
  • Can I use predicate methods defined on the value itself?
  • …how, since there’s no obvious way to reference the value being validated?
  • What is the the value of self inside the validation block?
  • How do I add new predicates?

Please do not go looking for the answers to these questions. And please do not write a comment about how that’s “all found in the documentation”. Because that’s not the point.

Here’s a Hanami view module, straight from the documentation:

Quick:

  • Where would I add another view, say, an “archive” page?
  • How would I add a new partial?
  • What is the value of self in the view code?
  • How could I define a set of views that are shared by multiple similar models, and “include” them into each models’ set of views
  • How could I define a view that builds on an existing view and specializes it in a few ways?

If you know Ruby, the answers to all of these questions are instantly self-evident. You don’t need to learn a new way to define methods, inherit behavior, or compose functionality from shared modules. You can call upon all of your existing knowledge of the language. This code has affordances.

This code, on the other hand, has few affordances:

Indeed, your intuition may actively mislead you in this case. For instance, those predicates that look like they must be defined as methods somewhere? In fact they are defined like this:

Consider the following totally made-up hand-wavey suggestion of a validation API:

This bears a strong resemblance to Test::Unit or MiniTest code, which is not a coincidence. I’m not saying that this is remotely the right API. But think about the following questions about it:

  • Are those predicate methods likely defined as real methods somewhere?
  • Where are they likely to be defined?
  • How do you think you would create a variant of the SignupValidation which builds on this one and adds a constraint that the name contain only ASCII characters?
  • How would you go about sharing common constraints or predicates between a few different validation classes?

But Avdi! It’s soooo verbose! Well, yes, this represents sort of the opposite extreme of the DSL version. I can imagine some potential middle-ground compromises or syntax sugars, and you probably can as well.

It’s your problem

One of my questions about the validation code was: What is the value of self inside the validation block?

Here’s an answer I received:

AFAIK self inside the schema definition block is actually not all that important or useful. The definition is intended to be completely abstract, keeping the user focused on the predicate logic itself.

Seems reasonable, right? I’ve almost certainly given someone the same answer about similar code in the past.

Let me make some predictions about that “unimportant” self value, based on 15 years of experience in Ruby projects:

  1. Users will try to write “Ruby logic” inside those blocks, in combination with the “approved” predicate logic.
  2. Users will get themselves confused in this way, and in particular they will be taken by surprise when self is not what they expected it to be.
  3. They will clutter the project forums with questions, bug reports, and sometimes complaints when they get confused.
  4. Project maintainers will become annoyed, and will brusquely advise user to “RTFM”.
  5. No matter how many times this happens, like the restaurant owner who points angrily at the “watch your step” sign every single time a patron trips over the four-inch lip between rooms, the project owners will react by shaking their heads about why people can’t just read the damn documentation.

I’ve watched this play out over and over again. As often as not I’ve been the one shaking my head at the “dumb users”.

Here’s a hard truth of usability: you don’t get to decide what “makes sense”. In the context of Ruby, client programmers will always try to write “regular old Ruby code” inside “special” contexts. Sometimes they will even have good reasons. And if their intuitive grasp of what’s possible or sanctioned comes up wrong over and over again, it’s not their problem. It’s yours.

Validation is a Process

I would love to dive deep into the implementation of an intuitive, affordance-rich validation system. Unfortunately, I just don’t have the time right now. For anyone who is interested, though, here are some general design notes.

  • Any project having to do with user input validation should begin with a careful reading (maybe two or three readings) of Ward Cunningham’s CHECKS Pattern Language of Information Integrity. This tragically under-read paper is one of the most useful explorations of the validation problem space I’m aware of. It won’t give you a blueprint for building a validation system. But it will acquaint you with some of the less-obvious considerations that you should take into account.
  • One of the things you’ll realize after reading that paper is that a lot of the discussions of validation in the Ruby world are built on a false premise, that of primitive obsession. If input is not being consumed in the form of Whole Values and Exceptional Values, we lose many of the benefits of object-orientation right from the get-go.
  • Another of the non-obvious points the CHECKS paper draws out is that validation is a process. It may involve several back-and-forths with the user. There may be more than one “draft state” before input is fully validated. Some validations may require some time-consuming calculation or database queries, and these validations should not interrupt or hold up the user input process.

Let me expand on that last point. Consider the following features we might want to add to a mature and evolving web application:

  • The ability to suggest alternate usernames in response to one that is already taken.
  • A system that notes when a user appears to be struggling with a form, and offers to put them in touch with a support agent.
  • A slow, failure-prone check against an external address-validation service.
  • A memory of the first few, invalid attempts a user made at a field, so we can better understand why our users get confused and/or give up.

Most validations systems I’ve seen suffer from what I’ve come to think of as the transactional fallacy. They represent what is truly a process as a theoretically instantaneous transaction. One that requires no conversation, no back-and-forth, no history. A schema function that is applied to data and presents a result.

EDIT: You can see the detritus of the transactional fallacy even in the most vanilla Rails code. You see it in oft-lamented fact that the model.valid? predicate is actually a data-mutating command. You can see it in the fact that models tote around their .errors attribute, the abandoned state of an implicit validation process that pretends to be a “stateless” transaction.

As a result of the transactional fallacy, as soon as it comes time to add some of these process-oriented features to an application, you’re on your own. In fact, it may feel like the framework actively fights you.

I’m not saying you can’t add these features. I’m saying the transactional validation system will give you no help; no suggestion as to where you might extend it in order to move in the direction you need. You’ll wind up building something completely new and probably a bit kludgey outside of what’s available, rather than on top of it.

A validation process that may accumulate history over time? That may need to request clarification? That may defer some constraint checking asynchronously until later? Validations which represent important knowledge about how your business is transacted, and therefore deserve to be represented as parts of your domain model in their own right? Possibly even including (gasp) persistent state of their own? These are all concepts which are entirely outside the language of most validation approaches.

I’m not saying that any of these features should be available in the box. But it should be intuitively obvious how to compose, extend, and build upon validations in exactly the same way we compose, extend, and build upon any other objects in our application.