Rake Part 7: MultiTask

If you're a Ruby programmer, you've almost certainly used Rake, the build utility created by the late Jim Weirich. But you might not realize just how powerful and flexible a tool it can be. I certainly didn't, until I decided to use it as the basis for Quarto, my e-book production toolchain.

This post is part of a series on Rake, starting with the basics and then moving on to advanced usage. It originated as a series of RubyTapas videos published to subscribers in August-September 2013. Each post begins with a video, followed by the script for those who prefer reading to viewing.

My hope in publishing these episodes for free is that more people will come to know and love the full power of this ubiquitous but under-appreciated tool. If you are grateful for Rake, please consider donating to the Weirich Fund in Jim's memory.

This miniseries on Rake is winding its way to a close. I hope that over the course of the last several videos you’ve come to appreciate the power of Rake as I have. But perhaps you’re still skeptical about the benefit of using Rake over plain-old Ruby or shell scripts. If so, I think today might just change that impression. I want to show you an amazingly powerful capability that Rake gives us more or less for free.

Let’s say we’re putting an ebook together. We have a directory full of several hundred code listings which we’ve stripped out of the text in preparation to be turned into syntax-highlighted HTML using the pygmentize utility.

Here’s a little Rakefile to take care of this task. It defines a list of listings, a list of “highlights”, which are the HTML end products, and a task to produce all of them called highlight. Finally, it defines a rule to produce a .html file from a listing file by running pygmentize on it. We’ve also defined the default task to depend on the highlight task.

Highlighting source code with pygmentize takes time. When we have a lot of source files, it takes a lot of time. If we run rake under the time command, it tells us that the process takes about 48 seconds.

Currently these highlighted files are being built one at a time. But this is 2013, and I have a computer with two physical cores and, through hyperthreading, four virtual cores. Why can’t we build more than one file at a time?

As it turns out, we can. And all we have to do is change one line of the rakefile, from task to multitask.

This tells Rake that it can process the prerequisites of the :highlight task in parallel. Note that we make this change to the task which depends on the task we want to be parallelized; not to the parallelizable task itself

We run rake again. We see some rather messy output, as Rake fires up a few hundred parallel Rake subprocesses simultaneously and they all talk to the same STDOUT.

A little over 25 seconds later, the build is done. With this one change, we’ve cut the processing time nearly in half!

If we want to fine tune how many tasks are run in parallel, we can use the -j option to Rake to tell it the maximum number of processes to run at once. I’ll specify 4, one for each virtual core.

Interestingly, this actually takes a little bit longer. I’m not sure why.

Earlier I said that all it takes is a one-line change to the code to parallize execution, but that was a bit of a fib. In truth, we can tell Rake to run tasks in parallel with no changes to the code whatsoever. Let’s change the multitask back to a task. Then we’ll run rake with the -m option. This tells Rake to treat ever task as if it is a multitask.

Again, we see distorted output. And when the dust settles, we once again see a total time of a little over 25 seconds.

Rake’s parallelization is smart, too: if other tasks were dependent on the :highlight task, it would still wait until all the pygmentize processes finished before moving on to the next phase.

So what do we gain from automating our builds with Rake? Not just an easy way to declare complex dependencies and rules for accomplishing tasks. Not just a set of convenience methods for file operations. Not just a handy command-line front-end. In addition to all that, we get parallelization of repetitive tasks for free. And that’s what I call happy hacking!

(This post is also available in Japanese from Nilquebe Blog.)

I hope you've enjoyed this episode/article on Rake. If you've learned something today, please consider "paying it forward" by donating to the Weirich Fund, to help carry on Jim's legacy of educating programmers. If you want to see more videos like this one, check out RubyTapas. If you want to learn more about Rake, check out my book-in-progress The Rake Field Manual.

P.S. Have you used Rake in a particularly interesting way? I want to hear from you.