Rake Part 2: File Lists

If you're a Ruby programmer, you've almost certainly used Rake, the build utility created by the late Jim Weirich. But you might not realize just how powerful and flexible a tool it can be. I certainly didn't, until I decided to use it as the basis for Quarto, my e-book production toolchain.

This post is part of a series on Rake, starting with the basics and then moving on to advanced usage. It originated as a series of RubyTapas videos published to subscribers in August-September 2013. Each post begins with a video, followed by the script for those who prefer reading to viewing.

My hope in publishing these episodes for free is that more people will come to know and love the full power of this ubiquitous but under-appreciated tool. If you are grateful for Rake, please consider donating to the Weirich Fund in Jim's memory.

In the last episode, we wrote this Rakefile. It automates building three Markdown files into HTML files.

We really don’t want to have to edit this file every time we add a new file to process though. Instead, we’d like to have the Rakefile automatically find files to be built.

To give us something to experiment with, I’ve set up a sample project directory. It contains four Markdown chapter files and one appendix file in a subdirectory, all of which should be built into HTML files. It also has some other stuff which we don’t want to build. There’s a ~ch1.md file which is some kind of temporary file left behind by an editor. And there’s a scratch directory, the contents of which should be ignored.

This project is under Git revision control. If we tell Git to list the files it knows about, we see a subset of the files from before. Notably missing is a file called temp.md, which has not been registered with Git and probably never will. It too should be left out of the list of files to build.

In order to automatically discover just the files which should be built, we turn to Rake file lists. Let’s explore what file lists are, and what they are capable of.

To create a file list, we use the subscript operator on the Rake::FileList class, passing in a list of strings representing files.

So far this isn’t very exciting. But we’re just getting started. Instead of listing files individually, with a FileList we can instead pass in a shell glob pattern. Let’s give it the pattern *.md

Now we start to see the power of a FileList. But this isn’t quite the list of files we want. It contains some files we don’t care about, and it’s missing some files we do want.

We’ll address the missing files first. We add a *.markdown pattern to find files which use the long-form extension.

But we’re still missing the appendix file. To fix this, we change the glob patterns to match any level in the project directory tree.

Now we’ve found all four chapters and the appendix, but we’ve picked up a lot of junk along the way. Let’s start winnowing down the list of files. For this, we’ll use exclusion patterns.

We start by ignoring files that begin with a ~ character.

Next we’ll ignore files in the scratch directory. Just to demonstrate that it’s possible, we’ll use a regular expression for this exclusion instead of a shell glob.

We’ve still got the file temp.md hanging around. As we saw before, this file isn’t registered with Git. We’d like to make an exclusion rule that says to ignore any non-Git-controlled file. To do this, we pass a block to .exclude. Inside, we put an incantation which will determine if Git is aware of the file.

This filters out the temp file, and finally we are left with the list of just the files we care about.

Next we update the code to make the FileList definition a little more self-contained. We change from the subscript shorthand to FileList.new, and pass a block to the constructor. The FileList will yield itself to this block, which means we can set up all of our exclusions inside the block.

We need to make one more change to our list of files before we can return to our Rakefile. In the Rakefile what we needed was a list of the files to be built, not the source files that correspond to them. To convert our list of input files to a list of output files, we use the #ext method. We give it a .html file extension, and it returns a new list of files with all of the original Markdown extensions replaced with .html.

Now we’re ready to come back to our Rakefile. We replace our hardcoded list of target files with the FileList we just built.

Since we are now supporting Markdown files with either a .md or .markdown extension, we have to make one more change to tell Rake it can build an HTML file for either one. For now, we’ll do this by simply duplicating the rule. In the future we’ll look at a way to avoid this duplication.

When we run rake , we can see that it builds all the right HTML files:

I think that’s enough Rake for today. Happy hacking!

I hope you've enjoyed this episode/article on Rake. If you've learned something today, please consider "paying it forward" by donating to the Weirich Fund, to help carry on Jim's legacy of educating programmers. If you want to see more videos like this one, check out RubyTapas. If you want to learn more about Rake, check out my book-in-progress The Rake Field Manual.

P.S. Have you used Rake in a particularly interesting way? I want to hear from you.