Rake Part 1: Files and Rules

[boilerplate bypath=”rake”]

We’re going to spend some time looking at Rake over the next few episodes. I hope you don’t mind.

Chances are you’ve used Rake at some point. If nothing else, you’ve probably run various Rake tasks associated with Rails projects. Perhaps you’ve written some Rakefiles of your own.

Chances are, though, that you’ve barely scratched the surface of Rake’s capabilities. That was certainly true of me until a few weeks ago. I’d written my share of Rakefiles and task files, sure, but I’d never really dug deeply into all that Rake can do. Now that I’ve spent some time really learning Rake, I’ve realized that it’s a tool of extraordinary power. I’d like to share some of what I’ve learned with you.

We’re going to start, though, with a review of Rake basics.

Let’s say we have a directory full of Markdown files we want to convert to HTML using the Pandoc tool. We could write a simple script to iterate over the files and convert them one by one.

%W[ch1.md ch2.md ch3.md].each do |md_file|
  html_file = File.basename(md_file, ".md") + ".html"
  system("pandoc -o #{html_file} #{md_file}")
end

But this script is going to remake every single one of the HTML files every time we run it, even if the source files haven’t changed. If the markdown files are very large, this could mean a long wait.

Instead, let’s make a Rakefile and write a Rake task to generate the HTML. It starts similarly, by iterating over a list of input files, and determining the corresponding HTML file. But then it starts to differ. We use Rake’s file method to declare that the html_file has a dependency on the markdown file. Then, inside the block, we tell Rake how to get an HTML file from a markdown file, using a shell command.

What we’ve written here is a rule, or actually three rules, each one telling Rake how to build a particular HTML file from a Markdown source file.

%W[ch1.md ch2.md ch3.md].each do |md_file|
  html_file = File.basename(md_file, ".md") + ".html"
  file html_file => md_file do
    sh "pandoc -o #{html_file} #{md_file}"
  end
end

This by itself is already a usable Rakefile. On the command line, we can tell rake to build one of the HTML files and it will oblige us. We can already see an advantage over our script: Rake shows us the command that it is executing.

$ rake ch1.html
pandoc -o ch1.html ch1.md

If we tell Rake to build the same file again, nothing happens. This is because Rake checks file modification times to see if the Markdown file has changed since the HTML file was created. Since it hasn’t, Rake knows that the HTML file doesn’t need to be rebuilt.

$ rake ch1.html
$

If we then modify the file and run Rake again, it once again builds the HTML file.

$ rake ch1.html
$

It’s nice that Rake is tracking when files need to be rebuilt. But specifying which file we want to be built is tedious. We’d prefer to simply have Rake rebuild any HTML files that are out of date.

To make that happen, we add a task to our Rakefile. We name it “html”, and give it a dependency on our three HTML files.

task :html => %W[ch1.html ch2.html ch3.html]

%W[ch1.md ch2.md ch3.md].each do |md_file|
  html_file = <span class="org-type">File.basename(md_file, ".md") + ".html"
  file html_file => md_file do
    sh "pandoc -o #{html_file} #{md_file}"
  end
end

This task has no code of its own. But when we tell Rake to build the “html” task, it follows the dependency to the HTML files. It knows how to build those files because of the rules we already wrote, so it proceeds to build them.

$ rake html
pandoc -o ch1.html ch1.md
pandoc -o ch2.html ch2.md
pandoc -o ch3.html ch3.md

If we then edit one of the Markdown files and re-run the Rake task, we can see that Rake only rebuilds the one that was updated.

$ rake html
pandoc -o ch2.html ch2.md

If we’re going to be running this command a lot we can make it even more convenient by declaring a :default task with a dependency on our html task.

task :default => :html
task :html => %W[ch1.html ch2.html ch3.html]

%W[ch1.md ch2.md ch3.md].each do |md_file|
  html_file = File.basename(md_file, ".md") + ".html"
  file html_file => md_file do
    sh "pandoc -o #{html_file} #{md_file}"
  end
end

This allows us to rebuild our files by simply running rake with no arguments.

$ rm *.html
$ rake
pandoc -o ch1.html ch1.md
pandoc -o ch2.html ch2.md
pandoc -o ch3.html ch3.md

So far we’ve seen how to declare file rules and tasks. Now let’s learn how to write generic rules.

Our three file rules have all have a common pattern, of converting from a “.md” file to a “.html” file. In fact, this pattern is so repetitive that we automated the generation of the rules using an each loop. Instead of writing an explicit loop, let’s instead teach Rake how to convert “.md” files to “.html” files, and let it work the rest out for itself.

We do this by declaring a rule whose name is the file extension .html. This rule’s dependency is on the file extension .md. We then open a block. This block will accept a block argument we’ll call t. We call it t because it will be bound to a Rake Task object.

Inside the block, we use the sh command to run a shell command. It starts out with the pandoc command as before. But for the output filename, we interpolate in the task’s name attribute. And for the input file, we use the task’s source attribute.

task :default => :html
task :html => %W[ch1.html ch2.html ch3.html]

rule ".html" => ".md" do |t|
  sh "pandoc -o #{t.name} #{t.source}"
end

That’s it. When we remove our HTML files and run Rake again, we can see that it regenerates them as before.

$ rm *.html
$ rake
pandoc -o ch1.html ch1.md
pandoc -o ch2.html ch2.md
pandoc -o ch3.html ch3.md

So what just happened here? Since we specified no arguments, Rake executed the :default task, which has a dependency on the :html task. The :html task, in turn, depends on three .html files. Rake started with the first one, ch1.html, and looked to see if it existed. It found that it didn’t. So Rake then tried to find a way to build the file.

First it looked for any rules explicitly named ch1.html, but we removed all of those.

What it did find was our new rule. It saw that using the rule it could generate a .html file from a corresponding .md file. Applying the rule to ch1.html, it found that the corresponding file, ch1.md , existed. This meant that rule was a match, so it went ahead and executed it. It then repeated the whole process for the remaining missing .html files.

There is so much more I want to show you about Rake, but RubyTapas is all about one idea at a time, so I’ll save it for future episodes. Stay tuned, and happy hacking!

(This post is also available in Japanese at Nilquebe Blog.)

[boilerplate bypath=”rake-end”]

1 comment

  1. Wow! Just found this tutorial and it makes total sense. I’ve only encountered rake whilst using rails before but this has demystified what I’m actually doing!

    Thanks, looking forward to reading the rest.

Leave a Reply

Your email address will not be published. Required fields are marked *