Why does Amazon hate ebook authors?

In theory, Amazon has been a boon to ebook authors. They created the world’s first widely-accepted dedicated ebook reader hardware. Their Kindle Direct Publishing program makes it easy for self-published authors to get their stuff featured next to works from big-name publishers.

But on the technical side, Amazon’s exclusionary strategies and policies would be more at home in the 1990s. Some of their tactics would make an a Windows 98-era Microsoft executive proud.

When Amazon first created the Kindle, they bought MobiPocket and chose the .mobi format for ebook interchange.  The .mobi format was basically a very rudimentary subset of HTML packed inside a PRC file in a highly compressed form, suitable for installation on your PalmPilot device.

This decision made a lot of technical and business sense. MobiPocket had an established ebook ecosystem, probably the biggest of its time. And early Kindle devices had a lot in common with old Palm PDAs: a greyscale screen, coupled with limited processing power.

It also made things difficult for anyone writing technical ebooks. Early Kindle devices either couldn’t display monospaced text at all, or would display it in oddly unpredictable font sizes. There was no provision for horizontally scrolling a source-code listing that exceeded the page width. And forget any kind of source-code highlighting.

Many publishers of programming books resorted to publishing Kindle ebooks in which all the source code listings had been replaced with images in an attempt to match the formatting quality of the print books. This came with its own set of problems, but it was arguably the best of a very limited set of bad options.

Meanwhile, everyone involved in ebook publishing other than Amazon was working together to create a next-generation ebook standard which would give authors and publishers more formatting leeway (among many other improvements). EPUB2 was standardized in 2007. It was built on web standards like XHTML and a subset of CSS, and it gave authors far more flexibility to style their books. With EPUB 2, it was entirely possible to publish nicely formatted code listings in a programming ebook without resorting to images.

EPUB 3 became a standard in 2011. EPUB 3 was intended, among other things, to address limitations in EPUB 2 for non-textual content like comic books. This standard has been criticized by some for overreaching, particularly when it comes to its support for multimedia elements. However, the core of EPUB 3 is a solid step up from EPUB 2. It is built on HTML5, and makes good use of that format’s more semantic elements. It also incorporates support for CSS3, including print styles and media queries. This makes it much easier to build a single master stylesheet for an ebook which supports display on various devices as well as style the book for printing using a tool like PrinceXML. It also finally makes makes it possible to have things like auto-numbered chapters without support from a pre-processor of some kind.

If you buy an ebook from anyone other than Amazon, you’re buying an EPUB file. Apple and Google, in particular, have aggressively supported EPUB 3 in their iBooks and Google Play stores. As an ebook author, iBooks and the Google Play reader are a dream come true: I can write the book, style it as I please with modern CSS, and be confident that readers will see the text as I intended.

As Amazon evolved their Kindle technology and, in particular, rolled out the Kindle Fire series of tablets, they too saw the need for an updated format which would make full use of the extended capabilities of these new devices. Many of us in the ebook publishing industry hoped that they would adopt EPUB 3.

Instead, Amazon created something they call “Kindle Format 8”, or KF8. Technically, KF8 is a hybrid: it’s a valid .mobi ebook file, with all of the limitations of that format. But attached to it, in such a way that it will be ignored by legacy Mobi devices, is a file which is almost, but not quite, EPUB 3.

It’s understandable why they went this route. They wanted a single file format that would work on both older Kindles and newer ones. And if they had simply defined it as “.mobi with an EPUB3 attachment”, this would not have been a bad approach.

Unfortunately, that’s not what they did. Even though it is obvious from their documentation that this is more or less what their solution consists of, at no point do they actually say they support EPUB 3. And for good reason, because what they actually support in the “new hotness” section of a KF8 file is an undocumented subset of EPUB3.

There’s a little informal documentation, to be sure. But there is no formal specification for the format. And there are numerous “gotchas” that ebook authors must discover solely by trial and error.

An example of one of a “gotcha”: EPUB 3 specifies that readers may support any number of embedded font formats, but they must support OpenType and WOFF. KF8, as it turns out, supports TrueType and OpenType. So when generating a selection of ebook formats, an author must ensure that all embedded fonts are in OpenType, the only format that is guaranteed to be supported by all readers.

This is one of the more basic incompatibilities an author is likely to stumble across. More vexing are the seemingly endless little differences in the CSS support between EPUB3 and the KF8 subset.

Amazon provides a Kindle emulator which attempts to preview files as they will appear on various Kindle devices. But in my experience it’s not terribly accurate, especially when it comes to formatting code listings. When trying to get formatting consistent across multiple Kindle and non-Kindle devices, the conscientious ebook author is usually reduced to painstakingly loading the file onto an array of devices, over and over again, until they stumble across markup and styling which is acceptable to all of them.

In order to help authors generate valid KF8, Amazon also provides a tool called kindlegen. You feed in an EPUB3 file, and out comes a KF8 file. Along the way it outputs various warnings or errors if it comes across elements of the EPUB file that aren’t supported by KF8. kindlegen is the only way to generate an Amazon-approved KF8 file locally.

This is nice, but unfortunately the tool is closed-source and encumbered by restrictive terms of service. In particular, it is not permitted to sell the output of kindlegen outside of the Amazon store.

There is one other tool which can generate KF8: the Open-Source Calibre ebook tools. Unfortunately, as anyone who has spent a significant amount of time messing with ebook publishing tools knows, Calibre has a lot of significant limitations and outright flaws in output.

Nonetheless, using Calibre it was possible until recently, to generate a KF8 file that could be used both for direct sale and for upload to the Amazon store using open-source tools. The resulting ebook might not look quite as good as the EPUB version, but it was usually workable.

(BTW, quick technical note: the relevant command-line argument to ebook-convert is --mobi-file-type=both. This is insufficiently documented.)

However, recently Amazon stopped accepting files generated using Calibre. If you want to submit a KF8 file to Amazon, it now has to be one generated by kindlegen.

So anyone building a comprehensive ebook toolchain that supports both direct sales and the Amazon store must make support building .mobi files with both Calibre and Kindlegen. And must somehow deal with the many little differences in the output of those two tools.

Someone is bound to say something about how you don’t need to generate files locally at all. You can just submit them directly to the Amazon store, in various formats.

For technical authoring, or really any kind of advanced authoring requiring careful control over formatting, this simply isn’t a viable option. The only non-KF8 uploadable file format that Amazon accepts which will give you total control over fonts (including embedding) and styles is EPUB. So you’ll still have to generate an EPUB file. And Amazon is then simply putting the EPUB file you give them through their version of kindlegen, with all the little quirks that we’ve already discussed. So it’s just like generating the file locally, only with a much longer and less convenient turnaround to check whether the final product looks right. Did I mention that Amazon’s online “previewer” gives an even less accurate picture of what the file will look like on an actual Kindle then the downloadable Kindle previewer?

So here’s my situation, as an author who wants to sell books both directly and on the Amazon Kindle store:

  • I need to generate PDF files. This is the easiest part.
  • I need to generate EPUB3 files for non-Kindle devices and readers. This isn’t too painful; the majority of EPUB users are using iBooks, and iBooks support for EPUB3 is generally superb. On the Android side, the Play reader app also has pretty great suport for EPUB3. Other devices and software only support EPUB2, but EPUB3 is designed to remain compatible with EPUB2 readers. I know that people using EPUB2 readers won’t have as good an experience, but the book will at least be readable.
  • A lot of people expect to receive a .mobi file when buying a book from me directly, so I need to supply one. I have to generate it with Calibre, accepting the inevitable degeneration in quality that entails (or work very hard on workarounds specific to that part of the toolchain). And I have to deal with users who are upset that they tried to mail the file to their Kindle and Amazon rejected it (they now have to transfer the file directly).
  • For sales on the Kindle Store, I need to generate a KF8 with kindlegen. I have to painstakingly craft my EPUB files so as not to conflict with the limitations in Amazon’s undocumented subset of EPUB3.

This is a nightmare, and it’s a nightmare which any author or publisher of technical ebooks will tell you is a big part of their life.

What would I like to see?

  • Ideally, I’d like to see Amazon join the rest of the e-publishing world and simply support EPUB3. I’d like for differences in Kindle rendering to be considered reportable bugs in their EPUB support, rather than KF8 “features”.
  • Failing that, I’d like to see an exhaustive specification of the KF8 format, so that authors and tool writers can do a better job of supporting it, and generate files that Amazon will accept.
  • I would love to see amazon either open-source kindlegen, or put effort into helping authors of open-source tools like Pandoc and Calibre improve their KF8 support.

Do I expect any of this to actually happen? I don’t know. I’ve never really thought of Amazon as an enemy of Open Source and open standards. But in this case, they may see their strategy as being sufficiently in their own corporate interests to trump any concerns over the lack of openness it represents.

It’s not a new strategy, either. It’s called “embrace, extend, and extinguish“, and it was perfected by Microsoft back in the 90s. It involves fragmenting a market by seeming to adopt an open standard, and then adding your own closed, proprietary extensions.

In truth I don’t have high hopes that this will change anytime soon. Amazon is the 800-pound gorilla when it comes to ebooks, much like Microsoft was for operating systems and web browsers back in 90s. They don’t have any incentive to change their approach.

Why write this, then? Mostly just to vent. And to have something to point people to when they complain about how Amazon rejected a .mobi file they bought from me. I’ll keep writing and selling ebooks, and I’ll keep augmenting Quarto to smooth over some of the worst pain points in technical authoring. And hopefully, in ten years I’ll run across this article again and shake my head about the “bad old days”, when ebook formats were still fragmented and painful.