Generate Kindle (MOBI) ebooks with your ASP.NET Web API

Recently, I’ve been working a little on an application that allows users to save, tag, bookmark links for later reading – that kind of stuff. Obviously, Web API facilitates those types of apps really well, as data can be exposed in a magnitude of formats. So I had this crazy idea – CLR to Kindle? Why not.

Unfortunately MOBI format (used by Kindle) is not that easy to support from C#, as to my knowledge there is no ready-made DLL port or SDK available. On the other hand, Amazon has created a proprietary tool called Kindlegen, which is a command line tool, and allows you to convert HTML into MOBI. We’ll use that – it’s a hacky solution but it sure is a lot of fun.

Kindlegen

Of course to start off you need to have Kindlegen. You can get it from the Amazon website. It is very simple to use – just takes a name of the HTML file as an argument and generates the MOBI file in the same folder.

Another useful link would be the Amazon Publishing Guidelines. It contains all kinds of information about how to format the HTML file in order for the generated ebook to be of highest quality. I will not focus on that at all here, as that’s not the scope of the article. In fact I’ll just use some HTML copied from this very blog, and as you’ll see Kindlegen works with pretty much anything (it just might not be perfectly sharply formatted).

Application

Our application will be a simple Web API application, off the MVC4 template in VS2010. You should copy the Kindlegen tool to the root of the website, into “kindlegen” folder.

My model is similar to what I used in other tutorials:

Url is a typical article type of entity I mentioned before, saved by the user. Notice it implements an IMobi interface, because that will be our contract for serializing to MOBI.

The IMobi interafce defines only two things it needs for creating MOBI output – unique ID (which we’ll use for naming the file) and an HTML representation of the CLR type – which will be flushed into the MOBI ebook.

Our HTMLRepresentation property getter on the Url class could take all shapes or forms – in my example it will compose some simple HTML out of the model’s properties such as Title, Description, Text, timestamps and so on. You might do that using simple string formatting/concatenation and build up an HTML structure like that, or use Razor templating engine or any other templating solution you are happy with.

Notice that the IMobi interface can also be implemented on collections/aggregate types to create sets of articles rather than serializing just a single article.

Formatter

As with any customized returned type in Web API, we”ll use a MediaTypeFormatter.

The overview of the formatter:

So we have everything included here, except the actual serialization proccess (writing to stream).

Few notes:
1. We never deserialize from MOBI, so CanReadType property is always false
2. We support text/html >media type and QueryStringMapping because if our controller is i.e. UrlController we want users to be able to type into browser api/url/1?format=mobi and get the file directly in the browser (without gaving to use any proprietary Content Type headers).
3. For the same reason we set the “attachment” content disposition in the default headers
4. Obviously we only support types implementing IMobi

Creating ebooks

OK for the last piece, or the show time, if you will, we will write the stream to the MOBI ebook. We will use a little hack/trick that allows us to run command line tools (such as Kindlegen) from the C# code.

So what happens here – step by step:
1. We cast the object to IMobi, then we get the path of the /kindlegen/ folder (which if you remember we copied to our web app).
2. We need an HTML file, in order to be able to generate MOBI (we need it to be able to invoke the command line tool, in-memory representation is not enough) – so we check if the HTML file already exists on the disk, if not, we write it.
3. Then we check if the MOBI file already exists (perhaps it was generated earlier?). If not, we start a new process and pass the name of our HTML file to it as an argument. If the process does not return Error(kindlegen) char sequence, everything should be fine.
4. We grab the MOBI file from the disk and flush its Stream to the response stream

One note here, is that this approach causes both the HTML and the MOBI file to be generated only once, all subsequent requests for the same model will result in returning the same files from the disk (think of them as “immutable”). There is nothing stopping you from doing otherwise, and regenrate the file everytime, or perhaps use a CRC to check if the HTML representation has changed (i.e. someone modified the text).

Wiring up

Final step is to wire up the formatter:

If I now request: http://localhost:56660/api/url/1 (a normal API request), I get a predictable output:

But if I request: http://localhost:56660/api/url/1?format=mobi, I get a file downlaod dialogue:

I can download the file, and open in Calibre, an excellent ebook management tool:

Finally, I can obviously sent it to my Kindle:

Of course the formatting is not perfect, but that’s all subject to adjusting the output HTML.

Summary

Generating MOBI out of CLR types via Web API + Kindlegen was just one of the crazy ideas I had this weekend. I hope you enjoyed the article, because I had a lot of fun playing around with this. And now, Sunday Football – so see you next time!