Generate Kindle (MOBI) ebooks with your ASP.NET Web API - StrathWeb

Strath

September 16th, 2012

Generate Kindle (MOBI) ebooks with your ASP.NET Web API

Because, it's so crazy, it might work

Recently, I’ve been working a little on an application that allows users to save, tag, bookmark links for later reading – that kind of stuff. Obviously, Web API facilitates those types of apps really well, as data can be exposed in a magnitude of formats. So I had this crazy idea – CLR to Kindle? Why not.

Unfortunately MOBI format (used by Kindle) is not that easy to support from C#, as to my knowledge there is no ready-made DLL port or SDK available. On the other hand, Amazon has created a proprietary tool called Kindlegen, which is a command line tool, and allows you to convert HTML into MOBI. We’ll use that – it’s a hacky solution but it sure is a lot of fun.

Kindlegen

Of course to start off you need to have Kindlegen. You can get it from the Amazon website. It is very simple to use – just takes a name of the HTML file as an argument and generates the MOBI file in the same folder.

Another useful link would be the Amazon Publishing Guidelines. It contains all kinds of information about how to format the HTML file in order for the generated ebook to be of highest quality. I will not focus on that at all here, as that’s not the scope of the article. In fact I’ll just use some HTML copied from this very blog, and as you’ll see Kindlegen works with pretty much anything (it just might not be perfectly sharply formatted).

Application

Our application will be a simple Web API application, off the MVC4 template in VS2010. You should copy the Kindlegen tool to the root of the website, into “kindlegen” folder.

My model is similar to what I used in other tutorials:

Url is a typical article type of entity I mentioned before, saved by the user. Notice it implements an IMobi interface, because that will be our contract for serializing to MOBI.

The IMobi interafce defines only two things it needs for creating MOBI output – unique ID (which we’ll use for naming the file) and an HTML representation of the CLR type – which will be flushed into the MOBI ebook.

Our HTMLRepresentation property getter on the Url class could take all shapes or forms – in my example it will compose some simple HTML out of the model’s properties such as Title, Description, Text, timestamps and so on. You might do that using simple string formatting/concatenation and build up an HTML structure like that, or use Razor templating engine or any other templating solution you are happy with.

Notice that the IMobi interface can also be implemented on collections/aggregate types to create sets of articles rather than serializing just a single article.

Formatter

As with any customized returned type in Web API, we”ll use a MediaTypeFormatter.

The overview of the formatter:

So we have everything included here, except the actual serialization proccess (writing to stream).

Few notes:
1. We never deserialize from MOBI, so CanReadType property is always false
2. We support text/html >media type and QueryStringMapping because if our controller is i.e. UrlController we want users to be able to type into browser api/url/1?format=mobi and get the file directly in the browser (without gaving to use any proprietary Content Type headers).
3. For the same reason we set the “attachment” content disposition in the default headers
4. Obviously we only support types implementing IMobi

Creating ebooks

OK for the last piece, or the show time, if you will, we will write the stream to the MOBI ebook. We will use a little hack/trick that allows us to run command line tools (such as Kindlegen) from the C# code.

So what happens here – step by step:
1. We cast the object to IMobi, then we get the path of the /kindlegen/ folder (which if you remember we copied to our web app).
2. We need an HTML file, in order to be able to generate MOBI (we need it to be able to invoke the command line tool, in-memory representation is not enough) – so we check if the HTML file already exists on the disk, if not, we write it.
3. Then we check if the MOBI file already exists (perhaps it was generated earlier?). If not, we start a new process and pass the name of our HTML file to it as an argument. If the process does not return Error(kindlegen) char sequence, everything should be fine.
4. We grab the MOBI file from the disk and flush its Stream to the response stream

One note here, is that this approach causes both the HTML and the MOBI file to be generated only once, all subsequent requests for the same model will result in returning the same files from the disk (think of them as “immutable”). There is nothing stopping you from doing otherwise, and regenrate the file everytime, or perhaps use a CRC to check if the HTML representation has changed (i.e. someone modified the text).

Wiring up

Final step is to wire up the formatter:

If I now request: http://localhost:56660/api/url/1 (a normal API request), I get a predictable output:

But if I request: http://localhost:56660/api/url/1?format=mobi, I get a file downlaod dialogue:

I can download the file, and open in Calibre, an excellent ebook management tool:

Finally, I can obviously sent it to my Kindle:

Of course the formatting is not perfect, but that’s all subject to adjusting the output HTML.

Summary

Generating MOBI out of CLR types via Web API + Kindlegen was just one of the crazy ideas I had this weekend. I hope you enjoyed the article, because I had a lot of fun playing around with this. And now, Sunday Football – so see you next time!

Be Sociable, Share!

  • http://beletsky.net Alexander Beletsky

    Very, very nice hack. Had joy of reading this.

    • Filip W

      thank you sir, appreciate it

  • http://bobuva.blogspot.com Bob Uva

    I like it! I might even look at enhancing for some alternatives to using Pocket for saving to-be-read items.
    Thanks!

  • Pingback: Dew Drop – September 17, 2012 (#1,402) | Alvin Ashcraft's Morning Dew

  • Pingback: Joel Cochran » Weekly roundup 09/21/12

  • angel

    Hi..a little question a bit off topic…what font are you using?…it looks good…

  • Pingback: Reading Notes 2012-09-24 | Matricis

  • http://typedstrong.com Jesse

    I skipped over this article for days thinking I had no use for what you were doing. I have to say I was quite wrong. This is an awesome little project. I wonder, though, if it could be extended somehow to use Razor or some other templating engine. That way the HTML isn’t hanging around in each instance of URL. Also, I had no idea about running command-line stuff in C#. Too cool.

    • Filip W

      Thanks a lot!

      It sure can be extended to work with Razor, I was actually planning to do that from the get go, but just didn’t have enough time and published the post as is. I’ll try to update it in the near future.

  • Raghav

    Hey Thanks Filip, You are getting some amazing ideas, so if want to create pdf in the same way as above, should i start consuming acrobat assemblies ?
    Also Can you share your solution on git ?

    Thanks!

    • Filip W

      Thanks man! Yeah, I was planning to write a blog about generating PDFs as well, so watch out in the coming weeks.
      I will post a code for this (MOBI) on github, as soon as I take it a step further. I want to incorporate using Razor templates instead of hardcoded HTML!

  • Nice

    Very nice post.

  • Pingback: .Net News – September Summary – Namics Weblog

  • http://codebetter.com/glennblock Glenn Block

    Nice job Filip. Very cool idea!

    • Filip W

      Thanks! Some say cool, some say half-baked – the line is usually very blurry :)

  • http://www.facebook.com/srosseter Scott Rosseter

    by anychance does this render images with a http source?