BSON (Binary JSON) and how your Web API can be even faster - StrathWeb

Strath

July 22nd, 2012

BSON (Binary JSON) and how your Web API can be even faster

Because BSON media type formatter is very useful

I have been reading the wishlist at Web API Codeplex repository recently, and noticed that one of the most popular requested features, is to add support for BSON (Binary JSON) media type (6th on the list).

Of course all it takes to include BSON into your Web API is to simply write a media type formatter for it, and since JSON.NET already has great BSON support, it is actually quite easy.

Now, you might be asking a question, why to do it in the first place? Isn’t JSON enough? Well, the main reason is performance, as according to JSON.NET tests, BSON would produce output that’s often smaller than JSON (up to 25%). It is also much quicker to encode and decode, as for simple types there is no parsing to/from their string representation.

Let’s do it then.

About BSON

If you are still unsure about using BSON, you can find more information about the specification here, and of course at Wikipedia. The format has been popularized by MongoDB, and is a great alternative to binary/JSON/XML way of passing the data between the various systems.

The main advantage is that if you for example have a an integer, it doesn’t get converted to a string and back to integer when you are serializing/deserializing your POCO.

Creating the formatter

So what we need is a new media type formatter. If you are unfamiliar with the concept, take a look at my previous articles: about media type formatters and about content negotiation.

To create a media type formatter, we inherit from System.Net.Http.Formatting.MediaTypeFormatter.

We will have to override the methods shown below. Please note, this is relevant for Web API RC, because in RTM these signatures change slightly (i.e. HttpContent instead of HttpContentHeaders is passed).

In this example we will follow the conventions of the Web API source code, so in addition to the necessary overrides, we expose some public members of the formatter the same way the default Web API JSON.NET formatter does.

Let’s start with a constructor though.

We start off by telling the formatter to support “application/bson” media type. We also initialize our private JSON.NET serializer settings for later use.

The default serialization settings are the same as those of the default Web API JSON.NET formatter. Additionally, you can set your own if you wish, through the use of the public property.

Now let’s handle the CanReadType and CanWriteType methods.

Since JSON.NET can serialize any CLR types, we will always return true for both methods, as long as a type is passed to the formatter.

Finally, let’s deal with writing to the stream (serialization) and reading from the stream (deserialization).

The “write” code should be very straightforward. We use JSON.NET’s BsonWriter to serialize our object. We run the serialization synchronously (hence the use of TaskCompletionSource), the same way as Web API does it, because there is no real advantage of switching threads for these simple operations.

Now the deserialization:

While this method is also fairly simple, two notes here. We need to use IsAssignableFrom to determine if the deserialized type is a collection. This is a limitation of BSON as it cannot automatically detect the root container type. If it is, we treat the entire BSON as a collection.

The second thing worth noting is the error handling. This is actually copied from the ASP.NET Web API source, and that’s exactly how the default JSON.NET formatter handles the errors. The goal is, as mark all exceptions as handled, as otherwise it may be rethrown at each recursive level and overflow the CLR stack.

Plugging it in

As usually, we need to plug it into our GlobalConfiguration.

Consuming BSON serialized types

Now let’s see this in practice. I will be testing with a couple of simple Console app methods.

In this test example I am using a simple MyType class which has one property only, a string Name.

It looks like this in Fiddler (binary response):

This returns the following output (binary response can be deserialized smoothly):

Now, what I can do as well, obviously, is I can post an item back using the BSON format.

Now if I enumerate all of them again, I can see it has been added.

Finally, let’s see if the performance really improves with BSON. JSON.NET actually comes in with nUnit performance tests. If you run those you’d see that BSON gives the smallest size, and usually outperforms JSON and is right on par or faster than DCS.

Summary and source code

As you can see, you can easily add BSON support for your Web API, and benefit from its performance. Of course it is not suitable for sending down to the browser, but whereever your client has the capabilities of deserializing BSON (i.e. any .NET client which has access to JSON.NET), then you should definitely consider providing BSON media type from your API.

Source code, as always, included.

source code (gist)

Be Sociable, Share!

  • Bruno

    In WriteToStreamAsync and ReadFromStreamAsync methods shouldn’t you be calling tcs.SetException in case of exceptions thrown?

    Or is just throwing the exception the same thing?

  • http://beletsky.net Alexander Beletsky

    Nice post! Any proof it is *really* faster?

    • Filip W

      sure, I added a screenshot from performance tests that are included in JSON.NET source.

      You can run them yourself if you don’t believe :-)

      • http://beletsky.net Alexander Beletsky

        I intuitivelly understand that BSON should give better performance, but it would be great to see actuall performance boost of Web API end point with and w/o BSON serialization, as article name stated.

        So, my question is less related to JSON.net benchmarks, but for Web API benchmark.

        Is it possible to do such measurement?

        • Filip W

          Sure we could time the end-to-end integration tests. But I don’t really see a reason for that to be honest.

          Take using JSON media type formatter:
          1. Action invoked
          2. CLR type returned
          3. JSON.NET formatter runs
          4. serialized data written to HTTP response
          5. HTTP response transported to the client

          Now, with BSON:
          1. Action invoked
          2. CLR type returned
          3. BSON formatter runs
          4. serialized data written to HTTP response
          5. HTTP response transported to the client

          Really the only difference are steps 3 (different formatter) and 5 (since response size would be different). Since the serialization tests show BSON is on par or superior in both of these steps (not always, but in most cases), what’s the point in timing that again? :)

  • Karl Seguin

    BSON is faster to decode because fields are length prefixed with type information embedded in the format. This means that rather than having to scan character by character for tokens, you can read predefined chunks of bytes at a time and know what you are getting.

    However, it can also be larger than JSON because of this prefix and because of the explicit array indexes.

    One of the key benefits of BSON is that it allows for in-place-updates, something that isn’t useful in this case at all, but that costs you space.

    The correct tool to use for your case isn’t BSON, but MessagePack. First, it’s compatible with JSON (BSON isn’t). Secondly, it’s always smaller than JSON (as far as I know), and it’s also quick to encode/decode (not sure if it’s faster or slower than BSON).

    Right tool for the right job..

  • Pingback: Dew Drop – July 23, 2012 (#1,369) | Alvin Ashcraft's Morning Dew

  • Pingback: Today in APIs: Twitter Monitoring, Python REST APIs and 17 New APIs

  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1152

  • Pete

    Binary is clearly going to be smaller and most likely faster to process, but there are other considerations. A lot of the most popular and long-lived protocols such as SMTP, NNTP, HTTP and POP3 are text-based even though the speed difference was probably more important when they were being created; arguably, this is because plain text is both very easy to work with as a programmer (I can do basic troubleshooting of a server with nothing more than a copy of telnet) and also highly interoperable (even more so with the general acceptance of UTF-8 encoding).

    If there’s a large amount of data to be transferred then binary is likely to be a good choice; if it’s only a few kilobytes then I’d be more hesitant to discard text’s advantages. Of course, if it’s less than a Kb or so then it’s totally immaterial size-wise because it’ll all fit in a single packet anyway!

    • Filip W

      That’s a very valuable insight, and I definitely agree. Binary is not a silver bullet to solve all your problems.

      I think what’s most interesting in this case, and hopefully I managed to show that through this simple example, is how easily you can enable your Web API endpoint to serve binary (BSON) representation of your CLR types if the client requests such. And then it’s up to this client (consumer of the API) to determine whether BSON is something he wants to deal with or whether he’d go the more traditional JSON or XML route.

  • http://jvaras.com Jorge

    Excellent article. Quick question: How BJSON manages byte arrays?

    • Loran

      If I were a Teenage Mutant Ninja Turtle, now I’d say “Cowbaugna, dude!”

  • Anonymous Coward

    I used such a prefixed format some longer time ago. The overhead gets huge, if your objects are deeply nested – which is the case for almost any serious data structure.

    Compression on the wire – which is already built into the browser – does a much better job than binary JSON. Debugging the communication between two BSON endpoints would be a lot harder than JSON without a specialized tool – like fiddler
    for BSON, for example.

    Browser clients would have a hard time using BSON, whereas JSON is the most natural format for Javascript code.

    So all in all you get a small improvement in bandwidth consumption at the cost of several other, quite costly, inconveniences.

    The biggest problem of web apps and RPC calls isn’t bandwidth, it’s latency. If you need 10 ms to process a request, but 30 ms or longer to send a packet across the Atlantic, a 25% decrease in size won’t really make a difference. If your requests and responses are typically under one kilo of JSON, reducing their size further won’t matter, since they’ll still use up an entire TCP packet, on most networks, and latency will stay the same.

    Which is why I don’t think something like BSON-RPC is ever going to fly.

    • Filip W

      I think this is a very valid discussion to have, thanks a lot for this great comment. I have already mentioned that BSON is not a magic wand for your application.

      With that said, what Web API really is all about, is building a RESTful API with numerous media types support. If you could offer BSON as one of the content type options that the client can request from you, it only increases the flexibility and broadens the possibilities for consumption of your API. And it’s really up to the client to determine whether BSON, XML or any other media type you offer from your REST API is something he wants to go with. I second your debugging worry, indeed it is much harder to do that with binary content; but that would really have to be the burden of the client. You have to remember that APIs are not consumed just by the browsers, but by a wide variety of clients within different contexts. And there are valid use cases for anything.

      As a side note, the BSON support baked into JSON.NET is tremendous and performs unbelievably well – outperforming JSON in most-to all the test. There are also additional benefits such as ease of accessing the response – as BSON is great at pulling information out of.

  • Mabit

    This post is really remarkable, particularly because I was investigating this topic recently.

    Thanks a lot and great job.

  • http://msprogrammer.serviciipeweb.ro/ Andrei Ignat

    How do you send from javascript binary?

  • Pingback: Friday Links #213 | Blue Onion Software *

  • http://www.domlia.com Domlia

    Nice work, Thanks for sharing.

  • http://www.dotnetjalps.com jalpesh vadgama

    Nice work!!

  • Pingback: Friday links 37 « A Programmer with Microsoft tools

  • Pingback: Friday link 47 « A Programmer with Microsoft tools

  • Pingback: [.NETWorld] Looking at ASP.NET MVC 5.1 and Web API 2.1 – Part 4 – Web API Help Pages, BSON, and Global Error Handling | sudo man