Blog_ JSON (1)

Which JSON library for Java can parse JSON files the fastest?

JSON is the accepted standard these days for transmitting data between servers and web applications, but like many things we’ve accepted, it’s easy to take it for granted and not put much further thought into it. We often don’t think about the JSON libraries we use, but there are some differences between them. With that in mind, we ran a benchmark test to see how fast four of the most popular JSON libraries for Java parse different sizes of files. Today, we’re sharing those results with you guys.

JSON is often used to transport and parse big files. This is a scenario that’s common in data processing applications running in Hadoop or Spark clusters. Given the size of these files, you can be looking at significant differences in parsing speed between libraries.

Small files come up all the time as incoming requests at high throughput, and parsing them happens quickly, so the differences in performance may not seem to be a big deal at first. But the differences add up, as often you need to parse lots of small files in rapid succession during times of heavy traffic. Micro services and distributed architectures often use JSON for transporting these kinds of files, as it’s the de facto format for web APIs.

Not all JSON libraries perform the same. Picking the right one for your environment can be critical. This benchmark can help you decide.

The JSON Libraries

JSON.simple vs GSON vs Jackson vs JSONP For the benchmark tests, we looked at four major JSON libraries for Java: JSON.simple, GSON, Jackson, and JSONP. All of these libraries are popularly used for JSON processing in a Java environment, and were chosen according to their popularity in Github projects. Here are the ones we tested:

  • Yidong Fang’s JSON.simple (https://github.com/fangyidong/json-simple). JSON.simple is a Java toolkit for encoding and decoding JSON text. It’s meant to be a lightweight and simple library that still performs at a high level.
  • Google’s GSON (https://github.com/google/gson). GSON is a Java library that converts Java Objects into JSON and vice versa. It provides the added benefit of full support for Java Generics, and it doesn’t require you to annotate your classes. Not needing to add annotations makes for simpler implementation and can even be a requirement if you don’t have access to your source code.
  • FasterXML’s Jackson Project (https://github.com/FasterXML/jackson). Jackson is a group of data processing tools highlighted by its streaming JSON parser and generator library. Designed for Java, it can also handle other non-JSON encodings. It’s the most popular JSON parser, according to our findings on Github usages.
  • Oracle’s JSONP (https://jsonp.java.net/). JSONP (JSON Processing) is a Java API for JSON processing, namely around consuming and producing streaming JSON text. It’s the open source reference implementation of JSR353.

The Benchmark

We ran a benchmark test on the libraries for both big files and small files. The requirements (and therefore performance) for handling different file sizes are different, as are the environments in which the need to parse these files arise.

The benchmark tested two key scenarios: parsing speed for big files (190 MB) and parsing speed for small files (1 KB). The big files were taken from here: https://github.com/zeMirco/sf-city-lots-json. The small files were randomly generated from here: http://www.json-generator.com/.

For both big and small files, we ran each file 10 times per library. Given the size of the big file, we did 10 iterations per run for each library. Each small file was iterated 10,000 times per run for each library. For the small files test, we didn’t retain the files in memory between iterations and the test was run on a c3.large instance on AWS.

The results for the big file are shown in full below, but I’ve further averaged the results for the small files in the interest of space. To view the extended results, go here. If you want to view the source code for the small files or the libraries, go here.

Big File Results

table 1

Big differences here! Depending on the run, Jackson or JSON.simple traded fastest times, with Jackson edging out JSON.simple in aggregate. Looking at the average result across all the test runs, Jackson and JSON.simple come out well ahead on the big file, with JSONP a distant third and GSON far in last place.

Let’s put that in percentage terms. Jackson is the winner in average time across all the runs. Looking at the numbers from two different angles, here are the percentage results:

table 3

Those are big differences between the library speeds!

Takeaway: It was a photo finish, but Jackson is your winning library. JSON.simple is a nose behind and the other two are in the rearview mirror.

Small Files Results

The table above shows the average of 10 runs for each file, and the total average at the bottom. The tally for fastest library on number of files won is:

  • GSON – 14
  • JSONP – 5
  • Jackson – 1
  • JSON.simple – 0

That seems pretty telling. However, looking at the average result for all the test runs across all the files, GSON is the winner here, with JSON.simple and JSONP taking a distinct second and third place, respectively. Jackson came in 2nd to last. So despite not being the fastest on any single file, JSON.simple is the second fastest in aggregate. And despite being the fastest on a handful of files, JSONP is well in third place in aggregate.

Of interest to note is that despite being the slowest library here, Jackson is very consistent across all the files, while the other three libraries are occasionally much faster than Jackson, but on some files end up running at about the same speed or even slightly slower.

Let’s put the numbers in percentage terms, again looking at the numbers from two different angles:

table 4

Compared to the big file tests, these are smaller differences, but still quite noticeable.

Takeaway: Bad luck for JSON.simple, as it again loses a close race, but GSON is your winner. JSONP is a clear third and Jackson brings up the rear.

Find the Crap in Your Java App

Show me how >>

Fred

Conclusion

Parsing speed isn’t the only consideration when choosing a JSON library, but it is an important one. Upon running this benchmark test, what we found was that there is no one library that blows the others away on parsing speed across all file sizes and all runs. The libraries that performed best for big files suffered for small files and vice versa.

Choosing which library to use on the merit of parsing speed comes down to your environment then.

  • If you have an environment that deals often or primarily with big JSON files, then Jackson is your library of interest. GSON struggles the most with big files.
  • If your environment primarily deals with lots of small JSON requests, such as in a micro services or distributed architecture setup, then GSON is your library of interest. Jackson struggles the most with small files.
  • If you end up having to often deal with both types of files, JSON.simple came in a very close 2nd place in both tests, making it a good workhorse for a variable environment. Neither Jackson nor GSON perform as well across multiple files sizes.

As far as parsing speed goes, JSONP doesn’t have much to recommend for it in any scenario. It performs poorly for both big and small files compared to other available options. Fortunately, Java 9 is reportedly getting native JSON implementation, which one would imagine is going to be an improvement over the reference implementation.

So there you have it. If you’re concerned about parsing speed for your JSON library, choose Jackson for big files, GSON for small files, and JSON.simple for handling both. Let me know if you have any thoughts on this benchmark in the comments.

Further reading:

Bug line 5 Error Tracking Tools Java Developers Should Know

Josh does product marketing for Takipi. He's a big baseball fan and a small beer nerd.
  • http://ruedigermoeller.github.io/ Rüdiger Möller

    Should be common knowledge as of today that you need to warm up properly. In addition you are repeating parser initialization with each test (usually jackson ObjectMapper is created once and reused).

  • Aku Ankka

    Is there a link for actual tests? I tried following things linked to, but did not see one for test cases. As per Rüdiger’s comments, proper warm-up is a must, as well as reuse (especially for small input). But beyond that, what exactly is being tested? JSON as simple Lists, Maps, or bound to POJOs? Reads and writes, or just reads?

    • Aku Ankka

      Looks code is available here:

      https://github.com/terencetaih/aws-speed/tree/master/JsonProcess/src

      and the read-only tests are for reading JSON as tree (node) representation, native to each library in question.

      Content is directly read from a file, which adds some common overhead, could be read in memory first. But probably not a big deal as OS tends to cache that.

      Rüdiger’s comment is correct regarding Jackson, not sure if other implementations would benefit or need reuse — I think they are ok. But for Jackson, ObjectMapper reuse is a must since all caching and reuse is handled via mapper instance.

      • Tal Weiss

        For the big file scenario, this will probably run within the context of a job (e.g. Spark), caching of the mapper for the most part will not play a role there. For smaller files, which is a scenario more common for a web service processing inbound requests, the mapper can be declared as static and final as it does seem thread safe. The question in that scenario how bog of change will that really make (especially with a large variance in the inbound data).

        • Aku Ankka

          You are right in that from performance perspective it will not make as much relative difference for large input, since startup cost is somewhat fixed, and ends up being smaller portion of the total time spent.

          But the reuse of an ObjectMapper via static instance, or a singleton provided via Dependency Injection (Guice), or, if you prefer, ThreadLocal (although there is no real benefit over static) is the best practice for Jackson, and one is never to create one-off ObjectMapper. Create-use-drop is an anti-pattern, unless it is truly single use (like command-line tool). So it is not as much an optimization as the proper way to do it.

          Hope this helps!

          • Tal Weiss

            Totally agree and great comment. Thanks!

          • http://ruedigermoeller.github.io/ Rüdiger Möller

            Reusing the mapper in Jackson makes a BIG difference for smaller files (various buffers are stored there and are reused behind the scenes). Also I’d like to point out you should use ThreadLocal for heavily multithreaded high load servers due to possible lock contention. For this test it does not matter

  • Mike

    How about adding Eclipse MOXY to the benchmark?

    • Aku Ankka

      Unfortunately MOXy does not have native JSON impl. At least it used to bundle Jettison, which is an XML API wrapper over org.json parser, and is both awkward to use and slow. Unless this has changed, there is very little reason to use MOXy for JSON, unlike for XML where it works well.

      Not that it couldn’t be included, if there is a stand-alone version of its JSON component. And if it is possible to use it in tree-binding style, instead of POJO databinding.

  • http://ruedigermoeller.github.io/ Rüdiger Möller

    a more properly engineered benchmark shows a different picture: http://ruedigermoeller.github.io/fast-serialization/json_bench.html

  • Михаил Бобруцков

    Hey guys, I work on my own json parser (and other tools https://github.com/wizzardo/Tools), here (https://goo.gl/cGXQMY) I compared it with gson, boon and jackson.. Source code you can find at https://github.com/wizzardo/json-benchmarks

    • Aku Ankka

      That looks better in many ways, and I like the explanation of both input data and styles of reading/writing (as Maps and POJOs — both common approaches but with different performance and usability characteristics).

      The only concern to me is that input and output are assumed to be ‘java.lang.String’. This may be relevant for some use cases, but in many other cases it is not: web frameworks typically expose byte streams for efficiency.
      So it would be good to explain bit on why specific setup was chosen. I mention this partly because performance difference between String input and InputStream is quite significant for case of Boon/Jackson comparison.

      • Михаил Бобруцков

        I wanted to benchmark parsing performance, so if we have String – we have char array, otherwise if we have Stream – we need to convert bytes to chars first

        I’ve slightly changed benchmark code to test streams (only for 2.5MB cities.json):
        Deserialization Score ops/s
        boon_string 22.04 ops/s
        boon_stream 19.427 ops/s
        jackson_string 23.978 ops/s
        jackson_stream 27.626 ops/s
        tools_string 40.349 ops/s
        tools_stream 31.954 ops/s

        Serialization Score ops/s
        boon_string 19.715 ops/s
        boon_stream 17.621 ops/s
        jackson_string 32.078 ops/s
        jackson_stream 36.152 ops/s
        tools_string 36.725 ops/s
        tools_stream 31.648 ops/s

        after this test, I see some ways to improve my code, thank you =)

        • Aku Ankka

          String vs streams is a tricky question, since not only do use cases vary, but so do decoders. Some JSON codecs just use JDK InputStreamReader, whereas others actually decode straight from byte sequence (Jackson does that, not sure about others).

          I do actually have couple of other suggestions, if you are interested?
          I would have filed an issue or PR, but looks like this is a fork, so those are not available via github.

          It seems like you are using POJO binding for GSON and Jackson deserialization (Map or List of POJOs), whereas Boon uses “untyped” approach (to Object, that is, Lists, Maps etc). This leads to bit of apples to oranges comparison.
          But you can easily change both Jackson and GSON to read into untyped as well; either give type of `java.lang.Object`, or, if you prefer, non-generic `List` or `Map` (depending on data). The reason I mention this is that there is actual performance difference there as well — especially for shorter content.

          Anyway, thank you for sharing the benchmark!

          • Михаил Бобруцков

            Original benchmark used maps and lists, I forked it and rewrote to pojo..

            public Object boon() {
            return JsonFactory.create().readValue(resource, List.class, type);
            }

            where ‘type’ depends on resource, did I make mistake somewhere?

            I think, I need to push this benchmark as independent project

          • Aku Ankka

            Ok. Maybe I misread the code there. I did notice that code differed from the original.

  • Michael Peterson

    Java benchmarks these days should be done using the JMH tool (http://java-performance.info/jmh/). Benchmarking on the JVM is notoriously difficult to get right, as Aleksey Shipilev explains here: https://vimeo.com/78900556.

    In addition, benchmarks should be accompanied by statistical analysis. I ran an ANOVA and t-test evaluation of your large JSON dataset run. The difference between JsonSimple and Jackson is not statistically significant, so you should not conclude that for that dataset Jackson is better. The conclusion should be that JsonSimple and Jackson are basically equivalent in performance for that data set and are both faster than the other competitors to a statistically significant level.

    • Fabien Renaud

      Here you go: https://github.com/fabienrenaud/java-json-benchmark
      Tested with JMH, reuse of factories, no disk IO, small payload only.
      Serialization and deserialization evaluated independently for: Jackson, Gson, Genson, FastJson, org.json, jsonp.

      Let me know if I fu anything 🙂

  • http://www.daily-dev-solutions.com Dor Ben Dov

    Interesting comparison.

  • Aku Ankka

    One more question: does JSON generator (http://www.json-generator.com/) really work? Right now it just seems to hang, not produce anything.

  • Matt Watson

    This is a nice list of JSON performance tips as well that could also be helpful: http://stackify.com/top-11-json-performance-usage-tips/

  • http://imgdj.com Iris Panabaker

    I love JSON and want to share tool which i just found http://jsonformatter.org

  • TatuSaloranta

    For what it is worth, running ‘small’ file (02.json), with 3 second warmup, using single Jackson ObjectMapper gives results like so:

    JsonpParser,471
    FangiongParser,405
    GsonParser,317
    JacksonParser,211

    which are much faster for all parsers, and results that are not very different from what I have seen from other tests.
    It would be great if the problems with the test were resolved, even if tests were not rewritten to use `jmh` or other solid performance test tools.

  • aroth

    Useful article, but would be interested in also seeing memory/gc statistics for each library. Currently using JSON.simple, but seeing really strange spikes in memory consumption, particularly when stringifying my json-objects.

  • Igor Spasić

    And no Jodd Json?

  • http://djakdekiel.pl/ djakdekiel

    I don’t know why but I prefer Jackson

  • mbonaci

    Would be nice to see nogit in this perf test

  • S.l. Kosik

    Hi, Josh. Could you please Lanch your test under any Android device? Or maybe share source codes in order i did it. Thanks in advance.