What are the most popular libraries Java developers use? 2017 edition

It feels like only yesterday we were scraping data from GitHub to discover what are the top Java libraries of 2016, and all of a sudden another year has passed. This year, we’re kicking this data crunch up a notch and introducing Google BigQuery into the mix to retrieve the most accurate results.

For this year’s data crunch, we’ve changed the methodology a bit, thanks to Google BigQuery. First, we pulled the top 1,000 Java repositories from GitHub by stars. Now that we had the most popular Java projects on Github, we filtered out Android and focused only on 477 pure Java projects.

After filtering the projects, we counted the unique imports within each of them and summed it all together. A deeper walkthrough of the research process is available at the bottom of this post.

Without further adieu, it’s time to see who are the winners and bloomers of 2017 most popular Java libraries. Who will sit on the Java throne?

The top 20 Java libraries

The top 20 Java libraries

Keeping the same position from last year, JUnit is the most popular Java library in GitHub. It also holds the 2nd place with the extended JUnit abstract Runner class, and you can even find it in the 3rd place, with the older junit.framework.

Mockito, the open source testing framework is now the 4th most popular Java library. Rounding out the top 5 is slf4j, the logging facade for Java. Its popularity emphasizes developers’ dependance on logging and shows the low usage of the standard java.util.logging library. We’ve recently taken a deeper look in the most common logging habits among Java developers and published the research as an extensive eBook, you can check it out right here.

The rise of Hamcrest, a framework that assists writing tests within JUnit and jMock, is another sign that developers need better testing environments.

Creating a better debugging environment

Top positions for libraries aimed at producing better code stress out the importance of testing. It also brings up the fact that production errors are one of the biggest pains that developers have to face, and no wonder they try to avoid it as much as they can. The need to solve this pain is one of the main reasons that brought us to build OverOps.

Debugging in production consists of sifting through log files, trying to reproduce the variable state that caused the error. OverOps provides engineers with the exact variable state behind any exception, logged error or warning. It lets you see the complete source code and variable state across the entire call stack of the error. Even if it wasn’t printed to the log file.

We’d be happy to show you how it works, click here to schedule some time for us to meet.

Top trends and noticeable libraries

Within the top 20 libraries we can see a representation for the popular Google Guava libraries, more uses of the JUnit framework and an increased use of javax libraries. We can also see that the most popular JSON library is Jackson.

At #20 we can see a new name popping up that we didn’t notice on last year’s top 20: org.w3c.dom, which provides the interfaces for manipulating the DOM (Document Object Model). Also, taking a broader look at the top 100 list, we can see that Spring has a wide representation, with the following 8 libraries:

#57 – org.springframework.beans.factory.annotation
#60 – org.springframework.context
#65 – org.springframework.context.annotation
#66 – org.springframework.stereotype
#68 – org.springframework.util
#81 – org.springframework.test.context.junit4
#85 – org.springframework.beans.factory
#91 – org.springframework.web.bind.annotation

Another trend we were able to detect is the wide use of Apache libraries:

#16 – org.apache.commons.io
#22 – org.apache.http
#24 – org.apache.commons.lang
#25 – org.apache.http.impl.client
#30 – org.apache.http.client
#33 – org.apache.http.client.methods
#34 – org.apache.log4j
#35 – org.apache.commons.codec.binary
#45 – org.apache.commons.lang3
#53 – org.apache.http.entity
#61 – org.apache.http.util
#64 – org.apache.commons.logging
#75 – org.apache.http.message
#88 – org.apache.zookeeper
#95 – org.apache.hadoop.conf
#98 – org.apache.http.client.config
#100 – org.apache.http.client.utils

One of the notable changes in the chart is the rise of AssertJ, a library that provides a fluent interface for writing assertions. This year it climbed up and reached #50, which means that the most popular projects put a big emphasis on best practices, such as testing. At the bottom of the spreadsheet we can find the scripting API javax.script and org.apache.http.client.utils, a builder for URI instances.

Feel free to explore the full top 100 Java libraries list right here.

How did we do it?

As we mentioned at the beginning of the post, this year we used Google BigQuery to crunch data from GitHub. We’ve used GitHub’s API to pull the top 1,000 repositories, and extracted the Java libraries these repos use.

After filtering out Android, Arduino and deprecated repos, we were left with 259,885 Java source files. Then, we removed duplicate uses of the same library in the same repo, and ended up with 25,788 unique libraries.

How did we actually do it? With the kind help of Guy Castel from the OverOps R&D team, and some SQL queries. First, we wanted to create the top repositories table, called java_top_repos_filtered:

Now that we had the names of the top repositories, we pulled all of their content:

After we had the source files for each project, we wanted to pull all of their unique import statements. In the following query, we extract the package name, and made sure it is counted just once per project:

The final step was filtering the results again, making sure that there’s no Android, Arduino, deprecated or standard Java libraries that might have slipped through our query-cracks:

And there you have it, the top Java libraries of 2017.

Final thoughts

Using Google BigQuery paid off, and we got a much more verbose overlook of the libraries being used within the top GitHub projects.

The main conclusion is that most of the libraries who were popular in 2016 are still on top in 2017. The way we see it, it means that the developers, teams and/or companies behind these libraries are working hard at keeping them relevant and up-to-date.

It also means that if you’re planning on starting your own Java project, our spreadsheet could offer some good references to the libraries you should use.

Found other interesting libraries within our spreadsheet? We’d love to hear about them in the comments below.

email
I write about Java, Scala and everything in between. Lover of gadgets, apps, technology and tea.
  • bithead2

    thanks for writing this up

    • Henn Idan

      Thanks for the comment, glad you liked the post 🙂

  • http://www.leghrib.com/index/index.php Leghrib Badreddine

    thanks you 🙂

  • Christian Schwarz

    You write about top libraries but your chart contains package names? All 4 Junit entries belong to the very same lib, also all entries of google commons belong to guava.

    • Kisna

      Exactly, there should have been another graph that showed grouping by the actual library!

    • TatuSaloranta

      Besides this there are other examples of Java packages from same library being counted more than once: Jackson databind contributes 4 entries (plus 1 entry for its annotations and streaming-core packages). It is not a trivial problem to solve of course, but I wonder if other approaches (like scanning Maven poms like http://mvnrepository.com/ does) may have edge here.

      But even if this was manually reconstructed (most cases are easy enough to figure out) it’d be nice — however, it is not possible to simply add up counts since there’s obvious overlaps (one project is likely import classes from same set of related libraries).

  • Billy Fackwit

    meh. Oh, you forgot to mention the point of this? In fact it has little point. It doesn’t advise people how to do their job, or what to choose and why.

    It simply treats people as sheep and implies you should do like the majority do perhaps … as I said, meaningless