Github Java Libraries

What are the top Java libraries used by some of the most popular projects on Github? Based on analyzing 60,678 dependencies

We like backing up everything we say with data, that’s why some people claim we’re not that fun at parties. Obviously, they’re going to the wrong parties. In this post we’ve looked into 60,678 import statements of 11,939 unique Java libraries that are used by the top 5,216 Java projects on Github – and extracted the top 100 to a list. Or how we like to call it, a fun way to spend a rainy weekend.

There’s a tension between new rising technologies and good ol’ tried and tested libraries that we all like to use. The new libraries and frameworks tend to generate more buzz up to a point where it seems everybody are using them and you’re left behind the curve – This is often NOT the case and this post brings the numbers to prove it.

Without Further Ado: The Top 20 Java Libraries

Github Top 20 Java Libraries

The Main Insights From the Top 100 Libraries List

The Unexpected

Hadoop Blows Spark Out of the WaterHadoop comes in at #42 with no mention of Apache Spark in the top 100 list whatsoever. Apache Zookeeper made it to #75, helping maintain Hadoop clusters and keeping the elephants at bay.

Log4j is 2x More Popular Than Logback – We clearly see that Log4j, which is used in 16.76% of the projects we examined, is outrunning Logback that’s used as the logging engine only behind 8.45% of the top projects.

SQL > MongoDB > PostgreSQL – The Java SQL connector came in at #27, MongoDB showed up in #87, and PostgreSQL barely made the list at #97.

ElasticSearch has the Most Justified Buzz Around a Java Library – ElasticSearch, the search server based on Apache Lucene (which made #90 in the list), the E in the ELK stack, and a personal favorite of ours, is the library with the most justified buzz we have on the list.

Find the Crap in Your Java App

Show me how >>

Fred

And… The Usual Suspects

JUnit is the Undisputed King of Java Libraries – With 3,345 entries, 64% of Github’s top Java projects imports are set on JUnit. Followed by spring-test on the Spring front and testng, these are the top 3 Java testing libraries that we saw in the top 20 list.

SLF4J is the Most Popular Logging Library – Whether you’re using Log4j, Logback or any other logging engine, with 1,184 entries over 22% of Github’s top Java are using slf4j has their logging facade.

14 Out of the Top 100 Libraries are Coming From the Spring Framework – The most popular framework among the top 100 libraries (even more than apache-commons which has 12 libraries in the top 100), with spring-context as its most popular library.

Google Guava Rocks the Charts as the #4 Most Popular Java Library – With 815 entries which make 15.6% of Github’s top Java projects. We actually love using Guava here at Takipi as well and recently published a post about some of its useful yet lesser known features.

apache-commons is Really Common Coming in at #5 – With its top representative holding 659 import statements (12.63%) in Github’s top Java projects and 12 of its libraries in the top 100, apache-commons continues to justify its name.

Mockito is the Most Popular Java Mocking Framework – 559 entries (10.72%) show that mocking makes it big in Java, ranking as the 7th most popular library.

Developers Love Using joda-time – This comes as no surprise but it’s interesting to see the joda-time library by Stephen Coulbourne reach the 18th place.

5 More Entries Worth Mentioning

#65 – Bukkit – The only gaming library in the top 100 list, you guessed it right, Minecraft servers.
#66 – Jetty – Because Netty didn’t make it to the list.
#81 – PowerMock – A fresh entry to the top 100 list, states that “it can be used to solve testing problems that are normally considered difficult or even impossible to test”.
#90 – Google Protobuf – A language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
#100 – AssertJ – Rising in popularity over the last year and also included in the new version of Dropwizard, one of the popular new testing libraries, accepting migrations from FEST Assert.

Top 100 Libraries by Type

Github Top 100 Java Libraries by Type

To get a better sense of the the types of libraries that gather the most attention from the Java community, we’ve plotted the top 100 by type and their number of uses in Github’s most popular Java projects.

These numbers, where do they come from?

Let’s add some context to the stats: For starters we’ve pulled out the top 25,000 Java projects from Github by stars. On the second step we extracted the ones who use either Maven or Ivy for dependency management to gain quick access to their pom.xml / ivy.xml dependencies, this left us with 5,216 projects. Now that we had thousands of xml dependencies on hand, it was time to get a beer. Once we ran our of beer, we crunched out the data and got a total of 60,678 records of libraries in use with 11,939 unique libraries on hand. This means the average Github project in our dataset uses 11.6 external libraries. To make the analysis easier, we’ve processed the stats for the top 100 libraries by the number of Github projects they appear in. And added some classification by the type of library just for the heck of it.

The raw data is available right here and you’re welcome to take a look and make sure we didn’t miss any interesting insights. Although the beer drinking phase was an essential part of this research, the numbers are accurate.

Further Reading

Another interesting analysis comes from apiwave who looked into the top Java apis used by number of classes at each client’s project. The analysis was inspired by a previous post we’ve published in November 2014.

And what about the top tools Java developers use?

We’ve got you covered right here: The Top 15 Tools Java Developers Use After Major Releases

Seeing anything that we missed in the data? Please let us know in the comments section below.

This post in now in Spanish.

Takipi shows you when and why your code breaks in production. It detects caught and uncaught exceptions, HTTP and log errors, and gives you the code and variable state when they happened. Get actionable information, solve complex bugs in minutes. Installs in 5-min. Built for production.

Further reading:

15 Tools Java Developers Should Use After a Major Release  – read more 

Java 8

Java 8 Exceptions have never been so beautiful

email
Some kind of monster @ OverOps, GDG Haifa lead.
  • tekoyaki

    Would love to see this for other languages.

  • Emmanuel Bourg

    Nice, I would suggest merging the results for commons-lang and commons-lang3, these are just two versions of the same library even if the import statements are slightly different for technical reasons.

    • http://www.cowtowncoder.com/blog/blog.html Tatu Saloranta

      Similarly, Jackson 1.x and 2.x are ranked at #22 and #25, respectively, would be nice if there was a way to include proper coverage (may not be able to just add them up as both are technically possible to include).

      Another improvement would be to leave out platform APIs like servlet-api, which isn’t really a library but provided by runtime container.

      Still interesting to see these even with slight redundancy.

  • ryanlr

    A related article – Top 1000 classes from 10k open-source Java projects. http://www.programcreek.com/2014/09/top-100-classes-used-in-java-projects/

  • http://blog.mattnworb.com Matt Brown

    Most people using logback probably never import any of it’s classes and just use slf4j’s interface.

    • jfraney

      Agree.

      But things are muddy. Its unclear whether they counted imports or maven dependencies. And if maven dependencies, its unclear if build-time vs run-time distinctions were important. Even that distinction cannot be accurate because projects don’t bother with the ‘runtime’ scope to identify a log framework (like log4j) as ONLY a RUNTIME option.

      log4j is at 17% and slf4j-log4j12 is at 12%. This implies that perhaps upto 12% of log4j use is really as a RUNTIME option. Nobody should have a build time dependency on slf4j-log4j12, as per the intention of the ‘facade’ pattern (the ‘f’ in slf4j). This could mean the build time dependencies, actually ‘import’ of log4j library, could be as low as %5.

      Also, 2% are using log4j-over-slf4j which would push log4j as a RUNTIME downward.

      24% are using some slf4j- backend (slf4j-log4j12, logback, slf4j-jdk14, simple). This is close to those using slf4j-api (22%). Logback represents about 32% of those back-ends. That is a significant acceptance rate.

      Its muddy.

  • kuldeep

    About JUnit! It has the advantage that when you create maven based project, it is automatically present as a dependency. Second, github projects are generally in experimental phase, where focus is more on functionality than logging/testing. So, when same projects evolve from experimental stage to a matured state, they may go for Mock based testing framework and some SLF4J based logging.

  • http://www.javadiscover.blogspot.com/ Anand Kumar

    can visit for most of java interview quesitons and programs in my blog – http://javadiscover.blogspot.com

  • Jason Rembert

    Very nice article. If you want to download these libraries you download it here: http://jar-download.com/