Duke-Bug

The Pareto logging principle: 97% of logged error statements are caused by 3% of unique errors

We received a lot of feedback and questions following the latest data crunching post where we showed that 97% of logged errors are caused by 10 unique errors. By popular demand, we’ll go a step deeper into the top exceptions types in over a 1,000 applications that were included in this research.

Let’s roll.

(btw, this is our first post with a recommended soundtrack, check yo’ self)

Without Further Ado: The Top Exceptions by Types

Top10Exceptions

To pull out the data, we crunched anonymized stats from over a 1,000 applications monitored by OverOps’s error analysis micro-agent, and checked what were the top 10 exception types for each company. Then we combined all the data and came up with the overall top 10 list.

Every production environment is different, R&D teams use different 3rd party libraries, and also have custom exception types of their own. Looking at the bigger picture, the standard exceptions stand out and some interesting patterns become visible.

Check Your Self
True dat

1. NullPointerException – 70% of Production Environments

Yes. The infamous NullPointerException is in at #1. Sir Charles Antony Richard Hoare, inventor of the Null Reference was not mistaken when he said:

“I call it my billion-dollar mistake. It was the invention of the null reference in 1965… This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years”.

With a top 10 spot at 70% of the production environments that we looked at, NPEs take the first place. At OverOps, we actually have a special alert that lets us know whenever a new NullPointerException is introduced on our system, this is how you can set it up yourself.

Takipi NPE Monster
OverOps’s NPE Monster

2. NumberFormatException – 55% of Production Environments

In at #2 is the NumberFormatException which happens when you try to convert a string to a numeric value and the String is not formatted correctly. It extends IllegalArgumentException which also makes an appearance here at #3.

One easy fix to make sure that the input you’re passing to the parse method passes these regular expression:

  1. For integer values: “-?\\d+”
  2. For float values: “-?\\d+.\\d+”

3. IllegalArgumentException – 50% of Production Environments

Next up at #3, IllegalArgumentException, appearing at the top 10 exceptions in 50% of the production environments in this survey.

An IllegalArgumentException actually saves you from trouble, and thrown when you’re passing arguments from an unexpected type to your methods. For example, some method that expects type X and you’re calling it with type Y as an argument. Once again, an error that’s caused by not checking what you’re sending out as input to other methods.

IllegalArgumentException Takipi Monster
IllegalArgumentException OverOps Monster

4. RuntimeException – 23% of Production Environments

All exception objects in the top 10 list (Apart from Exception) are unchecked and extend RuntimeException. However, at #4 we’re facing a “pure” RuntimeException, where Java the language actually doesn’t throw any itself. So what’s going on here?

There are 2 main use cases to explicitly throw a RuntimeException from your code:

  1. Throwing a new “generic” unchecked exception
  2. Rethrows:
    • “Wrapping” a general unchecked exception around another exception that extends RuntimeException
    • Making a checked exception unchecked

One famous story around checked vs. unchecked and the last use case we described here comes from Amazon’s AWS SDK which ONLY throws unchecked exceptions and refuses to use checked exceptions.

 

Takipi RuntimeExceptionMonster
OverOps RuntimeExceptionMonster

5. IllegalStateException – 22% of Production Environments

In at #5, featured at the top 10 exceptions in 22% of over a 1,000 applications covered in this post is the IllegalStateException.

An IllegalStateException is thrown when you’re trying to use a method in an inappropriate time, like… this scene with Ted and Robin in the first episode of How I Met Your Mother.

A more realistic Java example would be if you use URLConnection, trying to do something assuming you’re not connected, and get “IllegalStateException: Already Connected”.

6. NoSuchMethodException – 16% of Production Environments

Such Method, Much Confusion. 16% of the production environments in this data crunch had NoSuchMethodException in their top 10.

Since most of us don’t write code while drunk, at least during day time, this doesn’t necessarily mean that we’re that delirious to think we’re seeing something that’s not there. That way the compiler would have caught that way earlier in the process.

This exception is thrown when you’re trying to use a method that doesn’t exist, which happens when you’re using reflection and getting the method name from some variable or when you’re building against a version of a class and using a different one at production (thanks @braxuss).

7. ClassCastException – 15% of Production Environments

A ClassCastException occurs when we’re trying to cast a class to another class of which it is not an instance. 15% of production environments have it in their top 10 exceptions, quite troublesome.

The rule is that you can’t cast an object to a different class which it doesn’t inherit from. Nature did it once, when no one was looking, and that’s how we got the… Java mouse-deer. Yep, that’s a real creature.

8. Exception – 15% of Production Environments

In at #8 is the mother of all exceptions, Exception, DUN DUN DUUUUN (grandmother is Throwable).

Java never throws plain Exceptions, so this is another case like RuntimeException where it must be… you, or 3rd party code, that throws it explicitly because:

  1. You need an exception and just too lazy to specify what it actually is.
  2. Or… More specifically, you need a checked exception to be thrown for some reason

9. ParseException – 13% of Production Environments

Parsing errors strike again! Whenever we’re passing a string to parse into something else, and it’s not formatted the way it’s supposed to, we’re hit by a ParseException. Bummer.

It’s more common than you might have thought with 13% of the production environments tested in this posted featuring this exception in their top 10.

The solution is… yet again, check yo’ self.

10. InvocationTargetException – 13% of Production Environments

Another exception that’s thrown at us from the world of Java Reflection is the InvocationTargetException. This one is actually a wrapper, if something goes wrong in an invoked method, that exception is then wrapped with an InvocationTargetException.

To get the original exception, you’d have to use the getTargetException method.

We see that 13% of production environments tested in this post had it in their list of top 10 exceptions. The second exception type here that’s directly related to Java’s reflection features.

Final Thoughts

The world of Java exceptions is indeed quite colorful, and it’s amazing to see how much impact the top 10 exceptions have on our logs. 97% of all logged errors come from 10 unique exceptions.

Try OverOps and find out what are the top 10 exceptions in your own production environment, it only takes a few minutes to get started and you’ll also get all the data you need in order to fix them. Source, Stack, State.

email
Some kind of monster @ OverOps, GDG Haifa lead.
  • http://ibragimov.by/ Ruslan Ibragimov
  • http://kokociel.blogspot.com.au kokociel

    :O hard-coding en_US as the locale for NumberFormat

  • Stijn de Witt

    I recommend fetching the logs from the production server at regular intervals. Say at the beginning of each sprint. And committing to, before even trying to fix the underlying problem, first fix the logging so that instead of an ugly stack trace no one can read, make sure you log a readable message with diagnosis. Of course for many exceptions in this list it’s hard to do that in a generic way. But I assert that if you looked deeper, you’d find that all those NPEs, IllegalStateExceptions and what not are actually all coming from the same places (looking at you Wildfly). If you keep doing that I’d bet that within a couple of months your production logs would be transformed from long lists of stack traces to short, readable error messages that clearly inform what went wrong, why it probably happened and how to fix it. Because in a lot of cases these errors come from configuration errors that aren’t checked enough and make their way into the system so deep that at that point we have no clue what’s up anymore and end up with some cryptic generic exception.

    If you did that, the system administrators would be able to fix the config themselves. Vague errors would be reported to your support system far less. You wouldn’t have to spend those hours/days figuring out why at customer X these weird stack traces keep appearing in the logs. Because the log file would point the admin to the config he made the mistake in and he would just fix it.