Why can’t production logs help you find the real root cause of your errors?
Asking if you’re using log files to monitor your application is almost like asking… do you drink water. We all use logs, but HOW we use them is a whole different question.
In the following post we’ll take a deeper look into logs and see how they are used and what’s written to them. Let’s go.
Big shout out to Aviv Danziger from our R&D team for his huge help pulling and crunching the data for us.
[This blog post is included as chapter 3 of our free Guide to Java Logging in Production. Download the full eBook here.]
— OverOps (@overopshq) February 7, 2017
Our quest for answers requires a large amount of data, and that’s why we turned to Google BigQuery. A few months ago we used it for the first time to see how GitHub’s top Java projects use logs.
For our current post, we took the top 400,000 Java repositories on GitHub, ranked by number of stars they were given in 2016. Out of those repositories we filtered out Android, sample projects and simple testers, which left us with 15,797 repositories.
Then, we extracted the repositories that had over 100 logging statements, which left us with 1,463 repos to work on. Now, it’s time for the fun part of finding the answers for all those questions that kept us awake at night.
TL;DR: Main Takeaways
If you’re not into pie, column or bar charts and want to skip the main course and head straight for the dessert, here are the 5 key points we learned about logging and how it’s really done:
1. Logs don’t really have as much information as we think, even though they can add up to hundreds of GBs per day. Over 50% of statements have no information about the variable state of the application
2. In production, 64% of overall logging statements are deactivated
3. The logging statements that do reach production have 35% less variables than the average development level logging statement
4. “This should never happen” always happens
5. There’s a better way to troubleshoot errors in production
1. How Many Logging Statements Actually Contain Variables?
The first thing we wanted to check is how many variables are sent out in each statement. We chose to slice the data on a scale from 0 variables up to 5 and above, in each repository. We then took the total count, and got a sense of the average breakdown over all of the projects in the research.
As you can see, the average Java project doesn’t log any variables in over 50% of its logging statements. We can also see that only 0.95% of logging statements send out 5 variables or more.
2. How Many Logging Statements Are Activated in Production?
Development and production environments are different for many reasons, one of them is their relation to logging. In development, all log levels are activated. However, in production only ERROR and WARN are activated. Let’s see how this breakdown looks like.
The chart shows that the average Java application has 35.5% unique logging statements that have the potential to be activated in production (ERROR, WARN), and 64.5% statements that are only activated in development (TRACE, INFO, DEBUG).
3. What’s the Average Number of Variables per Each Log Level?
So, not only do developers skimp on variables in their statements, the average Java application doesn’t send out that much statements to production logs in the first place.
Now, we’ve decided to look at each log level individually and calculate the average number of variables in the corresponding statements.
The average shows that TRACE, DEBUG and INFO statements contain more variables than WARN and ERROR. “More” is a polite word, considering the average number of variables in the first three is 0.78, and 0.5 in the last 2.
That means that production logging statements hold 35% less variables than development logging statements. In addition, as we’ve seen earlier, their overall number is also much lower.
OverOps lets you see the variables behind any exception, logged error or warning, without relying on the information that was actually logged. You’ll be able to see the complete source code and variable state across the entire call stack of the event. Even if it wasn’t printed to the log file. OverOps also shows you the 250 DEBUG, TRACE and INFO level statements that were logged prior to the error, in production, even if they’re turned off and never reach the log file.
We’d be happy to show you how it works, click here to schedule a demo.
4. This Should Never Happen
Since we already have information about all of those logging statements, we’ve decided to have a little fun. We found 58 mentions to “This should never happen”.
All we can say is that if it should never happen, at least have the decency to print out a variable or 2, so you’ll be able to see why it happened anyway 🙂
How we did it?
As we mentioned, to get this data we first had to filter out irrelevant Java repositories and focus on those which had over 100 logging statements, which left us with 1,463 repos that made the cut.
Then, we added some regex magic and pulled out all of the log lines:
Now that we had the data, we started slicing it up. First we filtered out the number of variables per log level:
Then calculated the average use of each tier. That’s how we got the average percent of total repositories statements.
You can check out the calculations in our raw data file.
We all use log files, but it seems that most of us take them for granted. With the numerous log management tools out there we forget to take control of our own code – and make it meaningful for us to understand, debug and fix.