The ultimate logging dictionary, or: what are the most common words we log?
Log files are the most common way to debug applications, and they can definitely lead us in the right direction when it comes to solving errors. However, most log files grow by million of messages each day, and it’s important to keep them as clear as possible, so you and your team will be able to understand what went down when an error was thrown.
Along with sending the variables, most of us add a description of our own. Since we’re avid fans of data crunches, we’ve decided to dive even deeper into logs, and dissect those log messages to see what you write to your logs. Can you guess what’s the most common word used in log files?
— OverOps (@overopshq) March 9, 2017
40,000 Projects, Thousands of Log Lines
During the past few months we’ve been on a quest to understand how GitHub’s top Java projects use logs. We looked at the top 400,000 repositories on GitHub, and seeked information.
We questioned whether standard Java logging is dead, enquired into the battle of parameterized logging vs string concatenations including if, why and when you should use each one and got an answer to the ultimate question – why production logs can’t help find the real root cause of errors.
Now that we have all of that information in our hands, it’s time to have a little fun. Which words developers use when logging? Are curse words as popular as we think they are? Do developers log in languages other than English? And are smiley faces a thing when it comes to logs?
Does the Length of Your Log Lines Matter?
The first answer we seeked out to find was how long are log messages. We already know how many variables are sent to the logs (and how they’re written), but this time we’re focused on the strings alone.
The average log line length, including the whole line and not just the message, with the call to the logger and the the log level, is 32 characters. But what do these characters say? What words do they represent?
To find this out, we created an index of the strings written to the log, counting the number of occurrences for each word. That got us a total of 139,079 words, and 3,648,131 occurrences. Now, we can answer the following question:
What are the Most Popular Words Written to the Log?
Coming in at number one, the most popular word found within the logs is… “to”. Not too existing, especially since it’s most commonly used as a preposition, for example: “This data should be sent to the log”. This log message would, hopefully, contain some relevant data and parameters.
Within the top 20 words we found written the logs, the 3 that popped up were “Error”, “Failed” and “Exception”, in both capitalized and all lower case versions. Breaking it down even further, there’s a total of 815 variations to the word error, 623 variations of the word fail and 1,052 variations of the word exception.
Since logs are meant to help us identify what happened, it makes sense to see the high repetition of these words. However, seeing as there are 9 variations to the word oops, it might be good practice to plan ahead when something “oops-worthy” happens, and not just add it to the logs.
And of course, we couldn’t help ourselves and wrote a haiku made solely from words found in our logging dictionary:
Connection not found
Request value exception
Failed and error
Want to make your logs better?
The strings in your log files are meant to help you understand what happened when a critical error was thrown, but more often than not, there’s only so much you can fit inside a log message. It can take hours and sometimes days trying to debug through log files, and instead of working on new features you waste time on fixing errors in previous deployments.
We’ve experienced these exact same problems in previous companies we worked at, and now it was time to build a solution that automates the debugging process. Developers do more daring things when they know there’s a safety net to protect them if production errors do occur.
OverOps shows you the variable state behind any exception, logged error or warning, without relying on the information that was actually logged. You can see the complete source code and variable state across the entire call stack of the error, even across microservices and machines.
OverOps also shows you the last 250 DEBUG, TRACE and INFO level statements that were logged prior to the error, in production, even if they were turned off and never reached the log file.
Discover the new way to debug errors in production. Watch a live demo of OverOps.
Logging in Foreign Languages
We don’t know about you, but when we think of log files, we visualize long lines of text that are meant to help us solve the riddle that is our application’s behavior. For us, that text is in English, but do developers prefer to log in their native tongue?
Out of the 803,869 log messages we checked, the most popular is English with over 70% of messages written in it. While it might rule the logs, it’s not the only language we found. Actually, we found 35 other languages along with English.
The second most popular language is French, but it only holds 4.37% of the log messages. There are a lot of other languages we found, from Norwegian (with 2.4% of log lines written in it), through Afrikaans (with a little over 1%), Tagalog, Romanian, Simplified Chinese and we even found a few lines in Bengali and Macedonian.
What Else Did We Find?
The security of the users is the top priority for every company. Or is it…? We decided to see if it’s true through the logs. Sure, these are your log files, but keeping personally identifiable information in them just seems wrong.
Among the examples we came across we were able to see that credit cards numbers, phone numbers, addresses and even passwords were saved as plain text to the log. Yikes. Here are a few examples:
…”validateCreditCardNumber – ” + creditCardNumber + …
…”Request for processing without filename: phoneNumber=(” + phoneNumber …
…”Password: ” + password …
On a brighter note, another interesting discovery we came across was the use of smiley faces. We found 11 happy smiley faces with a nose 🙂 and 4 sad faces with a nose 🙁 . We also came across a lot of happy/sad faces without a nose (77 sad 🙁 and 42 happy 🙂 ) – but most of them were used in their original form – colon and brackets, and not as an expression of joy or sadness.
Log files are very similar to… escape rooms. You find yourself locked (a critical error or exception were thrown), having little bits and pieces of clues (your log files), and you have to solve the big riddle on time, or you’ll lose (your users).
Log files are there to help us, but sometimes it seems like we forget that they should be meaningful enough for us to understand, debug and fix errors. If you relate to this, you should know that there’s a better way to use log files. Try it right now.