Java Performance Tuning

 

What’s going on under the hood of the JVM and how Garbage Collection affects Java performance?

The performance tuning world is a dangerous place, one JVM flag out of balance and things can quickly get hairy. For this reason, we’ve decided to turn to Haim Yadid, a Java performance tuning expert and the creator of mjprof, the monadic JVM profiler. In this post we’ll share some of his battle-tested insights and get a feel of how the internals of the JVM behave under stress.

On understanding what you’re up against

The JVM is subject to Garbage Collection pauses that vary in frequency and duration. During a pause everything stops and all kinds of unexpected behaviours come into play. When facing a new tuning project, one of two things will usually happen: Either the company already knows it has a garbage collection problem, or it will soon find out that it has one. At this stage they’re most likely experiencing pauses, unstable behaviour where the JVM gets stuck, and a general deterioration in performance. The symptoms are usually visible through slow response times, high CPU and memory utilization, or when the system acts normally most of the time but has irregular behavior like extremely slow transactions and disconnections.

The main pitfall: Ignoring the outliers

The way this kind of behaviour can be overlooked and not alert anyone is through one common mistake: Measuring the average transaction time, and ignoring the outliers. This is where GC problems hide: While most of the time a system may behave normal, at other times its responsiveness will go south and cause a bad experience for many users. For example, a transaction that would normally take 100ms, gets affected by a GC pause and suddenly takes several seconds or even a minute. In an eCommerce site this might go unnoticeable to anyone but the user if the system’s maintainers only look at the average transaction time. Another problem that can be easily overlooked is when system throughput is affected, say a 20% hit and it doesn’t fulfil its potential. You may never know something went wrong since you’re not looking at the right metrics. Many times the reason is low awareness to GC overhead and focusing on one metric of average response time, ignoring the 99th percentile.

Defining the performance requirements: Frequency and Duration

The main question here is this: What do you see as an acceptable criteria for the GC pause frequency and duration in your application? For example, a daily pause of 15 seconds might be acceptable, while a frequency of once in 30min would be an absolute disaster for the product. The requirements come from the domain of each system, where real time and high frequency trading systems would have the most strict requirements.

Overall, seeing pauses of 15-17 seconds is not a rare thing. Some systems might even reach 40-50 seconds pauses, and Haim also had a chance to see 5 minute pauses in a system with a large heap that did batch processing jobs. So pause duration doesn’t play a big factor there.

Find the Crap in Your Java App

Show me how >>

Fred

Stop The World and gather data: The importance of GC logs

The richest source of data for the state of garbage collection in a system based on a HotSpot JVM are the GC logs. If your JVM is not generating GC logs with timestamps, you’re missing out on a critical source of data to analyze and solve pausing issues. This is true for development environments, staging, load testing and most importantly, in production. You can get data about all GC events in your system, whether they were completed concurrently or caused a stop-the-world pause: how long did they take, how much CPU they consumed, and how much memory was freed. From this data, you’re able to understand the frequency and duration of these pauses, their overhead, and move on to taking actions to reduce them.

-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:mygclogfilename.gc

The minimal settings for GC log data collection

Looking at metrics, 5% is usually the upper bound for acceptable GC overhead while acceptable pauses are very different from one application to another.

Two tools worth mentioning here for GC log analysis are the open source GC Viewer that’s available on Github, and jClarity’s Censum.

Solution Strategies

Once we have the information we need, it’s time to examine possible reasons and solutions. Every change you apply calls for a new test and a round of log collection to assess its effectiveness and determine whether or not it helped us move the needle and hit the requirements. Preferably in production and under stress. There are 4 main ways in which we can address trouble caused by GC pauses: Switching a garbage collector, tuning the flags controlling the Java heap, making code changes, and using alternative JVMs / collectors. Here’s a quick overview of the approaches to consider in the HotSpot realm and the type of problems they address:

1. Wrong Garbage Collector in play

Roughly speaking, the JVM has 4 garbage collectors and you can choose which one to use during startup. To learn more about each type you can check out the comparison right here. One common reason for GC issues is using the wrong collector for the type of application you’re developing. HotSpot’s default is the Parallel / Throughput collector and often it’s not the best choice for your application. The act of choosing the right collector (via JVM flags) is a statement of your priorities from the system and usually the first issue to consider. Generally, the CMS and G1 collectors who are mostly concurrent will cause less frequent pauses. Although when a pause do comes, its duration will probably be longer than the one caused by the Parallel collector as their fallback mechanism is single threaded (Ouch). On the other hand, the parallel collector will achieve higher throughput for the same size of heap. Another guideline relates to predictability, if predictable performance is an important factor and the heap size isn’t large the Parallel collector might be the answer. And if the average response time / latency is your top priority, then CMS or G1 are most likely the answer.

2. Tuning the Java heap

After choosing your preferred GC algorithm it is time to do some tuning. Measure (via GC logs) the throughput and the pause time distribution and if you are happy with it then you are done. If the GC overhead is high (and throughput is low), usually increasing the heap size will improve the situation. When is comes to solving long pauses of CMS or G1 the situation is more delicate. Another reason for this apart from fragmentation is that the JVM can’t keep up with the rate that objects move to old gen from new gen and then it needs to pause the application to fix it. The solution here is either starting the GC earlier or increasing the heap size.

From experience heap sizes usually ranges between 1GB and 8GB , bigger sizes are much more rare. Increasing the heap size over 8GB during a tuning process usually happens when you’re becoming desparate. A viable reason for larger heap sizes is when we want to create a large cache, but that can also be solved off heap.

Let’s go through another example to show where tuning the spill rate is necessary. Say the application needs 100MB to handle some request and the new gen size is 50MB. Objects that shouldn’t be in old gen will reach there in no time. Tuning the new gen and survivor spaces will be needed to contain this problem and also make sure short lived objects will end their life in new gen. The main factors in play here are the heap size, the new to old gen ratio, the survivor space size and the max tenuring threshold – How many GC cycles does it take for an object to move to old gen.

Another important factor we need to take into account is the “liveset” of the application. Meaning the size of objects which are retained in memory for long periods an example for liveset will be an applicative cache which holds frequent DB query result sets. When tuning the JVM one needs to make sure that the “liveset” is conveniently accommodated in the old generation and there is sufficient free memory in this region on top of that consumption. Failing to do so will cause severe damage to the JVM behavior, resulting in low throughput and frequent pauses.

3. Architecture and code changes

Some issues will force us to resort to code and possibly even architectural changes. One cause for trouble we can address here is fragmentation. Long pauses with the CMS collector can be caused by fragmentation in old gen. Every GC cycle frees chunks of memory from old gen and makes it look like swiss cheese until a moment comes where the JVM just can’t handle it. This happens when the JVM moves objects from new gen that are bigger than these “holes” and then it has to stop the application to resolve the issue. Applications with a big state that changes over time are bound to cause fragmentation. As the state changes over time , “old state” objects will be released from old generation while their replacement state is created in the new generation. When it will eventually promoted to the old generation it will probably won’t fit to the right place and this will cause fragmentation.

Architectural solutions to these kind of problems may be to update objects in place, moving the “state” to off heap mechanisms or splitting process, the latency sensitive critical path with a lot of short lived allocated object to one process and the large state to another one .

4. Alternative JVMs and Garbage collectors

If pause time is critical to your application and the Hotspot JVM fails to deliver acceptable response times there are two more possible options. The first is Azul Zing JVM with the pauseless C4 garbage collector. In order to start using Zing you will need to have a relatively large machine and heap size starting from 32GB. Another option which is still not mature enough but may be worth a try if you like living on the edge is the Shenandoah GC algorithm. It is using a technique known as brook forwarding pointer which results in ultra low pauses with reasonable overhead.

Further reading: The leading GC experts

To gain more insight into Garbage Collection and the internals of the JVM, here are some of the most interesting people to follow in the space of GC:

Charlie Hunt, a member of Oracle’s Java Platform Group and the lead author of the Java Performance book.

Gil Tene, CTO and Co-founder of Azul Systems, the creators of the C4 pauseless garbage collector.

Kirk Pepperdine, performance tuning and jClarity CTO and co-founder.

Monica Beckwith, Java / JVM Performance Consultant.

Tony Printezis, JVM / GC Engineer at Twitter, former G1 tech lead at Oracle.

Jon Masamitsu, JVM Developer at Oracle.

Christine H. Flood and Roman Kennke, The developers of the Shenandoah GC algorithm.

Conclusion

Garbage Collection is one of the most fascinating topics in the JVM and we hope this post helped you get a better understanding of the various moving parts. Many thanks to Haim Yadid who agreed to share his experience with us! If you have any questions or would like to ask for clarifications, please let us know in the comments section below.

Takipi shows you when and why your code breaks in production. It detects caught and uncaught exceptions, HTTP and log errors, and gives you the code and variable state when they happened. Get actionable information, solve complex bugs in minutes. Installs in 5-min. Built for production.

solve-button (1)

Further reading:

Java Garbage Collection

Garbage Collectors – Serial vs. Parallel vs. CMS vs. G1 (and what’s new in Java 8) – read more

Java Garbage Collection

7 Things You Thought You Knew About Garbage Collection – and Are Totally Wrong – read more

Some kind of monster @ OverOps, GDG Haifa lead.
  • Yogendra N Joshi

    Can Takipi be used for IBM JDK as well? If yes, we should talk.

  • Moozaro Moozar

    Let’s assume that you have 50GB of RAM ,you work with G1 and your application is JAVA trading application (lots of short lived objects) . and your application is the only application running on the machine (except the OS itself)
    Your memory footprint is only 1GB (including cache and everything).
    The main question is :
    How do you set the optimal size of the -xmx value?
    Will you get better performance if you will set the all memory to the application? (assuming you leave few to the OS).

    On one hand, if you postpone the GC as much as you can, you eliminate the “stop the world” events, but on the other hand this is exactly the problem, you have lots of short lived objects that are not in use any more by the application , more memory does not always means better performance?
    is lots of small GC is not better then 1 huge GC.
    Will appreciate your thoughts and how you will set the -xmx and -xms in that situation?