
Most Java sampling profilers rely on “Java Virtual Machine Tool Interface (JVM TI)” to profile Java applications. However, there is an inherent limitation with JVM TI. JVM TI only allows collecting stack traces at a safepoint. Therefore, any sampling profilers using JVM TI suffer from the safepoint bias problem.
“Async Profiler” does not use JVMTI to get stack trace samples and therefore, it avoids the safepoint bias problem. Let’s understand this problem first.
Following are some examples of safepoints in a Java application.
Between every 2 bytecodes (interpreter mode)
Backedge of non-counted loops
Method exit
JNI call exit
Safepoint bias problem
Nitsan Wakart has talked about this safepoint bias problem in many places. His blog post, “Why (Most) Sampling Java Profilers Are F***ing Terrible” is perhaps the most comprehensive blog post describing this problem.
In summary, when sampling profilers get sample stack traces, the application threads are stopped to collect data and the threads are resumed. The problem here is that the application threads are stopped only at safepoints and therefore when profilers get stack trace samples at predefined intervals, the stack traces are retrieved only at next available safepoint poll location. Hence, the samples are biased towards safepoints and the profiler may report inaccurate data.
How can we avoid this safepoint bias problem?
There is an AsyncGetCallTrace method, which is an OpenJDK internal API call to facilitate non-safepoint collection of stack traces. There are some profilers using this AsyncGetCallTrace method to avoid the safepoint bias problem.
Nitsan also has written a great blog post on AsyncGetCallTrace method: “The Pros and Cons of AsyncGetCallTrace Profilers”
The “Honest Profiler” is one of the first Java sampling profilers without the safepoint bias problem. Honest Profiler has its own sampling agent that uses UNIX Operating System signals and AsyncGetCallTrace API in order to efficiently and accurately profile a Java application.
Java Flight Recorder also does not require threads to be at safe points in order for stacks to be sampled. However, it cannot get metadata for non-safepoint parts of the code without using the following flags: “-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints”
Introduction to Async Profiler
Now, you must have figured out that “Async Profiler” also avoids the safepoint bias problem by using AsyncGetCallTrace API.
Async Profiler is a low overhead sampling profiler for Java. It uses AsyncGetCallTrace and perf_events together to give a more holistic view of the application including both system code paths and Java code paths. Async profiler works with any JDK based on HotSpot JVM (which has AsyncGetCallTrace method) such as OpenJDK and Oracle JDK.
With perf_events, the system code paths of an application are profiled. Async Profiler nicely combines both system code paths from perf_events and Java code paths from AsyncGetCallTrace API.
It’s also recommended to use the following flags when profiling with Async Profiler: “-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints”
Async profiler supports several types of profiling. For example,
CPU profiling
Allocation profiling
Wall-clock profiling
Lock profiling
Another exciting feature in Async Profiler is the out-of-the-box Flame Graph support, which allows visualizing stack traces.
Flame Graph is just one of the output formats supported by Async Profiler. Interestingly, Async Profiler also can produce a Java Flight Recording with “Method Profiling Sample” events.
Following are output types supported by Async Profiler.
summary: Summary of execution profile
traces: List all unique stack trace samples in the descending order of sample count.
flat: List all top methods from stack trace samples.
collapsed: Folded output of stack traces compatible with Brendan Gregg’s flamegraph.pl script.
svg: Visualize all stack traces as a Flame Graph.
tree: Web (HTML) page showing all stack traces as a call tree.
jfr: Java Flight Recording with Method Profiling Sample events
The last three outputs (svg, tree, and jfr) should be used with the option to dump output to a file.
If the output file has *.svg, *.html or *.jfr extension, the output format will automatically be svg, tree or jfr respectively.
The summary, traces, and flat outputs can be combined. That is, in fact, the default output of Async Profiler.
Getting Started with Async Profiler
As I mentioned earlier, Async Profiler depends on perf_events. Following configurations should be done to capture kernel call stacks using perf_events from a non-root process.
Set /proc/sys/kernel/perf_event_paranoid to 1.
Set /proc/sys/kernel/kptr_restrict to 0.
The first setting will allow general users to profile the kernel. The second setting will disable restrictions placed on exposing kernel addresses.
You can directly write values to the mentioned files or use sysctl command to update values.
Latest Async Profiler can be downloaded from GitHub releases page or you can easily build the Async Profiler. Async Profiler is also included in IntelliJ IDEA.
Since I like to work with the latest changes, I built the Async Profiler from the source. Building the profiler was easy. I just exported JAVA_HOME and executed the make command.
Let’s profile some sample applications and see how each profile will look like.
I used sample applications I developed to demonstrate various performance issues. Source code is at https://github.com/chrishantha/sample-java-programs/
I will be using the svg output since it’s easier to visualize stack traces with Flame Graphs.
CPU Profiling
To profile the CPU, I used “highcpu” sample application. This application has several threads doing several CPU consuming tasks such as calculating hashes for some random UUIDs and doing some math operations.
I executed the application using the following command.
Here, we can see that the Flame Graph shows both system code paths and Java code paths.
Without Async Profiler, getting a similar Flame Graph was not straightforward. In order to generate “Java Mixed-Mode Flame Graphs”, the following steps had to be done.
Run the application with -XX:+PreserveFramePointer.
Profile application using perf record command and generate a perf.data file.
Generate Java symbol table using perf-map-agent to map Java code addresses to method names.
Use perf script command, get folded output, and generate Flame Graph.
This approach had various issues. For example, the -XX:+PreserveFramePointer flag, which is only available in JDK 8u60 and later will have a performance overhead. Sometimes, the frames will be missing due to method inlining.
Allocation Profiling
Async Profiler can also be used to profile the code allocating heap memory. It uses TLAB (Thread Local Allocation Buffer) callbacks, which can be used in production without much overhead. Java Flight Recorder also uses a similar approach.
Let’s look at an Allocation profile for the same highcpu sample application. Note that profiling event type is alloc.
Now, top edges show the methods allocating memory. Here, we can quickly see which code paths are allocating memory in the application.
Let’s look at another sample application named “allocations”. This application checks whether a number is prime or not in a finite loop.
Unlike highcpu application, this application terminates in a short time. Therefore, I profiled the allocations from the beginning by launching Async Profiler as an agent.
Wall-clock Profiling
Async Profiler can also sample all threads equally irrespective of each thread’s status (Running, Sleeping or Blocked) by using the wall event type.
Wall-clock profile is really useful to figure out what’s happening with threads over time. Wall-clock profiling can be used to troubleshoot issues in the application start-up time.
Since all threads are profiled regardless of thread status, Wall-clock profiler is most useful in per-thread mode.
Let’s see an example. Following is the wall-clock profile the highcpu application.
Async Profiler supports many kinds of events like CPU cycles, Linux performance counters, allocations, lock attempts, etc. This story demonstrated a few examples of Async Profiler using some sample Java applications and producing Flame Graphs.
It’s really easy to use Async Profiler especially because of its outputs are much simpler and does not require special applications to process the outputs.
I recommend you to try Async Profiler and make sure that you always profile your applications while developing (to avoid costly production issues in the future).