July 17, 2014

Performance Comparison of cglib, Javassist, JDK Proxy and Byte Buddy

Java Application Development
Enterprise Development

Today we're comparing the Java code generation libraries Byte Buddy, cglib,  Javassist, and JDK Proxy. Specifically, we're comparing them in a performance benchmark to see which code generation library is the fastest for Java applications. But first, let's get some background on this article series.

Our Series on Runtime Code Generation Libraries

In the first part of this series, we had a look at Java’s strong and static type system. We concluded that this type system allows for writing expressive and robust applications but limits the latitude of framework APIs to incorporate user types. We also saw how Java’s reflection API is not always the best choice for interacting with user types. To see this more clearly, we reasoned about the implementation of a simple security library where using the reflection API would break type safety while we could use code generation in order to retain user types.

In the second part of this article, we then learned about differently libraries for code generation and we had a closer look at Byte Buddy, a library of my own efforts. We then used this library to provide a simple implementation of the prolonged security framework.

In this last part, we want to compare the different libraries in a performance benchmark. If you have not yet read the previous parts, make sure to check them out before reading on. I promise, we won’t go further until you are back.

Code Generators: Byte Buddy vs cglib vs Javassist vs JDK Proxy

In the end, a shiny API will not be the only criteria for choosing the best code generation library. A library’s runtime performance might be an even more important factor, especially if the generated code takes up a crucial position within a running application. There exist numerous urban legends on the performance of different code generation libraries, but I've never found a proper benchmark to prove a claim in favor of any one specific technology.

Doing a micro-benchmark in Java is not an easy task. If you measure the execution time of a given code block, you do not normally know what it is that you are actually clocking. When Java code is executed, the just-in-time (JIT) compiler always kicks and in the most extreme case plainly erase the measured code.

However, over the last years, several smart people came up with ways to trick the JIT compiler and implemented micro-benchmarking libraries based on these ideas. My personal favorite benchmarking library is the Java Microbenchmarking Harness which is a tool that comes along with the Open JDK.

Before diving into measuring the numbers themselves, it’s essential to answer one question: what is the purpose and the focus of the benchmark? Obviously, some task might be handled more efficiently by one library, while a different task would take more time.

Beyond that, a code generation library can always trade time for creating a runtime class against the time that it takes for invoking its methods once it's created. Keep all of this in mind while we discuss the subsequent numbers.

Speed Comparision of Byte Buddy, cglib, Javassist and JDK Proxy

So, while keeping in mind all said above, let us first look at the raw numbers of a JMH benchmark that directly compares the runtime for different tasks. All numbers in the following table are listed in nanoseconds per operation with a sample’s standard deviation attached in braces.

Speed Comparision of Byte Buddy, cglib, Javassist and JDK Proxy
 Byte Buddy cglib javassist JDK proxy 
implement interface with stub methods153.800(0.394)804.000(1.899)706.878(4.929)973.650(1.624)
invoke a sub method0.001(0.000)0.002(0.000)0.009(0.000)0.005(0.000)
extend class with super method invocation172.126(0.533)1480.525(2.911)625.778(1.954) 
invoke a super method0.002(0.000)0.019(0.000)0.027(0.000) 

The first line displays the time a library requires for implementing an interface with 18 different methods as no-operation stubs. Based on these runtime classes, the second line displays the time it takes to invoke the stub on an instance of the generated class.

In this measurement, Byte Buddy and cglib perform best because both libraries allow you to hardcode a fixed return value into the generated class while javassist and the JDK proxies only permit the registration of a suitable callback.

This allows us to draw the first vague conclusion that a more specialized implementation of a runtime class’s methods results in better runtime. This might sound more obvious than it actually is, since we could have hoped that the JIT compiler had adapted the performance of both approaches for us.

What About Class Inheritance?

The third line of the above table displays required times for extending a class with 18 methods. This time, instead of creating method stubs, any overridden method should instead call its super implementation.

You might have noticed that Byte Buddy is listed with two measurements where the second italic numbers is significantly larger. Both numbers represent different approaches of implementing a super method invocation.

As mentioned last week, the JVM only permits a super method invocation from within the very same instance. Thus, the easiest way of invoking a super method is to simply hardcode the invocation into the intercepted method which is done for the first measurement.

This approach is however not too flexible, as it does not, for example, permit a conditional invocation. To overcome this limitation, Byte Buddy allows the creation of an inner class-alike. We saw this approach in action in the previous part of this article where we generated a proxy class that implemented the Callable interface.

For any invocation, an instance of this inner class is the injected into an interceptor method by using a corresponding annotation on one of its parameters. As you can see, the creation of such additional classes minimizes the runtime of invoking an exposed super method compared to other libraries which all follow a similar strategy.

At the same time, the creation of a dedicated class per method introduces an overhead for the creation of the actual subclass. Both cglib and javassist choose a middle ground for addressing this issue what cuts the load of their class creation by the cost of an additional runtime overhead for each execution of a super method.

Final Words: It's all About Increasing Performance

There is much more we could discuss from here but at the same time, this is a great time to complete this introductory digest on code generation. I hope that this overview helped you appreciate that code generation is nothing elitist; it's not just reserved for the big frameworks. With a helping library, code generation is a handy tool for implementing cross-cutting concerns along with beautiful APIs and without requiring explicit dependencies, even for small code bases.

Now that Java 8 is getting off the starting blocks, Java’s new meta space does no longer impose the same tight boundaries on the amount of classes a Java application can load in its default configuration. With all this, there is nothing holding you back, so fire ahead. If you have any more questions, just drop a comment or hit me up on Twitter where I reside at @rafaelcodes.

And make sure to check out my after-hours love child Byte Buddy which you can find on bytebuddy.net and on GitHub!

If you want to know more about Java Bytecode, be sure to read the advanced guide to Java Bytecode by Anton Arhipov. It's a great place to get started.

Read the Guide



Get Started With JRebel

Give JRebel a test spin with a free 14-day trial, or see how much you can save by calculating your ROI first.