October 21, 2015

How to Improve JPA Performance

Java Tools
Enterprise Development

Developers often complain about the subpar performance of JPA. In this blog post I’m going to explain how to improve JPA performance by harnessing features in JPA 2.1 to avoid problems and boost performance.

Learn what tools other developers are using to remedy JPA issues.

Our 2022 Java Developer Productivity Report gives data and insight on the most popular technologies in Java today.

Grab Your Java Report

3 Common JPA Performance Issues

If you take a close look at the performance issues in JPA, quite often you will find similar root causes. These can include:

  • using too many SQL queries to fetch the required entities from the database, aka the so called n+1 query problem
  • updating entities one by one instead of doing it in using a single statement
  • doing data heavy processing on the Java side, rather than the database side 

Luckily, there is no need to suffer from these inefficiencies, if you know what you’re doing. JPA always offered a way to handle these kinds of issues and introduced some additional features in JPA 2.1 that can be used to gain significant performance improvements.

By the way, if you want to learn more about the typical performance issues in Java projects, we have recently published an insightful report based on our performance survey findings.  Or if you’re looking for a JPA resource, here’s a great cheatsheet of JPA 2.1 features. Anyway, let’s get straight to fixing the performance issues found when using JPA.

1. Too Many SQL Queries

Performing too many SQL queries to fetch all required entities is, in my experience, the most common reason for performance issues.

Even the most innocent looking query, if implemented incorrectly can trigger dozens or hundreds of SQL queries to the database. And it doesn’t even have to be in the explicit query form as you will see in this section, rather just a couple of annotations configured incorrectly. So if you think this problem wont affect you, think again.

Imagine the following piece of code in your project. What are your thoughts?


List authors = this.em.createQuery("SELECT a FROM Author a",
		Author.class).getResultList();

for (Author a : authors) {
	System.out.println("Author "
			+ a.getFirstName()
			+ " "
			+ a.getLastName()
			+ " wrote "
			+ a.getBooks()
					.stream()
					.map(b -> b.getTitle() + "("
							+ b.getReviews().size() + " reviews)")
					.collect(Collectors.joining(", ")));
}


The code snippet above prints out the names of all authors and the titles of their books. That snippet looks really simple. What do you think, how many queries are sent to the database? One? Maybe two (one for each type of entity)?

The rights answer is, it depends on the number of authors in the database. If I use my small example database with only 11 authors and 6 books in it, this code triggers 12 queries. One to get all authors and 11 to get the books for each of the 11 authors. This issue is known as the n+1 query problem and it can easily occur with any libraries that you use for database access. The worst thing is that the performance gets even worse with an increasing dataset, so in production the problem is exacerbated.

The good news is, we have multiple options to avoid this scenario by fetching all the required entities with one query. One of the newest and, from my point of view, the best way to solve this problem is to use a @NamedEntityGraph.

An entity graph specifies a graph of entities that shall be fetched from the database in a query independent way. That means, you create a standalone definition of an entity graph and combine it with a query when you need it. The snippet below shows how to define a @NamedEntityGraph which we fetch the books of a given author.


@Entity
@NamedEntityGraph(name = "graph.AuthorBooks", attributeNodes = @NamedAttributeNode("books"))
public class Author implements Serializable {
…
}


You can now provide this graph as a hint to the entity manager and get the authors and all their books in one query. As you have seen in the definition of the graph, I only provided the name of the property that contains the related entities. Therefore I use the @NamedEntityGraph as a loadgraph, so that all the other attributes are fetched with their defined fetch type, as follows:


EntityGraph graph = this.em.getEntityGraph("graph.AuthorBooks");

List authors = this.em
		.createQuery("SELECT DISTINCT a FROM Author a", Author.class)
		.setHint("javax.persistence.loadgraph", graph).getResultList();


This example shows a very simple entity graph and you will probably be using more complex graphs in a real application. But this is not a problem. You can define more complex ones by defining multiple @NamedAttributeNodes and you can also use the @NamedSubGraph annotation to create a graph with multiple levels. You can find more information about @NamedEntityGraphs in this post explaining how to use Entity Graphs in detail.

For some use cases you might also need a more dynamic way to define the entity graph, e.g. based on some input parameters. In these cases it makes more sense to use a Java API to programmatically define the EntityGraph.

Need to cut down development time? 

The average developer saves 150 coding hours a year using JRebel. Try it for free.

TRY FREE

2. Updating Entities One by One

Updating entities one by one is another common reason for performance issues in JPA. As Java developers we are used to work with objects and to think in an object oriented way. While this is a good way to implement complex logic and applications, it is also a common cause of performance degradation when working with a database.

From an object oriented point of view it is perfectly acceptable to perform update and delete operations on the entities. But this is very inefficient when you have to update a huge set of entities. The persistence provider will create one update statement for each updated entity and send them to the database with the next flush operation.

SQL provides a more efficient way to do this. It allows you to construct an update statement that updates multiple entities at once. And you can do the same with the CriteriaUpdate and CriteriaDelete statements introduced in JPA 2.1.

If you have used criteria queries before, you will feel very familiar with the new CriteriaUpdate and CriteriaDelete statements. The update and delete operations are created in nearly the same way as the criteria queries introduced in JPA 2.0.

As you can see in the following code snippet, you need to get a CriteriaBuilder from the entity manager and use it to create a CriteriaUpdate object. This is done in a similar way to the CriteriaQuery. The main differences are the set methods which are used to define the update operations.


CriteriaBuilder cb = this.em.getCriteriaBuilder();
// create update
CriteriaUpdate update = cb.createCriteriaUpdate(Author.class);
// set the root class
Root a = update.from(Author.class);
// set update and where clause
update.set(Author_.firstName, cb.concat(a.get(Author_.firstName), " - updated"));
update.where(cb.greaterThanOrEqualTo(a.get(Author_.id), 3L));

// perform update
Query q = this.em.createQuery(update);
q.executeUpdate();


For CriteriaDelete operations you just need to call the createCriteriaDelete method on the entity manager to get a CriteriaDelete object and use it to define the FROM and WHERE parts of the query similar to the previous example.

3. Processing Data in the Database

Another common source of performance problems is that we, as Java developers, tend to implement all the logic of our application in Java. Don’t get me wrong, there are lots of good reasons to do it this way. But there can also be good reason to perform some part of the logic in the database and only send the result to the business tier.

There are multiple ways to perform logic in the database. You can do a lot of things with plain SQL and if that is not enough, you can still call database specific functions and stored procedures. Here I will have a closer look at stored procedures or to be more precise at the way you can call stored procedures.

There was no real support for it in JPA 2.0. Native queries were the only way you could call a stored procedure. This was changed with the introduction of @NamedStoredProcedureQuery and the more dynamic StoredProcedureQuery in JPA 2.1. In this post, I will focus on the annotation based definition of stored procedure calls with @NamedStoredProcedureQuery. I wrote more about the dynamic StoredProcedureQuery on my blog.

As you can see in the following code snippet, the definition of a @NamedStoredProcedureQuery is pretty straight forward. You need to define the name of the named query, the name of the stored procedure in the database as well as the input and output parameters. In this example, I’m calling the stored procedure calculate with the input parameters x and y. I expect the output parameter sum. Other supported parameter types are INPUT for parameters which are used for input and output and REF_COURSOR to retrieve result sets.


@NamedStoredProcedureQuery(
name = "calculate", 
	procedureName = "calculate", 
	parameters = { 	
@StoredProcedureParameter(mode = ParameterMode.IN, type = Double.class, name = "x"), 
		@StoredProcedureParameter(mode = ParameterMode.IN, type = Double.class, name = "y"), 
		@StoredProcedureParameter(mode = ParameterMode.OUT, type = Double.class, name = "sum") })


The @NamedStoredProcedureQuery is used in a similar way to @NamedQuery. You need to provide the name of the query to the createNamedStoredProcedureQuery method of the entity manager to get a StoredProcedureQuery object for this query. This can then be used to set the input parameters with the setParameter methods and to call the stored procedure with the execute method afterwards.


StoredProcedureQuery query = this.em.createNamedStoredProcedureQuery("calculate");
query.setParameter("x", 1.23d);
query.setParameter("y", 4.56d);
query.execute();
Double sum = (Double) query.getOutputParameterValue("sum");

 

Final Thoughts

JPA makes it very easy to store and retrieve data from a database. While this is great to get a project started quickly and to solve the vast majority of its requirements, it also makes it easy to implement a very inefficient persistence tier. Some of the most common problems include using too many queries to get the required data, updating entities one by one and implementing all of the logic within Java.

The JPA 2.1 specification introduced several new features to address these inefficiencies, like entity graphs, criteria update and stored procedure queries. 

Have any comments or better way to make sure JPA performance is great? Share your recipes with me on twitter: @thjanssen123!

Additional Resources

Looking for additional ways to increase your Java code performance? Be sure to check out our available Java resources.

Calculate the daily, monthly, and yearly time savings for your team and technology. Get an instant estimate of how much time and cost savings on restarts and redeploys you can achieve using JRebel.

calculate your roi