Improving the performance of the Spring-Petclinic sample application (part 2 of 5)

This is part 2 of our 5-part series on improving the performance of the Spring-petclinic application. You can find the first part here.

Let’s profile our application

The error from part 1 is quite clear: we fill up all the server memory until the application slows down and crashes.

This is time to launch our profiler! Our two favorite tools are:

  • JProfiler, which is the most complete and is a little bit expensive. This is the one we usually recommend to our clients.
  • YourKit, which is easier to use and is less expensive. This is the one we use for Tatami, as they provide free licenses for Open Source projects. If you have never used a profiler before, we recommend that you start with Yourkit.

We have used YourKit for this profiling session, mainly because we found its screenshots look better.

As we can see from this first screenshot, we have found our first culprit: Dandelion (which is a tag library used to display nice-looking HTML tables) is using most of our memory.

Spring-petclinic screenshot_1

Dandelion is a great project, but it is using too much memory on this version. As we would love to use Dandelion again, we have filled a bug on the project’s website, and the project’s developpers have been very quick at resolving it!

So the next version of Dandelion doesn’t have this problem anymore, and you can safely use it on high-volume applications.

Solving the memory issue with Dandelion

Of course we will upgrade to the next version of Dandelion, which will resolve this issue, but for the moment, as we need to move forward, we will replace it with a classic HTML table, which is then beautified using JavaScript. We have used JQuery DataTables, which provides a similar, but pure-JavaScript, solution:

[Source code]

We then ran our tests again: we now go up to 560 req/sec and then down again… The application now fails at 3000-4000 users. We have just pushed our memory limit further, but as soon as the heap space is filled up, the whole application starts to fail again.

This is already a big improvement, but it looks like we still have a memory problem. Let’s fire YourKit again:

Spring-petclinic screenshot_2

The heap memory is mostly used by “org.apache.catalina.session.StandardManager”, which is Tomcat’s class that manages HTTP sessions. This means the HTTP sessions are using all the free heap space, until the JVM cannot handle connections anymore.

Going stateless

We have fallen into a classical pitfall in Web application design: using stateful data prevents the application from scaling up.

On our application, it is rather easy to become stateless:

[Source code]

For this project, going stateless is mostly a matter of reloading data from the database instead of using the HTTP session as a kind of cache. Of course things are not always that easy, for instance when you manage “conversations” in a business application. Our goal is to lower the amount of data stored in the user’s HTTP session, as it is one of main scalability issues we encounter: we are here rather lucky, as we can remove all of this data.

Once those modifications have been done, let’s launch our stress test again: YourKit confirms that we do not use the HTTP Session anymore, and we can now handle the load without any problem. The biggest object in memory is now “org.sonatype.aether.util.DefaultRepositoryCache”, which comes from Maven (remember that we launch the application with “mvn clean tomcat7:run”).

The application can now handle our 500 threads doing 10 loops for the first time. However, our results are not perfect: we can serve 532 req/sec, but we still have 0.70% HTTP errors.

This result is a little bit slower than what we got during the previous step (we reached 560 req/sec before failing), as we now read more data from the database, instead of using the HTTP Session as a cache.

Tuning Tomcat

We have HTTP errors, which are distributed on all pages: this is a classical problem with Tomcat, which is using blocking IO by default. Let’s use the new Tomcat NIO connector:

[Source code]

(Many thanks to Olivier Lamy for this configuration, which was not explained in the official documentation!).

Now, we have no HTTP error at all, and we are able to handle 867 req/sec.

Conclusion of part 2

Now the application is starting to work! We can handle our 5000 users without any error at all, and the performance is rather good, at 867 req/sec.

On part 3, we will see that we can do even better.

[edit]

You can find the other episodes of this series here : part 1part 3part 4 and part 5.

TwitterFacebookGoogle+LinkedIn
  • Massinissa

    Hi,

    Could you tell me more about the benefit brought by NIO connector ?

    I heard that, in some cases the use of NIO connector is less performant than BIO.

    In most case increasing acceptCount (plus some tuning on the executor) on the BIO connector make connector run better.

    http://stackoverflow.com/questions/4260470/drawbacks-of-tomcat-http11nioprotocol

    • http://twitter.com/juliendubois Julien Dubois

      Yes, I’ve also seen that NIO is sometimes less performant.

      However, my problem here is scalability, and not pure performance : with the NIO connector I have no problem at all to handle a lot of users. With the BIO connector, I need one thread per request, which is quickly going to be a problem on this use case.

      It really depends if you expect a few users and want the “best” performance for them (BIO might be better), or if you want a lot of users (NIO will probably always be better in that case).

      Anyway, as you can see here, I have better performance and better scalability : I go from 532 req/sec with a few errors to 867 req/sec with no errors. So my decision was in fact quite easy. It might change depending on the type of application you are doing (if you have some huge SQL requests for instance), and on the number of users you have : that’s why testing is really important.

      • Massinissa

        Your test prove the efficiency of NIO connector in your case.

        But I think that both NIO and BIO connector create 1 thread by request (I can see it with thread dump). I think that the optimisation is that in NIO, threads wich are not handling request are not in blocked state and can achieve other task like parse http headers and others.

        I’m not sure but in the brief comparison given by tomcat you can see that some throughput off NIO
        connectors are made asynchronously.

        http://tomcat.apache.org/tomcat-7.0-doc/config/http.html#Connector_Comparison

        Another benefit of the NIO connector is the polling. Mean that this connector can accept socket faster by dedicating 1 thread by processor in charge to accept socket.

        In my configuration core 2 duo, I can see that I’ve 2 Threads in charge of polling sockets.

        I’ll continue to search for main benefits on NIO connector, because it’s a bit tricky and undocumented.

        Thank you for your answer

        • Massinissa

          After many research, reading and bench (to understand the real behavior). I find, I think the main ,difference beetween NIO and BIO connector.

          On Contrary of most assumption both of these connectors create “thread-by-request”.

          But the big difference is that in NIO connector when the thread finished to handle a request come back in the threadPool whereas Threads created by BIO connector after finish handling a request are waiting for subsequent request from the same http connection (in case of keep-alive connection). These thread are waiting until the timeout (observed by another thread) is reached and thread come back to the pool. That’s implies waisted CPU cycles and resources. In the case of BIO connector, I prefer to call the scheme “thread-by-http-connection” or “thread-by-http-connection-until-timeout”. :)

          Finally I agree with your statement, that BIO will probably better in case of few concurrent clients or requests and NIO better for strong concurrency requierements.