Thoughts on performance

Performance can make or break a piece of software, this is clear; nobody puts up with an unresponsive client UI or a slow back-end server in today’s age of software abundance and choice. Despite this, performance is often overlooked until late in the release cycle and doesn’t get the proper attention it deserves. This might not be a big deal in one release cycle, but after a few release cycles,  you can end up with a slow-moving giant that nobody knows how to fix instead of a lean fast machine that you used to have.  At this point, you either accept what you have or you take the hit and go through the painful process of profiling, analyzing, fixing, and in some cases redesigning. How does software end up like this in the first place? I can think of four reasons.

First, it’s not easy to sell performance. New features, especially visual ones where people can see and play, often are much easier to market and sell then subtle yet more important features like performance. Two more features on the release cycle looks better than 20% increase in throughput for example, so performance is not treated as a proper feature but rather seen as a thing to check at the end of the release cycle. As a result, performance does not get the time and resources it needs.

Second, it’s not easy to reason about performance. You need to define what metrics are being measured in the name of performance, define what qualifies as acceptable performance, define use cases where performance is important. This requires through understanding of the software and the use cases around it. It’s hard to get the scope of performance work right, it’s either too broad to implement or too narrow to produce anything useful.

Third, performance work is hard, sometimes harder than implementing the software. It usually needs additional tools/software outside of the software itself in order to write tests to simulate the agreed upon use cases and track some numbers around those cases. In most places, there is simply not enough time left over outside the feature development to build those tools. Even if you have all these tools, you need time to run the complicated performance scenarios and if numbers don’t look right, you need time to find out why; it can be anywhere in the code. You also need to do this all over again in every release cycle or get some time to implement automated performance tests that can track performance for you. This is a lot of work.

Fourth, performance is usually not tightly integrated with the overall feature development. When a new feature is being developed, there is a lot of focus on the new capabilities that the new feature brings from Engineering, QA, Product Management but not as much focus on two things: 1. How does this feature perform by itself? 2. How does this feature affect the overall performance? The result of ignoring #1 is that a new feature gets designed and developed without performance in mind, and by ignoring #2 the overall performance of existing system gets worse which is even worse.

Despite all this, professional software developers have the obligation to design and implement performant software, no matter what the realities of the workplace is and I think with some effort, performance can be saved and maintained over the release cycles with a few guidelines that I hope to share in a future post.

Performance and heap size

You have a machine with lots of memory, your Java/J2EE application is the only one on that machine, and you want high throughput and low latency from your application. I can guarantee that you’ll need to make some decisions on the size of the heap to allocate for your application at some point (using -Xms and -Xmx flags).

You can start with no min/max heap size, let JVM start with a default min heap for your platform and adjust heap size as needed. My observation is that adjusting heap size is expensive and affects performance, especially if your application is running under load. You can choose a small min heap, so at least the initial heap is chosen by you rather than JVM, and a larger max heap but this doesn’t buy you much because if your application needs more memory than min heap, JVM needs to adjust the heap size again and it’s the same performance problem as before. You can choose a largish min heap (something more than your application will need) and an equal or larger max heap. This sounds like a good idea at first, JVM with plenty of memory to work with should not cause any problems, right? Wrong. When more heap is allocated than what the application needs, garbage collection (GC) seems to get lazy. Of course, GC can be tuned with other JVM options but generally large heap results in lazy GC because JVM can afford to be lazy when it comes to GC in high throughput scenarios. When GC is lazy, minor GC collections kick in less frequently and eventually turn into full GC collections. Anyone who briefly looked into GC details (using -verbose:gc and -XX:+PrintGCDetails options) would know that full GC takes a while and it definitely affects the latency of your application as it’s running.

So, what can you do? Setting the min/max heap to what your application actually needs is an obvious first step. You need to find out how much memory your application really needs, you can do that but running your application through profiler (eg. YourKit, VisualVM). Once you have that value, set your min and max heap slightly higher than that value. By doing that, you remove the JVM heap sizing out of the equation and since the max heap is a reasonable value (around what the application needs), GC won’t get lazy and full GCs will be less frequent.

Of course, there are many other JVM options that affect performance. Options on which GC algorithm to use such as -XX:+UseSerialGC, -XX:+UseParallelGC, -XX:+UseConcMarkSweepGC, -Xincgc, or options on how many threads GC should use such as -XX:ParallelGCThreads, or how much GC should run such as -XX:GCTimeRatio, -XX:MaxGCPauseMillis all come into play and maybe we’ll cover these in some other post.

LiveCycle Data Services Performance Brief

Are you interested in learning more about the performance characteristics of the messaging infrastructure provided by LCDS? If you have performance related questions such as “How does the number of concurrent clients affect latency?” or “How does message size affect latency?” or maybe more fundamental questions like “Why should I care NIO-based endpoints in LCDS?” or “What’s the latency overhead of using LCDS Edge Server?”, then you need to check out the Adobe LiveCycle Data Services ES2 Performance Brief (pdf) on Livecycle Developer Center. Not only it provides answers to a number of questions like these, but it comes with tools and instructions on how to reproduce those results yourself if you don’t believe us 🙂

Real-time systems with Java

I came across a six-part series of articles from IBM on challenges to using Java to build systems that meet real-time performance requirements. The articles do a great job of outlining the challenges and also introduce IBM’s real-time VM that try to address these problems. Great read for anyone remotely interested in real-time applications.

Real-time Java, Part 1: Using Java code to program real-time systems
Real-time Java, Part 2: Comparing compilation techniques
Real-time Java, Part 3: Threading and synchronization
Real-time Java, Part 4: Real-time garbage collection
Real-time Java, Part 5: Writing and deploying real-time Java applications
Real-time Java, Part 6: Simplifying real-time Java development