A Microoptimization Treat

Recently, I’ve been reading a lot of the blog posts and documentation around The Disruptor. This amazing piece of software is capable of handling an astonishingly high number of messages per second. At the core of the message passing implementation is a data structure LMAX has called the Ring Buffer. While looking into the details of the implementation for the ring buffer, I found an article where they described using the bitwise AND operator rather than the modulo operator to handle the end-wrapping. I was really intruiged by the performance bump this blog post demonstrated, so I thought I’d write a little test to verify it myself (you know, science and all that). I’ve got to say, the results are pretty amaxing for such a small change. It’s important to note that in order to get meaningful results, we need to perform each test with a cold start, otherwise the hot spot vm will optimize our calls away to practically nothing.

With that being said, here are the results from five runs on a quad core, AMD A8-3800 APU:

MOD Version

MOD: Writes completed in 4.4576917 seconds
MOD: Writes completed in 4.603921101 seconds
MOD: Writes completed in 4.894278753 seconds
MOD: Writes completed in 4.932036944 seconds
MOD: Writes completed in 5.015436034 seconds
Average: 4.78067 seconds

MOD: Reads completed in 3.798826493 seconds
MOD: Reads completed in 4.560272135 seconds
MOD: Reads completed in 3.765064306 seconds
MOD: Reads completed in 4.687657705 seconds
MOD: Reads completed in 3.818593922 seconds
Average: 4.12608 seconds

AND Version

AND: Writes completed in 2.096063647 seconds
AND: Writes completed in 2.100729785 seconds
AND: Writes completed in 2.095923685 seconds
AND: Writes completed in 2.087372368 seconds
AND: Writes completed in 2.090020781 seconds
Average: 2.09402 seconds

AND: Reads completed in 1.79915254 seconds
AND: Reads completed in 1.808045051 seconds
AND: Reads completed in 1.79790112 seconds
AND: Reads completed in 1.803007605 seconds
AND: Reads completed in 1.796324115 seconds
Average: 1.80088 seconds

That’s a pretty big difference! Test code available here

Please keep in mind that this is a micro-optimization, and only really beneficial if this array index wrapping is in a piece of really performance critical code (for instance, The Disruptor’s Ring Buffer).

Enjoy!