How a 40-Line Fix Eliminated a 400x Performance Gap
Blog post from QuestDB
An intriguing exploration into recent OpenJDK commit changes reveals a significant performance improvement in measuring thread CPU time on Linux systems. The original method for obtaining user CPU time involved complex file operations and parsing, which was notably slower compared to a more efficient approach using the `clock_gettime()` function. The older method, which read from the `/proc` filesystem, was found to be 30x-400x slower than the `clock_gettime()` approach due to multiple system calls and kernel lock contention under concurrent loads. The new implementation leverages a clever bit manipulation trick to use a Linux-specific feature, allowing for direct and faster user CPU time retrieval. This change, which eliminates the need for file I/O and parsing, resulted in a significant reduction in latency, bringing about a 40-fold performance boost. This improvement is expected to be included in JDK 26, offering substantial benefits to developers using `ThreadMXBean.getCurrentThreadUserTime()`. The discussion also highlights the importance of revisiting and questioning old assumptions in code, as well as the insights that can be gained from understanding kernel source code beyond POSIX standards.