ScyllaDB’s Approach to Improve Performance for CPU-bound workloads
Blog post from ScyllaDB
ScyllaDB has made significant improvements in handling CPU-bound workloads by implementing function call batching inspired by the Staged Event-Driven Architecture (SEDA), which has resulted in a notable increase in throughput and efficiency. The process involved diagnosing performance bottlenecks using tools like Flame Graphs and Performance Monitoring Units (PMU) to analyze front-end latency issues, particularly instruction cache misses. By introducing execution stages at key points in the request processing pipeline, ScyllaDB was able to enhance instruction cache locality, thereby increasing instructions per cycle and reducing cache miss rates. This approach has shifted the performance bottleneck from the CPU front-end to other parts of the microarchitecture, without significantly affecting latency. While the current improvements have been substantial, further fine-tuning could yield additional performance gains, but such optimizations must be handled carefully to avoid increasing latency.