On Coordinated Omission
Blog post from ScyllaDB
Coordinated Omission (CO) is a critical issue in performance benchmarking, often leading to misleadingly positive results by failing to accurately measure system outliers and request delays. Coined by Gil Tene, CO occurs when the measuring system unintentionally synchronizes with the system being tested, thereby neglecting to capture latency spikes or missed requests. The text explains CO using an analogy of delayed coffee runs causing a backlog, and emphasizes its prevalence in open versus closed model systems, such as web applications and assembly lines. The text outlines various solutions to mitigate CO, including queuing and queueless request scheduling, and latency correction or simulation techniques. It highlights the importance of setting explicit throughput targets and worker thread numbers to ensure accurate benchmarks. The document also provides practical guidance on using tools like YCSB and ScyllaDB for testing, stressing the need for latency correction to reflect true system performance accurately.