Home / Companies / Mux / Blog / Post Details
Content Deep Dive

How we cut ClickHouse latency from 12s to 2s

Blog post from Mux

Post Details
Company
Mux
Date Published
Author
Nidhi Kulkarni
Word Count
2,516
Language
English
Hacker News Points
-
Summary

ClickHouse, known for its efficiency in handling large-scale data ingestion and aggregation, faced performance bottlenecks in a scenario involving real-time data ingestion through Kafka, despite having only 60% CPU utilization. This issue was identified as a trade-off between latency and throughput, with latency not being properly measured initially, leading to delays in data appearing on real-time dashboards. Through experimentation, it was discovered that the bottleneck was due to the inefficiency in parsing the protobuf single format. By switching to a batched format and adjusting the Kafka flush interval, the team reduced ingestion latency from 12 seconds to 2-6 seconds while maintaining high throughput and manageable CPU usage. These changes highlighted the importance of monitoring both latency and throughput to avoid blind spots in performance metrics, offering insights for others using ClickHouse’s Kafka Table Engine.