A Quadrillion Rows across three Clouds: scaling LogHouse
Blog post from ClickHouse
LogHouse, the internal logging platform for ClickHouse Cloud, has significantly expanded, now managing 431 PiB of uncompressed data across 1.59 quadrillion rows, a 23-fold increase over two years. It operates across 30+ regions on three cloud providers, handling data with high efficiency, such as 80 GiB/s and 190 million rows per second at peak. This growth is supported by a geosharding strategy that allows writes to remain local to their region, enabling independent scaling while minimizing cross-region costs. The platform uses Async Inserts to manage small write operations efficiently and a three-level table hierarchy that facilitates low-latency, cross-cloud queries by hiding the complex topology from users. LogHouse's development includes features like Distributed tables for seamless data querying across regions and a robust setup for reliable data delivery, even during outages, by leveraging S3 for persistent buffering. Despite its advancements, LogHouse continues to evolve, focusing on reducing memory consumption, enhancing durability in async inserts, improving telemetry without customer impact, and expanding data types to include more OpenTelemetry traces and metrics.
No tracked trend matches for this post yet.