/plushcap/analysis/cloudflare/log-analytics-using-clickhouse

Log analytics using ClickHouse

What's this blog post about?

The text discusses challenges faced in maintaining logging pipelines at Cloudflare, a global network provider handling millions of HTTP requests per second. It highlights issues such as unpredictable log volume, semi-structured and contextual logs, and write-heavy nature of centralized logging systems. The logging pipeline architecture is described, consisting of producers (applications), shippers, queues (Kafka), consumers (Logstash), and datastores (Elasticsearch). The text then delves into the limitations faced with Elasticsearch clusters at Cloudflare, including mapping explosion, lack of multi-tenancy support, cluster operational tasks, and garbage collection. It explains how these issues affect the logging pipeline's performance and cost-efficiency. The proposed solution involves using ClickHouse as an alternative datastore due to its columnar storage design, efficient indexing, compression capabilities, and linear scalability. The text also discusses optimizations for providing faster read/write throughput and better compression on log data, such as inserter scaling, batch size optimization, data modeling in ClickHouse, and data partitioning. It highlights the importance of primary key selection and introduces data skipping indexes to improve query performance. The text concludes by explaining how using ClickHouse for logs has resulted in significant improvements in CPU and memory consumption, storage efficiency, and query latency compared to Elasticsearch. It emphasizes that while both tools have their strengths, ClickHouse is particularly well-suited for handling large volumes of log data and performing analytics tasks.

Company
Cloudflare

Date published
Sept. 2, 2022

Author(s)
Monika Singh, Pradeep Chhetri

Word count
2602

Hacker News points
38

Language
English


By Matt Makai. 2021-2024.