/plushcap/analysis/cloudflare/building-network-analytics-v2

How we built Network Analytics v2

What's this blog post about?

In this project, a team of engineers rebuilt their network analytics system called Network Analytics from scratch, using ClickHouse and Apache Flink for data storage and processing respectively. The new system is highly scalable and fast, enabling real-time analysis of billions of network traffic events per day. The previous implementation was built on SQLite and Apache Kafka but suffered from performance issues as the amount of data increased. To address this issue, the team chose ClickHouse for its exceptional query speed, high availability features, and ability to handle large datasets efficiently. To implement real-time stream processing, the engineers used Apache Flink due to its fault tolerance capabilities, event time semantics support, and easy integration with other systems. They also implemented AggregatingMergeTree tables in ClickHouse for aggregated data analysis and pre-computation of minimum or maximum values using a single pass over the data. As a result of these changes, Network Analytics can now accurately identify different types of network attacks such as DDoS attacks with fixed destination IPs or ports, even if they span multiple days. The system has been deployed in production and is currently handling hundreds of terabytes of data per day across thousands of servers worldwide. In summary, the rebuilt Network Analytics system leverages ClickHouse for efficient storage and querying of large datasets, while Apache Flink enables real-time stream processing with fault tolerance and event time semantics support. This combination allows for accurate identification and analysis of various types of network attacks in near real-time.

Company
Cloudflare

Date published
May 2, 2023

Author(s)
Alex Forster, Clément Joly

Word count
2926

Hacker News points
2

Language
English


By Matt Makai. 2021-2024.