Company
Date Published
Author
Zohreh Karimi
Word count
2806
Language
English
Hacker News points
None

Summary

Confluent provides fine-grained operational visibility through its cloud services by utilizing Apache Druid, a column-based distributed database designed for real-time analytics, which helps manage the vast amounts of telemetry data from multi-tenant Apache Kafka clusters and other services across Azure, GCP, and AWS. The migration to Druid from a non-time-series NoSQL database was driven by the need for increased scalability, sub-second query latencies, and time-series data support, addressing challenges such as high-cardinality metrics and data ingestion. Druid's integration into Confluent's telemetry pipeline facilitates customer-facing solutions like monitoring dashboards, Confluent Cloud Metrics API, and internal operations such as cloud billing and Kafka cluster management. The architecture includes techniques like data tiering, query laning, and compaction to optimize performance, reduce costs, and handle high data volumes, while future plans focus on further automating deployments and expanding use cases.