Scaling Apache Druid for Real-Time Cloud Analytics at Confluent

Post Details

Company

Confluent

Date Published

Nov. 8, 2021

Author

Zohreh Karimi

Word Count

2,806

Company Posts That Month

4

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.confluent.io/blog/scaling-apache-druid-for-real-time-cloud-analytics-at-confluent

Summary

Confluent provides fine-grained operational visibility through its cloud services by utilizing Apache Druid, a column-based distributed database designed for real-time analytics, which helps manage the vast amounts of telemetry data from multi-tenant Apache Kafka clusters and other services across Azure, GCP, and AWS. The migration to Druid from a non-time-series NoSQL database was driven by the need for increased scalability, sub-second query latencies, and time-series data support, addressing challenges such as high-cardinality metrics and data ingestion. Druid's integration into Confluent's telemetry pipeline facilitates customer-facing solutions like monitoring dashboards, Confluent Cloud Metrics API, and internal operations such as cloud billing and Kafka cluster management. The architecture includes techniques like data tiering, query laning, and compaction to optimize performance, reduce costs, and handle high data volumes, while future plans focus on further automating deployments and expanding use cases.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	9	960	327	109	+7%
Kubernetes	5	1,218	176	69	-9%
Observability	3	857	161	53	+17%
Data Pipeline	1	244	58	26	-42%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.