Scaling Kafka at Honeycomb

Post Details

Company

Honeycomb

Date Published

Nov. 30, 2021

Author

Liz Fong-Jones

Word Count

2,924

Company Posts That Month

5

Language

English

Hacker News Points

-

Source URL

www.honeycomb.io/blog/scaling-kafka-observability-pipelines

Summary

Honeycomb has utilized Apache Kafka for buffering telemetry data in its observability pipeline, focusing on ensuring durability, reliability, and efficient operability. Over the years, Honeycomb has meticulously optimized its Kafka infrastructure, transitioning from c5.xlarge instances to AWS Graviton2-powered instances, and adopting Confluent's tiered storage for cost efficiency and scalability. These changes were driven by the need to maintain the integrity of a 24 to 48-hour data buffer and to ensure rapid recovery from any system failures. Despite experimenting with various configurations, including the use of AWS's gp3 EBS storage and the Graviton2 instances, Honeycomb faced challenges in achieving stability due to unforeseen saturation and reliability issues. Eventually, they settled on using im4gn.4xlarge instances for their Kafka clusters, which offered a balanced ratio of compute, storage, and network resources, supporting Honeycomb's rapid growth while reducing the total cost of ownership. Honeycomb emphasizes the importance of leveraging existing expertise and infrastructure to manage Kafka clusters effectively, as evidenced by their significant reduction in cost per megabyte of data throughput despite a substantial increase in data volume.

Trends Found in this Post

No tracked trend matches for this post yet.