Best practices for scaling Apache Kafka

Post Details

Company

New Relic

Date Published

Aug. 1, 2018

Author

Tony Mancill

Word Count

2,765

Language

English

Hacker News Points

-

Source URL

newrelic.com/blog/observability/kafka-best-practices

Summary

Apache Kafka is a powerful distributed streaming platform used by companies like New Relic, Uber, and Square to build scalable, high-throughput, real-time streaming systems. Despite its efficiency in simplifying data streams, Kafka can become complex at scale, particularly if consumers cannot keep up with data streams or if systems fail to scale with demand. The platform provides scalability, low latency, high throughput, fault tolerance, flexibility, and durability, making it ideal for real-time data processing applications. To address operational complexities, New Relic offers best practices for managing Kafka clusters, focusing on partitions, consumers, producers, and brokers. These practices include understanding data rates for retention, using random partitioning, upgrading consumer versions, configuring producer acknowledgments and retries, monitoring broker performance, and managing partition leadership and log compaction. The guidance emphasizes the importance of monitoring and adjusting configurations to maintain performance and reliability. For further learning, New Relic suggests resources such as Kafka documentation and Confluent's online talks, and offers a Kafka monitoring integration through its observability platform.