Best practices for scaling Apache Kafka
Blog post from New Relic
Apache Kafka is a powerful distributed streaming platform used by companies like New Relic, Uber, and Square to build scalable, high-throughput, real-time streaming systems. Despite its efficiency in simplifying data streams, Kafka can become complex at scale, particularly if consumers cannot keep up with data streams or if systems fail to scale with demand. The platform provides scalability, low latency, high throughput, fault tolerance, flexibility, and durability, making it ideal for real-time data processing applications. To address operational complexities, New Relic offers best practices for managing Kafka clusters, focusing on partitions, consumers, producers, and brokers. These practices include understanding data rates for retention, using random partitioning, upgrading consumer versions, configuring producer acknowledgments and retries, monitoring broker performance, and managing partition leadership and log compaction. The guidance emphasizes the importance of monitoring and adjusting configurations to maintain performance and reliability. For further learning, New Relic suggests resources such as Kafka documentation and Confluent's online talks, and offers a Kafka monitoring integration through its observability platform.