How Apache Kafka is Tested

Company

Confluent

Date Published

Sept. 13, 2017

Author

Lucia Cerchie, Colin McCabe, Josep Prat

Word count

1507

Language

English

Hacker News points

URL

www.confluent.io/blog/apache-kafka-tested

Summary

Apache Kafka is used by thousands of companies, including banks, financial exchanges, and tech companies, for critical market data systems and distributed databases. Ensuring correctness and performance in these demanding environments requires a set of practices across the software development lifecycle, from design to production. The goal of this blog post is to give insight into how Confluent and the Apache Kafka community handles testing and other practices aimed at ensuring quality. The trend in software is away from up-front design processes towards an agile approach, but for distributed systems, a good design is essential. Kafka requires any major new feature or subsystem to come with a design document, called a Kafka Improvement Proposal (KIP), which allows changes to go through a broad and open debate. The community has a culture of deep and extensive code review that tries to proactively find correctness and performance issues. A hierarchy of testing approaches is needed, including unit tests, integration tests, and system tests, with the latter providing a good check of correctness in realistic environments. Confluent uses a framework called ducktape to aid in creating distributed environments, setting up clusters, and introducing failures, making it easier to debug and test the system. The tight feedback loop from people running Kafka at scale and the engineers writing code has long been an essential part of development, ensuring that the tools, configs, metrics, and practices for at-scale operation really work.