Apache Kafka® simply explained

Company

Aiven

Date Published

July 5, 2022

Author

Olena Kutsenko

Word count

2400

Language

English

Hacker News points

None

URL

aiven.io/blog/kafka-simply-explained

Summary

Apache Kafka is a widely-used event streaming platform that is distributed, scalable, high-throughput, low-latency, and has a large ecosystem. It serves as a transportation mechanism for messages across multiple systems or microservices. The platform is distributed, meaning it relies on multiple servers with data replicated over multiple locations to ensure resilience against server failures. Apache Kafka is scalable, allowing users to add more servers as their system grows. The technology's community and ecosystem include client libraries for various programming languages and a set of data connectors to integrate Kafka with existing external systems. This makes it easier for developers to start using Apache Kafka without having to reinvent the wheel. Apache Kafka is commonly used in scenarios where real-time events need to be processed, such as e-commerce projects that require immediate recommendations based on users' latest purchases. The platform helps untangle data flows and simplify handling of real-time data while decoupling subsystems. The approach taken by Apache Kafka is to describe entities using continuously arriving events rather than static objects or aggregated facts stored in a database. This allows for replaying events, answering different types of questions about the products and sales, and maintaining an event-driven architecture. Apache Kafka's push-pull model involves producers that create and push messages into the cluster and consumers who pull, read, and process the messages. Producers and consumers can be written in different languages and platforms, allowing for flexibility and decoupling of systems. Data is organized within Apache Kafka using topics, which are abstract terms representing sequences of messages. Topics can have multiple partitions stored across multiple machines to enable horizontal scaling. Replication ensures high availability and prevents data loss by replicating data across brokers at the partition level. Apache Kafka connectors simplify connecting applications to Apache Kafka by integrating external data sources, such as databases and tools, using pre-built connectors or custom ones.