Company
Date Published
Author
Sanjana Kaundinya
Word count
4153
Language
English
Hacker News points
3

Summary

Apache Kafka is a distributed, real-time data streaming system that uses a pull-based replication model for durability and availability. It stores messages in topics, which are logical groups of one or more partitions, with each partition being an append-only log that guarantees message ordering within the partition. Kafka provides various replication topologies to support multi-geo deployments, including stretched clusters, connected clusters, read replica deployments, global write replication scenarios, fan-in and fan-out architectures, and mixed deployment strategies. These topologies offer different trade-offs between cost, business requirements, use cases, regulatory compliance, resilience to disasters, security, and fault tolerance. Choosing the right topology depends on factors such as data loss tolerance, consumer offset translation needs, clustering size, network latency, and security requirements. With various learning resources available, including Confluent Developer courses and talks, Apache Kafka provides a robust solution for building globally available systems that can handle high availability and disaster recovery use cases.