Apache Kafka's log compaction corrupts data. Here's how we fixed it
Blog post from Redpanda
The blog post addresses a critical bug in Apache Kafka's log compaction process, which can result in data inconsistencies across broker replicas. It explains how compaction manages data by retaining only the latest value for each key, using tombstones for deletion, and applying expiration-based rules to transaction control batches. However, issues arise when a broker falls behind, potentially leading to scenarios where deleted or aborted data reappears as committed, committed data is hidden, or partitions become frozen. The root cause is identified as a race condition between compaction and replication, where a broker missing critical markers may end up with inconsistent data. Redpanda Streaming introduces a coordinated compaction protocol that uses metrics like Maximum Cleanly Compacted Offset (MCCO) and Maximum Tombstone Removal Offset (MTRO) to ensure all replicas are synchronized before removing tombstones or transaction markers. This approach prioritizes data safety and allows for optimal cleanup decisions even during prolonged node outages, ensuring that compaction does not compromise data integrity.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Real-time | 4 | 5,457 | 1,338 | 238 | -5% |