Company
Date Published
Author
Stanislav Kozlovski
Word count
5254
Language
English
Hacker News points
None

Summary

The KIP-405 Tiered Storage in Kafka enables near-local performance with remote data by utilizing intelligent prefetching and caching. The plugin fetches data from the object store using a byte-ranged GET feature, which allows for efficient fetching of chunks. Caching is used to reduce round-trip times, and pre-fetching ensures that anticipated future reads are cached. The broker serves data from local disk first, and only one partition per fetch request is fetched from the remote store due to a limitation in KAFKA-14915. Deletes are handled by the new local retention settings, which delete data according to the local retention policy if it has successfully been tiered. Orphaned segments are cleaned up once their max leader epoch falls out of the leader epoch checkpoint file. The plugin fetches exactly one chunk per fetch request, and the offset index for the remote segment is fetched from the remote store and cached locally using a Caffeine W-TinyLFU cache.