Investigating Mysterious Kafka Broker I/O When Using Confluent Tiered Storage
Blog post from Honeycomb
Earlier this year, an upgrade from Confluent Platform 7.0.10 to 7.6.0 necessitated converting tiered storage metadata files to a new format, which posed some challenges since the conversion process could not be parallelized, causing delays due to large file sizes. An unexpected incident later revealed unusually high read IOPS on one Kafka broker, traced back to these metadata files. This led to the discovery that while Confluent's Tiered Storage feature offers "infinite retention," the metadata could become a scaling issue. Confluent's support helped implement settings to clean up tiered storage metadata, significantly reducing file sizes and improving broker start times. The investigation highlighted the importance of maintaining and updating infrastructure, even components that rarely cause issues, to prevent potential bottlenecks and inefficiencies.