Rethinking Container Image Distribution to eliminate cold starts
Blog post from Cerebrium
At Cerebrium, teams developing latency-sensitive AI systems, such as voice agents and real-time video avatars, encounter significant delays due to the slow startup of containers, primarily caused by lengthy image pull times. The issue stems from the reliance on the tar+gzip format, originally designed for sequential tape access in the 1970s, which requires downloading and unpacking every byte before a container can start. This creates substantial bottlenecks, especially for large machine learning images that exceed 10GB. The traditional OCI image format, while standardized, lacks efficiency in handling container start-up demands due to its lack of random access and cross-layer deduplication capabilities. To address these challenges, Cerebrium has implemented strategies such as lazy-loading, seekable archives, and chunk-based filesystems, allowing containers to initiate before the entire image is downloaded and fetching data on-demand, which significantly reduces cold start times. These optimizations, including splitting images into metadata indexes and data blobs and leveraging technologies like FUSE and EROFS, allow for a more efficient container startup, ultimately enhancing the performance of AI applications by reducing the time to first inference and lowering operational costs in high-demand environments.