The Hackathon Fix That Cut Our Storage Costs by 93%
Blog post from Cast AI
Cast AI's innovative project at their internal hackathon led to a significant overhaul of their Kubernetes cluster snapshot system, achieving remarkable improvements in storage efficiency and processing speed. The existing system, which took snapshots every 15 seconds, resulted in over a petabyte of data monthly and incurred high storage costs due to duplicated data. The hackathon-inspired redesign, named Snapshots V2, introduced a custom binary format with selective loading, dictionary-based differential compression, and lazy loading with smart memory management. These innovations allowed for a 93% reduction in storage space, saving over $300,000 annually, and an 82% decrease in snapshot processing time, enhancing service efficiency without disrupting existing operations. The new system not only reduces storage and compute costs but also maintains the high-frequency, detailed capture that is crucial for Cast AI's operations, including cost optimization, customer reporting, machine learning, and customer support.