Home / Companies / Starburst / Blog / Post Details
Content Deep Dive

Iceberg Snapshots Affect Storage, Not Performance

Blog post from Starburst

Post Details
Company
Date Published
Author
Lester Martin
Word Count
859
Language
English
Hacker News Points
-
Summary

Apache Iceberg's architecture utilizes snapshots for version control, allowing features like time-travel querying and rollback functionality without impacting query performance, as queries primarily interact with the current version. However, maintaining multiple snapshots increases the storage footprint on data lakes since each version's data files must be retained. While data lake providers allow extensive data storage, they charge for it, necessitating strategies such as expiring older snapshots to manage storage costs effectively. Iceberg's Merge-on-Read strategy, which handles updates and deletes without updating existing files, contributes to storage efficiency by minimizing the need for additional storage beyond what is currently used, although compaction processes can increase storage temporarily. Consequently, regular maintenance, including snapshot expiration and orphan file cleanup, is recommended to reclaim storage while balancing the need for historical version retention.