Iceberg v3: Getting Started
Blog post from Starburst
Apache Iceberg v3 is an advanced open-source table format designed to enhance data governance, performance, and versatility in data lakes by introducing key features such as binary deletion vectors, richer data types, nanosecond-precision timestamps, and built-in row lineage. These enhancements allow for faster and more efficient data operations, particularly in high-throughput scenarios like change data capture (CDC), while also supporting complex data structures, such as semi-structured data and geospatial analytics. Iceberg v3's capabilities are supported by Starburst Enterprise and Starburst Galaxy, which provide improved query performance and expanded use cases for analytics on large datasets. Default values for new columns facilitate schema evolution, reducing the need for complex ETL processes. The row lineage feature strengthens data governance by tracking a row's history, ensuring compliance, and enabling robust auditing. This latest iteration of Iceberg positions it as a leading table format in the open data ecosystem, offering users powerful tools to manage and analyze data with greater efficiency and reliability.