FAQ Apache Iceberg

Post Details

Company

Streamkap

Date Published

Aug. 15, 2025

Author

Oli Dinov

Word Count

609

Language

English

Hacker News Points

-

Source URL

streamkap.com/blog/faq-apache-iceberg

Summary

Apache Iceberg is an open table format designed to make files in data lakes behave like traditional database tables by adding features such as ACID transactions, schema evolution, and time travel without leaving cloud storage services like S3, ADLS, or GCS. It is cloud-native, vendor-neutral, and supports integration with various tools like Spark, Flink, Trino/Presto, and Kafka, allowing for both batch and streaming data processing. Iceberg's metadata-driven architecture facilitates hidden partitioning, schema evolution without data rewriting, and supports real-time data updates with low latency. It improves performance through data compaction and enables historical state queries via immutable snapshots. The format supports both Merge-on-Read and Copy-on-Write operations, ensuring data integrity with concurrent writes, and its hierarchical metadata management scales across different catalogs. Additionally, Iceberg's flexibility and integration capabilities allow it to effectively handle change data capture and optimize streaming data pipelines.