Automating the “Icehouse” – Fully-managed Open Lakehouse Platform on Starburst Galaxy
Blog post from Starburst
Starburst Galaxy has launched a fully-managed lakehouse platform called Icehouse, built on open-source technologies Trino and Iceberg, aimed at simplifying data ingestion, management, and querying for organizations. This platform supports Delta Lake and Apache Hudi, but emphasizes Iceberg as the preferred architecture for open lakehouses, adopted by major companies like Netflix and Apple for analytics and AI/ML use cases. The Icehouse implementation automates complex data engineering tasks such as data ingestion, quality checks, and schema changes, offering exactly-once processing and real-time transformation of data from sources like Apache Kafka into Iceberg tables stored in Amazon S3. Additionally, automated data maintenance and optimization enhance table storage and query performance without manual intervention. With the integration of Trino, users can run SQL queries on Iceberg tables for efficient data analytics, while the governance layer, Gravity, provides access controls and observability features. Starburst is currently offering a private preview of this platform, positioning it as a cost-effective alternative to proprietary systems without vendor lock-in.