Home / Companies / Starburst / Blog / Post Details
Content Deep Dive

Data pipelines and data lakehouses

Blog post from Starburst

Post Details
Company
Date Published
Author
Evan Smith
Word Count
2,122
Language
English
Hacker News Points
-
Summary

Data lakehouses have evolved from traditional data lakes, offering enhanced capabilities by integrating modern table formats like Apache Iceberg, Delta Lake, and Hudi, which enable advanced querying and data processing. The architecture typically involves a three-part data pipeline comprising the Land, Structure, and Consume layers. Data begins in its raw state in the Land layer, is transformed and validated in the Structure layer, and finally becomes consumable in the Consume layer for use in business intelligence tools and data products. This process can utilize batch or streaming ingestion methods, with technologies like Kafka, Flink, and Apache Spark facilitating these operations. The use of SQL for data normalization, validation, enrichment, and technical transformation is crucial in constructing these layers, with tools such as Starburst Galaxy and Starburst Enterprise providing integration and support throughout the pipeline. Data consumption is primarily through queries, business intelligence tools, and curated data products, allowing for enhanced data visibility, discovery, and governance.