Building lakehouse with dbt and Trino

Post Details

Company

Starburst

Date Published

Nov. 30, 2022

Author

Michiel De Smet

Word Count

713

Language

English

Hacker News Points

-

Source URL

www.starburst.io/blog/lakehouse-data-pipeline-dbt-trino

Summary

The series focuses on building data pipelines using dbt and Trino to process data directly from operational systems, ideal for analytics and data engineers seeking to enhance their workflows. It emphasizes the use of dbt's transformation workflow, which allows teams to develop analytics code using software engineering best practices such as modularity and CI/CD, leveraging SQL proficiency for production-grade pipelines. Trino's capability to federate data from various sources while supporting modern table formats like Iceberg enhances accessibility without data duplication. Iceberg adds ACID-like properties to the lakehouse architecture with features such as partitioning, schema evolution, and time travel, facilitating efficient data querying and management. The series, illustrated through a data-driven e-commerce business use case, offers tutorials on setting up dbt and Trino, utilizing incremental models for data refresh, and includes a GitHub repository for practical exploration.