Building lakehouse with dbt and Trino
Blog post from Starburst
The series focuses on building data pipelines using dbt and Trino to process data directly from operational systems, ideal for analytics and data engineers seeking to enhance their workflows. It emphasizes the use of dbt's transformation workflow, which allows teams to develop analytics code using software engineering best practices such as modularity and CI/CD, leveraging SQL proficiency for production-grade pipelines. Trino's capability to federate data from various sources while supporting modern table formats like Iceberg enhances accessibility without data duplication. Iceberg adds ACID-like properties to the lakehouse architecture with features such as partitioning, schema evolution, and time travel, facilitating efficient data querying and management. The series, illustrated through a data-driven e-commerce business use case, offers tutorials on setting up dbt and Trino, utilizing incremental models for data refresh, and includes a GitHub repository for practical exploration.