First dbt-trino data pipeline

Post Details

Company

Starburst

Date Published

Nov. 30, 2022

Author

Przemek Denkiewicz

Word Count

2,451

Language

English

Hacker News Points

-

Source URL

www.starburst.io/blog/lakehouse-data-pipeline-dbt-trino/install-dbt

Summary

In the blog post, the authors discuss integrating dbt with Trino to create a data pipeline that enhances data transformation capabilities by leveraging Trino's query federation across diverse data sources. They provide a detailed guide on setting up the dbt-trino adapter, configuring connection profiles, and building initial models. The process involves defining dbt sources for external data objects, which streamlines the management of data locations and facilitates changes. The post highlights the addition of extract and load (EL) functions to dbt's traditional transformation (T) role, enabled by Trino's capabilities, allowing seamless data integration without complex processes. The authors explain how dbt implicitly constructs a Direct Acyclic Graph (DAG) for managing data pipeline steps, and they delve into advanced SQL features like WINDOW operations for sessionization. They emphasize the importance of materialization choices for optimizing performance and introduce testing with dbt to ensure the accuracy of sessionization logic. The use of dbt macros, particularly from dbt-utils and trino-utils packages, is discussed for enhancing functionality. Additionally, the article covers the use of dbt seeds for incorporating static data and the potential for incremental refreshing to improve efficiency.