Setting up Trino for dbt

Post Details

Company

Starburst

Date Published

Nov. 30, 2022

Author

Przemek Denkiewicz

Word Count

1,166

Company Posts That Month

14

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.starburst.io/blog/lakehouse-data-pipeline-dbt-trino/etl

Summary

Trino is a distributed SQL query engine designed for querying large datasets across heterogeneous data sources, supporting Online Analytical Processing (OLAP) workloads such as data warehousing and analytics rather than functioning as a general-purpose relational database. This guide, authored by Przemek Denkiewicz and Michiel De Smet from Starburst, details the setup of Trino with dbt for lakehouse ETL processes, emphasizing the use of Docker and Docker Compose for managing multiple containers needed for this configuration. Key components include Trino for executing distributed queries, along with PostgreSQL for a webshop database, MongoDB for clickstream data, and the Iceberg table format for the lakehouse, all orchestrated through a YAML file to streamline service configuration and management. The document highlights the introduction of the MERGE statement in Trino version 393, enhancing ETL/ELT operations, and notes that Starburst products like Starburst Galaxy and Starburst Enterprise support dbt-trino, enabling incremental models and snapshot features.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Data Pipeline	3	484	117	47	+49%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.