Building a SQL-based data pipeline with Trino & Starburst

Post Details

Company

Starburst

Date Published

Sept. 26, 2023

Author

Evan Smith

Word Count

637

Language

English

Hacker News Points

-

Source URL

www.starburst.io/blog/sql-based-data-pipeline

Summary

Starburst Galaxy's tutorial series provides a comprehensive guide for data engineers to build and manage SQL-based data pipelines using modern data lakes. This is part of the Starburst Academy's free course, which emphasizes the simplicity and efficiency of SQL over more complex alternatives like Python UDFs. The tutorial focuses on constructing a modern data lakehouse architecture comprising three layers: Land, Structure, and Consume, using Starburst Galaxy and SQL, with the BlueBikes dataset as a practical example. Participants are guided through downloading the dataset, creating the Land layer to receive raw data, transforming it in the Structure layer, and making it query-ready in the Consume layer. Additionally, the series highlights the integration of Starburst Galaxy with dbt Cloud for automation, enhancing efficiency in data engineering workflows by automating tasks according to a schedule.