Building a SQL-based data pipeline with Trino & Starburst
Blog post from Starburst
Starburst Galaxy's tutorial series provides a comprehensive guide for data engineers to build and manage SQL-based data pipelines using modern data lakes. This is part of the Starburst Academy's free course, which emphasizes the simplicity and efficiency of SQL over more complex alternatives like Python UDFs. The tutorial focuses on constructing a modern data lakehouse architecture comprising three layers: Land, Structure, and Consume, using Starburst Galaxy and SQL, with the BlueBikes dataset as a practical example. Participants are guided through downloading the dataset, creating the Land layer to receive raw data, transforming it in the Structure layer, and making it query-ready in the Consume layer. Additionally, the series highlights the integration of Starburst Galaxy with dbt Cloud for automation, enhancing efficiency in data engineering workflows by automating tasks according to a schedule.