Home / Companies / Starburst / Blog / Post Details
Content Deep Dive

How to build data transformations with Python, Ibis, and Starburst Galaxy

Blog post from Starburst

Post Details
Company
Date Published
Author
Duy Huynh
Word Count
1,108
Language
English
Hacker News Points
-
Summary

The combination of Starburst Galaxy and Ibis offers a powerful solution for building data-intensive applications by connecting cloud data sources for processing and analysis with the optimized Trino clusters of Galaxy, ultimately presenting the data to end users through Ibis's pandas-like API. This setup provides an efficient workflow for data scientists, enabling complex data manipulations and computations across various analytical backend systems. The process involves setting up a Starburst Galaxy account, connecting a data lake catalog, and using schema discovery to register and query datasets as tables, which facilitates automation of new file discovery within the data lake. The integration is demonstrated through a tutorial using NYC Taxi trip data, illustrating how to prepare, upload, and analyze datasets with this combination, leveraging Trino's fast distributed SQL query engine and Ibis's user-friendly interface.