From Python Projects to Dagster Pipelines

Company

Dagster

Date Published

April 14, 2023

Author

Elliot Gunn

Word count

2094

Language

English

Hacker News points

None

URL

dagster.io/blog/data-engineering-in-python

Summary

This guide is for beginners who want to start their first data engineering project with a basic understanding of Python. It focuses on using Dagster, an open-source solution for data orchestration, and provides a step-by-step approach to creating successful data pipelines. The guide covers setting up the project's root directory, launching a virtual environment, installing Dagster and scaffolding an initial project, declaring assets in Dagster, and understanding serialization in Dagster. It also introduces software-defined assets, which enable a declarative approach to data management, making it easier to manage and organize code. The guide provides examples of creating two assets: hackernews_top_story_ids and hackernews_top_stories, and demonstrates how to run the pipeline and create assets using Dagster's user interface.