Orchestrating unstructured data pipeline with Dagster and dlt.

Company

dltHub

Date Published

Nov. 1, 2023

Author

Zaeem Athar

Word count

2042

Language

English

Hacker News points

None

URL

dlthub.com/blog/dlt-dagster

Summary

This text describes how to orchestrate unstructured data pipelines using Dagster and dlt, an open-source Python library that allows declarative loading of messy data sources into well-structured tables or datasets through automatic schema inference and evolution. The pipeline is created using a simple `dlt` command and then converted into an asset and resource before being orchestrated using Dagster. The text also shows how to orchestrate dlt MongoDB verified sources using Dagster, utilizing the `@multi_asset` feature to create separate assets for each collection under a database. The resulting data is ingested into BigQuery, demonstrating the potential of combining dlt and Dagster for building robust and scalable data pipelines.