Company
Date Published
Author
Zaeem Athar
Word count
2042
Language
English
Hacker News points
None

Summary

This text describes how to orchestrate unstructured data pipelines using Dagster and dlt, an open-source Python library that allows declarative loading of messy data sources into well-structured tables or datasets through automatic schema inference and evolution. The pipeline is created using a simple `dlt` command and then converted into an asset and resource before being orchestrated using Dagster. The text also shows how to orchestrate dlt MongoDB verified sources using Dagster, utilizing the `@multi_asset` feature to create separate assets for each collection under a database. The resulting data is ingested into BigQuery, demonstrating the potential of combining dlt and Dagster for building robust and scalable data pipelines.