Introducing Dynamic Definitions for Flexible Asset Partitioning

Company

Dagster

Date Published

May 19, 2023

Author

Claire Lin

Word count

1258

Language

English

Hacker News points

None

URL

dagster.io/blog/dynamic-partitioning

Summary

Modern data pipelines are responsible for applying consistent computations to diverse batches of data. However, when dealing with a large number of data assets from different sources, understanding their lineage and keeping track of up-to-date information becomes challenging. To address this issue, Dagster has introduced dynamic partitioning, a strategy that enables a single pipeline to process items selectively from a data collection rather than managing separate parallel pipelines for each asset in the collection. This feature offers flexibility and declarative data management capabilities. By declaratively defining partitions, users can detect new files, reprocess corrupted data, run backfills, track execution progress, and simplify incremental updates. Dagster's dynamic partitioning allows users to model a data collection as a single, dynamically partitioned asset, providing granular control over the pipeline and high-level observability of the data lineage. This enables efficient processing of large datasets while maintaining a simplified and condensed view of the history of the data collection.