Declarative Scheduling for Data Assets

Company

Dagster

Date Published

Dec. 8, 2022

Author

Sandy Ryza

Word count

1795

Language

English

Hacker News points

URL

dagster.io/blog/declarative-scheduling

Summary

Dagster 1.1 introduces a declarative, asset-based scheduling system that revolutionizes data pipeline management by focusing on the timely updating of data assets rather than traditional task workflows. This approach enables users to specify how up-to-date each data asset should be, and Dagster automatically schedules updates based on data freshness and change detection, thus reducing unnecessary computations. In contrast to imperative workflow-based systems like Airflow, which can be cumbersome and inefficient, Dagster's system models data assets as functions of their predecessors, allowing for more flexible and efficient scheduling. The platform offers features such as asset freshness policies, granular versioning, and partitions, enabling precise control over data updates and ensuring that data products remain current. Additionally, Dagster supports traditional workflow-based scheduling for those who prefer it, while providing tools for automatic asset materialization and integration with business rules, making it a versatile solution for modern data orchestration challenges.