Company
Date Published
Author
Sandy Ryza
Word count
1795
Language
English
Hacker News points
24

Summary

This text introduces the concept of declarative, asset-based scheduling in Dagster, a software development tool. It explains how this approach differs from traditional workflow-based orchestrators and how it can simplify data pipeline management. The system models each data asset as a function of its predecessors and schedules work based on how up-to-date data needs to be. This allows for more efficient and flexible computation, especially in machine learning applications where data freshness requirements vary greatly. The text also covers key concepts such as freshness policies, granular versioning, and partitions, which enable automatic materialization of assets based on business rules or changes in upstream data or code.