Home / Companies / Dagster / Blog / Post Details
Content Deep Dive

Sample-Level Versioning for ML Pipelines with Dagster and Metaxy

Blog post from Dagster

Post Details
Company
Date Published
Author
Daniel Gafni
Word Count
1,923
Language
English
Hacker News Points
-
Summary

Metaxy is an open-source framework designed to enhance the granularity of data versioning in multimodal data pipelines, particularly when integrated with Dagster. Created by Daniel Gafni, an MLOps engineer at Anam, Metaxy addresses the challenge of sample-level versioning by connecting orchestrators like Dagster with low-level processing engines, enabling precise processing of individual samples. This approach solves the inefficiencies encountered with traditional data versioning systems, which often require unnecessary recomputation of data not impacted by changes. Metaxy introduces a system where data fields are individually versioned, allowing for partial updates to be efficiently managed and irrelevant computations to be skipped. Its versatile and infrastructure-agnostic design supports various environments, cloud providers, and data engines, making it suitable for diverse use cases. By integrating seamlessly with Dagster, Metaxy allows users to focus on data transformations, while managing row-level orchestration with sub-sample granularity, and it can operate independently of Dagster as well. The framework is built upon open-source projects such as Narwhals and Ibis, which facilitate compatibility with a wide range of databases and data processing engines.