Home / Companies / Dagster / Blog / Post Details
Content Deep Dive

Thinking in Assets When Building Data Pipelines

Blog post from Dagster

Post Details
Company
Date Published
Author
Tim Castillo
Word Count
3,432
Company Posts That Month
10
Language
English
Hacker News Points
-
Post removed?
No
Summary

A data pipeline is a series of tasks that are orchestrated to collect, process, and transform raw data into a usable format. In this context, a data pipeline in Dagster refers to the creation of data assets, which are files or tables that contain specific data. The pipeline is designed by identifying what data assets need to be produced, breaking them down into more atomic assets, and repeating the process until reaching the source data. The pipeline can be implemented using Python code, with each asset specifying its dependencies. Dagster provides features such as auto-materialization policies, which define when an asset should be materialized, and asset checks, which verify data quality for each asset in the pipeline. The final state of the code includes all assets implemented, scheduled, documented, and some with data quality tests attached.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
Data Pipeline 15 348 132 56 -36%
LLM 2 2,401 292 122 -7%
Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.