Home / Companies / Snowplow / Blog / Post Details
Content Deep Dive

Orchestrating batch processing pipelines with cron and make

Blog post from Snowplow

Post Details
Company
Date Published
Author
Alex Dean
Word Count
1,514
Language
English
Hacker News Points
-
Summary

The blog post from Snowplow discusses a simplified approach to orchestrating multi-stage ETL pipelines using Unix tools like Make and Cron, instead of more complex orchestration tools such as AWS Data Pipeline or Airflow. The author outlines how to define a Directed Acyclic Graph (DAG) using a Makefile to manage tasks and dependencies, and how to schedule these tasks using Cron for periodic execution. The post emphasizes the strengths of this approach, such as reduced complexity and easier troubleshooting, despite lacking advanced functionalities found in dedicated orchestration tools. It also covers how to handle job failures by modifying the Makefile to resume tasks from a point of failure, showcasing its practical applicability for prototyping and managing batch processing jobs.