Orchestrating batch processing pipelines with cron and make
Blog post from Snowplow
The blog post from Snowplow discusses a simplified approach to orchestrating multi-stage ETL pipelines using Unix tools like Make and Cron, instead of more complex orchestration tools such as AWS Data Pipeline or Airflow. The author outlines how to define a Directed Acyclic Graph (DAG) using a Makefile to manage tasks and dependencies, and how to schedule these tasks using Cron for periodic execution. The post emphasizes the strengths of this approach, such as reduced complexity and easier troubleshooting, despite lacking advanced functionalities found in dedicated orchestration tools. It also covers how to handle job failures by modifying the Makefile to resume tasks from a point of failure, showcasing its practical applicability for prototyping and managing batch processing jobs.