Processing Paradigms: Stream vs Batch in the ML Era
Blog post from Airbyte
Batch and stream processing are two paradigms for efficiently handling data ingestion and processing. Batch processing involves taking finite input data, running a job on it, and producing output data. It is generally measured by throughput and data quality but can introduce significant latency into a system. Stream processing, on the other hand, consumes inputs and produces outputs continuously, operating on "events" shortly after they occur. This design allows for near-real-time data ingestion or processing. When deciding between implementing batch processing or stream processing pipelines, consider factors such as latency requirements and available resources. Both paradigms play a part in training, deploying, and maintaining quality ML models.