Processing Paradigms: Stream vs Batch in the ML Era

Post Details

Company

Airbyte

Date Published

Dec. 19, 2023

Author

Jacob Prall

Word Count

741

Language

English

Hacker News Points

-

Source URL

airbyte.com/blog/processing-paradigms-stream-vs-batch-in-the-ml-era

Summary

Batch and stream processing are two paradigms for efficiently handling data ingestion and processing. Batch processing involves taking finite input data, running a job on it, and producing output data. It is generally measured by throughput and data quality but can introduce significant latency into a system. Stream processing, on the other hand, consumes inputs and produces outputs continuously, operating on "events" shortly after they occur. This design allows for near-real-time data ingestion or processing. When deciding between implementing batch processing or stream processing pipelines, consider factors such as latency requirements and available resources. Both paradigms play a part in training, deploying, and maintaining quality ML models.