Stream Processing with IoT Data: Challenges, Best Practices, and Techniques

Company

Confluent

Date Published

June 4, 2020

Author

Wade Waldron, Victoria Xia, Jesse Yates

Word count

7295

Language

English

Hacker News points

None

URL

www.confluent.io/blog/stream-processing-iot-data-best-practices-and-techniques

Summary

The rise of IoT devices means that collecting, processing, and analyzing vast amounts of data has become increasingly important. Building an infrastructure to handle this data can be challenging due to factors such as variable connectivity, bursty data, and long tails of firmware versions. Apache Kafka, Flink, and MongoDB are key technologies in handling these challenges. To build a system that can scale horizontally without becoming harder to run or adding significant operational overhead, the core piece of technology is Apache Kafka, which provides resilient storage, native stream processing capabilities, and blazing-fast performance while maintaining high throughput. The system also needs to handle large messages, which can be done through parallelization, buffering, or diverting them to a "slow lane" topic. Additionally, metadata streams can provide powerful queries that help understand the state of the fleet, such as determining the relative coverage for every device in the fleet. Ultimately, managing these dataflows should not be the goal of teams building out these tools and pipeline components, but rather empowering end users to build and manage pipelines.