Home / Companies / Snowplow / Blog / Post Details
Content Deep Dive

Kinesis Streams vs. S3 Buckets: What’s the Best Choice for Your Snowplow Pipeline?

Blog post from Snowplow

Post Details
Company
Date Published
Author
-
Word Count
836
Language
English
Hacker News Points
-
Summary

Snowplow pipelines on AWS utilize both Kinesis Streams and S3 Buckets, each serving distinct roles in data processing and storage. Kinesis Streams are used for real-time data streaming, offering low latency and supporting multiple consumers, making them suitable for tasks like collecting raw events and stream enrichment. In contrast, S3 Buckets provide persistent storage for raw and enriched data, facilitating batch processing and data lake integration, essential for preventing data loss and enabling downstream processing. Oversized or malformed events in Kinesis are rerouted to a bad stream or captured by S3 for later analysis, with Kinesis Firehose optionally writing directly to S3, although it adds latency due to buffering. The S3 Loader can be configured to run on the same instance as the collector for low data volumes or on dedicated instances or containers for higher throughput, ensuring that Snowplow pipelines remain robust, scalable, and fault-tolerant by leveraging the complementary strengths of Kinesis Streams and S3 Buckets.