Process More, Spend Less: A Year of Breakthrough Snowplow Pipeline Improvements
Blog post from Snowplow
Over the past year, significant advancements have been made in Snowplow's core pipeline and customer data infrastructure, enhancing the processing, storage, and value extraction from event data. Notable improvements include Enrich's ability to process up to 50% more events per CPU and the Collector handling requests three times faster than previous versions, all through architectural optimizations. Enrich now allows filtering of unwanted events, reducing infrastructure costs by eliminating unnecessary data processing. Upcoming features like the Databricks Streaming Loader promise near-instantaneous data delivery, while new compression capabilities aim to cut infrastructure costs by up to 15%. Security has been bolstered by addressing critical vulnerabilities and phasing out unsupported components. As Snowflake plans to deprecate password authentication, Snowplow's streaming loader ensures compliance with new key pair authentication requirements. The ongoing evolution of data infrastructure is geared towards maximizing the value of behavioral data in real-time applications, AI-driven solutions, and customer analytics, with recent releases also focusing on reducing latency and operational costs, enhancing reliability, and ensuring secure integrations.