Make Big Data Small Again with Redshift ZSTD Compression
Blog post from Snowplow
Amazon Redshift's introduction of Zstandard (ZSTD) compression in 2017 offers Snowplow users enhanced storage and performance efficiencies when managing large datasets. ZSTD, developed by Facebook, provides superior compression ratios and faster decompression speeds compared to older algorithms, making it particularly effective for compressing large VARCHAR columns such as JSON strings in Snowplow's atomic.events and other derived tables. Users can implement ZSTD compression in Redshift by either using the ANALYZE COMPRESSION feature for automated encoding recommendations or performing a manual deep copy with explicit encodings for greater control. It is advised not to compress SORTKEYs, as this can negatively impact query performance. Transitioning to ZSTD can result in significant disk space savings while maintaining query performance, though careful planning and validation are essential during migration, including pausing data pipelines and ensuring disk space availability. Tooling support for ZSTD is available through igluctl for DDL generation, although Schema Guru does not yet auto-generate ZSTD, and Snowplow BDP users can access team support for migration planning.