How we solved RevenueCat’s biggest challenges on data ingestion into Snowflake

Post Details

Company

RevenueCat

Date Published

April 15, 2024

Author

Jesús Sánchez

Word Count

2,182

Language

English

Hacker News Points

-

Source URL

www.revenuecat.com/blog/engineering/data-ingestion-snowflake

Summary

RevenueCat's data management strategy involves replicating production data to Snowflake for various analytical needs, utilizing a complex pipeline that includes Aurora Postgres, Debezium, Kafka, and S3 to capture and process data changes. Initially, they faced challenges with commercial solutions for replicating data to Snowflake, leading them to develop a custom model leveraging Snowflake's external tables and streams to efficiently manage data ingestion. This system allows them to track and consolidate changes, reducing ingestion latency and cost significantly by adopting a hybrid approach that includes continuous ingestion and daily consolidation processes. By creating "low-latency views" and investing in tooling, RevenueCat has improved efficiency and flexibility, enabling them to handle their extensive, update-heavy datasets more effectively. Their approach highlights the importance of using technology like Debezium and Kafka for cost-effective replication and the value of maintaining a queryable data lake for investigation and analysis.