How we replicate a write-heavy Kvrocks dataset in real time

Post Details

Company

RevenueCat

Date Published

Aug. 21, 2023

Author

Tony Cosentini

Word Count

1,114

Language

English

Hacker News Points

-

Source URL

www.revenuecat.com/blog/engineering/how-we-replicate-kvrocks-dataset

Summary

Over the past year, a company transitioned some subscriber data storage from PostgreSQL to Kvrocks to manage the high volume of writes more effectively and reduce performance issues associated with PostgreSQL's write-ahead-log and associated costs. Kvrocks, leveraging RocksDB, offers a more suitable environment for handling frequent data changes and integrates easily with existing application code due to its use of the Redis protocol. To ensure data from Kvrocks could still be used for aggregation tasks in their data warehouse, they developed a system to stream changes from Kvrocks to the warehouse in near real-time. By utilizing RocksDB's getUpdatesSince API, they were able to track changes and implement a replication system using epochs as consistent checkpoints across multiple Kvrocks nodes. This system, implemented as a small Kotlin-based web service, sends changes from Kvrocks to an S3 bucket, from which they are upserted into Snowflake. Despite being somewhat hacky, this approach has proven effective, maintaining minimal latency of about 10-15 seconds from data change to replication, and providing a robust solution for distributing Kvrocks data to other data stores.