What Replo learned optimizing 100+ billion events in ClickHouse
Blog post from ClickHouse
Replo utilizes ClickHouse to facilitate real-time, in-product analytics for Shopify merchants, enabling them to track live page interactions, offers, and A/B testing outcomes. Initially, Replo faced challenges with its analytics pipeline, such as inefficiencies in data recomputation and session-level metric redundancies, which led to the development of a more structured data model incorporating customer-specific namespaces and precomputed metrics. This approach reduced the computational load and improved query performance, allowing for the processing of over 100 billion events while maintaining responsive dashboards and accurate real-time analytics. The system evolved further to handle complex requirements like fractional attribution, eventually leading to a refined architecture that focuses on real-time session events to avoid the inefficiencies of full historical dataset scanning. As a result, Replo's analytics infrastructure became more scalable and reliable, supporting continued performance optimization and future enhancements.