How Wix Built AI-Driven Incident Response at Scale with ClickHouse and Wild Moose
Blog post from ClickHouse
Wix, a global platform for digital presence, manages a highly complex production environment with over 300 million users and 4,000+ microservices, necessitating robust incident response mechanisms. To enhance the speed and accuracy of incident response, Wix turned to AI-driven solutions, leveraging its existing infrastructure and zero-error culture. The company integrated ClickHouse, a high-performance columnar database, for efficient log management and real-time analytics, which proved essential for their AI-driven system, Wild Moose. Wild Moose, an AI platform designed to mimic the investigative processes of engineers, automates incident response by processing vast amounts of data quickly, providing enriched alerts and reducing manual workload. This system allows engineers to start with context and probable root causes, improving accuracy and reducing mean time to resolution (MTTR), which in turn boosts team morale and decreases on-call stress. By combining AI, observability, and a strong data infrastructure, Wix has created a scalable, AI-driven reliability system that transforms tribal knowledge into institutional knowledge, thereby maintaining their zero-error tolerance while continuing to scale their platform.