Home / Companies / ClickHouse / Blog / Post Details
Content Deep Dive

ClickHouseとともに歩むDeepLの軌跡

Blog post from ClickHouse

Post Details
Company
Date Published
Author
始まり #
Word Count
126
Language
English
Hacker News Points
-
Summary

DeepL's integration of ClickHouse as a central data warehouse has been pivotal in enhancing its analytics capabilities across various applications, such as website and app analysis, company metrics provisioning, and technical monitoring. Initially adopted in 2020 to build privacy-conscious analytics, ClickHouse's ease of deployment via a single binary proved advantageous over alternatives like Hadoop, enabling DeepL to swiftly establish a Minimum Viable Product (MVP). The MVP, comprising an API for event transmission, Kafka for message brokering, and Metabase for visualization, demonstrated ClickHouse's ability to efficiently handle significant data volumes and rapid queries. Automation became a focus to streamline operations, leading to the implementation of a single source of truth for events and table schemas using protobuf. This setup allowed the creation of complex events and queries essential for understanding user interactions with DeepL, surpassing the capabilities of tools like Google Analytics while maintaining data control and privacy. Over time, DeepL expanded the system to include more data sources, facilitating a transition to data-driven development. The platform's evolution included scaling from a single-node to a 3-shard, 3-replica cluster, processing about 500 million rows of raw data daily. The enriched data infrastructure enabled the development of additional capabilities such as an experimentation framework for A/B testing and a machine learning infrastructure for personalizing user experiences. These advancements allowed for rapid iterations on frontend and backend changes and supported cultural shifts towards data-centric decision-making within DeepL.