LyftがClickHouse Cloudでバッチ分析とリアルタイム分析をどう実現しているか
Blog post from ClickHouse
Lyft transitioned from Apache Druid to ClickHouse Cloud to handle over 450TB of data daily and process hundreds of queries per second with simplified operations, integrating both S3 batch ingestion and Kafka/Kinesis real-time streaming. This shift from a self-managed to a managed service significantly reduced operational burdens and facilitated continuous improvements such as automated access control and schema synchronization elimination. Motivated by the goal of connecting people, Lyft provided over 800 million rides to more than 40 million people in the U.S. and Canada last year. The company relies on fast, accurate insights to enhance services and business decisions, processing vast amounts of data for historical analysis and real-time decision-making. Engineers Jeana Choi and Ritesh Varyani discussed the move to ClickHouse Cloud, highlighting its performance, scalability, and simplified infrastructure. They noted the benefits of reduced learning costs, built-in data deduplication, and lower operational costs. The transition encountered challenges, such as adapting to cloud-specific features and reconfiguring internal systems, but resulted in a more scalable and maintainable system capable of efficiently processing terabyte-scale batch data. Real-time analytics, crucial for daily operations, are powered by a streaming pipeline using Kafka and Kinesis, with Apache Flink as the processing layer. The system eliminates manual synchronization of protobuf schema definitions with ClickHouse, using Java reflection for dynamic deserialization. Despite facing challenges in the migration process, Lyft continues to evolve its data stack, focusing on deeper integration within its data platform to improve scalability and operational efficiency.