Home / Companies / ClickHouse / Blog / Post Details
Content Deep Dive

ClickPy at 2 Trillion rows: Scaling ingestion and fixing the past

Blog post from ClickHouse

Post Details
Company
Date Published
Author
Replacing the legacy ingestion pipeline #
Word Count
1,613
Language
English
Hacker News Points
-
Summary

ClickPy, a platform for tracking Python download statistics, achieved a significant milestone by surpassing 2 trillion rows in its main dataset, reflecting the extensive activity in the Python ecosystem since 2011. This achievement underscores ClickHouse's capability to manage high-volume analytical data with minimal maintenance. In response to this growth, ClickPy revamped its data ingestion pipeline, replacing a custom script with ClickPipes to enhance reliability and maintainability. This transition involved creating a separate database for testing the new pipeline without affecting ongoing operations, ultimately streamlining data ingestion and transformation processes. While implementing these changes, discrepancies in historical data were identified, necessitating careful corrections using ClickHouse's lightweight delete and update operations to ensure data accuracy without disrupting current ingestion activities. The ongoing improvements, driven by community feedback, have made ClickPy more robust and prepared for future expansion, with new features like chart exportation via Metabase enhancing its utility for users.