Company
Date Published
Author
Tom Schreiber
Word count
2916
Language
English
Hacker News points
None

Summary

Loading trillions of rows into ClickHouse can be challenging due to transient issues like network glitches that can interrupt and stop the data load, leading to delays and potential failures. To address this challenge, ClickHouse Cloud offers ClickPipes, a managed integration solution with built-in support for continuous, fast, resilient, and scalable data ingestion from external systems such as Apache Kafka. For external data sources not supported by ClickPipes, ClickLoad is a script that can be used to load large datasets incrementally and reliably over time by utilizing object storage buckets and a stateful orchestration of the data transfer with automatic retries. The script uses a queue-worker approach to parallelize the file load process, ensuring efficient scalability and reliability in loading trillions of rows into ClickHouse tables, including support for projections, materialized views, and partitioning keys.