Streaming with Change Data Capture to ClickHouse
Blog post from Streamkap
Streamkap has introduced a high-performance ClickHouse database connector designed for streaming Change Data Capture (CDC) data into ClickHouse, a column-oriented real-time database known for its quick analytical queries. The connector utilizes technologies such as Apache Kafka, Kafka Connect, Debezium, and Apache Flink to ensure high throughput and zero maintenance, making it ideal for fast-paced environments. It supports both insert and upsert modes, with the latter ensuring deduplication via the ReplacingMergeTree engine. Streamkap's connector also handles metadata addition, schema evolution, and data transformation efficiently, including support for semi-structured data and schema drift. Performance tests highlight its scalability, achieving up to 85,000 CDC records per second in upsert mode. Streamkap's solution allows for near-instantaneous streaming of CDC data into ClickHouse, optimized with automated features for both bulk and streaming modes, making it suitable for production pipelines with linear scalability.