Home / Companies / Tinybird / Blog / Post Details
Content Deep Dive

Adding JOIN support for parallel replicas on ClickHouse ®

Blog post from Tinybird

Post Details
Company
Date Published
Author
Javi Santana
Word Count
2,350
Language
English
Hacker News Points
-
Summary

Tinybird is leveraging the speed of the columnar database ClickHouse® to enhance query performance on large datasets by exploring the use of parallel replicas, a feature introduced in ClickHouse's April 23.3 release. Parallel replicas allow for query execution to be distributed across multiple servers, each holding a full copy of the data, offering a blend of sharding's performance benefits and replication's fault tolerance. This approach is particularly beneficial for complex queries over massive data sets, often involving JOINs, which are essential for Tinybird's real-time data platform. Despite the initial lack of JOIN support in parallel replicas, Tinybird contributed a solution to the ClickHouse® community, enabling INNER JOINs through broadcasting techniques. This advancement significantly reduces execution time for complex queries, as demonstrated by a test that cut a JOIN operation on 64 billion rows from 47 seconds to under 8 seconds using parallel replicas. While this method isn't always advantageous for simple queries due to overhead, it shows promise for scaling ClickHouse® clusters to handle trillions of rows at sub-second latency, marking a potential leap in real-time data processing capabilities.