Company
Date Published
Author
Tom Schreiber
Word count
3240
Language
English
Hacker News points
None

Summary

The Direct Join algorithm is the fastest join algorithm in ClickHouse, applicable when the underlying storage for the right-hand side table supports low latency key-value requests. It beats all other ClickHouse join algorithms with a significant improvement in execution time, especially with large right-hand side tables. The algorithm requires that the right table is backed by a dictionary, which allows for extremely fast key-value lookups with O(1) time complexity. The direct join run from the query using a flat memory layout dictionary is ~25 times faster than the hash join run and ~15 times faster than the parallel hash join run. Even with added dictionary bytes_allocated to peak memory consumption, it remains lower compared to the hash algorithm runs.