Company
Date Published
Author
Maksim Kita
Word count
4060
Language
English
Hacker News points
3

Summary

Hash tables are the core data structure used in ClickHouse for aggregation and join operations. They provide constant average performance for insert, lookup, and delete operations. The design of a hash table is critical for performance, as it depends on factors such as the type of keys, unique keys, and load factor. ClickHouse uses a zero-cost C++ framework to generate ideal hash tables for specific use cases. The framework provides a policy-based design, allowing users to customize the hash function, allocator, cell, grower, and hash table itself. It also includes features such as zero-value storage, custom resizing policies, and LRU caches. Specialized hash tables are available for various scenarios, including small tables, string hash tables, and two-level hash tables. The framework's performance is optimized for cache locality, making it suitable for large-scale data aggregation and join operations.