Hash tables in ClickHouse and C++ Zero-cost Abstractions
Blog post from ClickHouse
Hash tables are the core data structure used in ClickHouse for aggregation and join operations. They provide constant average performance for insert, lookup, and delete operations. The design of a hash table is critical for performance, as it depends on factors such as the type of keys, unique keys, and load factor. ClickHouse uses a zero-cost C++ framework to generate ideal hash tables for specific use cases. The framework provides a policy-based design, allowing users to customize the hash function, allocator, cell, grower, and hash table itself. It also includes features such as zero-value storage, custom resizing policies, and LRU caches. Specialized hash tables are available for various scenarios, including small tables, string hash tables, and two-level hash tables. The framework's performance is optimized for cache locality, making it suitable for large-scale data aggregation and join operations.