Parallelizing ClickHouse aggregation merge for fixed hash map
Blog post from ClickHouse
In ClickHouse 25.11, a significant optimization was introduced to enhance the performance of aggregations on 8-bit and 16-bit keys by parallelizing the merge phase for FixedHashMap-based aggregations. This innovation, implemented by Jianfei Hu, aimed at speeding up the aggregation process by addressing the discrepancies in query performance due to differences in data type handling, such as UInt16 and UInt64, during group by operations. Hu's exploration involved overcoming challenges related to concurrency, memory management, and the limitations of existing aggregation mechanisms, which initially used a one-dimensional array for hash maps, hindering parallel processing. The solution involved a novel approach to distribute merge work among different threads without race conditions, although it faced issues such as memory corruption and performance degradation for trivial functions due to the overhead of parallel merging. These challenges were addressed by refining memory allocation practices and extracting min/max indices before parallel merging to limit iteration ranges, thereby enhancing the efficiency and reliability of the aggregation process in ClickHouse.