How ClickHouse became fast at joins
Blog post from ClickHouse
Over the past two years, ClickHouse has significantly enhanced its performance on join-heavy analytical workloads, achieving a 26× speed increase on the TPC-H SF100 benchmark compared to version 22.4. This improvement was accomplished through targeted engineering efforts, focusing on making joins a core strength of the system. In the first year, foundational updates like faster parallel hash joins, smarter planning, and aggressive filter pushdown were implemented, resulting in a 4.4× speedup by version 25.4. The second year introduced further enhancements such as correlated subqueries, lazy column replication, runtime filters, and statistics-based join reordering, which collectively contributed to an additional 6× speed increase. These advancements have allowed ClickHouse to execute complex join queries more efficiently and cost-effectively, enabling it to compete with platforms like Snowflake, Databricks, BigQuery, and Redshift. The company plans to continue optimizing join performance with ongoing developments, including distributed joins to handle even larger workloads.