ClickHouse Cloud: Fast, Updatable Lookups with the Join Table Engine
Blog post from ClickHouse
In ClickHouse, dictionaries and Join tables are utilized to optimize join performance in dimensional modeling, a technique involving facts and dimensions based on the Kimball methodology. While dictionaries hold dimension data in memory for direct joins, the Join table engine allows for in-memory structures supporting specific join types, enhancing performance by persisting data on disk. Despite some drawbacks in the open-source version, such as lack of distribution and inefficient handling of frequent updates, ClickHouse Cloud addresses these issues with a SharedJoin table backed by a MergeTree family, enabling efficient upserting, deduplication, and data compaction. This structure is especially beneficial for implementing Type 1 slowly changing dimensions using an ANY LEFT join, ensuring that only the latest entries are maintained in memory. The cloud version thus facilitates scalable and high-performing data enrichment processes.