Blocked Bloom Filters: Speeding Up Point Lookups in Tiger Postgres' Native Columnstore
Blog post from Tiger Data
The article delves into the integration of Bloom filters in TimescaleDB's columnstore to enhance the speed of point-lookups, particularly in scenarios involving large-scale time-series data and analytics workloads. It explains how traditional columnstores struggle with queries on unsorted fields due to the need to decompress and scan every block, a process that can be time-consuming. Bloom filters are introduced as a solution that allows the database to quickly determine whether a value is definitely not in a batch, thus significantly reducing the number of blocks that need to be scanned. This approach is particularly effective for queries involving non-temporal identifiers, such as UUIDs or transaction IDs, leading to performance improvements of up to 100x. The article further explains the mechanics of Bloom filters, including their efficiency and limitations, and highlights TimescaleDB's implementation of "blocked Bloom filters" which optimize performance by reducing I/O operations. While Bloom filters excel in exact match queries, they have limitations in handling range queries or not-equal searches, but they offer a substantial boost in speed for many common use cases without requiring manual configuration, aligning with TimescaleDB's goal of offering speed without sacrificing flexibility.