Smarter bloom filters in Parquet: how filter folding speeds up point lookups in Logfire
Blog post from Pydantic
Logfire has optimized its use of bloom filters in Parquet for more efficient point lookup queries, such as finding specific traces or filtering spans by name, by contributing an enhancement to Apache Arrow. This optimization involves a technique called "bloom filter folding," which starts with a large filter and reduces its size post-data writing to achieve an optimal fit without sacrificing effectiveness, eliminating the need for prior size estimation. This approach addresses the challenges of handling high-cardinality columns and ensures that filters are neither overly large nor saturated with false positives, resulting in faster query performance and reduced file sizes. The enhancement benefits all projects using the Rust Parquet implementation, as it has been integrated into Apache Arrow, with potential adoption by other Parquet implementations like Java. Consequently, Logfire can now perform quicker point lookups with minimal overhead, thanks to folded bloom filters that automatically adjust to the data's actual characteristics.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Observability | 1 | 4,496 | 812 | 176 | +40% |