Home / Companies / Pydantic / Blog / Post Details
Content Deep Dive

Smarter bloom filters in Parquet: how filter folding speeds up point lookups in Logfire

Blog post from Pydantic

Post Details
Company
Date Published
Author
-
Word Count
954
Company Posts That Month
13
Language
English
Hacker News Points
-
Summary

Logfire has optimized its use of bloom filters in Parquet for more efficient point lookup queries, such as finding specific traces or filtering spans by name, by contributing an enhancement to Apache Arrow. This optimization involves a technique called "bloom filter folding," which starts with a large filter and reduces its size post-data writing to achieve an optimal fit without sacrificing effectiveness, eliminating the need for prior size estimation. This approach addresses the challenges of handling high-cardinality columns and ensures that filters are neither overly large nor saturated with false positives, resulting in faster query performance and reduced file sizes. The enhancement benefits all projects using the Rust Parquet implementation, as it has been integrated into Apache Arrow, with potential adoption by other Parquet implementations like Java. Consequently, Logfire can now perform quicker point lookups with minimal overhead, thanks to folded bloom filters that automatically adjust to the data's actual characteristics.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
Observability 1 4,496 812 176 +40%