From ts_rank to BM25. Introducing pg_textsearch: True BM25 Ranking and Hybrid Retrieval Inside Postgres
Blog post from Tiger Data
pg_textsearch is a new PostgreSQL extension designed to enhance AI-native applications by providing a modern BM25 ranking system, combining vector and keyword search capabilities within a single database. This extension addresses the limitations of Postgres' native full-text search by introducing improvements like inverse document frequency weighting, term frequency saturation, and length normalization to ensure high-quality search results. It is particularly beneficial for systems such as Retrieval-Augmented Generation (RAG) and chat agents that rely on precise and contextually relevant information retrieval. The extension integrates seamlessly with Postgres and provides a hybrid search approach that combines the conceptual similarity of vector search with the precision of keyword matching, enhancing the performance and relevance of search results for AI applications. The preview release focuses on a memtable layer for fast in-memory operations, with future plans to incorporate disk-based segments and advanced query optimizations.