Re-autoresearching MSMARCO BM25, on Vespa

Post Details

Company

Vespa

Date Published

May 29, 2026

Author

Andreas Eriksen

Word Count

2,338

Company Posts That Month

2

Language

English

Hacker News Points

-

Post removed?

No

Source URL

blog.vespa.ai/re-autoresearching-msmarco-bm25-on-vespa

Summary

Interest in the BM25 retrieval algorithm has surged, with Google searches increasing and OpenAI models frequently referencing it in retrieval prompts. The renewed focus on lexical search techniques like BM25 is seen as beneficial, particularly in settings where dense embedding models struggle. An autoresearch experiment by Doug Turnbull demonstrated improvements in the BM25 model using a Python reranker, which Vespa engineers attempted to replicate with their own twist, achieving significant performance gains using existing Vespa rank features. By applying techniques such as aggressive stopword filtering, proximity scoring, and early field matching, Vespa's approach showed substantial improvements in retrieval performance on the MSMARCO passage-ranking benchmark, particularly in generalizability to larger datasets. This experiment highlights the potential for further optimization in lexical search through a blend of manual tuning and machine learning methods, underpinning the enduring relevance of BM25 in information retrieval.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	10	9,074	1,640	224	+53%
Vector Search	3	2,268	422	128	+30%
AI Coding Assistant	1	1,798	527	167	+21%
RAG	1	2,105	333	83	+124%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.