Introducing RTEB: A New Standard for Retrieval Evaluation

Post Details

Company

HuggingFace

Date Published

Oct. 1, 2025

Author

Frank Liu, Kenneth C. Enevoldsen, Solomatin Roman, Isaac Chung, Tom Aarsen, and Fődi, Zoltán

Word Count

2,833

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/rteb

Summary

The Retrieval Embedding Benchmark (RTEB) introduces a new standard for evaluating the retrieval accuracy of embedding models, addressing limitations found in existing benchmarks. RTEB employs a hybrid strategy that combines open and private datasets, aiming to provide a fair and transparent measure of models' generalization capabilities on unseen data. This approach helps mitigate overfitting by revealing performance discrepancies between open and private datasets, thereby encouraging robust model development. RTEB is designed with a focus on real-world applications, covering 20 languages and encompassing critical enterprise domains such as law, healthcare, code, and finance. It uses datasets that are large enough to be meaningful without being overly cumbersome for evaluation, and it employs NDCG@10 as the default metric for assessing search result quality. The benchmark is multilingual and domain-specific, supporting enterprise use cases and encouraging community involvement to expand language coverage and dataset variety.