Company
Date Published
Author
Frank Liu, Kenneth C. Enevoldsen, Solomatin Roman, Isaac Chung, Tom Aarsen, and Fődi, Zoltán
Word count
2833
Language
-
Hacker News points
None

Summary

The Retrieval Embedding Benchmark (RTEB) introduces a new standard for evaluating the retrieval accuracy of embedding models, addressing limitations found in existing benchmarks. RTEB employs a hybrid strategy that combines open and private datasets, aiming to provide a fair and transparent measure of models' generalization capabilities on unseen data. This approach helps mitigate overfitting by revealing performance discrepancies between open and private datasets, thereby encouraging robust model development. RTEB is designed with a focus on real-world applications, covering 20 languages and encompassing critical enterprise domains such as law, healthcare, code, and finance. It uses datasets that are large enough to be meaningful without being overly cumbersome for evaluation, and it employs NDCG@10 as the default metric for assessing search result quality. The benchmark is multilingual and domain-specific, supporting enterprise use cases and encouraging community involvement to expand language coverage and dataset variety.