Llama‑Embed‑Nemotron‑8B Text Embedding Model Ranks First on Multilingual MTEB Leaderboard

Company

HuggingFace

Date Published

Oct. 21, 2025

Author

Yauhen Babakhin, Radek Osmulski, Ronay Ak, Gabriel de Souza Pereira Moreira, and Mengyao Xu

Word count

706

Language

Hacker News points

None

URL

huggingface.co/blog/nvidia/llama-embed-nemotron-8b

Summary

NVIDIA's Llama-Embed-Nemotron-8B is a cutting-edge text embedding model that has achieved top performance on the multilingual MTEB leaderboard, excelling in tasks across 1,038 languages. Built by fine-tuning the Llama-3.1-8B foundation model, it addresses the challenges of traditional multilingual models by utilizing cross-lingual representation learning to provide consistent, high-fidelity embeddings. The model's architecture includes 7.5 billion parameters and uses bi-directional self-attention for enhanced semantic understanding. It employs a bi-encoder architecture and contrastive learning to optimize semantic search, trained on a mix of 16 million data pairs from both public and synthetic datasets. With its ability to generate unified embeddings across diverse languages, Llama-Embed-Nemotron-8B enables the development of intelligent, inclusive multilingual applications, making it a valuable tool for building cross-language retrieval systems and enhancing semantic similarity tasks.