Run Qwen3 Embedding on NVIDIA Blackwell GPUs

Company

Baseten

Date Published

Aug. 4, 2025

Author

Amir Haghighat 4 others

Word count

345

Language

English

Hacker News points

None

URL

www.baseten.co/blog/run-qwen3-embedding-on-nvidia-blackwell-gpus

Summary

Baseten has introduced Baseten Embeddings Inference (BEI) for Blackwell GPUs, which allows users to leverage the latest open-source embedding models, such as Qwen3 Embedding, and NVIDIA GPUs with optimal performance. The Qwen3 Embedding 8B model, known for its multilingual capabilities and reasoning skills, currently ranks first on the Massive Text Embedding Leaderboard with a mean task score of 70.58%. Benchmarks show that BEI on B200s provides significant performance advantages, processing 1.5 times more tokens per second than the next best solution, and offering 3.3 to 8.4 times higher throughput compared to other systems in both high and low query-throughput tests. For those interested in deploying the Qwen3 Embedding 8B or optimizing AI workloads, Baseten offers a Model Library and additional resources, including a technical deep dive and documentation.