Your client code matters: 12x higher embedding throughput with Python and Rust

Company

Baseten

Date Published

June 13, 2025

Author

Michael Feil

Word count

1280

Language

English

Hacker News points

None

URL

www.baseten.co/blog/your-client-code-matters-10x-higher-embedding-throughput-with-python-and-rust

Summary

The Baseten Performance Client is an open-source Python library that improves throughput for high-volume embedding tasks by releasing the Global Interpreter Lock (GIL) during network-bound tasks, allowing true parallel request execution. This results in lower latencies under heavy loads, with a 12x speedup compared to the standard AsyncOpenAI client at extreme scale. The client is compatible with OpenAI and other inference providers, and its architecture utilizes multi-core CPUs to maximize throughput. It can be easily integrated into existing codebases and supports both synchronous and asynchronous usage, making it suitable for various use cases such as embedding large datasets or serving thousands of embedding queries in parallel.