Home / Companies / Baseten / Blog / Post Details
Content Deep Dive

Your client code matters: 12x higher embedding throughput with Python and Rust

Blog post from Baseten

Post Details
Company
Date Published
Author
Michael Feil
Word Count
1,280
Language
English
Hacker News Points
-
Summary

The Baseten Performance Client is an open-source Python library that improves throughput for high-volume embedding tasks by releasing the Global Interpreter Lock (GIL) during network-bound tasks, allowing true parallel request execution. This results in lower latencies under heavy loads, with a 12x speedup compared to the standard AsyncOpenAI client at extreme scale. The client is compatible with OpenAI and other inference providers, and its architecture utilizes multi-core CPUs to maximize throughput. It can be easily integrated into existing codebases and supports both synchronous and asynchronous usage, making it suitable for various use cases such as embedding large datasets or serving thousands of embedding queries in parallel.