Driving model performance optimization: 2024 highlights |
Pankaj Gupta |
Jan 14, 2025 |
1530 |
- |
Private, secure DeepSeek-R1 in production in US & EU data centers |
Amir Haghighat, Philip Kiely |
Feb 11, 2025 |
1274 |
- |
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud |
Pankaj Gupta, Philip Kiely |
Feb 11, 2025 |
1033 |
- |
Baseten Chains is now GA for production compound AI systems |
Marius Killinger, Tyron Jung, Rachel Rapp |
Feb 12, 2025 |
1123 |
- |
How multi-node inference works for massive LLMs like DeepSeek-R1 |
Phil Howes, Philip Kiely |
Feb 15, 2025 |
1303 |
- |
Announcing Baseten’s $75M Series C |
Tuhin Srivastava |
Feb 26, 2025 |
739 |
- |
How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM |
Michael Feil, Philip Kiely |
Mar 28, 2025 |
2035 |
- |
Introducing Baseten Embeddings Inference: The fastest embeddings solution available |
Michael Feil, Rachel Rapp |
Mar 28, 2025 |
782 |
- |
The best open-source embedding models |
Philip Kiely |
Apr 07, 2025 |
1254 |
- |
Building performant embedding workflows with Chroma and Baseten |
Philip Kiely |
Apr 11, 2025 |
570 |
- |
Accelerating inference with NVIDIA B200 GPUs |
Philip Kiely |
Apr 23, 2025 |
857 |
- |
Canopy Labs selects Baseten as preferred inference provider for Orpheus TTS models |
Philip Kiely |
May 07, 2025 |
1350 |
- |
Introducing Model APIs and Training |
|
May 24, 2025 |
525 |
- |
Introducing our new brand |
|
May 25, 2025 |
258 |
- |
Day zero benchmarks for Qwen 3 with SGLang on Baseten |
Yineng Zhang |
May 19, 2025 |
1303 |
- |
How Baseten multi-cloud capacity management (MCM) unifies deployments |
Rachel Rapp |
Jun 10, 2025 |
935 |
- |
Forward deployed engineering on the frontier of AI |
Vlad Shulman |
Jun 11, 2025 |
2108 |
- |
Your client code matters: 12x higher embedding throughput with Python and Rust |
Michael Feil |
Jun 13, 2025 |
1280 |
- |