51 blog posts published by month since the start of 2025. Start from a different year:

Posts year-to-date
51 (54 posts by this month last year.)
Average posts per month since 2025
4.3

Post details (2025 to today)

Title Author Date Word count HN points
Driving model performance optimization: 2024 highlights Pankaj Gupta Jan 14, 2025 1530 -
Private, secure DeepSeek-R1 in production in US & EU data centers Amir Haghighat, Philip Kiely Feb 11, 2025 1274 -
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud Pankaj Gupta, Philip Kiely Feb 11, 2025 1033 -
Baseten Chains is now GA for production compound AI systems Marius Killinger, Tyron Jung, Rachel Rapp Feb 12, 2025 1123 -
How multi-node inference works for massive LLMs like DeepSeek-R1 Phil Howes, Philip Kiely Feb 15, 2025 1303 -
Announcing Baseten’s $75M Series C Tuhin Srivastava Feb 26, 2025 739 -
How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM Michael Feil, Philip Kiely Mar 28, 2025 2035 -
Introducing Baseten Embeddings Inference: The fastest embeddings solution available Michael Feil, Rachel Rapp Mar 28, 2025 782 -
The best open-source embedding models Philip Kiely Apr 07, 2025 1254 -
Building performant embedding workflows with Chroma and Baseten Philip Kiely Apr 11, 2025 570 -
Accelerating inference with NVIDIA B200 GPUs Philip Kiely Apr 23, 2025 857 -
Canopy Labs selects Baseten as preferred inference provider for Orpheus TTS models Philip Kiely May 07, 2025 1350 -
Introducing Model APIs and Training - May 24, 2025 525 -
Introducing our new brand - May 25, 2025 258 -
Day zero benchmarks for Qwen 3 with SGLang on Baseten Yineng Zhang May 19, 2025 1303 -
How Baseten multi-cloud capacity management (MCM) unifies deployments Rachel Rapp Jun 10, 2025 935 -
Forward deployed engineering on the frontier of AI Vlad Shulman Jun 11, 2025 2108 -
Your client code matters: 12x higher embedding throughput with Python and Rust Michael Feil Jun 13, 2025 1280 -
Understanding Voxtral vs. Whisper: Build a Voice-Controlled Smart Home App Alex Ker 1 other Jul 24, 2025 901 -
Joey Zwicker joins Baseten as Head of FDE Tuhin Srivastava Aug 11, 2025 907 -
Building reliable AI agents Alex Ker Jul 22, 2025 1105 -
AI inference explained: The hidden process behind every prediction Madison Kanna Jul 01, 2025 1212 -
Kimi K2 Explained: The 1 Trillion Parameter Model Redefining How to Build Agents Alex Ker 1 other Aug 05, 2025 748 -
How we built BEI: high-throughput embedding, reranker, and classifier inference Amir Haghighat 4 others Jul 14, 2025 2111 -
Zero to real-time text-to-speech: The complete Orpheus + WebSockets tutorial Alex Ker Aug 08, 2025 991 -
Run Qwen3 Embedding on NVIDIA Blackwell GPUs Amir Haghighat 4 others Aug 04, 2025 345 -
Zero to real-time transcription: The complete Whisper V3 streaming tutorial Alex Ker Aug 05, 2025 971 -
How we built Multi-cloud Capacity Management (MCM) William Lau 3 others Jun 24, 2025 1914 -
How we run GPT OSS 120B at 500+ tokens per second on NVIDIA GPUs Amir Haghighat 4 others Aug 07, 2025 938 -
From Prompt to Production: Baseten Inference in Your IDE with Cline Alex Ker Aug 13, 2025 568 -
How to fine-tune gpt-oss-120b with Baseten and Axolotl Sanskriti Sharma 2 others Aug 19, 2025 1083 -
Welcoming Dannie Herzberg to Baseten Tuhin Srivastava Aug 27, 2025 286 -
HTTP vs. WebSockets vs. gRPC for AI model inference Madison Kanna Aug 29, 2025 635 -
How Baseten MCM, our cloud ecosystem partners, and NVIDIA drive fast, reliable inference at scale Marylise Tauzia 2 others Sep 03, 2025 583 -
Announcing Baseten’s $150M Series D Tuhin Srivastava Sep 05, 2025 1069 -
Building the future of AI infrastructure: Q&A with Baseten Co-founder Amir Haghighat Madison Kanna Sep 16, 2025 1268 -
Making Zed fast: A conversation with Richard Feldman Madison Kanna Sep 24, 2025 1183 -
Delivering GenAI solutions for healthcare with Baseten and Vultr Philip Kiely Oct 02, 2025 823 -
Baseten brings AI video to life on Nebius Mike Bilodeau Oct 06, 2025 867 -
Building AI Agents, Open Code And Open Source: A Conversation with Dax Madison Kanna Oct 10, 2025 2827 -
From Sketch to 3D Model: Building a flower card generator with open source AI Alex Ker Oct 11, 2025 1457 -
How Baseten achieved 2x faster inference with NVIDIA Dynamo Abu Qader 2 others Oct 17, 2025 904 -
How we made the fastest GPT-OSS on NVIDIA GPUs 60% faster Tri Dao 2 others Oct 24, 2025 1188 -
DeepSeek-OCR and the Unreasonable Usefulness of Compression Alex Ker 1 other Oct 24, 2025 988 -
High-performance agents for financial services with NVIDIA Nemotron on Baseten Philip Kiely Oct 28, 2025 871 -
Train AI Models When You Want. Deploy on Ultra Performant Infrastructure. Baseten Training Is GA. Raymond Cano 1 other Oct 30, 2025 922 -
Tool Calling in Inference Kenzie Amack 1 other Nov 06, 2025 2368 -
Kimi K2 Thinking at 140+ TPS on NVIDIA Blackwell Abu Qader 2 others Nov 12, 2025 1520 -
Enterprise vision intelligence with Mistral AI and Baseten Philip Kiely Dec 02, 2025 735 -
DeepSeek V3.2's path to GPT-5-level performance: sparse attention, RL at scale, and context reuse Alex Ker Dec 05, 2025 1298 -
Parsed + Baseten: Building Models That Touch Grass Mudith Jayasekara 3 others Dec 11, 2025 1482 -