Baseten Blog - Plushcap

Blog URL

www.baseten.co/blog

Posts YTD

25 ↑ vs 8 last year

Avg Posts/Month

3.4 since 2022

Monthly Post Volume

Start year:

Post Details

Search:

Title	Author	Published	Words	HN Pts
New in October: Find community with The DSC	Baseten	2022-10-31	408	--
New in May 2022: Off-site but on-track	Baseten	2022-05-26	432	--
Introducing Baseten Self-hosted	Anupreet Walia, Rachel Rapp	2024-08-08	670	--
Four ML models that accelerate content creation	Philip Kiely	2022-06-02	945	--
New in December 2021	Emmiliese von Avis	2022-01-07	494	--
Deploying and using Stable Diffusion XL 1.0	Philip Kiely	2023-07-26	286	--
How to serve your ComfyUI model behind an API endpoint	Het Trivedi, Philip Kiely	2023-12-08	1,326	--
New in July: A seamless bridge from model development to deployment	Baseten	2022-07-29	414	--
Baseten achieves SOC 2 Type II certification	Baseten	2023-03-08	282	--
New in January 2023	Baseten	2023-01-31	538	--
AudioGen: deploy and build today!	Jesse Mostipak	2023-08-04	340	--
Open source alternatives for machine learning models	Varun Shenoy, Philip Kiely	2023-11-21	1,207	--
A guide to LLM inference and performance	Varun Shenoy, Philip Kiely	2023-11-17	3,038	113
New in July 2023	Baseten	2023-08-02	514	--
Three techniques to adapt LLMs for any use case	Philip Kiely	2023-06-15	983	--
StartupML AMA: Nikhil Harithas	Derek Kim	2022-08-09	1,774	--
New in June 2023	Baseten	2023-06-29	424	--
Build with OpenAI’s Whisper model in five minutes	Justin Yi	2022-10-18	712	--
Go from machine learning models to full-stack applications	Tuhin Srivastava	2022-05-03	1,026	--
How we achieved SOC 2 and HIPAA compliance as an early-stage company	Baseten	2023-03-08	673	--
How to benchmark image generation models like Stable Diffusion XL	Philip Kiely	2024-01-31	1,374	--
Comparing tokens per second across LLMs	Philip Kiely	2024-05-09	769	--
What I learned from my AI startup’s internal hackathon	Julien Reiman	2023-06-12	719	--
New in August: Deploy, deploy, deploy	Baseten	2022-08-31	430	--
How latent consistency models work	Rachel Rapp	2024-06-04	1,140	--
New in August 2023	Baseten	2023-08-31	591	--
Comparing NVIDIA GPUs for AI: T4 vs A10	Philip Kiely	2023-04-27	1,604	--
Unlocking the full power of NVIDIA H100 GPUs for ML inference with …	Pankaj Gupta, Philip Kiely	2024-02-06	1,623	--
Deploy Falcon-40B on Baseten	Sid Shanker	2023-06-09	794	--
New in February 2024	Baseten	2024-02-29	634	--
StartupML AMA: Daniel Whitenack	Derek Kim	2022-08-30	1,706	--
How to choose the right instance size for your ML models	Philip Kiely	2023-01-18	597	--
How to serve 10,000 fine-tuned LLMs from a single GPU	Pankaj Gupta, Philip Kiely	2024-07-23	1,895	--
New in September 2023	Baseten	2023-09-29	605	--
Streaming real-time text to speech with XTTS V2	Het Trivedi, Philip Kiely	2024-04-18	1,318	--
Continuous vs dynamic batching for AI inference	Matt Howard, Philip Kiely	2024-04-05	1,350	--
Models We Love: June 2023	Baseten	2023-07-06	1,498	--
High performance ML inference with NVIDIA TensorRT	Justin Yi, Philip Kiely	2024-03-12	1,076	--
Why we built and open-sourced a model serving solution	Phil Howes	2022-08-05	1,030	--
NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference	Philip Kiely	2023-09-15	1,636	--
New in September: Increasing flexibility and robustness	Baseten	2022-09-29	461	--
Baseten achieves SOC 2 Type 1 certification	Baseten	2022-03-16	280	--
FP8: Efficient model inference with 8-bit floating point numbers	Pankaj Gupta, Philip Kiely	2024-03-07	1,021	2
Deployment and inference for open source text embedding models	Philip Kiely	2023-11-02	1,706	--
The best open source large language model	Philip Kiely	2024-02-09	1,920	--
New in January 2024	Baseten	2024-01-31	580	--
How to deploy Stable Diffusion using Truss	Abu Qader	2022-09-01	1,038	--
Deploy open-source models in a couple clicks from Baseten’s model library	Emmiliese von Avis	2023-06-08	888	--
Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation	Philip Kiely	2023-12-13	1,075	--
Using fractional H100 GPUs for efficient model serving	Matt Howard, Vlad Shulman, Pankaj Gupta, Philip Kiely	2024-03-28	1,086	--
Jina AI’s jina-embeddings-v2: an open source text embedding model that matches OpenAI’s …	Philip Kiely	2023-10-27	547	--
Accelerating model deployment: 100X faster dev loops with development deployments	Baseten	2022-12-08	810	--
40% faster Stable Diffusion XL inference with NVIDIA TensorRT	Pankaj Gupta, Justin Yi, Philip Kiely	2024-02-22	2,403	--
New in June: Full-stack superpowers	Baseten	2022-06-30	463	--
Ten reasons to join Baseten	Dustin Michaels, Philip Kiely	2024-07-25	1,230	--
Why GPU utilization matters for model inference	Marius Killinger, Philip Kiely	2024-02-20	816	--
New in March 2024	Baseten	2024-03-28	553	--
Build your own open-source ChatGPT with Llama 2 and Chainlit	Philip Kiely	2023-08-23	1,061	--
Designing parental leave at an early stage startup	Paige Pauli	2022-02-02	844	--
SDXL inference in under 2 seconds: the ultimate guide to Stable Diffusion …	Varun Shenoy, Philip Kiely	2023-08-30	1,352	--
A checklist for switching to open source ML models	Philip Kiely	2023-11-21	482	--
New in May 2023	Baseten	2023-06-02	384	--
Baseten announces HIPAA compliance	Baseten	2023-03-28	167	--
Compound AI systems explained	Rachel Rapp	2024-08-06	1,338	--
What I learned as a forward-deployed engineer working at an AI startup	Het Trivedi	2024-05-31	1,353	--
Introducing Baseten Chains	Bola Malek, Marius Killinger, Sid Shanker, Rachel Rapp, Mike Bilodeau	2024-06-27	1,132	9
The benefits of globally distributed infrastructure for model serving	Phil Howes, Philip Kiely	2024-03-01	603	--
Technical deep dive: Truss live reload	Pankaj Gupta	2023-02-17	1,852	--
33% faster LLM inference with FP8 quantization	Pankaj Gupta, Philip Kiely	2024-03-14	1,876	--
Using asynchronous inference in production	Samiksha Pal, Helen Yang, Rachel Rapp	2024-07-11	950	--
Introduction to quantizing ML models	Abu Qader, Philip Kiely	2024-01-31	1,679	1
Understanding NVIDIA’s Datacenter GPU line	Philip Kiely	2023-05-23	708	--
New in April 2024	Baseten	2024-05-01	552	--
Benchmarking fast Mistral 7B inference	Abu Qader, Pankaj Gupta, Justin Yi, Philip Kiely	2024-03-14	1,571	--
Comparing GPUs across architectures and tiers	Philip Kiely	2023-05-22	765	--
SPC hackathon winners build with Llama 3.1 on Baseten	Philip Kiely	2024-08-16	615	--
Understanding performance benchmarks for LLM inference	Philip Kiely	2024-01-12	1,459	--
New in December 2023	Baseten	2023-12-27	553	--
Pinning ML model revisions for compatibility and security	Philip Kiely	2023-11-09	564	--
Comparing few-step image generation models	Rachel Rapp	2024-06-14	1,087	--
Choosing the right horizontal scaling setup for high-traffic models	Philip Kiely	2023-01-19	628	--
Models We Love: July 2023	Baseten	2023-07-26	1,831	--
Faster Mixtral inference with TensorRT-LLM and quantization	Pankaj Gupta, Timur Abishev, Philip Kiely	2023-12-22	1,467	2
NVIDIA A10 vs A10G for ML model inference	Philip Kiely	2023-11-28	1,056	--
Stable Video Diffusion now available	Sid Shanker, Varun Shenoy	2023-11-22	324	--
Serving four million Riffusion requests in two days	Phil Howes	2022-12-21	757	--
Announcing our Series A	Tuhin Srivastava	2022-04-26	727	--
Create an API endpoint for an ML model	Philip Kiely	2022-04-22	339	--
New in October 2023	Baseten	2023-10-31	497	--
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder	Abu Qader, Philip Kiely	2024-08-01	939	2
New in March 2023	Baseten	2023-03-31	359	--
Deploying custom ComfyUI workflows as APIs	Het Trivedi, Rachel Rapp	2024-07-25	1,144	1
Deploy StableLM with Truss	Tuhin Srivastava	2023-04-20	423	--
Build a chatbot with Llama 2 and LangChain	Philip Kiely	2023-07-27	1,440	--
Model autoscaling features on Baseten	Jesse Mostipak	2023-07-07	890	--
GPT vs Mistral: Migrate to open source LLMs seamlessly	Sid Shanker, Philip Kiely	2023-11-22	879	--
New in May 2024	Baseten	2024-06-03	598	--
CI/CD for AI model deployments	Vlad Shulman, Samiksha Pal, Sid Shanker, Philip Kiely	2024-04-30	914	--
Getting started with foundation models	Jesse Mostipak	2023-06-06	1,226	--
How Baseten is using "docs as code" to build best-in-class documentation	Philip Kiely	2022-03-09	1,014	--
AI infrastructure: build vs. buy	Baseten	2023-07-28	1,040	--
Announcing our Series B	Tuhin Srivastava	2024-03-04	629	2
New in December 2022	Baseten	2022-12-23	554	--
Control plane vs workload plane in model serving infrastructure	Colin McGrath, Matt Howard, Philip Kiely	2024-05-29	870	--
If You Build It, Devs will Come: How to Host an AI …	Julien Reiman	2023-04-06	1,061	--
New in November 2023	Baseten	2023-11-30	419	--
Baseten Chains explained: building multi-component AI workflows at scale	Marius Killinger, Rachel Rapp	2024-07-02	2,424	--
New in April 2023	Baseten	2023-04-30	510	--
How to double tokens per second for Llama 3 with Medusa	Abu Qader, Philip Kiely	2024-08-20	1,462	2
The best open-source image generation model	Philip Kiely	2024-08-29	1,409	--
How to build function calling and JSON mode for open-source and fine-tuned …	Bryce Dubayah, Philip Kiely	2024-09-12	1,339	1
Introducing function calling and structured output for open-source and fine-tuned LLMs	Bryce Dubayah, Philip Kiely	2024-09-12	604	--
Building high-performance compound AI applications with MongoDB Atlas and Baseten	Philip Kiely	2024-09-17	1,425	--
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a …	Mike Bilodeau, Rachel Rapp	2024-09-26	688	--
Export your model inference metrics to your favorite observability tool	Helen Yang, Nicolas Gere-lamaysouette, Philip Kiely	2024-10-05	493	--
Evaluating NVIDIA H200 GPUs for LLM inference	Pankaj Gupta, Philip Kiely	2024-10-23	1,294	--
Introducing canary deployments on Baseten	Sid Shanker, Jonathan Rochette, Raymond Cano, Rachel Rapp	2024-11-01	932	--
Create custom environments for deployments on Baseten	Samiksha Pal, Raymond Cano, Sid Shanker, Rachel Rapp	2024-11-15	621	--
Introducing Custom Servers: Deploy production-ready model servers from Docker images	Tianshu Cheng, Bola Malek, Rachel Rapp	2024-12-09	807	--
Generally Available: The fastest, most accurate, and cost-efficient Whisper transcription	William Gao, Derrick Yang, Tianshu Cheng, Rachel Rapp	2024-12-12	1,145	--
A quick introduction to speculative decoding	Pankaj Gupta, Justin Yi, Philip Kiely	2024-12-20	1,139	--
Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference	Justin Yi, Abu Qader, Bryce Dubayah, Rachel Rapp	2024-12-20	904	--
How we built production-ready speculative decoding with TensorRT-LLM	Pankaj Gupta, Justin Yi, Philip Kiely	2024-12-20	2,729	--
New observability features: activity logging, LLM metrics, and metrics dashboard customization	Suren Atoyan, Aaron Relph, Marius Killinger, Sid Shanker, Rachel Rapp	2024-12-23	540	--
Driving model performance optimization: 2024 highlights	Pankaj Gupta	2025-01-14	1,530	--
Private, secure DeepSeek-R1 in production in US & EU data centers	Amir Haghighat, Philip Kiely	2025-02-11	1,274	--
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud	Pankaj Gupta, Philip Kiely	2025-02-11	1,033	--
Baseten Chains is now GA for production compound AI systems	Marius Killinger, Tyron Jung, Rachel Rapp	2025-02-12	1,123	--
How multi-node inference works for massive LLMs like DeepSeek-R1	Phil Howes, Philip Kiely	2025-02-15	1,303	--
Announcing Baseten’s $75M Series C	Tuhin Srivastava	2025-02-26	739	--
How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM	Michael Feil, Philip Kiely	2025-03-28	2,035	--
Introducing Baseten Embeddings Inference: The fastest embeddings solution available	Michael Feil, Rachel Rapp	2025-03-28	782	--
The best open-source embedding models	Philip Kiely	2025-04-07	1,254	--
Building performant embedding workflows with Chroma and Baseten	Philip Kiely	2025-04-11	570	--
Accelerating inference with NVIDIA B200 GPUs	Philip Kiely	2025-04-23	857	--
Canopy Labs selects Baseten as preferred inference provider for Orpheus TTS models	Philip Kiely	2025-05-07	1,350	--
Introducing Model APIs and Training	--	2025-05-24	525	--
Introducing our new brand	--	2025-05-25	258	--
Introducing Baseten Hybrid: control and flexibility in your cloud and ours	Phil Howes	2024-10-28	691	--
Day zero benchmarks for Qwen 3 with SGLang on Baseten	Yineng Zhang	2025-05-19	1,303	--
How Baseten multi-cloud capacity management (MCM) unifies deployments	Rachel Rapp	2025-06-10	935	--
Forward deployed engineering on the frontier of AI	Vlad Shulman	2025-06-11	2,108	--
Your client code matters: 12x higher embedding throughput with Python and Rust	Michael Feil	2025-06-13	1,280	--
Understanding Voxtral vs. Whisper: Build a Voice-Controlled Smart Home App	Alex Ker 1 other	2025-07-24	901	--
Joey Zwicker joins Baseten as Head of FDE	Tuhin Srivastava	2025-08-11	907	--
Building reliable AI agents	Alex Ker	2025-07-22	1,105	--
AI inference explained: The hidden process behind every prediction	Madison Kanna	2025-07-01	1,212	--
Kimi K2 Explained: The 1 Trillion Parameter Model Redefining How to Build …	Alex Ker 1 other	2025-08-05	748	--
How we built BEI: high-throughput embedding, reranker, and classifier inference	Amir Haghighat 4 others	2025-07-14	2,111	--
Zero to real-time text-to-speech: The complete Orpheus + WebSockets tutorial	Alex Ker	2025-08-08	991	--
Run Qwen3 Embedding on NVIDIA Blackwell GPUs	Amir Haghighat 4 others	2025-08-04	345	--
Zero to real-time transcription: The complete Whisper V3 streaming tutorial	Alex Ker	2025-08-05	971	--
How we built Multi-cloud Capacity Management (MCM)	William Lau 3 others	2025-06-24	1,914	--
How we run GPT OSS 120B at 500+ tokens per second on …	Amir Haghighat 4 others	2025-08-07	938	--
From Prompt to Production: Baseten Inference in Your IDE with Cline	Alex Ker	2025-08-13	568	--
How to fine-tune gpt-oss-120b with Baseten and Axolotl	Sanskriti Sharma 2 others	2025-08-19	1,083	--
Welcoming Dannie Herzberg to Baseten	Tuhin Srivastava	2025-08-27	286	--
HTTP vs. WebSockets vs. gRPC for AI model inference	Madison Kanna	2025-08-29	635	--
How Baseten MCM, our cloud ecosystem partners, and NVIDIA drive fast, reliable …	Marylise Tauzia 2 others	2025-09-03	583	--
Announcing Baseten’s $150M Series D	Tuhin Srivastava	2025-09-05	1,069	--
Building the future of AI infrastructure: Q&A with Baseten Co-founder Amir Haghighat	Madison Kanna	2025-09-16	1,268	--
Making Zed fast: A conversation with Richard Feldman	Madison Kanna	2025-09-24	1,183	--
Delivering GenAI solutions for healthcare with Baseten and Vultr	Philip Kiely	2025-10-02	823	--
Baseten brings AI video to life on Nebius	Mike Bilodeau	2025-10-06	867	--
Building AI Agents, Open Code And Open Source: A Conversation with Dax	Madison Kanna	2025-10-10	2,827	--
From Sketch to 3D Model: Building a flower card generator with open …	Alex Ker	2025-10-11	1,457	--
How Baseten achieved 2x faster inference with NVIDIA Dynamo	Abu Qader 2 others	2025-10-17	904	--
How we made the fastest GPT-OSS on NVIDIA GPUs 60% faster	Tri Dao 2 others	2025-10-24	1,188	--
DeepSeek-OCR and the Unreasonable Usefulness of Compression	Alex Ker 1 other	2025-10-24	988	--
High-performance agents for financial services with NVIDIA Nemotron on Baseten	Philip Kiely	2025-10-28	871	--
Train AI Models When You Want. Deploy on Ultra Performant Infrastructure. Baseten …	Raymond Cano 1 other	2025-10-30	922	--
Tool Calling in Inference	Kenzie Amack 1 other	2025-11-06	2,368	--
Kimi K2 Thinking at 140+ TPS on NVIDIA Blackwell	Abu Qader 2 others	2025-11-12	1,520	--
Enterprise vision intelligence with Mistral AI and Baseten	Philip Kiely	2025-12-02	735	--
DeepSeek V3.2's path to GPT-5-level performance: sparse attention, RL at scale, and …	Alex Ker	2025-12-05	1,298	--
Parsed + Baseten: Building Models That Touch Grass	Mudith Jayasekara 3 others	2025-12-11	1,482	--
NVIDIA Nemotron 3 Nano: Build Agentic AI Applications on Baseten	Marylise Tauzia 1 other	2025-12-16	708	--
Baseten AI Wrapped: 3 trends to help you build better in 2026	Amir Haghighat 1 other	2025-12-22	711	--
A Q&A From Inference To Training: The Inside Story Of Baseten’s Newest …	Madison Kanna	2026-01-07	1,673	--
Building production AI for regulated industries with a leading digital insurer	Marylise Tauzia 1 other	2026-01-09	2,145	--
Purpose-built LLMs for dental note-taking	Marylise Tauzia 1 other	2026-01-09	1,980	--
Fine-tuning small open-source LLMs to outperform large closed-source models by 60% on …	Marylise Tauzia 1 other	2026-01-09	1,430	--
Production AI for non-technical knowledge workers: LangChain Agent Builder with GLM 4.7 …	Alex Ker	2026-01-14	615	--
The fastest Whisper — with streaming and diarization	Tianshu Cheng 4 others	2026-01-19	935	--
Announcing Baseten's $300M Series E	Tuhin Srivastava 3 others	2026-01-24	358	--
Boosting MTP acceptance in TensorRT-LLM: +40% throughput	Mahmoud Hassan 1 other	2026-01-24	1,288	--
Wan 2.2 video generation in less than 60 seconds	Mahmoud Hassan 1 other	2026-01-24	1,252	--
Open-Sourcing Baseten’s Suffix Automaton MTP Accelerator	Mahmoud Hassan 1 other	2026-01-28	1,290	--
The Baseten Inference Stack at NVIDIA Dynamo Day	Rachel Rapp	2026-02-04	1,098	--
OpenClaw + Kimi K2.5 on Baseten = frontier agent performance with OSS	Alex Ker 1 other	2026-02-05	491	--
How to run LLM performance benchmarks (and why you should)	Alex Ker 1 other	2026-02-05	1,400	--
Fine-tuning models, AI and Hollywood: A conversation with Oxen’s founder Greg	Madison Kanna	2026-02-05	3,376	--
AI Model Performance Metrics Explained	Kenzie Amack	2026-02-10	1,595	--
How we built the fastest Kimi K2.5 on Artificial Analysis	Tri Dao 3 others	2026-02-12	834	--
MiniMax M2.5: Intelligence too cheap to meter, RL process rewards, real-world productivity	Alex Ker	2026-02-14	902	--
Announcing the acquihire of Inferless by Baseten	Nilesh Agarwal 1 other	2026-02-14	468	--
Four Bits	Ali Taha	2026-02-19	4,522	--
Introducing RadixMLP: Intra-batch deduplication for causal transformers	Michael Feil	2026-02-23	1,863	--
Outperforming frontier models on emergency department chart generation	Charles O'Neill 1 other	2026-03-03	1,829	--
Outperforming frontier models on emergency medicine documentation	Harry Partridge 2 others	2026-03-04	1,906	--
How we built the fastest GLM 5 API	Tri Dao 2 others	2026-03-06	858	--
NVIDIA Nemotron 3 Super for agentic AI in financial services	Rachel Rapp	2026-03-11	1,199	--
Welcome Matt Slagle!	Dannie Herzberg	2026-03-13	179	--

Plushcap, by Matt Makai. 2021-2026.