Cerebrium Blog - Plushcap

Blog URL

www.cerebrium.ai/blog

Posts YTD

39 ↑ vs 17 last year

Avg Posts/Month

1.5 since 2022

Monthly Post Volume

Start year: 2024 2025 2026

Post Details

Search:

Title	Author	Published	Words	HN Pts
Building a Real-Time Shopping Assistant: Turn Live Video into Instant Purchases	Michael Louis	2024-08-14	2,435	--
Using Codestral to Summarize, Correct and Auto-Approve Pull Requests	Cerebrium Team	2024-06-15	2,400	--
Creating a realtime RAG voice agent	Cerebrium Team	2024-07-21	3,262	--
Introduction	Cerebrium Team	2024-04-09	1,163	1
Installing Python Packages via UV leads to 3.75x increase in build performance	--	2024-02-15	28	--
Getting better price-performance, latency, and availability on AWS Trn1/Inf2 instances	Cerebrium Team	2024-05-20	1,950	--
Creating an Executive Assistant using LangChain, LangSmith, Cerebrium and Cal.com	Michael Louis	2024-05-19	2,482	--
Running Llama 3 8B with TensorRT-LLM on Serverless GPUs	Michael Louis	2024-05-16	1,410	--
How to Build a Real-Time AI Avatar for Training and Coaching	Michael Louis	2024-09-17	2,529	--
Cerebrium supports HIPAA compliance: A guide for health applications	Kyle Gani	2024-09-30	1,208	--
Benchmarking vLLM, SGLang and TensorRT for Llama 3.1 API	Michael Louis	2024-10-10	643	--
An Alternative to OpenAI Realtime API for Voice Capabilities	Michael Louis	2024-10-14	1,359	7
ML apps at scale: ASGI support now available on Cerebrium	Kyle Gani	2024-10-28	452	--
Overcoming Transcription Challenges for Multilingual AI voice agents	Michael Louis	2024-12-19	1,275	--
Building a Real-time Coding Assistant	Kyle Gani	2025-02-20	3,114	--
Creating a realtime AI Commentator with Cerebrium, LiveKit and Cartesia	Michael Louis	2025-02-18	4,243	--
Deploying Ultravox on Cerebrium for Ultra-low Latency Voice Applications	Kyle Gani	2025-04-28	1,194	--
Orpheus TTS: How to Deploy Orpheus at Scale for Production Inference	Cerebrium Team	2025-08-29	1,756	--
How much does a H100 cost? Cost comparision	Cerebrium Team	2025-08-29	1,026	--
How to Deploy Machine Learning Models: A comprehensive Guide	Cerebrium Team	2025-08-29	997	--
5 Top Free Hosting Platforms for Python Apps	Cerebrium Team	2025-08-29	1,773	--
Top 5 Serverless GPU providers	Cerebrium Team	2025-08-29	1,055	--
Deploying a global scale, AI voice agent with 500ms latency.	Cerebrium Team	2025-06-25	1,765	--
Integrating PayPal's Model Context Protocol (MCP) into a Real-time Voice Agent	Cerebrium Team	2025-07-31	2,134	--
Alternatives to AWS, GCP and Azure for deploying AI models efficiently	Cerebrium Team	2025-05-26	1,137	--
Launch Week Day 3: Annoucing Multi-Region Deployments	Cerebrium Team	2025-07-10	583	--
Introducing Cerebrium run: The Fastest Way to Execute Cloud Code	Cerebrium Team	2025-07-09	718	--
How much does a H200 cost? 2025 Guide	Cerebrium Team	2025-08-29	906	--
Cerebrium Raises $8.5M led by Gradient to Scale the Leading High-Performance Serverless …	Cerebrium Team	2025-07-08	532	--
How Startups Can Cut AI Infrastructure Costs Without Compromising Performance	Cerebrium Team	2025-05-26	462	--
Faster Whisper Transcription: How to Maximize Performance for Real-Time Audio-to-Text	Cerebrium Team	2025-08-29	1,025	--
Deploying Sesame CSM: The Most Realistic Voice Model as an API	Cerebrium Team	2025-08-29	2,253	--
Deploying DeepSeek-R1: A Guide to a Serverless, High-Performaning OpenAI-Compatible Endpoint	Cerebrium Team	2025-08-29	1,229	--
Choosing the Right Serverless GPU Platform for Global Scale: What to Know …	Cerebrium Team	2025-10-15	2,402	--
The Shortcomings of Celery + Redis for ML Workloads and How Cerebrium …	Cerebrium Team	2025-10-27	1,739	--
Introduction New Regions: India & Stockholm	Cerebrium Team	2026-01-08	218	--
Cerebrium is now ISO 27001 Compliant	Cerebrium Team	2026-01-27	319	--
Why Serverless Compute Partners Are Now More Important Than Ever	Cerebrium Team	2026-03-02	1,918	--
The 1979 Design Choice Breaking Modern ML & How We Solved It	Cerebrium Team	2026-03-08	2,848	--
Rethinking Container Image Distribution to eliminate cold starts	Cerebrium Team	2026-03-08	3,004	--
Why Kubernetes Serving Breaks Down for Real-Time AI	Cerebrium Team	2026-03-24	2,679	--
Rethinking Container Image Distribution to eliminate cold starts	Cerebrium Team	2026-03-08	3,027	--
An Alternative to OpenAI Realtime API for Voice Capabilities	Cerebrium Team	2024-10-14	1,949	--
Overcoming Transcription Challenges for Multilingual AI voice agents	Cerebrium Team	2024-12-19	1,663	--
Using Codestral to Summarize, Correct and Auto-Approve Pull Requests	Cerebrium Team	2024-06-15	2,059	--
Achieving 83% Speed Improvements in Custom Container Images	Cerebrium Team	2026-03-31	1,512	--
Deploying a global scale, AI voice agent with 500ms latency.	Cerebrium Team	2025-06-25	1,765	--
Introducing Cerebrium run: The Fastest Way to Execute Cloud Code	Cerebrium Team	2025-07-09	653	--
Getting better price-performance, latency, and availability on AWS Trn1/Inf2 instances	Cerebrium Team	2024-05-20	1,796	--
Cerebrium supports HIPAA compliance: A guide for health applications	Cerebrium Team	2024-09-30	1,191	--
Deploying Ultravox on Cerebrium for Ultra-low Latency Voice Applications	Cerebrium Team	2025-04-28	1,672	--
How to Build a Real-Time AI Avatar for Training and Coaching	Cerebrium Team	2024-09-17	2,569	--
Creating a realtime AI Commentator with Cerebrium, LiveKit and Cartesia	Cerebrium Team	2025-02-18	4,060	--
Why Serverless Compute Partners Are Now More Important Than Ever	Cerebrium Team	2026-03-02	1,918	--
Scaling AI Tutors: How Creatium Achieved 18x Faster Cold Starts with Cerebrium	Cerebrium Team	2026-04-04	592	--
ML apps at scale: ASGI support now available on Cerebrium	Cerebrium Team	2024-10-28	608	--
Running Llama 3 8B with TensorRT-LLM on Serverless GPUs	Cerebrium Team	2024-05-16	1,872	--
Launch Week Day 3: Annoucing Multi-Region Deployments	Cerebrium Team	2025-07-10	579	--
Cerebrium Raises $8.5M led by Gradient to Scale the Leading High-Performance Serverless …	Cerebrium Team	2025-07-08	520	--
Lelapa AI uses Cerebrium to Break Language Barriers	Cerebrium Team	2026-04-04	741	--
How Tavus Scaled Human-like AI Experiences with Cerebrium	Cerebrium Team	2026-04-04	537	--
Building a Real-Time Shopping Assistant: Turn Live Video into Instant Purchases	Cerebrium Team	2024-08-14	3,259	--
Benchmarking vLLM, SGLang and TensorRT for Llama 3.1 API	Cerebrium Team	2024-10-10	626	--
How DistilLabs is Delivering 50% Lower Inference Costs with Production-Grade Autoscaling on …	Cerebrium Team	2026-04-04	545	--
How bitHuman Scaled Digital Humans 10x Faster with Cerebrium	Cerebrium Team	2026-04-04	785	--
Building a Real-time Coding Assistant	Cerebrium Team	2025-02-20	2,850	--
Faster Whisper Transcription: How to Maximize Performance for Real-Time Audio-to-Text	Michael Louis	2026-05-20	1,017	--
Deploying Sesame CSM: The Most Realistic Voice Model as an API	Kyle Gani	2026-05-20	2,151	--
The Shortcomings of Celery + Redis for ML Workloads and How Cerebrium …	Michael Louis	2026-05-20	1,786	--
Orpheus TTS: How to Deploy Orpheus at Scale for Production Inference	Michael Louis	2026-05-20	1,664	--
Top 5 Serverless GPU providers	Michael Louis	2026-05-20	1,055	--
How to Deploy Machine Learning Models: A comprehensive Guide	Michael Louis	2026-05-20	932	--
How Startups Can Cut AI Infrastructure Costs Without Compromising Performance	Cerebrium Team	2026-05-20	462	--
How much does a H200 cost? 2025 Guide	Michael Louis	2026-05-20	906	--
How much does a H100 cost? Cost comparision	Michael Louis	2026-05-20	1,026	--
Deploying DeepSeek-R1: A Guide to a Serverless, High-Performaning OpenAI-Compatible Endpoint	Michael Louis	2026-05-20	988	--
Alternatives to AWS, GCP and Azure for deploying AI models efficiently	Michael Louis	2026-05-20	1,137	--
Choosing the Right Serverless GPU Platform for Global Scale: What to Know …	Akriti Keswani	2026-05-20	2,510	--
5 Top Free Hosting Platforms for Python Apps	Kyle Gani	2026-05-20	1,737	--
Creating a realtime RAG voice agent	Cerebrium Team	2026-05-26	2,899	--
Creating an Executive Assistant using LangChain, LangSmith, Cerebrium and Cal.com	Cerebrium Team	2026-05-26	3,359	--
Integrating PayPal's Model Context Protocol (MCP) into a Real-time Voice Agent	Michael Louis	2026-05-26	1,788	--
Introduction New Regions: India & Stockholm	Michael Louis	2026-01-08	218	--
Cerebrium is now ISO 27001 Compliant	Michael Louis	2026-01-27	319	--
Productionize your Comfy UI Workflow	Michael Louis	2024-04-09	1,201	--
Thalamus - Our Highly Available Distributed Router for Global Realtime AI Workloads	Wesley Robinson	2026-06-04	2,348	--
Reducing GPU Cold Starts with Memory Snapshots: Restoring CUDA Workloads in Seconds	Yaseen Hamdulay	2026-07-01	2,835	--
SOC 2 Type 2: When Production AI Requires More Than Fast GPUs	Connor Blier	2026-07-08	1,070	--
Cerebrium Achieves SOC 2 Type II Compliance for Secure Production AI Infrastructure	Connor Blier	2026-07-08	1,070	--
2026 GPU Buyer's Guide	Connor Blier	2026-07-13	887	--
A Low-Latency Architecture for Voice Agents with Live Web Retrieval	Michael Louis	2026-07-15	1,029	--
A Low-Latency Architecture for Voice Agents with Real-time Web Search	Michael Louis	2026-07-15	1,038	--

Plushcap, by Matt Makai. 2021-2026.