Home / Companies / RunPod / Blog / Post Details
Content Deep Dive

What’s New for Serverless LLM Usage in RunPod (2025 Update)

Blog post from RunPod

Post Details
Company
Date Published
Author
Brendan McKeag
Word Count
580
Language
English
Hacker News Points
-
Summary

Leveraging the advantages of serverless architecture, the text highlights its efficiency in deploying large language models (LLMs) by minimizing GPU costs and providing seamless scalability to accommodate demand spikes. Recent updates include the ability to increase GPU allocation up to 320GB per worker, with further expansion available through support, and the introduction of SGLang, a framework tailored for structured generation and control flow, alongside vLLM, optimized for high-performance inference. The platform also boasts an improved model selection interface with direct integration from Huggingface, a streamlined deployment process with GitHub integration, and promises of further enhancements to simplify and expedite serverless endpoint deployments.