Whatâs New for Serverless LLM Usage in RunPod (2025 Update)
Blog post from RunPod
Leveraging the advantages of serverless architecture, the text highlights its efficiency in deploying large language models (LLMs) by minimizing GPU costs and providing seamless scalability to accommodate demand spikes. Recent updates include the ability to increase GPU allocation up to 320GB per worker, with further expansion available through support, and the introduction of SGLang, a framework tailored for structured generation and control flow, alongside vLLM, optimized for high-performance inference. The platform also boasts an improved model selection interface with direct integration from Huggingface, a streamlined deployment process with GitHub integration, and promises of further enhancements to simplify and expedite serverless endpoint deployments.