Run Larger LLMs on Runpod Serverless Than Ever Before â Llama-3 70B (and beyond!)

Post Details

Company

RunPod

Date Published

June 6, 2024

Author

Brendan McKeag

Word Count

639

Company Posts That Month

2

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.runpod.io/blog/run-larger-llms-on-runpod-serverless-than-ever-before

Summary

Runpod's serverless offering now supports multiple GPUs, enhancing its capability to run large language models (LLMs) with ease. Users can assign two A100 or H100 GPUs or up to ten 24GB or 48GB GPUs to a worker, facilitating the execution of 70 billion parameter models at full precision or nearly any quantized model using the VLLM Quick Deploy template. Setting up involves creating a network volume to store models, reducing cold start times to approximately 600ms for models like Llama-3-70b. Serverless architecture, while requiring more initial setup, offers cost efficiency by billing only for active use and allowing dynamic scaling to handle concurrent requests, providing a smoother user experience compared to fixed pod setups.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Serverless	7	555	121	71	-3%
LLM	5	2,718	331	130	+3%
Real-time	1	2,305	607	180	+15%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.

Run Larger LLMs on Runpod Serverless Than Ever Before â Llama-3 70B (and beyond!)

Run Larger LLMs on Runpod Serverless Than Ever Before â Llama-3 70B (and beyond!)