What is Multi-Model Serving and How Does it Transform your ML Infrastructure?

Post Details

Company

Seldon

Date Published

Jan. 12, 2023

Author

Seldon

Word Count

934

Company Posts That Month

3

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.seldon.io/what-is-multi-model-serving-and-how-does-it-transform-your-ml-infrastructure

Summary

Multi-model serving (MMS) is an advanced approach that enhances machine learning (ML) infrastructure by enabling multiple models to run on shared servers, thereby reducing the infrastructure footprint and achieving cost and energy savings. This method is particularly efficient with the "Overcommit" functionality, which allows servers to handle more models than their memory capacity by using a least-recently-used cache mechanism to keep active models in memory while moving less-used ones to disk. Traditional single-model serving setups, where each model is deployed in a separate container, often lead to inefficient resource allocation, especially as the number of models scales up, resulting in increased overhead and costs. MMS addresses these issues by optimizing resource usage, improving CPU/GPU sharing, and eliminating cold start delays, which is when container images must be downloaded before model deployment. The integration of MMS with autoscaling and Overcommit capabilities facilitates intelligent resource management, accommodating fluctuating demand patterns and offering significant savings in both infrastructure costs and energy consumption, which is critical in constrained environments like edge device deployments.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	8	292	59	28	+7%
Kubernetes	2	1,398	143	60	+21%
TPUs	1	9	6	5	+800%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.