Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

New in llama.cpp: Model Management

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Xuan-Son Nguyen and Victor Mustar
Word Count
740
Company Posts That Month
48
Language
-
Hacker News Points
-
Summary

Llama.cpp has introduced a router mode in its server, enabling dynamic model management without requiring server restarts, a feature inspired by Ollama-style model management. This new capability allows users to load, unload, and switch between multiple models seamlessly, using a multi-process architecture that keeps other models running even if one crashes. The server auto-discovers models from caches or specified directories and supports on-demand loading with an LRU eviction policy to manage up to four models by default. It facilitates model selection through the request's model field and supports various configurations via command-line options or presets. Additionally, a web UI is available for model switching, making it easier for developers to conduct A/B testing, implement multi-tenant deployments, and switch models during development without needing to restart the server. The community has responded positively, discussing potential improvements and integrations on platforms like GitHub.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
LLM 1 3,775 638 202 -32%