Minions: where local and cloud LLMs meet

Post Details

Company

Ollama

Date Published

Feb. 25, 2025

Author

-

Word Count

672

Language

-

Hacker News Points

-

Source URL

ollama.com/blog/minions

Summary

Researchers from Stanford's Hazy Research lab have developed a method to offload a substantial portion of large language model (LLM) workloads to consumer devices by enabling small on-device models to collaborate with larger cloud-based models. This approach, detailed in a new paper with accompanying open-source code, seeks to reduce cloud costs with minimal or no quality loss through two protocols: Minion, where a cloud model interacts with a single local model to reach a solution, achieving a 30.4x cost reduction while maintaining 87% of cloud model performance, and MinionS, which breaks tasks into smaller subtasks to be solved in parallel by small LLMs, achieving a 5.7x cost reduction while maintaining 97.9% performance. The project, which involves models like Llama 3.2 and GPT-4o, allows users to explore these protocols through a demo app and provides instructions for setting up and running both protocols programmatically with Python.