Home / Companies / Ollama / Blog / Post Details
Content Deep Dive

Minions: where local and cloud LLMs meet

Blog post from Ollama

Post Details
Company
Date Published
Author
-
Word Count
672
Language
-
Hacker News Points
-
Summary

Researchers from Stanford's Hazy Research lab have developed a method to offload a substantial portion of large language model (LLM) workloads to consumer devices by enabling small on-device models to collaborate with larger cloud-based models. This approach, detailed in a new paper with accompanying open-source code, seeks to reduce cloud costs with minimal or no quality loss through two protocols: Minion, where a cloud model interacts with a single local model to reach a solution, achieving a 30.4x cost reduction while maintaining 87% of cloud model performance, and MinionS, which breaks tasks into smaller subtasks to be solved in parallel by small LLMs, achieving a 5.7x cost reduction while maintaining 97.9% performance. The project, which involves models like Llama 3.2 and GPT-4o, allows users to explore these protocols through a demo app and provides instructions for setting up and running both protocols programmatically with Python.