Stanford's Hazy Research lab introduced Minions, an open-source project that links local models like Google's gemma3:4b with advanced models in the cloud, such as GPT-4o, aiming to reduce cloud costs by 5x-30x while maintaining 98% of frontier model accuracy. The Minions protocol keeps raw context local, enhancing privacy by ensuring sensitive data doesn't leave the device, though some information still reaches the cloud. To address this, researchers developed a secure communication protocol using NVIDIA's Hopper H100 GPUs, which encrypts data end-to-end, even shielding it from cloud providers. This setup involves encrypted exchanges between the local device and a secure GPU enclave, allowing the processing of encrypted local LLM messages with minimal latency overhead, thus achieving confidential LLM collaboration. Users can explore this secure protocol through an interactive demo or programmatically in Python, with further technical details available in the Hazy Research blog post.