Home / Companies / Pulumi / Blog / Post Details
Content Deep Dive

Use Your Mac for AI Agents: Self-Host Gemma 4 12 B with Pulumi and Tailscale

Blog post from Pulumi

Post Details
Company
Date Published
Author
Pablo Seibelt
Word Count
1,609
Language
English
Hacker News Points
-
Summary

The text details a comprehensive guide on running the Gemma 4 12 B open-weight model locally on a modern Mac using llama.cpp for host-native inference, leveraging Apple Metal acceleration, and integrating with Kubernetes and Pulumi for infrastructure management. It highlights the benefits of local execution, such as keeping data within the local network, offline functionality, and zero token cost, contrasting it with common trade-offs associated with cloud-based AI services. The setup involves using tools like brew, docker, pulumi, and tailscale, with specific configuration steps for installing and running the model, including setting up a local Kubernetes cluster and deploying a chat interface using Open WebUI. It also explores how to expose the web interface securely via Tailscale, and provides instructions for configuring a local coding agent to interact with the model. The guide concludes with advanced options for Linux users with dedicated GPU resources and cleanup instructions for dismantling the setup.