Use Your Mac for AI Agents: Self-Host Gemma 4 12 B with Pulumi and Tailscale

Post Details

Company

Pulumi

Date Published

June 4, 2026

Author

Pablo Seibelt

Word Count

1,609

Company Posts That Month

15

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.pulumi.com/blog/self-host-gemma4-llama-cpp-k8s-tailscale-pulumi

Summary

The text details a comprehensive guide on running the Gemma 4 12 B open-weight model locally on a modern Mac using llama.cpp for host-native inference, leveraging Apple Metal acceleration, and integrating with Kubernetes and Pulumi for infrastructure management. It highlights the benefits of local execution, such as keeping data within the local network, offline functionality, and zero token cost, contrasting it with common trade-offs associated with cloud-based AI services. The setup involves using tools like brew, docker, pulumi, and tailscale, with specific configuration steps for installing and running the model, including setting up a local Kubernetes cluster and deploying a chat interface using Open WebUI. It also explores how to expose the web interface securely via Tailscale, and provides instructions for configuring a local coding agent to interact with the model. The guide concludes with advanced options for Linux users with dedicated GPU resources and cleanup instructions for dismantling the setup.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	17	6,237	1,165	246	-31%
Kubernetes	9	2,168	322	107	+10%
AI Agents	1	6,119	1,396	266	+24%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.