Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face

Company

HuggingFace

Date Published

Oct. 16, 2025

Author

Jiqing.Feng, Matrix Yao, Ke Ding, and Ilyas Moutawwakil

Word count

1374

Language

Hacker News points

None

URL

huggingface.co/blog/gpt-oss-on-intel-xeon

Summary

Intel and Hugging Face have collaborated to demonstrate significant improvements in cost efficiency and performance for large Mixture of Experts (MoE) models, such as the OpenAI GPT OSS, by upgrading to Google Cloud's C4 Virtual Machines powered by Intel Xeon 6 processors. The C4 VMs showed a 1.7x improvement in Total Cost of Ownership (TCO) and 1.4x to 1.7x increase in throughput per vCPU compared to the previous C3 VMs. These advancements were achieved through specific optimizations, including directing expert execution to reduce redundant computations, resulting in enhanced model performance. The benchmark tests, which focused on steady-state decoding and throughput across various batch sizes, confirmed that the new setup provides both higher throughput and lower latency, making large-scale MoE model inference more efficient on general-purpose CPUs.