Intel XPU Kernel Skill: LLM-driven Triton kernel optimization for the Hugging Face Kernel Hub

Post Details

Company

HuggingFace

Date Published

June 17, 2026

Author

Daniel Fleischer and Moshe Wasserblat

Word Count

2,201

Company Posts That Month

90

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/danf/intel-xpu-kernels-skill

Summary

Xe-Forge is an Intel project designed to optimize Triton kernels for Intel Arc Pro GPUs using a sequence of optimization stages driven by a large language model (LLM). This process, called CoVeR (Chain-of-Verification-and-Refinement), involves a loop that tests and iterates kernel candidates on the GPU to enhance performance. The Xe-Forge framework leverages a knowledge base of Intel XPU-specific patterns to guide optimization, which is often underrepresented in LLM training data. On the Intel Arc Pro B70, Xe-Forge achieves significant speedups over existing PyTorch and Triton kernels, demonstrating its ability to enhance even hand-tuned kernels. The xpu-kernels skill packages this optimization process into an Agent Skill, allowing a coding agent to perform the optimization loop without requiring the entire project. Xe-Forge's effectiveness has been proven across various kernel configurations, particularly in memory and compute-bound scenarios, and it emphasizes the importance of knowledge access in optimizing kernels for less-represented architectures like Intel's XPU.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	8	5,172	1,006	220	-43%