Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Easily Build and Share ROCm Kernels with Hugging Face

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Abdennacer Badaoui, Daniel Huang, colorswind, and Zesen Liu
Word Count
3,120
Language
-
Hacker News Points
-
Summary

Custom kernels are essential for high-performance deep learning, allowing GPU operations tailored to specific workloads, such as image processing or tensor transformations. The process of compiling these kernels for different architectures and integrating them into PyTorch extensions can be challenging, but Hugging Face’s kernel-builder and kernels libraries simplify this by providing support for multiple GPU backends, including ROCm for AMD GPUs. This guide focuses on creating, testing, and sharing ROCm-compatible kernels, using the RadeonFlow GEMM kernel as an example. This kernel is optimized for the AMD Instinct MI300X GPU, using a low-precision FP8 format to enhance throughput and reduce memory bandwidth while maintaining accuracy through per-block scaling. The guide explains how to structure projects, configure build files, and integrate custom kernels as native PyTorch operators, leveraging tools like Nix for reproducibility. Once built, these kernels can be shared on the Hugging Face Hub, making them readily accessible for community use.