Pallas for people who know JAX but not kernels yet
Blog post from HuggingFace
Pallas is an experimental extension of JAX designed for writing custom kernels on GPUs and TPUs, allowing users to maintain the Python and JAX primitives they are familiar with while necessitating a deeper understanding of memory allocation at the kernel level. Unlike standard JAX operations, Pallas requires developers to manage memory references directly using Refs, enabling fine-grained control over the computation process. This approach allows for precise memory and tiling management, crucial for optimizing performance on advanced hardware architectures like NVIDIA GPUs and TPUs. Pallas operates by lowering code to Mosaic on TPUs and Mosaic GPU on newer NVIDIA GPUs, with a secondary, less recommended Triton GPU backend. The tool introduces concepts like program instances and grids, essential for efficiently managing parallel computation tasks by defining how many instances to launch and what data blocks each should handle. Debugging and optimizing Pallas kernels involve using interpretation and debugging modes to ensure correct functionality, especially when transitioning from interpreted to compiled modes on TPUs and GPUs.