An Open Source Stack for AI Compute: Kubernetes + Ray + PyTorch + vLLM

Company

Anyscale

Date Published

June 12, 2025

Author

Robert Nishihara

Word count

3073

Language

English

Hacker News points

URL

www.anyscale.com/blog/ai-compute-open-source-stack-kubernetes-ray-pytorch-vllm

Summary

The software stack for AI compute consists of three layers: the training and inference framework, the distributed compute engine, and the container orchestrator. The training and inference framework includes PyTorch, vLLM, and other frameworks designed for model parallelism and transformer-specific optimization. The distributed compute engine, such as Ray, handles scheduling, data movement, and failure handling. The container orchestrator, like Kubernetes or SLURM, allocates resources and manages the lifecycle of containers. This stack is used by various companies, including Pinterest, Uber, Roblox, and others, to manage AI workloads, including training, inference, and batch processing. Post-training frameworks, such as VeRL, SkyRL, OpenRLHF, Open-Instruct, and NeMo-RL, are also built using this stack, often combining Ray, PyTorch, vLLM, and other technologies.