Achieving Up to 67% Cost Savings with Prefill-Decode Disaggregation Using Ray + vLLM on AMD MI325X

Post Details

Company

Anyscale

Date Published

June 12, 2026

Author

Kourosh Hakhamaneshi

Word Count

2,090

Language

English

Hacker News Points

-

Source URL

www.anyscale.com/blog/ray-vllm-prefill-decode-disaggregation-amd-mi325x-67-percent-savings

Summary

In the exploration of Prefill-Decode (PD) disaggregation using Ray Serve LLM on AMD hardware, this blog post discusses how it can significantly enhance the performance of LLM serving by achieving up to 2.7x better "goodput," translating into cost savings of up to 67%. PD disaggregation separates prefill and decode phases onto dedicated GPUs, thereby eliminating mutual interference and enabling each phase to run closer to its theoretical throughput. While it offers advantages such as consistent TPOT under load and compounded savings over long output sequences, it also introduces operational complexities like KV cache transfer and workload-specific tuning of the prefill-to-decode ratio. The post highlights scenarios where PD is beneficial, particularly for TPOT- or E2E-sensitive workloads, and where aggregated serving is preferable, especially when TTFT is a critical constraint. The blog provides insights into the use of RIXL for KV transfer on AMD MI325X and emphasizes the importance of matching the P:D ratio to workload demands to avoid the potential pitfalls of PD disaggregation.