How to Accurately Time CUDA Kernels in Pytorch

Post Details

Company

Speechmatics

Date Published

March 28, 2023

Author

Lawrence Atkins

Word Count

1,075

Language

English

Hacker News Points

-

Source URL

www.speechmatics.com/company/articles-and-news/timing-operations-in-pytorch

Summary

The article presents a comprehensive guide to accurately timing individual operations in a computational graph, particularly for machine learning models on GPUs. It highlights the importance of host-device synchronization, CUDA events, warm-up steps, fixed clocks, cache flush, and sleep/CUDA graphs in achieving accurate and repeatable results. The guide provides examples and tips specific to PyTorch, but the principles discussed can be applied to CUDA programming in general.