Home / Companies / Speechmatics / Blog / Post Details
Content Deep Dive

How to Accurately Time CUDA Kernels in Pytorch

Blog post from Speechmatics

Post Details
Company
Date Published
Author
Lawrence Atkins
Word Count
1,075
Language
English
Hacker News Points
-
Summary

The article presents a comprehensive guide to accurately timing individual operations in a computational graph, particularly for machine learning models on GPUs. It highlights the importance of host-device synchronization, CUDA events, warm-up steps, fixed clocks, cache flush, and sleep/CUDA graphs in achieving accurate and repeatable results. The guide provides examples and tips specific to PyTorch, but the principles discussed can be applied to CUDA programming in general.