Home / Companies / Polar Signals / Blog / Post Details
Content Deep Dive

Open-Source Low-Overhead NVIDIA CUDA PC Sampling

Blog post from Polar Signals

Post Details
Company
Date Published
Author
-
Word Count
1,781
Language
English
Hacker News Points
-
Summary

The CUDA Profiling Tools Interface (CUPTI) has expanded its capabilities by incorporating Program Counter (PC) sampling, which allows developers to analyze CUDA program performance at the instruction level, identifying stall reasons and optimizing code efficiency. This feature, traditionally used in developer tools like NVidia NSight, can now be applied in production settings thanks to a low-overhead continuous profiler that minimizes performance impacts. PC sampling utilizes dedicated hardware to record the state of each GPU warp at configurable intervals, capturing PC offsets and stall reasons without timestamps or call stacks. The implementation involves a dynamic algorithm that periodically enables and disables PC sampling to maintain efficient data collection, while a shim library interfaces with the CUPTI to manage and transmit data to a backend for analysis. The data, collected in PC/stall-reason pairs, is processed and symbolized on the backend to provide detailed insights into GPU stalls, enhancing the utility of continuous profiling tools like Polar Signals. This advancement allows users to maintain a comprehensive production-level profiling environment, capturing valuable instruction-level GPU insights for performance optimization.