Profiling NVIDIA CUDA in Kubernetes

Post Details

Company

Polar Signals

Date Published

Dec. 18, 2025

Author

Frederic Branczyk

Word Count

703

Language

-

Hacker News Points

-

Source URL

www.polarsignals.com/blog/posts/2025/12/18/profiling-nvidia-cuda-in-kubernetes

Summary

The open-source NVIDIA CUDA profiler facilitates continuous production monitoring by allowing users to inject a shared library into application processes without altering the build process, utilizing Kubernetes init containers and a shared volume. This method enables CUDA workloads, such as PyTorch training jobs, to load the profiler library seamlessly, capturing every kernel launch, memory transfer, and synchronization event. By using an init container to copy the library to a shared volume and setting the CUDA_INJECTION64_PATH environment variable, the library is made available to the main container, allowing for transparent profiling of any CUDA application, including TensorFlow, JAX, or custom C++ code. Users can verify successful implementation through application logs or by checking the environment directly, and the profiler provides valuable insights into CUDA function execution times, aiding in optimization decisions like batch sizing and operator fusion. Future blog posts will explore more detailed use cases and upcoming features, while further resources and community support are available through Discord and other documentation.