Deep Learning with Multiple GPUs on Rescale: Torch
Blog post from Rescale
The article introduces techniques for scaling deep neural network (DNN) training across multiple GPUs using the Torch machine learning library, which is favored by researchers due to its flexibility and open-source extensions that often house the latest deep learning advancements. It delves into two main parallelization strategies: Model Parallelism, where different parts of a network are processed by different GPUs, and Data Parallelism, where entire networks are run on different GPUs with various data batches. The article provides a simple Torch example to demonstrate these concepts and details how to use DataParallelTable for distributing data across multiple GPUs. It also references a practical implementation using Wide Residual Networks on CIFAR10, illustrating that multi-GPU setups can significantly reduce training time. The discussion is part of a broader series on multi-GPU and multi-node scaling, with future posts promised to explore other libraries and configurations.