Azure Linux RDMA Setup Tips
Blog post from Rescale
Microsoft's introduction of Azure Linux RDMA support has been beneficial for running high-performance computing (HPC) workloads in the cloud, yet setting it up remains challenging due to limited documentation. The available tutorial primarily utilizes the older ASM model for deploying virtual machines, whereas Microsoft advises using ARM for its parallel provisioning capability, which reduces startup time for larger clusters. Users may encounter issues such as missing repositories in the vanilla SLES VHD, which can be resolved by re-adding them to access a wider range of packages. Additionally, updating RDMA drivers may be necessary despite outdated guidance against doing so in certain regions, as it resolves DAPL errors. After installing the OSTC Extension, users might experience dropped SSH connections due to VM reboots, which should be considered when automating cluster deployments. Rescale’s support team offers assistance with further setup and tuning of HPC software on Azure, ensuring efficient cluster utilization.