Lightweight Azure InfiniBand Cluster Setup
Blog post from Rescale
Microsoft's Big Compute solution introduces InfiniBand connectivity with RDMA support for its new A8 and A9 cloud instances, addressing the challenge of slow interconnect speeds often found in cloud-based high-performance computing (HPC) compared to on-premise clusters. This development is particularly significant as it offers near bare-metal performance, making it attractive for enterprises needing high-bandwidth, low-latency networks for tightly coupled simulations. However, the solution's current limitations include the requirement for applications to run on Windows and the complexity of configuring an MPI cluster on Windows compared to Linux, which most HPC practitioners are accustomed to. While Microsoft provides tools like the HPC Pack to assist with setup, its perceived complexity and the need for applications to be recompiled against MS-MPI libraries may pose barriers to entry. Despite these challenges, the impressive performance metrics achieved in latency and bandwidth tests suggest the potential of Big Compute, especially as future support for Linux VMs could further enhance its appeal in the HPC market.