Home / Companies / Lambda / Blog / Post Details
Content Deep Dive

Hyperplane-16 InfiniBand Cluster Total Cost of Ownership Analysis

Blog post from Lambda

Post Details
Company
Date Published
Author
Stephen Balaban
Word Count
1,329
Language
English
Hacker News Points
-
Summary

The Hyperplane-16 is a massive 10kW Deep Learning training appliance from Lambda. It includes 16x NVIDIA Tesla V100 SXM3 GPUs, NVLink & NVSwitch for fast GPU-to-GPU communication within the server, and 8x 100 Gb/s InfiniBand cards for fast GPU-to-GPU communication across multiple servers during distributed training. The default system price used in the TCO calculator is $240,000. A 3x Hyperplane-16 cluster was analyzed, showing a total upfront cost of $868,939 and annual operating costs of $114,578. The Total Cost of Ownership (TCO) for a 5-year period is $1,441,829. The TCO analysis was conducted using a priced bill of materials, with prices adjusted based on system specifications and component costs. A network topology was dynamically generated to accommodate the cluster's components, including a spine and leaf network configuration.