Using HashiCorp Nomad to Schedule GPU Workloads

Company

HashiCorp

Date Published

May 6, 2019

Author

Chris Baker and Renaud Gaubert

Word count

2098

Language

English

Hacker News points

None

URL

www.hashicorp.com/blog/using-hashicorp-nomad-to-schedule-gpu-workloads

Summary

The new device plugin system in Nomad 0.9 introduces a feature called "device plugins" which allow physical hardware devices to be detected, fingerprinted, and made available to the Nomad job scheduler. The NVIDIA GPU device plugin is one of the first devices supported by this feature, enabling users to schedule workloads that benefit from GPU acceleration on Nomad clusters. With device plugins, users can specify custom devices, affinities, and constraints for resource allocation, allowing for more fine-grained control over workload deployment. This feature builds on Nomad's mission of running any application on any infrastructure, providing a production-ready solution for GPU-accelerated workloads. The integration with NVIDIA's TensorRT Inference Server platform enables users to deploy deep learning models as production services, addressing concerns around request routing, monitoring, parallelization, scalability, and cost. By using Nomad with the NVIDIA GPU device plugin, users can take their deep learning models from training to production in a smooth and efficient manner.