Modular natively supports dynamic shapes for AI workloads
Blog post from Modular
In the AI industry, infrastructure challenges often lead to reliance on simple metrics like QPS, Latency, and Throughput, resulting in tools that excel in benchmarks but struggle in practical deployments. A critical yet overlooked feature is support for dynamic shapes, which significantly impacts real-world performance and usability. The Modular AI Engine offers full support for dynamic shapes, allowing it to outperform traditional compilers like XLA on Intel CPUs, especially when running BERT on the GLUE dataset, achieving 5x faster compile times and 2x faster runtime. This engine supports various models from popular frameworks and combines the usability of dynamic execution with the performance of a compiler approach, unlike static compilers that require cumbersome workarounds like padding for sequence lengths. Hybrid approaches exist, but the Modular AI Engine's dynamic compiler provides superior performance without the need for such mitigations. It offers a seamless experience, reducing costs and latency while simplifying the deployment process for AI developers.