Home / Companies / Momento / Blog / Post Details
Content Deep Dive

GPUs are the most expensive resource in tech. We’re using them badly.

Blog post from Momento

Post Details
Company
Date Published
Author
Allen Helton
Word Count
886
Language
English
Hacker News Points
-
Summary

GPUs, originally designed for rendering video game graphics, have become crucial yet costly components in AI operations, often leading to inefficient usage due to their architectural and operational constraints. Despite their capabilities for parallel processing, the infrastructure that supports AI relies heavily on maintaining session-specific data, known as the KV cache, on individual GPUs to avoid repeated computations. This sticky session routing leads to some GPUs being overworked while others remain idle, creating inefficiencies across large fleets of GPUs. The challenge mirrors problems seen in other stateful systems like distributed databases and CDNs, but with unique complexities stemming from the large size and high cost of GPU memory. Researchers and infrastructure experts are beginning to focus on resolving these inefficiencies, with solutions likely emerging as these fields converge.