How AI Gateway runs on Fluid compute
Blog post from Vercel
AI Gateway is a Node.js service designed to connect to various AI models through a single interface, effectively processing billions of tokens daily by leveraging Fluid compute for cost-efficient scaling. Fluid compute enhances concurrency and resource efficiency by allowing multiple simultaneous operations within a single instance, reducing the need for separate serverless instances for each invocation. AI Gateway operates within the Vercel infrastructure using a global delivery network that optimizes request routing through Anycast routing and Points of Presence (PoPs), ensuring low-latency and high-throughput communication with AI providers. By employing Active CPU Pricing, AI Gateway significantly cuts costs by only charging CPU rates when actively processing, while the rest of the time incurs lower memory-only charges. Vercel provides native observability and detailed metrics for developers to monitor performance, provider health, and costs in real-time, ensuring reliable and resilient operation across fluctuating model APIs. This setup allows developers to rapidly deploy AI features without the complexities of managing provider connections or underlying compute.