Building an AI Gateway on Fastly Compute
Blog post from Fastly
AI applications often fail due to mundane issues like provider downtime or hardcoded model names, which become more problematic as AI workloads evolve into complex multi-step processes. This text introduces a proof-of-concept Edge AI Gateway built on Fastly Compute to address these challenges by creating a policy-driven routing layer between applications and Language Model (LLM) providers. The gateway allows an application to send a standard request, which is then classified at the edge to determine the appropriate provider and model based on factors like complexity and cost, without altering the main application. The system leverages Fastly's low-latency Compute service, using WebAssembly for rapid cold starts and secure sandboxing. The gateway uses a classification model, Mercury 2, to quickly decide on routing, offering advantages in cost and response time by avoiding unnecessary use of expensive models. Routing policies are stored in Fastly's KV Store, enabling seamless updates without redeployment, and credentials are securely managed using Fastly's Secret Store. Although still a proof-of-concept, the gateway showcases potential improvements in efficiency for managing multi-provider AI operations, with future capabilities including failover enhancements and caching.