The text discusses the limitations of traditional serverless computing in handling Large Language Models (LLMs) and their interactions, which require sustained compute resources and continuous execution patterns. A new compute model called Fluid is introduced, designed to address these challenges by prioritizing existing resources before spawning new ones, scaling inside a function, and dynamically reallocating compute where needed. This approach reduces overhead, enables efficient scaling, and ensures that every function invocation actively contributes to processing. Additionally, Fluid compute provides edge security with Vercel Firewall, secure instance architecture, enhanced reliability and availability, and is optimized for AI workloads requiring efficiency and security.