Durable Execution: The Key to Harnessing AI Agents in Production
Blog post from Inngest
Durable execution, a programming model that ensures code completion even in the face of failures, reached the early majority in 2025, driven by AI agent infrastructure needs and new offerings from AWS, Cloudflare, and Vercel. AI agents, which introduce multiple failure points such as orchestration and human-in-the-loop (HITL) interactions, benefit from durable execution's capabilities, including automatic state persistence, retries, and workflow resumption. These features allow agents to become production-ready by handling complex, probabilistic, compositional, and stateful operations seamlessly. Human-in-the-loop patterns, crucial for AI agent oversight, align well with durable execution by enabling workflows to pause and resume without losing state. The reliable execution of tool calls depends on the ability to checkpoint between calls and maintain execution context across failures. As the demand for interactive, user-facing AI agents grows, durable execution is evolving to support low-latency patterns, allowing real-time conversational experiences. This involves innovations such as durable endpoints, optimistic execution, and edge-based execution, which are being integrated into durable execution engines to enhance latency and reliability.