Running AI workflows in production: what we learned when Gemini went down
Blog post from Kestra
In an examination of AI workflow orchestration, the author details a Kestra-powered system that automates the routing of GitHub issues to the correct product squad using an internal ownership map from Notion. This system efficiently handles high demand and potential failures of AI models like Gemini 3.1 Pro by implementing fallbacks to older versions and utilizing deterministic steps for most tasks, aside from the AI classification step. The Kestra flow incorporates practices such as retry policies to manage transient failures, concurrency limits to prevent exceeding quotas, and a unified alert path to maintain operational consistency. The author argues that treating AI models as one component within a larger workflow, rather than the focal point, allows for more manageable and reliable operations, highlighting the importance of orchestration in smoothing over the complexities and potential disruptions in AI workflows.