Prompt Caching with Deep Agents
Blog post from LangChain
Prompt caching is a cost-effective feature for running AI agents at scale, offering significant reductions in token costs by storing and reusing snapshots of a model's state after processing a prompt. The Deep Agents harness leverages prompt caching across various model providers to minimize API costs by automatically setting explicit cache breakpoints when supported, and opting into implicit caching otherwise, to maximize cache reads. While different providers offer varied levels of support for features like explicit breakpoints, configurable TTLs, and cache prewarming, Deep Agents ensures that users can switch providers without losing cost-saving benefits. Real-world evaluations with models like claude-haiku-4-5, gpt-5.4-mini, and gemini-3.5-flash have shown token cost reductions ranging from 49% to 80%. As the feature landscape evolves, Deep Agents will integrate new capabilities, while tools like LangSmith provide observability into API costs and caching efficiency to further optimize agent performance.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| AI Agents | 1 | 4,874 | 1,103 | 240 | -1% |
| Observability | 1 | 3,430 | 674 | 183 | +0% |