Reproducing Variance: Caching in Agentic LLM Pipelines

Post Details

Company

AI21 Labs

Date Published

May 13, 2026

Author

Guy Lichtenfeld, Software Engineer • Orchestration

Word Count

2,024

Company Posts That Month

1

Language

English

Hacker News Points

-

Source URL

www.ai21.com/blog/caching-in-agentic-llm-pipelines

Summary

In the exploration of optimizing agentic workflows with non-deterministic Large Language Models (LLMs), the challenge of creating an effective caching mechanism that accommodates both reproducibility and variance arises. Traditional caching systems, designed for deterministic outputs, fall short when applied to LLMs due to their inherent variability, which can be influenced by parameters like temperature. To address this, a novel approach to cache key design was developed, encoding the position of an LLM call within a pipeline rather than relying on execution order. This ensures that cache keys remain consistent regardless of the order in which processes complete, allowing for precise control over when results should be reused or recomputed. This innovative method supports complex experimental setups, such as best-of-N sampling, where variance is necessary to ensure the accuracy and independence of results. By incorporating specific rules for cache key construction, the system allows for efficient A/B testing and iteration by only re-running necessary components and leveraging cached results where applicable, ultimately making experimentation with agentic workflows more feasible and cost-effective.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	24	9,074	1,640	224	+53%