Evidence-Based Prompting Strategies for LLM-as-a-Judge: Explanations and Chain-of-Thought

Company

Arize

Date Published

Aug. 20, 2025

Author

Sri Chavali, Elizabeth Hutton, Aparna Dhinakaran

Word count

1364

Language

English

Hacker News points

None

URL

arize.com/blog/evidence-based-prompting-strategies-for-llm-as-a-judge-explanations-and-chain-of-thought

Summary

When using large language models (LLMs) as evaluators, the inclusion of explanations and the use of chain-of-thought (CoT) prompting are crucial design choices that influence the quality and transparency of their judgments. Explanations enhance alignment with human judgments by reducing variance, exposing decision factors, and providing reusable data for retraining or improving models, while the order of explanations before or after labels has little effect on accuracy but affects the clarity of reasoning. CoT prompting, although widely adopted, shows mixed effectiveness and is most beneficial for tasks requiring complex reasoning steps, though it can increase complexity and costs in simpler tasks. Modern reasoning models, which perform internal deliberation, often outperform base models but come with trade-offs in latency and cost, making explicit CoT prompting less necessary. Therefore, explanations are recommended as part of the output to audit decisions and refine evaluation setups, with careful consideration of prompt design, score definitions, and bias mitigation strategies to ensure reliable evaluations.