LLM as a Judge prompts: templates, rubrics, and best practices

Post Details

Company

Galtea

Date Published

May 18, 2026

Author

-

Word Count

4,027

Company Posts That Month

5

Language

English

Hacker News Points

-

Source URL

galtea.ai/blog/llm-as-a-judge-prompts-templates-rubrics-and-best-practices

Summary

The text provides an in-depth guide on creating and optimizing Large Language Model (LLM) judge prompts, which are small programs used to evaluate AI-generated content based on specific criteria. A successful LLM-as-a-judge prompt consists of four essential parts: a criterion definition using domain-specific vocabulary, a reasoning structure for claim-by-claim evaluation, a deterministic scoring rule, and handling of edge cases. The guide emphasizes the importance of precise rubric design to ensure accurate and reliable judgments, cautioning against vague language or overly complex rationale structures that can lead to biased or inconsistent results. It also discusses common pitfalls in designing judge prompts, such as implicit length preference or mixing generator instructions with judge instructions, and suggests best practices for calibration, including versioning prompts alongside gold sets to track and attribute any alignment regression. The text advises against using custom prompts when deterministic checks are sufficient or when calibrated, published prompts are available, and underscores the necessity of treating judge prompts as hypotheses that require rigorous testing and refinement before deployment.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	13	9,074	1,640	224	+53%
RAG	5	2,105	333	83	+124%
AI Guardrails	2	216	116	52	-40%
AI Agents	1	4,942	1,264	250	+12%
AI Coding Assistant	1	1,798	527	167	+21%

LLM as a Judge prompts: templates, rubrics, and best practices | Galtea Blog