Home / Companies / Galtea / Blog / Post Details
Content Deep Dive

LLM as a Judge prompts: templates, rubrics, and best practices | Galtea Blog

Blog post from Galtea

Post Details
Company
Date Published
Author
-
Word Count
4,027
Company Posts That Month
5
Language
English
Hacker News Points
-
Summary

The text provides an in-depth guide on creating and optimizing Large Language Model (LLM) judge prompts, which are small programs used to evaluate AI-generated content based on specific criteria. A successful LLM-as-a-judge prompt consists of four essential parts: a criterion definition using domain-specific vocabulary, a reasoning structure for claim-by-claim evaluation, a deterministic scoring rule, and handling of edge cases. The guide emphasizes the importance of precise rubric design to ensure accurate and reliable judgments, cautioning against vague language or overly complex rationale structures that can lead to biased or inconsistent results. It also discusses common pitfalls in designing judge prompts, such as implicit length preference or mixing generator instructions with judge instructions, and suggests best practices for calibration, including versioning prompts alongside gold sets to track and attribute any alignment regression. The text advises against using custom prompts when deterministic checks are sufficient or when calibrated, published prompts are available, and underscores the necessity of treating judge prompts as hypotheses that require rigorous testing and refinement before deployment.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
LLM 13 9,074 1,640 224 +53%
RAG 5 2,105 333 83 +124%
AI Guardrails 2 216 116 52 -40%
AI Agents 1 4,942 1,264 250 +12%
AI Coding Assistant 1 1,798 527 167 +21%