Braintrust Blog - Plushcap

Blog URL

www.braintrust.dev/blog

Posts YTD

105 ↑ vs 13 last year

Avg Posts/Month

8.8 since 2026

Monthly Post Volume

Start year: 2023 2024 2025 2026

Post Details

Search:

Title	Author	Published	Words	HN Pts
Debugging Ralph Wiggum with Braintrust Logs	Jess Wang	2026-01-13	950	--
7 best LLM tracing tools for multi-agent AI systems (2026)	Braintrust Team	2026-01-13	2,494	--
AI observability tools: A buyer's guide to monitoring AI agents in production …	Braintrust Team	2026-01-14	4,005	--
Building observable AI agents with Temporal	Ethan Ruhe, Ornella Altunyan	2026-01-20	641	--
Testing if "bash is all you need"	Ankur Goyal	2026-01-22	857	--
Security is a choice: how Braintrust lets you decide where your AI …	Jan 21, 2026	2026-01-24	495	--
Langfuse alternatives: Top 5 competitors compared (2026)	Braintrust Team	2026-01-25	1,706	--
Arize AI alternatives: Top 5 Arize competitors compared (2026)	Braintrust Team	2026-01-25	1,682	--
5 best AI evaluation tools for AI systems in production (2026)	Braintrust Team	2026-01-25	2,081	--
5 best prompt engineering tools (and how to choose one in 2026)	Braintrust Team	2026-02-02	1,987	--
AI agent evaluation: A practical framework for testing multi-step agents (metrics, harnesses, …	Braintrust Team	2026-02-02	2,920	--
5 best AI agent observability tools for agent reliability in	Braintrust Team	2026-02-02	2,279	--
7 best prompt management tools in 2026 (tested and compared)	Braintrust Team	2026-02-02	2,045	--
What is LLM monitoring? (Quality, cost, latency, and drift in production)	Braintrust Team	2026-02-09	3,324	--
What is LLM observability? (Tracing, evals, and monitoring explained)	Braintrust Team	2026-02-09	3,118	--
What is LLM evaluation? A practical guide to evals, metrics, and regression …	Braintrust Team	2026-02-09	2,830	--
What is prompt management? Versioning, collaboration, and deployment for prompts	Braintrust Team	2026-02-09	2,452	--
The 5 pillars of AI model performance	Jess Wang	2026-02-12	3,186	--
Braintrust's series B: building the infrastructure for production AI	--	2026-02-17	728	--
What is prompt versioning? Best practices for iteration without breaking production	--	2026-02-19	3,207	--
What is eval-driven development: How to ship high-quality agents without guessing	--	2026-02-20	2,532	--
LLM monitoring vs LLM observability: What's the difference?	--	2026-02-20	2,599	--
What is prompt evaluation? How to test prompts with metrics and judges	--	2026-02-20	2,818	--
Trace keynote recap: See it, improve it, optimize it	Competition	2026-02-26	1,179	--
Automatically discover what matters in your production traces with Topics	--	2026-02-26	572	--
What is agent evaluation? How to test agents with tasks, simulations, and …	--	2026-02-28	2,222	--
What is an LLM-as-a-judge? When to use it (and when to use …	--	2026-02-28	3,008	--
What is agent observability? Tracing tool calls, memory, and multi-step reasoning	--	2026-02-28	2,116	--
What is RAG evaluation? Measuring retrieval quality and answer groundedness	--	2026-02-28	2,792	--
DeepEval alternatives (2026): Best tools for LLM evals, RAG, and agent testing	--	2026-03-02	2,687	--
7 best tools for debugging AI agents in production (2026)	--	2026-03-02	2,964	--
LangSmith alternatives (2026): Best tools for LLM tracing, evals, and prompt iteration	--	2026-03-03	1,893	--
Best Promptfoo alternatives in 2026: Open-source tools and SaaS	--	2026-03-04	2,594	--
How to build your first offline eval	--	2026-03-11	2,473	--
Supporting privacy and compliance for EU teams	--	2026-03-13	735	--
Braintrust vs Grafana for LLM observability: Logging vs evals	--	2026-03-13	2,100	--
Braintrust vs. Datadog for LLM observability: Logging vs. evals	--	2026-03-13	2,291	--
7 best prompt playgrounds for PMs in	--	2026-03-13	2,712	--
Logging vs. AI observability: Why logs alone aren't enough to monitor AI …	--	2026-03-13	2,471	--
Keep building with the Starter plan	--	2026-03-16	387	--
Evals for PMs: A practical guide to AI product quality	--	2026-03-18	2,224	--
What is AI observability?	--	2026-03-20	1,618	--
How to make requests to Gemini using the OpenAI SDK	--	2026-03-20	1,109	--
How to test AI models	--	2026-03-20	2,102	--
6 best LLM gateways for developers in	--	2026-03-20	1,787	--
How to make requests to Gemini using the Claude (Anthropic) SDK	--	2026-03-20	1,011	--
Evals are the new PRD	--	2026-03-28	1,518	--
How to make requests to Claude using the OpenAI SDK	--	2026-03-28	1,164	--
How to make requests to OpenAI using the Claude (Anthropic) SDK	--	2026-03-28	1,137	--
4 best LLM gateways for observability: tracing, cost attribution, and debuggability	--	2026-03-28	1,856	--
Best AI evals products for self-hosted / on-prem enterprise deployments (2026)	--	2026-03-28	2,595	--
8 best human-in-the-loop LLM evaluation platforms in	--	2026-04-04	3,230	--
Braintrust alternatives: What to consider (and why there's no true substitute)	--	2026-04-04	2,873	--
Braintrust CLI and MCP	--	2026-04-04	705	--
LLM-as-a-judge vs human-in-the-loop evals: When to use each	--	2026-04-04	3,333	--
The prompt optimization loop: How to improve prompts through iterative evaluation with …	--	2026-04-04	1,568	--
How Brainstore works: architecture for AI observability at scale	--	2026-04-07	2,051	--
Agentic eval development with the Braintrust CLI	--	2026-04-10	840	--
How to set up manual review workflows for AI agent traces	--	2026-04-10	1,949	--
LangSmith vs. Braintrust: Which AI evaluation platform is better?	--	2026-04-10	1,403	--
How to run human-in-the-loop evals for LLM apps	--	2026-04-10	1,671	--
Braintrust vs. Galileo AI: Which AI evaluation platform is better?	--	2026-04-10	1,397	--
How to prepare for AI compliance and governance	--	2026-04-14	993	--
Datadog LLM observability alternatives (2026): Better tools for AI quality	--	2026-04-21	2,241	--
PromptLayer alternatives for LLM evaluation teams (2026)	--	2026-04-21	2,175	--
Confident AI alternatives (2026): Best tools for LLM evaluation	--	2026-04-21	3,051	--
Best Galileo AI alternatives for LLM evaluation in	--	2026-04-21	1,925	--
Best Weights & Biases alternatives for LLM evaluation	--	2026-04-27	2,226	--
Braintrust vs. Confident AI: LLM evaluation platform comparison	--	2026-04-27	1,601	--
Braintrust vs. PromptLayer 2026: Prompt management vs. full AI quality platform	--	2026-04-27	1,454	--
7 best Grafana alternatives for LLM evaluation and AI quality	--	2026-04-27	2,223	--
How to earn stakeholder trust with evals and observability	--	2026-04-29	1,299	--
Best tools for tracking LLM costs in production (2026)	--	2026-04-30	2,038	--
How to reduce costs for LLMs using Braintrust	--	2026-04-30	2,104	--
Braintrust vs. Weights & Biases 2026: Which AI evaluation platform is better?	--	2026-04-30	1,312	--
Braintrust vs. Promptfoo: 2026 LLM evaluation comparison	--	2026-04-30	1,592	--
How AI observability helps lower LLM cost at scale	--	2026-05-07	2,424	--
Agent observability: The complete guide for	--	2026-05-07	2,483	--
Why your traces and evals belong in the same place	--	2026-05-11	603	--
How to evaluate multi-turn conversations	--	2026-05-15	2,216	--
LLM call observability: Tracing every request, response, and token in production	--	2026-05-17	3,860	--
Best tools for tracking LLM costs in production (2026)	--	2026-05-17	2,041	--
Best hallucination detection tools for LLM applications (2026): catch bad outputs before …	--	2026-05-21	3,056	--
Best RAG observability tools (2026): monitor retrieval and generation in production	--	2026-05-21	2,884	--
The six generations of AI agents and how to eval them	--	2026-05-22	5,533	--
How to improve your golden datasets with human review	--	2026-05-24	1,516	--
How to turn LLM production failures into regression tests	--	2026-06-01	3,035	--
What are Topics in Braintrust and how do they work? (2026)	--	2026-06-02	1,980	--
Best AI governance platforms for LLM applications (2026): Eval, audit, and enforce	--	2026-06-02	3,049	--
The easiest way to add LLM observability to your AI app (2026)	--	2026-06-02	3,146	--
How to design custom facets for AI agent traces (2026)	--	2026-06-02	2,925	--
Automate pattern discovery with Topics, now generally available	--	2026-06-02	750	--
AI observability is active observability	--	2026-06-02	336	--
How to track LLM costs (2026): A playbook for per-user, per-feature, and …	--	2026-06-03	2,845	--
How to track LLM token usage (2026): Prompt, completion, context window, and …	--	2026-06-03	2,435	--
How we made continuous trace intelligence possible at scale	--	2026-06-05	2,869	--
How to build continuous evaluation for AI agents with trace classifications (2026)	--	2026-06-10	1,911	--
What are AI hallucination evaluations? Metrics and methods that work in	--	2026-06-10	2,084	--
Best AI conversation analytics tools (2026): classify agent traffic at scale	--	2026-06-11	2,738	--
Best AI agent analytics tools (2026): see trends across every agent answer	--	2026-06-12	2,431	--
How to use Braintrust with any framework or provider	--	2026-06-17	1,640	--
How to discover hidden failure patterns in your AI agent's production traffic …	--	2026-06-18	2,016	--
How to mine AI agent production traffic for product roadmap signals (2026)	--	2026-06-18	2,235	--
How to analyze AI agent usage patterns to build eval datasets (2026)	--	2026-06-18	2,147	--
Best LLM routers and model routing platforms in	--	2026-06-18	2,657	--

Plushcap, by Matt Makai. 2021-2026.