Braintrust Blog - Plushcap

Blog URL

www.braintrust.dev/blog

Posts YTD

105 ↑ vs 13 last year

Avg Posts/Month

5.4 since 2024

Monthly Post Volume

Start year: 2023 2024 2025 2026

Post Details

Search:

Title	Author	Published	Words	HN Pts
What to do when a new AI model comes out	Ornella Altunyan	2024-12-04	459	1
Our approach to hybrid deployment	Ornella Altunyan	2025-01-08	586	--
How Notion develops world-class AI features	Ankur Goyal, Simon Last	2024-10-09	1,004	--
Eval feedback loops	Ankur Goyal	2024-04-17	1,002	--
Getting started with automated evaluations	Albert Zhang	2024-04-24	851	--
Copilot autocomplete in the Braintrust UI	Ankur Goyal	2024-09-05	524	--
Functions: flexible AI engineering primitives	Ornella Altunyan	2024-10-08	853	--
Support for Python tool functions	Ornella Altunyan	2024-11-13	285	--
Logging with attachments	Ornella Altunyan	2024-10-24	347	--
How Hostinger evaluates AI applications with Braintrust	Albert Zhang	2024-02-27	292	--
I ran an eval. Now what?	Albert Zhang, Ornella Altunyan	2024-10-17	1,041	--
How to improve your evaluations	Albert Zhang	2024-06-20	946	--
How Zapier builds production-ready AI products	Mike Knoop & Ankur Goyal	2024-05-30	1,161	2
Custom scoring functions in the Braintrust Playground	Ankur Goyal	2024-09-16	511	--
Building secure and scalable production apps with OpenAI’s Realtime API	Ornella Altunyan, Kevin Chen	2024-11-04	672	--
Announcing our $36 million Series A	Ankur Goyal	2024-10-08	476	--
Braintrust achieves SOC 2 Type II compliance	Ankur Goyal	2024-07-15	106	--
The top 10 most loved features of 2024	Ornella Altunyan	2024-12-31	433	--
Evaluating Gemini models for vision	Ornella Altunyan, Anirudh Baddepudi	2024-11-14	615	--
AI development loops	Taylor Laubach	2024-05-06	828	1
Braintrust selected to be in the Enterprise Tech 30	Ankur Goyal	2024-04-09	119	--
New monitor page for easy analytics	Ornella Altunyan	2024-12-18	250	--
Building a RAG app with MongoDB Atlas	Ornella Altunyan	2024-11-18	1,143	--
Evaluating agents	Ornella Altunyan	2025-01-22	2,161	1
How Loom auto-generates video titles	Ornella Altunyan, Matt Granmoe	2025-01-27	1,040	--
How Fintool generates millions of financial insights	Ornella Altunyan, Nicolas Bustamante	2025-01-31	738	--
Bedrock, Vertex AI, and universal structured outputs support	Ornella Altunyan	2025-02-11	385	--
Brainstore: the purpose-built database for the AI engineering era	Ankur Goyal	2025-03-03	1,692	5
Brainstore is now the default	Ankur Goyal	2025-03-31	616	--
Resilient observability by design	Ornella Altunyan, Sachin Padmanabhan	2025-04-03	767	--
Webinar recap: Eval best practices	Ornella Altunyan	2025-04-22	582	--
How Coursera builds next-generation learning tools	Ornella Altunyan, Winnie Tam, Sophie Gao	2025-05-12	1,110	--
Eval playgrounds for faster, focused iteration	Ornella Altunyan	2025-05-27	450	--
Experiments UI: Now 10x faster	Tara Nagar, Ornella Altunyan	2025-06-03	1,259	--
GPT-5 vs. Claude Opus 4.1	Ornella Altunyan, Wayde Gilliam, Sarah Zeng	2025-08-08	689	--
Braintrust is not an eval framework	Ankur Goyal	2025-07-14	1,276	--
The canonical agent architecture: A while loop with tools	Ankur Goyal	2025-08-07	891	--
Building with Grok	Wayde Gilliam	2025-07-11	681	--
Five hard-learned lessons about AI evals	Ankur Goyal	2025-07-17	903	--
How Graphite builds reliable AI code review at scale	Ornella Altunyan	2025-08-25	1,161	--
The rise of async programming	Ankur Goyal	2025-08-19	846	--
Systematic prompt engineering: From trial and error to data-driven optimization	Braintrust Team	2025-08-21	1,444	--
A/B testing can't keep up with AI	Mengying Li, Ankur Goyal	2025-09-03	732	--
AI observability: Why traditional monitoring falls short	Braintrust Team	2025-08-21	1,209	--
Testing different models with different prompts: A hands-on guide with Braintrust	Braintrust Team	2025-08-21	592	--
Testing different models with different prompts: A systematic approach to AI development	Braintrust Team	2025-08-21	1,381	--
The infrastructure behind AI development: Why testing and observability matter	Sarah Zeng	2025-08-21	1,015	--
The 4 best LLM evaluation platforms in 2025: Why Braintrust sets the …	Braintrust Team	2025-08-21	2,720	--
Integrating AI into production applications: Beyond the demo phase	Braintrust Team	2025-08-21	1,695	--
AI that knows your data	Ornella Altunyan	2025-09-13	447	--
10 best LLM evaluation tools with superior integrations in	Braintrust Team	2025-09-19	2,444	--
Why aspirational evals are critical when new AI models launch	Ornella Altunyan	2025-09-29	747	--
Top 10 LLM observability tools: Complete guide for	Braintrust Team	2025-10-02	4,372	--
Arize Phoenix vs. Braintrust: Which stack fits your LLM evaluation & observability …	Braintrust Team	2025-10-09	1,996	--
Measuring what matters: An intro to AI evals	Carlos Esteban	2025-10-10	1,693	--
How Dropbox automates evals for conversational AI	Ornella Altunyan	2025-10-15	1,544	--
Braintrust on the Vercel Marketplace	Ornella Altunyan	2025-10-16	567	--
The 4 best AI evals tools for running evaluations in your CI/CD …	Braintrust Team	2025-10-17	1,781	--
How Portola empowers subject matter experts to improve AI quality	Ornella Altunyan	2025-10-20	1,342	--
Braintrust Java SDK: AI observability and evals for the JVM	Andrew Kent	2025-10-23	495	--
The 5 best RAG evaluation tools in	Braintrust Team	2025-10-23	3,939	--
Customer stories - Braintrust blog - Braintrust	--	2025-10-25	281	--
Engineering - Braintrust blog - Braintrust	--	2025-10-25	136	--
Product - Braintrust blog - Braintrust	--	2025-10-25	489	--
Company - Braintrust blog - Braintrust	--	2025-10-25	263	--
Langfuse alternative: Braintrust vs. Langfuse for LLM observability	Braintrust Team	2025-10-27	952	--
How to eval: The Braintrust way	Braintrust Team	2025-10-27	2,179	--
Helicone alternative: Why Braintrust is the best pick	Braintrust Team	2025-10-28	4,313	--
LLM evaluation metrics: Full guide to LLM evals and key metrics	Braintrust Team	2025-10-28	2,490	--
The 5 best prompt versioning tools in	Braintrust Team	2025-10-28	4,592	--
RAG Evaluation Metrics: How to evaluate your RAG pipeline with Braintrust	Braintrust Team	2025-11-05	3,966	--
How to evaluate voice agents	Braintrust Team	2025-11-05	3,453	--
Webinar recap: Eval best practices	Ornella Altunyan	2025-04-22	580	--
A/B testing for LLM prompts: A practical guide	Braintrust Team	2025-11-13	836	--
The 5 best prompt evaluation tools in	Braintrust Team	2025-11-17	4,112	--
The three pillars of AI observability	Ankur Goyal	2025-11-18	1,350	--
How to evaluate your agent with Gemini	Braintrust Team	2025-11-18	2,347	--
Turn production data into better AI with Loop	Ornella Altunyan	2025-11-24	760	--
Top 5 platforms for agent evals in	Braintrust Team	2024-11-24	2,353	--
How Retool uses Loop to turn production data into AI roadmap decisions	Ornella Altunyan	2025-11-28	1,536	--
Evals are a team sport: How we built Loop	Mengying Li, David Kim	2025-11-25	1,545	--
The 5 best LLMOps platforms in	Braintrust Team	2025-12-05	2,267	--
The 4 best LLM monitoring tools to understand how your AI agents …	Braintrust Team	2025-12-05	1,591	--
Top tools for evaluating voice agents in	Braintrust Team	2025-12-11	1,709	--
Brainstore makes AI observability at scale possible	Ornella Altunyan	2025-12-18	445	--
7 best AI observability platforms for LLMs in	Braintrust Team	2025-12-19	2,151	--
AI observability beyond Python and TypeScript	Ornella Altunyan	2025-12-22	179	--
Claude Code meets Braintrust	Morgane Palomares	2025-12-23	332	--
Debugging Ralph Wiggum with Braintrust Logs	Jess Wang	2026-01-13	950	--
7 best LLM tracing tools for multi-agent AI systems (2026)	Braintrust Team	2026-01-13	2,494	--
AI observability tools: A buyer's guide to monitoring AI agents in production …	Braintrust Team	2026-01-14	4,005	--
Building observable AI agents with Temporal	Ethan Ruhe, Ornella Altunyan	2026-01-20	641	--
Testing if "bash is all you need"	Ankur Goyal	2026-01-22	857	--
Security is a choice: how Braintrust lets you decide where your AI …	Jan 21, 2026	2026-01-24	495	--
Langfuse alternatives: Top 5 competitors compared (2026)	Braintrust Team	2026-01-25	1,706	--
Arize AI alternatives: Top 5 Arize competitors compared (2026)	Braintrust Team	2026-01-25	1,682	--
5 best AI evaluation tools for AI systems in production (2026)	Braintrust Team	2026-01-25	2,081	--
5 best prompt engineering tools (and how to choose one in 2026)	Braintrust Team	2026-02-02	1,987	--
AI agent evaluation: A practical framework for testing multi-step agents (metrics, harnesses, …	Braintrust Team	2026-02-02	2,920	--
5 best AI agent observability tools for agent reliability in	Braintrust Team	2026-02-02	2,279	--
7 best prompt management tools in 2026 (tested and compared)	Braintrust Team	2026-02-02	2,045	--
What is LLM monitoring? (Quality, cost, latency, and drift in production)	Braintrust Team	2026-02-09	3,324	--
What is LLM observability? (Tracing, evals, and monitoring explained)	Braintrust Team	2026-02-09	3,118	--
What is LLM evaluation? A practical guide to evals, metrics, and regression …	Braintrust Team	2026-02-09	2,830	--
What is prompt management? Versioning, collaboration, and deployment for prompts	Braintrust Team	2026-02-09	2,452	--
The 5 pillars of AI model performance	Jess Wang	2026-02-12	3,186	--
Braintrust's series B: building the infrastructure for production AI	--	2026-02-17	728	--
What is prompt versioning? Best practices for iteration without breaking production	--	2026-02-19	3,207	--
What is eval-driven development: How to ship high-quality agents without guessing	--	2026-02-20	2,532	--
LLM monitoring vs LLM observability: What's the difference?	--	2026-02-20	2,599	--
What is prompt evaluation? How to test prompts with metrics and judges	--	2026-02-20	2,818	--
Trace keynote recap: See it, improve it, optimize it	Competition	2026-02-26	1,179	--
Automatically discover what matters in your production traces with Topics	--	2026-02-26	572	--
What is agent evaluation? How to test agents with tasks, simulations, and …	--	2026-02-28	2,222	--
What is an LLM-as-a-judge? When to use it (and when to use …	--	2026-02-28	3,008	--
What is agent observability? Tracing tool calls, memory, and multi-step reasoning	--	2026-02-28	2,116	--
What is RAG evaluation? Measuring retrieval quality and answer groundedness	--	2026-02-28	2,792	--
DeepEval alternatives (2026): Best tools for LLM evals, RAG, and agent testing	--	2026-03-02	2,687	--
7 best tools for debugging AI agents in production (2026)	--	2026-03-02	2,964	--
LangSmith alternatives (2026): Best tools for LLM tracing, evals, and prompt iteration	--	2026-03-03	1,893	--
Best Promptfoo alternatives in 2026: Open-source tools and SaaS	--	2026-03-04	2,594	--
How to build your first offline eval	--	2026-03-11	2,473	--
Supporting privacy and compliance for EU teams	--	2026-03-13	735	--
Braintrust vs Grafana for LLM observability: Logging vs evals	--	2026-03-13	2,100	--
Braintrust vs. Datadog for LLM observability: Logging vs. evals	--	2026-03-13	2,291	--
7 best prompt playgrounds for PMs in	--	2026-03-13	2,712	--
Logging vs. AI observability: Why logs alone aren't enough to monitor AI …	--	2026-03-13	2,471	--
Keep building with the Starter plan	--	2026-03-16	387	--
Evals for PMs: A practical guide to AI product quality	--	2026-03-18	2,224	--
What is AI observability?	--	2026-03-20	1,618	--
How to make requests to Gemini using the OpenAI SDK	--	2026-03-20	1,109	--
How to test AI models	--	2026-03-20	2,102	--
6 best LLM gateways for developers in	--	2026-03-20	1,787	--
How to make requests to Gemini using the Claude (Anthropic) SDK	--	2026-03-20	1,011	--
Evals are the new PRD	--	2026-03-28	1,518	--
How to make requests to Claude using the OpenAI SDK	--	2026-03-28	1,164	--
How to make requests to OpenAI using the Claude (Anthropic) SDK	--	2026-03-28	1,137	--
4 best LLM gateways for observability: tracing, cost attribution, and debuggability	--	2026-03-28	1,856	--
Best AI evals products for self-hosted / on-prem enterprise deployments (2026)	--	2026-03-28	2,595	--
8 best human-in-the-loop LLM evaluation platforms in	--	2026-04-04	3,230	--
Braintrust alternatives: What to consider (and why there's no true substitute)	--	2026-04-04	2,873	--
Braintrust CLI and MCP	--	2026-04-04	705	--
LLM-as-a-judge vs human-in-the-loop evals: When to use each	--	2026-04-04	3,333	--
The prompt optimization loop: How to improve prompts through iterative evaluation with …	--	2026-04-04	1,568	--
How Brainstore works: architecture for AI observability at scale	--	2026-04-07	2,051	--
Agentic eval development with the Braintrust CLI	--	2026-04-10	840	--
How to set up manual review workflows for AI agent traces	--	2026-04-10	1,949	--
LangSmith vs. Braintrust: Which AI evaluation platform is better?	--	2026-04-10	1,403	--
How to run human-in-the-loop evals for LLM apps	--	2026-04-10	1,671	--
Braintrust vs. Galileo AI: Which AI evaluation platform is better?	--	2026-04-10	1,397	--
How to prepare for AI compliance and governance	--	2026-04-14	993	--
Datadog LLM observability alternatives (2026): Better tools for AI quality	--	2026-04-21	2,241	--
PromptLayer alternatives for LLM evaluation teams (2026)	--	2026-04-21	2,175	--
Confident AI alternatives (2026): Best tools for LLM evaluation	--	2026-04-21	3,051	--
Best Galileo AI alternatives for LLM evaluation in	--	2026-04-21	1,925	--
Best Weights & Biases alternatives for LLM evaluation	--	2026-04-27	2,226	--
Braintrust vs. Confident AI: LLM evaluation platform comparison	--	2026-04-27	1,601	--
Braintrust vs. PromptLayer 2026: Prompt management vs. full AI quality platform	--	2026-04-27	1,454	--
7 best Grafana alternatives for LLM evaluation and AI quality	--	2026-04-27	2,223	--
How to earn stakeholder trust with evals and observability	--	2026-04-29	1,299	--
Best tools for tracking LLM costs in production (2026)	--	2026-04-30	2,038	--
How to reduce costs for LLMs using Braintrust	--	2026-04-30	2,104	--
Braintrust vs. Weights & Biases 2026: Which AI evaluation platform is better?	--	2026-04-30	1,312	--
Braintrust vs. Promptfoo: 2026 LLM evaluation comparison	--	2026-04-30	1,592	--
How AI observability helps lower LLM cost at scale	--	2026-05-07	2,424	--
Agent observability: The complete guide for	--	2026-05-07	2,483	--
Why your traces and evals belong in the same place	--	2026-05-11	603	--
How to evaluate multi-turn conversations	--	2026-05-15	2,216	--
LLM call observability: Tracing every request, response, and token in production	--	2026-05-17	3,860	--
Best tools for tracking LLM costs in production (2026)	--	2026-05-17	2,041	--
Best hallucination detection tools for LLM applications (2026): catch bad outputs before …	--	2026-05-21	3,056	--
Best RAG observability tools (2026): monitor retrieval and generation in production	--	2026-05-21	2,884	--
The six generations of AI agents and how to eval them	--	2026-05-22	5,533	--
How to improve your golden datasets with human review	--	2026-05-24	1,516	--
How to turn LLM production failures into regression tests	--	2026-06-01	3,035	--
What are Topics in Braintrust and how do they work? (2026)	--	2026-06-02	1,980	--
Best AI governance platforms for LLM applications (2026): Eval, audit, and enforce	--	2026-06-02	3,049	--
The easiest way to add LLM observability to your AI app (2026)	--	2026-06-02	3,146	--
How to design custom facets for AI agent traces (2026)	--	2026-06-02	2,925	--
Automate pattern discovery with Topics, now generally available	--	2026-06-02	750	--
AI observability is active observability	--	2026-06-02	336	--
How to track LLM costs (2026): A playbook for per-user, per-feature, and …	--	2026-06-03	2,845	--
How to track LLM token usage (2026): Prompt, completion, context window, and …	--	2026-06-03	2,435	--
How we made continuous trace intelligence possible at scale	--	2026-06-05	2,869	--
How to build continuous evaluation for AI agents with trace classifications (2026)	--	2026-06-10	1,911	--
What are AI hallucination evaluations? Metrics and methods that work in	--	2026-06-10	2,084	--
Best AI conversation analytics tools (2026): classify agent traffic at scale	--	2026-06-11	2,738	--
Best AI agent analytics tools (2026): see trends across every agent answer	--	2026-06-12	2,431	--
How to use Braintrust with any framework or provider	--	2026-06-17	1,640	--
How to discover hidden failure patterns in your AI agent's production traffic …	--	2026-06-18	2,016	--
How to mine AI agent production traffic for product roadmap signals (2026)	--	2026-06-18	2,235	--
How to analyze AI agent usage patterns to build eval datasets (2026)	--	2026-06-18	2,147	--
Best LLM routers and model routing platforms in	--	2026-06-18	2,657	--

Plushcap, by Matt Makai. 2021-2026.