Confident AI Blog - Plushcap

Blog URL

www.confident-ai.com/blog

Posts YTD

22 ↑ vs 10 last year

Avg Posts/Month

1.5 since 2025

Monthly Post Volume

Start year: 2024 2025 2026

Post Details

Search:

Title	Author	Published	Words	HN Pts
OWASP Top 10 2025 for LLM Applications: What’s new? Risks, and Mitigation …	Kritin Vongthongsri	2025-01-19	3,590	--
The People's Choice of Top LLM Evaluation Tools in 2025	Jeffrey Ip	2025-01-18	1,829	--
LLM Guardrails: The Ultimate Guide to Safeguard LLM Systems	Jeffrey Ip	2025-01-26	3,024	--
LLM Agent Evaluation: Assessing Tool Use, Task Completion, Agentic Reasoning, and More	Kritin Vongthongsri	2025-01-31	2,702	--
How I Built Deterministic LLM Evaluation Metrics for DeepEval	Jeffrey Ip	2025-02-09	2,335	--
How I raised Confident AI's $2.2M seed round in 5 days	Jeffrey Ip	2025-03-20	1,962	4
Top LLM Evaluators for Testing LLM Systems at Scale	Jeffrey Ip	2025-04-22	3,227	--
The G-Eval Guide to LLM Evaluation: Simply Explained	Kritin Vongthongsri	2025-04-30	3,925	--
The Ultimate LLM Evaluation Playbook: Why It Didn't Work For You	Jeffrey Ip	2025-05-03	3,973	--
RAG Evaluation Metrics: Assessing Answer Relevancy, Faithfulness, Contextual Relevancy, And More	Jeffrey Ip	2025-06-04	2,552	--
LLM Arena-as-a-Judge: LLM-Evals for Comparison-Based Regression Testing	Deep	2025-08-30	2,299	--
Top LangSmith Alternatives and Competitors, Compared	Jeffrey Ip	2025-09-02	3,106	--
Confident AI vs OpenLayer: Head-to-Head Comparison	Jeffrey Ip	2025-08-29	2,460	--
AI Agent Evaluation: The Definitive Guide to Testing AI Agents	Jeffrey Ip	2025-10-08	5,729	--
The Step-By-Step Guide to MCP Evaluation	--	2025-12-30	3,042	--
Confident AI vs LangSmith: Head-to-Head Comparison	--	2026-01-06	2,719	--
Multi-Turn LLM Evaluation in 2026: What You Need to Know	--	2026-03-22	3,425	--
Announcing Launch Week Q1 '26! Day 1: Automated Error Analysis	--	2026-03-31	908	--
Launch Week Day 2 (2/5): Scheduled Evals	--	2026-04-01	855	--
Launch Week Day 3 (3/5): Auto-Ingest Traces into Datasets & Annotation Queues	--	2026-04-02	958	--
Launch Week Day 4 (4/5): Auto-Categorize Traces & Threads	--	2026-04-03	1,116	--
Your AI Agent Passed Evals. That’s the Problem.	--	2026-04-06	1,505	--
Launch Week Day 5 (5/5): Generate Datasets from Your Data Sources	--	2026-04-04	1,417	--
Three Ways AI Systems Fail Even When Evals Pass	--	2026-04-07	2,856	--
Human-in-the-Loop Workflows for AI Agent Evaluation: Complete Guide	--	2026-06-13	4,943	--
The Complete Guide to LLM Experimentation: Compare Prompts, Models, and Agents	--	2026-06-10	4,810	--
LLM Evaluation for Startups: The Complete Guide	--	2026-06-04	4,788	--
Human-in-the-Loop Workflows for AI Agent Evaluation: Complete Guide	--	2026-06-13	4,980	--
LLM Evaluation for Startups: The Complete Guide	--	2026-06-04	4,788	--
LLM Product Manager Workflows: A Complete Guide to AI Quality	--	2026-06-13	5,829	--
The Complete Guide to LLM Experimentation: Compare Prompts, Models, and Agents	--	2026-06-10	4,810	--
Introducing AI Governance: Standardized evals, policies, and controls	--	2026-06-22	1,091	--
Introducing AI Observability Workflows: Custom automations for every trace on the platform	--	2026-06-23	1,238	--
Introducing Annotation Forms: Capture any human feedback without leaving Confident AI	--	2026-06-24	1,089	--
AI Agent Observability: Everything You Need to Know in 2026	--	2026-06-25	5,805	--
Introducing Synthetic Data Generation Pipelines: Customize how you generate data	--	2026-06-25	664	--
Introducing Report Templates: Build the report your team actually reads	--	2026-06-26	664	--

Plushcap, by Matt Makai. 2021-2026.