Confident AI Blog

Blog URL

www.confident-ai.com/blog

Posts YTD

22 ↑ vs 10 last year

Avg Posts/Month

1.3 since 2023

Monthly Post Volume

Start year: 2024 2025 2026

Post Details

Search:

Title	Author	Published	Words	HN Pts
The Comprehensive Guide to LLM Security	Kritin Vongthongsri	2024-08-19	2,366	1
Evaluating LLM Systems: Essential Metrics, Benchmarks, and Best Practices	Jeffrey Ip	2024-07-17	3,747	--
Why OpenAI Assistants is a Big Win for LLM Evaluation	Jeffrey Ip	2024-04-06	1,169	--
Become a Prompt Artist: Understanding the Midjourney LLM	Jeffrey Ip	2024-04-06	1,700	--
LLM Testing in 2024: Top Methods and Strategies	Jeffrey Ip	2024-06-24	1,958	1
A Step-By-Step Guide to Evaluating an LLM Text Summarization Task	Jeffrey Ip	2024-04-06	1,443	3
A Gentle Introduction to LLM Evaluation	Jeffrey Ip	2024-04-06	1,883	--
Generating synthetic data with LLMs - Part 1	Jeffrey Ip	2024-04-06	793	--
Building a customer support chatbot using GPT-3.5 and lLamaIndex	Jeffrey Ip	2024-04-06	1,329	--
Why we replaced Pinecone with PGVector	Jeffrey Ip	2024-04-06	1,016	3
Using LLMs for Synthetic Data Generation: The Definitive Guide	Kritin Vongthongsri	2024-06-11	1,744	1
An Introduction to LLM Red Teaming	Kritin Vongthongsri	2024-07-30	2,365	--
How to Build an LLM Evaluation Framework, from Scratch	Jeffrey Ip	2024-06-24	2,342	2
RAG Evaluation: The Definitive Guide to Unit Testing RAG in CI/CD	Jeffrey Ip	2024-04-14	1,722	4
LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide	Jeffrey Ip	2024-07-09	4,321	7
An Introduction to LLM Benchmarking	Jeffrey Ip	2024-07-17	2,911	--
How to build a PDF QA chatbot using OpenAI and ChromaDB	Jeffrey Ip	2024-04-06	1,275	--
The Ultimate Guide to Fine-Tune LLaMA 3, With LLM Evaluations	Jeffrey Ip	2024-04-19	1,691	--
What is Retrieval Augmented Generation (RAG)?	Jeffrey Ip	2024-04-06	1,200	1
LLM Benchmarks: Everything on MMLU, HellaSwag, BBH, and Beyond	Kritin Vongthongsri	2024-08-19	2,266	1
How to Evaluate LLM Applications: The Complete Guide	Jeffrey Ip	2024-04-06	2,312	--
Leveraging LLM-as-a-Judge for Automated and Scalable Evaluation	Jeffrey Ip	2024-09-24	2,508	--
LLM Chatbot Evaluation Explained: Top Metrics and Testing Techniques	Jeffrey Ip	2024-10-05	2,365	3
What is LLM Observability? - The Ultimate LLM Monitoring Guide	Kritin Vongthongsri	2024-10-30	2,694	--
The Comprehensive LLM Safety Guide: Navigate AI regulations and Best Practices for …	Kritin Vongthongsri	2024-11-03	2,342	--
How to Jailbreak LLMs One Step at a Time: Top Techniques and …	Kritin Vongthongsri	2024-10-30	2,206	--
OWASP Top 10 2025 for LLM Applications: What’s new? Risks, and Mitigation …	Kritin Vongthongsri	2025-01-19	3,590	--
The People's Choice of Top LLM Evaluation Tools in 2025	Jeffrey Ip	2025-01-18	1,829	--
LLM Guardrails: The Ultimate Guide to Safeguard LLM Systems	Jeffrey Ip	2025-01-26	3,024	--
LLM Agent Evaluation: Assessing Tool Use, Task Completion, Agentic Reasoning, and More	Kritin Vongthongsri	2025-01-31	2,702	--
How I Built Deterministic LLM Evaluation Metrics for DeepEval	Jeffrey Ip	2025-02-09	2,335	--
How I raised Confident AI's $2.2M seed round in 5 days	Jeffrey Ip	2025-03-20	1,962	4
Top LLM Evaluators for Testing LLM Systems at Scale	Jeffrey Ip	2025-04-22	3,227	--
The G-Eval Guide to LLM Evaluation: Simply Explained	Kritin Vongthongsri	2025-04-30	3,925	--
The Ultimate LLM Evaluation Playbook: Why It Didn't Work For You	Jeffrey Ip	2025-05-03	3,973	--
RAG Evaluation Metrics: Assessing Answer Relevancy, Faithfulness, Contextual Relevancy, And More	Jeffrey Ip	2025-06-04	2,552	--
LLM Arena-as-a-Judge: LLM-Evals for Comparison-Based Regression Testing	Deep	2025-08-30	2,299	--
Top LangSmith Alternatives and Competitors, Compared	Jeffrey Ip	2025-09-02	3,106	--
Confident AI vs OpenLayer: Head-to-Head Comparison	Jeffrey Ip	2025-08-29	2,460	--
AI Agent Evaluation: The Definitive Guide to Testing AI Agents	Jeffrey Ip	2025-10-08	5,729	--
The Step-By-Step Guide to MCP Evaluation	--	2025-12-30	3,042	--
Confident AI vs LangSmith: Head-to-Head Comparison	--	2026-01-06	2,719	--
Multi-Turn LLM Evaluation in 2026: What You Need to Know	--	2026-03-22	3,425	--
Announcing Launch Week Q1 '26! Day 1: Automated Error Analysis	--	2026-03-31	908	--
Launch Week Day 2 (2/5): Scheduled Evals	--	2026-04-01	855	--
Launch Week Day 3 (3/5): Auto-Ingest Traces into Datasets & Annotation Queues	--	2026-04-02	958	--
Launch Week Day 4 (4/5): Auto-Categorize Traces & Threads	--	2026-04-03	1,116	--
Your AI Agent Passed Evals. That’s the Problem.	--	2026-04-06	1,505	--
Launch Week Day 5 (5/5): Generate Datasets from Your Data Sources	--	2026-04-04	1,417	--
Three Ways AI Systems Fail Even When Evals Pass	--	2026-04-07	2,856	--
Human-in-the-Loop Workflows for AI Agent Evaluation: Complete Guide	--	2026-06-13	4,943	--
The Complete Guide to LLM Experimentation: Compare Prompts, Models, and Agents	--	2026-06-10	4,810	--
LLM Evaluation for Startups: The Complete Guide	--	2026-06-04	4,788	--
Human-in-the-Loop Workflows for AI Agent Evaluation: Complete Guide	--	2026-06-13	4,980	--
LLM Evaluation for Startups: The Complete Guide	--	2026-06-04	4,788	--
LLM Product Manager Workflows: A Complete Guide to AI Quality	--	2026-06-13	5,829	--
The Complete Guide to LLM Experimentation: Compare Prompts, Models, and Agents	--	2026-06-10	4,810	--
Introducing AI Governance: Standardized evals, policies, and controls	--	2026-06-22	1,091	--
Introducing AI Observability Workflows: Custom automations for every trace on the platform	--	2026-06-23	1,238	--
Introducing Annotation Forms: Capture any human feedback without leaving Confident AI	--	2026-06-24	1,089	--
AI Agent Observability: Everything You Need to Know in 2026	--	2026-06-25	5,805	--
Introducing Synthetic Data Generation Pipelines: Customize how you generate data	--	2026-06-25	664	--
Introducing Report Templates: Build the report your team actually reads	--	2026-06-26	664	--

Plushcap, by Matt Makai. 2021-2026.