Galileo Blog - Plushcap

Blog URL

galileo.ai/blog

Posts YTD

93 ↓ vs 144 last year

Avg Posts/Month

7.8 since 2026

Monthly Post Volume

Start year: 2022 2023 2024 2025 2026

Post Details

Search:

Title	Author	Published	Words	HN Pts
Google's Agent2Agent Protocol Explained	Jackson Wells	2026-01-18	2,409	--
Context Engineering at Scale: How We Built Galileo Signals	Bipin Shetty	2026-01-21	2,378	--
MMLU Benchmark: Testing AI Language Models	John Weiler	2026-01-17	2,394	--
What Is Toolchaining?	Jackson Wells	2026-02-02	2,232	--
Best LLMOps Platforms for Scaling Generative AI	Jackson Wells	2026-02-02	2,550	--
DeepMind FACTS Framework 2026: LLM Factual Accuracy Guide	Jackson Wells	2026-02-02	2,289	--
What Is RAGChecker?	Pratik Bhavsar	2026-02-02	2,706	--
7 Best Agent Evaluation Frameworks	Pratik Bhavsar	2026-02-02	2,354	--
What Is Chain-of-Thought Prompting? A Guide to Improving LLM Reasoning	Pratik Bhavsar	2026-02-02	2,532	--
What Is BrowseComp? OpenAI's Agent Benchmark Reveals 2026 Gaps	Jackson Wells	2026-02-02	2,337	--
What Is PaperBench?	Conor Bronsdon	2026-02-02	2,803	--
6 Best LLM Monitoring Solutions for Enterprise	Jackson Wells	2026-02-14	2,341	--
Agent Evaluation Framework 2026: Metrics, Rubrics & Benchmarks	Pratik Bhavsar	2026-02-14	2,233	--
5 Best LLM Evaluation Tools for Enterprise Teams	Pratik Bhavsar	2026-02-14	2,713	--
6 Best AI Agent Monitoring Tools in 2026	Jackson Wells	2026-02-14	1,803	--
7 Best LLM Observability Tools for Debugging and Tracing	Jackson Wells	2026-02-14	2,537	--
The Case for Purpose-Built vs. General AI Observability Tools	Jackson Wells	2026-02-25	3,551	--
Best Braintrust Alternatives in 2026	Jackson Wells	2026-02-25	2,455	--
Are You Making These 7 LLM-as-a-Judge Mistakes?	Jackson Wells	2026-02-25	2,562	--
Building Continuous Agent Evaluation Pipelines	Pratik Bhavsar	2026-02-25	2,268	--
7 Best LLM Eval Platforms Compared	Jackson Wells	2026-02-25	2,159	--
9 Key Findings from the State of AI Evaluation Engineering Report	Jackson Wells	2026-02-25	2,584	--
5 Best Hallucination Detection Tools for LLM Applications	Jackson Wells	2026-02-25	2,773	--
Announcing Agent Control: The Open Source Control Plane for AI Agents	Yash Sheth	2026-03-11	1,500	--
Securing the Agentic Future: Cisco AI Defense Integrates with Agent Control	Yash Sheth	2026-03-11	798	--
5 Tools to Evaluate and Monitor Multi-Agent AI Systems	Pratik Bhavsar	2026-03-16	2,292	--
AI Incident Response: Detect, Triage & Learn Fast	Jackson Wells	2026-03-17	2,700	--
Why 93% of AI Teams Struggle with LLM-as-a-Judge and 8 Alternatives That …	Jackson Wells	2026-03-17	2,950	--
6 Best AI Drift Detection Tools	Jackson Wells	2026-03-17	2,213	--
GCache: Caching Without the Chaos	Lev Neiman	2026-03-16	1,747	--
What MT-Bench and Chatbot Arena Reveal About Most LLM Judges	Jackson Wells	2026-03-17	3,231	--
What MT-Bench and Chatbot Arena Reveal About Most LLM Judges	Jackson Wells	2026-03-17	3,231	--
Galileo AI: The AI Observability and Evaluation Platform	Jackson Wells	2026-03-17	2,485	--
6 Best AI Drift Detection Tools in 2026	Jackson Wells	2026-03-17	2,205	--
8 Best AI Agent Debugging & Root Cause Analysis Tools	Jackson Wells	2026-03-17	2,303	--
Galileo AI: The AI Observability and Evaluation Platform	Jackson Wells	2026-03-17	2,145	--
8 Best AI Agent Guardrails Solutions in 2026	Jackson Wells	2026-03-17	2,378	--
Galileo AI: The AI Observability and Evaluation Platform	Jackson Wells	2026-03-17	2,150	--
OpenClaw: Sobering Lessons from an Agent Gone Rogue	Joyal Palackel	2026-03-19	2,312	--
7 Best RAG Debugging Tools for Production (2026)	Conor Bronsdon	2026-03-24	2,618	--
8 Best Small Language Models for AI Evaluation	Jackson Wells	2026-03-24	3,051	--
5 Best RAG Observability Tools Compared in 2026	Conor Bronsdon	2026-03-24	2,344	--
9 Best LLM Drift Monitoring Platforms in 2026	Jackson Wells	2026-03-24	3,290	--
5 Best AI Guardrails Platforms Compared in 2026	Jackson Wells	2026-03-24	2,065	--
Announcing Galileo Autotune: Your Evals Are Wrong 20% of the Time. Now …	Paul Lacey	2026-04-02	1,405	--
AI Incident Response Tools to Look For in 2026	Jackson Wells	2026-04-06	3,653	--
6 Best AI Agent Observability Platforms (2026)	Jackson Wells	2026-04-06	2,229	--
6 Best LangSmith Alternatives Compared (2026)	Jackson Wells	2026-04-06	2,478	--
8 Best AI Agent Evaluation Platforms in 2026	Jackson Wells	2026-04-13	2,766	--
9 Best Retrieval Quality Monitoring Tools	Jackson Wells	2026-04-13	2,406	--
8 Best AI Agent Governance Tools in 2026	Jackson Wells	2026-04-13	2,739	--
Galileo AI: The AI Observability and Evaluation Platform	Jackson Wells	2026-04-13	2,118	--
8 Best LLM Input Output Validation Tools	Jackson Wells	2026-04-13	2,774	--
Galileo AI: The AI Observability and Evaluation Platform	Jackson Wells	2026-04-19	2,579	--
Galileo AI: The AI Observability and Evaluation Platform	Jackson Wells	2026-04-19	2,249	--
Galileo AI: The AI Observability and Evaluation Platform	Jackson Wells	2026-04-19	2,730	--
Galileo AI: The AI Observability and Evaluation Platform	Jackson Wells	2026-04-19	2,539	--
From OWASP to Enterprise: Building a Central Control Plane for Agentic AI …	Pratik Bhavsar	2026-04-21	3,057	--
Scaling Judge Compute: The Next Frontier in AI Evaluation	Jackson Wells	2026-04-28	3,033	--
OWASP ASI01: Mapping Every Agent Goal Hijack Variant to Detection and Defense	Pratik Bhavsar	2026-04-28	2,579	--
The 70/40 Framework Elite Teams Use for AI Reliability	Jackson Wells	2026-04-28	2,363	--
Domain-Specific LLM Evaluation: Why Generic Rubrics Fall Short	Jackson Wells	2026-04-28	2,772	--
Why LLM Judges Disagree With Your Experts — and How to Fix …	Jackson Wells	2026-04-28	2,697	--
6 Best Langfuse Alternatives Compared in 2026	Jackson Wells	2026-05-01	2,948	--
What Is AI Agent Governance? A Practical Guide	Jackson Wells	2026-05-01	3,014	--
8 Best LLM Reliability Solutions for Production	Jackson Wells	2026-05-01	2,649	--
10 Best Low-Latency LLM Evaluation Tools in 2026	Jackson Wells	2026-05-01	3,280	--
OWASP ASI02: When AI Agents Weaponize Their Own Tools	Pratik Bhavsar	2026-05-11	3,501	--
Beyond Golden Datasets: Why Static Evals Miss Critical LLM Failures	Pratik Bhavsar	2026-05-15	2,323	--
AI Compliance Without Slowing Innovation: A Technical Leader's Playbook	Pratik Bhavsar	2026-05-15	2,958	--
AI Brittleness vs. Non-Determinism: The Real Reliability Problem	Pratik Bhavsar	2026-05-15	2,757	--
Expert-in-the-Loop Evaluation: Closing the SME Agreement Gap	Pratik Bhavsar	2026-05-15	2,460	--
How to Calibrate Your LLM Judge With Human Annotations	Pratik Bhavsar	2026-05-15	2,593	--
Future-Proofing Your AI Strategy: Navigating Regulatory Change	Pratik Bhavsar	2026-05-15	3,020	--
Instance-Specific Rubrics: The Next Frontier in LLM Evaluation	Pratik Bhavsar	2026-05-15	2,651	--
Fix AI like a professional eval engineer.	Pratik Bhavsar	2026-05-19	3,611	--
Luna Studio: Custom SLM Judges for Production AI Guardrails	Joyal Palackel	2026-05-20	2,490	--
How to Use Cursor Without Deleting Your GitHub Repos	Michael Branconier	2026-05-19	955	--
The 2026 Caching Playbook for Agents: Bigger Prompts, Smaller Bills.	Paul Lacey	2026-05-26	1,963	--
NIST AI Risk Management Framework in Practice	Jackson Wells	2026-06-09	2,585	--
Monitoring and Observability in Deployed AI	Jackson Wells	2026-06-08	2,609	--
AI-Powered Observability for Autonomous Agents	Jackson Wells	2026-06-09	2,626	--
AI Governance Failures and How to Prevent Them	Jackson Wells	2026-06-09	2,394	--
How to Discover Shadow Agents in Your Enterprise	Jackson Wells	2026-06-09	2,638	--
The Eval-to-Guardrail Lifecycle Explained	Jackson Wells	2026-06-09	2,660	--
Agent Telemetry and the New Observability Model for AI Agents	Jackson Wells	2026-06-09	2,472	--
How to Choose an AI Governance Platform	Jackson Wells	2026-06-09	2,798	--
AI Data Observability for Production Pipelines	Jackson Wells	2026-06-09	2,602	--
The AI Governance Maturity Model Explained	Jackson Wells	2026-06-09	2,511	--
AI Governance Tools Across the Stack	Jackson Wells	2026-06-09	2,908	--
The Hidden Cost of Sampling in Agent Observability	Jackson Wells	2026-06-09	2,771	--
Evaluation-Driven Development Across the ADLC	Jackson Wells	2026-06-09	2,624	--
AI Observability Trends Shaping 2026	Jackson Wells	2026-06-08	2,365	--

Plushcap, by Matt Makai. 2021-2026.