Testing if "bash is all you need"

Post Details

Company

Braintrust

Date Published

Jan. 22, 2026

Author

Ankur Goyal

Word Count

857

Language

English

Hacker News Points

-

Source URL

www.braintrust.dev/blog/bash-agent-evals

Summary

The text discusses the ongoing debate in the AI community about the optimal abstraction for AI agents, comparing the use of filesystems and bash with direct SQL queries for managing and querying structured data. While filesystems and bash offer a familiar interface due to the extensive training of language models on code and terminal environments, a recent evaluation revealed that SQL outperformed bash, achieving 100% accuracy compared to bash's 53%, despite bash generating sophisticated shell commands. Combining both methods in a hybrid approach led to high accuracy through a process of verification, although at a higher token cost. The primary insight is that SQL is superior for structured data queries, whereas bash offers flexibility for exploration and verification. The experiment highlighted the importance of iterative evaluation and collaboration in refining agent capabilities, revealing that the process of debugging and refining tasks through detailed traces significantly enhanced the tools and benchmarks. The text invites readers to conduct their own benchmarks using the open-source evaluation harness to adapt to their specific datasets and questions.