LLMs Are New Database Users. Now We Need a Way to Measure Them: Meet text-to-sql-eval

Post Details

Company

Tiger Data

Date Published

Aug. 28, 2025

Author

Team Tiger Data

Word Count

1,353

Language

English

Hacker News Points

-

Source URL

www.tigerdata.com/blog/text-to-sql-eval-open-source

Summary

Tiger Data has open-sourced a tool called "text-to-sql-eval," designed to evaluate and enhance text-to-SQL systems, particularly for PostgreSQL. Recognizing large language models (LLMs) as new database users, the tool aims to address the challenge of measuring their success in database interactions. The tool provides a comprehensive evaluation system that measures accuracy, identifies sources of failure, and suggests improvements. It includes features like LLM-as-a-judge for more human-like query evaluation, tracks performance over time, and offers three operational modes to debug issues with schema retrieval and reasoning. Text-to-sql-eval is flexible, extensible, and allows users to evaluate any LLM or text-to-SQL system, supporting a wide range of tools and models. It also comes with a companion repository to generate natural language questions and corresponding SQL queries for user databases, streamlining the creation of test datasets. Tiger Data has already utilized this suite internally for benchmarking, schema-specific performance evaluation, and tracking accuracy regressions, and now invites the community to explore and contribute to its development.