A Benchmark for Evaluating NL2SQL++ Systems

Post Details

Company

Couchbase

Date Published

May 6, 2026

Author

Aayush Fabwani, Software Engineering Intern

Word Count

3,337

Company Posts That Month

6

Language

English

Hacker News Points

-

Source URL

www.couchbase.com/blog/a-benchmark-for-evaluating-nl2sql-systems

Summary

Couchbase has developed a benchmark for evaluating Natural Language to SQL++ (NL2SQL++) conversion by adapting the BIRD NL2SQL benchmark, which was originally designed for traditional SQL, to accommodate the flexibility of SQL++ used for JSON documents. This initiative addresses the absence of publicly available NL2SQL++ benchmarks, enabling more intuitive and powerful querying for users. The primary challenge with SQL++ lies in its schema flexibility, which complicates query generation for Large Language Models (LLMs). Couchbase created a comprehensive two-pass pipeline to rigorously test and improve its Capella iQ service, achieving an accuracy of 77.8% in generating correct SQL++ queries. This process involved iteratively refining the methodology to handle SQL++ specifics, such as the use of the RAW keyword in subqueries and proper NULL handling. The outcome is a reusable open-source framework intended to empower the community to develop their NL2SQL++ models, with resources available in Couchbase's GitHub repository.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	17	9,074	1,640	224	+53%