A Benchmark for Evaluating NL2SQL++ Systems
Blog post from Couchbase
Couchbase has developed a benchmark for evaluating Natural Language to SQL++ (NL2SQL++) conversion by adapting the BIRD NL2SQL benchmark, which was originally designed for traditional SQL, to accommodate the flexibility of SQL++ used for JSON documents. This initiative addresses the absence of publicly available NL2SQL++ benchmarks, enabling more intuitive and powerful querying for users. The primary challenge with SQL++ lies in its schema flexibility, which complicates query generation for Large Language Models (LLMs). Couchbase created a comprehensive two-pass pipeline to rigorously test and improve its Capella iQ service, achieving an accuracy of 77.8% in generating correct SQL++ queries. This process involved iteratively refining the methodology to handle SQL++ specifics, such as the use of the RAW keyword in subqueries and proper NULL handling. The outcome is a reusable open-source framework intended to empower the community to develop their NL2SQL++ models, with resources available in Couchbase's GitHub repository.