Why semantic layers make LLM analytics reliable: a paired benchmark across three frontier models
Blog post from Cube
The study examines the reliability of large language model (LLM) analytics by introducing a semantic layer, which provides business definitions in a markdown document, as a structural solution to improve the accuracy of translating natural-language questions into SQL queries. Conducted on the Cleaned Contoso Retail Dataset in ClickHouse, the research involved three frontier models—Claude Opus 4.7, Claude Sonnet 4.6, and GPT-5.4—tested under two conditions: using only the schema and with the addition of the semantic layer. The results demonstrated that incorporating the semantic layer enhanced accuracy by 17 to 23 percentage points across all models, proving statistically significant improvements and showing that the choice of model becomes less important when a semantic layer is included. The research highlights that the semantic layer serves as an essential context for accurate analytics, emphasizing it as an architectural decision over model selection, and provides resources such as the full research paper, benchmark, and dataset for further exploration.