Benchmarking Using the Neo4j Text2Cypher (2024) Dataset

Post Details

Company

Neo4j

Date Published

Nov. 12, 2024

Author

Makbule Gulcin Ozsoy

Word Count

573

Language

English

Hacker News Points

-

Source URL

neo4j.com/blog/developer/benchmarking-neo4j-text2cypher-dataset

Summary

We explored how various fine-tuned and foundational LLM-based models perform in translating natural language questions to Cypher queries using the newly released Neo4j Text2Cypher (2024) Dataset. The results showed that closed-foundational models, such as OpenAI's GPT and Google's Gemini, demonstrated strong performance with user-friendly APIs and reliable output, though they can be costly. Previously fine-tuned models haven't quite matched these giants, but they demonstrate real potential for improvement through techniques like fine-tuning. We benchmarked four fine-tuned models and 10 foundational models to assess their performance side by side, using two evaluation procedures: translation-based evaluation and execution-based evaluation. The closed-foundational models delivered the best overall performance, with a match ratio of about 30 percent in the execution-based evaluation and outperforming previously fine-tuned models in the translation-based evaluation.