Home / Companies / Neo4j / Blog / Post Details
Content Deep Dive

Benchmarking Using the Neo4j Text2Cypher (2024) Dataset

Blog post from Neo4j

Post Details
Company
Date Published
Author
Makbule Gulcin Ozsoy
Word Count
573
Language
English
Hacker News Points
-
Summary

We explored how various fine-tuned and foundational LLM-based models perform in translating natural language questions to Cypher queries using the newly released Neo4j Text2Cypher (2024) Dataset. The results showed that closed-foundational models, such as OpenAI's GPT and Google's Gemini, demonstrated strong performance with user-friendly APIs and reliable output, though they can be costly. Previously fine-tuned models haven't quite matched these giants, but they demonstrate real potential for improvement through techniques like fine-tuning. We benchmarked four fine-tuned models and 10 foundational models to assess their performance side by side, using two evaluation procedures: translation-based evaluation and execution-based evaluation. The closed-foundational models delivered the best overall performance, with a match ratio of about 30 percent in the execution-based evaluation and outperforming previously fine-tuned models in the translation-based evaluation.