Text2Cypher: The Impact of Difficult Example Selection

Post Details

Company

Neo4j

Date Published

April 29, 2025

Author

Makbule Gulcin Ozsoy

Word Count

1,381

Language

English

Hacker News Points

-

Source URL

neo4j.com/blog/developer/text2cypher-impact-of-hard-example-selection

Summary

The Text2Cypher task involves translating natural language questions into Cypher queries. Researchers explored different hard-example selection techniques, including complexity-based, length-based, and Cypher-specific approaches, to improve the performance of fine-tuned models in this task. The analysis showed that using a smaller, more targeted subset of data, prioritizing more complex or challenging instances, can significantly reduce training time and cost by more than half while improving efficiency without a drastic drop in performance. However, the highest Google BLEU and Exact-Match scores remain below the performance achieved with the full dataset. The convergence of fine-tuned models suggests that increasing data diversity and fine-tuning hyperparameters could further improve performance. Additionally, the behavior of evaluation methods highlights the need to analyze how different data subsets impact the model's ability to generate accurate Cypher queries during execution-based evaluation.