Text2Cypher Across Languages: Evaluating Foundational Models Beyond English

Post Details

Company

Neo4j

Date Published

July 8, 2025

Author

Makbule Gulcin Ozsoy

Word Count

1,242

Language

English

Hacker News Points

-

Source URL

neo4j.com/blog/developer/text2cypher-across-languages

Summary

The blog post discusses the evaluation of large language models (LLMs) on the Text2Cypher task, which involves converting natural language questions into Cypher queries for Neo4j graph databases, with a focus on multilingual performance across English, Spanish, and Turkish. The authors released a multilingual test set and analyzed model performance, finding that LLMs perform best in English, followed by Spanish and Turkish, due to variations in language resources and linguistic similarities. The study showed that translating prompts had minimal impact on performance, while schema elements remained in English, suggesting future research could explore fully localized setups and language-specific tuning to improve cross-lingual query generation. The findings aim to promote broader research in structured query generation and contribute to the multilingual capabilities of LLMs.