Company
Date Published
Author
Davide Fantuzzi
Word count
945
Language
English
Hacker News points
None

Summary

The Neo4j Connector for Apache Spark is a rewritten library that leverages the new DataSource API V2, allowing for multi-language support. It was developed to provide an official library and continuous service, replacing the old connector with custom "hacky" solutions. The development process was challenging due to lack of documentation, breaking changes between minor versions, and dealing with different Scala versions supported by each Spark version. To overcome these challenges, the team created a solution that maps nodes and relationships into tables using columns for properties and IDs. They also extracted a schema from a schema-less graph, handling cases where properties may have mixed types across nodes. The connector uses the official Neo4j Java Driver and generates Cypher queries through the API. It offers examples in Scala, Python, R, and more, with plans to release support for Spark 3.0 and 3.1 soon.