Best Practices to Make (Very) Large Updates in Neo4j
Blog post from Neo4j
Neo4j is facing performance issues when making updates, particularly with large datasets. To optimize updates, Fanghua Joshua Yu, a Neo4j Pre-Sales & Field Engineer, reviews strategies such as using PERIODIC COMMIT with a small batchSize to reduce memory usage and parallel processing to improve performance. He also shares a case study of the Stack Overflow dataset, which has around 31 million nodes and 78 million relationships, to demonstrate how Cypher tuning can be used to keep queries efficient. Yu also discusses the importance of hardware, such as using an SSD, and monitoring system resources like heap memory usage and CPU threads. He highlights the benefits of using APOC procedures for iterative commits and parallel processing, which can significantly improve performance. Additionally, he emphasizes the need to test and profile updates before making large changes in a production environment.