Home / Companies / Neo4j / Blog / Post Details
Content Deep Dive

Neo4j Data Integration Pipeline Using Snakemake and Docker

Blog post from Neo4j

Post Details
Company
Date Published
Author
Ben Elsworth, Marina Vabistsevits, Oliver Lloyd, Yi Liu & Tom Gaunt
Word Count
1,140
Language
English
Hacker News Points
-
Summary

We have designed a Neo4j data integration pipeline to streamline our projects, providing access and transparency to the entire process. The pipeline uses Snakemake rules to control each step of the build process, running checks on each dataset and automating the build process. It can create a working graph from raw data, while also handling datasets from various sources that require cleaning and QC before incorporation. The pipeline also includes features such as predefined database schema creation, testing new data, merging nodes, Neo4j import, remote server options, and setup instructions for use. Our goal is to provide a simple method for adding new data to a graph build, which could potentially be used collaboratively.