Neo4j Data Integration Pipeline Using Snakemake and Docker

Post Details

Company

Neo4j

Date Published

Jan. 7, 2021

Author

Ben Elsworth, Marina Vabistsevits, Oliver Lloyd, Yi Liu & Tom Gaunt

Word Count

1,140

Language

English

Hacker News Points

-

Source URL

neo4j.com/blog/healthcare/neo4j-data-integration-pipeline-using-snakemake-and-docker

Summary

We have designed a Neo4j data integration pipeline to streamline our projects, providing access and transparency to the entire process. The pipeline uses Snakemake rules to control each step of the build process, running checks on each dataset and automating the build process. It can create a working graph from raw data, while also handling datasets from various sources that require cleaning and QC before incorporation. The pipeline also includes features such as predefined database schema creation, testing new data, merging nodes, Neo4j import, remote server options, and setup instructions for use. Our goal is to provide a simple method for adding new data to a graph build, which could potentially be used collaboratively.