Home / Companies / Neo4j / Blog / Post Details
Content Deep Dive

YouTube Transcripts Into Knowledge Graphs for RAG Applications

Blog post from Neo4j

Post Details
Company
Date Published
Author
Alex Gilmore
Word Count
1,770
Language
English
Hacker News Points
-
Summary

This blog post explores how to scrape YouTube video transcripts into a knowledge graph for Retrieval Augmented Generation (RAG) applications. The project uses Google Cloud Platform, Neo4j, and LangChain to create a document from the transcript, store the resulting documents in a Neo4j graph database, and embed only the smaller child chunks of the text using SpaCy embeddings. The process involves setting up services such as Google Cloud Storage and Neo4j AuraDB instance, scraping transcripts from YouTube videos, chunking the transcripts into manageable pieces, loading the transcripts into the Neo4j graph database, and creating an index on the embedding property for vector search. The project demonstrates how to build a simple knowledge graph that can be used for RAG applications, with plans to explore building a basic RAG application in the next blog post.