Home / Companies / DataStax / Blog / Post Details
Content Deep Dive

Building Knowledge Graphs at Production Scale for GenAI

Blog post from DataStax

Post Details
Company
Date Published
Author
-
Word Count
615
Language
English
Hacker News Points
-
Summary

Knowledge graphs are being utilized to enhance the results of retrieval-augmented generation (RAG) applications, with most examples demonstrating how to build a knowledge graph from a small number of documents. The typical approach involves extracting fine-grained, entity-centric information, which does not scale well due to time and cost constraints when dealing with large datasets. Content-centric knowledge graphs, such as GraphVectorStore, offer an easier and more efficient alternative by allowing links between chunks. This article presents a comparison of the two approaches using a subset of Wikipedia articles from the 2wikimultihop dataset. The content-centric approach is shown to be significantly faster and less expensive than the entity-centric method when loading large datasets, with parallelism further reducing processing time. Additionally, the content-centric approach produces more accurate and relevant answers to questions posed over the loaded data. Overall, GraphVectorStore offers a practical solution for building knowledge graphs at scale for RAG applications.