Content Deep Dive
Wikipedia and Weaviate
Blog post from Weaviate
Post Details
Company
Date Published
Author
Bob van Luijt
Word Count
1,439
Company Posts That Month
Language
English
Hacker News Points
-
Summary
This article outlines how to conduct semantic search queries on a large scale using a vector database. The complete English language Wikipedia corpus backup is open-sourced in Weaviate, which can be used for similar vector and semantic search solutions in other projects. The dataset contains 11.348.257 articles, 27.377.159 paragraphs, and 125.447.595 graph cross-references. The article provides step-by-step instructions on how to import the data into Weaviate, create a schema for semantic search, and query the data using GraphQL. It also discusses implementation strategies for bringing semantic search solutions to production, emphasizing scalability and the need for data, ML-models, and a vector database.
Trends Found in this Post
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Vector Search | 4 | 178 | 35 | 26 | +117% |
| Kubernetes | 1 | 1,218 | 176 | 69 | -9% |