Wikipedia and Weaviate

Post Details

Company

Weaviate

Date Published

Nov. 25, 2021

Author

Bob van Luijt

Word Count

1,439

Company Posts That Month

1

Language

English

Hacker News Points

-

Source URL

weaviate.io/blog/semantic-search-with-wikipedia-and-weaviate

Summary

This article outlines how to conduct semantic search queries on a large scale using a vector database. The complete English language Wikipedia corpus backup is open-sourced in Weaviate, which can be used for similar vector and semantic search solutions in other projects. The dataset contains 11.348.257 articles, 27.377.159 paragraphs, and 125.447.595 graph cross-references. The article provides step-by-step instructions on how to import the data into Weaviate, create a schema for semantic search, and query the data using GraphQL. It also discusses implementation strategies for bringing semantic search solutions to production, emphasizing scalability and the need for data, ML-models, and a vector database.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Vector Search	4	178	35	26	+117%
Kubernetes	1	1,218	176	69	-9%