What Is Text Vectorization? Everything You Need to Know

Company

deepset

Date Published

Dec. 3, 2021

Author

Isabelle Nguyen

Word count

1624

Language

English

Hacker News points

None

URL

www.deepset.ai/blog/what-is-text-vectorization-in-nlp

Summary

Text vectorization is a method of representing words, sentences, or larger units of text as vectors in a way that machines can work with. The technique has a long history, dating back to traditional count-based methods such as bag-of-words and TF-IDF, which were later improved upon by Word2Vec embeddings, but was further advanced by the Transformer-powered BERT language model, which can produce contextualized word vectors and account for unknown words. Modern semantic search systems use these techniques to improve document retrieval, and vector databases have emerged to store and search this vectorized data efficiently.