Text vectorization is a method of representing words, sentences, or larger units of text as vectors in a way that machines can work with. The technique has a long history, dating back to traditional count-based methods such as bag-of-words and TF-IDF, which were later improved upon by Word2Vec embeddings, but was further advanced by the Transformer-powered BERT language model, which can produce contextualized word vectors and account for unknown words. Modern semantic search systems use these techniques to improve document retrieval, and vector databases have emerged to store and search this vectorized data efficiently.