OpenAI text-embedding-3 Embedding Models: First Look
Blog post from Vectorize
OpenAI recently unveiled two new embedding models, text-embedding-3-large (v3 Large) and text-embedding-3-small (v3 Small), which are notable for having a dynamic, configurable number of dimensions. The v3 Large model, defaulting to 3,072 dimensions, and the v3 Small model, defaulting to 1,536 dimensions, were tested against the Ada v2 model using a dataset from the Advanced Dungeons and Dragons 2nd edition rule books. This initial examination, conducted by a co-founder of Vectorize, involved creating embeddings from chunks of the dataset and revealed varied performance in terms of context relevancy scores across different question categories. While v3 Large showed lower similarity scores potentially due to its capacity to capture subtle differences, v3 Small demonstrated competitive performance despite having fewer dimensions, offering a cost-effective alternative in terms of vector storage and processing. The findings underscore that larger embeddings do not always equate to better performance and highlight the importance of evaluating models in real-world applications, a principle that informed the development of Vectorize.