Introducing reranking to Pinecone Inference to simplify building accurate AI
Blog post from Pinecone
Pinecone Inference has introduced reranking capabilities to its API, enhancing the efficiency and accuracy of AI applications by scoring and filtering documents based on semantic relevance to a query. This feature, currently in public preview, supports the bge-reranker-v2-m3 model and aims to reduce hallucination and costs associated with AI model operations. By integrating rerankers into vector retrieval systems, such as RAG applications, Pinecone enables efficient document filtering, reducing the computational resources required and improving overall accuracy. The reranking process significantly decreases input token costs by up to 85% when used with models like GPT-4, streamlining the building of AI applications by embedding, managing, querying, and reranking data through a single API. This development simplifies the AI development stack, reducing the need for multiple tools and infrastructures, and is available for free public preview until August 31, 2024, after which it will cost $0.002 per request.