RAG, or Retrieval-Augmented Generation, enhances AI agent accuracy by grounding responses in extensive knowledge bases without overwhelming the model. The system employs a query rewriting step to transform vague user requests into precise queries for retrieval, improving response accuracy but posing latency challenges. To address this, the team implemented a "model racing" approach, sending queries to multiple models simultaneously and using the first valid response, significantly reducing median latency from 326ms to 155ms. This architecture not only halves RAG latency but also increases system resilience, ensuring continuity during external model outages and allowing the system to operate efficiently over large knowledge bases, thus facilitating scalable and real-time conversational agents without performance compromise.