Retrieval Augmented Generation (RAG) systems are gaining popularity for their ability to quickly deliver precise answers by combining internal documents, databases, and knowledge bases with Large Language Models (LLMs). These systems address the limitations of LLMs, such as token limits and lack of access to private data, by integrating relevant data snippets into model prompts, thereby enhancing accuracy and reducing hallucinations. A RAG pipeline consists of a retriever and generator, where the retriever converts user queries into vectors to find the most pertinent information, which the generator then uses to craft accurate responses. Building a scalable RAG architecture is crucial for handling increasing data and queries efficiently, requiring advanced techniques like adaptive chunking, vector databases, and parallel processing. Monitoring and regular updates ensure the system remains responsive and accurate, while evaluation tools like Deepchecks help maintain quality by assessing retrieval and generation components. Agentic RAG, an advanced version, introduces autonomous agents to handle complex queries through planning and tool invocation, offering enhanced flexibility and reasoning capabilities compared to traditional RAG systems, which are more suitable for straightforward, high-volume queries.