Airflow in Action: How SAP Delivers Trusted AI for Enterprise Clients
Blog post from Astronomer
At the Airflow Summit, Sagar Sharma from SAP detailed the development of a production-grade Retrieval Augmented Generation (RAG) pipeline using Apache Airflow, which supports Joule for Consultants, SAP's AI copilot. This system processes over 5 million documents from more than 15 data sources, offering a 30% productivity boost and 40% faster ABAP code interpretation for consultants by leveraging SAP-specific knowledge. The team selected Airflow over alternatives like Prefect, Dagster, and Flyte due to its fast implementation, DevOps compatibility, managed service options, and strong community momentum, all aligning with their Python-native infrastructure. The pipeline's evolution involved transitioning from a single hard-coded Directed Acyclic Graph (Dag) to a more flexible architecture with Airflow Variables and separate, parallel pipelines for ETL and data injection, resulting in a scalable system that accommodates both production workloads and AI/ML experimentation. The pipeline includes six modular stages, such as raw data ingestion, preprocessing, chunking, metadata extraction, PII redaction, and vector DB injection, with custom operators tailored for AI-specific tasks.