Testing Before Loading: WAP and AWAP
Blog post from dltHub
In the data engineering realm, the balance between strict and permissive pipelines often presents challenges, with strict pipelines halting at minor changes and permissive ones accruing technical debt. To address this, the Audit-Write-Audit-Publish (AWAP) framework is introduced as a resilient solution that mitigates the issues of both extremes by incorporating a two-gate validation system. This approach involves syntactic validation at the row level to prevent malformed data from causing schema mutations, followed by semantic validation at the batch level to catch anomalies that could corrupt data integrity. The AWAP model not only accommodates necessary schema evolution but also maintains data reliability by separating recoverable drifts from destructive anomalies. Through practical examples like the Street Survey System, AWAP demonstrates its effectiveness in filtering out untrustworthy data before it enters production, thereby preventing the need for extensive post-hoc corrections. The model offers a structured approach to data management that ensures a stable production environment while minimizing the risk of state corruption.