Row vs. Batch Contracts: Using AWAP to Prevent Schema Scars and State Corruption with dlt
Blog post from dltHub
The blog post discusses the challenges and solutions in data engineering, focusing on the balance between strict and permissive data ingestion pipelines. It highlights the shortcomings of strict pipelines, which often lead to frequent pipeline breaks due to minor upstream changes, and permissive pipelines, which risk schema scars and state corruption by allowing harmful data through. The article introduces AWAP (Audit-Write-Audit-Publish) as a middle-ground approach that employs a two-gate validation system to separate syntactic and semantic data validations, thereby allowing safe schema evolution while preventing destructive anomalies. By implementing AWAP, data engineers can maintain system integrity and avoid the pitfalls of manual interventions, ensuring that production tables remain reliable sources of truth while accommodating unavoidable upstream drifts. The concept is illustrated through a practical example using a street survey system, demonstrating how AWAP can filter out both row-level and batch-level data issues effectively.