Soda Data Quality
Blog post from Soda
Precisely's 2025 State of Data Integrity survey highlights significant issues with data quality, revealing that 77% of organizations rate their data quality as average or worse, and 67% lack full trust in their data for decision-making. The root cause identified is inadequate tools for automating data quality processes, emphasizing a syntax gap rather than a tooling gap. Traditionally, Python has been used for data quality checks due to its flexibility and widespread use among data engineers, but this approach limits participation to those with programming expertise. YAML, in contrast, offers a more inclusive, declarative method for defining data quality checks, allowing non-engineering stakeholders like governance leads and analysts to participate directly in defining what "good data" means. YAML's ease of use, integration with CI/CD workflows, and compatibility with data contracts make it particularly effective for standard data quality validations, whereas Python remains necessary for more complex checks. This hybrid model ensures broader organizational involvement in data quality management, addressing issues collaboratively and efficiently.