How To Monitor Data Quality in a Databricks Unity Catalog
Blog post from Soda
Soda integrates with Databricks to enhance data quality management by allowing users to define, execute, and monitor Data Contracts directly within Databricks. The platform utilizes Soda's end-to-end data quality capabilities, which include profiling, monitoring, and AI-assisted features to create executable rules that ensure data consistency and reliability. Soda connects to Databricks via SQL Warehouse or Spark sessions, and users can define Data Contracts using templates and automated profiling tools. Results from data quality checks are stored in a diagnostics warehouse within Databricks, enabling users to investigate and visualize data quality issues. The platform supports security features like SOC 2 compliance and integrates with various tools for communication and metadata management. Soda's capabilities are enhanced by its Contract Copilot, which helps convert natural language descriptions into executable checks, allowing business users to participate in data governance. While Soda does not natively provide data lineage, it integrates with partner tools to manage these aspects, offering a comprehensive solution for data quality management in Databricks environments.