Home / Companies / Soda / Blog / Post Details
Content Deep Dive

Soda Cleanse: Agentic Data Cleansing, Contract-Driven

Blog post from Soda

Post Details
Company
Date Published
Author
https://www.linkedin.com/in/lauren-de-bruyn/
Word Count
1,252
Language
English
Hacker News Points
-
Summary

Soda Cleanse is an advanced data cleansing tool that extends Soda's existing capabilities from automated detection to automated remediation, using specialized AI agents that propose fixes for data issues, which are then approved by human data stewards. The tool is built on Soda Cloud and the Diagnostics Warehouse, and it offers a contract-driven approach that integrates detection and remediation within the same data contract, ensuring consistency and traceability. Soda Cleanse focuses on four main failure types: entity normalization, imputation, deduplication, and reconciliation, each handled by specialized agents that propose solutions based on contract rules and context. It emphasizes human governance, requiring a steward's sign-off for any data changes, and maintains a comprehensive audit trail for all actions. Currently available in a limited Private Preview, Soda Cleanse aims to optimize data quality management by progressively learning from approval histories, thus streamlining the governance process and moving towards a more autonomous system.