Company
Date Published
Author
Robert Yokota
Word count
2087
Language
English
Hacker News points
None

Summary

A data contract is a formal agreement on data structure and semantics between upstream and downstream components, emphasizing data consistency, quality, and security, particularly in regulated industries. By combining data contracts with encryption on streaming workloads, responsibilities of data governance are shifted left to the data producers, allowing consumers to trust the data stream. The text outlines the use of Confluent Schema Registry and other tools to protect personally identifiable information (PII) in scenarios like healthcare by ensuring data quality, using dead letter queues (DLQ) for invalid data, applying simple masking functions, and implementing client-side encryption. It details the process of defining and validating schemas, setting up data quality rules, and using Common Expression Language (CEL) for data transformations and masking. Additionally, it describes client-side field-level encryption (CSFLE) using envelope encryption techniques to safeguard sensitive data, with the use of local keys as a testing measure. Overall, the post emphasizes the role of data contracts in securing data, ensuring compliance, and enhancing data reliability for consumers.