Home / Companies / dltHub / Blog / Post Details
Content Deep Dive

Exploring schema evolution with ontology-driven propagation

Blog post from dltHub

Post Details
Company
Date Published
Author
Aman Gupta, Data Engineer
Word Count
1,998
Language
English
Hacker News Points
-
Summary

The blog post explores the concept of ontology-driven schema evolution in data engineering, highlighting the challenges of managing evolving data schemas and the risk of exposing sensitive information like PII when new columns are added. It proposes a solution that involves creating an access policy described in plain English, converting it into a natural-language ontology, and using this ontology as a runtime policy to evaluate each column. The approach leverages both deterministic interpreters for clear cases and LLMs for ambiguous cases to decide which columns to include in an analytics view, ensuring only analytics-safe data is exposed. It demonstrates how ontology provides a consistent policy framework that adapts automatically to schema changes, separating the policy from code and allowing seamless updates without altering the underlying pipeline. The blog emphasizes the importance of maintaining an ontology for data safety, which remains valid even as schemas change, and suggests that the approach is particularly useful for handling complex data patterns that can't be resolved by simple pattern matching alone.