Setting additionalProperties to True in Iglu JSON Schemas
Blog post from Snowplow
Defining event and context schemas in Iglu involves balancing schema flexibility with data integrity, particularly regarding the additionalProperties attribute in JSON Schema, which determines whether undeclared properties are allowed in data processing. When set to true, this attribute offers flexibility by allowing new properties to be added without requiring schema updates, thus reducing overhead in rapidly changing environments. However, it also means that these extra properties, while present in raw JSON, are not available in downstream data models, potentially complicating data governance and quality management. In Snowplow data processing, such properties are retained but not loaded into data warehouses like Redshift, limiting their accessibility for analysis unless custom processing is applied. Best practices suggest using this flexibility selectively, employing structured metadata to maintain schema integrity, and establishing a versioning strategy to manage new properties. Ultimately, while this approach can facilitate rapid data collection, it requires careful management to avoid compromising data quality, especially in production-grade pipelines.