Company
Date Published
Author
Tom Seddon, Matt Mangia, Gil Friedlis
Word count
2314
Language
English
Hacker News points
None

Summary

The text discusses the implementation of a flexible, managed repository for Protobuf schemas in an event streaming platform built with Apache Kafka and MongoDB. The goal was to provide strong guarantees on data quality and schema evolution while supporting service decomposition work and analytical data needs. The authors investigated various encoding formats, ultimately choosing Avro due to its support for managing schema evolution and backwards and forwards compatibility. However, they discounted Avro due to lack of cross-language support. Instead, they chose Protobuf, which was already being used in the platform. To ensure data consistency and schema evolution, the team implemented unit tests that enforce rules on message structure and field numbers. They also created a centralized repository for schemas and introduced custom options in Protobuf IDL to enforce topic metadata and schema relationships. Additionally, they developed a Producer API that performs schema/topic validation before forwarding messages to Kafka, providing a method to enforce the relationship between producers, topics, and schemas. The platform now offers strong guarantees on data quality and schema evolution while supporting service decomposition work and analytical data needs.