Remote Read Replicas: Read-only topics in Tiered Storage
Blog post from Redpanda
Redpanda's Tiered Storage, introduced in 2021, is a crucial component of its data streaming platform, facilitating cost-efficient data management by offloading log segments to cloud storage. By integrating with cloud storage solutions like AWS S3, GCP Cloud Storage, and Azure Blob Storage, and supporting development with MinIO, Redpanda ensures data portability and self-sufficiency through archival storage that includes topic and partition manifests. The platform's v22.2 release enhances functionality with Remote Read Replicas (RRR), enabling read-only clusters that reduce production load by utilizing archived data in the cloud. This setup allows users to create separate clusters for consumers, offering flexibility in cluster size and accommodating additional consumers without impacting performance. The architecture relies on the ntp_archiver within the scheduler_service to manage data uploads, downloads, and synchronization between local and remote clusters. Redpanda's approach provides a framework for efficient data handling, which can be leveraged for various applications, such as offline machine learning training and edge streaming CDNs, while ensuring compatibility across updates through a feature manager.