Disaster Recovery in 60 Seconds: A POC for Seamless Client Failover on Confluent Cloud
Blog post from Confluent
The exploration of seamless client failover in Apache Kafka environments, particularly with Confluent Cloud, addresses the complexities of disaster recovery and maintaining low recovery point objectives (RPO) and recovery time objectives (RTO). Leveraging Confluent Cloud Gateway, the approach demonstrates how to orchestrate disaster recovery failover within 60 seconds by using a self-managed Kafka protocol proxy to redirect traffic from a failed active cluster to a passive one without client-side changes. This proof of concept (POC) highlights the challenges and solutions in ensuring continuity of service, particularly when dealing with Kafka's client and server-side failover. The POC includes a gateway that manages replication with Cluster Linking and allows for failover and failback with a single click, albeit with limitations like single-region support and the need for manual operations. The introduction of Confluent Cloud Gateway aims to simplify Kafka connectivity by providing a stable and intelligent entry point, automatically rerouting clients during outages to reduce recovery times and improve continuity, with future plans for an integrated client switchover feature.