Scaling Elasticsearch Across Data Centers With Kafka

Post Details

Company

Elastic

Date Published

Dec. 1, 2015

Author

Dara Gies

Word Count

1,090

Language

-

Hacker News Points

-

Source URL

www.elastic.co/blog/scaling_elasticsearch_across_data_centers_with_kafka

Summary

Organizations often need to manage and replicate data across multiple regions due to local security, privacy, and performance requirements, which presents challenges such as high availability, fault tolerance, and varying network conditions. The blog discusses potential architectures for a creative agency with offices in New York and London, where media assets are created locally and need to be accessible across data centers. A single shared Elasticsearch cluster is discouraged due to potential synchronization issues caused by network failures. Independent Elasticsearch clusters with Tribe Nodes or a shared Kafka cluster offer alternatives but come with their own drawbacks, such as search latency and network dependency. The recommended architecture involves maintaining independent Elasticsearch and Kafka clusters in each data center, with data synchronization achieved through Kafka MirrorMaker or Logstash. This approach allows for reliable data replication, ensures independent operation in the event of network failures, and supports disaster recovery by allowing each location to serve as a backup for the other. While Elasticsearch does not inherently support consistent replication over high latency networks, combining it with Kafka provides a robust solution for the use case.