Deduping HA Prometheus Samples in Cortex

Post Details

Company

Grafana Labs

Date Published

Oct. 3, 2019

Author

Callum Styan

Word Count

1,110

Language

English

Hacker News Points

-

Source URL

grafana.com/blog/deduping-ha-prometheus-samples-in-cortex

Summary

Cortex has implemented a solution to handle high-availability (HA) Prometheus setups without duplicating metrics data, addressing the challenge of storing redundant information when multiple Prometheus instances scrape the same targets. This advancement allows multiple Prometheus replicas to write to the same Cortex instance by using unique external labels, specifically prom_ha_cluster and prom_ha_instance, to differentiate replicas and deduplicate time series data before storage. The deduplication process is managed through the Cortex distributor, which uses a Compare and Swap (CAS) operation on a distributed key-value store like Consul to manage election and failover of replicas, ensuring only one replica's data is stored at a time. This method reduces storage costs and prevents client-visible errors, as non-elected replicas receive a successful status code for their remote write calls, minimizing unnecessary log errors. Future plans include further optimizations to reduce interactions with the key-value store, enhancing efficiency in handling HA Prometheus data.