Company
Date Published
Author
Jay Lim
Word count
3281
Language
English
Hacker News points
11

Summary

This blog post by Jay Lim details the development of SQL Prober, a black-box monitoring tool for Managed CockroachDB, created during an internship at Cockroach Labs. The need for this tool arose after an incident highlighted the limitations of existing white-box monitoring metrics, which rely on internal system data, in detecting and measuring the impact of system failures. SQL Prober aims to reduce Mean Time to Detect (MTTD) and accurately assess the uptime of customer clusters by emulating customer experiences and utilizing geo-partitioning to ensure comprehensive monitoring across nodes. The framework executes periodic probe tasks to verify node liveness and the ability to serve data, using replication zones to overcome challenges in ensuring coverage and data integrity without affecting customer data. The post also explores potential future enhancements, such as improved metric analysis and the inclusion of write operations in monitoring.