Company
Date Published
Author
Umut Uzgur, Tim Nolet
Word count
258
Language
English
Hacker News points
None

Summary

Checks failed to schedule due to downstream problems, specifically AWS SNS in the us-west-1 region, resulting in around 200 checks failing across all customers with an error message similar to "503: null". The root cause was slow responses and eventual 503 errors returning from calls to AWS SNS. No specific trigger was identified, but elevated error rates were detected by logging, leading to a resolution of the issue itself without any manual intervention.