Content Deep Dive
Debugging war story: the mystery of NXDOMAIN
Blog post from Cloudflare
Post Details
Company
Date Published
Author
Ivan Babrou
Word Count
1,621
Language
English
Hacker News Points
-
Summary
The blog post describes a debugging adventure on Cloudflare's Mesos-based cluster, which is primarily used to process log file information and detect attacks. Engineers encountered an issue where internal DNS queries were returning "no such host" errors for existing domains. Through extensive testing and analysis, it was discovered that the problem stemmed from packet loss during DNS resolution attempts. The solution involved increasing the retries option in the resolv.conf file to better handle transient network issues and improve the reliability of DNS resolution.