Company
Date Published
Author
Darron Froese
Word count
645
Language
English
Hacker News points
None

Summary

The Datadog community was recently shared with an article from Darron Froese, a Site Reliability Engineer. The author's team had been hosting their nonfiction sites on Rackspace Cloud instances, which have proven to be reliable for their workloads. However, one server had been experiencing issues and the team took a closer look to identify the problem. They discovered that some Apache processes were using excessive amounts of memory, leading to potential database issues if those pages were missing. To gain more insight into the issue, the team used Datadog's DogStatsD to monitor the memory usage of their Apache processes. By analyzing this data, they were able to identify and address the problem, including setting limits on process memory and implementing a script to send information to Datadog for further analysis. The team learned that keeping MinSpareServers and MaxSpareServers relatively low can help reduce overall memory usage, and that with the right visibility into system performance, even seemingly minor issues can have significant repercussions.