Company
Date Published
Author
Alexis Lê-Quôc
Word count
476
Language
English
Hacker News points
None

Summary

AWS provides virtual environments that allow multiple instances to be run on the same underlying hardware, resulting in lower compute costs. However, this can lead to inconsistent performance for users due to the hypervisor's need to juggle among multiple instances, causing "CPU Steal" or "Stolen CPU". To detect this issue, AWS users can track the system metric "system.cpu.stolen" using Datadog, which measures the percentage of cycles reclaimed by the hypervisor. By analyzing this metric alongside "system.cpu.idle", users can determine if the steal is due to quota limitations or other tenants on the same hardware requesting more cycles than are available. This visibility into AWS CPU utilization can be obtained in a matter of minutes after signing up for Datadog.