AWS has expanded its ecosystem to include various services that help manage and automate complex cloud environments. To monitor these services, it's essential to track key metrics that provide insights into performance, usage, and resource utilization. This includes metrics such as CPUUtilization, DiskReadBytes, StatusCheckFailed, RequestCount, SurgeQueueLength, HTTPCode_ELB_5XX, Latency, HealthyHostCount, UnhealthyHostCount, FreeStorageSpace, DatabaseConnections, ReadLatency, WriteLatency, DiskQueueDepth, CurrConnections, SetTypeCmds, GetTypeCmds, CacheHits, CacheMisses, Evictions, SwapUsage, Desired task count vs. running task count per service, MemoryUtilization, CPUUtilization, Node status, Memory utilization, Disk utilization, ProvisionedConcurrencyInvocations, ProvisionedConcurrencyUtilization, Duration, Errors, Invocations, ConcurrentExecutions, Throttles, and Provisions.
Monitoring these metrics can help ensure that your AWS services are functioning properly, providing a better understanding of performance, usage, and resource utilization. Additionally, adopting an automated, scalable AWS monitoring strategy will enable you to keep tabs on your infrastructure, even as hosts and services dynamically scale and update in real-time.