The Datadog Agent aims to process a large amount of data quickly while using minimal CPU. The team identified the bottleneck in the metric context generation algorithm, which computes a unique key for every metric received by the DogStatsD server. By leveraging the Go runtime's tools and profiling, they were able to pinpoint the sorting algorithms as the performance bottleneck. To improve performance, they specialized the sorting algorithm based on the number of tags, used micro-benchmarks to test various hash implementations, and implemented a custom hash set for deduplicating tags. The new design removed the need for sorting while maintaining efficient deduplication. Validation through micro-benchmarks showed significant performance wins, with the new algorithm spending less time computing metric contexts. The CPU flamegraph revealed reduced CPU usage in the Agent's network traffic processing, and dashboards displayed a 40% decrease in CPU usage on average, using fewer cores to process the same number of metrics. The team hopes this article demonstrates how beneficial performance optimizations can be, from simple to complex ones, and invites readers to join them in solving Agent performance problems.