Company
Date Published
Author
AJ Stuyvenberg, Jordan González
Word count
2017
Language
English
Hacker News points
None

Summary

Datadog reengineered its AWS Lambda extension to deliver high-fidelity telemetry with minimal overhead, resulting in an 82% cold start performance improvement and a reduction in memory usage by 40%. The team used Rust, which benefits from its memory safety features, tiny binary size, and concurrency primitives. They designed the extension to minimize impact on the running Lambda function handler code, CPU time during invoke phase, and post-runtime duration. The new extension supports various flush strategies, including sending data periodically or at the end of an invocation, allowing customers to choose the best approach for their workloads. By using a failover strategy and adding telemetry to track incompatible configurations, Datadog ensured seamless delivery of performance improvements to customers while having a gradual migration process. The team is now exploring ways to reduce long-tail p99 latency, including analyzing customer use cases, reducing memory allocations, and working with the AWS Lambda team.