Hardening eBPF for runtime security: Lessons from Datadog Workload Protection
Blog post from Datadog
eBPF has transformed the landscape of observability, networking, and security by enabling deep kernel-level instrumentation. However, deploying eBPF at scale, especially in diverse production environments, uncovers numerous challenges and complexities. Over five years, Datadog's experience with its Workload Protection product, which leverages eBPF, has revealed critical lessons for ensuring reliable operation. These include navigating edge cases across various kernel versions, ensuring comprehensive syscall coverage, managing performance impact, and maintaining secure and consistent data capture. Additionally, the potential for eBPF to be misused as a rootkit highlights the importance of strict monitoring and auditing. Datadog has developed strategies such as kernel version testing, centralized eBPF logic, and dynamic in-kernel filtering to mitigate these challenges. Despite its limitations, eBPF continues to hold significant promise for the future of systems engineering, though its adoption in managed or serverless environments remains uncertain. Datadog acknowledges the vast potential of eBPF and remains committed to leveraging its capabilities for various use cases while continuously refining its approach to maintain security and performance at scale.