Home / Companies / Datadog / Blog / Post Details
Content Deep Dive

Remediate issues autonomously with Bits Infrastructure Operations

Blog post from Datadog

Post Details
Company
Date Published
Author
Jessica Hsiao, Eli Kalish, Ananth Vaidyanathan
Word Count
1,556
Language
English
Hacker News Points
-
Summary

Bits Infrastructure Operations by Datadog is an advanced tool designed to autonomously detect, investigate, and remediate common infrastructure issues across various environments, including hosts, Kubernetes, serverless functions, and network infrastructure. It aims to alleviate the burden on infrastructure teams by automatically resolving issues like disk saturation, CrashLoopBackOff errors, and expiring TLS certificates before they escalate into incidents. The tool allows application engineers to safely address infrastructure issues affecting their services while platform engineers maintain control through defined guardrails. These guardrails set operational boundaries, ensuring safe remediation actions based on the environment and resource type, while a human-in-the-loop workflow allows teams to approve high-priority fixes. Additionally, Bits Infrastructure Operations assists teams in preventing recurring issues by learning from previously approved fixes and updating guardrails for future autonomous remediation. It also extends into the pull request workflow to flag risky infrastructure-as-code changes before they reach production, using real-time telemetry data to assess potential impacts. By reducing repetitive operational work, Bits Infrastructure Operations enables platform teams to focus on systemic improvements, ultimately enhancing the overall reliability and performance of their infrastructure.