Keeping it boring: the incident.io technology stack
Blog post from Incident.io
Incident.io utilizes a straightforward and efficient technology stack, which has allowed the company to expand its customer base significantly with only two platform engineers. The transition from Heroku to Google Cloud Platform (GCP) was influenced by the familiarity of the early engineers with GCP and its simpler abstractions, such as GKE Autopilot, which facilitates the management of Kubernetes workloads without the need for manual scaling or patching. While some workloads with specific requirements run on Google Compute Engine virtual machines, the database needs are handled by GCP's Cloud SQL using the "Enterprise Plus" tier to ensure high availability and reduced downtime. The event-driven architecture relies on GCP Pub/Sub for queuing asynchronous tasks, while Argo CD and Buildkite are employed for deploying Kubernetes resources and managing CI/CD tasks, respectively. Terraform is used for infrastructure management, and monitoring is conducted via Grafana Cloud, with traces stored independently to mitigate costs. This deliberate simplicity in technology choices allows the platform team to focus on critical tasks and adapt to growing demands, though the company recognizes the need for more platform engineers as it continues to scale.