Delivering Reliability Through SRE Practices

Post Details

Company

Harness

Date Published

Nov. 17, 2020

Author

Harness

Word Count

783

Company Posts That Month

6

Language

English

Hacker News Points

-

Source URL

www.harness.io/blog/delivering-reliability-through-sre-practices

Summary

Site Reliability Engineering (SRE) is a crucial practice for enhancing continuous delivery by ensuring that software remains innovative and reliable through strategies like on-call playbooks, canary deployments, and monitoring vital health metrics such as mean time to restore and change failure rate. SRE emphasizes the importance of being available during incidents, conducting post-mortems for continuous improvement, and managing the people, processes, and technology involved in software delivery. It also involves defining how code gets into production through release engineering, which includes minimizing risk, improving tempo, and automating processes to enable repeatable software delivery, with approaches such as canary deployments. Additionally, SRE focuses on managing reliability through setting SLAs, monitoring performance, and enforcing error thresholds, which can sometimes lead to blocking production releases if certain reliability standards are not met. These practices collectively aim to create stable, agile, and valuable software, supporting a sustainable continuous delivery lifecycle.

Trends Found in this Post

No tracked trend matches for this post yet.