Establishing and Enabling a Center of Production Excellence
Blog post from Honeycomb
Software systems often operate in a "degraded mode," requiring a mix of technical, organizational, and human effort to maintain stability, with some organizations leveraging resilience as an active practice rather than an inherent trait. The concept of a Center of Production Excellence (CoPE) or an Observability Guild is discussed as a means to enhance organizational resilience by creating a dedicated group that can critically analyze and improve internal processes. This group should have a degree of authority and autonomy, drawing parallels to safety departments in organizations, and requires diverse, experienced members from across the organization to effectively evaluate and recommend changes. The CoPE employs both passive and active tactics to improve observability practices, such as regular training, newsletters, and incident response planning, thereby fostering a culture of continual learning and adaptability. Top-down support from management is crucial, involving autonomy for the CoPE, empowering frontline expertise, and encouraging diverse perspectives to prevent knowledge silos. By nurturing adaptability and production excellence, organizations can better respond to the dynamic nature of software systems and enhance their overall resilience.