Skip to main content

Repository → 💼 AWS Well-Architected

💼 Operational Excellence

  • ID: /frameworks/aws-well-architected/operational-excellence

Description

Operational excellence (OE) is a commitment to build software correctly while consistently delivering a great customer experience. The operational excellence pillar contains best practices for organizing your team, designing your workload, operating it at scale, and evolving it over time.

Similar

Sub Sections

SectionSub SectionsInternal RulesPoliciesFlagsCompliance
💼 Evolve1no data
 💼 Learn, share, and improve9no data
  💼 OPS11-BP01 Have a process for continuous improvementno data
  💼 OPS11-BP02 Perform post-incident analysisno data
  💼 OPS11-BP03 Implement feedback loopsno data
  💼 OPS11-BP04 Perform knowledge managementno data
  💼 OPS11-BP05 Define drivers for improvementno data
  💼 OPS11-BP06 Validate insightsno data
  💼 OPS11-BP07 Perform operations metrics reviewsno data
  💼 OPS11-BP08 Document and share lessons learnedno data
  💼 OPS11-BP09 Allocate time to make improvementsno data
💼 Operate3no data
 💼 Responding to events7no data
  💼 OPS10-BP01 Use a process for event, incident, and problem managementno data
  💼 OPS10-BP02 Have a process per alertno data
  💼 OPS10-BP03 Prioritize operational events based on business impactno data
  💼 OPS10-BP04 Define escalation pathsno data
  💼 OPS10-BP05 Define a customer communication plan for service-impacting eventsno data
  💼 OPS10-BP06 Communicate status through dashboardsno data
  💼 OPS10-BP07 Automate responses to eventsno data
 💼 Understanding operational health3no data
  💼 OPS05-BP03 Use configuration management systemsno data
  💼 OPS09-BP01 Measure operations goals and KPIs with metricsno data
  💼 OPS09-BP02 Communicate status and trends to ensure visibility into operationno data
 💼 Utilizing workload observability5no data
  💼 OPS08-BP01 Analyze workload metricsno data
  💼 OPS08-BP02 Analyze workload logsno data
  💼 OPS08-BP03 Analyze workload tracesno data
  💼 OPS08-BP04 Create actionable alertsno data
  💼 OPS08-BP05 Create dashboardsno data
💼 Organization3no data
 💼 Operating model6no data
  💼 OPS02-BP01 Resources have identified ownersno data
  💼 OPS02-BP02 Processes and procedures have identified ownersno data
  💼 OPS02-BP03 Operations activities have identified owners responsible for their performanceno data
  💼 OPS02-BP04 Mechanisms exist to manage responsibilities and ownershipno data
  💼 OPS02-BP05 Mechanisms exist to request additions, changes, and exceptionsno data
  💼 OPS02-BP06 Responsibilities between teams are predefined or negotiatedno data
 💼 Organization priorities6no data
  💼 OPS01-BP01 Evaluate external customer needsno data
  💼 OPS01-BP02 Evaluate internal customer needsno data
  💼 OPS01-BP03 Evaluate governance requirementsno data
  💼 OPS01-BP04 Evaluate compliance requirementsno data
  💼 OPS01-BP05 Evaluate threat landscapeno data
  💼 OPS01-BP06 Evaluate tradeoffs while managing benefits and risksno data
 💼 Organizational culture7no data
  💼 OPS03-BP01 Provide executive sponsorshipno data
  💼 OPS03-BP02 Team members are empowered to take action when outcomes are at riskno data
  💼 OPS03-BP03 Escalation is encouragedno data
  💼 OPS03-BP04 Communications are timely, clear, and actionableno data
  💼 OPS03-BP05 Experimentation is encouragedno data
  💼 OPS03-BP06 Team members are encouraged to maintain and grow their skill setsno data
  💼 OPS03-BP07 Resource teams appropriatelyno data
💼 Prepare43no data
 💼 Design for operations102no data
  💼 OPS05-BP01 Use version controlno data
  💼 OPS05-BP02 Test and validate changesno data
  💼 OPS05-BP03 Use configuration management systemsno data
  💼 OPS05-BP04 Use build and deployment management systemsno data
  💼 OPS05-BP05 Perform patch management2no data
  💼 OPS05-BP06 Share design standardsno data
  💼 OPS05-BP07 Implement practices to improve code qualityno data
  💼 OPS05-BP08 Use multiple environmentno data
  💼 OPS05-BP09 Make frequent, small, reversible changesno data
  💼 OPS05-BP10 Fully automate integration and deploymentno data
 💼 Implement observability51no data
  💼 OPS04-BP01 Identify key performance indicatorsno data
  💼 OPS04-BP02 Implement application telemetryno data
  💼 OPS04-BP03 Implement user experience telemetryno data
  💼 OPS04-BP04 Implement dependency telemetryno data
  💼 OPS04-BP05 Implement distributed tracing1no data
 💼 Mitigate deployment risks4no data
  💼 OPS06-BP01 Plan for unsuccessful changesno data
  💼 OPS06-BP02 Test deploymentsno data
  💼 OPS06-BP03 Employ safe deployment strategiesno data
  💼 OPS06-BP04 Automate testing and rollbackno data
 💼 Operational readiness and change management6no data
  💼 OPS07-BP01 Ensure personnel capabilityno data
  💼 OPS07-BP02 Ensure a consistent review of operational readinessno data
  💼 OPS07-BP03 Use runbooks to perform proceduresno data
  💼 OPS07-BP04 Use playbooks to investigate issuesno data
  💼 OPS07-BP05 Make informed decisions to deploy systems and changesno data
  💼 OPS07-BP06 Create support plans for production workloadsno data