Skip to main content

💼 Operational Excellence

  • ID: /frameworks/aws-well-architected/ops

Description​

Operational excellence (OE) is a commitment to build software correctly while consistently delivering a great customer experience. The operational excellence pillar contains best practices for organizing your team, designing your workload, operating it at scale, and evolving it over time.

Similar​

Sub Sections​

SectionSub SectionsInternal RulesPoliciesFlagsCompliance
💼 Design for operations102no data
 💼 OPS05-BP01 Use version controlno data
 💼 OPS05-BP02 Test and validate changesno data
 💼 OPS05-BP03 Use configuration management systemsno data
 💼 OPS05-BP04 Use build and deployment management systemsno data
 💼 OPS05-BP05 Perform patch management2no data
 💼 OPS05-BP06 Share design standardsno data
 💼 OPS05-BP07 Implement practices to improve code qualityno data
 💼 OPS05-BP08 Use multiple environmentno data
 💼 OPS05-BP09 Make frequent, small, reversible changesno data
 💼 OPS05-BP10 Fully automate integration and deploymentno data
💼 Implement observability51no data
 💼 OPS04-BP01 Identify key performance indicatorsno data
 💼 OPS04-BP02 Implement application telemetryno data
 💼 OPS04-BP03 Implement user experience telemetryno data
 💼 OPS04-BP04 Implement dependency telemetryno data
 💼 OPS04-BP05 Implement distributed tracing1no data
💼 Learn, share, and improve9no data
 💼 OPS11-BP01 Have a process for continuous improvementno data
 💼 OPS11-BP02 Perform post-incident analysisno data
 💼 OPS11-BP03 Implement feedback loopsno data
 💼 OPS11-BP04 Perform knowledge managementno data
 💼 OPS11-BP05 Define drivers for improvementno data
 💼 OPS11-BP06 Validate insightsno data
 💼 OPS11-BP07 Perform operations metrics reviewsno data
 💼 OPS11-BP08 Document and share lessons learnedno data
 💼 OPS11-BP09 Allocate time to make improvementsno data
💼 Mitigate deployment risks4no data
 💼 OPS06-BP01 Plan for unsuccessful changesno data
 💼 OPS06-BP02 Test deploymentsno data
 💼 OPS06-BP03 Employ safe deployment strategiesno data
 💼 OPS06-BP04 Automate testing and rollbackno data
💼 Operating model6no data
 💼 OPS02-BP01 Resources have identified ownersno data
 💼 OPS02-BP02 Processes and procedures have identified ownersno data
 💼 OPS02-BP03 Operations activities have identified owners responsible for their performanceno data
 💼 OPS02-BP04 Mechanisms exist to manage responsibilities and ownershipno data
 💼 OPS02-BP05 Mechanisms exist to request additions, changes, and exceptionsno data
 💼 OPS02-BP06 Responsibilities between teams are predefined or negotiatedno data
💼 Operational readiness and change management6no data
 💼 OPS07-BP01 Ensure personnel capabilityno data
 💼 OPS07-BP02 Ensure a consistent review of operational readinessno data
 💼 OPS07-BP03 Use runbooks to perform proceduresno data
 💼 OPS07-BP04 Use playbooks to investigate issuesno data
 💼 OPS07-BP05 Make informed decisions to deploy systems and changesno data
 💼 OPS07-BP06 Create support plans for production workloadsno data
💼 Organization priorities6no data
 💼 OPS01-BP01 Evaluate external customer needsno data
 💼 OPS01-BP02 Evaluate internal customer needsno data
 💼 OPS01-BP03 Evaluate governance requirementsno data
 💼 OPS01-BP04 Evaluate compliance requirementsno data
 💼 OPS01-BP05 Evaluate threat landscapeno data
 💼 OPS01-BP06 Evaluate tradeoffs while managing benefits and risksno data
💼 Organizational culture7no data
 💼 OPS03-BP01 Provide executive sponsorshipno data
 💼 OPS03-BP02 Team members are empowered to take action when outcomes are at riskno data
 💼 OPS03-BP03 Escalation is encouragedno data
 💼 OPS03-BP04 Communications are timely, clear, and actionableno data
 💼 OPS03-BP05 Experimentation is encouragedno data
 💼 OPS03-BP06 Team members are encouraged to maintain and grow their skill setsno data
 💼 OPS03-BP07 Resource teams appropriatelyno data
💼 Responding to events7no data
 💼 OPS10-BP01 Use a process for event, incident, and problem managementno data
 💼 OPS10-BP02 Have a process per alertno data
 💼 OPS10-BP03 Prioritize operational events based on business impactno data
 💼 OPS10-BP04 Define escalation pathsno data
 💼 OPS10-BP05 Define a customer communication plan for service-impacting eventsno data
 💼 OPS10-BP06 Communicate status through dashboardsno data
 💼 OPS10-BP07 Automate responses to eventsno data
💼 Understanding operational health3no data
 💼 OPS05-BP03 Use configuration management systemsno data
 💼 OPS09-BP01 Measure operations goals and KPIs with metricsno data
 💼 OPS09-BP02 Communicate status and trends to ensure visibility into operationno data
💼 Utilizing workload observability5no data
 💼 OPS08-BP01 Analyze workload metricsno data
 💼 OPS08-BP02 Analyze workload logsno data
 💼 OPS08-BP03 Analyze workload tracesno data
 💼 OPS08-BP04 Create actionable alertsno data
 💼 OPS08-BP05 Create dashboardsno data