Skip to main content

Repository → 💼 AWS Well-Architected → 💼 Operational Excellence → 💼 Operate

💼 Responding to events

  • ID: /frameworks/aws-well-architected/operational-excellence/operate/ops10

Description

You should anticipate operational events, both planned (for example, sales promotions, deployments, and failure tests) and unplanned (for example, surges in utilization and component failures). You should use your existing runbooks and playbooks to deliver consistent results when you respond to alerts. Defined alerts should be owned by a role or a team that is accountable for the response and escalations. You will also want to know the business impact of your system components and use this to target efforts when needed. You should perform a root cause analysis (RCA) after events, and then prevent recurrence of failures or document workarounds.

Similar

Sub Sections

SectionSub SectionsInternal RulesPoliciesFlagsCompliance
💼 OPS10-BP01 Use a process for event, incident, and problem managementno data
💼 OPS10-BP02 Have a process per alertno data
💼 OPS10-BP03 Prioritize operational events based on business impactno data
💼 OPS10-BP04 Define escalation pathsno data
💼 OPS10-BP05 Define a customer communication plan for service-impacting eventsno data
💼 OPS10-BP06 Communicate status through dashboardsno data
💼 OPS10-BP07 Automate responses to eventsno data