⭐ Repository → 💼 AWS Well-Architected → 💼 Operational Excellence → 💼 Operate → 💼 Responding to events
💼 OPS10-BP04 Define escalation paths
- ID:
/frameworks/aws-well-architected/operational-excellence/operate/ops10/bp04
Description
Establish clear escalation paths within your incident response protocols to facilitate timely and effective action. This includes specifying prompts for escalation, detailing the escalation process, and pre-approving actions to expedite decision-making and reduce mean time to resolution (MTTR).
Desired outcome: A structured and efficient process that escalates incidents to the appropriate personnel, minimizing response times and impact.
Common anti-patterns
- Lack of clarity on recovery procedures leads to makeshift responses during critical incidents.
- Absence of defined permissions and ownership results in delays when urgent action is needed.
- Stakeholders and customers are not informed in line with expectations.
- Important decisions are delayed.
Benefits of establishing this best practice
- Streamlined incident response through predefined escalation procedures.
- Reduced downtime with pre-approved actions and clear ownership.
- Improved resource allocation and support-level adjustments according to incident severity.
- Improved communication to stakeholders and customers.
Level of risk exposed if this best practice is not established: Medium
Implementation guidance
Properly defined escalation paths are crucial for rapid incident response. AWS Systems Manager Incident Manager supports the setup of structured escalation plans and on-call schedules, which alert the right personnel so that they are ready to act when incidents occur.
Implementation steps
-
Set up escalation prompts: Set up CloudWatch alarms to create an incident in AWS Systems Manager Incident Manager.
-
Set up on-call schedules: Create on-call schedules in Incident Manager that align with your escalation paths. Equip on-call personnel with the necessary permissions and tools to act swiftly.
-
Detail escalation procedures:
- Determine specific conditions under which an incident should be escalated.
- Create escalation plans in Incident Manager.
- Escalation channels should consist of a contact or an on-call schedule.
- Define the roles and responsibilities of the team at each escalation level.
-
Pre-approve mitigation actions: Collaborate with decision-makers to pre-approve actions for anticipated scenarios. Use Systems Manager Automation runbooks integrated with Incident Manager to speed up incident resolution.
-
Specify ownership: Clearly identify internal owners for each step of the escalation path.
-
Detail third-party escalations:
- Document third-party service-level agreements (SLAs), and align them with internal goals.
- Set clear protocols for vendor communication during incidents.
- Integrate vendor contacts into incident management tools for direct access.
- Conduct regular drills that include third-party response scenarios.
- Keep vendor escalation information well-documented and easily accessible.
-
Train and rehearse escalation plans: Train your team on the escalation process and conduct regular incident response drills or game days. Enterprise Support customers can request an Incident Management Workshop.
-
Continue to improve: Review the effectiveness of your escalation paths regularly. Update your processes based on lessons learned from incident post-mortems and continuous feedback.
Level of effort for the implementation plan: Moderate
Similar
Sub Sections
Section | Sub Sections | Internal Rules | Policies | Flags | Compliance |
---|