Skip to main content

Repository → 💼 AWS Well-Architected → 💼 Operational Excellence → 💼 Operate → 💼 Responding to events

💼 OPS10-BP04 Define escalation paths

  • ID: /frameworks/aws-well-architected/operational-excellence/operate/ops10/bp04

Description

Establish clear escalation paths within your incident response protocols to facilitate timely and effective action. This includes specifying prompts for escalation, detailing the escalation process, and pre-approving actions to expedite decision-making and reduce mean time to resolution (MTTR).

Desired outcome: A structured and efficient process that escalates incidents to the appropriate personnel, minimizing response times and impact.

Common anti-patterns

  • Lack of clarity on recovery procedures leads to makeshift responses during critical incidents.
  • Absence of defined permissions and ownership results in delays when urgent action is needed.
  • Stakeholders and customers are not informed in line with expectations.
  • Important decisions are delayed.

Benefits of establishing this best practice

  • Streamlined incident response through predefined escalation procedures.
  • Reduced downtime with pre-approved actions and clear ownership.
  • Improved resource allocation and support-level adjustments according to incident severity.
  • Improved communication to stakeholders and customers.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

Properly defined escalation paths are crucial for rapid incident response. AWS Systems Manager Incident Manager supports the setup of structured escalation plans and on-call schedules, which alert the right personnel so that they are ready to act when incidents occur.

Implementation steps

  1. Set up escalation prompts: Set up CloudWatch alarms to create an incident in AWS Systems Manager Incident Manager.

  2. Set up on-call schedules: Create on-call schedules in Incident Manager that align with your escalation paths. Equip on-call personnel with the necessary permissions and tools to act swiftly.

  3. Detail escalation procedures:

    • Determine specific conditions under which an incident should be escalated.
    • Create escalation plans in Incident Manager.
    • Escalation channels should consist of a contact or an on-call schedule.
    • Define the roles and responsibilities of the team at each escalation level.
  4. Pre-approve mitigation actions: Collaborate with decision-makers to pre-approve actions for anticipated scenarios. Use Systems Manager Automation runbooks integrated with Incident Manager to speed up incident resolution.

  5. Specify ownership: Clearly identify internal owners for each step of the escalation path.

  6. Detail third-party escalations:

    • Document third-party service-level agreements (SLAs), and align them with internal goals.
    • Set clear protocols for vendor communication during incidents.
    • Integrate vendor contacts into incident management tools for direct access.
    • Conduct regular drills that include third-party response scenarios.
    • Keep vendor escalation information well-documented and easily accessible.
  7. Train and rehearse escalation plans: Train your team on the escalation process and conduct regular incident response drills or game days. Enterprise Support customers can request an Incident Management Workshop.

  8. Continue to improve: Review the effectiveness of your escalation paths regularly. Update your processes based on lessons learned from incident post-mortems and continuous feedback.

Level of effort for the implementation plan: Moderate

Similar

Sub Sections

SectionSub SectionsInternal RulesPoliciesFlagsCompliance