Skip to main content

Repository → 💼 AWS Well-Architected → 💼 Operational Excellence → 💼 Operate → 💼 Responding to events

💼 OPS10-BP07 Automate responses to events

  • ID: /frameworks/aws-well-architected/operational-excellence/operate/ops10/bp07

Description

Automating event responses is key for fast, consistent, and error-free operational handling. Create streamlined processes and use tools to automatically manage and respond to events, minimizing manual interventions and enhancing operational effectiveness.

Desired outcome

  • Reduced human errors and faster resolution times through automation.
  • Consistent and reliable operational event handling.
  • Enhanced operational efficiency and system reliability.

Common anti-patterns

  • Manual event handling leads to delays and errors.
  • Automation is overlooked in repetitive, critical tasks.
  • Repetitive, manual tasks lead to alert fatigue and missing critical issues.

Benefits of establishing this best practice

  • Accelerated event responses, reducing system downtime.
  • Reliable operations with automated and consistent event handling.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

Incorporate automation to create efficient operational workflows and minimize manual interventions.

Implementation steps

  1. Identify automation opportunities: Determine repetitive tasks for automation, such as issue remediation, ticket enrichment, capacity management, scaling, deployments, and testing.

  2. Identify automation prompts:

    • Assess and define specific conditions or metrics that initiate automated responses using Amazon CloudWatch alarm actions.
    • Use Amazon EventBridge to respond to events in AWS services, custom workloads, and SaaS applications.
    • Consider initiation events such as specific log entries, performance metrics thresholds, or state changes in AWS resources.
  3. Implement event-driven automation:

    • Use AWS Systems Manager Automation runbooks to simplify maintenance, deployment, and remediation tasks.
    • Creating incidents in Incident Manager automatically gathers and adds details about the involved AWS resources to the incident.
    • Proactively monitor quotas using Quota Monitor for AWS.
    • Automatically adjust capacity with AWS Auto Scaling to maintain availability and performance.
    • Automate development pipelines with Amazon CodeCatalyst.
    • Smoke test or continually monitor endpoints and APIs using synthetic monitoring.
  4. Perform risk mitigation through automation:

    • Implement automated security responses to swiftly address risks.
    • Use AWS Systems Manager State Manager to reduce configuration drift.
    • Remediate noncompliant resources with AWS Config Rules.

Level of effort for the implementation plan: High

Similar

Sub Sections

SectionSub SectionsInternal RulesPoliciesFlagsCompliance