Skip to main content

💼 OPS10-BP07 Automate responses to events

  • ID: /frameworks/aws-well-architected/ops/10/07

Description​

Automating event responses is key for fast, consistent, and error-free operational handling. Create streamlined processes and use tools to automatically manage and respond to events, minimizing manual interventions and enhancing operational effectiveness.

Desired outcome

  • Reduced human errors and faster resolution times through automation.
  • Consistent and reliable operational event handling.
  • Enhanced operational efficiency and system reliability.

Common anti-patterns

  • Manual event handling leads to delays and errors.
  • Automation is overlooked in repetitive, critical tasks.
  • Repetitive, manual tasks lead to alert fatigue and missing critical issues.

Benefits of establishing this best practice

  • Accelerated event responses, reducing system downtime.
  • Reliable operations with automated and consistent event handling.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance​

Incorporate automation to create efficient operational workflows and minimize manual interventions.

Implementation steps​

  1. Identify automation opportunities: Determine repetitive tasks for automation, such as issue remediation, ticket enrichment, capacity management, scaling, deployments, and testing.

  2. Identify automation prompts:

    • Assess and define specific conditions or metrics that initiate automated responses using Amazon CloudWatch alarm actions.
    • Use Amazon EventBridge to respond to events in AWS services, custom workloads, and SaaS applications.
    • Consider initiation events such as specific log entries, performance metrics thresholds, or state changes in AWS resources.
  3. Implement event-driven automation:

    • Use AWS Systems Manager Automation runbooks to simplify maintenance, deployment, and remediation tasks.
    • Creating incidents in Incident Manager automatically gathers and adds details about the involved AWS resources to the incident.
    • Proactively monitor quotas using Quota Monitor for AWS.
    • Automatically adjust capacity with AWS Auto Scaling to maintain availability and performance.
    • Automate development pipelines with Amazon CodeCatalyst.
    • Smoke test or continually monitor endpoints and APIs using synthetic monitoring.
  4. Perform risk mitigation through automation:

    • Implement automated security responses to swiftly address risks.
    • Use AWS Systems Manager State Manager to reduce configuration drift.
    • Remediate noncompliant resources with AWS Config Rules.

Level of effort for the implementation plan: High

Similar​

Sub Sections​

SectionSub SectionsInternal RulesPoliciesFlagsCompliance