⭐ Repository → 💼 AWS Well-Architected → 💼 Operational Excellence → 💼 Operate → 💼 Responding to events
💼 OPS10-BP07 Automate responses to events
- ID:
/frameworks/aws-well-architected/operational-excellence/operate/ops10/bp07
Description
Automating event responses is key for fast, consistent, and error-free operational handling. Create streamlined processes and use tools to automatically manage and respond to events, minimizing manual interventions and enhancing operational effectiveness.
Desired outcome
- Reduced human errors and faster resolution times through automation.
- Consistent and reliable operational event handling.
- Enhanced operational efficiency and system reliability.
Common anti-patterns
- Manual event handling leads to delays and errors.
- Automation is overlooked in repetitive, critical tasks.
- Repetitive, manual tasks lead to alert fatigue and missing critical issues.
Benefits of establishing this best practice
- Accelerated event responses, reducing system downtime.
- Reliable operations with automated and consistent event handling.
Level of risk exposed if this best practice is not established: Medium
Implementation guidance
Incorporate automation to create efficient operational workflows and minimize manual interventions.
Implementation steps
-
Identify automation opportunities: Determine repetitive tasks for automation, such as issue remediation, ticket enrichment, capacity management, scaling, deployments, and testing.
-
Identify automation prompts:
- Assess and define specific conditions or metrics that initiate automated responses using Amazon CloudWatch alarm actions.
- Use Amazon EventBridge to respond to events in AWS services, custom workloads, and SaaS applications.
- Consider initiation events such as specific log entries, performance metrics thresholds, or state changes in AWS resources.
-
Implement event-driven automation:
- Use AWS Systems Manager Automation runbooks to simplify maintenance, deployment, and remediation tasks.
- Creating incidents in Incident Manager automatically gathers and adds details about the involved AWS resources to the incident.
- Proactively monitor quotas using Quota Monitor for AWS.
- Automatically adjust capacity with AWS Auto Scaling to maintain availability and performance.
- Automate development pipelines with Amazon CodeCatalyst.
- Smoke test or continually monitor endpoints and APIs using synthetic monitoring.
-
Perform risk mitigation through automation:
- Implement automated security responses to swiftly address risks.
- Use AWS Systems Manager State Manager to reduce configuration drift.
- Remediate noncompliant resources with AWS Config Rules.
Level of effort for the implementation plan: High
Similar
Sub Sections
Section | Sub Sections | Internal Rules | Policies | Flags | Compliance |
---|