⭐ Repository → 💼 AWS Well-Architected → 💼 Operational Excellence → 💼 Operate → 💼 Responding to events

💼 OPS10-BP07 Automate responses to events

ID: /frameworks/aws-well-architected/operational-excellence/operate/ops10/bp07

Description

Automating event responses is key for fast, consistent, and error-free operational handling. Create streamlined processes and use tools to automatically manage and respond to events, minimizing manual interventions and enhancing operational effectiveness.

Desired outcome

Reduced human errors and faster resolution times through automation.
Consistent and reliable operational event handling.
Enhanced operational efficiency and system reliability.

Common anti-patterns

Manual event handling leads to delays and errors.
Automation is overlooked in repetitive, critical tasks.
Repetitive, manual tasks lead to alert fatigue and missing critical issues.

Benefits of establishing this best practice

Accelerated event responses, reducing system downtime.
Reliable operations with automated and consistent event handling.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

Incorporate automation to create efficient operational workflows and minimize manual interventions.

Implementation steps

Identify automation opportunities: Determine repetitive tasks for automation, such as issue remediation, ticket enrichment, capacity management, scaling, deployments, and testing.
Identify automation prompts:
- Assess and define specific conditions or metrics that initiate automated responses using Amazon CloudWatch alarm actions.
- Use Amazon EventBridge to respond to events in AWS services, custom workloads, and SaaS applications.
- Consider initiation events such as specific log entries, performance metrics thresholds, or state changes in AWS resources.
Implement event-driven automation:
- Use AWS Systems Manager Automation runbooks to simplify maintenance, deployment, and remediation tasks.
- Creating incidents in Incident Manager automatically gathers and adds details about the involved AWS resources to the incident.
- Proactively monitor quotas using Quota Monitor for AWS.
- Automatically adjust capacity with AWS Auto Scaling to maintain availability and performance.
- Automate development pipelines with Amazon CodeCatalyst.
- Smoke test or continually monitor endpoints and APIs using synthetic monitoring.
Perform risk mitigation through automation:
- Implement automated security responses to swiftly address risks.
- Use AWS Systems Manager State Manager to reduce configuration drift.
- Remediate noncompliant resources with AWS Config Rules.

Level of effort for the implementation plan: High

Similar

Sub Sections

Section	Sub Sections	Internal Rules	Policies	Flags	Compliance

Description​

Implementation guidance​

Implementation steps​

Similar​

Sub Sections​

Description

Implementation guidance

Implementation steps

Similar

Sub Sections