Skip to main content

🛡️ Google GCE Instance is overutilized🟢

Logic

Description

Open File

Description

Identify Google GCE Instances operating at high utilization levels that may require scaling or resizing to maintain stable application performance. When a linked New Relic host is available, this policy also evaluates memory utilization. Otherwise, it falls back to CPU metrics only. An instance is considered overutilized if, over a 14-day period, its average CPU utilization exceeds 80% and its maximum CPU utilization exceeds 95%, with memory utilization above 80% when New Relic data is available.

Rationale

Overutilized instances can cause degraded response times, increased latency, and reduced workload stability. Identifying these instances helps ensure that compute capacity remains aligned with workload demand and supports timely scaling decisions.

Impact

Resizing or scaling an instance can increase infrastructure cost and may require a brief service interruption, depending on the workload and deployment model.

Audit

This policy evaluates a Google GCE Instance over the last 14 days using CPU metrics and optional New Relic memory metrics.

... see more

Remediation

Open File

Remediation

Right-Size Overutilized Instances

Resize the instance to a larger machine type if the workload consistently exceeds the current capacity.

Using gcloud CLI
  1. Stop the instance:

    gcloud compute instances stop {{instance-name}} \
    --zone={{zone}}
  2. Change the machine type:

    gcloud compute instances set-machine-type {{instance-name}} \
    --machine-type={{new-machine-type}} \
    --zone={{zone}}
  3. Start the instance:

    gcloud compute instances start {{instance-name}} \
    --zone={{zone}}

Implement Autoscaling

For workloads with variable demand, consider moving the application to a managed instance group and configuring autoscaling policies based on CPU utilization.

Considerations
  • Verify that the selected machine type is available in the instance zone.
  • Confirm that the workload and attached resources are compatible with the new machine type.
  • Schedule resizing during a maintenance window if the instance serves production traffic.

policy.yaml

Open File

Linked Framework Sections

SectionSub SectionsInternal RulesPoliciesFlagsCompliance
💼 Cloudaware Framework → 💼 Resource Right-Sizing18no data
💼 Cloudaware Framework → 💼 Workload Efficiency24no data