Skip to main content

8822 docs tagged with "section"

View all tags

๐Ÿ’ผ [APIGateway.4] API Gateway should be associated with a WAF Web ACL

AWS WAF is a web application firewall that helps protect web applications and APIs from attacks. It enables you to configure an ACL, which is a set of rules that allow, block, or count web requests based on customizable web security rules and conditions that you define. Ensure that your API Gateway stage is associated with an AWS WAF web ACL to help protect it from malicious attacks.

๐Ÿ’ผ [APIGateway.5] API Gateway REST API cache data should be encrypted at rest

Encrypting data at rest reduces the risk of data stored on disk being accessed by a user not authenticated to AWS. It adds another set of access controls to limit unauthorized users ability access the data. For example, API permissions are required to decrypt the data before it can be read. API Gateway REST API caches should be encrypted at rest for an added layer of security.

๐Ÿ’ผ [AppSync.5] AWS AppSync GraphQL APIs should not be authenticated with API keys

An API key is a hard-coded value in your application that is generated by the AWS AppSync service when you create an unauthenticated GraphQL endpoint. If this API key is compromised, your endpoint is vulnerable to unintended access. Unless you are supporting a publicly accessible application or website, we don't recommend using an API key for authentication.

๐Ÿ’ผ [AppSync.6] AWS AppSync API caches should be encrypted in transit

Data in transit refers to data that moves from one location to another, such as between nodes in your cluster or between your cluster and your application. Data may move across the internet or within a private network. Encrypting data in transit reduces the risk that an unauthorized user can eavesdrop on network traffic.

๐Ÿ’ผ [AutoScaling.2] Amazon EC2 Auto Scaling group should cover multiple Availability Zones

An Auto Scaling group that doesn't span multiple AZs can't launch instances in another AZ to compensate if the configured single AZ becomes unavailable. However, an Auto Scaling group with a single Availability Zone may be preferred in some use cases, such as batch jobs or when inter-AZ transfer costs need to be kept to a minimum. In such cases, you can disable this control or suppress its findings.

๐Ÿ’ผ [Backup.1] AWS Backup recovery points should be encrypted at rest

An AWS Backup recovery point refers to a specific copy or snapshot of data that is created as part of a backup process. It represents a particular moment in time when the data was backed up and serves as a restore point in case the original data becomes lost, corrupted, or inaccessible. Encrypting the backup recovery points adds an extra layer of protection against unauthorized access. Encryption is a best practice to protect the confidentiality, integrity, and security of backup data.

๐Ÿ’ผ [CloudFront.13] CloudFront distributions should use origin access control

When using an S3 bucket as an origin for your CloudFront distribution, you can enable OAC. This permits access to the content in the bucket only through the specified CloudFront distribution, and prohibits access directly from the bucket or another distribution. Although CloudFront supports Origin Access Identity (OAI), OAC offers additional functionality, and distributions using OAI can migrate to OAC. While OAI provides a secure way to access S3 origins, it has limitations, such as lack of support for granular policy configurations and for HTTP/HTTPS requests that use the POST method in AWS Regions that require AWS Signature Version 4 (SigV4). OAI also doesn't support encryption with AWS Key Management Service. OAC is based on an AWS best practice of using IAM service principals to authenticate with S3 origins.

๐Ÿ’ผ [CloudFront.3] CloudFront distributions should require encryption in transit

HTTPS (TLS) can be used to help prevent potential attackers from using person-in-the-middle or similar attacks to eavesdrop on or manipulate network traffic. Only encrypted connections over HTTPS (TLS) should be allowed. Encrypting data in transit can affect performance. You should test your application with this feature to understand the performance profile and the impact of TLS.

๐Ÿ’ผ [CloudFront.5] CloudFront distributions should have logging enabled

CloudFront access logs provide detailed information about every user request that CloudFront receives. Each log contains information such as the date and time the request was received, the IP address of the viewer that made the request, the source of the request, and the port number of the request from the viewer. These logs are useful for applications such as security and access audits and forensics investigation.

๐Ÿ’ผ [CloudFront.6] CloudFront distributions should have WAF enabled

AWS WAF is a web application firewall that helps protect web applications and APIs from attacks. It allows you to configure a set of rules, called a web access control list (web ACL), that allow, block, or count web requests based on customizable web security rules and conditions that you define. Ensure your CloudFront distribution is associated with an AWS WAF web ACL to help protect it from malicious attacks.

๐Ÿ’ผ [CloudFront.8] CloudFront distributions should use SNI to serve HTTPS requests

Server Name Indication (SNI) is an extension to the TLS protocol that is supported by browsers and clients released after 2010. If you configure CloudFront to serve HTTPS requests using SNI, CloudFront associates your alternate domain name with an IP address for each edge location. When a viewer submits an HTTPS request for your content, DNS routes the request to the IP address for the correct edge location. The IP address to your domain name is determined during the SSL/TLS handshake negotiation; the IP address isn't dedicated to your distribution.

๐Ÿ’ผ [CloudTrail.1] CloudTrail should be enabled and configured with at least one multi-Region trail that includes read and write management events

AWS CloudTrail records AWS API calls for your account and delivers log files to you. The recorded information includes the following information: - Identity of the API caller - Time of the API call - Source IP address of the API caller - Request parameters - Response elements returned by the AWS service CloudTrail provides a history of AWS API calls for an account, including API calls made from the AWS Management Console, AWS SDKs, command line tools. The history also includes API calls from higher-level AWS services such as AWS CloudFormation. The AWS API call history produced by CloudTrail enables security analysis, resource change tracking, and compliance auditing. Multi-Region trails also provide the following benefits. - A multi-Region trail helps to detect unexpected activity occurring in otherwise unused Regions. - A multi-Region trail ensures that global service event logging is enabled for a trail by default. Global service event logging records events generated by AWS global services. - For a multi-Region trail, management events for all read and write operations ensure that CloudTrail records management operations on all resources in an AWS account. By default, CloudTrail trails that are created using the AWS Management Console are multi-Region trails.

๐Ÿ’ผ [CloudTrail.2] CloudTrail should have encryption at-rest enabled

For an added layer of security for your sensitive CloudTrail log files, you should use server-side encryption with AWS KMS keys (SSE-KMS) for your CloudTrail log files for encryption at rest. Note that by default, the log files delivered by CloudTrail to your buckets are encrypted by Amazon server-side encryption with Amazon S3-managed encryption keys (SSE-S3).

๐Ÿ’ผ [CloudTrail.5] CloudTrail trails should be integrated with Amazon CloudWatch Logs

CloudTrail records AWS API calls that are made in a given account. The recorded information includes the following: - The identity of the API caller - The time of the API call - The source IP address of the API caller - The request parameters - The response elements returned by the AWS service CloudTrail uses Amazon S3 for log file storage and delivery. You can capture CloudTrail logs in a specified S3 bucket for long-term analysis. To perform real-time analysis, you can configure CloudTrail to send logs to CloudWatch Logs. For a trail that is enabled in all Regions in an account, CloudTrail sends log files from all of those Regions to a CloudWatch Logs log group.

๐Ÿ’ผ [CodeBuild.1] CodeBuild Bitbucket source repository URLs should not contain sensitive credentials

Sign-in credentials shouldn't be stored or transmitted in clear text or appear in the source repository URL. Instead of personal access tokens or sign-in credentials, you should access your source provider in CodeBuild, and change your source repository URL to contain only the path to the Bitbucket repository location. Using personal access tokens or sign-in credentials could result in unintended data exposure or unauthorized access.

๐Ÿ’ผ [CodeBuild.3] CodeBuild S3 logs should be encrypted

Encryption of data at rest is a recommended best practice to add a layer of access management around your data. Encrypting the logs at rest reduces the risk that a user not authenticated by AWS will access the data stored on disk. It adds another set of access controls to limit the ability of unauthorized users to access the data.

๐Ÿ’ผ [Connect.2] Amazon Connect instances should have CloudWatch logging enabled

Amazon Connect flow logs provide real-time details about events in Amazon Connect flows. A flow defines the customer experience with an Amazon Connect contact center from start to finish. By default, when you create a new Amazon Connect instance, an Amazon CloudWatch log group is created automatically to store flow logs for the instance. Flow logs can help you analyze flows, find errors, and monitor operational metrics. You can also set up alerts for specific events that can occur in a flow.

๐Ÿ’ผ [DataFirehose.1] Firehose delivery streams should be encrypted at rest

Server-side encryption is a feature in Amazon Data Firehose delivery streams that automatically encrypts data before it's at rest by using a key created in AWS Key Management Service (AWS KMS). Data is encrypted before it's written to the Data Firehose stream storage layer, and decrypted after it's retrieved from storage. This allows you to comply with regulatory requirements and enhance the security of your data.

๐Ÿ’ผ [DataSync.1] DataSync tasks should have logging enabled

Audit logs track and monitor system activities. They provide a record of events that can help you detect security breaches, investigate incidents, and comply with regulations. Audit logs also enhance the overall accountability and transparency of your organization.

๐Ÿ’ผ [DMS.10] DMS endpoints for Neptune databases should have IAM authorization enabled

AWS Identity and Access Management (IAM) provides fine-grained access control across AWS. With IAM, you can specify who can access which services and resources, and under which conditions. With IAM policies, you manage permissions to your workforce and systems to ensure least-privilege permissions. By enabling IAM authorization on AWS DMS endpoints for Neptune databases, you can grant authorization privileges to IAM users by using a service role specified by the `ServiceAccessRoleARN` parameter.

๐Ÿ’ผ [DMS.11] DMS endpoints for MongoDB should have an authentication mechanism enabled

AWS Database Migration Service supports two authentications methods for MongoDB โ€” **MONGODB-CR** for MongoDB version 2.x, and **SCRAM-SHA-1** for MongoDB version 3.x or later. These authentication methods are used to authenticate and encrypt MongoDB passwords if users want to use the passwords to access the databases. Authentication on AWS DMS endpoints ensures that only authorized users can access and modify the data being migrated between databases. Without proper authentication, unauthorized users may be able to gain access to sensitive data during the migration process. This can result in data breaches, data loss, or other security incidents.

๐Ÿ’ผ [DMS.12] DMS endpoints for Redis OSS should have TLS enabled

TLS provides end-to-end security when data is sent between applications or databases over the internet. When you configure SSL encryption for your DMS endpoint, it enables encrypted communication between the source and target databases during the migration process. This helps prevent eavesdropping and interception of sensitive data by malicious actors. Without SSL encryption, sensitive data may be accessed, resulting in data breaches, data loss, or other security incidents.

๐Ÿ’ผ [DMS.6] DMS replication instances should have automatic minor version upgrade enabled

DMS provides automatic minor version upgrade to each supported replication engine so that you can keep your replication instance up-to-date. Minor versions can introduce new software features, bug fixes, security patches, and performance improvements. By enabling automatic minor version upgrade on DMS replication instances, minor upgrades are applied automatically during the maintenance window or immediately if the **Apply changes immediately option is chosen**.

๐Ÿ’ผ [DMS.7] DMS replication tasks for the target database should have logging enabled

DMS uses Amazon CloudWatch to log information during the migration process. Using logging task settings, you can specify which component activities are logged and how much information is logged. You should specify logging for the following tasks: - `TARGET_APPLY`: Data and data definition language (DDL) statements are applied to the target database. - `TARGET_LOAD`: Data is loaded into the target database. Logging plays a critical role in DMS replication tasks by enabling monitoring, troubleshooting, auditing, performance analysis, error detection, and recovery, as well as historical analysis and reporting. It helps ensure the successful replication of data between databases while maintaining data integrity and compliance with regulatory requirements. Logging levels other than `DEFAULT` are rarely needed for these components during troubleshooting. We recommend keeping the logging level as `DEFAULT` for these components unless specifically requested to change it by Support. A minimal logging level of `DEFAULT` ensures that informational messages, warnings, and error messages are written to the logs.

๐Ÿ’ผ [DMS.8] DMS replication tasks for the source database should have logging enabled

DMS uses Amazon CloudWatch to log information during the migration process. Using logging task settings, you can specify which component activities are logged and how much information is logged. You should specify logging for the following tasks: - `SOURCE_CAPTURE`: Ongoing replication or change data capture (CDC) data is captured from the source database or service, and passed to the SORTER service component. - `SOURCE_UNLOAD`: Data is unloaded from the source database or service during full load. Logging plays a critical role in DMS replication tasks by enabling monitoring, troubleshooting, auditing, performance analysis, error detection, and recovery, as well as historical analysis and reporting. It helps ensure the successful replication of data between databases while maintaining data integrity and compliance with regulatory requirements. Logging levels other than `DEFAULT` are rarely needed for these components during troubleshooting.

๐Ÿ’ผ [DMS.9] DMS endpoints should use SSL

SSL/TLS connections provide a layer of security by encrypting connections between DMS replication instances and your database. Using certificates provides an extra layer of security by validating that the connection is being made to the expected database. It does so by checking the server certificate that is automatically installed on all database instances that you provision. By enabling SSL connection on your DMS endpoints, you protect the confidentiality of the data during the migration.

๐Ÿ’ผ [DocumentDB.1] Amazon DocumentDB clusters should be encrypted at rest

Data at rest refers to any data that's stored in persistent, non-volatile storage for any duration. Encryption helps you protect the confidentiality of such data, reducing the risk that an unauthorized user gets access to it. Data in Amazon DocumentDB clusters should be encrypted at rest for an added layer of security. Amazon DocumentDB uses the 256-bit Advanced Encryption Standard (AES-256) to encrypt your data using encryption keys stored in AWS Key Management Service (AWS KMS).

๐Ÿ’ผ [DocumentDB.2] Amazon DocumentDB clusters should have an adequate backup retention period

Backups help you recover more quickly from a security incident and strengthen the resilience of your systems. By automating backups for your Amazon DocumentDB clusters, you'll be able to restore your systems to a point in time and minimize downtime and data loss. In Amazon DocumentDB, clusters have a default backup retention period of 1 day. This must be increased to a value between 7 and 35 days to pass this control.

๐Ÿ’ผ [DocumentDB.5] Amazon DocumentDB clusters should have deletion protection enabled

Enabling cluster deletion protection offers an additional layer of protection against accidental database deletion or deletion by an unauthorized user. An Amazon DocumentDB cluster can't be deleted while deletion protection is enabled. You must first disable deletion protection before a delete request can succeed. Deletion protection is enabled by default when you create a cluster in the Amazon DocumentDB console.

๐Ÿ’ผ [DynamoDB.1] DynamoDB tables should automatically scale capacity with demand

Scaling capacity with demand avoids throttling exceptions, which helps to maintain availability of your applications. DynamoDB tables that use on-demand capacity mode are limited only by the DynamoDB throughput default table quotas. To raise these quotas, you can file a support ticket with Support. DynamoDB tables that use provisioned mode with auto scaling adjust the provisioned throughput capacity dynamically in response to traffic patterns.

๐Ÿ’ผ [DynamoDB.2] DynamoDB tables should have point-in-time recovery enabled

Backups help you to recover more quickly from a security incident. They also strengthen the resilience of your systems. DynamoDB point-in-time recovery automates backups for DynamoDB tables. It reduces the time to recover from accidental delete or write operations. DynamoDB tables that have PITR enabled can be restored to any point in time in the last 35 days.

๐Ÿ’ผ [DynamoDB.6] DynamoDB tables should have deletion protection enabled

You can protect a DynamoDB table from accidental deletion with the deletion protection property. Enabling this property for tables helps ensure that tables don't get accidentally deleted during regular table management operations by your administrators. This helps prevent disruption to your normal business operations.

๐Ÿ’ผ [DynamoDB.7] DynamoDB Accelerator clusters should be encrypted in transit

HTTPS (TLS) can be used to help prevent potential attackers from using person-in-the-middle or similar attacks to eavesdrop on or manipulate network traffic. You should only allow encrypted connections over TLS to access DAX clusters. However, encrypting data in transit can affect performance. You should test your application with encryption turned on to understand the performance profile and the impact of TLS.

๐Ÿ’ผ [EC2.1] Amazon EBS snapshots should not be publicly restorable

EBS snapshots are used to back up the data on your EBS volumes to Amazon S3 at a specific point in time. You can use the snapshots to restore previous states of EBS volumes. It is rarely acceptable to share a snapshot with the public. Typically the decision to share a snapshot publicly was made in error or without a complete understanding of the implications.

๐Ÿ’ผ [EC2.10] Amazon EC2 should be configured to use VPC endpoints that are created for the Amazon EC2 service

To improve the security posture of your VPC, you can configure Amazon EC2 to use an interface VPC endpoint. Interface endpoints are powered by AWS PrivateLink, a technology that enables you to access Amazon EC2 API operations privately. It restricts all network traffic between your VPC and Amazon EC2 to the Amazon network. Because endpoints are supported within the same Region only, you cannot create an endpoint between a VPC and a service in a different Region. This prevents unintended Amazon EC2 API calls to other Regions.

๐Ÿ’ผ [EC2.171] EC2 VPN connections should have logging enabled

AWS Site-to-Site VPN logs provide you with deeper visibility into your Site-to-Site VPN deployments. With this feature, you have access to Site-to-Site VPN connection logs that provide details on IP Security (IPsec) tunnel establishment, Internet Key Exchange (IKE) negotiations, and dead peer detection (DPD) protocol messages. Site-to-Site VPN logs can be published to CloudWatch Logs.

๐Ÿ’ผ [EC2.172] EC2 VPC Block Public Access settings should block internet gateway traffic

Configuring the VPC BPA settings for your account in an AWS Region lets you block resources in VPCs and subnets that you own in that Region from reaching or being reached from the internet through internet gateways and egress-only internet gateways. If you need specific VPCs and subnets to be able to reach or be reachable from the internet, you can exclude them by configuring VPC BPA exclusions.

๐Ÿ’ผ [EC2.18] Security groups should only allow unrestricted incoming traffic for authorized ports

Security groups provide stateful filtering of ingress and egress network traffic to AWS. Security group rules should follow the principal of least privileged access. Unrestricted access (IP address with a /0 suffix) increases the opportunity for malicious activity such as hacking, denial-of-service attacks, and loss of data. Unless a port is specifically allowed, the port should deny unrestricted access.

๐Ÿ’ผ [EC2.2] VPC default security groups should not allow inbound or outbound traffic

The rules for the default security group allow all outbound and inbound traffic from network interfaces (and their associated instances) that are assigned to the same security group. We recommend that you don't use the default security group. Because the default security group cannot be deleted, you should change the default security group rules setting to restrict inbound and outbound traffic. This prevents unintended traffic if the default security group is accident

๐Ÿ’ผ [EC2.20] Both VPN tunnels for an AWS Site-to-Site VPN connection should be up

A VPN tunnel is an encrypted link where data can pass from the customer network to or from AWS within an AWS Site-to-Site VPN connection. Each VPN connection includes two VPN tunnels which you can simultaneously use for high availability. Ensuring that both VPN tunnels are up for a VPN connection is important for confirming a secure and highly available connection between an AWS VPC and your remote network.

๐Ÿ’ผ [EC2.24] Amazon EC2 paravirtual instance types should not be used

Linux Amazon Machine Images (AMIs) use one of two types of virtualization: paravirtual (PV) or hardware virtual machine (HVM). The main differences between PV and HVM AMIs are the way in which they boot and whether they can take advantage of special hardware extensions (CPU, network, and storage) for better performance. Historically, PV guests had better performance than HVM guests in many cases, but because of enhancements in HVM virtualization and the availability of PV drivers for HVM AMIs, this is no longer true.

๐Ÿ’ผ [EC2.3] Attached Amazon EBS volumes should be encrypted at-rest

For an added layer of security of your sensitive data in EBS volumes, you should enable EBS encryption at rest. Amazon EBS encryption offers a straightforward encryption solution for your EBS resources that doesn't require you to build, maintain, and secure your own key management infrastructure. It uses KMS keys when creating encrypted volumes and snapshots.

๐Ÿ’ผ [EC2.4] Stopped EC2 instances should be removed after a specified time period

When an EC2 instance has not run for a significant period of time, it creates a security risk because the instance is not being actively maintained (analyzed, patched, updated). If it is later launched, the lack of proper maintenance could result in unexpected issues in your AWS environment. To safely maintain an EC2 instance over time in an inactive state, start it periodically for maintenance and then stop it after maintenance. Ideally, this should be an automated process.

๐Ÿ’ผ [EC2.51] EC2 Client VPN endpoints should have client connection logging enabled

Client VPN endpoints allow remote clients to securely connect to resources in a Virtual Private Cloud (VPC) in AWS. Connection logs allow you to track user activity on the VPN endpoint and provides visibility. When you enable connection logging, you can specify the name of a log stream in the log group. If you don't specify a log stream, the Client VPN service creates one for you.

๐Ÿ’ผ [EC2.55] VPCs should be configured with an interface endpoint for ECR API

AWS PrivateLink enables customers to access services hosted on AWS in a highly available and scalable manner, while keeping all the network traffic within the AWS network. Service users can privately access services powered by PrivateLink from their VPC or their on-premises, without using public IPs, and without requiring traffic to traverse across the internet.

๐Ÿ’ผ [EC2.6] VPC flow logging should be enabled in all VPCs

With the VPC Flow Logs feature, you can capture information about the IP address traffic going to and from network interfaces in your VPC. After you create a flow log, you can view and retrieve its data in CloudWatch Logs. To reduce cost, you can also send your flow logs to Amazon S3. By default, the record includes values for the different components of the IP address flow, including the source, destination, and protocol.

๐Ÿ’ผ [EC2.8] EC2 instances should use Instance Metadata Service Version 2 (IMDSv2)

You use instance metadata to configure or manage the running instance. The IMDS provides access to temporary, frequently rotated credentials. These credentials remove the need to hard code or distribute sensitive credentials to instances manually or programmatically. The IMDS is attached locally to every EC2 instance. It runs on a special "link local" IP address of 169.254.169.254. This IP address is only accessible by software that runs on the instance. Version 2 of the IMDS adds new protections for the following types of vulnerabilities. These vulnerabilities could be used to try to access the IMDS. - Open website application firewalls - Open reverse proxies - Server-side request forgery (SSRF) vulnerabilities - Open Layer 3 firewalls and network address translation (NAT)

๐Ÿ’ผ [EC2.9] Amazon EC2 instances should not have a public IPv4 address

A public IPv4 address is an IP address that is reachable from the internet. If you launch your instance with a public IP address, then your EC2 instance is reachable from the internet. A private IPv4 address is an IP address that is not reachable from the internet. You can use private IPv4 addresses for communication between EC2 instances in the same VPC or in your connected private network.

๐Ÿ’ผ [ECR.2] ECR private repositories should have tag immutability configured

Amazon ECR Tag Immutability enables customers to rely on the descriptive tags of an image as a reliable mechanism to track and uniquely identify images. An immutable tag is static, which means each tag refers to a unique image. This improves reliability and scalability as the use of a static tag will always result in the same image being deployed. When configured, tag immutability prevents the tags from being overridden, which reduces the attack surface.

๐Ÿ’ผ [ECS.10] ECS Fargate services should run on the latest Fargate platform version

AWS Fargate platform versions refer to a specific runtime environment for Fargate task infrastructure, which is a combination of kernel and container runtime versions. New platform versions are released as the runtime environment evolves. For example, a new version may be released for kernel or operating system updates, new features, bug fixes, or security updates. Security updates and patches are deployed automatically for your Fargate tasks. If a security issue is found that affects a platform version, AWS patches the platform version.

๐Ÿ’ผ [ECS.12] ECS clusters should use Container Insights

Monitoring is an important part of maintaining the reliability, availability, and performance of Amazon ECS clusters. Use CloudWatch Container Insights to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices. CloudWatch automatically collects metrics for many resources, such as CPU, memory, disk, and network. Container Insights also provides diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly. You can also set CloudWatch alarms on metrics that Container Insights collects.

๐Ÿ’ผ [ECS.3] ECS task definitions should not share the host's process namespace

A process ID (PID) namespace provides separation between processes. It prevents system processes from being visible, and allows PIDs to be reused, including PID 1. If the host's PID namespace is shared with containers, it would allow containers to see all of the processes on the host system. This reduces the benefit of process level isolation between the host and the containers. These circumstances could lead to unauthorized access to processes on the host itself, including the ability to manipulate and terminate them. Customers shouldn't share the host's process namespace with containers running on it.

๐Ÿ’ผ [ECS.5] ECS containers should be limited to read-only access to root filesystems

If the readonlyRootFilesystem parameter is set to true in an Amazon ECS task definition, the ECS container is given read-only access to its root file system. This reduces security attack vectors because the container instance's root file system can't be tampered with or written to without explicit volume mounts that have read-write permissions for file system folders and directories. Enabling this option also adheres to the principle of least privilege.

๐Ÿ’ผ [ECS.9] ECS task definitions should have a logging configuration

Logging helps you maintain the reliability, availability, and performance of Amazon ECS. Collecting data from task definitions provides visibility, which can help you debug processes and find the root cause of errors. If you are using a logging solution that does not have to be defined in the ECS task definition (such as a third party logging solution), you can disable this control after ensuring that your logs are properly captured and delivered.

๐Ÿ’ผ [EFS.3] EFS access points should enforce a root directory

When you enforce a root directory, the NFS client using the access point uses the root directory configured on the access point instead of the file system's root directory. Enforcing a root directory for an access point helps restrict data access by ensuring that users of the access point can only reach files of the specified subdirectory.

๐Ÿ’ผ [EFS.4] EFS access points should enforce a user identity

Amazon EFS access points are application-specific entry points into an EFS file system that make it easier to manage application access to shared datasets. Access points can enforce a user identity, including the user's POSIX groups, for all file system requests that are made through the access point. Access points can also enforce a different root directory for the file system so that clients can only access data in the specified directory or its subdirectories.

๐Ÿ’ผ [EFS.6] EFS mount targets should not be associated with a public subnet

By default, an file system is only accessible from the virtual private cloud (VPC) in which you created it. We recommend creating EFS mount targets in private subnets that are not accessible from the internet. This helps ensure that your file system is only accessible to authorized users and isn't vulnerable to unauthorized access or attacks.

๐Ÿ’ผ [EFS.7] EFS file systems should have automatic backups enabled

A data backup is a copy of your system, configuration, or application data that's stored separately from the original. Enabling regular backups helps you safeguard valuable data against unforeseen events like system failures, cyberattacks, or accidental deletions. Having a robust backup strategy also facilitates quicker recovery, business continuity, and peace of mind in the face of potential data loss.

๐Ÿ’ผ [EKS.1] EKS cluster endpoints should not be publicly accessible

When you create a new cluster, Amazon EKS creates an endpoint for the managed Kubernetes API server that you use to communicate with your cluster. By default, this API server endpoint is publicly available to the internet. Access to the API server is secured using a combination of AWS Identity and Access Management (IAM) and native Kubernetes Role Based Access Control (RBAC). By removing public access to the endpoint, you can avoid unintentional exposure and access to your cluster.

๐Ÿ’ผ [EKS.3] EKS clusters should use encrypted Kubernetes secrets

When you encrypt secrets, you can use AWS Key Management Service (AWS KMS) keys to provide envelope encryption of Kubernetes secrets stored in etcd for your cluster. This encryption is in addition to the EBS volume encryption that is enabled by default for all data (including secrets) that is stored in etcd as part of an EKS cluster. Using secrets encryption for your EKS cluster allows you to deploy a defense in depth strategy for Kubernetes applications by encrypting Kubernetes secrets with a KMS key that you define and manage.

๐Ÿ’ผ [EKS.8] EKS clusters should have audit logging enabled

EKS control plane logging provides audit and diagnostic logs directly from the EKS control plane to Amazon CloudWatch Logs in your account. You can select the log types you need, and logs are sent as log streams to a group for each EKS cluster in CloudWatch. Logging provides visibility into the access and performance of EKS clusters. By sending EKS control plane logs for your EKS clusters to CloudWatch Logs, you can record operations for audit and diagnostic purposes in a central location.

๐Ÿ’ผ [ElastiCache.1] ElastiCache (Valkey and Redis OSS) clusters should have automatic backups enabled

Amazon ElastiCache (Redis OSS) clusters can back up their data. You can use the backup to restore a cluster or seed a new cluster. The backup consists of the cluster's metadata, along with all of the data in the cluster. All backups are written to Amazon Simple Storage Service (Amazon S3), which provides durable storage. You can restore your data by creating a new Redis cluster and populating it with data from a backup.

๐Ÿ’ผ [ElastiCache.7] ElastiCache clusters should not use the default subnet group

When launching an ElastiCache cluster, a default subnet group is created if one doesn't exist already. The default group uses subnets from the default Virtual Private Cloud (VPC). We recommend using custom subnet groups that are more restrictive of the subnets that the cluster resides in, and the networking that the cluster inherits from the subnets.

๐Ÿ’ผ [ElasticBeanstalk.1] Elastic Beanstalk environments should have enhanced health reporting enabled

Elastic Beanstalk enhanced health reporting enables a more rapid response to changes in the health of the underlying infrastructure. These changes could result in a lack of availability of the application. Elastic Beanstalk enhanced health reporting provides a status descriptor to gauge the severity of the identified issues and identify possible causes to investigate. The Elastic Beanstalk health agent, included in supported Amazon Machine Images (AMIs), evaluates logs and metrics of environment EC2 instances.

๐Ÿ’ผ [ElasticBeanstalk.3] Elastic Beanstalk should stream logs to CloudWatch

CloudWatch helps you collect and monitor various metrics for your applications and infrastructure resources. You can also use CloudWatch to configure alarm actions based on specific metrics. We recommend integrating Elastic Beanstalk with CloudWatch to get increased visibility into your Elastic Beanstalk environment. Elastic Beanstalk logs include the eb-activity.log, access logs from the environment nginx or Apache proxy server, and logs that are specific to an environment.

๐Ÿ’ผ [ELB.1] Application Load Balancer should be configured to redirect all HTTP requests to HTTPS

Before you start to use your Application Load Balancer, you must add one or more listeners. A listener is a process that uses the configured protocol and port to check for connection requests. Listeners support both the HTTP and HTTPS protocols. You can use an HTTPS listener to offload the work of encryption and decryption to your load balancer. To enforce encryption in transit, you should use redirect actions with Application Load Balancers to redirect client HTTP requests to an HTTPS request on port 443.

๐Ÿ’ผ [ELB.10] Classic Load Balancer should span multiple Availability Zones

A Classic Load Balancer can be set up to distribute incoming requests across Amazon EC2 instances in a single Availability Zone or multiple Availability Zones. A Classic Load Balancer that does not span multiple Availability Zones is unable to redirect traffic to targets in another Availability Zone if the sole configured Availability Zone becomes unavailable.

๐Ÿ’ผ [ELB.12] Application Load Balancer should be configured with defensive or strictest desync mitigation mode

HTTP Desync issues can lead to request smuggling and make applications vulnerable to request queue or cache poisoning. In turn, these vulnerabilities can lead to credential stuffing or execution of unauthorized commands. Application Load Balancers configured with defensive or strictest desync mitigation mode protect your application from security issues that may be caused by HTTP Desync.

๐Ÿ’ผ [ELB.13] Application, Network and Gateway Load Balancers should span multiple Availability Zones

Elastic Load Balancing automatically distributes your incoming traffic across multiple targets, such as EC2 instances, containers, and IP addresses, in one or more Availability Zones. Elastic Load Balancing scales your load balancer as your incoming traffic changes over time. It is recommended to configure at least two availability zones to ensure availability of services, as the Elastic Load Balancer will be able to direct traffic to another availability zone if one becomes unavailable. Having multiple availability zones configured will help eliminate having a single point of failure for the application.

๐Ÿ’ผ [ELB.14] Classic Load Balancer should be configured with defensive or strictest desync mitigation mode

HTTP Desync issues can lead to request smuggling and make applications vulnerable to request queue or cache poisoning. In turn, these vulnerabilities can lead to credential hijacking or execution of unauthorized commands. Classic Load Balancers configured with defensive or strictest desync mitigation mode protect your application from security issues that may be caused by HTTP Desync.

๐Ÿ’ผ [ELB.17] Application and Network Load Balancers with listeners should use recommended security policies

Elastic Load Balancing uses an SSL negotiation configuration, known as a security policy, to negotiate connections between a client and a load balancer. The security policy specifies a combination of protocols and ciphers. The protocol establishes a secure connection between a client and a server. A cipher is an encryption algorithm that uses encryption keys to create a coded message. During the connection negotiation process, the client and the load balancer present a list of ciphers and protocols that they each support, in order of preference.

๐Ÿ’ผ [ELB.5] Application and Classic Load Balancers logging should be enabled

Elastic Load Balancing provides access logs that capture detailed information about requests sent to your load balancer. Each log contains information such as the time the request was received, the client's IP address, latencies, request paths, and server responses. You can use these access logs to analyze traffic patterns and to troubleshoot issues.

๐Ÿ’ผ [ELB.9] Classic Load Balancers should have cross-zone load balancing enabled

A load balancer node distributes traffic only across the registered targets in its Availability Zone. When cross-zone load balancing is disabled, each load balancer node distributes traffic only across the registered targets in its Availability Zone. If the number of registered targets is not same across the Availability Zones, traffic wont be distributed evenly and the instances in one zone may end up over utilized compared to the instances in another zone. With cross-zone load balancing enabled, each load balancer node for your Classic Load Balancer distributes requests evenly across the registered instances in all enabled Availability Zones.

๐Ÿ’ผ [EMR.2] Amazon EMR block public access setting should be enabled

Amazon EMR block public access prevents you from launching a cluster in a public subnet if the cluster has a security configuration that allows inbound traffic from public IP addresses on a port. When a user from your AWS account launches a cluster, Amazon EMR checks the port rules in the security group for the cluster and compares them with your inbound traffic rules. If the security group has an inbound rule that opens ports to the public IP addresses IPv4 0.0.0.0/0 or IPv6 ::/0, and those ports aren't specified as exceptions for your account, Amazon EMR doesn't let the user create the cluster.

๐Ÿ’ผ [ES.1] Elasticsearch domains should have encryption at-rest enabled

For an added layer of security for your sensitive data in OpenSearch, you should configure your OpenSearch to be encrypted at rest. Elasticsearch domains offer encryption of data at rest. The feature uses AWS KMS to store and manage your encryption keys. To perform the encryption, it uses the Advanced Encryption Standard algorithm with 256-bit keys (AES-256).

๐Ÿ’ผ [ES.2] Elasticsearch domains should not be publicly accessible

Elasticsearch domains deployed within a VPC can communicate with VPC resources over the private AWS network, without the need to traverse the public internet. This configuration increases the security posture by limiting access to the data in transit. VPCs provide a number of network controls to secure access to Elasticsearch domains, including network ACL and security groups.

๐Ÿ’ผ [ES.3] Elasticsearch domains should encrypt data sent between nodes

HTTPS (TLS) can be used to help prevent potential attackers from eavesdropping on or manipulating network traffic using person-in-the-middle or similar attacks. Only encrypted connections over HTTPS (TLS) should be allowed. Enabling node-to-node encryption for Elasticsearch domains ensures that intra-cluster communications are encrypted in transit.

๐Ÿ’ผ [ES.7] Elasticsearch domains should be configured with at least three dedicated master nodes

An Elasticsearch domain requires at least three dedicated primary nodes for high availability and fault-tolerance. Dedicated primary node resources can be strained during data node blue/green deployments because there are additional nodes to manage. Deploying an Elasticsearch domain with at least three dedicated primary nodes ensures sufficient primary node resource capacity and cluster operations if a node fails.

๐Ÿ’ผ [ES.8] Connections to Elasticsearch domains should be encrypted using the latest TLS security policy

HTTPS (TLS) can be used to help prevent potential attackers from using person-in-the-middle or similar attacks to eavesdrop on or manipulate network traffic. Only encrypted connections over HTTPS (TLS) should be allowed. Encrypting data in transit can affect performance. You should test your application with this feature to understand the performance profile and the impact of TLS. TLS 1.2 provides several security enhancements over previous versions of TLS.

๐Ÿ’ผ [FSx.3] FSx for OpenZFS file systems should be configured for Multi-AZ deployment

Amazon FSx for OpenZFS supports several deployment types for file systems: Multi-AZ (HA), Single-AZ (HA), and Single-AZ (non-HA). The deployment types offer different levels of availability and durability. Multi-AZ (HA) file systems are composed of a high-availability (HA) pair of file servers that are spread across two Availability Zones (AZs). We recommend using the Multi-AZ (HA) deployment type for most production workloads due to the high availability and durability model that it provides.

๐Ÿ’ผ [FSx.4] FSx for NetApp ONTAP file systems should be configured for Multi-AZ deployment

Amazon FSx for NetApp ONTAP supports several deployment types for file systems: Single-AZ 1, Single-AZ 2, Multi-AZ 1, and Multi-AZ 2. The deployment types offer different levels of availability and durability. We recommend using a Multi-AZ deployment type for most production workloads due to the high availability and durability model that Multi-AZ deployment types provide. Multi-AZ file systems support all the availability and durability features of Single-AZ file systems. In addition, they're designed to provide continuous availability to data even when an Availability Zone (AZ) is unavailable.

๐Ÿ’ผ [FSx.5] FSx for Windows File Server file systems should be configured for Multi-AZ deployment

Amazon FSx for Windows File Server supports two deployment types for file systems: Single-AZ and Multi-AZ. The deployment types offer different levels of availability and durability. Single-AZ file systems are composed of a single Windows file server instance and a set of storage volumes within a single Availability Zone (AZ). Multi-AZ file systems are composed of a high-availability cluster of Windows file servers spread across two Availability Zones.

๐Ÿ’ผ [GuardDuty.1] GuardDuty should be enabled

It is highly recommended that you enable GuardDuty in all supported AWS Regions. Doing so allows GuardDuty to generate findings about unauthorized or unusual activity, even in Regions that you do not actively use. This also allows GuardDuty to monitor CloudTrail events for global AWS services such as IAM.

๐Ÿ’ผ [GuardDuty.10] GuardDuty S3 Protection should be enabled

S3 Protection enables GuardDuty to monitor object-level API operations to identify potential security risks for data within your Amazon Simple Storage Service (Amazon S3) buckets. GuardDuty monitors threats against your S3 resources by analyzing AWS CloudTrail management events and CloudTrail S3 data events.

๐Ÿ’ผ [GuardDuty.11] GuardDuty Runtime Monitoring should be enabled

GuardDuty Runtime Monitoring observes and analyzes operating system-level, networking, and file events to help you detect potential threats in specific AWS workloads in your environment. It uses GuardDuty security agents that add visibility into runtime behavior, such as file access, process execution, command line arguments, and network connections. You can enable and manage the security agent for each type of resource that you want to monitor for potential threats, such as Amazon EKS clusters and Amazon EC2 instances.

๐Ÿ’ผ [GuardDuty.12] GuardDuty ECS Runtime Monitoring should be enabled

GuardDuty Runtime Monitoring observes and analyzes operating system-level, networking, and file events to help you detect potential threats in specific AWS workloads in your environment. It uses GuardDuty security agents that add visibility into runtime behavior, such as file access, process execution, command line arguments, and network connections. You can enable and manage the security agent for each type of resource that you want to monitor for potential threats. This includes Amazon ECS clusters on AWS Fargate.

๐Ÿ’ผ [GuardDuty.13] GuardDuty EC2 Runtime Monitoring should be enabled

GuardDuty Runtime Monitoring observes and analyzes operating system-level, networking, and file events to help you detect potential threats in specific AWS workloads in your environment. It uses GuardDuty security agents that add visibility into runtime behavior, such as file access, process execution, command line arguments, and network connections. You can enable and manage the security agent for each type of resource that you want to monitor for potential threats. This includes Amazon EC2 instances.

๐Ÿ’ผ [GuardDuty.5] GuardDuty EKS Audit Log Monitoring should be enabled

GuardDuty EKS Audit Log Monitoring helps you detect potentially suspicious activities in your Amazon Elastic Kubernetes Service (Amazon EKS) clusters. EKS Audit Log Monitoring uses Kubernetes audit logs to capture chronological activities from users, applications using the Kubernetes API, and the control plane.

๐Ÿ’ผ [GuardDuty.6] GuardDuty Lambda Protection should be enabled

GuardDuty Lambda Protection helps you identify potential security threats when an AWS Lambda function gets invoked. After your enable Lambda Protection, GuardDuty starts monitoring Lambda network activity logs associated with the Lambda functions in your AWS account. When a Lambda function gets invoked and GuardDuty identifies suspicious network traffic that indicates the presence of a potentially malicious piece of code in your Lambda function, GuardDuty generates a finding.

๐Ÿ’ผ [GuardDuty.8] GuardDuty Malware Protection for EC2 should be enabled

GuardDuty Malware Protection for EC2 helps you detect the potential presence of malware by scanning the Amazon Elastic Block Store (Amazon EBS) volumes that are attached to Amazon Elastic Compute Cloud (Amazon EC2) instances and container workloads. Malware Protection provides scan options where you can decide if you want to include or exclude specific EC2 instances and container workloads at the time of scanning. It also provides an option to retain the snapshots of EBS volumes attached to the EC2 instances or container workloads, in your GuardDuty accounts. The snapshots get retained only when malware is found and Malware Protection findings are generated.

๐Ÿ’ผ [GuardDuty.9] GuardDuty RDS Protection should be enabled

RDS Protection in GuardDuty analyzes and profiles RDS login activity for potential access threats to your Amazon Aurora databases (Aurora MySQL-Compatible Edition and Aurora PostgreSQL-Compatible Edition). This feature allows you to identify potentially suspicious login behavior. RDS Protection doesn't require additional infrastructure; it is designed so as not to affect the performance of your database instances. When RDS Protection detects a potentially suspicious or anomalous login attempt that indicates a threat to your database, GuardDuty generates a new finding with details about the potentially compromised database.

๐Ÿ’ผ [IAM.1] IAM policies should not allow full "*" administrative privileges

IAM policies define a set of privileges that are granted to users, groups, or roles. Following standard security advice, AWS recommends that you grant least privilege, which means to grant only the permissions that are required to perform a task. When you provide full administrative privileges instead of the minimum set of permissions that the user needs, you expose the resources to potentially unwanted actions. Instead of allowing full administrative privileges, determine what users need to do and then craft policies that let the users perform only those tasks. It is more secure to start with a minimum set of permissions and grant additional permissions as necessary. Do not start with permissions that are too lenient and then try to tighten them later.

๐Ÿ’ผ [IAM.2] IAM users should not have IAM policies attached

By default, IAM users, groups, and roles have no access to AWS resources. IAM policies grant privileges to users, groups, or roles. We recommend that you apply IAM policies directly to groups and roles but not to users. Assigning privileges at the group or role level reduces the complexity of access management as the number of users grows. Reducing access management complexity might in turn reduce the opportunity for a principal to inadvertently receive or retain excessive privileges.

๐Ÿ’ผ [IAM.21] IAM customer managed policies that you create should not allow wildcard actions for services

When you assign permissions to AWS services, it is important to scope the allowed IAM actions in your IAM policies. You should restrict IAM actions to only those actions that are needed. This helps you to provision least privilege permissions. Overly permissive policies might lead to privilege escalation if the policies are attached to an IAM principal that might not require the permission.

๐Ÿ’ผ [IAM.3] IAM users' access keys should be rotated every 90 days or less

Access keys consist of an access key ID and a secret access key. They are used to sign programmatic requests that you make to AWS. Users need their own access keys to make programmatic calls to AWS from the AWS CLI, Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the API operations for individual AWS services.

๐Ÿ’ผ [Inspector.1] Amazon Inspector EC2 scanning should be enabled

Amazon Inspector EC2 scanning extracts metadata from your Amazon Elastic Compute Cloud (Amazon EC2) instance, and then compares this metadata against rules collected from security advisories to produce findings. Amazon Inspector scans instances for package vulnerabilities and network reachability issues.

๐Ÿ’ผ [Inspector.2] Amazon Inspector ECR scanning should be enabled

Amazon Inspector scans container images stored in Amazon Elastic Container Registry (Amazon ECR) for software vulnerabilities to generate package vulnerability findings. When you activate Amazon Inspector scans for Amazon ECR, you set Amazon Inspector as your preferred scanning service for your private registry. This replaces basic scanning, which is provided at no charge by Amazon ECR, with enhanced scanning, which is provided and billed through Amazon Inspector. Enhanced scanning gives you the benefit of vulnerability scanning for both operating system and programming language packages at the registry level. You can review findings discovered using enhanced scanning at the image level, for each layer of the image, on the Amazon ECR console. Additionally, you can review and work with these findings in other services not available for basic scanning findings, including AWS Security Hub and Amazon EventBridge.

๐Ÿ’ผ [Inspector.4] Amazon Inspector Lambda standard scanning should be enabled

Amazon Inspector Lambda standard scanning identifies software vulnerabilities in the application package dependencies you add to your AWS Lambda function code and layers. If Amazon Inspector detects a vulnerability in your Lambda function application package dependencies, Amazon Inspector produces a detailed Package Vulnerability type finding.

๐Ÿ’ผ [Kinesis.1] Kinesis streams should be encrypted at rest

Server-side encryption is a feature in Amazon Kinesis Data Streams that automatically encrypts data before it's at rest by using an AWS KMS key. Data is encrypted before it's written to the Kinesis stream storage layer, and decrypted after it's retrieved from storage. As a result, your data is encrypted at rest within the Amazon Kinesis Data Streams service.

๐Ÿ’ผ [Kinesis.3] Kinesis streams should have an adequate data retention period

In Kinesis Data Streams, a data stream is an ordered sequence of data records meant to be written to and read from in real time. Data records are stored in shards in your stream temporarily. The time period from when a record is added to when it is no longer accessible is called the retention period. Kinesis Data Streams almost immediately makes records older than the new retention period inaccessible after decreasing the retention period.

๐Ÿ’ผ [KMS.1] IAM customer managed policies should not allow decryption actions on all KMS keys

With AWS KMS, you control who can use your KMS keys and gain access to your encrypted data. IAM policies define which actions an identity (user, group, or role) can perform on which resources. Following security best practices, AWS recommends that you allow least privilege. In other words, you should grant to identities only the `kms:Decrypt` or `kms:ReEncryptFrom` permissions and only for the keys that are required to perform a task. Otherwise, the user might use keys that are not appropriate for your data.

๐Ÿ’ผ [KMS.2] IAM principals should not have IAM inline policies that allow decryption actions on all KMS keys

With AWS KMS, you control who can use your KMS keys and gain access to your encrypted data. IAM policies define which actions an identity (user, group, or role) can perform on which resources. Following security best practices, AWS recommends that you allow least privilege. In other words, you should grant to identities only the permissions they need and only for keys that are required to perform a task. Otherwise, the user might use keys that are not appropriate for your data.

๐Ÿ’ผ [KMS.3] AWS KMS keys should not be deleted unintentionally

KMS keys cannot be recovered once deleted. Data encrypted under a KMS key is also permanently unrecoverable if the KMS key is deleted. If meaningful data has been encrypted under a KMS key scheduled for deletion, consider decrypting the data or re-encrypting the data under a new KMS key unless you are intentionally performing a *cryptographic erasure*. When a KMS key is scheduled for deletion, a mandatory waiting period is enforced to allow time to reverse the deletion, if it was scheduled in error. The default waiting period is 30 days, but it can be reduced to as short as 7 days when the KMS key is scheduled for deletion. During the waiting period, the scheduled deletion can be canceled and the KMS key will not be deleted.

๐Ÿ’ผ [KMS.5] KMS keys should not be publicly accessible

Implementing least privilege access is fundamental to reducing security risk and the impact of errors or malicious intent. If the key policy for an AWS KMS key allows access from external accounts, third parties might be able to encrypt and decrypt data by using the key. This could result in an internal or external threat exfiltrating data from AWS services that use the key.

๐Ÿ’ผ [Lambda.2] Lambda functions should use supported runtimes

Lambda runtimes are built around a combination of operating system, programming language, and software libraries that are subject to maintenance and security updates. When a runtime component is no longer supported for security updates, Lambda deprecates the runtime. Even though you can't create functions that use the deprecated runtime, the function is still available to process invocation events.

๐Ÿ’ผ [Lambda.5] VPC Lambda functions should operate in multiple Availability Zones

Deploying resources across multiple AZs is an AWS best practice to ensure high availability within your architecture. Availability is a core pillar in the confidentiality, integrity, and availability triad security model. All Lambda functions that connect to a VPC should have a multi-AZ deployment to ensure that a single zone of failure doesn't cause a total disruption of operations.

๐Ÿ’ผ [Macie.1] Amazon Macie should be enabled

Amazon Macie discovers sensitive data using machine learning and pattern matching, provides visibility into data security risks, and enables automated protection against those risks. Macie automatically and continually evaluates your Amazon Simple Storage Service (Amazon S3) buckets for security and access control, and generates findings to notify you of potential issues with the security or privacy of your Amazon S3 data. Macie also automates discovery and reporting of sensitive data, such as personally identifiable information (PII), to provide you with a better understanding of the data that you store in Amazon S3.

๐Ÿ’ผ [Macie.2] Macie automated sensitive data discovery should be enabled

Macie automates discovery and reporting of sensitive data, such as personally identifiable information (PII), in Amazon Simple Storage Service (Amazon S3) buckets. With automated sensitive data discovery, Macie continually evaluates your bucket inventory and uses sampling techniques to identify and select representative S3 objects from your buckets. Macie then analyzes the selected objects, inspecting them for sensitive data. As the analyses progress, Macie updates statistics, inventory data, and other information that it provides about your S3 data. Macie also generates findings to report sensitive data that it finds.

๐Ÿ’ผ [MSK.1] MSK clusters should be encrypted in transit among broker nodes

HTTPS offers an extra layer of security as it uses TLS to move data and can be used to help prevent potential attackers from using person-in-the-middle or similar attacks to eavesdrop on or manipulate network traffic. By default, Amazon MSK encrypts data in transit with TLS. However, you can override this default at the time that you create the cluster.

๐Ÿ’ผ [MSK.3] MSK Connect connectors should be encrypted in transit

Data in transit refers to data that moves from one location to another, such as between nodes in your cluster or between your cluster and your application. Data may move across the internet or within a private network. Encrypting data in transit reduces the risk that an unauthorized user can eavesdrop on network traffic.

๐Ÿ’ผ [Neptune.1] Neptune DB clusters should be encrypted at rest

Data at rest refers to any data that's stored in persistent, non-volatile storage for any duration. Encryption helps you protect the confidentiality of such data, reducing the risk that an unauthorized user can access it. Encrypting your Neptune DB clusters protects your data and metadata against unauthorized access. It also fulfills compliance requirements for data-at-rest encryption of production file systems.

๐Ÿ’ผ [Neptune.2] Neptune DB clusters should publish audit logs to CloudWatch Logs

Amazon Neptune and Amazon CloudWatch are integrated so that you can gather and analyze performance metrics. Neptune automatically sends metrics to CloudWatch and also supports CloudWatch Alarms. Audit logs are highly customizable. When you audit a database, each operation on the data can be monitored and logged to an audit trail, including information about which database cluster is accessed and how.

๐Ÿ’ผ [Neptune.6] Neptune DB cluster snapshots should be encrypted at rest

Data at rest refers to any data that's stored in persistent, non-volatile storage for any duration. Encryption helps you protect the confidentiality of such data, reducing the risk that an unauthorized user gets access to it. Data in Neptune DB clusters snapshots should be encrypted at rest for an added layer of security.

๐Ÿ’ผ [Neptune.8] Neptune DB clusters should be configured to copy tags to snapshots

Identification and inventory of your IT assets is a crucial aspect of governance and security. You should tag snapshots in the same way as their parent Amazon RDS database clusters. Copying tags ensures that the metadata for the DB snapshots matches that of the parent database clusters, and that access policies for the DB snapshot also match those of the parent DB instance.

๐Ÿ’ผ [NetworkFirewall.2] Network Firewall logging should be enabled

Logging helps you maintain the reliability, availability, and performance of your firewalls. In Network Firewall, logging gives you detailed information about network traffic, including the time that the stateful engine received a packet flow, detailed information about the packet flow, and any stateful rule action taken against the packet flow.

๐Ÿ’ผ [Opensearch.1] OpenSearch domains should have encryption at rest enabled

For an added layer of security for sensitive data, you should configure your OpenSearch Service domain to be encrypted at rest. When you configure encryption of data at rest, AWS KMS stores and manages your encryption keys. To perform the encryption, AWS KMS uses the Advanced Encryption Standard algorithm with 256-bit keys (AES-256).

๐Ÿ’ผ [Opensearch.2] OpenSearch domains should not be publicly accessible

OpenSearch domains deployed within a VPC can communicate with VPC resources over the private AWS network, without the need to traverse the public internet. This configuration increases the security posture by limiting access to the data in transit. VPCs provide a number of network controls to secure access to OpenSearch domains, including network ACL and security groups.

๐Ÿ’ผ [Opensearch.3] OpenSearch domains should encrypt data sent between nodes

HTTPS (TLS) can be used to help prevent potential attackers from eavesdropping on or manipulating network traffic using person-in-the-middle or similar attacks. Only encrypted connections over HTTPS (TLS) should be allowed. Enabling node-to-node encryption for OpenSearch domains ensures that intra-cluster communications are encrypted in transit. There can be a performance penalty associated with this configuration. You should be aware of and test the performance trade-off before enabling this option.

๐Ÿ’ผ [Opensearch.8] Connections to OpenSearch domains should be encrypted using the latest TLS security policy

HTTPS (TLS) can be used to help prevent potential attackers from using person-in-the-middle or similar attacks to eavesdrop on or manipulate network traffic. Only encrypted connections over HTTPS (TLS) should be allowed. Encrypting data in transit can affect performance. You should test your application with this feature to understand the performance profile and the impact of TLS. TLS 1.2 provides several security enhancements over previous versions of TLS.

๐Ÿ’ผ [PCA.1] AWS Private CA root certificate authority should be disabled

With AWS Private CA, you can create a CA hierarchy that includes a root CA and subordinate CAs. You should minimize the use of the root CA for daily tasks, especially in production environments. The root CA should only be used to issue certificates for intermediate CAs. This allows the root CA to be stored out of harm's way while the intermediate CAs perform the daily task of issuing end-entity certificates.

๐Ÿ’ผ [RDS.1] RDS snapshot should be private

RDS snapshots are used to back up the data on your RDS instances at a specific point in time. They can be used to restore previous states of RDS instances. An RDS snapshot must not be public unless intended. If you share an unencrypted manual snapshot as public, this makes the snapshot available to all AWS accounts. This may result in unintended data exposure of your RDS instance.

๐Ÿ’ผ [RDS.13] RDS automatic minor version upgrades should be enabled

Enabling automatic minor version upgrades ensures that the latest minor version updates to the relational database management system (RDBMS) are installed. These upgrades might include security patches and bug fixes. Keeping up to date with patch installation is an important step in securing systems.

๐Ÿ’ผ [RDS.2] RDS DB Instances should prohibit public access, as determined by the PubliclyAccessible configuration

The `PubliclyAccessible` value in the RDS instance configuration indicates whether the DB instance is publicly accessible. When the DB instance is configured with `PubliclyAccessible`, it is an Internet-facing instance with a publicly resolvable DNS name, which resolves to a public IP address. When the DB instance isn't publicly accessible, it is an internal instance with a DNS name that resolves to a private IP address.

๐Ÿ’ผ [RDS.23] RDS instances should not use a database engine default port

If you use a known port to deploy an RDS cluster or instance, an attacker can guess information about the cluster or instance. The attacker can use this information in conjunction with other information to connect to an RDS cluster or instance or gain additional information about your application. When you change the port, you must also update the existing connection strings that were used to connect to the old port. You should also check the security group of the DB instance to ensure that it includes an ingress rule that allows connectivity on the new port.

๐Ÿ’ผ [RDS.27] RDS DB clusters should be encrypted at rest

Data at rest refers to any data that's stored in persistent, non-volatile storage for any duration. Encryption helps you protect the confidentiality of such data, reducing the risk that an unauthorized user can access it. Encrypting your RDS DB clusters protects your data and metadata against unauthorized access. It also fulfills compliance requirements for data-at-rest encryption of production file systems.

๐Ÿ’ผ [RDS.3] RDS DB instances should have encryption at-rest enabled

For an added layer of security for your sensitive data in RDS DB instances, you should configure your RDS DB instances to be encrypted at rest. To encrypt your RDS DB instances and snapshots at rest, enable the encryption option for your RDS DB instances. Data that is encrypted at rest includes the underlying storage for DB instances, its automated backups, read replicas, and snapshots. RDS encrypted DB instances use the open standard AES-256 encryption algorithm to encrypt your data on the server that hosts your RDS DB instances. After your data is encrypted, Amazon RDS handles authentication of access and decryption of your data transparently with a minimal impact on performance. You do not need to modify your database client applications to use encryption.

๐Ÿ’ผ [RDS.34] Aurora MySQL DB clusters should publish audit logs to CloudWatch Logs

Audit logs capture a record of database activity, including login attempts, data modifications, schema changes, and other events that can be audited for security and compliance purposes. When you configure an Aurora MySQL DB cluster to publish audit logs to a log group in Amazon CloudWatch Logs, you can perform real-time analysis of the log data. CloudWatch Logs retains logs in highly durable storage. You can also create alarms and view metrics in CloudWatch.

๐Ÿ’ผ [RDS.35] RDS DB clusters should have automatic minor version upgrade enabled

RDS provides automatic minor version upgrade so that you can keep your Multi-AZ DB cluster up to date. Minor versions can introduce new software features, bug fixes, security patches, and performance improvements. By enabling automatic minor version upgrade on RDS database clusters, the cluster, along with the instances in the cluster, will receive automatic updates to the minor version when new versions are available. The updates are applied automatically during the maintenance window.

๐Ÿ’ผ [RDS.36] RDS for PostgreSQL DB instances should publish logs to CloudWatch Logs

Database logging provides detailed records of requests made to an RDS instance. PostgreSQL generates event logs that contain useful information for administrators. Publishing these logs to CloudWatch Logs centralizes log management and helps you perform real-time analysis of the log data. CloudWatch Logs retains logs in highly durable storage. You can also create alarms and view metrics in CloudWatch.

๐Ÿ’ผ [RDS.37] Aurora PostgreSQL DB clusters should publish logs to CloudWatch Logs

Database logging provides detailed records of requests made to an RDS cluster. Aurora PostgreSQL generates event logs that contain useful information for administrators. Publishing these logs to CloudWatch Logs centralizes log management and helps you perform real-time analysis of the log data. CloudWatch Logs retains logs in highly durable storage. You can also create alarms and view metrics in CloudWatch.

๐Ÿ’ผ [RDS.40] RDS for SQL Server DB instances should publish logs to CloudWatch Logs

Database logging provides detailed records of requests made to an Amazon RDS DB instance. Publishing logs to CloudWatch Logs centralizes log management and helps you perform real-time analysis of log data. CloudWatch Logs retains logs in highly durable storage. In addition, you can use it to create alarms for specific errors that can occur, such as frequent restarts that are recorded in an error log. Similarly, you can create alarms for errors or warnings that are recorded in SQL Server agent logs related to SQL agent jobs.

๐Ÿ’ผ [RDS.6] Enhanced monitoring should be configured for RDS DB instances

In Amazon RDS, Enhanced Monitoring enables a more rapid response to performance changes in underlying infrastructure. These performance changes could result in a lack of availability of the data. Enhanced Monitoring provides real-time metrics of the operating system that your RDS DB instance runs on. An agent is installed on the instance. The agent can obtain metrics more accurately than is possible from the hypervisor layer.

๐Ÿ’ผ [Redshift.1] Amazon Redshift clusters should prohibit public access

The `PubliclyAccessible` attribute of the Amazon Redshift cluster configuration indicates whether the cluster is publicly accessible. When the cluster is configured with `PubliclyAccessible` set to `true`, it is an Internet-facing instance that has a publicly resolvable DNS name, which resolves to a public IP address. When the cluster is not publicly accessible, it is an internal instance with a DNS name that resolves to a private IP address. Unless you intend for your cluster to be publicly accessible, the cluster should not be configured with `PubliclyAccessible` set to `true`.

๐Ÿ’ผ [Redshift.10] Redshift clusters should be encrypted at rest

In Amazon Redshift, you can turn on database encryption for your clusters to help protect data at rest. When you turn on encryption for a cluster, the data blocks and system metadata are encrypted for the cluster and its snapshots. Encryption of data at rest is a recommended best practice because it adds a layer of access management to your data. Encrypting Redshift clusters at rest reduces the risk that an unauthorized user can access the data stored on disk.

๐Ÿ’ผ [Redshift.7] Redshift clusters should use enhanced VPC routing

Enhanced VPC routing forces all `COPY` and `UNLOAD` traffic between the cluster and data repositories to go through your VPC. You can then use VPC features such as security groups and network access control lists to secure network traffic. You can also use VPC Flow Logs to monitor network traffic.

๐Ÿ’ผ [RedshiftServerless.1] Amazon Redshift Serverless workgroups should use enhanced VPC routing

If enhanced VPC routing is disabled for an Amazon Redshift Serverless workgroup, Amazon Redshift routes traffic through the internet, including traffic to other services within the AWS network. If you enable enhanced VPC routing for a workgroup, Amazon Redshift forces all `COPY` and `UNLOAD` traffic between your cluster and your data repositories through your virtual private cloud (VPC) based on the Amazon VPC service. With enhanced VPC routing, you can use standard VPC features to control the flow of data between your Amazon Redshift cluster and other resources. This includes features such as VPC security groups and endpoint policies, network access control lists (ACLs), and Domain Name System (DNS) servers. You can also use VPC flow logs to monitor `COPY` and `UNLOAD` traffic.

๐Ÿ’ผ [Route53.2] Route 53 public hosted zones should log DNS queries

Logging DNS queries for a Route 53 hosted zone addresses DNS security and compliance requirements and grants visibility. The logs include information such as the domain or subdomain that was queried, the date and time of the query, the DNS record type (for example, A or AAAA), and the DNS response code (for example, NoError or ServFail). When DNS query logging is enabled, Route 53 publishes the log files to Amazon CloudWatch Logs.

๐Ÿ’ผ [S3.1] S3 general purpose buckets should have block public access settings enabled

Amazon S3 public access block is designed to provide controls across an entire AWS account or at the individual S3 bucket level to ensure that objects never have public access. Public access is granted to buckets and objects through access control lists (ACLs), bucket policies, or both. Unless you intend to have your S3 buckets be publicly accessible, you should configure the account level Amazon S3 Block Public Access feature.

๐Ÿ’ผ [S3.19] S3 access points should have block public access settings enabled

The Amazon S3 Block Public Access feature helps you manage access to your S3 resources at three levels: the account, bucket, and access point levels. The settings at each level can be configured independently, allowing you to have different levels of public access restrictions for your data. The access point settings can't individually override the more restrictive settings at higher levels (account level or bucket assigned to the access point). Instead, the settings at the access point level are additive, meaning they complement and work alongside the settings at the other levels. Unless you intend an S3 access point to be publicly accessible, you should enable block public access settings.

๐Ÿ’ผ [S3.6] S3 general purpose bucket policies should restrict access to other AWS accounts

Implementing least privilege access is fundamental to reducing security risk and the impact of errors or malicious intent. If an S3 bucket policy allows access from external accounts, it could result in data exfiltration by an insider threat or an attacker. The `blacklistedactionpatterns` parameter allows for successful evaluation of the rule for S3 buckets. The parameter grants access to external accounts for action patterns that are not included in the `blacklistedactionpatterns` list.

๐Ÿ’ผ [SageMaker.1] Amazon SageMaker AI notebook instances should not have direct internet access

If you configure your SageMaker AI instance without a VPC, then by default direct internet access is enabled on your instance. You should configure your instance with a VPC and change the default setting to Disableโ€”Access the internet through a VPC. To train or host models from a notebook, you need internet access. To enable internet access, your VPC must have either an interface endpoint (AWS PrivateLink) or a NAT gateway and a security group that allows outbound connections.

๐Ÿ’ผ [SageMaker.2] SageMaker AI notebook instances should be launched in a custom VPC

Subnets are a range of IP addresses within a VPC. We recommend keeping your resources inside a custom VPC whenever possible to ensure secure network protection of your infrastructure. An Amazon VPC is a virtual network dedicated to your AWS account. With an Amazon VPC, you can control the network access and internet connectivity of your SageMaker AI Studio and notebook instances.

๐Ÿ’ผ [SageMaker.5] SageMaker models should block inbound traffic

SageMaker AI training and deployed inference containers are internet-enabled by default. If you don't want SageMaker AI to provide external network access to your training or inference containers, you can enable network isolation. If you enable network isolation, the containers can't make any outbound network calls, even to other AWS services. Additionally, no AWS credentials are made available to the container runtime environment. Enabling network isolation helps prevent unintended access to your SageMaker AI resources from the internet.

๐Ÿ’ผ [SecretsManager.1] Secrets Manager secrets should have automatic rotation enabled

Secrets Manager helps you improve the security posture of your organization. Secrets include database credentials, passwords, and third-party API keys. You can use Secrets Manager to store secrets centrally, encrypt secrets automatically, control access to secrets, and rotate secrets safely and automatically. Secrets Manager can rotate secrets. You can use rotation to replace long-term secrets with short-term ones. Rotating your secrets limits how long an unauthorized user can use a compromised secret. For this reason, you should rotate your secrets frequently.

๐Ÿ’ผ [SecretsManager.2] Secrets Manager secrets configured with automatic rotation should rotate successfully

Secrets Manager helps you improve the security posture of your organization. Secrets include database credentials, passwords, and third-party API keys. You can use Secrets Manager to store secrets centrally, encrypt secrets automatically, control access to secrets, and rotate secrets safely and automatically. Secrets Manager can rotate secrets. You can use rotation to replace long-term secrets with short-term ones. Rotating your secrets limits how long an unauthorized user can use a compromised secret. For this reason, you should rotate your secrets frequently. In addition to configuring secrets to rotate automatically, you should ensure that those secrets rotate successfully based on the rotation schedule.

๐Ÿ’ผ [SecretsManager.3] Remove unused Secrets Manager secrets

Deleting unused secrets is as important as rotating secrets. Unused secrets can be abused by their former users, who no longer need access to these secrets. Also, as more users get access to a secret, someone might have mishandled and leaked it to an unauthorized entity, which increases the risk of abuse. Deleting unused secrets helps revoke secret access from users who no longer need it. It also helps to reduce the cost of using Secrets Manager. Therefore, it is essential to routinely delete unused secrets.

๐Ÿ’ผ [SecretsManager.4] Secrets Manager secrets should be rotated within a specified number of days

Rotating secrets can help you to reduce the risk of an unauthorized use of your secrets in your AWS account. Examples include database credentials, passwords, third-party API keys, and even arbitrary text. If you do not change your secrets for a long period of time, the secrets are more likely to be compromised. As more users get access to a secret, it can become more likely that someone mishandled and leaked it to an unauthorized entity. Secrets can be leaked through logs and cache data. They can be shared for debugging purposes and not changed or revoked once the debugging completes. For all these reasons, secrets should be rotated frequently. You can configure automatic rotation for secrets in AWS Secrets Manager. With automatic rotation, you can replace long-term secrets with short-term ones, significantly reducing the risk of compromise.

๐Ÿ’ผ [SNS.4] SNS topic access policies should not allow public access

You use an SNS access policy with a particular topic to restrict who can work with that topic (for example, who can publish messages to it or who can subscribe to it). SNS policies can grant access to other AWS accounts, or to users within your own AWS account. Providing a wildcard (*) in the `Principle` field of the topic policy and a lack of conditions to limit the topic policy can result in data exfiltration, denial of service, or undesired injection of messages into your service by an attacker.

๐Ÿ’ผ [SQS.3] SQS queue access policies should not allow public access

An Amazon SQS access policy can allow public access to an SQS queue, which might allow an anonymous user or any authenticated AWS IAM identity to access the queue. SQS access policies typically provide this access by specifying the wildcard character (*) in the `Principal` element of the policy, not using proper conditions to restrict access to the queue, or both. If an SQS access policy allows public access, third parties might be able to perform tasks such as receive messages from the queue, send messages to the queue, or modify the access policy for the queue. This could result in events such as data exfiltration, a denial of service, or injection of messages into the queue by a threat actor.

๐Ÿ’ผ [SSM.1] Amazon EC2 instances should be managed by AWS Systems Manager

To help you to maintain security and compliance, Systems Manager scans your stopped and running managed instances. A managed instance is a machine that is configured for use with Systems Manager. Systems Manager then reports or takes corrective action on any policy violations that it detects. Systems Manager also helps you to configure and maintain your managed instances.

๐Ÿ’ผ [StepFunctions.1] Step Functions state machines should have logging turned on

Monitoring helps you maintain the reliability, availability, and performance of Step Functions. You should collect as much monitoring data from the AWS services that you use so you can more easily debug multi-point failures. Having a logging configuration defined for your Step Functions state machines allows for you to track execution history and results in Amazon CloudWatch Logs. Optionally, you can track only errors or fatal events.

๐Ÿ’ผ [Transfer.2] Transfer Family servers should not use FTP protocol for endpoint connection

FTP (File Transfer Protocol) establishes the endpoint connection through unencrypted channels, leaving data sent over these channels vulnerable to interception. Using SFTP (SSH File Transfer Protocol), FTPS (File Transfer Protocol Secure), or AS2 (Applicability Statement 2) offers an extra layer of security by encrypting your data in transit and can be used to help prevent potential attackers from using person-in-the-middle or similar attacks to eavesdrop on or manipulate network traffic.

๐Ÿ’ผ [Transfer.3] Transfer Family connectors should have logging enabled

Amazon CloudWatch is a monitoring and observability service that provides visibility into your AWS resources, including AWS Transfer Family resources. For Transfer Family, CloudWatch provides consolidated auditing and logging for workflow progress and results. This includes several metrics that Transfer Family defines for workflows. You can configure Transfer Family to automatically log connector events in CloudWatch. To do this, you specify a logging role for the connector. For the logging role, you create an IAM role and a resource-based IAM policy that defines the permissions for the role.

๐Ÿ’ผ [WAF.1] AWS WAF Classic Global Web ACL logging should be enabled

Logging is an important part of maintaining the reliability, availability, and performance of AWS WAF globally. It is a business and compliance requirement in many organizations, and allows you to troubleshoot application behavior. It also provides detailed information about the traffic that is analyzed by the web ACL that is attached to AWS WAF.

๐Ÿ’ผ [WAF.10] AWS WAF web ACLs should have at least one rule or rule group

A web ACL gives you fine-grained control over all of the HTTP(S) web requests that your protected resource responds to. A web ACL should contain a collection of rules and rule groups that inspect and control web requests. If a web ACL is empty, the web traffic can pass without being detected or acted upon by AWS WAF depending on the default action.

๐Ÿ’ผ [WAF.2] AWS WAF Classic Regional rules should have at least one condition

A WAF Regional rule can contain multiple conditions. The rule's conditions allow for traffic inspection and take a defined action (allow, block, or count). Without any conditions, the traffic passes without inspection. A WAF Regional rule with no conditions, but with a name or tag suggesting allow, block, or count, could lead to the wrong assumption that one of those actions is occurring.

๐Ÿ’ผ [WAF.3] AWS WAF Classic Regional rule groups should have at least one rule

A WAF Regional rule group can contain multiple rules. The rule's conditions allow for traffic inspection and take a defined action (allow, block, or count). Without any rules, the traffic passes without inspection. A WAF Regional rule group with no rules, but with a name or tag suggesting allow, block, or count, could lead to the wrong assumption that one of those actions is occurring.

๐Ÿ’ผ [WAF.6] AWS WAF Classic global rules should have at least one condition

A WAF global rule can contain multiple conditions. A rule's conditions allow for traffic inspection and take a defined action (allow, block, or count). Without any conditions, the traffic passes without inspection. A WAF global rule with no conditions, but with a name or tag suggesting allow, block, or count, could lead to the wrong assumption that one of those actions is occurring.

๐Ÿ’ผ [WAF.7] AWS WAF Classic global rule groups should have at least one rule

A WAF global rule group can contain multiple rules. The rule's conditions allow for traffic inspection and take a defined action (allow, block, or count). Without any rules, the traffic passes without inspection. A WAF global rule group with no rules, but with a name or tag suggesting allow, block, or count, could lead to the wrong assumption that one of those actions is occurring.

๐Ÿ’ผ 1 An APRA-regulated entity could benefit from developing a training and information security awareness program. This would typically communicate to personnel (staff, contractors and third parties) regarding information security practices, policies and other expectations as well as providing material to assist the Board and other governing bodies to execute their duties. Sound practice would involve tracking training undertaken and testing the understanding of relevant information security policies, both on commencement and periodically.

๐Ÿ’ผ 1 Control Plane Components

This section consists of security recommendations for the direct configuration of Kubernetes control plane processes. These recommendations may not be directly applicable for cluster operators in environments where these components are managed by a 3rd party.

๐Ÿ’ผ 1 Identity and Access Management

This section covers security recommendations that to follow to set identity and access management policies on an Azure Subscription. Identity and Access Management policies are the first step towards a defense-in-depth approach to securing an Azure Cloud Platform environment. Most of the recommendations from this section are marked as "Not Scored" because of the lack of "Azure native CLI and API support" to perform the respective audits. However, from a security posture standpoint, these recommendations are important. According to the last communication with the Microsoft Support team regarding "Azure native CLI and API support", Microsoft teams are working to enhance "Microsoft graph API" to support all these "Azure AD" functionalities. Once we get this capability through "Microsoft Graph API", we will update the involved recommendations with the respective audit and remediation steps to make them as scored.

๐Ÿ’ผ 1 Identity and Access Management

This section covers security recommendations that to follow to set identity and access management policies on an Azure Subscription. Identity and Access Management policies are the first step towards a defense-in-depth approach to securing an Azure Cloud Platform environment. Most of the recommendations from this section are marked as "Not Scored" because of the lack of "Azure native CLI and API support" to perform the respective audits. However, from a security posture standpoint, these recommendations are important. According to the last communication with the Microsoft Support team regarding "Azure native CLI and API support", Microsoft teams are working to enhance "Microsoft graph API" to support all these "Azure AD" functionalities. Once we get this capability through "Microsoft Graph API", we will update the involved recommendations with the respective audit and remediation steps to make them as scored.

๐Ÿ’ผ 1 Identity and Access Management

This section covers security recommendations to set identity and access management policies on an Azure Subscription. Identity and Access Management policies are the first step towards a defense-in-depth approach to securing an Azure Cloud Platform environment. Many of the recommendations from this section are marked as "Manual" while Azure CLI and PowerShell are being improved to support and perform the respective audits and remediation. From a security posture standpoint, these recommendations are still very important and should not be discounted because they are "Manual." As automation capability using Rest API is developed for this Benchmark, the related recommendations will be updated with the respective audit and remediation steps and changed to an "automated" assessment status. If any problems are encountered running Azure CLI or PowerShell methodologies, please refer to the Overview for this benchmark where you will find additional detail on permission and required cmdlets.

๐Ÿ’ผ 1 Identity and Access Management

This section covers security recommendations to set identity and access management policies on an Azure Subscription. Identity and Access Management policies are the first step towards a defense-in-depth approach to securing an Azure Cloud Platform environment. Many of the recommendations from this section are marked as "Manual" while the existing Azure CLI and Azure AD PowerShell support through the Azure AD Graph are being depreciated. It is now recommended to use the new Microsoft Graph in replacement of Azure AD Graph for PowerShell and API level access. From a security posture standpoint, these recommendations are still very important and should not be discounted because they are "Manual." As automation capability using Rest API is developed for this Benchmark, the related recommendations will be updated with the respective audit and remediation steps and changed to an "automated" assessment status. If any problems are encountered running Azure CLI or PowerShell methodologies, please refer to the Overview for this benchmark where you will find additional detail on permission and required cmdlets.

๐Ÿ’ผ 1 Identity and Access Management

This section covers security recommendations to set identity and access management policies on an Azure Subscription. Identity and Access Management policies are the first step towards a defense-in-depth approach to securing an Azure Cloud Platform environment. Many of the recommendations from this section are marked as "Manual" while the existing Azure CLI and Azure AD PowerShell support through the Azure AD Graph are being depreciated. It is now recommended to use the new Microsoft Graph in replacement of Azure AD Graph for PowerShell and API level access. From a security posture standpoint, these recommendations are still very important and should not be discounted because they are "Manual." As automation capability using Rest API is developed for this Benchmark, the related recommendations will be updated with the respective audit and remediation steps and changed to an "automated" assessment status. If any problems are encountered running Azure CLI or PowerShell methodologies, please refer to the Overview for this benchmark where you will find additional detail on permission and required cmdlets.

๐Ÿ’ผ 1.1 Ensure that 'Multi-Factor Auth Status' is 'Enabled' for all Privileged Users - Level 1 (Manual | Not supported, no API/CLI available by Azure)

Enable multi-factor authentication for all roles, groups, and users that have write access or permissions to Azure resources. These include custom created objects or built-in roles such as; - Service Co-Administrators - Subscription Owners - Contributors Please note that according to the CIS Benchmark audit steps, at this point in time, there is no API/CLI mechanism available to programmatically conduct a security assessment for this recommendation.

๐Ÿ’ผ 1.1 Ensure that multi-factor authentication is enabled for all privileged users - Level 1 (Manual | Not supported, no API/CLI available by Azure)

Enable multi-factor authentication for all user credentials who have write access to Azure resources. These include roles like - Service Co-Administrators - Subscription Owners - Contributors Please note that according to the CIS Benchmark audit steps, at this point in time, there is no API/CLI mechanism available to programmatically conduct a security assessment for this recommendation.

๐Ÿ’ผ 1.1 Maintain current contact details

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of Acceptable Use Policy or indicative of likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.1 Maintain current contact details

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of Acceptable Use Policy or indicative of likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.1 Maintain current contact details - Level 1 (Manual)

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of Acceptable Use Policy or indicative of likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.1 Maintain current contact details - Level 1 (Manual)

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of Acceptable Use Policy or indicative of likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.1 Maintain current contact details - Level 1 (Manual)

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of Acceptable Use Policy or indicative of likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.1 Maintain current contact details (Manual)

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of the Acceptable Use Policy or indicative of a likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.1 Maintain current contact details (Manual)

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of the Acceptable Use Policy or indicative of a likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.1 Maintain current contact details (Manual)

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of the Acceptable Use Policy or indicative of a likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.1 Security Defaults

**IMPORTANT:** The Azure "Security Defaults" recommendations represent an entry-level set of recommendations which will be relevant to organizations and tenants that are either just starting to use Azure as an IaaS solution, or are only utilizing a bare minimum feature set such as the freely licensed tier of Azure Active Directory. Security Defaults recommendations are intended to ensure that these entry-level use cases are still capable of establishing a strong baseline of secure configuration. **If your subscription is licensed to use Azure AD Premium P1 or P2, it is strongly recommended that the "Security Defaults" section (this section and the recommendations therein) be bypassed in favor of the use of "Conditional Access."**

๐Ÿ’ผ 1.1 Security Defaults

**IMPORTANT:** The Azure "Security Defaults" recommendations represent an entry-level set of recommendations which will be relevant to organizations and tenants that are either just starting to use Azure as an IaaS solution, or are only utilizing a bare minimum feature set such as the freely licensed tier of Azure Active Directory. Security Defaults recommendations are intended to ensure that these entry-level use cases are still capable of establishing a strong baseline of secure configuration. **If your subscription is licensed to use Azure AD Premium P1 or P2, it is strongly recommended that the "Security Defaults" section (this section and the recommendations therein) be bypassed in favor of the use of "Conditional Access."**

๐Ÿ’ผ 1.1 Security Defaults

**IMPORTANT:** The Azure "Security Defaults" recommendations represent an entry-level set of recommendations which will be relevant to organizations and tenants that are either just starting to use Azure, or are only utilizing a bare minimum feature set such as the freely licensed tier of Microsoft Entra ID. Security Defaults recommendations are intended to ensure that these entry-level use cases are still capable of establishing a strong baseline of secure configuration. **If your subscription is licensed to use Microsoft Entra ID P1 or P2, it is strongly recommended that the "Security Defaults" section (this section and the recommendations therein) be bypassed in favor of the use of "Conditional Access."**

๐Ÿ’ผ 1.1.1 Ensure Security Defaults is enabled on Azure Active Directory - Level 1 (Manual | Not supported, no API/CLI available by Azure)

Security defaults in Azure Active Directory (Azure AD) make it easier to be secure and help protect your organization. Security defaults contain preconfigured security settings for common attacks. Microsoft is making security defaults available to everyone. The goal is to ensure that all organizations have a basic level of security enabled at no extra cost. You may turn on security defaults in the Azure portal.

๐Ÿ’ผ 1.1.1 Ensure Security Defaults is enabled on Azure Active Directory - Level 1 (Manual)

Security defaults in Azure Active Directory (Azure AD) make it easier to be secure and help protect your organization. Security defaults contain preconfigured security settings for common attacks. Security defaults is available to everyone. The goal is to ensure that all organizations have a basic level of security enabled at no extra cost. You may turn on security defaults in the Azure portal.

๐Ÿ’ผ 1.1.1 Ensure Security Defaults is enabled on Microsoft Entra ID - Level 1 (Manual)

Security defaults in Microsoft Entra ID make it easier to be secure and help protect your organization. Security defaults contain preconfigured security settings for common attacks. Security defaults is available to everyone. The goal is to ensure that all organizations have a basic level of security enabled at no extra cost. You may turn on security defaults in the Azure portal.

๐Ÿ’ผ 1.10 Do not create access keys during initial setup for IAM users with a console password (Manual)

AWS console defaults to no check boxes selected when creating a new IAM user. When creating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 1.10 Ensure KMS encryption keys are rotated within a period of 90 days

Google Cloud Key Management Service stores cryptographic keys in a hierarchical structure designed for useful and elegant access control management. The format for the rotation schedule depends on the client library that is used. For the gcloud command-line tool, the next rotation time must be in 'ISO' or 'RFC3339' format, and the rotation period must be in the form 'INTEGER[UNIT]', where units can be one of seconds (s), minutes (m), hours (h) or days (d).

๐Ÿ’ผ 1.10 Ensure KMS encryption keys are rotated within a period of 90 days - Level 1 (Automated)

Google Cloud Key Management Service stores cryptographic keys in a hierarchical structure designed for useful and elegant access control management. The format for the rotation schedule depends on the client library that is used. For the gcloud command-line tool, the next rotation time must be in `ISO` or `RFC3339` format, and the rotation period must be in the form `INTEGER[UNIT]`, where units can be one of seconds (s), minutes (m), hours (h) or days (d).

๐Ÿ’ผ 1.10 Ensure KMS Encryption Keys Are Rotated Within a Period of 90 Days - Level 1 (Automated)

Google Cloud Key Management Service stores cryptographic keys in a hierarchical structure designed for useful and elegant access control management. The format for the rotation schedule depends on the client library that is used. For the gcloud command-line tool, the next rotation time must be in `ISO` or `RFC3339` format, and the rotation period must be in the form `INTEGER[UNIT]`, where units can be one of seconds (s), minutes (m), hours (h) or days (d).

๐Ÿ’ผ 1.10 Ensure KMS Encryption Keys Are Rotated Within a Period of 90 Days - Level 1 (Automated)

Google Cloud Key Management Service stores cryptographic keys in a hierarchical structure designed for useful and elegant access control management. The format for the rotation schedule depends on the client library that is used. For the gcloud command-line tool, the next rotation time must be in ISO or RFC3339 format, and the rotation period must be in the form INTEGER[UNIT], where units can be one of seconds (s), minutes (m), hours (h) or days (d).

๐Ÿ’ผ 1.10 Ensure KMS Encryption Keys Are Rotated Within a Period of 90 Days - Level 1 (Automated)

Google Cloud Key Management Service stores cryptographic keys in a hierarchical structure designed for useful and elegant access control management. The format for the rotation schedule depends on the client library that is used. For the gcloud command-line tool, the next rotation time must be in `ISO` or `RFC3339` format, and the rotation period must be in the form `INTEGER[UNIT]`, where units can be one of seconds (s), minutes (m), hours (h) or days (d).

๐Ÿ’ผ 1.10 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 1.10 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 1.10 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password - Level 1 (Automated)

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 1.10 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password - Level 1 (Automated)

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 1.10 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password - Level 1 (Automated)

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 1.10 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password (Automated)

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 1.10 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password (Automated)

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 1.11 Do not create access keys during initial setup for IAM users with a console password (Manual)

AWS console defaults to no check boxes selected when creating a new IAM user. When creating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 1.11 Do not create access keys during initial setup for IAM users with a console password (Manual)

AWS console defaults to no check boxes selected when creating a new IAM user. When creating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 1.11 Do not setup access keys during initial user setup for all IAM users that have a console password

AWS console defaults to no check boxes selected when creating a new IAM user. When cerating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 1.11 Do not setup access keys during initial user setup for all IAM users that have a console password

AWS console defaults to no check boxes selected when creating a new IAM user. When cerating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 1.11 Do not setup access keys during initial user setup for all IAM users that have a console password - Level 1 (Automated)

AWS console defaults to no check boxes selected when creating a new IAM user. When cerating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 1.11 Do not setup access keys during initial user setup for all IAM users that have a console password - Level 1 (Manual)

AWS console defaults to no check boxes selected when creating a new IAM user. When creating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 1.11 Do not setup access keys during initial user setup for all IAM users that have a console password - Level 1 (Manual)

AWS console defaults to no check boxes selected when creating a new IAM user. When creating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 1.12 Ensure API Keys Only Exist for Active Services - Level 2 (Automated)

API Keys should only be used for services in cases where other authentication methods are unavailable. Unused keys with their permissions in tact may still exist within a project. Keys are insecure because they can be viewed publicly, such as from within a browser, or they can be accessed on a device where the key resides. It is recommended to use standard authentication flow instead.

๐Ÿ’ผ 1.12 Ensure API Keys Only Exist for Active Services - Level 2 (Automated)

API Keys should only be used for services in cases where other authentication methods are unavailable. Unused keys with their permissions in tact may still exist within a project. Keys are insecure because they can be viewed publicly, such as from within a browser, or they can be accessed on a device where the key resides. It is recommended to use standard authentication flow instead.

๐Ÿ’ผ 1.13 Ensure access keys are rotated every 90 days or less (Automated)

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be rotated regularly.

๐Ÿ’ผ 1.13 Ensure API Keys Are Restricted To Use by Only Specified Hosts and Apps - Level 2 (Manual)

API Keys should only be used for services in cases where other authentication methods are unavailable. In this case, unrestricted keys are insecure because they can be viewed publicly, such as from within a browser, or they can be accessed on a device where the key resides. It is recommended to restrict API key usage to trusted hosts, HTTP referrers and apps. It is recommended to use the more secure standard authentication flow instead.

๐Ÿ’ผ 1.13 Ensure API Keys Are Restricted To Use by Only Specified Hosts and Apps - Level 2 (Manual)

API Keys should only be used for services in cases where other authentication methods are unavailable. In this case, unrestricted keys are insecure because they can be viewed publicly, such as from within a browser, or they can be accessed on a device where the key resides. It is recommended to restrict API key usage to trusted hosts, HTTP referrers and apps. It is recommended to use the more secure standard authentication flow instead.

๐Ÿ’ผ 1.13 Ensure MFA is enabled for the "root" account

The root account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. **Note:** When virtual MFA is used for root accounts, it is recommended that the device used is NOT a personal device, but rather a dedicated mobile device (tablet or phone) that is managed to be kept charged and secured independent of any individual personal devices. ("non-personal virtual MFA") This lessens the risks of losing access to the MFA due to device loss, device trade-in or if the individual owning the device is no longer employed at the company.

๐Ÿ’ผ 1.14 Ensure access keys are rotated every 90 days or less

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be regularly rotated.

๐Ÿ’ผ 1.14 Ensure access keys are rotated every 90 days or less

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be regularly rotated.

๐Ÿ’ผ 1.14 Ensure access keys are rotated every 90 days or less - Level 1 (Automated)

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be regularly rotated.

๐Ÿ’ผ 1.14 Ensure access keys are rotated every 90 days or less - Level 1 (Automated)

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be regularly rotated.

๐Ÿ’ผ 1.14 Ensure access keys are rotated every 90 days or less - Level 1 (Automated)

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be regularly rotated.

๐Ÿ’ผ 1.14 Ensure access keys are rotated every 90 days or less (Automated)

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be rotated regularly.

๐Ÿ’ผ 1.14 Ensure access keys are rotated every 90 days or less (Automated)

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be rotated regularly.

๐Ÿ’ผ 1.14 Ensure hardware MFA is enabled for the "root" account

The root account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the root account be protected with a hardware MFA.

๐Ÿ’ผ 1.14 Ensure IAM users receive permissions only through groups (Automated)

IAM users are granted access to services, functions, and data through IAM policies. There are four ways to define policies for a user: 1) Edit the user policy directly, also known as an inline or user policy; 2) attach a policy directly to a user; 3) add the user to an IAM group that has an attached policy; 4) add the user to an IAM group that has an inline policy. Only the third implementation is recommended.

๐Ÿ’ผ 1.15 Ensure IAM policies that allow full "*:*" administrative privileges are not attached (Automated)

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered standard security advice to grant least privilegeโ€”that is, granting only the permissions required to perform a task. Determine what users need to do, and then craft policies for them that allow the users to perform only those tasks, instead of granting full administrative privileges.

๐Ÿ’ผ 1.15 Ensure IAM Users Receive Permissions Only Through Groups

IAM users are granted access to services, functions, and data through IAM policies. There are three ways to define policies for a user: 1) Edit the user policy directly, aka an inline, or user, policy; 2) attach a policy directly to a user; 3) add the user to an IAM group that has an attached policy. Only the third implementation is recommended.

๐Ÿ’ผ 1.15 Ensure IAM Users Receive Permissions Only Through Groups

IAM users are granted access to services, functions, and data through IAM policies. There are three ways to define policies for a user: 1) Edit the user policy directly, aka an inline, or user, policy; 2) attach a policy directly to a user; 3) add the user to an IAM group that has an attached policy. Only the third implementation is recommended.

๐Ÿ’ผ 1.15 Ensure IAM Users Receive Permissions Only Through Groups - Level 1 (Automated)

IAM users are granted access to services, functions, and data through IAM policies. There are four ways to define policies for a user: 1) Edit the user policy directly, aka an inline, or user, policy; 2) attach a policy directly to a user; 3) add the user to an IAM group that has an attached policy; 4) add the user to an IAM group that has an inline policy. Only the third implementation is recommended.

๐Ÿ’ผ 1.15 Ensure IAM Users Receive Permissions Only Through Groups - Level 1 (Automated)

IAM users are granted access to services, functions, and data through IAM policies. There are four ways to define policies for a user: 1) Edit the user policy directly, aka an inline, or user, policy; 2) attach a policy directly to a user; 3) add the user to an IAM group that has an attached policy; 4) add the user to an IAM group that has an inline policy. Only the third implementation is recommended.

๐Ÿ’ผ 1.15 Ensure IAM users receive permissions only through groups (Automated)

IAM users are granted access to services, functions, and data through IAM policies. There are four ways to define policies for a user: 1) Edit the user policy directly, also known as an inline or user policy; 2) attach a policy directly to a user; 3) add the user to an IAM group that has an attached policy; 4) add the user to an IAM group that has an inline policy. Only the third implementation is recommended.

๐Ÿ’ผ 1.15 Ensure IAM users receive permissions only through groups (Automated)

IAM users are granted access to services, functions, and data through IAM policies. There are four ways to define policies for a user: 1) Edit the user policy directly, also known as an inline or user policy; 2) attach a policy directly to a user; 3) add the user to an IAM group that has an attached policy; 4) add the user to an IAM group that has an inline policy. Only the third implementation is recommended.

๐Ÿ’ผ 1.16 Ensure IAM policies that allow full "_:_" administrative privileges are not attached - Level 1 (Automated)

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered a standard security advice to grant _least privilege_ -that is, granting only the permissions required to perform a task. Determine what users need to do and then craft policies for them that let the users perform _only_ those tasks, instead of allowing full administrative privileges.

๐Ÿ’ผ 1.16 Ensure IAM policies that allow full "*:*" administrative privileges are not attached

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered a standard security advice to grant _least privilege_ -that is, granting only the permissions required to perform a task. Determine what users need to do and then craft policies for them that let the users perform _only_ those tasks, instead of allowing full administrative privileges.

๐Ÿ’ผ 1.16 Ensure IAM policies that allow full "*:*" administrative privileges are not attached

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered a standard security advice to grant _least privilege_ -that is, granting only the permissions required to perform a task. Determine what users need to do and then craft policies for them that let the users perform _only_ those tasks, instead of allowing full administrative privileges.

๐Ÿ’ผ 1.16 Ensure IAM policies that allow full "*:*" administrative privileges are not attached - Level 1 (Automated)

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered a standard security advice to grant _least privilege_ -that is, granting only the permissions required to perform a task. Determine what users need to do and then craft policies for them that let the users perform _only_ those tasks, instead of allowing full administrative privileges.

๐Ÿ’ผ 1.16 Ensure IAM policies that allow full "*:*" administrative privileges are not attached - Level 1 (Automated)

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered a standard security advice to grant *least privilege* - that is, granting only the permissions required to perform a task. Determine what users need to do and then craft policies for them that let the users perform *only* those tasks, instead of allowing full administrative privileges.

๐Ÿ’ผ 1.16 Ensure IAM policies that allow full "*:*" administrative privileges are not attached (Automated)

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered standard security advice to grant least privilegeโ€”that is, granting only the permissions required to perform a task. Determine what users need to do, and then craft policies for them that allow the users to perform only those tasks, instead of granting full administrative privileges.

๐Ÿ’ผ 1.16 Ensure IAM policies that allow full "*:*" administrative privileges are not attached (Automated)

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered standard security advice to grant least privilegeโ€”that is, granting only the permissions required to perform a task. Determine what users need to do, and then craft policies for them that allow the users to perform only those tasks, instead of granting full administrative privileges.

๐Ÿ’ผ 1.17 Ensure that Dataproc Cluster is encrypted using Customer-Managed Encryption Key - Level 2 (Automated)

When you use Dataproc, cluster and job data is stored on Persistent Disks (PDs) associated with the Compute Engine VMs in your cluster and in a Cloud Storage staging bucket. This PD and bucket data is encrypted using a Google-generated data encryption key (DEK) and key encryption key (KEK). The CMEK feature allows you to create, use, and revoke the key encryption key (KEK). Google still controls the data encryption key (DEK).

๐Ÿ’ผ 1.17 Ensure that Dataproc Cluster is encrypted using Customer-Managed Encryption Key - Level 2 (Automated)

When you use Dataproc, cluster and job data is stored on Persistent Disks (PDs) associated with the Compute Engine VMs in your cluster and in a Cloud Storage staging bucket. This PD and bucket data is encrypted using a Google-generated data encryption key (DEK) and key encryption key (KEK). The CMEK feature allows you to create, use, and revoke the key encryption key (KEK). Google still controls the data encryption key (DEK).

๐Ÿ’ผ 1.17 Maintain current contact details

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organisation. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of Acceptable Use Policy or indicative of likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organisation; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 1.18 Ensure that all expired SSL/TLS certificates stored in AWS IAM are removed (Automated)

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use AWS Certificate Manager (ACM) or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 1.19 Ensure that all expired SSL/TLS certificates stored in AWS IAM are removed (Automated)

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use AWS Certificate Manager (ACM) or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 1.19 Ensure that all expired SSL/TLS certificates stored in AWS IAM are removed (Automated)

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use AWS Certificate Manager (ACM) or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 1.19 Ensure that all the expired SSL/TLS certificates stored in AWS IAM are removed

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use ACM or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 1.19 Ensure that all the expired SSL/TLS certificates stored in AWS IAM are removed

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use ACM or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 1.19 Ensure that all the expired SSL/TLS certificates stored in AWS IAM are removed - Level 1 (Automated)

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use ACM or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 1.19 Ensure that all the expired SSL/TLS certificates stored in AWS IAM are removed - Level 1 (Automated)

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use ACM or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 1.19 Ensure that all the expired SSL/TLS certificates stored in AWS IAM are removed - Level 1 (Automated)

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use ACM or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 1.19 Ensure that IAM External Access Analyzer is enabled for all regions (Automated)

Enable the IAM External Access Analyzer regarding all resources in each active AWS region. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. The results allow you to determine whether an unintended user is permitted, making it easier for administrators to monitor least privilege access. Access Analyzer analyzes only the policies that are applied to resources in the same AWS Region.

๐Ÿ’ผ 1.2 Conditional Access

For most Azure tenants, and certainly for organizations with a significant use of Azure Active Directory, Conditional Access policies are recommended and preferred. To use conditional access policies, a licensing plan is required, and **Security Defaults must be disabled**. Conditional Access requires one of the following plans: - Azure Active Directory Premium P1 or P2 - Microsoft 365 Business Premium - Microsoft 365 E3 or E5 - Enterprise Mobility & Security E3 or E5

๐Ÿ’ผ 1.2 Conditional Access

For most Azure tenants, and certainly for organizations with a significant use of Azure Active Directory, Conditional Access policies are recommended and preferred. To use conditional access policies, a licensing plan is required, and **Security Defaults must be disabled**. Conditional Access requires one of the following plans: - Azure Active Directory Premium P1 or P2 - Microsoft 365 Business Premium - Microsoft 365 E3 or E5 - Microsoft 365 F1, F3, F5 Security and F5 Security + Compliance - Enterprise Mobility & Security E3 or E5

๐Ÿ’ผ 1.2 Conditional Access

For most Azure tenants, and certainly for organizations with a significant use of Microsoft Entra ID, Conditional Access policies are recommended and preferred. To use conditional access policies, a licensing plan is required, and **Security Defaults must be disabled**. Conditional Access requires one of the following plans: - Microsoft Entra ID P1 or P2 - Microsoft 365 Business Premium - Microsoft 365 E3 or E5 - Microsoft 365 F1, F3, F5 Security and F5 Security + Compliance - Enterprise Mobility & Security E3 or E5

๐Ÿ’ผ 1.2 Prevent access to the administrative interface from the internet

Prevent access to the administrative interface (used to manage firewall configuration) from the internet, unless there is a clear and documented business need, and the interface is protected by one of the following controls: - multi-factor authentication (see MFA details below) - an IP allow list that limits access to a small range of trusted addresses combined with a properly managed password authentication approach

๐Ÿ’ผ 1.2.1 Ensure Trusted Locations Are Defined - Level 1 (Manual)

Azure Active Directory Conditional Access allows an organization to configure `Named locations` and configure whether those locations are trusted or untrusted. These settings provide organizations the means to specify Geographical locations for use in conditional access policies, or define actual IP addresses and IP ranges and whether or not those IP addresses and/or ranges are trusted by the organization.

๐Ÿ’ผ 1.2.1 Ensure Trusted Locations Are Defined - Level 1 (Manual)

Azure Active Directory Conditional Access allows an organization to configure Named locations and configure whether those locations are trusted or untrusted. These settings provide organizations the means to specify Geographical locations for use in conditional access policies, or define actual IP addresses and IP ranges and whether or not those IP addresses and/or ranges are trusted by the organization.

๐Ÿ’ผ 1.2.1 Ensure Trusted Locations Are Defined - Level 1 (Manual)

Microsoft Entra ID Conditional Access allows an organization to configure `Named locations` and configure whether those locations are trusted or untrusted. These settings provide organizations the means to specify Geographical locations for use in conditional access policies, or define actual IP addresses and IP ranges and whether or not those IP addresses and/or ranges are trusted by the organization.

๐Ÿ’ผ 1.2.2 Ensure that an exclusionary Geographic Access Policy is considered - Level 1 (Manual)

**CAUTION**: If these policies are created without first auditing and testing the result, misconfiguration can potentially lock out administrators or create undesired access issues. Conditional Access Policies can be used to block access from geographic locations that are deemed out-of-scope for your organization or application. The scope and variables for this policy should be carefully examined and defined.

๐Ÿ’ผ 1.2.2 Ensure that an exclusionary Geographic Access Policy is considered - Level 1 (Manual)

CAUTION: If these policies are created without first auditing and testing the result, misconfiguration can potentially lock out administrators or create undesired access issues. Conditional Access Policies can be used to block access from geographic locations that are deemed out-of-scope for your organization or application. The scope and variables for this policy should be carefully examined and defined.

๐Ÿ’ผ 1.2.2 Ensure that an exclusionary Geographic Access Policy is considered - Level 1 (Manual)

**CAUTION**: If these policies are created without first auditing and testing the result, misconfiguration can potentially lock out administrators or create undesired access issues. Conditional Access Policies can be used to block access from geographic locations that are deemed out-of-scope for your organization or application. The scope and variables for this policy should be carefully examined and defined.

๐Ÿ’ผ 1.20 Ensure that IAM Access analyzer is enabled for all regions

Enable IAM Access analyzer for IAM policies about all resources in each region. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. So the results allow you to determine if an unintended user is allowed, making it easier for administrators to monitor least privileges access. Access Analyzer analyzes only policies that are applied to resources in the same AWS Region.

๐Ÿ’ผ 1.20 Ensure that IAM Access analyzer is enabled for all regions - Level 1 (Automated)

Enable IAM Access analyzer for IAM policies about all resources in each region. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. So the results allow you to determine if an unintended user is allowed, making it easier for administrators to monitor least privileges access. Access Analyzer analyzes only policies that are applied to resources in the same AWS Region.

๐Ÿ’ผ 1.20 Ensure that IAM Access analyzer is enabled for all regions - Level 1 (Automated)

Enable IAM Access analyzer for IAM policies about all resources in each active AWS region. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. So the results allow you to determine if an unintended user is allowed, making it easier for administrators to monitor least privileges access. Access Analyzer analyzes only policies that are applied to resources in the same AWS Region.

๐Ÿ’ผ 1.20 Ensure that IAM Access analyzer is enabled for all regions - Level 1 (Automated)

Enable IAM Access analyzer for IAM policies about all resources in each active AWS region. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. So the results allow you to determine if an unintended user is allowed, making it easier for administrators to monitor least privileges access. Access Analyzer analyzes only policies that are applied to resources in the same AWS Region.

๐Ÿ’ผ 1.20 Ensure that IAM Access Analyzer is enabled for all regions (Automated)

Enable the IAM Access Analyzer for IAM policies regarding all resources in each active AWS region. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. The results allow you to determine whether an unintended user is permitted, making it easier for administrators to monitor least privilege access. Access Analyzer analyzes only the policies that are applied to resources in the same AWS Region.

๐Ÿ’ผ 1.20 Ensure that IAM Access Analyzer is enabled for all regions (Automated)

Enable the IAM Access Analyzer for IAM policies regarding all resources in each active AWS region. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. The results allow you to determine whether an unintended user is permitted, making it easier for administrators to monitor least privilege access. Access Analyzer analyzes only the policies that are applied to resources in the same AWS Region.

๐Ÿ’ผ 1.20 Ensure that S3 Buckets are configured with 'Block public access (bucket settings)'

Amazon S3 provides 'Block public access (bucket settings)' and 'Block public access (account settings)' to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principle with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, 'Block public access (bucket settings)' prevents an individual bucket, and its contained objects, from becoming publicly accessible. Similarly, 'Block public access (account settings)' prevents all buckets, and contained objects, from becoming publicly accessible across the entire account.

๐Ÿ’ผ 1.21 Ensure access to AWSCloudShellFullAccess is restricted (Manual)

AWS CloudShell is a convenient way of running CLI commands against AWS services; a managed IAM policy ('AWSCloudShellFullAccess') provides full access to CloudShell, which allows file upload and download capability between a user's local system and the CloudShell environment. Within the CloudShell environment, a user has sudo permissions and can access the internet. Therefore, it is feasible to install file transfer software, for example, and move data from CloudShell to external internet servers.

๐Ÿ’ผ 1.21 Ensure Security Defaults is enabled on Azure Active Directory - Level 1 (Manual | Not supported, no API/CLI available by Azure)

Security defaults in Azure Active Directory (Azure AD) make it easier to be secure and help protect your organization. Security defaults contain preconfigured security settings for common attacks. Microsoft is making security defaults available to everyone. The goal is to ensure that all organizations have a basic level of security-enabled at no extra cost. You turn on security defaults in the Azure portal. Please note that according to the CIS Benchmark audit steps, at this point in time, there is no API/CLI mechanism available to programmatically conduct a security assessment for this recommendation.

๐Ÿ’ผ 1.21 Ensure that IAM Access analyzer is enabled

Enable IAM Access analyzer for IAM policies about all resources. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. So the results allow you to determine if an unintended user is allowed, making it easier for administrators to monitor least privileges access.

๐Ÿ’ผ 1.22 Ensure a Custom Role is Assigned Permissions for Administering Resource Locks - Level 2 (Manual | Not supported, no API/CLI available by Azure)

Resource locking is a powerful protection mechanism that can prevent inadvertent modification/deletion of resources within Azure subscriptions/Resource Groups and is a recommended NIST configuration. Please note that according to the CIS Benchmark audit steps, at this point in time, there is no API/CLI mechanism available to programmatically conduct a security assessment for this recommendation.

๐Ÿ’ผ 1.22 Ensure access to AWSCloudShellFullAccess is restricted - Level 1 (Manual)

AWS CloudShell is a convenient way of running CLI commands against AWS services; a managed IAM policy ('AWSCloudShellFullAccess') provides full access to CloudShell, which allows file upload and download capability between a user's local system and the CloudShell environment. Within the CloudShell environment a user has sudo permissions, and can access the internet. So it is feasible to install file transfer software (for example) and move data from CloudShell to external internet servers.

๐Ÿ’ผ 1.22 Ensure access to AWSCloudShellFullAccess is restricted - Level 1 (Manual)

AWS CloudShell is a convenient way of running CLI commands against AWS services; a managed IAM policy ('AWSCloudShellFullAccess') provides full access to CloudShell, which allows file upload and download capability between a user's local system and the CloudShell environment. Within the CloudShell environment a user has sudo permissions, and can access the internet. So it is feasible to install file transfer software (for example) and move data from CloudShell to external internet servers.

๐Ÿ’ผ 1.22 Ensure access to AWSCloudShellFullAccess is restricted (Manual)

AWS CloudShell is a convenient way of running CLI commands against AWS services; a managed IAM policy ('AWSCloudShellFullAccess') provides full access to CloudShell, which allows file upload and download capability between a user's local system and the CloudShell environment. Within the CloudShell environment, a user has sudo permissions and can access the internet. Therefore, it is feasible to install file transfer software, for example, and move data from CloudShell to external internet servers.

๐Ÿ’ผ 1.22 Ensure access to AWSCloudShellFullAccess is restricted (Manual)

AWS CloudShell is a convenient way of running CLI commands against AWS services; a managed IAM policy ('AWSCloudShellFullAccess') provides full access to CloudShell, which allows file upload and download capability between a user's local system and the CloudShell environment. Within the CloudShell environment, a user has sudo permissions and can access the internet. Therefore, it is feasible to install file transfer software, for example, and move data from CloudShell to external internet servers.

๐Ÿ’ผ 1.22 Ensure IAM policies that allow full "*:*" administrative privileges are not created

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered a standard security advice to grant _least privilege_โ€”that is, granting only the permissions required to perform a task. Determine what users need to do and then craft policies for them that let the users perform _only_ those tasks, instead of allowing full administrative privileges.

๐Ÿ’ผ 1.22 Ensure Security Defaults is enabled on Azure Active Directory - Level 1 (Automated | Not supported, no API/CLI available by Azure)

Security defaults in Azure Active Directory (Azure AD) make it easier to be secure and help protect your organization. Security defaults contain preconfigured security settings for common attacks. Microsoft is making security defaults available to everyone. The goal is to ensure that all organizations have a basic level of security-enabled at no extra cost. You turn on security defaults in the Azure portal. Please note that according to the CIS Benchmark audit steps, at this point in time, there is no API/CLI mechanism available to programmatically conduct a security assessment for this recommendation.

๐Ÿ’ผ 1.3 Ensure Access Review is Set Up for External Users in Azure AD Privileged Identity Management - Level 2 (Manual)

This recommendation extends guest access review by utilizing the Azure AD Privileged Identity Management feature provided in Azure AD Premium P2. Azure AD is extended to include Azure AD B2B collaboration, allowing you to invite people from outside your organization to be guest users in your cloud account and sign in with their own work, school, or social identities. Guest users allow you to share your company's applications and services with users from any other organization, while maintaining control over your own corporate data. Work with external partners, large or small, even if they don't have Azure AD or an IT department. A simple invitation and redemption process lets partners use their own credentials to access your company's resources a a guest user.

๐Ÿ’ผ 1.3 Ensure guest users are reviewed on a monthly basis - Level 1 (Manual | Assessment requires a manual procedure. Hover over the title for the full description)

This recommendation assessment can be achieved only by a manual process using the cloud configuration rule 'Azure Active Directory should not include guest users' (IAM-044). This rule provides visibility to Guest users. Azure AD is extended to include Azure AD B2B collaboration, allowing you to invite people from outside your organization to be guest users in your cloud account and sign in with their work, school, or social identities. Guest users allow you to share your company's applications and services with users from any other organization while maintaining control over your corporate data. Work with external partners, large or small, even if they don't have Azure AD or an IT department. A simple invitation and redemption process lets partners use their credentials to access your company's resources as a guest user. Guest users should be reviewed on a monthly basis to ensure that inactive and unneeded accounts are removed.

๐Ÿ’ผ 1.3 Ensure guest users are reviewed on a monthly basis - Level 1 (Manual | Assessment requires a manual procedure. Hover over the title for the full description)

This recommendation assessment can be achieved only by a manual process using the cloud configuration rule 'Azure Active Directory should not include guest users' (IAM-044). This rule provides visibility to Guest users. Azure AD is extended to include Azure AD B2B collaboration, allowing you to invite people from outside your organization to be guest users in your cloud account and sign in with their work, school, or social identities. Guest users allow you to share your company's applications and services with users from any other organization while maintaining control over your corporate data. Work with external partners, large or small, even if they don't have Azure AD or an IT department. A simple invitation and redemption process lets partners use their credentials to access your company's resources as a guest user. Guest users should be reviewed on a monthly basis to ensure that inactive and unneeded accounts are removed.

๐Ÿ’ผ 1.4 Ensure access keys are rotated every 90 days or less

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be regularly rotated.

๐Ÿ’ผ 1.4 Ensure Access Review is Set Up for External Users in Azure AD Privileged Identity Management - Level 2 (Manual)

This recommendation extends guest access review by utilizing the Azure AD Privileged Identity Management feature provided in Azure AD Premium P2. Azure AD is extended to include Azure AD B2B collaboration, allowing you to invite people from outside your organization to be guest users in your cloud account and sign in with their own work, school, or social identities. Guest users allow you to share your company's applications and services with users from any other organization, while maintaining control over your own corporate data. Work with external partners, large or small, even if they don't have Azure AD or an IT department. A simple invitation and redemption process lets partners use their own credentials to access your company's resources a a guest user.

๐Ÿ’ผ 1.4 Ensure Guest Users Are Reviewed on a Regular Basis - Level 1 (Manual | Assessment requires a manual procedure. Hover over the title for the full description)

This recommendation assessment can be achieved only by a manual process using the cloud configuration rule 'Azure Active Directory should not include guest users' (IAM-044). This rule provides visibility to Guest users. Azure AD is extended to include Azure AD B2B collaboration, allowing you to invite people from outside your organization to be guest users in your cloud account and sign in with their own work, school, or social identities. Guest users allow you to share your company's applications and services with users from any other organization, while maintaining control over your own corporate data. Work with external partners, large or small, even if they don't have Azure AD or an IT department. A simple invitation and redemption process lets partners use their own credentials to access your company's resources as a guest user. Guest users in every subscription should be review on a regular basis to ensure that inactive and unneeded accounts are removed.

๐Ÿ’ผ 1.4 Ensure Guest Users Are Reviewed on a Regular Basis - Level 1 (Manual)

Microsoft Entra ID is extended to include Azure AD B2B collaboration, allowing you to invite people from outside your organization to be guest users in your cloud account and sign in with their own work, school, or social identities. Guest users allow you to share your company's applications and services with users from any other organization, while maintaining control over your own corporate data. Work with external partners, large or small, even if they don't have Azure AD or an IT department. A simple invitation and redemption process lets partners use their own credentials to access your company's resources as a guest user. Guest users in every subscription should be review on a regular basis to ensure that inactive and unneeded accounts are removed.

๐Ÿ’ผ 1.4 Ensure MFA is enabled for the 'root' user account (Automated)

The 'root' user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device. **Note**: When virtual MFA is used for 'root' accounts, it is recommended that the device used is NOT a personal device, but rather a dedicated mobile device (tablet or phone) that is kept charged and secured, independent of any individual personal devices ("nonpersonal virtual MFA"). This lessens the risks of losing access to the MFA due to device loss, device trade-in, or if the individual owning the device is no longer employed at the company. Where an AWS Organization is using centralized root access, root credentials can be removed from member accounts. In that case it is neither possible nor necessary to configure root MFA in the member account.

๐Ÿ’ผ 1.5 Ensure Guest Users Are Reviewed on a Regular Basis - Level 1 (Manual)

Azure AD is extended to include Azure AD B2B collaboration, allowing you to invite people from outside your organization to be guest users in your cloud account and sign in with their own work, school, or social identities. Guest users allow you to share your company's applications and services with users from any other organization, while maintaining control over your own corporate data. Work with external partners, large or small, even if they don't have Azure AD or an IT department. A simple invitation and redemption process lets partners use their own credentials to access your company's resources as a guest user. Guest users in every subscription should be review on a regular basis to ensure that inactive and unneeded accounts are removed.

๐Ÿ’ผ 1.5 Ensure hardware MFA is enabled for the 'root' user account (Manual)

The 'root' user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the 'root' user account be protected with a hardware MFA. Where an AWS Organization is using centralized root access, root credentials can be removed from member accounts. In that case it is neither possible nor necessary to configure root MFA in the member account.

๐Ÿ’ผ 1.5 Ensure MFA is enabled for the 'root' user account

The 'root' user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device. **Note:** When virtual MFA is used for 'root' accounts, it is recommended that the device used is NOT a personal device, but rather a dedicated mobile device (tablet or phone) that is managed to be kept charged and secured independent of any individual personal devices. ("non-personal virtual MFA") This lessens the risks of losing access to the MFA due to device loss, device trade-in or if the individual owning the device is no longer employed at the company.

๐Ÿ’ผ 1.5 Ensure MFA is enabled for the 'root' user account - Level 1 (Automated)

The 'root' user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device. **Note:** When virtual MFA is used for 'root' accounts, it is recommended that the device used is NOT a personal device, but rather a dedicated mobile device (tablet or phone) that is managed to be kept charged and secured independent of any individual personal devices. ("non-personal virtual MFA") This lessens the risks of losing access to the MFA due to device loss, device trade-in or if the individual owning the device is no longer employed at the company.

๐Ÿ’ผ 1.5 Ensure MFA is enabled for the 'root' user account - Level 1 (Automated)

The 'root' user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device. **Note:** When virtual MFA is used for 'root' accounts, it is recommended that the device used is NOT a personal device, but rather a dedicated mobile device (tablet or phone) that is managed to be kept charged and secured independent of any individual personal devices. ("non-personal virtual MFA") This lessens the risks of losing access to the MFA due to device loss, device trade-in or if the individual owning the device is no longer employed at the company.

๐Ÿ’ผ 1.5 Ensure MFA is enabled for the 'root' user account - Level 1 (Automated)

The 'root' user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device. **Note:** When virtual MFA is used for 'root' accounts, it is recommended that the device used is NOT a personal device, but rather a dedicated mobile device (tablet or phone) that is managed to be kept charged and secured independent of any individual personal devices. ("non-personal virtual MFA") This lessens the risks of losing access to the MFA due to device loss, device trade-in or if the individual owning the device is no longer employed at the company.

๐Ÿ’ผ 1.5 Ensure MFA is enabled for the 'root' user account (Automated)

The 'root' user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device.

๐Ÿ’ผ 1.5 Ensure MFA is enabled for the 'root' user account (Automated)

The 'root' user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device.

๐Ÿ’ผ 1.5 Ensure MFA is enabled for the "root user" account

The root user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device. **Note:** When virtual MFA is used for root accounts, it is recommended that the device used is NOT a personal device, but rather a dedicated mobile device (tablet or phone) that is managed to be kept charged and secured independent of any individual personal devices. ("non-personal virtual MFA") This lessens the risks of losing access to the MFA due to device loss, device trade-in or if the individual owning the device is no longer employed at the company.

๐Ÿ’ผ 1.5 Ensure that Service Account has no Admin privileges

A service account is a special Google account that belongs to an application or a VM, instead of to an individual end-user. The application uses the service account to call the service's Google API so that users aren't directly involved. It's recommended not to use admin access for ServiceAccount.

๐Ÿ’ผ 1.5.1 Security controls are implemented on any computing devices, including company- and employee-owned devices, that connect to both untrusted networks and the CDE.

As follows: - Specific configuration settings are defined to prevent threats being introduced into the entity's network. - Security controls are actively running. - Security controls are not alterable by users of the computing devices unless specifically documented and authorized by management on a case-by-case basis for a limited period.

๐Ÿ’ผ 1.5.1 Security controls are implemented on any computing devices, including company- and employee-owned devices, that connect to both untrusted networks and the CDE.

As follows: - Specific configuration settings are defined to prevent threats being introduced into the entity's network. - Security controls are actively running. - Security controls are not alterable by users of the computing devices unless specifically documented and authorized by management on a case-by-case basis for a limited period.

๐Ÿ’ผ 1.6 Ensure hardware MFA is enabled for the 'root' user account

The 'root' user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the 'root' user account be protected with a hardware MFA.

๐Ÿ’ผ 1.6 Ensure hardware MFA is enabled for the 'root' user account - Level 2 (Automated)

The 'root' user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the 'root' user account be protected with a hardware MFA.

๐Ÿ’ผ 1.6 Ensure hardware MFA is enabled for the 'root' user account - Level 2 (Manual)

The 'root' user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the 'root' user account be protected with a hardware MFA.

๐Ÿ’ผ 1.6 Ensure hardware MFA is enabled for the 'root' user account - Level 2 (Manual)

The 'root' user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the 'root' user account be protected with a hardware MFA.

๐Ÿ’ผ 1.6 Ensure hardware MFA is enabled for the 'root' user account (Manual)

The 'root' user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the 'root' user account be protected with a hardware MFA.

๐Ÿ’ผ 1.6 Ensure hardware MFA is enabled for the 'root' user account (Manual)

The 'root' user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the 'root' user account be protected with a hardware MFA.

๐Ÿ’ผ 1.6 Ensure hardware MFA is enabled for the "root user" account

The root user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the root user account be protected with a hardware MFA.

๐Ÿ’ผ 1.6 Ensure that 'Number of days before users are asked to re-confirm their authentication information' is not set to '0' - Level 1 (Manual | Not supported, no API/CLI available by Azure)

Ensure that the number of days before users are asked to re-confirm their authentication information is not set to 0. Please note that according to the CIS Benchmark audit steps, at this point in time, there is no API/CLI mechanism available to programmatically conduct a security assessment for this recommendation.

๐Ÿ’ผ 1.6 Ensure that 'Number of days before users are asked to re-confirm their authentication information' is not set to "0" - Level 1 (Manual | Not supported, no API/CLI available by Azure)

Ensure that the number of days before users are asked to re-confirm their authentication information is not set to 0. Please note that according to the CIS Benchmark audit steps, at this point in time, there is no API/CLI mechanism available to programmatically conduct a security assessment for this recommendation.

๐Ÿ’ผ 1.6 Ensure that a Custom Bad Password List is set to 'Enforce' for your Organization - Level 1 (Manual)

Microsoft Azure provides a Global Banned Password policy that applies to Azure administrative and normal user accounts. This is not applied to user accounts that are synced from an on-premise Active Directory unless Microsoft Entra ID Connect is used and you enable EnforceCloudPasswordPolicyForPasswordSyncedUsers. Please see the list in default values on the specifics of this policy. To further password security, it is recommended to further define a custom banned password policy.

๐Ÿ’ผ 1.7 Ensure that a Custom Bad Password List is set to 'Enforce' for your Organization - Level 1 (Manual | Not supported, no API/CLI available by Azure)

Microsoft Azure creates a default bad password policy that is already applied to Azure administrative and normal user accounts. This is not applied to user accounts that are synced from an on-premise Active Directory unless Azure AD Connect is used and you enable EnforceCloudPasswordPolicyForPasswordSyncedUsers. Please see the list in default values on the specifics of this policy.

๐Ÿ’ผ 1.7 Ensure that a Custom Bad Password List is set to 'Enforce' for your Organization - Level 1 (Manual)

Microsoft Azure provides a Global Banned Password policy that applies to Azure administrative and normal user accounts. This is not applied to user accounts that are synced from an on-premise Active Directory unless Azure AD Connect is used and you enable EnforceCloudPasswordPolicyForPasswordSyncedUsers. Please see the list in default values on the specifics of this policy. To further password security, it is recommended to further define a custom banned password policy.

๐Ÿ’ผ 1.9 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password (Automated)

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 10 APRA does not seek to impose restrictions on a Boardโ€™s ability to delegate information security roles and responsibilities to Board sub-committees, management committees or individuals. However, APRA expects that a Board would clearly outline how it expects to be engaged with respect to information security, including escalation of risks, issues and reporting. Refer to Attachment H for common examples of the types of information that the Board might find useful in this regard.

๐Ÿ’ผ 10.1 Ensure that Resource Locks are set for Mission-Critical Azure Resources - Level 2 (Manual)

Resource Manager Locks provide a way for administrators to lock down Azure resources to prevent deletion of, or modifications to, a resource. These locks sit outside of the Role Based Access Controls (RBAC) hierarchy and, when applied, will place restrictions on the resource for all users. These locks are very useful when there is an important resource in a subscription that users should not be able to delete or change. Locks can help prevent accidental and malicious changes or deletion.

๐Ÿ’ผ 10.1 Ensure that Resource Locks are set for Mission-Critical Azure Resources - Level 2 (Manual)

Resource Manager Locks provide a way for administrators to lock down Azure resources to prevent deletion of, or modifications to, a resource. These locks sit outside of the Role Based Access Controls (RBAC) hierarchy and, when applied, will place restrictions on the resource for all users. These locks are very useful when there is an important resource in a subscription that users should not be able to delete or change. Locks can help prevent accidental and malicious changes or deletion.

๐Ÿ’ผ 10.1 Ensure that Resource Locks are set for Mission-Critical Azure Resources - Level 2 (Manual)

Resource Manager Locks provide a way for administrators to lock down Azure resources to prevent deletion of, or modifications to, a resource. These locks sit outside of the Role Based Access Controls (RBAC) hierarchy and, when applied, will place restrictions on the resource for all users. These locks are very useful when there is an important resource in a subscription that users should not be able to delete or change. Locks can help prevent accidental and malicious changes or deletion.

๐Ÿ’ผ 10.1 Ensure that Resource Locks are set for Mission-Critical Azure Resources (Manual)

Resource Manager Locks provide a way for administrators to lock down Azure resources to prevent deletion of, or modifications to, a resource. These locks sit outside of the Role Based Access Controls (RBAC) hierarchy and, when applied, will place restrictions on the resource for all users. These locks are very useful when there is an important resource in a subscription that users should not be able to delete or change. Locks can help prevent accidental and malicious changes or deletion.

๐Ÿ’ผ 10.2 Azure Blob Storage

This section covers security best practice recommendations for Azure Blob Storage. Azure Blob Storage is a core storage service type for Azure Storage Accounts. Azure Data Lake services depend on the Azure Blob Service.

๐Ÿ’ผ 10.2.1 Ensure that soft delete for blobs on Azure Blob Storage storage accounts is Enabled (Automated)

Blobs in Azure storage accounts may contain sensitive or personal data, such as ePHI or financial information. Data that is erroneously modified or deleted by an application or a user can lead to data loss or unavailability. It is recommended that soft delete be enabled on Azure storage accounts with blob storage to allow for the preservation and recovery of data when blobs or blob snapshots are deleted.

๐Ÿ’ผ 10.3 Storage Accounts

This section covers security best practice recommendations for Storage Accounts in Azure. The recommendations in this section apply to the Storage Account, but not to the Storage Services which may be running on that account. Use the Storage Account recommendations as a starting place for securing the account, then proceed to apply the recommendations from the storage services section(s) that are relevant to the storage services running on your account. Storage Accounts are a family of account types that support different Storage Services. The Storage Account types and their supported services follow: โ€ข Standard general-purpose v2 supported services: Blob Storage (including Data Lake Storage), Queue Storage, Table Storage, and Azure Files. โ€ข Premium block blobs supported services: Blob Storage (including Data Lake Storage) โ€ข Premium file shares supported services: Azure Files โ€ข Premium page blobs supported services: Page blobs only

๐Ÿ’ผ 10.3.10 Ensure Azure Resource Manager Delete locks are applied to Azure Storage Accounts (Manual)

Azure Resource Manager CannotDelete (Delete) locks can prevent users from accidentally or maliciously deleting a storage account. This feature ensures that while the Storage account can still be modified or used, deletion of the Storage account resource requires removal of the lock by a user with appropriate permissions. This feature is a protective control for the availability of data. By ensuring that a storage account or its parent resource group cannot be deleted without first removing the lock, the risk of data loss is reduced.

๐Ÿ’ผ 10.3.11 Ensure Azure Resource Manager ReadOnly locks are considered for Azure Storage Accounts (Manual)

Adding an Azure Resource Manager ReadOnly lock can prevent users from accidentally or maliciously deleting a storage account, modifying its properties and containers, or creating access assignments. The lock must be removed before the storage account can be deleted or updated. It provides more protection than a CannotDelete-type of resource manager lock. This feature prevents POST operations on a storage account and containers to the Azure Resource Manager control plane, management.azure.com. Blocked operations include listKeys which prevents clients from obtaining the account shared access keys. Microsoft does not recommend ReadOnly locks for storage accounts with Azure Files and Table service containers. This Azure Resource Manager REST API documentation (spec) provides information about the control plane POST operations for Microsoft.Storage resources.

๐Ÿ’ผ 10.3.2.1 Ensure Private Endpoints are used to access Storage Accounts (Automated)

Use private endpoints for your Azure Storage accounts to allow clients and services to securely access data located over a network via an encrypted Private Link. To do this, the private endpoint uses an IP address from the VNet for each service. Network traffic between disparate services securely traverses encrypted over the VNet. This VNet can also link addressing space, extending your network and accessing resources on it. Similarly, it can be a tunnel through public networks to connect remote infrastructures together. This creates further security through segmenting network traffic and preventing outside sources from accessing it.

๐Ÿ’ผ 10.3.5 Ensure 'Allow Azure services on the trusted services list to access this storage account' is Enabled for Storage Account Access (Automated)

NOTE: This recommendation assumes that the 'Public network access' parameter is set to 'Enabled from selected virtual networks and IP addresses'. Please ensure the prerequisite recommendation has been implemented before proceeding: โ€ข Ensure Default Network Access Rule for Storage Accounts is Set to Deny Some Azure services that interact with storage accounts operate from networks that can't be granted access through network rules. To help this type of service work as intended, allow the set of trusted Azure services to bypass the network rules. These services will then use strong authentication to access the storage account. If the 'Allow Azure services on the trusted services list to access this storage account' exception is enabled, the following services are granted access to the storage account: Azure Backup, Azure Data Box, Azure DevTest Labs, Azure Event Grid, Azure Event Hubs, Azure File Sync, Azure HDInsight, Azure Import/Export, Azure Monitor, Azure Networking Services, and Azure Site Recovery (when registered in the subscription).

๐Ÿ’ผ 10.3.6 Ensure Soft Delete is Enabled for Azure Containers and Blob Storage (Automated)

The Azure Storage blobs contain data like ePHI or Financial, which can be secret or personal. Data that is erroneously modified or deleted by an application or other storage account user will cause data loss or unavailability. It is recommended that both Azure Containers with attached Blob Storage and standalone containers with Blob Storage be made recoverable by enabling the soft delete configuration. This is to save and recover data when blobs or blob snapshots are deleted.

๐Ÿ’ผ 10.3.8 Ensure 'Cross Tenant Replication' is not enabled (Automated)

Cross Tenant Replication in Azure allows data to be replicated across multiple Azure tenants. While this feature can be beneficial for data sharing and availability, it also poses a significant security risk if not properly managed. Unauthorized data access, data leakage, and compliance violations are potential risks. Disabling Cross Tenant Replication ensures that data is not inadvertently replicated across different tenant boundaries without explicit authorization.

๐Ÿ’ผ 10.3.9 Ensure that 'Allow Blob Anonymous Access' is set to 'Disabled' (Automated)

The Azure Storage setting 'Allow Blob Anonymous Access' (aka "allowBlobPublicAccess") controls whether anonymous access is allowed for blob data in a storage account. When this property is set to True, it enables public read access to blob data, which can be convenient for sharing data but may carry security risks. When set to False, it disallows public access to blob data, providing a more secure storage environment.

๐Ÿ’ผ 10.4.1 The audit logs are reviewed at least once daily.

The following logs: - All security events. - Logs of all system components that store, process, or transmit CHD and/or SAD. - Logs of all critical system components. - Logs of all servers and system components that perform security functions (for example, network security controls, intrusion-detection systems/intrusion-prevention systems (IDS/IPS), authentication servers).

๐Ÿ’ผ 10.4.1 The audit logs are reviewed at least once daily.

The following logs: - All security events. - Logs of all system components that store, process, or transmit CHD and/or SAD. - Logs of all critical system components. - Logs of all servers and system components that perform security functions (for example, network security controls, intrusion-detection systems/intrusion-prevention systems (IDS/IPS), authentication servers).

๐Ÿ’ผ 10.6.1 Review security events and critical system component logs at least daily.

Including the following: - All security events - Logs of all system components that store, process, or transmit CHD and/or SAD - Logs of all critical system components - Logs of all servers and system components that perform security functions (for example, firewalls, intrusion-detection systems/intrusion-prevention systems (IDS/IPS), authentication servers, e-commerce redirection servers, etc.)

๐Ÿ’ผ 10.6.2 Systems are configured to the correct and consistent time.

As follows: - One or more designated time servers are in use. - Only the designated central time server(s) receives time from external sources. - Time received from external sources is based on International Atomic Time or Coordinated Universal Time (UTC). - The designated time server(s) accept time updates only from specific industry-accepted external sources. - Where there is more than one designated time server, the time servers peer with one another to keep accurate time. - Internal systems receive time information only from designated central time server(s).

๐Ÿ’ผ 10.6.2 Systems are configured to the correct and consistent time.

As follows: - One or more designated time servers are in use. - Only the designated central time server(s) receives time from external sources. - Time received from external sources is based on International Atomic Time or Coordinated Universal Time (UTC). - The designated time server(s) accept time updates only from specific industry-accepted external sources. - Where there is more than one designated time server, the time servers peer with one another to keep accurate time. - Internal systems receive time information only from designated central time server(s).

๐Ÿ’ผ 10.7.3 Failures of any critical security controls systems are responded to promptly.

Including the following: - Restoring security functions. - Identifying and documenting the duration (date and time from start to end) of the security failure. - Identifying and documenting the cause(s) of failure and documenting required remediation. - Identifying and addressing any security issues that arose during the failure. - Determining whether further actions are required as a result of the security failure. - Implementing controls to prevent the cause of failure from reoccurring. - Resuming monitoring of security controls.

๐Ÿ’ผ 10.7.3 Failures of any critical security controls systems are responded to promptly.

Including the following: - Restoring security functions. - Identifying and documenting the duration (date and time from start to end) of the security failure. - Identifying and documenting the cause(s) of failure and documenting required remediation. - Identifying and addressing any security issues that arose during the failure. - Determining whether further actions are required as a result of the security failure. - Implementing controls to prevent the cause of failure from reoccurring. - Resuming monitoring of security controls.

๐Ÿ’ผ 10.8.1 Respond to failures of any critical security controls in a timely manner.

**Additional requirement for service providers only.** Processes for responding to failures in security controls must include: - Restoring security functions - Identifying and documenting the duration (date and time start to end) of the security failure - Identifying and documenting cause(s) of failure, including root cause, and documenting remediation required to address root cause - Identifying and addressing any security issues that arose during the failure - Performing a risk assessment to determine whether further actions are required as a result of the security failure - Implementing controls to prevent cause of failure from reoccurring - Resuming monitoring of security controls

๐Ÿ’ผ 11.1 Implement processes to test for the presence of wireless access points (802.11), and detect and identify all authorized and unauthorized wireless access points on a quarterly basis.

Methods that may be used in the process include but are not limited to wireless network scans, physical/logical inspections of system components and infrastructure, network access control (NAC), or wireless IDS/IPS. Whichever methods are used, they must be sufficient to detect and identify both authorized and unauthorized devices.

๐Ÿ’ผ 11.2 Run internal and external network vulnerability scans at least quarterly and after any significant change in the network.

Multiple scan reports can be combined for the quarterly scan process to show that all systems were scanned and all applicable vulnerabilities have been addressed. Additional documentation may be required to verify non-remediated vulnerabilities are in the process of being addressed. For initial PCI DSS compliance, it is not required that four quarters of passing scans be completed if the assessor verifies 1) the most recent scan result was a passing scan, 2) the entity has documented policies and procedures requiring quarterly scanning, and 3) vulnerabilities noted in the scan results have been corrected as shown in a re-scan(s). For subsequent years after the initial PCI DSS review, four quarters of passing scans must have occurred.

๐Ÿ’ผ 11.2.2 Perform quarterly external vulnerability scans, via an Approved Scanning Vendor (ASV) approved by the Payment Card Industry Security Standards Council (PCI SSC).

Perform rescans as needed, until passing scans are achieved. Quarterly external vulnerability scans must be performed by an Approved Scanning Vendor (ASV), approved by the Payment Card Industry Security Standards Council (PCI SSC). Refer to the ASV Program Guide published on the PCI SSC website for scan customer responsibilities, scan preparation, etc.

๐Ÿ’ผ 11.3 Implement a methodology for penetration testing.

Includes the following: - Is based on industry-accepted penetration testing approaches (for example, NIST SP800-115) - Includes coverage for the entire CDE perimeter and critical systems - Includes testing from both inside and outside the network - Includes testing to validate any segmentation and scope-reduction controls - Defines application-layer penetration tests to include, at a minimum, the vulnerabilities listed in Requirement 6.5 - Defines network-layer penetration tests to include components that support network functions as well as operating systems - Includes review and consideration of threats and vulnerabilities experienced in the last 12 months - Specifies retention of penetration testing results and remediation activities results.

๐Ÿ’ผ 11.3.1 Internal vulnerability scans are performed.

As follows: - At least once every three months. - Vulnerabilities that are either high-risk or critical (according to the entity's vulnerability risk rankings defined at Requirement 6.3.1) are resolved. - Rescans are performed that confirm all high-risk and critical vulnerabilities (as noted above) have been resolved. - Scan tool is kept up to date with latest vulnerability information. - Scans are performed by qualified personnel and organizational independence of the tester exists.

๐Ÿ’ผ 11.3.1 Internal vulnerability scans are performed.

As follows: - At least once every three months. - High-risk and critical vulnerabilities (per the entity's vulnerability risk rankings defined at Requirement 6.3.1) are resolved. - Rescans are performed that confirm all high-risk and critical vulnerabilities (as noted above) have been resolved. - Scan tool is kept up to date with latest vulnerability information. - Scans are performed by qualified personnel and organizational independence of the tester exists.

๐Ÿ’ผ 11.3.2 External vulnerability scans are performed.

As follows: - At least once every three months. - By a PCI SSC Approved Scanning Vendor (ASV). - Vulnerabilities are resolved and ASV Program Guide requirements for a passing scan are met. - Rescans are performed as needed to confirm that vulnerabilities are resolved per the ASV Program Guide requirements for a passing scan.

๐Ÿ’ผ 11.3.2 External vulnerability scans are performed.

As follows: - At least once every three months. - By a PCI SSC Approved Scanning Vendor (ASV). - Vulnerabilities are resolved and ASV Program Guide requirements for a passing scan are met. - Rescans are performed as needed to confirm that vulnerabilities are resolved per the ASV Program Guide requirements for a passing scan.

๐Ÿ’ผ 11.4.1 A penetration testing methodology is defined, documented, and implemented by the entity.

Includes: - Industry-accepted penetration testing approaches. - Coverage for the entire CDE perimeter and critical systems. - Testing from both inside and outside the network. - Testing to validate any segmentation and scope-reduction controls. - Application-layer penetration testing to identify, at a minimum, the vulnerabilities listed in Requirement 6.2.4. - Network-layer penetration tests that encompass all components that support network functions as well as operating systems. - Review and consideration of threats and vulnerabilities experienced in the last 12 months. - Documented approach to assessing and addressing the risk posed by exploitable vulnerabilities and security weaknesses found during penetration testing. - Retention of penetration testing results and remediation activities results for at least 12 months.

๐Ÿ’ผ 11.4.1 A penetration testing methodology is defined, documented, and implemented by the entity.

Includes: - Industry-accepted penetration testing approaches. - Coverage for the entire CDE perimeter and critical systems. - Testing from both inside and outside the network. - Testing to validate any segmentation and scope-reduction controls. - Application-layer penetration testing to identify, at a minimum, the vulnerabilities listed in Requirement 6.2.4. - Network-layer penetration tests that encompass all components that support network functions as well as operating systems. - Review and consideration of threats and vulnerabilities experienced in the last 12 months. - Documented approach to assessing and addressing the risk posed by exploitable vulnerabilities and security weaknesses found during penetration testing. - Retention of penetration testing results and remediation activities results for at least 12 months.

๐Ÿ’ผ 11.4.2 Internal penetration testing is performed.

Including: - Per the entity's defined methodology, - At least once every 12 months - After any significant infrastructure or application upgrade or change - By a qualified internal resource or qualified external third-party - Organizational independence of the tester exists (not required to be a QSA or ASV).

๐Ÿ’ผ 11.4.2 Internal penetration testing is performed.

Including: - Per the entity's defined methodology, - At least once every 12 months - After any significant infrastructure or application upgrade or change - By a qualified internal resource or qualified external third-party - Organizational independence of the tester exists (not required to be a QSA or ASV).

๐Ÿ’ผ 11.4.3 External penetration testing is performed.

Including: - Per the entity's defined methodology - At least once every 12 months - After any significant infrastructure or application upgrade or change - By a qualified internal resource or qualified external third party - Organizational independence of the tester exists (not required to be a QSA or ASV).

๐Ÿ’ผ 11.4.3 External penetration testing is performed.

Including: - Per the entity's defined methodology - At least once every 12 months - After any significant infrastructure or application upgrade or change - By a qualified internal resource or qualified external third party - Organizational independence of the tester exists (not required to be a QSA or ASV).

๐Ÿ’ผ 11.4.5 If segmentation is used to isolate the CDE from other networks, penetration tests are performed on segmentation controls.

As follows: - At least once every 12 months and after any changes to segmentation controls/methods - Covering all segmentation controls/methods in use. - According to the entity's defined penetration testing methodology. - Confirming that the segmentation controls/methods are operational and effective, and isolate the CDE from all out-of-scope systems. - Confirming effectiveness of any use of isolation to separate systems with differing security levels (see Requirement 2.2.3). - Performed by a qualified internal resource or qualified external third party. - Organizational independence of the tester exists (not required to be a QSA or ASV).

๐Ÿ’ผ 11.4.5 If segmentation is used to isolate the CDE from other networks, penetration tests are performed on segmentation controls.

As follows: - At least once every 12 months and after any changes to segmentation controls/methods - Covering all segmentation controls/methods in use. - According to the entity's defined penetration testing methodology. - Confirming that the segmentation controls/methods are operational and effective, and isolate the CDE from all out-of-scope systems. - Confirming effectiveness of any use of isolation to separate systems with differing security levels (see Requirement 2.2.3). - Performed by a qualified internal resource or qualified external third party. - Organizational independence of the tester exists (not required to be a QSA or ASV).

๐Ÿ’ผ 11.4.6 If segmentation is used to isolate the CDE from other networks, penetration tests are performed on segmentation controls.

**Additional requirement for service providers only.** As follows: - At least once every six months and after any changes to segmentation controls/methods. - Covering all segmentation controls/methods in use. - According to the entity's defined penetration testing methodology. - Confirming that the segmentation controls/methods are operational and effective, and isolate the CDE from all out-of-scope systems. - Confirming effectiveness of any use of isolation to separate systems with differing security levels (see Requirement 2.2.3). - Performed by a qualified internal resource or qualified external third party. - Organizational independence of the tester exists (not required to be a QSA or ASV).

๐Ÿ’ผ 11.4.6 If segmentation is used to isolate the CDE from other networks, penetration tests are performed on segmentation controls.

**Additional requirement for service providers only.** As follows: - At least once every six months and after any changes to segmentation controls/methods. - Covering all segmentation controls/methods in use. - According to the entity's defined penetration testing methodology. - Confirming that the segmentation controls/methods are operational and effective, and isolate the CDE from all out-of-scope systems. - Confirming effectiveness of any use of isolation to separate systems with differing security levels (see Requirement 2.2.3). - Performed by a qualified internal resource or qualified external third party. - Organizational independence of the tester exists (not required to be a QSA or ASV).

๐Ÿ’ผ 11.5 Deploy a change-detection mechanism to alert personnel to unauthorized modification of critical system files, configuration files, or content files.

Configure the software to perform critical file comparisons at least weekly. For change-detection purposes, critical files are usually those that do not regularly change, but the modification of which could indicate a system compromise or risk of compromise. Change-detection mechanisms such as file-integrity monitoring products usually come pre-configured with critical files for the related operating system. Other critical files, such as those for custom applications, must be evaluated and defined by the entity (that is, the merchant or service provider).

๐Ÿ’ผ 11.6.1 A change- and tamper-detection mechanism is deployed.

As follows: - To alert personnel to unauthorized modification (including indicators of compromise, changes, additions, and deletions) to the security-impacting HTTP headers and the script contents of payment pages as received by the consumer browser. - The mechanism is configured to evaluate the received HTTP header and payment page. - The mechanism functions are performed as follows: - At least weekly. OR - Periodically (at the frequency defined in the entity's targeted risk analysis, which is performed according to all elements specified in Requirement 12.3.1).

๐Ÿ’ผ 11.6.1 A change- and tamper-detection mechanism is deployed.

As follows: - To alert personnel to unauthorized modification (including indicators of compromise, changes, additions, and deletions) to the HTTP headers and the contents of payment pages as received by the consumer browser. - The mechanism is configured to evaluate the received HTTP header and payment page. - The mechanism functions are performed as follows: - At least once every seven days OR - Periodically (at the frequency defined in the entity's targeted risk analysis, which is performed according to all elements specified in Requirement 12.3.1).

๐Ÿ’ผ 12 Information security roles and responsibilities are typically located in separate business areas, as well as within the IT function itself and in third parties and related parties. This can result in issues such as a lack of ownership, unclear accountabilities, ineffective oversight and fragmentation of practices with respect to information security. APRA regulated entities could address these issues by maintaining clear delineation between the responsibilities of each area and implementing compensating measures. Compensating measures could include establishing a virtual security group comprised of individuals with information security roles and responsibilities.

๐Ÿ’ผ 12.10.1 An incident response plan exists and is ready to be activated in the event of a suspected or confirmed security incident.

The plan includes, but is not limited to: - Roles, responsibilities, and communication and contact strategies in the event of a suspected or confirmed security incident, including notification of payment brands and acquirers, at a minimum. - Incident response procedures with specific containment and mitigation activities for different types of incidents. - Business recovery and continuity procedures. - Data backup processes. - Analysis of legal requirements for reporting compromises. - Coverage and responses of all critical system components. - Reference or inclusion of incident response procedures from the payment brands.

๐Ÿ’ผ 12.10.1 An incident response plan exists and is ready to be activated in the event of a suspected or confirmed security incident.

The plan includes, but is not limited to: - Roles, responsibilities, and communication and contact strategies in the event of a suspected or confirmed security incident, including notification of payment brands and acquirers, at a minimum. - Incident response procedures with specific containment and mitigation activities for different types of incidents. - Business recovery and continuity procedures. - Data backup processes. - Analysis of legal requirements for reporting compromises. - Coverage and responses of all critical system components. - Reference or inclusion of incident response procedures from the payment brands.

๐Ÿ’ผ 12.10.1 Create the incident response plan to be implemented in the event of system breach.

Ensure the plan addresses the following, at a minimum: - Roles, responsibilities, and communication and contact strategies in the event of a compromise including notification of the payment brands, at a minimum - Specific incident response procedures - Business recovery and continuity procedures - Data backup processes - Analysis of legal requirements for reporting compromises - Coverage and responses of all critical system components - Reference or inclusion of incident response procedures from the payment brands.

๐Ÿ’ผ 12.10.5 The security incident response plan includes monitoring and responding to alerts from security monitoring systems.

Including but not limited to: - Intrusion-detection and intrusion-prevention systems. - Network security controls. - Change-detection mechanisms for critical files. - The change-and tamper-detection mechanism for payment pages. This bullet is a best practice until its effective date; refer to Applicability Notes below for details. - Detection of unauthorized wireless access points.

๐Ÿ’ผ 12.10.5 The security incident response plan includes monitoring and responding to alerts from security monitoring systems.

Including but not limited to: - Intrusion-detection and intrusion-prevention systems. - Network security controls. - Change-detection mechanisms for critical files. - The change-and tamper-detection mechanism for payment pages. This bullet is a best practice until its effective date; refer to Applicability Notes below for details. - Detection of unauthorized wireless access points.

๐Ÿ’ผ 12.10.7 Incident response procedures are in place, to be initiated upon the detection of stored PAN anywhere it is not expected.

Include: - Determining what to do if PAN is discovered outside the CDE, including its retrieval, secure deletion, and/or migration into the currently defined CDE, as applicable. - Identifying whether sensitive authentication data is stored with PAN. - Determining where the account data came from and how it ended up where it was not expected. - Remediating data leaks or process gaps that resulted in the account data being where it was not expected.

๐Ÿ’ผ 12.10.7 Incident response procedures are in place, to be initiated upon the detection of stored PAN anywhere it is not expected.

Include: - Determining what to do if PAN is discovered outside the CDE, including its retrieval, secure deletion, and/or migration into the currently defined CDE, as applicable. - Identifying whether sensitive authentication data is stored with PAN. - Determining where the account data came from and how it ended up where it was not expected. - Remediating data leaks or process gaps that resulted in the account data being where it was not expected.

๐Ÿ’ผ 12.2 Implement a risk-assessment process.

This process: - Is performed at least annually and upon significant changes to the environment (for example, acquisition, merger, relocation, etc.), - Identifies critical assets, threats, and vulnerabilities, and - Results in a formal, documented analysis of risk. Examples of risk-assessment methodologies include but are not limited to OCTAVE, ISO 27005 and NIST SP 800-30.

๐Ÿ’ผ 12.3.1 Each PCI DSS requirement that provides flexibility for how frequently it is performed is supported by a targeted risk analysis that is documented.

Includes: - Identification of the assets being protected. - Identification of the threat(s) that the requirement is protecting against. - Identification of factors that contribute to the likelihood and/or impact of a threat being realized. - Resulting analysis that determines, and includes justification for, how frequently the requirement must be performed to minimize the likelihood of the threat being realized. - Review of each targeted risk analysis at least once every 12 months to determine whether the results are still valid or if an updated risk analysis is needed. - Performance of updated risk analyses when needed, as determined by the annual review.

๐Ÿ’ผ 12.3.1 For each PCI DSS requirement that specifies completion of a targeted risk analysis, the analysis is documented.

Includes: - Identification of the assets being protected. - Identification of the threat(s) that the requirement is protecting against. - Identification of factors that contribute to the likelihood and/or impact of a threat being realized. - Resulting analysis that determines, and includes justification for, how the frequency or processes defined by the entity to meet the requirement minimize the likelihood and/or impact of the threat being realized. - Review of each targeted risk analysis at least once every 12 months to determine whether the results are still valid or if an updated risk analysis is needed. - Performance of updated risk analyses when needed, as determined by the annual review.

๐Ÿ’ผ 12.3.4 Hardware and software technologies in use are reviewed at least once every 12 months.

Including at least the following: - Analysis that the technologies continue to receive security fixes from vendors promptly. - Analysis that the technologies continue to support (and do not preclude) the entity's PCI DSS compliance. - Documentation of any industry announcements or trends related to a technology, such as when a vendor has announced โ€œend of lifeโ€ plans for a technology. - Documentation of a plan, approved by senior management, to remediate outdated technologies, including those for which vendors have announced โ€œend of lifeโ€ plans.

๐Ÿ’ผ 12.3.4 Hardware and software technologies in use are reviewed at least once every 12 months.

Including at least the following: - Analysis that the technologies continue to receive security fixes from vendors promptly. - Analysis that the technologies continue to support (and do not preclude) the entity's PCI DSS compliance. - Documentation of any industry announcements or trends related to a technology, such as when a vendor has announced โ€œend of lifeโ€ plans for a technology. - Documentation of a plan, approved by senior management, to remediate outdated technologies, including those for which vendors have announced โ€œend of lifeโ€ plans.

๐Ÿ’ผ 12.5.2 PCI DSS scope is documented and confirmed by the entity at least once every 12 months and upon significant change to the in-scope environment.

At a minimum, the scoping validation includes: - Identifying all data flows for the various payment stages (for example, authorization, capture settlement, chargebacks, and refunds) and acceptance channels (for example, card-present, card-not-present, and e-commerce). - Updating all data-flow diagrams per Requirement 1.2.4. - Identifying all locations where account data is stored, processed, and transmitted, including but not limited to: 1) any locations outside of the currently defined CDE, 2) applications that process CHD, 3) transmissions between systems and networks, and 4) file backups. - Identifying all system components in the CDE, connected to the CDE, or that could impact security of the CDE. - Identifying all segmentation controls in use and the environment(s) from which the CDE is segmented, including justification for environments being out of scope. - Identifying all connections from third-party entities with access to the CDE. - Confirming that all identified data flows, account data, system components, segmentation controls, and connections from third parties with access to the CDE are included in scope.

๐Ÿ’ผ 12.5.2 PCI DSS scope is documented and confirmed by the entity at least once every 12 months and upon significant change to the in-scope environment.

At a minimum, the scoping validation includes: - Identifying all data flows for the various payment stages (for example, authorization, capture settlement, chargebacks, and refunds) and acceptance channels (for example, card-present, card-not-present, and e-commerce). - Updating all data-flow diagrams per Requirement 1.2.4. - Identifying all locations where account data is stored, processed, and transmitted, including but not limited to: 1) any locations outside of the currently defined CDE, 2) applications that process CHD, 3) transmissions between systems and networks, and 4) file backups. - Identifying all system components in the CDE, connected to the CDE, or that could impact security of the CDE. - Identifying all segmentation controls in use and the environment(s) from which the CDE is segmented, including justification for environments being out of scope. - Identifying all connections from third-party entities with access to the CDE. - Confirming that all identified data flows, account data, system components, segmentation controls, and connections from third parties with access to the CDE are included in scope.

๐Ÿ’ผ 12.8.2 Maintain a written agreement that includes an acknowledgement that the service providers are responsible for the security of cardholder data the service providers possess or otherwise store, process or transmit on behalf of the customer, or to the extent that they could impact the security of the customer's cardholder data environment.

The exact wording of an acknowledgement will depend on the agreement between the two parties, the details of the service being provided, and the responsibilities assigned to each party. The acknowledgement does not have to include the exact wording provided in this requirement.

๐Ÿ’ผ 12.8.2 Written agreements with TPSPs are maintained.

As follows: - Written agreements are maintained with all TPSPs with which account data is shared or that could affect the security of the CDE. - Written agreements include acknowledgments from TPSPs that TPSPs are responsible for the security of account data the TPSPs possess or otherwise store, process, or transmit on behalf of the entity, or to the extent that the TPSP could impact the security of the entity's cardholder data and/or sensitive authentication data.

๐Ÿ’ผ 12.8.2 Written agreements with TPSPs are maintained.

As follows: - Written agreements are maintained with all TPSPs with which account data is shared or that could affect the security of the CDE. - Written agreements include acknowledgments from TPSPs that they are responsible for the security of account data the TPSPs possess or otherwise store, process, or transmit on behalf of the entity, or to the extent that they could impact the security of the entity's CDE.

๐Ÿ’ผ 12.9 Service providers acknowledge in writing to customers that they are responsible for the security of cardholder data the service provider possesses or otherwise stores, processes, or transmits on behalf of the customer, or to the extent that they could impact the security of the customer's cardholder data environment.

**Additional requirement for service providers only.** The exact wording of an acknowledgement will depend on the agreement between the two parties, the details of the service being provided, and the responsibilities assigned to each party. The acknowledgement does not have to include the exact wording provided in this requirement.

๐Ÿ’ผ 12.9.2 TPSPs support their customers' requests for information to meet Requirements 12.8.4 and 12.8.5.

**Additional requirement for service providers only.** By providing the following upon customer request: - PCI DSS compliance status information (Requirement 12.8.4). - Information about which PCI DSS requirements are the responsibility of the TPSP and which are the responsibility of the customer, including any shared responsibilities (Requirement 12.8.5), for any service the TPSP provides that meets a PCI DSS requirement(s) on behalf of customers or that can impact security of customers' cardholder data or sensitive authentication data.

๐Ÿ’ผ 12.9.2 TPSPs support their customers' requests for information to meet Requirements 12.8.4 and 12.8.5.

**Additional requirement for service providers only.** By providing the following upon customer request: - PCI DSS compliance status information for any service the TPSP performs on behalf of customers (Requirement 12.8.4). - Information about which PCI DSS requirements are the responsibility of the TPSP and which are the responsibility of the customer, including any shared responsibilities (Requirement 12.8.5).

๐Ÿ’ผ 13 The Board, governing bodies and individuals would typically define their information requirements (e.g. schedule, format, scope and content) to ensure they are provided with sufficient and timely information to effectively discharge their information security roles and responsibilities. Reporting to governing bodies would normally be supported by defined escalation paths and thresholds. An APRA-regulated entity could benefit from implementing processes for periodic review of audience relevance and fitness for use.

๐Ÿ’ผ 17 APRA-regulated entities often place reliance on information security capabilities of third parties and related parties to provide a targeted information security capability, or as part of a wider service-provision arrangement. Accordingly, entities would have a view as to the sufficiency of resources, skills and controls of third parties and related parties. This could be achieved through a combination of interviews, service reporting, control testing, certifications, attestations, referrals and independent assurance assessments. Any capability gaps identified would be addressed in a timely manner.

๐Ÿ’ผ 2 Identity

This section covers security best practice recommendations for products in the Azure Identity services category. Azure Product Category Page: <https://azure.microsoft.com/en-us/products/category/identity> Many of the recommendations from this section are marked as "Manual" while the existing Azure CLI and Azure AD PowerShell support through the Azure AD Graph are being deprecated. It is now recommended to use the new Microsoft Graph PowerShell in replacement of Azure AD Graph for PowerShell and API level access. From a security posture standpoint, these recommendations are still very important and should not be discounted because they are "Manual." As automation capability is developed for this Benchmark, the related recommendations will be updated with the respective audit and remediation steps and changed to an "automated" assessment status. If any problems are encountered running Azure CLI or PowerShell methodologies, please refer to the Introduction section of this Benchmark where you will find additional detail on permission and required cmdlets.

๐Ÿ’ผ 2 Logging

This section contains recommendations for configuring AWS's account logging features.

๐Ÿ’ผ 2 Microsoft Defender

This section covers recommendations to consider for tenant-wide security policies and plans related to Microsoft Defender. Please note that because Microsoft Defender products require additional licensing, all Microsoft Defender plan recommendations in subsection 2.1 are assigned as โ€œLevel 2.โ€

๐Ÿ’ผ 2 Microsoft Defender

This section covers recommendations to consider for tenant-wide security policies and plans related to Microsoft Defender. Please note that because Microsoft Defender products require additional licensing, all Microsoft Defender plan recommendations in subsection 2.1 are assigned as โ€œLevel 2.โ€

๐Ÿ’ผ 2 Microsoft Defender for Cloud

This section covers recommendations to consider for tenant-wide security policies and plans related to Microsoft Defender. Please note that because Microsoft Defender products require additional licensing, all Microsoft Defender plan recommendations in subsection 2.1 are assigned as โ€œLevel 2.โ€

๐Ÿ’ผ 2 Security Center

This section covers security recommendations to follow when setting various security policies on an Azure Subscription. A security policy defines the set of controls, which are recommended for resources within the specified Azure subscription. Please note that the majority of the recommendations mentioned in this section only produce an alert if a security violation is found. They do not actually enforce security settings by themselves. Alerts should be acted upon and remedied wherever possible.

๐Ÿ’ผ 2 Security Center

This section covers security recommendations to follow when setting various security policies on an Azure Subscription. A security policy defines the set of controls, which are recommended for resources within the specified Azure subscription. Please note that the majority of the recommendations mentioned in this section only produce an alert if a security violation is found. They do not actually enforce security settings by themselves. Alerts should be acted upon and remedied wherever possible.

๐Ÿ’ผ 2.1 Defender Plans

This subsection is dedicated to providing guidance on Microsoft Defender for Cloud product plans. This guidance is intended to ensure that - at a minimum - the protective measures offered by these plans are being considered. Organizations may find that they have existing products or services that provide the same utility as some Microsoft Defender for Cloud products. Security and Administrative personnel need to make the determination on their organization's behalf regarding which - if any - of these recommendations are relevant to their organization's needs. In consideration of the above, and because of the potential for increased cost and complexity, please be aware that all Defender Plan recommendations are profiled as "Level 2" recommendations.

๐Ÿ’ผ 2.1 Ensure CloudTrail is enabled in all regions

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 2.1 Maintain current contact details (Manual)

Ensure contact email and telephone details for AWS accounts are current and map to more than one individual in your organization. An AWS account supports a number of contact details, and AWS will use these to contact the account owner if activity judged to be in breach of the Acceptable Use Policy or indicative of a likely security compromise is observed by the AWS Abuse team. Contact details should not be for a single individual, as circumstances may arise where that individual is unavailable. Email contact details should point to a mail alias which forwards email to multiple individuals within the organization; where feasible, phone contact details should point to a PABX hunt group or other call-forwarding system.

๐Ÿ’ผ 2.1 Microsoft Defender for Cloud

This subsection is dedicated to providing guidance on Microsoft Defender for Cloud product plans. This guidance is intended to ensure that - at a minimum - the protective measures offered by these plans are being considered. Organizations may find that they have existing products or services that provide the same utility as some Microsoft Defender for Cloud products. Security and Administrative personnel need to make the determination on their organization's behalf regarding which - if any - of these recommendations are relevant to their organization's needs. In consideration of the above, and because of the potential for increased cost and complexity, please be aware that all Defender Plan recommendations are profiled as "Level 2" recommendations.

๐Ÿ’ผ 2.1 Microsoft Defender for Cloud

This subsection is dedicated to providing guidance on Microsoft Defender for Cloud product plans. This guidance is intended to ensure that - at a minimum - the protective measures offered by these plans are being considered. Organizations may find that they have existing products or services that provide the same utility as some Microsoft Defender for Cloud products. Security and Administrative personnel need to make the determination on their organization's behalf regarding which - if any - of these recommendations are relevant to their organization's needs. In consideration of the above, and because of the potential for increased cost and complexity, please be aware that all Defender Plan recommendations are profiled as "Level 2" recommendations.

๐Ÿ’ผ 2.1 Security Defaults (Per-User MFA)

- If your organization pays for Microsoft Entra ID licensing (included in Microsoft 365 E3, E5, or F5, and EM&S E3 or E5 licenses) and **CAN** use Conditional Access, ignore the recommendations in this section and proceed to the Conditional Access section. - If your organization is using the free tier of Entra ID (Office 365 E1, E3, or E5, and Microsoft 365 F1 or F3 licenses) and **CAN NOT** use Conditional Access, proceed with the Security Defaults guidance in this section, and ignore the recommendations in the Conditional Access section. Conditional Access is preferred, but Security Defaults (Per-User MFA) is recommended only if Conditional Access isn't available. Why is this **IMPORTANT**? The Azure "Security Defaults" recommendations represent an entry-level set of recommendations (such as Multi-Factor Authentication) which will be relevant to organizations and tenants that are either just starting to use Azure, or are only utilizing a bare minimum feature set, and rely on the free license tier of Microsoft Entra ID. Security Defaults recommendations are intended to ensure that these use cases are still capable of establishing a strong baseline of secure configuration. **If your subscription is licensed to use Microsoft Entra ID P1 or P2, it is strongly recommended that the "Security Defaults" section (this section and the recommendations therein) be bypassed in favor of the use of "Conditional Access."**

๐Ÿ’ผ 2.1.1 Ensure Security Defaults is enabled on Microsoft Entra ID (Manual)

**[IMPORTANT - Please read the section overview**: If your organization pays for Microsoft Entra ID licensing (included in Microsoft 365 E3, E5, or F5, and EM&S E3 or E5 licenses) and **CAN** use Conditional Access, ignore the recommendations in this section and proceed to the Conditional Access section.] Security defaults in Microsoft Entra ID make it easier to be secure and help protect your organization. Security defaults contain preconfigured security settings for common attacks. Security defaults is available to everyone. The goal is to ensure that all organizations have a basic level of security enabled at no extra cost. You may turn on security defaults in the Azure portal.

๐Ÿ’ผ 2.1.2 Ensure that 'Multi-Factor Auth Status' is 'Enabled' for all Privileged Users (Manual)

**[IMPORTANT - Please read the section overview**: If your organization pays for Microsoft Entra ID licensing (included in Microsoft 365 E3, E5, or F5, and EM&S E3 or E5 licenses) and **CAN** use Conditional Access, ignore the recommendations in this section and proceed to the Conditional Access section.] Enable multi-factor authentication for all roles, groups, and users that have write access or permissions to Azure resources. These include custom created objects or built-in roles such as: - Service Co-Administrators - Subscription Owners - Contributors

๐Ÿ’ผ 2.1.21 Ensure that Microsoft Defender for Endpoint integration with Microsoft Defender for Cloud is selected - Level 2 (Manual)

This integration setting enables Microsoft Defender for Endpoint (formerly 'Advanced Threat Protection' or 'ATP' or 'WDATP' - see additional info) to communicate with Microsoft Defender for Cloud. **IMPORTANT:** When enabling integration between DfE & DfC it needs to be taken into account that this will have some side effects that may be undesirable. 1. For server 2019 & above if defender is installed (default for these server SKU's) this will trigger a deployment of the new unified agent and link to any of the extended configuration in the Defender portal. 2. If the new unified agent is required for server SKU's of Win 2016 or Linux and lower there is additional integration that needs to be switched on and agents need to be aligned.

๐Ÿ’ผ 2.1.22 Ensure that Microsoft Defender External Attack Surface Monitoring (EASM) is enabled - Level 2 (Manual)

An organization's attack surface is the collection of assets with a public network identifier or URI that an external threat actor can see or access from outside your cloud. It is the set of points on the boundary of a system, a system element, system component, or an environment where an attacker can try to enter, cause an effect on, or extract data from, that system, system element, system component, or environment. The larger the attack surface, the harder it is to protect. This tool can be configured to scan your organization's online infrastructure such as specified domains, hosts, CIDR blocks, and SSL certificates, and store them in an Inventory. Inventory items can be added, reviewed, approved, and removed, and may contain enrichments ("insights") and additional information collected from the tool's different scan engines and open-source intelligence sources. A Defender EASM workspace will generate an Inventory of publicly exposed assets by crawling and scanning the internet using _Seeds_ you provide when setting up the tool. Seeds can be FQDNs, IP CIDR blocks, and WHOIS records. Defender EASM will generate Insights within 24-48 hours after Seeds are provided, and these insights include vulnerability data (CVEs), ports and protocols, and weak or expired SSL certificates that could be used by an attacker for reconnaisance or exploitation. Results are classified High/Medium/Low and some of them include proposed mitigations.

๐Ÿ’ผ 2.1.22 Ensure that Microsoft Defender for Endpoint integration with Microsoft Defender for Cloud is selected - Level 2 (Manual)

This integration setting enables Microsoft Defender for Endpoint (formerly 'Advanced Threat Protection' or 'ATP' or 'WDATP' - see additional info) to communicate with Microsoft Defender for Cloud. IMPORTANT: When enabling integration between DfE & DfC it needs to be taken into account that this will have some side effects that may be undesirable. 1. For server 2019 & above if defender is installed (default for these server SKU's) this will trigger a deployment of the new unified agent and link to any of the extended configuration in the Defender portal. 2. If the new unified agent is required for server SKU's of Win 2016 or Linux and lower there is additional integration that needs to be switched on and agents need to be aligned.

๐Ÿ’ผ 2.1.3 Ensure That Microsoft Defender for Databases Is Set To 'On' - Level 2 (Manual)

Turning on Microsoft Defender for Databases enables threat detection for the instances running your database software. This provides threat intelligence, anomaly detection, and behavior analytics in the Azure Microsoft Defender for Cloud. Instead of being enabled on services like Platform as a Service (PaaS), this implementation will run within your instances as Infrastructure as a Service (IaaS) on the Operating Systems hosting your databases.

๐Ÿ’ผ 2.1.3 Ensure That Microsoft Defender for Databases Is Set To 'On' - Level 2 (Manual)

Turning on Microsoft Defender for Databases enables threat detection for the instances running your database software. This provides threat intelligence, anomaly detection, and behavior analytics in the Azure Microsoft Defender for Cloud. Instead of being enabled on services like Platform as a Service (PaaS), this implementation will run within your instances as Infrastructure as a Service (IaaS) on the Operating Systems hosting your databases.

๐Ÿ’ผ 2.1.4 Ensure that 'Allow users to remember multi-factor authentication on devices they trust' is Disabled (Manual)

**[IMPORTANT - Please read the section overview**: If your organization pays for Microsoft Entra ID licensing (included in Microsoft 365 E3, E5, or F5, and EM&S E3 or E5 licenses) and **CAN** use Conditional Access, ignore the recommendations in this section and proceed to the Conditional Access section.] Do not allow users to remember multi-factor authentication on devices.

๐Ÿ’ผ 2.1.4 Ensure that S3 Buckets are configured with 'Block public access (bucket settings)' - Level 1 (Automated)

Amazon S3 provides `Block public access (bucket settings)` and `Block public access (account settings)` to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principal with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, `Block public access (bucket settings)` prevents an individual bucket, and its contained objects, from becoming publicly accessible. Similarly, `Block public access (account settings)` prevents all buckets, and contained objects, from becoming publicly accessible across the entire account.

๐Ÿ’ผ 2.1.4 Ensure that S3 Buckets are configured with 'Block public access (bucket settings)' - Level 1 (Automated)

Amazon S3 provides `Block public access (bucket settings)` and `Block public access (account settings)` to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principal with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, `Block public access (bucket settings)` prevents an individual bucket, and its contained objects, from becoming publicly accessible. Similarly, `Block public access (account settings)` prevents all buckets, and contained objects, from becoming publicly accessible across the entire account.

๐Ÿ’ผ 2.1.4 Ensure that S3 is configured with 'Block Public Access' enabled (Automated)

Amazon S3 provides `Block public access (bucket settings)` and `Block public access (account settings)` to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principal with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, `Block public access (bucket settings)` prevents an individual bucket and its contained objects from becoming publicly accessible. Similarly, `Block public access (account settings)` prevents all buckets and their contained objects from becoming publicly accessible across the entire account.

๐Ÿ’ผ 2.1.4 Ensure that S3 is configured with 'Block Public Access' enabled (Automated)

Amazon S3 provides `Block public access (bucket settings)` and `Block public access (account settings)` to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principal with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, `Block public access (bucket settings)` prevents an individual bucket and its contained objects from becoming publicly accessible. Similarly, `Block public access (account settings)` prevents all buckets and their contained objects from becoming publicly accessible across the entire account.

๐Ÿ’ผ 2.1.4 Ensure that S3 is configured with 'Block Public Access' enabled (Automated)

Amazon S3 provides `Block public access (bucket settings)` and `Block public access (account settings)` to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principal with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, `Block public access (bucket settings)` prevents an individual bucket and its contained objects from becoming publicly accessible. Similarly, `Block public access (account settings)` prevents all buckets and their contained objects from becoming publicly accessible across the entire account.

๐Ÿ’ผ 2.1.5 Ensure that S3 Buckets are configured with 'Block public access (bucket settings)'

Amazon S3 provides 'Block public access (bucket settings)' and 'Block public access (account settings)' to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principal with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, 'Block public access (bucket settings)' prevents an individual bucket, and its contained objects, from becoming publicly accessible. Similarly, 'Block public access (account settings)' prevents all buckets, and contained objects, from becoming publicly accessible across the entire account.

๐Ÿ’ผ 2.1.5 Ensure that S3 Buckets are configured with 'Block public access (bucket settings)' - Level 1 (Automated)

Amazon S3 provides `Block public access (bucket settings)` and `Block public access (account settings)` to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principal with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, `Block public access (bucket settings)` prevents an individual bucket, and its contained objects, from becoming publicly accessible. Similarly, `Block public access (account settings)` prevents all buckets, and contained objects, from becoming publicly accessible across the entire account.

๐Ÿ’ผ 2.1.8 Ensure That Microsoft Defender for Containers Is Set To 'On' - Level 2 (Automated)

Turning on Microsoft Defender for Containers enables threat detection for Container Registries including Kubernetes, providing threat intelligence, anomaly detection, and behavior analytics in the Microsoft Defender for Cloud. The following services will be enabled for container instances: - Defender agent in Azure - Azure Policy for Kubernetes - Agentless discovery for Kubernetes - Agentless container vulnerability assessment

๐Ÿ’ผ 2.10 Do not create access keys during initial setup for IAM users with a console password (Manual)

AWS console defaults to no check boxes selected when creating a new IAM user. When creating the IAM User credentials you have to determine what type of access they require. Programmatic access: The IAM user might need to make API calls, use the AWS CLI, or use the Tools for Windows PowerShell. In that case, create an access key (access key ID and a secret access key) for that user. AWS Management Console access: If the user needs to access the AWS Management Console, create a password for the user.

๐Ÿ’ผ 2.13 Ensure access keys are rotated every 90 days or less (Automated)

Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. AWS users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. It is recommended that all access keys be rotated regularly.

๐Ÿ’ผ 2.13 Ensure Cloud Asset Inventory Is Enabled - Level 1 (Automated)

GCP Cloud Asset Inventory is services that provides a historical view of GCP resources and IAM policies through a time-series database. The information recorded includes metadata on Google Cloud resources, metadata on policies set on Google Cloud projects or resources, and runtime information gathered within a Google Cloud resource.

๐Ÿ’ผ 2.13 Ensure Cloud Asset Inventory Is Enabled - Level 1 (Automated)

GCP Cloud Asset Inventory is services that provides a historical view of GCP resources and IAM policies through a time-series database. The information recorded includes metadata on Google Cloud resources, metadata on policies set on Google Cloud projects or resources, and runtime information gathered within a Google Cloud resource. Cloud Asset Inventory Service (CAIS) API enablement is not required for operation of the service, but rather enables the mechanism for searching/exporting CAIS asset data directly.

๐Ÿ’ผ 2.13 Ensure Cloud Asset Inventory Is Enabled (Automated)

GCP Cloud Asset Inventory is services that provides a historical view of GCP resources and IAM policies through a time-series database. The information recorded includes metadata on Google Cloud resources, metadata on policies set on Google Cloud projects or resources, and runtime information gathered within a Google Cloud resource.

๐Ÿ’ผ 2.14 Ensure IAM users receive permissions only through groups (Automated)

IAM users are granted access to services, functions, and data through IAM policies. There are four ways to define policies for a user: 1) Edit the user policy directly, also known as an inline or user policy; 2) attach a policy directly to a user; 3) add the user to an IAM group that has an attached policy; 4) add the user to an IAM group that has an inline policy. Only the third implementation is recommended.

๐Ÿ’ผ 2.15 Ensure 'Access Approval' is 'Enabled' - Level 2 (Automated)

GCP Access Approval enables you to require your organizations' explicit approval whenever Google support try to access your projects. You can then select users within your organization who can approve these requests through giving them a security role in IAM. All access requests display which Google Employee requested them in an email or Pub/Sub message that you can choose to Approve. This adds an additional control and logging of who in your organization approved/denied these requests.

๐Ÿ’ผ 2.15 Ensure 'Access Approval' is 'Enabled' - Level 2 (Automated)

GCP Access Approval enables you to require your organizations' explicit approval whenever Google support try to access your projects. You can then select users within your organization who can approve these requests through giving them a security role in IAM. All access requests display which Google Employee requested them in an email or Pub/Sub message that you can choose to Approve. This adds an additional control and logging of who in your organization approved/denied these requests.

๐Ÿ’ผ 2.15 Ensure 'Access Approval' is 'Enabled' - Level 2 (Automated)

GCP Access Approval enables you to require your organizations' explicit approval whenever Google support try to access your projects. You can then select users within your organization who can approve these requests through giving them a security role in IAM. All access requests display which Google Employee requested them in an email or Pub/Sub message that you can choose to Approve. This adds an additional control and logging of who in your organization approved/denied these requests.

๐Ÿ’ผ 2.15 Ensure IAM policies that allow full "*:*" administrative privileges are not attached (Automated)

IAM policies are the means by which privileges are granted to users, groups, or roles. It is recommended and considered standard security advice to grant least privilegeโ€”that is, granting only the permissions required to perform a task. Determine what users need to do, and then craft policies for them that allow the users to perform only those tasks, instead of granting full administrative privileges.

๐Ÿ’ผ 2.18 Ensure that all expired SSL/TLS certificates stored in AWS IAM are removed (Automated)

To enable HTTPS connections to your website or application in AWS, you need an SSL/TLS server certificate. You can use AWS Certificate Manager (ACM) or IAM to store and deploy server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console.

๐Ÿ’ผ 2.19 Ensure that IAM External Access Analyzer is enabled for all regions (Automated)

Enable the IAM External Access Analyzer regarding all resources in each active AWS region. IAM Access Analyzer is a technology introduced at AWS reinvent 2019. After the Analyzer is enabled in IAM, scan results are displayed on the console showing the accessible resources. Scans show resources that other accounts and federated users can access, such as KMS keys and IAM roles. The results allow you to determine whether an unintended user is permitted, making it easier for administrators to monitor least privilege access. Access Analyzer analyzes only the policies that are applied to resources in the same AWS Region.

๐Ÿ’ผ 2.2 Auto provisioning

Microsoft Defender for Cloud ingests data from agents, extensions, and integrations. Automatic provisioning assists with the deployment and maintenance of agents and extensions required on endpoints such as Azure Virtual Machines.

๐Ÿ’ผ 2.2 Conditional Access

For most Azure tenants, and certainly for organizations with a significant use of Microsoft Entra ID, Conditional Access policies are recommended and preferred. To use Conditional Access Policies, a licensing plan is required, and **Security Defaults must be disabled**. Because of the licensing requirement, all Conditional Access policies are assigned a profile of "Level 2." Conditional Access requires one of the following plans: - Microsoft Entra ID P1 or P2 - Microsoft 365 Business Premium - Microsoft 365 E3 or E5 - Microsoft 365 F1, F3, F5 Security and F5 Security + Compliance - Enterprise Mobility & Security E3 or E5

๐Ÿ’ผ 2.2 Develop configuration standards for all system components. Assure that these standards address all known security vulnerabilities and are consistent with industry-accepted system hardening standards.

Sources of industry-accepted system hardening standards may include, but are not limited to: - Center for Internet Security (CIS) - International Organization for Standardization (ISO) - SysAdmin Audit Network Security (SANS) Institute - National Institute of Standards Technology (NIST).

๐Ÿ’ผ 2.2 Ensure CloudTrail log file validation is enabled

CloudTrail log file validation creates a digitally signed digest file containing a hash of each log that CloudTrail writes to S3. These digest files can be used to determine whether a log file was changed, deleted, or unchanged after CloudTrail delivered the log. It is recommended that file validation be enabled on all CloudTrails.

๐Ÿ’ผ 2.2.1 Configuration standards are developed, implemented, and maintained.

To the following: - Cover all system components. - Address all known security vulnerabilities. - Be consistent with industry-accepted system hardening standards or vendor hardening recommendations. - Be updated as new vulnerability issues are identified, as defined in Requirement 6.3.1. - Be applied when new systems are configured and verified as in place before or immediately after a system component is connected to a production environment.

๐Ÿ’ผ 2.2.1 Configuration standards are developed, implemented, and maintained.

To the following: - Cover all system components. - Address all known security vulnerabilities. - Be consistent with industry-accepted system hardening standards or vendor hardening recommendations. - Be updated as new vulnerability issues are identified, as defined in Requirement 6.3.1. - Be applied when new systems are configured and verified as in place before or immediately after a system component is connected to a production environment.

๐Ÿ’ผ 2.2.1 Ensure Trusted Locations Are Defined (Manual)

Microsoft Entra ID Conditional Access allows an organization to configure `Named locations` and configure whether those locations are trusted or untrusted. These settings provide organizations the means to specify Geographical locations for use in conditional access policies, or define actual IP addresses and IP ranges and whether or not those IP addresses and/or ranges are trusted by the organization.

๐Ÿ’ผ 2.2.2 Ensure that an exclusionary Geographic Access Policy is considered (Manual)

**CAUTION**: If these policies are created without first auditing and testing the result, misconfiguration can potentially lock out administrators or create undesired access issues. Conditional Access Policies can be used to block access from geographic locations that are deemed out-of-scope for your organization or application. The scope and variables for this policy should be carefully examined and defined.

๐Ÿ’ผ 2.2.2.1 Shouldnโ€™t allow more than 10 guesses in 5 minutes

When it's possible to configure, you should โ€˜throttling' the rate of attempts, so that the number of times the user must wait between attempts increases with each unsuccessful attempt you shouldnโ€™t allow more than 10 guesses in 5 minutes When the vendor doesn't allow you to configure the above, use the vendorโ€™s default setting.

๐Ÿ’ผ 2.2.3 Primary functions requiring different security levels are managed.

As follows: - Only one primary function exists on a system component, OR - Primary functions with differing security levels that exist on the same system component are isolated from each other, OR - Primary functions with differing security levels on the same system component are all secured to the level required by the function with the highest security need.

๐Ÿ’ผ 2.2.3 Primary functions requiring different security levels are managed.

As follows: - Only one primary function exists on a system component, OR - Primary functions with differing security levels that exist on the same system component are isolated from each other, OR - Primary functions with differing security levels on the same system component are all secured to the level required by the function with the highest security need.

๐Ÿ’ผ 2.2.3 Technical controls must be used to manage the quality of credentials.

Technical controls must be used to manage the quality of credentials. If credentials are just to unlock a device, use a minimum password or PIN length of at least 6 characters. When the device unlocking credentials are also used for authentication, you must apply the full password requirements to the credentials described in 'user access controls'.

๐Ÿ’ผ 2.21 Ensure access to AWSCloudShellFullAccess is restricted (Manual)

AWS CloudShell is a convenient way of running CLI commands against AWS services; a managed IAM policy ('AWSCloudShellFullAccess') provides full access to CloudShell, which allows file upload and download capability between a user's local system and the CloudShell environment. Within the CloudShell environment, a user has sudo permissions and can access the internet. Therefore, it is feasible to install file transfer software, for example, and move data from CloudShell to external internet servers.

๐Ÿ’ผ 2.22 Ensure that 'Require Multifactor Authentication to register or join devices with Microsoft Entra' is set to 'Yes' (Manual)

**NOTE**: This recommendation is only relevant if your subscription is using Per-User MFA. If your organization is licensed to use Conditional Access, the preferred method of requiring MFA to join devices to Entra ID is to use a Conditional Access policy (see additional information below for link). Joining or registering devices to Microsoft Entra ID should require multi-factor authentication.

๐Ÿ’ผ 2.26 Ensure fewer than 5 users have global administrator assignment (Manual)

This recommendation aims to maintain a balance between security and operational efficiency by ensuring that a minimum of 2 and a maximum of 4 users are assigned the Global Administrator role in Microsoft Entra ID. Having at least two Global Administrators ensures redundancy, while limiting the number to four reduces the risk of excessive privileged access.

๐Ÿ’ผ 2.3.1 Ensure that encryption is enabled for RDS Instances

Amazon RDS encrypted DB instances use the industry standard AES-256 encryption algorithm to encrypt your data on the server that hosts your Amazon RDS DB instances. After your data is encrypted, Amazon RDS handles authentication of access and decryption of your data transparently with a minimal impact on performance.

๐Ÿ’ผ 2.4 Ensure CloudTrail trails are integrated with CloudWatch Logs

AWS CloudTrail is a web service that records AWS API calls made in a given AWS account. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail uses Amazon S3 for log file storage and delivery, so log files are stored durably. In addition to capturing CloudTrail logs within a specified S3 bucket for long term analysis, realtime analysis can be performed by configuring CloudTrail to send logs to CloudWatch Logs. For a trail that is enabled in all regions in an account, CloudTrail sends log files from all those regions to a CloudWatch Logs log group. It is recommended that CloudTrail logs be sent to CloudWatch Logs. Note: The intent of this recommendation is to ensure AWS account activity is being captured, monitored, and appropriately alarmed on. CloudWatch Logs is a native way to accomplish this using AWS services but does not preclude the use of an alternate solution.

๐Ÿ’ผ 2.4 Ensure log metric filter and alerts exist for project ownership assignments/changes

In order to prevent unnecessary project ownership assignments to users/service-accounts and further misuses of projects and resources, all 'roles/Owner' assignments should be monitored. Members (users/Service-Accounts) with a role assignment to primitive role 'roles/Owner' are project owners. The project owner has all the privileges on the project the role belongs to. These are summarized below: ''' - All viewer permissions on all GCP Services within the project - Permissions for actions that modify the state of all GCP services within the project - Manage roles and permissions for a project and all resources within the project - Set up billing for a project ''' Granting the owner role to a member (user/Service-Account) will allow that member to modify the Identity and Access Management (IAM) policy. Therefore, grant the owner role only if the member has a legitimate purpose to manage the IAM policy. This is because the project IAM policy contains sensitive access control data. Having a minimal set of users allowed to manage IAM policy will simplify any auditing that may be necessary.

๐Ÿ’ผ 2.4 Ensure log metric filter and alerts exist for project ownership assignments/changes - Level 1 (Automated)

In order to prevent unnecessary project ownership assignments to users/service-accounts and further misuses of projects and resources, all `roles/Owner` assignments should be monitored. Members (users/Service-Accounts) with a role assignment to primitive role `roles/Owner` are project owners. The project owner has all the privileges on the project the role belongs to. These are summarized below: ``` - All viewer permissions on all GCP Services within the project - Permissions for actions that modify the state of all GCP services within the project - Manage roles and permissions for a project and all resources within the project - Set up billing for a project ``` Granting the owner role to a member (user/Service-Account) will allow that member to modify the Identity and Access Management (IAM) policy. Therefore, grant the owner role only if the member has a legitimate purpose to manage the IAM policy. This is because the project IAM policy contains sensitive access control data. Having a minimal set of users allowed to manage IAM policy will simplify any auditing that may be necessary.

๐Ÿ’ผ 2.4 Ensure Log Metric Filter and Alerts Exist for Project Ownership Assignments/Changes - Level 1 (Automated)

In order to prevent unnecessary project ownership assignments to users/service-accounts and further misuses of projects and resources, all `roles/Owner` assignments should be monitored. Members (users/Service-Accounts) with a role assignment to primitive role `roles/Owner` are project owners. The project owner has all the privileges on the project the role belongs to. These are summarized below: ``` - All viewer permissions on all GCP Services within the project - Permissions for actions that modify the state of all GCP services within the project - Manage roles and permissions for a project and all resources within the project - Set up billing for a project ``` Granting the owner role to a member (user/Service-Account) will allow that member to modify the Identity and Access Management (IAM) policy. Therefore, grant the owner role only if the member has a legitimate purpose to manage the IAM policy. This is because the project IAM policy contains sensitive access control data. Having a minimal set of users allowed to manage IAM policy will simplify any auditing that may be necessary.

๐Ÿ’ผ 2.4 Ensure Log Metric Filter and Alerts Exist for Project Ownership Assignments/Changes - Level 1 (Automated)

In order to prevent unnecessary project ownership assignments to users/service-accounts and further misuses of projects and resources, all roles/Owner assignments should be monitored. Members (users/Service-Accounts) with a role assignment to primitive role roles/Owner are project owners. The project owner has all the privileges on the project the role belongs to. These are summarized below: - All viewer permissions on all GCP Services within the project - Permissions for actions that modify the state of all GCP services within the project - Manage roles and permissions for a project and all resources within the project - Set up billing for a project Granting the owner role to a member (user/Service-Account) will allow that member to modify the Identity and Access Management (IAM) policy. Therefore, grant the owner role only if the member has a legitimate purpose to manage the IAM policy. This is because the project IAM policy contains sensitive access control data. Having a minimal set of users allowed to manage IAM policy will simplify any auditing that may be necessary.

๐Ÿ’ผ 2.4 Ensure Log Metric Filter and Alerts Exist for Project Ownership Assignments/Changes - Level 1 (Automated)

In order to prevent unnecessary project ownership assignments to users/service-accounts and further misuses of projects and resources, all `roles/Owner` assignments should be monitored. Members (users/Service-Accounts) with a role assignment to primitive role `roles/Owner` are project owners. The project owner has all the privileges on the project the role belongs to. These are summarized below: - All viewer permissions on all GCP Services within the project - Permissions for actions that modify the state of all GCP services within the project - Manage roles and permissions for a project and all resources within the project - Set up billing for a project Granting the owner role to a member (user/Service-Account) will allow that member to modify the Identity and Access Management (IAM) policy. Therefore, grant the owner role only if the member has a legitimate purpose to manage the IAM policy. This is because the project IAM policy contains sensitive access control data. Having a minimal set of users allowed to manage IAM policy will simplify any auditing that may be necessary.

๐Ÿ’ผ 2.4 Ensure MFA is enabled for the 'root' user account (Automated)

The 'root' user account is the most privileged user in an AWS account. Multi-factor Authentication (MFA) adds an extra layer of protection on top of a username and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their username and password as well as for an authentication code from their AWS MFA device. **Note**: When virtual MFA is used for 'root' accounts, it is recommended that the device used is NOT a personal device, but rather a dedicated mobile device (tablet or phone) that is kept charged and secured, independent of any individual personal devices ("nonpersonal virtual MFA"). This lessens the risks of losing access to the MFA due to device loss, device trade-in, or if the individual owning the device is no longer employed at the company. Where an AWS Organization is using centralized root access, root credentials can be removed from member accounts. In that case it is neither possible nor necessary to configure root MFA in the member account.

๐Ÿ’ผ 2.4 Integrations

Microsoft Defender for Cloud ingests data from agents, extensions, and integrations. Integration allows other Azure products to send and receive data with Microsoft Defender for Cloud.

๐Ÿ’ผ 2.4.2 Ensure that Microsoft Defender for Endpoint integration with Microsoft Defender for Cloud is selected - Level 2 (Manual)

This integration setting enables Microsoft Defender for Endpoint (formerly 'Advanced Threat Protection' or 'ATP' or 'WDATP' - see additional info) to communicate with Microsoft Defender for Cloud. **IMPORTANT:** When enabling integration between DfE & DfC it needs to be taken into account that this will have some side effects that may be undesirable. 1. For server 2019 & above if defender is installed (default for these server SKU's) this will trigger a deployment of the new unified agent and link to any of the extended configuration in the Defender portal. 1. If the new unified agent is required for server SKU's of Win 2016 or Linux and lower there is additional integration that needs to be switched on and agents need to be aligned.

๐Ÿ’ผ 2.5 Ensure AWS Config is enabled in all regions

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration item (AWS resource), relationships between configuration items (AWS resources), any configuration changes between resources. It is recommended to enable AWS Config be enabled in all regions.

๐Ÿ’ผ 2.5 Ensure hardware MFA is enabled for the 'root' user account (Manual)

The 'root' user account is the most privileged user in an AWS account. MFA adds an extra layer of protection on top of a user name and password. With MFA enabled, when a user signs in to an AWS website, they will be prompted for their user name and password as well as for an authentication code from their AWS MFA device. For Level 2, it is recommended that the 'root' user account be protected with a hardware MFA. Where an AWS Organization is using centralized root access, root credentials can be removed from member accounts. In that case it is neither possible nor necessary to configure root MFA in the member account.

๐Ÿ’ผ 2.5 Ensure that the log metric filter and alerts exist for Audit Configuration changes

Google Cloud Platform (GCP) services write audit log entries to the Admin Activity and Data Access logs to help answer the questions of, 'who did what, where, and when? within GCP projects. Cloud audit logging records information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by GCP services. Cloud audit logging provides a history of GCP API calls for an account, including API calls made via the console, SDKs, command-line tools, and other GCP services.

๐Ÿ’ผ 2.5 Ensure that the log metric filter and alerts exist for Audit Configuration changes - Level 1 (Automated)

Google Cloud Platform (GCP) services write audit log entries to the Admin Activity and Data Access logs to help answer the questions of, "who did what, where, and when?" within GCP projects. Cloud audit logging records information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by GCP services. Cloud audit logging provides a history of GCP API calls for an account, including API calls made via the console, SDKs, command-line tools, and other GCP services.

๐Ÿ’ผ 2.5 Ensure That the Log Metric Filter and Alerts Exist for Audit Configuration Changes - Level 1 (Automated)

Google Cloud Platform (GCP) services write audit log entries to the Admin Activity and Data Access logs to help answer the questions of, "who did what, where, and when?" within GCP projects. Cloud audit logging records information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by GCP services. Cloud audit logging provides a history of GCP API calls for an account, including API calls made via the console, SDKs, command-line tools, and other GCP services.

๐Ÿ’ผ 2.5 Ensure That the Log Metric Filter and Alerts Exist for Audit Configuration Changes - Level 1 (Automated)

Google Cloud Platform (GCP) services write audit log entries to the Admin Activity and Data Access logs to help answer the questions of, "who did what, where, and when?" within GCP projects. Cloud audit logging records information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by GCP services. Cloud audit logging provides a history of GCP API calls for an account, including API calls made via the console, SDKs, command-line tools, and other GCP services.

๐Ÿ’ผ 2.5 Ensure That the Log Metric Filter and Alerts Exist for Audit Configuration Changes - Level 1 (Automated)

Google Cloud Platform (GCP) services write audit log entries to the Admin Activity and Data Access logs to help answer the questions of, "who did what, where, and when?" within GCP projects. Cloud audit logging records information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by GCP services. Cloud audit logging provides a history of GCP API calls for an account, including API calls made via the console, SDKs, command-line tools, and other GCP services.

๐Ÿ’ผ 2.6 Ensure S3 bucket access logging is enabled on the CloudTrail S3 bucket

S3 Bucket Access Logging generates a log that contains access records for each request made to your S3 bucket. An access log record contains details about the request, such as the request type, the resources specified in the request worked, and the time and date the request was processed. It is recommended that bucket access logging be enabled on the CloudTrail S3 bucket.

๐Ÿ’ผ 2.7 Ensure CloudTrail logs are encrypted at rest using KMS CMKs

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 2.8 Ensure rotation for customer created CMKs is enabled

AWS Key Management Service (KMS) allows customers to rotate the backing key which is key material stored within the KMS which is tied to the key ID of the Customer Created customer master key (CMK). It is the backing key that is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can take place transparently. It is recommended that CMK key rotation be enabled.

๐Ÿ’ผ 2.8 Ensure that a Custom Bad Password List is set to 'Enforce' for your Organization (Manual)

Microsoft Azure provides a Global Banned Password policy that applies to Azure administrative and normal user accounts. This is not applied to user accounts that are synced from an on-premise Active Directory unless Microsoft Entra ID Connect is used and you enable EnforceCloudPasswordPolicyForPasswordSyncedUsers. Please see the list in default values on the specifics of this policy. To further password security, it is recommended to further define a custom banned password policy.

๐Ÿ’ผ 2.9 Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password (Automated)

Multi-Factor Authentication (MFA) adds an extra layer of authentication assurance beyond traditional credentials. With MFA enabled, when a user signs in to the AWS Console, they will be prompted for their user name and password as well as for an authentication code from their physical or virtual MFA token. It is recommended that MFA be enabled for all accounts that have a console password.

๐Ÿ’ผ 2.9 Ensure VPC flow logging is enabled in all VPCs

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. After you've created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs. It is recommended that VPC Flow Logs be enabled for packet "Rejects" for VPCs.

๐Ÿ’ผ 20 Under CPS 234, an APRA-regulated entity must actively maintain an information security capability with respect to changes in vulnerabilities and threats. Accordingly, an entity would typically adopt an adaptive and forward-looking approach to maintaining its information security capability, including ongoing investment in resources, skills and controls. This would commonly be achieved through the execution of an information security strategy which responds to the changing environment throughout the year. The strategy could be informed by existing and emerging information security vulnerabilities and threats, contemporary industry practices, information security incidents, both internal and external, and known information security issues. Oversight of execution of the strategy is normally the responsibility of the Board or a delegated governing body with representation from across the organisation.

๐Ÿ’ผ 23 An APRA-regulated entity could consider implementing processes that ensure compliance with its information security policy framework and regulatory requirements. This could include an exemption policy defining registration, authorisation and duration requirements. Exemptions are typically administered using a register detailing nature, rationale and expiry date. APRA envisages that an entity would review and assess the adequacy of compensating controls both initially and on an ongoing basis

๐Ÿ’ผ 25 An APRA-regulated entity would typically periodically evaluate the effectiveness and completeness of its information security policy framework through a review of incidents that have occurred as well as comparisons to peers and established control frameworks and standards. Adjustments would be made to the policy framework to ensure its continued effectiveness. This assessment would typically also be conducted in response to a material change to information assets or the business environment.

๐Ÿ’ผ 27 Under CPS 234, all information assets must be classified by criticality2 and sensitivity3 . This includes infrastructure, ancillary systems such as environmental control systems and physical access control systems as well as information assets managed by third parties and related parties. Furthermore, APRA-regulated entities could benefit from considering the interrelationships between information assets, including identifying information assets which are not intrinsically critical or sensitive but could be used to compromise information assets which are critical or sensitive.

๐Ÿ’ผ 29 In order to identify and classify information assets, an APRA-regulated entity would benefit from maintaining a classification methodology that provides clarity as to what constitutes an information asset, granularity considerations and the method for rating criticality and sensitivity. The rating could take into account the impact of an information security compromise on an information asset. Notably, an information asset could be assessed as having a different rating from the perspective of its criticality and sensitivity.

๐Ÿ’ผ 3 Control Plane Configuration

This section contains recommendations for cluster-wide areas, such as authentication and logging. Unlike section 1 these recommendations should apply to all deployments.

๐Ÿ’ผ 3 Logging

This section contains recommendations for configuring AWS's account logging features.

๐Ÿ’ผ 3 Monitoring

For effectiveness and coverage of recommended metric-filters and alarms, recommendations in Section 3 should be implemented on Multi-region CloudTrail referred in 'Ensure CloudTrail is enabled in all regions' Updated Overview should look like: This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the 'Ensure CloudTrail is enabled in all regions' and 'Ensure CloudTrail trails are integrated with CloudWatch Logs' recommendation in the "Logging" section. Additionally, step 3 of the remediation procedure for the same recommendations provides guidance for establishing an email-based subscription ('--protocol email'). This is provided as an example and is not meant to suggest other protocols provide lesser value.

๐Ÿ’ผ 3 Security

This section covers security best practice recommendations for products in the Azure Security services category. Azure Product Category Page: <https://azure.microsoft.com/en-us/products/category/security>

๐Ÿ’ผ 3 Storage Accounts

This section covers security recommendations to follow to set storage account policies on an Azure Subscription. An Azure storage account provides a unique namespace to store and access Azure Storage data objects.

๐Ÿ’ผ 3 Storage Accounts

This section covers security recommendations to follow to set storage account policies on an Azure Subscription. An Azure storage account provides a unique namespace to store and access Azure Storage data objects.

๐Ÿ’ผ 3 Storage Accounts

This section covers security recommendations to follow to set storage account policies on an Azure Subscription. An Azure storage account provides a unique namespace to store and access Azure Storage data objects.

๐Ÿ’ผ 3 Storage Accounts

This section covers security recommendations to follow to set storage account policies on an Azure Subscription. An Azure storage account provides a unique namespace to store and access Azure Storage data objects.

๐Ÿ’ผ 3 Storage Accounts

This section covers security recommendations to follow to set storage account policies on an Azure Subscription. An Azure storage account provides a unique namespace to store and access Azure Storage data objects.

๐Ÿ’ผ 3.1 Ensure CloudTrail is enabled in all regions

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 3.1 Ensure CloudTrail is enabled in all regions

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 3.1 Ensure CloudTrail is enabled in all regions - Level 1 (Automated)

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 3.1 Ensure CloudTrail is enabled in all regions - Level 1 (Automated)

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 3.1 Ensure CloudTrail is enabled in all regions - Level 1 (Automated)

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 3.1 Ensure CloudTrail is enabled in all regions (Automated)

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 3.1 Ensure CloudTrail is enabled in all regions (Automated)

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 3.1 Ensure CloudTrail is enabled in all regions (Manual)

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 3.1 Keep cardholder data storage to a minimum by implementing data retention and disposal policies.

Procedures and processes that include at least the following for all cardholder data (CHD) storage: - Limiting data storage amount and retention time to that which is required for legal, regulatory, and/or business requirements - Specific retention requirements for cardholder data - Processes for secure deletion of data when no longer needed - A quarterly process for identifying and securely deleting stored cardholder data that exceeds defined retention.

๐Ÿ’ผ 3.1 Microsoft Defender for Cloud

This subsection provides guidance on the use of Microsoft Defender for Cloud and associated product plans. This guidance is intended to ensure that - at a minimum - the protective measures offered by these plans are being considered. Organizations may find that they have existing products or services that provide the same utility as some Microsoft Defender for Cloud products. Security and Administrative personnel need to make the determination on their organization's behalf regarding which - if any - of these recommendations are relevant to their organization's needs. In consideration of the above, and because of the potential for increased cost and complexity, please be aware that all Microsoft Defender for Cloud and associated plan recommendations are profiled as "Level 2" recommendations.

๐Ÿ’ผ 3.1.1 Client certificate authentication should not be used for users (Not Scored)

Kubernetes provides the option to use client certificates for user authentication. However, as there is no way to revoke these certificates when a user leaves an organization or loses their credential, they are not suitable for this purpose. It is not possible to fully disable client certificate use within a cluster as it is used for component to component authentication.

๐Ÿ’ผ 3.1.1 Ensure that Azure Databricks is deployed in a customer-managed virtual network (VNet) (Automated)

Networking for Azure Databricks can be set up in a few different ways. Using a customer-managed Virtual Network (VNet) (also known as VNet Injection) ensures that compute clusters and control planes are securely isolated within the organization's network boundary. By default, Databricks creates a managed VNet, which provides limited control over network security policies, firewall configurations, and routing.

๐Ÿ’ผ 3.1.1 Microsoft Cloud Security Posture Management (CSPM)

Microsoft Defender for Cloud offers foundational and advanced Cloud Security Posture Management (CSPM) solutions to protect across multi-cloud and hybrid environments. Both solutions cover PaaS as well as IaaS. CSPM provides reporting functionality on security and regulatory frameworks including NIST 800 series, ISO 27001, PCI-DSS, CIS Benchmarks and Controls, and many more. CSPM also provides the ability to create your own custom framework, but this will require significant work. Regulatory standards are reported in a compliance dashboard which offers a summarized view against deployed standards and presents the ability to download compliance reports in various formats. CSPM has two types of implementations: 1. Foundational (Free): This implementation is free and enabled by default with a limited set of features including: - Continuous assessment of the security configuration of cloud resources - Security recommendations to fix misconfigurations and weaknesses - Secure score summarizing current overall security posture 2. Full CSPM (Paid): Full CSPM is a paid product offering additional functionality including: - Identity and role assignments discovery - Network exposure detection - Attack path analysis - Cloud security explorer for risk hunting - Agentless vulnerability scanning - Agentless secrets scanning - Governance rules to drive timely remediation and accountability - Regulatory compliance and industry best practices - Data-aware security posture - Agentless discovery for Kubernetes - Agentless container vulnerability assessment It is recommended that for full CSPM a cost review is undertaken particularly if your tenant is heavy on IaaS prior to implementing and matched to security requirements.

๐Ÿ’ผ 3.1.1.1 Ensure that Auto provisioning of 'Log Analytics agent for Azure VMs' is Set to 'On' (Automated)

Enable automatic provisioning of the monitoring agent to collect security data. **DEPRECATION PLANNED**: The Log Analytics Agent is slated for deprecation in August 2024. The Microsoft Defender for Endpoint agent, in tandem with new agentless capabilities will be providing replacement functionality. More detail is available here: <https://techcommunity.microsoft.com/t5/microsoft-defender-for-cloud/microsoft-defender-for-cloud-strategy-and-plan-towards-log/ba-p/3883341>.

๐Ÿ’ผ 3.1.15 Ensure that Microsoft Defender External Attack Surface Monitoring (EASM) is enabled (Manual)

An organization's attack surface is the collection of assets with a public network identifier or URI that an external threat actor can see or access from outside your cloud. It is the set of points on the boundary of a system, a system element, system component, or an environment where an attacker can try to enter, cause an effect on, or extract data from, that system, system element, system component, or environment. The larger the attack surface, the harder it is to protect. This tool can be configured to scan your organization's online infrastructure such as specified domains, hosts, CIDR blocks, and SSL certificates, and store them in an Inventory. Inventory items can be added, reviewed, approved, and removed, and may contain enrichments ("insights") and additional information collected from the tool's different scan engines and open-source intelligence sources. A Defender EASM workspace will generate an Inventory of publicly exposed assets by crawling and scanning the internet using Seeds you provide when setting up the tool. Seeds can be FQDNs, IP CIDR blocks, and WHOIS records. Defender EASM will generate Insights within 24-48 hours after Seeds are provided, and these insights include vulnerability data (CVEs), ports and protocols, and weak or expired SSL certificates that could be used by an attacker for reconnaisance or exploitation. Results are classified High/Medium/Low and some of them include proposed mitigations.

๐Ÿ’ผ 3.1.2 Defender Plan: APIs

Defender for APIs in Microsoft Defender for Cloud offers full lifecycle protection, detection, and response coverage for APIs. Defender for APIs helps you to gain visibility into business-critical APIs. You can investigate and improve your API security posture, prioritize vulnerability fixes, and quickly detect active real-time threats. Defender for API's requires additional configuration in the Microsoft API portal. **Note**: There is a cost attached to using Defender for API

๐Ÿ’ผ 3.1.3.3 Ensure that 'Endpoint protection' component status is set to 'On' (Manual)

The Endpoint protection component enables Microsoft Defender for Endpoint (formerly 'Advanced Threat Protection' or 'ATP' or 'WDATP' - see additional info) to communicate with Microsoft Defender for Cloud. **IMPORTANT**: When enabling integration between DfE & DfC it needs to be taken into account that this will have some side effects that may be undesirable. 1. For server 2019 & above if defender is installed (default for these server SKUs) this will trigger a deployment of the new unified agent and link to any of the extended configuration in the Defender portal. 2. If the new unified agent is required for server SKUs of Win 2016 or Linux and lower there is additional integration that needs to be switched on and agents need to be aligned.

๐Ÿ’ผ 3.1.4 Ensure that S3 is configured with 'Block Public Access' enabled (Automated)

Amazon S3 provides `Block public access (bucket settings)` and `Block public access (account settings)` to help you manage public access to Amazon S3 resources. By default, S3 buckets and objects are created with public access disabled. However, an IAM principal with sufficient S3 permissions can enable public access at the bucket and/or object level. While enabled, `Block public access (bucket settings)` prevents an individual bucket and its contained objects from becoming publicly accessible. Similarly, `Block public access (account settings)` prevents all buckets and their contained objects from becoming publicly accessible across the entire account.

๐Ÿ’ผ 3.1.4 Ensure that users and groups are synced from Microsoft Entra ID to Azure Databricks (Manual)

To ensure centralized identity and access management, users and groups from Microsoft Entra ID should be synchronized with Azure Databricks. This is achieved through SCIM provisioning, which automates the creation, update, and deactivation of users and groups in Databricks based on Entra ID assignments. Enabling this integration ensures that access controls in Databricks remain consistent with corporate identity governance policies, reducing the risk of orphaned accounts, stale permissions, and unauthorized access.

๐Ÿ’ผ 3.1.4.1 Ensure That Microsoft Defender for Containers Is Set To 'On' (Automated)

Turning on Microsoft Defender for Containers enables threat detection for Container Registries including Kubernetes, providing threat intelligence, anomaly detection, and behavior analytics in the Microsoft Defender for Cloud. The following services will be enabled for container instances: - Defender agent in Azure - Azure Policy for Kubernetes - Agentless discovery for Kubernetes - Agentless container vulnerability assessment

๐Ÿ’ผ 3.1.5 Ensure that Unity Catalog is configured for Azure Databricks (Manual)

Unity Catalog is a centralized governance model for managing and securing data in Azure Databricks. It provides fine-grained access control to databases, tables, and views using Microsoft Entra ID identities. Unity Catalog also enhances data lineage, audit logging, and compliance monitoring, making it a critical component for security and governance.

๐Ÿ’ผ 3.1.6 Ensure that usage is restricted and expiry is enforced for Databricks personal access tokens (Manual)

Databricks personal access tokens (PATs) provide API-based authentication for users and applications. By default, users can generate API tokens without expiration, leading to potential security risks if tokens are leaked, improperly stored, or not rotated regularly. To mitigate these risks, administrators should: โ€ข Restrict token creation to approved users and service principals. โ€ข Enforce expiration policies to prevent long-lived tokens. โ€ข Monitor token usage and revoke unused or compromised tokens.

๐Ÿ’ผ 3.1.7 Ensure that diagnostic log delivery is configured for Azure Databricks (Manual)

Azure Databricks Diagnostic Logging provides insights into system operations, user activities, and security events within a Databricks workspace. Enabling diagnostic logs helps organizations: โ€ข Detect security threats by logging access, job executions, and cluster activities. โ€ข Ensure compliance with industry regulations such as SOC 2, HIPAA, and GDPR. โ€ข Monitor operational performance and troubleshoot issues proactively.

๐Ÿ’ผ 3.10 Ensure a log metric filter and alarm exist for security group changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. Security Groups are a stateful packet filter that controls ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established changes to Security Groups.

๐Ÿ’ผ 3.10 Ensure Private Endpoints are used to access Storage Accounts - Level 1 (Automated)

Use private endpoints for your Azure Storage accounts to allow clients and services to securely access data located over a network via an encrypted Private Link. To do this, the private endpoint uses an IP address from the VNet for each service. Network traffic between disparate services securely traverses encrypted over the VNet. This VNet can also link addressing space, extending your network and accessing resources on it. Similarly, it can be a tunnel through public networks to connect remote infrastructures together. This creates further security through segmenting network traffic and preventing outside sources from accessing it.

๐Ÿ’ผ 3.10 Ensure Private Endpoints are used to access Storage Accounts - Level 1 (Automated)

Use private endpoints for your Azure Storage accounts to allow clients and services to securely access data located over a network via an encrypted Private Link. To do this, the private endpoint uses an IP address from the VNet for each service. Network traffic between disparate services securely traverses encrypted over the VNet. This VNet can also link addressing space, extending your network and accessing resources on it. Similarly, it can be a tunnel through public networks to connect remote infrastructures together. This creates further security through segmenting network traffic and preventing outside sources from accessing it.

๐Ÿ’ผ 3.10 Ensure Private Endpoints are used to access Storage Accounts - Level 1 (Manual)

Use private endpoints for your Azure Storage accounts to allow clients and services to securely access data located over a network via an encrypted Private Link. To do this, the private endpoint uses an IP address from the VNet for each service. Network traffic between disparate services securely traverses encrypted over the VNet. This VNet can also link addressing space, extending your network and accessing resources on it. Similarly, it can be a tunnel through public networks to connect remote infrastructures together. This creates further security through segmenting network traffic and preventing outside sources from accessing it.

๐Ÿ’ผ 3.10 Ensure Storage logging is Enabled for Blob Service for 'Read', 'Write', and 'Delete' requests - Level 2 (Automated)

The Storage Blob service provides scalable, cost-efficient objective storage in the cloud. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the blobs. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details , concurrency information and the sizes of the request and response messages.

๐Ÿ’ผ 3.10 Ensure Storage logging is enabled for Blob service for read, write, and delete requests - Level 2 (Manual)

The Storage Blob service provides scalable, cost-efficient objective storage in the cloud. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the blobs. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details , concurrency information and the sizes of the request and response messages.

๐Ÿ’ผ 3.11 Ensure Soft Delete is Enabled for Azure Containers and Blob Storage - Level 1 (Automated)

The Azure Storage blobs contain data like ePHI or Financial, which can be secret or personal. Data that is erroneously modified or deleted by an application or other storage account user will cause data loss or unavailability. It is recommended that both Azure Containers with attached Blob Storage and standalone containers with Blob Storage be made recoverable by enabling the **soft delete** configuration. This is to save and recover data when blobs or blob snapshots are deleted.

๐Ÿ’ผ 3.11 Ensure Soft Delete is Enabled for Azure Containers and Blob Storage - Level 1 (Automated)

The Azure Storage blobs contain data like ePHI or Financial, which can be secret or personal. Data that is erroneously modified or deleted by an application or other storage account user will cause data loss or unavailability. It is recommended that both Azure Containers with attached Blob Storage and standalone containers with Blob Storage be made recoverable by enabling the soft delete configuration. This is to save and recover data when blobs or blob snapshots are deleted.

๐Ÿ’ผ 3.11 Ensure Soft Delete is Enabled for Azure Containers and Blob Storage - Level 1 (Automated)

The Azure Storage blobs contain data like ePHI or Financial, which can be secret or personal. Data that is erroneously modified or deleted by an application or other storage account user will cause data loss or unavailability. It is recommended that both Azure Containers with attached Blob Storage and standalone containers with Blob Storage be made recoverable by enabling the **soft delete** configuration. This is to save and recover data when blobs or blob snapshots are deleted.

๐Ÿ’ผ 3.11 Ensure Storage Logging is Enabled for Table Service for 'Read', 'Write', and 'Delete' Requests - Level 2 (Automated)

The Storage Table storage is a service that stores structure NoSQL data in the cloud, providing a key/attribute store with a schema less design. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the tables. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details , concurrency information and the sizes of the request and response messages. Will be supported in the near future.

๐Ÿ’ผ 3.11 Ensure Storage logging is enabled for Table service for read, write, and delete requests - Level 2 (Manual)

The Storage Table storage is a service that stores structure NoSQL data in the cloud, providing a key/attribute store with a schema less design. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the tables. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details , concurrency information and the sizes of the request and response messages. Will be supported in the near future.

๐Ÿ’ผ 3.13 Ensure a log metric filter and alarm exist for route table changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 3.13 Ensure Storage logging is Enabled for Blob Service for 'Read', 'Write', and 'Delete' requests - Level 2 (Automated)

The Storage Blob service provides scalable, cost-efficient object storage in the cloud. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the blobs. Storage Logging log entries contain the following information about individual requests: timing information such as start time, end-to-end latency, and server latency; authentication details; concurrency information; and the sizes of the request and response messages.

๐Ÿ’ผ 3.13 Ensure Storage logging is Enabled for Blob Service for 'Read', 'Write', and 'Delete' requests - Level 2 (Automated)

The Storage Blob service provides scalable, cost-efficient object storage in the cloud. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the blobs. Storage Logging log entries contain the following information about individual requests: timing information such as start time, end-to-end latency, and server latency; authentication details; concurrency information; and the sizes of the request and response messages.

๐Ÿ’ผ 3.13 Ensure Storage logging is Enabled for Blob Service for 'Read', 'Write', and 'Delete' requests - Level 2 (Automated)

The Storage Blob service provides scalable, cost-efficient object storage in the cloud. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the blobs. Storage Logging log entries contain the following information about individual requests: timing information such as start time, end-to-end latency, and server latency; authentication details; concurrency information; and the sizes of the request and response messages.

๐Ÿ’ผ 3.14 Ensure a log metric filter and alarm exist for VPC changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. It is possible to have more than 1 VPC within an account, in addition it is also possible to create a peer connection between 2 VPCs enabling network traffic to route between VPCs. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 3.14 Ensure Storage Logging is Enabled for Table Service for 'Read', 'Write', and 'Delete' Requests - Level 2 (Automated)

Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schema-less design. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the tables. Storage Logging log entries contain the following information about individual requests: timing information such as start time, end-to-end latency, and server latency; authentication details; concurrency information; and the sizes of the request and response messages.

๐Ÿ’ผ 3.14 Ensure Storage Logging is Enabled for Table Service for 'Read', 'Write', and 'Delete' Requests - Level 2 (Automated)

Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schema-less design. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the tables. Storage Logging log entries contain the following information about individual requests: timing information such as start time, end-to-end latency, and server latency; authentication details; concurrency information; and the sizes of the request and response messages.

๐Ÿ’ผ 3.14 Ensure Storage Logging is Enabled for Table Service for 'Read', 'Write', and 'Delete' Requests - Level 2 (Automated)

Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schema-less design. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the tables. Storage Logging log entries contain the following information about individual requests: timing information such as start time, end-to-end latency, and server latency; authentication details; concurrency information; and the sizes of the request and response messages.

๐Ÿ’ผ 3.16 Ensure 'Cross Tenant Replication' is not enabled - Level 1 (Automated)

Cross Tenant Replication in Azure allows data to be replicated across multiple Azure tenants. While this feature can be beneficial for data sharing and availability, it also poses a significant security risk if not properly managed. Unauthorized data access, data leakage, and compliance violations are potential risks. Disabling Cross Tenant Replication ensures that data is not inadvertently replicated across different tenant boundaries without explicit authorization.

๐Ÿ’ผ 3.17 Ensure that 'Allow Blob Anonymous Access' is set to 'Disabled' - Level 1 (Automated)

The Azure Storage setting Allow Blob Anonymous Access (aka "allowBlobPublicAccess") controls whether anonymous access is allowed for blob data in a storage account. When this property is set to True, it enables public read access to blob data, which can be convenient for sharing data but may carry security risks. When set to False, it disallows public access to blob data, providing a more secure storage environment.

๐Ÿ’ผ 3.2 Ensure CloudTrail log file validation is enabled

CloudTrail log file validation creates a digitally signed digest file containing a hash of each log that CloudTrail writes to S3. These digest files can be used to determine whether a log file was changed, deleted, or unchanged after CloudTrail delivered the log. It is recommended that file validation be enabled on all CloudTrails.

๐Ÿ’ผ 3.2 Ensure CloudTrail log file validation is enabled

CloudTrail log file validation creates a digitally signed digest file containing a hash of each log that CloudTrail writes to S3. These digest files can be used to determine whether a log file was changed, deleted, or unchanged after CloudTrail delivered the log. It is recommended that file validation be enabled on all CloudTrails.

๐Ÿ’ผ 3.2 Ensure CloudTrail log file validation is enabled (Automated)

CloudTrail log file validation creates a digitally signed digest file containing a hash of each log that CloudTrail writes to S3. These digest files can be used to determine whether a log file was changed, deleted, or remained unchanged after CloudTrail delivered the log. It is recommended that file validation be enabled for all CloudTrails.

๐Ÿ’ผ 3.2 Ensure CloudTrail log file validation is enabled (Automated)

CloudTrail log file validation creates a digitally signed digest file containing a hash of each log that CloudTrail writes to S3. These digest files can be used to determine whether a log file was changed, deleted, or remained unchanged after CloudTrail delivered the log. It is recommended that file validation be enabled for all CloudTrails.

๐Ÿ’ผ 3.2 Ensure CloudTrail log file validation is enabled (Automated)

CloudTrail log file validation creates a digitally signed digest file containing a hash of each log that CloudTrail writes to S3. These digest files can be used to determine whether a log file was changed, deleted, or remained unchanged after CloudTrail delivered the log. It is recommended that file validation be enabled for all CloudTrails.

๐Ÿ’ผ 3.2.1 Account data storage is kept to a minimum through implementation of data retention and disposal policies, procedures, and processes.

Include at least the following: - Coverage for all locations of stored account data. - Coverage for any sensitive authentication data (SAD) stored prior to completion of authorization. This bullet is a best practice until its effective date; refer to Applicability Notes below for details. - Limiting data storage amount and retention time to that which is required for legal or regulatory, and/or business requirements. - Specific retention requirements for stored account data that defines length of retention period and includes a documented business justification. - Processes for secure deletion or rendering account data unrecoverable when no longer needed per the retention policy. - A process for verifying, at least once every three months, that stored account data exceeding the defined retention period has been securely deleted or rendered unrecoverable.

๐Ÿ’ผ 3.2.1 Account data storage is kept to a minimum through implementation of data retention and disposal policies, procedures, and processes.

Include at least the following: - Coverage for all locations of stored account data. - Coverage for any sensitive authentication data (SAD) stored prior to completion of authorization. This bullet is a best practice until its effective date; refer to Applicability Notes below for details. - Limiting data storage amount and retention time to that which is required for legal or regulatory, and/or business requirements. - Specific retention requirements for stored account data that defines length of retention period and includes a documented business justification. - Processes for secure deletion or rendering account data unrecoverable when no longer needed per the retention policy. - A process for verifying, at least once every three months, that stored account data exceeding the defined retention period has been securely deleted or rendered unrecoverable.

๐Ÿ’ผ 3.2.1 Do not store the full contents of any track after authorization.

This data is alternatively called full track, track, track 1, track 2, and magnetic-stripe data. In the normal course of business, the following data elements from the magnetic stripe may need to be retained: - The cardholder's name - Primary account number (PAN) - Expiration date - Service code To minimize risk, store only these data elements as needed for business.

๐Ÿ’ผ 3.3 Ensure AWS Config is enabled in all regions - Level 2 (Automated)

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration item (AWS resource), relationships between configuration items (AWS resources), any configuration changes between resources. It is recommended AWS Config be enabled in all regions.

๐Ÿ’ผ 3.3 Ensure AWS Config is enabled in all regions (Automated)

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration items (AWS resources), relationships between configuration items (AWS resources), and any configuration changes between resources. It is recommended that AWS Config be enabled in all regions.

๐Ÿ’ผ 3.3 Ensure AWS Config is enabled in all regions (Automated)

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration items (AWS resources), relationships between configuration items (AWS resources), and any configuration changes between resources. It is recommended that AWS Config be enabled in all regions.

๐Ÿ’ผ 3.3 Ensure AWS Config is enabled in all regions (Automated)

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration items (AWS resources), relationships between configuration items (AWS resources), and any configuration changes between resources. It is recommended that AWS Config be enabled in all regions.

๐Ÿ’ผ 3.3 Ensure Storage Logging is Enabled for Queue Service for 'Read', 'Write', and 'Delete' requests - Level 2 (Automated)

The Storage Queue service stores messages that may be read by any client who has access to the storage account. A queue can contain an unlimited number of messages, each of which can be up to 64KB in size using version 2011-08-18 or newer. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the queues. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details , concurrency information and the sizes of the request and response messages.

๐Ÿ’ผ 3.3 Ensure Storage logging is enabled for Queue service for read, write, and delete requests

The Storage Queue service stores messages that may be read by any client who has access to the storage account. A queue can contain an unlimited number of messages, each of which can be up to 64KB in size using version 2011-08-18 or newer. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the queues. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details , concurrency information and the sizes of the request and response messages.

๐Ÿ’ผ 3.3 Ensure Storage logging is enabled for Queue service for read, write, and delete requests - Level 2 (Manual)

The Storage Queue service stores messages that may be read by any client who has access to the storage account. A queue can contain an unlimited number of messages, each of which can be up to 64KB in size using version 2011-08-18 or newer. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the queues. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details , concurrency information and the sizes of the request and response messages.

๐Ÿ’ผ 3.3 Ensure that DNSSEC is enabled for Cloud DNS

Cloud Domain Name System (DNS) is a fast, reliable and cost-effective domain name system that powers millions of domains on the internet. Domain Name System Security Extensions (DNSSEC) in Cloud DNS enables domain owners to take easy steps to protect their domains against DNS hijacking and man-in-the-middle and other attacks.

๐Ÿ’ผ 3.3 Ensure that DNSSEC is enabled for Cloud DNS - Level 1 (Automated)

Cloud Domain Name System (DNS) is a fast, reliable and cost-effective domain name system that powers millions of domains on the internet. Domain Name System Security Extensions (DNSSEC) in Cloud DNS enables domain owners to take easy steps to protect their domains against DNS hijacking and man-in-the-middle and other attacks.

๐Ÿ’ผ 3.3 Ensure That DNSSEC Is Enabled for Cloud DNS - Level 1 (Automated)

Cloud Domain Name System (DNS) is a fast, reliable and cost-effective domain name system that powers millions of domains on the internet. Domain Name System Security Extensions (DNSSEC) in Cloud DNS enables domain owners to take easy steps to protect their domains against DNS hijacking and man-in-the-middle and other attacks.

๐Ÿ’ผ 3.3 Ensure That DNSSEC Is Enabled for Cloud DNS - Level 1 (Automated)

Cloud Domain Name System (DNS) is a fast, reliable and cost-effective domain name system that powers millions of domains on the internet. Domain Name System Security Extensions (DNSSEC) in Cloud DNS enables domain owners to take easy steps to protect their domains against DNS hijacking and man-in-the-middle and other attacks.

๐Ÿ’ผ 3.3 Ensure That DNSSEC Is Enabled for Cloud DNS - Level 1 (Automated)

Cloud Domain Name System (DNS) is a fast, reliable and cost-effective domain name system that powers millions of domains on the internet. Domain Name System Security Extensions (DNSSEC) in Cloud DNS enables domain owners to take easy steps to protect their domains against DNS hijacking and man-in-the-middle and other attacks.

๐Ÿ’ผ 3.3 Key Vault

This section covers security recommendations to follow for the configuration and use of Azure Key Vault.

๐Ÿ’ผ 3.3.5 Ensure the Key Vault is Recoverable (Automated)

The Key Vault contains object keys, secrets, and certificates. Accidental unavailability of a Key Vault can cause immediate data loss or loss of security functions (authentication, validation, verification, non-repudiation, etc.) supported by the Key Vault objects. It is recommended the Key Vault be made recoverable by enabling the "Do Not Purge" and "Soft Delete" functions. This is in order to prevent loss of encrypted data, including storage accounts, SQL databases, and/or dependent services provided by Key Vault objects (Keys, Secrets, Certificates) etc. This may happen in the case of accidental deletion by a user or from disruptive activity by a malicious user. **NOTE**: In February 2025, Microsoft will enable soft-delete protection on all key vaults, and users will no longer be able to opt out of or turn off soft-delete. **WARNING**: A current limitation is that role assignments disappearing when Key Vault is deleted. All role assignments will need to be recreated after recovery.

๐Ÿ’ผ 3.3.6 Enable Role Based Access Control for Azure Key Vault (Automated)

The recommended way to access Key Vaults is to use the Azure Role-Based Access Control (RBAC) permissions model. Azure RBAC is an authorization system built on Azure Resource Manager that provides fine-grained access management of Azure resources. It allows users to manage Key, Secret, and Certificate permissions. It provides one place to manage all permissions across all key vaults.

๐Ÿ’ผ 3.4 All software on in-scope devices must be updated within 14 days of an update being released

All software on in-scope devices must be updated, including applying any manual configuration changes required to make the update effective, within 14 days* of an update being released, where: - the update fixes vulnerabilities described by the vendor as โ€˜criticalโ€™ or โ€˜high riskโ€™ - the update addresses vulnerabilities with a CVSS v3 base score of 7 or above - there are no details of the level of vulnerabilities the update fixes provided by the vendor Please note: For optimum security we strongly recommend (but itโ€™s not mandatory) that all released updates are applied within 14 days of release.

๐Ÿ’ผ 3.4 Ensure CloudTrail trails are integrated with CloudWatch Logs

AWS CloudTrail is a web service that records AWS API calls made in a given AWS account. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail uses Amazon S3 for log file storage and delivery, so log files are stored durably. In addition to capturing CloudTrail logs within a specified S3 bucket for long term analysis, realtime analysis can be performed by configuring CloudTrail to send logs to CloudWatch Logs. For a trail that is enabled in all regions in an account, CloudTrail sends log files from all those regions to a CloudWatch Logs log group. It is recommended that CloudTrail logs be sent to CloudWatch Logs. Note: The intent of this recommendation is to ensure AWS account activity is being captured, monitored, and appropriately alarmed on. CloudWatch Logs is a native way to accomplish this using AWS services but does not preclude the use of an alternate solution.

๐Ÿ’ผ 3.4 Ensure CloudTrail trails are integrated with CloudWatch Logs

AWS CloudTrail is a web service that records AWS API calls made in a given AWS account. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail uses Amazon S3 for log file storage and delivery, so log files are stored durably. In addition to capturing CloudTrail logs within a specified S3 bucket for long term analysis, realtime analysis can be performed by configuring CloudTrail to send logs to CloudWatch Logs. For a trail that is enabled in all regions in an account, CloudTrail sends log files from all those regions to a CloudWatch Logs log group. It is recommended that CloudTrail logs be sent to CloudWatch Logs. Note: The intent of this recommendation is to ensure AWS account activity is being captured, monitored, and appropriately alarmed on. CloudWatch Logs is a native way to accomplish this using AWS services but does not preclude the use of an alternate solution.

๐Ÿ’ผ 3.4 Ensure CloudTrail trails are integrated with CloudWatch Logs - Level 1 (Automated)

AWS CloudTrail is a web service that records AWS API calls made in a given AWS account. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail uses Amazon S3 for log file storage and delivery, so log files are stored durably. In addition to capturing CloudTrail logs within a specified S3 bucket for long term analysis, realtime analysis can be performed by configuring CloudTrail to send logs to CloudWatch Logs. For a trail that is enabled in all regions in an account, CloudTrail sends log files from all those regions to a CloudWatch Logs log group. It is recommended that CloudTrail logs be sent to CloudWatch Logs. Note: The intent of this recommendation is to ensure AWS account activity is being captured, monitored, and appropriately alarmed on. CloudWatch Logs is a native way to accomplish this using AWS services but does not preclude the use of an alternate solution.

๐Ÿ’ผ 3.4 Ensure CloudTrail trails are integrated with CloudWatch Logs - Level 1 (Automated)

AWS CloudTrail is a web service that records AWS API calls made in a given AWS account. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail uses Amazon S3 for log file storage and delivery, so log files are stored durably. In addition to capturing CloudTrail logs within a specified S3 bucket for long term analysis, real time analysis can be performed by configuring CloudTrail to send logs to CloudWatch Logs. For a trail that is enabled in all regions in an account, CloudTrail sends log files from all those regions to a CloudWatch Logs log group. It is recommended that CloudTrail logs be sent to CloudWatch Logs. Note: The intent of this recommendation is to ensure AWS account activity is being captured, monitored, and appropriately alarmed on. CloudWatch Logs is a native way to accomplish this using AWS services but does not preclude the use of an alternate solution.

๐Ÿ’ผ 3.4 Ensure That RSASHA1 Is Not Used for the Key-Signing Key in Cloud DNS DNSSEC - Level 1 (Automated)

NOTE: Currently, the SHA1 algorithm has been removed from general use by Google, and, if being used, needs to be whitelisted on a project basis by Google and will also, therefore, require a Google Cloud support contract. DNSSEC algorithm numbers in this registry may be used in CERT RRs. Zone signing (DNSSEC) and transaction security mechanisms (SIG(0) and TSIG) make use of particular subsets of these algorithms. The algorithm used for key signing should be a recommended one and it should be strong.

๐Ÿ’ผ 3.4 Ensure That RSASHA1 Is Not Used for the Key-Signing Key in Cloud DNS DNSSEC - Level 1 (Automated)

NOTE: Currently, the SHA1 algorithm has been removed from general use by Google, and, if being used, needs to be whitelisted on a project basis by Google and will also, therefore, require a Google Cloud support contract. DNSSEC algorithm numbers in this registry may be used in CERT RRs. Zone signing (DNSSEC) and transaction security mechanisms (SIG(0) and TSIG) make use of particular subsets of these algorithms. The algorithm used for key signing should be a recommended one and it should be strong.

๐Ÿ’ผ 3.4 Ensure That RSASHA1 Is Not Used for the Key-Signing Key in Cloud DNS DNSSEC - Level 1 (Manual)

NOTE: Currently, the SHA1 algorithm has been removed from general use by Google, and, if being used, needs to be whitelisted on a project basis by Google and will also, therefore, require a Google Cloud support contract. DNSSEC algorithm numbers in this registry may be used in CERT RRs. Zone signing (DNSSEC) and transaction security mechanisms (SIG(0) and TSIG) make use of particular subsets of these algorithms. The algorithm used for key signing should be a recommended one and it should be strong.

๐Ÿ’ผ 3.4 Render PAN unreadable anywhere it is stored.

Use any of the following approaches: - One-way hashes based on strong cryptography, (hash must be of the entire PAN) - Truncation (hashing cannot be used to replace the truncated segment of PAN) - Index tokens and pads (pads must be securely stored) - Strong cryptography with associated key-management processes and procedures. It is a relatively trivial effort for a malicious individual to reconstruct original PAN data if they have access to both the truncated and hashed version of a PAN. Where hashed and truncated versions of the same PAN are present in an entity's environment, additional controls must be in place to ensure that the hashed and truncated versions cannot be correlated to reconstruct the original PAN.

๐Ÿ’ผ 3.5 Ensure AWS Config is enabled in all regions

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration item (AWS resource), relationships between configuration items (AWS resources), any configuration changes between resources. It is recommended to enable AWS Config be enabled in all regions.

๐Ÿ’ผ 3.5 Ensure AWS Config is enabled in all regions

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration item (AWS resource), relationships between configuration items (AWS resources), any configuration changes between resources. It is recommended AWS Config be enabled in all regions.

๐Ÿ’ผ 3.5 Ensure AWS Config is enabled in all regions - Level 2 (Automated)

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration item (AWS resource), relationships between configuration items (AWS resources), any configuration changes between resources. It is recommended AWS Config be enabled in all regions.

๐Ÿ’ผ 3.5 Ensure AWS Config is enabled in all regions - Level 2 (Automated)

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration item (AWS resource), relationships between configuration items (AWS resources), any configuration changes between resources. It is recommended AWS Config be enabled in all regions.

๐Ÿ’ผ 3.5 Ensure CloudTrail logs are encrypted at rest using KMS CMKs - Level 2 (Automated)

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 3.5 Ensure CloudTrail logs are encrypted at rest using KMS CMKs (Automated)

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer-created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 3.5 Ensure CloudTrail logs are encrypted at rest using KMS CMKs (Automated)

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer-created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 3.5 Ensure CloudTrail logs are encrypted at rest using KMS CMKs (Automated)

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer-created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 3.5 Ensure Storage Logging is Enabled for Queue Service for 'Read', 'Write', and 'Delete' requests - Level 2 (Automated)

The Storage Queue service stores messages that may be read by any client who has access to the storage account. A queue can contain an unlimited number of messages, each of which can be up to 64KB in size using version 2011-08-18 or newer. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the queues. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details, concurrency information, and the sizes of the request and response messages.

๐Ÿ’ผ 3.5 Ensure Storage Logging is Enabled for Queue Service for 'Read', 'Write', and 'Delete' requests - Level 2 (Automated)

The Storage Queue service stores messages that may be read by any client who has access to the storage account. A queue can contain an unlimited number of messages, each of which can be up to 64KB in size using version 2011-08-18 or newer. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the queues. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details, concurrency information, and the sizes of the request and response messages.

๐Ÿ’ผ 3.5 Ensure Storage Logging is Enabled for Queue Service for 'Read', 'Write', and 'Delete' requests - Level 2 (Automated)

The Storage Queue service stores messages that may be read by any client who has access to the storage account. A queue can contain an unlimited number of messages, each of which can be up to 64KB in size using version 2011-08-18 or newer. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the queues. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details, concurrency information, and the sizes of the request and response messages.

๐Ÿ’ผ 3.5 Ensure That RSASHA1 Is Not Used for the Zone-Signing Key in Cloud DNS DNSSEC - Level 1 (Automated)

NOTE: Currently, the SHA1 algorithm has been removed from general use by Google, and, if being used, needs to be whitelisted on a project basis by Google and will also, therefore, require a Google Cloud support contract. DNSSEC algorithm numbers in this registry may be used in CERT RRs. Zone signing (DNSSEC) and transaction security mechanisms (SIG(0) and TSIG) make use of particular subsets of these algorithms. The algorithm used for key signing should be a recommended one and it should be strong.

๐Ÿ’ผ 3.5 Ensure That RSASHA1 Is Not Used for the Zone-Signing Key in Cloud DNS DNSSEC - Level 1 (Automated)

NOTE: Currently, the SHA1 algorithm has been removed from general use by Google, and, if being used, needs to be whitelisted on a project basis by Google and will also, therefore, require a Google Cloud support contract. DNSSEC algorithm numbers in this registry may be used in CERT RRs. Zone signing (DNSSEC) and transaction security mechanisms (SIG(0) and TSIG) make use of particular subsets of these algorithms. The algorithm used for key signing should be a recommended one and it should be strong.

๐Ÿ’ผ 3.5 Ensure That RSASHA1 Is Not Used for the Zone-Signing Key in Cloud DNS DNSSEC - Level 1 (Manual)

NOTE: Currently, the SHA1 algorithm has been removed from general use by Google, and, if being used, needs to be whitelisted on a project basis by Google and will also, therefore, require a Google Cloud support contract. DNSSEC algorithm numbers in this registry may be used in CERT RRs. Zone signing (DNSSEC) and transaction security mechanisms (SIG(0) and TSIG) make use of particular subsets of these algorithms. The algorithm used for key signing should be a recommended one and it should be strong.

๐Ÿ’ผ 3.5.1 PAN is rendered unreadable anywhere it is stored.

Using any of the following approaches: - One-way hashes based on strong cryptography of the entire PAN. - Truncation (hashing cannot be used to replace the truncated segment of PAN). - If hashed and truncated versions of the same PAN, or different truncation formats of the same PAN, are present in an environment, additional controls are in place such that the different versions cannot be correlated to reconstruct the original PAN. - Index tokens. - Strong cryptography with associated key-management processes and procedures.

๐Ÿ’ผ 3.5.1 PAN is rendered unreadable anywhere it is stored.

Using any of the following approaches: - One-way hashes based on strong cryptography of the entire PAN. - Truncation (hashing cannot be used to replace the truncated segment of PAN). - If hashed and truncated versions of the same PAN, or different truncation formats of the same PAN, are present in an environment, additional controls are in place such that the different versions cannot be correlated to reconstruct the original PAN. - Index tokens. - Strong cryptography with associated key-management processes and procedures.

๐Ÿ’ผ 3.5.1.3 If disk-level or partition-level encryption is used (rather than file-, column-, or field--level database encryption) to render PAN unreadable.

It is managed as follows: - Logical access is managed separately and independently of native operating system authentication and access control mechanisms. - Decryption keys are not associated with user accounts. - Authentication factors (passwords, passphrases, or cryptographic keys) that allow access to unencrypted data are stored securely.

๐Ÿ’ผ 3.5.1.3 If disk-level or partition-level encryption is used (rather than file-, column-, or field--level database encryption) to render PAN unreadable.

It is managed as follows: - Logical access is managed separately and independently of native operating system authentication and access control mechanisms. - Decryption keys are not associated with user accounts. - Authentication factors (passwords, passphrases, or cryptographic keys) that allow access to unencrypted data are stored securely.

๐Ÿ’ผ 3.5.3 Store secret and private keys used to encrypt/decrypt cardholder data in one (or more) of the described forms at all times.

The forms includes: - Encrypted with a key-encrypting key that is at least as strong as the data-encrypting key, and that is stored separately from the data-encrypting key - Within a secure cryptographic device (such as a hardware (host) security module (HSM) or PTS-approved point-of-interaction device) - As at least two full-length key components or key shares, in accordance with an industry-accepted method It is not required that public keys be stored in one of these forms.

๐Ÿ’ผ 3.6 Ensure rotation for customer-created symmetric CMKs is enabled - Level 2 (Automated)

AWS Key Management Service (KMS) allows customers to rotate the backing key which is key material stored within the KMS which is tied to the key ID of the customer-created customer master key (CMK). It is the backing key that is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can take place transparently. It is recommended that CMK key rotation be enabled for symmetric keys. Key rotation can not be enabled for any asymmetric CMK.

๐Ÿ’ผ 3.6 Ensure rotation for customer-created symmetric CMKs is enabled (Automated)

AWS Key Management Service (KMS) allows customers to rotate the backing key, which is key material stored within the KMS that is tied to the key ID of the customer-created customer master key (CMK). The backing key is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can occur transparently. It is recommended that CMK key rotation be enabled for symmetric keys. Key rotation cannot be enabled for any asymmetric CMK.

๐Ÿ’ผ 3.6 Ensure rotation for customer-created symmetric CMKs is enabled (Automated)

AWS Key Management Service (KMS) allows customers to rotate the backing key, which is key material stored within the KMS that is tied to the key ID of the customer-created customer master key (CMK). The backing key is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can occur transparently. It is recommended that CMK key rotation be enabled for symmetric keys. Key rotation cannot be enabled for any asymmetric CMK.

๐Ÿ’ผ 3.6 Ensure rotation for customer-created symmetric CMKs is enabled (Automated)

AWS Key Management Service (KMS) allows customers to rotate the backing key, which is key material stored within the KMS that is tied to the key ID of the customer-created customer master key (CMK). The backing key is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can occur transparently. It is recommended that CMK key rotation be enabled for symmetric keys. Key rotation cannot be enabled for any asymmetric CMK.

๐Ÿ’ผ 3.6 Ensure S3 bucket access logging is enabled on the CloudTrail S3 bucket

S3 Bucket Access Logging generates a log that contains access records for each request made to your S3 bucket. An access log record contains details about the request, such as the request type, the resources specified in the request worked, and the time and date the request was processed. It is recommended that bucket access logging be enabled on the CloudTrail S3 bucket.

๐Ÿ’ผ 3.6 Ensure S3 bucket access logging is enabled on the CloudTrail S3 bucket

S3 Bucket Access Logging generates a log that contains access records for each request made to your S3 bucket. An access log record contains details about the request, such as the request type, the resources specified in the request worked, and the time and date the request was processed. It is recommended that bucket access logging be enabled on the CloudTrail S3 bucket.

๐Ÿ’ผ 3.6 Ensure that SSH access is restricted from the internet

GCP 'Firewall Rules' are specific to a 'VPC Network'. Each rule either 'allows' or 'denies' traffic when its conditions are met. Its conditions allow the user to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, only an 'IPv4' address or 'IPv4 block in CIDR' notation can be used. Generic '(0.0.0.0/0)' incoming traffic from the internet to VPC or VM instance using 'SSH' on 'Port 22' can be avoided.

๐Ÿ’ผ 3.6 Ensure that SSH access is restricted from the internet - Level 2 (Automated)

GCP `Firewall Rules` are specific to a `VPC Network`. Each rule either `allows` or `denies` traffic when its conditions are met. Its conditions allow the user to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, only an `IPv4` address or `IPv4 block in CIDR` notation can be used. Generic `(0.0.0.0/0)` incoming traffic from the internet to VPC or VM instance using `SSH` on `Port 22` can be avoided.

๐Ÿ’ผ 3.6 Ensure That SSH Access Is Restricted From the Internet - Level 2 (Automated)

GCP `Firewall Rules` are specific to a `VPC Network`. Each rule either `allows` or `denies` traffic when its conditions are met. Its conditions allow the user to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, only an `IPv4` address or `IPv4 block in CIDR` notation can be used. Generic `(0.0.0.0/0)` incoming traffic from the internet to VPC or VM instance using `SSH` on `Port 22` can be avoided.

๐Ÿ’ผ 3.6 Ensure That SSH Access Is Restricted From the Internet - Level 2 (Automated)

GCP Firewall Rules are specific to a VPC Network. Each rule either allows or denies traffic when its conditions are met. Its conditions allow the user to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, only an IPv4 address or IPv4 block in CIDR notation can be used. Generic (0.0.0.0/0) incoming traffic from the internet to VPC or VM instance using SSH on Port 22 can be avoided.

๐Ÿ’ผ 3.6 Ensure That SSH Access Is Restricted From the Internet - Level 2 (Automated)

GCP `Firewall Rules` are specific to a `VPC Network`. Each rule either `allows` or `denies` traffic when its conditions are met. Its conditions allow the user to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, only an `IPv4` address or `IPv4 block in CIDR` notation can be used. Generic `(0.0.0.0/0)` incoming traffic from the internet to VPC or VM instance using `SSH` on `Port 22` can be avoided.

๐Ÿ’ผ 3.6.1.1 A documented description of the cryptographic architecture is maintained.

**Additional requirement for service providers only.** That includes: - Details of all algorithms, protocols, and keys used for the protection of stored account data, including key strength and expiry date. - Preventing the use of the same cryptographic keys in production and test environments. This bullet is a best practice until its effective date; refer to Applicability Notes below for details. - Description of the key usage for each key. - Inventory of any hardware security modules (HSMs), key management systems (KMS), and other secure cryptographic devices (SCDs) used for key management, including type and location of devices, as outlined in Requirement 12.3.4.

๐Ÿ’ผ 3.6.1.1 A documented description of the cryptographic architecture is maintained.

**Additional requirement for service providers only.** That includes: - Details of all algorithms, protocols, and keys used for the protection of stored account data, including key strength and expiry date. - Preventing the use of the same cryptographic keys in production and test environments. This bullet is a best practice until its effective date; refer to Applicability Notes below for details. - Description of the key usage for each key. - Inventory of any hardware security modules (HSMs), key management systems (KMS), and other secure cryptographic devices (SCDs) used for key management, including type and location of devices, as outlined in Requirement 12.3.4.

๐Ÿ’ผ 3.6.1.2 Secret and private keys used to encrypt/decrypt stored account data are stored in one (or more) of the described forms at all times.

The following forms: - Encrypted with a key-encrypting key that is at least as strong as the data-encrypting key, and that is stored separately from the data-encrypting key. - Within a secure cryptographic device (SCD), such as a hardware security module (HSM) or PTS-approved point-of-interaction device. - As at least two full-length key components or key shares, in accordance with an industry-accepted method.

๐Ÿ’ผ 3.6.1.2 Secret and private keys used to encrypt/decrypt stored account data are stored in one (or more) of the described forms at all times.

The following forms: - Encrypted with a key-encrypting key that is at least as strong as the data-encrypting key, and that is stored separately from the data-encrypting key. - Within a secure cryptographic device (SCD), such as a hardware security module (HSM) or PTS-approved point-of-interaction device. - As at least two full-length key components or key shares, in accordance with an industry-accepted method.

๐Ÿ’ผ 3.7 Ensure 'Trusted Microsoft Services' are Enabled for Storage Account Access - Level 2 (Automated)

Some Microsoft services that interact with storage accounts operate from networks that can't be granted access through network rules. To help this type of service work as intended, allow the set of trusted Microsoft services to bypass the network rules. These services will then use strong authentication to access the storage account. If the Allow trusted Microsoft services exception is enabled, the following services: Azure Backup, Azure Site Recovery, Azure DevTest Labs, Azure Event Grid, Azure Event Hubs, Azure Networking, Azure Monitor and Azure SQL Data Warehouse (when registered in the subscription), are granted access to the storage account.

๐Ÿ’ผ 3.7 Ensure 'Trusted Microsoft Services' is enabled for Storage Account access - Level 2 (Manual)

Some Microsoft services that interact with storage accounts operate from networks that can't be granted access through network rules. To help this type of service work as intended, allow the set of trusted Microsoft services to bypass the network rules. These services will then use strong authentication to access the storage account. If the Allow trusted Microsoft services exception is enabled, the following services: Azure Backup, Azure Site Recovery, Azure DevTest Labs, Azure Event Grid, Azure Event Hubs, Azure Networking, Azure Monitor and Azure SQL Data Warehouse (when registered in the subscription), are granted access to the storage account.

๐Ÿ’ผ 3.7 Ensure CloudTrail logs are encrypted at rest using KMS CMKs

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 3.7 Ensure CloudTrail logs are encrypted at rest using KMS CMKs

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 3.7 Ensure CloudTrail logs are encrypted at rest using KMS CMKs - Level 2 (Automated)

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 3.7 Ensure CloudTrail logs are encrypted at rest using KMS CMKs - Level 2 (Automated)

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 3.7 Ensure that RDP access is restricted from the Internet

GCP 'Firewall Rules' are specific to a 'VPC Network'. Each rule either 'allows' or 'denies' traffic when its conditions are met. Its conditions allow users to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, an 'IPv4' address or 'IPv4 block in CIDR' notation can be used. Generic '(0.0.0.0/0)' incoming traffic from the Internet to a VPC or VM instance using 'RDP' on 'Port 3389' can be avoided.

๐Ÿ’ผ 3.7 Ensure that RDP access is restricted from the Internet - Level 2 (Automated)

GCP `Firewall Rules` are specific to a `VPC Network`. Each rule either `allows` or `denies` traffic when its conditions are met. Its conditions allow users to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, an `IPv4` address or `IPv4 block in CIDR` notation can be used. Generic `(0.0.0.0/0)` incoming traffic from the Internet to a VPC or VM instance using `RDP` on `Port 3389` can be avoided.

๐Ÿ’ผ 3.7 Ensure That RDP Access Is Restricted From the Internet - Level 2 (Automated)

GCP `Firewall Rules` are specific to a `VPC Network`. Each rule either `allows` or `denies` traffic when its conditions are met. Its conditions allow users to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, an `IPv4` address or `IPv4 block in CIDR` notation can be used. Generic `(0.0.0.0/0)` incoming traffic from the Internet to a VPC or VM instance using `RDP` on `Port 3389` can be avoided.

๐Ÿ’ผ 3.7 Ensure That RDP Access Is Restricted From the Internet - Level 2 (Automated)

GCP Firewall Rules are specific to a VPC Network. Each rule either allows or denies traffic when its conditions are met. Its conditions allow users to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, an IPv4 address or IPv4 block in CIDR notation can be used. Generic (0.0.0.0/0) incoming traffic from the Internet to a VPC or VM instance using RDP on Port 3389 can be avoided.

๐Ÿ’ผ 3.7 Ensure That RDP Access Is Restricted From the Internet - Level 2 (Automated)

GCP `Firewall Rules` are specific to a `VPC Network`. Each rule either `allows` or `denies` traffic when its conditions are met. Its conditions allow users to specify the type of traffic, such as ports and protocols, and the source or destination of the traffic, including IP addresses, subnets, and instances. Firewall rules are defined at the VPC network level and are specific to the network in which they are defined. The rules themselves cannot be shared among networks. Firewall rules only support IPv4 traffic. When specifying a source for an ingress rule or a destination for an egress rule by address, an `IPv4` address or `IPv4 block in CIDR` notation can be used. Generic `(0.0.0.0/0)` incoming traffic from the Internet to a VPC or VM instance using `RDP` on `Port 3389` can be avoided.

๐Ÿ’ผ 3.7 Ensure VPC flow logging is enabled in all VPCs (Automated)

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. After you've created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs. It is recommended that VPC Flow Logs be enabled for packet "Rejects" for VPCs.

๐Ÿ’ผ 3.7 Ensure VPC flow logging is enabled in all VPCs (Automated)

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. After you've created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs. It is recommended that VPC Flow Logs be enabled for packet "Rejects" for VPCs.

๐Ÿ’ผ 3.7 Ensure VPC flow logging is enabled in all VPCs (Automated)

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. After you've created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs. It is recommended that VPC Flow Logs be enabled for packet "Rejects" for VPCs.

๐Ÿ’ผ 3.7.5 Key management policies procedures are implemented to include the retirement, replacement, or destruction of keys used to protect stored account data.

As deemed necessary when: - The key has reached the end of its defined cryptoperiod. - The integrity of the key has been weakened, including when personnel with knowledge of a cleartext key component leaves the company, or the role for which the key component was known. - The key is suspected of or known to be compromised. Retired or replaced keys are not used for encryption operations.

๐Ÿ’ผ 3.7.5 Key management policies procedures are implemented to include the retirement, replacement, or destruction of keys used to protect stored account data.

As deemed necessary when: - The key has reached the end of its defined cryptoperiod. - The integrity of the key has been weakened, including when personnel with knowledge of a cleartext key component leaves the company, or the role for which the key component was known. - The key is suspected of or known to be compromised. Retired or replaced keys are not used for encryption operations.

๐Ÿ’ผ 3.8 Ensure 'Trusted Microsoft Services' is enabled for Storage Account access

Some Microsoft services that interact with storage accounts operate from networks that can't be granted access through network rules. To help this type of service work as intended, allow the set of trusted Microsoft services to bypass the network rules. These services will then use strong authentication to access the storage account. If the Allow trusted Microsoft services exception is enabled, the following services: Azure Backup, Azure Site Recovery, Azure DevTest Labs, Azure Event Grid, Azure Event Hubs, Azure Networking, Azure Monitor and Azure SQL Data Warehouse (when registered in the subscription), are granted access to the storage account.

๐Ÿ’ผ 3.8 Ensure rotation for customer created CMKs is enabled

AWS Key Management Service (KMS) allows customers to rotate the backing key which is key material stored within the KMS which is tied to the key ID of the Customer Created customer master key (CMK). It is the backing key that is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can take place transparently. It is recommended that CMK key rotation be enabled.

๐Ÿ’ผ 3.8 Ensure rotation for customer created CMKs is enabled

AWS Key Management Service (KMS) allows customers to rotate the backing key which is key material stored within the KMS which is tied to the key ID of the Customer Created customer master key (CMK). It is the backing key that is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can take place transparently. It is recommended that CMK key rotation be enabled.

๐Ÿ’ผ 3.8 Ensure rotation for customer created symmetric CMKs is enabled - Level 2 (Automated)

AWS Key Management Service (KMS) allows customers to rotate the backing key which is key material stored within the KMS which is tied to the key ID of the Customer Created customer master key (CMK). It is the backing key that is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can take place transparently. It is recommended that CMK key rotation be enabled for symmetric keys. Key rotation can not be enabled for any asymmetric CMK.

๐Ÿ’ผ 3.8 Ensure rotation for customer created symmetric CMKs is enabled - Level 2 (Automated)

AWS Key Management Service (KMS) allows customers to rotate the backing key which is key material stored within the KMS which is tied to the key ID of the Customer Created customer master key (CMK). It is the backing key that is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can take place transparently. It is recommended that CMK key rotation be enabled for symmetric keys. Key rotation can not be enabled for any asymmetric CMK.

๐Ÿ’ผ 3.8 Ensure soft delete is enabled for Azure Storage - Level 1 (Automated)

The Azure Storage blobs contain data like ePHI, Financial, secret or personal. Erroneously modified or deleted accidentally by an application or other storage account user cause data loss or data unavailability. It is recommended the Azure Storage be made recoverable by enabling **soft delete** configuration. This is to save and recover data when blobs or blob snapshots are deleted.

๐Ÿ’ผ 3.8 Ensure Soft Delete is Enabled for Azure Storage - Level 1 (Automated)

The Azure Storage blobs contain data like ePHI, Financial, secret or personal. Erroneously modified or deleted accidentally by an application or other storage account user cause data loss or data unavailability. It is recommended the Azure Storage be made recoverable by enabling **soft delete** configuration. This is to save and recover data when blobs or blob snapshots are deleted.

๐Ÿ’ผ 3.9 Ensure 'Allow Azure services on the trusted services list to access this storage account' is Enabled for Storage Account Access - Level 2 (Automated)

Some Azure services that interact with storage accounts operate from networks that can't be granted access through network rules. To help this type of service work as intended, allow the set of trusted Azure services to bypass the network rules. These services will then use strong authentication to access the storage account. If the Allow trusted Azure services exception is enabled, the following services are granted access to the storage account: Azure Backup, Azure Site Recovery, Azure DevTest Labs, Azure Event Grid, Azure Event Hubs, Azure Networking, Azure Monitor, and Azure SQL Data Warehouse (when registered in the subscription).

๐Ÿ’ผ 3.9 Ensure 'Allow Azure services on the trusted services list to access this storage account' is Enabled for Storage Account Access - Level 2 (Automated)

Some Azure services that interact with storage accounts operate from networks that can't be granted access through network rules. To help this type of service work as intended, allow the set of trusted Azure services to bypass the network rules. These services will then use strong authentication to access the storage account. If the Allow trusted Azure services exception is enabled, the following services are granted access to the storage account: Azure Backup, Azure Site Recovery, Azure DevTest Labs, Azure Event Grid, Azure Event Hubs, Azure Networking, Azure Monitor, and Azure SQL Data Warehouse (when registered in the subscription).

๐Ÿ’ผ 3.9 Ensure 'Allow Azure services on the trusted services list to access this storage account' is Enabled for Storage Account Access - Level 2 (Automated)

Some Azure services that interact with storage accounts operate from networks that can't be granted access through network rules. To help this type of service work as intended, allow the set of trusted Azure services to bypass the network rules. These services will then use strong authentication to access the storage account. If the Allow trusted Azure services exception is enabled, the following services are granted access to the storage account: Azure Backup, Azure Site Recovery, Azure DevTest Labs, Azure Event Grid, Azure Event Hubs, Azure Networking, Azure Monitor, and Azure SQL Data Warehouse (when registered in the subscription).

๐Ÿ’ผ 3.9 Ensure no HTTPS or SSL proxy load balancers permit SSL policies with weak cipher suites

Secure Sockets Layer (SSL) policies determine what port Transport Layer Security (TLS) features clients are permitted to use when connecting to load balancers. To prevent usage of insecure features, SSL policies should use (a) at least TLS 1.2 with the MODERN profile; or (b) the RESTRICTED profile, because it effectively requires clients to use TLS 1.2 regardless of the chosen minimum TLS version; or (3) a CUSTOM profile that does not support any of the following features: ''' TLS_RSA_WITH_AES_128_GCM_SHA256 TLS_RSA_WITH_AES_256_GCM_SHA384 TLS_RSA_WITH_AES_128_CBC_SHA TLS_RSA_WITH_AES_256_CBC_SHA TLS_RSA_WITH_3DES_EDE_CBC_SHA '''

๐Ÿ’ผ 3.9 Ensure no HTTPS or SSL proxy load balancers permit SSL policies with weak cipher suites - Level 1 (Manual)

Secure Sockets Layer (SSL) policies determine what port Transport Layer Security (TLS) features clients are permitted to use when connecting to load balancers. To prevent usage of insecure features, SSL policies should use (a) at least TLS 1.2 with the MODERN profile; or (b) the RESTRICTED profile, because it effectively requires clients to use TLS 1.2 regardless of the chosen minimum TLS version; or (3) a CUSTOM profile that does not support any of the following features: ``` TLS_RSA_WITH_AES_128_GCM_SHA256 TLS_RSA_WITH_AES_256_GCM_SHA384 TLS_RSA_WITH_AES_128_CBC_SHA TLS_RSA_WITH_AES_256_CBC_SHA TLS_RSA_WITH_3DES_EDE_CBC_SHA ```

๐Ÿ’ผ 3.9 Ensure No HTTPS or SSL Proxy Load Balancers Permit SSL Policies With Weak Cipher Suites - Level 1 (Manual)

Secure Sockets Layer (SSL) policies determine what port Transport Layer Security (TLS) features clients are permitted to use when connecting to load balancers. To prevent usage of insecure features, SSL policies should use (a) at least TLS 1.2 with the MODERN profile; or (b) the RESTRICTED profile, because it effectively requires clients to use TLS 1.2 regardless of the chosen minimum TLS version; or (3) a CUSTOM profile that does not support any of the following features: ``` TLS_RSA_WITH_AES_128_GCM_SHA256 TLS_RSA_WITH_AES_256_GCM_SHA384 TLS_RSA_WITH_AES_128_CBC_SHA TLS_RSA_WITH_AES_256_CBC_SHA TLS_RSA_WITH_3DES_EDE_CBC_SHA ```

๐Ÿ’ผ 3.9 Ensure No HTTPS or SSL Proxy Load Balancers Permit SSL Policies With Weak Cipher Suites - Level 1 (Manual)

Secure Sockets Layer (SSL) policies determine what port Transport Layer Security (TLS) features clients are permitted to use when connecting to load balancers. To prevent usage of insecure features, SSL policies should use (a) at least TLS 1.2 with the MODERN profile; or (b) the RESTRICTED profile, because it effectively requires clients to use TLS 1.2 regardless of the chosen minimum TLS version; or (3) a CUSTOM profile that does not support any of the following features: TLS_RSA_WITH_AES_128_GCM_SHA256 TLS_RSA_WITH_AES_256_GCM_SHA384 TLS_RSA_WITH_AES_128_CBC_SHA TLS_RSA_WITH_AES_256_CBC_SHA TLS_RSA_WITH_3DES_EDE_CBC_SHA

๐Ÿ’ผ 3.9 Ensure No HTTPS or SSL Proxy Load Balancers Permit SSL Policies With Weak Cipher Suites - Level 1 (Manual)

Secure Sockets Layer (SSL) policies determine what port Transport Layer Security (TLS) features clients are permitted to use when connecting to load balancers. To prevent usage of insecure features, SSL policies should use (a) at least TLS 1.2 with the MODERN profile; or (b) the RESTRICTED profile, because it effectively requires clients to use TLS 1.2 regardless of the chosen minimum TLS version; or (3) a CUSTOM profile that does not support any of the following features: ``` TLS_RSA_WITH_AES_128_GCM_SHA256 TLS_RSA_WITH_AES_256_GCM_SHA384 TLS_RSA_WITH_AES_128_CBC_SHA TLS_RSA_WITH_AES_256_CBC_SHA TLS_RSA_WITH_3DES_EDE_CBC_SHA ```

๐Ÿ’ผ 3.9 Ensure VPC flow logging is enabled in all VPCs

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. After you've created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs. It is recommended that VPC Flow Logs be enabled for packet "Rejects" for VPCs.

๐Ÿ’ผ 3.9 Ensure VPC flow logging is enabled in all VPCs

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. After you've created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs. It is recommended that VPC Flow Logs be enabled for packet "Rejects" for VPCs.

๐Ÿ’ผ 30 APRA-regulated entities record information assets in various ways, sometimes at a very granular level and sometimes at an aggregated level. For example, a system can be seen as an aggregation of the underlying components (such as applications, databases, operating systems, middleware and data sets) and treated as a single information asset for classification purposes. Alternatively, a regulated entity could choose to treat each ofthe underlying components as individual information assets in their own right. Ultimately, the level of granularity would be sufficient to determine the nature and strength of controls required to protect the information asset.

๐Ÿ’ผ 34 Under CPS 234, an APRA-regulated entity must have information security controls to protect its information assets commensurate with, amongst other things, the stage at which the information assets are within their life-cycle. This includes ensuring that information security controls remain effective at each stage of the life-cycle of the information asset and that there is formal allocation of responsibility and accountability for the information security of an information asset to an information asset owner. Typically, the information asset owner would be an individual located within the business function which is most dependent on the information asset

๐Ÿ’ผ 4 Database Services

This section covers security recommendations to follow to set general database services policies on an Azure Subscription. Subsections will address specific database types.

๐Ÿ’ผ 4 Database Services

This section covers security recommendations to follow to set general database services policies on an Azure Subscription. Subsections will address specific database types.

๐Ÿ’ผ 4 Database Services

This section covers security recommendations to follow to set general database services policies on an Azure Subscription. Subsections will address specific database types.

๐Ÿ’ผ 4 Database Services

This section covers security recommendations to follow to set general database services policies on an Azure Subscription. Subsections will address specific database types.

๐Ÿ’ผ 4 Database Services

This section covers security recommendations to follow to set general database services policies on an Azure Subscription. Subsections will address specific database types.

๐Ÿ’ผ 4 Monitoring

For effectiveness and coverage of recommended metric-filters and alarms, recommendations in Section 3 should be implemented on Multi-region CloudTrail referred in 'Ensure CloudTrail is enabled in all regions' Updated Overview should look like: This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the 'Ensure CloudTrail is enabled in all regions' and 'Ensure CloudTrail trails are integrated with CloudWatch Logs' recommendation in the "Logging" section. Additionally, step 3 of the remediation procedure for the same recommendations provides guidance for establishing an email-based subscription ('--protocol email'). This is provided as an example and is not meant to suggest other protocols provide lesser value.

๐Ÿ’ผ 4 Monitoring

This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities.

๐Ÿ’ผ 4 Monitoring

This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the `Ensure CloudTrail is enabled in all regions` and `Ensure CloudTrail trails are integrated with CloudWatch Logs` recommendation in the "Logging" section.

๐Ÿ’ผ 4 Monitoring

This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the `Ensure CloudTrail is enabled in all regions` and `Ensure CloudTrail trails are integrated with CloudWatch Logs` recommendation in the "Logging" section.

๐Ÿ’ผ 4 Monitoring

This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the `Ensure CloudTrail is enabled in all regions` and `Ensure CloudTrail trails are integrated with CloudWatch Logs` recommendation in the "Logging" section.

๐Ÿ’ผ 4 Monitoring

This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the `Ensure CloudTrail is enabled in all regions` and `Ensure CloudTrail trails are integrated with CloudWatch Logs` recommendations in the "Logging" section.

๐Ÿ’ผ 4 Monitoring

This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the `Ensure CloudTrail is enabled in all regions` and `Ensure CloudTrail trails are integrated with CloudWatch Logs` recommendations in the "Logging" section.

๐Ÿ’ผ 4 Monitoring

This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the `Ensure CloudTrail is enabled in all regions` and `Ensure CloudTrail trails are integrated with CloudWatch Logs` recommendations in the "Logging" section.

๐Ÿ’ผ 4 Networking

This section contains recommendations for configuring security-related aspects of the default Virtual Private Cloud (VPC)

๐Ÿ’ผ 4 Policies

This section contains recommendations for various Kubernetes policies which are important to the security of the environment.

๐Ÿ’ผ 4 Storage Accounts

This section covers security recommendations to follow to set storage account policies on an Azure Subscription. An Azure storage account provides a unique namespace to store and access Azure Storage data objects.

๐Ÿ’ผ 4 Worker Nodes

This section consists of security recommendations for the components that run on Kubernetes worker nodes. Note that these components may also run on Kubernetes master nodes, so the recommendations in this section should be applied to master nodes as well as worker nodes where the master nodes make use of these components.

๐Ÿ’ผ 4.1 Ensure CloudTrail is enabled in all regions (Manual)

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service. CloudTrail provides a history of AWS API calls for an account, including API calls made via the Management Console, SDKs, command line tools, and higher-level AWS services (such as CloudFormation).

๐Ÿ’ผ 4.1 Ensure unauthorized API calls are monitored - Level 2 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for unauthorized API calls.

๐Ÿ’ผ 4.1 Ensure unauthorized API calls are monitored - Level 2 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for unauthorized API calls.

๐Ÿ’ผ 4.1 Ensure unauthorized API calls are monitored (Automated)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for unauthorized API calls.

๐Ÿ’ผ 4.1 Ensure unauthorized API calls are monitored (Automated)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for unauthorized API calls.

๐Ÿ’ผ 4.1 Ensure unauthorized API calls are monitored (Automated)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for unauthorized API calls.

๐Ÿ’ผ 4.1 SQL Server - Auditing

Auditing for Azure SQL Servers and SQL Databases tracks database events and writes them to an audit log Azure storage account, Log Analytics workspace or Event Hubs. Auditing helps to maintain regulatory compliance, understand database activity, and gain insight into discrepancies and anomalies that could indicate business concerns or suspected security violations. Auditing enables and facilitates adherence to compliance standards, although it doesn't guarantee compliance. The Default SQL Server Auditing profile set for SQL server is inherited by all the SQL Databases which are part of the SQL server.

๐Ÿ’ผ 4.1 SQL Server - Auditing

Auditing for Azure SQL Servers and SQL Databases tracks database events and writes them to an audit log Azure storage account, Log Analytics workspace or Event Hubs. Auditing helps to maintain regulatory compliance, understand database activity, and gain insight into discrepancies and anomalies that could indicate business concerns or suspected security violations. Auditing enables and facilitates adherence to compliance standards, although it doesn't guarantee compliance. The Default SQL Server Auditing profile set for SQL server is inherited by all the SQL Databases which are part of the SQL server.

๐Ÿ’ผ 4.1 SQL Server - Auditing

Auditing for Azure SQL Servers and SQL Databases tracks database events and writes them to an audit log Azure storage account, Log Analytics workspace or Event Hubs. Auditing helps to maintain regulatory compliance, understand database activity, and gain insight into discrepancies and anomalies that could indicate business concerns or suspected security violations. Auditing enables and facilitates adherence to compliance standards, although it doesn't guarantee compliance. The Default SQL Server Auditing profile set for SQL server is inherited by all the SQL Databases which are part of the SQL server.

๐Ÿ’ผ 4.1 Use strong cryptography and security protocols to safeguard sensitive cardholder data during transmission over open, public networks.

Including the following: - Only trusted keys and certificates are accepted. - The protocol in use only supports secure versions or configurations. - The encryption strength is appropriate for the encryption methodology in use. Examples of open, public networks include but are not limited to: - The Internet - Wireless technologies, including 802.11 and Bluetooth - Cellular technologies, for example, Global System for Mobile communications (GSM), Code division multiple access (CDMA) - General Packet Radio Service (GPRS) - Satellite communications

๐Ÿ’ผ 4.1.10 Avoid non-default bindings to system:authenticated (Automated)

Avoid non-default `ClusterRoleBindings` and `RoleBindings` with the group `system:authenticated`, except the `ClusterRoleBindings` `system:basic-user`, `system:discovery`, and `system:public-info-viewer`. Google's approach to authentication is to make authenticating to Google Cloud and GKE as simple and secure as possible without adding complex configuration steps. The group `system:authenticated` includes all users with a Google account, which includes all Gmail accounts. Consider your authorization controls with this extended group scope when granting permissions. Thus, group `system:authenticated` is not recommended for non-default use.

๐Ÿ’ผ 4.1.2 Minimize access to secrets (Automated)

The Kubernetes API stores secrets, which may be service account tokens for the Kubernetes API or credentials used by workloads in the cluster. Access to these secrets should be restricted to the smallest possible group of users to reduce the risk of privilege escalation.

๐Ÿ’ผ 4.1.3 Ensure SQL server's Transparent Data Encryption (TDE) protector is encrypted with Customer-managed key - Level 2 (Automated)

Transparent Data Encryption (TDE) with Customer-managed key support provides increased transparency and control over the TDE Protector, increased security with an HSM-backed external service, and promotion of separation of duties. With TDE, data is encrypted at rest with a symmetric key (called the database encryption key) stored in the database or data warehouse distribution. To protect this data encryption key (DEK) in the past, only a certificate that the Azure SQL Service managed could be used. Now, with Customer-managed key support for TDE, the DEK can be protected with an asymmetric key that is stored in the Azure Key Vault. The Azure Key Vault is a highly available and scalable cloud-based key store which offers central key management, leverages FIPS 140-2 Level 2 validated hardware security modules (HSMs), and allows separation of management of keys and data for additional security. Based on business needs or criticality of data/databases hosted on a SQL server, it is recommended that the TDE protector is encrypted by a key that is managed by the data owner (Customer-managed key).

๐Ÿ’ผ 4.1.3 Ensure SQL server's Transparent Data Encryption (TDE) protector is encrypted with Customer-managed key - Level 2 (Automated)

Transparent Data Encryption (TDE) with Customer-managed key support provides increased transparency and control over the TDE Protector, increased security with an HSM-backed external service, and promotion of separation of duties. With TDE, data is encrypted at rest with a symmetric key (called the database encryption key) stored in the database or data warehouse distribution. To protect this data encryption key (DEK) in the past, only a certificate that the Azure SQL Service managed could be used. Now, with Customer-managed key support for TDE, the DEK can be protected with an asymmetric key that is stored in the Azure Key Vault. The Azure Key Vault is a highly available and scalable cloud-based key store which offers central key management, leverages FIPS 140-2 Level 2 validated hardware security modules (HSMs), and allows separation of management of keys and data for additional security. Based on business needs or criticality of data/databases hosted on a SQL server, it is recommended that the TDE protector is encrypted by a key that is managed by the data owner (Customer-managed key).

๐Ÿ’ผ 4.1.3 Ensure SQL server's Transparent Data Encryption (TDE) protector is encrypted with Customer-managed key - Level 2 (Automated)

Transparent Data Encryption (TDE) with Customer-managed key support provides increased transparency and control over the TDE Protector, increased security with an HSM-backed external service, and promotion of separation of duties. With TDE, data is encrypted at rest with a symmetric key (called the database encryption key) stored in the database or data warehouse distribution. To protect this data encryption key (DEK) in the past, only a certificate that the Azure SQL Service managed could be used. Now, with Customer-managed key support for TDE, the DEK can be protected with an asymmetric key that is stored in the Azure Key Vault. The Azure Key Vault is a highly available and scalable cloud-based key store which offers central key management, leverages FIPS 140-2 Level 2 validated hardware security modules (HSMs), and allows separation of management of keys and data for additional security. Based on business needs or criticality of data/databases hosted on a SQL server, it is recommended that the TDE protector is encrypted by a key that is managed by the data owner (Customer-managed key).

๐Ÿ’ผ 4.1.3 Minimize wildcard use in Roles and ClusterRoles (Automated)

Kubernetes Roles and ClusterRoles provide access to resources based on sets of objects and actions that can be taken on those objects. It is possible to set either of these to be the wildcard `*`, which matches all items. Use of wildcards is not optimal from a security perspective as it may allow for inadvertent access to be granted when new resources are added to the Kubernetes API either as CRDs or in later versions of the product.

๐Ÿ’ผ 4.10 Ensure a log metric filter and alarm exist for security group changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. Security Groups are a stateful packet filter that controls ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established for detecting changes to Security Groups.

๐Ÿ’ผ 4.10 Ensure a log metric filter and alarm exist for security group changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. Security Groups are a stateful packet filter that controls ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established for detecting changes to Security Groups.

๐Ÿ’ผ 4.10 Ensure security group changes are monitored - Level 2 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. Security Groups are a stateful packet filter that controls ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established for detecting changes to Security Groups.

๐Ÿ’ผ 4.10 Ensure security group changes are monitored - Level 2 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. Security Groups are a stateful packet filter that controls ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established for detecting changes to Security Groups.

๐Ÿ’ผ 4.10 Ensure security group changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Security groups are stateful packet filters that control ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established to detect changes to security groups.

๐Ÿ’ผ 4.10 Ensure security group changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Security groups are stateful packet filters that control ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established to detect changes to security groups.

๐Ÿ’ผ 4.10 Ensure security group changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Security groups are stateful packet filters that control ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established to detect changes to security groups.

๐Ÿ’ผ 4.10 Ensure Soft Delete is Enabled for Azure Containers and Blob Storage (Automated)

The Azure Storage blobs contain data like ePHI or Financial, which can be secret or personal. Data that is erroneously modified or deleted by an application or other storage account user will cause data loss or unavailability. It is recommended that both Azure Containers with attached Blob Storage and standalone containers with Blob Storage be made recoverable by enabling the **soft delete** configuration. This is to save and recover data when blobs or blob snapshots are deleted.

๐Ÿ’ผ 4.10 Ensure SQL server's TDE protector is encrypted with BYOK (Use your own key)

TDE with BYOK support provides increased transparency and control over the TDE Protector, increased security with an HSM-backed external service, and promotion of separation of duties. With TDE, data is encrypted at rest with a symmetric key (called the database encryption key) stored in the database or data warehouse distribution. To protect this data encryption key (DEK) in the past, only a certificate that the Azure SQL Service managed could be used. Now, with BYOK support for TDE, the DEK can be protected with an asymmetric key that is stored in the Key Vault. Key Vault is a highly available and scalable cloud-based key store which offers central key management, leverages FIPS 140-2 Level 2 validated hardware security modules (HSMs), and allows separation of management of keys and data, for additional security. Based on business needs or criticality of data/databases hosted a SQL server, it is recommended that the TDE protector is encrypted by a key that is managed by the data owner (BYOK).

๐Ÿ’ผ 4.11 Ensure Network Access Control List (NACL) changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. NACLs are used as a stateless packet filter to control ingress and egress traffic for subnets within a VPC. It is recommended that a metric filter and alarm be established for any changes made to NACLs.

๐Ÿ’ผ 4.11 Ensure Network Access Control List (NACL) changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. NACLs are used as a stateless packet filter to control ingress and egress traffic for subnets within a VPC. It is recommended that a metric filter and alarm be established for any changes made to NACLs.

๐Ÿ’ผ 4.11 Ensure Network Access Control List (NACL) changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. NACLs are used as a stateless packet filter to control ingress and egress traffic for subnets within a VPC. It is recommended that a metric filter and alarm be established for any changes made to NACLs.

๐Ÿ’ผ 4.11 Ensure Network Access Control Lists (NACL) changes are monitored - Level 2 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. NACLs are used as a stateless packet filter to control ingress and egress traffic for subnets within a VPC. It is recommended that a metric filter and alarm be established for changes made to NACLs.

๐Ÿ’ผ 4.11 Ensure Network Access Control Lists (NACL) changes are monitored - Level 2 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. NACLs are used as a stateless packet filter to control ingress and egress traffic for subnets within a VPC. It is recommended that a metric filter and alarm be established for changes made to NACLs.

๐Ÿ’ผ 4.11 Ensure that Compute instances have Confidential Computing enabled - Level 2 (Automated)

Google Cloud encrypts data at-rest and in-transit, but customer data must be decrypted for processing. Confidential Computing is a breakthrough technology which encrypts data in-useโ€”while it is being processed. Confidential Computing environments keep data encrypted in memory and elsewhere outside the central processing unit (CPU). Confidential VMs leverage the Secure Encrypted Virtualization (SEV) feature of AMD EPYCโ„ข CPUs. Customer data will stay encrypted while it is used, indexed, queried, or trained on. Encryption keys are generated in hardware, per VM, and not exportable. Thanks to built-in hardware optimizations of both performance and security, there is no significant performance penalty to Confidential Computing workloads. Will be supported in the near future.

๐Ÿ’ผ 4.11 Ensure That Compute Instances Have Confidential Computing Enabled - Level 2 (Automated)

Google Cloud encrypts data at-rest and in-transit, but customer data must be decrypted for processing. Confidential Computing is a breakthrough technology which encrypts data in-useโ€”while it is being processed. Confidential Computing environments keep data encrypted in memory and elsewhere outside the central processing unit (CPU). Confidential VMs leverage the Secure Encrypted Virtualization (SEV) feature of AMD EPYCโ„ข CPUs. Customer data will stay encrypted while it is used, indexed, queried, or trained on. Encryption keys are generated in hardware, per VM, and not exportable. Thanks to built-in hardware optimizations of both performance and security, there is no significant performance penalty to Confidential Computing workloads.

๐Ÿ’ผ 4.11 Ensure That Compute Instances Have Confidential Computing Enabled - Level 2 (Automated)

Google Cloud encrypts data at-rest and in-transit, but customer data must be decrypted for processing. Confidential Computing is a breakthrough technology which encrypts data in-useโ€”while it is being processed. Confidential Computing environments keep data encrypted in memory and elsewhere outside the central processing unit (CPU). Confidential VMs leverage the Secure Encrypted Virtualization (SEV) feature of AMD EPYCโ„ข CPUs. Customer data will stay encrypted while it is used, indexed, queried, or trained on. Encryption keys are generated in hardware, per VM, and not exportable. Thanks to built-in hardware optimizations of both performance and security, there is no significant performance penalty to Confidential Computing workloads.

๐Ÿ’ผ 4.11 Ensure That Compute Instances Have Confidential Computing Enabled - Level 2 (Automated)

Google Cloud encrypts data at-rest and in-transit, but customer data must be decrypted for processing. Confidential Computing is a breakthrough technology which encrypts data in-useโ€”while it is being processed. Confidential Computing environments keep data encrypted in memory and elsewhere outside the central processing unit (CPU). Confidential VMs leverage the Secure Encrypted Virtualization (SEV) feature of AMD EPYCโ„ข CPUs. Customer data will stay encrypted while it is used, indexed, queried, or trained on. Encryption keys are generated in hardware, per VM, and not exportable. Thanks to built-in hardware optimizations of both performance and security, there is no significant performance penalty to Confidential Computing workloads.

๐Ÿ’ผ 4.12 Ensure changes to network gateways are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. Network gateways are required to send/receive traffic to a destination outside of a VPC. It is recommended that a metric filter and alarm be established for changes to network gateways.

๐Ÿ’ผ 4.12 Ensure changes to network gateways are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. Network gateways are required to send/receive traffic to a destination outside of a VPC. It is recommended that a metric filter and alarm be established for changes to network gateways.

๐Ÿ’ผ 4.12 Ensure changes to network gateways are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Network gateways are required to send and receive traffic to a destination outside of a VPC. It is recommended that a metric filter and alarm be established for changes to network gateways.

๐Ÿ’ผ 4.12 Ensure changes to network gateways are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Network gateways are required to send and receive traffic to a destination outside of a VPC. It is recommended that a metric filter and alarm be established for changes to network gateways.

๐Ÿ’ผ 4.12 Ensure changes to network gateways are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Network gateways are required to send and receive traffic to a destination outside of a VPC. It is recommended that a metric filter and alarm be established for changes to network gateways.

๐Ÿ’ผ 4.12 Ensure Storage Logging is Enabled for Queue Service for 'Read', 'Write', and 'Delete' requests (Automated)

The Storage Queue service stores messages that may be read by any client who has access to the storage account. A queue can contain an unlimited number of messages, each of which can be up to 64KB in size using version 2011-08-18 or newer. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the queues. Storage Logging log entries contain the following information about individual requests: Timing information such as start time, end-to-end latency, and server latency, authentication details, concurrency information, and the sizes of the request and response messages.

๐Ÿ’ผ 4.12 Ensure the Latest Operating System Updates Are Installed On Your Virtual Machines in All Projects - Level 2 (Manual)

For the virtual machines where you manage the operating system in Infrastructure as a Service (IaaS), you are responsible for keeping these operating systems and programs up to date. There are multiple ways to manage updates yourself that would be difficult to fit into one recommendation. Check the CIS Benchmarks for each of your Operating Systems as well for potential solutions there. In this recommendation we will use a feature in Google Cloud via its VM manager API to manage updates called Operating System Patch Management (referred to OS Patch Management from here on out). This may requires installing the OS Config API if it is not already installed. Also if you install custom operating systems, they may not functionally support the local OS config agent required to gather operating system patch information and issue update commands. These update commands are the default Linux and Windows commands to install updates such as yum or apt. This feature allows for a central management to issue those commands. OS Patch management also does not host the updates itself, so your VMs will need to be public or be able to access the internet. This is not the only Patch Management solution available to your organization and you should weigh your needs before committing to using this.

๐Ÿ’ผ 4.12 Ensure the Latest Operating System Updates Are Installed On Your Virtual Machines in All Projects - Level 2 (Manual)

Google Cloud Virtual Machines have the ability via an OS Config agent API to periodically (about every 10 minutes) report OS inventory data. A patch compliance API periodically reads this data, and cross references metadata to determine if the latest updates are installed. This is not the only Patch Management solution available to your organization and you should weigh your needs before committing to using this method.

๐Ÿ’ผ 4.12 Ensure the Latest Operating System Updates Are Installed On Your Virtual Machines in All Projects - Level 2 (Manual)

Google Cloud Virtual Machines have the ability via an OS Config agent API to periodically (about every 10 minutes) report OS inventory data. A patch compliance API periodically reads this data, and cross references metadata to determine if the latest updates are installed. This is not the only Patch Management solution available to your organization and you should weigh your needs before committing to using this method.

๐Ÿ’ผ 4.13 Ensure a log metric filter and alarm exist for route table changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 4.13 Ensure a log metric filter and alarm exist for route table changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 4.13 Ensure route table changes are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 4.13 Ensure route table changes are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 4.13 Ensure route table changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 4.13 Ensure route table changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 4.13 Ensure route table changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 4.13 Ensure Storage logging is Enabled for Blob Service for 'Read', 'Write', and 'Delete' requests (Automated)

The Storage Blob service provides scalable, cost-efficient object storage in the cloud. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the blobs. Storage Logging log entries contain the following information about individual requests: timing information such as start time, end-to-end latency, and server latency; authentication details; concurrency information; and the sizes of the request and response messages.

๐Ÿ’ผ 4.14 Ensure a log metric filter and alarm exist for VPC changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. It is possible to have more than 1 VPC within an account, in addition it is also possible to create a peer connection between 2 VPCs enabling network traffic to route between VPCs. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 4.14 Ensure a log metric filter and alarm exist for VPC changes

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. It is possible to have more than 1 VPC within an account, in addition it is also possible to create a peer connection between 2 VPCs enabling network traffic to route between VPCs. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 4.14 Ensure a log metric filter and alarm exist for VPC changes - Level 1 (Automated)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs and establishing corresponding metric filters and alarms. It is possible to have more than 1 VPC within an account, in addition it is also possible to create a peer connection between 2 VPCs enabling network traffic to route between VPCs. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 4.14 Ensure Storage Logging is Enabled for Table Service for 'Read', 'Write', and 'Delete' Requests (Automated)

Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schema-less design. Storage Logging happens server-side and allows details for both successful and failed requests to be recorded in the storage account. These logs allow users to see the details of read, write, and delete operations against the tables. Storage Logging log entries contain the following information about individual requests: timing information such as start time, end-to-end latency, and server latency; authentication details; concurrency information; and the sizes of the request and response messages.

๐Ÿ’ผ 4.14 Ensure VPC changes are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is possible to have more than 1 VPC within an account, in addition it is also possible to create a peer connection between 2 VPCs enabling network traffic to route between VPCs. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 4.14 Ensure VPC changes are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is possible to have more than 1 VPC within an account, in addition it is also possible to create a peer connection between 2 VPCs enabling network traffic to route between VPCs. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 4.14 Ensure VPC changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is possible to have more than one VPC within an account; additionally, it is also possible to create a peer connection between two VPCs, enabling network traffic to route between them. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 4.14 Ensure VPC changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is possible to have more than one VPC within an account; additionally, it is also possible to create a peer connection between two VPCs, enabling network traffic to route between them. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 4.14 Ensure VPC changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is possible to have more than one VPC within an account; additionally, it is also possible to create a peer connection between two VPCs, enabling network traffic to route between them. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 4.15 Ensure AWS Organizations changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes made to AWS Organizations in the master AWS account.

๐Ÿ’ผ 4.15 Ensure AWS Organizations changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes made to AWS Organizations in the master AWS account.

๐Ÿ’ผ 4.15 Ensure AWS Organizations changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes made to AWS Organizations in the master AWS account.

๐Ÿ’ผ 4.16 Ensure 'Cross Tenant Replication' is not enabled (Automated)

Cross Tenant Replication in Azure allows data to be replicated across multiple Azure tenants. While this feature can be beneficial for data sharing and availability, it also poses a significant security risk if not properly managed. Unauthorized data access, data leakage, and compliance violations are potential risks. Disabling Cross Tenant Replication ensures that data is not inadvertently replicated across different tenant boundaries without explicit authorization.

๐Ÿ’ผ 4.16 Ensure AWS Security Hub is enabled - Level 2 (Automated)

Security Hub collects security data from across AWS accounts, services, and supported third-party partner products and helps you analyze your security trends and identify the highest priority security issues. When you enable Security Hub, it begins to consume, aggregate, organize, and prioritize findings from AWS services that you have enabled, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. You can also enable integrations with AWS partner security products.

๐Ÿ’ผ 4.16 Ensure AWS Security Hub is enabled - Level 2 (Automated)

Security Hub collects security data from across AWS accounts, services, and supported third-party partner products and helps you analyze your security trends and identify the highest priority security issues. When you enable Security Hub, it begins to consume, aggregate, organize, and prioritize findings from AWS services that you have enabled, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. You can also enable integrations with AWS partner security products.

๐Ÿ’ผ 4.16 Ensure AWS Security Hub is enabled - Level 2 (Automated)

Security Hub collects security data from across AWS accounts, services, and supported third-party partner products and helps you analyze your security trends and identify the highest priority security issues. When you enable Security Hub, it begins to consume, aggregate, organize, and prioritize findings from AWS services that you have enabled, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. You can also enable integrations with AWS partner security products.

๐Ÿ’ผ 4.16 Ensure AWS Security Hub is enabled (Automated)

Security Hub collects security data from various AWS accounts, services, and supported third-party partner products, helping you analyze your security trends and identify the highest-priority security issues. When you enable Security Hub, it begins to consume, aggregate, organize, and prioritize findings from the AWS services that you have enabled, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. You can also enable integrations with AWS partner security products.

๐Ÿ’ผ 4.16 Ensure AWS Security Hub is enabled (Automated)

Security Hub collects security data from various AWS accounts, services, and supported third-party partner products, helping you analyze your security trends and identify the highest-priority security issues. When you enable Security Hub, it begins to consume, aggregate, organize, and prioritize findings from the AWS services that you have enabled, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. You can also enable integrations with AWS partner security products.

๐Ÿ’ผ 4.16 Ensure AWS Security Hub is enabled (Automated)

Security Hub collects security data from various AWS accounts, services, and supported third-party partner products, helping you analyze your security trends and identify the highest-priority security issues. When you enable Security Hub, it begins to consume, aggregate, organize, and prioritize findings from the AWS services that you have enabled, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. You can also enable integrations with AWS partner security products.

๐Ÿ’ผ 4.17 Ensure that 'Allow Blob Anonymous Access' is set to 'Disabled' (Automated)

The Azure Storage setting 'Allow Blob Anonymous Access' (aka 'allowBlobPublicAccess') controls whether anonymous access is allowed for blob data in a storage account. When this property is set to True, it enables public read access to blob data, which can be convenient for sharing data but may carry security risks. When set to False, it disallows public access to blob data, providing a more secure storage environment.

๐Ÿ’ผ 4.2 Ensure CloudTrail log file validation is enabled (Automated)

CloudTrail log file validation creates a digitally signed digest file containing a hash of each log that CloudTrail writes to S3. These digest files can be used to determine whether a log file was changed, deleted, or remained unchanged after CloudTrail delivered the log. It is recommended that file validation be enabled for all CloudTrails.

๐Ÿ’ผ 4.2 Ensure management console sign-in without MFA is monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for console logins that are not protected by multi-factor authentication (MFA).

๐Ÿ’ผ 4.2 Ensure management console sign-in without MFA is monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for console logins that are not protected by multi-factor authentication (MFA).

๐Ÿ’ผ 4.2 Ensure management console sign-in without MFA is monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for console logins that are not protected by multi-factor authentication (MFA).

๐Ÿ’ผ 4.2 Ensure management console sign-in without MFA is monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for console logins that are not protected by multi-factor authentication (MFA).

๐Ÿ’ผ 4.2 Ensure management console sign-in without MFA is monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for console logins that are not protected by multi-factor authentication (MFA).

๐Ÿ’ผ 4.2 Kubelet

This section contains recommendations for kubelet configuration. Kubelet settings may be configured using arguments on the running kubelet executable, or they may be taken from a Kubelet config file. If both are specified, the executable argument takes precedence. To find the Kubelet config file, run the following command: ps -ef | grep kubelet | grep config If the `--config` argument is present, this gives the location of the Kubelet config file. This config file could be in JSON or YAML format depending on your distribution.

๐Ÿ’ผ 4.2 Pod Security Standards

Pod Security Standards (PSS) are recommendations for securing deployed workloads to reduce the risks of container breakout. There are a number of ways if implementing PSS, including the built-in Pod Security Admission controller, or external policy control systems which integrate with Kubernetes via validating and mutating webhooks.

๐Ÿ’ผ 4.2 SQL Server - Microsoft Defender for SQL

Microsoft Defender for SQL provides a layer of security which enables customers to detect and respond to potential threats as they occur through security alerts on anomalous activities. Users will receive an alert upon suspicious database activities, potential vulnerabilities, and SQL injection attacks, as well as anomalous database access patterns. SQL Server Threat Detection alerts provide details of suspicious activity and recommend action on how to investigate and mitigate the threat. Microsoft Defender for SQL may incur additional cost per SQL server.

๐Ÿ’ผ 4.2 SQL Server - Microsoft Defender for SQL

Microsoft Defender for SQL provides a layer of security which enables customers to detect and respond to potential threats as they occur through security alerts on anomalous activities. Users will receive an alert upon suspicious database activities, potential vulnerabilities, and SQL injection attacks, as well as anomalous database access patterns. SQL Server Threat Detection alerts provide details of suspicious activity and recommend action on how to investigate and mitigate the threat. Microsoft Defender for SQL may incur additional cost per SQL server.

๐Ÿ’ผ 4.2.1 Passwords are protected against brute-force password guessing

Passwords are protected against brute-force password guessing by implementing at least one of: - multi-factor authentication - 'throttling' the rate of attempts, so that the number of times the user must wait between attempts increases with each unsuccessful attempt โ€“ you shouldnโ€™t allow more than 10 guesses in 5 minutes - locking accounts after no more than 10 unsuccessful attempts

๐Ÿ’ผ 4.2.1 Strong cryptography and security protocols are implemented to safeguard PAN during transmission over open, public networks.

As following: - Only trusted keys and certificates are accepted. - Certificates used to safeguard PAN during transmission over open, public networks are confirmed as valid and are not expired or revoked. This bullet is a best practice until its effective date; refer to applicability notes below for details. - The protocol in use supports only secure versions or configurations and does not support fallback to, or use of insecure versions, algorithms, key sizes, or implementations. - The encryption strength is appropriate for the encryption methodology in use.

๐Ÿ’ผ 4.2.1 Strong cryptography and security protocols are implemented to safeguard PAN during transmission over open, public networks.

As following: - Only trusted keys and certificates are accepted. - Certificates used to safeguard PAN during transmission over open, public networks are confirmed as valid and are not expired or revoked. This bullet is a best practice until its effective date; refer to applicability notes below for details. - The protocol in use supports only secure versions or configurations and does not support fallback to, or use of insecure versions, algorithms, key sizes, or implementations. - The encryption strength is appropriate for the encryption methodology in use.

๐Ÿ’ผ 4.2.2 Use technical controls to manage the quality of passwords.

Use technical controls to manage the quality of passwords. This will include one of the following: - Using multi-factor authentication - A minimum password length of at least 12 characters, with no maximum length restrictions - A minimum password length of at least 8 characters, with no maximum length restrictions and use automatic blocking of common passwords using a deny list.

๐Ÿ’ผ 4.2.3 Support users to choose unique passwords for their work accounts

Support users to choose unique passwords for their work accounts by: - educating people about avoiding common passwords, such as a pet's name, common keyboard patterns or passwords they have used elsewhere. This could include teaching people to use the password generator feature built into some password managers. - encouraging people to choose longer passwords by promoting the use of multiple words (a minimum of three) to create a password (such as the NCSCโ€™s guidance on using three random words) - providing usable secure storage for passwords (for example a password manager or secure locked cabinet) with clear information about how and when it can be used. - not enforcing regular password expiry - not enforcing password complexity requirements

๐Ÿ’ผ 4.2.4 The password element of the multi-factor authentication

As well as providing an extra layer of security for passwords that arenโ€™t protected by the other technical controls, you should always use multi-factor authentication to give administrative accounts extra security, and accounts that are accessible from the internet. The password element of the multi-factor authentication approach must have a password length of at least 8 characters, with no maximum length restrictions.

๐Ÿ’ผ 4.3 Ensure AWS Config is enabled in all regions (Automated)

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration items (AWS resources), relationships between configuration items (AWS resources), and any configuration changes between resources. It is recommended that AWS Config be enabled in all regions.

๐Ÿ’ผ 4.3 Ensure the default security group of every VPC restricts all traffic

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If you don't specify a security group when you launch an instance, the instance is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic. The default VPC in every region should have its default security group updated to comply. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **NOTE:** When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly because it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering - discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 4.3 Ensure usage of 'root' account is monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for 'root' login attempts to detect the unauthorized use, or attempts to use the root account.

๐Ÿ’ผ 4.3 Ensure usage of 'root' account is monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for 'root' login attempts to detect the unauthorized use, or attempts to use the root account.

๐Ÿ’ผ 4.3 Ensure usage of the 'root' account is monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for 'root' login attempts to detect unauthorized use or attempts to use the root account.

๐Ÿ’ผ 4.3 Ensure usage of the 'root' account is monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for 'root' login attempts to detect unauthorized use or attempts to use the root account.

๐Ÿ’ผ 4.3 Ensure usage of the 'root' account is monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for 'root' login attempts to detect unauthorized use or attempts to use the root account.

๐Ÿ’ผ 4.4 Ensure IAM policy changes are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established changes made to Identity and Access Management (IAM) policies.

๐Ÿ’ผ 4.4 Ensure IAM policy changes are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established changes made to Identity and Access Management (IAM) policies.

๐Ÿ’ผ 4.4 Ensure IAM policy changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes made to Identity and Access Management (IAM) policies.

๐Ÿ’ผ 4.4 Ensure IAM policy changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes made to Identity and Access Management (IAM) policies.

๐Ÿ’ผ 4.4 Ensure IAM policy changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes made to Identity and Access Management (IAM) policies.

๐Ÿ’ผ 4.4.2 Consider external secret storage (Manual)

Consider the use of an external secrets storage and management system instead of using Kubernetes Secrets directly, if more complex secret management is required. Ensure the solution requires authentication to access secrets, has auditing of access to and use of secrets, and encrypts secrets. Some solutions also make it easier to rotate secrets.

๐Ÿ’ผ 4.5 Cosmos DB

This section groups security best practices/recommendations for Azure Cosmos DB Database Servers.

๐Ÿ’ผ 4.5 Cosmos DB

This section groups security best practices/recommendations for Azure Cosmos DB Database Servers.

๐Ÿ’ผ 4.5 Cosmos DB

This section groups security best practices/recommendations for Azure Cosmos DB Database Servers.

๐Ÿ’ผ 4.5 Ensure 'Enable connecting to serial ports' is not enabled for VM Instance

Interacting with a serial port is often referred to as the serial console, which is similar to using a terminal window, in that input and output is entirely in text mode and there is no graphical interface or mouse support. If you enable the interactive serial console on an instance, clients can attempt to connect to that instance from any IP address. Therefore interactive serial console support should be disabled.

๐Ÿ’ผ 4.5 Ensure 'Enable connecting to serial ports' is not enabled for VM Instance - Level 1 (Automated)

Interacting with a serial port is often referred to as the serial console, which is similar to using a terminal window, in that input and output is entirely in text mode and there is no graphical interface or mouse support. If you enable the interactive serial console on an instance, clients can attempt to connect to that instance from any IP address. Therefore interactive serial console support should be disabled.

๐Ÿ’ผ 4.5 Ensure 'Enable Connecting to Serial Ports' Is Not Enabled for VM Instance - Level 1 (Automated)

Interacting with a serial port is often referred to as the serial console, which is similar to using a terminal window, in that input and output is entirely in text mode and there is no graphical interface or mouse support. If you enable the interactive serial console on an instance, clients can attempt to connect to that instance from any IP address. Therefore interactive serial console support should be disabled.

๐Ÿ’ผ 4.5 Ensure 'Enable Connecting to Serial Ports' Is Not Enabled for VM Instance - Level 1 (Automated)

Interacting with a serial port is often referred to as the serial console, which is similar to using a terminal window, in that input and output is entirely in text mode and there is no graphical interface or mouse support. If you enable the interactive serial console on an instance, clients can attempt to connect to that instance from any IP address. Therefore interactive serial console support should be disabled.

๐Ÿ’ผ 4.5 Ensure โ€˜Enable Connecting to Serial Portsโ€™ Is Not Enabled for VM Instance - Level 1 (Automated)

Interacting with a serial port is often referred to as the serial console, which is similar to using a terminal window, in that input and output is entirely in text mode and there is no graphical interface or mouse support. If you enable the interactive serial console on an instance, clients can attempt to connect to that instance from any IP address. Therefore interactive serial console support should be disabled.

๐Ÿ’ผ 4.5 Ensure CloudTrail configuration changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be used to detect changes to CloudTrail's configurations.

๐Ÿ’ผ 4.5 Ensure CloudTrail configuration changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be used to detect changes to CloudTrail's configurations.

๐Ÿ’ผ 4.5 Ensure CloudTrail configuration changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be used to detect changes to CloudTrail's configurations.

๐Ÿ’ผ 4.5 Ensure CloudTrail logs are encrypted at rest using KMS CMKs (Automated)

AWS CloudTrail is a web service that records AWS API calls for an account and makes those logs available to users and resources in accordance with IAM policies. AWS Key Management Service (KMS) is a managed service that helps create and control the encryption keys used to encrypt account data, and uses Hardware Security Modules (HSMs) to protect the security of encryption keys. CloudTrail logs can be configured to leverage server side encryption (SSE) and KMS customer-created master keys (CMK) to further protect CloudTrail logs. It is recommended that CloudTrail be configured to use SSE-KMS.

๐Ÿ’ผ 4.5 Ensure SQL server's TDE protector is encrypted with Customer-managed key - Level 2 (Automated)

TDE with Customer-managed key support provides increased transparency and control over the TDE Protector, increased security with an HSM-backed external service, and promotion of separation of duties. With TDE, data is encrypted at rest with a symmetric key (called the database encryption key) stored in the database or data warehouse distribution. To protect this data encryption key (DEK) in the past, only a certificate that the Azure SQL Service managed could be used. Now, with Customer-managed key support for TDE, the DEK can be protected with an asymmetric key that is stored in the Key Vault. Key Vault is a highly available and scalable cloud-based key store which offers central key management, leverages FIPS 140-2 Level 2 validated hardware security modules (HSMs), and allows separation of management of keys and data, for additional security. Based on business needs or criticality of data/databases hosted a SQL server, it is recommended that the TDE protector is encrypted by a key that is managed by the data owner (Customer-managed key).

๐Ÿ’ผ 4.6 Ensure rotation for customer-created symmetric CMKs is enabled (Automated)

AWS Key Management Service (KMS) allows customers to rotate the backing key, which is key material stored within the KMS that is tied to the key ID of the customer-created customer master key (CMK). The backing key is used to perform cryptographic operations such as encryption and decryption. Automated key rotation currently retains all prior backing keys so that decryption of encrypted data can occur transparently. It is recommended that CMK key rotation be enabled for symmetric keys. Key rotation cannot be enabled for any asymmetric CMK.

๐Ÿ’ผ 4.6 Ensure SQL server's TDE protector is encrypted with Customer-managed key - Level 2 (Automated)

TDE with Customer-managed key support provides increased transparency and control over the TDE Protector, increased security with an HSM-backed external service, and promotion of separation of duties. With TDE, data is encrypted at rest with a symmetric key (called the database encryption key) stored in the database or data warehouse distribution. To protect this data encryption key (DEK) in the past, only a certificate that the Azure SQL Service managed could be used. Now, with Customer-managed key support for TDE, the DEK can be protected with an asymmetric key that is stored in the Key Vault. Key Vault is a highly available and scalable cloud-based key store which offers central key management, leverages FIPS 140-2 Level 2 validated hardware security modules (HSMs), and allows separation of management of keys and data, for additional security. Based on business needs or criticality of data/databases hosted a SQL server, it is recommended that the TDE protector is encrypted by a key that is managed by the data owner (Customer-managed key).

๐Ÿ’ผ 4.6 Ensure that IP forwarding is not enabled on Instances

Compute Engine instance cannot forward a packet unless the source IP address of the packet matches the IP address of the instance. Similarly, GCP won't deliver a packet whose destination IP address is different than the IP address of the instance receiving the packet. However, both capabilities are required if you want to use instances to help route packets. Forwarding of data packets should be disabled to prevent data loss or information disclosure.

๐Ÿ’ผ 4.6 Ensure that IP forwarding is not enabled on Instances - Level 1 (Automated)

Compute Engine instance cannot forward a packet unless the source IP address of the packet matches the IP address of the instance. Similarly, GCP won't deliver a packet whose destination IP address is different than the IP address of the instance receiving the packet. However, both capabilities are required if you want to use instances to help route packets. Forwarding of data packets should be disabled to prevent data loss or information disclosure.

๐Ÿ’ผ 4.6 Ensure That IP Forwarding Is Not Enabled on Instances - Level 1 (Automated)

Compute Engine instance cannot forward a packet unless the source IP address of the packet matches the IP address of the instance. Similarly, GCP won't deliver a packet whose destination IP address is different than the IP address of the instance receiving the packet. However, both capabilities are required if you want to use instances to help route packets. Forwarding of data packets should be disabled to prevent data loss or information disclosure.

๐Ÿ’ผ 4.6 Ensure That IP Forwarding Is Not Enabled on Instances - Level 1 (Automated)

Compute Engine instance cannot forward a packet unless the source IP address of the packet matches the IP address of the instance. Similarly, GCP won't deliver a packet whose destination IP address is different than the IP address of the instance receiving the packet. However, both capabilities are required if you want to use instances to help route packets. Forwarding of data packets should be disabled to prevent data loss or information disclosure.

๐Ÿ’ผ 4.6 Ensure That IP Forwarding Is Not Enabled on Instances - Level 1 (Automated)

Compute Engine instance cannot forward a packet unless the source IP address of the packet matches the IP address of the instance. Similarly, GCP won't deliver a packet whose destination IP address is different than the IP address of the instance receiving the packet. However, both capabilities are required if you want to use instances to help route packets. Forwarding of data packets should be disabled to prevent data loss or information disclosure.

๐Ÿ’ผ 4.6 General Policies

These policies relate to general cluster management topics, like namespace best practices and policies applied to pod objects in the cluster.

๐Ÿ’ผ 4.7 Ensure VM disks for critical VMs are encrypted with Customer-Supplied Encryption Keys (CSEK)

Customer-Supplied Encryption Keys (CSEK) are a feature in Google Cloud Storage and Google Compute Engine. If you supply your own encryption keys, Google uses your key to protect the Google-generated keys used to encrypt and decrypt your data. By default, Google Compute Engine encrypts all data at rest. Compute Engine handles and manages this encryption for you without any additional actions on your part. However, if you wanted to control and manage this encryption yourself, you can provide your own encryption keys.

๐Ÿ’ผ 4.7 Ensure VM disks for critical VMs are encrypted with Customer-Supplied Encryption Keys (CSEK) - Level 2 (Automated)

Customer-Supplied Encryption Keys (CSEK) are a feature in Google Cloud Storage and Google Compute Engine. If you supply your own encryption keys, Google uses your key to protect the Google-generated keys used to encrypt and decrypt your data. By default, Google Compute Engine encrypts all data at rest. Compute Engine handles and manages this encryption for you without any additional actions on your part. However, if you wanted to control and manage this encryption yourself, you can provide your own encryption keys.

๐Ÿ’ผ 4.7 Ensure VM Disks for Critical VMs Are Encrypted With Customer-Supplied Encryption Keys (CSEK) - Level 2 (Automated)

Customer-Supplied Encryption Keys (CSEK) are a feature in Google Cloud Storage and Google Compute Engine. If you supply your own encryption keys, Google uses your key to protect the Google-generated keys used to encrypt and decrypt your data. By default, Google Compute Engine encrypts all data at rest. Compute Engine handles and manages this encryption for you without any additional actions on your part. However, if you wanted to control and manage this encryption yourself, you can provide your own encryption keys.

๐Ÿ’ผ 4.7 Ensure VM Disks for Critical VMs Are Encrypted With Customer-Supplied Encryption Keys (CSEK) - Level 2 (Automated)

Customer-Supplied Encryption Keys (CSEK) are a feature in Google Cloud Storage and Google Compute Engine. If you supply your own encryption keys, Google uses your key to protect the Google-generated keys used to encrypt and decrypt your data. By default, Google Compute Engine encrypts all data at rest. Compute Engine handles and manages this encryption for you without any additional actions on your part. However, if you wanted to control and manage this encryption yourself, you can provide your own encryption keys.

๐Ÿ’ผ 4.7 Ensure VM Disks for Critical VMs Are Encrypted With Customer-Supplied Encryption Keys (CSEK) - Level 2 (Automated)

Customer-Supplied Encryption Keys (CSEK) are a feature in Google Cloud Storage and Google Compute Engine. If you supply your own encryption keys, Google uses your key to protect the Google-generated keys used to encrypt and decrypt your data. By default, Google Compute Engine encrypts all data at rest. Compute Engine handles and manages this encryption for you without any additional actions on your part. However, if you wanted to control and manage this encryption yourself, you can provide your own encryption keys.

๐Ÿ’ผ 4.7 Ensure VPC flow logging is enabled in all VPCs (Automated)

VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. After you've created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs. It is recommended that VPC Flow Logs be enabled for packet "Rejects" for VPCs.

๐Ÿ’ผ 4.8 Ensure 'Allow Azure services on the trusted services list to access this storage account' is Enabled for Storage Account Access (Automated)

**NOTE**: This recommendation assumes that the `Public network access` parameter is set to `Enabled from selected virtual networks and IP addresses`. Please ensure the prerequisite recommendation has been implemented before proceeding: - Ensure Default Network Access Rule for Storage Accounts is Set to Deny Some Azure services that interact with storage accounts operate from networks that can't be granted access through network rules. To help this type of service work as intended, allow the set of trusted Azure services to bypass the network rules. These services will then use strong authentication to access the storage account. If the `Allow Azure services on the trusted services list to access this storage account` exception is enabled, the following services are granted access to the storage account: Azure Backup, Azure Data Box, Azure DevTest Labs, Azure Event Grid, Azure Event Hubs, Azure File Sync, Azure HDInsight, Azure Import/Export, Azure Monitor, Azure Networking Services, and Azure Site Recovery (when registered in the subscription).

๐Ÿ’ผ 4.8 Ensure S3 bucket policy changes are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes to S3 bucket policies.

๐Ÿ’ผ 4.8 Ensure S3 bucket policy changes are monitored - Level 1 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes to S3 bucket policies.

๐Ÿ’ผ 4.8 Ensure S3 bucket policy changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes to S3 bucket policies.

๐Ÿ’ผ 4.8 Ensure S3 bucket policy changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes to S3 bucket policies.

๐Ÿ’ผ 4.8 Ensure S3 bucket policy changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes to S3 bucket policies.

๐Ÿ’ผ 4.9 Ensure AWS Config configuration changes are monitored - Level 2 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for detecting changes to AWS Config's configurations.

๐Ÿ’ผ 4.9 Ensure AWS Config configuration changes are monitored - Level 2 (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs, or an external Security information and event management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for detecting changes to AWS Config's configurations.

๐Ÿ’ผ 4.9 Ensure AWS Config configuration changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for detecting changes to AWS Config's configurations.

๐Ÿ’ผ 4.9 Ensure AWS Config configuration changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for detecting changes to AWS Config's configurations.

๐Ÿ’ผ 4.9 Ensure AWS Config configuration changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for detecting changes to AWS Config's configurations.

๐Ÿ’ผ 4.9 Ensure Private Endpoints are used to access Storage Accounts (Automated)

Use private endpoints for your Azure Storage accounts to allow clients and services to securely access data located over a network via an encrypted Private Link. To do this, the private endpoint uses an IP address from the VNet for each service. Network traffic between disparate services securely traverses encrypted over the VNet. This VNet can also link addressing space, extending your network and accessing resources on it. Similarly, it can be a tunnel through public networks to connect remote infrastructures together. This creates further security through segmenting network traffic and preventing outside sources from accessing it.

๐Ÿ’ผ 40 An important aspect of information asset life-cycle management involves minimising vulnerabilities and maintaining support. Information security exposures could arise from hardware and software which is outdated or has limited or no support (whether through a third party, a related party or in-house). Technology that is end-of-life5 , out-of-support or in extended support is typically less secure by design, has a dated security model and can take longer, or is unable, to be updated to address new threats.

๐Ÿ’ผ 41 Maintaining information assets therefore necessitates a disciplined approach to information asset life-cycle management, including a comprehensive understanding of assets that support the business, as well as the potential impacts of an information security compromise of these assets. Maintenance of information assets can be facilitated through the monitoring of end-of-support dates, where available, and the active identification of systems, including those that are internally-developed and which are no longer invested in or are not secure by design. A technology refresh plan with committed resourcing can also facilitate the timely replacement of hardware and software.

๐Ÿ’ผ 42 Where extended support arrangements are in place, it is important that there is a clear understanding of the nature and effectiveness of these arrangements. Additionally, while extended or custom support arrangements may partially mitigate risk, they are often costly, could provide a false sense of security and can further delay remediation of ageing technology. Furthermore, support agreements of this nature typically provide hot-fixes or patches for critical vulnerabilities only, and remain constrained by the dated security model and design limitations of the technology.

๐Ÿ’ผ 45 An understanding of plausible worst case scenarios can help regulated entities identify and implement additional controls to prevent or reduce the impact of such scenarios. One example is malware that infects computers and encrypts data, both on the infected computer and any connected storage, including (corporate) networks and cloud storage. Such attacks reinforce the importance of protecting the backup environment in the event that the production environment is compromised. Common techniques to achieve this include network segmentation, highly restricted and segregated access controls and network traffic flow restrictions.

๐Ÿ’ผ 5 Database Services

This section covers security recommendations to follow to set general database services policies on an Azure Subscription. Subsections will address specific database types.

๐Ÿ’ผ 5 Logging and Monitoring

This section covers security recommendations to follow for logging and monitoring policies on an Azure Subscription. **Scoping: A necessary exercise for effective and efficient use of Logging and Monitoring** For recommendations contained in this section, it is crucial that your organization consider and settle on the scope of application for each recommendation individually. The scope of application cannot be realistically written in a generic prescriptive way within these recommendations, so a scoping exercise is strongly recommended. A scoping exercise will help you determine which resources are "in scope" and will receive partial or complete logging and monitoring treatment, and which resources are "out of scope" and will not receive any logging and monitoring treatment. Your objectives with the scoping exercise should be to: - Produce a clear classification of resources - Understand the control requirements of any relevant security or compliance frameworks - Ensure the appropriate personnel can detect and react to threats - Ensure relevant resources have a historical register for accountability and investigation - Minimize alert fatigue and cost Release Environments provide a helpful context for understanding scope from a DevOps perspective. For example: 1. Production Environment 1. Staging Environment 1. Testing Environment 1. Development Environment While resources considered in the scope of a Production Environment might have a full set of recommendations applied for logging and monitoring, other release environments might have a limited set of recommendations applied for the sake of accountability. The names of these environments and which resources are in the scope of each environment will vary from one organization to another.

๐Ÿ’ผ 5 Managed services

This section consists of security recommendations for the direct configuration of Kubernetes managed service components, namely, Google Kubernetes Engine (GKE). These recommendations are directly applicable for features which exist only as part of a managed service.

๐Ÿ’ผ 5 Monitoring

This section contains recommendations for configuring AWS to assist with monitoring and responding to account activities. Metric filter-related recommendations in this section are dependent on the `Ensure CloudTrail is enabled in all regions` and `Ensure CloudTrail trails are integrated with CloudWatch Logs` recommendations in the "Logging" section.

๐Ÿ’ผ 5 Networking

This section contains recommendations for configuring security-related aspects of the default Virtual Private Cloud (VPC)

๐Ÿ’ผ 5 Networking

This section contains recommendations for configuring security-related aspects of AWS Virtual Private Cloud (VPC).

๐Ÿ’ผ 5 Networking

This section contains recommendations for configuring security-related aspects of AWS Virtual Private Cloud (VPC).

๐Ÿ’ผ 5 Networking

This section contains recommendations for configuring security-related aspects of AWS Virtual Private Cloud (VPC).

๐Ÿ’ผ 5 Networking

This section contains recommendations for configuring security-related aspects of AWS Virtual Private Cloud (VPC).

๐Ÿ’ผ 5 Policies

This section contains recommendations for various Kubernetes policies which are important to the security of the environment.

๐Ÿ’ผ 5 Storage

This section covers recommendations addressing storage on Google Cloud Platform.

๐Ÿ’ผ 5 Storage

This section covers recommendations addressing storage on Google Cloud Platform.

๐Ÿ’ผ 5 Storage

This section covers recommendations addressing storage on Google Cloud Platform.

๐Ÿ’ผ 5 Storage

This section covers recommendations addressing storage on Google Cloud Platform.

๐Ÿ’ผ 5 Storage

This section covers recommendations addressing storage on Google Cloud Platform.

๐Ÿ’ผ 5 The strength of identification and authentication would typically be commensurate with the impact should an identity be falsified. Common techniques for increasing the strength of identification and authentication include the use of strong password techniques (i.e. length, complexity, re-use limitations and frequency of change), utilisation of cryptographic techniques and increasing the number and type of authentication factors used. Authentication factors include something an individual: a. knows - for example, user IDs and passwords; b. has - for example, a security token or other devices in the personโ€™s possession used for the generation of one-time passwords; c. is - for example, retinal scans, hand scans, signature scans, digital signature, voice scans or other biometrics.

๐Ÿ’ผ 5.1 Anti-malware software

Anti-malware software (option for in scope devices running Windows or MacOS including servers, desktop computers, laptop computers) If you use anti-malware software to protect your device it must be configured to:

๐Ÿ’ผ 5.1 Azure SQL Database

This section covers security best practice recommendations for Azure SQL Database. Azure Product Page: <https://azure.microsoft.com/en-us/products/azure-sql/database/>

๐Ÿ’ผ 5.1 Configuring Diagnostic Settings

The Azure Diagnostic Settings capture control/management activities performed on a subscription or Azure AD Tenant. By default, the Azure Portal retains activity logs only for 90 days. The Diagnostic Settings define the type of events that are stored or streamed and the outputsโ€”storage account, log analytics workspace, event hub, and others. The Diagnostic Settings, if configured properly, can ensure that all logs are retained for longer duration. This section has recommendations for correctly configuring the Diagnostic Settings so that all logs captured are retained for longer periods. **Azure Subscriptions** When configuring Diagnostic Settings, you may choose to export in one of four ways in which you need to ensure appropriate data retention. The options are Log Analytics, Event Hub, Storage Account, and Partner Solutions. It is important to ensure you are aware and have set retention as your organization sees fit. **Azure AD Logs** In order to retain sign in logs, user account changes, application provisioning logs, or other logs that are visible to only on the Tenant in Azure AD, separate Diagnostic settings must be specified. **Deployment by Policy** Deploying Azure diagnostics should ideally be done by policy to ensure a consistent configuration, Microsoft provide a full set of policies for all diagnostic capable resource types in their github repository. If you chose to deploy by policy, it is best to route the diagnostics to a Log Analytics Workspace so that they can be used in Azure Monitor or Azure Sentinel. Be aware that this has a cost attached to it. Future versions of the CIS Azure Foundations Benchmark will aim to cover the use of policy in greater detail.

๐Ÿ’ผ 5.1 Configuring Diagnostic Settings

The Azure Diagnostic Settings capture control/management activities performed on a subscription or Azure AD Tenant. By default, the Azure Portal retains activity logs only for 90 days. The Diagnostic Settings define the type of events that are stored or streamed and the outputsโ€”storage account, log analytics workspace, event hub, and others. The Diagnostic Settings, if configured properly, can ensure that all logs are retained for longer duration. This section has recommendations for correctly configuring the Diagnostic Settings so that all logs captured are retained for longer periods. **Azure Subscriptions** When configuring Diagnostic Settings, you may choose to export in one of four ways in which you need to ensure appropriate data retention. The options are Log Analytics, Event Hub, Storage Account, and Partner Solutions. It is important to ensure you are aware and have set retention as your organization sees fit. **Azure AD Logs** In order to retain sign in logs, user account changes, application provisioning logs, or other logs that are visible to only on the Tenant in Azure AD, separate Diagnostic settings must be specified. **Deployment by Policy** Deploying Azure diagnostics should ideally be done by policy to ensure a consistent configuration, Microsoft provide a full set of policies for all diagnostic capable resource types in their github repository. If you chose to deploy by policy, it is best to route the diagnostics to a Log Analytics Workspace so that they can be used in Azure Monitor or Azure Sentinel. Be aware that this has a cost attached to it. Future versions of the CIS Azure Foundations Benchmark will aim to cover the use of policy in greater detail.

๐Ÿ’ผ 5.1 Configuring Diagnostic Settings

The Azure Diagnostic Settings capture control/management activities performed on a subscription or Azure AD Tenant. By default, the Azure Portal retains activity logs only for 90 days. The Diagnostic Settings define the type of events that are stored or streamed and the outputsโ€”storage account, log analytics workspace, event hub, and others. The Diagnostic Settings, if configured properly, can ensure that all logs are retained for longer duration. This section has recommendations for correctly configuring the Diagnostic Settings so that all logs captured are retained for longer periods. **Azure Subscriptions** When configuring Diagnostic Settings, you may choose to export in one of four ways in which you need to ensure appropriate data retention. The options are Log Analytics workspace, Event Hub, Storage Account, and Partner Solutions. It is important to ensure you are aware and have set retention as your organization sees fit. **Azure AD Logs** In order to retain sign in logs, user account changes, application provisioning logs, or other logs that are visible to only on the Tenant in Azure AD, separate Diagnostic settings must be specified. **Deployment by Policy** Deploying Azure diagnostics should ideally be done by policy to ensure a consistent configuration, Microsoft provide a full set of policies for all diagnostic capable resource types in their github repository. If you chose to deploy by policy, it is best to route the diagnostics to a Log Analytics Workspace so that they can be used in Azure Monitor or Azure Sentinel. Be aware that this has a cost attached to it. Future versions of the CIS Azure Foundations Benchmark will aim to cover the use of policy in greater detail.

๐Ÿ’ผ 5.1 Ensure unauthorized API calls are monitored (Automated)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for unauthorized API calls.

๐Ÿ’ผ 5.1 Policies for information security

Information security policy and topic-specific policies shall be de- fined, approved by management, published, communicated to and acknowledged by relevant personnel and relevant interested parties, and reviewed at planned intervals and if significant changes occur.

๐Ÿ’ผ 5.1.2 Ensure Diagnostic Setting captures appropriate categories - Level 1 (Automated)

**Prerequisite**: A Diagnostic Setting must exist. If a Diagnostic Setting does not exist, the navigation and options within this recommendation will not be available. Please review the recommendation at the beginning of this subsection titled: "Ensure that a 'Diagnostic Setting' exists." The diagnostic setting should be configured to log the appropriate activities from the control/management plane.

๐Ÿ’ผ 5.1.2 Ensure Diagnostic Setting captures appropriate categories - Level 1 (Automated)

Prerequisite: A Diagnostic Setting must exist. If a Diagnostic Setting does not exist, the navigation and options within this recommendation will not be available. Please review the recommendation at the beginning of this subsection titled: "Ensure that a 'Diagnostic Setting' exists." The diagnostic setting should be configured to log the appropriate activities from the control/management plane.

๐Ÿ’ผ 5.1.2 Ensure Diagnostic Setting captures appropriate categories - Level 1 (Automated)

**Prerequisite**: A Diagnostic Setting must exist. If a Diagnostic Setting does not exist, the navigation and options within this recommendation will not be available. Please review the recommendation at the beginning of this subsection titled: "Ensure that a 'Diagnostic Setting' exists." The diagnostic setting should be configured to log the appropriate activities from the control/management plane.

๐Ÿ’ผ 5.1.2 Minimize access to secrets (Not Scored)

The Kubernetes API stores secrets, which may be service account tokens for the Kubernetes API or credentials used by workloads in the cluster. Access to these secrets should be restricted to the smallest possible group of users to reduce the risk of privilege escalation.

๐Ÿ’ผ 5.1.3 Ensure SQL server's Transparent Data Encryption (TDE) protector is encrypted with Customer-managed key (Automated)

Transparent Data Encryption (TDE) with Customer-managed key support provides increased transparency and control over the TDE Protector, increased security with an HSM-backed external service, and promotion of separation of duties. With TDE, data is encrypted at rest with a symmetric key (called the database encryption key) stored in the database or data warehouse distribution. To protect this data encryption key (DEK) in the past, only a certificate that the Azure SQL Service managed could be used. Now, with Customer-managed key support for TDE, the DEK can be protected with an asymmetric key that is stored in the Azure Key Vault. The Azure Key Vault is a highly available and scalable cloud-based key store which offers central key management, leverages FIPS 140-2 Level 2 validated hardware security modules (HSMs), and allows separation of management of keys and data for additional security. Based on business needs or criticality of data/databases hosted on a SQL server, it is recommended that the TDE protector is encrypted by a key that is managed by the data owner (Customer-managed key).

๐Ÿ’ผ 5.1.3 Minimize wildcard use in Roles and ClusterRoles (Not Scored)

Kubernetes Roles and ClusterRoles provide access to resources based on sets of objects and actions that can be taken on those objects. It is possible to set either of these to be the wildcard `*` which matches all items. Use of wildcards is not optimal from a security perspective as it may allow for inadvertent access to be granted when new resources are added to the Kubernetes API either as CRDs or in later versions of the product.

๐Ÿ’ผ 5.1.4 Minimize access to create pods (Not Scored)

The ability to create pods in a namespace can provide a number of opportunities for privilege escalation, such as assigning privileged service accounts to these pods or mounting hostPaths with access to sensitive data (unless Pod Security Policies are implemented to restrict this access) As such, access to create new pods should be restricted to the smallest possible group of users.

๐Ÿ’ผ 5.10 Ensure security group changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Security groups are stateful packet filters that control ingress and egress traffic within a VPC. It is recommended that a metric filter and alarm be established to detect changes to security groups.

๐Ÿ’ผ 5.10.1 Ensure Kubernetes Web UI is Disabled (Automated)

Note: The Kubernetes web UI (Dashboard) does not have admin access by default in GKE 1.7 and higher. The Kubernetes web UI is disabled by default in GKE 1.10 and higher. In GKE 1.15 and higher, the Kubernetes web UI add-on KubernetesDashboard is no longer supported as a managed add-on. The Kubernetes Web UI (Dashboard) has been a historical source of vulnerability and should only be deployed when necessary.

๐Ÿ’ผ 5.11 Ensure Network Access Control List (NACL) changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. NACLs are used as a stateless packet filter to control ingress and egress traffic for subnets within a VPC. It is recommended that a metric filter and alarm be established for any changes made to NACLs.

๐Ÿ’ผ 5.11 Return of assets

Personnel and other interested parties as appropriate shall return all the organizationโ€™s assets in their possession upon change or termination of their employment, contract or agreement.

๐Ÿ’ผ 5.12 Classification of information

Information shall be classified according to the information security needs of the organization based on confidentiality, integrity, availability and relevant interested party requirements.

๐Ÿ’ผ 5.12 Ensure changes to network gateways are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Network gateways are required to send and receive traffic to a destination outside of a VPC. It is recommended that a metric filter and alarm be established for changes to network gateways.

๐Ÿ’ผ 5.13 Ensure route table changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. Routing tables are used to route network traffic between subnets and to network gateways. It is recommended that a metric filter and alarm be established for changes to route tables.

๐Ÿ’ผ 5.13 Labelling of information

An appropriate set of procedures for information labelling shall be developed and implemented in accordance with the information classification scheme adopted by the organization.

๐Ÿ’ผ 5.14 Ensure VPC changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is possible to have more than one VPC within an account; additionally, it is also possible to create a peer connection between two VPCs, enabling network traffic to route between them. It is recommended that a metric filter and alarm be established for changes made to VPCs.

๐Ÿ’ผ 5.14 Information transfer

Information transfer rules, procedures, or agreements shall be in place for all types of transfer facilities within the organization and between the organization and other parties.

๐Ÿ’ผ 5.15 Access control

Rules to control physical and logical access to information and other associated assets shall be established and implemented based on business and information security requirements.

๐Ÿ’ผ 5.15 Ensure AWS Organizations changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes made to AWS Organizations in the master AWS account.

๐Ÿ’ผ 5.16 Ensure AWS Security Hub is enabled (Automated)

Security Hub collects security data from various AWS accounts, services, and supported third-party partner products, helping you analyze your security trends and identify the highest-priority security issues. When you enable Security Hub, it begins to consume, aggregate, organize, and prioritize findings from the AWS services that you have enabled, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie. You can also enable integrations with AWS partner security products.

๐Ÿ’ผ 5.17 Authentication information

Allocation and management of authentication information shall be controlled by a management process, including advising personnel on appropriate handling of authentication information.

๐Ÿ’ผ 5.18 Access rights

Access rights to information and other associated assets shall be provisioned, reviewed, modified and removed in accordance with the organizationโ€™s topic-specific policy on and rules for access control.

๐Ÿ’ผ 5.2 Application allow listing

Application allow listing (option for all in scope devices) Only approved applications, restricted by code signing, are allowed to execute on devices. You must:

๐Ÿ’ผ 5.2 Azure Database for PostgreSQL

This section covers security best practice recommendations for Azure PostgreSQL Database Servers. Azure Product Page: <https://azure.microsoft.com/en-us/products/postgresql/> **RETIREMENT of Azure PostgreSQL Single Server**: Azure PostgreSQL Single Server is slated for retirement by March 25, 2025. Azure PostgreSQL Flexible Server is the newer deployment standard and is unaffected. Please use these resources to consider and prepare for migration: - <https://learn.microsoft.com/en-us/azure/postgresql/single-server/whats-happening-to-postgresql-single-server> - <https://learn.microsoft.com/en-us/azure/postgresql/migrate/concepts-single-to-flexible>

๐Ÿ’ผ 5.2 Ensure management console sign-in without MFA is monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for console logins that are not protected by multi-factor authentication (MFA).

๐Ÿ’ผ 5.2 Monitoring using Activity Log Alerts

The recommendations provided in this section are intended to provide entry-level alerting for crucial activities on a tenant account. These recommended activities **should** be tuned to your needs. By default, each of these Activity Log Alerts tends to guide the reader to alerting at the "Subscription-wide" level which will capture and alert on rules triggered by all resources and resource groups contained within a subscription. This is not an ideal rule set for Alerting within larger and more complex organizations. While this section provides recommendations for the creation of **Activity Log Alerts** specifically, Microsoft Azure supports four different types of alerts: - Metric Alerts - Log Alerts - Activity Log Alerts - Smart Detection Alerts All Azure services (Microsoft provided or otherwise) that can generate alerts are assigned a "Resource provider namespace" when they are registered in an Azure tenant. The recommendations in this section are in no way exhaustive of the plethora of available "Providers" or "Resource Types." The Resource Providers that are registered in your Azure Tenant can be located in your Subscription. Each registered Provider in your environment **may** have available "Conditions" to raise alerts via Activity Log Alerts. These providers should be considered for inclusion in Activity Log Alert rules of your own making. To view the registered resource providers in your Subscription(s), use this guide: - https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/resource-providers-and-types If you wish to create custom alerting rules for Activity Log Alerts or other alert types, please refer to Microsoft documentation: - https://docs.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-create-new-alert-rule

๐Ÿ’ผ 5.2 Monitoring using Activity Log Alerts

The recommendations provided in this section are intended to provide entry-level alerting for crucial activities on a tenant account. These recommended activities **should** be tuned to your needs. By default, each of these Activity Log Alerts tends to guide the reader to alerting at the "Subscription-wide" level which will capture and alert on rules triggered by all resources and resource groups contained within a subscription. This is not an ideal rule set for Alerting within larger and more complex organizations. While this section provides recommendations for the creation of **Activity Log Alerts** specifically, Microsoft Azure supports four different types of alerts: - Metric Alerts - Log Alerts - Activity Log Alerts - Smart Detection Alerts All Azure services (Microsoft provided or otherwise) that can generate alerts are assigned a "Resource provider namespace" when they are registered in an Azure tenant. The recommendations in this section are in no way exhaustive of the plethora of available "Providers" or "Resource Types." The Resource Providers that are registered in your Azure Tenant can be located in your Subscription. Each registered Provider in your environment **may** have available "Conditions" to raise alerts via Activity Log Alerts. These providers should be considered for inclusion in Activity Log Alert rules of your own making. To view the registered resource providers in your Subscription(s), use this guide: - https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/resource-providers-and-types If you wish to create custom alerting rules for Activity Log Alerts or other alert types, please refer to Microsoft documentation: - https://docs.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-create-new-alert-rule

๐Ÿ’ผ 5.2 Monitoring using Activity Log Alerts

The recommendations provided in this section are intended to provide entry-level alerting for crucial activities on a tenant account. These recommended activities **should** be tuned to your needs. By default, each of these Activity Log Alerts tends to guide the reader to alerting at the "Subscription-wide" level which will capture and alert on rules triggered by all resources and resource groups contained within a subscription. This is not an ideal rule set for Alerting within larger and more complex organizations. While this section provides recommendations for the creation of **Activity Log Alerts** specifically, Microsoft Azure supports four different types of alerts: - Metric Alerts - Log Alerts - Activity Log Alerts - Smart Detection Alerts All Azure services (Microsoft provided or otherwise) that can generate alerts are assigned a "Resource provider namespace" when they are registered in an Azure tenant. The recommendations in this section are in no way exhaustive of the plethora of available "Providers" or "Resource Types." The Resource Providers that are registered in your Azure Tenant can be located in your Subscription. Each registered Provider in your environment **may** have available "Conditions" to raise alerts via Activity Log Alerts. These providers should be considered for inclusion in Activity Log Alert rules of your own making. To view the registered resource providers in your Subscription(s), use this guide: - https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/resource-providers-and-types If you wish to create custom alerting rules for Activity Log Alerts or other alert types, please refer to Microsoft documentation: - https://docs.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-create-new-alert-rule

๐Ÿ’ผ 5.2 Pod Security Policies

A Pod Security Policy (PSP) is a cluster-level resource that controls security settings for pods. Your cluster may have multiple PSPs. You can query PSPs with the following command: kubectl get psp PodSecurityPolicies are used in conjunction with the PodSecurityPolicy admission controller plugin.

๐Ÿ’ผ 5.2.5 Ensure that Activity Log Alert exists for Create or Update Network Security Group (Network Security Group Rule) - Level 1 (Automated)

Create an activity log alert for the Create or Update Network Security Group Rule event. Note: The recommendation name is reflected in the original name from the CIS Azure v1.4.0 official benchmark. The original name contains a mistake for not mentioning that this recommendation checks the "Security Group Rule event". We added the suffix "(Network Security Group Rule)" to emphasize the actual purpose of the test.

๐Ÿ’ผ 5.28 Collection of evidence

The organization shall establish and implement procedures for the iden- tification, collection, acquisition and preservation of evidence related to information security events.

๐Ÿ’ผ 5.3 Azure Database for MySQL

This section covers security best practice recommendations for Azure MySQL Database Servers. Azure Product Page: <https://azure.microsoft.com/en-us/products/mysql/> **RETIREMENT of Azure MySQL Single Server**: Azure MySQL Single Server is slated for retirement by September 16, 2024. Azure MySQL Flexible Server is the newer deployment standard and is unaffected. Please use these resources to consider and prepare for migration: - <https://learn.microsoft.com/en-us/azure/mysql/migrate/whats-happening-to-mysql-single-server> - <https://learn.microsoft.com/en-us/azure/mysql/migrate/how-to-decide-on-right-migration-tools>

๐Ÿ’ผ 5.3 Ensure that anti-virus mechanisms are actively running and cannot be disabled or altered by users, unless specifically authorized by management on a case-by-case basis for a limited time period.

Anti-virus solutions may be temporarily disabled only if there is legitimate technical need, as authorized by management on a case-by-case basis. If anti-virus protection needs to be disabled for a specific purpose, it must be formally authorized. Additional security measures may also need to be implemented for the period of time during which anti-virus protection is not active.

๐Ÿ’ผ 5.3 Ensure that Azure Monitor Resource Logging is Enabled for All Services that Support it - Level 1 (Manual)

Resource Logs capture activity to the data access plane while the Activity log is a subscription-level log for the control plane. Resource-level diagnostic logs provide insight into operations that were performed within that resource itself; for example, reading or updating a secret from a Key Vault. Currently, 95 Azure resources support Azure Monitoring (See the more information section for a complete list), including Network Security Groups, Load Balancers, Key Vault, AD, Logic Apps, and CosmosDB. The content of these logs varies by resource type. A number of back-end services were not configured to log and store Resource Logs for certain activities or for a sufficient length. It is crucial that monitoring is correctly configured to log all relevant activities and retain those logs for a sufficient length of time. Given that the mean time to detection in an enterprise is 240 days, a minimum retention period of two years is recommended.

๐Ÿ’ผ 5.3 Ensure that Diagnostic Logs Are Enabled for All Services that Support it. - Level 1 (Manual | Not supported, requires a manual assessment)

Diagnostic Logs capture activity to the data access plane while the Activity log is a subscription-level log for the control plane. Resource-level diagnostic logs provide insight into operations that were performed within that resource itself, for example, getting a secret from a Key Vault. Currently, 32 Azure resources support Diagnostic Logging (See the references section for a complete list), including Network Security Groups, Load Balancers, Key Vault, AD, Logic Apps and CosmosDB. The content of these logs varies by resource type. For example, Windows event system logs are a category of diagnostics logs for VMs, and blob, table, and queue logs are categories of diagnostics logs for storage accounts. A number of back-end services were not configured to log and store Diagnostic Logs for certain activities or for a sufficient length. It is crucial that logging systems are correctly configured to log all relevant activities and retain those logs for a sufficient length of time. By default, Diagnostic Logs are not enabled. Given that the mean time to detection in an enterprise is 240 days, a minimum retention period of two years is recommended. Note: The CIS Benchmark covers some specific Diagnostic Logs separately. ''' 3.3 - Ensure Storage logging is enabled for Queue service for read, write, and delete requests 6.4 Ensure that Network Security Group Flow Log retention period is 'greater than 90 days' '''

๐Ÿ’ผ 5.3 Ensure that Diagnostic Logs are enabled for all services which support it. - Level 1 (Manual | Not supported, requires a manual assessment)

Diagnostic Logs capture activity to the data access plane while the Activity log is a subscription-level log for the control plane. Resource-level diagnostic logs provide insight into operations that were performed within that resource itself, for example, getting a secret from a Key Vault. Currently, 32 Azure resources support Diagnostic Logging (See the references section for a complete list), including Network Security Groups, Load Balancers, Key Vault, AD, Logic Apps and CosmosDB. The content of these logs varies by resource type. For example, Windows event system logs are a category of diagnostics logs for VMs, and blob, table, and queue logs are categories of diagnostics logs for storage accounts. A number of back-end services were not configured to log and store Diagnostic Logs for certain activities or for a sufficient length. It is crucial that logging systems are correctly configured to log all relevant activities and retain those logs for a sufficient length of time. By default, Diagnostic Logs are not enabled. Given that the mean time to detection in an enterprise is 240 days, a minimum retention period of two years is recommended. Note: The CIS Benchmark covers some specific Diagnostic Logs separately. ''' 3.3 - Ensure Storage logging is enabled for Queue service for read, write, and delete requests 6.4 Ensure that Network Security Group Flow Log retention period is 'greater than 90 days' ''' The assessment status was updated into "Manual" on the latest version of Azure CIS Benchmark v1.4.0.

๐Ÿ’ผ 5.3 Ensure the default security group of every VPC restricts all traffic

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If you don't specify a security group when you launch an instance, the instance is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic. The default VPC in every region should have its default security group updated to comply. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **NOTE:** When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly because it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering - discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 5.3 Ensure the default security group of every VPC restricts all traffic

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If you don't specify a security group when you launch an instance, the instance is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic. The default VPC in every region should have its default security group updated to comply. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation.

๐Ÿ’ผ 5.3 Ensure usage of the 'root' account is monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for 'root' login attempts to detect unauthorized use or attempts to use the root account.

๐Ÿ’ผ 5.3.1 Ensure Application Insights are Configured - Level 2 (Automated)

Application Insights within Azure act as an Application Performance Monitoring solution providing valuable data into how well an application performs and additional information when performing incident response. The types of log data collected include application metrics, telemetry data, and application trace logging data providing organizations with detailed information about application activity and application transactions. Both data sets help organizations adopt a proactive and retroactive means to handle security and performance related metrics within their modern applications.

๐Ÿ’ผ 5.3.1 Ensure Application Insights are Configured - Level 2 (Automated)

Application Insights within Azure act as an Application Performance Monitoring solution providing valuable data into how well an application performs and additional information when performing incident response. The types of log data collected include application metrics, telemetry data, and application trace logging data providing organizations with detailed information about application activity and application transactions. Both data sets help organizations adopt a proactive and retroactive means to handle security and performance related metrics within their modern applications.

๐Ÿ’ผ 5.4 Azure Cosmos DB

This section covers security best practice recommendations for Azure Cosmos DB Database Servers. Azure Product Page: <https://azure.microsoft.com/en-us/products/cosmos-db/>

๐Ÿ’ผ 5.4 Ensure IAM policy changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes made to Identity and Access Management (IAM) policies.

๐Ÿ’ผ 5.4 Ensure that Azure Monitor Resource Logging is Enabled for All Services that Support it - Level 1 (Manual)

Resource Logs capture activity to the data access plane while the Activity log is a subscription-level log for the control plane. Resource-level diagnostic logs provide insight into operations that were performed within that resource itself; for example, reading or updating a secret from a Key Vault. Currently, 95 Azure resources support Azure Monitoring (See the more information section for a complete list), including Network Security Groups, Load Balancers, Key Vault, AD, Logic Apps, and CosmosDB. The content of these logs varies by resource type. A number of back-end services were not configured to log and store Resource Logs for certain activities or for a sufficient length. It is crucial that monitoring is correctly configured to log all relevant activities and retain those logs for a sufficient length of time. Given that the mean time to detection in an enterprise is 240 days, a minimum retention period of two years is recommended.

๐Ÿ’ผ 5.4 Ensure that Azure Monitor Resource Logging is Enabled for All Services that Support it - Level 1 (Manual)

Resource Logs capture activity to the data access plane while the Activity log is a subscription-level log for the control plane. Resource-level diagnostic logs provide insight into operations that were performed within that resource itself; for example, reading or updating a secret from a Key Vault. Currently, 95 Azure resources support Azure Monitoring (See the more information section for a complete list), including Network Security Groups, Load Balancers, Key Vault, AD, Logic Apps, and CosmosDB. The content of these logs varies by resource type. A number of back-end services were not configured to log and store Resource Logs for certain activities or for a sufficient length. It is crucial that monitoring is correctly configured to log all relevant activities and retain those logs for a sufficient length of time. Given that the mean time to detection in an enterprise is 240 days, a minimum retention period of two years is recommended.

๐Ÿ’ผ 5.4 Ensure the default security group of every VPC restricts all traffic - Level 2 (Automated)

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If you don't specify a security group when you launch an instance, the instance is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic. The default VPC in every region should have its default security group updated to comply. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **NOTE:** When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly because it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering - discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 5.4 Ensure the default security group of every VPC restricts all traffic - Level 2 (Automated)

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If you don't specify a security group when you launch an instance, the instance is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic. The default VPC in every region should have its default security group updated to comply. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **NOTE:** When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly because it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering - discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 5.4 Ensure the default security group of every VPC restricts all traffic - Level 2 (Automated)

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If you don't specify a security group when you launch an instance, the instance is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic. The default VPC in every region should have its default security group updated to comply. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **NOTE:** When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly because it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering - discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 5.4 Management responsibilities

Management shall require all personnel to apply information security in accordance with the established information security policy, top- ic-specific policies and procedures of the organization.

๐Ÿ’ผ 5.4.2 Consider external secret storage (Not Scored)

Consider the use of an external secrets storage and management system, instead of using Kubernetes Secrets directly, if you have more complex secret management needs. Ensure the solution requires authentication to access secrets, has auditing of access to and use of secrets, and encrypts secrets. Some solutions also make it easier to rotate secrets.

๐Ÿ’ผ 5.5 Ensure CloudTrail configuration changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be used to detect changes to CloudTrail's configurations.

๐Ÿ’ผ 5.5 Ensure that SKU Basic/Consumption is not used on artifacts that need to be monitored (Particularly for Production Workloads) - Level 2 (Automated)

The use of Basic or Free SKUs in Azure whilst cost effective have significant limitations in terms of what can be monitored and what support can be realized from Microsoft. Typically, these SKUโ€™s do not have a service SLA and Microsoft will usually refuse to provide support for them. Consequently Basic/Free SKUs should never be used for production workloads.

๐Ÿ’ผ 5.5 Ensure that SKU Basic/Consumption is not used on artifacts that need to be monitored (Particularly for Production Workloads) - Level 2 (Manual)

The use of Basic or Free SKUs in Azure whilst cost effective have significant limitations in terms of what can be monitored and what support can be realized from Microsoft. Typically, these SKUs do not have a service SLA and Microsoft may refuse to provide support for them. Consequently Basic/Free SKUs should never be used for production workloads.

๐Ÿ’ผ 5.5 Ensure the default security group of every VPC restricts all traffic (Automated)

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If a security group is not specified when an instance is launched, it is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic, both inbound and outbound. The default VPC in every region should have its default security group updated to comply with the following: - No inbound rules. - No outbound rules. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **Note**: When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly, as it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering by discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 5.5 Ensure the default security group of every VPC restricts all traffic (Automated)

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If a security group is not specified when an instance is launched, it is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic, both inbound and outbound. The default VPC in every region should have its default security group updated to comply with the following: - No inbound rules. - No outbound rules. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **Note**: When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly, as it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering by discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 5.5 Ensure the default security group of every VPC restricts all traffic (Automated)

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If a security group is not specified when an instance is launched, it is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic, both inbound and outbound. The default VPC in every region should have its default security group updated to comply with the following: - No inbound rules. - No outbound rules. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **Note**: When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly, as it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering by discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 5.6 General Policies

These policies relate to general cluster management topics, like namespace best practices and policies applied to pod objects in the cluster.

๐Ÿ’ผ 5.8 Ensure S3 bucket policy changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for changes to S3 bucket policies.

๐Ÿ’ผ 5.9 Ensure AWS Config configuration changes are monitored (Manual)

Real-time monitoring of API calls can be achieved by directing CloudTrail Logs to CloudWatch Logs or an external Security Information and Event Management (SIEM) environment, and establishing corresponding metric filters and alarms. It is recommended that a metric filter and alarm be established for detecting changes to AWS Config's configurations.

๐Ÿ’ผ 5.9 Storage

This section contains recommendations relating to security-related configurations for storage in GKE.

๐Ÿ’ผ 53 Wholesale access to sensitive data (e.g. contents of customer databases or intellectual property that can be exploited for personal gain) would be highly restricted to reduce the risk exposure to significant data leakage events. Industry experience of actual data leakage incidents include the unauthorised extraction of debit/credit card details, theft of personally identifiable information, loss of unencrypted backup media and the sale/trade or exploitation of customer identity data.

๐Ÿ’ผ 6 Cloud SQL Database Services

This section covers security recommendations to follow to secure Cloud SQL database services. The recommendations in this section on setting up database flags are also present in the [CIS Oracle MySQL Community Server 5.7 Benchmarks](https://www.cisecurity.org/benchmark/oracle_mysql/) and in the [CIS PostgreSQL 12 Benchmarks](https://www.cisecurity.org/benchmark/postgresql/). We, nevertheless, include them here as well, the remediation instructions are different on Cloud SQL. Settings these flags require superuser privileges and can only be configured through GCP controls. Learn more at: [https://cloud.google.com/sql/docs/postgres/users](https://cloud.google.com/sql/docs/postgres/users) and [https://cloud.google.com/sql/docs/mysql/flags](https://cloud.google.com/sql/docs/mysql/flags).

๐Ÿ’ผ 6 Cloud SQL Database Services

This section covers security recommendations to follow to secure Cloud SQL database services. The recommendations in this section on setting up database flags are also present in the [CIS Oracle MySQL Community Server 5.7 Benchmarks](https://www.cisecurity.org/benchmark/oracle_mysql/) and in the [CIS PostgreSQL 12 Benchmarks](https://www.cisecurity.org/benchmark/postgresql/). We, nevertheless, include them here as well, the remediation instructions are different on Cloud SQL. Settings these flags require superuser privileges and can only be configured through GCP controls. Learn more at: [https://cloud.google.com/sql/docs/postgres/users](https://cloud.google.com/sql/docs/postgres/users) and [https://cloud.google.com/sql/docs/mysql/flags](https://cloud.google.com/sql/docs/mysql/flags).

๐Ÿ’ผ 6 Cloud SQL Database Services

This section covers security recommendations to follow to secure Cloud SQL database services. The recommendations in this section on setting up database flags are also present in the [CIS Oracle MySQL Community Server 5.7 Benchmarks](https://www.cisecurity.org/benchmark/oracle_mysql/) and in the [CIS PostgreSQL 12 Benchmarks](https://www.cisecurity.org/benchmark/postgresql/). We, nevertheless, include them here as well, the remediation instructions are different on Cloud SQL. Settings these flags require superuser privileges and can only be configured through GCP controls. Learn more at: [https://cloud.google.com/sql/docs/postgres/users](https://cloud.google.com/sql/docs/postgres/users) and [https://cloud.google.com/sql/docs/mysql/flags](https://cloud.google.com/sql/docs/mysql/flags).

๐Ÿ’ผ 6 Cloud SQL Database Services

This section covers security recommendations to follow to secure Cloud SQL database services. The recommendations in this section on setting up database flags are also present in the [CIS Oracle MySQL Community Server 5.7 Benchmarks](https://www.cisecurity.org/benchmark/oracle_mysql/) and in the [CIS PostgreSQL 12 Benchmarks](https://www.cisecurity.org/benchmark/postgresql/). We, nevertheless, include them here as well, the remediation instructions are different on Cloud SQL. Settings these flags require superuser privileges and can only be configured through GCP controls. Learn more at: [https://cloud.google.com/sql/docs/postgres/users](https://cloud.google.com/sql/docs/postgres/users) and [https://cloud.google.com/sql/docs/mysql/flags](https://cloud.google.com/sql/docs/mysql/flags).

๐Ÿ’ผ 6 Cloud SQL Database Services

This section covers security recommendations to follow to secure Cloud SQL database services. The recommendations in this section on setting up database flags are also present in the [CIS Oracle MySQL Community Server 5.7 Benchmarks](https://www.cisecurity.org/benchmark/oracle_mysql/) and in the [CIS PostgreSQL 12 Benchmarks](https://www.cisecurity.org/benchmark/postgresql/). We, nevertheless, include them here as well, the remediation instructions are different on Cloud SQL. Settings these flags require superuser privileges and can only be configured through GCP controls. Learn more at: [https://cloud.google.com/sql/docs/postgres/users](https://cloud.google.com/sql/docs/postgres/users) and [https://cloud.google.com/sql/docs/mysql/flags](https://cloud.google.com/sql/docs/mysql/flags).

๐Ÿ’ผ 6 Logging and Monitoring

This section covers security recommendations to follow for logging and monitoring policies on an Azure Subscription. **Scoping: A necessary exercise for effective and efficient use of Logging and Monitoring** For recommendations contained in this section, it is crucial that your organization consider and settle on the scope of application for each recommendation individually. The scope of application cannot be realistically written in a generic prescriptive way within these recommendations, so a scoping exercise is strongly recommended. A scoping exercise will help you determine which resources are "in scope" and will receive partial or complete logging and monitoring treatment, and which resources are "out of scope" and will not receive any logging and monitoring treatment. Your objectives with the scoping exercise should be to: - Produce a clear classification of resources - Understand the control requirements of any relevant security or compliance frameworks - Ensure the appropriate personnel can detect and react to threats - Ensure relevant resources have a historical register for accountability and investigation - Minimize alert fatigue and cost Release Environments provide a helpful context for understanding scope from a DevOps perspective. For example: 1. Production Environment 2. Staging Environment 3. Testing Environment 4. Development Environment While resources considered in the scope of a Production Environment might have a full set of recommendations applied for logging and monitoring, other release environments might have a limited set of recommendations applied for the sake of accountability. The names of these environments and which resources are in the scope of each environment will vary from one organization to another.

๐Ÿ’ผ 6 Managed services

This section consists of security recommendations for the direct configuration of Kubernetes managed service components, namely, Google Kubernetes Engine (GKE). These recommendations are directly applicable for features which exist only as part of a managed service.

๐Ÿ’ผ 6 Networking

This section covers security recommendations to follow in order to set networking policies on an Azure subscription.

๐Ÿ’ผ 6 Networking

This section covers security recommendations to follow in order to set networking policies on an Azure subscription.

๐Ÿ’ผ 6 Networking

This section covers security recommendations to follow in order to set networking policies on an Azure subscription.

๐Ÿ’ผ 6 Networking

This section covers security recommendations to follow in order to set networking policies on an Azure subscription.

๐Ÿ’ผ 6 Networking

This section covers security recommendations to follow in order to set networking policies on an Azure subscription.

๐Ÿ’ผ 6.1 Configuring Diagnostic Settings

The Azure Diagnostic Settings capture control/management activities performed on a subscription or Azure AD Tenant. By default, the Azure Portal retains activity logs only for 90 days. The Diagnostic Settings define the type of events that are stored or streamed and the outputsโ€”storage account, log analytics workspace, event hub, and others. The Diagnostic Settings, if configured properly, can ensure that all logs are retained for longer duration. This section has recommendations for correctly configuring the Diagnostic Settings so that all logs captured are retained for longer periods. **Azure Subscriptions** When configuring Diagnostic Settings, you may choose to export in one of four ways in which you need to ensure appropriate data retention. The options are Log Analytics workspace, Event Hub, Storage Account, and Partner Solutions. It is important to ensure you are aware and have set retention as your organization sees fit. **Azure AD Logs** In order to retain sign in logs, user account changes, application provisioning logs, or other logs that are visible to only on the Tenant in Azure AD, separate Diagnostic settings must be specified. **Deployment by Policy** Deploying Azure diagnostics should ideally be done by policy to ensure a consistent configuration, Microsoft provide a full set of policies for all diagnostic capable resource types in their github repository. If you chose to deploy by policy, it is best to route the diagnostics to a Log Analytics Workspace so that they can be used in Azure Monitor or Azure Sentinel. Be aware that this has a cost attached to it. Future versions of the CIS Azure Foundations Benchmark will aim to cover the use of policy in greater detail.

๐Ÿ’ผ 6.1 Establish a process to identify security vulnerabilities, using reputable outside sources for security vulnerability information, and assign a risk ranking to newly discovered security vulnerabilities.

Risk rankings should be based on industry best practices as well as consideration of potential impact. For example, criteria for ranking vulnerabilities may include consideration of the CVSS base score, and/or the classification by the vendor, and/or type of systems affected. Methods for evaluating vulnerabilities and assigning risk ratings will vary based on an organization's environment and risk-assessment strategy. Risk rankings should, at a minimum, identify all vulnerabilities considered to be a โ€œhigh riskโ€ to the environment. In addition to the risk ranking, vulnerabilities may be considered โ€œcriticalโ€ if they pose an imminent threat to the environment, impact critical systems, and/or would result in a potential compromise if not addressed. Examples of critical systems may include security systems, public-facing devices and systems, databases, and other systems that store, process, or transmit cardholder data.

๐Ÿ’ผ 6.1 Screening

Background verification checks on all candidates to become personnel shall be carried out prior to joining the organization and on an ongoing basis taking into consideration applicable laws, regulations and ethics and be proportional to the business requirements, the classification of the information to be accessed and the perceived risks.

๐Ÿ’ผ 6.1 Security Defaults (Per-User MFA)

IMPORTANT: READ BELOW BEFORE PROCEEDING! โ€ข If your organization pays for Microsoft Entra ID licensing (included in Microsoft 365 E3, E5, F5, or Business Premium, and EM&S E3 or E5 licenses) and CAN use Conditional Access, ignore the recommendations in this section and proceed to the Conditional Access section. โ€ข If your organization is using the free tier of Entra ID (Office 365 E1, E3, or E5, and Microsoft 365 F1 or F3 licenses) and CAN NOT use Conditional Access, proceed with the Security Defaults guidance in this section, and ignore the recommendations in the Conditional Access section. Conditional Access is preferred, but Security Defaults (Per-User MFA) is recommended only if Conditional Access isn't available. Why is this IMPORTANT? The Azure "Security Defaults" recommendations represent an entry-level set of recommendations (such as Multi-Factor Authentication) which will be relevant to organizations and tenants that are either just starting to use Azure, or are only utilizing a bare minimum feature set, and rely on the free license tier of Microsoft Entra ID. Security Defaults recommendations are intended to ensure that these use cases are still capable of establishing a strong baseline of secure configuration. If your subscription is licensed to use Microsoft Entra ID P1 or P2, it is strongly recommended that the "Security Defaults" section (this section and the recommendations therein) be bypassed in favor of the use of "Conditional Access."

๐Ÿ’ผ 6.1.1 Ensure that 'security defaults' is enabled in Microsoft Entra ID (Manual)

[IMPORTANT - Please read the section overview: If your organization pays for Microsoft Entra ID licensing (included in Microsoft 365 E3, E5, F5, or Business Premium, and EM&S E3 or E5 licenses) and CAN use Conditional Access, ignore the recommendations in this section and proceed to the Conditional Access section.] Security defaults in Microsoft Entra ID make it easier to be secure and help protect your organization. Security defaults contain preconfigured security settings for common attacks. Security defaults is available to everyone. The goal is to ensure that all organizations have a basic level of security enabled at no extra cost. You may turn on security defaults in the Azure portal.

๐Ÿ’ผ 6.1.2 Ensure Diagnostic Setting captures appropriate categories (Automated)

**Prerequisite**: A Diagnostic Setting must exist. If a Diagnostic Setting does not exist, the navigation and options within this recommendation will not be available. Please review the recommendation at the beginning of this subsection titled: "Ensure that a 'Diagnostic Setting' exists." The diagnostic setting should be configured to log the appropriate activities from the control/management plane.

๐Ÿ’ผ 6.1.2 Ensure that 'multifactor authentication' is 'enabled' for all users (Manual)

[IMPORTANT - Please read the section overview: If your organization pays for Microsoft Entra ID licensing (included in Microsoft 365 E3, E5, F5, or Business Premium, and EM&S E3 or E5 licenses) and CAN use Conditional Access, ignore the recommendations in this section and proceed to the Conditional Access section.] Enable multifactor authentication for all users. Note: Since 2024, Azure has been rolling out mandatory multifactor authentication. For more information: โ€ข <https://azure.microsoft.com/en-us/blog/announcing-mandatory-multi-factor-authentication-for-azure-sign-in> โ€ข <https://learn.microsoft.com/en-us/entra/identity/authentication/concept-mandatory-multifactor-authentication>

๐Ÿ’ผ 6.1.3 Ensure that 'Allow users to remember multifactor authentication on devices they trust' is disabled (Manual)

[IMPORTANT - Please read the section overview: If your organization pays for Microsoft Entra ID licensing (included in Microsoft 365 E3, E5, F5, or Business Premium, and EM&S E3 or E5 licenses) and CAN use Conditional Access, ignore the recommendations in this section and proceed to the Conditional Access section.] Do not allow users to remember multi-factor authentication on devices.

๐Ÿ’ผ 6.2 Conditional Access

For most Azure tenants, and certainly for organizations with a significant use of Microsoft Entra ID, Conditional Access policies are recommended and preferred. To use Conditional Access Policies, a licensing plan is required, and Security Defaults must be disabled. Because of the licensing requirement, all Conditional Access policies are assigned a profile of "Level 2." Conditional Access requires one of the following plans: โ€ข Microsoft Entra ID P1 or P2 โ€ข Microsoft 365 Business Premium โ€ข Microsoft 365 E3 or E5 โ€ข Microsoft 365 F1, F3, F5 Security and F5 Security + Compliance โ€ข Enterprise Mobility & Security E3 or E5

๐Ÿ’ผ 6.2 Monitoring using Activity Log Alerts

The recommendations provided in this section are intended to provide entry-level alerting for crucial activities on a tenant account. These recommended activities should be tuned to your needs. By default, each of these Activity Log Alerts tends to guide the reader to alerting at the "Subscription-wide" level which will capture and alert on rules triggered by all resources and resource groups contained within a subscription. This is not an ideal rule set for Alerting within larger and more complex organizations. While this section provides recommendations for the creation of **Activity Log Alerts** specifically, Microsoft Azure supports four different types of alerts: - Metric Alerts - Log Alerts - Activity Log Alerts - Smart Detection Alerts All Azure services (Microsoft provided or otherwise) that can generate alerts are assigned a "Resource provider namespace" when they are registered in an Azure tenant. The recommendations in this section are in no way exhaustive of the plethora of available "Providers" or "Resource Types." The Resource Providers that are registered in your Azure Tenant can be located in your Subscription. Each registered Provider in your environment may have available "Conditions" to raise alerts via Activity Log Alerts. These providers should be considered for inclusion in Activity Log Alert rules of your own making. To view the registered resource providers in your Subscription(s), use this guide: - <https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/resource-providers-and-types> If you wish to create custom alerting rules for Activity Log Alerts or other alert types, please refer to Microsoft documentation: - <https://docs.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-create-new-alert-rule>

๐Ÿ’ผ 6.2.1 Bespoke and custom software are developed securely.

As follows: - Based on industry standards and/or best practices for secure development. - In accordance with PCI DSS (for example, secure authentication and logging). - Incorporating consideration of information security issues during each stage of the software development lifecycle.

๐Ÿ’ผ 6.2.1 Bespoke and custom software are developed securely.

As follows: - Based on industry standards and/or best practices for secure development. - In accordance with PCI DSS (for example, secure authentication and logging). - Incorporating consideration of information security issues during each stage of the software development lifecycle.

๐Ÿ’ผ 6.2.1 Ensure 'Log_error_verbosity' Database Flag for Cloud SQL PostgreSQL Instance Is Set to 'DEFAULT' or Stricter - Level 2 (Automated)

The log_error_verbosity flag controls the verbosity/details of messages logged. Valid values are: - TERSE - DEFAULT - VERBOSE TERSE excludes the logging of DETAIL, HINT, QUERY, and CONTEXT error information. VERBOSE output includes the SQLSTATE error code, source code file name, function name, and line number that generated the error. Ensure an appropriate value is set to 'DEFAULT' or stricter.

๐Ÿ’ผ 6.2.1 Ensure 'Log_error_verbosity' Database Flag for Cloud SQL PostgreSQL Instance Is Set to 'DEFAULT' or Stricter - Level 2 (Automated)

The `log_error_verbosity` flag controls the verbosity/details of messages logged. Valid values are: - `TERSE` - `DEFAULT` - `VERBOSE` `TERSE` excludes the logging of `DETAIL`, `HINT`, `QUERY`, and `CONTEXT` error information. `VERBOSE` output includes the `SQLSTATE` error code, source code file name, function name, and line number that generated the error. Ensure an appropriate value is set to 'DEFAULT' or stricter.

๐Ÿ’ผ 6.2.1 Ensure 'Log_error_verbosity' Database Flag for Cloud SQL PostgreSQL Instance Is Set to 'DEFAULT' or Stricter - Level 2 (Manual)

The `log_error_verbosity` flag controls the verbosity/details of messages logged. Valid values are: - `TERSE` - `DEFAULT` - `VERBOSE` `TERSE` excludes the logging of `DETAIL`, `HINT`, `QUERY`, and `CONTEXT` error information. `VERBOSE` output includes the `SQLSTATE` error code, source code file name, function name, and line number that generated the error. Ensure an appropriate value is set to 'DEFAULT' or stricter.

๐Ÿ’ผ 6.2.1 Ensure that 'trusted locations' are defined (Manual)

Microsoft Entra ID Conditional Access allows an organization to configure Named locations and configure whether those locations are trusted or untrusted. These settings provide organizations the means to specify Geographical locations for use in conditional access policies, or define actual IP addresses and IP ranges and whether or not those IP addresses and/or ranges are trusted by the organization.

๐Ÿ’ผ 6.2.13 Ensure that the 'log_min_messages' database flag for Cloud SQL PostgreSQL instance is set appropriately - Level 1 (Manual)

The `log_min_messages` flag defines the minimum message severity level that is considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include `DEBUG5`, `DEBUG4`, `DEBUG3`, `DEBUG2`, `DEBUG1`, `INFO`, `NOTICE`, `WARNING`, `ERROR`, `LOG`, `FATAL`, and `PANIC`. Each severity level includes the subsequent levels mentioned above. Note: To effectively turn off logging failing statements, set this parameter to PANIC. ERROR is considered the best practice setting. Changes should only be made in accordance with the organization's logging policy.

๐Ÿ’ผ 6.2.14 Ensure 'log_min_error_statement' database flag for Cloud SQL PostgreSQL instance is set to 'Error' or stricter - Level 1 (Automated)

The `log_min_error_statement` flag defines the minimum message severity level that are considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include `DEBUG5`, `DEBUG4`, `DEBUG3`, `DEBUG2`, `DEBUG1`, `INFO`, `NOTICE`, `WARNING`, `ERROR`, `LOG`, `FATAL`, and `PANIC`. Each severity level includes the subsequent levels mentioned above. Ensure a value of `ERROR` or stricter is set. Will be supported in the near future.

๐Ÿ’ผ 6.2.15 Ensure that the 'log_temp_files' database flag for Cloud SQL PostgreSQL instance is set to '0' (on) - Level 1 (Automated)

PostgreSQL can create a temporary file for actions such as sorting, hashing and temporary query results when these operations exceed `work_mem`. The `log_temp_files` flag controls logging names and the file size when it is deleted. Configuring `log_temp_files` to `0` causes all temporary file information to be logged, while positive values log only files whose size is greater than or equal to the specified number of kilobytes. A value of `-1` disables temporary file information logging.

๐Ÿ’ผ 6.2.2 Ensure 'log_error_verbosity' database flag for Cloud SQL PostgreSQL instance is set to 'DEFAULT' or stricter - Level 2 (Manual | Not supported, requires a manual assessment)

The `log_error_verbosity` flag controls the verbosity/details of messages logged. Valid values are: - `TERSE` - `DEFAULT` - `VERBOSE` `TERSE` excludes the logging of `DETAIL`, `HINT`, `QUERY`, and `CONTEXT` error information. `VERBOSE` output includes the `SQLSTATE` error code, source code file name, function name, and line number that generated the error. Ensure an appropriate value is set to 'DEFAULT' or stricter.

๐Ÿ’ผ 6.2.2 Ensure that an exclusionary geographic Conditional Access policy is considered (Manual)

CAUTION: If these policies are created without first auditing and testing the result, misconfiguration can potentially lock out administrators or create undesired access issues. Conditional Access Policies can be used to block access from geographic locations that are deemed out-of-scope for your organization or application. The scope and variables for this policy should be carefully examined and defined.

๐Ÿ’ผ 6.2.4 Ensure 'Log_statement' Database Flag for Cloud SQL PostgreSQL Instance Is Set Appropriately - Level 1 (Manual)

The value of `log_statement` flag determined the SQL statements that are logged. Valid values are: - `none` - `ddl` - `mod` - `all` The value `ddl` logs all data definition statements. The value `mod` logs all ddl statements, plus data-modifying statements. The statements are logged after a basic parsing is done and statement type is determined, thus this does not logs statements with errors. When using extended query protocol, logging occurs after an Execute message is received and values of the Bind parameters are included. A value of 'ddl' is recommended unless otherwise directed by your organization's logging policy.

๐Ÿ’ผ 6.2.4 Ensure 'Log_statement' Database Flag for Cloud SQL PostgreSQL Instance Is Set Appropriately - Level 2 (Automated)

The value of log_statement flag determined the SQL statements that are logged. Valid values are: - none - ddl - mod - all The value ddl logs all data definition statements. The value mod logs all ddl statements, plus data-modifying statements. The statements are logged after a basic parsing is done and statement type is determined, thus this does not logs statements with errors. When using extended query protocol, logging occurs after an Execute message is received and values of the Bind parameters are included. A value of 'ddl' is recommended unless otherwise directed by your organization's logging policy.

๐Ÿ’ผ 6.2.4 Ensure 'Log\_statement' Database Flag for Cloud SQL PostgreSQL Instance Is Set Appropriately - Level 2 (Automated)

The value of `log_statement` flag determined the SQL statements that are logged. Valid values are: - `none` - `ddl` - `mod` - `all` The value `ddl` logs all data definition statements. The value `mod` logs all ddl statements, plus data-modifying statements. The statements are logged after a basic parsing is done and statement type is determined, thus this does not logs statements with errors. When using extended query protocol, logging occurs after an Execute message is received and values of the Bind parameters are included. A value of 'ddl' is recommended unless otherwise directed by your organization's logging policy.

๐Ÿ’ผ 6.2.4 Ensure that a multifactor authentication policy exists for all users (Manual)

A Conditional Access policy can be enabled to ensure that users are required to use Multifactor Authentication (MFA) to login. Note: Since 2024, Azure has been rolling out mandatory multifactor authentication. For more information: โ€ข <https://azure.microsoft.com/en-us/blog/announcing-mandatory-multi-factor-authentication-for-azure-sign-in> โ€ข <https://learn.microsoft.com/en-us/entra/identity/authentication/concept-mandatory-multifactor-authentication>

๐Ÿ’ผ 6.2.4 Software engineering techniques or other methods are defined and in use by software development personnel to prevent or mitigate common software attacks and related vulnerabilities in bespoke and custom software.

Including but not limited to the following: - Injection attacks, including SQL, LDAP, XPath, or other command, parameter, object, fault, or injection-type flaws. - Attacks on data and data structures, including attempts to manipulate buffers, pointers, input data, or shared data. - Attacks on cryptography usage, including attempts to exploit weak, insecure, or inappropriate cryptographic implementations, algorithms, cipher suites, or modes of operation. - Attacks on business logic, including attempts to abuse or bypass application features and functionalities through the manipulation of APIs, communication protocols and channels, client-side functionality, or other system/application functions and resources. This includes cross-site scripting (XSS) and cross-site request forgery (CSRF). - Attacks on access control mechanisms, including attempts to bypass or abuse identification, authentication, or authorization mechanisms, or attempts to exploit weaknesses in the implementation of such mechanisms. - Attacks via any โ€œhigh-riskโ€ vulnerabilities identified in the vulnerability identification process, as defined in Requirement 6.3.1.

๐Ÿ’ผ 6.2.4 Software engineering techniques or other methods are defined and in use by software development personnel to prevent or mitigate common software attacks and related vulnerabilities in bespoke and custom software.

including but not limited to the following: - Injection attacks, including SQL, LDAP, XPath, or other command, parameter, object, fault, or injection-type flaws. - Attacks on data and data structures, including attempts to manipulate buffers, pointers, input data, or shared data. - Attacks on cryptography usage, including attempts to exploit weak, insecure, or inappropriate cryptographic implementations, algorithms, cipher suites, or modes of operation. - Attacks on business logic, including attempts to abuse or bypass application features and functionalities through the manipulation of APIs, communication protocols and channels, client-side functionality, or other system/application functions and resources. This includes cross-site scripting (XSS) and cross-site request forgery (CSRF). - Attacks on access control mechanisms, including attempts to bypass or abuse identification, authentication, or authorization mechanisms, or attempts to exploit weaknesses in the implementation of such mechanisms. - Attacks via any โ€œhigh-riskโ€ vulnerabilities identified in the vulnerability identification process, as defined in Requirement 6.3.1.

๐Ÿ’ผ 6.2.5 Ensure that the 'log_min_messages' database flag for Cloud SQL PostgreSQL instance is set appropriately

The 'log_min_error_statement' flag defines the minimum message severity level that is considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include 'DEBUG5', 'DEBUG4', 'DEBUG3', 'DEBUG2', 'DEBUG1', 'INFO', 'NOTICE', 'WARNING', 'ERROR', 'LOG', 'FATAL', and 'PANIC'. Each severity level includes the subsequent levels mentioned above. Note: To effectively turn off logging failing statements, set this parameter to PANIC. ERROR is considered the best practice setting. Changes should only be made in accordance with the organization's logging policy.

๐Ÿ’ผ 6.2.5 Ensure that the 'Log_min_messages' Flag for a Cloud SQL PostgreSQL Instance is set at minimum to 'Warning' - Level 1 (Automated)

The log_min_messages flag defines the minimum message severity level that is considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE, WARNING, ERROR, LOG, FATAL, and PANIC. Each severity level includes the subsequent levels mentioned above. ERROR is considered the best practice setting. Changes should only be made in accordance with the organization's logging policy.

๐Ÿ’ผ 6.2.5 Ensure that the 'Log_min_messages' Flag for a Cloud SQL PostgreSQL Instance is set at minimum to 'Warning' - Level 1 (Automated)

The `log_min_messages` flag defines the minimum message severity level that is considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include (from lowest to highest severity) `DEBUG5`, `DEBUG4`, `DEBUG3`, `DEBUG2`, `DEBUG1`, `INFO`, `NOTICE`, `WARNING`, `ERROR`, `LOG`, `FATAL`, and `PANIC`. Each severity level includes the subsequent levels mentioned above. ERROR is considered the best practice setting. Changes should only be made in accordance with the organization's logging policy.

๐Ÿ’ผ 6.2.6 Ensure 'Log_min_error_statement' Database Flag for Cloud SQL PostgreSQL Instance Is Set to 'Error' or Stricter - Level 1 (Automated)

The log_min_error_statement flag defines the minimum message severity level that are considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE, WARNING, ERROR, LOG, FATAL, and PANIC. Each severity level includes the subsequent levels mentioned above. Ensure a value of ERROR or stricter is set.

๐Ÿ’ผ 6.2.6 Ensure 'Log_min_error\_statement' Database Flag for Cloud SQL PostgreSQL Instance Is Set to 'Error' or Stricter - Level 1 (Automated)

The `log_min_error_statement` flag defines the minimum message severity level that are considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include (from lowest to highest severity) `DEBUG5`, `DEBUG4`, `DEBUG3`, `DEBUG2`, `DEBUG1`, `INFO`, `NOTICE`, `WARNING`, `ERROR`, `LOG`, `FATAL`, and `PANIC`. Each severity level includes the subsequent levels mentioned above. Ensure a value of `ERROR` or stricter is set.

๐Ÿ’ผ 6.2.6 Ensure That the 'Log_min_messages' Database Flag for Cloud SQL PostgreSQL Instance Is Set to at least 'Warning' - Level 1 (Manual)

The `log_min_messages` flag defines the minimum message severity level that is considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include `DEBUG5`, `DEBUG4`, `DEBUG3`, `DEBUG2`, `DEBUG1`, `INFO`, `NOTICE`, `WARNING`, `ERROR`, `LOG`, `FATAL`, and `PANIC`. Each severity level includes the subsequent levels mentioned above. ERROR is considered the best practice setting. Changes should only be made in accordance with the organization's logging policy.

๐Ÿ’ผ 6.2.6 Ensure that the 'log_temp_files' database flag for Cloud SQL PostgreSQL instance is set to '0' (on)

PostgreSQL can create a temporary file for actions such as sorting, hashing and temporary query results when these operations exceed 'work_mem'. The 'log_temp_files' flag controls logging names and the file size when it is deleted. Configuring 'log_temp_files' to '0' causes all temporary file information to be logged, while positive values log only files whose size is greater than or equal to the specified number of kilobytes. A value of '-1' disables temporary file information logging.

๐Ÿ’ผ 6.2.7 Ensure 'Log_min_error_statement' Database Flag for Cloud SQL PostgreSQL Instance Is Set to 'Error' or Stricter - Level 1 (Automated)

The `log_min_error_statement` flag defines the minimum message severity level that are considered as an error statement. Messages for error statements are logged with the SQL statement. Valid values include `DEBUG5`, `DEBUG4`, `DEBUG3`, `DEBUG2`, `DEBUG1`, `INFO`, `NOTICE`, `WARNING`, `ERROR`, `LOG`, `FATAL`, and `PANIC`. Each severity level includes the subsequent levels mentioned above. Ensure a value of `ERROR` or stricter is set.

๐Ÿ’ผ 6.2.7 Ensure 'log_statement' database flag for Cloud SQL PostgreSQL instance is set appropriately - Level 1 (Manual | Not supported, requires a manual assessment)

The value of `log_statement` flag determined the SQL statements that are logged. Valid values are: - `none` - `ddl` - `mod` - `all` The value `ddl` logs all data definition statements. The value `mod` logs all ddl statements, plus data-modifying statements. The statements are logged after a basic parsing is done and statement type is determined, thus this does not logs statements with errors. When using extended query protocol, logging occurs after an Execute message is received and values of the Bind parameters are included. A value of 'ddl' is recommended unless otherwise directed by your organization's logging policy.

๐Ÿ’ผ 6.2.8 Ensure 'log_hostname' database flag for Cloud SQL PostgreSQL instance is set appropriately - Level 1 (Automated | Roadmapped)

PostgreSQL logs only the IP address of the connecting hosts. The `log_hostname` flag controls the logging of `hostnames` in addition to the IP addresses logged. The performance hit is dependent on the configuration of the environment and the host name resolution setup. This parameter can only be set in the `postgresql.conf` file or on the server command line. Will be supported in the near future.

๐Ÿ’ผ 6.2.9 Ensure Instance IP assignment is set to private - Level 1 (Automated)

Instance addresses can be public IP or private IP. Public IP means that the instance is accessible through the public internet. In contrast, instances using only private IP are not accessible through the public internet, but are accessible through a Virtual Private Cloud (VPC). Limiting network access to your database will limit potential attacks.

๐Ÿ’ผ 6.22 Ensure that 'Require Multifactor Authentication to register or join devices with Microsoft Entra' is set to 'Yes' (Manual)

NOTE: This recommendation is only relevant if your subscription is using Per-User MFA. If your organization is licensed to use Conditional Access, the preferred method of requiring MFA to join devices to Entra ID is to use a Conditional Access policy (see additional information below for link). Joining or registering devices to Microsoft Entra ID should require multi-factor authentication.

๐Ÿ’ผ 6.26 Ensure fewer than 5 users have global administrator assignment (Manual)

This recommendation aims to maintain a balance between security and operational efficiency by ensuring that a minimum of 2 and a maximum of 4 users are assigned the Global Administrator role in Microsoft Entra ID. Having at least two Global Administrators ensures redundancy, while limiting the number to four reduces the risk of excessive privileged access.

๐Ÿ’ผ 6.3 Develop internal and external software applications securely.

As follows: - In accordance with PCI DSS (for example, secure authentication and logging) - Based on industry standards and/or best practices. - Incorporating information security throughout the software-development life cycle. This applies to all software developed internally as well as bespoke or custom software developed by a third party.

๐Ÿ’ผ 6.3 Ensure no SQL Databases allow ingress 0.0.0.0/0 (ANY IP)

Ensure that no SQL Databases allow ingress from 0.0.0.0/0 (ANY IP). Implementation note: CIS instructs to fail any SQL database that has a firewall rule that has โ€˜Start IPโ€™ of 0.0.0.0. This does not align with the rule title and description - to block any ingress traffic. Therefore, the assigned policy of this rule will fail for any SQL Server that has a firewall rule with โ€˜Start IPโ€™ 0.0.0.0 and โ€˜End IPโ€™ 255.255.255.255. This range covers any IP (0.0.0.0/0). In addition, the benchmark instructs to make sure 'Allow Azure services and resources to access this service' is disabled. This option has nothing to do with ANY IP. It simply checks if the Server allows traffic from Azure services (from any subscription). We decided to create a dedicated rule to check if this feature is enabled.

๐Ÿ’ผ 6.3 Ensure no SQL Databases allow ingress 0.0.0.0/0 (ANY IP) - Level 1 (Automated).

Ensure that no SQL Databases allow ingress from 0.0.0.0/0 (ANY IP). Implementation note: CIS instructs to fail any SQL database that has a firewall rule that has โ€˜Start IPโ€™ of 0.0.0.0. This does not align with the rule title and description - to block any ingress traffic. Therefore, the assigned policy of this rule will fail for any SQL Server that has a firewall rule with โ€˜Start IPโ€™ 0.0.0.0 and โ€˜End IPโ€™ 255.255.255.255. This range covers any IP (0.0.0.0/0). In addition, the benchmark instructs to make sure 'Allow Azure services and resources to access this service' is disabled. This option has nothing to do with ANY IP. It simply checks if the Server allows traffic from Azure services (from any subscription). We decided to create a dedicated rule to check if this feature is enabled.

๐Ÿ’ผ 6.3 Ensure no SQL Databases allow ingress 0.0.0.0/0 (ANY IP) - Level 1 (Automated).

Ensure that no SQL Databases allow ingress from 0.0.0.0/0 (ANY IP). Implementation note: CIS instructs to fail any SQL database that has a firewall rule that has โ€˜Start IPโ€™ of 0.0.0.0. This does not align with the rule title and description - to block any ingress traffic. Therefore, the assigned policy of this rule will fail for any SQL Server that has a firewall rule with โ€˜Start IPโ€™ 0.0.0.0 and โ€˜End IPโ€™ 255.255.255.255. This range covers any IP (0.0.0.0/0). In addition, the benchmark instructs to make sure 'Allow Azure services and resources to access this service' is disabled. This option has nothing to do with ANY IP. It simply checks if the Server allows traffic from Azure services (from any subscription). We decided to create a dedicated rule to check if this feature is enabled.

๐Ÿ’ผ 6.3 Information security awareness, education and training

Personnel of the organization and relevant interested parties shall receive appropriate information security awareness, education and training and regular updates of the organization's information security policy, topic-specific policies and procedures, as relevant for their job function.

๐Ÿ’ผ 6.3 Periodic Identity Reviews

Security Best Practices for Identity services should include operational reviews that periodically ensure the integrity and necessity of accounts and permissions. These operational practices should be performed regularly on a cadence that is based on your organization's policy or compliance requirements. NOTE: The recommendations in this section may not have a precise audit or remediation procedure because they may not be a configurable setting as much as they are an operative task that should be performed on a periodic basis.

๐Ÿ’ผ 6.3.1 Ensure Application Insights are Configured (Automated)

Application Insights within Azure act as an Application Performance Monitoring solution providing valuable data into how well an application performs and additional information when performing incident response. The types of log data collected include application metrics, telemetry data, and application trace logging data providing organizations with detailed information about application activity and application transactions. Both data sets help organizations adopt a proactive and retroactive means to handle security and performance related metrics within their modern applications.

๐Ÿ’ผ 6.3.1 Security vulnerabilities are identified and managed.

As follows: - New security vulnerabilities are identified using industry-recognized sources for security vulnerability information, including alerts from international and national computer emergency response teams (CERTs). - Vulnerabilities are assigned a risk ranking based on industry best practices and consideration of potential impact. - Risk rankings identify, at a minimum, all vulnerabilities considered to be a high-risk or critical to the environment. - Vulnerabilities for bespoke and custom, and third-party software (for example operating systems and databases) are covered.

๐Ÿ’ผ 6.3.1 Security vulnerabilities are identified and managed.

As follows: - New security vulnerabilities are identified using industry-recognized sources for security vulnerability information, including alerts from international and national computer emergency response teams (CERTs). - Vulnerabilities are assigned a risk ranking based on industry best practices and consideration of potential impact. - Risk rankings identify, at a minimum, all vulnerabilities considered to be a high-risk or critical to the environment. - Vulnerabilities for bespoke and custom, and third-party software (for example operating systems and databases) are covered.

๐Ÿ’ผ 6.3.2 Review custom code prior to release to production or customers in order to identify any potential coding vulnerability.

Include at least the following: - Code changes are reviewed by individuals other than the originating code author, and by individuals knowledgeable about code-review techniques and secure coding practices. - Code reviews ensure code is developed according to secure coding guidelines - Appropriate corrections are implemented prior to release. - Code-review results are reviewed and approved by management prior to release. This requirement for code reviews applies to all custom code (both internal and public-facing), as part of the system development life cycle. Code reviews can be conducted by knowledgeable internal personnel or third parties. Public-facing web applications are also subject to additional controls, to address ongoing threats and vulnerabilities after implementation, as defined at PCI DSS Requirement 6.6.

๐Ÿ’ผ 6.3.2 Ensure that the 'cross db ownership chaining' database flag for Cloud SQL SQL Server instance is set to 'off' - Level 1 (Automated)

It is recommended to set `cross db ownership chaining` database flag for Cloud SQL SQL Server instance to `off`. This flag is deprecated for all SQL Server versions in CGP. Going forward, you can't set its value to on. However, if you have this flag enabled, we strongly recommend that you either remove the flag from your database or set it to off. For cross-database access, use the [Microsoft tutorial for signing stored procedures with a certificate](https://learn.microsoft.com/en-us/sql/relational-databases/tutorial-signing-stored-procedures-with-a-certificate?view=sql-server-ver16).

๐Ÿ’ผ 6.3.3 All system components are protected from known vulnerabilities by installing applicable security patches/updates

As follows: - Critical or high-security patches/updates (identified according to the risk ranking process at Requirement 6.3.1) are installed within one month of release. - All other applicable security patches/updates are installed within an appropriate time frame as determined by the entity's assessment of the criticality of the risk to the environment as identified according to the risk ranking process at Requirement 6.3.1.

๐Ÿ’ผ 6.4 Disciplinary process

A disciplinary process shall be formalized and communicated to take actions against personnel and other relevant interested parties who have committed an information security policy violation.

๐Ÿ’ผ 6.4 Ensure that Azure Monitor Resource Logging is Enabled for All Services that Support it (Manual)

Resource Logs capture activity to the data access plane while the Activity log is a subscription-level log for the control plane. Resource-level diagnostic logs provide insight into operations that were performed within that resource itself; for example, reading or updating a secret from a Key Vault. Currently, 95 Azure resources support Azure Monitoring (See the more information section for a complete list), including Network Security Groups, Load Balancers, Key Vault, AD, Logic Apps, and CosmosDB. The content of these logs varies by resource type. A number of back-end services were not configured to log and store Resource Logs for certain activities or for a sufficient length. It is crucial that monitoring is correctly configured to log all relevant activities and retain those logs for a sufficient length of time. Given that the mean time to detection in an enterprise is 240 days, a minimum retention period of two years is recommended.

๐Ÿ’ผ 6.4.1 For public-facing web applications, new threats and vulnerabilities are addressed on an ongoing basis and these applications are protected against known attacks.

as follows: - Reviewing public-facing web applications via manual or automated application vulnerability security assessment tools or methods as follows: - At least once every 12 months and after significant changes. - By an entity that specializes in application security. - Including, at a minimum, all common software attacks in Requirement 6.2.4. - All vulnerabilities are ranked in accordance with requirement 6.3.1. - All vulnerabilities are corrected. - The application is re-evaluated after the corrections OR - Installing an automated technical solution(s) that continually detects and prevents web-based attacks as follows: - Installed in front of public-facing web applications to detect and prevent web-based attacks. - Actively running and up to date as applicable. - Generating audit logs. - Configured to either block web-based attacks or generate an alert that is immediately investigated.

๐Ÿ’ผ 6.4.1 For public-facing web applications, new threats and vulnerabilities are addressed on an ongoing basis and these applications are protected against known attacks.

as follows: - Reviewing public-facing web applications via manual or automated application vulnerability security assessment tools or methods as follows: - At least once every 12 months and after significant changes. - By an entity that specializes in application security. - Including, at a minimum, all common software attacks in Requirement 6.2.4. - All vulnerabilities are ranked in accordance with requirement 6.3.1. - All vulnerabilities are corrected. - The application is re-evaluated after the corrections OR - Installing an automated technical solution(s) that continually detects and prevents web-based attacks as follows: - Installed in front of public-facing web applications to detect and prevent web-based attacks. - Actively running and up to date as applicable. - Generating audit logs. - Configured to either block web-based attacks or generate an alert that is immediately investigated.

๐Ÿ’ผ 6.5 Address common coding vulnerabilities in software-development processes.

As follows: - Train developers at least annually in up-to-date secure coding techniques, including how to avoid common coding vulnerabilities. - Develop applications based on secure coding guidelines. The vulnerabilities listed at 6.5.1 through 6.5.10 were current with industry best practices when this version of PCI DSS was published. However, as industry best practices for vulnerability management are updated (for example, the OWASPGuide, SANS CWE Top 25, CERT Secure Coding, etc.), the current best practices must be used for these requirements.

๐Ÿ’ผ 6.5 Ensure the default security group of every VPC restricts all traffic (Automated)

A VPC comes with a default security group whose initial settings deny all inbound traffic, allow all outbound traffic, and allow all traffic between instances assigned to the security group. If a security group is not specified when an instance is launched, it is automatically assigned to this default security group. Security groups provide stateful filtering of ingress/egress network traffic to AWS resources. It is recommended that the default security group restrict all traffic, both inbound and outbound. The default VPC in every region should have its default security group updated to comply with the following: - No inbound rules. - No outbound rules. Any newly created VPCs will automatically contain a default security group that will need remediation to comply with this recommendation. **Note**: When implementing this recommendation, VPC flow logging is invaluable in determining the least privilege port access required by systems to work properly, as it can log all packet acceptances and rejections occurring under the current security groups. This dramatically reduces the primary barrier to least privilege engineering by discovering the minimum ports required by systems in the environment. Even if the VPC flow logging recommendation in this benchmark is not adopted as a permanent security measure, it should be used during any period of discovery and engineering for least privileged security groups.

๐Ÿ’ผ 6.5.1 Changes to all system components in the production environment are made according to established procedures.

That include: - Reason for, and description of, the change. - Documentation of security impact. - Documented change approval by authorized parties. - Testing to verify that the change does not adversely impact system security. - For bespoke and custom software changes, all updates are tested for compliance with Requirement 6.2.4 before being deployed into production. - Procedures to address failures and return to a secure state.

๐Ÿ’ผ 6.5.1 Changes to all system components in the production environment are made according to established procedures.

That include: - Reason for, and description of, the change. - Documentation of security impact. - Documented change approval by authorized parties. - Testing to verify that the change does not adversely impact system security. - For bespoke and custom software changes, all updates are tested for compliance with Requirement 6.2.4 before being deployed into production. - Procedures to address failures and return to a secure state.

๐Ÿ’ผ 6.6 For public-facing web applications, address new threats and vulnerabilities on an ongoing basis and ensure these applications are protected against known attacks.

By either of the following methods: - Reviewing public-facing web applications via manual or automated application vulnerability security assessment tools or methods, at least annually and after any changes This assessment is not the same as the vulnerability scans performed for Requirement 11.2. - Installing an automated technical solution that detects and prevents web-based attacks (for example, a web-application firewall) in front of public-facing web applications, to continually check all traffic.

๐Ÿ’ผ 6.7 Remote working

Security measures shall be implemented when personnel are working remotely to protect information accessed, processed or stored outside the organizationโ€™s premises.

๐Ÿ’ผ 6.8 Ensure that a 'Custom banned password list' is set to 'Enforce' (Manual)

Microsoft Azure applies a default global banned password list to all user and admin accounts that are created and managed directly in Microsoft Entra ID. The Microsoft Entra password policy does not apply to user accounts that are synchronized from an on-premises Active Directory environment, unless Microsoft Entra ID Connect is used and EnforceCloudPasswordPolicyForPasswordSyncedUsers is enabled. Review the Default Value section for more detail on the password policy. For increased password security, a custom banned password list is recommended

๐Ÿ’ผ 6.9 Storage

This section contains recommendations relating to security-related configurations for storage in GKE.

๐Ÿ’ผ 63 Evaluation of the design of information security controls of third parties and related parties necessitates an understanding of the controls in place or planned. This can be maintained over time through a combination of interviews, surveys, control testing, certifications, contractual reviews, attestations and independent assurance assessments. Controls identified can then be compared to common industry controls and considered in light of controls within the regulated entity as well as the nature of the information assets involved. Any capability gaps identified would be addressed in a timely manner.

๐Ÿ’ผ 66 Under CPS 234, an APRA-regulated entity is required to have robust mechanisms in place to detect and respond to actual or potential compromises of information security in a timely manner. The term โ€˜potentialโ€™ is used to highlight that information security incidents are commonly identified when an event occurs (e.g. unauthorised access notification, customer complaint) requiring further investigation in order to ascertain whether an actual security compromise has occurred.

๐Ÿ’ผ 67 Detection mechanisms typically include scanning, sensing and logging mechanisms which can be used to identify potential information security incidents. Monitoring processes could include the identification of unusual patterns of behaviour and logging that facilitates investigation and preserves forensic evidence. The strength and nature of monitoring controls would typically be commensurate with the impact of an information security incident. Monitoring processes would consider the broad set of events, ranging from the physical hardware layer to higher order business activities such as payments and changes to user access.

๐Ÿ’ผ 7 BigQuery

This section addresses Google CloudPlatform BigQuery. BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in.

๐Ÿ’ผ 7 BigQuery

This section addresses Google CloudPlatform BigQuery. BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in.

๐Ÿ’ผ 7 BigQuery

This section addresses Google CloudPlatform BigQuery. BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in.

๐Ÿ’ผ 7 BigQuery

This section addresses Google CloudPlatform BigQuery. BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in.

๐Ÿ’ผ 7 BigQuery

This section addresses Google CloudPlatform BigQuery. BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in.

๐Ÿ’ผ 7 Networking

This section covers security recommendations to follow in order to set networking policies on an Azure subscription.

๐Ÿ’ผ 7 Virtual Machines

This section covers security recommendations to follow in order to set virtual machine policies on an Azure subscription.

๐Ÿ’ผ 7 Virtual Machines

This section covers security recommendations to follow in order to set virtual machine policies on an Azure subscription.

๐Ÿ’ผ 7 Virtual Machines

This section covers security recommendations to follow for the configuration of Virtual Machines on an Azure subscription.

๐Ÿ’ผ 7 Virtual Machines

This section covers security recommendations to follow for the configuration of Virtual Machines on an Azure subscription.

๐Ÿ’ผ 7 Virtual Machines

This section covers security recommendations to follow for the configuration of Virtual Machines on an Azure subscription.

๐Ÿ’ผ 7.1 Ensure an Azure Bastion Host Exists - Level 2 (Automated)

The Azure Bastion service allows secure remote access to Azure Virtual Machines over the Internet without exposing remote access protocol ports and services directly to the Internet. The Azure Bastion service provides this access using TLS over 443/TCP, and subscribes to hardened configurations within an organization's Azure Active Directory service.

๐Ÿ’ผ 7.1 Ensure an Azure Bastion Host Exists - Level 2 (Automated)

The Azure Bastion service allows secure remote access to Azure Virtual Machines over the Internet without exposing remote access protocol ports and services directly to the Internet. The Azure Bastion service provides this access using TLS over 443/TCP, and subscribes to hardened configurations within an organization's Azure Active Directory service.

๐Ÿ’ผ 7.1 Logging and Monitoring

This section covers security recommendations to follow for logging and monitoring policies on an Azure Subscription. Scoping: A necessary exercise for effective and efficient use of Logging and Monitoring For recommendations contained in this section, it is crucial that your organization consider and settle on the scope of application for each recommendation individually. The scope of application cannot be realistically written in a generic prescriptive way within these recommendations, so a scoping exercise is strongly recommended. A scoping exercise will help you determine which resources are "in scope" and will receive partial or complete logging and monitoring treatment, and which resources are "out of scope" and will not receive any logging and monitoring treatment. Your objectives with the scoping exercise should be to: โ€ข Produce a clear classification of resources โ€ข Understand the control requirements of any relevant security or compliance frameworks โ€ข Ensure the appropriate personnel can detect and react to threats โ€ข Ensure relevant resources have a historical register for accountability and investigation โ€ข Minimize alert fatigue and cost Release Environments provide a helpful context for understanding scope from a DevOps perspective. For example: 1. Production Environment 2. Staging Environment 3. Testing Environment 4. Development Environment While resources considered in the scope of a Production Environment might have a full set of recommendations applied for logging and monitoring, other release environments might have a limited set of recommendations applied for the sake of accountability. The names of these environments and which resources are in the scope of each environment will vary from one organization to another.

๐Ÿ’ผ 7.1.1 Configuring Diagnostic Settings

The Azure Diagnostic Settings capture control/management activities performed on a subscription or Azure AD Tenant. By default, the Azure Portal retains activity logs only for 90 days. The Diagnostic Settings define the type of events that are stored or streamed and the outputsโ€”storage account, log analytics workspace, event hub, and others. The Diagnostic Settings, if configured properly, can ensure that all logs are retained for longer duration. This section has recommendations for correctly configuring the Diagnostic Settings so that all logs captured are retained for longer periods. Azure Subscriptions When configuring Diagnostic Settings, you may choose to export in one of four ways in which you need to ensure appropriate data retention. The options are Log Analytics workspace, Event Hub, Storage Account, and Partner Solutions. It is important to ensure you are aware and have set retention as your organization sees fit. Azure AD Logs In order to retain sign in logs, user account changes, application provisioning logs, or other logs that are visible to only on the Tenant in Azure AD, separate Diagnostic settings must be specified. Deployment by Policy Deploying Azure diagnostics should ideally be done by policy to ensure a consistent configuration. Microsoft provides a full set of policies for all diagnostic capable resource types in their GitHub repository. If you chose to deploy by policy, it is best to route the diagnostics to a Log Analytics Workspace so that they can be used in Azure Monitor or Azure Sentinel. Be aware that this has a cost attached to it.

๐Ÿ’ผ 7.1.1.2 Ensure Diagnostic Setting captures appropriate categories (Automated)

Prerequisite: A Diagnostic Setting must exist. If a Diagnostic Setting does not exist, the navigation and options within this recommendation will not be available. Please review the recommendation at the beginning of this subsection titled: "Ensure that a 'Diagnostic Setting' exists." The diagnostic setting should be configured to log the appropriate activities from the control/management plane.

๐Ÿ’ผ 7.1.2 Monitoring using Activity Log Alerts

The recommendations provided in this section are intended to provide entry-level alerting for crucial activities on a tenant account. These recommended activities should be tuned to your needs. By default, each of these Activity Log Alerts tends to guide the reader to alerting at the "Subscription-wide" level which will capture and alert on rules triggered by all resources and resource groups contained within a subscription. This is not an ideal rule set for Alerting within larger and more complex organizations. While this section provides recommendations for the creation of Activity Log Alerts specifically, Microsoft Azure supports four different types of alerts: โ€ข Metric Alerts โ€ข Log Alerts โ€ข Activity Log Alerts โ€ข Smart Detection Alerts All Azure services (Microsoft provided or otherwise) that can generate alerts are assigned a "Resource provider namespace" when they are registered in an Azure tenant. The recommendations in this section are in no way exhaustive of the plethora of available "Providers" or "Resource Types." The Resource Providers that are registered in your Azure Tenant can be located in your Subscription. Each registered Provider in your environment may have available "Conditions" to raise alerts via Activity Log Alerts. These providers should be considered for inclusion in Activity Log Alert rules of your own making.

๐Ÿ’ผ 7.1.3.1 Ensure Application Insights are Configured (Automated)

Application Insights within Azure act as an Application Performance Monitoring solution providing valuable data into how well an application performs and additional information when performing incident response. The types of log data collected include application metrics, telemetry data, and application trace logging data providing organizations with detailed information about application activity and application transactions. Both data sets help organizations adopt a proactive and retroactive means to handle security and performance related metrics within their modern applications.

๐Ÿ’ผ 7.1.4 Ensure that Azure Monitor Resource Logging is Enabled for All Services that Support it (Manual)

Resource Logs capture activity to the data access plane while the Activity log is a subscription-level log for the control plane. Resource-level diagnostic logs provide insight into operations that were performed within that resource itself; for example, reading or updating a secret from a Key Vault. Currently, 95 Azure resources support Azure Monitoring (See the more information section for a complete list), including Network Security Groups, Load Balancers, Key Vault, AD, Logic Apps, and CosmosDB. The content of these logs varies by resource type. A number of back-end services were not configured to log and store Resource Logs for certain activities or for a sufficient length. It is crucial that monitoring is correctly configured to log all relevant activities and retain those logs for a sufficient length of time. Given that the mean time to detection in an enterprise is 240 days, a minimum retention period of two years is recommended.

๐Ÿ’ผ 7.1.5 Ensure that SKU Basic/Consumption is not used on artifacts that need to be monitored (Particularly for Production Workloads) (Manual)

The use of Basic or Free SKUs in Azure whilst cost effective have significant limitations in terms of what can be monitored and what support can be realized from Microsoft. Typically, these SKU's do not have a service SLA and Microsoft may refuse to provide support for them. Consequently Basic/Free SKUs should never be used for production workloads.

๐Ÿ’ผ 7.10 Storage media

Storage media shall be managed through their life cycle of acquisition, use, transportation and disposal in accordance with the organizationโ€™s classification scheme and handling requirements.

๐Ÿ’ผ 7.2 Ensure that all BigQuery Tables are encrypted with Customer-managed encryption key (CMEK) - Level 2 (Automated | Roadmapped)

BigQuery by default encrypts the data as rest by employing `Envelope Encryption` using Google managed cryptographic keys. The data is encrypted using the `data encryption keys` and data encryption keys themselves are further encrypted using `key encryption keys`. This is seamless and do not require any additional input from the user. However, if you want to have greater control, Customer-managed encryption keys (CMEK) can be used as encryption key management solution for BigQuery Data Sets. If CMEK is used, the CMEK is used to encrypt the data encryption keys instead of using google-managed encryption keys. Will be supported in the near future.

๐Ÿ’ผ 7.2 Ensure That All BigQuery Tables Are Encrypted With Customer-Managed Encryption Key (CMEK) - Level 2 (Automated)

BigQuery by default encrypts the data as rest by employing `Envelope Encryption` using Google managed cryptographic keys. The data is encrypted using the `data encryption keys` and data encryption keys themselves are further encrypted using `key encryption keys`. This is seamless and do not require any additional input from the user. However, if you want to have greater control, Customer-managed encryption keys (CMEK) can be used as encryption key management solution for BigQuery Data Sets. If CMEK is used, the CMEK is used to encrypt the data encryption keys instead of using google-managed encryption keys.

๐Ÿ’ผ 7.2 Ensure That All BigQuery Tables Are Encrypted With Customer-Managed Encryption Key (CMEK) - Level 2 (Automated)

BigQuery by default encrypts the data as rest by employing Envelope Encryption using Google managed cryptographic keys. The data is encrypted using the data encryption keys and data encryption keys themselves are further encrypted using key encryption keys. This is seamless and do not require any additional input from the user. However, if you want to have greater control, Customer-managed encryption keys (CMEK) can be used as encryption key management solution for BigQuery Data Sets. If CMEK is used, the CMEK is used to encrypt the data encryption keys instead of using google-managed encryption keys.

๐Ÿ’ผ 7.2 Ensure That All BigQuery Tables Are Encrypted With Customer-Managed Encryption Key (CMEK) - Level 2 (Automated)

BigQuery by default encrypts the data as rest by employing `Envelope Encryption` using Google managed cryptographic keys. The data is encrypted using the `data encryption keys` and data encryption keys themselves are further encrypted using `key encryption keys`. This is seamless and do not require any additional input from the user. However, if you want to have greater control, Customer-managed encryption keys (CMEK) can be used as encryption key management solution for BigQuery Data Sets. If CMEK is used, the CMEK is used to encrypt the data encryption keys instead of using google-managed encryption keys.

๐Ÿ’ผ 7.2 Ensure that Resource Locks are set for Mission-Critical Azure Resources (Manual)

Resource Manager Locks provide a way for administrators to lock down Azure resources to prevent deletion of, or modifications to, a resource. These locks sit outside of the Role Based Access Controls (RBAC) hierarchy and, when applied, will place restrictions on the resource for all users. These locks are very useful when there is an important resource in a subscription that users should not be able to delete or change. Locks can help prevent accidental and malicious changes or deletion.

๐Ÿ’ผ 7.3 Ensure that a Default Customer-managed encryption key (CMEK) is specified for all BigQuery Data Sets - Level 2 (Automated | Roadmapped)

BigQuery by default encrypts the data as rest by employing `Envelope Encryption` using Google managed cryptographic keys. The data is encrypted using the `data encryption keys` and data encryption keys themselves are further encrypted using `key encryption keys`. This is seamless and do not require any additional input from the user. However, if you want to have greater control, Customer-managed encryption keys (CMEK) can be used as encryption key management solution for BigQuery Data Sets. Will be supported in the near future.

๐Ÿ’ผ 7.3 Ensure That a Default Customer-Managed Encryption Key (CMEK) Is Specified for All BigQuery Data Sets - Level 2 (Automated)

BigQuery by default encrypts the data as rest by employing Envelope Encryption using Google managed cryptographic keys. The data is encrypted using the data encryption keys and data encryption keys themselves are further encrypted using key encryption keys. This is seamless and do not require any additional input from the user. However, if you want to have greater control, Customer-managed encryption keys (CMEK) can be used as encryption key management solution for BigQuery Data Sets.

๐Ÿ’ผ 7.3 Ensure That a Default Customer-Managed Encryption Key (CMEK) Is Specified for All BigQuery Data Sets - Level 2 (Automated)

BigQuery by default encrypts the data as rest by employing `Envelope Encryption` using Google managed cryptographic keys. The data is encrypted using the `data encryption keys` and data encryption keys themselves are further encrypted using `key encryption keys`. This is seamless and do not require any additional input from the user. However, if you want to have greater control, Customer-managed encryption keys (CMEK) can be used as encryption key management solution for BigQuery Data Sets.

๐Ÿ’ผ 7.3 Ensure That a Default Customer-Managed Encryption Key (CMEK) Is Specified for All BigQuery Data Sets - Level 2 (Manual)

BigQuery by default encrypts the data as rest by employing `Envelope Encryption` using Google managed cryptographic keys. The data is encrypted using the `data encryption keys` and data encryption keys themselves are further encrypted using `key encryption keys`. This is seamless and do not require any additional input from the user. However, if you want to have greater control, Customer-managed encryption keys (CMEK) can be used as encryption key management solution for BigQuery Data Sets.

๐Ÿ’ผ 7.6 [Legacy] Ensure that VHDs are Encrypted - Level 2 (Manual)

**NOTE: This is a legacy recommendation. Managed Disks are encrypted by default and recommended for all new VM implementations.** VHD (Virtual Hard Disks) are stored in blob storage and are the old-style disks that were attached to Virtual Machines. The blob VHD was then leased to the VM. By default, storage accounts are not encrypted, and Microsoft Defender will then recommend that the OS disks should be encrypted. Storage accounts can be encrypted as a whole using PMK or CMK. This should be turned on for storage accounts containing VHDs.

๐Ÿ’ผ 7.7 [Legacy] Ensure that VHDs are Encrypted - Level 2 (Manual)

NOTE: This is a legacy recommendation. Managed Disks are encrypted by default and recommended for all new VM implementations. VHD (Virtual Hard Disks) are stored in blob storage and are the old-style disks that were attached to Virtual Machines. The blob VHD was then leased to the VM. By default, storage accounts are not encrypted, and Microsoft Defender will then recommend that the OS disks should be encrypted. Storage accounts can be encrypted as a whole using PMK or CMK. This should be turned on for storage accounts containing VHDs.

๐Ÿ’ผ 7.7 [Legacy] Ensure that VHDs are Encrypted - Level 2 (Manual)

**NOTE: This is a legacy recommendation. Managed Disks are encrypted by default and recommended for all new VM implementations.** VHD (Virtual Hard Disks) are stored in blob storage and are the old-style disks that were attached to Virtual Machines. The blob VHD was then leased to the VM. By default, storage accounts are not encrypted, and Microsoft Defender will then recommend that the OS disks should be encrypted. Storage accounts can be encrypted as a whole using PMK or CMK. This should be turned on for storage accounts containing VHDs.

๐Ÿ’ผ 7.7 Ensure that VHD's are encrypted - Level 2 (Manual | Not supported, requires a manual assessment)

VHD (Virtual Hard Disks) are stored in BLOB storage and are the old style disks that were attached to Virtual Machines, and the BLOB VHD was then leased to the VM. By Default storage accounts are not encrypted, and Azure Defender(Security Centre) would then recommend that the OS disks should be encrypted. Storage accounts can be encrypted as a whole using PMK or CMK and this should be turned on for storage accounts containing VHD's.

๐Ÿ’ผ 7.7 Ensure that VHD's are Encrypted - Level 2 (Manual | Not supported, requires a manual assessment)

VHD (Virtual Hard Disks) are stored in BLOB storage and are the old style disks that were attached to Virtual Machines, and the BLOB VHD was then leased to the VM. By Default storage accounts are not encrypted, and Azure Defender(Security Centre) would then recommend that the OS disks should be encrypted. Storage accounts can be encrypted as a whole using PMK or CMK and this should be turned on for storage accounts containing VHD's.

๐Ÿ’ผ 7.9 Ensure Trusted Launch is enabled on Virtual Machines - Level 1 (Automated)

When **Secure Boot** and **vTPM** are enabled together, they provide a strong foundation for protecting your VM from boot attacks. For example, if an attacker attempts to replace the bootloader with a malicious version, Secure Boot will prevent the VM from booting. If the attacker is able to bypass Secure Boot and install a malicious bootloader, vTPM can be used to detect the intrusion and alert you.

๐Ÿ’ผ 8 Dataproc

Dataproc, a service within Google Cloud Platform (GCP), offers a fully managed and easy-to-use service for running Apache Spark and Apache Hadoop clusters. It simplifies the management of big data processing and analytics by handling the underlying infrastructure, allowing users to focus on data analysis rather than operational complexities. Dataproc is notable for its quick start-up and scaling capabilities, accommodating data loads from gigabytes to petabytes efficiently. It seamlessly integrates with other GCP services like BigQuery, Cloud Storage, and Cloud Bigtable, enhancing data processing and transfer capabilities. Additionally, its cost-effectiveness, with a pay-as-you-go model, makes it an attractive option for businesses seeking scalable and efficient big data solutions. [https://cloud.google.com/dataproc](https://cloud.google.com/dataproc?hl=en)

๐Ÿ’ผ 8 Key Vault

This section covers security recommendations to follow for the configuration and use of Azure Key Vault.

๐Ÿ’ผ 8 Key Vault

This section covers security recommendations to follow for the configuration and use of Azure Key Vault.

๐Ÿ’ผ 8 Key Vault

This section covers security recommendations to follow for the configuration and use of Azure Key Vault.

๐Ÿ’ผ 8 Virtual Machines

This section covers security recommendations to follow for the configuration of Virtual Machines on an Azure subscription.

๐Ÿ’ผ 8.1 Ensure an Azure Bastion Host Exists (Automated)

The Azure Bastion service allows secure remote access to Azure Virtual Machines over the Internet without exposing remote access protocol ports and services directly to the Internet. The Azure Bastion service provides this access using TLS over 443/TCP, and subscribes to hardened configurations within an organization's Azure Active Directory service.

๐Ÿ’ผ 8.1 Ensure that Dataproc Cluster is encrypted using Customer-Managed Encryption Key - Level 2 (Automated)

When you use Dataproc, cluster and job data is stored on Persistent Disks (PDs) associated with the Compute Engine VMs in your cluster and in a Cloud Storage staging bucket. This PD and bucket data is encrypted using a Google-generated data encryption key (DEK) and key encryption key (KEK). The CMEK feature allows you to create, use, and revoke the key encryption key (KEK). Google still controls the data encryption key (DEK).

๐Ÿ’ผ 8.11 Data masking

Data masking shall be used in accordance with the organizationโ€™s topic-specific policy on access control and other related topic-specific policies, and business requirements, taking applicable legislation into consideration.

๐Ÿ’ผ 8.11 Ensure Trusted Launch is enabled on Virtual Machines (Automated)

When **Secure Boot** and **vTPM** are enabled together, they provide a strong foundation for protecting your VM from boot attacks. For example, if an attacker attempts to replace the bootloader with a malicious version, Secure Boot will prevent the VM from booting. If the attacker is able to bypass Secure Boot and install a malicious bootloader, vTPM can be used to detect the intrusion and alert you.

๐Ÿ’ผ 8.13 Information backup

Backup copies of information, software and systems shall be maintained and regularly tested in accordance with the agreed topic-specific policy on backup.

๐Ÿ’ผ 8.15 Logging

Logs that record activities, exceptions, faults and other relevant events shall be produced, stored, protected and analysed.

๐Ÿ’ผ 8.16 Monitoring activities

Networks, systems and applications shall be monitored for anomalous behaviour and appropriate actions taken to evaluate potential information security incidents.

๐Ÿ’ผ 8.2 Ensure Virtual Machines are utilizing Managed Disks (Automated)

Migrate blob-based VHDs to Managed Disks on Virtual Machines to exploit the default features of this configuration. The features include: 1. Default Disk Encryption 2. Resilience, as Microsoft will managed the disk storage and move around if underlying hardware goes faulty 3. Reduction of costs over storage accounts

๐Ÿ’ผ 8.2.2 Group, shared, or generic accounts, or other shared authentication credentials are only used when necessary on an exception basis.

Manage as follows: - ID use is prevented unless needed for an exceptional circumstance. - Use is limited to the time needed for the exceptional circumstance. - Business justification for use is documented. - Use is explicitly approved by management. - Individual user identity is confirmed before access to an account is granted. - Every action taken is attributable to an individual user.

๐Ÿ’ผ 8.2.2 Group, shared, or generic accounts, or other shared authentication credentials are only used when necessary on an exception basis.

Manage as follows: - Account use is prevented unless needed for an exceptional circumstance. - Use is limited to the time needed for the exceptional circumstance. - Business justification for use is documented. - Use is explicitly approved by management. - Individual user identity is confirmed before access to an account is granted. - Every action taken is attributable to an individual user.

๐Ÿ’ผ 8.3 Ensure that Resource Locks are set for mission critical Azure resources

Resource Manager Locks provide a way for administrators to lock down Azure resources to prevent deletion of, or modifications to, a resource. These locks sit outside of the Role Based Access Controls (RBAC) hierarchy and, when applied, will place restrictions on the resource for all users. These are very useful when there is have an important resource in a subscription that users should not be able to delete or change and can help prevent accidental and malicious changes or deletion.

๐Ÿ’ผ 8.3 Ensure that Resource Locks are set for mission critical Azure resources - Level 2 (Manual | Not supported, requires a manual assessment)

Resource Manager Locks provide a way for administrators to lock down Azure resources to prevent deletion of, or modifications to, a resource. These locks sit outside of the Role Based Access Controls (RBAC) hierarchy and, when applied, will place restrictions on the resource for all users. These locks are very useful when there is an important resource in a subscription that users should not be able to delete or change. Locks can help prevent accidental and malicious changes or deletion.

๐Ÿ’ผ 8.4 Ensure the key vault is recoverable

The key vault contains object keys, secrets and certificates. Accidental unavailability of a key vault can cause immediate data loss or loss of security functions (authentication, validation, verification, non-repudiation, etc.) supported by the key vault objects. It is recommended the key vault be made recoverable by enabling the "Do Not Purge" and "Soft Delete" functions. This is in order to prevent loss of encrypted data including storage accounts, SQL databases, and/or dependent services provided by key vault objects (Keys, Secrets, Certificates) etc., as may happen in the case of accidental deletion by a user or from disruptive activity by a malicious user.

๐Ÿ’ผ 8.4 Ensure the key vault is recoverable - Level 1 (Automated)

The key vault contains object keys, secrets and certificates. Accidental unavailability of a key vault can cause immediate data loss or loss of security functions (authentication, validation, verification, non-repudiation, etc.) supported by the key vault objects. It is recommended the key vault be made recoverable by enabling the "Do Not Purge" and "Soft Delete" functions. This is in order to prevent loss of encrypted data including storage accounts, SQL databases, and/or dependent services provided by key vault objects (Keys, Secrets, Certificates) etc., as may happen in the case of accidental deletion by a user or from disruptive activity by a malicious user.

๐Ÿ’ผ 8.5 Ensure that Resource Locks are set for Mission Critical Azure Resources - Level 2 (Manual | Not supported, requires a manual assessment)

Resource Manager Locks provide a way for administrators to lock down Azure resources to prevent deletion of, or modifications to, a resource. These locks sit outside of the Role Based Access Controls (RBAC) hierarchy and, when applied, will place restrictions on the resource for all users. These locks are very useful when there is an important resource in a subscription that users should not be able to delete or change. Locks can help prevent accidental and malicious changes or deletion.

๐Ÿ’ผ 8.5 Ensure the Key Vault is Recoverable - Level 1 (Automated)

The Key Vault contains object keys, secrets, and certificates. Accidental unavailability of a Key Vault can cause immediate data loss or loss of security functions (authentication, validation, verification, non-repudiation, etc.) supported by the Key Vault objects. It is recommended the Key Vault be made recoverable by enabling the "Do Not Purge" and "Soft Delete" functions. This is in order to prevent loss of encrypted data, including storage accounts, SQL databases, and/or dependent services provided by Key Vault objects (Keys, Secrets, Certificates) etc. This may happen in the case of accidental deletion by a user or from disruptive activity by a malicious user. WARNING: A current limitation of the soft-delete feature across all Azure services is role assignments disappearing when Key Vault is deleted. All role assignments will need to be recreated after recovery.

๐Ÿ’ผ 8.5 Ensure the Key Vault is Recoverable - Level 1 (Automated)

The Key Vault contains object keys, secrets, and certificates. Accidental unavailability of a Key Vault can cause immediate data loss or loss of security functions (authentication, validation, verification, non-repudiation, etc.) supported by the Key Vault objects. It is recommended the Key Vault be made recoverable by enabling the "Do Not Purge" and "Soft Delete" functions. This is in order to prevent loss of encrypted data, including storage accounts, SQL databases, and/or dependent services provided by Key Vault objects (Keys, Secrets, Certificates) etc. This may happen in the case of accidental deletion by a user or from disruptive activity by a malicious user. WARNING: A current limitation of the soft-delete feature across all Azure services is role assignments disappearing when Key Vault is deleted. All role assignments will need to be recreated after recovery.

๐Ÿ’ผ 8.5 Ensure the Key Vault is Recoverable - Level 1 (Automated)

The Key Vault contains object keys, secrets, and certificates. Accidental unavailability of a Key Vault can cause immediate data loss or loss of security functions (authentication, validation, verification, non-repudiation, etc.) supported by the Key Vault objects. It is recommended the Key Vault be made recoverable by enabling the "Do Not Purge" and "Soft Delete" functions. This is in order to prevent loss of encrypted data, including storage accounts, SQL databases, and/or dependent services provided by Key Vault objects (Keys, Secrets, Certificates) etc. This may happen in the case of accidental deletion by a user or from disruptive activity by a malicious user. WARNING: A current limitation is that role assignments disappearing when Key Vault is deleted. All role assignments will need to be recreated after recovery.

๐Ÿ’ผ 8.5 Secure authentication

Secure authentication technologies and procedures shall be implemented based on information access restrictions and the topic-specific policy on access control.

๐Ÿ’ผ 8.5.1 MFA systems are implemented.

As follows: - The MFA system is not susceptible to replay attacks. - MFA systems cannot be bypassed by any users, including administrative users unless specifically documented, and authorized by management on an exception basis, for a limited time period. - At least two different types of authentication factors are used. - Success of all authentication factors is required before access is granted.

๐Ÿ’ผ 8.5.1 MFA systems are implemented.

As follows: - The MFA system is not susceptible to replay attacks. - MFA systems cannot be bypassed by any users, including administrative users unless specifically documented, and authorized by management on an exception basis, for a limited time period. - At least two different types of authentication factors are used. - Success of all authentication factors is required before access is granted.

๐Ÿ’ผ 8.6 Ensure the key vault is recoverable - Level 1 (Automated)

The key vault contains object keys, secrets and certificates. Accidental unavailability of a key vault can cause immediate data loss or loss of security functions (authentication, validation, verification, non-repudiation, etc.) supported by the key vault objects. It is recommended the key vault be made recoverable by enabling the "Do Not Purge" and "Soft Delete" functions. This is in order to prevent loss of encrypted data including storage accounts, SQL databases, and/or dependent services provided by key vault objects (Keys, Secrets, Certificates) etc., as may happen in the case of accidental deletion by a user or from disruptive activity by a malicious user.

๐Ÿ’ผ 8.6.1 If accounts used by systems or applications can be used for interactive login, they are managed.

As follows: - Interactive use is prevented unless needed for an exceptional circumstance. - Interactive use is limited to the time needed for the exceptional circumstance. - Business justification for interactive use is documented. - Interactive use is explicitly approved by management. - Individual user identity is confirmed before access to account is granted. - Every action taken is attributable to an individual user.

๐Ÿ’ผ 8.6.1 If accounts used by systems or applications can be used for interactive login, they are managed.

As follows: - Interactive use is prevented unless needed for an exceptional circumstance. - Interactive use is limited to the time needed for the exceptional circumstance. - Business justification for interactive use is documented. - Interactive use is explicitly approved by management. - Individual user identity is confirmed before access to account is granted. - Every action taken is attributable to an individual user.

๐Ÿ’ผ 8.6.3 Passwords/passphrases for any application and system accounts are protected against misuse.

As follows: - Passwords/passphrases are changed periodically (at the frequency defined in the entity's targeted risk analysis, which is performed according to all elements specified in Requirement 12.3.1) and upon suspicion or confirmation of compromise. - Passwords/passphrases are constructed with sufficient complexity appropriate for how frequently the entity changes the passwords/passphrases.

๐Ÿ’ผ 8.6.3 Passwords/passphrases for any application and system accounts are protected against misuse.

As follows: - Passwords/passphrases are changed periodically (at the frequency defined in the entity's targeted risk analysis, which is performed according to all elements specified in Requirement 12.3.1) and upon suspicion or confirmation of compromise. - Passwords/passphrases are constructed with sufficient complexity appropriate for how frequently the entity changes the passwords/passphrases.

๐Ÿ’ผ 8.7 All access to any database containing cardholder data is restricted.

As follows: - All user access to, user queries of, and user actions on databases are through programmatic methods. - Only database administrators have the ability to directly access or query databases. - Application IDs for database applications can only be used by the applications (and not by individual users or other non-application processes).

๐Ÿ’ผ 8.9 [Legacy] Ensure that VHDs are Encrypted (Manual)

**NOTE: This is a legacy recommendation. Managed Disks are encrypted by default and recommended for all new VM implementations.** VHD (Virtual Hard Disks) are stored in blob storage and are the old-style disks that were attached to Virtual Machines. The blob VHD was then leased to the VM. By default, storage accounts are not encrypted, and Microsoft Defender will then recommend that the OS disks should be encrypted. Storage accounts can be encrypted as a whole using PMK or CMK. This should be turned on for storage accounts containing VHDs.

๐Ÿ’ผ 8.9 Configuration management

Configurations, including security configurations, of hardware, software, services and networks shall be established, documented, implemented, monitored and reviewed.

๐Ÿ’ผ 82 Under CPS 234, an APRA-regulated entity must ensure that testing is conducted by appropriately skilled and functionally independent specialists. For an APRA-regulated entity to have confidence in the quality of testing, it is important that testers are sufficiently independent in order to provide a bias-free assessment of controls (i.e. unimpeded by a conflict of interest). This includes the use of testers who do not have operational responsibility for the controls being validated. The level of functional independence required would typically be determined by the nature and importance of the testing.

๐Ÿ’ผ 83 Internal audit is an important vehicle by which the Board can gain assurance that information security is maintained. This assurance would typically be achieved through the inclusion of information security within the APRA-regulated entityโ€™s internal audit plan. The Board could also choose to gain assurance through expert opinion or other means to complement the assurance provided by the internal audit function. This typically occurs where the required skills do not reside within the internal audit function or the area subject to audit pertains to third parties or related parties.

๐Ÿ’ผ 84 Under CPS 234, an APRA-regulated entityโ€™s internal audit function must review the design and operating effectiveness of information security controls. In APRAโ€™s view, an approach which achieves comprehensive assurance would involve an audit program which assesses all aspects of the information security control environment over time. The frequency at which areas to be audited are assessed would take into account the impact of an information security compromise and the ability to place reliance on other control testing undertaken. Additional assurance work may be triggered by changes to vulnerabilities and threats or material changes to IT assets.

๐Ÿ’ผ 9.1 Ensure 'HTTPS Only' is set to 'On' (Automated)

Azure App Service allows apps to run under both HTTP and HTTPS by default. Apps can be accessed by anyone using non-secure HTTP links by default. Non-secure HTTP requests can be restricted and all HTTP requests redirected to the secure HTTPS port. It is recommended to enforce HTTPS-only traffic.

๐Ÿ’ผ 9.1 Ensure App Service Authentication is set on Azure App Service

Azure App Service Authentication is a feature that can prevent anonymous HTTP requests from reaching the API app, or authenticate those that have tokens before they reach the API app. If an anonymous request is received from a browser, App Service will redirect to a logon page. To handle the logon process, a choice from a set of identity providers can be made, or a custom authentication mechanism can be implemented.

๐Ÿ’ผ 9.1 Ensure App Service Authentication is set on Azure App Service - Level 2 (Automated)

Azure App Service Authentication is a feature that can prevent anonymous HTTP requests from reaching the API app, or authenticate those that have tokens before they reach the API app. If an anonymous request is received from a browser, App Service will redirect to a logon page. To handle the logon process, a choice from a set of identity providers can be made, or a custom authentication mechanism can be implemented.

๐Ÿ’ผ 9.1 Ensure App Service Authentication is set up for apps in Azure App Service - Level 2 (Automated)

Azure App Service Authentication is a feature that can prevent anonymous HTTP requests from reaching the API app, or authenticate those that have tokens before they reach the API app. If an anonymous request is received from a browser, App Service will redirect to a logon page. To handle the logon process, a choice from a set of identity providers can be made, or a custom authentication mechanism can be implemented.

๐Ÿ’ผ 9.1 Ensure App Service Authentication is set up for apps in Azure App Service - Level 2 (Automated)

Azure App Service Authentication is a feature that can prevent anonymous HTTP requests from reaching the API app, or authenticate those that have tokens before they reach the API app. If an anonymous request is received from a browser, App Service will redirect to a logon page. To handle the logon process, a choice from a set of identity providers can be made, or a custom authentication mechanism can be implemented.

๐Ÿ’ผ 9.1 Ensure App Service Authentication is set up for apps in Azure App Service - Level 2 (Automated)

Azure App Service Authentication is a feature that can prevent anonymous HTTP requests from reaching a Web Application or authenticate those with tokens before they reach the app. If an anonymous request is received from a browser, App Service will redirect to a logon page. To handle the logon process, a choice from a set of identity providers can be made, or a custom authentication mechanism can be implemented.

๐Ÿ’ผ 9.1 Ensure App Service Authentication is set up for apps in Azure App Service - Level 2 (Automated)

Azure App Service Authentication is a feature that can prevent anonymous HTTP requests from reaching a Web Application or authenticate those with tokens before they reach the app. If an anonymous request is received from a browser, App Service will redirect to a logon page. To handle the logon process, a choice from a set of identity providers can be made, or a custom authentication mechanism can be implemented.

๐Ÿ’ผ 9.1 Microsoft Defender for Cloud

This subsection provides guidance on the use of Microsoft Defender for Cloud and associated product plans. This guidance is intended to ensure thatโ€”at a minimumโ€”the protective measures offered by these plans are being considered. Organizations may find that they have existing products or services that provide the same utility as some Microsoft Defender for Cloud products. Security and Administrative personnel need to make the determination on their organization's behalf regarding whichโ€”if anyโ€”of these recommendations are relevant to their organization's needs. In consideration of the above, and because of the potential for increased cost and complexity, please be aware that all Microsoft Defender for Cloud and associated plan recommendations are profiled as "Level 2" recommendations.

๐Ÿ’ผ 9.1.1 Use either video cameras or access control mechanisms (or both) to monitor individual physical access to sensitive areas.

Review collected data and correlate with other entries. Store for at least three months, unless otherwise restricted by law. โ€œSensitive areasโ€ refers to any data center, server room or any area that houses systems that store, process, or transmit cardholder data. This excludes public-facing areas where only point-of-sale terminals are present, such as the cashier areas in a retail store.

๐Ÿ’ผ 9.1.1 Microsoft Cloud Security Posture Management (CSPM)

Microsoft Defender for Cloud offers foundational and advanced Cloud Security Posture Management (CSPM) solutions to protect across multi-cloud and hybrid environments. Both solutions cover PaaS as well as IaaS. CSPM provides reporting functionality on security and regulatory frameworks including NIST 800 series, ISO 27001, PCI-DSS, CIS Benchmarks and Controls, and many more. CSPM also provides the ability to create your own custom framework, but this will require significant work. Regulatory standards are reported in a compliance dashboard which offers a summarized view against deployed standards and presents the ability to download compliance reports in various formats. CSPM has two types of implementations: 1. Foundational (Free): This implementation is free and enabled by default with a limited set of features including: โ€ข Continuous assessment of the security configuration of cloud resources โ€ข Security recommendations to fix misconfigurations and weaknesses โ€ข Secure score summarizing current overall security posture 2. Full CSPM (Paid): Full CSPM is a paid product offering additional functionality including: โ€ข Identity and role assignments discovery โ€ข Network exposure detection โ€ข Attack path analysis โ€ข Cloud security explorer for risk hunting โ€ข Agentless vulnerability scanning โ€ข Agentless secrets scanning โ€ข Governance rules to drive timely remediation and accountability โ€ข Regulatory compliance and industry best practices โ€ข Data-aware security posture โ€ข Agentless discovery for Kubernetes โ€ข Agentless container vulnerability assessment It is recommended that for full CSPM a cost review is undertaken particularly if your tenant is heavy on IaaS prior to implementing and matched to security requirements.

๐Ÿ’ผ 9.1.16 Ensure that Microsoft Defender External Attack Surface Monitoring (EASM) is enabled (Manual)

An organization's attack surface is the collection of assets with a public network identifier or URI that an external threat actor can see or access from outside your cloud. It is the set of points on the boundary of a system, a system element, system component, or an environment where an attacker can try to enter, cause an effect on, or extract data from, that system, system element, system component, or environment. The larger the attack surface, the harder it is to protect. This tool can be configured to scan your organization's online infrastructure such as specified domains, hosts, CIDR blocks, and SSL certificates, and store them in an Inventory. Inventory items can be added, reviewed, approved, and removed, and may contain enrichments ("insights") and additional information collected from the tool's different scan engines and open-source intelligence sources. A Defender EASM workspace will generate an Inventory of publicly exposed assets by crawling and scanning the internet using Seeds you provide when setting up the tool. Seeds can be FQDNs, IP CIDR blocks, and WHOIS records. Defender EASM will generate Insights within 24-48 hours after Seeds are provided, and these insights include vulnerability data (CVEs), ports and protocols, and weak or expired SSL certificates that could be used by an attacker for reconnaissance or exploitation. Results are classified High/Medium/Low and some of them include proposed mitigations.

๐Ÿ’ผ 9.1.2 Defender Plan: APIs

Defender for APIs in Microsoft Defender for Cloud offers full lifecycle protection, detection, and response coverage for APIs published in Azure API Management. Defender for APIs helps you to gain visibility into business-critical APIs. You can investigate and improve your API security posture, prioritize vulnerability fixes, and quickly detect active real-time threats. Defender for APIs requires additional configuration in the Microsoft API portal. Note: There is a cost attached to using Defender for API.

๐Ÿ’ผ 9.1.3.1 Ensure that Defender for Servers is set to 'On' (Automated)

The Defender for Servers plan in Microsoft Defender for Cloud reduces security risk by providing actionable recommendations to improve and remediate machine security posture. Defender for Servers also helps to protect machines against real-time security threats and attacks. Defender for Servers offers two paid plans: Plan 1 The following components are enabled by default: - Log Analytics agent (deprecated) - Endpoint protection Plan 1 also offers the following components, disabled by default: - Vulnerability assessment for machines - Guest Configuration agent (preview) Plan 2 The following components are enabled by default: - Log Analytics agent (deprecated) - Vulnerability assessment for machines - Endpoint protection - Agentless scanning for machines Plan 2 also offers the following components, disabled by default: - Guest Configuration agent (preview) - File Integrity Monitoring

๐Ÿ’ผ 9.1.3.3 Ensure that 'Endpoint protection' component status is set to 'On' (Manual)

The Endpoint protection component enables Microsoft Defender for Endpoint (formerly 'Advanced Threat Protection' or 'ATP' or 'WDATP' - see additional info) to communicate with Microsoft Defender for Cloud. IMPORTANT: When enabling integration between DfE & DfC it needs to be taken into account that this will have some side effects that may be undesirable. 1. For server 2019 & above if defender is installed (default for these server SKUs) this will trigger a deployment of the new unified agent and link to any of the extended configuration in the Defender portal. 2. If the new unified agent is required for server SKUs of Win 2016 or Linux and lower there is additional integration that needs to be switched on and agents need to be aligned.

๐Ÿ’ผ 9.1.4.1 Ensure That Microsoft Defender for Containers Is Set To 'On' (Automated)

Microsoft Defender for Containers helps improve, monitor, and maintain the security of containerized assetsโ€”including Kubernetes clusters, nodes, workloads, container registries, and imagesโ€”across multi-cloud and on-premises environments. By default, when enabling the plan through the Azure Portal, Microsoft Defender for Containers automatically configures the following components: โ€ข Agentless scanning for machines โ€ข Defender sensor for runtime protection โ€ข Azure Policy for enforcing security best practices โ€ข K8S API access for monitoring and threat detection โ€ข Registry access for vulnerability assessment Note: Microsoft Defender for Container Registries ('ContainerRegistry') is deprecated and has been replaced by Microsoft Defender for Containers ('Containers').

๐Ÿ’ผ 9.2 Ensure App Service Authentication is set up for apps in Azure App Service (Automated)

Azure App Service Authentication is a feature that can prevent anonymous HTTP requests from reaching a Web Application or authenticate those with tokens before they reach the app. If an anonymous request is received from a browser, App Service will redirect to a logon page. To handle the logon process, a choice from a set of identity providers can be made, or a custom authentication mechanism can be implemented.

๐Ÿ’ผ 9.3 Key Vault

This section covers security recommendations to follow for the configuration and use of Azure Key Vault.

๐Ÿ’ผ 9.3.10 Ensure that Azure Key Vault Managed HSM is used when required (Manual)

Azure Key Vault Managed HSM is a fully managed, highly available, single-tenant cloud service that safeguards cryptographic keys using FIPS 140-2 Level 3 validated HSMs. Note: This recommendation to use Managed HSM applies only to scenarios where specific regulatory and compliance requirements mandate the use of a dedicated hardware security module.

๐Ÿ’ผ 9.3.5 Ensure the Key Vault is Recoverable (Automated)

Key Vaults contain object keys, secrets, and certificates. Deletion of a Key Vault can cause immediate data loss or loss of security functions (authentication, validation, verification, non-repudiation, etc.) supported by the Key Vault objects. It is recommended the Key Vault be made recoverable by enabling the "Do Not Purge" and "Soft Delete" functions. This is in order to prevent loss of encrypted data, including storage accounts, SQL databases, and/or dependent services provided by Key Vault objects (Keys, Secrets, Certificates) etc. This may happen in the case of accidental deletion by a user or from disruptive activity by a malicious user. NOTE: In February 2025, Microsoft will enable soft-delete protection on all key vaults, and users will no longer be able to opt out of or turn off soft-delete. WARNING: A current limitation is that role assignments disappearing when Key Vault is deleted. All role assignments will need to be recreated after recovery.

๐Ÿ’ผ 9.3.7 Ensure that Public Network Access when using Private Endpoint is disabled (Automated)

When Private endpoint is configured on a Key Vault, connections from Azure resources within the same subnet will use its private IP address. However, network traffic from the public internet can still flow connect to the Key Vault's public endpoint (mykeyvault.vault.azure.net) using its public IP address unless Public network access is set to โ€œDisabledโ€. Setting the Public network access to โ€œDisabledโ€ with a Private Endpoint will remove the Vault's public endpoint from Azure public DNS, reducing its exposure to the public internet. Network traffic will use the Vault private endpoint IP address for all requests (mykeyvault.vault.privatelink.azure.net).

๐Ÿ’ผ 9.4.1 Ensure an Azure Bastion Host Exists (Automated)

The Azure Bastion service allows secure remote access to Azure Virtual Machines over the Internet without exposing remote access protocol ports and services directly to the Internet. The Azure Bastion service provides this access using TLS over 443/TCP, and subscribes to hardened configurations within an organization's Azure Active Directory service.

๐Ÿ’ผ 9.5.1.3 Training is provided for personnel in POI environments to be aware of attempted tampering or replacement of POI devices.

Includes: - Verifying the identity of any third-party persons claiming to be repair or maintenance personnel, before granting them access to modify or troubleshoot devices. - Procedures to ensure devices are not installed, replaced, or returned without verification. - Being aware of suspicious behavior around devices. - Reporting suspicious behavior and indications of device tampering or substitution to appropriate personnel.

๐Ÿ’ผ 9.5.1.3 Training is provided for personnel in POI environments to be aware of attempted tampering or replacement of POI devices.

Includes: - Verifying the identity of any third-party persons claiming to be repair or maintenance personnel, before granting them access to modify or troubleshoot devices. - Procedures to ensure devices are not installed, replaced, or returned without verification. - Being aware of suspicious behavior around devices. - Reporting suspicious behavior and indications of device tampering or substitution to appropriate personnel.

๐Ÿ’ผ 9.6 Ensure that 'Basic Authentication' is 'Disabled' (Manual)

Basic Authentication provides the ability to create identities and authentication for an App Service without a centralized Identity Provider. For a more effective, capable, and secure solution for Identity, Authentication, Authorization, and Accountability, a centralized Identity Provider such as Entra ID is strongly advised.

๐Ÿ’ผ 9.9.3 Provide training for personnel to be aware of attempted tampering or replacement of devices.

Training should include the following: - Verify the identity of any third-party persons claiming to be repair or maintenance personnel, prior to granting them access to modify or troubleshoot devices. - Do not install, replace, or return devices without verification. - Be aware of suspicious behavior around devices (for example, attempts by unknown persons to unplug or open devices). - Report suspicious behavior and indications of device tampering or substitution to appropriate personnel (for example, to a manager or security officer).

๐Ÿ’ผ A.11.1 Secure areas

To prevent unauthorized physical access, damage and interference to the organization's information and information processing facilities.

๐Ÿ’ผ A.11.1.6 Delivery and loading areas

Access points such as delivery and loading areas and other points where unauthorized persons could enter the premises shall be controlled and, if possible, isolated from information processing facilities to avoid unauthorized access.

๐Ÿ’ผ A.11.2.3 Cabling security

Power and telecommunications cabling carrying data or supporting information services shall be protected from interception, interference or damage.

๐Ÿ’ผ A.12.1.2 Change management

Changes to the organization, business processes, information processing facilities and systems that affect information security shall be controlled.

๐Ÿ’ผ A.12.4.1 Event logging

Event logs recording user activities, exceptions, faults and information security events shall be produced, kept and regularly reviewed.

๐Ÿ’ผ A.12.6.1 Management of technical vulnerabilities

Information about technical vulnerabilities of information systems being used shall be obtained in a timely fashion, the organizationโ€™s exposure to such vulnerabilities evaluated and appropriate measures taken to address the associated risk.

๐Ÿ’ผ A.13.1.2 Security of network services

Security mechanisms, service levels and management requirements of all network services shall be identified and included in network services agreements, whether these services are provided in-house or outsourced.

๐Ÿ’ผ A.15.2.2 Managing changes to supplier services

Changes to the provision of services by suppliers, including maintaining and improving existing information security policies, procedures and controls, shall be managed, taking account of the criticality of business information, systems and processes involved and re-assessment of risks

๐Ÿ’ผ A.18.1.2 Intellectual property rights

Appropriate procedures shall be implemented to ensure compliance with legislative, regulatory and contractual requirements related to intellectual property rights and use of proprietary software products.

๐Ÿ’ผ A.18.1.3 Protection of records

Records shall be protected from loss, destruction, falsification, unauthorized access and unauthorized release, in accordance with legislatory, regulatory, contractual and business requirements.

๐Ÿ’ผ A.18.2.1 Independent review of information security

The organizationโ€™s approach to managing information security and its implementation (i.e. control objectives, controls, policies, processes and procedures for information security) shall be reviewed independently at planned intervals or when significant changes occur.

๐Ÿ’ผ A.6.2.2 Teleworking

A policy and supporting security measures shall be implemented to protect information accessed, processed or stored at teleworking sites

๐Ÿ’ผ A.7.1.1 Screening

Background verification checks on all candidates for employment shall be carried out in accordance with relevant laws, regulations and ethics and shall be proportional to the business requirements, the classification of the information to be accessed and the perceived risks.

๐Ÿ’ผ A.8.1.1 Inventory of assets

Assets associated with information and information processing facilities shall be identified and an inventory of these assets shall be drawn up and maintained.

๐Ÿ’ผ A.8.1.3 Acceptable use of assets

Rules for the acceptable use of information and of assets associated with information and information processing facilities shall be identified, documented and implemented.

๐Ÿ’ผ A.8.1.4 Return of assets

All employees and external party users shall return all of the organizational assets in their possession upon termination of their employment, contract or agreement.

๐Ÿ’ผ A.8.2.2 Labelling of information

An appropriate set of procedures for information labelling shall be developed and implemented in accordance with the information classification scheme adopted by the organization.

๐Ÿ’ผ A.8.2.3 Handling of assets

Procedures for handling assets shall be developed and implemented in accordance with the information classification scheme adopted by the organization.

๐Ÿ’ผ A1.1-1 Measures Current Usage

The use of the system components is measured to establish a baseline for capacity management and to use when evaluating the risk of impaired availability due to capacity constraints.

๐Ÿ’ผ A1.1-2 Forecasts Capacity

The expected average and peak use of system components is forecasted and compared to system capacity and associated tolerances. Forecasting considers capacity in the event of the failure of system components that constrain capacity.

๐Ÿ’ผ A1.2-1 Identifies Environmental Threats

As part of the risk assessment process, management identifies environmental threats that could impair the availability of the system, including threats resulting from adverse weather, failure of environmental control systems, electrical discharge, fire, and water.

๐Ÿ’ผ A1.2-5 Responds to Environmental Threat Events

Procedures are in place for responding to environmental threat events and for evaluating the effectiveness of those policies and procedures on a periodic basis. This includes automatic mitigation systems (for example, uninterruptible power system and generator back-up subsystem).

๐Ÿ’ผ A1.2-9 Addresses Offsite Storage

Back-up data is stored in a location at a distance from its principal storage location sufficient that the likelihood of a security or environmental threat event affecting both sets of data is reduced to an appropriate level.

๐Ÿ’ผ A1.3-1 Implements Business Continuity Plan Testing

Business continuity plan testing is performed on a periodic basis. The testing includes (1) development of testing scenarios based on threat likelihood and magnitude; (2) consideration of system components from across the entity that can impair the availability; (3) scenarios that consider the potential for the lack of availability of key personnel; and (4) revision of continuity plans and systems based on test results.

๐Ÿ’ผ AC-1 ACCESS CONTROL POLICY AND PROCEDURES

The organization: AC-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: AC-1a.1. An access control policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and AC-1a.2. Procedures to facilitate the implementation of the access control policy and associated access controls; and AC-1b. Reviews and updates the current: AC-1b.1. Access control policy [Assignment: organization-defined frequency]; and AC-1b.2. Access control procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ AC-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] access control policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the access control policy and the associated access controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the access control policy and procedures; and c. Review and update the current access control: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ AC-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] access control policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the access control policy and the associated access controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the access control policy and procedures; and c. Review and update the current access control: 1. Policy [FedRAMP Assignment: at least every 3 years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AC-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] access control policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the access control policy and the associated access controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the access control policy and procedures; and c. Review and update the current access control: 1. Policy [FedRAMP Assignment: at least every 3 years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AC-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] access control policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the access control policy and the associated access controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the access control policy and procedures; and c. Review and update the current access control: 1. Policy [FedRAMP Assignment: at least every 3 years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AC-10 CONCURRENT SESSION CONTROL

The information system limits the number of concurrent sessions for each [Assignment: organization-defined account and/or account type] to [Assignment: organization-defined number].

๐Ÿ’ผ AC-10 Concurrent Session Control (H)

Limit the number of concurrent sessions for each [Assignment: organization-defined account and/or account type] to [FedRAMP Assignment: three (3) sessions for privileged access and two (2) sessions for non-privileged access].

๐Ÿ’ผ AC-10 Concurrent Session Control (H)

Limit the number of concurrent sessions for each [Assignment: organization-defined account and/or account type] to [FedRAMP Assignment: three (3) sessions for privileged access and two (2) sessions for non-privileged access].

๐Ÿ’ผ AC-11 Device Lock

a. Prevent further access to the system by [Selection (one or more): initiating a device lock after [Assignment: organization-defined time period] of inactivity; requiring the user to initiate a device lock before leaving the system unattended]; and b. Retain the device lock until the user reestablishes access using established identification and authentication procedures.

๐Ÿ’ผ AC-11 Device Lock (M)(H)

a. Prevent further access to the system by [Selection (one-or-more): initiating a device lock after [FedRAMP Assignment: fifteen (15) minutes of inactivity; requiring the user to initiate a device lock before leaving the system unattended]; and b. Retain the device lock until the user re-establishes access using established identification and authentication procedures.

๐Ÿ’ผ AC-11 Device Lock (M)(H)

a. Prevent further access to the system by [Selection (one-or-more): initiating a device lock after [FedRAMP Assignment: fifteen (15) minutes of inactivity; requiring the user to initiate a device lock before leaving the system unattended]; and b. Retain the device lock until the user re-establishes access using established identification and authentication procedures.

๐Ÿ’ผ AC-11 SESSION LOCK

The information system: AC-11a. Prevents further access to the system by initiating a session lock after [Assignment: organization-defined time period] of inactivity or upon receiving a request from a user; and AC-11b. Retains the session lock until the user reestablishes access using established identification and authentication procedures.

๐Ÿ’ผ AC-12 (1) USER-INITIATED LOGOUTS | MESSAGE DISPLAYS

The information system: AC-12 (1)(a) Provides a logout capability for user-initiated communications sessions whenever authentication is used to gain access to [Assignment: organization-defined information resources]; and AC-12 (1)(b) Displays an explicit logout message to users indicating the reliable termination of authenticated communications sessions.

๐Ÿ’ผ AC-12 SESSION TERMINATION

The information system automatically terminates a user session after [Assignment: organization-defined conditions or trigger events requiring session disconnect].

๐Ÿ’ผ AC-14 Permitted Actions Without Identification or Authentication

a. Identify [Assignment: organization-defined user actions] that can be performed on the system without identification or authentication consistent with organizational mission and business functions; and b. Document and provide supporting rationale in the security plan for the system, user actions not requiring identification or authentication.

๐Ÿ’ผ AC-14 PERMITTED ACTIONS WITHOUT IDENTIFICATION OR AUTHENTICATION

The organization: AC-14a. Identifies [Assignment: organization-defined user actions] that can be performed on the information system without identification or authentication consistent with organizational missions/business functions; and AC-14b. Documents and provides supporting rationale in the security plan for the information system, user actions not requiring identification or authentication.

๐Ÿ’ผ AC-14 Permitted Actions Without Identification or Authentication (L)(M)(H)

a. Identify [Assignment: organization-defined user actions] that can be performed on the system without identification or authentication consistent with organizational mission and business functions; and b. Document and provide supporting rationale in the security plan for the system, user actions not requiring identification or authentication.

๐Ÿ’ผ AC-14 Permitted Actions Without Identification or Authentication (L)(M)(H)

a. Identify [Assignment: organization-defined user actions] that can be performed on the system without identification or authentication consistent with organizational mission and business functions; and b. Document and provide supporting rationale in the security plan for the system, user actions not requiring identification or authentication.

๐Ÿ’ผ AC-14 Permitted Actions Without Identification or Authentication (L)(M)(H)

a. Identify [Assignment: organization-defined user actions] that can be performed on the system without identification or authentication consistent with organizational mission and business functions; and b. Document and provide supporting rationale in the security plan for the system, user actions not requiring identification or authentication.

๐Ÿ’ผ AC-16 (1) DYNAMIC ATTRIBUTE ASSOCIATION

The information system dynamically associates security attributes with [Assignment: organization-defined subjects and objects] in accordance with [Assignment: organization-defined security policies] as information is created and combined.

๐Ÿ’ผ AC-16 (5) ATTRIBUTE DISPLAYS FOR OUTPUT DEVICES

The information system displays security attributes in human-readable form on each object that the system transmits to output devices to identify [Assignment: organization-identified special dissemination, handling, or distribution instructions] using [Assignment: organization-identified human-readable, standard naming conventions].

๐Ÿ’ผ AC-16 (9) ATTRIBUTE REASSIGNMENT

The organization ensures that security attributes associated with information are reassigned only via re-grading mechanisms validated using [Assignment: organization-defined techniques or procedures].

๐Ÿ’ผ AC-16 Security and Privacy Attributes

a. Provide the means to associate [Assignment: organization-defined types of security and privacy attributes] with [Assignment: organization-defined security and privacy attribute values] for information in storage, in process, and/or in transmission; b. Ensure that the attribute associations are made and retained with the information; c. Establish the following permitted security and privacy attributes from the attributes defined in AC-16a for [Assignment: organization-defined systems]: [Assignment: organization-defined security and privacy attributes]; d. Determine the following permitted attribute values or ranges for each of the established attributes: [Assignment: organization-defined attribute values or ranges for established attributes]; e. Audit changes to attributes; and f. Review [Assignment: organization-defined security and privacy attributes] for applicability [Assignment: organization-defined frequency].

๐Ÿ’ผ AC-16 SECURITY ATTRIBUTES

The organization: AC-16a. Provides the means to associate [Assignment: organization-defined types of security attributes] having [Assignment: organization-defined security attribute values] with information in storage, in process, and/or in transmission; AC-16b. Ensures that the security attribute associations are made and retained with the information; AC-16c. Establishes the permitted [Assignment: organization-defined security attributes] for [Assignment: organization-defined information systems]; and AC-16d. Determines the permitted [Assignment: organization-defined values or ranges] for each of the established security attributes.

๐Ÿ’ผ AC-17 (4) PRIVILEGED COMMANDS | ACCESS

The organization: AC-17 (4)(a) Authorizes the execution of privileged commands and access to security-relevant information via remote access only for [Assignment: organization-defined needs]; and AC-17 (4)(b) Documents the rationale for such access in the security plan for the information system.

๐Ÿ’ผ AC-17 Remote Access

a. Establish and document usage restrictions, configuration/connection requirements, and implementation guidance for each type of remote access allowed; and b. Authorize each type of remote access to the system prior to allowing such connections.

๐Ÿ’ผ AC-17 REMOTE ACCESS

The organization: AC-17a. Establishes and documents usage restrictions, configuration/connection requirements, and implementation guidance for each type of remote access allowed; and AC-17b. Authorizes remote access to the information system prior to allowing such connections.

๐Ÿ’ผ AC-17 Remote Access (L)(M)(H)

a. Establish and document usage restrictions, configuration/connection requirements, and implementation guidance for each type of remote access allowed; and b. Authorize each type of remote access to the system prior to allowing such connections.

๐Ÿ’ผ AC-17 Remote Access (L)(M)(H)

a. Establish and document usage restrictions, configuration/connection requirements, and implementation guidance for each type of remote access allowed; and b. Authorize each type of remote access to the system prior to allowing such connections.

๐Ÿ’ผ AC-17 Remote Access (L)(M)(H)

a. Establish and document usage restrictions, configuration/connection requirements, and implementation guidance for each type of remote access allowed; and b. Authorize each type of remote access to the system prior to allowing such connections.

๐Ÿ’ผ AC-17(4) Privileged Commands and Access (M)(H)

(a) Authorize the execution of privileged commands and access to security-relevant information via remote access only in a format that provides assessable evidence and for the following needs: [Assignment: organization-defined needs]; and (b) Document the rationale for remote access in the security plan for the system.

๐Ÿ’ผ AC-17(4) Privileged Commands and Access (M)(H)

(a) Authorize the execution of privileged commands and access to security-relevant information via remote access only in a format that provides assessable evidence and for the following needs: [Assignment: organization-defined needs]; and (b) Document the rationale for remote access in the security plan for the system.

๐Ÿ’ผ AC-17(4) Remote Access | Privileged Commands and Access

(a) Authorize the execution of privileged commands and access to security-relevant information via remote access only in a format that provides assessable evidence and for the following needs: [Assignment: organization-defined needs]; and (b) Document the rationale for remote access in the security plan for the system.

๐Ÿ’ผ AC-18 Wireless Access

a. Establish configuration requirements, connection requirements, and implementation guidance for each type of wireless access; and b. Authorize each type of wireless access to the system prior to allowing such connections.

๐Ÿ’ผ AC-18 WIRELESS ACCESS

The organization: AC-18a. Establishes usage restrictions, configuration/connection requirements, and implementation guidance for wireless access; and AC-18b. Authorizes wireless access to the information system prior to allowing such connections.

๐Ÿ’ผ AC-18 Wireless Access (L)(M)(H)

a. Establish configuration requirements, connection requirements, and implementation guidance for each type of wireless access; and b. Authorize each type of wireless access to the system prior to allowing such connections.

๐Ÿ’ผ AC-18 Wireless Access (L)(M)(H)

a. Establish configuration requirements, connection requirements, and implementation guidance for each type of wireless access; and b. Authorize each type of wireless access to the system prior to allowing such connections.

๐Ÿ’ผ AC-18 Wireless Access (L)(M)(H)

a. Establish configuration requirements, connection requirements, and implementation guidance for each type of wireless access; and b. Authorize each type of wireless access to the system prior to allowing such connections.

๐Ÿ’ผ AC-19 (4) RESTRICTIONS FOR CLASSIFIED INFORMATION

The organization: AC-19 (4)(a) Prohibits the use of unclassified mobile devices in facilities containing information systems processing, storing, or transmitting classified information unless specifically permitted by the authorizing official; and AC-19 (4)(b) Enforces the following restrictions on individuals permitted by the authorizing official to use unclassified mobile devices in facilities containing information systems processing, storing, or transmitting classified information: AC-19 (4)(b)(1) Connection of unclassified mobile devices to classified information systems is prohibited; AC-19 (4)(b)(2) Connection of unclassified mobile devices to unclassified information systems requires approval from the authorizing official; AC-19 (4)(b)(3) Use of internal or external modems or wireless interfaces within the unclassified mobile devices is prohibited; and AC-19 (4)(b)(4) Unclassified mobile devices and the information stored on those devices are subject to random reviews and inspections by [Assignment: organization-defined security officials], and if classified information is found, the incident handling policy is followed. AC-19 (4)(c) Restricts the connection of classified mobile devices to classified information systems in accordance with [Assignment: organization-defined security policies].

๐Ÿ’ผ AC-19 Access Control for Mobile Devices

a. Establish configuration requirements, connection requirements, and implementation guidance for organization-controlled mobile devices, to include when such devices are outside of controlled areas; and b. Authorize the connection of mobile devices to organizational systems.

๐Ÿ’ผ AC-19 ACCESS CONTROL FOR MOBILE DEVICES

The organization: AC-19a. Establishes usage restrictions, configuration requirements, connection requirements, and implementation guidance for organization-controlled mobile devices; and AC-19b. Authorizes the connection of mobile devices to organizational information systems.

๐Ÿ’ผ AC-19 Access Control for Mobile Devices (L)(M)(H)

a. Establish configuration requirements, connection requirements, and implementation guidance for organization-controlled mobile devices, to include when such devices are outside of controlled areas; and b. Authorize the connection of mobile devices to organizational systems.

๐Ÿ’ผ AC-19 Access Control for Mobile Devices (L)(M)(H)

a. Establish configuration requirements, connection requirements, and implementation guidance for organization-controlled mobile devices, to include when such devices are outside of controlled areas; and b. Authorize the connection of mobile devices to organizational systems.

๐Ÿ’ผ AC-19 Access Control for Mobile Devices (L)(M)(H)

a. Establish configuration requirements, connection requirements, and implementation guidance for organization-controlled mobile devices, to include when such devices are outside of controlled areas; and b. Authorize the connection of mobile devices to organizational systems.

๐Ÿ’ผ AC-19(4) Access Control for Mobile Devices | Restrictions for Classified Information

(a) Prohibit the use of unclassified mobile devices in facilities containing systems processing, storing, or transmitting classified information unless specifically permitted by the authorizing official; and (b) Enforce the following restrictions on individuals permitted by the authorizing official to use unclassified mobile devices in facilities containing systems processing, storing, or transmitting classified information: (1) Connection of unclassified mobile devices to classified systems is prohibited; (2) Connection of unclassified mobile devices to unclassified systems requires approval from the authorizing official; (3) Use of internal or external modems or wireless interfaces within the unclassified mobile devices is prohibited; and (4) Unclassified mobile devices and the information stored on those devices are subject to random reviews and inspections by [Assignment: organization-defined security officials], and if classified information is found, the incident handling policy is followed. (c) Restrict the connection of classified mobile devices to classified systems in accordance with [Assignment: organization-defined security policies].

๐Ÿ’ผ AC-2 (11) USAGE CONDITIONS

The information system enforces [Assignment: organization-defined circumstances and/or usage conditions] for [Assignment: organization-defined information system accounts].

๐Ÿ’ผ AC-2 (12) ACCOUNT MONITORING | ATYPICAL USAGE

The organization: AC-2 (12)(a) Monitors information system accounts for [Assignment: organization-defined atypical usage]; and AC-2 (12)(b) Reports atypical usage of information system accounts to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ AC-2 (4) AUTOMATED AUDIT ACTIONS

The information system automatically audits account creation, modification, enabling, disabling, and removal actions, and notifies [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ AC-2 (5) INACTIVITY LOGOUT

The organization requires that users log out when [Assignment: organization-defined time-period of expected inactivity or description of when to log out].

๐Ÿ’ผ AC-2 (7) ROLE-BASED SCHEMES

The organization: AC-2 (7)(a) Establishes and administers privileged user accounts in accordance with a role-based access scheme that organizes allowed information system access and privileges into roles; AC-2 (7)(b) Monitors privileged role assignments; and AC-2 (7)(c) Takes [Assignment: organization-defined actions] when privileged role assignments are no longer appropriate.

๐Ÿ’ผ AC-2 Account Management

a. Define and document the types of accounts allowed and specifically prohibited for use within the system; b. Assign account managers; c. Require [Assignment: organization-defined prerequisites and criteria] for group and role membership; d. Specify: 1. Authorized users of the system; 2. Group and role membership; and 3. Access authorizations (i.e., privileges) and [Assignment: organization-defined attributes (as required)] for each account; e. Require approvals by [Assignment: organization-defined personnel or roles] for requests to create accounts; f. Create, enable, modify, disable, and remove accounts in accordance with [Assignment: organization-defined policy, procedures, prerequisites, and criteria]; g. Monitor the use of accounts; h. Notify account managers and [Assignment: organization-defined personnel or roles] within: 1. [Assignment: organization-defined time period] when accounts are no longer required; 2. [Assignment: organization-defined time period] when users are terminated or transferred; and 3. [Assignment: organization-defined time period] when system usage or need-to-know changes for an individual; i. Authorize access to the system based on: 1. A valid access authorization; 2. Intended system usage; and 3. [Assignment: organization-defined attributes (as required)]; j. Review accounts for compliance with account management requirements [Assignment: organization-defined frequency]; k. Establish and implement a process for changing shared or group account authenticators (if deployed) when individuals are removed from the group; and l. Align account management processes with personnel termination and transfer processes.

๐Ÿ’ผ AC-2 ACCOUNT MANAGEMENT

The organization: AC-2a. Identifies and selects the following types of information system accounts to support organizational missions/business functions: [Assignment: organization-defined information system account types]; AC-2b. Assigns account managers for information system accounts; AC-2c. Establishes conditions for group and role membership; AC-2d. Specifies authorized users of the information system, group and role membership, and access authorizations (i.e., privileges) and other attributes (as required) for each account; AC-2e. Requires approvals by [Assignment: organization-defined personnel or roles] for requests to create information system accounts; AC-2f. Creates, enables, modifies, disables, and removes information system accounts in accordance with [Assignment: organization-defined procedures or conditions]; AC-2g. Monitors the use of information system accounts; AC-2h. Notifies account managers: AC-2h.1. When accounts are no longer required; AC-2h.2. When users are terminated or transferred; and AC-2h.3. When individual information system usage or need-to-know changes; AC-2i. Authorizes access to the information system based on: AC-2i.1. A valid access authorization; AC-2i.2. Intended system usage; and AC-2i.3. Other attributes as required by the organization or associated missions/business functions; AC-2j. Reviews accounts for compliance with account management requirements [Assignment: organization-defined frequency]; and AC-2k. Establishes a process for reissuing shared/group account credentials (if deployed) when individuals are removed from the group.

๐Ÿ’ผ AC-2 Account Management (L)(M)(H)

a. Define and document the types of accounts allowed and specifically prohibited for use within the system; b. Assign account managers; c. Require [Assignment: organization-defined prerequisites and criteria] for group and role membership; d. Specify: 1. Authorized users of the system; 2. Group and role membership; and 3. Access authorizations (i.e., privileges) and [Assignment: organization-defined attributes (as required)] for each account; e. Require approvals by [Assignment: organization-defined personnel or roles] for requests to create accounts; f. Create, enable, modify, disable, and remove accounts in accordance with [Assignment: organization-defined policy, procedures, prerequisites, and criteria]; g. Monitor the use of accounts; h. Notify account managers and [Assignment: organization-defined personnel or roles] within: 1. [FedRAMP Assignment: twenty-four (24) hours] when accounts are no longer required; 2. [FedRAMP Assignment: eight (8) hours] when users are terminated or transferred; and 3. [FedRAMP Assignment: eight (8) hours] when system usage or need-to-know changes for an individual; i. Authorize access to the system based on: 1. A valid access authorization; 2. Intended system usage; and 3. [Assignment: organization-defined attributes (as required)]; j. Review accounts for compliance with account management requirements [FedRAMP Assignment: quarterly for privileged access, annually for non-privileged access]; k. Establish and implement a process for changing shared or group account authenticators (if deployed) when individuals are removed from the group; and l. Align account management processes with personnel termination and transfer processes.

๐Ÿ’ผ AC-2 Account Management (L)(M)(H)

a. Define and document the types of accounts allowed and specifically prohibited for use within the system; b. Assign account managers; c. Require [Assignment: organization-defined prerequisites and criteria] for group and role membership; d. Specify: 1. Authorized users of the system; 2. Group and role membership; and 3. Access authorizations (i.e., privileges) and [Assignment: organization-defined attributes (as required)] for each account; e. Require approvals by [Assignment: organization-defined personnel or roles] for requests to create accounts; f. Create, enable, modify, disable, and remove accounts in accordance with [Assignment: organization-defined policy, procedures, prerequisites, and criteria]; g. Monitor the use of accounts; h. Notify account managers and [Assignment: organization-defined personnel or roles] within: 1. [FedRAMP Assignment: twenty-four (24) hours] when accounts are no longer required; 2. [FedRAMP Assignment: eight (8) hours] when users are terminated or transferred; and 3. [FedRAMP Assignment: eight (8) hours] when system usage or need-to-know changes for an individual; i. Authorize access to the system based on: 1. A valid access authorization; 2. Intended system usage; and 3. [Assignment: organization-defined attributes (as required)]; j. Review accounts for compliance with account management requirements [FedRAMP Assignment: quarterly for privileged access, annually for non-privileged access]; k. Establish and implement a process for changing shared or group account authenticators (if deployed) when individuals are removed from the group; and l. Align account management processes with personnel termination and transfer processes.

๐Ÿ’ผ AC-2 Account Management (L)(M)(H)

a. Define and document the types of accounts allowed and specifically prohibited for use within the system; b. Assign account managers; c. Require [Assignment: organization-defined prerequisites and criteria] for group and role membership; d. Specify: 1. Authorized users of the system; 2. Group and role membership; and 3. Access authorizations (i.e., privileges) and [Assignment: organization-defined attributes (as required)] for each account; e. Require approvals by [Assignment: organization-defined personnel or roles] for requests to create accounts; f. Create, enable, modify, disable, and remove accounts in accordance with [Assignment: organization-defined policy, procedures, prerequisites, and criteria]; g. Monitor the use of accounts; h. Notify account managers and [Assignment: organization-defined personnel or roles] within: 1. [FedRAMP Assignment: twenty-four (24) hours] when accounts are no longer required; 2. [FedRAMP Assignment: eight (8) hours] when users are terminated or transferred; and 3. [FedRAMP Assignment: eight (8) hours] when system usage or need-to-know changes for an individual; i. Authorize access to the system based on: 1. A valid access authorization; 2. Intended system usage; and 3. [Assignment: organization-defined attributes (as required)]; j. Review accounts for compliance with account management requirements [FedRAMP Assignment: quarterly for privileged access, annually for non-privileged access]; k. Establish and implement a process for changing shared or group account authenticators (if deployed) when individuals are removed from the group; and l. Align account management processes with personnel termination and transfer processes.

๐Ÿ’ผ AC-2(12) Account Monitoring for Atypical Usage (M)(H)

(a) Monitor system accounts for [Assignment: organization-defined atypical usage]; and (b) Report atypical usage of system accounts to [FedRAMP Assignment: at a minimum, the ISSO and/or similar role within the organization]. **AC-2 (12) Additional FedRAMP Requirements and Guidance**: **(a) Requirement**: Required for privileged accounts. **(b) Requirement**: Required for privileged accounts.

๐Ÿ’ผ AC-2(12) Account Monitoring for Atypical Usage (M)(H)

(a) Monitor system accounts for [Assignment: organization-defined atypical usage]; and (b) Report atypical usage of system accounts to [FedRAMP Assignment: at a minimum, the ISSO and/or similar role within the organization]. **AC-2 (12) Additional FedRAMP Requirements and Guidance**: **(a) Requirement**: Required for privileged accounts. **(b) Requirement**: Required for privileged accounts.

๐Ÿ’ผ AC-2(3) Account Management | Disable Accounts

Disable accounts within [Assignment: organization-defined time period] when the accounts: (a) Have expired; (b) Are no longer associated with a user or individual; (c) Are in violation of organizational policy; or (d) Have been inactive for [Assignment: organization-defined time period].

๐Ÿ’ผ AC-2(3) Disable Accounts (M)(H)

Disable accounts within [FedRAMP Assignment: twenty-four (24) hours for user accounts] when the accounts: (a) Have expired; (b) Are no longer associated with a user or individual; (c) Are in violation of organizational policy; or (d) Have been inactive for [FedRAMP Assignment: ninety (90) days (See additional requirements and guidance.)]. **AC-2 (3) Additional FedRAMP Requirements and Guidance:** **Guidance**: For DoD clouds, see DoD cloud website for specific DoD requirements that go above and beyond FedRAMP <https://public.cyber.mil/dccs/> **Requirement**: The service provider defines the time period for non-user accounts (e.g., accounts associated with devices). The time periods are approved and accepted by the JAB/AO. Where user management is a function of the service, reports of activity of consumer users shall be made available. **(d) Requirement**: The service provider defines the time period of inactivity for device identifiers.

๐Ÿ’ผ AC-2(3) Disable Accounts (M)(H)

Disable accounts within [FedRAMP Assignment: twenty-four (24) hours for user accounts] when the accounts: (a) Have expired; (b) Are no longer associated with a user or individual; (c) Are in violation of organizational policy; or (d) Have been inactive for [FedRAMP Assignment: ninety (90) days (See additional requirements and guidance.)]. **AC-2 (3) Additional FedRAMP Requirements and Guidance:** **Guidance**: For DoD clouds, see DoD cloud website for specific DoD requirements that go above and beyond FedRAMP <https://public.cyber.mil/dccs/> **Requirement**: The service provider defines the time period for non-user accounts (e.g., accounts associated with devices). The time periods are approved and accepted by the JAB/AO. Where user management is a function of the service, reports of activity of consumer users shall be made available. **(d) Requirement**: The service provider defines the time period of inactivity for device identifiers.

๐Ÿ’ผ AC-2(5) Inactivity Logout (M)(H)

Require that users log out when [FedRAMP Assignment: for privileged users, it is the end of a user's standard work period]. **AC-2 (5) Additional FedRAMP Requirements and Guidance:** **Guidance**: Should use a shorter timeframe than AC-12.

๐Ÿ’ผ AC-2(5) Inactivity Logout (M)(H)

Require that users log out when [FedRAMP Assignment: for privileged users, it is the end of a user's standard work period]. **AC-2 (5) Additional FedRAMP Requirements and Guidance:** **Guidance**: Should use a shorter timeframe than AC-12.

๐Ÿ’ผ AC-2(7) Account Management | Privileged User Accounts

(a) Establish and administer privileged user accounts in accordance with [Selection: a role-based access scheme; an attribute-based access scheme]; (b) Monitor privileged role or attribute assignments; (c) Monitor changes to roles or attributes; and (d) Revoke access when privileged role or attribute assignments are no longer appropriate.

๐Ÿ’ผ AC-2(7) Privileged User Accounts (M)(H)

(a) Establish and administer privileged user accounts in accordance with [Selection: Assignment: a role-based access scheme; an attribute-based access scheme]; (b) Monitor privileged role or attribute assignments; (c) Monitor changes to roles or attributes; and (d) Revoke access when privileged role or attribute assignments are no longer appropriate.

๐Ÿ’ผ AC-2(7) Privileged User Accounts (M)(H)

(a) Establish and administer privileged user accounts in accordance with [Selection: Assignment: a role-based access scheme; an attribute-based access scheme]; (b) Monitor privileged role or attribute assignments; (c) Monitor changes to roles or attributes; and (d) Revoke access when privileged role or attribute assignments are no longer appropriate.

๐Ÿ’ผ AC-2(9) Restrictions on Use of Shared and Group Accounts (M)(H)

Only permit the use of shared and group accounts that meet [FedRAMP Assignment: organization-defined need with justification statement that explains why such accounts are necessary]. **AC-2 (9) Additional FedRAMP Requirements and Guidance:** **Requirement**: Required if shared/group accounts are deployed.

๐Ÿ’ผ AC-2(9) Restrictions on Use of Shared and Group Accounts (M)(H)

Only permit the use of shared and group accounts that meet [FedRAMP Assignment: organization-defined need with justification statement that explains why such accounts are necessary]. **AC-2 (9) Additional FedRAMP Requirements and Guidance:** **Requirement**: Required if shared/group accounts are deployed.

๐Ÿ’ผ AC-20 (1) LIMITS ON AUTHORIZED USE

The organization permits authorized individuals to use an external information system to access the information system or to process, store, or transmit organization-controlled information only when the organization: AC-20 (1)(a) Verifies the implementation of required security controls on the external system as specified in the organization???s information security policy and security plan; or AC-20 (1)(b) Retains approved information system connection or processing agreements with the organizational entity hosting the external information system.

๐Ÿ’ผ AC-20 USE OF EXTERNAL INFORMATION SYSTEMS

The organization establishes terms and conditions, consistent with any trust relationships established with other organizations owning, operating, and/or maintaining external information systems, allowing authorized individuals to: AC-20a. Access the information system from external information systems; and AC-20b. Process, store, or transmit organization-controlled information using external information systems.

๐Ÿ’ผ AC-20 Use of External Systems

a. [Selection (one or more): Establish [Assignment: organization-defined terms and conditions]; Identify [Assignment: organization-defined controls asserted to be implemented on external systems]], consistent with the trust relationships established with other organizations owning, operating, and/or maintaining external systems, allowing authorized individuals to: 1. Access the system from external systems; and 2. Process, store, or transmit organization-controlled information using external systems; or b. Prohibit the use of [Assignment: organizationally-defined types of external systems].

๐Ÿ’ผ AC-20 Use of External Systems (L)(M)(H)

a. [Selection (one-or-more): Establish [Assignment: organization-defined terms and conditions]; Identify [Assignment: organization-defined controls asserted to be implemented on external systems]], consistent with the trust relationships established with other organizations owning, operating, and/or maintaining external systems, allowing authorized individuals to: 1. Access the system from external systems; and 2. Process, store, or transmit organization-controlled information using external systems; or b. Prohibit the use of [Assignment: organizationally-defined types of external systems]. **AC-20 Additional FedRAMP Requirements and Guidance:** **Guidance**: The interrelated controls of AC-20, CA-3, and SA-9 should be differentiated as follows: - AC-20 describes system access to and from external systems. - CA-3 describes documentation of an agreement between the respective system owners when data is exchanged between the CSO and an external system. - SA-9 describes the responsibilities of external system owners. These responsibilities would typically be captured in the agreement required by CA-3.

๐Ÿ’ผ AC-20 Use of External Systems (L)(M)(H)

a. [Selection (one-or-more): Establish [Assignment: organization-defined terms and conditions]; Identify [Assignment: organization-defined controls asserted to be implemented on external systems]], consistent with the trust relationships established with other organizations owning, operating, and/or maintaining external systems, allowing authorized individuals to: 1. Access the system from external systems; and 2. Process, store, or transmit organization-controlled information using external systems; or b. Prohibit the use of [Assignment: organizationally-defined types of external systems]. **AC-20 Additional FedRAMP Requirements and Guidance:** **Guidance**: The interrelated controls of AC-20, CA-3, and SA-9 should be differentiated as follows: - AC-20 describes system access to and from external systems. - CA-3 describes documentation of an agreement between the respective system owners when data is exchanged between the CSO and an external system. - SA-9 describes the responsibilities of external system owners. These responsibilities would typically be captured in the agreement required by CA-3.

๐Ÿ’ผ AC-20 Use of External Systems (L)(M)(H)

a. [Selection (one-or-more): Establish [Assignment: organization-defined terms and conditions]; Identify [Assignment: organization-defined controls asserted to be implemented on external systems]], consistent with the trust relationships established with other organizations owning, operating, and/or maintaining external systems, allowing authorized individuals to: 1. Access the system from external systems; and 2. Process, store, or transmit organization-controlled information using external systems; or b. Prohibit the use of [Assignment: organizationally-defined types of external systems]. **AC-20 Additional FedRAMP Requirements and Guidance:** **Guidance**: The interrelated controls of AC-20, CA-3, and SA-9 should be differentiated as follows: - AC-20 describes system access to and from external systems. - CA-3 describes documentation of an agreement between the respective system owners when data is exchanged between the CSO and an external system. - SA-9 describes the responsibilities of external system owners. These responsibilities would typically be captured in the agreement required by CA-3.

๐Ÿ’ผ AC-20(1) Limits on Authorized Use (M)(H)

Permit authorized individuals to use an external system to access the system or to process, store, or transmit organization-controlled information only after: (a) Verification of the implementation of controls on the external system as specified in the organization's security and privacy policies and security and privacy plans; or (b) Retention of approved system connection or processing agreements with the organizational entity hosting the external system.

๐Ÿ’ผ AC-20(1) Limits on Authorized Use (M)(H)

Permit authorized individuals to use an external system to access the system or to process, store, or transmit organization-controlled information only after: (a) Verification of the implementation of controls on the external system as specified in the organization's security and privacy policies and security and privacy plans; or (b) Retention of approved system connection or processing agreements with the organizational entity hosting the external system.

๐Ÿ’ผ AC-20(1) Use of External Systems | Limits on Authorized Use

Permit authorized individuals to use an external system to access the system or to process, store, or transmit organization-controlled information only after: (a) Verification of the implementation of controls on the external system as specified in the organizationโ€™s security and privacy policies and security and privacy plans; or (b) Retention of approved system connection or processing agreements with the organizational entity hosting the external system.

๐Ÿ’ผ AC-21 Information Sharing

a. Enable authorized users to determine whether access authorizations assigned to a sharing partner match the informationโ€™s access and use restrictions for [Assignment: organization-defined information sharing circumstances where user discretion is required]; and b. Employ [Assignment: organization-defined automated mechanisms or manual processes] to assist users in making information sharing and collaboration decisions.

๐Ÿ’ผ AC-21 INFORMATION SHARING

The organization: AC-21a. Facilitates information sharing by enabling authorized users to determine whether access authorizations assigned to the sharing partner match the access restrictions on the information for [Assignment: organization-defined information sharing circumstances where user discretion is required]; and AC-21b. Employs [Assignment: organization-defined automated mechanisms or manual processes] to assist users in making information sharing/collaboration decisions.

๐Ÿ’ผ AC-21 Information Sharing (M)(H)

a. Enable authorized users to determine whether access authorizations assigned to a sharing partner match the information's access and use restrictions for [Assignment: organization-defined information sharing circumstances where user discretion is required]; and b. Employ [Assignment: organization-defined automated mechanisms or manual processes] to assist users in making information sharing and collaboration decisions.

๐Ÿ’ผ AC-21 Information Sharing (M)(H)

a. Enable authorized users to determine whether access authorizations assigned to a sharing partner match the information's access and use restrictions for [Assignment: organization-defined information sharing circumstances where user discretion is required]; and b. Employ [Assignment: organization-defined automated mechanisms or manual processes] to assist users in making information sharing and collaboration decisions.

๐Ÿ’ผ AC-22 Publicly Accessible Content

a. Designate individuals authorized to make information publicly accessible; b. Train authorized individuals to ensure that publicly accessible information does not contain nonpublic information; c. Review the proposed content of information prior to posting onto the publicly accessible system to ensure that nonpublic information is not included; and d. Review the content on the publicly accessible system for nonpublic information [Assignment: organization-defined frequency] and remove such information, if discovered.

๐Ÿ’ผ AC-22 PUBLICLY ACCESSIBLE CONTENT

The organization: AC-22a. Designates individuals authorized to post information onto a publicly accessible information system; AC-22b. Trains authorized individuals to ensure that publicly accessible information does not contain nonpublic information; AC-22c. Reviews the proposed content of information prior to posting onto the publicly accessible information system to ensure that nonpublic information is not included; and AC-22d. Reviews the content on the publicly accessible information system for nonpublic information [Assignment: organization-defined frequency] and removes such information, if discovered.

๐Ÿ’ผ AC-22 Publicly Accessible Content (L)(M)(H)

a. Designate individuals authorized to make information publicly accessible; b. Train authorized individuals to ensure that publicly accessible information does not contain nonpublic information; c. Review the proposed content of information prior to posting onto the publicly accessible system to ensure that nonpublic information is not included; and d. Review the content on the publicly accessible system for nonpublic information [FedRAMP Assignment: at least quarterly] and remove such information, if discovered.

๐Ÿ’ผ AC-22 Publicly Accessible Content (L)(M)(H)

a. Designate individuals authorized to make information publicly accessible; b. Train authorized individuals to ensure that publicly accessible information does not contain nonpublic information; c. Review the proposed content of information prior to posting onto the publicly accessible system to ensure that nonpublic information is not included; and d. Review the content on the publicly accessible system for nonpublic information [FedRAMP Assignment: at least quarterly] and remove such information, if discovered.

๐Ÿ’ผ AC-22 Publicly Accessible Content (L)(M)(H)

a. Designate individuals authorized to make information publicly accessible; b. Train authorized individuals to ensure that publicly accessible information does not contain nonpublic information; c. Review the proposed content of information prior to posting onto the publicly accessible system to ensure that nonpublic information is not included; and d. Review the content on the publicly accessible system for nonpublic information [FedRAMP Assignment: at least quarterly] and remove such information, if discovered.

๐Ÿ’ผ AC-23 Data Mining Protection

Employ [Assignment: organization-defined data mining prevention and detection techniques] for [Assignment: organization-defined data storage objects] to detect and protect against unauthorized data mining.

๐Ÿ’ผ AC-23 DATA MINING PROTECTION

The organization employs [Assignment: organization-defined data mining prevention and detection techniques] for [Assignment: organization-defined data storage objects] to adequately detect and protect against data mining.

๐Ÿ’ผ AC-24 (1) TRANSMIT ACCESS AUTHORIZATION INFORMATION

The information system transmits [Assignment: organization-defined access authorization information] using [Assignment: organization-defined security safeguards] to [Assignment: organization-defined information systems] that enforce access control decisions.

๐Ÿ’ผ AC-24 (2) NO USER OR PROCESS IDENTITY

The information system enforces access control decisions based on [Assignment: organization-defined security attributes] that do not include the identity of the user or process acting on behalf of the user.

๐Ÿ’ผ AC-24 Access Control Decisions

[Selection: Establish procedures; Implement mechanisms] to ensure [Assignment: organization-defined access control decisions] are applied to each access request prior to access enforcement.

๐Ÿ’ผ AC-24 ACCESS CONTROL DECISIONS

The organization establishes procedures to ensure [Assignment: organization-defined access control decisions] are applied to each access request prior to access enforcement.

๐Ÿ’ผ AC-25 Reference Monitor

Implement a reference monitor for [Assignment: organization-defined access control policies] that is tamperproof, always invoked, and small enough to be subject to analysis and testing, the completeness of which can be assured.

๐Ÿ’ผ AC-25 REFERENCE MONITOR

The information system implements a reference monitor for [Assignment: organization-defined access control policies] that is tamperproof, always invoked, and small enough to be subject to analysis and testing, the completeness of which can be assured.

๐Ÿ’ผ AC-3 (2) DUAL AUTHORIZATION

The information system enforces dual authorization for [Assignment: organization-defined privileged commands and/or other organization-defined actions].

๐Ÿ’ผ AC-3 (3) MANDATORY ACCESS CONTROL

The information system enforces [Assignment: organization-defined mandatory access control policy] over all subjects and objects where the policy: AC-3 (3)(a) Is uniformly enforced across all subjects and objects within the boundary of the information system; AC-3 (3)(b) Specifies that a subject that has been granted access to information is constrained from doing any of the following; AC-3 (3)(b)(1) Passing the information to unauthorized subjects or objects; AC-3 (3)(b)(2) Granting its privileges to other subjects; AC-3 (3)(b)(3) Changing one or more security attributes on subjects, objects, the information system, or information system components; AC-3 (3)(b)(4) Choosing the security attributes and attribute values to be associated with newly created or modified objects; or AC-3 (3)(b)(5) Changing the rules governing access control; and AC-3 (3)(c) Specifies that [Assignment: organization-defined subjects] may explicitly be granted [Assignment: organization-defined privileges (i.e., they are trusted subjects)] such that they are not limited by some or all of the above constraints.

๐Ÿ’ผ AC-3 (4) DISCRETIONARY ACCESS CONTROL

The information system enforces [Assignment: organization-defined discretionary access control policy] over defined subjects and objects where the policy specifies that a subject that has been granted access to information can do one or more of the following: AC-3 (4)(a) Pass the information to any other subjects or objects; AC-3 (4)(b) Grant its privileges to other subjects; AC-3 (4)(c) Change security attributes on subjects, objects, the information system, or the information system???s components; AC-3 (4)(d) Choose the security attributes to be associated with newly created or revised objects; or AC-3 (4)(e) Change the rules governing access control.

๐Ÿ’ผ AC-3 (7) ROLE-BASED ACCESS CONTROL

The information system enforces a role-based access control policy over defined subjects and objects and controls access based upon [Assignment: organization-defined roles and users authorized to assume such roles].

๐Ÿ’ผ AC-3 (8) REVOCATION OF ACCESS AUTHORIZATIONS

The information system enforces the revocation of access authorizations resulting from changes to the security attributes of subjects and objects based on [Assignment: organization-defined rules governing the timing of revocations of access authorizations].

๐Ÿ’ผ AC-3 (9) CONTROLLED RELEASE

The information system does not release information outside of the established system boundary unless: AC-3 (9)(a) The receiving [Assignment: organization-defined information system or system component] provides [Assignment: organization-defined security safeguards]; and AC-3 (9)(b) [Assignment: organization-defined security safeguards] are used to validate the appropriateness of the information designated for release.

๐Ÿ’ผ AC-3 Access Enforcement

Enforce approved authorizations for logical access to information and system resources in accordance with applicable access control policies.

๐Ÿ’ผ AC-3 ACCESS ENFORCEMENT

The information system enforces approved authorizations for logical access to information and system resources in accordance with applicable access control policies.

๐Ÿ’ผ AC-3(12) Access Enforcement | Assert and Enforce Application Access

(a) Require applications to assert, as part of the installation process, the access needed to the following system applications and functions: [Assignment: organization-defined system applications and functions]; (b) Provide an enforcement mechanism to prevent unauthorized access; and (c) Approve access changes after initial installation of the application.

๐Ÿ’ผ AC-3(3) Access Enforcement | Mandatory Access Control

Enforce [Assignment: organization-defined mandatory access control policy] over the set of covered subjects and objects specified in the policy, and where the policy: (a) Is uniformly enforced across the covered subjects and objects within the system; (b) Specifies that a subject that has been granted access to information is constrained from doing any of the following; (1) Passing the information to unauthorized subjects or objects; (2) Granting its privileges to other subjects; (3) Changing one or more security attributes (specified by the policy) on subjects, objects, the system, or system components; (4) Choosing the security attributes and attribute values (specified by the policy) to be associated with newly created or modified objects; and (5) Changing the rules governing access control; and (c) Specifies that [Assignment: organization-defined subjects] may explicitly be granted [Assignment: organization-defined privileges] such that they are not limited by any defined subset (or all) of the above constraints.

๐Ÿ’ผ AC-3(4) Access Enforcement | Discretionary Access Control

Enforce [Assignment: organization-defined discretionary access control policy] over the set of covered subjects and objects specified in the policy, and where the policy specifies that a subject that has been granted access to information can do one or more of the following: (a) Pass the information to any other subjects or objects; (b) Grant its privileges to other subjects; (c) Change security attributes on subjects, objects, the system, or the systemโ€™s components; (d) Choose the security attributes to be associated with newly created or revised objects; or (e) Change the rules governing access control.

๐Ÿ’ผ AC-3(9) Access Enforcement | Controlled Release

Release information outside of the system only if: (a) The receiving [Assignment: organization-defined system or system component] provides [Assignment: organization-defined controls]; and (b) [Assignment: organization-defined controls] are used to validate the appropriateness of the information designated for release.

๐Ÿ’ผ AC-4 (1) OBJECT SECURITY ATTRIBUTES

The information system uses [Assignment: organization-defined security attributes] associated with [Assignment: organization-defined information, source, and destination objects] to enforce [Assignment: organization-defined information flow control policies] as a basis for flow control decisions.

๐Ÿ’ผ AC-4 (12) DATA TYPE IDENTIFIERS

The information system, when transferring information between different security domains, uses [Assignment: organization-defined data type identifiers] to validate data essential for information flow decisions.

๐Ÿ’ผ AC-4 (14) SECURITY POLICY FILTER CONSTRAINTS

The information system, when transferring information between different security domains, implements [Assignment: organization-defined security policy filters] requiring fully enumerated formats that restrict data structure and content.

๐Ÿ’ผ AC-4 (15) DETECTION OF UNSANCTIONED INFORMATION

The information system, when transferring information between different security domains, examines the information for the presence of [Assignment: organized-defined unsanctioned information] and prohibits the transfer of such information in accordance with the [Assignment: organization-defined security policy].

๐Ÿ’ผ AC-4 (17) DOMAIN AUTHENTICATION

The information system uniquely identifies and authenticates source and destination points by [Selection (one or more): organization, system, application, individual] for information transfer.

๐Ÿ’ผ AC-4 (19) VALIDATION OF METADATA

The information system, when transferring information between different security domains, applies the same security policy filtering to metadata as it applies to data payloads.

๐Ÿ’ผ AC-4 (2) PROCESSING DOMAINS

The information system uses protected processing domains to enforce [Assignment: organization-defined information flow control policies] as a basis for flow control decisions.

๐Ÿ’ผ AC-4 (20) APPROVED SOLUTIONS

The organization employs [Assignment: organization-defined solutions in approved configurations] to control the flow of [Assignment: organization-defined information] across security domains.

๐Ÿ’ผ AC-4 (22) ACCESS ONLY

The information system provides access from a single device to computing platforms, applications, or data residing on multiple different security domains, while preventing any information flow between the different security domains.

๐Ÿ’ผ AC-4 (4) CONTENT CHECK ENCRYPTED INFORMATION

The information system prevents encrypted information from bypassing content-checking mechanisms by [Selection (one or more): decrypting the information; blocking the flow of the encrypted information; terminating communications sessions attempting to pass encrypted information; [Assignment: organization-defined procedure or method]].

๐Ÿ’ผ AC-4 (8) SECURITY POLICY FILTERS

The information system enforces information flow control using [Assignment: organization-defined security policy filters] as a basis for flow control decisions for [Assignment: organization-defined information flows].

๐Ÿ’ผ AC-4 (9) HUMAN REVIEWS

The information system enforces the use of human reviews for [Assignment: organization-defined information flows] under the following conditions: [Assignment: organization-defined conditions].

๐Ÿ’ผ AC-4 Information Flow Enforcement

Enforce approved authorizations for controlling the flow of information within the system and between connected systems based on [Assignment: organization-defined information flow control policies].

๐Ÿ’ผ AC-4 INFORMATION FLOW ENFORCEMENT

The information system enforces approved authorizations for controlling the flow of information within the system and between interconnected systems based on [Assignment: organization-defined information flow control policies].

๐Ÿ’ผ AC-4(25) Information Flow Enforcement | Data Sanitization

When transferring information between different security domains, sanitize data to minimize [Selection (one or more): delivery of malicious content, command and control of malicious code, malicious code augmentation, and steganography encoded data; spillage of sensitive information] in accordance with [Assignment: organization-defined policy]].

๐Ÿ’ผ AC-4(29) Information Flow Enforcement | Filter Orchestration Engines

When transferring information between different security domains, employ content filter orchestration engines to ensure that: (a) Content filtering mechanisms successfully complete execution without errors; and (b) Content filtering actions occur in the correct order and comply with [Assignment: organization-defined policy].

๐Ÿ’ผ AC-4(4) Flow Control of Encrypted Information (H)

Prevent encrypted information from bypassing [FedRAMP Assignment: intrusion detection mechanisms] by [Selection (one-or-more): decrypting the information; blocking the flow of the encrypted information; terminating communications sessions attempting to pass encrypted information]. **AC-4 (4) Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider must support Agency requirements to comply with [M-21-31](https://www.whitehouse.gov/wp-content/uploads/2021/08/M-21-31-Improving-the-Federal-Governments-Investigative-and-Remediation-Capabilities-Related-to-Cybersecurity-Incidents.pdf) and [M-22-09](https://www.whitehouse.gov/wp-content/uploads/2022/01/M-22-09.pdf).

๐Ÿ’ผ AC-4(4) Information Flow Enforcement | Flow Control of Encrypted Information

Prevent encrypted information from bypassing [Assignment: organization-defined information flow control mechanisms] by [Selection (one or more): decrypting the information; blocking the flow of the encrypted information; terminating communications sessions attempting to pass encrypted information; [Assignment: organization-defined procedure or method]].

๐Ÿ’ผ AC-4(8) Information Flow Enforcement | Security and Privacy Policy Filters

(a) Enforce information flow control using [Assignment: organization-defined security or privacy policy filters] as a basis for flow control decisions for [Assignment: organization-defined information flows]; and (b) [Selection (one or more): Block; Strip; Modify; Quarantine] data after a filter processing failure in accordance with [Assignment: organization-defined security or privacy policy].

๐Ÿ’ผ AC-5 Separation of Duties

a. Identify and document [Assignment: organization-defined duties of individuals requiring separation]; and b. Define system access authorizations to support separation of duties.

๐Ÿ’ผ AC-5 SEPARATION OF DUTIES

The organization: AC-5a. Separates [Assignment: organization-defined duties of individuals]; AC-5b. Documents separation of duties of individuals; and AC-5c. Defines information system access authorizations to support separation of duties.

๐Ÿ’ผ AC-5 Separation of Duties (M)(H)

a. Identify and document [Assignment: organization-defined duties of individuals requiring separation]; and b. Define system access authorizations to support separation of duties. **AC-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: CSPs have the option to provide a separation of duties matrix as an attachment to the SSP.

๐Ÿ’ผ AC-5 Separation of Duties (M)(H)

a. Identify and document [Assignment: organization-defined duties of individuals requiring separation]; and b. Define system access authorizations to support separation of duties. **AC-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: CSPs have the option to provide a separation of duties matrix as an attachment to the SSP.

๐Ÿ’ผ AC-6 (3) NETWORK ACCESS TO PRIVILEGED COMMANDS

The organization authorizes network access to [Assignment: organization-defined privileged commands] only for [Assignment: organization-defined compelling operational needs] and documents the rationale for such access in the security plan for the information system.

๐Ÿ’ผ AC-6 (7) REVIEW OF USER PRIVILEGES

The organization: AC-6 (7)(a) Reviews [Assignment: organization-defined frequency] the privileges assigned to [Assignment: organization-defined roles or classes of users] to validate the need for such privileges; and AC-6 (7)(b) Reassigns or removes privileges, if necessary, to correctly reflect organizational mission/business needs.

๐Ÿ’ผ AC-6 Least Privilege

Employ the principle of least privilege, allowing only authorized accesses for users (or processes acting on behalf of users) that are necessary to accomplish assigned organizational tasks.

๐Ÿ’ผ AC-6 LEAST PRIVILEGE

The organization employs the principle of least privilege, allowing only authorized accesses for users (or processes acting on behalf of users) which are necessary to accomplish assigned tasks in accordance with organizational missions and business functions.

๐Ÿ’ผ AC-6 Least Privilege (M)(H)

Employ the principle of least privilege, allowing only authorized accesses for users (or processes acting on behalf of users) that are necessary to accomplish assigned organizational tasks.

๐Ÿ’ผ AC-6 Least Privilege (M)(H)

Employ the principle of least privilege, allowing only authorized accesses for users (or processes acting on behalf of users) that are necessary to accomplish assigned organizational tasks.

๐Ÿ’ผ AC-6(1) Authorize Access to Security Functions (M)(H)

Authorize access for [Assignment: organization-defined individuals or roles] to: (a) [Assignment: organization-defined security functions (deployed in hardware, software, and firmware)]; and (b) [Assignment: organization-defined security-relevant information].

๐Ÿ’ผ AC-6(1) Authorize Access to Security Functions (M)(H)

Authorize access for [Assignment: organization-defined individuals or roles] to: (a) [Assignment: organization-defined security functions (deployed in hardware, software, and firmware)]; and (b) [Assignment: organization-defined security-relevant information].

๐Ÿ’ผ AC-6(2) Non-privileged Access for Nonsecurity Functions (M)(H)

Require that users of system accounts (or roles) with access to [FedRAMP Assignment: all security functions] use non-privileged accounts or roles, when accessing nonsecurity functions. **AC-6 (2) Additional FedRAMP Requirements and Guidance:** **Guidance**: Examples of security functions include but are not limited to: establishing system accounts, configuring access authorizations (i.e., permissions, privileges), setting events to be audited, and setting intrusion detection parameters, system programming, system and security administration, other privileged functions.

๐Ÿ’ผ AC-6(2) Non-privileged Access for Nonsecurity Functions (M)(H)

Require that users of system accounts (or roles) with access to [FedRAMP Assignment: all security functions] use non-privileged accounts or roles, when accessing nonsecurity functions. **AC-6 (2) Additional FedRAMP Requirements and Guidance:** **Guidance**: Examples of security functions include but are not limited to: establishing system accounts, configuring access authorizations (i.e., permissions, privileges), setting events to be audited, and setting intrusion detection parameters, system programming, system and security administration, other privileged functions.

๐Ÿ’ผ AC-6(7) Least Privilege | Review of User Privileges

(a) Review [Assignment: organization-defined frequency] the privileges assigned to [Assignment: organization-defined roles or classes of users] to validate the need for such privileges; and (b) Reassign or remove privileges, if necessary, to correctly reflect organizational mission and business needs.

๐Ÿ’ผ AC-6(7) Review of User Privileges (M)(H)

(a) Review [FedRAMP Assignment: at a minimum, annually] the privileges assigned to [FedRAMP Assignment: all users with privileges] to validate the need for such privileges; and (b) Reassign or remove privileges, if necessary, to correctly reflect organizational mission and business needs.

๐Ÿ’ผ AC-6(7) Review of User Privileges (M)(H)

(a) Review [FedRAMP Assignment: at a minimum, annually] the privileges assigned to [FedRAMP Assignment: all users with privileges] to validate the need for such privileges; and (b) Reassign or remove privileges, if necessary, to correctly reflect organizational mission and business needs.

๐Ÿ’ผ AC-7 (2) PURGE | WIPE MOBILE DEVICE

The information system purges/wipes information from [Assignment: organization-defined mobile devices] based on [Assignment: organization-defined purging/wiping requirements/techniques] after [Assignment: organization-defined number] consecutive, unsuccessful device logon attempts.

๐Ÿ’ผ AC-7 Unsuccessful Logon Attempts

a. Enforce a limit of [Assignment: organization-defined number] consecutive invalid logon attempts by a user during a [Assignment: organization-defined time period]; and b. Automatically [Selection (one or more): lock the account or node for an [Assignment: organization-defined time period]; lock the account or node until released by an administrator; delay next logon prompt per [Assignment: organization-defined delay algorithm]; notify system administrator; take other [Assignment: organization-defined action]] when the maximum number of unsuccessful attempts is exceeded.

๐Ÿ’ผ AC-7 UNSUCCESSFUL LOGON ATTEMPTS

The information system: AC-7a. Enforces a limit of [Assignment: organization-defined number] consecutive invalid logon attempts by a user during a [Assignment: organization-defined time period]; and AC-7b. Automatically [Selection: locks the account/node for an [Assignment: organization-defined time period]; locks the account/node until released by an administrator; delays next logon prompt according to [Assignment: organization-defined delay algorithm]] when the maximum number of unsuccessful attempts is exceeded.

๐Ÿ’ผ AC-7 Unsuccessful Logon Attempts (L)(M)(H)

a. Enforce a limit of [Assignment: organization-defined number] consecutive invalid logon attempts by a user during a [Assignment: organization-defined time period]; and b. Automatically [Selection (one-or-more): lock the account or node for an [Assignment: organization-defined time period]; lock the account or node until released by an administrator; delay next logon prompt per [Assignment: organization-defined delay algorithm]; notify system administrator; take other [Assignment: organization-defined action]] when the maximum number of unsuccessful attempts is exceeded. **AC-7 Additional FedRAMP Requirements and Guidance:** **Requirement**: In alignment with NIST SP 800-63B

๐Ÿ’ผ AC-7 Unsuccessful Logon Attempts (L)(M)(H)

a. Enforce a limit of [Assignment: organization-defined number] consecutive invalid logon attempts by a user during a [Assignment: organization-defined time period]; and b. Automatically [Selection (one-or-more): lock the account or node for an [Assignment: organization-defined time period]; lock the account or node until released by an administrator; delay next logon prompt per [Assignment: organization-defined delay algorithm]; notify system administrator; take other [Assignment: organization-defined action]] when the maximum number of unsuccessful attempts is exceeded. **AC-7 Additional FedRAMP Requirements and Guidance:** **Requirement**: In alignment with NIST SP 800-63B

๐Ÿ’ผ AC-7 Unsuccessful Logon Attempts (L)(M)(H)

a. Enforce a limit of [Assignment: organization-defined number] consecutive invalid logon attempts by a user during a [Assignment: organization-defined time period]; and b. Automatically [Selection (one-or-more): lock the account or node for an [Assignment: organization-defined time period]; lock the account or node until released by an administrator; delay next logon prompt per [Assignment: organization-defined delay algorithm]; notify system administrator; take other [Assignment: organization-defined action]] when the maximum number of unsuccessful attempts is exceeded. **AC-7 Additional FedRAMP Requirements and Guidance:** **Requirement**: In alignment with NIST SP 800-63B

๐Ÿ’ผ AC-7(4) Unsuccessful Logon Attempts | Use of Alternate Authentication Factor

(a) Allow the use of [Assignment: organization-defined authentication factors] that are different from the primary authentication factors after the number of organization-defined consecutive invalid logon attempts have been exceeded; and (b) Enforce a limit of [Assignment: organization-defined number] consecutive invalid logon attempts through use of the alternative factors by a user during a [Assignment: organization-defined time period].

๐Ÿ’ผ AC-8 System Use Notification

a. Display [Assignment: organization-defined system use notification message or banner] to users before granting access to the system that provides privacy and security notices consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines and state that: 1. Users are accessing a U.S. Government system; 2. System usage may be monitored, recorded, and subject to audit; 3. Unauthorized use of the system is prohibited and subject to criminal and civil penalties; and 4. Use of the system indicates consent to monitoring and recording; b. Retain the notification message or banner on the screen until users acknowledge the usage conditions and take explicit actions to log on to or further access the system; and c. For publicly accessible systems: 1. Display system use information [Assignment: organization-defined conditions], before granting further access to the publicly accessible system; 2. Display references, if any, to monitoring, recording, or auditing that are consistent with privacy accommodations for such systems that generally prohibit those activities; and 3. Include a description of the authorized uses of the system.

๐Ÿ’ผ AC-8 SYSTEM USE NOTIFICATION

The information system: AC-8a. Displays to users [Assignment: organization-defined system use notification message or banner] before granting access to the system that provides privacy and security notices consistent with applicable federal laws, Executive Orders, directives, policies, regulations, standards, and guidance and states that: AC-8a.1. Users are accessing a U.S. Government information system; AC-8a.2. Information system usage may be monitored, recorded, and subject to audit; AC-8a.3. Unauthorized use of the information system is prohibited and subject to criminal and civil penalties; and AC-8a.4. Use of the information system indicates consent to monitoring and recording; AC-8b. Retains the notification message or banner on the screen until users acknowledge the usage conditions and take explicit actions to log on to or further access the information system; and AC-8c. For publicly accessible systems: AC-8c.1. Displays system use information [Assignment: organization-defined conditions], before granting further access; AC-8c.2. Displays references, if any, to monitoring, recording, or auditing that are consistent with privacy accommodations for such systems that generally prohibit those activities; and AC-8c.3. Includes a description of the authorized uses of the system.

๐Ÿ’ผ AC-8 System Use Notification (L)(M)(H)

a. Display [FedRAMP Assignment: see additional Requirements and Guidance] to users before granting access to the system that provides privacy and security notices consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines and state that: 1. Users are accessing a U.S. Government system; 2. System usage may be monitored, recorded, and subject to audit; 3. Unauthorized use of the system is prohibited and subject to criminal and civil penalties; and 4. Use of the system indicates consent to monitoring and recording; b. Retain the notification message or banner on the screen until users acknowledge the usage conditions and take explicit actions to log on to or further access the system; and c. For publicly accessible systems: 1. Display system use information [FedRAMP Assignment: see additional Requirements and Guidance], before granting further access to the publicly accessible system 2. Display references, if any, to monitoring, recording, or auditing that are consistent with privacy accommodations for such systems that generally prohibit those activities; and 3. Include a description of the authorized uses of the system. **AC-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: If performed as part of a Configuration Baseline check, then the % of items requiring setting that are checked and that pass (or fail) check can be provided. **Requirement**: The service provider shall determine elements of the cloud environment that require the System Use Notification control. The elements of the cloud environment that require System Use Notification are approved and accepted by the JAB/AO. **Requirement**: The service provider shall determine how System Use Notification is going to be verified and provide appropriate periodicity of the check. The System Use Notification verification and periodicity are approved and accepted by the JAB/AO. **Requirement**: If not performed as part of a Configuration Baseline check, then there must be documented agreement on how to provide results of verification and the necessary periodicity of the verification by the service provider. The documented agreement on how to provide verification of the results are approved and accepted by the JAB/AO.

๐Ÿ’ผ AC-8 System Use Notification (L)(M)(H)

a. Display [FedRAMP Assignment: see additional Requirements and Guidance] to users before granting access to the system that provides privacy and security notices consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines and state that: 1. Users are accessing a U.S. Government system; 2. System usage may be monitored, recorded, and subject to audit; 3. Unauthorized use of the system is prohibited and subject to criminal and civil penalties; and 4. Use of the system indicates consent to monitoring and recording; b. Retain the notification message or banner on the screen until users acknowledge the usage conditions and take explicit actions to log on to or further access the system; and c. For publicly accessible systems: 1. Display system use information [FedRAMP Assignment: see additional Requirements and Guidance], before granting further access to the publicly accessible system 2. Display references, if any, to monitoring, recording, or auditing that are consistent with privacy accommodations for such systems that generally prohibit those activities; and 3. Include a description of the authorized uses of the system. **AC-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: If performed as part of a Configuration Baseline check, then the % of items requiring setting that are checked and that pass (or fail) check can be provided. **Requirement**: The service provider shall determine elements of the cloud environment that require the System Use Notification control. The elements of the cloud environment that require System Use Notification are approved and accepted by the JAB/AO. **Requirement**: The service provider shall determine how System Use Notification is going to be verified and provide appropriate periodicity of the check. The System Use Notification verification and periodicity are approved and accepted by the JAB/AO. **Requirement**: If not performed as part of a Configuration Baseline check, then there must be documented agreement on how to provide results of verification and the necessary periodicity of the verification by the service provider. The documented agreement on how to provide verification of the results are approved and accepted by the JAB/AO.

๐Ÿ’ผ AC-8 System Use Notification (L)(M)(H)

a. Display [FedRAMP Assignment: see additional Requirements and Guidance] to users before granting access to the system that provides privacy and security notices consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines and state that: 1. Users are accessing a U.S. Government system; 2. System usage may be monitored, recorded, and subject to audit; 3. Unauthorized use of the system is prohibited and subject to criminal and civil penalties; and 4. Use of the system indicates consent to monitoring and recording; b. Retain the notification message or banner on the screen until users acknowledge the usage conditions and take explicit actions to log on to or further access the system; and c. For publicly accessible systems: 1. Display system use information [FedRAMP Assignment: see additional Requirements and Guidance], before granting further access to the publicly accessible system 2. Display references, if any, to monitoring, recording, or auditing that are consistent with privacy accommodations for such systems that generally prohibit those activities; and 3. Include a description of the authorized uses of the system. **AC-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: If performed as part of a Configuration Baseline check, then the % of items requiring setting that are checked and that pass (or fail) check can be provided. **Requirement**: The service provider shall determine elements of the cloud environment that require the System Use Notification control. The elements of the cloud environment that require System Use Notification are approved and accepted by the JAB/AO. **Requirement**: The service provider shall determine how System Use Notification is going to be verified and provide appropriate periodicity of the check. The System Use Notification verification and periodicity are approved and accepted by the JAB/AO. **Requirement**: If not performed as part of a Configuration Baseline check, then there must be documented agreement on how to provide results of verification and the necessary periodicity of the verification by the service provider. The documented agreement on how to provide verification of the results are approved and accepted by the JAB/AO.

๐Ÿ’ผ AC-9 (1) UNSUCCESSFUL LOGONS

The information system notifies the user, upon successful logon/access, of the number of unsuccessful logon/access attempts since the last successful logon/access.

๐Ÿ’ผ AC-9 (3) NOTIFICATION OF ACCOUNT CHANGES

The information system notifies the user of changes to [Assignment: organization-defined security-related characteristics/parameters of the user???s account] during [Assignment: organization-defined time period].

๐Ÿ’ผ AC-9 (4) ADDITIONAL LOGON INFORMATION

The information system notifies the user, upon successful logon (access), of the following additional information: [Assignment: organization-defined information to be included in addition to the date and time of the last logon (access)].

๐Ÿ’ผ Alignment to demand

The way users and applications consume your workloads and other resources can help you identify improvements to meet sustainability goals. Scale infrastructure to continually match demand and verify that you use only the minimum resources required to support your users. Align service levels to customer needs. Position resources to limit the network required for users and applications to consume them. Remove unused assets. Provide your team members with devices that support their needs and minimize their sustainability impact.

๐Ÿ’ผ Architecture selection

The optimal solution for a particular workload varies, and solutions often combine multiple approaches. Well-Architected workloads use multiple solutions and allow different features to improve performance.

๐Ÿ’ผ Art. 1 Subject-matter and objectives

1. This Regulation lays down rules relating to the protection of natural persons with regard to the processing of personal data and rules relating to the free movement of personal data. 2. This Regulation protects fundamental rights and freedoms of natural persons and in particular their right to the protection of personal data. 3. The free movement of personal data within the Union shall be neither restricted nor prohibited for reasons connected with the protection of natural persons with regard to the processing of personal data.

๐Ÿ’ผ Art. 10 Processing of personal data relating to criminal convictions and offences

Processing of personal data relating to criminal convictions and offences or related security measures based on Article 6(1) shall be carried out only under the control of official authority or when the processing is authorised by Union or Member State law providing for appropriate safeguards for the rights and freedoms of data subjects. Any comprehensive register of criminal convictions shall be kept only under the control of official authority.

๐Ÿ’ผ Art. 11 Processing which does not require identification

1. If the purposes for which a controller processes personal data do not or do no longer require the identification of a data subject by the controller, the controller shall not be obliged to maintain, acquire or process additional information in order to identify the data subject for the sole purpose of complying with this Regulation. 2. Where, in cases referred to in paragraph 1 of this Article, the controller is able to demonstrate that it is not in a position to identify the data subject, the controller shall inform the data subject accordingly, if possible. In such cases, Articles 15 to 20 shall not apply except where the data subject, for the purpose of exercising his or her rights under those articles, provides additional information enabling his or her identification.

๐Ÿ’ผ Art. 12 Transparent information, communication and modalities for the exercise of the rights of the data subject

1. The controller shall take appropriate measures to provide any information referred to in Articles 13 and 14 and any communication under Articles 15 to 22 and 34 relating to processing to the data subject in a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child. The information shall be provided in writing, or by other means, including, where appropriate, by electronic means. When requested by the data subject, the information may be provided orally, provided that the identity of the data subject is proven by other means. 2. The controller shall facilitate the exercise of data subject rights under Articles 15 to 22. In the cases referred to in Article 11(2), the controller shall not refuse to act on the request of the data subject for exercising his or her rights under Articles 15 to 22, unless the controller demonstrates that it is not in a position to identify the data subject. 3. The controller shall provide information on action taken on a request under Articles 15 to 22 to the data subject without undue delay and in any event within one month of receipt of the request. That period may be extended by two further months where necessary, taking into account the complexity and number of the requests. The controller shall inform the data subject of any such extension within one month of receipt of the request, together with the reasons for the delay. Where the data subject makes the request by electronic form means, the information shall be provided by electronic means where possible, unless otherwise requested by the data subject. 4. If the controller does not take action on the request of the data subject, the controller shall inform the data subject without delay and at the latest within one month of receipt of the request of the reasons for not taking action and on the possibility of lodging a complaint with a supervisory authority and seeking a judicial remedy. 5. Information provided under Articles 13 and 14 and any communication and any actions taken under Articles 15 to 22 and 34 shall be provided free of charge. Where requests from a data subject are manifestly unfounded or excessive, in particular because of their repetitive character, the controller may either: a. charge a reasonable fee taking into account the administrative costs of providing the information or communication or taking the action requested; or b. refuse to act on the request. 6. The controller shall bear the burden of demonstrating the manifestly unfounded or excessive character of the request. 7. Without prejudice to Article 11, where the controller has reasonable doubts concerning the identity of the natural person making the request referred to in Articles 15 to 21, the controller may request the provision of additional information necessary to confirm the identity of the data subject. 8. The information to be provided to data subjects pursuant to Articles 13 and 14 may be provided in combination with standardised icons in order to give in an easily visible, intelligible and clearly legible manner a meaningful overview of the intended processing. Where the icons are presented electronically they shall be machine-readable. 9. The Commission shall be empowered to adopt delegated acts in accordance with Article 92 for the purpose of determining the information to be presented by the icons and the procedures for providing standardised icons.

๐Ÿ’ผ Art. 13 Information to be provided where personal data are collected from the data subject

Where personal data relating to a data subject are collected from the data subject, the controller shall, at the time when personal data are obtained, provide the data subject with all of the following information: 1. the identity and the contact details of the controller and, where applicable, of the controller's representative; 2. the contact details of the data protection officer, where applicable; 3. the purposes of the processing for which the personal data are intended as well as the legal basis for the processing; 4. where the processing is based on point (f) of Article 6(1), the legitimate interests pursued by the controller or by a third party; 5. the recipients or categories of recipients of the personal data, if any; 6. where applicable, the fact that the controller intends to transfer personal data to a third country or international organisation and the existence or absence of an adequacy decision by the Commission, or in the case of transfers referred to in Article 46 or 47, or the second subparagraph of Article 49(1), reference to the appropriate or suitable safeguards and the means by which to obtain a copy of them or where they have been made available. In addition to the information referred to in paragraph 1, the controller shall, at the time when personal data are obtained, provide the data subject with the following further information necessary to ensure fair and transparent processing: 1. the period for which the personal data will be stored, or if that is not possible, the criteria used to determine that period; 2. the existence of the right to request from the controller access to and rectification or erasure of personal data or restriction of processing concerning the data subject or to object to processing as well as the right to data portability; 3. where the processing is based on point (a) of Article 6(1) or point (a) of Article 9(2), the existence of the right to withdraw consent at any time, without affecting the lawfulness of processing based on consent before its withdrawal; 4. the right to lodge a complaint with a supervisory authority; 5. whether the provision of personal data is a statutory or contractual requirement, or a requirement necessary to enter into a contract, as well as whether the data subject is obliged to provide the personal data and of the possible consequences of failure to provide such data; 6. the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject. Where the controller intends to further process the personal data for a purpose other than that for which the personal data were collected, the controller shall provide the data subject prior to that further processing with information on that other purpose and with any relevant further information as referred to in paragraph 2. Paragraphs 1, 2 and 3 shall not apply where and insofar as the data subject already has the information.

๐Ÿ’ผ Art. 14 Information to be provided where personal data have not been obtained from the data subject

1. Where personal data have not been obtained from the data subject, the controller shall provide the data subject with the following information: a. the identity and the contact details of the controller and, where applicable, of the controller's representative; b. the contact details of the data protection officer, where applicable; c. the purposes of the processing for which the personal data are intended as well as the legal basis for the processing; d. the categories of personal data concerned; e. the recipients or categories of recipients of the personal data, if any; f. where applicable, that the controller intends to transfer personal data to a recipient in a third country or international organisation and the existence or absence of an adequacy decision by the Commission, or in the case of transfers referred to in Article 46 or 47, or the second subparagraph of Article 49(1), reference to the appropriate or suitable safeguards and the means to obtain a copy of them or where they have been made available. 2. In addition to the information referred to in paragraph 1, the controller shall provide the data subject with the following information necessary to ensure fair and transparent processing in respect of the data subject: a. the period for which the personal data will be stored, or if that is not possible, the criteria used to determine that period; b. where the processing is based on point (f) of Article 6(1), the legitimate interests pursued by the controller or by a third party; c. the existence of the right to request from the controller access to and rectification or erasure of personal data or restriction of processing concerning the data subject and to object to processing as well as the right to data portability; d. where processing is based on point (a) of Article 6(1) or point (a) of Article 9(2), the existence of the right to withdraw consent at any time, without affecting the lawfulness of processing based on consent before its withdrawal; e. the right to lodge a complaint with a supervisory authority; f. from which source the personal data originate, and if applicable, whether it came from publicly accessible sources; g. the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject. 3. The controller shall provide the information referred to in paragraphs 1 and 2: a. within a reasonable period after obtaining the personal data, but at the latest within one month, having regard to the specific circumstances in which the personal data are processed; b. if the personal data are to be used for communication with the data subject, at the latest at the time of the first communication to that data subject; or c. if a disclosure to another recipient is envisaged, at the latest when the personal data are first disclosed. 4. Where the controller intends to further process the personal data for a purpose other than that for which the personal data were obtained, the controller shall provide the data subject prior to that further processing with information on that other purpose and with any relevant further information as referred to in paragraph 2. 5. Paragraphs 1 to 4 shall not apply where and insofar as: a. the data subject already has the information; b. the provision of such information proves impossible or would involve a disproportionate effort, in particular for processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes, subject to the conditions and safeguards referred to in Article 89(1) or in so far as the obligation referred to in paragraph 1 of this Article is likely to render impossible or seriously impair the achievement of the objectives of that processing. In such cases the controller shall take appropriate measures to protect the data subject's rights and freedoms and legitimate interests, including making the information publicly available; c. obtaining or disclosure is expressly laid down by Union or Member State law to which the controller is subject and which provides appropriate measures to protect the data subject's legitimate interests; or d. where the personal data must remain confidential subject to an obligation of professional secrecy regulated by Union or Member State law, including a statutory obligation of secrecy.

๐Ÿ’ผ Art. 15 Right of access by the data subject

1. The data subject shall have the right to obtain from the controller confirmation as to whether or not personal data concerning him or her are being processed, and, where that is the case, access to the personal data and the following information: a. the purposes of the processing; b. the categories of personal data concerned; c. the recipients or categories of recipient to whom the personal data have been or will be disclosed, in particular recipients in third countries or international organisations; d. where possible, the envisaged period for which the personal data will be stored, or, if not possible, the criteria used to determine that period; e. the existence of the right to request from the controller rectification or erasure of personal data or restriction of processing of personal data concerning the data subject or to object to such processing; f. the right to lodge a complaint with a supervisory authority; g. where the personal data are not collected from the data subject, any available information as to their source; h. the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject. 2. Where personal data are transferred to a third country or to an international organisation, the data subject shall have the right to be informed of the appropriate safeguards pursuant to Article 46 relating to the transfer. 3. The controller shall provide a copy of the personal data undergoing processing. For any further copies requested by the data subject, the controller may charge a reasonable fee based on administrative costs. Where the data subject makes the request by electronic means, and unless otherwise requested by the data subject, the information shall be provided in a commonly used electronic form. 4. The right to obtain a copy referred to in paragraph 3 shall not adversely affect the rights and freedoms of others.

๐Ÿ’ผ Art. 16 Right to rectification

The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her. Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement.

๐Ÿ’ผ Art. 17 Right to erasure (โ€˜right to be forgottenโ€™)

1. The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay where one of the following grounds applies: a. the personal data are no longer necessary in relation to the purposes for which they were collected or otherwise processed; b. the data subject withdraws consent on which the processing is based according to point (a) of Article 6(1), or point (a) of Article 9(2), and where there is no other legal ground for the processing; c. the data subject objects to the processing pursuant to Article 21(1) and there are no overriding legitimate grounds for the processing, or the data subject objects to the processing pursuant to Article 21(2); d. the personal data have been unlawfully processed; e. the personal data have to be erased for compliance with a legal obligation in Union or Member State law to which the controller is subject; f. the personal data have been collected in relation to the offer of information society services referred to in Article 8(1). 2. Where the controller has made the personal data public and is obliged pursuant to paragraph 1 to erase the personal data, the controller, taking account of available technology and the cost of implementation, shall take reasonable steps, including technical measures, to inform controllers which are processing the personal data that the data subject has requested the erasure by such controllers of any links to, or copy or replication of, those personal data. 3. Paragraphs 1 and 2 shall not apply to the extent that processing is necessary: a. for exercising the right of freedom of expression and information; b. for compliance with a legal obligation which requires processing by Union or Member State law to which the controller is subject or for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller; c. for reasons of public interest in the area of public health in accordance with points (h) and (i) of Article 9(2) as well as Article 9(3); d. for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing; or e. for the establishment, exercise or defence of legal claims.

๐Ÿ’ผ Art. 18 Right to restriction of processing

1. The data subject shall have the right to obtain from the controller restriction of processing where one of the following applies: a. the accuracy of the personal data is contested by the data subject, for a period enabling the controller to verify the accuracy of the personal data; b. the processing is unlawful and the data subject opposes the erasure of the personal data and requests the restriction of their use instead; c. the controller no longer needs the personal data for the purposes of the processing, but they are required by the data subject for the establishment, exercise or defence of legal claims; d. the data subject has objected to processing pursuant to Article 21(1) pending the verification whether the legitimate grounds of the controller override those of the data subject. 2. Where processing has been restricted under paragraph 1, such personal data shall, with the exception of storage, only be processed with the data subject's consent or for the establishment, exercise or defence of legal claims or for the protection of the rights of another natural or legal person or for reasons of important public interest of the Union or of a Member State. 3. A data subject who has obtained restriction of processing pursuant to paragraph 1 shall be informed by the controller before the restriction of processing is lifted.

๐Ÿ’ผ Art. 19 Notification obligation regarding rectification or erasure of personal data or restriction of processing

The controller shall communicate any rectification or erasure of personal data or restriction of processing carried out in accordance with Article 16, Article 17(1) and Article 18 to each recipient to whom the personal data have been disclosed, unless this proves impossible or involves disproportionate effort. The controller shall inform the data subject about those recipients if the data subject requests it.

๐Ÿ’ผ Art. 2 Material scope

1. This Regulation applies to the processing of personal data wholly or partly by automated means and to the processing other than by automated means of personal data which form part of a filing system or are intended to form part of a filing system. 2. This Regulation does not apply to the processing of personal data: a. in the course of an activity which falls outside the scope of Union law; b. by the Member States when carrying out activities which fall within the scope of Chapter 2 of Title V of the TEU; c. by a natural person in the course of a purely personal or household activity; d. by competent authorities for the purposes of the prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties, including the safeguarding against and the prevention of threats to public security. 3. For the processing of personal data by the Union institutions, bodies, offices and agencies, Regulation (EC) No 45/2001 applies. Regulation (EC) No 45/2001 and other Union legal acts applicable to such processing of personal data shall be adapted to the principles and rules of this Regulation in accordance with Article 98. 4. This Regulation shall be without prejudice to the application of Directive 2000/31/EC, in particular of the liability rules of intermediary service providers in Articles 12 to 15 of that Directive.

๐Ÿ’ผ Art. 20 Right to data portability

1. The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance from the controller to which the personal data have been provided, where: a. the processing is based on consent pursuant to point (a) of Article 6(1) or point (a) of Article 9(2) or on a contract pursuant to point (b) of Article 6(1); and b. the processing is carried out by automated means. 2. In exercising his or her right to data portability pursuant to paragraph 1, the data subject shall have the right to have the personal data transmitted directly from one controller to another, where technically feasible. 3. The exercise of the right referred to in paragraph 1 of this Article shall be without prejudice to Article 17. That right shall not apply to processing necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller. 4. The right referred to in paragraph 1 shall not adversely affect the rights and freedoms of others.

๐Ÿ’ผ Art. 21 Right to object

The data subject shall have the right to object, on grounds relating to his or her particular situation, at any time to processing of personal data concerning him or her which is based on point (e) or (f) of Article 6(1), including profiling based on those provisions. The controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject or for the establishment, exercise or defence of legal claims. Where personal data are processed for direct marketing purposes, the data subject shall have the right to object at any time to processing of personal data concerning him or her for such marketing, which includes profiling to the extent that it is related to such direct marketing. Where the data subject objects to processing for direct marketing purposes, the personal data shall no longer be processed for such purposes. At the latest at the time of the first communication with the data subject, the right referred to in paragraphs 1 and 2 shall be explicitly brought to the attention of the data subject and shall be presented clearly and separately from any other information. In the context of the use of information society services, and notwithstanding Directive 2002/58/EC, the data subject may exercise his or her right to object by automated means using technical specifications. Where personal data are processed for scientific or historical research purposes or statistical purposes pursuant to Article 89(1), the data subject, on grounds relating to his or her particular situation, shall have the right to object to processing of personal data concerning him or her, unless the processing is necessary for the performance of a task carried out for reasons of public interest.

๐Ÿ’ผ Art. 22 Automated individual decision-making, including profiling

1. The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her. 2. Paragraph 1 shall not apply if the decision: a. is necessary for entering into, or performance of, a contract between the data subject and a data controller; b. is authorised by Union or Member State law to which the controller is subject and which also lays down suitable measures to safeguard the data subject's rights and freedoms and legitimate interests; or c. is based on the data subject's explicit consent. 3. In the cases referred to in points (a) and (c) of paragraph 2, the data controller shall implement suitable measures to safeguard the data subject's rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision. 4. Decisions referred to in paragraph 2 shall not be based on special categories of personal data referred to in Article 9(1), unless point (a) or (g) of Article 9(2) applies and suitable measures to safeguard the data subject's rights and freedoms and legitimate interests are in place.

๐Ÿ’ผ Art. 23 Restrictions

1. Union or Member State law to which the data controller or processor is subject may restrict by way of a legislative measure the scope of the obligations and rights provided for in Articles 12 to 22 and Article 34, as well as Article 5 in so far as its provisions correspond to the rights and obligations provided for in Articles 12 to 22, when such a restriction respects the essence of the fundamental rights and freedoms and is a necessary and proportionate measure in a democratic society to safeguard: a. national security; b. defence; c. public security; d. the prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties, including the safeguarding against and the prevention of threats to public security; e. other important objectives of general public interest of the Union or of a Member State, in particular an important economic or financial interest of the Union or of a Member State, including monetary, budgetary and taxation a matters, public health and social security; f. the protection of judicial independence and judicial proceedings; g. the prevention, investigation, detection and prosecution of breaches of ethics for regulated professions; h. a monitoring, inspection or regulatory function connected, even occasionally, to the exercise of official authority in the cases referred to in points (a) to (e) and (g); i. the protection of the data subject or the rights and freedoms of others; j. the enforcement of civil law claims. 2. In particular, any legislative measure referred to in paragraph 1 shall contain specific provisions at least, where relevant, as to: a. the purposes of the processing or categories of processing; b. the categories of personal data; c. the scope of the restrictions introduced; d. the safeguards to prevent abuse or unlawful access or transfer; e. the specification of the controller or categories of controllers; f. the storage periods and the applicable safeguards taking into account the nature, scope and purposes of the processing or categories of processing; g. the risks to the rights and freedoms of data subjects; and h. the right of data subjects to be informed about the restriction, unless that may be prejudicial to the purpose of the restriction.

๐Ÿ’ผ Art. 24 Responsibility of the controller

1. Taking into account the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for the rights and freedoms of natural persons, the controller shall implement appropriate technical and organisational measures to ensure and to be able to demonstrate that processing is performed in accordance with this Regulation. Those measures shall be reviewed and updated where necessary. 2. Where proportionate in relation to processing activities, the measures referred to in paragraph 1 shall include the implementation of appropriate data protection policies by the controller. 3. Adherence to approved codes of conduct as referred to in Article 40 or approved certification mechanisms as referred to in Article 42 may be used as an element by which to demonstrate compliance with the obligations of the controller.

๐Ÿ’ผ Art. 25 Data protection by design and by default

1. Taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing, the controller shall, both at the time of the determination of the means for processing and at the time of the processing itself, implement appropriate technical and organisational measures, such as pseudonymisation, which are designed to implement data-protection principles, such as data minimisation, in an effective manner and to integrate the necessary safeguards into the processing in order to meet the requirements of this Regulation and protect the rights of data subjects. 2. The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility. In particular, such measures shall ensure that by default personal data are not made accessible without the individualโ€™s intervention to an indefinite number of natural persons. 3. An approved certification mechanism pursuant to Article 42 may be used as an element to demonstrate compliance with the requirements set out in paragraphs 1 and 2 of this Article.

๐Ÿ’ผ Art. 26 Joint controllers

1. Where two or more controllers jointly determine the purposes and means of processing, they shall be joint controllers. They shall in a transparent manner determine their respective responsibilities for compliance with the obligations under this Regulation, in particular as regards the exercising of the rights of the data subject and their respective duties to provide the information referred to in Articles 13 and 14, by means of an arrangement between them unless, and in so far as, the respective responsibilities of the controllers are determined by Union or Member State law to which the controllers are subject. The arrangement may designate a contact point for data subjects. 2. The arrangement referred to in paragraph 1 shall duly reflect the respective roles and relationships of the joint controllers vis-ร -vis the data subjects. The essence of the arrangement shall be made available to the data subject. 3. Irrespective of the terms of the arrangement referred to in paragraph 1, the data subject may exercise his or her rights under this Regulation in respect of and against each of the controllers.

๐Ÿ’ผ Art. 27 Representatives of controllers or processors not established in the Union

1. Where Article 3(2) applies, the controller or the processor shall designate in writing a representative in the Union. 2. The obligation laid down in paragraph 1 of this Article shall not apply to: a. processing which is occasional, does not include, on a large scale, processing of special categories of data as referred to in Article 9(1) or processing of personal data relating to criminal convictions and offences referred to in Article 10, and is unlikely to result in a risk to the rights and freedoms of natural persons, taking into account the nature, context, scope and purposes of the processing; or b. a public authority or body. 3. The representative shall be established in one of the Member States where the data subjects, whose personal data are processed in relation to the offering of goods or services to them, or whose behaviour is monitored, are. 4. The representative shall be mandated by the controller or processor to be addressed in addition to or instead of the controller or the processor by, in particular, supervisory authorities and data subjects, on all issues related to processing, for the purposes of ensuring compliance with this Regulation. 5. The designation of a representative by the controller or processor shall be without prejudice to legal actions which could be initiated against the controller or the processor themselves.

๐Ÿ’ผ Art. 28 Processor

1. Where processing is to be carried out on behalf of a controller, the controller shall use only processors providing sufficient guarantees to implement appropriate technical and organisational measures in such a manner that processing will meet the requirements of this Regulation and ensure the protection of the rights of the data subject. 2. The processor shall not engage another processor without prior specific or general written authorisation of the controller. In the case of general written authorisation, the processor shall inform the controller of any intended changes concerning the addition or replacement of other processors, thereby giving the controller the opportunity to object to such changes. 3. Processing by a processor shall be governed by a contract or other legal act under Union or Member State law, that is binding on the processor with regard to the controller and that sets out the subject-matter and duration of the processing, the nature and purpose of the processing, the type of personal data and categories of data subjects and the obligations and rights of the controller. That contract or other legal act shall stipulate, in particular, that the processor: a. processes the personal data only on documented instructions from the controller, including with regard to transfers of personal data to a third country or an international organisation, unless required to do so by Union or Member State law to which the processor is subject; in such a case, the processor shall inform the controller of that legal requirement before processing, unless that law prohibits such information on important grounds of public interest; b. ensures that persons authorised to process the personal data have committed themselves to confidentiality or are under an appropriate statutory obligation of confidentiality; c. takes all measures required pursuant to Article 32; d. respects the conditions referred to in paragraphs 2 and 4 for engaging another processor; e. taking into account the nature of the processing, assists the controller by appropriate technical and organisational measures, insofar as this is possible, for the fulfilment of the controller's obligation to respond to requests for exercising the data subject's rights laid down in Chapter III; f. assists the controller in ensuring compliance with the obligations pursuant to Articles 32 to 36 taking into account the nature of processing and the information available to the processor; g. at the choice of the controller, deletes or returns all the personal data to the controller after the end of the provision of services relating to processing, and deletes existing copies unless Union or Member State law requires storage of the personal data; h. makes available to the controller all information necessary to demonstrate compliance with the obligations laid down in this Article and allow for and contribute to audits, including inspections, conducted by the controller or another auditor mandated by the controller. 4. With regard to point (h) of the first subparagraph, the processor shall immediately inform the controller if, in its opinion, an instruction infringes this Regulation or other Union or Member State data protection provisions. 5. Where a processor engages another processor for carrying out specific processing activities on behalf of the controller, the same data protection obligations as set out in the contract or other legal act between the controller and the processor as referred to in paragraph 3 shall be imposed on that other processor by way of a contract or other legal act under Union or Member State law, in particular providing sufficient guarantees to implement appropriate technical and organisational measures in such a manner that the processing will meet the requirements of this Regulation. Where that other processor fails to fulfil its data protection obligations, the initial processor shall remain fully liable to the controller for the performance of that other processor's obligations. 6. Adherence of a processor to an approved code of conduct as referred to in Article 40 or an approved certification mechanism as referred to in Article 42 may be used as an element by which to demonstrate sufficient guarantees as referred to in paragraphs 1 and 4 of this Article. 7. Without prejudice to an individual contract between the controller and the processor, the contract or the other legal act referred to in paragraphs 3 and 4 of this Article may be based, in whole or in part, on standard contractual clauses referred to in paragraphs 7 and 8 of this Article, including when they are part of a certification granted to the controller or processor pursuant to Articles 42 and 43. 8. The Commission may lay down standard contractual clauses for the matters referred to in paragraph 3 and 4 of this Article and in accordance with the examination procedure referred to in Article 93(2). 9. A supervisory authority may adopt standard contractual clauses for the matters referred to in paragraph 3 and 4 of this Article and in accordance with the consistency mechanism referred to in Article 63. 10. The contract or the other legal act referred to in paragraphs 3 and 4 shall be in writing, including in electronic form. 11. Without prejudice to Articles 82, 83 and 84, if a processor infringes this Regulation by determining the purposes and means of processing, the processor shall be considered to be a controller in respect of that processing.

๐Ÿ’ผ Art. 3 Territorial scope

1. This Regulation applies to the processing of personal data in the context of the activities of an establishment of a controller or a processor in the Union, regardless of whether the processing takes place in the Union or not. 2. This Regulation applies to the processing of personal data of data subjects who are in the Union by a controller or processor not established in the Union, where the processing activities are related to: a. the offering of goods or services, irrespective of whether a payment of the data subject is required, to such data subjects in the Union; or b. the monitoring of their behaviour as far as their behaviour takes place within the Union. 3. This Regulation applies to the processing of personal data by a controller not established in the Union, but in a place where Member State law applies by virtue of public international law.

๐Ÿ’ผ Art. 30 Records of processing activities

1. Each controller and, where applicable, the controller's representative, shall maintain a record of processing activities under its responsibility. That record shall contain all of the following information: a. the name and contact details of the controller and, where applicable, the joint controller, the controller's representative and the data protection officer; b. the purposes of the processing; c. a description of the categories of data subjects and of the categories of personal data; d. the categories of recipients to whom the personal data have been or will be disclosed including recipients in third countries or international organisations; e. where applicable, transfers of personal data to a third country or an international organisation, including the identification of that third country or international organisation and, in the case of transfers referred to in the second subparagraph of Article 49(1), the documentation of suitable safeguards; f. where possible, the envisaged time limits for erasure of the different categories of data; g. where possible, a general description of the technical and organisational security measures referred to in Article 32(1). 2. Each processor and, where applicable, the processor's representative shall maintain a record of all categories of processing activities carried out on behalf of a controller, containing: a. the name and contact details of the processor or processors and of each controller on behalf of which the processor is acting, and, where applicable, of the controller's or the processor's representative, and the data protection officer; b. the categories of processing carried out on behalf of each controller; c. where applicable, transfers of personal data to a third country or an international organisation, including the identification of that third country or international organisation and, in the case of transfers referred to in the second subparagraph of Article 49(1), the documentation of suitable safeguards; d. where possible, a general description of the technical and organisational security measures referred to in Article 32(1). 3. The records referred to in paragraphs 1 and 2 shall be in writing, including in electronic form. 4. The controller or the processor and, where applicable, the controller's or the processor's representative, shall make the record available to the supervisory authority on request. 5. The obligations referred to in paragraphs 1 and 2 shall not apply to an enterprise or an organisation employing fewer than 250 persons unless the processing it carries out is likely to result in a risk to the rights and freedoms of data subjects, the processing is not occasional, or the processing includes special categories of data as referred to in Article 9(1) or personal data relating to criminal convictions and offences referred to in Article 10.

๐Ÿ’ผ Art. 32 Security of processing

1. Taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk, including inter alia as appropriate: a. the pseudonymisation and encryption of personal data; b. the ability to ensure the ongoing confidentiality, integrity, availability and resilience of processing systems and services; c. the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident; d. a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures for ensuring the security of the processing. 2. In assessing the appropriate level of security account shall be taken in particular of the risks that are presented by processing, in particular from accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to personal data transmitted, stored or otherwise processed. 3. Adherence to an approved code of conduct as referred to in Article 40 or an approved certification mechanism as referred to in Article 42 may be used as an element by which to demonstrate compliance with the requirements set out in paragraph 1 of this Article. 4. The controller and processor shall take steps to ensure that any natural person acting under the authority of the controller or the processor who has access to personal data does not process them except on instructions from the controller, unless he or she is required to do so by Union or Member State law.

๐Ÿ’ผ Art. 33 Notification of a personal data breach to the supervisory authority

1. In the case of a personal data breach, the controller shall without undue delay and, where feasible, not later than 72 hours after having become aware of it, notify the personal data breach to the supervisory authority competent in accordance with Article 55, unless the personal data breach is unlikely to result in a risk to the rights and freedoms of natural persons. Where the notification to the supervisory authority is not made within 72 hours, it shall be accompanied by reasons for the delay. 2. The processor shall notify the controller without undue delay after becoming aware of a personal data breach. 3. The notification referred to in paragraph 1 shall at least: a. describe the nature of the personal data breach including where possible, the categories and approximate number of data subjects concerned and the categories and approximate number of personal data records concerned; b. communicate the name and contact details of the data protection officer or other contact point where more information can be obtained; c. describe the likely consequences of the personal data breach; d. describe the measures taken or proposed to be taken by the controller to address the personal data breach, including, where appropriate, measures to mitigate its possible adverse effects. 4. Where, and in so far as, it is not possible to provide the information at the same time, the information may be provided in phases without undue further delay. 5. The controller shall document any personal data breaches, comprising the facts relating to the personal data breach, its effects and the remedial action taken. That documentation shall enable the supervisory authority to verify compliance with this Article.

๐Ÿ’ผ Art. 34 Communication of a personal data breach to the data subject

1. When the personal data breach is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall communicate the personal data breach to the data subject without undue delay. 2. The communication to the data subject referred to in paragraph 1 of this Article shall describe in clear and plain language the nature of the personal data breach and contain at least the information and measures referred to in points (b), (c) and (d) of Article 33(3). 3. The communication to the data subject referred to in paragraph 1 shall not be required if any of the following conditions are met: a. the controller has implemented appropriate technical and organisational protection measures, and those measures were applied to the personal data affected by the personal data breach, in particular those that render the personal data unintelligible to any person who is not authorised to access it, such as encryption; b. the controller has taken subsequent measures which ensure that the high risk to the rights and freedoms of data subjects referred to in paragraph 1 is no longer likely to materialise; c. it would involve disproportionate effort. In such a case, there shall instead be a public communication or similar measure whereby the data subjects are informed in an equally effective manner. 4. If the controller has not already communicated the personal data breach to the data subject, the supervisory authority, having considered the likelihood of the personal data breach resulting in a high risk, may require it to do so or may decide that any of the conditions referred to in paragraph 3 are met.

๐Ÿ’ผ Art. 35 Data protection impact assessment

1. Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data. A single assessment may address a set of similar processing operations that present similar high risks. 2. The controller shall seek the advice of the data protection officer, where designated, when carrying out a data protection impact assessment. 3. A data protection impact assessment referred to in paragraph 1 shall in particular be required in the case of: a. a systematic and extensive evaluation of personal aspects relating to natural persons which is based on automated processing, including profiling, and on which decisions are based that produce legal effects concerning the natural person or similarly significantly affect the natural person; b. processing on a large scale of special categories of data referred to in Article 9(1), or of personal data relating to criminal convictions and offences referred to in Article 10; or c. a systematic monitoring of a publicly accessible area on a large scale. 4. The supervisory authority shall establish and make public a list of the kind of processing operations which are subject to the requirement for a data protection impact assessment pursuant to paragraph 1. The supervisory authority shall communicate those lists to the Board referred to in Article 68. 5. The supervisory authority may also establish and make public a list of the kind of processing operations for which no data protection impact assessment is required. The supervisory authority shall communicate those lists to the Board. 6. Prior to the adoption of the lists referred to in paragraphs 4 and 5, the competent supervisory authority shall apply the consistency mechanism referred to in Article 63 where such lists involve processing activities which are related to the offering of goods or services to data subjects or to the monitoring of their behaviour in several Member States, or may substantially affect the free movement of personal data within the Union. 7. The assessment shall contain at least: a. a systematic description of the envisaged processing operations and the purposes of the processing, including, where applicable, the legitimate interest pursued by the controller; b. an assessment of the necessity and proportionality of the processing operations in relation to the purposes; c. an assessment of the risks to the rights and freedoms of data subjects referred to in paragraph 1; and d. the measures envisaged to address the risks, including safeguards, security measures and mechanisms to ensure the protection of personal data and to demonstrate compliance with this Regulation taking into account the rights and legitimate interests of data subjects and other persons concerned. 8. Compliance with approved codes of conduct referred to in Article 40 by the relevant controllers or processors shall be taken into due account in assessing the impact of the processing operations performed by such controllers or processors, in particular for the purposes of a data protection impact assessment. 9. Where appropriate, the controller shall seek the views of data subjects or their representatives on the intended processing, without prejudice to the protection of commercial or public interests or the security of processing operations. 10. Where processing pursuant to point (c) or (e) of Article 6(1) has a legal basis in Union law or in the law of the Member State to which the controller is subject, that law regulates the specific processing operation or set of operations in question, and a data protection impact assessment has already been carried out as part of a general impact assessment in the context of the adoption of that legal basis, paragraphs 1 to 7 shall not apply unless Member States deem it to be necessary to carry out such an assessment prior to processing activities. 11. Where necessary, the controller shall carry out a review to assess if processing is performed in accordance with the data protection impact assessment at least when there is a change of the risk represented by processing operations.

๐Ÿ’ผ Art. 36 Prior consultation

1. The controller shall consult the supervisory authority prior to processing where a data protection impact assessment under Article 35 indicates that the processing would result in a high risk in the absence of measures taken by the controller to mitigate the risk. 2. Where the supervisory authority is of the opinion that the intended processing referred to in paragraph 1 would infringe this Regulation, in particular where the controller has insufficiently identified or mitigated the risk, the supervisory authority shall, within period of up to eight weeks of receipt of the request for consultation, provide written advice to the controller and, where applicable to the processor, and may use any of its powers referred to in Article 58. That period may be extended by six weeks, taking into account the complexity of the intended processing. The supervisory authority shall inform the controller and, where applicable, the processor, of any such extension within one month of receipt of the request for consultation together with the reasons for the delay. Those periods may be suspended until the supervisory authority has obtained information it has requested for the purposes of the consultation. 3. When consulting the supervisory authority pursuant to paragraph 1, the controller shall provide the supervisory authority with: a. where applicable, the respective responsibilities of the controller, joint controllers and processors involved in the processing, in particular for processing within a group of undertakings; b. the purposes and means of the intended processing; c. the measures and safeguards provided to protect the rights and freedoms of data subjects pursuant to this Regulation; d. where applicable, the contact details of the data protection officer; e. the data protection impact assessment provided for in Article 35; and f. any other information requested by the supervisory authority. 4. Member States shall consult the supervisory authority during the preparation of a proposal for a legislative measure to be adopted by a national parliament, or of a regulatory measure based on such a legislative measure, which relates to processing. 5. Notwithstanding paragraph 1, Member State law may require controllers to consult with, and obtain prior authorisation from, the supervisory authority in relation to processing by a controller for the performance of a task carried out by the controller in the public interest, including processing in relation to social protection and public health.

๐Ÿ’ผ Art. 37 Designation of the data protection officer

1. The controller and the processor shall designate a data protection officer in any case where: a. the processing is carried out by a public authority or body, except for courts acting in their judicial capacity; b. the core activities of the controller or the processor consist of processing operations which, by virtue of their nature, their scope and/or their purposes, require regular and systematic monitoring of data subjects on a large scale; or c. the core activities of the controller or the processor consist of processing on a large scale of special categories of data pursuant to Article 9 or personal data relating to criminal convictions and offences referred to in Article 10. 2. A group of undertakings may appoint a single data protection officer provided that a data protection officer is easily accessible from each establishment. 3. Where the controller or the processor is a public authority or body, a single data protection officer may be designated for several such authorities or bodies, taking account of their organisational structure and size. 4. In cases other than those referred to in paragraph 1, the controller or processor or associations and other bodies representing categories of controllers or processors may or, where required by Union or Member State law shall, designate a data protection officer. The data protection officer may act for such associations and other bodies representing controllers or processors. 5. The data protection officer shall be designated on the basis of professional qualities and, in particular, expert knowledge of data protection law and practices and the ability to fulfil the tasks referred to in Article 39. 6. The data protection officer may be a staff member of the controller or processor, or fulfil the tasks on the basis of a service contract. 7. The controller or the processor shall publish the contact details of the data protection officer and communicate them to the supervisory authority.

๐Ÿ’ผ Art. 38 Position of the data protection officer

1. The controller and the processor shall ensure that the data protection officer is involved, properly and in a timely manner, in all issues which relate to the protection of personal data. 2. The controller and processor shall support the data protection officer in performing the tasks referred to in Article 39 by providing resources necessary to carry out those tasks and access to personal data and processing operations, and to maintain his or her expert knowledge. 3. The controller and processor shall ensure that the data protection officer does not receive any instructions regarding the exercise of those tasks. He or she shall not be dismissed or penalised by the controller or the processor for performing his tasks. The data protection officer shall directly report to the highest management level of the controller or the processor. 4. Data subjects may contact the data protection officer with regard to all issues related to processing of their personal data and to the exercise of their rights under this Regulation. 5. The data protection officer shall be bound by secrecy or confidentiality concerning the performance of his or her tasks, in accordance with Union or Member State law. 6. The data protection officer may fulfil other tasks and duties. The controller or processor shall ensure that any such tasks and duties do not result in a conflict of interests.

๐Ÿ’ผ Art. 39 Tasks of the data protection officer

1. The data protection officer shall have at least the following tasks: a. to inform and advise the controller or the processor and the employees who carry out processing of their obligations pursuant to this Regulation and to other Union or Member State data protection provisions; b. to monitor compliance with this Regulation, with other Union or Member State data protection provisions and with the policies of the controller or processor in relation to the protection of personal data, including the assignment of responsibilities, awareness-raising and training of staff involved in processing operations, and the related audits; c. to provide advice where requested as regards the data protection impact assessment and monitor its performance pursuant to Article 35; d. to cooperate with the supervisory authority; e. to act as the contact point for the supervisory authority on issues relating to processing, including the prior consultation referred to in Article 36, and to consult, where appropriate, with regard to any other matter. 2. The data protection officer shall in the performance of his or her tasks have due regard to the risk associated with processing operations, taking into account the nature, scope, context and purposes of processing.

๐Ÿ’ผ Art. 4 Definitions

For the purposes of this Regulation: 1. 'personal data' means any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person; 2. 'processing' means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction; 3. 'restriction of processing' means the marking of stored personal data with the aim of limiting their processing in the future; 4. 'profiling' means any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person's performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements; 5. 'pseudonymisation' means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person; 6. 'filing system' means any structured set of personal data which are accessible according to specific criteria, whether centralised, decentralised or dispersed on a functional or geographical basis; 7. 'controller' means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law; 8. 'processor' means a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller; 9. 1'recipient' means a natural or legal person, public authority, agency or another body, to which the personal data are disclosed, whether a third party or not. However, public authorities which may receive personal data in the framework of a particular inquiry in accordance with Union or Member State law shall not be regarded as recipients; the processing of those data by those public authorities shall be in compliance with the applicable data protection rules according to the purposes of the processing; 10. 'third party' means a natural or legal person, public authority, agency or body other than the data subject, controller, processor and persons who, under the direct authority of the controller or processor, are authorised to process personal data; 11. 'consent' of the data subject means any freely given, specific, informed and unambiguous indication of the data subject's wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her; 12. 'personal data breach' means a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed; 13. 'genetic data' means personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question; 14. 'biometric data' means personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data; 15. 'data concerning health' means personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status; 16. 'main establishment' means: a. as regards a controller with establishments in more than one Member State, the place of its central administration in the Union, unless the decisions on the purposes and means of the processing of personal data are taken in another establishment of the controller in the Union and the latter establishment has the power to have such decisions implemented, in which case the establishment having taken such decisions is to be considered to be the main establishment; b. as regards a processor with establishments in more than one Member State, the place of its central administration in the Union, or, if the processor has no central administration in the Union, the establishment of the processor in the Union where the main processing activities in the context of the activities of an establishment of the processor take place to the extent that the processor is subject to specific obligations under this Regulation; 17. 'representative' means a natural or legal person established in the Union who, designated by the controller or processor in writing pursuant to Article 27, represents the controller or processor with regard to their respective obligations under this Regulation; 18. 'enterprise' means a natural or legal person engaged in an economic activity, irrespective of its legal form, including partnerships or associations regularly engaged in an economic activity; 19. 'group of undertakings' means a controlling undertaking and its controlled undertakings; 20. 'binding corporate rules' means personal data protection policies which are adhered to by a controller or processor established on the territory of a Member State for transfers or a set of transfers of personal data to a controller or processor in one or more third countries within a group of undertakings, or group of enterprises engaged in a joint economic activity; 21. 'supervisory authority' means an independent public authority which is established by a Member State pursuant to Article 51; 22. 'supervisory authority concerned' means a supervisory authority which is concerned by the processing of personal data because: a. the controller or processor is established on the territory of the Member State of that supervisory authority; b. data subjects residing in the Member State of that supervisory authority are substantially affected or likely to be substantially affected by the processing; or c. a complaint has been lodged with that supervisory authority; 23. 'cross-border processing' means either: a. processing of personal data which takes place in the context of the activities of establishments in more than one Member State of a controller or processor in the Union where the controller or processor is established in more than one Member State; or b. processing of personal data which takes place in the context of the activities of a single establishment of a controller or processor in the Union but which substantially affects or is likely to substantially affect data subjects in more than one Member State. 24. 'relevant and reasoned objection' means an objection to a draft decision as to whether there is an infringement of this Regulation, or whether envisaged action in relation to the controller or processor complies with this Regulation, which clearly demonstrates the significance of the risks posed by the draft decision as regards the fundamental rights and freedoms of data subjects and, where applicable, the free flow of personal data within the Union; 25. 'information society service' means a service as defined in point (b) of Article 1(1) of Directive (EU) 2015/1535 of the European Parliament and of the Council (_); 26. 'international organisation' means an organisation and its subordinate bodies governed by public international law, or any other body which is set up by, or on the basis of, an agreement between two or more countries.

๐Ÿ’ผ Art. 40 Codes of conduct

1. The Member States, the supervisory authorities, the Board and the Commission shall encourage the drawing up of codes of conduct intended to contribute to the proper application of this Regulation, taking account of the specific features of the various processing sectors and the specific needs of micro, small and medium-sized enterprises. 2. Associations and other bodies representing categories of controllers or processors may prepare codes of conduct, or amend or extend such codes, for the purpose of specifying the application of this Regulation, such as with regard to: a. fair and transparent processing; b. the legitimate interests pursued by controllers in specific contexts; c. the collection of personal data; d. the pseudonymisation of personal data; e. the information provided to the public and to data subjects; f. the exercise of the rights of data subjects; g. the information provided to, and the protection of, children, and the manner in which the consent of the holders of parental responsibility over children is to be obtained; h. the measures and procedures referred to in Articles 24 and 25 and the measures to ensure security of processing referred to in Article 32; i. the notification of personal data breaches to supervisory authorities and the communication of such personal data breaches to data subjects; j. the transfer of personal data to third countries or international organisations; or k. out-of-court proceedings and other dispute resolution procedures for resolving disputes between controllers and data subjects with regard to processing, without prejudice to the rights of data subjects pursuant to Articles 77 and 79. 3. In addition to adherence by controllers or processors subject to this Regulation, codes of conduct approved pursuant to paragraph 5 of this Article and having general validity pursuant to paragraph 9 of this Article may also be adhered to by controllers or processors that are not subject to this Regulation pursuant to Article 3 in order to provide appropriate safeguards within the framework of personal data transfers to third countries or international organisations under the terms referred to in point (e) of Article 46(2). Such controllers or processors shall make binding and enforceable commitments, via contractual or other legally binding instruments, to apply those appropriate safeguards including with regard to the rights of data subjects. 4. A code of conduct referred to in paragraph 2 of this Article shall contain mechanisms which enable the body referred to in Article 41(1) to carry out the mandatory monitoring of compliance with its provisions by the controllers or processors which undertake to apply it, without prejudice to the tasks and powers of supervisory authorities competent pursuant to Article 55 or 56. 5. Associations and other bodies referred to in paragraph 2 of this Article which intend to prepare a code of conduct or to amend or extend an existing code shall submit the draft code, amendment or extension to the supervisory authority which is competent pursuant to Article 55. The supervisory authority shall provide an opinion on whether the draft code, amendment or extension complies with this Regulation and shall approve that draft code, amendment or extension if it finds that it provides sufficient appropriate safeguards. 6. Where the draft code, or amendment or extension is approved in accordance with paragraph 5, and where the code of conduct concerned does not relate to processing activities in several Member States, the supervisory authority shall register and publish the code. 7. Where a draft code of conduct relates to processing activities in several Member States, the supervisory authority which is competent pursuant to Article 55 shall, before approving the draft code, amendment or extension, submit it in the procedure referred to in Article 63 to the Board which shall provide an opinion on whether the draft code, amendment or extension complies with this Regulation or, in the situation referred to in paragraph 3 of this Article, provides appropriate safeguards. 8. Where the opinion referred to in paragraph 7 confirms that the draft code, amendment or extension complies with this Regulation, or, in the situation referred to in paragraph 3, provides appropriate safeguards, the Board shall submit its opinion to the Commission. 9. The Commission may, by way of implementing acts, decide that the approved code of conduct, amendment or extension submitted to it pursuant to paragraph 8 of this Article have general validity within the Union. Those implementing acts shall be adopted in accordance with the examination procedure set out in Article 93(2). 10. The Commission shall ensure appropriate publicity for the approved codes which have been decided as having general validity in accordance with paragraph 9. 11. The Board shall collate all approved codes of conduct, amendments and extensions in a register and shall make them publicly available by way of appropriate means.

๐Ÿ’ผ Art. 41 Monitoring of approved codes of conduct

1. Without prejudice to the tasks and powers of the competent supervisory authority under Articles 57 and 58, the monitoring of compliance with a code of conduct pursuant to Article 40 may be carried out by a body which has an appropriate level of expertise in relation to the subject-matter of the code and is accredited for that purpose by the competent supervisory authority. 2. A body as referred to in paragraph 1 may be accredited to monitor compliance with a code of conduct where that body has: a. demonstrated its independence and expertise in relation to the subject-matter of the code to the satisfaction of the competent supervisory authority; b. established procedures which allow it to assess the eligibility of controllers and processors concerned to apply the code, to monitor their compliance with its provisions and to periodically review its operation; c. established procedures and structures to handle complaints about infringements of the code or the manner in which the code has been, or is being, implemented by a controller or processor, and to make those procedures and structures transparent to data subjects and the public; and d. demonstrated to the satisfaction of the competent supervisory authority that its tasks and duties do not result in a conflict of interests. 3. The competent supervisory authority shall submit the draft requirements for accreditation of a body as referred to in paragraph 1 of this Article to the Board pursuant to the consistency mechanism referred to in Article 63. 4. Without prejudice to the tasks and powers of the competent supervisory authority and the provisions of Chapter VIII, a body as referred to in paragraph 1 of this Article shall, subject to appropriate safeguards, take appropriate action in cases of infringement of the code by a controller or processor, including suspension or exclusion of the controller or processor concerned from the code. It shall inform the competent supervisory authority of such actions and the reasons for taking them. 5. The competent supervisory authority shall revoke the accreditation of a body as referred to in paragraph 1 if the requirements for accreditation are not, or are no longer, met or where actions taken by the body infringe this Regulation. 6. This Article shall not apply to processing carried out by public authorities and bodies.

๐Ÿ’ผ Art. 42 Certification

1. The Member States, the supervisory authorities, the Board and the Commission shall encourage, in particular at Union level, the establishment of data protection certification mechanisms and of data protection seals and marks, for the purpose of demonstrating compliance with this Regulation of processing operations by controllers and processors. The specific needs of micro, small and medium-sized enterprises shall be taken into account. 2. In addition to adherence by controllers or processors subject to this Regulation, data protection certification mechanisms, seals or marks approved pursuant to paragraph 5 of this Article may be established for the purpose of demonstrating the existence of appropriate safeguards provided by controllers or processors that are not subject to this Regulation pursuant to Article 3 within the framework of personal data transfers to third countries or international organisations under the terms referred to in point (f) of Article 46(2). Such controllers or processors shall make binding and enforceable commitments, via contractual or other legally binding instruments, to apply those appropriate safeguards, including with regard to the rights of data subjects. 3. The certification shall be voluntary and available via a process that is transparent. 4. A certification pursuant to this Article does not reduce the responsibility of the controller or the processor for compliance with this Regulation and is without prejudice to the tasks and powers of the supervisory authorities which are competent pursuant to Article 55 or 56. 5. A certification pursuant to this Article shall be issued by the certification bodies referred to in Article 43 or by the competent supervisory authority, on the basis of criteria approved by that competent supervisory authority pursuant to Article 58(3) or by the Board pursuant to Article 63. Where the criteria are approved by the Board, this may result in a common certification, the European Data Protection Seal. 6. The controller or processor which submits its processing to the certification mechanism shall provide the certification body referred to in Article 43, or where applicable, the competent supervisory authority, with all information and access to its processing activities which are necessary to conduct the certification procedure. 7. Certification shall be issued to a controller or processor for a maximum period of three years and may be renewed, under the same conditions, provided that the relevant criteria continue to be met. Certification shall be withdrawn, as applicable, by the certification bodies referred to in Article 43 or by the competent supervisory authority where the criteria for the certification are not or are no longer met. 8. The Board shall collate all certification mechanisms and data protection seals and marks in a register and shall make them publicly available by any appropriate means.

๐Ÿ’ผ Art. 43 Certification bodies

1. Without prejudice to the tasks and powers of the competent supervisory authority under Articles 57 and 58, certification bodies which have an appropriate level of expertise in relation to data protection shall, after informing the supervisory authority in order to allow it to exercise its powers pursuant to point (h) of Article 58(2) where necessary, issue and renew certification. Member States shall ensure that those certification bodies are accredited by one or both of the following: a. the supervisory authority which is competent pursuant to Article 55 or 56; b. the national accreditation body named in accordance with Regulation (EC) No 765/2008 of the European Parliament and of the Councilยน in accordance with EN-ISO/IEC 17065/2012 and with the additional requirements established by the supervisory authority which is competent pursuant to Article 55 or 56. 2. Certification bodies referred to in paragraph 1 shall be accredited in accordance with that paragraph only where they have: a. demonstrated their independence and expertise in relation to the subject-matter of the certification to the satisfaction of the competent supervisory authority; b. undertaken to respect the criteria referred to in Article 42(5) and approved by the supervisory authority which is competent pursuant to Article 55 or 56 or by the Board pursuant to Article 63; c. established procedures for the issuing, periodic review and withdrawal of data protection certification, seals and marks; d. established procedures and structures to handle complaints about infringements of the certification or the manner in which the certification has been, or is being, implemented by the controller or processor, and to make those procedures and structures transparent to data subjects and the public; and e. demonstrated, to the satisfaction of the competent supervisory authority, that their tasks and duties do not result in a conflict of interests. 3. The accreditation of certification bodies as referred to in paragraphs 1 and 2 of this Article shall take place on the basis of requirements approved by the supervisory authority which is competent pursuant to Article 55 or 56 or by the Board pursuant to Article 63. In the case of accreditation pursuant to point (b) of paragraph 1 of this Article, those requirements shall complement those envisaged in Regulation (EC) No 765/2008 and the technical rules that describe the methods and procedures of the certification bodies. 4. The certification bodies referred to in paragraph 1 shall be responsible for the proper assessment leading to the certification or the withdrawal of such certification without prejudice to the responsibility of the controller or processor for compliance with this Regulation. The accreditation shall be issued for a maximum period of five years and may be renewed on the same conditions provided that the certification body meets the requirements set out in this Article. 5. The certification bodies referred to in paragraph 1 shall provide the competent supervisory authorities with the reasons for granting or withdrawing the requested certification. 6. The requirements referred to in paragraph 3 of this Article and the criteria referred to in Article 42(5) shall be made public by the supervisory authority in an easily accessible form. The supervisory authorities shall also transmit those requirements and criteria to the Board. 7. Without prejudice to Chapter VIII, the competent supervisory authority or the national accreditation body shall revoke an accreditation of a certification body pursuant to paragraph 1 of this Article where the conditions for the accreditation are not, or are no longer, met or where actions taken by a certification body infringe this Regulation. 8. The Commission shall be empowered to adopt delegated acts in accordance with Article 92 for the purpose of specifying the requirements to be taken into account for the data protection certification mechanisms referred to in Article 42(1). 9. The Commission may adopt implementing acts laying down technical standards for certification mechanisms and data protection seals and marks, and mechanisms to promote and recognise those certification mechanisms, seals and marks. Those implementing acts shall be adopted in accordance with the examination procedure referred to in Article 93(2).

๐Ÿ’ผ Art. 44 General principle for transfers

Any transfer of personal data which are undergoing processing or are intended for processing after transfer to a third country or to an international organisation shall take place only if, subject to the other provisions of this Regulation, the conditions laid down in this Chapter are complied with by the controller and processor, including for onward transfers of personal data from the third country or an international organisation to another third country or to another international organisation. All provisions in this Chapter shall be applied in order to ensure that the level of protection of natural persons guaranteed by this Regulation is not undermined.

๐Ÿ’ผ Art. 45 Transfers on the basis of an adequacy decision

1. A transfer of personal data to a third country or an international organisation may take place where the Commission has decided that the third country, a territory or one or more specified sectors within that third country, or the international organisation in question ensures an adequate level of protection. Such a transfer shall not require any specific authorisation. 2. When assessing the adequacy of the level of protection, the Commission shall, in particular, take account of the following elements: a. the rule of law, respect for human rights and fundamental freedoms, relevant legislation, both general and sectoral, including concerning public security, defence, national security and criminal law and the access of public authorities to personal data, as well as the implementation of such legislation, data protection rules, professional rules and security measures, including rules for the onward transfer of personal data to another third country or international organisation which are complied with in that country or international organisation, case-law, as well as effective and enforceable data subject rights and effective administrative and judicial redress for the data subjects whose personal data are being transferred; b. the existence and effective functioning of one or more independent supervisory authorities in the third country or to which an international organisation is subject, with responsibility for ensuring and enforcing compliance with the data protection rules, including adequate enforcement powers, for assisting and advising the data subjects in exercising their rights and for cooperation with the supervisory authorities of the Member States; and c. the international commitments the third country or international organisation concerned has entered into, or other obligations arising from legally binding conventions or instruments as well as from its participation in multilateral or regional systems, in particular in relation to the protection of personal data. 3. The Commission, after assessing the adequacy of the level of protection, may decide, by means of implementing act, that a third country, a territory or one or more specified sectors within a third country, or an international organisation ensures an adequate level of protection within the meaning of paragraph 2 of this Article. The implementing act shall provide for a mechanism for a periodic review, at least every four years, which shall take into account all relevant developments in the third country or international organisation. The implementing act shall specify its territorial and sectoral application and, where applicable, identify the supervisory authority or authorities referred to in point (b) of paragraph 2 of this Article. The implementing act shall be adopted in accordance with the examination procedure referred to in Article 93(2). 4. The Commission shall, on an ongoing basis, monitor developments in third countries and international organisations that could affect the functioning of decisions adopted pursuant to paragraph 3 of this Article and decisions adopted on the basis of Article 25(6) of Directive 95/46/EC. 5. The Commission shall, where available information reveals, in particular following the review referred to in paragraph 3 of this Article, that a third country, a territory or one or more specified sectors within a third country, or an international organisation no longer ensures an adequate level of protection within the meaning of paragraph 2 of this Article, to the extent necessary, repeal, amend or suspend the decision referred to in paragraph 3 of this Article by means of implementing acts without retro-active effect. Those implementing acts shall be adopted in accordance with the examination procedure referred to in Article 93(2). 6. On duly justified imperative grounds of urgency, the Commission shall adopt immediately applicable implementing acts in accordance with the procedure referred to in Article 93(3). 7. The Commission shall enter into consultations with the third country or international organisation with a view to remedying the situation giving rise to the decision made pursuant to paragraph 5. 8. A decision pursuant to paragraph 5 of this Article is without prejudice to transfers of personal data to the third country, a territory or one or more specified sectors within that third country, or the international organisation in question pursuant to Articles 46 to 49. 9. The Commission shall publish in the Official Journal of the European Union and on its website a list of the third countries, territories and specified sectors within a third country and international organisations for which it has decided that an adequate level of protection is or is no longer ensured. 10. Decisions adopted by the Commission on the basis of Article 25(6) of Directive 95/46/EC shall remain in force until amended, replaced or repealed by a Commission Decision adopted in accordance with paragraph 3 or 5 of this Article.

๐Ÿ’ผ Art. 46 Transfers subject to appropriate safeguards

1. In the absence of a decision pursuant to Article 45(3), a controller or processor may transfer personal data to a third country or an international organisation only if the controller or processor has provided appropriate safeguards, and on condition that enforceable data subject rights and effective legal remedies for data subjects are available. 2. The appropriate safeguards referred to in paragraph 1 may be provided for, without requiring any specific authorisation from a supervisory authority, by: a. a legally binding and enforceable instrument between public authorities or bodies; b. binding corporate rules in accordance with Article 47; c. standard data protection clauses adopted by the Commission in accordance with the examination procedure referred to in Article 93(2); d. standard data protection clauses adopted by a supervisory authority and approved by the Commission pursuant to the examination procedure referred to in Article 93(2); e. an approved code of conduct pursuant to Article 40 together with binding and enforceable commitments of the controller or processor in the third country to apply the appropriate safeguards, including as regards data subjectsโ€™ rights; or f. an approved certification mechanism pursuant to Article 42 together with binding and enforceable commitments of the controller or processor in the third country to apply the appropriate safeguards, including as regards data subjectsโ€™ rights. 3. Subject to the authorisation from the competent supervisory authority, the appropriate safeguards referred to in paragraph 1 may also be provided for, in particular, by: a. contractual clauses between the controller or processor and the controller, processor or the recipient of the personal data in the third country or international organisation; or b. provisions to be inserted into administrative arrangements between public authorities or bodies which include enforceable and effective data subject rights. 4. The supervisory authority shall apply the consistency mechanism referred to in Article 63 in the cases referred to in paragraph 3 of this Article. 5. Authorisations by a Member State or supervisory authority on the basis of Article 26(2) of Directive 95/46/EC shall remain valid until amended, replaced or repealed, if necessary, by that supervisory authority. Decisions adopted by the Commission on the basis of Article 26(4) of Directive 95/46/EC shall remain in force until amended, replaced or repealed, if necessary, by a Commission Decision adopted in accordance with paragraph 2 of this Article.

๐Ÿ’ผ Art. 47 Binding corporate rules

1. The competent supervisory authority shall approve binding corporate rules in accordance with the consistency mechanism set out in Article 63, provided that they: a. are legally binding and apply to and are enforced by every member concerned of the group of undertakings, or group of enterprises engaged in a joint economic activity, including their employees; b. expressly confer enforceable rights on data subjects with regard to the processing of their personal data; and c. fulfil the requirements laid down in paragraph 2. 2. The binding corporate rules referred to in paragraph 1 shall specify at least: a. the structure and contact details of the group of undertakings, or group of enterprises engaged in a joint economic activity and of each of its members; b. the data transfers or set of transfers, including the categories of personal data, the type of processing and its purposes, the type of data subjects affected and the identification of the third country or countries in question; c. their legally binding nature, both internally and externally; d. the application of the general data protection principles, in particular purpose limitation, data minimisation, limited storage periods, data quality, data protection by design and by default, legal basis for processing, processing of special categories of personal data, measures to ensure data security, and the requirements in respect of onward transfers to bodies not bound by the binding corporate rules; e. the rights of data subjects in regard to processing and the means to exercise those rights, including the right not to be subject to decisions based solely on automated processing, including profiling in accordance with Article 22, the right to lodge a complaint with the competent supervisory authority and before the competent courts of the Member States in accordance with Article 79, and to obtain redress and, where appropriate, compensation for a breach of the binding corporate rules; f. the acceptance by the controller or processor established on the territory of a Member State of liability for any breaches of the binding corporate rules by any member concerned not established in the Union; the controller or the processor shall be exempt from that liability, in whole or in part, only if it proves that that member is not responsible for the event giving rise to the damage; g. how the information on the binding corporate rules, in particular on the provisions referred to in points (d), (e) and (f) of this paragraph is provided to the data subjects in addition to Articles 13 and 14; h. the tasks of any data protection officer designated in accordance with Article 37 or any other person or entity in charge of the monitoring compliance with the binding corporate rules within the group of undertakings, or group of enterprises engaged in a joint economic activity, as well as monitoring training and complaint-handling; i. the complaint procedures; j. the mechanisms within the group of undertakings, or group of enterprises engaged in a joint economic activity for ensuring the verification of compliance with the binding corporate rules. Such mechanisms shall include data protection audits and methods for ensuring corrective actions to protect the rights of the data subject. Results of such verification should be communicated to the person or entity referred to in point (h) and to the board of the controlling undertaking of a group of undertakings, or of the group of enterprises engaged in a joint economic activity, and should be available upon request to the competent supervisory authority; k. the mechanisms for reporting and recording changes to the rules and reporting those changes to the supervisory authority; l. the cooperation mechanism with the supervisory authority to ensure compliance by any member of the group of undertakings, or group of enterprises engaged in a joint economic activity, in particular by making available to the supervisory authority the results of verifications of the measures referred to in point (j); m. the mechanisms for reporting to the competent supervisory authority any legal requirements to which a member of the group of undertakings, or group of enterprises engaged in a joint economic activity is subject in a third country which are likely to have a substantial adverse effect on the guarantees provided by the binding corporate rules; and n. the appropriate data protection training to personnel having permanent or regular access to personal data. 3. The Commission may specify the format and procedures for the exchange of information between controllers, processors and supervisory authorities for binding corporate rules within the meaning of this Article. Those implementing acts shall be adopted in accordance with the examination procedure set out in Article 93(2).

๐Ÿ’ผ Art. 48 Transfers or disclosures not authorised by Union law

Any judgment of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may only be recognised or enforceable in any manner if based on an international agreement, such as a mutual legal assistance treaty, in force between the requesting third country and the Union or a Member State, without prejudice to other grounds for transfer pursuant to this Chapter.

๐Ÿ’ผ Art. 49 Derogations for specific situations

1. In the absence of an adequacy decision pursuant to Article 45(3), or of appropriate safeguards pursuant to Article 46, including binding corporate rules, a transfer or a set of transfers of personal data to a third country or an international organisation shall take place only on one of the following conditions: a. the data subject has explicitly consented to the proposed transfer, after having been informed of the possible risks of such transfers for the data subject due to the absence of an adequacy decision and appropriate safeguards; b. the transfer is necessary for the performance of a contract between the data subject and the controller or the implementation of pre-contractual measures taken at the data subjectโ€™s request; c. the transfer is necessary for the conclusion or performance of a contract concluded in the interest of the data subject between the controller and another natural or legal person; d. the transfer is necessary for important reasons of public interest; e. the transfer is necessary for the establishment, exercise or defence of legal claims; f. the transfer is necessary in order to protect the vital interests of the data subject or of other persons, where the data subject is physically or legally incapable of giving consent; g. the transfer is made from a register which according to Union or Member State law is intended to provide information to the public and which is open to consultation either by the public in general or by any person who can demonstrate a legitimate interest, but only to the extent that the conditions laid down by Union or Member State law for consultation are fulfilled in the particular case. 2. Where a transfer could not be based on a provision in Article 45 or 46, including the provisions on binding corporate rules, and none of the derogations for a specific situation referred to in the first subparagraph of this paragraph is applicable, a transfer to a third country or an international organisation may take place only if the transfer is not repetitive, concerns only a limited number of data subjects, is necessary for the purposes of compelling legitimate interests pursued by the controller which are not overridden by the interests or rights and freedoms of the data subject, and the controller has assessed all the circumstances surrounding the data transfer and has on the basis of that assessment provided suitable safeguards with regard to the protection of personal data. The controller shall inform the supervisory authority of the transfer. The controller shall, in addition to providing the information referred to in Articles 13 and 14, inform the data subject of the transfer and on the compelling legitimate interests pursued. 3. A transfer pursuant to point (g) of the first subparagraph of paragraph 1 shall not involve the entirety of the personal data or entire categories of the personal data contained in the register. Where the register is intended for consultation by persons having a legitimate interest, the transfer shall be made only at the request of those persons or if they are to be the recipients. 4. Points (a), (b) and (c) of the first subparagraph of paragraph 1 and the second subparagraph thereof shall not apply to activities carried out by public authorities in the exercise of their public powers. 5. The public interest referred to in point (d) of the first subparagraph of paragraph 1 shall be recognised in Union law or in the law of the Member State to which the controller is subject. 6. In the absence of an adequacy decision, Union or Member State law may, for important reasons of public interest, expressly set limits to the transfer of specific categories of personal data to a third country or an international organisation. Member States shall notify such provisions to the Commission. 7. The controller or processor shall document the assessment as well as the suitable safeguards referred to in the second subparagraph of paragraph 1 of this Article in the records referred to in Article 30.

๐Ÿ’ผ Art. 5 Principles relating to processing of personal data

1. Personal data shall be: a. processed lawfully, fairly and in a transparent manner in relation to the data subject ('lawfulness, fairness and transparency'); b. collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes ('purpose limitation'); c. adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed ('data minimisation'); d. accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay ('accuracy'); e. kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed; personal data may be stored for longer periods insofar as the personal data will be processed solely for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) subject to implementation of the appropriate technical and organisational measures required by this Regulation in order to safeguard the rights and freedoms of the data subject ('storage limitation'); f. processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures ('integrity and confidentiality'). 2. The controller shall be responsible for, and be able to demonstrate compliance with, paragraph 1 ('accountability').

๐Ÿ’ผ Art. 50 International cooperation for the protection of personal data

In relation to third countries and international organisations, the Commission and supervisory authorities shall take appropriate steps to: 1. develop international cooperation mechanisms to facilitate the effective enforcement of legislation for the protection of personal data; 2. provide international mutual assistance in the enforcement of legislation for the protection of personal data, including through notification, complaint referral, investigative assistance and information exchange, subject to appropriate safeguards for the protection of personal data and other fundamental rights and freedoms; 3. engage relevant stakeholders in discussion and activities aimed at furthering international cooperation in the enforcement of legislation for the protection of personal data; 4. promote the exchange and documentation of personal data protection legislation and practice, including on jurisdictional conflicts with third countries.

๐Ÿ’ผ Art. 51 Supervisory authority

1. Each Member State shall provide for one or more independent public authorities to be responsible for monitoring the application of this Regulation, in order to protect the fundamental rights and freedoms of natural persons in relation to processing and to facilitate the free flow of personal data within the Union (โ€˜supervisory authorityโ€™). 2. Each supervisory authority shall contribute to the consistent application of this Regulation throughout the Union. For that purpose, the supervisory authorities shall cooperate with each other and the Commission in accordance with Chapter VII. 3. Where more than one supervisory authority is established in a Member State, that Member State shall designate the supervisory authority which is to represent those authorities in the Board and shall set out the mechanism to ensure compliance by the other authorities with the rules relating to the consistency mechanism referred to in Article 63. 4. Each Member State shall notify to the Commission the provisions of its law which it adopts pursuant to this Chapter, by 25 May 2018 and, without delay, any subsequent amendment affecting them.

๐Ÿ’ผ Art. 52 Independence

1. Each supervisory authority shall act with complete independence in performing its tasks and exercising its powers in accordance with this Regulation. 2. The member or members of each supervisory authority shall, in the performance of their tasks and exercise of their powers in accordance with this Regulation, remain free from external influence, whether direct or indirect, and shall neither seek nor take instructions from anybody. 3. Member or members of each supervisory authority shall refrain from any action incompatible with their duties and shall not, during their term of office, engage in any incompatible occupation, whether gainful or not. 4. Each Member State shall ensure that each supervisory authority is provided with the human, technical and financial resources, premises and infrastructure necessary for the effective performance of its tasks and exercise of its powers, including those to be carried out in the context of mutual assistance, cooperation and participation in the Board. 5. Each Member State shall ensure that each supervisory authority chooses and has its own staff which shall be subject to the exclusive direction of the member or members of the supervisory authority concerned. 6. Each Member State shall ensure that each supervisory authority is subject to financial control which does not affect its independence and that it has separate, public annual budgets, which may be part of the overall state or national budget.

๐Ÿ’ผ Art. 53 General conditions for the members of the supervisory authority

1. Member States shall provide for each member of their supervisory authorities to be appointed by means of a transparent procedure by: - their parliament; - their government; - their head of State; or - an independent body entrusted with the appointment under Member State law. 2. Each member shall have the qualifications, experience and skills, in particular in the area of the protection of personal data, required to perform its duties and exercise its powers. 3. The duties of a member shall end in the event of the expiry of the term of office, resignation or compulsory retirement, in accordance with the law of the Member State concerned. 4. A member shall be dismissed only in cases of serious misconduct or if the member no longer fulfils the conditions required for the performance of the duties.

๐Ÿ’ผ Art. 54 Rules on the establishment of the supervisory authority

1. Each Member State shall provide by law for all of the following: a. the establishment of each supervisory authority; b. the qualifications and eligibility conditions required to be appointed as member of each supervisory authority; c. the rules and procedures for the appointment of the member or members of each supervisory authority; d. the duration of the term of the member or members of each supervisory authority of no less than four years, except for the first appointment after 24 May 2016, part of which may take place for a shorter period where that is necessary to protect the independence of the supervisory authority by means of a staggered appointment procedure; e. whether and, if so, for how many terms the member or members of each supervisory authority is eligible for reappointment; f. the conditions governing the obligations of the member or members and staff of each supervisory authority, prohibitions on actions, occupations and benefits incompatible therewith during and after the term of office and rules governing the cessation of employment. 2. The member or members and the staff of each supervisory authority shall, in accordance with Union or Member State law, be subject to a duty of professional secrecy both during and after their term of office, with regard to any confidential information which has come to their knowledge in the course of the performance of their tasks or exercise of their powers. During their term of office, that duty of professional secrecy shall in particular apply to reporting by natural persons of infringements of this Regulation.

๐Ÿ’ผ Art. 55 Competence

1. Each supervisory authority shall be competent for the performance of the tasks assigned to and the exercise of the powers conferred on it in accordance with this Regulation on the territory of its own Member State. 2. Where processing is carried out by public authorities or private bodies acting on the basis of point (c) or (e) of Article 6(1), the supervisory authority of the Member State concerned shall be competent. In such cases Article 56 does not apply. 3. Supervisory authorities shall not be competent to supervise processing operations of courts acting in their judicial capacity.

๐Ÿ’ผ Art. 56 Competence of the lead supervisory authority

1. Without prejudice to Article 55, the supervisory authority of the main establishment or of the single establishment of the controller or processor shall be competent to act as lead supervisory authority for the cross-border processing carried out by that controller or processor in accordance with the procedure provided in Article 60. 2. By derogation from paragraph 1, each supervisory authority shall be competent to handle a complaint lodged with it or a possible infringement of this Regulation, if the subject matter relates only to an establishment in its Member State or substantially affects data subjects only in its Member State. 3. In the cases referred to in paragraph 2 of this Article, the supervisory authority shall inform the lead supervisory authority without delay on that matter. Within a period of three weeks after being informed the lead supervisory authority shall decide whether or not it will handle the case in accordance with the procedure provided in Article 60, taking into account whether or not there is an establishment of the controller or processor in the Member State of which the supervisory authority informed it. 4. Where the lead supervisory authority decides to handle the case, the procedure provided in Article 60 shall apply. The supervisory authority which informed the lead supervisory authority may submit to the lead supervisory authority a draft for a decision. The lead supervisory authority shall take utmost account of that draft when preparing the draft decision referred to in Article 60(3). 5. Where the lead supervisory authority decides not to handle the case, the supervisory authority which informed the lead supervisory authority shall handle it according to Articles 61 and 62. 6. The lead supervisory authority shall be the sole interlocutor of the controller or processor for the cross-border processing carried out by that controller or processor.

๐Ÿ’ผ Art. 57 Tasks

1. Without prejudice to other tasks set out under this Regulation, each supervisory authority shall on its territory: a. monitor and enforce the application of this Regulation; b. promote public awareness and understanding of the risks, rules, safeguards and rights in relation to processing. Activities addressed specifically to children shall receive specific attention; c. advise, in accordance with Member State law, the national parliament, the government, and other institutions and bodies on legislative and administrative measures relating to the protection of natural personsโ€™ rights and freedoms with regard to processing; d. promote the awareness of controllers and processors of their obligations under this Regulation; e. upon request, provide information to any data subject concerning the exercise of their rights under this Regulation and, if appropriate, cooperate with the supervisory authorities in other Member States to that end; f. handle complaints lodged by a data subject, or by a body, organisation or association in accordance with Article 80, and investigate, to the extent appropriate, the subject matter of the complaint and inform the complainant of the progress and the outcome of the investigation within a reasonable period, in particular if further investigation or coordination with another supervisory authority is necessary; g. cooperate with, including sharing information and provide mutual assistance to, other supervisory authorities with a view to ensuring the consistency of application and enforcement of this Regulation; h. conduct investigations on the application of this Regulation, including on the basis of information received from another supervisory authority or other public authority; i. monitor relevant developments, insofar as they have an impact on the protection of personal data, in particular the development of information and communication technologies and commercial practices; j. adopt standard contractual clauses referred to in Article 28(8) and in point (d) of Article 46(2); k. establish and maintain a list in relation to the requirement for data protection impact assessment pursuant to Article 35(4); l. give advice on the processing operations referred to in Article 36(2); m. encourage the drawing up of codes of conduct pursuant to Article 40(1) and provide an opinion and approve such codes of conduct which provide sufficient safeguards, pursuant to Article 40(5); n. encourage the establishment of data protection certification mechanisms and of data protection seals and marks pursuant to Article 42(1), and approve the criteria of certification pursuant to Article 42(5); o. where applicable, carry out a periodic review of certifications issued in accordance with Article 42(7); p. draft and publish the requirements for accreditation of a body for monitoring codes of conduct pursuant to Article 41 and of a certification body pursuant to Article 43; q. conduct the accreditation of a body for monitoring codes of conduct pursuant to Article 41 and of a certification body pursuant to Article 43; r. authorise contractual clauses and provisions referred to in Article 46(3); s. approve binding corporate rules pursuant to Article 47; t. contribute to the activities of the Board; u. keep internal records of infringements of this Regulation and of measures taken in accordance with Article 58(2); and v. fulfil any other tasks related to the protection of personal data. 2. Each supervisory authority shall facilitate the submission of complaints referred to in point (f) of paragraph 1 by measures such as a complaint submission form which can also be completed electronically, without excluding other means of communication. 3. The performance of the tasks of each supervisory authority shall be free of charge for the data subject and, where applicable, for the data protection officer. 4. Where requests are manifestly unfounded or excessive, in particular because of their repetitive character, the supervisory authority may charge a reasonable fee based on administrative costs, or refuse to act on the request. The supervisory authority shall bear the burden of demonstrating the manifestly unfounded or excessive character of the request.

๐Ÿ’ผ Art. 58 Powers

1. Each supervisory authority shall have all of the following investigative powers: a. to order the controller and the processor, and, where applicable, the controller's or the processor's representative to provide any information it requires for the performance of its tasks; b. to carry out investigations in the form of data protection audits; c. to carry out a review on certifications issued pursuant to Article 42(7); d. to notify the controller or the processor of an alleged infringement of this Regulation; e. to obtain, from the controller and the processor, access to all personal data and to all information necessary for the performance of its tasks; f. to obtain access to any premises of the controller and the processor, including to any data processing equipment and means, in accordance with Union or Member State procedural law. 2. Each supervisory authority shall have all of the following corrective powers: a. to issue warnings to a controller or processor that intended processing operations are likely to infringe provisions of this Regulation; b. to issue reprimands to a controller or a processor where processing operations have infringed provisions of this Regulation; c. to order the controller or the processor to comply with the data subject's requests to exercise his or her rights pursuant to this Regulation; d. to order the controller or processor to bring processing operations into compliance with the provisions of this Regulation, where appropriate, in a specified manner and within a specified period; e. to order the controller to communicate a personal data breach to the data subject; f. to impose a temporary or definitive limitation including a ban on processing; g. to order the rectification or erasure of personal data or restriction of processing pursuant to Articles 16, 17 and 18 and the notification of such actions to recipients to whom the personal data have been disclosed pursuant to Article 17(2) and Article 19; h. to withdraw a certification or to order the certification body to withdraw a certification issued pursuant to Articles 42 and 43, or to order the certification body not to issue certification if the requirements for the certification are not or are no longer met; i. to impose an administrative fine pursuant to Article 83, in addition to, or instead of measures referred to in this paragraph, depending on the circumstances of each individual case; j. to order the suspension of data flows to a recipient in a third country or to an international organisation. 3. Each supervisory authority shall have all of the following authorisation and advisory powers: a. to advise the controller in accordance with the prior consultation procedure referred to in Article 36; b. to issue, on its own initiative or on request, opinions to the national parliament, the Member State government or, in accordance with Member State law, to other institutions and bodies as well as to the public on any issue related to the protection of personal data; c. to authorise processing referred to in Article 36(5), if the law of the Member State requires such prior authorisation; d. to issue an opinion and approve draft codes of conduct pursuant to Article 40(5); e. to accredit certification bodies pursuant to Article 43 f. to issue certifications and approve criteria of certification in accordance with Article 42(5); g. to adopt standard data protection clauses referred to in Article 28(8) and in point (d) of Article 46(2); h. to authorise contractual clauses referred to in point (a) of Article 46(3); i. to authorise administrative arrangements referred to in point (b) of Article 46(3); j. to approve binding corporate rules pursuant to Article 47. 4. The exercise of the powers conferred on the supervisory authority pursuant to this Article shall be subject to appropriate safeguards, including effective judicial remedy and due process, set out in Union and Member State law in accordance with the Charter. 5. Each Member State shall provide by law that its supervisory authority shall have the power to bring infringements of this Regulation to the attention of the judicial authorities and where appropriate, to commence or engage otherwise in legal proceedings, in order to enforce the provisions of this Regulation. 6. Each Member State may provide by law that its supervisory authority shall have additional powers to those referred to in paragraphs 1, 2 and 3. The exercise of those powers shall not impair the effective operation of Chapter VII.

๐Ÿ’ผ Art. 59 Activity reports

Each supervisory authority shall draw up an annual report on its activities, which may include a list of types of infringement notified and types of measures taken in accordance with Article 58(2). Those reports shall be transmitted to the national parliament, the government and other authorities as designated by Member State law. They shall be made available to the public, to the Commission and to the Board.

๐Ÿ’ผ Art. 6 Lawfulness of processing

1. Processing shall be lawful only if and to the extent that at least one of the following applies: i. the data subject has given consent to the processing of his or her personal data for one or more specific purposes; ii. processing is necessary for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract; iii. processing is necessary for compliance with a legal obligation to which the controller is subject; iv. processing is necessary in order to protect the vital interests of the data subject or of another natural person; v. processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller; a. processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child. 2. Point (f) of the first subparagraph shall not apply to processing carried out by public authorities in the performance of their tasks. 3. Member States may maintain or introduce more specific provisions to adapt the application of the rules of this Regulation with regard to processing for compliance with points (c) and (e) of paragraph 1 by determining more precisely specific requirements for the processing and other measures to ensure lawful and fair processing including for other specific processing situations as provided for in Chapter IX. 4. The basis for the processing referred to in point (c) and (e) of paragraph 1 shall be laid down by: i. Union law; or a. Member State law to which the controller is subject. 5. The purpose of the processing shall be determined in that legal basis or, as regards the processing referred to in point (e) of paragraph 1, shall be necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller. That legal basis may contain specific provisions to adapt the application of rules of this Regulation, inter alia: the general conditions governing the lawfulness of processing by the controller; the types of data which are subject to the processing; the data subjects concerned; the entities to, and the purposes for which, the personal data may be disclosed; the purpose limitation; storage periods; and processing operations and processing procedures, including measures to ensure lawful and fair processing such as those for other specific processing situations as provided for in Chapter IX. The Union or the Member State law shall meet an objective of public interest and be proportionate to the legitimate aim pursued. 6. Where the processing for a purpose other than that for which the personal data have been collected is not based on the data subject's consent or on a Union or Member State law which constitutes a necessary and proportionate measure in a democratic society to safeguard the objectives referred to in Article 23(1), the controller shall, in order to ascertain whether processing for another purpose is compatible with the purpose for which the personal data are initially collected, take into account, inter alia: a. any link between the purposes for which the personal data have been collected and the purposes of the intended further processing; b. the context in which the personal data have been collected, in particular regarding the relationship between data subjects and the controller; c. the nature of the personal data, in particular whether special categories of personal data are processed, pursuant to Article 9, or whether personal data related to criminal convictions and offences are processed, pursuant to Article 10; d. the possible consequences of the intended further processing for data subjects; e. the existence of appropriate safeguards, which may include encryption or pseudonymisation.

๐Ÿ’ผ Art. 60 Cooperation between the lead supervisory authority and the other supervisory authorities concerned

1. The lead supervisory authority shall cooperate with the other supervisory authorities concerned in accordance with this Article in an endeavour to reach consensus. The lead supervisory authority and the supervisory authorities concerned shall exchange all relevant information with each other. 2. The lead supervisory authority may request at any time other supervisory authorities concerned to provide mutual assistance pursuant to Article 61 and may conduct joint operations pursuant to Article 62, in particular for carrying out investigations or for monitoring the implementation of a measure concerning a controller or processor established in another Member State. 3. The lead supervisory authority shall, without delay, communicate the relevant information on the matter to the other supervisory authorities concerned. It shall without delay submit a draft decision to the other supervisory authorities concerned for their opinion and take due account of their views. 4. Where any of the other supervisory authorities concerned within a period of four weeks after having been consulted in accordance with paragraph 3 of this Article, expresses a relevant and reasoned objection to the draft decision, the lead supervisory authority shall, if it does not follow the relevant and reasoned objection or is of the opinion that the objection is not relevant or reasoned, submit the matter to the consistency mechanism referred to in Article 63. 5. Where the lead supervisory authority intends to follow the relevant and reasoned objection made, it shall submit to the other supervisory authorities concerned a revised draft decision for their opinion. That revised draft decision shall be subject to the procedure referred to in paragraph 4 within a period of two weeks. 6. Where none of the other supervisory authorities concerned has objected to the draft decision submitted by the lead supervisory authority within the period referred to in paragraphs 4 and 5, the lead supervisory authority and the supervisory authorities concerned shall be deemed to be in agreement with that draft decision and shall be bound by it. 7. The lead supervisory authority shall adopt and notify the decision to the main establishment or single establishment of the controller or processor, as the case may be and inform the other supervisory authorities concerned and the Board of the decision in question, including a summary of the relevant facts and grounds. The supervisory authority with which a complaint has been lodged shall inform the complainant on the decision. 8. By derogation from paragraph 7, where a complaint is dismissed or rejected, the supervisory authority with which the complaint was lodged shall adopt the decision and notify it to the complainant and shall inform the controller thereof. 9. Where the lead supervisory authority and the supervisory authorities concerned agree to dismiss or reject parts of a complaint and to act on other parts of that complaint, a separate decision shall be adopted for each of those parts of the matter. The lead supervisory authority shall adopt the decision for the part concerning actions in relation to the controller, shall notify it to the main establishment or single establishment of the controller or processor on the territory of its Member State and shall inform the complainant thereof, while the supervisory authority of the complainant shall adopt the decision for the part concerning dismissal or rejection of that complaint, and shall notify it to that complainant and shall inform the controller or processor thereof. 10. After being notified of the decision of the lead supervisory authority pursuant to paragraphs 7 and 9, the controller or processor shall take the necessary measures to ensure compliance with the decision as regards processing activities in the context of all its establishments in the Union. The controller or processor shall notify the measures taken for complying with the decision to the lead supervisory authority, which shall inform the other supervisory authorities concerned. 11. Where, in exceptional circumstances, a supervisory authority concerned has reasons to consider that there is an urgent need to act in order to protect the interests of data subjects, the urgency procedure referred to in Article 66 shall apply. 12. The lead supervisory authority and the other supervisory authorities concerned shall supply the information required under this Article to each other by electronic means, using a standardised format.

๐Ÿ’ผ Art. 61 Mutual assistance

1. Supervisory authorities shall provide each other with relevant information and mutual assistance in order to implement and apply this Regulation in a consistent manner, and shall put in place measures for effective cooperation with one another. Mutual assistance shall cover, in particular, information requests and supervisory measures, such as requests to carry out prior authorisations and consultations, inspections and investigations. 2. Each supervisory authority shall take all appropriate measures required to reply to a request of another supervisory authority without undue delay and no later than one month after receiving the request. Such measures may include, in particular, the transmission of relevant information on the conduct of an investigation. 3. Requests for assistance shall contain all the necessary information, including the purpose of and reasons for the request. Information exchanged shall be used only for the purpose for which it was requested. 4. The requested supervisory authority shall not refuse to comply with the request unless: a. it is not competent for the subject-matter of the request or for the measures it is requested to execute; or b. compliance with the request would infringe this Regulation or Union or Member State law to which the supervisory authority receiving the request is subject. 5. The requested supervisory authority shall inform the requesting supervisory authority of the results or, as the case may be, of the progress of the measures taken in order to respond to the request. The requested supervisory authority shall provide reasons for any refusal to comply with a request pursuant to paragraph 4. 6. Requested supervisory authorities shall, as a rule, supply the information requested by other supervisory authorities by electronic means, using a standardised format. 7. Requested supervisory authorities shall not charge a fee for any action taken by them pursuant to a request for mutual assistance. Supervisory authorities may agree on rules to indemnify each other for specific expenditure arising from the provision of mutual assistance in exceptional circumstances. 8. Where a supervisory authority does not provide the information referred to in paragraph 5 of this Article within one month of receiving the request of another supervisory authority, the requesting supervisory authority may adopt a provisional measure on the territory of its Member State in accordance with Article 55(1). In that case, the urgent need to act under Article 66(1) shall be presumed to be met and require an urgent binding decision from the Board pursuant to Article 66(2). 9. The Commission may, by means of implementing acts, specify the format and procedures for mutual assistance referred to in this Article and the arrangements for the exchange of information by electronic means between supervisory authorities, and between supervisory authorities and the Board, in particular the standardised format referred to in paragraph 6 of this Article. Those implementing acts shall be adopted in accordance with the examination procedure referred to in Article 93(2).

๐Ÿ’ผ Art. 62 Joint operations of supervisory authorities

1. The supervisory authorities shall, where appropriate, conduct joint operations including joint investigations and joint enforcement measures in which members or staff of the supervisory authorities of other Member States are involved. 2. Where the controller or processor has establishments in several Member States or where a significant number of data subjects in more than one Member State are likely to be substantially affected by processing operations, a supervisory authority of each of those Member States shall have the right to participate in joint operations. The supervisory authority which is competent pursuant to Article 56(1) or (4) shall invite the supervisory authority of each of those Member States to take part in the joint operations and shall respond without delay to the request of a supervisory authority to participate. 3. A supervisory authority may, in accordance with Member State law, and with the seconding supervisory authorityโ€™s authorisation, confer powers, including investigative powers on the seconding supervisory authorityโ€™s members or staff involved in joint operations or, in so far as the law of the Member State of the host supervisory authority permits, allow the seconding supervisory authorityโ€™s members or staff to exercise their investigative powers in accordance with the law of the Member State of the seconding supervisory authority. Such investigative powers may be exercised only under the guidance and in the presence of members or staff of the host supervisory authority. The seconding supervisory authorityโ€™s members or staff shall be subject to the Member State law of the host supervisory authority. 4. Where, in accordance with paragraph 1, staff of a seconding supervisory authority operate in another Member State, the Member State of the host supervisory authority shall assume responsibility for their actions, including liability, for any damage caused by them during their operations, in accordance with the law of the Member State in whose territory ' are operating. 5. The Member State in whose territory the damage was caused shall make good such damage under the conditions applicable to damage caused by its own staff. The Member State of the seconding supervisory authority whose staff has caused damage to any person in the territory of another Member State shall reimburse that other Member State in full any sums it has paid to the persons entitled on their behalf. 6. Without prejudice to the exercise of its rights vis-ร -vis third parties and with the exception of paragraph 5, each Member State shall refrain, in the case provided for in paragraph 1, from requesting reimbursement from another Member State in relation to damage referred to in paragraph 4. 7. Where a joint operation is intended and a supervisory authority does not, within one month, comply with the obligation laid down in the second sentence of paragraph 2 of this Article, the other supervisory authorities may adopt a provisional measure on the territory of its Member State in accordance with Article 55. In that case, the urgent need to act under Article 66(1) shall be presumed to be met and require an opinion or an urgent binding decision from the Board pursuant to Article 66(2).

๐Ÿ’ผ Art. 63 Consistency mechanism

In order to contribute to the consistent application of this Regulation throughout the Union, the supervisory authorities shall cooperate with each other and, where relevant, with the Commission, through the consistency mechanism as set out in this Section.

๐Ÿ’ผ Art. 64 Opinion of the Board

1. The Board shall issue an opinion where a competent supervisory authority intends to adopt any of the measures below. To that end, the competent supervisory authority shall communicate the draft decision to the Board, when it: a. aims to adopt a list of the processing operations subject to the requirement for a data protection impact assessment pursuant to Article 35(4); b. concerns a matter pursuant to Article 40(7) whether a draft code of conduct or an amendment or extension to a code of conduct complies with this Regulation; c. aims to approve the requirements for accreditation of a body pursuant to Article 41(3), of a certification body pursuant to Article 43(3) or the criteria for certification referred to in Article 42(5); d. aims to determine standard data protection clauses referred to in point (d) of Article 46(2) and in Article 28(8); e. aims to authorise contractual clauses referred to in point (a) of Article 46(3); or f. aims to approve binding corporate rules within the meaning of Article 47. 2. Any supervisory authority, the Chair of the Board or the Commission may request that any matter of general application or producing effects in more than one Member State be examined by the Board with a view to obtaining an opinion, in particular where a competent supervisory authority does not comply with the obligations for mutual assistance in accordance with Article 61 or for joint operations in accordance with Article 62. 3. In the cases referred to in paragraphs 1 and 2, the Board shall issue an opinion on the matter submitted to it provided that it has not already issued an opinion on the same matter. That opinion shall be adopted within eight weeks by simple majority of the members of the Board. That period may be extended by a further six weeks, taking into account the complexity of the subject matter. Regarding the draft decision referred to in paragraph 1 circulated to the members of the Board in accordance with paragraph 5, a member which has not objected within a reasonable period indicated by the Chair, shall be deemed to be in agreement with the draft decision. 4. Supervisory authorities and the Commission shall, without undue delay, communicate by electronic means to the Board, using a standardised format any relevant information, including as the case may be a summary of the facts, the draft decision, the grounds which make the enactment of such measure necessary, and the views of other supervisory authorities concerned. 5. The Chair of the Board shall, without undue, delay inform by electronic means: a. the members of the Board and the Commission of any relevant information which has been communicated to it using a standardised format. The secretariat of the Board shall, where necessary, provide translations of relevant information; and b. the supervisory authority referred to, as the case may be, in paragraphs 1 and 2, and the Commission of the opinion and make it public. 6. The competent supervisory authority referred to in paragraph 1 shall not adopt its draft decision referred to in paragraph 1 within the period referred to in paragraph 3. 7. The competent supervisory authority referred to in paragraph 1 shall take utmost account of the opinion of the Board and shall, within two weeks after receiving the opinion, communicate to the Chair of the Board by electronic means whether it will maintain or amend its draft decision and, if any, the amended draft decision, using a standardised format. 8. Where the competent supervisory authority referred to in paragraph 1 informs the Chair of the Board within the period referred to in paragraph 7 of this Article that it does not intend to follow the opinion of the Board, in whole or in part, providing the relevant grounds, Article 65(1) shall apply.

๐Ÿ’ผ Art. 65 Dispute resolution by the Board

1. In order to ensure the correct and consistent application of this Regulation in individual cases, the Board shall adopt a binding decision in the following cases: a. where, in a case referred to in Article 60(4), a supervisory authority concerned has raised a relevant and reasoned objection to a draft decision of the lead supervisory authority and the lead supervisory authority has not followed the objection or has rejected such an objection as being not relevant or reasoned. The binding decision shall concern all the matters which are the subject of the relevant and reasoned objection, in particular whether there is an infringement of this Regulation; b. where there are conflicting views on which of the supervisory authorities concerned is competent for the main establishment; c. where a competent supervisory authority does not request the opinion of the Board in the cases referred to in Article 64(1), or does not follow the opinion of the Board issued under Article 64. In that case, any supervisory authority concerned or the Commission may communicate the matter to the Board. 2. The decision referred to in paragraph 1 shall be adopted within one month from the referral of the subject-matter by a two-thirds majority of the members of the Board. That period may be extended by a further month on account of the complexity of the subject-matter. The decision referred to in paragraph 1 shall be reasoned and addressed to the lead supervisory authority and all the supervisory authorities concerned and binding on them. 3. Where the Board has been unable to adopt a decision within the periods referred to in paragraph 2, it shall adopt its decision within two weeks following the expiration of the second month referred to in paragraph 2 by a simple majority of the members of the Board. Where the members of the Board are split, the decision shall by adopted by the vote of its Chair. 4. he supervisory authorities concerned shall not adopt a decision on the subject matter submitted to the Board under paragraph 1 during the periods referred to in paragraphs 2 and 3. 5. The Chair of the Board shall notify, without undue delay, the decision referred to in paragraph 1 to the supervisory authorities concerned. It shall inform the Commission thereof. The decision shall be published on the website of the Board without delay after the supervisory authority has notified the final decision referred to in paragraph 6. 6. The lead supervisory authority or, as the case may be, the supervisory authority with which the complaint has been lodged shall adopt its final decision on the basis of the decision referred to in paragraph 1 of this Article, without undue delay and at the latest by one month after the Board has notified its decision. The lead supervisory authority or, as the case may be, the supervisory authority with which the complaint has been lodged, shall inform the Board of the date when its final decision is notified respectively to the controller or the processor and to the data subject. The final decision of the supervisory authorities concerned shall be adopted under the terms of Article 60(7), (8) and (9). The final decision shall refer to the decision referred to in paragraph 1 of this Article and shall specify that the decision referred to in that paragraph will be published on the website of the Board in accordance with paragraph 5 of this Article. The final decision shall attach the decision referred to in paragraph 1 of this Article.

๐Ÿ’ผ Art. 66 Urgency procedure

1. In exceptional circumstances, where a supervisory authority concerned considers that there is an urgent need to act in order to protect the rights and freedoms of data subjects, it may, by way of derogation from the consistency mechanism referred to in Articles 63, 64 and 65 or the procedure referred to in Article 60, immediately adopt provisional measures intended to produce legal effects on its own territory with a specified period of validity which shall not exceed three months. The supervisory authority shall, without delay, communicate those measures and the reasons for adopting them to the other supervisory authorities concerned, to the Board and to the Commission. 2. Where a supervisory authority has taken a measure pursuant to paragraph 1 and considers that final measures need urgently be adopted, it may request an urgent opinion or an urgent binding decision from the Board, giving reasons for requesting such opinion or decision. 3. Any supervisory authority may request an urgent opinion or an urgent binding decision, as the case may be, from the Board where a competent supervisory authority has not taken an appropriate measure in a situation where there is an urgent need to act, in order to protect the rights and freedoms of data subjects, giving reasons for requesting such opinion or decision, including for the urgent need to act. 4. By derogation from Article 64(3) and Article 65(2), an urgent opinion or an urgent binding decision referred to in paragraphs 2 and 3 of this Article shall be adopted within two weeks by simple majority of the members of the Board.

๐Ÿ’ผ Art. 67 Exchange of information

The Commission may adopt implementing acts of general scope in order to specify the arrangements for the exchange of information by electronic means between supervisory authorities, and between supervisory authorities and the Board, in particular the standardised format referred to in Article 64.

๐Ÿ’ผ Art. 68 European Data Protection Board

1. The European Data Protection Board (the 'Board') is hereby established as a body of the Union and shall have legal personality. 2. The Board shall be represented by its Chair. 3. The Board shall be composed of the head of one supervisory authority of each Member State and of the European Data Protection Supervisor, or their respective representatives. 4. Where in a Member State more than one supervisory authority is responsible for monitoring the application of the provisions pursuant to this Regulation, a joint representative shall be appointed in accordance with that Member State's law. 5. The Commission shall have the right to participate in the activities and meetings of the Board without voting right. The Commission shall designate a representative. The Chair of the Board shall communicate to the Commission the activities of the Board. 6. In the cases referred to in Article 65, the European Data Protection Supervisor shall have voting rights only on decisions which concern principles and rules applicable to the Union institutions, bodies, offices and agencies which correspond in substance to those of this Regulation.

๐Ÿ’ผ Art. 69 Independence

1. The Board shall act independently when performing its tasks or exercising its powers pursuant to Articles 70 and 71. 2. Without prejudice to requests by the Commission referred to in Article 70(1) and (2), the Board shall, in the performance of its tasks or the exercise of its powers, neither seek nor take instructions from anybody.

๐Ÿ’ผ Art. 7 Conditions for consent

1. Where processing is based on consent, the controller shall be able to demonstrate that the data subject has consented to processing of his or her personal data. 2. If the data subject's consent is given in the context of a written declaration which also concerns other matters, the request for consent shall be presented in a manner which is clearly distinguishable from the other matters, in an intelligible and easily accessible form, using clear and plain language. Any part of such a declaration which constitutes an infringement of this Regulation shall not be binding. 3. The data subject shall have the right to withdraw his or her consent at any time. The withdrawal of consent shall not affect the lawfulness of processing based on consent before its withdrawal. Prior to giving consent, the data subject shall be informed thereof. It shall be as easy to withdraw as to give consent. 4. When assessing whether consent is freely given, utmost account shall be taken of whether, inter alia, the performance of a contract, including the provision of a service, is conditional on consent to the processing of personal data that is not necessary for the performance of that contract.

๐Ÿ’ผ Art. 70 Tasks of the Board

1. The Board shall ensure the consistent application of this Regulation. To that end, the Board shall, on its own initiative or, where relevant, at the request of the Commission, in particular: a. monitor and ensure the correct application of this Regulation in the cases provided for in Articles 64 and 65 without prejudice to the tasks of national supervisory authorities; b. advise the Commission on any issue related to the protection of personal data in the Union, including on any proposed amendment of this Regulation; c. advise the Commission on the format and procedures for the exchange of information between controllers, processors and supervisory authorities for binding corporate rules; d. issue guidelines, recommendations, and best practices on procedures for erasing links, copies or replications of personal data from publicly available communication services as referred to in Article 17(2); e. examine, on its own initiative, on request of one of its members or on request of the Commission, any question covering the application of this Regulation and issue guidelines, recommendations and best practices in order to encourage consistent application of this Regulation; f. issue guidelines, recommendations and best practices in accordance with point (e) of this paragraph for further specifying the criteria and conditions for decisions based on profiling pursuant to Article 22(2); g. issue guidelines, recommendations and best practices in accordance with point (e) of this paragraph for establishing the personal data breaches and determining the undue delay referred to in Article 33(1) and (2) and for the particular circumstances in which a controller or a processor is required to notify the personal data breach; h. issue guidelines, recommendations and best practices in accordance with point (e) of this paragraph as to the circumstances in which a personal data breach is likely to result in a high risk to the rights and freedoms of the natural persons referred to in Article 34(1). i. issue guidelines, recommendations and best practices in accordance with point (e) of this paragraph for the purpose of further specifying the criteria and requirements for personal data transfers based on binding corporate rules adhered to by controllers and binding corporate rules adhered to by processors and on further necessary requirements to ensure the protection of personal data of the data subjects concerned referred to in Article 47; j. issue guidelines, recommendations and best practices in accordance with point (e) of this paragraph for the purpose of further specifying the criteria and requirements for the personal data transfers on the basis of Article 49(1); k. draw up guidelines for supervisory authorities concerning the application of measures referred to in Article 58(1), (2) and (3) and the setting of administrative fines pursuant to Article 83; l. review the practical application of the guidelines, recommendations and best practices; m. issue guidelines, recommendations and best practices in accordance with point (e) of this paragraph for establishing common procedures for reporting by natural persons of infringements of this Regulation pursuant to Article 54(2); n. encourage the drawing-up of codes of conduct and the establishment of data protection certification mechanisms and data protection seals and marks pursuant to Articles 40 and 42; o. approve the criteria of certification pursuant to Article 42(5) and maintain a public register of certification mechanisms and data protection seals and marks pursuant to Article 42(8) and of the certified controllers or processors established in third countries pursuant to Article 42(7); p. approve the requirements referred to in Article 43(3) with a view to the accreditation of certification bodies referred to in Article 43; q. provide the Commission with an opinion on the certification requirements referred to in Article 43(8); r. provide the Commission with an opinion on the icons referred to in Article 12(7); s. provide the Commission with an opinion for the assessment of the adequacy of the level of protection in a third country or international organisation, including for the assessment whether a third country, a territory or one or more specified sectors within that third country, or an international organisation no longer ensures an adequate level of protection. To that end, the Commission shall provide the Board with all necessary documentation, including correspondence with the government of the third country, with regard to that third country, territory or specified sector, or with the international organisation. t. issue opinions on draft decisions of supervisory authorities pursuant to the consistency mechanism referred to in Article 64(1), on matters submitted pursuant to Article 64(2) and to issue binding decisions pursuant to Article 65, including in cases referred to in Article 66; u. promote the cooperation and the effective bilateral and multilateral exchange of information and best practices between the supervisory authorities; v. promote common training programmes and facilitate personnel exchanges between the supervisory authorities and, where appropriate, with the supervisory authorities of third countries or with international organisations; w. promote the exchange of knowledge and documentation on data protection legislation and practice with data protection supervisory authorities worldwide. x. issue opinions on codes of conduct drawn up at Union level pursuant to Article 40(9); and y. maintain a publicly accessible electronic register of decisions taken by supervisory authorities and courts on issues handled in the consistency mechanism. 2. Where the Commission requests advice from the Board, it may indicate a time limit, taking into account the urgency of the matter. 3. he Board shall forward its opinions, guidelines, recommendations, and best practices to the Commission and to the committee referred to in Article 93 and make them public. 4. The Board shall, where appropriate, consult interested parties and give them the opportunity to comment within a reasonable period. The Board shall, without prejudice to Article 76, make the results of the consultation procedure publicly available.

๐Ÿ’ผ Art. 71 Reports

1. The Board shall draw up an annual report regarding the protection of natural persons with regard to processing in the Union and, where relevant, in third countries and international organisations. The report shall be made public and be transmitted to the European Parliament, to the Council and to the Commission. 2. The annual report shall include a review of the practical application of the guidelines, recommendations and best practices referred to in point (l) of Article 70(1) as well as of the binding decisions referred to in Article 65.

๐Ÿ’ผ Art. 72 Procedure

1. The Board shall take decisions by a simple majority of its members, unless otherwise provided for in this Regulation. 2. The Board shall adopt its own rules of procedure by a two-thirds majority of its members and organise its own operational arrangements.

๐Ÿ’ผ Art. 73 Chair

1. The Board shall elect a chair and two deputy chairs from amongst its members by simple majority. 2. The term of office of the Chair and of the deputy chairs shall be five years and be renewable once.

๐Ÿ’ผ Art. 74 Tasks of the Chair

1. The Chair shall have the following tasks: a. to convene the meetings of the Board and prepare its agenda; b. to notify decisions adopted by the Board pursuant to Article 65 to the lead supervisory authority and the supervisory authorities concerned; c. to ensure the timely performance of the tasks of the Board, in particular in relation to the consistency mechanism referred to in Article 63. 2. The Board shall lay down the allocation of tasks between the Chair and the deputy chairs in its rules of procedure.

๐Ÿ’ผ Art. 75 Secretariat

1. The Board shall have a secretariat, which shall be provided by the European Data Protection Supervisor. 2. The secretariat shall perform its tasks exclusively under the instructions of the Chair of the Board. 3. The staff of the European Data Protection Supervisor involved in carrying out the tasks conferred on the Board by this Regulation shall be subject to separate reporting lines from the staff involved in carrying out tasks conferred on the European Data Protection Supervisor. 4. Where appropriate, the Board and the European Data Protection Supervisor shall establish and publish a Memorandum of Understanding implementing this Article, determining the terms of their cooperation, and applicable to the staff of the European Data Protection Supervisor involved in carrying out the tasks conferred on the Board by this Regulation. 5. The secretariat shall provide analytical, administrative and logistical support to the Board. 6. The secretariat shall be responsible in particular for: a. the day-to-day business of the Board; b. communication between the members of the Board, its Chair and the Commission; c. communication with other institutions and the public; d. the use of electronic means for the internal and external communication; e. the translation of relevant information; f. the preparation and follow-up of the meetings of the Board; g. the preparation, drafting and publication of opinions, decisions on the settlement of disputes between supervisory authorities and other texts adopted by the Board.

๐Ÿ’ผ Art. 76 Confidentiality

1. The discussions of the Board shall be confidential where the Board deems it necessary, as provided for in its rules of procedure. 2. Access to documents submitted to members of the Board, experts and representatives of third parties shall be governed by Regulation (EC) No 1049/2001 of the European Parliament and of the Councilยน.

๐Ÿ’ผ Art. 77 Right to lodge a complaint with a supervisory authority

1. Without prejudice to any other administrative or judicial remedy, every data subject shall have the right to lodge a complaint with a supervisory authority, in particular in the Member State of his or her habitual residence, place of work or place of the alleged infringement if the data subject considers that the processing of personal data relating to him or her infringes this Regulation. 2. The supervisory authority with which the complaint has been lodged shall inform the complainant on the progress and the outcome of the complaint including the possibility of a judicial remedy pursuant to Article 78.

๐Ÿ’ผ Art. 78 Right to an effective judicial remedy against a supervisory authority

1. Without prejudice to any other administrative or non-judicial remedy, each natural or legal person shall have the right to an effective judicial remedy against a legally binding decision of a supervisory authority concerning them. 2. Without prejudice to any other administrative or non-judicial remedy, each data subject shall have the right to a an effective judicial remedy where the supervisory authority which is competent pursuant to Articles 55 and 56 does not handle a complaint or does not inform the data subject within three months on the progress or outcome of the complaint lodged pursuant to Article 77. 3. Proceedings against a supervisory authority shall be brought before the courts of the Member State where the supervisory authority is established. 4. Where proceedings are brought against a decision of a supervisory authority which was preceded by an opinion or a decision of the Board in the consistency mechanism, the supervisory authority shall forward that opinion or decision to the court.

๐Ÿ’ผ Art. 79 Right to an effective judicial remedy against a controller or processor

1. Without prejudice to any available administrative or non-judicial remedy, including the right to lodge a complaint with a supervisory authority pursuant to Article 77, each data subject shall have the right to an effective judicial remedy where he or she considers that his or her rights under this Regulation have been infringed as a result of the processing of his or her personal data in non-compliance with this Regulation. 2. Proceedings against a controller or a processor shall be brought before the courts of the Member State where the controller or processor has an establishment. Alternatively, such proceedings may be brought before the courts of the Member State where the data subject has his or her habitual residence, unless the controller or processor is a public authority of a Member State acting in the exercise of its public powers.

๐Ÿ’ผ Art. 8 Conditions applicable to child's consent in relation to information society services

1. Where point (a) of Article 6(1) applies, in relation to the offer of information society services directly to a child, the processing of the personal data of a child shall be lawful where the child is at least 16 years old. Where the child is below the age of 16 years, such processing shall be lawful only if and to the extent that consent is given or authorised by the holder of parental responsibility over the child. 2. Member States may provide by law for a lower age for those purposes provided that such lower age is not below 13 years. 3. The controller shall make reasonable efforts to verify in such cases that consent is given or authorised by the holder of parental responsibility over the child, taking into consideration available technology. 4. Paragraph 1 shall not affect the general contract law of Member States such as the rules on the validity, formation or effect of a contract in relation to a child.

๐Ÿ’ผ Art. 80 Representation of data subjects

1. The data subject shall have the right to mandate a not-for-profit body, organisation or association which has been properly constituted in accordance with the law of a Member State, has statutory objectives which are in the public interest, and is active in the field of the protection of data subjects' rights and freedoms with regard to the protection of their personal data to lodge the complaint on his or her behalf, to exercise the rights referred to in Articles 77, 78 and 79 on his or her behalf, and to exercise the right to receive compensation referred to in Article 82 on his or her behalf where provided for by Member State law. 2. Member States may provide that any body, organisation or association referred to in paragraph 1 of this Article, independently of a data subject's mandate, has the right to lodge, in that Member State, a complaint with the supervisory authority which is competent pursuant to Article 77 and to exercise the rights referred to in Articles 78 and 79 if it considers that the rights of a data subject under this Regulation have been infringed as a result of the processing.

๐Ÿ’ผ Art. 81 Suspension of proceedings

1. Where a competent court of a Member State has information on proceedings, concerning the same subject matter as regards processing by the same controller or processor, that are pending in a court in another Member State, it shall contact that court in the other Member State to confirm the existence of such proceedings. 2. Where proceedings concerning the same subject matter as regards processing of the same controller or processor are pending in a court in another Member State, any competent court other than the court first seized may suspend its proceedings. 3. Where those proceedings are pending at first instance, any court other than the court first seized may also, on the application of one of the parties, decline jurisdiction if the court first seized has jurisdiction over the actions in question and its law permits the consolidation thereof.

๐Ÿ’ผ Art. 82 Right to compensation and liability

1. Any person who has suffered material or non-material damage as a result of an infringement of this Regulation shall have the right to receive compensation from the controller or processor for the damage suffered. 2. Any controller involved in processing shall be liable for the damage caused by processing which infringes this Regulation. A processor shall be liable for the damage caused by processing only where it has not complied with obligations of this Regulation specifically directed to processors or where it has acted outside or contrary to lawful instructions of the controller. 3. A controller or processor shall be exempt from liability under paragraph 2 if it proves that it is not in any way responsible for the event giving rise to the damage. 4. Where more than one controller or processor, or both a controller and a processor, are involved in the same processing and where they are, under paragraphs 2 and 3, responsible for any damage caused by processing, each controller or processor shall be held liable for the entire damage in order to ensure effective compensation of the data subject. 5. Where a controller or processor has, in accordance with paragraph 4, paid full compensation for the damage suffered, that controller or processor shall be entitled to claim back from the other controllers or processors involved in the same processing that part of the compensation corresponding to their part of responsibility for the damage, in accordance with the conditions set out in paragraph 2. 6. Court proceedings for exercising the right to receive compensation shall be brought before the courts competent under the law of the Member State referred to in Article 79(2).

๐Ÿ’ผ Art. 83 General conditions for imposing administrative fines

1. Each supervisory authority shall ensure that the imposition of administrative fines pursuant to this Article in respect of infringements of this Regulation referred to in paragraphs 4, 5 and 6 shall in each individual case be effective, proportionate and dissuasive. 2. Administrative fines shall, depending on the circumstances of each individual case, be imposed in addition to, or instead of, measures referred to in points (a) to (h) and (j) of Article 58(2). When deciding whether to impose an administrative fine and deciding on the amount of the administrative fine in each individual case due regard shall be given to the following: a. the nature, gravity and duration of the infringement taking into account the nature scope or purpose of the processing concerned as well as the number of data subjects affected and the level of damage suffered by them; b. the intentional or negligent character of the infringement; c. any action taken by the controller or processor to mitigate the damage suffered by data subjects; d. the degree of responsibility of the controller or processor taking into account technical and organisational measures implemented by them pursuant to Articles 25 and 32; e. any relevant previous infringements by the controller or processor; f. the degree of cooperation with the supervisory authority, in order to remedy the infringement and mitigate the possible adverse effects of the infringement; g. the categories of personal data affected by the infringement; h. the manner in which the infringement became known to the supervisory authority, in particular whether, and if so to what extent, the controller or processor notified the infringement; i. where measures referred to in Article 58(2) have previously been ordered against the controller or processor concerned with regard to the same subject-matter, compliance with those measures; j. adherence to approved codes of conduct pursuant to Article 40 or approved certification mechanisms pursuant to Article 42; and k. any other aggravating or mitigating factor applicable to the circumstances of the case, such as financial benefits gained, or losses avoided, directly or indirectly, from the infringement. 3. If a controller or processor intentionally or negligently, for the same or linked processing operations, infringes several provisions of this Regulation, the total amount of the administrative fine shall not exceed the amount specified for the gravest infringement. 4. Infringements of the following provisions shall, in accordance with paragraph 2, be subject to administrative fines up to 10 000 000 EUR, or in the case of an undertaking, up to 2 % of the total worldwide annual turnover of the preceding financial year, whichever is higher: a. the obligations of the controller and the processor pursuant to Articles 8, 11, 25 to 39 and 42 and 43; b. the obligations of the certification body pursuant to Articles 42 and 43; c. the obligations of the monitoring body pursuant to Article 41(4). 5. Infringements of the following provisions shall, in accordance with paragraph 2, be subject to administrative fines up to 20 000 000 EUR, or in the case of an undertaking, up to 4 % of the total worldwide annual turnover of the preceding financial year, whichever is higher: a. the basic principles for processing, including conditions for consent, pursuant to Articles 5, 6, 7 and 9; b. the data subjects' rights pursuant to Articles 12 to 22; c. the transfers of personal data to a recipient in a third country or an international organisation pursuant to Articles 44 to 49; d. any obligations pursuant to Member State law adopted under Chapter IX; e. non-compliance with an order or a temporary or definitive limitation on processing or the suspension of data flows by the supervisory authority pursuant to Article 58(2) or failure to provide access in violation of Article 58(1). 6. Non-compliance with an order by the supervisory authority as referred to in Article 58(2) shall, in accordance with paragraph 2 of this Article, be subject to administrative fines up to 20 000 000 EUR, or in the case of an undertaking, up to 4 % of the total worldwide annual turnover of the preceding financial year, whichever is higher. 7. Without prejudice to the corrective powers of supervisory authorities pursuant to Article 58(2), each Member State may lay down the rules on whether and to what extent administrative fines may be imposed on public authorities and bodies established in that Member State. 8. The exercise by the supervisory authority of its powers under this Article shall be subject to appropriate procedural safeguards in accordance with Union and Member State law, including effective judicial remedy and due process. 9. Where the legal system of the Member State does not provide for administrative fines, this Article may be applied in such a manner that the fine is initiated by the competent supervisory authority and imposed by competent national courts, while ensuring that those legal remedies are effective and have an equivalent effect to the administrative fines imposed by supervisory authorities. In any event, the fines imposed shall be effective, proportionate and dissuasive. Those Member States shall notify to the Commission the provisions of their laws which they adopt pursuant to this paragraph by 25 May 2018 and, without delay, any subsequent amendment law or amendment affecting them.

๐Ÿ’ผ Art. 84 Penalties

1. Member States shall lay down the rules on other penalties applicable to infringements of this Regulation in particular for infringements which are not subject to administrative fines pursuant to Article 83, and shall take all measures necessary to ensure that they are implemented. Such penalties shall be effective, proportionate and dissuasive. 2. Each Member State shall notify to the Commission the provisions of its law which it adopts pursuant to paragraph 1, by 25 May 2018 and, without delay, any subsequent amendment affecting them.

๐Ÿ’ผ Art. 85 Processing and freedom of expression and information

1. Member States shall by law reconcile the right to the protection of personal data pursuant to this Regulation with the right to freedom of expression and information, including processing for journalistic purposes and the purposes of academic, artistic or literary expression. 2. For processing carried out for journalistic purposes or the purpose of academic artistic or literary expression, Member States shall provide for exemptions or derogations from Chapter II (principles), Chapter III (rights of the data subject), Chapter IV (controller and processor), Chapter V (transfer of personal data to third countries or international organisations), Chapter VI (independent supervisory authorities), Chapter VII (cooperation and consistency) and Chapter IX (specific data processing situations) if they are necessary to reconcile the right to the protection of personal data with the freedom of expression and information. 3. Each Member State shall notify to the Commission the provisions of its law which it has adopted pursuant to paragraph 2 and, without delay, any subsequent amendment law or amendment affecting them.

๐Ÿ’ผ Art. 86 Processing and public access to official documents

Personal data in official documents held by a public authority or a public body or a private body for the performance of a task carried out in the public interest may be disclosed by the authority or body in accordance with Union or Member State law to which the public authority or body is subject in order to reconcile public access to official documents with the right to the protection of personal data pursuant to this Regulation.

๐Ÿ’ผ Art. 87 Processing of the national identification number

Member States may further determine the specific conditions for the processing of a national identification number or any other identifier of general application. In that case the national identification number or any other identifier of general application shall be used only under appropriate safeguards for the rights and freedoms of the data subject pursuant to this Regulation.

๐Ÿ’ผ Art. 88 Processing in the context of employment

1. Member States may, by law or by collective agreements, provide for more specific rules to ensure the protection of the rights and freedoms in respect of the processing of employees' personal data in the employment context, in particular for the purposes of the recruitment, the performance of the contract of employment, including discharge of obligations laid down by law or by collective agreements, management, planning and organisation of work, equality and diversity in the workplace, health and safety at work, protection of employer's or customer's property and for the purposes of the exercise and enjoyment, on an individual or collective basis, of rights and benefits related to employment, and for the purpose of the termination of the employment relationship. 2. Those rules shall include suitable and specific measures to safeguard the data subject's human dignity, legitimate interests and fundamental rights, with particular regard to the transparency of processing, the transfer of personal data within a group of undertakings, or a group of enterprises engaged in a joint economic activity and monitoring systems at the work place. 3. Each Member State shall notify to the Commission those provisions of its law which it adopts pursuant to paragraph 1, by 25 May 2018 and, without delay, any subsequent amendment affecting them.

๐Ÿ’ผ Art. 89 Safeguards and derogations relating to processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes

1. Processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes, shall be subject to appropriate safeguards, in accordance with this Regulation, for the rights and freedoms of the data subject. Those safeguards shall ensure that technical and organisational measures are in place in particular in order to ensure respect for the principle of data minimisation. Those measures may include pseudonymisation provided that those purposes can be fulfilled in that manner. Where those purposes can be fulfilled by further processing which does not permit or no longer permits the identification of data subjects, those purposes shall be fulfilled in that manner. 2. Where personal data are processed for scientific or historical research purposes or statistical purposes, Union or Member State law may provide for derogations from the rights referred to in Articles 15, 16, 18 and 21 subject to the conditions and safeguards referred to in paragraph 1 of this Article in so far as such rights are likely to render impossible or seriously impair the achievement of the specific purposes, and such derogations are necessary for the fulfilment of those purposes. 3. Where personal data are processed for archiving purposes in the public interest, Union or Member State law may provide for derogations from the rights referred to in Articles 15, 16, 18, 19, 20 and 21 subject to the conditions and safeguards referred to in paragraph 1 of this Article in so far as such rights are likely to render impossible or seriously impair the achievement of the specific purposes, and such derogations are necessary for the fulfilment of those purposes. 4. Where processing referred to in paragraphs 2 and 3 serves at the same time another purpose, the derogations shall apply only to processing for the purposes referred to in those paragraphs.

๐Ÿ’ผ Art. 9 Processing of special categories of personal data

1. Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural personโ€™s sex life or sexual orientation shall be prohibited. 2. Paragraph 1 shall not apply if one of the following applies: a. the data subject has given explicit consent to the processing of those personal data for one or more specified purposes, except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject; b. processing is necessary for the purposes of carrying out the obligations and exercising specific rights of the controller or of the data subject in the field of employment and social security and social protection law in so far as it is authorised by Union or Member State law or a collective agreement pursuant to Member State law providing for appropriate safeguards for the fundamental rights and the interests of the data subject; c. processing is necessary to protect the vital interests of the data subject or of another natural person where the data subject is physically or legally incapable of giving consent; d. processing is carried out in the course of its legitimate activities with appropriate safeguards by a foundation, association or any other not-for-profit body with a political, philosophical, religious or trade union aim and on condition that the processing relates solely to the members or to former members of the body or to persons who have regular contact with it in connection with its purposes and that the personal data are not disclosed outside that body without the consent of the data subjects; e. processing relates to personal data which are manifestly made public by the data subject; f. processing is necessary for the establishment, exercise or defence of legal claims or whenever courts are acting in their judicial capacity; g. processing is necessary for reasons of substantial public interest, on the basis of Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject; h. processing is necessary for the purposes of preventive or occupational medicine, for the assessment of the working capacity of the employee, medical diagnosis, the provision of health or social care or treatment or the management of health or social care systems and services on the basis of Union or Member State law or pursuant to contract with a health professional and subject to the conditions and safeguards referred to in paragraph 3; i. processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards of quality and safety of health care and of medicinal products or medical devices, on the basis of Union or Member State law which provides for suitable and specific measures to safeguard the rights and freedoms of the data subject, in particular professional secrecy; j. processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject. 3. Personal data referred to in paragraph 1 may be processed for the purposes referred to in point (h) of paragraph 2 when those data are processed by or under the responsibility of a professional subject to the obligation of professional secrecy under Union or Member State law or rules established by national competent bodies or by another person also subject to an obligation of secrecy under Union or Member State law or rules established by national competent bodies. 4. Member States may maintain or introduce further conditions, including limitations, with regard to the processing of genetic data, biometric data or data concerning health.

๐Ÿ’ผ Art. 90 Obligations of secrecy

1. Member States may adopt specific rules to set out the powers of the supervisory authorities laid down in points (e) and (f) of Article 58(1) in relation to controllers or processors that are subject, under Union or Member State law or rules established by national competent bodies, to an obligation of professional secrecy or other equivalent obligations of secrecy where this is necessary and proportionate to reconcile the right of the protection of personal data with the obligation of secrecy. Those rules shall apply only with regard to personal data which the controller or processor has received as a result of or has obtained in an activity covered by that obligation of secrecy. 2. Each Member State shall notify to the Commission the rules adopted pursuant to paragraph 1, by 25 May 2018 and, without delay, any subsequent amendment affecting them.

๐Ÿ’ผ Art. 91 Existing data protection rules of churches and religious association

1. Where in a Member State, churches and religious associations or communities apply, at the time of entry into force of this Regulation, comprehensive rules relating to the protection of natural persons with regard to processing, such rules may continue to apply, provided that they are brought into line with this Regulation. 2. Churches and religious associations which apply comprehensive rules in accordance with paragraph 1 of this Article shall be subject to the supervision of an independent supervisory authority, which may be specific, provided that it fulfils the conditions laid down in Chapter VI of this Regulation.

๐Ÿ’ผ Art. 92 Exercise of the delegation

1. The power to adopt delegated acts is conferred on the Commission subject to the conditions laid down in this Article. 2. The delegation of power referred to in Article 12(8) and Article 43(8) shall be conferred on the Commission for an indeterminate period of time from 24 May 2016. 3. The delegation of power referred to in Article 12(8) and Article 43(8) may be revoked at any time by the European Parliament or by the Council. A decision of revocation shall put an end to the delegation of power specified in that decision.It shall take effect the day following that of its publication in the Official Journal of the European Union or at a later date specified therein. It shall not affect the validity of any delegated acts already in force. 4. As soon as it adopts a delegated act, the Commission shall notify it simultaneously to the European Parliament and to the Council. 5. A delegated act adopted pursuant to Article 12(8) and Article 43(8) shall enter into force only if no objection has been expressed by either the European Parliament or the Council within a period of three months of notification of that act to the European Parliament and the Council or if, before the expiry of that period, the European Parliament and the Council have both informed the Commission that they will not object. That period shall be extended by three months at the initiative of the European Parliament or of the Council.

๐Ÿ’ผ Art. 93 Committee procedure

1. The Commission shall be assisted by a committee. That committee shall be a committee within the meaning of Regulation (EU) No 182/2011. 2. Where reference is made to this paragraph, Article 5 of Regulation (EU) No 182/2011 shall apply. 3. Where reference is made to this paragraph, Article 8 of Regulation ((EU) No 182/2011, in conjunction with Article 5 thereof, shall apply.

๐Ÿ’ผ Art. 94 Repeal of Directive 95/46/EC

1. Directive 95/46/EC is repealed with effect from 25 May 2018. 2. References to the repealed Directive shall be construed as references to this Regulation. References to the Working Party on the Protection of Individuals with regard to the Processing of Personal Data established by Article 29 of Directive 95/46/EC shall be construed as references to the European Data Protection Board established by this Regulation.

๐Ÿ’ผ Art. 95 Relationship with Directive 2002/58/EC

1. Directive 95/46/EC is repealed with effect from 25 May 2018. 2. References to the repealed Directive shall be construed as references to this Regulation. References to the Working Party on the Protection of Individuals with regard to the Processing of Personal Data established by Article 29 of Directive 95/46/EC shall be construed as references to the European Data Protection Board established by this Regulation.

๐Ÿ’ผ Art. 96 Relationship with previously concluded Agreements

1. Directive 95/46/EC is repealed with effect from 25 May 2018. 2. References to the repealed Directive shall be construed as references to this Regulation. References to the Working Party on the Protection of Individuals with regard to the Processing of Personal Data established by Article 29 of Directive 95/46/EC shall be construed as references to the European Data Protection Board established by this Regulation.

๐Ÿ’ผ Art. 97 Commission reports

1. Directive 95/46/EC is repealed with effect from 25 May 2018. 2. References to the repealed Directive shall be construed as references to this Regulation. References to the Working Party on the Protection of Individuals with regard to the Processing of Personal Data established by Article 29 of Directive 95/46/EC shall be construed as references to the European Data Protection Board established by this Regulation.

๐Ÿ’ผ Art. 98 Review of other Union legal acts on data protection

1. Directive 95/46/EC is repealed with effect from 25 May 2018. 2. References to the repealed Directive shall be construed as references to this Regulation. References to the Working Party on the Protection of Individuals with regard to the Processing of Personal Data established by Article 29 of Directive 95/46/EC shall be construed as references to the European Data Protection Board established by this Regulation.

๐Ÿ’ผ Art. 99 Entry into force and application

1. Directive 95/46/EC is repealed with effect from 25 May 2018. 2. References to the repealed Directive shall be construed as references to this Regulation. References to the Working Party on the Protection of Individuals with regard to the Processing of Personal Data established by Article 29 of Directive 95/46/EC shall be construed as references to the European Data Protection Board established by this Regulation.

๐Ÿ’ผ Asset Management (ID.AM)

The data, personnel, devices, systems, and facilities that enable the organization to achieve business purposes are identified and managed consistent with their relative importance to organizational objectives and the organization's risk strategy.

๐Ÿ’ผ Asset Management (ID.AM)

Assets (e.g., data, hardware, software, systems, facilities, services, people) that enable the organization to achieve business purposes are identified and managed consistent with their relative importance to organizational objectives and the organization's risk strategy.

๐Ÿ’ผ AT-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] awareness and training policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the awareness and training policy and the associated awareness and training controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the awareness and training policy and procedures; and c. Review and update the current awareness and training: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ AT-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] awareness and training policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the awareness and training policy and the associated awareness and training controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the awareness and training policy and procedures; and c. Review and update the current awareness and training: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AT-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] awareness and training policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the awareness and training policy and the associated awareness and training controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the awareness and training policy and procedures; and c. Review and update the current awareness and training: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AT-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] awareness and training policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the awareness and training policy and the associated awareness and training controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the awareness and training policy and procedures; and c. Review and update the current awareness and training: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AT-1 SECURITY AWARENESS AND TRAINING POLICY AND PROCEDURES

The organization: AT-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: AT-1a.1. A security awareness and training policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and AT-1a.2. Procedures to facilitate the implementation of the security awareness and training policy and associated security awareness and training controls; and AT-1b. Reviews and updates the current: AT-1b.1. Security awareness and training policy [Assignment: organization-defined frequency]; and AT-1b.2. Security awareness and training procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ AT-2 Literacy Training and Awareness

a. Provide security and privacy literacy training to system users (including managers, senior executives, and contractors): 1. As part of initial training for new users and [Assignment: organization-defined frequency] thereafter; and 2. When required by system changes or following [Assignment: organization-defined events]; b. Employ the following techniques to increase the security and privacy awareness of system users [Assignment: organization-defined awareness techniques]; c. Update literacy training and awareness content [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and d. Incorporate lessons learned from internal or external security incidents or breaches into literacy training and awareness techniques.

๐Ÿ’ผ AT-2 Literacy Training and Awareness (L)(M)(H)

a. Provide security and privacy literacy training to system users (including managers, senior executives, and contractors): 1. As part of initial training for new users and [FedRAMP Assignment: at least annually] thereafter; and 2. When required by system changes or following [Assignment: organization-defined events]; b. Employ the following techniques to increase the security and privacy awareness of system users [Assignment: organization-defined awareness techniques]; c. Update literacy training and awareness content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]; and d. Incorporate lessons learned from internal or external security or privacy incidents into literacy training and awareness techniques.

๐Ÿ’ผ AT-2 Literacy Training and Awareness (L)(M)(H)

a. Provide security and privacy literacy training to system users (including managers, senior executives, and contractors): 1. As part of initial training for new users and [FedRAMP Assignment: at least annually] thereafter; and 2. When required by system changes or following [Assignment: organization-defined events]; b. Employ the following techniques to increase the security and privacy awareness of system users [Assignment: organization-defined awareness techniques]; c. Update literacy training and awareness content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]; and d. Incorporate lessons learned from internal or external security or privacy incidents into literacy training and awareness techniques.

๐Ÿ’ผ AT-2 Literacy Training and Awareness (L)(M)(H)

a. Provide security and privacy literacy training to system users (including managers, senior executives, and contractors): 1. As part of initial training for new users and [FedRAMP Assignment: at least annually] thereafter; and 2. When required by system changes or following [Assignment: organization-defined events]; b. Employ the following techniques to increase the security and privacy awareness of system users [Assignment: organization-defined awareness techniques]; c. Update literacy training and awareness content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]; and d. Incorporate lessons learned from internal or external security or privacy incidents into literacy training and awareness techniques.

๐Ÿ’ผ AT-2 SECURITY AWARENESS TRAINING

The organization provides basic security awareness training to information system users (including managers, senior executives, and contractors): AT-2a. As part of initial training for new users; AT-2b. When required by information system changes; and AT-2c. [Assignment: organization-defined frequency] thereafter.

๐Ÿ’ผ AT-3 (1) ENVIRONMENTAL CONTROLS

The organization provides [Assignment: organization-defined personnel or roles] with initial and [Assignment: organization-defined frequency] training in the employment and operation of environmental controls.

๐Ÿ’ผ AT-3 (2) PHYSICAL SECURITY CONTROLS

The organization provides [Assignment: organization-defined personnel or roles] with initial and [Assignment: organization-defined frequency] training in the employment and operation of physical security controls.

๐Ÿ’ผ AT-3 ROLE-BASED SECURITY TRAINING

The organization provides role-based security training to personnel with assigned security roles and responsibilities: AT-3a. Before authorizing access to the information system or performing assigned duties; AT-3b. When required by information system changes; and AT-3c. [Assignment: organization-defined frequency] thereafter.

๐Ÿ’ผ AT-3 Role-based Training

a. Provide role-based security and privacy training to personnel with the following roles and responsibilities: [Assignment: organization-defined roles and responsibilities]: 1. Before authorizing access to the system, information, or performing assigned duties, and [Assignment: organization-defined frequency] thereafter; and 2. When required by system changes; b. Update role-based training content [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and c. Incorporate lessons learned from internal or external security incidents or breaches into role-based training.

๐Ÿ’ผ AT-3 Role-based Training (L)(M)(H)

a. Provide role-based security and privacy training to personnel with the following roles and responsibilities: [Assignment: organization-defined roles and responsibilities]: 1. Before authorizing access to the system, information, or performing assigned duties, and [FedRAMP Assignment: at least annually] thereafter; and 2. When required by system changes; b. Update role-based training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]; and c. Incorporate lessons learned from internal or external security or privacy incidents into role-based training.

๐Ÿ’ผ AT-3 Role-based Training (L)(M)(H)

a. Provide role-based security and privacy training to personnel with the following roles and responsibilities: [Assignment: organization-defined roles and responsibilities]: 1. Before authorizing access to the system, information, or performing assigned duties, and [FedRAMP Assignment: at least annually] thereafter; and 2. When required by system changes; b. Update role-based training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]; and c. Incorporate lessons learned from internal or external security or privacy incidents into role-based training.

๐Ÿ’ผ AT-3 Role-based Training (L)(M)(H)

a. Provide role-based security and privacy training to personnel with the following roles and responsibilities: [Assignment: organization-defined roles and responsibilities]: 1. Before authorizing access to the system, information, or performing assigned duties, and [FedRAMP Assignment: at least annually] thereafter; and 2. When required by system changes; b. Update role-based training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]; and c. Incorporate lessons learned from internal or external security or privacy incidents into role-based training.

๐Ÿ’ผ AT-4 SECURITY TRAINING RECORDS

The organization: AT-4a. Documents and monitors individual information system security training activities including basic security awareness training and specific information system security training; and AT-4b. Retains individual training records for [Assignment: organization-defined time period].

๐Ÿ’ผ AT-4 Training Records

a. Document and monitor information security and privacy training activities, including security and privacy awareness training and specific role-based security and privacy training; and b. Retain individual training records for [Assignment: organization-defined time period].

๐Ÿ’ผ AT-4 Training Records (L)(M)(H)

a. Document and monitor information security and privacy training activities, including security and privacy awareness training and specific role-based security and privacy training; and b. Retain individual training records for [FedRAMP Assignment: at least one (1) year or one (1) year after completion of a specific training program].

๐Ÿ’ผ AT-4 Training Records (L)(M)(H)

a. Document and monitor information security and privacy training activities, including security and privacy awareness training and specific role-based security and privacy training; and b. Retain individual training records for [FedRAMP Assignment: at least one (1) year or one (1) year after completion of a specific training program].

๐Ÿ’ผ AT-4 Training Records (L)(M)(H)

a. Document and monitor information security and privacy training activities, including security and privacy awareness training and specific role-based security and privacy training; and b. Retain individual training records for [FedRAMP Assignment: at least one (1) year or one (1) year after completion of a specific training program].

๐Ÿ’ผ AT-6 Training Feedback

Provide feedback on organizational training results to the following personnel [Assignment: organization-defined frequency]: [Assignment: organization-defined personnel].

๐Ÿ’ผ AU-1 AUDIT AND ACCOUNTABILITY POLICY AND PROCEDURES

The organization: AU-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: AU-1a.1. An audit and accountability policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and AU-1a.2. Procedures to facilitate the implementation of the audit and accountability policy and associated audit and accountability controls; and AU-1b. Reviews and updates the current: AU-1b.1. Audit and accountability policy [Assignment: organization-defined frequency]; and AU-1b.2. Audit and accountability procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ AU-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] audit and accountability policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the audit and accountability policy and the associated audit and accountability controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the audit and accountability policy and procedures; and c. Review and update the current audit and accountability: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ AU-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] audit and accountability policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the audit and accountability policy and the associated audit and accountability controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the audit and accountability policy and procedures; and c. Review and update the current audit and accountability: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AU-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] audit and accountability policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the audit and accountability policy and the associated audit and accountability controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the audit and accountability policy and procedures; and c. Review and update the current audit and accountability: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AU-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] audit and accountability policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the audit and accountability policy and the associated audit and accountability controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the audit and accountability policy and procedures; and c. Review and update the current audit and accountability: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ AU-10 (1) ASSOCIATION OF IDENTITIES

The information system: AU-10 (1)(a) Binds the identity of the information producer with the information to [Assignment: organization-defined strength of binding]; and AU-10 (1)(b) Provides the means for authorized individuals to determine the identity of the producer of the information.

๐Ÿ’ผ AU-10 (3) CHAIN OF CUSTODY

The information system maintains reviewer/releaser identity and credentials within the established chain of custody for all information reviewed or released.

๐Ÿ’ผ AU-10 (4) VALIDATE BINDING OF INFORMATION REVIEWER IDENTITY

The information system: AU-10 (4)(a) Validates the binding of the information reviewer identity to the information at the transfer or release points prior to release/transfer between [Assignment: organization-defined security domains]; and AU-10 (4)(b) Performs [Assignment: organization-defined actions] in the event of a validation error.

๐Ÿ’ผ AU-10 Non-repudiation

Provide irrefutable evidence that an individual (or process acting on behalf of an individual) has performed [Assignment: organization-defined actions to be covered by non-repudiation].

๐Ÿ’ผ AU-10 NON-REPUDIATION

The information system protects against an individual (or process acting on behalf of an individual) falsely denying having performed [Assignment: organization-defined actions to be covered by non-repudiation].

๐Ÿ’ผ AU-10 Non-repudiation (H)

Provide irrefutable evidence that an individual (or process acting on behalf of an individual) has performed [FedRAMP Assignment: minimum actions including the addition, modification, deletion, approval, sending, or receiving of data].

๐Ÿ’ผ AU-11 Audit Record Retention

Retain audit records for [Assignment: organization-defined time period consistent with records retention policy] to provide support for after-the-fact investigations of incidents and to meet regulatory and organizational information retention requirements.

๐Ÿ’ผ AU-11 AUDIT RECORD RETENTION

The organization retains audit records for [Assignment: organization-defined time period consistent with records retention policy] to provide support for after-the-fact investigations of security incidents and to meet regulatory and organizational information retention requirements.

๐Ÿ’ผ AU-11 Audit Record Retention (L)(M)(H)

Retain audit records for [FedRAMP Assignment: a time period in compliance with M-21-31] to provide support for after-the-fact investigations of incidents and to meet regulatory and organizational information retention requirements. **AU-11 Additional FedRAMP Requirements and Guidance:** **Guidance**: The service provider is encouraged to align with M-21-31 where possible **Requirement**: The service provider retains audit records on-line for at least ninety (90) days and further preserves audit records off-line for a period that is in accordance with NARA requirements. **Requirement**: The service provider must support Agency requirements to comply with [M-21-31](https://www.whitehouse.gov/wp-content/uploads/2021/08/M-21-31-Improving-the-Federal-Governments-Investigative-and-Remediation-Capabilities-Related-to-Cybersecurity-Incidents.pdf)

๐Ÿ’ผ AU-11 Audit Record Retention (L)(M)(H)

Retain audit records for [FedRAMP Assignment: a time period in compliance with M-21-31] to provide support for after-the-fact investigations of incidents and to meet regulatory and organizational information retention requirements. **AU-11 Additional FedRAMP Requirements and Guidance:** **Guidance**: The service provider is encouraged to align with M-21-31 where possible **Requirement**: The service provider retains audit records on-line for at least ninety (90) days and further preserves audit records off-line for a period that is in accordance with NARA requirements. **Requirement**: The service provider must support Agency requirements to comply with [M-21-31](https://www.whitehouse.gov/wp-content/uploads/2021/08/M-21-31-Improving-the-Federal-Governments-Investigative-and-Remediation-Capabilities-Related-to-Cybersecurity-Incidents.pdf)

๐Ÿ’ผ AU-11 Audit Record Retention (L)(M)(H)

Retain audit records for [FedRAMP Assignment: a time period in compliance with M-21-31] to provide support for after-the-fact investigations of incidents and to meet regulatory and organizational information retention requirements. **AU-11 Additional FedRAMP Requirements and Guidance:** **Guidance**: The service provider is encouraged to align with M-21-31 where possible **Requirement**: The service provider retains audit records on-line for at least ninety (90) days and further preserves audit records off-line for a period that is in accordance with NARA requirements. **Requirement**: The service provider must support Agency requirements to comply with [M-21-31](https://www.whitehouse.gov/wp-content/uploads/2021/08/M-21-31-Improving-the-Federal-Governments-Investigative-and-Remediation-Capabilities-Related-to-Cybersecurity-Incidents.pdf)

๐Ÿ’ผ AU-12 (1) SYSTEM-WIDE | TIME-CORRELATED AUDIT TRAIL

The information system compiles audit records from [Assignment: organization-defined information system components] into a system-wide (logical or physical) audit trail that is time-correlated to within [Assignment: organization-defined level of tolerance for the relationship between time stamps of individual records in the audit trail].

๐Ÿ’ผ AU-12 (3) CHANGES BY AUTHORIZED INDIVIDUALS

The information system provides the capability for [Assignment: organization-defined individuals or roles] to change the auditing to be performed on [Assignment: organization-defined information system components] based on [Assignment: organization-defined selectable event criteria] within [Assignment: organization-defined time thresholds].

๐Ÿ’ผ AU-12 AUDIT GENERATION

The information system: AU-12a. Provides audit record generation capability for the auditable events defined in AU-2 a. at [Assignment: organization-defined information system components]; AU-12b. Allows [Assignment: organization-defined personnel or roles] to select which auditable events are to be audited by specific components of the information system; and AU-12c. Generates audit records for the events defined in AU-2 d. with the content defined in AU-3.

๐Ÿ’ผ AU-12 Audit Record Generation

a. Provide audit record generation capability for the event types the system is capable of auditing as defined in AU-2a on [Assignment: organization-defined system components]; b. Allow [Assignment: organization-defined personnel or roles] to select the event types that are to be logged by specific components of the system; and c. Generate audit records for the event types defined in AU-2c that include the audit record content defined in AU-3.

๐Ÿ’ผ AU-12 Audit Record Generation (L)(M)(H)

a. Provide audit record generation capability for the event types the system is capable of auditing as defined in AU-2a on [FedRAMP Assignment: all information system and network components where audit capability is deployed/available]; b. Allow [Assignment: organization-defined personnel or roles] to select the event types that are to be logged by specific components of the system; and c. Generate audit records for the event types defined in AU-2c that include the audit record content defined in AU-3.

๐Ÿ’ผ AU-12 Audit Record Generation (L)(M)(H)

a. Provide audit record generation capability for the event types the system is capable of auditing as defined in AU-2a on [FedRAMP Assignment: all information system and network components where audit capability is deployed/available]; b. Allow [Assignment: organization-defined personnel or roles] to select the event types that are to be logged by specific components of the system; and c. Generate audit records for the event types defined in AU-2c that include the audit record content defined in AU-3.

๐Ÿ’ผ AU-12 Audit Record Generation (L)(M)(H)

a. Provide audit record generation capability for the event types the system is capable of auditing as defined in AU-2a on [FedRAMP Assignment: all information system and network components where audit capability is deployed/available]; b. Allow [Assignment: organization-defined personnel or roles] to select the event types that are to be logged by specific components of the system; and c. Generate audit records for the event types defined in AU-2c that include the audit record content defined in AU-3.

๐Ÿ’ผ AU-12(1) System-wide and Time-correlated Audit Trail (H)

Compile audit records from [FedRAMP Assignment: all network, data storage, and computing devices] into a system-wide (logical or physical) audit trail that is time-correlated to within [Assignment: organization-defined level of tolerance for the relationship between time stamps of individual records in the audit trail].

๐Ÿ’ผ AU-12(3) Audit Record Generation | Changes by Authorized Individuals

Provide and implement the capability for [Assignment: organization-defined individuals or roles] to change the logging to be performed on [Assignment: organization-defined system components] based on [Assignment: organization-defined selectable event criteria] within [Assignment: organization-defined time thresholds].

๐Ÿ’ผ AU-12(3) Changes by Authorized Individuals (H)

Provide and implement the capability for [FedRAMP Assignment: service provider-defined individuals or roles with audit configuration responsibilities] to change the logging to be performed on [FedRAMP Assignment: all network, data storage, and computing devices] based on [Assignment: organization-defined selectable event criteria] within [Assignment: organization-defined time thresholds].

๐Ÿ’ผ AU-13 Monitoring for Information Disclosure

a. Monitor [Assignment: organization-defined open-source information and/or information sites] [Assignment: organization-defined frequency] for evidence of unauthorized disclosure of organizational information; and b. If an information disclosure is discovered: 1. Notify [Assignment: organization-defined personnel or roles]; and 2. Take the following additional actions: [Assignment: organization-defined additional actions].

๐Ÿ’ผ AU-13 MONITORING FOR INFORMATION DISCLOSURE

The organization monitors [Assignment: organization-defined open source information and/or information sites] [Assignment: organization-defined frequency] for evidence of unauthorized disclosure of organizational information.

๐Ÿ’ผ AU-14 Session Audit

a. Provide and implement the capability for [Assignment: organization-defined users or roles] to [Selection (one or more): record; view; hear; log] the content of a user session under [Assignment: organization-defined circumstances]; and b. Develop, integrate, and use session auditing activities in consultation with legal counsel and in accordance with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines.

๐Ÿ’ผ AU-14 SESSION AUDIT

The information system provides the capability for authorized users to select a user session to capture/record or view/hear.

๐Ÿ’ผ AU-15 ALTERNATE AUDIT CAPABILITY

The organization provides an alternate audit capability in the event of a failure in primary audit capability that provides [Assignment: organization-defined alternate audit functionality].

๐Ÿ’ผ AU-16 (2) SHARING OF AUDIT INFORMATION

The organization provides cross-organizational audit information to [Assignment: organization-defined organizations] based on [Assignment: organization-defined cross-organizational sharing agreements].

๐Ÿ’ผ AU-16 Cross-organizational Audit Logging

Employ [Assignment: organization-defined methods] for coordinating [Assignment: organization-defined audit information] among external organizations when audit information is transmitted across organizational boundaries.

๐Ÿ’ผ AU-16 CROSS-ORGANIZATIONAL AUDITING

The organization employs [Assignment: organization-defined methods] for coordinating [Assignment: organization-defined audit information] among external organizations when audit information is transmitted across organizational boundaries.

๐Ÿ’ผ AU-2 AUDIT EVENTS

The organization: AU-2a. Determines that the information system is capable of auditing the following events: [Assignment: organization-defined auditable events]; AU-2b. Coordinates the security audit function with other organizational entities requiring audit-related information to enhance mutual support and to help guide the selection of auditable events; AU-2c. Provides a rationale for why the auditable events are deemed to be adequate to support after-the-fact investigations of security incidents; and AU-2d. Determines that the following events are to be audited within the information system: [Assignment: organization-defined audited events (the subset of the auditable events defined in AU-2 a.) along with the frequency of (or situation requiring) auditing for each identified event].

๐Ÿ’ผ AU-2 Event Logging

a. Identify the types of events that the system is capable of logging in support of the audit function: [Assignment: organization-defined event types that the system is capable of logging]; b. Coordinate the event logging function with other organizational entities requiring audit-related information to guide and inform the selection criteria for events to be logged; c. Specify the following event types for logging within the system: [Assignment: organization-defined event types (subset of the event types defined in AU-2a.) along with the frequency of (or situation requiring) logging for each identified event type]; d. Provide a rationale for why the event types selected for logging are deemed to be adequate to support after-the-fact investigations of incidents; and e. Review and update the event types selected for logging [Assignment: organization-defined frequency].

๐Ÿ’ผ AU-2 Event Logging (L)(M)(H)

a. Identify the types of events that the system is capable of logging in support of the audit function: [FedRAMP Assignment: successful and unsuccessful account logon events, account management events, object access, policy change, privilege functions, process tracking, and system events. For Web applications: all administrator activity, authentication checks, authorization checks, data deletions, data access, data changes, and permission changes]; b. Coordinate the event logging function with other organizational entities requiring audit-related information to guide and inform the selection criteria for events to be logged; c. Specify the following event types for logging within the system: [FedRAMP Assignment: organization-defined subset of the auditable events defined in AU-2a to be audited continually for each identified event.]; d. Provide a rationale for why the event types selected for logging are deemed to be adequate to support after-the-fact investigations of incidents; and e. Review and update the event types selected for logging [FedRAMP Assignment: annually and whenever there is a change in the threat environment]. **AU-2 Additional FedRAMP Requirements and Guidance:** **(e) Guidance**: Annually or whenever changes in the threat environment are communicated to the service provider by the JAB/AO. **Requirement**: Coordination between service provider and consumer shall be documented and accepted by the JAB/AO.

๐Ÿ’ผ AU-2 Event Logging (L)(M)(H)

a. Identify the types of events that the system is capable of logging in support of the audit function: [FedRAMP Assignment: successful and unsuccessful account logon events, account management events, object access, policy change, privilege functions, process tracking, and system events. For Web applications: all administrator activity, authentication checks, authorization checks, data deletions, data access, data changes, and permission changes]; b. Coordinate the event logging function with other organizational entities requiring audit-related information to guide and inform the selection criteria for events to be logged; c. Specify the following event types for logging within the system: [FedRAMP Assignment: organization-defined subset of the auditable events defined in AU-2a to be audited continually for each identified event.]; d. Provide a rationale for why the event types selected for logging are deemed to be adequate to support after-the-fact investigations of incidents; and e. Review and update the event types selected for logging [FedRAMP Assignment: annually and whenever there is a change in the threat environment]. **AU-2 Additional FedRAMP Requirements and Guidance:** **(e) Guidance**: Annually or whenever changes in the threat environment are communicated to the service provider by the JAB/AO. **Requirement**: Coordination between service provider and consumer shall be documented and accepted by the JAB/AO.

๐Ÿ’ผ AU-2 Event Logging (L)(M)(H)

a. Identify the types of events that the system is capable of logging in support of the audit function: [FedRAMP Assignment: successful and unsuccessful account logon events, account management events, object access, policy change, privilege functions, process tracking, and system events. For Web applications: all administrator activity, authentication checks, authorization checks, data deletions, data access, data changes, and permission changes]; b. Coordinate the event logging function with other organizational entities requiring audit-related information to guide and inform the selection criteria for events to be logged; c. Specify the following event types for logging within the system: [FedRAMP Assignment: organization-defined subset of the auditable events defined in AU-2a to be audited continually for each identified event.]; d. Provide a rationale for why the event types selected for logging are deemed to be adequate to support after-the-fact investigations of incidents; and e. Review and update the event types selected for logging [FedRAMP Assignment: annually and whenever there is a change in the threat environment]. **AU-2 Additional FedRAMP Requirements and Guidance:** **(e) Guidance**: Annually or whenever changes in the threat environment are communicated to the service provider by the JAB/AO. **Requirement**: Coordination between service provider and consumer shall be documented and accepted by the JAB/AO.

๐Ÿ’ผ AU-3 Content of Audit Records

Ensure that audit records contain information that establishes the following: a. What type of event occurred; b. When the event occurred; c. Where the event occurred; d. Source of the event; e. Outcome of the event; and f. Identity of any individuals, subjects, or objects/entities associated with the event.

๐Ÿ’ผ AU-3 CONTENT OF AUDIT RECORDS

The information system generates audit records containing information that establishes what type of event occurred, when the event occurred, where the event occurred, the source of the event, the outcome of the event, and the identity of any individuals or subjects associated with the event.

๐Ÿ’ผ AU-3 Content of Audit Records (L)(M)(H)

Ensure that audit records contain information that establishes the following: a. What type of event occurred; b. When the event occurred; c. Where the event occurred; d. Source of the event; e. Outcome of the event; and f. Identity of any individuals, subjects, or objects/entities associated with the event.

๐Ÿ’ผ AU-3 Content of Audit Records (L)(M)(H)

Ensure that audit records contain information that establishes the following: a. What type of event occurred; b. When the event occurred; c. Where the event occurred; d. Source of the event; e. Outcome of the event; and f. Identity of any individuals, subjects, or objects/entities associated with the event.

๐Ÿ’ผ AU-3 Content of Audit Records (L)(M)(H)

Ensure that audit records contain information that establishes the following: a. What type of event occurred; b. When the event occurred; c. Where the event occurred; d. Source of the event; e. Outcome of the event; and f. Identity of any individuals, subjects, or objects/entities associated with the event.

๐Ÿ’ผ AU-3(1) Additional Audit Information (M)(H)

Generate audit records containing the following additional information: [FedRAMP Assignment: session, connection, transaction, or activity duration; for client-server transactions, the number of bytes received and bytes sent; additional informational messages to diagnose or identify the event; characteristics that describe or identify the object or resource being acted upon; individual identities of group account users; full-text of privileged commands]. **AU-3 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: For client-server transactions, the number of bytes sent and received gives bidirectional transfer information that can be helpful during an investigation or inquiry.

๐Ÿ’ผ AU-3(1) Additional Audit Information (M)(H)

Generate audit records containing the following additional information: [FedRAMP Assignment: session, connection, transaction, or activity duration; for client-server transactions, the number of bytes received and bytes sent; additional informational messages to diagnose or identify the event; characteristics that describe or identify the object or resource being acted upon; individual identities of group account users; full-text of privileged commands]. **AU-3 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: For client-server transactions, the number of bytes sent and received gives bidirectional transfer information that can be helpful during an investigation or inquiry.

๐Ÿ’ผ AU-5 (1) AUDIT STORAGE CAPACITY

The information system provides a warning to [Assignment: organization-defined personnel, roles, and/or locations] within [Assignment: organization-defined time period] when allocated audit record storage volume reaches [Assignment: organization-defined percentage] of repository maximum audit record storage capacity.

๐Ÿ’ผ AU-5 (2) REAL-TIME ALERTS

The information system provides an alert in [Assignment: organization-defined real-time period] to [Assignment: organization-defined personnel, roles, and/or locations] when the following audit failure events occur: [Assignment: organization-defined audit failure events requiring real-time alerts].

๐Ÿ’ผ AU-5 (4) SHUTDOWN ON FAILURE

The information system invokes a [Selection: full system shutdown; partial system shutdown; degraded operational mode with limited mission/business functionality available] in the event of [Assignment: organization-defined audit failures], unless an alternate audit capability exists.

๐Ÿ’ผ AU-5 Response to Audit Logging Process Failures

a. Alert [Assignment: organization-defined personnel or roles] within [Assignment: organization-defined time period] in the event of an audit logging process failure; and b. Take the following additional actions: [Assignment: organization-defined additional actions].

๐Ÿ’ผ AU-5 RESPONSE TO AUDIT PROCESSING FAILURES

The information system: AU-5a. Alerts [Assignment: organization-defined personnel or roles] in the event of an audit processing failure; and AU-5b. Takes the following additional actions: [Assignment: organization-defined actions to be taken (e.g., shut down information system, overwrite oldest audit records, stop generating audit records)].

๐Ÿ’ผ AU-5(1) Storage Capacity Warning (H)

Provide a warning to [Assignment: organization-defined personnel, roles, and/or locations] within [Assignment: organization-defined time period] when allocated audit log storage volume reaches [FedRAMP Assignment: 75%, or one month before expected negative impact] of repository maximum audit log storage capacity.

๐Ÿ’ผ AU-5(2) Real-time Alerts (H)

Provide an alert within [FedRAMP Assignment: real-time] to [FedRAMP Assignment: service provider personnel with authority to address failed audit events] when the following audit failure events occur: [Assignment: organization-defined audit logging failure events requiring real-time alerts].

๐Ÿ’ผ AU-6 (1) PROCESS INTEGRATION

The organization employs automated mechanisms to integrate audit review, analysis, and reporting processes to support organizational processes for investigation and response to suspicious activities.

๐Ÿ’ผ AU-6 (10) AUDIT LEVEL ADJUSTMENT

The organization adjusts the level of audit review, analysis, and reporting within the information system when there is a change in risk based on law enforcement information, intelligence information, or other credible sources of information.

๐Ÿ’ผ AU-6 (5) INTEGRATION | SCANNING AND MONITORING CAPABILITIES

The organization integrates analysis of audit records with analysis of [Selection (one or more): vulnerability scanning information; performance data; information system monitoring information; [Assignment: organization-defined data/information collected from other sources]] to further enhance the ability to identify inappropriate or unusual activity.

๐Ÿ’ผ AU-6 (7) PERMITTED ACTIONS

The organization specifies the permitted actions for each [Selection (one or more): information system process; role; user] associated with the review, analysis, and reporting of audit information.

๐Ÿ’ผ AU-6 Audit Record Review, Analysis, and Reporting

a. Review and analyze system audit records [Assignment: organization-defined frequency] for indications of [Assignment: organization-defined inappropriate or unusual activity] and the potential impact of the inappropriate or unusual activity; b. Report findings to [Assignment: organization-defined personnel or roles]; and c. Adjust the level of audit record review, analysis, and reporting within the system when there is a change in risk based on law enforcement information, intelligence information, or other credible sources of information.

๐Ÿ’ผ AU-6 Audit Record Review, Analysis, and Reporting (L)(M)(H)

a. Review and analyze system audit records [FedRAMP Assignment: at least weekly] for indications of [Assignment: organization-defined inappropriate or unusual activity] and the potential impact of the inappropriate or unusual activity; b. Report findings to [Assignment: organization-defined personnel or roles]; and c. Adjust the level of audit record review, analysis, and reporting within the system when there is a change in risk based on law enforcement information, intelligence information, or other credible sources of information. **AU-6 Additional FedRAMP Requirements and Guidance:** **Requirement**: Coordination between service provider and consumer shall be documented and accepted by the JAB/AO. In multi-tenant environments, capability and means for providing review, analysis, and reporting to consumer for data pertaining to consumer shall be documented.

๐Ÿ’ผ AU-6 Audit Record Review, Analysis, and Reporting (L)(M)(H)

a. Review and analyze system audit records [FedRAMP Assignment: at least weekly] for indications of [Assignment: organization-defined inappropriate or unusual activity] and the potential impact of the inappropriate or unusual activity; b. Report findings to [Assignment: organization-defined personnel or roles]; and c. Adjust the level of audit record review, analysis, and reporting within the system when there is a change in risk based on law enforcement information, intelligence information, or other credible sources of information. **AU-6 Additional FedRAMP Requirements and Guidance:** **Requirement**: Coordination between service provider and consumer shall be documented and accepted by the JAB/AO. In multi-tenant environments, capability and means for providing review, analysis, and reporting to consumer for data pertaining to consumer shall be documented.

๐Ÿ’ผ AU-6 Audit Record Review, Analysis, and Reporting (L)(M)(H)

a. Review and analyze system audit records [FedRAMP Assignment: at least weekly] for indications of [Assignment: organization-defined inappropriate or unusual activity] and the potential impact of the inappropriate or unusual activity; b. Report findings to [Assignment: organization-defined personnel or roles]; and c. Adjust the level of audit record review, analysis, and reporting within the system when there is a change in risk based on law enforcement information, intelligence information, or other credible sources of information. **AU-6 Additional FedRAMP Requirements and Guidance:** **Requirement**: Coordination between service provider and consumer shall be documented and accepted by the JAB/AO. In multi-tenant environments, capability and means for providing review, analysis, and reporting to consumer for data pertaining to consumer shall be documented.

๐Ÿ’ผ AU-6 AUDIT REVIEW, ANALYSIS, AND REPORTING

The organization: AU-6a. Reviews and analyzes information system audit records [Assignment: organization-defined frequency] for indications of [Assignment: organization-defined inappropriate or unusual activity]; and AU-6b. Reports findings to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ AU-6(5) Integrated Analysis of Audit Records (H)

Integrate analysis of audit records with analysis of [FedRAMP Assignment: Selection (one-or-more): vulnerability scanning information; performance data; information system monitoring information; penetration test data; [Assignment: organization-defined data/information collected from other sources]] to further enhance the ability to identify inappropriate or unusual activity.

๐Ÿ’ผ AU-6(6) Correlation with Physical Monitoring (H)

Correlate information from audit records with information obtained from monitoring physical access to further enhance the ability to identify suspicious, inappropriate, unusual, or malevolent activity. **AU-6 (6) Additional FedRAMP Requirements and Guidance:** **Requirement**: Coordination between service provider and consumer shall be documented and accepted by the JAB/AO.

๐Ÿ’ผ AU-6(7) Permitted Actions (H)

Specify the permitted actions for each [FedRAMP Assignment: information system process; role; user] associated with the review, analysis, and reporting of audit record information.

๐Ÿ’ผ AU-7 (1) AUTOMATIC PROCESSING

The information system provides the capability to process audit records for events of interest based on [Assignment: organization-defined audit fields within audit records].

๐Ÿ’ผ AU-7 (2) AUTOMATIC SORT AND SEARCH

The information system provides the capability to sort and search audit records for events of interest based on the content of [Assignment: organization-defined audit fields within audit records].

๐Ÿ’ผ AU-7 Audit Record Reduction and Report Generation

Provide and implement an audit record reduction and report generation capability that: a. Supports on-demand audit record review, analysis, and reporting requirements and after-the-fact investigations of incidents; and b. Does not alter the original content or time ordering of audit records.

๐Ÿ’ผ AU-7 Audit Record Reduction and Report Generation (M)(H)

Provide and implement an audit record reduction and report generation capability that: a. Supports on-demand audit record review, analysis, and reporting requirements and after-the-fact investigations of incidents; and b. Does not alter the original content or time ordering of audit records.

๐Ÿ’ผ AU-7 Audit Record Reduction and Report Generation (M)(H)

Provide and implement an audit record reduction and report generation capability that: a. Supports on-demand audit record review, analysis, and reporting requirements and after-the-fact investigations of incidents; and b. Does not alter the original content or time ordering of audit records.

๐Ÿ’ผ AU-7 AUDIT REDUCTION AND REPORT GENERATION

The information system provides an audit reduction and report generation capability that: AU-7a. Supports on-demand audit review, analysis, and reporting requirements and after-the-fact investigations of security incidents; and AU-7b. Does not alter the original content or time ordering of audit records.

๐Ÿ’ผ AU-7(1) Automatic Processing (M)(H)

Provide and implement the capability to process, sort, and search audit records for events of interest based on the following content: [Assignment: organization-defined fields within audit records].

๐Ÿ’ผ AU-7(1) Automatic Processing (M)(H)

Provide and implement the capability to process, sort, and search audit records for events of interest based on the following content: [Assignment: organization-defined fields within audit records].

๐Ÿ’ผ AU-8 (1) SYNCHRONIZATION WITH AUTHORITATIVE TIME SOURCE

The information system: AU-8 (1)(a) Compares the internal information system clocks [Assignment: organization-defined frequency] with [Assignment: organization-defined authoritative time source]; and AU-8 (1)(b) Synchronizes the internal system clocks to the authoritative time source when the time difference is greater than [Assignment: organization-defined time period].

๐Ÿ’ผ AU-8 Time Stamps

a. Use internal system clocks to generate time stamps for audit records; and b. Record time stamps for audit records that meet [Assignment: organization-defined granularity of time measurement] and that use Coordinated Universal Time, have a fixed local time offset from Coordinated Universal Time, or that include the local time offset as part of the time stamp.

๐Ÿ’ผ AU-8 TIME STAMPS

The information system: AU-8a. Uses internal system clocks to generate time stamps for audit records; and AU-8b. Records time stamps for audit records that can be mapped to Coordinated Universal Time (UTC) or Greenwich Mean Time (GMT) and meets [Assignment: organization-defined granularity of time measurement].

๐Ÿ’ผ AU-8 Time Stamps (L)(M)(H)

a. Use internal system clocks to generate time stamps for audit records; and b. Record time stamps for audit records that meet [FedRAMP Assignment: one second granularity of time measurement] and that use Coordinated Universal Time, have a fixed local time offset from Coordinated Universal Time, or that include the local time offset as part of the time stamp.

๐Ÿ’ผ AU-8 Time Stamps (L)(M)(H)

a. Use internal system clocks to generate time stamps for audit records; and b. Record time stamps for audit records that meet [FedRAMP Assignment: one second granularity of time measurement] and that use Coordinated Universal Time, have a fixed local time offset from Coordinated Universal Time, or that include the local time offset as part of the time stamp.

๐Ÿ’ผ AU-8 Time Stamps (L)(M)(H)

a. Use internal system clocks to generate time stamps for audit records; and b. Record time stamps for audit records that meet [FedRAMP Assignment: one second granularity of time measurement] and that use Coordinated Universal Time, have a fixed local time offset from Coordinated Universal Time, or that include the local time offset as part of the time stamp.

๐Ÿ’ผ AU-9 Protection of Audit Information

a. Protect audit information and audit logging tools from unauthorized access, modification, and deletion; and b. Alert [Assignment: organization-defined personnel or roles] upon detection of unauthorized access, modification, or deletion of audit information.

๐Ÿ’ผ AU-9 Protection of Audit Information (L)(M)(H)

a. Protect audit information and audit logging tools from unauthorized access, modification, and deletion; and b. Alert [Assignment: organization-defined personnel or roles] upon detection of unauthorized access, modification, or deletion of audit information.

๐Ÿ’ผ AU-9 Protection of Audit Information (L)(M)(H)

a. Protect audit information and audit logging tools from unauthorized access, modification, and deletion; and b. Alert [Assignment: organization-defined personnel or roles] upon detection of unauthorized access, modification, or deletion of audit information.

๐Ÿ’ผ AU-9 Protection of Audit Information (L)(M)(H)

a. Protect audit information and audit logging tools from unauthorized access, modification, and deletion; and b. Alert [Assignment: organization-defined personnel or roles] upon detection of unauthorized access, modification, or deletion of audit information.

๐Ÿ’ผ AU-9(3) Cryptographic Protection (H)

Implement cryptographic mechanisms to protect the integrity of audit information and audit tools. **AU-9 (3) Additional FedRAMP Requirements and Guidance:** **Guidance**: Note that this enhancement requires the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or NSA approved cryptography (see SC-13.)

๐Ÿ’ผ Awareness and Training (PR.AT)

The organization's personnel and partners are provided cybersecurity awareness education and are trained to perform their cybersecurity-related duties and responsibilities consistent with related policies, procedures, and agreements.

๐Ÿ’ผ Back up data

Back up data, applications, and configuration to meet requirements for recovery time objectives (RTO) and recovery point objectives (RPO).

๐Ÿ’ผ Business Environment (ID.BE)

The organization's mission, objectives, stakeholders, and activities are understood and prioritized; this information is used to inform cybersecurity roles, responsibilities, and risk management decisions.

๐Ÿ’ผ c. selection and configuration โ€” considerations when selecting and configuring vendor supplied software include due diligence as to the security testing conducted to identify vulnerabilities (either intended or deliberate); user access management capabilities (e.g. role based, support of segregation of duties); interface vulnerabilities; monitoring capabilities; encryption capabilities to protect sensitive data; ability to obtain and implement information security updates in a timely manner; compliance with the security policy framework; and configuration/implementation of the software which minimises the risk of a security compromise;

๐Ÿ’ผ CA-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] assessment, authorization, and monitoring policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the assessment, authorization, and monitoring policy and the associated assessment, authorization, and monitoring controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the assessment, authorization, and monitoring policy and procedures; and c. Review and update the current assessment, authorization, and monitoring: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ CA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] assessment, authorization, and monitoring policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the assessment, authorization, and monitoring policy and the associated assessment, authorization, and monitoring controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the assessment, authorization, and monitoring policy and procedures; and c. Review and update the current assessment, authorization, and monitoring: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] assessment, authorization, and monitoring policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the assessment, authorization, and monitoring policy and the associated assessment, authorization, and monitoring controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the assessment, authorization, and monitoring policy and procedures; and c. Review and update the current assessment, authorization, and monitoring: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] assessment, authorization, and monitoring policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the assessment, authorization, and monitoring policy and the associated assessment, authorization, and monitoring controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the assessment, authorization, and monitoring policy and procedures; and c. Review and update the current assessment, authorization, and monitoring: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CA-1 SECURITY ASSESSMENT AND AUTHORIZATION POLICY AND PROCEDURES

The organization: CA-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: CA-1a.1. A security assessment and authorization policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and CA-1a.2. Procedures to facilitate the implementation of the security assessment and authorization policy and associated security assessment and authorization controls; and CA-1b. Reviews and updates the current: CA-1b.1. Security assessment and authorization policy [Assignment: organization-defined frequency]; and CA-1b.2. Security assessment and authorization procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ CA-2 (2) SPECIALIZED ASSESSMENTS

The organization includes as part of security control assessments, [Assignment: organization-defined frequency], [Selection: announced; unannounced], [Selection (one or more): in-depth monitoring; vulnerability scanning; malicious user testing; insider threat assessment; performance/load testing; [Assignment: organization-defined other forms of security assessment]].

๐Ÿ’ผ CA-2 (3) EXTERNAL ORGANIZATIONS

The organization accepts the results of an assessment of [Assignment: organization-defined information system] performed by [Assignment: organization-defined external organization] when the assessment meets [Assignment: organization-defined requirements].

๐Ÿ’ผ CA-2 Control Assessments

a. Select the appropriate assessor or assessment team for the type of assessment to be conducted; b. Develop a control assessment plan that describes the scope of the assessment including: 1. Controls and control enhancements under assessment; 2. Assessment procedures to be used to determine control effectiveness; and 3. Assessment environment, assessment team, and assessment roles and responsibilities; c. Ensure the control assessment plan is reviewed and approved by the authorizing official or designated representative prior to conducting the assessment; d. Assess the controls in the system and its environment of operation [Assignment: organization-defined frequency] to determine the extent to which the controls are implemented correctly, operating as intended, and producing the desired outcome with respect to meeting established security and privacy requirements; e. Produce a control assessment report that document the results of the assessment; and f. Provide the results of the control assessment to [Assignment: organization-defined individuals or roles].

๐Ÿ’ผ CA-2 Control Assessments (L)(M)(H)

a. Select the appropriate assessor or assessment team for the type of assessment to be conducted; b. Develop a control assessment plan that describes the scope of the assessment including: 1. Controls and control enhancements under assessment; 2. Assessment procedures to be used to determine control effectiveness; and 3. Assessment environment, assessment team, and assessment roles and responsibilities; c. Ensure the control assessment plan is reviewed and approved by the authorizing official or designated representative prior to conducting the assessment; d. Assess the controls in the system and its environment of operation [FedRAMP Assignment: at least annually] to determine the extent to which the controls are implemented correctly, operating as intended, and producing the desired outcome with respect to meeting established security and privacy e. Produce a control assessment report that document the results of the assessment; and f. Provide the results of the control assessment to [FedRAMP Assignment: individuals or roles to include FedRAMP PMO]. **CA-2 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference FedRAMP Annual Assessment Guidance.

๐Ÿ’ผ CA-2 Control Assessments (L)(M)(H)

a. Select the appropriate assessor or assessment team for the type of assessment to be conducted; b. Develop a control assessment plan that describes the scope of the assessment including: 1. Controls and control enhancements under assessment; 2. Assessment procedures to be used to determine control effectiveness; and 3. Assessment environment, assessment team, and assessment roles and responsibilities; c. Ensure the control assessment plan is reviewed and approved by the authorizing official or designated representative prior to conducting the assessment; d. Assess the controls in the system and its environment of operation [FedRAMP Assignment: at least annually] to determine the extent to which the controls are implemented correctly, operating as intended, and producing the desired outcome with respect to meeting established security and privacy e. Produce a control assessment report that document the results of the assessment; and f. Provide the results of the control assessment to [FedRAMP Assignment: individuals or roles to include FedRAMP PMO]. **CA-2 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference FedRAMP Annual Assessment Guidance.

๐Ÿ’ผ CA-2 Control Assessments (L)(M)(H)

a. Select the appropriate assessor or assessment team for the type of assessment to be conducted; b. Develop a control assessment plan that describes the scope of the assessment including: 1. Controls and control enhancements under assessment; 2. Assessment procedures to be used to determine control effectiveness; and 3. Assessment environment, assessment team, and assessment roles and responsibilities; c. Ensure the control assessment plan is reviewed and approved by the authorizing official or designated representative prior to conducting the assessment; d. Assess the controls in the system and its environment of operation [FedRAMP Assignment: at least annually] to determine the extent to which the controls are implemented correctly, operating as intended, and producing the desired outcome with respect to meeting established security and privacy e. Produce a control assessment report that document the results of the assessment; and f. Provide the results of the control assessment to [FedRAMP Assignment: individuals or roles to include FedRAMP PMO]. **CA-2 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference FedRAMP Annual Assessment Guidance.

๐Ÿ’ผ CA-2 SECURITY ASSESSMENTS

The organization: CA-2a. Develops a security assessment plan that describes the scope of the assessment including: CA-2a.1. Security controls and control enhancements under assessment; CA-2a.2. Assessment procedures to be used to determine security control effectiveness; and CA-2a.3. Assessment environment, assessment team, and assessment roles and responsibilities; CA-2b. Assesses the security controls in the information system and its environment of operation [Assignment: organization-defined frequency] to determine the extent to which the controls are implemented correctly, operating as intended, and producing the desired outcome with respect to meeting established security requirements; CA-2c. Produces a security assessment report that documents the results of the assessment; and CA-2d. Provides the results of the security control assessment to [Assignment: organization-defined individuals or roles].

๐Ÿ’ผ CA-2(1) Independent Assessors (L)(M)(H)

Employ independent assessors or assessment teams to conduct control assessments. **CA-2 (1) Additional FedRAMP Requirements and Guidance:** **Requirement**: For JAB Authorization, must use an accredited 3PAO.

๐Ÿ’ผ CA-2(1) Independent Assessors (L)(M)(H)

Employ independent assessors or assessment teams to conduct control assessments. **CA-2 (1) Additional FedRAMP Requirements and Guidance:** **Requirement**: For JAB Authorization, must use an accredited 3PAO.

๐Ÿ’ผ CA-2(1) Independent Assessors (L)(M)(H)

Employ independent assessors or assessment teams to conduct control assessments. **CA-2 (1) Additional FedRAMP Requirements and Guidance:** **Requirement**: For JAB Authorization, must use an accredited 3PAO.

๐Ÿ’ผ CA-2(2) Control Assessments | Specialized Assessments

Include as part of control assessments, [Assignment: organization-defined frequency], [Selection: announced; unannounced], [Selection (one or more): in-depth monitoring; security instrumentation; automated security test cases; vulnerability scanning; malicious user testing; insider threat assessment; performance and load testing; data leakage or data loss assessment; [Assignment: organization-defined other forms of assessment]].

๐Ÿ’ผ CA-2(2) Specialized Assessments (H)

Include as part of control assessments [FedRAMP Assignment: at least annually], [Selection: [announced; unannounced], [Selection (one-or-more): in-depth monitoring; security instrumentation; automated security test cases; vulnerability scanning; malicious user testing; insider threat assessment; performance and load testing; data leakage or data loss assessment; [Assignment: organization-defined other forms of assessment]]. **CA-2 (2) Additional FedRAMP Requirements and Guidance:** **Requirement**: To include 'announced', 'vulnerability scanning'.

๐Ÿ’ผ CA-3 Information Exchange

a. Approve and manage the exchange of information between the system and other systems using [Selection (one or more): interconnection security agreements; information exchange security agreements; memoranda of understanding or agreement; service level agreements; user agreements; nondisclosure agreements; [Assignment: organization-defined type of agreement]]; b. Document, as part of each exchange agreement, the interface characteristics, security and privacy requirements, controls, and responsibilities for each system, and the impact level of the information communicated; and c. Review and update the agreements [Assignment: organization-defined frequency].

๐Ÿ’ผ CA-3 Information Exchange (L)(M)(H)

a. Approve and manage the exchange of information between the system and other systems using [Selection (one-or-more): interconnection security agreements; information exchange security agreements; memoranda of understanding or agreement; service level agreements; user agreements; nondisclosure agreements, [Assignment: organization-defined type of agreement]]; b. Document, as part of each exchange agreement, the interface characteristics, security and privacy requirements, controls, and responsibilities for each system, and the impact level of the information communicated; and c. Review and update the agreements [FedRAMP Assignment: at least annually and on input from JAB/AO].

๐Ÿ’ผ CA-3 Information Exchange (L)(M)(H)

a. Approve and manage the exchange of information between the system and other systems using [Selection (one-or-more): interconnection security agreements; information exchange security agreements; memoranda of understanding or agreement; service level agreements; user agreements; nondisclosure agreements, [Assignment: organization-defined type of agreement]]; b. Document, as part of each exchange agreement, the interface characteristics, security and privacy requirements, controls, and responsibilities for each system, and the impact level of the information communicated; and c. Review and update the agreements [FedRAMP Assignment: at least annually and on input from JAB/AO].

๐Ÿ’ผ CA-3 Information Exchange (L)(M)(H)

a. Approve and manage the exchange of information between the system and other systems using [Selection (one-or-more): interconnection security agreements; information exchange security agreements; memoranda of understanding or agreement; service level agreements; user agreements; nondisclosure agreements, [Assignment: organization-defined type of agreement]]; b. Document, as part of each exchange agreement, the interface characteristics, security and privacy requirements, controls, and responsibilities for each system, and the impact level of the information communicated; and c. Review and update the agreements [FedRAMP Assignment: at least annually and on input from JAB/AO].

๐Ÿ’ผ CA-3 SYSTEM INTERCONNECTIONS

The organization: CA-3a. Authorizes connections from the information system to other information systems through the use of Interconnection Security Agreements; CA-3b. Documents, for each interconnection, the interface characteristics, security requirements, and the nature of the information communicated; and CA-3c. Reviews and updates Interconnection Security Agreements [Assignment: organization-defined frequency].

๐Ÿ’ผ CA-3(6) Transfer Authorizations (H)

Verify that individuals or systems transferring data between interconnecting systems have the requisite authorizations (i.e., write permissions or privileges) prior to accepting such data.

๐Ÿ’ผ CA-3(7) Information Exchange | Transitive Information Exchanges

(a) Identify transitive (downstream) information exchanges with other systems through the systems identified in CA-3a; and (b) Take measures to ensure that transitive (downstream) information exchanges cease when the controls on identified transitive (downstream) systems cannot be verified or validated.

๐Ÿ’ผ CA-5 Plan of Action and Milestones

a. Develop a plan of action and milestones for the system to document the planned remediation actions of the organization to correct weaknesses or deficiencies noted during the assessment of the controls and to reduce or eliminate known vulnerabilities in the system; and b. Update existing plan of action and milestones [Assignment: organization-defined frequency] based on the findings from control assessments, independent audits or reviews, and continuous monitoring activities.

๐Ÿ’ผ CA-5 PLAN OF ACTION AND MILESTONES

The organization: CA-5a. Develops a plan of action and milestones for the information system to document the organization???s planned remedial actions to correct weaknesses or deficiencies noted during the assessment of the security controls and to reduce or eliminate known vulnerabilities in the system; and CA-5b. Updates existing plan of action and milestones [Assignment: organization-defined frequency] based on the findings from security controls assessments, security impact analyses, and continuous monitoring activities.

๐Ÿ’ผ CA-5 Plan of Action and Milestones (L)(M)(H)

a. Develop a plan of action and milestones for the system to document the planned remediation actions of the organization to correct weaknesses or deficiencies noted during the assessment of the controls and to reduce or eliminate known vulnerabilities in the system; and b. Update existing plan of action and milestones [FedRAMP Assignment: at least monthly] based on the findings from control assessments, independent audits or reviews, and continuous monitoring activities. **CA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference FedRAMP-POAM-Template. **Requirement**: POA&Ms must be provided at least monthly.

๐Ÿ’ผ CA-5 Plan of Action and Milestones (L)(M)(H)

a. Develop a plan of action and milestones for the system to document the planned remediation actions of the organization to correct weaknesses or deficiencies noted during the assessment of the controls and to reduce or eliminate known vulnerabilities in the system; and b. Update existing plan of action and milestones [FedRAMP Assignment: at least monthly] based on the findings from control assessments, independent audits or reviews, and continuous monitoring activities. **CA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference FedRAMP-POAM-Template. **Requirement**: POA&Ms must be provided at least monthly.

๐Ÿ’ผ CA-5 Plan of Action and Milestones (L)(M)(H)

a. Develop a plan of action and milestones for the system to document the planned remediation actions of the organization to correct weaknesses or deficiencies noted during the assessment of the controls and to reduce or eliminate known vulnerabilities in the system; and b. Update existing plan of action and milestones [FedRAMP Assignment: at least monthly] based on the findings from control assessments, independent audits or reviews, and continuous monitoring activities. **CA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference FedRAMP-POAM-Template. **Requirement**: POA&Ms must be provided at least monthly.

๐Ÿ’ผ CA-6 Authorization

a. Assign a senior official as the authorizing official for the system; b. Assign a senior official as the authorizing official for common controls available for inheritance by organizational systems; c. Ensure that the authorizing official for the system, before commencing operations: 1. Accepts the use of common controls inherited by the system; and 2. Authorizes the system to operate; d. Ensure that the authorizing official for common controls authorizes the use of those controls for inheritance by organizational systems; e. Update the authorizations [Assignment: organization-defined frequency].

๐Ÿ’ผ CA-6 Authorization (L)(M)(H)

a. Assign a senior official as the authorizing official for the system; b. Assign a senior official as the authorizing official for common controls available for inheritance by organizational systems; c. Ensure that the authorizing official for the system, before commencing operations: 1. Accepts the use of common controls inherited by the system; and 2. Authorizes the system to operate; d. Ensure that the authorizing official for common controls authorizes the use of those controls for inheritance by organizational systems; e. Update the authorizations [FedRAMP Assignment: in accordance with OMB A-130 requirements or when a significant change occurs]. **CA-6 Additional FedRAMP Requirements and Guidance:** **(e) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F and according to FedRAMP Significant Change Policies and Procedures. The service provider describes the types of changes to the information system or the environment of operations that would impact the risk posture. The types of changes are approved and accepted by the JAB/AO.

๐Ÿ’ผ CA-6 Authorization (L)(M)(H)

a. Assign a senior official as the authorizing official for the system; b. Assign a senior official as the authorizing official for common controls available for inheritance by organizational systems; c. Ensure that the authorizing official for the system, before commencing operations: 1. Accepts the use of common controls inherited by the system; and 2. Authorizes the system to operate; d. Ensure that the authorizing official for common controls authorizes the use of those controls for inheritance by organizational systems; e. Update the authorizations [FedRAMP Assignment: in accordance with OMB A-130 requirements or when a significant change occurs]. **CA-6 Additional FedRAMP Requirements and Guidance:** **(e) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F and according to FedRAMP Significant Change Policies and Procedures. The service provider describes the types of changes to the information system or the environment of operations that would impact the risk posture. The types of changes are approved and accepted by the JAB/AO.

๐Ÿ’ผ CA-6 Authorization (L)(M)(H)

a. Assign a senior official as the authorizing official for the system; b. Assign a senior official as the authorizing official for common controls available for inheritance by organizational systems; c. Ensure that the authorizing official for the system, before commencing operations: 1. Accepts the use of common controls inherited by the system; and 2. Authorizes the system to operate; d. Ensure that the authorizing official for common controls authorizes the use of those controls for inheritance by organizational systems; e. Update the authorizations [FedRAMP Assignment: in accordance with OMB A-130 requirements or when a significant change occurs]. **CA-6 Additional FedRAMP Requirements and Guidance:** **(e) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F and according to FedRAMP Significant Change Policies and Procedures. The service provider describes the types of changes to the information system or the environment of operations that would impact the risk posture. The types of changes are approved and accepted by the JAB/AO.

๐Ÿ’ผ CA-6 SECURITY AUTHORIZATION

The organization: CA-6a. Assigns a senior-level executive or manager as the authorizing official for the information system; CA-6b. Ensures that the authorizing official authorizes the information system for processing before commencing operations; and CA-6c. Updates the security authorization [Assignment: organization-defined frequency].

๐Ÿ’ผ CA-7 (1) INDEPENDENT ASSESSMENT

The organization employs assessors or assessment teams with [Assignment: organization-defined level of independence] to monitor the security controls in the information system on an ongoing basis.

๐Ÿ’ผ CA-7 (3) TREND ANALYSES

The organization employs trend analyses to determine if security control implementations, the frequency of continuous monitoring activities, and/or the types of activities used in the continuous monitoring process need to be modified based on empirical data.

๐Ÿ’ผ CA-7 Continuous Monitoring

Develop a system-level continuous monitoring strategy and implement continuous monitoring in accordance with the organization-level continuous monitoring strategy that includes: a. Establishing the following system-level metrics to be monitored: [Assignment: organization-defined system-level metrics]; b. Establishing [Assignment: organization-defined frequencies] for monitoring and [Assignment: organization-defined frequencies] for assessment of control effectiveness; c. Ongoing control assessments in accordance with the continuous monitoring strategy; d. Ongoing monitoring of system and organization-defined metrics in accordance with the continuous monitoring strategy; e. Correlation and analysis of information generated by control assessments and monitoring; f. Response actions to address results of the analysis of control assessment and monitoring information; and g. Reporting the security and privacy status of the system to [Assignment: organization-defined personnel or roles] [Assignment: organization-defined frequency].

๐Ÿ’ผ CA-7 CONTINUOUS MONITORING

The organization develops a continuous monitoring strategy and implements a continuous monitoring program that includes: CA-7a. Establishment of [Assignment: organization-defined metrics] to be monitored; CA-7b. Establishment of [Assignment: organization-defined frequencies] for monitoring and [Assignment: organization-defined frequencies] for assessments supporting such monitoring; CA-7c. Ongoing security control assessments in accordance with the organizational continuous monitoring strategy; CA-7d. Ongoing security status monitoring of organization-defined metrics in accordance with the organizational continuous monitoring strategy; CA-7e. Correlation and analysis of security-related information generated by assessments and monitoring; CA-7f. Response actions to address results of the analysis of security-related information; and CA-7g. Reporting the security status of organization and the information system to [Assignment: organization-defined personnel or roles] [Assignment: organization-defined frequency].

๐Ÿ’ผ CA-7 Continuous Monitoring (L)(M)(H)

Develop a system-level continuous monitoring strategy and implement continuous monitoring in accordance with the organization-level continuous monitoring strategy that includes: a. Establishing the following system-level metrics to be monitored: [Assignment: organization-defined system-level metrics]; b. Establishing [Assignment: organization-defined frequencies] for monitoring and [Assignment: organization-defined frequencies] for assessment of control effectiveness; c. Ongoing control assessments in accordance with the continuous monitoring strategy; d. Ongoing monitoring of system and organization-defined metrics in accordance with the continuous monitoring strategy; e. Correlation and analysis of information generated by control assessments and monitoring; f. Response actions to address results of the analysis of control assessment and monitoring information; and g. Reporting the security and privacy status of the system to [FedRAMP Assignment: to include JAB/AO][Assignment: organization-defined frequency]. **CA-7 Additional FedRAMP Requirements and Guidance:** **Guidance**: FedRAMP does not provide a template for the Continuous Monitoring Plan. CSPs should reference the FedRAMP Continuous Monitoring Strategy Guide when developing the Continuous Monitoring Plan. **Requirement**: Operating System, Database, Web Application, Container, and Service Configuration Scans: at least monthly. All scans performed by Independent Assessor: at least annually. **Requirement**: CSOs with more than one agency ATO must implement a collaborative Continuous Monitoring (Con Mon) approach described in the FedRAMP Guide for Multi-Agency Continuous Monitoring. This requirement applies to CSOs authorized via the Agency path as each agency customer is responsible for performing Con Mon oversight. It does not apply to CSPs authorized via the JAB path because the JAB performs Con Mon oversight.

๐Ÿ’ผ CA-7 Continuous Monitoring (L)(M)(H)

Develop a system-level continuous monitoring strategy and implement continuous monitoring in accordance with the organization-level continuous monitoring strategy that includes: a. Establishing the following system-level metrics to be monitored: [Assignment: organization-defined system-level metrics]; b. Establishing [Assignment: organization-defined frequencies] for monitoring and [Assignment: organization-defined frequencies] for assessment of control effectiveness; c. Ongoing control assessments in accordance with the continuous monitoring strategy; d. Ongoing monitoring of system and organization-defined metrics in accordance with the continuous monitoring strategy; e. Correlation and analysis of information generated by control assessments and monitoring; f. Response actions to address results of the analysis of control assessment and monitoring information; and g. Reporting the security and privacy status of the system to [FedRAMP Assignment: to include JAB/AO][Assignment: organization-defined frequency]. **CA-7 Additional FedRAMP Requirements and Guidance:** **Guidance**: FedRAMP does not provide a template for the Continuous Monitoring Plan. CSPs should reference the FedRAMP Continuous Monitoring Strategy Guide when developing the Continuous Monitoring Plan. **Requirement**: Operating System, Database, Web Application, Container, and Service Configuration Scans: at least monthly. All scans performed by Independent Assessor: at least annually. **Requirement**: CSOs with more than one agency ATO must implement a collaborative Continuous Monitoring (Con Mon) approach described in the FedRAMP Guide for Multi-Agency Continuous Monitoring. This requirement applies to CSOs authorized via the Agency path as each agency customer is responsible for performing Con Mon oversight. It does not apply to CSPs authorized via the JAB path because the JAB performs Con Mon oversight.

๐Ÿ’ผ CA-7 Continuous Monitoring (L)(M)(H)

Develop a system-level continuous monitoring strategy and implement continuous monitoring in accordance with the organization-level continuous monitoring strategy that includes: a. Establishing the following system-level metrics to be monitored: [Assignment: organization-defined system-level metrics]; b. Establishing [Assignment: organization-defined frequencies] for monitoring and [Assignment: organization-defined frequencies] for assessment of control effectiveness; c. Ongoing control assessments in accordance with the continuous monitoring strategy; d. Ongoing monitoring of system and organization-defined metrics in accordance with the continuous monitoring strategy; e. Correlation and analysis of information generated by control assessments and monitoring; f. Response actions to address results of the analysis of control assessment and monitoring information; and g. Reporting the security and privacy status of the system to [FedRAMP Assignment: to include JAB/AO][Assignment: organization-defined frequency]. **CA-7 Additional FedRAMP Requirements and Guidance:** **Guidance**: FedRAMP does not provide a template for the Continuous Monitoring Plan. CSPs should reference the FedRAMP Continuous Monitoring Strategy Guide when developing the Continuous Monitoring Plan. **Requirement**: Operating System, Database, Web Application, Container, and Service Configuration Scans: at least monthly. All scans performed by Independent Assessor: at least annually. **Requirement**: CSOs with more than one agency ATO must implement a collaborative Continuous Monitoring (Con Mon) approach described in the FedRAMP Guide for Multi-Agency Continuous Monitoring. This requirement applies to CSOs authorized via the Agency path as each agency customer is responsible for performing Con Mon oversight. It does not apply to CSPs authorized via the JAB path because the JAB performs Con Mon oversight.

๐Ÿ’ผ CA-7(3) Continuous Monitoring | Trend Analyses

Employ trend analyses to determine if control implementations, the frequency of continuous monitoring activities, and the types of activities used in the continuous monitoring process need to be modified based on empirical data.

๐Ÿ’ผ CA-7(4) Risk Monitoring (L)(M)(H)

Ensure risk monitoring is an integral part of the continuous monitoring strategy that includes the following: (a) Effectiveness monitoring; (b) Compliance monitoring; and (c) Change monitoring.

๐Ÿ’ผ CA-7(4) Risk Monitoring (L)(M)(H)

Ensure risk monitoring is an integral part of the continuous monitoring strategy that includes the following: (a) Effectiveness monitoring; (b) Compliance monitoring; and (c) Change monitoring.

๐Ÿ’ผ CA-7(4) Risk Monitoring (L)(M)(H)

Ensure risk monitoring is an integral part of the continuous monitoring strategy that includes the following: (a) Effectiveness monitoring; (b) Compliance monitoring; and (c) Change monitoring.

๐Ÿ’ผ CA-8 (2) RED TEAM EXERCISES

The organization employs [Assignment: organization-defined red team exercises] to simulate attempts by adversaries to compromise organizational information systems in accordance with [Assignment: organization-defined rules of engagement].

๐Ÿ’ผ CA-8 Penetration Testing

Conduct penetration testing [Assignment: organization-defined frequency] on [Assignment: organization-defined systems or system components].

๐Ÿ’ผ CA-8 PENETRATION TESTING

The organization conducts penetration testing [Assignment: organization-defined frequency] on [Assignment: organization-defined information systems or system components].

๐Ÿ’ผ CA-8 Penetration Testing (L)(M)(H)

Conduct penetration testing [FedRAMP Assignment: at least annually] on [Assignment: organization-defined systems or system components]. **CA-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference the FedRAMP Penetration Test Guidance.

๐Ÿ’ผ CA-8 Penetration Testing (L)(M)(H)

Conduct penetration testing [FedRAMP Assignment: at least annually] on [Assignment: organization-defined systems or system components]. **CA-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference the FedRAMP Penetration Test Guidance.

๐Ÿ’ผ CA-8 Penetration Testing (L)(M)(H)

Conduct penetration testing [FedRAMP Assignment: at least annually] on [Assignment: organization-defined systems or system components]. **CA-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: Reference the FedRAMP Penetration Test Guidance.

๐Ÿ’ผ CA-8(2) Red Team Exercises (M)(H)

Employ the following red-team exercises to simulate attempts by adversaries to compromise organizational systems in accordance with applicable rules of engagement: [Assignment: organization-defined red team exercises]. **CA-8(2) Additional FedRAMP Requirements and Guidance:** **Guidance**: See the FedRAMP Documents page > Penetration Test Guidance <https://www.FedRAMP.gov/documents/>.

๐Ÿ’ผ CA-8(2) Red Team Exercises (M)(H)

Employ the following red-team exercises to simulate attempts by adversaries to compromise organizational systems in accordance with applicable rules of engagement: [Assignment: organization-defined red team exercises]. **CA-8(2) Additional FedRAMP Requirements and Guidance:** **Guidance**: See the FedRAMP Documents page > Penetration Test Guidance <https://www.FedRAMP.gov/documents/>.

๐Ÿ’ผ CA-9 Internal System Connections

a. Authorize internal connections of [Assignment: organization-defined system components or classes of components] to the system; b. Document, for each internal connection, the interface characteristics, security and privacy requirements, and the nature of the information communicated; c. Terminate internal system connections after [Assignment: organization-defined conditions]; and d. Review [Assignment: organization-defined frequency] the continued need for each internal connection.

๐Ÿ’ผ CA-9 INTERNAL SYSTEM CONNECTIONS

The organization: CA-9a. Authorizes internal connections of [Assignment: organization-defined information system components or classes of components] to the information system; and CA-9b. Documents, for each internal connection, the interface characteristics, security requirements, and the nature of the information communicated.

๐Ÿ’ผ CA-9 Internal System Connections (L)(M)(H)

a. Authorize internal connections of [Assignment: organization-defined system components or classes of components] to the system; b. Document, for each internal connection, the interface characteristics, security and privacy requirements, and the nature of the information communicated; c. Terminate internal system connections after [Assignment: organization-defined conditions]; and d. Review [FedRAMP Assignment: at least annually] the continued need for each internal connection.

๐Ÿ’ผ CA-9 Internal System Connections (L)(M)(H)

a. Authorize internal connections of [Assignment: organization-defined system components or classes of components] to the system; b. Document, for each internal connection, the interface characteristics, security and privacy requirements, and the nature of the information communicated; c. Terminate internal system connections after [Assignment: organization-defined conditions]; and d. Review [FedRAMP Assignment: at least annually] the continued need for each internal connection.

๐Ÿ’ผ CA-9 Internal System Connections (L)(M)(H)

a. Authorize internal connections of [Assignment: organization-defined system components or classes of components] to the system; b. Document, for each internal connection, the interface characteristics, security and privacy requirements, and the nature of the information communicated; c. Terminate internal system connections after [Assignment: organization-defined conditions]; and d. Review [FedRAMP Assignment: at least annually] the continued need for each internal connection.

๐Ÿ’ผ CC1.1-2 Sets the Tone at the Top

The board of directors and management, at all levels, demonstrate through their directives, actions, and behavior the importance of integrity and ethical values to support the functioning of the system of internal control.

๐Ÿ’ผ CC1.1-4 Establishes Standards of Conduct

The expectations of the board of directors and senior management concerning integrity and ethical values are defined in the entityโ€™s standards of conduct and understood at all levels of the entity and by outsourced service providers and business partners.

๐Ÿ’ผ CC1.2-2 Applies Relevant Expertise

The board of directors defines, maintains, and periodically evaluates the skills and expertise needed among its members to enable them to ask probing questions of senior management and take commensurate action.

๐Ÿ’ผ CC1.2-4 Supplements Board Expertise

The board of directors supplements its expertise relevant to security, availability, processing integrity, confidentiality, and privacy, as needed, through the use of a subcommittee or consultants.

๐Ÿ’ผ CC1.3-2 Establishes Reporting Lines

Management designs and evaluates lines of reporting for each entity structure to enable execution of authorities and responsibilities and flow of information to manage the activities of the entity.

๐Ÿ’ผ CC1.5-2 Establishes Performance Measures, Incentives, and Rewards

Management and the board of directors establish performance measures, incentives, and other rewards appropriate for responsibilities at all levels of the entity, reflecting appropriate dimensions of performance and expected standards of conduct, and considering the achievement of both short-term and longer-term objectives.

๐Ÿ’ผ CC1.5-4 Considers Excessive Pressures

Management and the board of directors evaluate and adjust pressures associated with the achievement of objectives as they assign responsibilities, develop performance measures, and evaluate performance.

๐Ÿ’ผ CC2.2-5 Communicates Responsibilities

Entity personnel with responsibility for designing, developing, implementing, operating, maintaining, or monitoring system controls receive communications about their responsibilities, including changes in their responsibilities, and have the information necessary to carry out those responsibilities.

๐Ÿ’ผ CC2.3-11 Communicates System Responsibilities

External users with responsibility for designing, developing, implementing, operating, maintaining, and monitoring system controls receive communications about their responsibilities and have the information necessary to carry out those responsibilities.

๐Ÿ’ผ CC2.3-2 Enables Inbound Communications

Open communication channels allow input from customers, consumers, suppliers, external auditors, regulators, financial analysts, and others, providing management and the board of directors with relevant information.

๐Ÿ’ผ CC3.2-9 Assesses the Significance of the Risks

The entity assesses the significance of the identified risks, including (1) determining the criticality of system components, including information assets, in achieving the objectives; (2) assessing the susceptibility of the identified vulnerabilities to the identified threats (3) assessing the likelihood of the identified risks (4) assessing the magnitude of the effect of potential risks to the achievement of the objectives; (5) considering the potential effects of unidentified threats and vulnerabilities on the assessed risks; (6) developing risk mitigation strategies to address the assessed risks; and (7) evaluating the appropriateness of residual risk (including whether to accept, reduce, or share such risks).

๐Ÿ’ผ CC3.3-3 Assesses Opportunities

The assessment of fraud risk considers opportunities for unauthorized acquisition, use, or disposal of assets, altering the entity's reporting records, or committing other inappropriate acts.

๐Ÿ’ผ CC3.4-2 Assesses Changes in the Business Model

The entity considers the potential impacts of new business lines, dramatically altered compositions of existing business lines, acquired or divested business operations on the system of internal control, rapid growth, changing reliance on foreign geographies, and new technologies.

๐Ÿ’ผ CC5.1-2 Considers Entity-Specific Factors

Management considers how the environment, complexity, nature, and scope of its operations, as well as the specific characteristics of its organization, affect the selection and development of control activities.

๐Ÿ’ผ CC6.1-3 Restricts Logical Access

Logical access to information assets, including hardware, data (at-rest, during processing, or in transmission), software, administrative authorities, mobile devices, output, and offline system components is restricted through the use of access control software and rule sets.

๐Ÿ’ผ CC6.1-6 Manages Points of Access

Points of access by outside entities and the types of data that flow through the points of access are identified, inventoried, and managed. The types of individuals and systems using each point of access are identified, documented, and managed.

๐Ÿ’ผ CC6.1-9 Manages Credentials for Infrastructure and Software

New internal and external infrastructure and software are registered, authorized, and documented prior to being granted access credentials and implemented on the network or access point. Credentials are removed and access is disabled when access is no longer required or the infrastructure and software are no longer in use.

๐Ÿ’ผ CC6.2-1 Creates Access Credentials to Protected Information Assets

The entity creates credentials for accessing protected information assets based on an authorization from the system's asset owner or authorized custodian. Authorization is required for the creation of all types of credentials of individuals (for example, employees, contractors, vendors, and business partner personnel), systems, and software.

๐Ÿ’ผ CC6.3-3 Uses Access Control Structures

The entity uses access control structures, such as role-based access\ \ controls, to restrict access to protected information assets, limit privileges,\ \ and support segregation of incompatible functions.

๐Ÿ’ผ CC6.3-4 Reviews Access Roles and Rules

The appropriateness of access roles and access rules is reviewed on a periodic basis for unnecessary and inappropriate individuals (for example, employees, contractors, vendors, business partner personnel) and inappropriate system or service accounts. Access roles and rules are modified, as appropriate.

๐Ÿ’ผ CC6.4-3 Recovers Physical Devices

Processes are in place to recover entity devices (for example, badges, laptops, and mobile devices) when an employee, contractor, vendor, or business partner no longer requires access.

๐Ÿ’ผ CC6.6-4 Implements Boundary Protection Systems

Boundary protection systems (for example, firewalls, demilitarized zones, and intrusion detection systems) are implemented to protect external access points from attempts and unauthorized access and are monitored to detect such attempts.

๐Ÿ’ผ CC7.1-5 Conducts Vulnerability Scans

The entity conducts vulnerability scans designed to identify potential vulnerabilities or misconfigurations on a periodic basis and after any significant change in the environment and takes action to remediate identified deficiencies on a timely basis.

๐Ÿ’ผ CC7.2-1 Implements Detection Policies, Procedures, and Tools

Detection policies and procedures are defined and implemented, and detection tools are implemented on Infrastructure and software to identify anomalies in the operation or unusual activity on systems. Procedures may include (1) a defined governance process for security event detection and management that includes provision of resources; (2) use of intelligence sources to identify newly discovered threats and vulnerabilities; and (3) logging of unusual system activities.

๐Ÿ’ผ CC7.2-2 Designs Detection Measures

Detection measures are designed to identify anomalies that could result from actual or attempted (1) compromise of physical barriers; (2) unauthorized actions of authorized personnel; (3) use of compromised identification and authentication credentials; (4) unauthorized access from outside the system boundaries; (5) compromise of authorized external parties; and (6) implementation or connection of unauthorized hardware and software.

๐Ÿ’ผ CC7.4-11 Periodically Evaluates Incidents

Periodically, management reviews incidents related to security, availability, processing integrity, confidentiality, and privacy and identifies the need for system changes based on incident patterns and root causes.

๐Ÿ’ผ CC7.4-14 Application of Sanctions

The conduct of individuals and organizations operating under the authority of the entity and involved in the unauthorized use or disclosure of personal information is evaluated and, if appropriate, sanctioned in accordance with entity policies and legal and regulatory requirements.

๐Ÿ’ผ CC7.5-6 Implements Incident Recovery Plan Testing

Incident recovery plan testing is performed on a periodic basis. The testing includes (1) development of testing scenarios based on threat likelihood and magnitude; (2) consideration of relevant system components from across the entity that can impair availability; (3) scenarios that consider the potential for the lack of availability of key personnel; and (4) revision of continuity plans and systems based on test results.

๐Ÿ’ผ CC8.1-15 Considers System Resilience

The entity considers system resilience when designing its systems and tests resilience during development to help ensure the entity's ability to respond to, recover from, and resume operations through significant disruptions.

๐Ÿ’ผ CC8.1-18 Privacy by Design

The entity considers privacy requirements in the design of its systems and processes and limits the collection and processing of personal information to what is necessary for the identified purpose.

๐Ÿ’ผ CC8.1-4 Documents Changes

A process is in place to document system changes to support ongoing maintenance of the system and to support system users in performing their responsibilities.

๐Ÿ’ผ CC9.1-1 Considers Mitigation of Risks of Business Disruption

Risk mitigation activities include the development of planned policies, procedures, communications, and alternative processing solutions to respond to, mitigate, and recover from security events that disrupt business operations. Those policies and procedures include monitoring processes and information and communications to meet the entity's objectives during response, mitigation, and recovery efforts.

๐Ÿ’ผ CC9.2-2 Identifies Vulnerabilities

The entity evaluates vulnerabilities arising from vendor and business partner relationships, including third-party access to the entity's IT systems and connections with third-party networks.

๐Ÿ’ผ CM-1 CONFIGURATION MANAGEMENT POLICY AND PROCEDURES

The organization: CM-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: CM-1a.1. A configuration management policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and CM-1a.2. Procedures to facilitate the implementation of the configuration management policy and associated configuration management controls; and CM-1b. Reviews and updates the current: CM-1b.1. Configuration management policy [Assignment: organization-defined frequency]; and CM-1b.2. Configuration management procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ CM-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] configuration management policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the configuration management policy and the associated configuration management controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the configuration management policy and procedures; and c. Review and update the current configuration management: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ CM-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] configuration management policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the configuration management policy and the associated configuration management controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the configuration management policy and procedures; and c. Review and update the current configuration management: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CM-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] configuration management policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the configuration management policy and the associated configuration management controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the configuration management policy and procedures; and c. Review and update the current configuration management: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CM-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] configuration management policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the configuration management policy and the associated configuration management controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the configuration management policy and procedures; and c. Review and update the current configuration management: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CM-10 Software Usage Restrictions

a. Use software and associated documentation in accordance with contract agreements and copyright laws; b. Track the use of software and associated documentation protected by quantity licenses to control copying and distribution; and c. Control and document the use of peer-to-peer file sharing technology to ensure that this capability is not used for the unauthorized distribution, display, performance, or reproduction of copyrighted work.

๐Ÿ’ผ CM-10 SOFTWARE USAGE RESTRICTIONS

The organization: CM-10a. Uses software and associated documentation in accordance with contract agreements and copyright laws; CM-10b. Tracks the use of software and associated documentation protected by quantity licenses to control copying and distribution; and CM-10c. Controls and documents the use of peer-to-peer file sharing technology to ensure that this capability is not used for the unauthorized distribution, display, performance, or reproduction of copyrighted work.

๐Ÿ’ผ CM-10 Software Usage Restrictions (L)(M)(H)

a. Use software and associated documentation in accordance with contract agreements and copyright laws; b. Track the use of software and associated documentation protected by quantity licenses to control copying and distribution; and c. Control and document the use of peer-to-peer file sharing technology to ensure that this capability is not used for the unauthorized distribution, display, performance, or reproduction of copyrighted work.

๐Ÿ’ผ CM-10 Software Usage Restrictions (L)(M)(H)

a. Use software and associated documentation in accordance with contract agreements and copyright laws; b. Track the use of software and associated documentation protected by quantity licenses to control copying and distribution; and c. Control and document the use of peer-to-peer file sharing technology to ensure that this capability is not used for the unauthorized distribution, display, performance, or reproduction of copyrighted work.

๐Ÿ’ผ CM-10 Software Usage Restrictions (L)(M)(H)

a. Use software and associated documentation in accordance with contract agreements and copyright laws; b. Track the use of software and associated documentation protected by quantity licenses to control copying and distribution; and c. Control and document the use of peer-to-peer file sharing technology to ensure that this capability is not used for the unauthorized distribution, display, performance, or reproduction of copyrighted work.

๐Ÿ’ผ CM-11 User-installed Software

a. Establish [Assignment: organization-defined policies] governing the installation of software by users; b. Enforce software installation policies through the following methods: [Assignment: organization-defined methods]; and c. Monitor policy compliance [Assignment: organization-defined frequency].

๐Ÿ’ผ CM-11 USER-INSTALLED SOFTWARE

The organization: CM-11a. Establishes [Assignment: organization-defined policies] governing the installation of software by users; CM-11b. Enforces software installation policies through [Assignment: organization-defined methods]; and CM-11c. Monitors policy compliance at [Assignment: organization-defined frequency].

๐Ÿ’ผ CM-11 User-installed Software (L)(M)(H)

a. Establish [Assignment: organization-defined policies] governing the installation of software by users; b. Enforce software installation policies through the following methods: [Assignment: organization-defined methods]; and c. Monitor policy compliance [FedRAMP Assignment: Continuously (via CM-7 (5))].

๐Ÿ’ผ CM-11 User-installed Software (L)(M)(H)

a. Establish [Assignment: organization-defined policies] governing the installation of software by users; b. Enforce software installation policies through the following methods: [Assignment: organization-defined methods]; and c. Monitor policy compliance [FedRAMP Assignment: Continuously (via CM-7 (5))].

๐Ÿ’ผ CM-11 User-installed Software (L)(M)(H)

a. Establish [Assignment: organization-defined policies] governing the installation of software by users; b. Enforce software installation policies through the following methods: [Assignment: organization-defined methods]; and c. Monitor policy compliance [FedRAMP Assignment: Continuously (via CM-7 (5))].

๐Ÿ’ผ CM-12 Information Location

a. Identify and document the location of [Assignment: organization-defined information] and the specific system components on which the information is processed and stored; b. Identify and document the users who have access to the system and system components where the information is processed and stored; and c. Document changes to the location (i.e., system or system components) where the information is processed and stored.

๐Ÿ’ผ CM-12 Information Location (M)(H)

a. Identify and document the location of [Assignment: organization-defined information] and the specific system components on which the information is processed and stored; b. Identify and document the users who have access to the system and system components where the information is processed and stored; and c. Document changes to the location (i.e., system or system components) where the information is processed and stored. **CM-12 Additional FedRAMP Requirements and Guidance:** **Requirement**: According to FedRAMP Authorization Boundary Guidance.

๐Ÿ’ผ CM-12 Information Location (M)(H)

a. Identify and document the location of [Assignment: organization-defined information] and the specific system components on which the information is processed and stored; b. Identify and document the users who have access to the system and system components where the information is processed and stored; and c. Document changes to the location (i.e., system or system components) where the information is processed and stored. **CM-12 Additional FedRAMP Requirements and Guidance:** **Requirement**: According to FedRAMP Authorization Boundary Guidance.

๐Ÿ’ผ CM-12(1) Automated Tools to Support Information Location (M)(H)

Use automated tools to identify [FedRAMP Assignment: Federal data and system data that must be protected at the High or Moderate impact levels] on [Assignment: organization-defined system components] to ensure controls are in place to protect organizational information and individual privacy. **CM-12 (1) Additional FedRAMP Requirements and Guidance:** **Requirement**: According to FedRAMP Authorization Boundary Guidance.

๐Ÿ’ผ CM-12(1) Automated Tools to Support Information Location (M)(H)

Use automated tools to identify [FedRAMP Assignment: Federal data and system data that must be protected at the High or Moderate impact levels] on [Assignment: organization-defined system components] to ensure controls are in place to protect organizational information and individual privacy. **CM-12 (1) Additional FedRAMP Requirements and Guidance:** **Requirement**: According to FedRAMP Authorization Boundary Guidance.

๐Ÿ’ผ CM-14 Signed Components

Prevent the installation of [Assignment: organization-defined software and firmware components] without verification that the component has been digitally signed using a certificate that is recognized and approved by the organization.

๐Ÿ’ผ CM-14 Signed Components (H)

Prevent the installation of [Assignment: organization-defined software and firmware components] without verification that the component has been digitally signed using a certificate that is recognized and approved by the organization. **CM-14 Additional FedRAMP Requirements and Guidance:** **Guidance**: If digital signatures/certificates are unavailable, alternative cryptographic integrity checks (hashes, self-signed certs, etc.) can be utilized.

๐Ÿ’ผ CM-2 (1) REVIEWS AND UPDATES

The organization reviews and updates the baseline configuration of the information system: CM-2 (1)(a) [Assignment: organization-defined frequency]; CM-2 (1)(b) When required due to [Assignment organization-defined circumstances]; and CM-2 (1)(c) As an integral part of information system component installations and upgrades.

๐Ÿ’ผ CM-2 (7) CONFIGURE SYSTEMS, COMPONENTS, OR DEVICES FOR HIGH-RISK AREAS

The organization: CM-2 (7)(a) Issues [Assignment: organization-defined information systems, system components, or devices] with [Assignment: organization-defined configurations] to individuals traveling to locations that the organization deems to be of significant risk; and CM-2 (7)(b) Applies [Assignment: organization-defined security safeguards] to the devices when the individuals return.

๐Ÿ’ผ CM-2 Baseline Configuration

a. Develop, document, and maintain under configuration control, a current baseline configuration of the system; and b. Review and update the baseline configuration of the system: 1. [Assignment: organization-defined frequency]; 2. When required due to [Assignment: organization-defined circumstances]; and 3. When system components are installed or upgraded.

๐Ÿ’ผ CM-2 Baseline Configuration (L)(M)(H)

a. Develop, document, and maintain under configuration control, a current baseline configuration of the system; and b. Review and update the baseline configuration of the system: 1. [FedRAMP Assignment: at least annually and when a significant change occurs]; 2. When required due to [FedRAMP Assignment: to include when directed by the JAB]; and 3. When system components are installed or upgraded. **CM-2 Additional FedRAMP Requirements and Guidance:** **(b) (1) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F.

๐Ÿ’ผ CM-2 Baseline Configuration (L)(M)(H)

a. Develop, document, and maintain under configuration control, a current baseline configuration of the system; and b. Review and update the baseline configuration of the system: 1. [FedRAMP Assignment: at least annually and when a significant change occurs]; 2. When required due to [FedRAMP Assignment: to include when directed by the JAB]; and 3. When system components are installed or upgraded. **CM-2 Additional FedRAMP Requirements and Guidance:** **(b) (1) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F.

๐Ÿ’ผ CM-2 Baseline Configuration (L)(M)(H)

a. Develop, document, and maintain under configuration control, a current baseline configuration of the system; and b. Review and update the baseline configuration of the system: 1. [FedRAMP Assignment: at least annually and when a significant change occurs]; 2. When required due to [FedRAMP Assignment: to include when directed by the JAB]; and 3. When system components are installed or upgraded. **CM-2 Additional FedRAMP Requirements and Guidance:** **(b) (1) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F.

๐Ÿ’ผ CM-2(7) Configure Systems and Components for High-risk Areas (M)(H)

(a) Issue [Assignment: organization-defined systems or system components] with [Assignment: organization-defined configurations] to individuals traveling to locations that the organization deems to be of significant risk; and (b) Apply the following controls to the systems or components when the individuals return from travel: [Assignment: organization-defined controls].

๐Ÿ’ผ CM-2(7) Configure Systems and Components for High-risk Areas (M)(H)

(a) Issue [Assignment: organization-defined systems or system components] with [Assignment: organization-defined configurations] to individuals traveling to locations that the organization deems to be of significant risk; and (b) Apply the following controls to the systems or components when the individuals return from travel: [Assignment: organization-defined controls].

๐Ÿ’ผ CM-3 (1) AUTOMATED DOCUMENT | NOTIFICATION | PROHIBITION OF CHANGES

The organization employs automated mechanisms to: CM-3 (1)(a) Document proposed changes to the information system; CM-3 (1)(b) Notify [Assignment: organized-defined approval authorities] of proposed changes to the information system and request change approval; CM-3 (1)(c) Highlight proposed changes to the information system that have not been approved or disapproved by [Assignment: organization-defined time period]; CM-3 (1)(d) Prohibit changes to the information system until designated approvals are received; CM-3 (1)(e) Document all changes to the information system; and CM-3 (1)(f) Notify [Assignment: organization-defined personnel] when approved changes to the information system are completed.

๐Ÿ’ผ CM-3 Configuration Change Control

a. Determine and document the types of changes to the system that are configuration-controlled; b. Review proposed configuration-controlled changes to the system and approve or disapprove such changes with explicit consideration for security and privacy impact analyses; c. Document configuration change decisions associated with the system; d. Implement approved configuration-controlled changes to the system; e. Retain records of configuration-controlled changes to the system for [Assignment: organization-defined time period]; f. Monitor and review activities associated with configuration-controlled changes to the system; and g. Coordinate and provide oversight for configuration change control activities through [Assignment: organization-defined configuration change control element] that convenes [Selection (one or more): [Assignment: organization-defined frequency]; when [Assignment: organization-defined configuration change conditions]].

๐Ÿ’ผ CM-3 CONFIGURATION CHANGE CONTROL

The organization: CM-3a. Determines the types of changes to the information system that are configuration-controlled; CM-3b. Reviews proposed configuration-controlled changes to the information system and approves or disapproves such changes with explicit consideration for security impact analyses; CM-3c. Documents configuration change decisions associated with the information system; CM-3d. Implements approved configuration-controlled changes to the information system; CM-3e. Retains records of configuration-controlled changes to the information system for [Assignment: organization-defined time period]; CM-3f. Audits and reviews activities associated with configuration-controlled changes to the information system; and CM-3g. Coordinates and provides oversight for configuration change control activities through [Assignment: organization-defined configuration change control element (e.g., committee, board)] that convenes [Selection (one or more): [Assignment: organization-defined frequency]; [Assignment: organization-defined configuration change conditions]].

๐Ÿ’ผ CM-3 Configuration Change Control (M)(H)

a. Determine and document the types of changes to the system that are configuration-controlled; b. Review proposed configuration-controlled changes to the system and approve or disapprove such changes with explicit consideration for security and privacy impact analyses; c. Document configuration change decisions associated with the system; d. Implement approved configuration-controlled changes to the system; e. Retain records of configuration-controlled changes to the system for [Assignment: organization-defined time period]; f. Monitor and review activities associated with configuration-controlled changes to the system; and g. Coordinate and provide oversight for configuration change control activities through [Assignment: organization-defined configuration change control element] that convenes [Selection (one-or-more): organization-defined frequency]; when [Assignment: organization-defined configuration change conditions]]. **CM-3 Additional FedRAMP Requirements and Guidance:** **(e) Guidance**: In accordance with record retention policies and procedures. **Requirement**: The service provider establishes a central means of communicating major changes to or developments in the information system or environment of operations that may affect its services to the federal government and associated service consumers (e.g., electronic bulletin board, web status page). The means of communication are approved and accepted by the JAB/AO.

๐Ÿ’ผ CM-3 Configuration Change Control (M)(H)

a. Determine and document the types of changes to the system that are configuration-controlled; b. Review proposed configuration-controlled changes to the system and approve or disapprove such changes with explicit consideration for security and privacy impact analyses; c. Document configuration change decisions associated with the system; d. Implement approved configuration-controlled changes to the system; e. Retain records of configuration-controlled changes to the system for [Assignment: organization-defined time period]; f. Monitor and review activities associated with configuration-controlled changes to the system; and g. Coordinate and provide oversight for configuration change control activities through [Assignment: organization-defined configuration change control element] that convenes [Selection (one-or-more): organization-defined frequency]; when [Assignment: organization-defined configuration change conditions]]. **CM-3 Additional FedRAMP Requirements and Guidance:** **(e) Guidance**: In accordance with record retention policies and procedures. **Requirement**: The service provider establishes a central means of communicating major changes to or developments in the information system or environment of operations that may affect its services to the federal government and associated service consumers (e.g., electronic bulletin board, web status page). The means of communication are approved and accepted by the JAB/AO.

๐Ÿ’ผ CM-3(1) Automated Documentation, Notification, and Prohibition of Changes (H)

Use [Assignment: organization-defined automated mechanisms] to: (a) Document proposed changes to the system; (b) Notify [Assignment: organization-defined approval authorities] of proposed changes to the system and request change approval; (c) Highlight proposed changes to the system that have not been approved or disapproved within [FedRAMP Assignment: organization agreed upon time period]; (d) Prohibit changes to the system until designated approvals are received; (e) Document all changes to the system; and (f) Notify [FedRAMP Assignment: organization defined configuration management approval authorities] when approved changes to the system are completed.

๐Ÿ’ผ CM-3(1) Configuration Change Control | Automated Documentation, Notification, and Prohibition of Changes

Use [Assignment: organization-defined automated mechanisms] to: (a) Document proposed changes to the system; (b) Notify [Assignment: organization-defined approval authorities] of proposed changes to the system and request change approval; (c) Highlight proposed changes to the system that have not been approved or disapproved within [Assignment: organization-defined time period]; (d) Prohibit changes to the system until designated approvals are received; (e) Document all changes to the system; and (f) Notify [Assignment: organization-defined personnel] when approved changes to the system are completed.

๐Ÿ’ผ CM-3(6) Cryptography Management (H)

Ensure that cryptographic mechanisms used to provide the following controls are under configuration management: [FedRAMP Assignment: All security safeguards that rely on cryptography].

๐Ÿ’ผ CM-4 (1) SEPARATE TEST ENVIRONMENTS

The organization analyzes changes to the information system in a separate test environment before implementation in an operational environment, looking for security impacts due to flaws, weaknesses, incompatibility, or intentional malice.

๐Ÿ’ผ CM-4 (2) VERIFICATION OF SECURITY FUNCTIONS

The organization, after the information system is changed, checks the security functions to verify that the functions are implemented correctly, operating as intended, and producing the desired outcome with regard to meeting the security requirements for the system.

๐Ÿ’ผ CM-4(1) Separate Test Environments (H)

Analyze changes to the system in a separate test environment before implementation in an operational environment, looking for security and privacy impacts due to flaws, weaknesses, incompatibility, or intentional malice.

๐Ÿ’ผ CM-4(2) Verification of Controls (M)(H)

After system changes, verify that the impacted controls are implemented correctly, operating as intended, and producing the desired outcome with regard to meeting the security and privacy requirements for the system.

๐Ÿ’ผ CM-4(2) Verification of Controls (M)(H)

After system changes, verify that the impacted controls are implemented correctly, operating as intended, and producing the desired outcome with regard to meeting the security and privacy requirements for the system.

๐Ÿ’ผ CM-5 (2) REVIEW SYSTEM CHANGES

The organization reviews information system changes [Assignment: organization-defined frequency] and [Assignment: organization-defined circumstances] to determine whether unauthorized changes have occurred.

๐Ÿ’ผ CM-5 (3) SIGNED COMPONENTS

The information system prevents the installation of [Assignment: organization-defined software and firmware components] without verification that the component has been digitally signed using a certificate that is recognized and approved by the organization.

๐Ÿ’ผ CM-5 (4) DUAL AUTHORIZATION

The organization enforces dual authorization for implementing changes to [Assignment: organization-defined information system components and system-level information].

๐Ÿ’ผ CM-5 (5) LIMIT PRODUCTION | OPERATIONAL PRIVILEGES

The organization: CM-5 (5)(a) Limits privileges to change information system components and system-related information within a production or operational environment; and CM-5 (5)(b) Reviews and reevaluates privileges [Assignment: organization-defined frequency].

๐Ÿ’ผ CM-6 Configuration Settings

a. Establish and document configuration settings for components employed within the system that reflect the most restrictive mode consistent with operational requirements using [Assignment: organization-defined common secure configurations]; b. Implement the configuration settings; c. Identify, document, and approve any deviations from established configuration settings for [Assignment: organization-defined system components] based on [Assignment: organization-defined operational requirements]; and d. Monitor and control changes to the configuration settings in accordance with organizational policies and procedures.

๐Ÿ’ผ CM-6 CONFIGURATION SETTINGS

The organization: CM-6a. Establishes and documents configuration settings for information technology products employed within the information system using [Assignment: organization-defined security configuration checklists] that reflect the most restrictive mode consistent with operational requirements; CM-6b. Implements the configuration settings; CM-6c. Identifies, documents, and approves any deviations from established configuration settings for [Assignment: organization-defined information system components] based on [Assignment: organization-defined operational requirements]; and CM-6d. Monitors and controls changes to the configuration settings in accordance with organizational policies and procedures.

๐Ÿ’ผ CM-6 Configuration Settings (L)(M)(H)

a. Establish and document configuration settings for components employed within the system that reflect the most restrictive mode consistent with operational requirements using [Assignment: organization-defined common secure configurations]; b. Implement the configuration settings; c. Identify, document, and approve any deviations from established configuration settings for [Assignment: organization-defined system components] based on [Assignment: organization-defined operational requirements]; and d. Monitor and control changes to the configuration settings in accordance with organizational policies and procedures. **CM-6 Additional FedRAMP Requirements and Guidance:** **Guidance**: Compliance checks are used to evaluate configuration settings and provide general insight into the overall effectiveness of configuration management activities. CSPs and 3PAOs typically combine compliance check findings into a single CM-6 finding, which is acceptable. However, for initial assessments, annual assessments, and significant change requests, FedRAMP requires a clear understanding, on a per-control basis, where risks exist. Therefore, 3PAOs must also analyze compliance check findings as part of the controls assessment. Where a direct mapping exists, the 3PAO must document additional findings per control in the corresponding SAR Risk Exposure Table (RET), which are then documented in the CSP's Plan of Action and Milestones (POA&M). This will likely result in the details of individual control findings overlapping with those in the combined CM-6 finding, which is acceptable. During monthly continuous monitoring, new findings from CSP compliance checks may be combined into a single CM-6 POA&M item. CSPs are not required to map the findings to specific controls because controls are only assessed during initial assessments, annual assessments, and significant change requests. **(a) Requirement 1**: The service provider shall use the DoD STIGs to establish configuration settings; Center for Internet Security up to Level 2 (CIS Level 2) guidelines shall be used if STIGs are not available; Custom baselines shall be used if CIS is not available. **(a) Requirement 2**: The service provider shall ensure that checklists for configuration settings are Security Content Automation Protocol (SCAP) validated or SCAP compatible (if validated checklists are not available).

๐Ÿ’ผ CM-6 Configuration Settings (L)(M)(H)

a. Establish and document configuration settings for components employed within the system that reflect the most restrictive mode consistent with operational requirements using [Assignment: organization-defined common secure configurations]; b. Implement the configuration settings; c. Identify, document, and approve any deviations from established configuration settings for [Assignment: organization-defined system components] based on [Assignment: organization-defined operational requirements]; and d. Monitor and control changes to the configuration settings in accordance with organizational policies and procedures. **CM-6 Additional FedRAMP Requirements and Guidance:** **Guidance**: Compliance checks are used to evaluate configuration settings and provide general insight into the overall effectiveness of configuration management activities. CSPs and 3PAOs typically combine compliance check findings into a single CM-6 finding, which is acceptable. However, for initial assessments, annual assessments, and significant change requests, FedRAMP requires a clear understanding, on a per-control basis, where risks exist. Therefore, 3PAOs must also analyze compliance check findings as part of the controls assessment. Where a direct mapping exists, the 3PAO must document additional findings per control in the corresponding SAR Risk Exposure Table (RET), which are then documented in the CSP's Plan of Action and Milestones (POA&M). This will likely result in the details of individual control findings overlapping with those in the combined CM-6 finding, which is acceptable. During monthly continuous monitoring, new findings from CSP compliance checks may be combined into a single CM-6 POA&M item. CSPs are not required to map the findings to specific controls because controls are only assessed during initial assessments, annual assessments, and significant change requests. **(a) Requirement 1**: The service provider shall use the DoD STIGs to establish configuration settings; Center for Internet Security up to Level 2 (CIS Level 2) guidelines shall be used if STIGs are not available; Custom baselines shall be used if CIS is not available. **(a) Requirement 2**: The service provider shall ensure that checklists for configuration settings are Security Content Automation Protocol (SCAP) validated or SCAP compatible (if validated checklists are not available).

๐Ÿ’ผ CM-6 Configuration Settings (L)(M)(H)

a. Establish and document configuration settings for components employed within the system that reflect the most restrictive mode consistent with operational requirements using [Assignment: organization-defined common secure configurations]; b. Implement the configuration settings; c. Identify, document, and approve any deviations from established configuration settings for [Assignment: organization-defined system components] based on [Assignment: organization-defined operational requirements]; and d. Monitor and control changes to the configuration settings in accordance with organizational policies and procedures. **CM-6 Additional FedRAMP Requirements and Guidance:** **Guidance**: Compliance checks are used to evaluate configuration settings and provide general insight into the overall effectiveness of configuration management activities. CSPs and 3PAOs typically combine compliance check findings into a single CM-6 finding, which is acceptable. However, for initial assessments, annual assessments, and significant change requests, FedRAMP requires a clear understanding, on a per-control basis, where risks exist. Therefore, 3PAOs must also analyze compliance check findings as part of the controls assessment. Where a direct mapping exists, the 3PAO must document additional findings per control in the corresponding SAR Risk Exposure Table (RET), which are then documented in the CSP's Plan of Action and Milestones (POA&M). This will likely result in the details of individual control findings overlapping with those in the combined CM-6 finding, which is acceptable. During monthly continuous monitoring, new findings from CSP compliance checks may be combined into a single CM-6 POA&M item. CSPs are not required to map the findings to specific controls because controls are only assessed during initial assessments, annual assessments, and significant change requests. **(a) Requirement 1**: The service provider shall use the DoD STIGs to establish configuration settings; Center for Internet Security up to Level 2 (CIS Level 2) guidelines shall be used if STIGs are not available; Custom baselines shall be used if CIS is not available. **(a) Requirement 2**: The service provider shall ensure that checklists for configuration settings are Security Content Automation Protocol (SCAP) validated or SCAP compatible (if validated checklists are not available).

๐Ÿ’ผ CM-7 (1) PERIODIC REVIEW

The organization: CM-7 (1)(a) Reviews the information system [Assignment: organization-defined frequency] to identify unnecessary and/or nonsecure functions, ports, protocols, and services; and CM-7 (1)(b) Disables [Assignment: organization-defined functions, ports, protocols, and services within the information system deemed to be unnecessary and/or nonsecure].

๐Ÿ’ผ CM-7 (2) PREVENT PROGRAM EXECUTION

The information system prevents program execution in accordance with [Selection (one or more): [Assignment: organization-defined policies regarding software program usage and restrictions]; rules authorizing the terms and conditions of software program usage].

๐Ÿ’ผ CM-7 (4) UNAUTHORIZED SOFTWARE | BLACKLISTING

The organization: CM-7 (4)(a) Identifies [Assignment: organization-defined software programs not authorized to execute on the information system]; CM-7 (4)(b) Employs an allow-all, deny-by-exception policy to prohibit the execution of unauthorized software programs on the information system; and CM-7 (4)(c) Reviews and updates the list of unauthorized software programs [Assignment: organization-defined frequency].

๐Ÿ’ผ CM-7 (5) AUTHORIZED SOFTWARE | WHITELISTING

The organization: CM-7 (5)(a) Identifies [Assignment: organization-defined software programs authorized to execute on the information system]; CM-7 (5)(b) Employs a deny-all, permit-by-exception policy to allow the execution of authorized software programs on the information system; and CM-7 (5)(c) Reviews and updates the list of authorized software programs [Assignment: organization-defined frequency].

๐Ÿ’ผ CM-7 Least Functionality

a. Configure the system to provide only [Assignment: organization-defined mission essential capabilities]; and b. Prohibit or restrict the use of the following functions, ports, protocols, software, and/or services: [Assignment: organization-defined prohibited or restricted functions, system ports, protocols, software, and/or services].

๐Ÿ’ผ CM-7 LEAST FUNCTIONALITY

The organization: CM-7a. Configures the information system to provide only essential capabilities; and CM-7b. Prohibits or restricts the use of the following functions, ports, protocols, and/or services: [Assignment: organization-defined prohibited or restricted functions, ports, protocols, and/or services].

๐Ÿ’ผ CM-7 Least Functionality (L)(M)(H)

a. Configure the system to provide only [Assignment: organization-defined mission essential capabilities]; and b. Prohibit or restrict the use of the following functions, ports, protocols, software, and/or services: [Assignment: organization-defined prohibited or restricted functions, system ports, protocols, software, and/or services]. **CM-7 Additional FedRAMP Requirements and Guidance:** **(b) Requirement**: The service provider shall use Security guidelines (See CM-6) to establish list of prohibited or restricted functions, ports, protocols, and/or services or establishes its own list of prohibited or restricted functions, ports, protocols, and/or services if STIGs or CIS is not available.

๐Ÿ’ผ CM-7 Least Functionality (L)(M)(H)

a. Configure the system to provide only [Assignment: organization-defined mission essential capabilities]; and b. Prohibit or restrict the use of the following functions, ports, protocols, software, and/or services: [Assignment: organization-defined prohibited or restricted functions, system ports, protocols, software, and/or services]. **CM-7 Additional FedRAMP Requirements and Guidance:** **(b) Requirement**: The service provider shall use Security guidelines (See CM-6) to establish list of prohibited or restricted functions, ports, protocols, and/or services or establishes its own list of prohibited or restricted functions, ports, protocols, and/or services if STIGs or CIS is not available.

๐Ÿ’ผ CM-7 Least Functionality (L)(M)(H)

a. Configure the system to provide only [Assignment: organization-defined mission essential capabilities]; and b. Prohibit or restrict the use of the following functions, ports, protocols, software, and/or services: [Assignment: organization-defined prohibited or restricted functions, system ports, protocols, software, and/or services]. **CM-7 Additional FedRAMP Requirements and Guidance:** **(b) Requirement**: The service provider shall use Security guidelines (See CM-6) to establish list of prohibited or restricted functions, ports, protocols, and/or services or establishes its own list of prohibited or restricted functions, ports, protocols, and/or services if STIGs or CIS is not available.

๐Ÿ’ผ CM-7(1) Least Functionality | Periodic Review

(a) Review the system [Assignment: organization-defined frequency] to identify unnecessary and/or nonsecure functions, ports, protocols, software, and services; and (b) Disable or remove [Assignment: organization-defined functions, ports, protocols, software, and services within the system deemed to be unnecessary and/or nonsecure].

๐Ÿ’ผ CM-7(1) Periodic Review (M)(H)

(a) Review the system [FedRAMP Assignment: at least annually] to identify unnecessary and/or nonsecure functions, ports, protocols, software, and services; and (b) Disable or remove [Assignment: organization-defined functions, ports, protocols, software, and services within the system deemed to be unnecessary and/or nonsecure].

๐Ÿ’ผ CM-7(1) Periodic Review (M)(H)

(a) Review the system [FedRAMP Assignment: at least annually] to identify unnecessary and/or nonsecure functions, ports, protocols, software, and services; and (b) Disable or remove [Assignment: organization-defined functions, ports, protocols, software, and services within the system deemed to be unnecessary and/or nonsecure].

๐Ÿ’ผ CM-7(2) Least Functionality | Prevent Program Execution

Prevent program execution in accordance with [Selection (one or more): [Assignment: organization-defined policies, rules of behavior, and/or access agreements regarding software program usage and restrictions]; rules authorizing the terms and conditions of software program usage].

๐Ÿ’ผ CM-7(2) Prevent Program Execution (M)(H)

Prevent program execution in accordance with [Selection (one-or-more): [Assignment: organization-defined policies, rules of behavior, and/or access agreements regarding software program usage and restrictions]; rules authorizing the terms and conditions of software program usage]. **CM-7 (2) Additional FedRAMP Requirements and Guidance:** **Guidance**: This control refers to software deployment by CSP personnel into the production environment. The control requires a policy that states conditions for deploying software. This control shall be implemented in a technical manner on the information system to only allow programs to run that adhere to the policy (i.e. allow-listing). This control is not to be based off of strictly written policy on what is allowed or not allowed to run.

๐Ÿ’ผ CM-7(2) Prevent Program Execution (M)(H)

Prevent program execution in accordance with [Selection (one-or-more): [Assignment: organization-defined policies, rules of behavior, and/or access agreements regarding software program usage and restrictions]; rules authorizing the terms and conditions of software program usage]. **CM-7 (2) Additional FedRAMP Requirements and Guidance:** **Guidance**: This control refers to software deployment by CSP personnel into the production environment. The control requires a policy that states conditions for deploying software. This control shall be implemented in a technical manner on the information system to only allow programs to run that adhere to the policy (i.e. allow-listing). This control is not to be based off of strictly written policy on what is allowed or not allowed to run.

๐Ÿ’ผ CM-7(5) Authorized Software โ€” Allow-by-exception (M)(H)

(a) Identify [Assignment: organization-defined software programs authorized to execute on the system]; (b) Employ a deny-all, permit-by-exception policy to allow the execution of authorized software programs on the system; and (c) Review and update the list of authorized software programs [FedRAMP Assignment: at least quarterly or when there is a change].

๐Ÿ’ผ CM-7(5) Authorized Software โ€” Allow-by-exception (M)(H)

(a) Identify [Assignment: organization-defined software programs authorized to execute on the system]; (b) Employ a deny-all, permit-by-exception policy to allow the execution of authorized software programs on the system; and (c) Review and update the list of authorized software programs [FedRAMP Assignment: at least quarterly or when there is a change].

๐Ÿ’ผ CM-8 (3) AUTOMATED UNAUTHORIZED COMPONENT DETECTION

The organization: CM-8 (3)(a) Employs automated mechanisms [Assignment: organization-defined frequency] to detect the presence of unauthorized hardware, software, and firmware components within the information system; and CM-8 (3)(b) Takes the following actions when unauthorized components are detected: [Selection (one or more): disables network access by such components; isolates the components; notifies [Assignment: organization-defined personnel or roles]].

๐Ÿ’ผ CM-8 (4) ACCOUNTABILITY INFORMATION

The organization includes in the information system component inventory information, a means for identifying by [Selection (one or more): name; position; role], individuals responsible/accountable for administering those components.

๐Ÿ’ผ CM-8 (9) ASSIGNMENT OF COMPONENTS TO SYSTEMS

The organization: CM-8 (9)(a) Assigns [Assignment: organization-defined acquired information system components] to an information system; and CM-8 (9)(b) Receives an acknowledgement from the information system owner of this assignment.

๐Ÿ’ผ CM-8 INFORMATION SYSTEM COMPONENT INVENTORY

The organization: CM-8a. Develops and documents an inventory of information system components that: CM-8a.1. Accurately reflects the current information system; CM-8a.2. Includes all components within the authorization boundary of the information system; CM-8a.3. Is at the level of granularity deemed necessary for tracking and reporting; and CM-8a.4. Includes [Assignment: organization-defined information deemed necessary to achieve effective information system component accountability]; and CM-8b. Reviews and updates the information system component inventory [Assignment: organization-defined frequency].

๐Ÿ’ผ CM-8 System Component Inventory

a. Develop and document an inventory of system components that: 1. Accurately reflects the system; 2. Includes all components within the system; 3. Does not include duplicate accounting of components or components assigned to any other system; 4. Is at the level of granularity deemed necessary for tracking and reporting; and 5. Includes the following information to achieve system component accountability: [Assignment: organization-defined information deemed necessary to achieve effective system component accountability]; and b. Review and update the system component inventory [Assignment: organization-defined frequency].

๐Ÿ’ผ CM-8 System Component Inventory (L)(M)(H)

a. Develop and document an inventory of system components that: 1. Accurately reflects the system; 2. Includes all components within the system; 3. Does not include duplicate accounting of components or components assigned to any other system; 4. Is at the level of granularity deemed necessary for tracking and reporting; and 5. Includes the following information to achieve system component accountability: [Assignment: organization-defined information deemed necessary to achieve effective system component accountability]; and b. Review and update the system component inventory [FedRAMP Assignment: at least monthly]. **CM-8 Additional FedRAMP Requirements and Guidance:** **Requirement**: must be provided at least monthly or when there is a change.

๐Ÿ’ผ CM-8 System Component Inventory (L)(M)(H)

a. Develop and document an inventory of system components that: 1. Accurately reflects the system; 2. Includes all components within the system; 3. Does not include duplicate accounting of components or components assigned to any other system; 4. Is at the level of granularity deemed necessary for tracking and reporting; and 5. Includes the following information to achieve system component accountability: [Assignment: organization-defined information deemed necessary to achieve effective system component accountability]; and b. Review and update the system component inventory [FedRAMP Assignment: at least monthly]. **CM-8 Additional FedRAMP Requirements and Guidance:** **Requirement**: must be provided at least monthly or when there is a change.

๐Ÿ’ผ CM-8 System Component Inventory (L)(M)(H)

a. Develop and document an inventory of system components that: 1. Accurately reflects the system; 2. Includes all components within the system; 3. Does not include duplicate accounting of components or components assigned to any other system; 4. Is at the level of granularity deemed necessary for tracking and reporting; and 5. Includes the following information to achieve system component accountability: [Assignment: organization-defined information deemed necessary to achieve effective system component accountability]; and b. Review and update the system component inventory [FedRAMP Assignment: at least monthly]. **CM-8 Additional FedRAMP Requirements and Guidance:** **Requirement**: must be provided at least monthly or when there is a change.

๐Ÿ’ผ CM-8(3) Automated Unauthorized Component Detection (M)(H)

(a) Detect the presence of unauthorized hardware, software, and firmware components within the system using [FedRAMP Assignment: automated mechanisms with a maximum five-minute delay in detection]; and [FedRAMP Assignment: continuously] (b) Take the following actions when unauthorized components are detected: [Selection (one-or-more): disable network access by such components; isolate the components; notify [Assignment: organization-defined personnel or roles]].

๐Ÿ’ผ CM-8(3) Automated Unauthorized Component Detection (M)(H)

(a) Detect the presence of unauthorized hardware, software, and firmware components within the system using [FedRAMP Assignment: automated mechanisms with a maximum five-minute delay in detection]; and [FedRAMP Assignment: continuously] (b) Take the following actions when unauthorized components are detected: [Selection (one-or-more): disable network access by such components; isolate the components; notify [Assignment: organization-defined personnel or roles]].

๐Ÿ’ผ CM-8(3) System Component Inventory | Automated Unauthorized Component Detection

(a) Detect the presence of unauthorized hardware, software, and firmware components within the system using [Assignment: organization-defined automated mechanisms] [Assignment: organization-defined frequency]; and (b) Take the following actions when unauthorized components are detected: [Selection (one or more): disable network access by such components; isolate the components; notify [Assignment: organization-defined personnel or roles]].

๐Ÿ’ผ CM-8(4) Accountability Information (H)

Include in the system component inventory information, a means for identifying by [FedRAMP Assignment: position and role], individuals responsible and accountable for administering those components.

๐Ÿ’ผ CM-9 Configuration Management Plan

Develop, document, and implement a configuration management plan for the system that: a. Addresses roles, responsibilities, and configuration management processes and procedures; b. Establishes a process for identifying configuration items throughout the system development life cycle and for managing the configuration of the configuration items; c. Defines the configuration items for the system and places the configuration items under configuration management; d. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; and e. Protects the configuration management plan from unauthorized disclosure and modification.

๐Ÿ’ผ CM-9 CONFIGURATION MANAGEMENT PLAN

The organization develops, documents, and implements a configuration management plan for the information system that: CM-9a. Addresses roles, responsibilities, and configuration management processes and procedures; CM-9b. Establishes a process for identifying configuration items throughout the system development life cycle and for managing the configuration of the configuration items; CM-9c. Defines the configuration items for the information system and places the configuration items under configuration management; and CM-9d. Protects the configuration management plan from unauthorized disclosure and modification.

๐Ÿ’ผ CM-9 Configuration Management Plan (M)(H)

Develop, document, and implement a configuration management plan for the system that: a. Addresses roles, responsibilities, and configuration management processes and procedures; b. Establishes a process for identifying configuration items throughout the system development life cycle and for managing the configuration of the configuration items; c. Defines the configuration items for the system and places the configuration items under configuration management; d. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; and e. Protects the configuration management plan from unauthorized disclosure and modification. **CM-9 Additional FedRAMP Requirements and Guidance:** **Guidance**: FedRAMP does not provide a template for the Configuration Management Plan. However, NIST SP 800-128, Guide for Security-Focused Configuration Management of Information Systems, provides guidelines for the implementation of CM controls as well as a sample CMP outline in Appendix D of the Guide.

๐Ÿ’ผ CM-9 Configuration Management Plan (M)(H)

Develop, document, and implement a configuration management plan for the system that: a. Addresses roles, responsibilities, and configuration management processes and procedures; b. Establishes a process for identifying configuration items throughout the system development life cycle and for managing the configuration of the configuration items; c. Defines the configuration items for the system and places the configuration items under configuration management; d. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; and e. Protects the configuration management plan from unauthorized disclosure and modification. **CM-9 Additional FedRAMP Requirements and Guidance:** **Guidance**: FedRAMP does not provide a template for the Configuration Management Plan. However, NIST SP 800-128, Guide for Security-Focused Configuration Management of Information Systems, provides guidelines for the implementation of CM controls as well as a sample CMP outline in Appendix D of the Guide.

๐Ÿ’ผ Communications (RC.CO)

Restoration activities are coordinated with internal and external parties (e.g. coordinating centers, Internet Service Providers, owners of attacking systems, victims, other CSIRTs, and vendors).

๐Ÿ’ผ Compute and hardware

The optimal compute choice for a particular workload can vary based on application design, usage patterns, and configuration settings. Architectures may use different compute choices for various components and allow different features to improve performance. Selecting the wrong compute choice for an architecture can lead to lower performance efficiency.

๐Ÿ’ผ COST02-BP01 Develop policies based on your organization requirements

Develop policies that define how resources are managed by your organization and inspect them periodically. Policies should cover the cost aspects of resources and workloads, including creation, modification, and decommissioning over a resourceโ€™s lifetime. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Understanding your organizationโ€™s costs and drivers is critical for managing your cost and usage effectively and identifying cost reduction opportunities. Organizations typically operate multiple workloads run by multiple teams. These teams can be in different organization units, each with its own revenue stream. The capability to attribute resource costs to the workloads, individual organization, or product owners drives efficient usage behaviour and helps reduce waste. Accurate cost and usage monitoring helps you understand how optimized a workload is, as well as how profitable organization units and products are. This knowledge allows for more informed decision making about where to allocate resources within your organization. Awareness of usage at all levels in the organization is key to driving change, as change in usage drives changes in cost. Consider taking a multi-faceted approach to becoming aware of your usage and expenditures. The first step in performing governance is to use your organizationโ€™s requirements to develop policies for your cloud usage. These policies define how your organization uses the cloud and how resources are managed. Policies should cover all aspects of resources and workloads that relate to cost or usage, including creation, modification, and decommissioning over a resourceโ€™s lifetime. Verify that policies and procedures are followed and implemented for any change in a cloud environment. During your IT change management meetings, raise questions to find out the cost impact of planned changes, whether increasing or decreasing, the business justification, and the expected outcome. Policies should be simple so that they are easily understood and can be implemented effectively throughout the organization. Policies also need to be easy to follow and interpret (so they are used) and specific (no misinterpretation between teams). Moreover, they need to be inspected periodically (like our mechanisms) and updated as customers business conditions or priorities change, which would make the policy outdated. Start with broad, high-level policies, such as which geographic Region to use or times of the day that resources should be running. Gradually refine the policies for the various organizational units and workloads. Common policies include which services and features can be used (for example, lower performance storage in test and development environments), which types of resources can be used by different groups (for example, the largest size of resource in a development account is medium) and how long these resources will be in use (whether temporary, short term, or for a specific period of time). ### Policy example The following is a sample policy you can review to create your own cloud governance policies, which focus on cost optimization. Make sure you adjust policy based on your organizationโ€™s requirements and your stakeholdersโ€™ requests. - **Policy name:** Define a clear policy name, such as Resource Optimization and Cost Reduction Policy. - **Purpose:** Explain why this policy should be used and what is the expected outcome. The objective of this policy is to verify that there is a minimum cost required to deploy and run the desired workload to meet business requirements. - **Scope:** Clearly define who should use this policy and when it should be used, such as DevOps X Team to use this policy in us-east customers for X environment (production or non-production). ### Policy statement - Select us-east-1 or multiple us-east regions based on your workloadโ€™s environment and business requirement (development, user acceptance testing, pre-production, or production). - Schedule Amazon EC2 and Amazon RDS instances to run between six in the morning and eight at night (Eastern Standard Time (EST)). - Stop all unused Amazon EC2 instances after eight hours and unused Amazon RDS instances after 24 hours of inactivity. - Terminate all unused Amazon EC2 instances after 24 hours of inactivity in non-production environments. Remind Amazon EC2 instance owner (based on tags) to review their stopped Amazon EC2 instances in production and inform them that their Amazon EC2 instances will be terminated within 72 hours if they are not in use. - Use generic instance family and size such as m5.large and then resize the instance based on CPU and memory utilization using AWS Compute Optimizer. - Prioritize using auto scaling to dynamically adjust the number of running instances based on traffic. - Use spot instances for non-critical workloads. - Review capacity requirements to commit saving plans or reserved instances for predictable workloads and inform Cloud Financial Management Team. - Use Amazon S3 lifecycle policies to move infrequently accessed data to cheaper storage tiers. If no retention policy defined, use Amazon S3 Intelligent Tiering to move objects to archived tier automatically. - Monitor resource utilization and set alarms to trigger scaling events using Amazon CloudWatch. - For each AWS account, use AWS Budgets to set cost and usage budgets for your account based on cost center and business units. - Using AWS Budgets to set cost and usage budgets for your account can help you stay on top of your spending and avoid unexpected bills, allowing you to better control your costs. **Procedure:** Provide detailed procedures for implementing this policy or refer to other documents that describe how to implement each policy statement. This section should provide step-by-step instructions for carrying out the policy requirements. To implement this policy, you can use various third-party tools or AWS Config rules to check for compliance with the policy statement and trigger automated remediation actions using AWS Lambda functions. You can also use AWS Organizations to enforce the policy. Additionally, you should regularly review your resource usage and adjust the policy as necessary to verify that it continues to meet your business needs. ### Implementation steps 1. Meet with stakeholders: To develop policies, ask stakeholders (cloud business office, engineers, or functional decision makers for policy enforcement) within your organization to specify their requirements and document them. Take an iterative approach by starting broadly and continually refine down to the smallest units at each step. Team members include those with direct interest in the workload, such as organization units or application owners, as well as supporting groups, such as security and finance teams. 2. Get confirmation: Make sure teams agree on policies who can access and deploy to the AWS Cloud. Make sure they follow your organizationโ€™s policies and confirm that their resource creations align with the agreed policies and procedures. 3. Create onboarding training sessions: Ask new organization members to complete onboarding training courses to create cost awareness and organization requirements. They may assume different policies from their previous experience or not think of them at all. 4. Define locations for your workload: Define where your workload operates, including the country and the area within the country. This information is used for mapping to AWS Regions and Availability Zones. 5. Define and group services and resources: Define the services that the workloads require. For each service, specify the types, the size, and the number of resources required. Define groups for the resources by function, such as application servers or database storage. Resources can belong to multiple groups. 6. Define and group the users by function: Define the users that interact with the workload, focusing on what they do and how they use the workload, not on who they are or their position in the organization. Group similar users or functions together. You can use the AWS managed policies as a guide. 7. Define the actions: Using the locations, resources, and users identified previously, define the actions that are required by each to achieve the workload outcomes over its life time (development, operation, and decommission). Identify the actions based on the groups, not the individual elements in the groups, in each location. Start broadly with read or write, then refine down to specific actions to each service. 8. Define the review period: Workloads and organizational requirements can change over time. Define the workload review schedule to ensure it remains aligned with organizational priorities. 9. Document the policies: Verify the policies that have been defined are accessible as required by your organization. These policies are used to implement, maintain, and audit access of your environments.

๐Ÿ’ผ COST02-BP02 Implement goals and targets

Implement both cost and usage goals and targets for your workload. Goals provide direction to your organization on expected outcomes, and targets provide specific measurable outcomes to be achieved for your workloads. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Develop cost and usage goals and targets for your organization. As a growing organization on AWS, it is important to set and track goals for cost optimization. These goals or key performance indicators (KPIs) can include things like percent of spend on-demand or adoption of certain optimized services such as AWS Graviton instances or gp3 EBS volume types. Set measurable and achievable goals to help you measure efficiency improvements, which is important for your business operations. Goals provide guidance and direction to your organization on expected outcomes. Targets provide specific, measurable outcomes to be achieved. In short, a goal is the direction you want to go, and a target is how far in that direction and when that goal should be achieved (use guidance of specific, measurable, assignable, realistic and timely, or SMART). An example of a goal is that platform usage should increase significantly, with only a minor (non-linear) increase in cost. An example target is a 20% increase in platform usage, with less than a five percent increase in costs. Another common goal is that workloads need to be more efficient every six months. The accompanying target would be that the cost per business metrics needs to decrease by five percent every six months. Use the right metrics, and set calculated KPIs for your organization. You can start with basic KPIs and evolve later based on business needs. A goal for cost optimization is to increase workload efficiency, which corresponds to a decrease in the cost per business outcome of the workload over time. Implement this goal for all workloads, and set a target like a five percent increase in efficiency every six months to a year. In the cloud, you can achieve this through the establishment of capability in cost optimization, as well as new service and feature releases. Targets are the quantifiable benchmarks you want to reach to meet your goals and benchmarks compare your actual results against a target. Establish benchmarks with KPIs for the cost per unit of compute services (such as Spot adoption, Graviton adoption, latest instance types, and On-Demands coverage), storage services (such as EBS GP3 adoption, obsolete EBS snapshots, and Amazon S3 standard storage), or database service usage (such as RDS open-source engines, Graviton adoption, and On-demand coverage). These benchmarks and KPIs can help you verify that you use AWS services in the most cost-effective manner. ### Standard AWS Metrics (Reference) | Category | KPI (%) | Description | |----------|---------|-------------| | Compute | EC2 usage Coverage | EC2 instances (in cost or hours) using SP+RI+Spot compared to total (in cost or hours) of EC2 instances | | Compute | Compute SP/RI utilization | Utilized SP or RI hours compared to total available SP or RI hours | | Compute | EC2/Hour cost | EC2 cost divided by the number of EC2 instances running in that hour | | Compute | vCPU cost | Cost per vCPU for all instances | | Compute | Latest Instance Generation | Percentage of instances on Graviton (or other modern generation instance types) | | Database | RDS coverage | RDS instances (in cost or hours) using RI compared to total (in cost or hours) of RDS instances | | Database | RDS utilization | Utilized RI hours compared to total available RI hours | | Database | RDS uptime | RDS cost divided by the number of RDS instances running in that hour | | Database | Latest Instance Generation | Percentage of instances on Graviton (or other modern instance types) | | Storage | Storage utilization | Optimized storage cost (for example Glacier, deep archive, or Infrequent Access) divided by total storage cost | | Tagging | Untagged resources | 1. Filter out credits, discounts, taxes, refunds, marketplace, and copy the latest monthly cost 2. Select Show only untagged resources in Cost Explorer 3. Divide the amount in untagged resources with your monthly cost | Using this table, include target or benchmark values, which should be calculated based on your organizational goals. You need to measure certain metrics for your business and understand business outcome for that workload to define accurate and realistic KPIs. When you evaluate performance metrics within an organization, distinguish between different types of metrics that serve distinct purposes. These metrics primarily measure the performance and efficiency of the technical infrastructure rather than directly the overall business impact. For instance, they might track server response times, network latency, or system uptime. These metrics are crucial to assess how well the infrastructure supports the organization's technical operations. However, they don't provide direct insight into broader business objectives like customer satisfaction, revenue growth, or market share. To gain a comprehensive understanding of business performance, complement these efficiency metrics with strategic business metrics that directly correlate with business outcomes. Establish near real-time visibility over your KPIs and related savings opportunities and track your progress over time. To get started with the definition and tracking of KPI goals, we recommend the KPI dashboard from Cloud Intelligence Dashboards (CID). Based on the data from Cost and Usage Report (CUR), the KPI dashboard provides a series of recommended cost optimization KPIs, with the ability to set custom goals and track progress over time. If you have other solutions to set and track KPI goals, make sure these methods are adopted by all cloud financial management stakeholders in your organization. ### Implementation steps 1. **Define expected usage levels:** To begin, focus on usage levels. Engage with the application owners, marketing, and greater business teams to understand what the expected usage levels are for the workload. How might customer demand change over time, and what can change due to seasonal increases or marketing campaigns? 2. **Define workload resourcing and costs:** With usage levels defined, quantify the changes in workload resources required to meet those usage levels. You may need to increase the size or number of resources for a workload component, increase data transfer, or change workload components to a different service at a specific level. Specify the costs at each of these major points, and predict the change in cost when there is a change in usage. 3. **Define business goals:** Take the output from the expected changes in usage and cost, combine this with expected changes in technology, or any programs that you are running, and develop goals for the workload. Goals must address usage and cost, as well as the relationship between the two. Goals must be simple, high-level, and help people understand what the business expects in terms of outcomes (such as making sure unused resources are kept below certain cost level). You don't need to define goals for each unused resource type or define costs that can cause losses in goals and targets. Verify that there are organizational programs (for example, capability building like training and education) if there are expected changes in cost without changes in usage. 4. **Define targets:** For each of the defined goals, specify a measurable target. If the goal is to increase efficiency in the workload, the target should quantify the amount of improvement (typically in business outputs for each dollar spent) and when it should be delivered. For example, you could set a goal to minimize waste due to over-provisioning. With this goal, your target can be that waste due to compute over-provisioning in the first tier of production workloads should not exceed ten percent of tier compute cost. Additionally, a second target could be that waste due to compute over-provisioning in the second tier of production workloads should not exceed five percent of tier compute cost.

๐Ÿ’ผ COST02-BP03 Implement an account structure

Implement a structure of accounts that maps to your organization. This assists in allocating and managing costs throughout your organization. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS Organizations allows you to create multiple AWS accounts which can help you centrally govern your environment as you scale your workloads on AWS. You can model your organizational hierarchy by grouping AWS accounts in organizational unit (OU) structure and creating multiple AWS accounts under each OU. To create an account structure, you need to decide which of your AWS accounts will be the management account first. After that, you can create new AWS accounts or select existing accounts as member accounts based on your designed account structure by following management account best practices and member account best practices. It is advised to always have at least one management account with one member account linked to it, regardless of your organization size or usage. All workload resources should reside only within member accounts and no resource should be created in management account. There is no one size fits all answer for how many AWS accounts you should have. Assess your current and future operational and cost models to ensure that the structure of your AWS accounts reflects your organizationโ€™s goals. Some companies create multiple AWS accounts for business reasons, for example: - Administrative or fiscal and billing isolation is required between organization units, cost centers, or specific workloads. - AWS service limits are set to be specific to particular workloads. - There is a requirement for isolation and separation between workloads and resources. Within AWS Organizations, consolidated billing creates the construct between one or more member accounts and the management account. Member accounts allow you to isolate and distinguish your cost and usage by groups. A common practice is to have separate member accounts for each organization unit (such as finance, marketing, and sales), or for each environment lifecycle (such as development, testing and production), or for each workload (workload a, b, and c), and then aggregate these linked accounts using consolidated billing. Consolidated billing allows you to consolidate payment for multiple member AWS accounts under a single management account, while still providing visibility for each linked accountโ€™s activity. As costs and usage are aggregated in the management account, this allows you to maximize your service volume discounts, and maximize the use of your commitment discounts (Savings Plans and Reserved Instances) to achieve the highest discounts. AWS Control Tower can quickly set up and configure multiple AWS accounts, ensuring that governance is aligned with your organizationโ€™s requirements. ### Implementation steps 1. Define separation requirements: Requirements for separation are a combination of multiple factors, including security, reliability, and financial constructs. Work through each factor in order and specify whether the workload or workload environment should be separate from other workloads. Security promotes adhesion to access and data requirements. Reliability manages limits so that environments and workloads do not impact others. Review the security and reliability pillars of the Well-Architected Framework periodically and follow the provided best practices. Financial constructs create strict financial separation (different cost center, workload ownerships and accountability). Common examples of separation are production and test workloads being run in separate accounts, or using a separate account so that the invoice and billing data can be provided to the individual business units or departments in the organization or stakeholder who owns the account. 2. Define grouping requirements: Requirements for grouping do not override the separation requirements, but are used to assist management. Group together similar environments or workloads that do not require separation. An example of this is grouping multiple test or development environments from one or more workloads together. 3. Define account structure: Using these separations and groupings, specify an account for each group and maintain separation requirements. These accounts are your member or linked accounts. By grouping these member accounts under a single management or payer account, you combine usage, which allows for greater volume discounts across all accounts, which provides a single bill for all accounts. It's possible to separate billing data and provide each member account with an individual view of their billing data. If a member account must not have its usage or billing data visible to any other account, or if a separate bill from AWS is required, define multiple management or payer accounts. In this case, each member account has its own management or payer account. Resources should always be placed in member or linked accounts. The management or payer accounts should only be used for management.

๐Ÿ’ผ COST02-BP04 Implement groups and roles

Implement groups and roles that align to your policies and control who can create, modify, or decommission instances and resources in each group. For example, implement development, test, and production groups. This applies to AWS services and third-party solutions. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance User roles and groups are fundamental building blocks in the design and implementation of secure and efficient systems. Roles and groups help organizations balance the need for control with the requirement for flexibility and productivity, ultimately supporting organizational objectives and user needs. As recommended in Identity and access management section of AWS Well-Architected Framework Security Pillar, you need robust identity management and permissions in place to provide access to the right resources for the right people under the right conditions. Users receive only the access necessary to complete their tasks. This minimizes the risk associated with unauthorized access or misuse. After you develop policies, you can create logical groups and user roles within your organization. This allows you to assign permissions, control usage, and help implement robust access control mechanisms, preventing unauthorized access to sensitive information. Begin with high-level groupings of people. Typically, this aligns with organizational units and job roles (for example, a systems administrator in the IT Department, financial controller, or business analysts). The groups categorize people that do similar tasks and need similar access. Roles define what a group must do. It is easier to manage permissions for groups and roles than for individual users. Roles and groups assign permissions consistently and systematically across all users, preventing errors and inconsistencies. When a userโ€™s role changes, administrators can adjust access at the role or group level, rather than reconfiguring individual user accounts. For example, a systems administrator in IT requires access to create all resources, but an analytics team member only needs to create analytics resources. ### Implementation steps 1. Implement groups: Using the groups of users defined in your organizational policies, implement the corresponding groups, if necessary. For best practices on users, groups and authentication, see the Security Pillar of the AWS Well-Architected Framework. 2. Implement roles and policies: Using the actions defined in your organizational policies, create the required roles and access policies. For best practices on roles and policies, see the Security Pillar of the AWS Well-Architected Framework.

๐Ÿ’ผ COST02-BP05 Implement cost controls

Implement controls based on organization policies and defined groups and roles. These certify that costs are only incurred as defined by organization requirements such as control access to regions or resource types. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance A common first step in implementing cost controls is to set up notifications when cost or usage events occur outside of policies. You can act quickly and verify if corrective action is required without restricting or negatively impacting workloads or new activity. After you know the workload and environment limits, you can enforce governance. AWS Budgets allows you to set notifications and define monthly budgets for your AWS costs, usage, and commitment discounts (Savings Plans and Reserved Instances). You can create budgets at an aggregate cost level (for example, all costs), or at a more granular level where you include only specific dimensions such as linked accounts, services, tags, or Availability Zones. Once you set up your budget limits with AWS Budgets, use AWS Cost Anomaly Detection to reduce your unexpected cost. AWS Cost Anomaly Detection is a cost management service that uses machine learning to continually monitor your cost and usage to detect unusual spends. It helps you identify anomalous spend and root causes, so you can quickly take action. First, create a cost monitor in AWS Cost Anomaly Detection, then choose your alerting preference by setting up a dollar threshold (such as an alert on anomalies with impact greater than $1,000). Once you receive alerts, you can analyze the root cause behind the anomaly and impact on your costs. You can also monitor and perform your own anomaly analysis in AWS Cost Explorer. Enforce governance policies in AWS through AWS Identity and Access Management and AWS Organizations Service Control Policies (SCP). IAM allows you to securely manage access to AWS services and resources. Using IAM, you can control who can create or manage AWS resources, the type of resources that can be created, and where they can be created. This minimizes the possibility of resources being created outside of the defined policy. Use the roles and groups created previously and assign IAM policies to enforce the correct usage. SCP offers central control over the maximum available permissions for all accounts in your organization, keeping your accounts stay within your access control guidelines. SCPs are available only in an organization that has all features turned on, and you can configure the SCPs to either deny or allow actions for member accounts by default. Governance can also be implemented through management of AWS service quotas. By ensuring service quotas are set with minimum overhead and accurately maintained, you can minimize resource creation outside of your organizationโ€™s requirements. To achieve this, you must understand how quickly your requirements can change, understand projects in progress (both creation and decommission of resources), and factor in how fast quota changes can be implemented. Service quotas can be used to increase your quotas when required. ### Implementation steps 1. Implement notifications on spend: Using your defined organization policies, create AWS Budgets to notify you when spending is outside of your policies. Configure multiple cost budgets, one for each account, which notify you about overall account spending. Configure additional cost budgets within each account for smaller units within the account. These units vary depending on your account structure. Some common examples are AWS Regions, workloads (using tags), or AWS services. Configure an email distribution list as the recipient for notifications, and not an individual's email account. You can configure an actual budget for when an amount is exceeded, or use a forecasted budget for notifying on forecasted usage. You can also preconfigure AWS Budget Actions that can enforce specific IAM or SCP policies, or stop target Amazon EC2 or Amazon RDS instances. Budget Actions can be started automatically or require workflow approval. 2. Implement notifications on anomalous spend: Use AWS Cost Anomaly Detection to reduce your surprise costs in your organization and analyze root cause of potential anomalous spend. Once you create cost monitor to identify unusual spend at your specified granularity and configure notifications in AWS Cost Anomaly Detection, it sends you alert when unusual spend is detected. This will allow you to analyze root cause behind the anomaly and understand the impact on your cost. Use AWS Cost Categories while configuring AWS Cost Anomaly Detection to identify which project team or business unit team can analyze the root cause of the unexpected cost and take timely necessary actions. 3. Implement controls on usage: Using your defined organization policies, implement IAM policies and roles to specify which actions users can perform and which actions they cannot. Multiple organizational policies may be included in an AWS policy. In the same way that you defined policies, start broadly and then apply more granular controls at each step. Service limits are also an effective control on usage. Implement the correct service limits on all your accounts.

๐Ÿ’ผ COST02-BP06 Track project lifecycle

Track, measure, and audit the lifecycle of projects, teams, and environments to avoid using and paying for unnecessary resources. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance By effectively tracking the project lifecycle, organizations can achieve better cost control through enhanced planning, management, and resource optimization. The insights gained through tracking are invaluable for making informed decisions that contribute to the cost-effectiveness and overall success of the project. Tracking the entire lifecycle of the workload helps you understand when workloads or workload components are no longer required. The existing workloads and components may appear to be in use, but when AWS releases new services or features, they can be decommissioned or adopted. Check the previous stages of workloads. After a workload is in production, previous environments can be decommissioned or greatly reduced in capacity until they are required again. You can tag resources with a timeframe or reminder to pin the time that the workload was reviewed. For example, if the development environment was last reviewed months ago, it could be a good time to review it again to explore if new services can be adopted or if the environment is in use. You can group and tag your applications with myApplications on AWS to manage and track metadata such as criticality, environment, last reviewed, and cost center. You can both track your workload's lifecycle and monitor and manage the cost, health, security posture, and performance of your applications. AWS provides various management and governance services you can use for entity lifecycle tracking. You can use AWS Config or AWS Systems Manager to provide a detailed inventory of your AWS resources and configuration. It is recommended that you integrate with your existing project or asset management systems to keep track of active projects and products within your organization. Combining your current system with the rich set of events and metrics provided by AWS allows you to build a view of significant lifecycle events and proactively manage resources to reduce unnecessary costs. Similar to Application Lifecycle Management (ALM), tracking project lifecycle should involve multiple processes, tools, and teams working together, such as design and development, testing, production, support, and workload redundancy. By carefully monitoring each phase of a project's lifecycle, organizations gain crucial insights and enhanced control, facilitating successful project planning, implementation, and completion. This careful oversight verifies that projects not only meet quality standards, but are delivered on time and within budget, promoting overall cost efficiency. ### Implementation steps 1. Establish project lifecycle monitoring process: The Cloud Center of Excellence team must establish project lifecycle monitoring process. Establish a structured and systematic approach to monitoring workloads in order to improve control, visibility, and performance of the projects. Make the monitoring process transparent, collaborative, and focused on continuous improvement to maximize its effectiveness and value. 2. Perform workload reviews: As defined by your organizational policies, set up a regular cadence to audit your existing projects and perform workload reviews. The amount of effort spent in the audit should be proportional to the approximate risk, value, or cost to the organization. Key areas to include in the audit would be risk to the organization of an incident or outage, value, or contribution to the organization (measured in revenue or brand reputation), cost of the workload (measured as total cost of resources and operational costs), and usage of the workload (measured in number of organization outcomes per unit of time). If these areas change over the lifecycle, adjustments to the workload are required, such as full or partial decommissioning.

๐Ÿ’ผ COST03-BP01 Configure detailed information sources

Set up cost management and reporting tools for enhanced analysis and transparency of cost and usage data. Configure your workload to create log entries that facilitate the tracking and segregation of costs and usage. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Detailed billing information such as hourly granularity in cost management tools allow organizations to track their consumptions with further details and help them to identify some of the cost increase reasons. These data sources provide the most accurate view of cost and usage across your entire organization. You can use AWS Data Exports to create exports of the AWS Cost and Usage Report (CUR) 2.0. This is the new and recommended way to receive your detailed cost and usage data from AWS. It provides daily or hourly usage granularity, rates, costs, and usage attributes for all chargeable AWS services (the same information as CUR), along with some improvements. All possible dimensions are in the CUR such as tagging, location, resource attributes, and account IDs. There are three export types based on the type of export you want to create: a standard data export, an export to a cost and usage dashboard with QuickSight integration, or a legacy data export. - Standard data export: A customized export of a table that delivers to Amazon S3 on a recurring basis. - Cost and usage dashboard: An export and integration to QuickSight to deploy a pre-built cost and usage dashboard. - Legacy data export: An export of the legacy AWS Cost and Usage Report (CUR). You can create data exports with the following customizations: - Include resource IDs - Split cost allocation data - Hourly granularity - Versioning - Compression type and file format For your workloads that run containers on Amazon ECS or Amazon EKS, enable split cost allocation data so that you can allocate your container costs to individual business units and teams, based on how your container workloads consume shared compute and memory resources. Split cost allocation data introduces cost and usage data for new container-level resources to AWS Cost and Usage Report. Split cost allocation data is calculated by computing the cost of individual ECS services and tasks running on the cluster. A cost and usage dashboard exports the cost and usage dashboard table to an S3 bucket on a recurring basis and deploys a prebuilt cost and usage dashboard to QuickSight. Use this option if you want to quickly deploy a dashboard of your cost and usage data without the ability for customization. If desired, you can still export CUR in legacy mode, where you can integrate other processing services such as AWS Glue to prepare the data for analysis and perform data analysis with Amazon Athena using SQL to query the data. ### Implementation steps 1. Create data exports: Create customized exports with the data you want and control the schema of your exports. Create billing and cost management data exports using basic SQL, and visualize your billing and cost management data by integrating with QuickSight. You can also export your data in standard mode to analyze your data with other processing tools like Amazon Athena. 2. Configure the cost and usage report: Using the billing console, configure at least one cost and usage report. Configure a report with hourly granularity that includes all identifiers and resource IDs. You can also create other reports with different granularities to provide higher-level summary information. 3. Configure hourly granularity in Cost Explorer: To access cost and usage data with hourly granularity for the past 14 days, consider enabling hourly and resource level data in the billing console. 4. Configure application logging: Verify that your application logs each business outcome that it delivers so it can be tracked and measured. Ensure that the granularity of this data is at least hourly so it matches with the cost and usage data. For more details on logging and monitoring, see Well-Architected Operational Excellence Pillar.

๐Ÿ’ผ COST03-BP02 Add organization information to cost and usage

Define a tagging schema based on your organization, workload attributes, and cost allocation categories so that you can filter and search for resources or monitor cost and usage in cost management tools. Implement consistent tagging across all resources where possible by purpose, team, environment, or other criteria relevant to your business. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Implement tagging in AWS to add organization information to your resources, which will then be added to your cost and usage information. A tag is a key-value pair โ€” the key is defined and must be unique across your organization, and the value is unique to a group of resources. An example of a key-value pair is the key is Environment, with a value of Production. All resources in the production environment will have this key-value pair. Tagging allows you categorize and track your costs with meaningful, relevant organization information. You can apply tags that represent organization categories (such as cost centers, application names, projects, or owners), and identify workloads and characteristics of workloads (such as test or production) to attribute your costs and usage throughout your organization. When you apply tags to your AWS resources (such as Amazon Elastic Compute Cloud instances or Amazon Simple Storage Service buckets) and activate the tags, AWS adds this information to your Cost and Usage Reports. You can run reports and perform analysis on tagged and untagged resources to allow greater compliance with internal cost management policies and ensure accurate attribution. Creating and implementing an AWS tagging standard across your organizationโ€™s accounts helps you manage and govern your AWS environments in a consistent and uniform manner. Use Tag Policies in AWS Organizations to define rules for how tags can be used on AWS resources in your accounts in AWS Organizations. Tag Policies allow you to easily adopt a standardized approach for tagging AWS resources. AWS Tag Editor allows you to add, delete, and manage tags of multiple resources. With Tag Editor, you search for the resources that you want to tag, and then manage tags for the resources in your search results. AWS Cost Categories allows you to assign organization meaning to your costs, without requiring tags on resources. You can map your cost and usage information to unique internal organization structures. You define category rules to map and categorize costs using billing dimensions, such as accounts and tags. This provides another level of management capability in addition to tagging. You can also map specific accounts and tags to multiple projects. ### Implementation steps 1. Define a tagging schema: Gather all stakeholders from across your business to define a schema. This typically includes people in technical, financial, and management roles. Define a list of tags that all resources must have, as well as a list of tags that resources should have. Verify that the tag names and values are consistent across your organization. 2. Tag resources: Using your defined cost attribution categories, place tags on all resources in your workloads according to the categories. Use tools such as the CLI, Tag Editor, or AWS Systems Manager to increase efficiency. 3. Implement AWS Cost Categories: You can create Cost Categories without implementing tagging. Cost categories use the existing cost and usage dimensions. Create category rules from your schema and implement them into cost categories. 4. Automate tagging: To verify that you maintain high levels of tagging across all resources, automate tagging so that resources are automatically tagged when they are created. Use services such as AWS CloudFormation to verify that resources are tagged when created. You can also create a custom solution to tag automatically using Lambda functions or use a microservice that scans the workload periodically and removes any resources that are not tagged, which is ideal for test and development environments. 5. Monitor and report on tagging: To verify that you maintain high levels of tagging across your organization, report and monitor the tags across your workloads. You can use AWS Cost Explorer to view the cost of tagged and untagged resources, or use services such as Tag Editor. Regularly review the number of untagged resources and take action to add tags until you reach the desired level of tagging.

๐Ÿ’ผ COST03-BP03 Identify cost attribution categories

Identify organization categories such as business units, departments or projects that could be used to allocate cost within your organization to the internal consuming entities. Use those categories to enforce spend accountability, create cost awareness and drive effective consumption behaviors. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance The process of categorizing costs is crucial in budgeting, accounting, financial reporting, decision making, benchmarking, and project management. By classifying and categorizing expenses, teams can gain a better understanding of the types of costs they incur throughout their cloud journey helping teams make informed decisions and manage budgets effectively. Cloud spend accountability establishes a strong incentive for disciplined demand and cost management. The result is significantly greater cloud cost savings for organizations that allocate most of their cloud spend to consuming business units or teams. Moreover, allocating cloud spend helps organizations adopt more best practices of centralized cloud governance. Work with your finance team and other relevant stakeholders to understand the requirements of how costs must be allocated within your organization during your regular cadence calls. Workload costs must be allocated throughout the entire lifecycle, including development, testing, production, and decommissioning. Understand how the costs incurred for learning, staff development, and idea creation are attributed in the organization. This can be helpful to correctly allocate accounts used for this purpose to training and development budgets instead of generic IT cost budgets. After defining your cost attribution categories with stakeholders in your organization, use AWS Cost Categories to group your cost and usage information into meaningful categories in the AWS Cloud, such as cost for a specific project, or AWS accounts for departments or business units. You can create custom categories and map your cost and usage information into these categories based on rules you define using various dimensions such as account, tag, service, or charge type. Once cost categories are set up, you can view your cost and usage information by these categories, which allows your organization to make better strategic and purchasing decisions. These categories are visible in AWS Cost Explorer, AWS Budgets, and AWS Cost and Usage Report as well. For example, create cost categories for your business units (DevOps team), and under each category create multiple rules (rules for each sub category) with multiple dimensions (AWS accounts, cost allocation tags, services or charge type) based on your defined groupings. With cost categories, you can organize your costs using a rule-based engine. The rules that you configure organize your costs into categories. Within these rules, you can filter with using multiple dimensions for each category such as specific AWS accounts, AWS services, or charge types. You can then use these categories across multiple products in the AWS Billing and Cost Management and Cost Management console. This includes AWS Cost Explorer, AWS Budgets, AWS Cost and Usage Report, and AWS Cost Anomaly Detection. You can create groupings of costs using cost categories as well. After you create the cost categories (allowing up to 24 hours after creating a cost category for your usage records to be updated with values), they appear in AWS Cost Explorer, AWS Budgets, AWS Cost and Usage Report, and AWS Cost Anomaly Detection. In AWS Cost Explorer and AWS Budgets, a cost category appears as an additional billing dimension. You can use this to filter for the specific cost category value, or group by the cost category. ### Implementation steps 1. Define your organization categories: Meet with internal stakeholders and business units to define categories that reflect your organization's structure and requirements. These categories should directly map to the structure of existing financial categories, such as business unit, budget, cost center, or department. Look at the outcomes the cloud delivers for your business such as training or education, as these are also organization categories. 2. Define your functional categories: Meet with internal stakeholders and business units to define categories that reflect the functions that you have within your business. This may be the workload or application names, and the type of environment, such as production, testing, or development. 3. Define AWS Cost Categories: Create cost categories to organize your cost and usage information with using AWS Cost Categories and map your AWS cost and usage into meaningful categories. Multiple categories can be assigned to a resource, and a resource can be in multiple different categories, so define as many categories as needed so that you can manage your costs within the categorized structure using AWS Cost Categories.

๐Ÿ’ผ COST03-BP04 Establish organization metrics

Establish the organization metrics that are required for this workload. Example metrics of a workload are customer reports produced, or web pages served to customers. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Understand how your workloadโ€™s output is measured against business success. Each workload typically has a small set of major outputs that indicate performance. If you have a complex workload with many components, then you can prioritize the list, or define and track metrics for each component. Work with your teams to understand which metrics to use. This unit will be used to understand the efficiency of the workload, or the cost for each business output. ### Implementation steps 1. Define workload outcomes: Meet with the stakeholders in the business and define the outcomes for the workload. These are a primary measure of customer usage and must be business metrics and not technical metrics. There should be a small number of high-level metrics (less than five) per workload. If the workload produces multiple outcomes for different use cases, then group them into a single metric. 2. Define workload component outcomes: Optionally, if you have a large and complex workload, or can easily break your workload into components (such as microservices) with well-defined inputs and outputs, define metrics for each component. The effort should reflect the value and cost of the component. Start with the largest components and work towards the smaller components.

๐Ÿ’ผ COST03-BP05 Configure billing and cost management tools

Configure cost management tools that meet your organization's policies to manage and optimize cloud spending. This includes services, tools, and resources to organize and track cost and usage data, enhance control through consolidated billing and access permission, improve planning through budgeting and forecasts, receive notifications or alerts, and lower cost with resources and pricing optimizations. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To establish strong accountability, consider your account strategy first as part of your cost allocation strategy. Get this right, and you may not need to go any further. Otherwise, there can be unawareness and further pain points. To encourage accountability of cloud spend, grant users access to tools that provide visibility into their costs and usage. AWS recommends that you configure all workloads and teams for the following purposes: - Organize: Establish your cost allocation and governance baseline with your own tagging strategy and taxonomy. Create multiple AWS Accounts with tools such as AWS Control Tower or AWS Organization. Tag the supported AWS resources and categorize them meaningfully based on your organization structure (business units, departments, or projects). Tag account names for specific cost centers and map them with AWS Cost Categories to group accounts for business units to their cost centers so that business unit owner can see multiple accounts' consumption in one place. - Access: Track organization-wide billing information in consolidated billing. Verify the right stakeholders and business owners have access. - Control: Build effective governance mechanisms with the right guardrails to prevent unexpected scenarios when using Service Control Policies (SCP), tag policies, IAM policies and budget alerts. For example, you can allow teams to create specific resources in preferred regions only by using effective control mechanisms and prevent resource creations without specific tag (such as cost-center). - Current state: Configure a dashboard that shows current levels of cost and usage. The dashboard should be available in a highly visible place within the work environment like an operations dashboard. You can export data and use the Cost and Usage Dashboard from the AWS Cost Optimization Hub or any supported product to create this visibility. You may need to create different dashboards for different personas. For example, manager dashboard may differ from an engineering dashboard. - Notifications: Provide notifications when cost or usage exceeds defined limits and anomalies occur with AWS Budgets or AWS Cost Anomaly Detection. - Reports: Summarize all cost and usage information. Raise awareness and accountability of your cloud spend with detailed, attributable cost data. Create reports that are relevant to the team consuming them and contain recommendations. - Tracking: Show the current cost and usage against configured goals or targets. - Analysis: Allow team members to perform custom and deep analysis down to the hourly, daily or monthly granularity with different filters (resource, account, tag, etc.). - Inspect: Stay up to date with your resource deployment and cost optimization opportunities. Get notifications using Amazon CloudWatch, Amazon SNS, or Amazon SES for resource deployments at the organization level. Review cost optimization recommendations with AWS Trusted Advisor or AWS Compute Optimizer. - Trend reports: Display the variability in cost and usage over the required period with the required granularity. - Forecasts: Show estimated future costs, estimate your resource usage, and spend with forecast dashboards you create. You can use AWS Cost Optimization Hub to understand potential cost-saving opportunities consolidated from a centralized location and create data exports for integration with Amazon Athena. You can also use the AWS Cost Optimization Hub to deploy the Cost and Usage Dashboard, which utilizes QuickSight for interactive cost analysis and secure cost insight sharing. If you don't have essential skills or bandwidth in your organization, you can work with AWS ProServ, AWS Managed Services (AMS), or AWS Partners. You can also use third-party tools but ensure you validate the value proposition. ### Implementation steps 1. Allow team-based access to tools: Configure your accounts and create groups that have access to the required cost and usage reports for their consumptions and use AWS Identity and Access Management to control access to the tools such as AWS Cost Explorer. These groups must include representatives from all teams that own or manage an application. This certifies that every team has access to their cost and usage information to track their consumption. 2. Organize Costs Tags and Categories: Organize your costs across teams, business units, applications, environments, and projects. Use resource tags to organize costs, by cost allocation tags. Create Cost Categories based on the dimensions with using tags, accounts, services, etc. to map your costs. 3. Configure AWS Budgets: Configure AWS Budgets on all accounts for your workloads. Set budgets for the overall account spend, and budgets for the workloads by using tags and cost categories. Configure notifications in AWS Budgets to receive alerts for when you exceed your budgeted amounts, or when your estimated costs exceed your budgets. 4. Configure AWS Cost Anomaly Detection: Use AWS Cost Anomaly Detection for your accounts, core services or cost categories you created to monitor your cost and usage and detect unusual spends. You can receive alerts individually in aggregated reports and receive alerts in an email or an Amazon SNS topic which allows you to analyze and determine the root cause of the anomaly and identify the factor that is driving the cost increase. 5. Use cost analysis tools: Configure AWS Cost Explorer for your workload and accounts to visualize your cost data for further analysis. Create a dashboard for the workload that tracks overall spend, key usage metrics for the workload, and forecast of future costs based on your historical cost data. 6. Use cost-saving analysis tools: Use AWS Cost Optimization Hub to identify savings opportunities with tailored recommendations including deleting unused resources, rightsizing, savings Plans, reservations and compute optimizer recommendations. 7. Configure advanced tools: You can optionally create visuals to facilitate interactive analysis and sharing of cost insights. With Data Exports on AWS Cost Optimization Hub, you can create cost and usage dashboard powered by QuickSight for your organization that provides additional detail and granularity. You can also implement advanced analysis capability with using data exports in Amazon Athena for advanced queries, and create dashboards on QuickSight. Work with AWS Partners to adopt cloud management solutions for consolidated cloud bill monitoring and optimization.

๐Ÿ’ผ COST03-BP06 Allocate costs based on workload metrics

Allocate the workload's costs based on usage metrics or business outcomes to measure workload cost efficiency. Implement a process to analyze the cost and usage data with analytics services, which can provide insight and charge back capability. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Cost optimization means delivering business outcomes at the lowest price point, which can only be achieved by allocating workload costs based on workload metrics (measured by workload efficiency). Monitor the defined workload metrics through log files or other application monitoring. Combine this data with the workloadโ€™s costs, which can be obtained by looking at costs with a specific tag value or account ID. Perform this analysis at the hourly level. Your efficiency typically changes if you have static cost components (for example, a backend database running permanently) with a varying request rate (for example, usage peaks at nine in the morning to five in the evening, with few requests at night). Understanding the relationship between the static and variable costs helps you focus your optimization activities. Creating workload metrics for shared resources may be challenging compared to resources like containerized applications on Amazon Elastic Container Service (Amazon ECS) and Amazon API Gateway. However, there are certain ways you can categorize usage and track cost. If you need to track Amazon ECS and AWS Batch shared resources, you can enable split cost allocation data in AWS Cost Explorer. With split cost allocation data, you can understand and optimize the cost and usage of your containerized applications and allocate application costs back to individual business entities based on how shared compute and memory resources are consumed. ### Implementation steps - Allocate costs to workload metrics: Using the defined metrics and configured tags, create a metric that combines the workload output and workload cost. Use analytics services such as Amazon Athena and Amazon QuickSight to create an efficiency dashboard for the overall workload and any components.

๐Ÿ’ผ COST04-BP01 Track resources over their lifetime

Define and implement a method to track resources and their associations with systems over their lifetime. You can use tagging to identify the workload or function of the resource. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Decommission workload resources that are no longer required. A common example is resources used for testing: after testing has been completed, the resources can be removed. Tracking resources with tags (and running reports on those tags) can help you identify assets for decommission, as they will not be in use or the license on them will expire. Using tags is an effective way to track resources, by labeling the resource with its function, or a known date when it can be decommissioned. Reporting can then be run on these tags. Example values for feature tagging are feature-X testing to identify the purpose of the resource in terms of the workload lifecycle. Another example is using LifeSpan or TTL for the resources, such as to-be-deleted tag key name and value to define the time period or specific time for decommissioning. ### Implementation steps 1. Implement a tagging scheme: Implement a tagging scheme that identifies the workload the resource belongs to, verifying that all resources within the workload are tagged accordingly. Tagging helps you categorize resources by purpose, team, environment, or other criteria relevant to your business. For more detail on tagging uses cases, strategies, and techniques, see AWS Tagging Best Practices. 2. Implement workload throughput or output monitoring: Implement workload throughput monitoring or alarming, initiating on either input requests or output completions. Configure it to provide notifications when workload requests or outputs drop to zero, indicating the workload resources are no longer used. Incorporate a time factor if the workload periodically drops to zero under normal conditions. For more detail on unused or underutilized resources, see AWS Trusted Advisor Cost Optimization checks. 3. Group AWS resources: Create groups for AWS resources. You can use AWS Resource Groups to organize and manage your AWS resources that are in the same AWS Region. You can add tags to most of your resources to help identify and sort your resources within your organization. Use Tag Editor to add tags to supported resources in bulk. Consider using AWS Service Catalog to create, manage, and distribute portfolios of approved products to end users and manage the product lifecycle.

๐Ÿ’ผ COST04-BP02 Implement a decommissioning process

Implement a process to identify and decommission unused resources. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Implement a standardized process across your organization to identify and remove unused resources. The process should define the frequency searches are performed and the processes to remove the resource to verify that all organization requirements are met. ### Implementation steps 1. Create and implement a decommissioning process: Work with the workload developers and owners to build a decommissioning process for the workload and its resources. The process should cover the method to verify if the workload is in use, and also if each of the workload resources are in use. Detail the steps necessary to decommission the resource, removing them from service while ensuring compliance with any regulatory requirements. Any associated resources should be included, such as licenses or attached storage. Notify the workload owners that the decommissioning process has been started. Use the following decommission steps to guide you on what should be checked as part of your process: - Identify resources to be decommissioned: Identify resources that are eligible for decommissioning in your AWS Cloud. Record all necessary information and schedule the decommission. In your timeline, be sure to account for if (and when) unexpected issues arise during the process. - Coordinate and communicate: Work with workload owners to confirm the resource to be decommissioned. - Record metadata and create backups: Record metadata (such as public IPs, Region, AZ, VPC, Subnet, and Security Groups) and create backups (such as Amazon Elastic Block Store snapshots or taking AMI, keys export, and Certificate export) if it is required for the resources in the production environment or if they are critical resources. - Validate infrastructure-as-code: Determine whether resources were deployed with AWS CloudFormation, Terraform, AWS Cloud Development Kit (AWS CDK), or any other infrastructure-as-code deployment tool so they can be re-deployed if necessary. - Prevent access: Apply restrictive controls for a period of time, to prevent the use of resources while you determine if the resource is required. Verify that the resource environment can be reverted to its original state if required. - Follow your internal decommissioning process: Follow the administrative tasks and decommissioning process of your organization, like removing the resource from your organization domain, removing the DNS record, and removing the resource from your configuration management tool, monitoring tool, automation tool and security tools. 2. If the resource is an Amazon EC2 instance, consult the following list. - Stop or terminate all your Amazon EC2 instances and load balancers. Amazon EC2 instances are visible in the console for a short time after they're terminated. You aren't billed for any instances that aren't in the running state. - Delete your Auto Scaling infrastructure. - Release all Dedicated Hosts. - Delete all Amazon EBS volumes and Amazon EBS snapshots. - Release all Elastic IP addresses. - Deregister all Amazon Machine Images (AMIs). - Terminate all AWS Elastic Beanstalk environments. 3. If the resource is an object in Amazon Glacier storage and if you delete an archive before meeting the minimum storage duration, you will be charged a prorated early deletion fee. Amazon Glacier minimum storage duration depends on the storage class used. For a summary of minimum storage duration for each storage class, see Performance across the Amazon S3 storage classes.

๐Ÿ’ผ COST04-BP03 Decommission resources

Decommission resources initiated by events such as periodic audits, or changes in usage. Decommissioning is typically performed periodically and can be manual or automated. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance The frequency and effort to search for unused resources should reflect the potential savings, so an account with a small cost should be analyzed less frequently than an account with larger costs. Searches and decommission events can be initiated by state changes in the workload, such as a product going end of life or being replaced. Searches and decommission events may also be initiated by external events, such as changes in market conditions or product termination. ### Implementation steps - Decommission resources: This is the depreciation stage of AWS resources that are no longer needed or ending of a licensing agreement. Complete all final checks before moving to the disposal stage and decommissioning resources to prevent any unwanted disruptions, such as taking snapshots or backups. Using the decommissioning process, decommission each of the resources that have been identified as unused.

๐Ÿ’ผ COST04-BP04 Decommission resources automatically

Design your workload to gracefully handle resource termination as you identify and decommission non-critical resources, resources that are not required, or resources with low utilization. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Use automation to reduce or remove the associated costs of the decommissioning process. Designing your workload to perform automated decommissioning will reduce the overall workload costs during its lifetime. You can use Amazon EC2 Auto Scaling or Application Auto Scaling to perform the decommissioning process. You can also implement custom code using the API or SDK to decommission workload resources automatically. Modern applications are built serverless-first, a strategy that prioritizes the adoption of serverless services. AWS developed serverless services for all three layers of your stack: compute, integration, and data stores. Using serverless architecture will allow you to save costs during low-traffic periods with scaling up and down automatically. ### Implementation steps 1. Implement Amazon EC2 Auto Scaling or Application Auto Scaling: For resources that are supported, configure them with Amazon EC2 Auto Scaling or Application Auto Scaling. These services can help you optimize your utilization and cost efficiencies when consuming AWS services. When demand drops, these services will automatically remove any excess resource capacity so you avoid overspending. 2. Configure CloudWatch to terminate instances: Instances can be configured to terminate using CloudWatch alarms. Using the metrics from the decommissioning process, implement an alarm with an Amazon Elastic Compute Cloud action. Verify the operation in a non-production environment before rolling out. 3. Implement code within the workload: You can use the AWS SDK or AWS CLI to decommission workload resources. Implement code within the application that integrates with AWS and terminates or removes resources that are no longer used. 4. Use serverless services: Prioritize building serverless architectures and event-driven architecture on AWS to build and run your applications. AWS offers multiple serverless technology services that inherently provide automatically optimized resource utilization and automated decommissioning (scale in and scale out). With serverless applications, resource utilization is automatically optimized and you never pay for over-provisioning.

๐Ÿ’ผ COST04-BP05 Enforce data retention policies

Define data retention policies on supported resources to handle object deletion per your organizationsโ€™ requirements. Identify and delete unnecessary or orphaned resources and objects that are no longer required. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Use data retention policies and lifecycle policies to reduce the associated costs of the decommissioning process and storage costs for the identified resources. Defining your data retention policies and lifecycle policies to perform automated storage class migration and deletion will reduce the overall storage costs during its lifetime. You can use Amazon Data Lifecycle Manager to automate the creation and deletion of Amazon Elastic Block Store snapshots and Amazon EBS-backed Amazon Machine Images (AMIs), and use Amazon S3 Intelligent-Tiering or an Amazon S3 lifecycle configuration to manage the lifecycle of your Amazon S3 objects. You can also implement custom code using the API or SDK to create lifecycle policies and policy rules for objects to be deleted automatically. ### Implementation steps 1. Use Amazon Data Lifecycle Manager: Use lifecycle policies on Amazon Data Lifecycle Manager to automate deletion of Amazon EBS snapshots and Amazon EBS-backed AMIs. 2. Set up lifecycle configuration on a bucket: Use Amazon S3 lifecycle configuration on a bucket to define actions for Amazon S3 to take during an object's lifecycle, including automatic deletion at the end of the object's lifecycle, based on your business requirements.

๐Ÿ’ผ COST05-BP01 Identify organization requirements for cost

Work with team members to define the balance between cost optimization and other pillars, such as performance and reliability, for this workload. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance In most organizations, the information technology (IT) department is comprised of multiple small teams, each with its own agenda and focus area, reflecting the specialties and skills of its team members. You need to understand your organizationโ€™s overall objectives, priorities, goals, and how each department or project contributes to these objectives. Categorizing all essential resources, including personnel, equipment, technology, materials, and external services, is crucial for achieving organizational objectives and comprehensive budget planning. Adopting this systematic approach to cost identification and understanding is fundamental for establishing a realistic and robust cost plan for the organization. When selecting services for your workload, it is key that you understand your organization priorities. Create a balance between cost optimization and other AWS Well-Architected Framework pillars, such as performance and reliability. This process should be conducted systematically and regularly to reflect changes in the organization's objectives, market conditions, and operational dynamics. A fully cost-optimized workload is the solution that is most aligned to your organizationโ€™s requirements, not necessarily the lowest cost. Meet with all teams in your organization, such as product, business, technical, and finance, to collect information. Evaluate the impact of trade-offs between competing interests or alternative approaches to help make informed decisions when determining where to focus efforts or choosing a course of action. For example, accelerating speed to market for new features may be emphasized over cost optimization, or you may choose a relational database for non-relational data to simplify the effort to migrate a system, rather than migrating to a database optimized for your data type and updating your application. ### Implementation steps 1. Identify organization requirements for cost: Meet with team members from your organization, including those in product management, application owners, development and operational teams, management, and financial roles. Prioritize the Well-Architected pillars for this workload and its components. The output should be a list of the pillars in order. You can also add a weight to each pillar to indicate how much additional focus it has, or how similar the focus is between two pillars. 2. Address the technical debt and document it: During the workload review, address the technical debt. Document a backlog item to revisit the workload in the future, with the goal of refactoring or re-architecting to optimize it further. Clearly communicate the trade-offs that were made to other stakeholders.

๐Ÿ’ผ COST05-BP02 Analyze all components of the workload

Verify every workload component is analyzed, regardless of current size or current costs. The review effort should reflect the potential benefit, such as current and projected costs. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Workload components, which are designed to deliver business value to the organization, may encompass various services. For each component, one might choose specific AWS Cloud services to address business needs. This selection could be influenced by factors such as familiarity with or prior experience using these services. After identifying your organization's requirements as mentioned in COST05-BP01 Identify organization requirements for cost, perform a thorough analysis on all components in your workload. Analyze each component considering current and projected costs and sizes. Consider the cost of analysis against any potential workload savings over its lifecycle. The effort expended on the analysis of all components of this workload should correspond to the potential savings or improvements anticipated from optimization of that specific component. For example, if the cost of the proposed resource is $10 per month, and under forecasted loads would not exceed $15 per month, spending a day of effort to reduce costs by 50% (five dollars per month) could exceed the potential benefit over the life of the system. Use a faster and more efficient data-based estimation to create the best overall outcome for this component. Workloads can change over time, and the right set of services may not be optimal if the workload architecture or usage changes. Analysis for selection of services must incorporate current and future workload states and usage levels. Implementing a service for future workload state or usage may reduce overall costs by reducing or removing the effort required to make future changes. For example, using EMR Serverless might be the appropriate choice initially. However, as consumption for that service increases, transitioning to EMR on EC2 could reduce costs for that component of the workload. AWS Cost Explorer and the AWS Cost and Usage Reports (CUR) can analyze the cost of a proof of concept (PoC) or running environment. You can also use AWS Pricing Calculator to estimate workload costs. Write a workflow to be followed by technical teams to review their workloads. Keep this workflow simple, but also cover all the necessary steps to make sure the teams understand each component of the workload and its pricing. Your organization can then follow and customize this workflow based on the specific needs of each team. 1. List each service in use for your workload: This is a good starting point. Identify all of the services currently in use and where costs are originate from. 2. Understand how pricing works for those services: Understand the pricing model of each service. Different AWS services have different pricing models based on factors like usage volume, data transfer, and feature- specific pricing. 3. Focus on the services that have unexpected workload costs and that do not align with your expected usage and business outcome: Identify outliers or services where the cost is not proportional to the value or usage with using AWS Cost Explorer or AWS Cost and Usage Reports. It's important to correlate costs with business outcomes to prioritize optimization efforts. 4. AWS Cost Explorer, CloudWatch Logs, VPC Flow Logs, and Amazon S3 Storage Lens to understand the root cause of those high costs: These tools are instrumental in the diagnosis of high costs. Each service offers a different lens to view and analyze usage and costs. For instance, Cost Explorer helps determine overall cost trends, CloudWatch Logs provides operational insights, VPC Flow Logs displays IP traffic, and Amazon S3 Storage Lens is useful for storage analytics. 5. Use AWS Budgets to set budgets for certain amounts for services or accounts: Setting budgets is a proactive way to manage costs. Use AWS Budgets to set custom budget thresholds and receive alerts when costs exceed those thresholds. 6. Configure Amazon CloudWatch alarms to send billing and usage alerts: Set up monitoring and alerts for cost and usage metrics. CloudWatch alarms can notify you when certain thresholds are breached, which improves intervention response time. Facilitate notable enhancement and financial savings over time through strategic review of all workload components and irrespective of their present attributes. The effort invested in this review process should be deliberate, with careful consideration of the potential advantages that might be realized. ### Implementation steps 1. List the workload components: Build a list of your workload's components. Use this list to verify that each component was analyzed. The effort spent should reflect the criticality to the workload as defined by your organization's priorities. Group together resources functionally to improve efficiency (for example, production database storage, if there are multiple databases). 2. Prioritize the component list: Take the component list and prioritize it in order of effort. This is typically in order of the cost of the component, from most expensive to least expensive or the criticality as defined by your organization's priorities. 3. Perform the analysis: For each component on the list, review the options and services available, and choose the option that aligns best with your organizational priorities.

๐Ÿ’ผ COST05-BP03 Perform a thorough analysis of each component

Look at overall cost to the organization of each component. Calculate the total cost of ownership by factoring in cost of operations and management, especially when using managed services by cloud provider. The review effort should reflect potential benefit (for example, time spent analyzing is proportional to component cost). **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Consider the time savings that will allow your team to focus on retiring technical debt, innovation, value-adding features and building what differentiates the business. For example, you might need to lift and shift (also known as rehost) your databases from your on-premises environment to the cloud as rapidly as possible and optimize later. It is worth exploring the possible savings attained by using managed services on AWS that may remove or reduce license costs. Managed services on AWS remove the operational and administrative burden of maintaining a service, such as patching or upgrading the OS, and allow you to focus on innovation and business. Since managed services operate at cloud scale, they can offer a lower cost per transaction or service. You can make potential optimizations in order to achieve some tangible benefit, without changing the core architecture of the application. For example, you may be looking to reduce the amount of time you spend managing database instances by migrating to a database-as-a-service platform like Amazon Relational Database Service (Amazon RDS) or migrating your application to a fully managed platform like AWS Elastic Beanstalk. Usually, managed services have attributes that you can set to ensure sufficient capacity. You must set and monitor these attributes so that your excess capacity is kept to a minimum and performance is maximized. You can modify the attributes of AWS Managed Services using the AWS Management Console or AWS APIs and SDKs to align resource needs with changing demand. For example, you can increase or decrease the number of nodes on an Amazon EMR cluster (or an Amazon Redshift cluster) to scale out or in. You can also pack multiple instances on an AWS resource to activate higher density usage. For example, you can provision multiple small databases on a single Amazon Relational Database Service (Amazon RDS) database instance. As usage grows, you can migrate one of the databases to a dedicated Amazon RDS database instance using a snapshot and restore process. When provisioning workloads on managed services, you must understand the requirements of adjusting the service capacity. These requirements are typically time, effort, and any impact to normal workload operation. The provisioned resource must allow time for any changes to occur, provision the required overhead to allow this. The ongoing effort required to modify services can be reduced to virtually zero by using APIs and SDKs that are integrated with system and monitoring tools, such as Amazon CloudWatch. Amazon RDS, Amazon Redshift, and Amazon ElastiCache provide a managed database service. Amazon Athena, Amazon EMR, and Amazon OpenSearch Service provide a managed analytics service. AMS is a service that operates AWS infrastructure on behalf of enterprise customers and partners. It provides a secure and compliant environment that you can deploy your workloads onto. AMS uses enterprise cloud operating models with automation to allow you to meet your organization requirements, move into the cloud faster, and reduce your on-going management costs. ### Implementation steps 1. Perform a thorough analysis: Using the component list, work through each component from the highest priority to the lowest priority. For the higher priority and more costly components, perform additional analysis and assess all available options and their long term impact. For lower priority components, assess if changes in usage would change the priority of the component, and then perform an analysis of appropriate effort. 2. Compare managed and unmanaged resources: Consider the operational cost for the resources you manage and compare them with AWS managed resources. For example, review your databases running on Amazon EC2 instances and compare with Amazon RDS options (an AWS managed service) or Amazon EMR compared to running Apache Spark on Amazon EC2. When moving from a self-managed workload to a AWS fully managed workload, research your options carefully. The three most important factors to consider are the type of managed service you want to use, the process you will use to migrate your data and understand the AWS shared responsibility model.

๐Ÿ’ผ COST05-BP04 Select software with cost-effective licensing

Open-source software eliminates software licensing costs, which can contribute significant costs to workloads. Where licensed software is required, avoid licenses bound to arbitrary attributes such as CPUs, look for licenses that are bound to output or outcomes. The cost of these licenses scales more closely to the benefit they provide. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Open source originated in the context of software development to indicate that the software complies with certain free distribution criteria. Open source software is composed of source code that anyone can inspect, modify, and enhance. Based on business requirements, skill of engineers, forecasted usage, or other technology dependencies, organizations can consider using open source software on AWS to minimize their license costs. In other words, the cost of software licenses can be reduced through the use of open source software. This can have significant impact on workload costs as the size of the workload scales. Measure the benefits of licensed software against the total cost to optimize your workload. Model any changes in licensing and how they would impact your workload costs. If a vendor changes the cost of your database license, investigate how that impacts the overall efficiency of your workload. Consider historical pricing announcements from your vendors for trends of licensing changes across their products. Licensing costs may also scale independently of throughput or usage, such as licenses that scale by hardware (CPU bound licenses). These licenses should be avoided because costs can rapidly increase without corresponding outcomes. For instance, operating an Amazon EC2 instance in us-east-1 with a Linux operating system allows you to cut costs by approximately 45%, compared to running another Amazon EC2 instance that runs on Windows. The AWS Pricing Calculator offers a comprehensive way to compare the costs of various resources with different license options, such as Amazon RDS instances and different database engines. Additionally, the AWS Cost Explorer provides an invaluable perspective for the costs of existing workloads, especially those that come with different licenses. For license management, AWS License Manager offers a streamlined method to oversee and handle software licenses. Customers can deploy and operationalize their preferred open source software in the AWS Cloud. ### Implementation steps 1. Analyze license options: Review the licensing terms of available software. Look for open source versions that have the required functionality, and whether the benefits of licensed software outweigh the cost. Favorable terms align the cost of the software to the benefits it provides. 2. Analyze the software provider: Review any historical pricing or licensing changes from the vendor. Look for any changes that do not align to outcomes, such as punitive terms for running on specific vendors hardware or platforms. Additionally, look for how they perform audits, and penalties that could be imposed.

๐Ÿ’ผ COST05-BP05 Select components of this workload to optimize cost in line with organization priorities

Factor in cost when selecting all components for your workload. This includes using application-level and managed services or serverless, containers, or event-driven architecture to reduce overall cost. Minimize license costs by using open-source software, software that does not have license fees, or alternatives to reduce spending. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Consider the cost of services and options when selecting all components. This includes using application level and managed services, such as Amazon Relational Database Service (Amazon RDS), Amazon DynamoDB, Amazon Simple Notification Service (Amazon SNS), and Amazon Simple Email Service (Amazon SES) to reduce overall organization cost. Use serverless and containers for compute, such as AWS Lambda and Amazon Simple Storage Service (Amazon S3) for static websites. Containerize your application if possible and use AWS Managed Container Services such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Minimize license costs by using open-source software, or software that does not have license fees (for example, Amazon Linux for compute workloads or migrate databases to Amazon Aurora). You can use serverless or application-level services such as Lambda, Amazon Simple Queue Service (Amazon SQS), Amazon SNS, and Amazon SES. These services remove the need for you to manage a resource and provide the function of code execution, queuing services, and message delivery. The other benefit is that they scale in performance and cost in line with usage, allowing efficient cost allocation and attribution. Using event-driven architecture is also possible with serverless services. Event-driven architectures are push-based, so everything happens on demand as the event presents itself in the router. This way, youโ€™re not paying for continuous polling to check for an event. This means less network bandwidth consumption, less CPU utilization, less idle fleet capacity, and fewer SSL/TLS handshakes. ### Implementation steps 1. Select each service to optimize cost: Using your prioritized list and analysis, select each option that provides the best match with your organizational priorities. Instead of increasing the capacity to meet the demand, consider other options which may give you better performance with lower cost. For example, if you need to review expected traffic for your databases on AWS, consider either increasing the instance size or using Amazon ElastiCache services (Redis or Memcached) to provide cached mechanisms for your databases. 2. Evaluate event-driven architecture: Using serverless architecture also allows you to build event-driven architecture for distributed microservice-based applications, which helps you build scalable, resilient, agile and cost- effective solutions.

๐Ÿ’ผ COST05-BP06 Perform cost analysis for different usage over time

Workloads can change over time. Some services or features are more cost effective at different usage levels. By performing the analysis on each component over time and at projected usage, the workload remains cost-effective over its lifetime. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance As AWS releases new services and features, the optimal services for your workload may change. Effort required should reflect potential benefits. Workload review frequency depends on your organization requirements. If it is a workload of significant cost, implementing new services sooner will maximize cost savings, so more frequent review can be advantageous. Another initiation for review is change in usage patterns. Significant changes in usage can indicate that alternate services would be more optimal. If you need to move data into AWS Cloud, you can select any wide variety of services AWS offers and partner tools to help you migrate your data sets, whether they are files, databases, machine images, block volumes, or even tape backups. For example, to move a large amount of data to and from AWS or process data at the edge, you can use one of the AWS purpose-built devices to cost effectively move petabytes of data offline. Another example is for higher data transfer rates, a direct connect service may be cheaper than a VPN which provides the required consistent connectivity for your business. Based on the cost analysis for different usage over time, review your scaling activity. Analyze the result to see if the scaling policy can be tuned to add instances with multiple instance types and purchase options. Review your settings to see if the minimum can be reduced to serve user requests but with a smaller fleet size, and add more resources to meet the expected high demand. Perform cost analysis for different usage over time by discussing with stakeholders in your organization and use AWS Cost Explorerโ€™s forecast feature to predict the potential impact of service changes. Monitor usage level launches using AWS Budgets, CloudWatch billing alarms and AWS Cost Anomaly Detection to identify and implement the most cost-effective services sooner. ### Implementation steps 1. Define predicted usage patterns: Working with your organization, such as marketing and product owners, document what the expected and predicted usage patterns will be for the workload. Discuss with business stakeholders about both historical and forecasted cost and usage increases and make sure increases align with business requirements. Identify calendar days, weeks, or months where you expect more users to use your AWS resources, which indicate that you should increase the capacity of the existing resources or adopt additional services to reduce the cost and increase performance. 2. Perform cost analysis at predicted usage: Using the usage patterns defined, perform analysis at each of these points. The analysis effort should reflect the potential outcome. For example, if the change in usage is large, a thorough analysis should be performed to verify any costs and changes. In other words, when cost increases, usage should increase for business as well.

๐Ÿ’ผ COST06-BP01 Perform cost modeling

Identify organization requirements (such as business needs and existing commitments) and perform cost modeling (overall costs) of the workload and each of its components. Perform benchmark activities for the workload under different predicted loads and compare the costs. The modeling effort should reflect the potential benefit. For example, time spent is proportional to component cost. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Perform cost modelling for your workload and each of its components to understand the balance between resources, and find the correct size for each resource in the workload, given a specific level of performance. Understanding cost considerations can inform your organizational business case and decision- making process when evaluating the value realization outcomes for planned workload deployment. Perform benchmark activities for the workload under different predicted loads and compare the costs. The modelling effort should reflect potential benefit; for example, time spent is proportional to component cost or predicted saving. For best practices, refer to the Review section of the Performance Efficiency Pillar of the AWS Well-Architected Framework. As an example, to create cost modeling for a workload consisting of compute resources, AWS Compute Optimizer can assist with cost modelling for running workloads. It provides right-sizing recommendations for compute resources based on historical usage. Make sure CloudWatch Agents are deployed to the Amazon EC2 instances to collect memory metrics which help you with more accurate recommendations within AWS Compute Optimizer. This is the ideal data source for compute resources because it is a free service that uses machine learning to make multiple recommendations depending on levels of risk. There are multiple services you can use with custom logs as data sources for rightsizing operations for other services and workload components, such as AWS Trusted Advisor, Amazon CloudWatch and Amazon CloudWatch Logs. AWS Trusted Advisor checks resources and flags resources with low utilization which can help you right size your resources and create cost modelling. The following are recommendations for cost modelling data and metrics: - The monitoring must accurately reflect the user experience. Select the correct granularity for the time period and thoughtfully choose the maximum or 99th percentile instead of the average. - Select the correct granularity for the time period of analysis that is required to cover any workload cycles. For example, if a two-week analysis is performed, you might be overlooking a monthly cycle of high utilization, which could lead to under-provisioning. - Choose the right AWS services for your planned workload by considering your existing commitments, selected pricing models for other workloads, and ability to innovate faster and focus on your core business value. ### Implementation steps - Perform cost modeling for resources: Deploy the workload or a proof of concept into a separate account with the specific resource types and sizes to test. Run the workload with the test data and record the output results, along with the cost data for the time the test was run. Afterwards, redeploy the workload or change the resource types and sizes and run the test again. Include license fees of any products you may use with these resources and estimated operations (labor or engineer) costs for deploying and managing these resources while creating cost modeling. Consider cost modeling for a period (hourly, daily, monthly, yearly or three years).

๐Ÿ’ผ COST06-BP02 Select resource type, size, and number based on data

Select resource size or type based on data about the workload and resource characteristics. For example, compute, memory, throughput, or write intensive. This selection is typically made using a previous (on-premises) version of the workload, using documentation, or using other sources of information about the workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Amazon EC2 provides a wide selection of instance types with different levels of CPU, memory, storage, and networking capacity to fit different use cases. These instance types feature different blends of CPU, memory, storage, and networking capabilities, giving you versatility when selecting the right resource combination for your projects. Every instance type comes in multiple sizes, so that you can adjust your resources based on your workloadโ€™s demands. To determine which instance type you need, gather details about the system requirements of the application or software that you plan to run on your instance. These details should include the following: - Operating system - Number of CPU cores - GPU cores - Amount of system memory (RAM) - Storage type and space - Network bandwidth requirement Identify the purpose of compute requirements and which instance is needed, and then explore the various Amazon EC2 instance families. Amazon offers the following instance type families: - General Purpose - Compute Optimized - Memory Optimized - Storage Optimized - Accelerated Computing - HPC Optimized System requirements gathering is critical for you to select the specific instance family and instance type that best serves your needs. Instance type names are comprised of the family name and the instance size. For example, the t2.micro instance is from the T2 family and is micro-sized. Select resource size or type based on workload and resource characteristics (for example, compute, memory, throughput, or write intensive). This selection is typically made using cost modelling, a previous version of the workload (such as an on-premises version), using documentation, or using other sources of information about the workload (whitepapers or published solutions). Using AWS pricing calculators or cost management tools can assist in making informed decisions about instance types, sizes, and configurations. ### Implementation steps - Select resources based on data: Use your cost modeling data to select the anticipated workload usage level, and choose the specified resource type and size. Relying on the cost modeling data, determine the number of virtual CPUs, total memory (GiB), the local instance store volume (GB), Amazon EBS volumes, and the network performance level, taking into account the data transfer rate required for the instance. Always make selections based on detailed analysis and accurate data to optimize performance while managing costs effectively.

๐Ÿ’ผ COST06-BP03 Select resource type, size, and number automatically based on metrics

Use metrics from the currently running workload to select the right size and type to optimize for cost. Appropriately provision throughput, sizing, and storage for compute, storage, data, and networking services. This can be done with a feedback loop such as automatic scaling or by custom code in the workload. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Create a feedback loop within the workload that uses active metrics from the running workload to make changes to that workload. You can use a managed service, such as AWS Auto Scaling, which you configure to perform the right sizing operations for you. AWS also provides APIs, SDKs, and features that allow resources to be modified with minimal effort. You can program a workload to stop-and-start an Amazon EC2 instance to allow a change of instance size or instance type. This provides the benefits of right-sizing while removing almost all the operational cost required to make the change. Some AWS services have built in automatic type or size selection, such as Amazon Simple Storage Service Intelligent-Tiering. Amazon S3 Intelligent- Tiering automatically moves your data between two access tiers, frequent access and infrequent access, based on your usage patterns. ### Implementation steps 1. Increase your observability by configuring workload metrics: Capture key metrics for the workload. These metrics provide an indication of the customer experience, such as workload output, and align to the differences between resource types and sizes, such as CPU and memory usage. For compute resource, analyze performance data to right size your Amazon EC2 instances. Identify idle instances and ones that are underutilized. Key metrics to look for are CPU usage and memory utilization (for example, 40% CPU utilization at 90% of the time as explained in Rightsizing with AWS Compute Optimizer and Memory Utilization Enabled). Identify instances with a maximum CPU usage and memory utilization of less than 40% over a four-week period. These are the instances to right size to reduce costs. For storage resources such as Amazon S3, you can use Amazon S3 Storage Lens, which allows you to see 28 metrics across various categories at the bucket level, and 14 days of historical data in the dashboard by default. You can filter your Amazon S3 Storage Lens dashboard by summary and cost optimization or events to analyze specific metrics. 2. View rightsizing recommendations: Use the rightsizing recommendations in AWS Compute Optimizer and the Amazon EC2 rightsizing tool in the Cost Management console, or review AWS Trusted Advisor right-sizing your resources to make adjustments on your workload. It is important to use the right tools when right-sizing different resources and follow right-sizing guidelines whether it is an Amazon EC2 instance, AWS storage classes, or Amazon RDS instance types. For storage resources, you can use Amazon S3 Storage Lens, which gives you visibility into object storage usage, activity trends, and makes actionable recommendations to optimize costs and apply data protection best practices. Using the contextual recommendations that Amazon S3 Storage Lens derives from analysis of metrics across your organization, you can take immediate steps to optimize your storage. 3. Select resource type and size automatically based on metrics: Using the workload metrics, manually or automatically select your workload resources. For compute resources, configuring AWS Auto Scaling or implementing code within your application can reduce the effort required if frequent changes are needed, and it can potentially implement changes sooner than a manual process. You can launch and automatically scale a fleet of On-Demand Instances and Spot Instances within a single Auto Scaling group. In addition to receiving discounts for using Spot Instances, you can use Reserved Instances or a Savings Plan to receive discounted rates of the regular On- Demand Instance pricing. All of these factors combined help you optimize your cost savings for Amazon EC2 instances and determine the desired scale and performance for your application. You can also use an attribute-based instance type selection (ABS) strategy in Auto Scaling Groups (ASG), which lets you express your instance requirements as a set of attributes, such as vCPU, memory, and storage. You can automatically use newer generation instance types when they are released and access a broader range of capacity with Amazon EC2 Spot Instances. Amazon EC2 Fleet and Amazon EC2 Auto Scaling select and launch instances that fit the specified attributes, removing the need to manually pick instance types. For storage resources, you can use the Amazon S3 Intelligent Tiering and Amazon EFS Infrequent Access features, which allow you to select storage classes automatically that deliver automatic storage cost savings when data access patterns change, without performance impact or operational overhead.

๐Ÿ’ผ COST06-BP04 Consider using shared resources

For already-deployed services at the organization level for multiple business units, consider using shared resources to increase utilization and reduce total cost of ownership (TCO). Using shared resources can be a cost-effective option to centralize the management and costs by using existing solutions, sharing components, or both. Manage common functions like monitoring, backups, and connectivity either within an account boundary or in a dedicated account. You can also reduce cost by implementing standardization, reducing duplication, and reducing complexity. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Where multiple workloads cause the same function, use existing solutions and shared components to improve management and optimize costs. Consider using existing resources (especially shared ones), such as non-production database servers or directory services, to mitigate cloud costs by following security best practices and organizational regulations. For optimal value realization and efficiency, it is crucial to allocate costs back (using showback and chargeback) to the pertinent areas of the business driving consumption. Showback refers to reports that break down cloud costs into attributable categories, such as consumers, business units, general ledger accounts, or other responsible entities. The goal of showback is to show teams, business units, or individuals the cost of their consumed cloud resources. Chargeback means to allocate central service spend to cost units based on a strategy suitable for a specific financial management process. For customers, chargeback charges the cost incurred from one shared services account to different financial cost categories suitable for a customer reporting process. By establishing chargeback mechanisms, you can report costs incurred by different business units, products, and teams. Workloads can be categorized as critical and non-critical. Based on this classification, use shared resources with general configurations for less critical workloads. To further optimize costs, reserve dedicated servers solely for critical workloads. Share resources or provision them across several accounts to manage them efficiently. Even with distinct development, testing, and production environments, secure sharing is feasible and does not compromise organizational structure. To improve your understanding and optimize cost and usage for containerized applications, use split cost allocation data which helps you allocate costs to individual business entities based on how the application consumes shared compute and memory resources. Split cost allocation data helps you achieve task-level showback and chargeback in container workloads running on Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). For distributed architectures, build a shared services VPC, which provides centralized access to shared services required by workloads in each of the VPCs. These shared services can include resources such as directory services or VPC endpoints. To reduce administrative overhead and cost, share resources from a central location instead of building them in each VPC. When you use shared resources, you can save on operational costs, maximize resource utilization, and improve consistency. In a multi-account design, you can host some AWS services centrally and access them using several applications and accounts in a hub to save cost. You can use AWS Resource Access Manager (AWS RAM) to share other common resources, such as VPC subnets and AWS Transit Gateway attachments, AWS Network Firewall, or Amazon SageMaker AI pipelines. In a multi-account environment, use AWS RAM to create a resource once and share it with other accounts. Organizations should tag shared costs effectively and verify that they do not have a significant portion of their costs untagged or unallocated. If you do not allocate shared costs effectively and no one takes accountability for shared costs management, shared cloud costs can spiral. You should know where you have incurred costs at the resource, workload, team, or organization level, as this knowledge enhances your understanding of the value delivered at the applicable level when compared to the business outcomes achieved. Ultimately, organizations benefit from cost savings as a result of sharing cloud infrastructure. Encourage cost allocation on shared cloud resources to optimize cloud spend. ### Implementation steps 1. Evaluate existing resources: Review existing workloads that use similar services for your workload. Depending on the workloadโ€™s components, consider existing platforms if business logic or technical requirement allow. 2. Use resource sharing in AWS RAM and restrict accordingly: Use AWS RAM to share resources with other AWS accounts within your organization. When you share resources, you donโ€™t need to duplicate resources in multiple accounts, which minimizes the operational burden of resource maintenance. This process also helps you securely share the resources that you have created with roles and users in your account, as well as with other AWS accounts. 3. Tag resources: Tag resources that are candidates for cost reporting and categorize them within cost categories. Activate these cost related resource tags for cost allocation to provide visibility of AWS resources usage. Focus on creating an appropriate level of granularity with respect to cost and usage visibility, and in๏ฌ‚uence cloud consumption behaviors through cost allocation reporting and KPI tracking.

๐Ÿ’ผ COST07-BP01 Perform pricing model analysis

Analyze each component of the workload. Determine if the component and resources will be running for extended periods (for commitment discounts) or dynamic and short-running (for spot or on-demand). Perform an analysis on the workload using the recommendations in cost management tools and apply business rules to those recommendations to achieve high returns. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS has multiple pricing models that allow you to pay for your resources in the most cost-effective way that suits your organizationโ€™s needs and depending on product. Work with your teams to determine the most appropriate pricing model. Often your pricing model consists of a combination of multiple options, as determined by your availability. **On-Demand Instances** allow you pay for compute or database capacity by the hour or by the second (60 seconds minimum) depending on which instances you run, without long-term commitments or upfront payments. **Savings Plans** are a flexible pricing model that offers low prices on Amazon EC2, Lambda, and AWS Fargate usage, in exchange for a commitment to a consistent amount of usage (measured in dollars per hour) over one year or three years terms. **Spot Instances** are an Amazon EC2 pricing mechanism that allows you request spare compute capacity at discounted hourly rate (up to 90% off the on-demand price) without upfront commitment. **Reserved Instances** allow you up to 75 percent discount by prepaying for capacity. For more details, see Optimizing costs with reservations. You might choose to include a Savings Plans for the resources associated with the production, quality, and development environments. Alternatively, because sandbox resources are only powered on when needed, you might choose an on-demand model for the resources in that environment. Use Amazon Spot Instances to reduce Amazon EC2 costs or use Compute Savings Plans to reduce Amazon EC2, Fargate, and Lambda cost. The AWS Cost Explorer recommendations tool provides opportunities for commitment discounts with Saving plans. If you have been purchasing Reserved Instances for Amazon EC2 in the past or have established cost allocation practices inside your organization, you can continue using Amazon EC2 Reserved Instances for the time being. However, we recommend working on a strategy to use Savings Plans in the future as a more flexible cost savings mechanism. You can refresh Savings Plans (SP) Recommendations in AWS Cost Management to generate new Savings Plans Recommendations at any time. Use Reserved Instances (RI) to reduce Amazon RDS, Amazon Redshift, Amazon ElastiCache, and Amazon OpenSearch Service costs. Saving Plans and Reserved Instances are available in three options: all upfront, partial upfront and no upfront payments. Use the recommendations provided in AWS Cost Explorer RI and SP purchase recommendations. To find opportunities for Spot workloads, use an hourly view of your overall usage, and look for regular periods of changing usage or elasticity. You can use Spot Instances for various fault-tolerant and flexible applications. Examples include stateless web servers, API endpoints, big data and analytics applications, containerized workloads, CI/CD, and other flexible workloads. Analyze your Amazon EC2 and Amazon RDS instances whether they can be turned off when you donโ€™t use (after hours and weekends). This approach will allow you to reduce costs by 70% or more compared to using them 24/7. If you have Amazon Redshift clusters that only need to be available at specific times, you can pause the cluster and later resume it. When the Amazon Redshift cluster or Amazon EC2 and Amazon RDS Instance is stopped, the compute billing halts and only the storage charge applies. Note that On-Demand Capacity reservations (ODCR) are not a pricing discount. Capacity Reservations are charged at the equivalent On-Demand rate, whether you run instances in reserved capacity or not. They should be considered when you need to provide enough capacity for the resources you plan to run. ODCRs don't have to be tied to long-term commitments, as they can be cancelled when you no longer need them, but they can also benefit from the discounts that Savings Plans or Reserved Instances provide. ### Implementation steps 1. Analyze workload elasticity: Using the hourly granularity in Cost Explorer or a custom dashboard, analyze your workload's elasticity. Look for regular changes in the number of instances that are running. Short duration instances are candidates for Spot Instances or Spot Fleet. 2. Review existing pricing contracts: Review current contracts or commitments for long term needs. Analyze what you currently have and how much those commitments are in use. Leverage pre-existing contractual discounts or enterprise agreements. Enterprise Agreements give customers the option to tailor agreements that best suit their needs. For long term commitments, consider reserved pricing discounts, Reserved Instances or Savings Plans for the specific instance type, instance family, AWS Region, and Availability Zones. 3. Perform a commitment discount analysis: Using Cost Explorer in your account, review the Savings Plans and Reserved Instance recommendations.

๐Ÿ’ผ COST07-BP02 Choose Regions based on cost

Resource pricing may be different in each Region. Identify Regional cost differences and only deploy in Regions with higher costs to meet latency, data residency and data sovereignty requirements. Factoring in Region cost helps you pay the lowest overall price for this workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance The AWS Cloud Infrastructure is global, hosted in multiple locations world-wide, and built around AWS Regions, Availability Zones, Local Zones, AWS Outposts, and Wavelength Zones. A Region is a physical location in the world and each Region is a separate geographic area where AWS has multiple Availability Zones. Availability Zones, which are multiple isolated locations within each Region, consist of one or more discrete data centers, each with redundant power, networking, and connectivity. Each AWS Region operates within local market conditions, and resource pricing is different in each Region due to differences in the cost of land, fiber, electricity, and taxes, for example. Choose a specific Region to operate a component of or your entire solution so that you can run at the lowest possible price globally. Use AWS Calculator to estimate the costs of your workload in various Regions by searching services by location type (Region, Wavelength Zone, and Local Zone) and Region. When you architect your solutions, a best practice is to seek to place computing resources closer to users to provide lower latency and strong data sovereignty. Select the geographic location based on your business, data privacy, performance, and security requirements. For applications with global end users, use multiple locations. Use Regions that provide lower prices for AWS services to deploy your workloads if you have no obligations in data privacy, security, and business requirements. For example, if your default Region is Asia Pacific (Sydney) (ap-southwest-2), and if there are no restrictions (data privacy, security, for example) to use other Regions, deploying non-critical (development and test) Amazon EC2 instances in US East (N. Virginia) (us-east-1) will cost you less. ### Implementation steps 1. Review AWS Region pricing: Analyze the workload costs in the current Region. Starting with the highest costs by service and usage type, calculate the costs in other Regions that are available. If the forecasted saving outweighs the cost of moving the component or workload, migrate to the new Region. 2. Review requirements for multi-Region deployments: Analyze your business requirements and obligations (data privacy, security, or performance) to find out if there are any restrictions for you to not use multiple Regions. If there are no obligations to restrict you to use a single Region, then use multiple Regions. 3. Analyze required data transfer: Consider data transfer costs when selecting Regions. Keep your data close to your customer and close to the resources. Select less costly AWS Regions where data flows and where there is minimal data transfer. Depending on your business requirements for data transfer, you can use Amazon CloudFront, AWS PrivateLink, AWS Direct Connect, and AWS Virtual Private Network to reduce your networking costs, improve performance, and enhance security.

๐Ÿ’ผ COST07-BP03 Select third-party agreements with cost-efficient terms

Cost efficient agreements and terms ensure the cost of these services scales with the benefits they provide. Select agreements and pricing that scale when they provide additional benefits to your organization. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance There are multiple products on the market that can help you manage costs in your cloud environments. They may have some differences in terms of features that depend on customer requirements, such as some focusing on cost governance or cost visibility and others on cost optimization. One key factor for effective cost optimization and governance is using the right tool with necessary features and the right pricing model. These products have different pricing models. Some charge you a certain percentage of your monthly bill, while others charge a percentage of your realized savings. Ideally, you should pay only for what you need. When you use third-party solutions or services in the cloud, it's important that the pricing structures are aligned to your desired outcomes. Pricing should scale with the outcomes and value it provides. For example, in software that takes a percentage of savings it provides, the more you save (outcome), the more it charges. License agreements where you pay more as your expenses increase might not always be in your best interest for optimizing costs. However, if the vendor offers clear benefits for all parts of your bill, this scaling fee might be justified. For example, a solution that provides recommendations for Amazon EC2 and charges a percentage of your entire bill can become more expensive if you use other services that provide no benefit. Another example is a managed service that is charged at a percentage of the cost of managed resources. A larger instance size may not necessarily require more management effort, but can be charged more. Verify that these service pricing arrangements include a cost optimization program or features in their service to drive efficiency. Customers may find these products on the market more advanced or easier to use. You need to consider the cost of these products and think about potential cost optimization outcomes in the long term. ### Implementation steps - Analyze third-party agreements and terms: Review the pricing in third-party agreements. Perform modeling for different levels of your usage, and factor in new costs such as new service usage, or increases in current services due to workload growth. Decide if the additional costs provide the required benefits to your business.

๐Ÿ’ผ COST07-BP04 Implement pricing models for all components of this workload

Permanently running resources should utilize reserved capacity such as Savings Plans or Reserved Instances. Short-term capacity is configured to use Spot Instances, or Spot Fleet. On-Demand Instances are only used for short-term workloads that cannot be interrupted and do not run long enough for reserved capacity, between 25% to 75% of the period, depending on the resource type. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance To improve cost efficiency, AWS provides multiple commitment recommendations based on your past usage. You can use these recommendations to understand what you can save, and how the commitment will be used. You can use these services as On-Demand, Spot, or make a commitment for a certain period of time and reduce your on-demand costs with Reserved Instances (RIs) and Savings Plans (SPs). You need to understand not only each workload components and multiple AWS services, but also commitment discounts, purchase options, and Spot Instances for these services to optimize your workload. Consider the requirements of your workloadโ€™s components, and understand the different pricing models for these services. Define the availability requirement of these components. Determine if there are multiple independent resources that perform the function in the workload, and what the workload requirements are over time. Compare the cost of the resources using the default On-Demand pricing model and other applicable models. Factor in any potential changes in resources or workload components. For example, letโ€™s look at this Web Application Architecture on AWS. This sample workload consists of multiple AWS services, such as Amazon Route 53, AWS WAF, Amazon CloudFront, Amazon EC2 instances, Amazon RDS instances, Load Balancers, Amazon S3 storage, and Amazon Elastic File System (Amazon EFS). You need to review each of these services, and identify potential cost saving opportunities with different pricing models. Some of them may be eligible for RIs or SPs, while some of them may be only available by on-demand. As the following image shows, some of the AWS services can be committed using RIs or SPs. ### Implementation steps 1. Implement pricing models: Using your analysis results, purchase Savings Plans, Reserved Instances, or implement Spot Instances. If it is your first commitment purchase, choose the top five or ten recommendations in the list, then monitor and analyze the results over the next month or two. AWS Cost Management Console guides you through the process. Review the RI or SP recommendations from the console, customize the recommendations (type, payment, and term), and review hourly commitment (for example $20 per hour), and then add to cart. Discounts apply automatically to eligible usage. Purchase a small amount of commitment discounts in regular cycles (for example every 2 weeks or monthly). Implement Spot Instances for workloads that can be interrupted or are stateless. Finally, select on-demand Amazon EC2 instances and allocate resources for the remaining requirements. 2. Workload review cycle: Implement a review cycle for the workload that specifically analyzes pricing model coverage. Once the workload has the required coverage, purchase additional commitment discounts partially (every few months), or as your organization usage changes.

๐Ÿ’ผ COST07-BP05 Perform pricing model analysis at the management account level

Check billing and cost management tools and see recommended discounts with commitments and reservations to perform regular analysis at the management account level. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Performing regular cost modeling helps you implement opportunities to optimize across multiple workloads. For example, if multiple workloads use On-Demand Instances at an aggregate level, the risk of change is lower, and implementing a commitment-based discount can achieve a lower overall cost. It is recommended to perform analysis in regular cycles of two weeks to one month. This allows you to make small adjustment purchases, so the coverage of your pricing models continues to evolve with your changing workloads and their components. Use the AWS Cost Explorer recommendations tool to find opportunities for commitment discounts in your management account. Recommendations at the management account level are calculated considering usage across all of the accounts in your AWS organization that have Reserved Instances (RI) or Savings Plans (SP). They're also calculated when discount sharing is activated to recommend a commitment that maximizes savings across accounts. While purchasing at the management account level optimizes for max savings in many cases, there may be situations where you might consider purchasing SPs at the linked account level, like when you want the discounts to apply first to usage in that particular linked account. Member account recommendations are calculated at the individual account level, to maximize savings for each isolated account. If your account owns both RI and SP commitments, they will be applied in this order: - Zonal RI - Standard RI - Convertible RI - Instance Savings Plan - Compute Savings Plan If you purchase an SP at the management account level, the savings will be applied based on highest to lowest discount percentage. SPs at the management account level look across all linked accounts and apply the savings wherever the discount will be the highest. If you wish to restrict where the savings are applied, you can purchase a Savings Plan at the linked account level and any time that account is running eligible compute services, the discount will be applied there first. When the account is not running eligible compute services, the discount will be shared across the other linked accounts under the same management account. Discount sharing is turned on by default, but can be turned off if needed. In a Consolidated Billing Family, Savings Plans are applied first to the owner account's usage, and then to other accounts' usage. This occurs only if you have sharing enabled. Your Savings Plans are applied to your highest savings percentage first. If there are multiple usages with equal savings percentages, Savings Plans are applied to the first usage with the lowest Savings Plans rate. Savings Plans continue to apply until there are no more remaining uses or your commitment is exhausted. Any remaining usage is charged at the On-Demand rates. You can refresh Savings Plans Recommendations in AWS Cost Management to generate new Savings Plans Recommendations at any time. After analyzing flexibility of instances, you can commit by following recommendations. Create cost modeling by analyzing the workloadโ€™s short-term costs with potential different resource options, analyzing AWS pricing models, and aligning them with your business requirements to find out total cost of ownership and cost optimization opportunities. ### Implementation steps Perform a commitment discount analysis: Use Cost Explorer in your account to review the Savings Plans and Reserved Instance recommendations. Make sure you understand Saving Plan recommendations, and estimate your monthly spend and monthly savings. Review recommendations at the management account level, which are calculated considering usage across all of the member accounts in your AWS organization that have RI or Savings Plans discount sharing enabled for maximum savings across accounts. You can verify that you implemented the correct recommendations with the required discounts and risk by following the Well-Architected labs.

๐Ÿ’ผ COST08-BP01 Perform data transfer modeling

Gather organization requirements and perform data transfer modeling of the workload and each of its components. This identifies the lowest cost point for its current data transfer requirements. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance When designing a solution in the cloud, data transfer fees are usually neglected due to habits of designing architecture using on-premises data centers or lack of knowledge. Data transfer charges in AWS are determined by the source, destination, and volume of traffic. Factoring in these fees during the design phase can lead to cost savings. Understanding where the data transfer occurs in your workload, the cost of the transfer, and its associated benefit is very important to accurately estimate total cost of ownership (TCO). This allows you to make an informed decision to modify or accept the architectural decision. For example, you may have a Multi-Availability Zone configuration where you replicate data between the Availability Zones. You model the components of services which transfer the data in your workload, and decide that this is an acceptable cost (similar to paying for compute and storage in both Availability Zones) to achieve the required reliability and resilience. Model the costs over different usage levels. Workload usage can change over time, and different services may be more cost effective at different levels. While modelling your data transfer, think about how much data is ingested and where that data comes from. Additionally, consider how much data is processed and how much storage or compute capacity is needed. During modelling, follow networking best practices for your workload architecture to optimize your potential data transfer costs. The AWS Pricing Calculator can help you see estimated costs for specific AWS services and expected data transfer. If you have a workload already running (for test purposes or in a pre-production environment), use AWS Cost Explorer or AWS Cost and Usage Report (CUR) to understand and model your data transfer costs. Configure a proof of concept (PoC) or test your workload, and run a test with a realistic simulated load. You can model your costs at different workload demands. ### Implementation steps 1. Identify requirements: What is the primary goal and business requirements for the planned data transfer between source and destination? What is the expected business outcome at the end? Gather business requirements and define expected outcome. 2. Identify source and destination: What is the data source and destination for the data transfer, such as within AWS Regions, to AWS services, or out to the internet? 3. Identify data classifications: What is the data classification for this data transfer? What kind of data is it? How big is the data? How frequently must data be transferred? Is data sensitive? 4. Identify AWS services or tools to use: Which AWS services are used for this data transfer? Is it possible to use an already-provisioned service for another workload? 5. Calculate data transfer costs: Use AWS Pricing and the data transfer modeling you created previously to calculate the data transfer costs for the workload. Calculate the data transfer costs at different usage levels, for both increases and reductions in workload usage. Where there are multiple options for the workload architecture, calculate the cost for each option for comparison. 6. Link costs to outcomes: For each data transfer cost incurred, specify the outcome that it achieves for the workload. If it is transfer between components, it may be for decoupling; if it is between Availability Zones it may be for redundancy. 7. Create data transfer modeling: After gathering all information, create a conceptual base data transfer modeling for multiple use cases and different workloads.

๐Ÿ’ผ COST08-BP02 Select components to optimize data transfer cost

All components are selected, and architecture is designed to reduce data transfer costs. This includes using components such as wide-area-network (WAN) optimization and Multi-Availability Zone (AZ) configurations. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Architecting for data transfer minimizes data transfer costs. This may involve using content delivery networks to locate data closer to users, or using dedicated network links from your premises to AWS. You can also use WAN optimization and application optimization to reduce the amount of data that is transferred between components. When transferring data to or within the AWS Cloud, it is essential to know the destination based on varied use cases, the nature of the data, and the available network resources in order to select the right AWS services to optimize data transfer. AWS offers a range of data transfer services tailored for diverse data migration requirements. Select the right data storage and data transfer options based on the business needs within your organization. When planning or reviewing your workload architecture, consider the following: - Use VPC endpoints within AWS: VPC endpoints allow for private connections between your VPC and supported AWS services. This allows you to avoid using the public internet, which can lead to data transfer costs. - Use a NAT gateway: Use a NAT gateway so that instances in a private subnet can connect to the internet or to the services outside your VPC. Check whether the resources behind the NAT gateway that send the most traffic are in the same Availability Zone as the NAT gateway. If they are not, create new NAT gateways in the same Availability Zone as the resource to reduce cross-AZ data transfer charges. - Use AWS Direct Connect: AWS Direct Connect bypasses the public internet and establishes a direct, private connection between your on-premises network and AWS. This can be more cost-effective and consistent than transferring large volumes of data over the internet. - Avoid transferring data across Regional boundaries: Data transfers between AWS Regions (from one Region to another) typically incur charges. It should be a very thoughtful decision to pursue a multi-Region path. - Monitor data transfer: Use Amazon CloudWatch and VPC flow logs to capture details about your data transfer and network usage. Analyze captured network traffic information in your VPCs, such as IP address or range going to and from network interfaces. - Analyze your network usage: Use metering and reporting tools such as AWS Cost Explorer, CUDOS Dashboards, or CloudWatch to understand data transfer cost of your workload. ### Implementation steps - Select components for data transfer: Using the data transfer modeling explained in COST08-BP01 Perform data transfer modeling, focus on where the largest data transfer costs are or where they would be if the workload usage changes. Look for alternative architectures or additional components that remove or reduce the need for data transfer (or lower its cost).

๐Ÿ’ผ COST08-BP03 Implement services to reduce data transfer costs

Implement services to reduce data transfer. For example, use edge locations or content delivery networks (CDN) to deliver content to end users, build caching layers in front of your application servers or databases, and use dedicated network connections instead of VPN for connectivity to the cloud. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance There are various AWS services that can help you to optimize your network data transfer usage. Depending on your workload components, type, and cloud architecture, these services can assist you in compression, caching, and sharing and distribution of your traffic on the cloud. - Amazon CloudFront is a global content delivery network that delivers data with low latency and high transfer speeds. It caches data at edge locations across the world, which reduces the load on your resources. By using CloudFront, you can reduce the administrative effort in delivering content to large numbers of users globally with minimum latency. The security savings bundle can help you to save up to 30% on your CloudFront usage if you plan to grow your usage over time. - AWS Direct Connect allows you to establish a dedicated network connection to AWS. This can reduce network costs, increase bandwidth, and provide a more consistent network experience than internet-based connections. - AWS VPN allows you to establish a secure and private connection between your private network and the AWS global network. It is ideal for small offices or business partners because it provides simplified connectivity, and it is a fully managed and elastic service. - VPC Endpoints allow connectivity between AWS services over private networking and can be used to reduce public data transfer and NAT gateway costs. Gateway VPC endpoints have no hourly charges, and support Amazon S3 and Amazon DynamoDB. Interface VPC endpoints are provided by AWS PrivateLink and have an hourly fee and per-GB usage cost. - NAT gateways provide built-in scaling and management for reducing costs as opposed to a standalone NAT instance. Place NAT gateways in the same Availability Zones as high traffic instances and consider using VPC endpoints for the instances that need to access Amazon DynamoDB or Amazon S3 to reduce the data transfer and processing costs. - Use AWS Snow Family devices which have computing resources to collect and process data at the edge. AWS Snow Family devices (Snowball Edge, Snowball Edge and Snowmobile) allow you to move petabytes of data to the AWS Cloud cost effectively and offline. ### Implementation steps - Implement services: Select applicable AWS network services based on your service workload type using the data transfer modeling and reviewing VPC Flow Logs. Look at where the largest costs and highest volume flows are. Review the AWS services and assess whether there is a service that reduces or removes the transfer, specifically networking and content delivery. Also look for caching services where there is repeated access to data or large amounts of data.

๐Ÿ’ผ COST09-BP01 Perform an analysis on the workload demand

Analyze the demand of the workload over time. Verify that the analysis covers seasonal trends and accurately represents operating conditions over the full workload lifetime. Analysis effort should reflect the potential benefit, for example, time spent is proportional to the workload cost. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Analyzing workload demand for cloud computing involves understanding the patterns and characteristics of computing tasks that are initiated in the cloud environment. This analysis helps users optimize resource allocation, manage costs, and verify that performance meets required levels. Know the requirements of the workload. Your organization's requirements should indicate the workload response times for requests. The response time can be used to determine if the demand is managed, or if the supply of resources should change to meet the demand. The analysis should include the predictability and repeatability of the demand, the rate of change in demand, and the amount of change in demand. Perform the analysis over a long enough period to incorporate any seasonal variance, such as end-of-month processing or holiday peaks. Analysis effort should reflect the potential benefits of implementing scaling. Look at the expected total cost of the component and any increases or decreases in usage and cost over the workload's lifetime. The following are some key aspects to consider when performing workload demand analysis for cloud computing: - Resource utilization and performance metrics: Analyze how AWS resources are being used over time. Determine peak and off-peak usage patterns to optimize resource allocation and scaling strategies. Monitor performance metrics such as response times, latency, throughput, and error rates. These metrics help assess the overall health and efficiency of the cloud infrastructure. - User and application scaling behaviour: Understand user behavior and how it affects workload demand. Examining the patterns of user traffic assists in enhancing the delivery of content and the responsiveness of applications. Analyze how workloads scale with increasing demand. Determine whether auto-scaling parameters are configured correctly and effectively for handling load fluctuations. - Workload types: Identify the different types of workloads running in the cloud, such as batch processing, real-time data processing, web applications, databases, or machine learning. Each type of workload may have different resource requirements and performance profiles. - Service-level agreements (SLAs): Compare actual performance with SLAs to ensure compliance and identify areas that need improvement. You can use Amazon CloudWatch to collect and track metrics, monitor log files, set alarms, and automatically react to changes in your AWS resources. You can also use Amazon CloudWatch to gain system-wide visibility into resource utilization, application performance, and operational health. With AWS Trusted Advisor, you can provision your resources following best practices to improve system performance and reliability, increase security, and look for opportunities to save money. You can also turn off non-production instances and use Amazon CloudWatch and Auto Scaling to match increases or reductions in demand. Finally, you can use AWS Cost Explorer or QuickSight with the AWS Cost and Usage Report (CUR) file or your application logs to perform advanced analysis of workload demand. Overall, a comprehensive workload demand analysis allows organizations to make informed decisions about resource provisioning, scaling, and optimization, leading to better performance, cost efficiency, and user satisfaction. ### Implementation steps 1. Analyze existing workload data: Analyze data from the existing workload, previous versions of the workload, or predicted usage patterns. Use Amazon CloudWatch, log files and monitoring data to gain insight on how workload was used. Analyze a full cycle of the workload, and collect data for any seasonal changes such as end-of- month or end-of-year events. The effort reflected in the analysis should reflect the workload characteristics. The largest effort should be placed on high-value workloads that have the largest changes in demand. The least effort should be placed on low-value workloads that have minimal changes in demand. 2. Forecast outside influence: Meet with team members from across the organization that can influence or change the demand in the workload. Common teams would be sales, marketing, or business development. Work with them to know the cycles they operate within, and if there are any events that would change the demand of the workload. Forecast the workload demand with this data.

๐Ÿ’ผ COST09-BP02 Implement a buffer or throttle to manage demand

Buffering and throttling modify the demand on your workload, smoothing out any peaks. Implement throttling when your clients perform retries. Implement buffering to store the request and defer processing until a later time. Verify that your throttles and buffers are designed so clients receive a response in the required time. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Implementing a buffer or throttle is crucial in cloud computing in order to manage demand and reduce the provisioned capacity required for your workload. For optimal performance, it's essential to gauge the total demand, including peaks, the pace of change in requests, and the necessary response time. When clients have the ability to resend their requests, it becomes practical to apply throttling. Conversely, for clients lacking retry functionalities, the ideal approach is implementing a buffer solution. Such buffers streamline the influx of requests and optimize the interaction of applications with varied operational speeds. Assume a workload with the demand curve shown in preceding image. This workload has two peaks, and to handle those peaks, the resource capacity as shown by orange line is provisioned. The resources and energy used for this workload are not indicated by the area under the demand curve, but the area under the provisioned capacity line, as provisioned capacity is needed to handle those two peaks. Flattening the workload demand curve can help you to reduce the provisioned capacity for a workload and reduce its environmental impact. To smooth out the peak, consider to implement throttling or buffering solution. To understand them better, letโ€™s explore throttling and buffering: - Throttling: If the source of the demand has retry capability, then you can implement throttling. Throttling tells the source that if it cannot service the request at the current time, it should try again later. The source waits for a period of time, and then retries the request. Implementing throttling has the advantage of limiting the maximum amount of resources and costs of the workload. In AWS, you can use Amazon API Gateway to implement throttling. - Buffer based: A buffer-based approach uses producers (components that send messages to the queue), consumers (components that receive messages from the queue), and a queue (which holds messages) to store the messages. Messages are read by consumers and processed, allowing the messages to run at the rate that meets the consumersโ€™ business requirements. By using a buffer-centric methodology, messages from producers are housed in queues or streams, ready to be accessed by consumers at a pace that aligns with their operational demands. In AWS, you can choose from multiple services to implement a buffering approach. Amazon Simple Queue Service (Amazon SQS) is a managed service that provides queues that allow a single consumer to read individual messages. Amazon Kinesis provides a stream that allows many consumers to read the same messages. Buffering and throttling can smooth out any peaks by modifying the demand on your workload. Use throttling when clients retry actions and use buffering to hold the request and process it later. When working with a buffer-based approach, architect your workload to service the request in the required time, verify that you are able to handle duplicate requests for work. Analyze the overall demand, rate of change, and required response time to right size the throttle or buffer required. ### Implementation steps 1. Analyze the client requirements: Analyze the client requests to determine if they are capable of performing retries. For clients that cannot perform retries, buffers need to be implemented. Analyze the overall demand, rate of change, and required response time to determine the size of throttle or buffer required. 2. Implement a buffer or throttle: Implement a buffer or throttle in the workload. A queue such as Amazon Simple Queue Service (Amazon SQS) can provide a buffer to your workload components. Amazon API Gateway can provide throttling for your workload components.

๐Ÿ’ผ COST09-BP03 Supply resources dynamically

Resources are provisioned in a planned manner. This can be demand-based, such as through automatic scaling, or time-based, where demand is predictable and resources are provided based on time. These methods result in the least amount of over-provisioning or under-provisioning. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance There are several ways for AWS customers to increase the resources available to their applications and supply resources to meet the demand. One of these options is to use AWS Instance Scheduler, which automates the starting and stopping of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Relational Database Service (Amazon RDS) instances. The other option is to use AWS Auto Scaling, which allows you to automatically scale your computing resources based on the demand of your application or service. Supplying resources based on demand will allow you to pay for the resources you use only, reduce cost by launching resources when they are needed, and terminate them when they aren't. AWS Instance Scheduler allows you to configure the stop and start of your Amazon EC2 and Amazon RDS instances at defined times so that you can meet the demand for the same resources within a consistent time pattern such as every day user access Amazon EC2 instances at eight in the morning that they donโ€™t need after six at night. This solution helps reduce operational cost by stopping resources that are not in use and starting them when they are needed. You can also easily configure schedules for your Amazon EC2 instances across your accounts and Regions with a simple user interface (UI) using AWS Systems Manager Quick Setup. You can schedule Amazon EC2 or Amazon RDS instances with AWS Instance Scheduler and you can stop and start existing instances. However, you cannot stop and start instances which are part of your Auto Scaling group (ASG) or that manage services such as Amazon Redshift or Amazon OpenSearch Service. Auto Scaling groups have their own scheduling for the instances in the group and these instances are created. AWS Auto Scaling helps you adjust your capacity to maintain steady, predictable performance at the lowest possible cost to meet changing demand. It is a fully managed and free service to scale the capacity of your application that integrates with Amazon EC2 instances and Spot Fleets, Amazon ECS, Amazon DynamoDB, and Amazon Aurora. Auto Scaling provides automatic resource discovery to help find resources in your workload that can be configured, it has built-in scaling strategies to optimize performance, costs, or a balance between the two, and provides predictive scaling to assist with regularly occurring spikes. There are multiple scaling options available to scale your Auto Scaling group: - Maintain current instance levels at all times - Scale manually - Scale based on a schedule - Scale based on demand - Use predictive scaling Auto Scaling policies differ and can be categorized as dynamic and scheduled scaling policies. Dynamic policies are manual or dynamic scaling which, scheduled or predictive scaling. You can use scaling policies for dynamic, scheduled, and predictive scaling. You can also use metrics and alarms from Amazon CloudWatch to trigger scaling events for your workload. We recommend you use launch templates, which allow you to access the latest features and improvements. Not all Auto Scaling features are available when you use launch configurations. For example, you cannot create an Auto Scaling group that launches both Spot and On-Demand Instances or that specifies multiple instance types. You must use a launch template to configure these features. When using launch templates, we recommended you version each one. With versioning of launch templates, you can create a subset of the full set of parameters. Then, you can reuse it to create other versions of the same launch template. You can use AWS Auto Scaling or incorporate scaling in your code with AWS APIs or SDKs. This reduces your overall workload costs by removing the operational cost from manually making changes to your environment, and changes can be performed much faster. This also matches your workload resourcing to your demand at any time. In order to follow this best practice and supply resources dynamically for your organization, you should understand horizontal and vertical scaling in the AWS Cloud, as well as the nature of the applications running on Amazon EC2 instances. It is better for your Cloud Financial Management team to work with technical teams to follow this best practice. Elastic Load Balancing (Elastic Load Balancing) helps you scale by distributing demand across multiple resources. With using ASG and Elastic Load Balancing, you can manage incoming requests by optimally routing traffic so that no one instance is overwhelmed in an Auto Scaling group. The requests would be distributed among all the targets of a target group in a round-robin fashion without consideration for capacity or utilization. Typical metrics can be standard Amazon EC2 metrics, such as CPU utilization, network throughput, and Elastic Load Balancing observed request and response latency. When possible, you should use a metric that is indicative of customer experience, typically a custom metric that might originate from application code within your workload. To elaborate how to meet the demand dynamically in this document, we will group Auto Scaling into two categories as demand-based and time-based supply models and deep dive into each. **Demand-based supply:** Take advantage of elasticity of the cloud to supply resources to meet changing demand by relying on near real-time demand state. For demand-based supply, use APIs or service features to programmatically vary the amount of cloud resources in your architecture. This allows you to scale components in your architecture and increase the number of resources during demand spikes to maintain performance and decrease capacity when demand subsides to reduce costs. - Simple/Step Scaling: Monitors metrics and adds/removes instances as per steps defined by the customers manually. - Target Tracking: Thermostat-like control mechanism that automatically adds or removes instances to maintain metrics at a customer defined target. When architecting with a demand-based approach keep in mind two key considerations. First, understand how quickly you must provision new resources. Second, understand that the size of margin between supply and demand will shift. You must be ready to cope with the rate of change in demand and also be ready for resource failures. **Time-based supply:** A time-based approach aligns resource capacity to demand that is predictable or well-defined by time. This approach is typically not dependent upon utilization levels of the resources. A time-based approach ensures that resources are available at the specific time they are required and can be provided without any delays due to start-up procedures and system or consistency checks. Using a time-based approach, you can provide additional resources or increase capacity during busy periods. You can use scheduled or predictive auto scaling to implement a time-based approach. Workloads can be scheduled to scale out or in at defined times (for example, the start of business hours), making resources available when users arrive or demand increases. Predictive scaling uses patterns to scale out while scheduled scaling uses pre-defined times to scale out. You can also use attribute-based instance type selection (ABS) strategy in Auto Scaling groups, which lets you express your instance requirements as a set of attributes, such as vCPU, memory, and storage. This also allows you to automatically use newer generation instance types when they are released and access a broader range of capacity with Amazon EC2 Spot Instances. Amazon EC2 Fleet and Amazon EC2 Auto Scaling select and launch instances that fit the specified attributes, removing the need to manually pick instance types. You can also leverage the AWS APIs and SDKs and AWS CloudFormation to automatically provision and decommission entire environments as you need them. This approach is well suited for development or test environments that run only in defined business hours or periods of time. You can use APIs to scale the size of resources within an environment (vertical scaling). For example, you could scale up a production workload by changing the instance size or class. This can be achieved by stopping and starting the instance and selecting the different instance size or class. This technique can also be applied to other resources, such as Amazon EBS Elastic Volumes, which can be modified to increase size, adjust performance (IOPS) or change the volume type while in use. When architecting with a time-based approach keep in mind two key considerations. First, how consistent is the usage pattern? Second, what is the impact if the pattern changes? You can increase the accuracy of predictions by monitoring your workloads and by using business intelligence. If you see significant changes in the usage pattern, you can adjust the times to ensure that coverage is provided. ### Implementation steps - **Configure scheduled scaling:** For predictable changes in demand, time-based scaling can provide the correct number of resources in a timely manner. It is also useful if resource creation and configuration is not fast enough to respond to changes on demand. Using the workload analysis configure scheduled scaling using AWS Auto Scaling. To configure time-based scheduling, you can use predictive scaling of scheduled scaling to increase the number of Amazon EC2 instances in your Auto Scaling groups in advance according to expected or predictable load changes. - **Configure predictive scaling:** Predictive scaling allows you to increase the number of Amazon EC2 instances in your Auto Scaling group in advance of daily and weekly patterns in traffic flows. If you have regular traffic spikes and applications that take a long time to start, you should consider using predictive scaling. Predictive scaling can help you scale faster by initializing capacity before projected load compared to dynamic scaling alone, which is reactive in nature. For example, if users start using your workload with the start of the business hours and donโ€™t use after hours, then predictive scaling can add capacity before the business hours which eliminates delay of dynamic scaling to react to changing traffic. - **Configure dynamic automatic scaling:** To configure scaling based on active workload metrics, use Auto Scaling. Use the analysis and configure Auto Scaling to launch on the correct resource levels, and verify that the workload scales in the required time. You can launch and automatically scale a fleet of On-Demand Instances and Spot Instances within a single Auto Scaling group. In addition to receiving discounts for using Spot Instances, you can use Reserved Instances or a Savings Plan to receive discounted rates of the regular On-Demand Instance pricing. All of these factors combined help you to optimize your cost savings for Amazon EC2 instances and help you get the desired scale and performance for your application.

๐Ÿ’ผ COST10-BP01 Develop a workload review process

Develop a process that defines the criteria and process for workload review. The review effort should reflect potential benefit. For example, core workloads or workloads with a value of over ten percent of the bill are reviewed quarterly or every six months, while workloads below ten percent are reviewed annually. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To have the most cost-efficient workload, you must regularly review the workload to know if there are opportunities to implement new services, features, and components. To achieve overall lower costs the process must be proportional to the potential amount of savings. For example, workloads that are 50% of your overall spend should be reviewed more regularly, and more thoroughly, than workloads that are five percent of your overall spend. Factor in any external factors or volatility. If the workload services a specific geography or market segment, and change in that area is predicted, more frequent reviews could lead to cost savings. Another factor in review is the effort to implement changes. If there are significant costs in testing and validating changes, reviews should be less frequent. Factor in the long-term cost of maintaining outdated and legacy, components and resources and the inability to implement new features into them. The current cost of testing and validation may exceed the proposed benefit. However, over time, the cost of making the change may significantly increase as the gap between the workload and the current technologies increases, resulting in even larger costs. For example, the cost of moving to a new programming language may not currently be cost effective. However, in five years time, the cost of people skilled in that language may increase, and due to workload growth, you would be moving an even larger system to the new language, requiring even more effort than previously. Break down your workload into components, assign the cost of the component (an estimate is sufficient), and then list the factors (for example, effort and external markets) next to each component. Use these indicators to determine a review frequency for each workload. For example, you may have webservers as a high cost, low change effort, and high external factors, resulting in high frequency of review. A central database may be medium cost, high change effort, and low external factors, resulting in a medium frequency of review. Define a process to evaluate new services, design patterns, resource types, and configurations to optimize your workload cost as they become available. Similar to performance pillar review and reliability pillar review processes, identify, validate, and prioritize optimization and improvement activities and issue remediation and incorporate this into your backlog. ### Implementation steps - **Define review frequency:** Define how frequently the workload and its components should be reviewed. Allocate time and resources to continual improvement and review frequency to improve the efficiency and optimization of your workload. This is a combination of factors and may differ from workload to workload within your organization and between components in the workload. Common factors include the importance to the organization measured in terms of revenue or brand, the total cost of running the workload (including operation and resource costs), the complexity of the workload, how easy is it to implement a change, any software licensing agreements, and if a change would incur significant increases in licensing costs due to punitive licensing. Components can be defined functionally or technically, such as web servers and databases, or compute and storage resources. Balance the factors accordingly and develop a period for the workload and its components. You may decide to review the full workload every 18 months, review the web servers every six months, the database every 12 months, compute and short-term storage every six months, and long-term storage every 12 months. - **Define review thoroughness:** Define how much effort is spent on the review of the workload or workload components. Similar to the review frequency, this is a balance of multiple factors. Evaluate and prioritize opportunities for improvement to focus efforts where they provide the greatest benefits while estimating how much effort is required for these activities. If the expected outcomes do not satisfy the goals, and required effort costs more, then iterate using alternative courses of action. Your review processes should include dedicated time and resources to make continuous incremental improvements possible. As an example, you may decide to spend one week of analysis on the database component, one week of analysis for compute resources, and four hours for storage reviews.

๐Ÿ’ผ COST10-BP02 Review and analyze this workload regularly

Existing workloads are regularly reviewed based on each defined process to find out if new services can be adopted, existing services can be replaced, or workloads can be re-architected. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance AWS is constantly adding new features so you can experiment and innovate faster with the latest technology. AWS What's New details how AWS is doing this and provides a quick overview of AWS services, features, and Regional expansion announcements as they are released. You can dive deeper into the launches that have been announced and use them for your review and analyze of your existing workloads. To realize the benefits of new AWS services and features, you review on your workloads and implement new services and features as required. This means you may need to replace existing services you use for your workload, or modernize your workload to adopt these new AWS services. For example, you might review your workloads and replace the messaging component with Amazon Simple Email Service. This removes the cost of operating and maintaining a fleet of instances, while providing all the functionality at a reduced cost. To analyze your workload and highlight potential opportunities, you should consider not only new services but also new ways of building solutions. Review the This is My Architecture videos on AWS to learn about other customersโ€™ architecture designs, their challenges and their solutions. Check the All-In series to find out real world applications of AWS services and customer stories. You can also watch the Back to Basics video series that explains, examines, and breaks down basic cloud architecture pattern best practices. Another source is How to Build This videos, which are designed to assist people with big ideas on how to bring their minimum viable product (MVP) to life using AWS services. It is a way for builders from all over the world who have a strong idea to gain architectural guidance from experienced AWS Solutions Architects. Finally, you can review the Getting Started resource materials, which has step by step tutorials. Before starting your review process, follow your businessโ€™ requirements for the workload, security and data privacy requirements in order to use specific service or Region and performance requirements while following your agreed review process. ### Implementation steps - **Regularly review the workload:** Using your defined process, perform reviews with the frequency specified. Verify that you spend the correct amount of effort on each component. This process would be similar to the initial design process where you selected services for cost optimization. Analyze the services and the benefits they would bring, this time factor in the cost of making the change, not just the long-term benefits. - **Implement new services:** If the outcome of the analysis is to implement changes, first perform a baseline of the workload to know the current cost for each output. Implement the changes, then perform an analysis to confirm the new cost for each output.

๐Ÿ’ผ COST11-BP01 Perform automation for operations

Evaluate the operational costs on the cloud, focusing on quantifying the time and effort savings in administrative tasks, deployments, mitigating the risk of human errors, compliance, and other operations through automation. Assess the time and associated costs required for operational efforts and implement automation for administrative tasks to minimize manual effort wherever feasible. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Automating operations reduces the frequency of manual tasks, improves efficiency, and benefits customers by delivering a consistent and reliable experience when deploying, administering, or operating workloads. You can free up infrastructure resources from manual operational tasks and use them for higher value tasks and innovations, which improves business value. Enterprises require a proven, tested way to manage their workloads in the cloud. That solution must be secure, fast, and cost effective, with minimum risk and maximum reliability. Start by prioritizing your operational activities based on required effort by looking at overall operations cost. For example, how long does it take to deploy new resources in the cloud, make optimization changes to existing ones, or implement necessary configurations? Look at the total cost of human actions by factoring in cost of operations and management. Prioritize automations for admin tasks to reduce the human effort. Review effort should reflect the potential benefit. For example, examine time spent performing tasks manually as opposed to automatically. Prioritize automating repetitive, high value, time consuming and complex activities. Activities that pose a high value or high risk of human error are typically the better place to start automating as the risk often poses an unwanted additional operational cost (like operations team working extra hours). Use automation tools like AWS Systems Manager or AWS Config to streamline operations, compliance, monitoring, lifecycle, and termination processes. With AWS services, tools, and third-party products, you can customize the automations you implement to meet your specific requirement. Following table shows some of the core operation functions and capabilities you can achieve with AWS services to automate administration and operation: - AWS Audit Manager: Continually audit your AWS usage to simplify risk and compliance assessment - AWS Backup: Centrally manage and automate data protection. - AWS Config: Configure compute resources, asses, audit, evaluate configurations and resource inventory. - AWS CloudFormation: Launch highly available resources with Infrastructure as Code. - AWS CloudTrail: IT change management, compliance, and control. - Amazon EventBridge: Schedule events and trigger AWS Lambda to take action. - AWS Lambda: Automate repetitive processes by triggering them with events or by running them on a fixed schedule with AWS EventBridge. - AWS Systems Manager: Start and stop workloads, patch operating systems, automate configuration, and ongoing management. - AWS Step Functions: Schedule jobs and automate workflows. - AWS Service Catalog: Template consumption, infrastructure as code with compliance and control. If you would like to adopt automations immediately with using AWS products and service and if don't have skills in your organization, reach out to AWS Managed Services (AMS), AWS Professional Services, or AWS Partners to increase adoption of automation and improve your operational excellence in the cloud. AWS Managed Services (AMS) is a service that operates AWS infrastructure on behalf of enterprise customers and partners. It provides a secure and compliant environment that you can deploy your workloads onto. AMS uses enterprise cloud operating models with automation to allow you to meet your organization requirements, move into the cloud faster, and reduce your on-going management costs. AWS Professional Services can also help you achieve your desired business outcomes and automate operations with AWS. They help customers to deploy automated, robust, agile IT operations, and governance capabilities optimized for the cloud. For detailed monitoring examples and recommended best practices, see Operational Excellence Pillar whitepaper. ### Implementation steps - **Build once and deploy many:** Use infrastructure-as-code such as CloudFormation, AWS SDK, or AWS CLI to deploy once and use many times for similar environments or for disaster recovery scenarios. Tag while deploying to track your consumption as defined in other best practices. Use AWS Launch Wizard to reduce the time to deploy many popular enterprise workloads. AWS Launch Wizard guides you through the sizing, configuration, and deployment of enterprise workloads following AWS best practices. You can also use the Service Catalog, which helps you create and manage infrastructure-as-code approved templates for use on AWS so anyone can discover approved, self-service cloud resources. - **Automate continuous compliance:** Consider automating assessment and remediation of recorded configurations against predefined standards. When you combine AWS Organizations with the capabilities of AWS Config and AWS CloudFormation, you can efficiently manage and automate configuration compliance at scale for hundreds of member accounts. You can review changes in configurations and relationships between AWS resources and dive into the history of a resource configuration. - **Automate monitoring tasks:** AWS provides various tools that you can use to monitor services. You can configure these tools to automate monitoring tasks. Create and implement a monitoring plan that collects monitoring data from all the parts in your workload so that you can more easily debug a multi-point failure if one occurs. For example, you can use the automated monitoring tools to observe Amazon EC2 and report back to you when something is wrong for system status checks, instance status checks, and Amazon CloudWatch alarms. - **Automate maintenance and operations:** Run routine operations automatically without human intervention. Using AWS services and tools, you can choose which AWS automations to implement and customize for your specific requirements. For example, use EC2 Image Builder for building, testing, and deployment of virtual machine and container images for use on AWS or on-premises or patching your EC2 instances with AWS SSM. If your desired action cannot be done with AWS services or you need more complex actions with filtering resources, then automate your operations with using AWS Command Line Interface (AWS CLI) or AWS SDK tools. AWS CLI provides the ability to automate the entire process of controlling and managing AWS services with scripts without using the AWS Management Console. Select your preferred AWS SDKs to interact with AWS services. For other code examples, see AWS SDK Code examples repository. - **Create a continual lifecycle with automations:** It is important that you establish and preserve mature lifecycle policies not only for regulations or redundancy but also for cost optimization. You can use AWS Backup to centrally manage and automate data protection of data stores, such as your buckets, volumes, databases, and file systems. You can also use Amazon Data Lifecycle Manager to automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs. - **Delete unnecessary resources:** It's quite common to accumulate unused resources in sandbox or development AWS accounts. Developers create and experiment with various services and resources as part of the normal development cycle, and then they don't delete those resources when they're no longer needed. Unused resources can incur unnecessary and sometimes high costs for the organization. Deleting these resources can reduce the costs of operating these environments. Make sure your data is not needed or backed up if you are not sure. You can use AWS CloudFormation to clean up deployed stacks, which automatically deletes most resources defined in the template. Alternatively, you can create an automation for the deletion of AWS resources using tools like aws-nuke.

๐Ÿ’ผ CP-1 CONTINGENCY PLANNING POLICY AND PROCEDURES

The organization: CP-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: CP-1a.1. A contingency planning policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and CP-1a.2. Procedures to facilitate the implementation of the contingency planning policy and associated contingency planning controls; and CP-1b. Reviews and updates the current: CP-1b.1. Contingency planning policy [Assignment: organization-defined frequency]; and CP-1b.2. Contingency planning procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ CP-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] contingency planning policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the contingency planning policy and the associated contingency planning controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the contingency planning policy and procedures; and c. Review and update the current contingency planning: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ CP-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] contingency planning policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the contingency planning policy and the associated contingency planning controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the contingency planning policy and procedures; and c. Review and update the current contingency planning: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CP-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] contingency planning policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the contingency planning policy and the associated contingency planning controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the contingency planning policy and procedures; and c. Review and update the current contingency planning: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CP-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] contingency planning policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the contingency planning policy and the associated contingency planning controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the contingency planning policy and procedures; and c. Review and update the current contingency planning: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ CP-10 (4) RESTORE WITHIN TIME PERIOD

The organization provides the capability to restore information system components within [Assignment: organization-defined restoration time-periods] from configuration-controlled and integrity-protected information representing a known, operational state for the components.

๐Ÿ’ผ CP-10 System Recovery and Reconstitution

Provide for the recovery and reconstitution of the system to a known state within [Assignment: organization-defined time period consistent with recovery time and recovery point objectives] after a disruption, compromise, or failure.

๐Ÿ’ผ CP-10(4) Restore Within Time Period (H)

Provide the capability to restore system components within [FedRAMP Assignment: time period consistent with the restoration time-periods defined in the service provider and organization SLA] from configuration-controlled and integrity-protected information representing a known, operational state for the components.

๐Ÿ’ผ CP-12 Safe Mode

When [Assignment: organization-defined conditions] are detected, enter a safe mode of operation with [Assignment: organization-defined restrictions of safe mode of operation].

๐Ÿ’ผ CP-12 SAFE MODE

The information system, when [Assignment: organization-defined conditions] are detected, enters a safe mode of operation with [Assignment: organization-defined restrictions of safe mode of operation].

๐Ÿ’ผ CP-13 Alternative Security Mechanisms

Employ [Assignment: organization-defined alternative or supplemental security mechanisms] for satisfying [Assignment: organization-defined security functions] when the primary means of implementing the security function is unavailable or compromised.

๐Ÿ’ผ CP-13 ALTERNATIVE SECURITY MECHANISMS

The organization employs [Assignment: organization-defined alternative or supplemental security mechanisms] for satisfying [Assignment: organization-defined security functions] when the primary means of implementing the security function is unavailable or compromised.

๐Ÿ’ผ CP-2 (2) CAPACITY PLANNING

The organization conducts capacity planning so that necessary capacity for information processing, telecommunications, and environmental support exists during contingency operations.

๐Ÿ’ผ CP-2 (6) ALTERNATE PROCESSING | STORAGE SITE

The organization plans for the transfer of essential missions and business functions to alternate processing and/or storage sites with little or no loss of operational continuity and sustains that continuity through information system restoration to primary processing and/or storage sites.

๐Ÿ’ผ CP-2 Contingency Plan

a. Develop a contingency plan for the system that: 1. Identifies essential mission and business functions and associated contingency requirements; 2. Provides recovery objectives, restoration priorities, and metrics; 3. Addresses contingency roles, responsibilities, assigned individuals with contact information; 4. Addresses maintaining essential mission and business functions despite a system disruption, compromise, or failure; 5. Addresses eventual, full system restoration without deterioration of the controls originally planned and implemented; 6. Addresses the sharing of contingency information; and 7. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; b. Distribute copies of the contingency plan to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; c. Coordinate contingency planning activities with incident handling activities; d. Review the contingency plan for the system [Assignment: organization-defined frequency]; e. Update the contingency plan to address changes to the organization, system, or environment of operation and problems encountered during contingency plan implementation, execution, or testing; f. Communicate contingency plan changes to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; g. Incorporate lessons learned from contingency plan testing, training, or actual contingency activities into contingency testing and training; and h. Protect the contingency plan from unauthorized disclosure and modification.

๐Ÿ’ผ CP-2 CONTINGENCY PLAN

The organization: CP-2a. Develops a contingency plan for the information system that: CP-2a.1. Identifies essential missions and business functions and associated contingency requirements; CP-2a.2. Provides recovery objectives, restoration priorities, and metrics; CP-2a.3. Addresses contingency roles, responsibilities, assigned individuals with contact information; CP-2a.4. Addresses maintaining essential missions and business functions despite an information system disruption, compromise, or failure; CP-2a.5. Addresses eventual, full information system restoration without deterioration of the security safeguards originally planned and implemented; and CP-2a.6. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; CP-2b. Distributes copies of the contingency plan to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; CP-2c. Coordinates contingency planning activities with incident handling activities; CP-2d. Reviews the contingency plan for the information system [Assignment: organization-defined frequency]; CP-2e. Updates the contingency plan to address changes to the organization, information system, or environment of operation and problems encountered during contingency plan implementation, execution, or testing; CP-2f. Communicates contingency plan changes to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; and CP-2g. Protects the contingency plan from unauthorized disclosure and modification.

๐Ÿ’ผ CP-2 Contingency Plan (L)(M)(H)

a. Develop a contingency plan for the system that: 1. Identifies essential mission and business functions and associated contingency requirements; 2. Provides recovery objectives, restoration priorities, and metrics; 3. Addresses contingency roles, responsibilities, assigned individuals with contact information; 4. Addresses maintaining essential mission and business functions despite a system disruption, compromise, or failure; 5. Addresses eventual, full system restoration without deterioration of the controls originally planned and implemented; 6. Addresses the sharing of contingency information; and 7. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; b. Distribute copies of the contingency plan to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; c. Coordinate contingency planning activities with incident handling activities; d. Review the contingency plan for the system [FedRAMP Assignment: at least annually]; e. Update the contingency plan to address changes to the organization, system, or environment of operation and problems encountered during contingency plan implementation, execution, or testing; f. Communicate contingency plan changes to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; g. Incorporate lessons learned from contingency plan testing, training, or actual contingency activities into contingency testing and training; and h. Protect the contingency plan from unauthorized disclosure and modification. **CP-2 Additional FedRAMP Requirements and Guidance:** **Requirement**: For JAB authorizations the contingency lists include designated FedRAMP personnel. **Requirement**: CSPs must use the FedRAMP Information System Contingency Plan (ISCP) Template (available on the fedramp.gov: <https://www.fedramp.gov/assets/resources/templates/SSP-Appendix-G-Information-System-Contingency-Plan-(ISCP)-Template.docx)>.

๐Ÿ’ผ CP-2 Contingency Plan (L)(M)(H)

a. Develop a contingency plan for the system that: 1. Identifies essential mission and business functions and associated contingency requirements; 2. Provides recovery objectives, restoration priorities, and metrics; 3. Addresses contingency roles, responsibilities, assigned individuals with contact information; 4. Addresses maintaining essential mission and business functions despite a system disruption, compromise, or failure; 5. Addresses eventual, full system restoration without deterioration of the controls originally planned and implemented; 6. Addresses the sharing of contingency information; and 7. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; b. Distribute copies of the contingency plan to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; c. Coordinate contingency planning activities with incident handling activities; d. Review the contingency plan for the system [FedRAMP Assignment: at least annually]; e. Update the contingency plan to address changes to the organization, system, or environment of operation and problems encountered during contingency plan implementation, execution, or testing; f. Communicate contingency plan changes to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; g. Incorporate lessons learned from contingency plan testing, training, or actual contingency activities into contingency testing and training; and h. Protect the contingency plan from unauthorized disclosure and modification. **CP-2 Additional FedRAMP Requirements and Guidance:** **Requirement**: For JAB authorizations the contingency lists include designated FedRAMP personnel. **Requirement**: CSPs must use the FedRAMP Information System Contingency Plan (ISCP) Template (available on the fedramp.gov: <https://www.fedramp.gov/assets/resources/templates/SSP-Appendix-G-Information-System-Contingency-Plan-(ISCP)-Template.docx)>.

๐Ÿ’ผ CP-2 Contingency Plan (L)(M)(H)

a. Develop a contingency plan for the system that: 1. Identifies essential mission and business functions and associated contingency requirements; 2. Provides recovery objectives, restoration priorities, and metrics; 3. Addresses contingency roles, responsibilities, assigned individuals with contact information; 4. Addresses maintaining essential mission and business functions despite a system disruption, compromise, or failure; 5. Addresses eventual, full system restoration without deterioration of the controls originally planned and implemented; 6. Addresses the sharing of contingency information; and 7. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; b. Distribute copies of the contingency plan to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; c. Coordinate contingency planning activities with incident handling activities; d. Review the contingency plan for the system [FedRAMP Assignment: at least annually]; e. Update the contingency plan to address changes to the organization, system, or environment of operation and problems encountered during contingency plan implementation, execution, or testing; f. Communicate contingency plan changes to [Assignment: organization-defined key contingency personnel (identified by name and/or by role) and organizational elements]; g. Incorporate lessons learned from contingency plan testing, training, or actual contingency activities into contingency testing and training; and h. Protect the contingency plan from unauthorized disclosure and modification. **CP-2 Additional FedRAMP Requirements and Guidance:** **Requirement**: For JAB authorizations the contingency lists include designated FedRAMP personnel. **Requirement**: CSPs must use the FedRAMP Information System Contingency Plan (ISCP) Template (available on the fedramp.gov: <https://www.fedramp.gov/assets/resources/templates/SSP-Appendix-G-Information-System-Contingency-Plan-(ISCP)-Template.docx)>.

๐Ÿ’ผ CP-2(2) Capacity Planning (H)

Conduct capacity planning so that necessary capacity for information processing, telecommunications, and environmental support exists during contingency operations.

๐Ÿ’ผ CP-3 Contingency Training

a. Provide contingency training to system users consistent with assigned roles and responsibilities: 1. Within [Assignment: organization-defined time period] of assuming a contingency role or responsibility; 2. When required by system changes; and 3. [Assignment: organization-defined frequency] thereafter; and b. Review and update contingency training content [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ CP-3 CONTINGENCY TRAINING

The organization provides contingency training to information system users consistent with assigned roles and responsibilities: CP-3a. Within [Assignment: organization-defined time period] of assuming a contingency role or responsibility; CP-3b. When required by information system changes; and CP-3c. [Assignment: organization-defined frequency] thereafter.

๐Ÿ’ผ CP-3 Contingency Training (L)(M)(H)

a. Provide contingency training to system users consistent with assigned roles and responsibilities: 1. Within [FedRAMP Assignment: *See Additional Requirements] of assuming a contingency role or responsibility; 2. When required by system changes; and 3. [FedRAMP Assignment: at least annually] thereafter; and b. Review and update contingency training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]. **CP-3 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: Privileged admins and engineers must take the basic contingency training within ten (10) days. Consideration must be given for those privileged admins and engineers with critical contingency-related roles, to gain enough system context and situational awareness to understand the full impact of contingency training as it applies to their respective level. Newly hired critical contingency personnel must take this more in-depth training within sixty (60) days of hire date when the training will have more impact.

๐Ÿ’ผ CP-3 Contingency Training (L)(M)(H)

a. Provide contingency training to system users consistent with assigned roles and responsibilities: 1. Within [FedRAMP Assignment: *See Additional Requirements] of assuming a contingency role or responsibility; 2. When required by system changes; and 3. [FedRAMP Assignment: at least annually] thereafter; and b. Review and update contingency training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]. **CP-3 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: Privileged admins and engineers must take the basic contingency training within ten (10) days. Consideration must be given for those privileged admins and engineers with critical contingency-related roles, to gain enough system context and situational awareness to understand the full impact of contingency training as it applies to their respective level. Newly hired critical contingency personnel must take this more in-depth training within sixty (60) days of hire date when the training will have more impact.

๐Ÿ’ผ CP-3 Contingency Training (L)(M)(H)

a. Provide contingency training to system users consistent with assigned roles and responsibilities: 1. Within [FedRAMP Assignment: *See Additional Requirements] of assuming a contingency role or responsibility; 2. When required by system changes; and 3. [FedRAMP Assignment: at least annually] thereafter; and b. Review and update contingency training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events]. **CP-3 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: Privileged admins and engineers must take the basic contingency training within ten (10) days. Consideration must be given for those privileged admins and engineers with critical contingency-related roles, to gain enough system context and situational awareness to understand the full impact of contingency training as it applies to their respective level. Newly hired critical contingency personnel must take this more in-depth training within sixty (60) days of hire date when the training will have more impact.

๐Ÿ’ผ CP-4 (2) ALTERNATE PROCESSING SITE

The organization tests the contingency plan at the alternate processing site: CP-4 (2)(a) To familiarize contingency personnel with the facility and available resources; and CP-4 (2)(b) To evaluate the capabilities of the alternate processing site to support contingency operations.

๐Ÿ’ผ CP-4 Contingency Plan Testing

a. Test the contingency plan for the system [Assignment: organization-defined frequency] using the following tests to determine the effectiveness of the plan and the readiness to execute the plan: [Assignment: organization-defined tests]. b. Review the contingency plan test results; and c. Initiate corrective actions, if needed.

๐Ÿ’ผ CP-4 CONTINGENCY PLAN TESTING

The organization: CP-4a. Tests the contingency plan for the information system [Assignment: organization-defined frequency] using [Assignment: organization-defined tests] to determine the effectiveness of the plan and the organizational readiness to execute the plan; CP-4b. Reviews the contingency plan test results; and CP-4c. Initiates corrective actions, if needed.

๐Ÿ’ผ CP-4 Contingency Plan Testing (L)(M)(H)

a. Test the contingency plan for the system [FedRAMP Assignment: at least annually] using the following tests to determine the effectiveness of the plan and the readiness to execute the plan: [FedRAMP Assignment: functional exercises]. b. Review the contingency plan test results; and c. Initiate corrective actions, if needed. **CP-4 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider develops test plans in accordance with NIST Special Publication 800-34 (as amended); plans are approved by the JAB/AO prior to initiating testing. **(a) Requirement**: The service provider must include the Contingency Plan test results with the security package within the Contingency Plan-designated appendix (Appendix G, Contingency Plan Test Report).

๐Ÿ’ผ CP-4 Contingency Plan Testing (L)(M)(H)

a. Test the contingency plan for the system [FedRAMP Assignment: at least annually] using the following tests to determine the effectiveness of the plan and the readiness to execute the plan: [FedRAMP Assignment: functional exercises]. b. Review the contingency plan test results; and c. Initiate corrective actions, if needed. **CP-4 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider develops test plans in accordance with NIST Special Publication 800-34 (as amended); plans are approved by the JAB/AO prior to initiating testing. **(a) Requirement**: The service provider must include the Contingency Plan test results with the security package within the Contingency Plan-designated appendix (Appendix G, Contingency Plan Test Report).

๐Ÿ’ผ CP-4 Contingency Plan Testing (L)(M)(H)

a. Test the contingency plan for the system [FedRAMP Assignment: at least annually] using the following tests to determine the effectiveness of the plan and the readiness to execute the plan: [FedRAMP Assignment: functional exercises]. b. Review the contingency plan test results; and c. Initiate corrective actions, if needed. **CP-4 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider develops test plans in accordance with NIST Special Publication 800-34 (as amended); plans are approved by the JAB/AO prior to initiating testing. **(a) Requirement**: The service provider must include the Contingency Plan test results with the security package within the Contingency Plan-designated appendix (Appendix G, Contingency Plan Test Report).

๐Ÿ’ผ CP-4(2) Alternate Processing Site (H)

Test the contingency plan at the alternate processing site: (a) To familiarize contingency personnel with the facility and available resources; and (b) To evaluate the capabilities of the alternate processing site to support contingency operations.

๐Ÿ’ผ CP-6 (3) ACCESSIBILITY

The organization identifies potential accessibility problems to the alternate storage site in the event of an area-wide disruption or disaster and outlines explicit mitigation actions.

๐Ÿ’ผ CP-6 Alternate Storage Site

a. Establish an alternate storage site, including necessary agreements to permit the storage and retrieval of system backup information; and b. Ensure that the alternate storage site provides controls equivalent to that of the primary site.

๐Ÿ’ผ CP-6 ALTERNATE STORAGE SITE

The organization: CP-6a. Establishes an alternate storage site including necessary agreements to permit the storage and retrieval of information system backup information; and CP-6b. Ensures that the alternate storage site provides information security safeguards equivalent to that of the primary site.

๐Ÿ’ผ CP-6 Alternate Storage Site (M)(H)

a. Establish an alternate storage site, including necessary agreements to permit the storage and retrieval of system backup information; and b. Ensure that the alternate storage site provides controls equivalent to that of the primary site.

๐Ÿ’ผ CP-6 Alternate Storage Site (M)(H)

a. Establish an alternate storage site, including necessary agreements to permit the storage and retrieval of system backup information; and b. Ensure that the alternate storage site provides controls equivalent to that of the primary site.

๐Ÿ’ผ CP-6(3) Accessibility (M)(H)

Identify potential accessibility problems to the alternate storage site in the event of an area-wide disruption or disaster and outline explicit mitigation actions.

๐Ÿ’ผ CP-6(3) Accessibility (M)(H)

Identify potential accessibility problems to the alternate storage site in the event of an area-wide disruption or disaster and outline explicit mitigation actions.

๐Ÿ’ผ CP-7 (2) ACCESSIBILITY

The organization identifies potential accessibility problems to the alternate processing site in the event of an area-wide disruption or disaster and outlines explicit mitigation actions.

๐Ÿ’ผ CP-7 (3) PRIORITY OF SERVICE

The organization develops alternate processing site agreements that contain priority-of-service provisions in accordance with organizational availability requirements (including recovery time objectives).

๐Ÿ’ผ CP-7 (4) PREPARATION FOR USE

The organization prepares the alternate processing site so that the site is ready to be used as the operational site supporting essential missions and business functions.

๐Ÿ’ผ CP-7 Alternate Processing Site

a. Establish an alternate processing site, including necessary agreements to permit the transfer and resumption of [Assignment: organization-defined system operations] for essential mission and business functions within [Assignment: organization-defined time period consistent with recovery time and recovery point objectives] when the primary processing capabilities are unavailable; b. Make available at the alternate processing site, the equipment and supplies required to transfer and resume operations or put contracts in place to support delivery to the site within the organization-defined time period for transfer and resumption; and c. Provide controls at the alternate processing site that are equivalent to those at the primary site.

๐Ÿ’ผ CP-7 ALTERNATE PROCESSING SITE

The organization: CP-7a. Establishes an alternate processing site including necessary agreements to permit the transfer and resumption of [Assignment: organization-defined information system operations] for essential missions/business functions within [Assignment: organization-defined time period consistent with recovery time and recovery point objectives] when the primary processing capabilities are unavailable; CP-7b. Ensures that equipment and supplies required to transfer and resume operations are available at the alternate processing site or contracts are in place to support delivery to the site within the organization-defined time period for transfer/resumption; and CP-7c. Ensures that the alternate processing site provides information security safeguards equivalent to those of the primary site.

๐Ÿ’ผ CP-7 Alternate Processing Site (M)(H)

a. Establish an alternate processing site, including necessary agreements to permit the transfer and resumption of [Assignment: organization-defined system operations] for essential mission and business functions within [Assignment: organization-defined time period consistent with recovery time and recovery point objectives] when the primary processing capabilities are unavailable; b. Make available at the alternate processing site, the equipment and supplies required to transfer and resume operations or put contracts in place to support delivery to the site within the organization-defined time period for transfer and resumption; and c. Provide controls at the alternate processing site that are equivalent to those at the primary site. **CP-7 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider defines a time period consistent with the recovery time objectives and business impact analysis.

๐Ÿ’ผ CP-7 Alternate Processing Site (M)(H)

a. Establish an alternate processing site, including necessary agreements to permit the transfer and resumption of [Assignment: organization-defined system operations] for essential mission and business functions within [Assignment: organization-defined time period consistent with recovery time and recovery point objectives] when the primary processing capabilities are unavailable; b. Make available at the alternate processing site, the equipment and supplies required to transfer and resume operations or put contracts in place to support delivery to the site within the organization-defined time period for transfer and resumption; and c. Provide controls at the alternate processing site that are equivalent to those at the primary site. **CP-7 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider defines a time period consistent with the recovery time objectives and business impact analysis.

๐Ÿ’ผ CP-7(1) Separation from Primary Site (M)(H)

Identify an alternate processing site that is sufficiently separated from the primary processing site to reduce susceptibility to the same threats. **CP-7 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: The service provider may determine what is considered a sufficient degree of separation between the primary and alternate processing sites, based on the types of threats that are of concern. For one particular type of threat (i.e., hostile cyber attack), the degree of separation between sites will be less relevant.

๐Ÿ’ผ CP-7(1) Separation from Primary Site (M)(H)

Identify an alternate processing site that is sufficiently separated from the primary processing site to reduce susceptibility to the same threats. **CP-7 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: The service provider may determine what is considered a sufficient degree of separation between the primary and alternate processing sites, based on the types of threats that are of concern. For one particular type of threat (i.e., hostile cyber attack), the degree of separation between sites will be less relevant.

๐Ÿ’ผ CP-7(2) Accessibility (M)(H)

Identify potential accessibility problems to alternate processing sites in the event of an area-wide disruption or disaster and outlines explicit mitigation actions.

๐Ÿ’ผ CP-7(2) Accessibility (M)(H)

Identify potential accessibility problems to alternate processing sites in the event of an area-wide disruption or disaster and outlines explicit mitigation actions.

๐Ÿ’ผ CP-8 (1) PRIORITY OF SERVICE PROVISIONS

The organization: CP-8 (1)(a) Develops primary and alternate telecommunications service agreements that contain priority-of-service provisions in accordance with organizational availability requirements (including recovery time objectives); and CP-8 (1)(b) Requests Telecommunications Service Priority for all telecommunications services used for national security emergency preparedness in the event that the primary and/or alternate telecommunications services are provided by a common carrier.

๐Ÿ’ผ CP-8 (4) PROVIDER CONTINGENCY PLAN

The organization: CP-8 (4)(a) Requires primary and alternate telecommunications service providers to have contingency plans; CP-8 (4)(b) Reviews provider contingency plans to ensure that the plans meet organizational contingency requirements; and CP-8 (4)(c) Obtains evidence of contingency testing/training by providers [Assignment: organization-defined frequency].

๐Ÿ’ผ CP-8 Telecommunications Services

Establish alternate telecommunications services, including necessary agreements to permit the resumption of [Assignment: organization-defined system operations] for essential mission and business functions within [Assignment: organization-defined time period] when the primary telecommunications capabilities are unavailable at either the primary or alternate processing or storage sites.

๐Ÿ’ผ CP-8 TELECOMMUNICATIONS SERVICES

The organization establishes alternate telecommunications services including necessary agreements to permit the resumption of [Assignment: organization-defined information system operations] for essential missions and business functions within [Assignment: organization-defined time period] when the primary telecommunications capabilities are unavailable at either the primary or alternate processing or storage sites.

๐Ÿ’ผ CP-8 Telecommunications Services (M)(H)

Establish alternate telecommunications services, including necessary agreements to permit the resumption of [Assignment: organization-defined system operations] for essential mission and business functions within [Assignment: organization-defined time period] when the primary telecommunications capabilities are unavailable at either the primary or alternate processing or storage sites. **CP-8 Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider defines a time period consistent with the recovery time objectives and business impact analysis.

๐Ÿ’ผ CP-8 Telecommunications Services (M)(H)

Establish alternate telecommunications services, including necessary agreements to permit the resumption of [Assignment: organization-defined system operations] for essential mission and business functions within [Assignment: organization-defined time period] when the primary telecommunications capabilities are unavailable at either the primary or alternate processing or storage sites. **CP-8 Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider defines a time period consistent with the recovery time objectives and business impact analysis.

๐Ÿ’ผ CP-8(1) Priority of Service Provisions (M)(H)

(a) Develop primary and alternate telecommunications service agreements that contain priority-of-service provisions in accordance with availability requirements (including recovery time objectives); and (b) Request Telecommunications Service Priority for all telecommunications services used for national security emergency preparedness if the primary and/or alternate telecommunications services are provided by a common carrier.

๐Ÿ’ผ CP-8(1) Priority of Service Provisions (M)(H)

(a) Develop primary and alternate telecommunications service agreements that contain priority-of-service provisions in accordance with availability requirements (including recovery time objectives); and (b) Request Telecommunications Service Priority for all telecommunications services used for national security emergency preparedness if the primary and/or alternate telecommunications services are provided by a common carrier.

๐Ÿ’ผ CP-8(1) Telecommunications Services | Priority of Service Provisions

(a) Develop primary and alternate telecommunications service agreements that contain priority-of-service provisions in accordance with availability requirements (including recovery time objectives); and (b) Request Telecommunications Service Priority for all telecommunications services used for national security emergency preparedness if the primary and/or alternate telecommunications services are provided by a common carrier.

๐Ÿ’ผ CP-8(4) Provider Contingency Plan (H)

(a) Require primary and alternate telecommunications service providers to have contingency plans; (b) Review provider contingency plans to ensure that the plans meet organizational contingency requirements; and (c) Obtain evidence of contingency testing and training by providers [FedRAMP Assignment: annually].

๐Ÿ’ผ CP-8(4) Telecommunications Services | Provider Contingency Plan

(a) Require primary and alternate telecommunications service providers to have contingency plans; (b) Review provider contingency plans to ensure that the plans meet organizational contingency requirements; and (c) Obtain evidence of contingency testing and training by providers [Assignment: organization-defined frequency].

๐Ÿ’ผ CP-9 (3) SEPARATE STORAGE FOR CRITICAL INFORMATION

The organization stores backup copies of [Assignment: organization-defined critical information system software and other security-related information] in a separate facility or in a fire-rated container that is not collocated with the operational system.

๐Ÿ’ผ CP-9 (5) TRANSFER TO ALTERNATE STORAGE SITE

The organization transfers information system backup information to the alternate storage site [Assignment: organization-defined time period and transfer rate consistent with the recovery time and recovery point objectives].

๐Ÿ’ผ CP-9 (6) REDUNDANT SECONDARY SYSTEM

The organization accomplishes information system backup by maintaining a redundant secondary system that is not collocated with the primary system and that can be activated without loss of information or disruption to operations.

๐Ÿ’ผ CP-9 INFORMATION SYSTEM BACKUP

The organization: CP-9a. Conducts backups of user-level information contained in the information system [Assignment: organization-defined frequency consistent with recovery time and recovery point objectives]; CP-9b. Conducts backups of system-level information contained in the information system [Assignment: organization-defined frequency consistent with recovery time and recovery point objectives]; CP-9c. Conducts backups of information system documentation including security-related documentation [Assignment: organization-defined frequency consistent with recovery time and recovery point objectives]; and CP-9d. Protects the confidentiality, integrity, and availability of backup information at storage locations.

๐Ÿ’ผ CP-9 System Backup

a. Conduct backups of user-level information contained in [Assignment: organization-defined system components] [Assignment: organization-defined frequency consistent with recovery time and recovery point objectives]; b. Conduct backups of system-level information contained in the system [Assignment: organization-defined frequency consistent with recovery time and recovery point objectives]; c. Conduct backups of system documentation, including security- and privacy-related documentation [Assignment: organization-defined frequency consistent with recovery time and recovery point objectives]; and d. Protect the confidentiality, integrity, and availability of backup information.

๐Ÿ’ผ CP-9 System Backup (L)(M)(H)

a. Conduct backups of user-level information contained in [Assignment: organization-defined system components][FedRAMP Assignment: daily incremental; weekly full]; b. Conduct backups of system-level information contained in the system [FedRAMP Assignment: daily incremental; weekly full]; c. Conduct backups of system documentation, including security- and privacy-related documentation [FedRAMP Assignment: daily incremental; weekly full]; and d. Protect the confidentiality, integrity, and availability of backup information. **CP-9 Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider shall determine what elements of the cloud environment require the Information System Backup control. The service provider shall determine how Information System Backup is going to be verified and appropriate periodicity of the check. **(a) Requirement**: The service provider maintains at least three (3) backup copies of user-level information (at least one of which is available online) or provides an equivalent alternative. **(b) Requirement**: The service provider maintains at least three (3) backup copies of system-level information (at least one of which is available online) or provides an equivalent alternative. **(c) Requirement**: The service provider maintains at least three (3) backup copies of information system documentation including security information (at least one of which is available online) or provides an equivalent alternative.

๐Ÿ’ผ CP-9 System Backup (L)(M)(H)

a. Conduct backups of user-level information contained in [Assignment: organization-defined system components][FedRAMP Assignment: daily incremental; weekly full]; b. Conduct backups of system-level information contained in the system [FedRAMP Assignment: daily incremental; weekly full]; c. Conduct backups of system documentation, including security- and privacy-related documentation [FedRAMP Assignment: daily incremental; weekly full]; and d. Protect the confidentiality, integrity, and availability of backup information. **CP-9 Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider shall determine what elements of the cloud environment require the Information System Backup control. The service provider shall determine how Information System Backup is going to be verified and appropriate periodicity of the check. **(a) Requirement**: The service provider maintains at least three (3) backup copies of user-level information (at least one of which is available online) or provides an equivalent alternative. **(b) Requirement**: The service provider maintains at least three (3) backup copies of system-level information (at least one of which is available online) or provides an equivalent alternative. **(c) Requirement**: The service provider maintains at least three (3) backup copies of information system documentation including security information (at least one of which is available online) or provides an equivalent alternative.

๐Ÿ’ผ CP-9 System Backup (L)(M)(H)

a. Conduct backups of user-level information contained in [Assignment: organization-defined system components][FedRAMP Assignment: daily incremental; weekly full]; b. Conduct backups of system-level information contained in the system [FedRAMP Assignment: daily incremental; weekly full]; c. Conduct backups of system documentation, including security- and privacy-related documentation [FedRAMP Assignment: daily incremental; weekly full]; and d. Protect the confidentiality, integrity, and availability of backup information. **CP-9 Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider shall determine what elements of the cloud environment require the Information System Backup control. The service provider shall determine how Information System Backup is going to be verified and appropriate periodicity of the check. **(a) Requirement**: The service provider maintains at least three (3) backup copies of user-level information (at least one of which is available online) or provides an equivalent alternative. **(b) Requirement**: The service provider maintains at least three (3) backup copies of system-level information (at least one of which is available online) or provides an equivalent alternative. **(c) Requirement**: The service provider maintains at least three (3) backup copies of information system documentation including security information (at least one of which is available online) or provides an equivalent alternative.

๐Ÿ’ผ CP-9(5) Transfer to Alternate Storage Site (H)

Transfer system backup information to the alternate storage site [FedRAMP Assignment: time period and transfer rate consistent with the recovery time and recovery point objectives defined in the service provider and organization SLA.].

๐Ÿ’ผ CP-9(8) Cryptographic Protection (M)(H)

Implement cryptographic mechanisms to prevent unauthorized disclosure and modification of [FedRAMP Assignment: all backup files]. **CP-9 (8) Additional FedRAMP Requirements and Guidance:** **Guidance**: Note that this enhancement requires the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or National Security Agency (NSA) approved cryptography (see SC-13).

๐Ÿ’ผ CP-9(8) Cryptographic Protection (M)(H)

Implement cryptographic mechanisms to prevent unauthorized disclosure and modification of [FedRAMP Assignment: all backup files]. **CP-9 (8) Additional FedRAMP Requirements and Guidance:** **Guidance**: Note that this enhancement requires the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or National Security Agency (NSA) approved cryptography (see SC-13).

๐Ÿ’ผ d. standards and guidelines โ€” the body of knowledge for developing secure software would typically be embodied in a set of standards and guidelines. Typically, standards would exist for each programming language, taking into account known vulnerabilities and what is considered to be good practice. It is important that standards remain aligned with industry developments such as emerging vulnerabilities/threats and associated compensating controls. In developing software standards and guidelines, consideration would typically be given to:

๐Ÿ’ผ Data Classification

Data classification provides a way to categorize organizational data based on criticality and sensitivity in order to help you determine appropriate protection and retention controls.

๐Ÿ’ผ Data management

The optimal data management solution for a particular system varies based on the kind of data type (block, file, or object), access patterns (random or sequential), required throughput, frequency of access (online, offline, archival), frequency of update (WORM, dynamic), and availability and durability constraints. Well-Architected workloads use purpose-built data stores which allow different features to improve performance.

๐Ÿ’ผ Data management

Implement data management practices to reduce the provisioned storage required to support your workload, and the resources required to use it. Understand your data, and use storage technologies and configurations that best support the business value of the data and how itโ€™s used. Lifecycle data to more efficient, less performant storage when requirements decrease, and delete data thatโ€™s no longer required.

๐Ÿ’ผ Data Security (PR.DS)

Information and records (data) are managed consistent with the organization's risk strategy to protect the confidentiality, integrity, and availability of information.

๐Ÿ’ผ Data Security (PR.DS)

Data are managed consistent with the organization's risk strategy to protect the confidentiality, integrity, and availability of information.

๐Ÿ’ผ DE.AE-02: Potentially adverse events are analyzed to better understand associated activities

1. Use security information and event management (SIEM) or other tools to continuously monitor log events for known malicious and suspicious activity 2. Utilize up-to-date cyber threat intelligence in log analysis tools to improve detection accuracy and characterize threat actors, their methods, and indicators of compromise 3. Regularly conduct manual reviews of log events for technologies that cannot be sufficiently monitored through automation 4. Use log analysis tools to generate reports on their findings

๐Ÿ’ผ DE.AE-03: Information is correlated from multiple sources

1. Constantly transfer log data generated by other sources to a relatively small number of log servers 2. Use event correlation technology (e.g., SIEM) to collect information captured by multiple sources 3. Utilize cyber threat intelligence to help correlate events among log sources

๐Ÿ’ผ DE.AE-06: Information on adverse events is provided to authorized staff and tools

1. Use cybersecurity software to generate alerts and provide them to the security operations center (SOC), incident responders, and incident response tools 2. Incident responders and other authorized personnel can access log analysis findings at all times 3. Automatically create and assign tickets in the organization's ticketing system when certain types of alerts occur 4. Manually create and assign tickets in the organization's ticketing system when technical staff discover indicators of compromise

๐Ÿ’ผ DE.CM-02: The physical environment is monitored to find potentially adverse events

1. Monitor logs from physical access control systems (e.g., badge readers) to find unusual access patterns (e.g., deviations from the norm) and failed access attempts 2. Review and monitor physical access records (e.g., from visitor registration, sign-in sheets) 3. Monitor physical access controls (e.g., locks, latches, hinge pins, alarms) for signs of tampering 4. Monitor the physical environment using alarm systems, cameras, and security guards

๐Ÿ’ผ DE.CM-09: Computing hardware and software, runtime environments, and their data are monitored to find potentially adverse events

Ex1: Monitor email, web, file sharing, collaboration services, and other common attack vectors to detect malware, phishing, data leaks and exfiltration, and other adverse events Ex2: Monitor authentication attempts to identify attacks against credentials and unauthorized credential reuse Ex3: Monitor software configurations for deviations from security baselines Ex4: Monitor hardware and software for signs of tampering Ex5: Use technologies with a presence on endpoints to detect cyber health issues (e.g., missing patches, malware infections, unauthorized software), and redirect the endpoints to a remediation environment before access is authorized

๐Ÿ’ผ Decommission resources

After you manage a list of projects, employees, and technology resources over time you will be able to identify which resources are no longer being used, and which projects that no longer have an owner.

๐Ÿ’ผ Design for operations

Adopt approaches that improve the flow of changes into production and that help refactoring, fast feedback on quality, and bug fixing. These accelerate beneficial changes entering production, limit issues deployed, and provide rapid identification and remediation of issues introduced through deployment activities.

๐Ÿ’ผ Design interactions in a distributed system to mitigate or withstand failures

Distributed systems rely on communications networks to interconnect components (such as servers or services). Your workload must operate reliably despite data loss or latency over these networks. Components of the distributed system must operate in a way that does not negatively impact other components or the workload. These best practices allow workloads to withstand stresses or failures, more quickly recover from them, and mitigate the impact of such impairments. The result is improved mean time to recovery (MTTR).

๐Ÿ’ผ Design interactions in a distributed system to prevent failures

Distributed systems rely on communications networks to interconnect components, such as servers or services. Your workload must operate reliably despite data loss or latency in these networks. Components of the distributed system must operate in a way that does not negatively impact other components or the workload. These best practices prevent failures and improve mean time between failures (MTBF).

๐Ÿ’ผ Design your workload service architecture

Build highly scalable and reliable workloads using a service-oriented architecture (SOA) or a microservices architecture. Service-oriented architecture (SOA) is the practice of making software components reusable via service interfaces. Microservices architecture goes further to make components smaller and simpler.

๐Ÿ’ผ Expiration Management

Policies for identifying resources that do not implement expiration and rotation management procedures for keys, secrets, and certificates.

๐Ÿ’ผ Governance (ID.GV)

The policies, procedures, and processes to manage and monitor the organization's regulatory, legal, risk, environmental, and operational requirements are understood and inform the management of cybersecurity risk.

๐Ÿ’ผ GV.OC-02: Internal and external stakeholders are understood, and their needs and expectations regarding cybersecurity risk management are understood and considered

1. Identify relevant internal stakeholders and their cybersecurity-related expectations (e.g., performance and risk expectations of officers, directors, and advisors; cultural expectations of employees) 2. Identify relevant external stakeholders and their cybersecurity-related expectations (e.g., privacy expectations of customers, business expectations of partnerships, compliance expectations of regulators, ethics expectations of society)

๐Ÿ’ผ GV.OC-03: Legal, regulatory, and contractual requirements regarding cybersecurity - including privacy and civil liberties obligations - are understood and managed

1. Determine a process to track and manage legal and regulatory requirements regarding protection of individuals' information (e.g., Health Insurance Portability and Accountability Act, California Consumer Privacy Act, General Data Protection Regulation) 2. Determine a process to track and manage contractual requirements for cybersecurity management of supplier, customer, and partner information 3. Align the organization's cybersecurity strategy with legal, regulatory, and contractual requirements

๐Ÿ’ผ GV.OC-04: Critical objectives, capabilities, and services that external stakeholders depend on or expect from the organization are understood and communicated

1. Establish criteria for determining the criticality of capabilities and services as viewed by internal and external stakeholders 2. Determine (e.g., from a business impact analysis) assets and business operations that are vital to achieving mission objectives and the potential impact of a loss (or partial loss) of such operations 3. Establish and communicate resilience objectives (e.g., recovery time objectives) for delivering critical capabilities and services in various operating states (e.g., under attack, during recovery, normal operation)

๐Ÿ’ผ GV.OC-05: Outcomes, capabilities, and services that the organization depends on are understood and communicated

1. Create an inventory of the organization's dependencies on external resources (e.g., facilities, cloud-based hosting providers) and their relationships to organizational assets and business functions 2. Identify and document external dependencies that are potential points of failure for the organization's critical capabilities and services, and share that information with appropriate personnel

๐Ÿ’ผ GV.PO-01: Policy for managing cybersecurity risks is established based on organizational context, cybersecurity strategy, and priorities and is communicated and enforced

1. Create, disseminate, and maintain an understandable, usable risk management policy with statements of management intent, expectations, and direction 2. Periodically review policy and supporting processes and procedures to ensure that they align with risk management strategy objectives and priorities, as well as the high-level direction of the cybersecurity policy 3. Require approval from senior management on policy 4. Communicate cybersecurity risk management policy and supporting processes and procedures across the organization 5. Require personnel to acknowledge receipt of policy when first hired, annually, and whenever policy is updated

๐Ÿ’ผ GV.PO-02: Policy for managing cybersecurity risks is reviewed, updated, communicated, and enforced to reflect changes in requirements, threats, technology, and organizational mission

1. Update policy based on periodic reviews of cybersecurity risk management results to ensure that policy and supporting processes and procedures adequately maintain risk at an acceptable level 2. Provide a timeline for reviewing changes to the organization's risk environment (e.g., changes in risk or in the organization's mission objectives), and communicate recommended policy updates 3. Update policy to reflect changes in legal and regulatory requirements 4. Update policy to reflect changes in technology (e.g., adoption of artificial intelligence) and changes to the business (e.g., acquisition of a new business, new contract requirements)

๐Ÿ’ผ GV.RM-01: Risk management objectives are established and agreed to by organizational stakeholders

1. Update near-term and long-term cybersecurity risk management objectives as part of annual strategic planning and when major changes occur 2. Establish measurable objectives for cybersecurity risk management (e.g., manage the quality of user training, ensure adequate risk protection for industrial control systems) 3. Senior leaders agree about cybersecurity objectives and use them for measuring and managing risk and performance

๐Ÿ’ผ GV.RM-04: Strategic direction that describes appropriate risk response options is established and communicated

1. Specify criteria for accepting and avoiding cybersecurity risk for various classifications of data 2. Determine whether to purchase cybersecurity insurance 3. Document conditions under which shared responsibility models are acceptable (e.g., outsourcing certain cybersecurity functions, having a third party perform financial transactions on behalf of the organization, using public cloud-based services)

๐Ÿ’ผ GV.RM-05: Lines of communication across the organization are established for cybersecurity risks, including risks from suppliers and other third parties

1. Determine how to update senior executives, directors, and management on the organization's cybersecurity posture at agreed-upon intervals 2. Identify how all departments across the organization - such as management, operations, internal auditors, legal, acquisition, physical security, and HR - will communicate with each other about cybersecurity risks

๐Ÿ’ผ GV.RM-06: A standardized method for calculating, documenting, categorizing, and prioritizing cybersecurity risks is established and communicated

1. Establish criteria for using a quantitative approach to cybersecurity risk analysis, and specify probability and exposure formulas 2. Create and use templates (e.g., a risk register) to document cybersecurity risk information (e.g., risk description, exposure, treatment, and ownership) 3. Establish criteria for risk prioritization at the appropriate levels within the enterprise 4. Use a consistent list of risk categories to support integrating, aggregating, and comparing cybersecurity risks

๐Ÿ’ผ GV.RR-01: Organizational leadership is responsible and accountable for cybersecurity risk and fosters a culture that is risk-aware, ethical, and continually improving

1. Leaders (e.g., directors) agree on their roles and responsibilities in developing, implementing, and assessing the organization's cybersecurity strategy 2. Share leaders' expectations regarding a secure and ethical culture, especially when current events present the opportunity to highlight positive or negative examples of cybersecurity risk management 3. Leaders direct the CISO to maintain a comprehensive cybersecurity risk strategy and review and update it at least annually and after major events 4. Conduct reviews to ensure adequate authority and coordination among those responsible for managing cybersecurity risk

๐Ÿ’ผ GV.RR-02: Roles, responsibilities, and authorities related to cybersecurity risk management are established, communicated, understood, and enforced

1. Document risk management roles and responsibilities in policy 2. Document who is responsible and accountable for cybersecurity risk management activities and how those teams and individuals are to be consulted and informed 3. Include cybersecurity responsibilities and performance requirements in personnel descriptions 4. Document performance goals for personnel with cybersecurity risk management responsibilities, and periodically measure performance to identify areas for improvement 5. Clearly articulate cybersecurity responsibilities within operations, risk functions, and internal audit functions

๐Ÿ’ผ GV.RR-03: Adequate resources are allocated commensurate with the cybersecurity risk strategy, roles, responsibilities, and policies

1. Conduct periodic management reviews to ensure that those given cybersecurity risk management responsibilities have the necessary authority 2. Identify resource allocation and investment in line with risk tolerance and response 3. Provide adequate and sufficient people, process, and technical resources to support sections: - "/frameworks/nist-csf-v1.1/id-rm/01" - "/frameworks/nist-sp-800-53-r5/pm/03"

๐Ÿ’ผ GV.RR-04: Cybersecurity is included in human resources practices

1. Integrate cybersecurity risk management considerations into human resources processes (e.g., personnel screening, onboarding, change notification, offboarding) 2. Consider cybersecurity knowledge to be a positive factor in hiring, training, and retention decisions 3. Conduct background checks prior to onboarding new personnel for sensitive roles, and periodically repeat background checks for personnel with such roles 4. Define and enforce obligations for personnel to be aware of, adhere to, and uphold security policies as they relate to their roles

๐Ÿ’ผ GV.SC-01: A cybersecurity supply chain risk management program, strategy, objectives, policies, and processes are established and agreed to by organizational stakeholders

1. Establish a strategy that expresses the objectives of the cybersecurity supply chain risk management program 2. Develop the cybersecurity supply chain risk management program, including a plan (with milestones), policies, and procedures that guide implementation and improvement of the program, and share the policies and procedures with the organizational stakeholders 3. Develop and implement program processes based on the strategy, objectives, policies, and procedures that are agreed upon and performed by the organizational stakeholders 4. Establish a cross-organizational mechanism that ensures alignment between functions that contribute to cybersecurity supply chain risk management, such as cybersecurity, IT, operations, legal, human resources, and engineering

๐Ÿ’ผ GV.SC-02: Cybersecurity roles and responsibilities for suppliers, customers, and partners are established, communicated, and coordinated internally and externally

1. Identify one or more specific roles or positions that will be responsible and accountable for planning, resourcing, and executing cybersecurity supply chain risk management activities 2. Document cybersecurity supply chain risk management roles and responsibilities in policy 3. Create responsibility matrixes to document who will be responsible and accountable for cybersecurity supply chain risk management activities and how those teams and individuals will be consulted and informed 4. Include cybersecurity supply chain risk management responsibilities and performance requirements in personnel descriptions to ensure clarity and improve accountability 5. Document performance goals for personnel with cybersecurity risk management-specific responsibilities, and periodically measure them to demonstrate and improve performance 6. Develop roles and responsibilities for suppliers, customers, and business partners to address shared responsibilities for applicable cybersecurity risks, and integrate them into organizational policies and applicable third-party agreements 7. Internally communicate cybersecurity supply chain risk management roles and responsibilities for third parties 8. Establish rules and protocols for information sharing and reporting processes between the organization and its suppliers

๐Ÿ’ผ GV.SC-03: Cybersecurity supply chain risk management is integrated into cybersecurity and enterprise risk management, risk assessment, and improvement processes

1. Identify areas of alignment and overlap with cybersecurity and enterprise risk management 2. Establish integrated control sets for cybersecurity risk management and cybersecurity supply chain risk management 3. Integrate cybersecurity supply chain risk management into improvement processes 4. Escalate material cybersecurity risks in supply chains to senior management, and address them at the enterprise risk management level

๐Ÿ’ผ GV.SC-04: Suppliers are known and prioritized by criticality

1. Develop criteria for supplier criticality based on, for example, the sensitivity of data processed or possessed by suppliers, the degree of access to the organization's systems, and the importance of the products or services to the organization's mission 2. Keep a record of all suppliers, and prioritize suppliers based on the criticality criteria

๐Ÿ’ผ GV.SC-05: Requirements to address cybersecurity risks in supply chains are established, prioritized, and integrated into contracts and other types of agreements with suppliers and other relevant third parties

1. Establish security requirements for suppliers, products, and services commensurate with their criticality level and potential impact if compromised 2. Include all cybersecurity and supply chain requirements that third parties must follow and how compliance with the requirements may be verified in default contractual language 3. Define the rules and protocols for information sharing between the organization and its suppliers and sub-tier suppliers in agreements 4. Manage risk by including security requirements in agreements based on their criticality and potential impact if compromised 5. Define security requirements in service-level agreements (SLAs) for monitoring suppliers for acceptable security performance throughout the supplier relationship lifecycle 6. Contractually require suppliers to disclose cybersecurity features, functions, and vulnerabilities of their products and services for the life of the product or the term of service 7. Contractually require suppliers to provide and maintain a current component inventory (e.g., software or hardware bill of materials) for critical products 8. Contractually require suppliers to vet their employees and guard against insider threats 9. Contractually require suppliers to provide evidence of performing acceptable security practices through, for example, self-attestation, conformance to known standards, certifications, or inspections 10. Specify in contracts and other agreements the rights and responsibilities of the organization, its suppliers, and their supply chains, with respect to potential cybersecurity risks

๐Ÿ’ผ GV.SC-06: Planning and due diligence are performed to reduce risks before entering into formal supplier or other third-party relationships

1. Perform thorough due diligence on prospective suppliers that is consistent with procurement planning and commensurate with the level of risk, criticality, and complexity of each supplier relationship 2. Assess the suitability of the technology and cybersecurity capabilities and the risk management practices of prospective suppliers 3. Conduct supplier risk assessments against business and applicable cybersecurity requirements 4. Assess the authenticity, integrity, and security of critical products prior to acquisition and use

๐Ÿ’ผ GV.SC-07: The risks posed by a supplier, their products and services, and other third parties are understood, recorded, prioritized, assessed, responded to, and monitored over the course of the relationship

1. Adjust assessment formats and frequencies based on the third party's reputation and the criticality of the products or services they provide 2. Evaluate third parties' evidence of compliance with contractual cybersecurity requirements, such as self-attestations, warranties, certifications, and other artifacts 3. Monitor critical suppliers to ensure that they are fulfilling their security obligations throughout the supplier relationship lifecycle using a variety of methods and techniques, such as inspections, audits, tests, or other forms of evaluation 4. Monitor critical suppliers, services, and products for changes to their risk profiles, and reevaluate supplier criticality and risk impact accordingly 5. Plan for unexpected supplier and supply chain-related interruptions to ensure business continuity

๐Ÿ’ผ GV.SC-08: Relevant suppliers and other third parties are included in incident planning, response, and recovery activities

1. Define and use rules and protocols for reporting incident response and recovery activities and the status between the organization and its suppliers 2. Identify and document the roles and responsibilities of the organization and its suppliers for incident response 3. Include critical suppliers in incident response exercises and simulations 4. Define and coordinate crisis communication methods and protocols between the organization and its critical suppliers 5. Conduct collaborative lessons learned sessions with critical suppliers

๐Ÿ’ผ GV.SC-09: Supply chain security practices are integrated into cybersecurity and enterprise risk management programs, and their performance is monitored throughout the technology product and service life cycle

1. Policies and procedures require provenance records for all acquired technology products and services 2. Periodically provide risk reporting to leaders about how acquired components are proven to be untampered and authentic 3. Communicate regularly among cybersecurity risk managers and operations personnel about the need to acquire software patches, updates, and upgrades only from authenticated and trustworthy software providers 4. Review policies to ensure that they require approved supplier personnel to perform maintenance on supplier products 5. Policies and procedure require checking upgrades to critical hardware for unauthorized changes

๐Ÿ’ผ GV.SC-10: Cybersecurity supply chain risk management plans include provisions for activities that occur after the conclusion of a partnership or service agreement

Ex1: Establish processes for terminating critical relationships under both normal and adverse circumstances Ex2: Define and implement plans for component end-of-life maintenance support and obsolescence Ex3: Verify that supplier access to organization resources is deactivated promptly when it is no longer needed Ex4: Verify that assets containing the organization's data are returned or properly disposed of in a timely, controlled, and safe manner Ex5: Develop and execute a plan for terminating or transitioning supplier relationships that takes supply chain security risk and resiliency into account Ex6: Mitigate risks to data and systems created by supplier termination Ex7: Manage data leakage risks associated with supplier termination

๐Ÿ’ผ Hardware and services

Look for opportunities to reduce workload sustainability impacts by making changes to your hardware management practices. Minimize the amount of hardware needed to provision and deploy, and select the most efficient hardware and services for your individual workload.

๐Ÿ’ผ IA-1 IDENTIFICATION AND AUTHENTICATION POLICY AND PROCEDURES

The organization: IA-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: IA-1a.1. An identification and authentication policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and IA-1a.2. Procedures to facilitate the implementation of the identification and authentication policy and associated identification and authentication controls; and IA-1b. Reviews and updates the current: IA-1b.1. Identification and authentication policy [Assignment: organization-defined frequency]; and IA-1b.2. Identification and authentication procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ IA-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] identification and authentication policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the identification and authentication policy and the associated identification and authentication controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the identification and authentication policy and procedures; and c. Review and update the current identification and authentication: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ IA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] identification and authentication policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the identification and authentication policy and the associated identification and authentication controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the identification and authentication policy and procedures; and c. Review and update the current identification and authentication: 1. Policy [FedRAMP Assignment: at least every 3 years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ IA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] identification and authentication policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the identification and authentication policy and the associated identification and authentication controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the identification and authentication policy and procedures; and c. Review and update the current identification and authentication: 1. Policy [FedRAMP Assignment: at least every 3 years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ IA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] identification and authentication policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the identification and authentication policy and the associated identification and authentication controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the identification and authentication policy and procedures; and c. Review and update the current identification and authentication: 1. Policy [FedRAMP Assignment: at least every 3 years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ IA-10 Adaptive Authentication

Require individuals accessing the system to employ [Assignment: organization-defined supplemental authentication techniques or mechanisms] under specific [Assignment: organization-defined circumstances or situations].

๐Ÿ’ผ IA-10 ADAPTIVE IDENTIFICATION AND AUTHENTICATION

The organization requires that individuals accessing the information system employ [Assignment: organization-defined supplemental authentication techniques or mechanisms] under specific [Assignment: organization-defined circumstances or situations].

๐Ÿ’ผ IA-11 RE-AUTHENTICATION

The organization requires users and devices to re-authenticate when [Assignment: organization-defined circumstances or situations requiring re-authentication].

๐Ÿ’ผ IA-11 Re-authentication (L)(M)(H)

Require users to re-authenticate when [Assignment: organization-defined circumstances or situations requiring re-authentication]. **IA-11 Additional FedRAMP Requirements and Guidance:** **Guidance: The fixed time period cannot exceed the limits set in SP 800-63. At this time they are:** - AAL2 (moderate baseline) - Twelve (12) hours or - Thirty (30) minutes of inactivity.

๐Ÿ’ผ IA-11 Re-authentication (L)(M)(H)

Require users to re-authenticate when [Assignment: organization-defined circumstances or situations requiring re-authentication]. **IA-11 Additional FedRAMP Requirements and Guidance:** **Guidance: The fixed time period cannot exceed the limits set in SP 800-63. At this time they are:** - AAL2 (moderate baseline) - Twelve (12) hours or - Thirty (30) minutes of inactivity.

๐Ÿ’ผ IA-11 Re-authentication (L)(M)(H)

Require users to re-authenticate when [Assignment: organization-defined circumstances or situations requiring re-authentication]. **IA-11 Additional FedRAMP Requirements and Guidance:** **Guidance: The fixed time period cannot exceed the limits set in SP 800-63. At this time they are:** - AAL2 (moderate baseline) - Twelve (12) hours or - Thirty (30) minutes of inactivity.

๐Ÿ’ผ IA-12 Identity Proofing

a. Identity proof users that require accounts for logical access to systems based on appropriate identity assurance level requirements as specified in applicable standards and guidelines; b. Resolve user identities to a unique individual; and c. Collect, validate, and verify identity evidence.

๐Ÿ’ผ IA-12 Identity Proofing (M)(H)

a. Identity proof users that require accounts for logical access to systems based on appropriate identity assurance level requirements as specified in applicable standards and guidelines; b. Resolve user identities to a unique individual; and c. Collect, validate, and verify identity evidence. **IA-12 Additional FedRAMP Requirements and Guidance:** **Guidance**: In accordance with NIST SP 800-63A Enrollment and Identity Proofing.

๐Ÿ’ผ IA-12 Identity Proofing (M)(H)

a. Identity proof users that require accounts for logical access to systems based on appropriate identity assurance level requirements as specified in applicable standards and guidelines; b. Resolve user identities to a unique individual; and c. Collect, validate, and verify identity evidence. **IA-12 Additional FedRAMP Requirements and Guidance:** **Guidance**: In accordance with NIST SP 800-63A Enrollment and Identity Proofing.

๐Ÿ’ผ IA-12(5) Address Confirmation (M)(H)

Require that a [Selection: Assignment: registration code; notice of proofing] be delivered through an out-of-band channel to verify the users address (physical or digital) of record. **IA-12 (5) Additional FedRAMP Requirements and Guidance:** **Guidance**: In accordance with NIST SP 800-63A Enrollment and Identity Proofing.

๐Ÿ’ผ IA-12(5) Address Confirmation (M)(H)

Require that a [Selection: Assignment: registration code; notice of proofing] be delivered through an out-of-band channel to verify the users address (physical or digital) of record. **IA-12 (5) Additional FedRAMP Requirements and Guidance:** **Guidance**: In accordance with NIST SP 800-63A Enrollment and Identity Proofing.

๐Ÿ’ผ IA-2 (10) SINGLE SIGN-ON

The information system provides a single sign-on capability for [Assignment: organization-defined information system accounts and services].

๐Ÿ’ผ IA-2 (11) REMOTE ACCESS - SEPARATE DEVICE

The information system implements multifactor authentication for remote access to privileged and non-privileged accounts such that one of the factors is provided by a device separate from the system gaining access and the device meets [Assignment: organization-defined strength of mechanism requirements].

๐Ÿ’ผ IA-2 Identification and Authentication (Organizational Users) (L)(M)(H)

Uniquely identify and authenticate organizational users and associate that unique identification with processes acting on behalf of those users. **IA-2 Additional FedRAMP Requirements and Guidance:** **Guidance**: "Phishing-resistant" authentication refers to authentication processes designed to detect and prevent disclosure of authentication secrets and outputs to a website or application masquerading as a legitimate system. **Requirement**: For all control enhancements that specify multifactor authentication, the implementation must adhere to the Digital Identity Guidelines specified in NIST Special Publication 800-63B. **Requirement**: Multi-factor authentication must be phishing-resistant. **Requirement**: All uses of encrypted virtual private networks must meet all applicable Federal requirements and architecture, dataflow, and security and privacy controls must be documented, assessed, and authorized to operate.

๐Ÿ’ผ IA-2 Identification and Authentication (Organizational Users) (L)(M)(H)

Uniquely identify and authenticate organizational users and associate that unique identification with processes acting on behalf of those users. **IA-2 Additional FedRAMP Requirements and Guidance:** **Guidance**: "Phishing-resistant" authentication refers to authentication processes designed to detect and prevent disclosure of authentication secrets and outputs to a website or application masquerading as a legitimate system. **Requirement**: For all control enhancements that specify multifactor authentication, the implementation must adhere to the Digital Identity Guidelines specified in NIST Special Publication 800-63B. **Requirement**: Multi-factor authentication must be phishing-resistant. **Requirement**: All uses of encrypted virtual private networks must meet all applicable Federal requirements and architecture, dataflow, and security and privacy controls must be documented, assessed, and authorized to operate.

๐Ÿ’ผ IA-2 Identification and Authentication (Organizational Users) (L)(M)(H)

Uniquely identify and authenticate organizational users and associate that unique identification with processes acting on behalf of those users. **IA-2 Additional FedRAMP Requirements and Guidance:** **Guidance**: "Phishing-resistant" authentication refers to authentication processes designed to detect and prevent disclosure of authentication secrets and outputs to a website or application masquerading as a legitimate system. **Requirement**: For all control enhancements that specify multifactor authentication, the implementation must adhere to the Digital Identity Guidelines specified in NIST Special Publication 800-63B. **Requirement**: Multi-factor authentication must be phishing-resistant. **Requirement**: All uses of encrypted virtual private networks must meet all applicable Federal requirements and architecture, dataflow, and security and privacy controls must be documented, assessed, and authorized to operate.

๐Ÿ’ผ IA-2(1) Multi-factor Authentication to Privileged Accounts (L)(M)(H)

Implement multi-factor authentication for access to privileged accounts. **IA-2 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Multi-factor authentication to subsequent components in the same user domain is not required. **Requirement**: According to SP 800-63-3, SP 800-63A (IAL), SP 800-63B (AAL), and SP 800-63C (FAL). **Requirement**: Multi-factor authentication must be phishing-resistant.

๐Ÿ’ผ IA-2(1) Multi-factor Authentication to Privileged Accounts (L)(M)(H)

Implement multi-factor authentication for access to privileged accounts. **IA-2 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Multi-factor authentication to subsequent components in the same user domain is not required. **Requirement**: According to SP 800-63-3, SP 800-63A (IAL), SP 800-63B (AAL), and SP 800-63C (FAL). **Requirement**: Multi-factor authentication must be phishing-resistant.

๐Ÿ’ผ IA-2(1) Multi-factor Authentication to Privileged Accounts (L)(M)(H)

Implement multi-factor authentication for access to privileged accounts. **IA-2 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Multi-factor authentication to subsequent components in the same user domain is not required. **Requirement**: According to SP 800-63-3, SP 800-63A (IAL), SP 800-63B (AAL), and SP 800-63C (FAL). **Requirement**: Multi-factor authentication must be phishing-resistant.

๐Ÿ’ผ IA-2(12) Acceptance of PIV Credentials (L)(M)(H)

Accept and electronically verify Personal Identity Verification-compliant credentials. **IA-2 (12) Additional FedRAMP Requirements and Guidance:** **Guidance**: Include Common Access Card (CAC), i.e., the DoD technical implementation of PIV/FIPS 201/HSPD-12.

๐Ÿ’ผ IA-2(12) Acceptance of PIV Credentials (L)(M)(H)

Accept and electronically verify Personal Identity Verification-compliant credentials. **IA-2 (12) Additional FedRAMP Requirements and Guidance:** **Guidance**: Include Common Access Card (CAC), i.e., the DoD technical implementation of PIV/FIPS 201/HSPD-12.

๐Ÿ’ผ IA-2(12) Acceptance of PIV Credentials (L)(M)(H)

Accept and electronically verify Personal Identity Verification-compliant credentials. **IA-2 (12) Additional FedRAMP Requirements and Guidance:** **Guidance**: Include Common Access Card (CAC), i.e., the DoD technical implementation of PIV/FIPS 201/HSPD-12.

๐Ÿ’ผ IA-2(6) Access to Accounts โ€”separate Device (M)(H)

Implement multi-factor authentication for [FedRAMP Assignment: local, network and remote] access to [FedRAMP Assignment: privileged accounts; non-privileged accounts] such that: (a) One of the factors is provided by a device separate from the system gaining access; and (b) The device meets [FedRAMP Assignment: FIPS-validated or NSA-approved cryptography]. **IA-2 (6) Additional FedRAMP Requirements and Guidance:** **Guidance**: PIV=separate device. Please refer to NIST SP 800-157 Guidelines for Derived Personal Identity Verification (PIV) Credentials. **Guidance**: See SC-13 Guidance for more information on FIPS-validated or NSA-approved cryptography.

๐Ÿ’ผ IA-2(6) Access to Accounts โ€”separate Device (M)(H)

Implement multi-factor authentication for [FedRAMP Assignment: local, network and remote] access to [FedRAMP Assignment: privileged accounts; non-privileged accounts] such that: (a) One of the factors is provided by a device separate from the system gaining access; and (b) The device meets [FedRAMP Assignment: FIPS-validated or NSA-approved cryptography]. **IA-2 (6) Additional FedRAMP Requirements and Guidance:** **Guidance**: PIV=separate device. Please refer to NIST SP 800-157 Guidelines for Derived Personal Identity Verification (PIV) Credentials. **Guidance**: See SC-13 Guidance for more information on FIPS-validated or NSA-approved cryptography.

๐Ÿ’ผ IA-3 (1) CRYPTOGRAPHIC BIDIRECTIONAL AUTHENTICATION

The information system authenticates [Assignment: organization-defined specific devices and/or types of devices] before establishing [Selection (one or more): local; remote; network] connection using bidirectional authentication that is cryptographically based.

๐Ÿ’ผ IA-3 (3) DYNAMIC ADDRESS ALLOCATION

The organization: IA-3 (3)(a) Standardizes dynamic address allocation lease information and the lease duration assigned to devices in accordance with [Assignment: organization-defined lease information and lease duration]; and IA-3 (3)(b) Audits lease information when assigned to a device.

๐Ÿ’ผ IA-3 (4) DEVICE ATTESTATION

The organization ensures that device identification and authentication based on attestation is handled by [Assignment: organization-defined configuration management process].

๐Ÿ’ผ IA-4 (4) IDENTIFY USER STATUS

The organization manages individual identifiers by uniquely identifying each individual as [Assignment: organization-defined characteristic identifying individual status].

๐Ÿ’ผ IA-4 Identifier Management

Manage system identifiers by: a. Receiving authorization from [Assignment: organization-defined personnel or roles] to assign an individual, group, role, service, or device identifier; b. Selecting an identifier that identifies an individual, group, role, service, or device; c. Assigning the identifier to the intended individual, group, role, service, or device; and d. Preventing reuse of identifiers for [Assignment: organization-defined time period].

๐Ÿ’ผ IA-4 IDENTIFIER MANAGEMENT

The organization manages information system identifiers by: IA-4a. Receiving authorization from [Assignment: organization-defined personnel or roles] to assign an individual, group, role, or device identifier; IA-4b. Selecting an identifier that identifies an individual, group, role, or device; IA-4c. Assigning the identifier to the intended individual, group, role, or device; IA-4d. Preventing reuse of identifiers for [Assignment: organization-defined time period]; and IA-4e. Disabling the identifier after [Assignment: organization-defined time period of inactivity].

๐Ÿ’ผ IA-4 Identifier Management (L)(M)(H)

Manage system identifiers by: a. Receiving authorization from [FedRAMP Assignment: at a minimum, the ISSO (or similar role within the organization)] to assign an individual, group, role, service, or device identifier; b. Selecting an identifier that identifies an individual, group, role, service, or device; c. Assigning the identifier to the intended individual, group, role, service, or device; and d. Preventing reuse of identifiers for [FedRAMP Assignment: at least two (2) years].

๐Ÿ’ผ IA-4 Identifier Management (L)(M)(H)

Manage system identifiers by: a. Receiving authorization from [FedRAMP Assignment: at a minimum, the ISSO (or similar role within the organization)] to assign an individual, group, role, service, or device identifier; b. Selecting an identifier that identifies an individual, group, role, service, or device; c. Assigning the identifier to the intended individual, group, role, service, or device; and d. Preventing reuse of identifiers for [FedRAMP Assignment: at least two (2) years].

๐Ÿ’ผ IA-4 Identifier Management (L)(M)(H)

Manage system identifiers by: a. Receiving authorization from [FedRAMP Assignment: at a minimum, the ISSO (or similar role within the organization)] to assign an individual, group, role, service, or device identifier; b. Selecting an identifier that identifies an individual, group, role, service, or device; c. Assigning the identifier to the intended individual, group, role, service, or device; and d. Preventing reuse of identifiers for [FedRAMP Assignment: at least two (2) years].

๐Ÿ’ผ IA-5 (1) PASSWORD-BASED AUTHENTICATION

The information system, for password-based authentication: IA-5 (1)(a) Enforces minimum password complexity of [Assignment: organization-defined requirements for case sensitivity, number of characters, mix of upper-case letters, lower-case letters, numbers, and special characters, including minimum requirements for each type]; IA-5 (1)(b) Enforces at least the following number of changed characters when new passwords are created: [Assignment: organization-defined number]; IA-5 (1)(c) Stores and transmits only cryptographically-protected passwords; IA-5 (1)(d) Enforces password minimum and maximum lifetime restrictions of [Assignment: organization-defined numbers for lifetime minimum, lifetime maximum]; IA-5 (1)(e) Prohibits password reuse for [Assignment: organization-defined number] generations; and IA-5 (1)(f) Allows the use of a temporary password for system logons with an immediate change to a permanent password.

๐Ÿ’ผ IA-5 (14) MANAGING CONTENT OF PKI TRUST STORES

The organization, for PKI-based authentication, employs a deliberate organization-wide methodology for managing the content of PKI trust stores installed across all platforms including networks, operating systems, browsers, and applications.

๐Ÿ’ผ IA-5 (2) PKI-BASED AUTHENTICATION

The information system, for PKI-based authentication: IA-5 (2)(a) Validates certifications by constructing and verifying a certification path to an accepted trust anchor including checking certificate status information; IA-5 (2)(b) Enforces authorized access to the corresponding private key; IA-5 (2)(c) Maps the authenticated identity to the account of the individual or group; and IA-5 (2)(d) Implements a local cache of revocation data to support path discovery and validation in case of inability to access revocation information via the network.

๐Ÿ’ผ IA-5 (3) IN-PERSON OR TRUSTED THIRD-PARTY REGISTRATION

The organization requires that the registration process to receive [Assignment: organization-defined types of and/or specific authenticators] be conducted [Selection: in person; by a trusted third party] before [Assignment: organization-defined registration authority] with authorization by [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ IA-5 Authenticator Management

Manage system authenticators by: a. Verifying, as part of the initial authenticator distribution, the identity of the individual, group, role, service, or device receiving the authenticator; b. Establishing initial authenticator content for any authenticators issued by the organization; c. Ensuring that authenticators have sufficient strength of mechanism for their intended use; d. Establishing and implementing administrative procedures for initial authenticator distribution, for lost or compromised or damaged authenticators, and for revoking authenticators; e. Changing default authenticators prior to first use; f. Changing or refreshing authenticators [Assignment: organization-defined time period by authenticator type] or when [Assignment: organization-defined events] occur; g. Protecting authenticator content from unauthorized disclosure and modification; h. Requiring individuals to take, and having devices implement, specific controls to protect authenticators; and i. Changing authenticators for group or role accounts when membership to those accounts changes.

๐Ÿ’ผ IA-5 AUTHENTICATOR MANAGEMENT

The organization manages information system authenticators by: IA-5a. Verifying, as part of the initial authenticator distribution, the identity of the individual, group, role, or device receiving the authenticator; IA-5b. Establishing initial authenticator content for authenticators defined by the organization; IA-5c. Ensuring that authenticators have sufficient strength of mechanism for their intended use; IA-5d. Establishing and implementing administrative procedures for initial authenticator distribution, for lost/compromised or damaged authenticators, and for revoking authenticators; IA-5e. Changing default content of authenticators prior to information system installation; IA-5f. Establishing minimum and maximum lifetime restrictions and reuse conditions for authenticators; IA-5g. Changing/refreshing authenticators [Assignment: organization-defined time period by authenticator type]; IA-5h. Protecting authenticator content from unauthorized disclosure and modification; IA-5i. Requiring individuals to take, and having devices implement, specific security safeguards to protect authenticators; and IA-5j. Changing authenticators for group/role accounts when membership to those accounts changes.

๐Ÿ’ผ IA-5 Authenticator Management (L)(M)(H)

Manage system authenticators by: a. Verifying, as part of the initial authenticator distribution, the identity of the individual, group, role, service, or device receiving the authenticator; b. Establishing initial authenticator content for any authenticators issued by the organization; c. Ensuring that authenticators have sufficient strength of mechanism for their intended use; d. Establishing and implementing administrative procedures for initial authenticator distribution, for lost or compromised or damaged authenticators, and for revoking authenticators; e. Changing default authenticators prior to first use; f. Changing or refreshing authenticators [Assignment: organization-defined time period by authenticator type] or when [Assignment: organization-defined events] occur; g. Protecting authenticator content from unauthorized disclosure and modification; h. Requiring individuals to take, and having devices implement, specific controls to protect authenticators; and i. Changing authenticators for group or role accounts when membership to those accounts changes. **IA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: SP 800-63C Section 6.2.3 Encrypted Assertion requires that authentication assertions be encrypted when passed through third parties, such as a browser. For example, a SAML assertion can be encrypted using XML-Encryption, or an OpenID Connect ID Token can be encrypted using JSON Web Encryption (JWE). **Requirement**: Authenticators must be compliant with NIST SP 800-63-3 Digital Identity Guidelines IAL, AAL, FAL level 2. Link <https://pages.nist.gov/800-63-3>.

๐Ÿ’ผ IA-5 Authenticator Management (L)(M)(H)

Manage system authenticators by: a. Verifying, as part of the initial authenticator distribution, the identity of the individual, group, role, service, or device receiving the authenticator; b. Establishing initial authenticator content for any authenticators issued by the organization; c. Ensuring that authenticators have sufficient strength of mechanism for their intended use; d. Establishing and implementing administrative procedures for initial authenticator distribution, for lost or compromised or damaged authenticators, and for revoking authenticators; e. Changing default authenticators prior to first use; f. Changing or refreshing authenticators [Assignment: organization-defined time period by authenticator type] or when [Assignment: organization-defined events] occur; g. Protecting authenticator content from unauthorized disclosure and modification; h. Requiring individuals to take, and having devices implement, specific controls to protect authenticators; and i. Changing authenticators for group or role accounts when membership to those accounts changes. **IA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: SP 800-63C Section 6.2.3 Encrypted Assertion requires that authentication assertions be encrypted when passed through third parties, such as a browser. For example, a SAML assertion can be encrypted using XML-Encryption, or an OpenID Connect ID Token can be encrypted using JSON Web Encryption (JWE). **Requirement**: Authenticators must be compliant with NIST SP 800-63-3 Digital Identity Guidelines IAL, AAL, FAL level 2. Link <https://pages.nist.gov/800-63-3>.

๐Ÿ’ผ IA-5 Authenticator Management (L)(M)(H)

Manage system authenticators by: a. Verifying, as part of the initial authenticator distribution, the identity of the individual, group, role, service, or device receiving the authenticator; b. Establishing initial authenticator content for any authenticators issued by the organization; c. Ensuring that authenticators have sufficient strength of mechanism for their intended use; d. Establishing and implementing administrative procedures for initial authenticator distribution, for lost or compromised or damaged authenticators, and for revoking authenticators; e. Changing default authenticators prior to first use; f. Changing or refreshing authenticators [Assignment: organization-defined time period by authenticator type] or when [Assignment: organization-defined events] occur; g. Protecting authenticator content from unauthorized disclosure and modification; h. Requiring individuals to take, and having devices implement, specific controls to protect authenticators; and i. Changing authenticators for group or role accounts when membership to those accounts changes. **IA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: SP 800-63C Section 6.2.3 Encrypted Assertion requires that authentication assertions be encrypted when passed through third parties, such as a browser. For example, a SAML assertion can be encrypted using XML-Encryption, or an OpenID Connect ID Token can be encrypted using JSON Web Encryption (JWE). **Requirement**: Authenticators must be compliant with NIST SP 800-63-3 Digital Identity Guidelines IAL, AAL, FAL level 2. Link <https://pages.nist.gov/800-63-3>.

๐Ÿ’ผ IA-5(1) Authenticator Management | Password-based Authentication

For password-based authentication: (a) Maintain a list of commonly-used, expected, or compromised passwords and update the list [Assignment: organization-defined frequency] and when organizational passwords are suspected to have been compromised directly or indirectly; (b) Verify, when users create or update passwords, that the passwords are not found on the list of commonly-used, expected, or compromised passwords in IA-5(1)(a); (c) Transmit passwords only over cryptographically-protected channels; (d) Store passwords using an approved salted key derivation function, preferably using a keyed hash; (e) Require immediate selection of a new password upon account recovery; (f) Allow user selection of long passwords and passphrases, including spaces and all printable characters; (g) Employ automated tools to assist the user in selecting strong password authenticators; and (h) Enforce the following composition and complexity rules: [Assignment: organization-defined composition and complexity rules].

๐Ÿ’ผ IA-5(1) Password-based Authentication (L)(M)(H)

For password-based authentication: (a) Maintain a list of commonly-used, expected, or compromised passwords and update the list [Assignment: organization-defined frequency] and when organizational passwords are suspected to have been compromised directly or indirectly; (b) Verify, when users create or update passwords, that the passwords are not found on the list of commonly-used, expected, or compromised passwords in IA-5(1)(a); (c) Transmit passwords only over cryptographically-protected channels; (d) Store passwords using an approved salted key derivation function, preferably using a keyed hash; (e) Require immediate selection of a new password upon account recovery; (f) Allow user selection of long passwords and passphrases, including spaces and all printable characters; (g) Employ automated tools to assist the user in selecting strong password authenticators; and (h) Enforce the following composition and complexity rules: [Assignment: organization-defined composition and complexity rules]. **IA-5 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Note that (c) and (d) require the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or NSA approved cryptography (see SC-13). **Requirement**: Password policies must be compliant with NIST SP 800-63B for all memorized, lookup, out-of-band, or One-Time-Passwords (OTP). Password policies shall not enforce special character or minimum password rotation requirements for memorized secrets of users. **(h) Requirement**: For cases where technology doesnโ€™t allow multi-factor authentication, these rules should be enforced: must have a minimum length of 14 characters and must support all printable ASCII characters. For emergency use accounts, these rules should be enforced: must have a minimum length of 14 characters, must support all printable ASCII characters, and passwords must be changed if used.

๐Ÿ’ผ IA-5(1) Password-based Authentication (L)(M)(H)

For password-based authentication: (a) Maintain a list of commonly-used, expected, or compromised passwords and update the list [Assignment: organization-defined frequency] and when organizational passwords are suspected to have been compromised directly or indirectly; (b) Verify, when users create or update passwords, that the passwords are not found on the list of commonly-used, expected, or compromised passwords in IA-5(1)(a); (c) Transmit passwords only over cryptographically-protected channels; (d) Store passwords using an approved salted key derivation function, preferably using a keyed hash; (e) Require immediate selection of a new password upon account recovery; (f) Allow user selection of long passwords and passphrases, including spaces and all printable characters; (g) Employ automated tools to assist the user in selecting strong password authenticators; and (h) Enforce the following composition and complexity rules: [Assignment: organization-defined composition and complexity rules]. **IA-5 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Note that (c) and (d) require the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or NSA approved cryptography (see SC-13). **Requirement**: Password policies must be compliant with NIST SP 800-63B for all memorized, lookup, out-of-band, or One-Time-Passwords (OTP). Password policies shall not enforce special character or minimum password rotation requirements for memorized secrets of users. **(h) Requirement**: For cases where technology doesn't allow multi-factor authentication, these rules should be enforced: must have a minimum length of 14 characters and must support all printable ASCII characters. For emergency use accounts, these rules should be enforced: must have a minimum length of 14 characters, must support all printable ASCII characters, and passwords must be changed if used.

๐Ÿ’ผ IA-5(1) Password-based Authentication (L)(M)(H)

For password-based authentication: (a) Maintain a list of commonly-used, expected, or compromised passwords and update the list [Assignment: organization-defined frequency] and when organizational passwords are suspected to have been compromised directly or indirectly; (b) Verify, when users create or update passwords, that the passwords are not found on the list of commonly-used, expected, or compromised passwords in IA-5(1)(a); (c) Transmit passwords only over cryptographically-protected channels; (d) Store passwords using an approved salted key derivation function, preferably using a keyed hash; (e) Require immediate selection of a new password upon account recovery; (f) Allow user selection of long passwords and passphrases, including spaces and all printable characters; (g) Employ automated tools to assist the user in selecting strong password authenticators; and (h) Enforce the following composition and complexity rules: [Assignment: organization-defined composition and complexity rules]. **IA-5 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Note that (c) and (d) require the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or NSA approved cryptography (see SC-13). **Requirement**: Password policies must be compliant with NIST SP 800-63B for all memorized, lookup, out-of-band, or One-Time-Passwords (OTP). Password policies shall not enforce special character or minimum password rotation requirements for memorized secrets of users. **(h) Requirement**: For cases where technology doesnโ€™t allow multi-factor authentication, these rules should be enforced: must have a minimum length of 14 characters and must support all printable ASCII characters. For emergency use accounts, these rules should be enforced: must have a minimum length of 14 characters, must support all printable ASCII characters, and passwords must be changed if used.

๐Ÿ’ผ IA-5(13) Expiration of Cached Authenticators (H)

Prohibit the use of cached authenticators after [Assignment: organization-defined time period]. **IA-5 (13) Additional FedRAMP Requirements and Guidance:** **Guidance**: For components subject to configuration baseline(s) (such as STIG or CIS,) the time period should conform to the baseline standard.

๐Ÿ’ผ IA-5(2) Authenticator Management | Public Key-based Authentication

(a) For public key-based authentication: (1) Enforce authorized access to the corresponding private key; and (2) Map the authenticated identity to the account of the individual or group; and (b) When public key infrastructure (PKI) is used: (1) Validate certificates by constructing and verifying a certification path to an accepted trust anchor, including checking certificate status information; and (2) Implement a local cache of revocation data to support path discovery and validation.

๐Ÿ’ผ IA-5(2) Public Key-based Authentication (M)(H)

(a) For public key-based authentication: 1. Enforce authorized access to the corresponding private key; and 2. Map the authenticated identity to the account of the individual or group; and (b) When public key infrastructure (PKI) is used: 1. Validate certificates by constructing and verifying a certification path to an accepted trust anchor, including checking certificate status information; and 2. Implement a local cache of revocation data to support path discovery and validation.

๐Ÿ’ผ IA-5(2) Public Key-based Authentication (M)(H)

(a) For public key-based authentication: 1. Enforce authorized access to the corresponding private key; and 2. Map the authenticated identity to the account of the individual or group; and (b) When public key infrastructure (PKI) is used: 1. Validate certificates by constructing and verifying a certification path to an accepted trust anchor, including checking certificate status information; and 2. Implement a local cache of revocation data to support path discovery and validation.

๐Ÿ’ผ IA-5(7) No Embedded Unencrypted Static Authenticators (M)(H)

Ensure that unencrypted static authenticators are not embedded in applications or other forms of static storage. **IA-5 (7) Additional FedRAMP Requirements and Guidance:** **Guidance**: In this context, prohibited static storage refers to any storage where unencrypted authenticators, such as passwords, persist beyond the time required to complete the access process.

๐Ÿ’ผ IA-5(7) No Embedded Unencrypted Static Authenticators (M)(H)

Ensure that unencrypted static authenticators are not embedded in applications or other forms of static storage. **IA-5 (7) Additional FedRAMP Requirements and Guidance:** **Guidance**: In this context, prohibited static storage refers to any storage where unencrypted authenticators, such as passwords, persist beyond the time required to complete the access process.

๐Ÿ’ผ IA-5(8) Multiple System Accounts (H)

Implement [FedRAMP Assignment: different authenticators in different user authentication domains] to manage the risk of compromise due to individuals having accounts on multiple systems. **IA-5 (8) Additional FedRAMP Requirements and Guidance:** **Guidance**: If a single user authentication domain is used to access multiple systems, such as in single-sign-on, then only a single authenticator is required.

๐Ÿ’ผ IA-6 Authentication Feedback

Obscure feedback of authentication information during the authentication process to protect the information from possible exploitation and use by unauthorized individuals.

๐Ÿ’ผ IA-6 AUTHENTICATOR FEEDBACK

The information system obscures feedback of authentication information during the authentication process to protect the information from possible exploitation/use by unauthorized individuals.

๐Ÿ’ผ IA-7 Cryptographic Module Authentication

Implement mechanisms for authentication to a cryptographic module that meet the requirements of applicable laws, executive orders, directives, policies, regulations, standards, and guidelines for such authentication.

๐Ÿ’ผ IA-7 CRYPTOGRAPHIC MODULE AUTHENTICATION

The information system implements mechanisms for authentication to a cryptographic module that meet the requirements of applicable federal laws, Executive Orders, directives, policies, regulations, standards, and guidance for such authentication.

๐Ÿ’ผ IA-9 (2) TRANSMISSION OF DECISIONS

The organization ensures that identification and authentication decisions are transmitted between [Assignment: organization-defined services] consistent with organizational policies.

๐Ÿ’ผ ID.AM-03: Representations of the organization's authorized network communication and internal and external network data flows are maintained

1. Maintain baselines of communication and data flows within the organization's wired and wireless networks 2. Maintain baselines of communication and data flows between the organization and third parties 3. Maintain baselines of communication and data flows for the organization's infrastructure-as-a-service (IaaS) usage 4. Maintain documentation of expected network ports, protocols, and services that are typically used among authorized systems

๐Ÿ’ผ ID.AM-04: Inventories of services provided by suppliers are maintained

1. Inventory all external services used by the organization, including third-party infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS) offerings; APIs; and other externally hosted application services 2. Update the inventory when a new external service is going to be utilized to ensure adequate cybersecurity risk management monitoring of the organization's use of that service

๐Ÿ’ผ ID.AM-07: Inventories of data and corresponding metadata for designated data types are maintained

1. Maintain a list of the designated data types of interest (e.g., personally identifiable information, protected health information, financial account numbers, organization intellectual property, operational technology data) 2. Continuously discover and analyze ad hoc data to identify new instances of designated data types 3. Assign data classifications to designated data types through tags or labels 4. Track the provenance, data owner, and geolocation of each instance of designated data types

๐Ÿ’ผ ID.AM-08: Systems, hardware, software, services, and data are managed throughout their life cycles

1. Integrate cybersecurity considerations throughout the life cycles of systems, hardware, software, and services 2. Integrate cybersecurity considerations into product life cycles 3. Identify unofficial uses of technology to meet mission objectives (i.e., shadow IT) 4. Periodically identify redundant systems, hardware, software, and services that unnecessarily increase the organization's attack surface 5. Properly configure and secure systems, hardware, software, and services prior to their deployment in production 6. Update inventories when systems, hardware, software, and services are moved or transferred within the organization 7. Securely destroy stored data based on the organization's data retention policy using the prescribed destruction method, and keep and manage a record of the destructions 8. Securely sanitize data storage when hardware is being retired, decommissioned, reassigned, or sent for repairs or replacement 9. Offer methods for destroying paper, storage media, and other physical forms of data storage

๐Ÿ’ผ ID.IM-01: Improvements are identified from evaluations

1. Perform self-assessments of critical services that take current threats and TTPs into consideration 2. Invest in third-party assessments or independent audits of the effectiveness of the organization's cybersecurity program to identify areas that need improvement 3. Constantly evaluate compliance with selected cybersecurity requirements through automated means

๐Ÿ’ผ ID.IM-02: Improvements are identified from security tests and exercises, including those done in coordination with suppliers and relevant third parties

1. Identify improvements for future incident response activities based on findings from incident response assessments (e.g., tabletop exercises and simulations, tests, internal reviews, independent audits) 2. Identify improvements for future business continuity, disaster recovery, and incident response activities based on exercises performed in coordination with critical service providers and product suppliers 3. Involve internal stakeholders (e.g., senior executives, legal department, HR) in security tests and exercises as appropriate 4. Perform penetration testing to identify opportunities to improve the security posture of selected high-risk systems as approved by leadership 5. Exercise contingency plans for responding to and recovering from the discovery that products or services did not originate with the contracted supplier or partner or were altered before receipt 6. Collect and analyze performance metrics using security tools and services to inform improvements to the cybersecurity program

๐Ÿ’ผ ID.IM-04: Incident response plans and other cybersecurity plans that affect operations are established, communicated, maintained, and improved

1. Establish contingency plans (e.g., incident response, business continuity, disaster recovery) for responding to and recovering from adverse events that can interfere with operations, expose confidential information, or otherwise endanger the organization's mission and viability 2. Include contact and communication information, processes for handling common scenarios, and criteria for prioritization, escalation, and elevation in all contingency plans 3. Create a vulnerability management plan to identify and assess all types of vulnerabilities and to prioritize, test, and implement risk responses 4. Communicate cybersecurity plans (including updates) to those responsible for carrying them out and to affected parties 5. Review and update all cybersecurity plans annually or when a need for significant improvements is identified

๐Ÿ’ผ ID.RA-01: Vulnerabilities in assets are identified, validated, and recorded

1. Use vulnerability management technologies to identify unpatched and misconfigured software 2. Assess network and system architectures for design and implementation weaknesses that affect cybersecurity 3. Review, analyze, or test organization-developed software to identify design, coding, and default configuration vulnerabilities 4. Assess facilities that house critical computing assets for physical vulnerabilities and resilience issues 5. Monitor sources of cyber threat intelligence for information on new vulnerabilities in products and services 6. Review processes and procedures for weaknesses that could be exploited to affect cybersecurity

๐Ÿ’ผ ID.RA-02: Cyber threat intelligence is received from information sharing forums and sources

1. Configure cybersecurity tools and technologies with detection or response capabilities to securely ingest cyber threat intelligence feeds 2. Receive and review advisories from reputable third parties on current threat actors and their tactics, techniques, and procedures (TTPs) 3. Monitor sources of cyber threat intelligence for information on the types of vulnerabilities that emerging technologies may have

๐Ÿ’ผ ID.RA-04: Potential impacts and likelihoods of threats exploiting vulnerabilities are identified and recorded

1. Business leaders and cybersecurity risk management practitioners work together to estimate the likelihood and impact of risk scenarios and record them in risk registers 2. Enumerate the potential business impacts of unauthorized access to the organization's communications, systems, and data processed in or by those systems 3. Account for the potential impacts of cascading failures for systems of systems

๐Ÿ’ผ ID.RA-06: Risk responses are chosen, prioritized, planned, tracked, and communicated

1. Apply the vulnerability management plan's criteria for deciding whether to accept, transfer, mitigate, or avoid risk 2. Apply the vulnerability management plan's criteria for selecting compensating controls to mitigate risk 3. Track the progress of risk response implementation (e.g., plan of action and milestones [POA&M], risk register, risk detail report) 4. Use risk assessment findings to inform risk response decisions and actions 5. Communicate planned risk responses to affected stakeholders in priority order

๐Ÿ’ผ ID.RA-07: Changes and exceptions are managed, assessed for risk impact, recorded, and tracked

1. Implement and follow procedures for the formal documentation, review, testing, and approval of proposed changes and requested exceptions 2. Document the possible risks of making or not making each proposed change, and provide guidance on rolling back changes 3. Document the risks related to each requested exception and the plan for responding to those risks 4. Periodically review risks that were accepted based upon planned future actions or milestones

๐Ÿ’ผ ID.RA-08: Processes for receiving, analyzing, and responding to vulnerability disclosures are established

1. Conduct vulnerability information sharing between the organization and its suppliers following the rules and protocols defined in contracts 2. Assign responsibilities and verify the execution of procedures for processing, analyzing the impact of, and responding to cybersecurity threat, vulnerability, or incident disclosures by suppliers, customers, partners, and government cybersecurity organizations

๐Ÿ’ผ Identity management

There are two types of identities you need to manage when approaching operating secure AWS workloads. - **Human identities:** The human identities that require access to your AWS environments and applications can be categorized into three groups: workforce, third parties, and users. The workforce group includes administrators, developers, and operators who are members of your organization. They need access to manage, build, and operate your AWS resources. Third parties are external collaborators, such as contractors, vendors, or partners. They interact with your AWS resources as part of their engagement with your organization. Users are the consumers of your applications. They access your AWS resources through web browsers, client applications, mobile apps, or interactive command-line tools. - **Machine identities:** Your workload applications, operational tools, and components require an identity to make requests to AWS services, such as reading data. These identities also include machines running within your AWS environment, like Amazon EC2 instances or AWS Lambda functions. You may also manage machine identities for external parties, or machines outside of AWS, that require access to your AWS environment.

๐Ÿ’ผ Implement change

Controlled changes are necessary to deploy new functionality and to ensure that the workloads and the operating environment are running known, properly patched software. If these changes are uncontrolled, then it makes it difficult to predict the effect of these changes, or to address issues that arise because of them.

๐Ÿ’ผ Implement observability

Implement observability in your workload so that you can understand its state and make data-driven decisions based on business requirements.

๐Ÿ’ผ Improvement (ID.IM)

Improvements to organizational cybersecurity risk management processes, procedures and activities are identified across all CSF Functions.

๐Ÿ’ผ Improvements (RS.IM)

Organizational response activities are improved by incorporating lessons learned from current and previous detection/response activities.

๐Ÿ’ผ IR-1 INCIDENT RESPONSE POLICY AND PROCEDURES

The organization: IR-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: IR-1a.1. An incident response policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and IR-1a.2. Procedures to facilitate the implementation of the incident response policy and associated incident response controls; and IR-1b. Reviews and updates the current: IR-1b.1. Incident response policy [Assignment: organization-defined frequency]; and IR-1b.2. Incident response procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ IR-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] incident response policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the incident response policy and the associated incident response controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the incident response policy and procedures; and c. Review and update the current incident response: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ IR-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] incident response policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the incident response policy and the associated incident response controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the incident response policy and procedures; and c. Review and update the current incident response: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ IR-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] incident response policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the incident response policy and the associated incident response controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the incident response policy and procedures; and c. Review and update the current incident response: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ IR-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] incident response policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the incident response policy and the associated incident response controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the incident response policy and procedures; and c. Review and update the current incident response: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ IR-2 (1) SIMULATED EVENTS

The organization incorporates simulated events into incident response training to facilitate effective response by personnel in crisis situations.

๐Ÿ’ผ IR-2 Incident Response Training

a. Provide incident response training to system users consistent with assigned roles and responsibilities: 1. Within [Assignment: organization-defined time period] of assuming an incident response role or responsibility or acquiring system access; 2. When required by system changes; and 3. [Assignment: organization-defined frequency] thereafter; and b. Review and update incident response training content [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ IR-2 INCIDENT RESPONSE TRAINING

The organization provides incident response training to information system users consistent with assigned roles and responsibilities: IR-2a. Within [Assignment: organization-defined time period] of assuming an incident response role or responsibility; IR-2b. When required by information system changes; and IR-2c. [Assignment: organization-defined frequency] thereafter.

๐Ÿ’ผ IR-2 Incident Response Training (L)(M)(H)

a. Provide incident response training to system users consistent with assigned roles and responsibilities: 1. Within [FedRAMP Assignment: ten (10) days for privileged users, thirty (30) days for Incident Response roles] of assuming an incident response role or responsibility or acquiring system access; 2. When required by system changes; and 3. [FedRAMP Assignment: at least annually] thereafter; and b. Review and update incident response training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events].

๐Ÿ’ผ IR-2 Incident Response Training (L)(M)(H)

a. Provide incident response training to system users consistent with assigned roles and responsibilities: 1. Within [FedRAMP Assignment: ten (10) days for privileged users, thirty (30) days for Incident Response roles] of assuming an incident response role or responsibility or acquiring system access; 2. When required by system changes; and 3. [FedRAMP Assignment: at least annually] thereafter; and b. Review and update incident response training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events].

๐Ÿ’ผ IR-2 Incident Response Training (L)(M)(H)

a. Provide incident response training to system users consistent with assigned roles and responsibilities: 1. Within [FedRAMP Assignment: ten (10) days for privileged users, thirty (30) days for Incident Response roles] of assuming an incident response role or responsibility or acquiring system access; 2. When required by system changes; and 3. [FedRAMP Assignment: at least annually] thereafter; and b. Review and update incident response training content [FedRAMP Assignment: at least annually] and following [Assignment: organization-defined events].

๐Ÿ’ผ IR-3 Incident Response Testing

Test the effectiveness of the incident response capability for the system [Assignment: organization-defined frequency] using the following tests: [Assignment: organization-defined tests].

๐Ÿ’ผ IR-3 INCIDENT RESPONSE TESTING

The organization tests the incident response capability for the information system [Assignment: organization-defined frequency] using [Assignment: organization-defined tests] to determine the incident response effectiveness and documents the results.

๐Ÿ’ผ IR-3 Incident Response Testing (M)(H)

Test the effectiveness of the incident response capability for the system [FedRAMP Assignment: functional, at least annually] using the following tests: [Assignment: organization-defined tests]. **IR-3-2 Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider defines tests and/or exercises in accordance with NIST Special Publication 800-61 (as amended). Functional testing must occur prior to testing for initial authorization. Annual functional testing may be concurrent with required penetration tests (see CA-8). The service provider provides test plans to the JAB/AO annually. Test plans are approved and accepted by the JAB/AO prior to test commencing.

๐Ÿ’ผ IR-3 Incident Response Testing (M)(H)

Test the effectiveness of the incident response capability for the system [FedRAMP Assignment: functional, at least annually] using the following tests: [Assignment: organization-defined tests]. **IR-3-2 Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider defines tests and/or exercises in accordance with NIST Special Publication 800-61 (as amended). Functional testing must occur prior to testing for initial authorization. Annual functional testing may be concurrent with required penetration tests (see CA-8). The service provider provides test plans to the JAB/AO annually. Test plans are approved and accepted by the JAB/AO prior to test commencing.

๐Ÿ’ผ IR-3(3) Incident Response Testing | Continuous Improvement

Use qualitative and quantitative data from testing to: (a) Determine the effectiveness of incident response processes; (b) Continuously improve incident response processes; and (c) Provide incident response measures and metrics that are accurate, consistent, and in a reproducible format.

๐Ÿ’ผ IR-4 (3) CONTINUITY OF OPERATIONS

The organization identifies [Assignment: organization-defined classes of incidents] and [Assignment: organization-defined actions to take in response to classes of incidents] to ensure continuation of organizational missions and business functions.

๐Ÿ’ผ IR-4 (8) CORRELATION WITH EXTERNAL ORGANIZATIONS

The organization coordinates with [Assignment: organization-defined external organizations] to correlate and share [Assignment: organization-defined incident information] to achieve a cross-organization perspective on incident awareness and more effective incident responses.

๐Ÿ’ผ IR-4 Incident Handling

a. Implement an incident handling capability for incidents that is consistent with the incident response plan and includes preparation, detection and analysis, containment, eradication, and recovery; b. Coordinate incident handling activities with contingency planning activities; c. Incorporate lessons learned from ongoing incident handling activities into incident response procedures, training, and testing, and implement the resulting changes accordingly; and d. Ensure the rigor, intensity, scope, and results of incident handling activities are comparable and predictable across the organization.

๐Ÿ’ผ IR-4 INCIDENT HANDLING

The organization: IR-4a. Implements an incident handling capability for security incidents that includes preparation, detection and analysis, containment, eradication, and recovery; IR-4b. Coordinates incident handling activities with contingency planning activities; and IR-4c. Incorporates lessons learned from ongoing incident handling activities into incident response procedures, training, and testing, and implements the resulting changes accordingly.

๐Ÿ’ผ IR-4 Incident Handling (L)(M)(H)

a. Implement an incident handling capability for incidents that is consistent with the incident response plan and includes preparation, detection and analysis, containment, eradication, and recovery; b. Coordinate incident handling activities with contingency planning activities; c. Incorporate lessons learned from ongoing incident handling activities into incident response procedures, training, and testing, and implement the resulting changes accordingly; and d. Ensure the rigor, intensity, scope, and results of incident handling activities are comparable and predictable across the organization. **IR-4 Additional FedRAMP Requirements and Guidance:** **Requirement**: The FISMA definition of "incident" shall be used: "An occurrence that actually or imminently jeopardizes, without lawful authority, the confidentiality, integrity, or availability of information or an information system; or constitutes a violation or imminent threat of violation of law, security policies, security procedures, or acceptable use policies." **Requirement**: The service provider ensures that individuals conducting incident handling meet personnel security requirements commensurate with the criticality/sensitivity of the information being processed, stored, and transmitted by the information system.

๐Ÿ’ผ IR-4 Incident Handling (L)(M)(H)

a. Implement an incident handling capability for incidents that is consistent with the incident response plan and includes preparation, detection and analysis, containment, eradication, and recovery; b. Coordinate incident handling activities with contingency planning activities; c. Incorporate lessons learned from ongoing incident handling activities into incident response procedures, training, and testing, and implement the resulting changes accordingly; and d. Ensure the rigor, intensity, scope, and results of incident handling activities are comparable and predictable across the organization. **IR-4 Additional FedRAMP Requirements and Guidance:** **Requirement**: The FISMA definition of "incident" shall be used: "An occurrence that actually or imminently jeopardizes, without lawful authority, the confidentiality, integrity, or availability of information or an information system; or constitutes a violation or imminent threat of violation of law, security policies, security procedures, or acceptable use policies." **Requirement**: The service provider ensures that individuals conducting incident handling meet personnel security requirements commensurate with the criticality/sensitivity of the information being processed, stored, and transmitted by the information system.

๐Ÿ’ผ IR-4 Incident Handling (L)(M)(H)

a. Implement an incident handling capability for incidents that is consistent with the incident response plan and includes preparation, detection and analysis, containment, eradication, and recovery; b. Coordinate incident handling activities with contingency planning activities; c. Incorporate lessons learned from ongoing incident handling activities into incident response procedures, training, and testing, and implement the resulting changes accordingly; and d. Ensure the rigor, intensity, scope, and results of incident handling activities are comparable and predictable across the organization. **IR-4 Additional FedRAMP Requirements and Guidance:** **Requirement**: The FISMA definition of "incident" shall be used: "An occurrence that actually or imminently jeopardizes, without lawful authority, the confidentiality, integrity, or availability of information or an information system; or constitutes a violation or imminent threat of violation of law, security policies, security procedures, or acceptable use policies." **Requirement**: The service provider ensures that individuals conducting incident handling meet personnel security requirements commensurate with the criticality/sensitivity of the information being processed, stored, and transmitted by the information system.

๐Ÿ’ผ IR-4(2) Dynamic Reconfiguration (H)

Include the following types of dynamic reconfiguration for [FedRAMP Assignment: all network, data storage, and computing devices] as part of the incident response capability: [Assignment: organization-defined types of dynamic reconfiguration].

๐Ÿ’ผ IR-4(3) Incident Handling | Continuity of Operations

Identify [Assignment: organization-defined classes of incidents] and take the following actions in response to those incidents to ensure continuation of organizational mission and business functions: [Assignment: organization-defined actions to take in response to classes of incidents].

๐Ÿ’ผ IR-6 Incident Reporting

a. Require personnel to report suspected incidents to the organizational incident response capability within [Assignment: organization-defined time period]; and b. Report incident information to [Assignment: organization-defined authorities].

๐Ÿ’ผ IR-6 INCIDENT REPORTING

The organization: IR-6a. Requires personnel to report suspected security incidents to the organizational incident response capability within [Assignment: organization-defined time period]; and IR-6b. Reports security incident information to [Assignment: organization-defined authorities].

๐Ÿ’ผ IR-6 Incident Reporting (L)(M)(H)

a. Require personnel to report suspected incidents to the organizational incident response capability within [FedRAMP Assignment: US-CERT incident reporting timelines as specified in NIST Special Publication 800-61 (as amended)]; and b. Report incident information to [Assignment: organization-defined authorities]. **IR-6 Additional FedRAMP Requirements and Guidance:** **Requirement**: Reports security incident information according to FedRAMP Incident Communications Procedure.

๐Ÿ’ผ IR-6 Incident Reporting (L)(M)(H)

a. Require personnel to report suspected incidents to the organizational incident response capability within [FedRAMP Assignment: US-CERT incident reporting timelines as specified in NIST Special Publication 800-61 (as amended)]; and b. Report incident information to [Assignment: organization-defined authorities]. **IR-6 Additional FedRAMP Requirements and Guidance:** **Requirement**: Reports security incident information according to FedRAMP Incident Communications Procedure.

๐Ÿ’ผ IR-6 Incident Reporting (L)(M)(H)

a. Require personnel to report suspected incidents to the organizational incident response capability within [FedRAMP Assignment: US-CERT incident reporting timelines as specified in NIST Special Publication 800-61 (as amended)]; and b. Report incident information to [Assignment: organization-defined authorities]. **IR-6 Additional FedRAMP Requirements and Guidance:** **Requirement**: Reports security incident information according to FedRAMP Incident Communications Procedure.

๐Ÿ’ผ IR-6(3) Supply Chain Coordination (M)(H)

Provide incident information to the provider of the product or service and other organizations involved in the supply chain or supply chain governance for systems or system components related to the incident.

๐Ÿ’ผ IR-6(3) Supply Chain Coordination (M)(H)

Provide incident information to the provider of the product or service and other organizations involved in the supply chain or supply chain governance for systems or system components related to the incident.

๐Ÿ’ผ IR-7 (2) COORDINATION WITH EXTERNAL PROVIDERS

The organization: IR-7 (2)(a) Establishes a direct, cooperative relationship between its incident response capability and external providers of information system protection capability; and IR-7 (2)(b) Identifies organizational incident response team members to the external providers.

๐Ÿ’ผ IR-7 Incident Response Assistance

Provide an incident response support resource, integral to the organizational incident response capability, that offers advice and assistance to users of the system for the handling and reporting of incidents.

๐Ÿ’ผ IR-7 INCIDENT RESPONSE ASSISTANCE

The organization provides an incident response support resource, integral to the organizational incident response capability that offers advice and assistance to users of the information system for the handling and reporting of security incidents.

๐Ÿ’ผ IR-8 Incident Response Plan

a. Develop an incident response plan that: 1. Provides the organization with a roadmap for implementing its incident response capability; 2. Describes the structure and organization of the incident response capability; 3. Provides a high-level approach for how the incident response capability fits into the overall organization; 4. Meets the unique requirements of the organization, which relate to mission, size, structure, and functions; 5. Defines reportable incidents; 6. Provides metrics for measuring the incident response capability within the organization; 7. Defines the resources and management support needed to effectively maintain and mature an incident response capability; 8. Addresses the sharing of incident information; 9. Is reviewed and approved by [Assignment: organization-defined personnel or roles] [Assignment: organization-defined frequency]; and 10. Explicitly designates responsibility for incident response to [Assignment: organization-defined entities, personnel, or roles]. b. Distribute copies of the incident response plan to [Assignment: organization-defined incident response personnel (identified by name and/or by role) and organizational elements]; c. Update the incident response plan to address system and organizational changes or problems encountered during plan implementation, execution, or testing; d. Communicate incident response plan changes to [Assignment: organization-defined incident response personnel (identified by name and/or by role) and organizational elements]; and e. Protect the incident response plan from unauthorized disclosure and modification.

๐Ÿ’ผ IR-8 INCIDENT RESPONSE PLAN

The organization: IR-8a. Develops an incident response plan that: IR-8a.1. Provides the organization with a roadmap for implementing its incident response capability; IR-8a.2. Describes the structure and organization of the incident response capability; IR-8a.3. Provides a high-level approach for how the incident response capability fits into the overall organization; IR-8a.4. Meets the unique requirements of the organization, which relate to mission, size, structure, and functions; IR-8a.5. Defines reportable incidents; IR-8a.6. Provides metrics for measuring the incident response capability within the organization; IR-8a.7. Defines the resources and management support needed to effectively maintain and mature an incident response capability; and IR-8a.8. Is reviewed and approved by [Assignment: organization-defined personnel or roles]; IR-8b. Distributes copies of the incident response plan to [Assignment: organization-defined incident response personnel (identified by name and/or by role) and organizational elements]; IR-8c. Reviews the incident response plan [Assignment: organization-defined frequency]; IR-8d. Updates the incident response plan to address system/organizational changes or problems encountered during plan implementation, execution, or testing; IR-8e. Communicates incident response plan changes to [Assignment: organization-defined incident response personnel (identified by name and/or by role) and organizational elements]; and IR-8f. Protects the incident response plan from unauthorized disclosure and modification.

๐Ÿ’ผ IR-8 Incident Response Plan (L)(M)(H)

a. Develop an incident response plan that: 1. Provides the organization with a roadmap for implementing its incident response capability; 2. Describes the structure and organization of the incident response capability; 3. Provides a high-level approach for how the incident response capability fits into the overall organization; 4. Meets the unique requirements of the organization, which relate to mission, size, structure, and functions; 5. Defines reportable incidents; 6. Provides metrics for measuring the incident response capability within the organization; 7. Defines the resources and management support needed to effectively maintain and mature an incident response capability; 8. Addresses the sharing of incident information; 9. Is reviewed and approved by [Assignment: organization-defined personnel or roles] [FedRAMP Assignment: at least annually]; and 10. Explicitly designates responsibility for incident response to [Assignment: organization-defined entities, personnel, or roles]. b. Distribute copies of the incident response plan to [FedRAMP Assignment: see additional FedRAMP Requirements and Guidance]; c. Update the incident response plan to address system and organizational changes or problems encountered during plan implementation, execution, or testing; d. Communicate incident response plan changes to [FedRAMP Assignment: see additional FedRAMP Requirements and Guidance]; and e. Protect the incident response plan from unauthorized disclosure and modification. **IR-8 Additional FedRAMP Requirements and Guidance:** **(b) Requirement**: The service provider defines a list of incident response personnel (identified by name and/or by role) and organizational elements. The incident response list includes designated FedRAMP personnel. **(d) Requirement**: The service provider defines a list of incident response personnel (identified by name and/or by role) and organizational elements. The incident response list includes designated FedRAMP personnel.

๐Ÿ’ผ IR-8 Incident Response Plan (L)(M)(H)

a. Develop an incident response plan that: 1. Provides the organization with a roadmap for implementing its incident response capability; 2. Describes the structure and organization of the incident response capability; 3. Provides a high-level approach for how the incident response capability fits into the overall organization; 4. Meets the unique requirements of the organization, which relate to mission, size, structure, and functions; 5. Defines reportable incidents; 6. Provides metrics for measuring the incident response capability within the organization; 7. Defines the resources and management support needed to effectively maintain and mature an incident response capability; 8. Addresses the sharing of incident information; 9. Is reviewed and approved by [Assignment: organization-defined personnel or roles] [FedRAMP Assignment: at least annually]; and 10. Explicitly designates responsibility for incident response to [Assignment: organization-defined entities, personnel, or roles]. b. Distribute copies of the incident response plan to [FedRAMP Assignment: see additional FedRAMP Requirements and Guidance]; c. Update the incident response plan to address system and organizational changes or problems encountered during plan implementation, execution, or testing; d. Communicate incident response plan changes to [FedRAMP Assignment: see additional FedRAMP Requirements and Guidance]; and e. Protect the incident response plan from unauthorized disclosure and modification. **IR-8 Additional FedRAMP Requirements and Guidance:** **(b) Requirement**: The service provider defines a list of incident response personnel (identified by name and/or by role) and organizational elements. The incident response list includes designated FedRAMP personnel. **(d) Requirement**: The service provider defines a list of incident response personnel (identified by name and/or by role) and organizational elements. The incident response list includes designated FedRAMP personnel.

๐Ÿ’ผ IR-8 Incident Response Plan (L)(M)(H)

a. Develop an incident response plan that: 1. Provides the organization with a roadmap for implementing its incident response capability; 2. Describes the structure and organization of the incident response capability; 3. Provides a high-level approach for how the incident response capability fits into the overall organization; 4. Meets the unique requirements of the organization, which relate to mission, size, structure, and functions; 5. Defines reportable incidents; 6. Provides metrics for measuring the incident response capability within the organization; 7. Defines the resources and management support needed to effectively maintain and mature an incident response capability; 8. Addresses the sharing of incident information; 9. Is reviewed and approved by [Assignment: organization-defined personnel or roles] [FedRAMP Assignment: at least annually]; and 10. Explicitly designates responsibility for incident response to [Assignment: organization-defined entities, personnel, or roles]. b. Distribute copies of the incident response plan to [FedRAMP Assignment: see additional FedRAMP Requirements and Guidance]; c. Update the incident response plan to address system and organizational changes or problems encountered during plan implementation, execution, or testing; d. Communicate incident response plan changes to [FedRAMP Assignment: see additional FedRAMP Requirements and Guidance]; and e. Protect the incident response plan from unauthorized disclosure and modification. **IR-8 Additional FedRAMP Requirements and Guidance:** **(b) Requirement**: The service provider defines a list of incident response personnel (identified by name and/or by role) and organizational elements. The incident response list includes designated FedRAMP personnel. **(d) Requirement**: The service provider defines a list of incident response personnel (identified by name and/or by role) and organizational elements. The incident response list includes designated FedRAMP personnel.

๐Ÿ’ผ IR-8(1) Incident Response Plan | Breaches

Include the following in the Incident Response Plan for breaches involving personally identifiable information: (a) A process to determine if notice to individuals or other organizations, including oversight organizations, is needed; (b) An assessment process to determine the extent of the harm, embarrassment, inconvenience, or unfairness to affected individuals and any mechanisms to mitigate such harms; and (c) Identification of applicable privacy requirements.

๐Ÿ’ผ IR-9 (3) POST-SPILL OPERATIONS

The organization implements [Assignment: organization-defined procedures] to ensure that organizational personnel impacted by information spills can continue to carry out assigned tasks while contaminated systems are undergoing corrective actions.

๐Ÿ’ผ IR-9 Information Spillage Response

Respond to information spills by: a. Assigning [Assignment: organization-defined personnel or roles] with responsibility for responding to information spills; b. Identifying the specific information involved in the system contamination; c. Alerting [Assignment: organization-defined personnel or roles] of the information spill using a method of communication not associated with the spill; d. Isolating the contaminated system or system component; e. Eradicating the information from the contaminated system or component; f. Identifying other systems or system components that may have been subsequently contaminated; and g. Performing the following additional actions: [Assignment: organization-defined actions].

๐Ÿ’ผ IR-9 INFORMATION SPILLAGE RESPONSE

The organization responds to information spills by: IR-9a. Identifying the specific information involved in the information system contamination; IR-9b. Alerting [Assignment: organization-defined personnel or roles] of the information spill using a method of communication not associated with the spill; IR-9c. Isolating the contaminated information system or system component; IR-9d. Eradicating the information from the contaminated information system or component; IR-9e. Identifying other information systems or system components that may have been subsequently contaminated; and IR-9f. Performing other [Assignment: organization-defined actions].

๐Ÿ’ผ IR-9 Information Spillage Response (M)(H)

Respond to information spills by: a. Assigning [Assignment: organization-defined personnel or roles] with responsibility for responding to information spills; b. Identifying the specific information involved in the system contamination; c. Alerting [Assignment: organization-defined personnel or roles] of the information spill using a method of communication not associated with the spill; d. Isolating the contaminated system or system component; e. Eradicating the information from the contaminated system or component; f. Identifying other systems or system components that may have been subsequently contaminated; and g. Performing the following additional actions: [Assignment: organization-defined actions].

๐Ÿ’ผ IR-9 Information Spillage Response (M)(H)

Respond to information spills by: a. Assigning [Assignment: organization-defined personnel or roles] with responsibility for responding to information spills; b. Identifying the specific information involved in the system contamination; c. Alerting [Assignment: organization-defined personnel or roles] of the information spill using a method of communication not associated with the spill; d. Isolating the contaminated system or system component; e. Eradicating the information from the contaminated system or component; f. Identifying other systems or system components that may have been subsequently contaminated; and g. Performing the following additional actions: [Assignment: organization-defined actions].

๐Ÿ’ผ IR-9(3) Post-spill Operations (M)(H)

Implement the following procedures to ensure that organizational personnel impacted by information spills can continue to carry out assigned tasks while contaminated systems are undergoing corrective [Assignment: organization-defined procedures].

๐Ÿ’ผ IR-9(3) Post-spill Operations (M)(H)

Implement the following procedures to ensure that organizational personnel impacted by information spills can continue to carry out assigned tasks while contaminated systems are undergoing corrective [Assignment: organization-defined procedures].

๐Ÿ’ผ Learn, share, and improve

Itโ€™s essential that you regularly provide time for analysis of operations activities, analysis of failures, experimentation, and making improvements. When things fail, you will want to ensure that your team, as well as your larger engineering community, learns from those failures. You should analyze failures to identify lessons learned and plan improvements. You will want to regularly review your lessons learned with other teams to validate your insights.

๐Ÿ’ผ Logging & Monitoring

Policies for identifying resources that do not implement logging and monitoring systems to ensure resources are functioning properly, and to detect issues or malicious behavior.

๐Ÿ’ผ MA-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] maintenance policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the maintenance policy and the associated maintenance controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the maintenance policy and procedures; and c. Review and update the current maintenance: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ MA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] maintenance policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the maintenance policy and the associated maintenance controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the maintenance policy and procedures; and c. Review and update the current maintenance: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ MA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] maintenance policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the maintenance policy and the associated maintenance controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the maintenance policy and procedures; and c. Review and update the current maintenance: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ MA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] maintenance policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the maintenance policy and the associated maintenance controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the maintenance policy and procedures; and c. Review and update the current maintenance: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ MA-1 SYSTEM MAINTENANCE POLICY AND PROCEDURES

The organization: MA-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: MA-1a.1. A system maintenance policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and MA-1a.2. Procedures to facilitate the implementation of the system maintenance policy and associated system maintenance controls; and MA-1b. Reviews and updates the current: MA-1b.1. System maintenance policy [Assignment: organization-defined frequency]; and MA-1b.2. System maintenance procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ MA-2 (2) AUTOMATED MAINTENANCE ACTIVITIES

The organization: MA-2 (2)(a) Employs automated mechanisms to schedule, conduct, and document maintenance and repairs; and MA-2 (2)(b) Produces up-to date, accurate, and complete records of all maintenance and repair actions requested, scheduled, in process, and completed.

๐Ÿ’ผ MA-2 Controlled Maintenance

a. Schedule, document, and review records of maintenance, repair, and replacement on system components in accordance with manufacturer or vendor specifications and/or organizational requirements; b. Approve and monitor all maintenance activities, whether performed on site or remotely and whether the system or system components are serviced on site or removed to another location; c. Require that [Assignment: organization-defined personnel or roles] explicitly approve the removal of the system or system components from organizational facilities for off-site maintenance, repair, or replacement; d. Sanitize equipment to remove the following information from associated media prior to removal from organizational facilities for off-site maintenance, repair, or replacement: [Assignment: organization-defined information]; e. Check all potentially impacted controls to verify that the controls are still functioning properly following maintenance, repair, or replacement actions; and f. Include the following information in organizational maintenance records: [Assignment: organization-defined information].

๐Ÿ’ผ MA-2 CONTROLLED MAINTENANCE

The organization: MA-2a. Schedules, performs, documents, and reviews records of maintenance and repairs on information system components in accordance with manufacturer or vendor specifications and/or organizational requirements; MA-2b. Approves and monitors all maintenance activities, whether performed on site or remotely and whether the equipment is serviced on site or removed to another location; MA-2c. Requires that [Assignment: organization-defined personnel or roles] explicitly approve the removal of the information system or system components from organizational facilities for off-site maintenance or repairs; MA-2d. Sanitizes equipment to remove all information from associated media prior to removal from organizational facilities for off-site maintenance or repairs; MA-2e. Checks all potentially impacted security controls to verify that the controls are still functioning properly following maintenance or repair actions; and MA-2f. Includes [Assignment: organization-defined maintenance-related information] in organizational maintenance records.

๐Ÿ’ผ MA-2 Controlled Maintenance (L)(M)(H)

a. Schedule, document, and review records of maintenance, repair, and replacement on system components in accordance with manufacturer or vendor specifications and/or organizational requirements; b. Approve and monitor all maintenance activities, whether performed on site or remotely and whether the system or system components are serviced on site or removed to another location; c. Require that [Assignment: organization-defined personnel or roles] explicitly approve the removal of the system or system components from organizational facilities for off-site maintenance, repair, or replacement; d. Sanitize equipment to remove the following information from associated media prior to removal from organizational facilities for off-site maintenance, repair, or replacement: [Assignment: organization-defined information]; e. Check all potentially impacted controls to verify that the controls are still functioning properly following maintenance, repair, or replacement actions; and f. Include the following information in organizational maintenance records: [Assignment: organization-defined information].

๐Ÿ’ผ MA-2 Controlled Maintenance (L)(M)(H)

a. Schedule, document, and review records of maintenance, repair, and replacement on system components in accordance with manufacturer or vendor specifications and/or organizational requirements; b. Approve and monitor all maintenance activities, whether performed on site or remotely and whether the system or system components are serviced on site or removed to another location; c. Require that [Assignment: organization-defined personnel or roles] explicitly approve the removal of the system or system components from organizational facilities for off-site maintenance, repair, or replacement; d. Sanitize equipment to remove the following information from associated media prior to removal from organizational facilities for off-site maintenance, repair, or replacement: [Assignment: organization-defined information]; e. Check all potentially impacted controls to verify that the controls are still functioning properly following maintenance, repair, or replacement actions; and f. Include the following information in organizational maintenance records: [Assignment: organization-defined information].

๐Ÿ’ผ MA-2 Controlled Maintenance (L)(M)(H)

a. Schedule, document, and review records of maintenance, repair, and replacement on system components in accordance with manufacturer or vendor specifications and/or organizational requirements; b. Approve and monitor all maintenance activities, whether performed on site or remotely and whether the system or system components are serviced on site or removed to another location; c. Require that [Assignment: organization-defined personnel or roles] explicitly approve the removal of the system or system components from organizational facilities for off-site maintenance, repair, or replacement; d. Sanitize equipment to remove the following information from associated media prior to removal from organizational facilities for off-site maintenance, repair, or replacement: [Assignment: organization-defined information]; e. Check all potentially impacted controls to verify that the controls are still functioning properly following maintenance, repair, or replacement actions; and f. Include the following information in organizational maintenance records: [Assignment: organization-defined information].

๐Ÿ’ผ MA-2(2) Automated Maintenance Activities (H)

(a) Schedule, conduct, and document maintenance, repair, and replacement actions for the system using [Assignment: organization-defined automated mechanisms]; and (b) Produce up-to date, accurate, and complete records of all maintenance, repair, and replacement actions requested, scheduled, in process, and completed.

๐Ÿ’ผ MA-2(2) Controlled Maintenance | Automated Maintenance Activities

(a) Schedule, conduct, and document maintenance, repair, and replacement actions for the system using [Assignment: organization-defined automated mechanisms]; and (b) Produce up-to date, accurate, and complete records of all maintenance, repair, and replacement actions requested, scheduled, in process, and completed.

๐Ÿ’ผ MA-3 (1) INSPECT TOOLS

The organization inspects the maintenance tools carried into a facility by maintenance personnel for improper or unauthorized modifications.

๐Ÿ’ผ MA-3 (2) INSPECT MEDIA

The organization checks media containing diagnostic and test programs for malicious code before the media are used in the information system.

๐Ÿ’ผ MA-3 (3) PREVENT UNAUTHORIZED REMOVAL

The organization prevents the unauthorized removal of maintenance equipment containing organizational information by: MA-3 (3)(a) Verifying that there is no organizational information contained on the equipment; MA-3 (3)(b) Sanitizing or destroying the equipment; MA-3 (3)(c) Retaining the equipment within the facility; or MA-3 (3)(d) Obtaining an exemption from [Assignment: organization-defined personnel or roles] explicitly authorizing removal of the equipment from the facility.

๐Ÿ’ผ MA-3 Maintenance Tools

a. Approve, control, and monitor the use of system maintenance tools; and b. Review previously approved system maintenance tools [Assignment: organization-defined frequency].

๐Ÿ’ผ MA-3 Maintenance Tools (M)(H)

a. Approve, control, and monitor the use of system maintenance tools; and b. Review previously approved system maintenance tools [FedRAMP Assignment: at least annually].

๐Ÿ’ผ MA-3 Maintenance Tools (M)(H)

a. Approve, control, and monitor the use of system maintenance tools; and b. Review previously approved system maintenance tools [FedRAMP Assignment: at least annually].

๐Ÿ’ผ MA-3(3) Maintenance Tools | Prevent Unauthorized Removal

Prevent the removal of maintenance equipment containing organizational information by: (a) Verifying that there is no organizational information contained on the equipment; (b) Sanitizing or destroying the equipment; (c) Retaining the equipment within the facility; or (d) Obtaining an exemption from [Assignment: organization-defined personnel or roles] explicitly authorizing removal of the equipment from the facility.

๐Ÿ’ผ MA-3(3) Prevent Unauthorized Removal (M)(H)

Prevent the removal of maintenance equipment containing organizational information by: (a) Verifying that there is no organizational information contained on the equipment; (b) Sanitizing or destroying the equipment; (c) Retaining the equipment within the facility; or (d) Obtaining an exemption from [FedRAMP Assignment: the information owner] explicitly authorizing removal of the equipment from the facility.

๐Ÿ’ผ MA-3(3) Prevent Unauthorized Removal (M)(H)

Prevent the removal of maintenance equipment containing organizational information by: (a) Verifying that there is no organizational information contained on the equipment; (b) Sanitizing or destroying the equipment; (c) Retaining the equipment within the facility; or (d) Obtaining an exemption from [FedRAMP Assignment: the information owner] explicitly authorizing removal of the equipment from the facility.

๐Ÿ’ผ MA-4 (1) AUDITING AND REVIEW

The organization: MA-4 (1)(a) Audits nonlocal maintenance and diagnostic sessions [Assignment: organization-defined audit events]; and MA-4 (1)(b) Reviews the records of the maintenance and diagnostic sessions.

๐Ÿ’ผ MA-4 (3) COMPARABLE SECURITY | SANITIZATION

The organization: MA-4 (3)(a) Requires that nonlocal maintenance and diagnostic services be performed from an information system that implements a security capability comparable to the capability implemented on the system being serviced; or MA-4 (3)(b) Removes the component to be serviced from the information system prior to nonlocal maintenance or diagnostic services, sanitizes the component (with regard to organizational information) before removal from organizational facilities, and after the service is performed, inspects and sanitizes the component (with regard to potentially malicious software) before reconnecting the component to the information system.

๐Ÿ’ผ MA-4 (4) AUTHENTICATION | SEPARATION OF MAINTENANCE SESSIONS

The organization protects nonlocal maintenance sessions by: MA-4 (4)(a) Employing [Assignment: organization-defined authenticators that are replay resistant]; and MA-4 (4)(b) Separating the maintenance sessions from other network sessions with the information system by either: MA-4 (4)(b)(1) Physically separated communications paths; or MA-4 (4)(b)(2) Logically separated communications paths based upon encryption.

๐Ÿ’ผ MA-4 (5) APPROVALS AND NOTIFICATIONS

The organization: MA-4 (5)(a) Requires the approval of each nonlocal maintenance session by [Assignment: organization-defined personnel or roles]; and MA-4 (5)(b) Notifies [Assignment: organization-defined personnel or roles] of the date and time of planned nonlocal maintenance.

๐Ÿ’ผ MA-4 Nonlocal Maintenance

a. Approve and monitor nonlocal maintenance and diagnostic activities; b. Allow the use of nonlocal maintenance and diagnostic tools only as consistent with organizational policy and documented in the security plan for the system; c. Employ strong authentication in the establishment of nonlocal maintenance and diagnostic sessions; d. Maintain records for nonlocal maintenance and diagnostic activities; and e. Terminate session and network connections when nonlocal maintenance is completed.

๐Ÿ’ผ MA-4 NONLOCAL MAINTENANCE

The organization: MA-4a. Approves and monitors nonlocal maintenance and diagnostic activities; MA-4b. Allows the use of nonlocal maintenance and diagnostic tools only as consistent with organizational policy and documented in the security plan for the information system; MA-4c. Employs strong authenticators in the establishment of nonlocal maintenance and diagnostic sessions; MA-4d. Maintains records for nonlocal maintenance and diagnostic activities; and MA-4e. Terminates session and network connections when nonlocal maintenance is completed.

๐Ÿ’ผ MA-4 Nonlocal Maintenance (L)(M)(H)

a. Approve and monitor nonlocal maintenance and diagnostic activities; b. Allow the use of nonlocal maintenance and diagnostic tools only as consistent with organizational policy and documented in the security plan for the system; c. Employ strong authentication in the establishment of nonlocal maintenance and diagnostic sessions; d. Maintain records for nonlocal maintenance and diagnostic activities; and e. Terminate session and network connections when nonlocal maintenance is completed.

๐Ÿ’ผ MA-4 Nonlocal Maintenance (L)(M)(H)

a. Approve and monitor nonlocal maintenance and diagnostic activities; b. Allow the use of nonlocal maintenance and diagnostic tools only as consistent with organizational policy and documented in the security plan for the system; c. Employ strong authentication in the establishment of nonlocal maintenance and diagnostic sessions; d. Maintain records for nonlocal maintenance and diagnostic activities; and e. Terminate session and network connections when nonlocal maintenance is completed.

๐Ÿ’ผ MA-4 Nonlocal Maintenance (L)(M)(H)

a. Approve and monitor nonlocal maintenance and diagnostic activities; b. Allow the use of nonlocal maintenance and diagnostic tools only as consistent with organizational policy and documented in the security plan for the system; c. Employ strong authentication in the establishment of nonlocal maintenance and diagnostic sessions; d. Maintain records for nonlocal maintenance and diagnostic activities; and e. Terminate session and network connections when nonlocal maintenance is completed.

๐Ÿ’ผ MA-4(3) Comparable Security and Sanitization (H)

(a) Require that nonlocal maintenance and diagnostic services be performed from a system that implements a security capability comparable to the capability implemented on the system being serviced; or (b) Remove the component to be serviced from the system prior to nonlocal maintenance or diagnostic services; sanitize the component (for organizational information); and after the service is performed, inspect and sanitize the component (for potentially malicious software) before reconnecting the component to the system.

๐Ÿ’ผ MA-4(3) Nonlocal Maintenance | Comparable Security and Sanitization

(a) Require that nonlocal maintenance and diagnostic services be performed from a system that implements a security capability comparable to the capability implemented on the system being serviced; or (b) Remove the component to be serviced from the system prior to nonlocal maintenance or diagnostic services; sanitize the component (for organizational information); and after the service is performed, inspect and sanitize the component (for potentially malicious software) before reconnecting the component to the system.

๐Ÿ’ผ MA-4(5) Nonlocal Maintenance | Approvals and Notifications

(a) Require the approval of each nonlocal maintenance session by [Assignment: organization-defined personnel or roles]; and (b) Notify the following personnel or roles of the date and time of planned nonlocal maintenance: [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ MA-5 (1) INDIVIDUALS WITHOUT APPROPRIATE ACCESS

The organization: MA-5 (1)(a) Implements procedures for the use of maintenance personnel that lack appropriate security clearances or are not U.S. citizens, that include the following requirements: MA-5 (1)(a)(1) Maintenance personnel who do not have needed access authorizations, clearances, or formal access approvals are escorted and supervised during the performance of maintenance and diagnostic activities on the information system by approved organizational personnel who are fully cleared, have appropriate access authorizations, and are technically qualified; MA-5 (1)(a)(2) Prior to initiating maintenance or diagnostic activities by personnel who do not have needed access authorizations, clearances or formal access approvals, all volatile information storage components within the information system are sanitized and all nonvolatile storage media are removed or physically disconnected from the system and secured; and MA-5 (1)(b) Develops and implements alternate security safeguards in the event an information system component cannot be sanitized, removed, or disconnected from the system.

๐Ÿ’ผ MA-5 (2) SECURITY CLEARANCES FOR CLASSIFIED SYSTEMS

The organization ensures that personnel performing maintenance and diagnostic activities on an information system processing, storing, or transmitting classified information possess security clearances and formal access approvals for at least the highest classification level and for all compartments of information on the system.

๐Ÿ’ผ MA-5 (4) FOREIGN NATIONALS

The organization ensures that: MA-5 (4)(a) Cleared foreign nationals (i.e., foreign nationals with appropriate security clearances), are used to conduct maintenance and diagnostic activities on classified information systems only when the systems are jointly owned and operated by the United States and foreign allied governments, or owned and operated solely by foreign allied governments; and MA-5 (4)(b) Approvals, consents, and detailed operational conditions regarding the use of foreign nationals to conduct maintenance and diagnostic activities on classified information systems are fully documented within Memoranda of Agreements.

๐Ÿ’ผ MA-5 (5) NONSYSTEM-RELATED MAINTENANCE

The organization ensures that non-escorted personnel performing maintenance activities not directly associated with the information system but in the physical proximity of the system, have required access authorizations.

๐Ÿ’ผ MA-5 Maintenance Personnel

a. Establish a process for maintenance personnel authorization and maintain a list of authorized maintenance organizations or personnel; b. Verify that non-escorted personnel performing maintenance on the system possess the required access authorizations; and c. Designate organizational personnel with required access authorizations and technical competence to supervise the maintenance activities of personnel who do not possess the required access authorizations.

๐Ÿ’ผ MA-5 MAINTENANCE PERSONNEL

The organization: MA-5a. Establishes a process for maintenance personnel authorization and maintains a list of authorized maintenance organizations or personnel; MA-5b. Ensures that non-escorted personnel performing maintenance on the information system have required access authorizations; and MA-5c. Designates organizational personnel with required access authorizations and technical competence to supervise the maintenance activities of personnel who do not possess the required access authorizations.

๐Ÿ’ผ MA-5 Maintenance Personnel (L)(M)(H)

a. Establish a process for maintenance personnel authorization and maintain a list of authorized maintenance organizations or personnel; b. Verify that non-escorted personnel performing maintenance on the system possess the required access authorizations; and c. Designate organizational personnel with required access authorizations and technical competence to supervise the maintenance activities of personnel who do not possess the required access authorizations.

๐Ÿ’ผ MA-5 Maintenance Personnel (L)(M)(H)

a. Establish a process for maintenance personnel authorization and maintain a list of authorized maintenance organizations or personnel; b. Verify that non-escorted personnel performing maintenance on the system possess the required access authorizations; and c. Designate organizational personnel with required access authorizations and technical competence to supervise the maintenance activities of personnel who do not possess the required access authorizations.

๐Ÿ’ผ MA-5 Maintenance Personnel (L)(M)(H)

a. Establish a process for maintenance personnel authorization and maintain a list of authorized maintenance organizations or personnel; b. Verify that non-escorted personnel performing maintenance on the system possess the required access authorizations; and c. Designate organizational personnel with required access authorizations and technical competence to supervise the maintenance activities of personnel who do not possess the required access authorizations.

๐Ÿ’ผ MA-5(1) Individuals Without Appropriate Access (M)(H)

(a) Implement procedures for the use of maintenance personnel that lack appropriate security clearances or are not U.S. citizens, that include the following requirements: 1. Maintenance personnel who do not have needed access authorizations, clearances, or formal access approvals are escorted and supervised during the performance of maintenance and diagnostic activities on the system by approved organizational personnel who are fully cleared, have appropriate access authorizations, and are technically qualified; and 2. Prior to initiating maintenance or diagnostic activities by personnel who do not have needed access authorizations, clearances or formal access approvals, all volatile information storage components within the system are sanitized and all nonvolatile storage media are removed or physically disconnected from the system and secured; and (b) Develop and implement [Assignment: organization-defined alternate controls] in the event a system component cannot be sanitized, removed, or disconnected from the system. **MA-5 (1) Additional FedRAMP Requirements and Guidance:** **Requirement**: Only MA-5 (1) (a) (1) is required by FedRAMP Moderate Baseline.

๐Ÿ’ผ MA-5(1) Individuals Without Appropriate Access (M)(H)

(a) Implement procedures for the use of maintenance personnel that lack appropriate security clearances or are not U.S. citizens, that include the following requirements: 1. Maintenance personnel who do not have needed access authorizations, clearances, or formal access approvals are escorted and supervised during the performance of maintenance and diagnostic activities on the system by approved organizational personnel who are fully cleared, have appropriate access authorizations, and are technically qualified; and 2. Prior to initiating maintenance or diagnostic activities by personnel who do not have needed access authorizations, clearances or formal access approvals, all volatile information storage components within the system are sanitized and all nonvolatile storage media are removed or physically disconnected from the system and secured; and (b) Develop and implement [Assignment: organization-defined alternate controls] in the event a system component cannot be sanitized, removed, or disconnected from the system. **MA-5 (1) Additional FedRAMP Requirements and Guidance:** **Requirement**: Only MA-5 (1) (a) (1) is required by FedRAMP Moderate Baseline.

๐Ÿ’ผ MA-5(1) Maintenance Personnel | Individuals Without Appropriate Access

(a) Implement procedures for the use of maintenance personnel that lack appropriate security clearances or are not U.S. citizens, that include the following requirements: (1) Maintenance personnel who do not have needed access authorizations, clearances, or formal access approvals are escorted and supervised during the performance of maintenance and diagnostic activities on the system by approved organizational personnel who are fully cleared, have appropriate access authorizations, and are technically qualified; and (2) Prior to initiating maintenance or diagnostic activities by personnel who do not have needed access authorizations, clearances or formal access approvals, all volatile information storage components within the system are sanitized and all nonvolatile storage media are removed or physically disconnected from the system and secured; and (b) Develop and implement [Assignment: organization-defined alternate controls] in the event a system component cannot be sanitized, removed, or disconnected from the system.

๐Ÿ’ผ MA-5(4) Maintenance Personnel | Foreign Nationals

Ensure that: (a) Foreign nationals with appropriate security clearances are used to conduct maintenance and diagnostic activities on classified systems only when the systems are jointly owned and operated by the United States and foreign allied governments, or owned and operated solely by foreign allied governments; and (b) Approvals, consents, and detailed operational conditions regarding the use of foreign nationals to conduct maintenance and diagnostic activities on classified systems are fully documented within Memoranda of Agreements.

๐Ÿ’ผ MA-6 (1) PREVENTIVE MAINTENANCE

The organization performs preventive maintenance on [Assignment: organization-defined information system components] at [Assignment: organization-defined time intervals].

๐Ÿ’ผ MA-6 (2) PREDICTIVE MAINTENANCE

The organization performs predictive maintenance on [Assignment: organization-defined information system components] at [Assignment: organization-defined time intervals].

๐Ÿ’ผ MA-6 Timely Maintenance

Obtain maintenance support and/or spare parts for [Assignment: organization-defined system components] within [Assignment: organization-defined time period] of failure.

๐Ÿ’ผ MA-6 TIMELY MAINTENANCE

The organization obtains maintenance support and/or spare parts for [Assignment: organization-defined information system components] within [Assignment: organization-defined time period] of failure.

๐Ÿ’ผ MA-6 Timely Maintenance (M)(H)

Obtain maintenance support and/or spare parts for [Assignment: organization-defined system components] within [FedRAMP Assignment: a timeframe to support advertised uptime and availability] of failure.

๐Ÿ’ผ MA-6 Timely Maintenance (M)(H)

Obtain maintenance support and/or spare parts for [Assignment: organization-defined system components] within [FedRAMP Assignment: a timeframe to support advertised uptime and availability] of failure.

๐Ÿ’ผ MA-7 Field Maintenance

Restrict or prohibit field maintenance on [Assignment: organization-defined systems or system components] to [Assignment: organization-defined trusted maintenance facilities].

๐Ÿ’ผ Maintenance (PR.MA)

Maintenance and repairs of industrial control and information system components are performed consistent with policies and procedures.

๐Ÿ’ผ Manage demand and supply resources

When you move to the cloud, you pay only for what you need. You can supply resources to match the workload demand at the time theyโ€™re needed โ€” eliminating the need for costly and wasteful overprovisioning. You can also modify the demand using a throttle, buffer, or queue to smooth the demand and serve it with less resources.

๐Ÿ’ผ Manage service quotas and constraints

For cloud-based workload architectures, there are service quotas (which are also referred to as service limits). These quotas exist to prevent accidentally provisioning more resources than you need and to limit request rates on API operations so as to protect services from abuse. There are also resource constraints, for example, the rate that you can push bits down a fiber-optic cable, or the amount of storage on a physical disk.

๐Ÿ’ผ Mitigate deployment risks

Adopt approaches that provide fast feedback on quality and provide rapid recovery from changes that do not have desired outcomes. Using these practices mitigates the impact of issues introduced through the deployment of changes.

๐Ÿ’ผ Monitor cost and usage

Enable teams to take action on their cost and usage through detailed visibility into the workload. Cost optimization begins with a granular understanding of the breakdown in cost and usage, the ability to model and forecast future spend, usage, and features, and the implementation of sufficient mechanisms to align cost and usage to your organizationโ€™s objectives.

๐Ÿ’ผ Monitor workload resources

Logs and metrics are powerful tools to gain insight into the health of your workload. You can configure your workload to monitor logs and metrics and send notifications when thresholds are crossed or significant events occur. Monitoring allows your workload to recognize when low-performance thresholds are crossed or failures occur, so it can recover automatically in response.

๐Ÿ’ผ MP-1 MEDIA PROTECTION POLICY AND PROCEDURES

The organization: MP-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: MP-1a.1. A media protection policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and MP-1a.2. Procedures to facilitate the implementation of the media protection policy and associated media protection controls; and MP-1b. Reviews and updates the current: MP-1b.1. Media protection policy [Assignment: organization-defined frequency]; and MP-1b.2. Media protection procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ MP-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] media protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the media protection policy and the associated media protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the media protection policy and procedures; and c. Review and update the current media protection: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ MP-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] media protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the media protection policy and the associated media protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the media protection policy and procedures; and c. Review and update the current media protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ MP-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] media protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the media protection policy and the associated media protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the media protection policy and procedures; and c. Review and update the current media protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ MP-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] media protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the media protection policy and the associated media protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the media protection policy and procedures; and c. Review and update the current media protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ MP-2 Media Access

Restrict access to [Assignment: organization-defined types of digital and/or non-digital media] to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ MP-2 MEDIA ACCESS

The organization restricts access to [Assignment: organization-defined types of digital and/or non-digital media] to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ MP-2 Media Access (L)(M)(H)

Restrict access to [FedRAMP Assignment: all types of digital and/or non-digital media containing sensitive information] to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ MP-2 Media Access (L)(M)(H)

Restrict access to [FedRAMP Assignment: all types of digital and/or non-digital media containing sensitive information] to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ MP-2 Media Access (L)(M)(H)

Restrict access to [FedRAMP Assignment: all types of digital and/or non-digital media containing sensitive information] to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ MP-3 Media Marking

a. Mark system media indicating the distribution limitations, handling caveats, and applicable security markings (if any) of the information; and b. Exempt [Assignment: organization-defined types of system media] from marking if the media remain within [Assignment: organization-defined controlled areas].

๐Ÿ’ผ MP-3 MEDIA MARKING

The organization: MP-3a. Marks information system media indicating the distribution limitations, handling caveats, and applicable security markings (if any) of the information; and MP-3b. Exempts [Assignment: organization-defined types of information system media] from marking as long as the media remain within [Assignment: organization-defined controlled areas].

๐Ÿ’ผ MP-3 Media Marking (M)(H)

a. Mark system media indicating the distribution limitations, handling caveats, and applicable security markings (if any) of the information; and b. Exempt [FedRAMP Assignment: no removable media types] from marking if the media remain within [FedRAMP Assignment: organization-defined security safeguards not applicable]. **MP-3 Additional FedRAMP Requirements and Guidance:** **(b) Guidance**: Second parameter not-applicable.

๐Ÿ’ผ MP-3 Media Marking (M)(H)

a. Mark system media indicating the distribution limitations, handling caveats, and applicable security markings (if any) of the information; and b. Exempt [FedRAMP Assignment: no removable media types] from marking if the media remain within [FedRAMP Assignment: organization-defined security safeguards not applicable]. **MP-3 Additional FedRAMP Requirements and Guidance:** **(b) Guidance**: Second parameter not-applicable.

๐Ÿ’ผ MP-4 Media Storage

a. Physically control and securely store [Assignment: organization-defined types of digital and/or non-digital media] within [Assignment: organization-defined controlled areas]; and b. Protect system media types defined in MP-4a until the media are destroyed or sanitized using approved equipment, techniques, and procedures.

๐Ÿ’ผ MP-4 MEDIA STORAGE

The organization: MP-4a. Physically controls and securely stores [Assignment: organization-defined types of digital and/or non-digital media] within [Assignment: organization-defined controlled areas]; and MP-4b. Protects information system media until the media are destroyed or sanitized using approved equipment, techniques, and procedures.

๐Ÿ’ผ MP-4 Media Storage (M)(H)

a. Physically control and securely store [FedRAMP Assignment: all types of digital and non-digital media with sensitive information] within [FedRAMP Assignment: see additional FedRAMP requirements and guidance]; and b. Protect system media types defined in MP-4a until the media are destroyed or sanitized using approved equipment, techniques, and procedures. **MP-4 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider defines controlled areas within facilities where the information and information system reside.

๐Ÿ’ผ MP-4 Media Storage (M)(H)

a. Physically control and securely store [FedRAMP Assignment: all types of digital and non-digital media with sensitive information] within [FedRAMP Assignment: see additional FedRAMP requirements and guidance]; and b. Protect system media types defined in MP-4a until the media are destroyed or sanitized using approved equipment, techniques, and procedures. **MP-4 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider defines controlled areas within facilities where the information and information system reside.

๐Ÿ’ผ MP-5 (3) CUSTODIANS

The organization employs an identified custodian during transport of information system media outside of controlled areas.

๐Ÿ’ผ MP-5 (4) CRYPTOGRAPHIC PROTECTION

The information system implements cryptographic mechanisms to protect the confidentiality and integrity of information stored on digital media during transport outside of controlled areas.

๐Ÿ’ผ MP-5 Media Transport

a. Protect and control [Assignment: organization-defined types of system media] during transport outside of controlled areas using [Assignment: organization-defined controls]; b. Maintain accountability for system media during transport outside of controlled areas; c. Document activities associated with the transport of system media; and d. Restrict the activities associated with the transport of system media to authorized personnel.

๐Ÿ’ผ MP-5 MEDIA TRANSPORT

The organization: MP-5a. Protects and controls [Assignment: organization-defined types of information system media] during transport outside of controlled areas using [Assignment: organization-defined security safeguards]; MP-5b. Maintains accountability for information system media during transport outside of controlled areas; MP-5c. Documents activities associated with the transport of information system media; and MP-5d. Restricts the activities associated with the transport of information system media to authorized personnel.

๐Ÿ’ผ MP-5 Media Transport (M)(H)

a. Protect and control [FedRAMP Assignment: all media with sensitive information] during transport outside of controlled areas using [FedRAMP Assignment: prior to leaving secure/controlled environment: for digital media, encryption in compliance with Federal requirements and utilizes FIPS validated or NSA approved cryptography (see SC-13.); for non-digital media, secured in locked container]; b. Maintain accountability for system media during transport outside of controlled areas; c. Document activities associated with the transport of system media; and d. Restrict the activities associated with the transport of system media to authorized personnel. **MP-5 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider defines security measures to protect digital and non-digital media in transport. The security measures are approved and accepted by the JAB/AO.

๐Ÿ’ผ MP-5 Media Transport (M)(H)

a. Protect and control [FedRAMP Assignment: all media with sensitive information] during transport outside of controlled areas using [FedRAMP Assignment: prior to leaving secure/controlled environment: for digital media, encryption in compliance with Federal requirements and utilizes FIPS validated or NSA approved cryptography (see SC-13.); for non-digital media, secured in locked container]; b. Maintain accountability for system media during transport outside of controlled areas; c. Document activities associated with the transport of system media; and d. Restrict the activities associated with the transport of system media to authorized personnel. **MP-5 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider defines security measures to protect digital and non-digital media in transport. The security measures are approved and accepted by the JAB/AO.

๐Ÿ’ผ MP-6 (2) EQUIPMENT TESTING

The organization tests sanitization equipment and procedures [Assignment: organization-defined frequency] to verify that the intended sanitization is being achieved.

๐Ÿ’ผ MP-6 (3) NONDESTRUCTIVE TECHNIQUES

The organization applies nondestructive sanitization techniques to portable storage devices prior to connecting such devices to the information system under the following circumstances: [Assignment: organization-defined circumstances requiring sanitization of portable storage devices].

๐Ÿ’ผ MP-6 (8) REMOTE PURGING | WIPING OF INFORMATION

The organization provides the capability to purge/wipe information from [Assignment: organization-defined information systems, system components, or devices] either remotely or under the following conditions: [Assignment: organization-defined conditions].

๐Ÿ’ผ MP-6 Media Sanitization

a. Sanitize [Assignment: organization-defined system media] prior to disposal, release out of organizational control, or release for reuse using [Assignment: organization-defined sanitization techniques and procedures]; and b. Employ sanitization mechanisms with the strength and integrity commensurate with the security category or classification of the information.

๐Ÿ’ผ MP-6 MEDIA SANITIZATION

The organization: MP-6a. Sanitizes [Assignment: organization-defined information system media] prior to disposal, release out of organizational control, or release for reuse using [Assignment: organization-defined sanitization techniques and procedures] in accordance with applicable federal and organizational standards and policies; and MP-6b. Employs sanitization mechanisms with the strength and integrity commensurate with the security category or classification of the information.

๐Ÿ’ผ MP-6 Media Sanitization (L)(M)(H)

a. Sanitize [FedRAMP Assignment: techniques and procedures IAW NIST SP 800-88 Section 4: Reuse and Disposal of Storage Media and Hardware] prior to disposal, release out of organizational control, or release for reuse using [Assignment: organization-defined sanitization techniques and procedures]; and b. Employ sanitization mechanisms with the strength and integrity commensurate with the security category or classification of the information.

๐Ÿ’ผ MP-6 Media Sanitization (L)(M)(H)

a. Sanitize [FedRAMP Assignment: techniques and procedures IAW NIST SP 800-88 Section 4: Reuse and Disposal of Storage Media and Hardware] prior to disposal, release out of organizational control, or release for reuse using [Assignment: organization-defined sanitization techniques and procedures]; and b. Employ sanitization mechanisms with the strength and integrity commensurate with the security category or classification of the information.

๐Ÿ’ผ MP-6 Media Sanitization (L)(M)(H)

a. Sanitize [FedRAMP Assignment: techniques and procedures IAW NIST SP 800-88 Section 4: Reuse and Disposal of Storage Media and Hardware] prior to disposal, release out of organizational control, or release for reuse using [Assignment: organization-defined sanitization techniques and procedures]; and b. Employ sanitization mechanisms with the strength and integrity commensurate with the security category or classification of the information.

๐Ÿ’ผ MP-6(2) Equipment Testing (H)

Test sanitization equipment and procedures [FedRAMP Assignment: at least every six (6) months] to ensure that the intended sanitization is being achieved. **MP-6 (2) Additional FedRAMP Requirements and Guidance:** **Guidance**: Equipment and procedures may be tested or validated for effectiveness.

๐Ÿ’ผ MP-6(3) Nondestructive Techniques (H)

Apply nondestructive sanitization techniques to portable storage devices prior to connecting such devices to the system under the following circumstances: [Assignment: organization-defined circumstances requiring sanitization of portable storage devices]. **MP-6 (3) Additional FedRAMP Requirements and Guidance:** **Requirement**: Must comply with NIST SP 800-88.

๐Ÿ’ผ MP-7 Media Use

a. [Selection: Restrict; Prohibit] the use of [Assignment: organization-defined types of system media] on [Assignment: organization-defined systems or system components] using [Assignment: organization-defined controls]; and b. Prohibit the use of portable storage devices in organizational systems when such devices have no identifiable owner.

๐Ÿ’ผ MP-7 MEDIA USE

The organization [Selection: restricts; prohibits] the use of [Assignment: organization-defined types of information system media] on [Assignment: organization-defined information systems or system components] using [Assignment: organization-defined security safeguards].

๐Ÿ’ผ MP-7 Media Use (L)(M)(H)

a. [Selection: Restrict; Prohibit] the use of [Assignment: organization-defined types of system media] on [Assignment: organization-defined systems or system components] using [Assignment: organization-defined controls]; and b. Prohibit the use of portable storage devices in organizational systems when such devices have no identifiable owner.

๐Ÿ’ผ MP-7 Media Use (L)(M)(H)

a. [Selection: Restrict; Prohibit] the use of [Assignment: organization-defined types of system media] on [Assignment: organization-defined systems or system components] using [Assignment: organization-defined controls]; and b. Prohibit the use of portable storage devices in organizational systems when such devices have no identifiable owner.

๐Ÿ’ผ MP-7 Media Use (L)(M)(H)

a. [Selection: Restrict; Prohibit] the use of [Assignment: organization-defined types of system media] on [Assignment: organization-defined systems or system components] using [Assignment: organization-defined controls]; and b. Prohibit the use of portable storage devices in organizational systems when such devices have no identifiable owner.

๐Ÿ’ผ MP-8 (2) EQUIPMENT TESTING

The organization employs [Assignment: organization-defined tests] of downgrading equipment and procedures to verify correct performance [Assignment: organization-defined frequency].

๐Ÿ’ผ MP-8 (3) CONTROLLED UNCLASSIFIED INFORMATION

The organization downgrades information system media containing [Assignment: organization-defined Controlled Unclassified Information (CUI)] prior to public release in accordance with applicable federal and organizational standards and policies.

๐Ÿ’ผ MP-8 (4) CLASSIFIED INFORMATION

The organization downgrades information system media containing classified information prior to release to individuals without required access authorizations in accordance with NSA standards and policies.

๐Ÿ’ผ MP-8 Media Downgrading

a. Establish [Assignment: organization-defined system media downgrading process] that includes employing downgrading mechanisms with strength and integrity commensurate with the security category or classification of the information; b. Verify that the system media downgrading process is commensurate with the security category and/or classification level of the information to be removed and the access authorizations of the potential recipients of the downgraded information; c. Identify [Assignment: organization-defined system media requiring downgrading]; and d. Downgrade the identified system media using the established process.

๐Ÿ’ผ MP-8 MEDIA DOWNGRADING

The organization: MP-8a. Establishes [Assignment: organization-defined information system media downgrading process] that includes employing downgrading mechanisms with [Assignment: organization-defined strength and integrity]; MP-8b. Ensures that the information system media downgrading process is commensurate with the security category and/or classification level of the information to be removed and the access authorizations of the potential recipients of the downgraded information; MP-8c. Identifies [Assignment: organization-defined information system media requiring downgrading]; and MP-8d. Downgrades the identified information system media using the established process.

๐Ÿ’ผ Networking and content delivery

The optimal networking solution for a workload varies based on latency, throughput requirements, jitter, and bandwidth. Physical constraints, such as user or on-premises resources, determine location options. These constraints can be offset with edge locations or resource placement.

๐Ÿ’ผ Operating model

Your teams need to have a shared understanding of your entire workload, their role in it, and shared business goals to set the priorities that will create business success. Well-defined priorities will maximize the benefits of your efforts. Review your priorities regularly so that they can be updated as your organization's needs change.

๐Ÿ’ผ Operating your workloads securely

Operating workloads securely covers the whole lifecycle of a workload from design, to build, to run, and to ongoing improvement. One of the ways to improve your ability to operate securely in the cloud is by taking an organizational approach to governance. Governance is the way that decisions are guided consistently without depending solely on the good judgment of the people involved. Your governance model and process are the way you answer the question โ€œHow do I know that the control objectives for a given workload are met and are appropriate for that workload?โ€ Having a consistent approach to making decisions speeds up the deployment of workloads and helps raise the bar for the security capability in your organization.

๐Ÿ’ผ Operational Excellence

Operational excellence (OE) is a commitment to build software correctly while consistently delivering a great customer experience. The operational excellence pillar contains best practices for organizing your team, designing your workload, operating it at scale, and evolving it over time.

๐Ÿ’ผ OPS01-BP01 Evaluate external customer needs

Involve key stakeholders, including business, development, and operations teams, to determine where to focus efforts on external customer needs. This verifies that you have a thorough understanding of the operations support that is required to achieve your desired business outcomes. **Desired outcome:** - You work backwards from customer outcomes. - You understand how your operational practices support business outcomes and objectives. - You engage all relevant parties. - You have mechanisms to capture external customer needs. **Common anti-patterns:** - You have decided not to have customer support outside of core business hours, but you haven't reviewed historical support request data. You do not know whether this will have an impact on your customers. - You are developing a new feature but have not engaged your customers to find out if it is desired, if desired in what form, and without experimentation to validate the need and method of delivery. **Benefits of establishing this best practice:** Customers whose needs are satisfied are much more likely to remain customers. Evaluating and understanding external customer needs will inform how you prioritize your efforts to deliver business value. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance **Understand business needs:** Business success is created by shared goals and understanding across stakeholders, including business, development, and operations teams. **Review business goals, needs, and priorities of external customers:** Engage key stakeholders, including business, development, and operations teams, to discuss goals, needs, and priorities of external customers. This ensures that you have a thorough understanding of the operational support that is required to achieve business and customer outcomes. **Establish a shared understanding:** Establish a shared understanding of the business functions of the workload, the roles of each of the teams in operating the workload, and how these factors support your shared business goals across internal and external customers.

๐Ÿ’ผ OPS01-BP02 Evaluate internal customer needs

Involve key stakeholders, including business, development, and operations teams, when determining where to focus efforts on internal customer needs. This will ensure that you have a thorough understanding of the operations support that is required to achieve business outcomes. **Desired outcome:** - You use your established priorities to focus your improvement efforts where they will have the greatest impact (for example, developing team skills, improving workload performance, reducing costs, automating runbooks, or enhancing monitoring). - You update your priorities as needs change. **Common anti-patterns:** - You have decided to change IP address allocations for your product teams, without consulting them, to make managing your network easier. You do not know the impact this will have on your product teams. - You are implementing a new development tool but have not engaged your internal customers to find out if it is needed or if it is compatible with their existing practices. - You are implementing a new monitoring system but have not contacted your internal customers to find out if they have monitoring or reporting needs that should be considered. **Benefits of establishing this best practice:** Evaluating and understanding internal customer needs informs how you prioritize your efforts to deliver business value. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance - Understand business needs: Business success is created by shared goals and understanding across stakeholders including business, development, and operations teams. - Review business goals, needs, and priorities of internal customers: Engage key stakeholders, including business, development, and operations teams, to discuss goals, needs, and priorities of internal customers. This ensures that you have a thorough understanding of the operational support that is required to achieve business and customer outcomes. - Establish shared understanding: Establish shared understanding of the business functions of the workload, the roles of each of the teams in operating the workload, and how these factors support shared business goals across internal and external customers.

๐Ÿ’ผ OPS01-BP03 Evaluate governance requirements

Governance is the set of policies, rules, or frameworks that a company uses to achieve its business goals. Governance requirements are generated from within your organization. They can affect the types of technologies you choose or influence the way you operate your workload. Incorporate organizational governance requirements into your workload. Conformance is the ability to demonstrate that you have implemented governance requirements. **Desired outcome:** - Governance requirements are incorporated into the architectural design and operation of your workload. - You can provide proof that you have followed governance requirements. - Governance requirements are regularly reviewed and updated. **Common anti-patterns:** - Your organization mandates that the root account has multi-factor authentication. You failed to implement this requirement and the root account is compromised. - During the design of your workload, you choose an instance type that is not approved by the IT department. You are unable to launch your workload and must conduct a redesign. - You are required to have a disaster recovery plan. You did not create one and your workload suffers an extended outage. - Your team wants to use new instances but your governance requirements have not been updated to allow them. **Benefits of establishing this best practice:** - Following governance requirements aligns your workload with larger organization policies. - Governance requirements reflect industry standards and best practices for your organization. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Identify governance requirement by working with stakeholders and governance organizations. Include governance requirements into your workload. Be able to demonstrate proof that youโ€™ve followed governance requirements. ### Customer example At AnyCompany Retail, the cloud operations team works with stakeholders across the organization to develop governance requirements. For example, they prohibit SSH access into Amazon EC2 instances. If teams need system access, they are required to use AWS Systems Manager Session Manager. The cloud operations team regularly updates governance requirements as new services become available. ### Implementation steps 1. Identify the stakeholders for your workload, including any centralized teams. 2. Work with stakeholders to identify governance requirements. 3. Once youโ€™ve generated a list, prioritize the improvement items, and begin implementing them into your workload. 1. Use services like AWS Config to create governance-as-code and validate that governance requirements are followed. 2. If you use AWS Organizations, you can leverage Service Control Policies to implement governance requirements. 4. Provide documentation that validates the implementation. **Level of effort for the implementation plan:** Medium. Implementing missing governance requirements may result in rework of your workload.

๐Ÿ’ผ OPS01-BP04 Evaluate compliance requirements

Regulatory, industry, and internal compliance requirements are an important driver for defining your organizationโ€™s priorities. Your compliance framework may preclude you from using specific technologies or geographic locations. Apply due diligence if no external compliance frameworks are identified. Generate audits or reports that validate compliance. If you advertise that your product meets specific compliance standards, you must have an internal process for ensuring continuous compliance. Examples of compliance standards include PCI DSS, FedRAMP, and HIPAA. Applicable compliance standards are determined by various factors, such as what types of data the solution stores or transmits and which geographic regions the solution supports. **Desired outcome:** - Regulatory, industry, and internal compliance requirements are incorporated into architectural selection. - You can validate compliance and generate audit reports. **Common anti-patterns:** - Parts of your workload fall under the Payment Card Industry Data Security Standard (PCI-DSS) framework but your workload stores credit cards data unencrypted. - Your software developers and architects are unaware of the compliance framework that your organization must adhere to. - The yearly Systems and Organizations Control (SOC2) Type II audit is happening soon and you are unable to verify that controls are in place. **Benefits of establishing this best practice:** - Evaluating and understanding the compliance requirements that apply to your workload will inform how you prioritize your efforts to deliver business value. - You choose the right locations and technologies that are congruent with your compliance framework. - Designing your workload for auditability helps you to prove you are adhering to your compliance framework. **Level of risk exposed if this best practice is not established: High** ## Implementation guidance Implementing this best practice means that you incorporate compliance requirements into your architecture design process. Your team members are aware of the required compliance framework. You validate compliance in line with the framework. ### Customer example AnyCompany Retail stores credit card information for customers. Developers on the card storage team understand that they need to comply with the PCI-DSS framework. Theyโ€™ve taken steps to verify that credit card information is stored and accessed securely in line with the PCI-DSS framework. Every year they work with their security team to validate compliance. ### Implementation steps 1. Work with your security and governance teams to determine what industry, regulatory, or internal compliance frameworks that your workload must adhere to. Incorporate the compliance frameworks into your workload. 1. Validate continual compliance of AWS resources with services like AWS Compute Optimizer and AWS Security Hub. 2. Educate your team members on the compliance requirements so they can operate and evolve the workload in line with them. Compliance requirements should be included in architectural and technological choices. 3. Depending on the compliance framework, you may be required to generate an audit or compliance report. Work with your organization to automate this process as much as possible. 1. Use services like AWS Audit Manager to validate compliance and generate audit reports. 2. You can download AWS security and compliance documents with AWS Artifact. **Level of effort for the implementation plan:** Medium. Implementing compliance frameworks can be challenging. Generating audit reports or compliance documents adds additional complexity.

๐Ÿ’ผ OPS01-BP05 Evaluate threat landscape

Evaluate threats to the business (for example, competition, business risk and liabilities, operational risks, and information security threats) and maintain current information in a risk registry. Include the impact of risks when determining where to focus efforts. The Well-Architected Framework emphasizes learning, measuring, and improving. It provides a consistent approach for you to evaluate architectures, and implement designs that will scale over time. AWS provides the AWS Well-Architected Tool to help you review your approach prior to development, the state of your workloads prior to production, and the state of your workloads in production. You can compare them to the latest AWS architectural best practices, monitor the overall status of your workloads, and gain insight to potential risks. AWS customers are eligible for a guided Well-Architected Review of their mission-critical workloads to measure their architectures against AWS best practices. Enterprise Support customers are eligible for an Operations Review, designed to help them to identify gaps in their approach to operating in the cloud. The cross-team engagement of these reviews helps to establish common understanding of your workloads and how team roles contribute to success. The needs identified through the review can help shape your priorities. AWS Trusted Advisor is a tool that provides access to a core set of checks that recommend optimizations that may help shape your priorities. Business and Enterprise Support customers receive access to additional checks focusing on security, reliability, performance, and cost-optimization that can further help shape their priorities. **Desired outcome:** - You regularly review and act on Well-Architected and Trusted Advisor outputs - You are aware of the latest patch status of your services - You understand the risk and impact of known threats and act accordingly - You implement mitigations as necessary - You communicate actions and context **Common anti-patterns:** - You are using an old version of a software library in your product. You are unaware of security updates to the library for issues that may have unintended impact on your workload. - Your competitor just released a version of their product that addresses many of your customers' complaints about your product. You have not prioritized addressing any of these known issues. - Regulators have been pursuing companies like yours that are not compliant with legal regulatory compliance requirements. You have not prioritized addressing any of your outstanding compliance requirements. **Benefits of establishing this best practice:** You identify and understand the threats to your organization and workload, which helps your determination of which threats to address, their priority, and the resources necessary to do so. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance - **Evaluate threat landscape:** Evaluate threats to the business (for example, competition, business risk and liabilities, operational risks, and information security threats), so that you can include their impact when determining where to focus efforts. - AWS Latest Security Bulletins - AWS Trusted Advisor - **Maintain a threat model:** Establish and maintain a threat model identifying potential threats, planned and in place mitigations, and their priority. Review the probability of threats manifesting as incidents, the cost to recover from those incidents and the expected harm caused, and the cost to prevent those incidents. Revise priorities as the contents of the threat model change.

๐Ÿ’ผ OPS01-BP06 Evaluate tradeoffs while managing benefits and risks

Competing interests from multiple parties can make it challenging to prioritize efforts, build capabilities, and deliver outcomes aligned with business strategies. For example, you may be asked to accelerate speed-to-market for new features over optimizing IT infrastructure costs. This can put two interested parties in conflict with one another. In these situations, decisions need to be brought to a higher authority to resolve conflict. Data is required to remove emotional attachment from the decision-making process. The same challenge may occur at a tactical level. For example, the choice between using relational or non-relational database technologies can have a significant impact on the operation of an application. It's critical to understand the predictable results of various choices. AWS can help you educate your teams about AWS and its services to increase their understanding of how their choices can have an impact on your workload. Use the resources provided by Support (AWS Knowledge Center, AWS Discussion Forums, and Support Center) and AWS Documentation to educate your teams. For further questions, reach out to Support. AWS also shares operational best practices and patterns in The Amazon Builders' Library. A wide variety of other useful information is available through the AWS Blog and The Official AWS Podcast. **Desired outcome:** You have a clearly defined decision-making governance framework to facilitate important decisions at every level within your cloud delivery organization. This framework includes features like a risk register, defined roles that are authorized to make decisions, and a defined models for each level of decision that can be made. This framework defines in advance how conflicts are resolved, what data needs to be presented, and how options are prioritized, so that once decisions are made you can commit without delay. The decision-making framework includes a standardized approach to reviewing and weighing the benefits and risks of every decision to understand the tradeoffs. This may include external factors, such as adherence to regulatory compliance requirements. **Common anti-patterns:** - Your investors request that you demonstrate compliance with Payment Card Industry Data Security Standards (PCI DSS). You do not consider the tradeoffs between satisfying their request and continuing with your current development efforts. Instead, you proceed with your development efforts without demonstrating compliance. Your investors stop their support of your company over concerns about the security of your platform and their investments. - You have decided to include a library that one of your developers found on the internet. You have not evaluated the risks of adopting this library from an unknown source and do not know if it contains vulnerabilities or malicious code. - The original business justification for your migration was based upon the modernization of 60% of your application workloads. However, due to technical difficulties, a decision was made to modernize only 20%, leading to a reduction in planned benefits long-term, increased operator toil for infrastructure teams to manually support legacy systems, and greater reliance on developing new skillsets in your infrastructure teams that were not planning for this change. **Benefits of establishing this best practice:** Fully aligning and supporting board-level business priorities, understanding the risks to achieving success, making informed decisions, and acting appropriately when risks impede chances for success. Understanding the implications and consequences of your decisions helps you to prioritize your options and bring leaders into agreement faster, leading to improved business outcomes. Identifying the available benefits of your choices and being aware of the risks to your organization helps you make data-driven decisions, rather than relying on anecdotes. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Managing benefits and risks should be defined by a governing body that drives the requirements for key decision-making. You want decisions to be made and prioritized based on how they benefit the organization, with an understanding of the risks involved. Accurate information is critical for making the organizational decisions. This should be based on solid measurements and defined by common industry practices of cost benefit analysis. To make these types of decisions, strike a balance between centralized and decentralized authority. There is always a tradeoff, and it's important to understand how each choice impacts defined strategies and desired business outcomes. ### Implementation steps 1. Formalize benefits measurement practices within a holistic cloud governance framework. 1. Balance central control of decision-making with decentralized authority for some decisions. 2. Understand that burdensome decision-making processes imposed on every decision can slow you down. 3. Incorporate external factors into your decision making process (like compliance requirements). 2. Establish an agreed-upon decision-making framework for various levels of decisions, which includes who is required to unblock decisions that are subject to conflicted interests. 1. Centralize one-way door decisions that could be irreversible. 2. Allow two-way door decisions to be made by lower level organizational leaders. 3. Understand and manage benefits and risks. Balance the benefits of decisions against the risks involved. 1. Identify benefits: Identify benefits based on business goals, needs, and priorities. Examples include business case impact, time-to-market, security, reliability, performance, and cost. 2. Identify risks: Identify risks based on business goals, needs, and priorities. Examples include time-to-market, security, reliability, performance, and cost. 3. Assess benefits against risks and make informed decisions: Determine the impact of benefits and risks based on goals, needs, and priorities of your key stakeholders, including business, development, and operations. Evaluate the value of the benefit against the probability of the risk being realized and the cost of its impact. For example, emphasizing speed-to-market over reliability might provide competitive advantage. However, it may result in reduced uptime if there are reliability issues. 4. Programatically enforce key decisions that automate your adherence to compliance requirements. 5. Leverage known industry frameworks and capabilities, such as Value Stream Analysis and LEAN, to baseline current state performance, business metrics, and define iterations of progress towards improvements to these metrics. **Level of effort for the implementation plan:** Medium-High

๐Ÿ’ผ OPS02-BP01 Resources have identified owners

Resources for your workload must have identified owners for change control, troubleshooting, and other functions. Owners are assigned for workloads, accounts, infrastructure, platforms, and applications. Ownership is recorded using tools like a central register or metadata attached to resources. The business value of components informs the processes and procedures applied to them. **Desired outcome** - Resources have identified owners using metadata or a central register. - Team members can identify who owns resources. - Accounts have a single owner where possible. **Common anti-patterns** - The alternate contacts for your AWS accounts are not populated. - Resources lack tags that identify what teams own them. - You have an ITSM queue without an email mapping. - Two teams have overlapping ownership of a critical piece of infrastructure. **Benefits of establishing this best practice** - Change control for resources is straightforward with assigned ownership. - You can involve the right owners when troubleshooting issues. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Define what ownership means for the resource use cases in your environment. Ownership can mean who oversees changes to the resource, supports the resource during troubleshooting, or who is financially accountable. Specify and record owners for resources, including name, contact information, organization, and team. **Customer example** AnyCompany Retail defines ownership as the team or individual that owns changes and support for resources. They leverage AWS Organizations to manage their AWS accounts. Alternate account contacts are configuring using group inboxes. Each ITSM queue maps to an email alias. Tags identify who own AWS resources. For other platforms and infrastructure, they have a wiki page that identifies ownership and contact information. ### Implementation steps 1. Start by defining ownership for your organization. Ownership can imply who owns the risk for the resource, who owns changes to the resource, or who supports the resource when troubleshooting. Ownership could also imply financial or administrative ownership of the resource. 2. Use AWS Organizations to manage accounts. You can manage the alternate contacts for your accounts centrally. 1. Using company owned email addresses and phone numbers for contact information helps you to access them even if the individuals whom they belong to are no longer with your organization. For example, create separate email distribution lists for billing, operations, and security and configure these as Billing, Security, and Operations contacts in each active AWS account. Multiple people will receive AWS notifications and be able to respond, even if someone is on vacation, changes roles, or leaves the company. 2. If an account is not managed by AWS Organizations, alternate account contacts help AWS get in contact with the appropriate personnel if needed. Configure the account's alternate contacts to point to a group rather than an individual. 3. Use tags to identify owners for AWS resources. You can specify both owners and their contact information in separate tags. 1. You can use AWS Config rules to enforce that resources have the required ownership tags. 2. For in-depth guidance on how to build a tagging strategy for your organization, see AWS Tagging Best Practices whitepaper. 4. Use Amazon Q Business, a conversational assistant that uses generative AI to enhance workforce productivity, answer questions, and complete tasks based on information in your enterprise systems. 1. Connect Amazon Q Business to your company's data source. Amazon Q Business offers prebuilt connectors to over 40 supported data sources, including Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and Atlassian Confluence. 5. For other resources, platforms, and infrastructure, create documentation that identifies ownership. This should be accessible to all team members. **Level of effort for the implementation plan:** Low. Leverage account contact information and tags to assign ownership of AWS resources. For other resources you can use something as simple as a table in a wiki to record ownership and contact information, or use an ITSM tool to map ownership.

๐Ÿ’ผ OPS02-BP02 Processes and procedures have identified owners

Understand who has ownership of the definition of individual processes and procedures, why those specific process and procedures are used, and why that ownership exists. Understanding the reasons that specific processes and procedures are used aids in identification of improvement opportunities. **Desired outcome**: Your organization has a well defined and maintained set of process and procedures for operational tasks. The process and procedures are stored in a central location and available to your team members. Process and procedures are updated frequently, by clearly assigned ownership. Where possible, scripts, templates, and automation documents are implemented as code. **Common anti-patterns** - Processes are not documented. Fragmented scripts may exist on isolated operator workstations. - Knowledge of how to use scripts is held by a few individuals or informally as team knowledge. - A legacy process is due for an update, but ownership of the update is unclear, and the original author is no longer part of the organization. - Processes and scripts are not discoverable, so they are not readily available when required (for example, in response to an incident). **Benefits of establishing this best practice** - Processes and procedures boost your efforts to operate your workloads. - New team members become effective more quickly. - Reduced time to mitigate incidents. - Different team members (and teams) can use the same processes and procedures in a consistent manner. - Teams can scale their processes with repeatable processes. - Standardized processes and procedures help mitigate the impact of transferring workload responsibilities between teams. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance - Processes and procedures have identified owners who are responsible for their definition. - Identify the operations activities conducted in support of your workloads. Document these activities in a discoverable location. - Uniquely identify the individual or team responsible for the specification of an activity. They are responsible to verify that it can be successfully performed by an adequately skilled team member with the correct permissions, access, and tools. If there are issues with performing that activity, the team members performing it are responsible for providing the detailed feedback necessary for the activity to be improved. - Capture ownership in the metadata of the activity artifact through services like AWS Systems Manager, through documents, and AWS Lambda. Capture resource ownership using tags or resource groups, specifying ownership and contact information. Use AWS Organizations to create tagging polices and capture ownership and contact information. - Over time, these procedures should be evolved to be runnable as code, reducing the need for human intervention. - For example, consider AWS Lambda functions, CloudFormation templates, or AWS Systems Manager automation docs. - Perform version control in appropriate repositories. - Include suitable resource tagging so owners and documentation can readily be identified. **Customer example** AnyCompany Retail defines ownership as the team or individual that owns processes for an application or groups of applications (that share common architetural practices and technologies). Initially, the process and procedures are documented as step-by-step guides in the document management system, discoverable using tags on the AWS account that hosts the application and on specific groups of resources within the account. They leverage AWS Organizations to manage their AWS accounts. Over time, these processes are converted to code, and resources are defined using infrastructure as code (such as CloudFormation or AWS Cloud Development Kit (AWS CDK) templates). The operational processes become automation documents in AWS Systems Manager or AWS Lambda functions, which can be initiated as scheduled tasks, in response to events such as AWS CloudWatch alarms or AWS EventBridge events, or started by requests within an IT service management (ITSM) platform. All process have tags to identify ownership. Documentation for the automation and process is maintained within the wiki pages generated by the code repository for the process. ### Implementation steps 1. Document the existing processes and procedures. 1. Review and keep them up-to-date. 2. Identify an owner for each process or procedure. 3. Place them under version control. 4. Where possible, share processes and procedures across workloads and environments that share architectural designs. 2. Establish mechanisms for feedback and improvement. 1. Define policies for how frequently processes should be reviewed. 2. Define processes for reviewers and approvers. 3. Implement issues or a ticketing queue for feedback to be provided and tracked. 4. Whereever possible, processes and procedures should have pre-approval and risk classification from a change approval board (CAB). 3. Verify that processes and procedures are accessible and discoverable by those who need to run them. 1. Use tags to indicate where the process and procedures can be accessed for the workload. 2. Use meaningful error and event messaging to indicate the appropriate processes or procedures to address an issue. 3. Use wikis and document management, and make processes and procedures searchable consistently accross the organization. 4. Use Amazon Q Business, a conversational assistant that uses generative AI to enhance workforce productivity, answer questions, and complete tasks based on information in your enterprise systems. 1. Connect Amazon Q Business to your company's data source. Amazon Q Business offers prebuilt connectors to over 40 supported data sources, including Amazon S3, Microsoft SharePoint, Salesforce, and Atlassian Confluence. For more information, see Amazon Q connectors. 5. Automate when appropriate. 1. Automations should be developed when services and technologies provide an API. 2. Educate adequately on processes. Develop the user stories and requirements to automate those processes. 3. Measure the use of your processes and procedures successfully, and create issues or tickets to support iterative improvement. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS02-BP03 Operations activities have identified owners responsible for their performance

Understand who has responsibility to perform specific activities on defined workloads and why that responsibility exists. Understanding who has responsibility to perform activities informs who will conduct the activity, validate the result, and provide feedback to the owner of the activity. **Desired outcome**: Your organization clearly defines responsibilities to perform specific activities on defined workloads and respond to events generated by the workload. The organization documents ownership of processes and fulfillment and makes this information discoverable. You review and update responsibilities when organizational changes take place, and teams track and measure the performance of defect and inefficiency identification activities. You implement feedback mechanisms to track defects and improvements and support iterative improvement. **Common anti-patterns** - You do not document responsibilities. - Fragmented scripts exist on isolated operator workstations. Only a few individuals know how to use them or informally refer to them as team knowledge. - A legacy process is due for update, but no one knows who owns the process, and the original author is no longer part of the organization. - Processes and scripts can't be discovered, and they are not readily available when required (for example, in response to an incident). **Benefits of establishing this best practice** - You understand who is responsible to perform an activity, who to notify when action is needed, and who performs the action, validates the result, and provides feedback to the owner of the activity. - Processes and procedures boost your efforts to operate your workloads. - New team members become effective more quickly. - You reduce the time it takes to mitigate incidents. - Different teams use the same processes and procedures to perform tasks in a consistent manner. - Teams can scale their processes with repeatable processes. - Standardized processes and procedures help mitigate the impact of transferring workload responsibilties between teams. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To begin to define responsibilities, start with existing documentation, like responsibility matrices, processes and procedures, roles and responsibilities, and tools and automation. Review and host discussions on the responsibilities for documented processes. Review with teams to identify misalignments between document responsibilities and processes. Discuss services offered with internal customers of that team to identify expectations gaps between teams. Analyze and address the discrepancies. Identify opportunities to improvement, and look for frequently requested, resource-intensive activities, which are typically strong candidates for improvement. Explore best practices, patterns, and prescriptive guidance to simplify and standardize improvements. Record improvement opportunities, and track the improvements to completion. Over time, these procedures should be evolved to be run as code, reducing the need for human intervention. For example, procedures can be initiated as AWS Lambda functions, AWS CloudFormation templates, or AWS Systems Manager Automation documents. Verify that these procedures are version-controlled in appropriate repositories, and include suitable resource tagging so that teams can readily identify owners and documentation. Document the responsibility for carrying out the activities, and then monitor the automations for successful initiation and operation, as well as performance of the desired outcomes. **Customer example** AnyCompany Retail defines ownership as the team or individual that owns processes for an application or groups of applications that share common architectural practices and technologies. Initially, the company documents the processes and procedures as step-by-step guides in the document management system. They make the procedures discoverable using tags on the AWS account that hosts the application and on specific groups of resources within the account, using AWS Organizations to manage their AWS accounts. Over time, AnyCompany Retail converts these processes to code and defines resources using infrastructure as code (through services like CloudFormation or AWS Cloud Development Kit (AWS CDK) templates). The operational processes become Automation documents in AWS Systems Manager or AWS Lambda functions, which can be initiated as scheduled tasks in response to events such as Amazon CloudWatch alarms or Amazon EventBridge events or by requests within an IT service management (ITSM) platform. All processes have tags to identify who owns them. Teams manage documentation for the automation and process within the wiki pages generated by the code repository for the process. ### Implementation steps 1. Document the existing processes and procedures. 1. Review and verify that they are up-to-date. 2. Verify that each process or procedure has an owner. 3. Place the procedures under version control. 4. Where possible, share processes and procedures across workloads and environments that share architectural designs. 2. Establish mechanisms for feedback and improvement. 1. Define policies for how frequently processes should be reviewed. 2. Define processes for reviewers and approvers. 3. Implement issues or a ticketing queue to provide and track feedback. 4. Wherever possible, provide pre-approval and risk classification for processes and procedures from a change approval board (CAB). 3. Make process and procedures accessible and discoverable by users who need to run them. 1. Use tags to indicate where the process and procedures can be accessed for the workload. 2. Use meaningful error and event messaging to indicate the appropriate process or proceedure to address the issue. 3. Use wikis or document management to make processes and procedures consistently searchable across the organization. 4. Automate when it is appropriate to do so. 1. Where services and technologies provide an API, develop automations. 2. Verify that processes are well-understood, and develop the user stories and requirements to automate those processes. 3. Measure the successful use of processes and procedures, with issue tracking to support iterative improvement. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS02-BP04 Mechanisms exist to manage responsibilities and ownership

Understand the responsibilities of your role and how you contribute to business outcomes, as this understanding informs the prioritization of your tasks and why your role is important. This helps team members recognize needs and respond appropriately. When team members know their role, they can establish ownership, identify improvement opportunities, and understand how to influence or make appropriate changes. Occasionally, a responsibility might not have a clear owner. In these situations, design a mechanism to resolve this gap. Create a well-defined escalation path to someone with the authority to assign ownership or plan to address the need. **Desired outcome**: Teams within your organization have clearly-defined responsibilities that include how they are related to resources, actions to be performed, processes, and procedures. These responsibilities align to the team's responsibilities and goals, as well as the responsibilities of other teams. You document the routes of escalation in a consistent and discoverable manner and feed these decisions into documentation artifacts, such as responsibility matrices, team definitions, or wiki pages. **Common anti-patterns** - The responsibilities of the team are ambiguous or poorly-defined. - The team does not align roles with responsibilities. - The team does not align its goals and objectives its responsibilities, which makes it difficult to measure success. - Team member responsibilities do not align with the team and the wider organization. - Your team does not keep responsibilities up-to-date, which makes them inconsistent with the tasks performed by the team. - Escalation paths for determining responsibilities aren't defined or are unclear. - Escalation paths have no single thread owner to ensure timely reponse. - Roles, responsibilities, and escalation paths are not discoverable, and they are not readily available when required (for example, in response to an incident). **Benefits of establishing this best practice** - When you understand who has responsibility or ownership, you can contact the proper team or team member to make a request or transition a task. - To reduce the risk of inaction and unaddressed needs, you have identified a person who has the authority to assign responsibility or ownership. - When you clearly define the scope of a responsibility, your team members gain autonomy and ownership. - Your responsibilities inform the decisions you make, the actions you take, and your handoff activities to their proper owners. - It's easy to identify abandoned responsibilities because you have a clear understanding of what falls outside of your team's responsibility, which helps you escalate for clarification. - Teams avoid confusion and tension, and they can more adequately manage their workloads and resources. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Identify team members roles and responsibilities, and verify that they understand the expectations of their role. Make this information discoverable so that members of your organization can identify who they need to contact for specific needs, whether it's a team or individual. As organizations seek to capitalize on the opportunities to migrate and modernize on AWS, roles and responsibilities might also change. Keep your teams and their members aware of their responsibilities, and train them appropriately to carry out their tasks during this change. Determine the role or team that should receive escalations to identify responsibility and ownership. This team can engage with various stakeholders to come to a decision. However, they should own the management of the decision making process. Provide accessible mechanisms for members of your organization to discover and identify ownership and responsibility. These mechanisms teach them who to contact for specific needs. **Customer example** AnyCompany Retail recently completed a migration of workloads from an on-premises environment to their landing zone in AWS with a lift and shift approach. They performed an operations review to reflect on how they accomplish common operational tasks and verified that their existing responsibility matrix reflects operations in the new environment. When they migrated from on-premises to AWS, they reduced the infrastructure teams responsibilities relating to the hardware and physical infrastructure. This move also revealed new opportunities to evolve the operating model for their workloads. While they identified, addressed, and documented the majority of responsibilities, they also defined escalation routes for any responsibilities that were missed or that may need to change as operations practices evolve. To explore new opportunities to standardize and improve efficiency across your workloads, provide access to operations tools like AWS Systems Manager and security tools like AWS Security Hub and Amazon GuardDuty. AnyCompany Retail puts together a review of responsibilities and strategy based on improvements they wants to address first. As the company adopts new ways of working and technology patterns, they update their responsibility matrix to match. ### Implementation steps 1. Start with existing documentation. Some typical source documents might include: 1. Responsibility or responsible, accountable, consulted, and informed (RACI) matrices 2. Team definitions or wiki pages 3. Service definitions and offerings 4. Role or job descriptions 2. Review and host discussions on the documented responsibilities: 1. Review with teams to identify misalignments between documented responsibilities and responsibilities the team typically performs. 2. Discuss potential services offered by internal customers to identify gaps in expectations between teams. 3. Analysis and address the discrepancies. 4. Identify opportunities for improvement. 1. Identify frequently-requested, resource-intensive requests, which are typically strong candidates for improvement. 2. Look for best practices, understand patterns, follow prescriptive guidance, and simplify and standardize improvements. 3. Record improvement opportunities, and track them to completion. 5. If a team doesn't already hold responsibility for managing and tracking the assignment of responsibilities, identify someone on the team to hold this responsibility. 1. Define a process for teams to request clarification of responsibility. 2. Review the process, and verify that it is clear and simple to use. 3. Make sure that someone owns and tracks escalations to their conclusion. 4. Establish operational metrics to measure effectiveness. 5. Create a feedback mechanisms to verify that teams can highlight improvement opportunities. 6. Implement a mechanism for periodic review. 6. Document in a discoverable and accessible location. 1. Wikis or documentation portal are common choices. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS02-BP05 Mechanisms exist to request additions, changes, and exceptions

You can make requests to owners of processes, procedures, and resources. Requests include additions, changes, and exceptions. These requests go through a change management process. Make informed decisions to approve requests where viable and determined to be appropriate after an evaluation of benefits and risks. **Desired outcome** - You can make requests to change processes, procedures, and resources based on assigned ownership. - Changes are made in a deliberate manner, weighing benefits and risks. **Common anti-patterns** - You must update the way you deploy your application, but there is no way to request a change to the deployment process from the operations team. - The disaster recovery plan must be updated, but there is no identified owner to request changes to. **Benefits of establishing this best practice** - Processes, procedures, and resources can evolve as requirements change. - Owners can make informed decisions when to make changes. - Changes are made in a deliberate manner. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance To implement this best practice, you need to be able to request changes to processes, procedures, and resources. The change management process can be lightweight. Document the change management process. **Customer example** AnyCompany Retail uses a responsibility assignment (RACI) matrix to identify who owns changes for processes, procedures, and resources. They have a documented change management process thatโ€™s lightweight and easy to follow. Using the RACI matrix and the process, anyone can submit change requests. ### Implementation steps 1. Identify the processes, procedures, and resources for your workload and the owners for each. Document them in your knowledge management system. 1. If you have not implemented OPS02-BP01 Resources have identified owners, OPS02-BP02 Processes and procedures have identified owners, or OPS02-BP03 Operations activities have identified owners responsible for their performance, start with those first. 2. Work with stakeholders in your organization to develop a change management process. The process should cover additions, changes, and exceptions for resources, processes, and procedures. 1. You can use AWS Systems Manager Change Manager as a change management platform for workload resources. 3. Document the change management process in your knowledge management system. **Level of effort for the implementation plan:** Medium. Developing a change management process requires alignment with multiple stakeholders across your organization.

๐Ÿ’ผ OPS02-BP06 Responsibilities between teams are predefined or negotiated

Have defined or negotiated agreements between teams describing how they work with and support each other (for example, response times, service level objectives, or service-level agreements). Inter-team communications channels are documented. Understanding the impact of the teamsโ€™ work on business outcomes and the outcomes of other teams and organizations informs the prioritization of their tasks and helps them respond appropriately. When responsibility and ownership are undefined or unknown, you are at risk of both not addressing necessary activities in a timely fashion and of redundant and potentially conflicting efforts emerging to address those needs. **Desired outcome** - Inter-team working or support agreements are agreed to and documented. - Teams that support or work with each other have defined communication channels and response expectations. **Common anti-patterns** - An issue occurs in production and two separate teams start troubleshooting independent of each other. Their siloed efforts extend the outage. - The operations team needs assistance from the development team but there is no agreed to response time. The request is stuck in the backlog. **Benefits of establishing this best practice** - Teams know how to interact and support each other. - Expectations for responsiveness are known. - Communications channels are clearly defined. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Implementing this best practice means that there is no ambiguity about how teams work with each other. Formal agreements codify how teams work together or support each other. Inter-team communication channels are documented. **Customer example** AnyCompany Retailโ€™s SRE team has a service level agreement with their development team. Whenever the development team makes a request in their ticketing system, they can expect a response within fifteen minutes. If there is a site outage, the SRE team takes lead in the investigation with support from the development team. ### Implementation steps 1. Working with stakeholders across your organization, develop agreements between teams based on processes and procedures. 1. If a process or procedure is shared between two teams, develop a runbook on how the teams will work together. 2. If there are dependencies between teams, agree to a response SLA for requests. 2. Document responsibilities in your knowledge management system. **Level of effort for the implementation plan:** Medium. If there are no existing agreements between teams, it can take effort to come to agreement with stakeholders across your organization.

๐Ÿ’ผ OPS03-BP01 Provide executive sponsorship

At the highest level, senior leadership acts as the executive sponsor to clearly set expectations and direction for the organization's outcomes, including evaluating its success. The sponsor advocates and drives adoption of best practices and evolution of the organization. **Desired outcome:** Organizations that endeavor to adopt, transform, and optimize their cloud operations establish clear lines of leadership and accountability for desired outcomes. The organization understands each capability required by the organization to accomplish a new outcome and assigns ownership to functional teams for development. Leadership actively sets this direction, assigns ownership, takes accountability, and defines the work. Individuals across the organization can mobilize, feel inspired, and actively work towards the desired objectives. **Common anti-patterns** - There is a mandate for workload owners to migrate workloads to AWS without a clear sponsor and plan for cloud operations. This results in teams not consciously collaborating to improve and mature their operational capabilities. Lack of operational best practice standards overwhelm teams (such as operator-toil, on-calls, and technical debt), which constrains innovation. - A new organization-wide goal has been set to adopt an emerging technology without providing leadership sponsor and strategy. Teams interpret goals differently, which causes confusion on where to focus efforts, why they matter, and how to measure impact. Consequently, the organization loses momentum in adopting the technology. **Benefits of establishing this best practice:** When executive sponsorship clearly communicates and shares vision, direction, and goals, team members know what is expected of them. Individuals and teams begin to intensely focus effort in the same direction to accomplish defined objectives when leaders are actively engaged. The organization maximizes the ability to succeed. When you evaluate success, you can better identify barriers to success so that they can be addressed through intervention by the executive sponsor. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance - At every phase of the cloud journey (migration, adoption, or optimization), success requires active involvement at the highest level of leadership with a designated executive sponsor. The executive sponsor aligns the team's mindset, skillsets, and ways of working to the defined strategy. - **Explain the why:** Bring clarity and explain the reasoning behind the vision and strategy. - **Set expectations:** Define and publish goals for your organizations, including how progress and success are measured. - **Track achievement of goals:** Measure the incremental achievement of goals regularly (not just completion of tasks). Share the results so that appropriate action can be taken if outcomes are at risk. - **Provide the resources necessary to achieve your goals:** Bring people and teams together to collaborate and build the right solutions that bring about the defined outcomes. This reduces or eliminates organizational friction. - **Advocate for your teams:** Remain engaged with your teams so that you understand their performance and whether there are external factors affecting them. Identify obstacles that are impeding your teams progress. Act on behalf of your teams to help address obstacles and remove unnecessary burdens. When your teams are impacted by external factors, reevaluate goals and adjust targets as appropriate. - **Drive adoption of best practices:** Acknowledge best practices that provide quantifiable benefits, and recognize the creators and adopters. Encourage further adoption to magnify the benefits achieved. - **Encourage evolution of your teams:** Create a culture of continual improvement, and proactively learn from progress made as well as failures. Encourage both personal and organizational growth and development. Use data and anecdotes to evolve the vision and strategy. **Customer example** AnyCompany Retail is in the process of business transformation through rapid reinvention of customer experiences, enhancement of productivity, and acceleration of growth through generative AI. ### Implementation steps 1. Establish single-threaded leadership, and assign a primary executive sponsor to lead and drive the transformation. 2. Define clear business outcomes of your transformation, and assign ownership and accountability. Empower the primary executive with the authority to lead and make critical decisions. 3. Verify that your transformational strategy is very clear and communicated widely by the executive sponsor to every level of the organization. 1. Establish clearly defined business objectives for IT and cloud initiatives. 2. Document key business metrics to drive IT and cloud transformation. 3. Communicate the vision consistently to all teams and individuals responsible for parts of the strategy. 4. Develop communication planning matrices that specify what message needs to be delivered to specified leaders, managers, and individual contributors. Specify the person or team that should deliver this message. 1. Fulfill communications plans consistently and reliably. 2. Set and manage expectations through in-person events on a regular basis. 3. Accept feedback on the effectiveness of communications, and adjust the communications and plan accordingly. 4. Schedule communication events to proactively understand challenges from teams, and establish a consistent feedback loop that allows for correcting course where necessary. 5. Actively engage each initiative from a leadership perspective to verify that all impacted teams understand the outcomes they are accountable to achieve. 6. At every status meeting, executive sponsors should look for blockers, inspect established metrics, anecdotes, or feedback from the teams, and measure progress towards objectives. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS03-BP02 Team members are empowered to take action when outcomes are at risk

A cultural behavior of ownership instilled by leadership results in any employee feeling empowered to act on behalf of the entire company beyond their defined scope of role and accountability. Employees can act to proactively identify risks as they emerge and take appropriate action. Such a culture allows employees to make high value decisions with situational awareness. For example, Amazon uses Leadership Principles as the guidelines to drive desired behavior for employees to move forward in situations, solve problems, deal with conflict, and take action. **Desired outcome:** Leadership has influenced a new culture that allows individuals and teams to make critical decisions, even at lower levels of the organization (as long as decisions are defined with auditable permissions and safety mechanisms). Failure is not discouraged, and teams iteratively learn to improve their decision-making and responses to tackle similar situations going forward. If someone's actions result in an improvement that can benefit other teams, they proactively share knowledge from such actions. Leadership measures operational improvements and incentivizes the individual and organization for adoption of such patterns. **Common anti-patterns** - There isn't clear guidance or mechanisms in an organization for what to do when a risk is identified. For example, when an employee notices a phishing attack, they fail to report to the security team, resulting in a large portion of the organization falling for the attack. This causes a data breach. - Your customers complain about service unavailability, which primarily stems from failed deployments. Your SRE team is responsible for the deployment tool, and an automated rollback for deployments is in their long-term roadmap. In a recent application rollout, one of the engineers devised a solution to automate rolling back their application to a previous version. Though their solution can become the pattern for SRE teams, other teams do not adopt, as there is no process to track such improvements. The organization continues to be plagued with failed deployments impacting customers and causing further negative sentiment. - In order to stay compliant, your infosec team oversees a long-established process to rotate shared SSH keys regularly on behalf of operators connecting to their Amazon EC2 Linux instances. It takes several days for the infosec teams to complete rotating keys, and you are blocked from connecting to those instances. No one inside or outside of infosec suggests using other options on AWS to achieve the same result. **Benefits of establishing this best practice:** By decentralizing authority to make decisions and empowering your teams to decide key decisions, you are able to address issues more quickly with increasing success rates. Teams start to realize a sense of ownership, and failures are acceptable. Experimentation becomes a cultural mainstay. Managers and directors do not feel as though they are micro-managed through every aspect of their work. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance 1. Develop a culture where it is expected that failures can occur. 2. Define clear ownership and accountability for various functional areas within the organization. 3. Communicate ownership and accountability to everyone so that individuals know who can help them facilitate decentralized decisions. 4. Define your one-way and two-way door decisions to help individuals know when they do need to escalate to higher levels of leadership. 5. Create organizational awareness that all employees are empowered to take action at various levels when outcomes are at risk. Provide your team members documentation of governance, permission-levels, tools, and opportunities to practice the skills necessary to respond effectively. 6. Give your team members the opportunity to practice the skills necessary to respond to various decisions. Once decision levels are defined, perform game days to verify that all individual contributors understand and can demonstrate the process. 1. Provide alternative safe environments where processes and procedures can be tested and trained upon. 2. Acknowledge and create awareness that team members have authority to take action when the outcome has a predefined level of risk. 3. Define the authority of your team members to take action by assigning permissions and access to the workloads and components they support. 7. Provide ability for teams to share their learnings (operational successes and failures). 8. Empower teams to challenge the status quo, and provide mechanisms to track and measure improvements, as well as their impact to the organization. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS03-BP03 Escalation is encouraged

Team members are encouraged by leadership to escalate issues and concerns to higher-level decision makers and stakeholders if they believe desired outcomes are at risk and expected standards are not met. This is a feature of the organization's culture and is driven at all levels. Escalation should be done early and often so that risks can be identified and prevented from causing incidents. Leadership does not reprimand individuals for escalating an issue. **Desired outcome:** Individuals throughout the organization are comfortable to escalate problems to their immediate and higher levels of leadership. Leadership has deliberately and consciously established expectations that their teams should feel safe to escalate any issue. A mechanism exists to escalate issues at each level within the organization. When employees escalate to their manager, they jointly decide the level of impact and whether the issue should be escalated. In order to initiate an escalation, employees are required to include a recommended work plan to address the issue. If direct management does not take timely action, employees are encouraged to take issues to the highest level of leadership if they feel strongly that the risks to the organization warrant the escalation. **Common anti-patterns** - Executive leaders do not ask enough probing questions during your cloud transformation program status meeting to find where issues and blockers are occurring. Only good news is presented as status. The CIO has made it clear that she only likes to hear good news, as any challenges brought up make the CEO think that the program is failing. - You are a cloud operations engineer and you notice that the new knowledge management system is not being widely adopted by application teams. The company invested one year and several million dollars to implement this new knowledge management system, but people are still authoring their runbooks locally and sharing them on an organizational cloud share, making it difficult to find knowledge pertinent to supported workloads. You try to bring this to leadership's attention, because consistent use of this system can enhance operational efficiency. When you bring this to the director who lead the implementation of the knowledge management system, she reprimands you because it calls the investment into question. - The infosec team responsible for hardening compute resources has decided to put a process in place that requires performing the scans necessary to ensure that EC2 instances are fully secured before the compute team releases the resource for use. This has created a time delay of an additional week for resources to be deployed, which breaks their SLA. The compute team is afraid to escalate this to the VP over cloud because this makes the VP of information security look bad. **Benefits of establishing this best practice** Complex or critical issues are addressed before they impact the business. Less time is wasted. Risks are minimized. Teams become more proactive and results focused when solving problems. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance The willingness and ability to escalate freely at every level in the organization is an organizational and cultural foundation that should be consciously developed through emphasized training, leadership communications, expectation setting, and the deployment of mechanisms throughout the organization at every level. ### Implementation steps 1. Define policies, standards, and expectations for your organization. 1. Ensure wide adoption and understanding of policies, expectations, and standards. 2. Encourage, train, and empower workers for early and frequent escalation when standards are not met. 3. Organizationally acknowledge that early and frequent escalation is the best practice. Accept that escalations may prove to be unfounded, and that it is better to have the opportunity to prevent an incident then to miss that opportunity by not escalating. 1. Build a mechanism for escalation (like an Andon cord system). 2. Have documented procedures defining when and how escalation should occur. 3. Define the series of people with increasing authority to take or approve action, as well as each stakeholder's contact information. 4. When escalation occurs, it should continue until the team member is satisfied that the risk has been mitigated through actions driven from leadership. 1. Escalations should include: - Description of the situation, and the nature of the risk - Criticality of the situation - Who or what is impacted - How great the impact is - Urgency if impact occurs - Suggested remedies and plans to mitigate 2. Protect employees who escalate. Have policy that protects team members from retribution if they escalate around a non-responsive decision maker or stakeholder. Have mechanisms in place to identify if this is occurring and respond appropriately. 5. Encourage a culture of continuous improvement feedback loops in everything that the organization produces. Feedback loops act as minor escalations to individuals responsible, and they identify improvement opportunities, even when escalation is not needed. Continuous improvement cultures force everyone to be more proactive. 6. Leadership should periodically reemphasize the policies, standards, mechanisms, and the desire for open escalation and continuous feedback loops without retribution. **Level of effort for the Implementation Plan:** Medium

๐Ÿ’ผ OPS03-BP04 Communications are timely, clear, and actionable

Leadership is responsible for the creation of strong and effective communications, especially when the organization adopts new strategies, technologies, or ways of working. Leaders should set expectations for all staff to work towards the company objectives. Devise communication mechanisms that create and maintain awareness among the teams responsible for running plans that are funded and sponsored by leadership. Make use of cross-organizational diversity, and listen attentively to multiple unique perspectives. Use this perspective to increase innovation, challenge your assumptions, and reduce the risk of confirmation bias. Foster inclusion, diversity, and accessibility within your teams to gain beneficial perspectives. **Desired outcome:** Your organization designs communication strategies to address the impact of change to the organization. Teams remain informed and motivated to continue working with one another rather than against each other. Individuals understand how important their role is to achieve the stated objectives. Email is only a passive mechanism for communications and used accordingly. Management spends time with their individual contributors to help them understand their responsibility, the tasks to complete, and how their work contributes to the overall mission. Leaders engage people directly in smaller venues to convey messages and verify that these messages are being delivered effectively. The organization performs at or above the expectations of leadership. Leadership encourages and seeks diverse opinions within and across teams. **Common anti-patterns** - Your organization has a five-year plan to migrate all workloads to AWS. The business case for cloud includes the modernization of 25% of all workloads to take advantage of serverless technology. The CIO communicates this strategy to direct reports and expects each leader to cascade this presentation to managers, directors, and individual contributors without any in-person communication. The CIO steps back and expects his organization to perform the new strategy. - Leadership does not provide or use a mechanism for feedback, and an expectation gap grows, which leads to stalled projects. - You are asked to make a change to your security groups, but you are not given any details of what change needs to be made, what the impact of the change could be on all the workloads, and when it should happen. The manager forwards an email from the VP of InfoSec and adds the message "Make this happen." - Changes were made to your migration strategy that reduce the planned modernization number from 25% to 10%. This has downstream effects on the operations organization. They were not informed of this strategic change and thus, they are not ready with enough skilled capacity to support a greater number of workloads lifted and shifted into AWS. **Benefits of establishing this best practice** - Your organization is well-informed on new or changed strategies, and they act accordingly with strong motivation to help each other achieve the overall objectives and metrics set by leadership. - Mechanisms exist and are used to provide timely notice to team members of known risks and planned events. - New ways of working (including changes to people or the organization, processes, or technology), along with required skills, are more effectively adopted by the organization, and your organization realizes business benefits more quickly. - Team members have the necessary context of the communications being received, and they can be more effective in their jobs. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To implement this best practice, work with stakeholders across your organization to agree to communication standards. Publicize those standards widely. For any significant IT transitions, an established planning team can more successfully manage the impact of change to its people than an organization that ignores this practice. In the absence of such a transition planning team, leadership holds 100% of the responsibility for effective communications. When establishing a transition planning team, assign team members to work with organizational leadership to define and manage effective communications at every level. ### Customer example AnyCompany Retail signed up for AWS Enterprise Support and depends on third-party providers for its cloud operations. The company uses chat and chatops as their main communication medium for operational activities. Alerts and other information populate specific channels. When someone must act, they clearly state the desired outcome, and in many cases, they receive a runbook or playbook to use. They schedule major changes to production systems with a change calendar. ### Implementation steps 1. Establish a core team within the organization that has accountability to build and initiate communication plans for changes that happen at multiple levels within the organization. 2. Institute single-threaded ownership to achieve oversight. Give individual teams the ability to innovate independently, and balance the use of consistent mechanisms, which allows for the right level of inspection and directional vision. 3. Work with stakeholders across your organization to agree to communication standards, practices, and plans. 4. Verify that the core communications team collaborates with organizational and program leadership to craft messages to appropriate staff on behalf of leaders. 5. Build strategic communication mechanisms to manage change through announcements, shared calendars, all-hands meetings, and in-person or one-on-one methods so that team members have proper expectations on the actions they should take. 6. Provide necessary context, details, and time (when possible) to determine if action is necessary. When action is needed, provide the required action and its impact. 7. Implement tools that facilitate tactical communications, like internal chat, email, and knowledge management. 8. Implement mechanisms to measure and verify that all communications lead to desired outcomes. 9. Establish a feedback loop that measures the effectiveness of all communications, especially when communications are related to resistance to changes throughout the organization. 10. For all AWS accounts, establish alternate contacts for billing, security, and operations. Ideally, each contact should be an email distribution as opposed to a specific individual contact. 11. Establish an escalation and reverse escalation communication plan to engage with your internal and external teams, including AWS support and other third-party providers. 12. Initiate and perform communication strategies consistently throughout the life of each transformation program. 13. Prioritize actions that are repeatable where possible to safely automate at scale. 14. When communications are required in scenarios with automated actions, the communication's purpose should be to inform teams, for auditing, or a part of the change management process. 15. Analyze communications from your alert systems for false positives or alerts that are constantly created. Remove or change these alerts so that they start when human intervention is required. If an alert is initiated, provide a runbook or playbook. 1. You can use AWS Systems Manager Documents to build playbooks and runbooks for alerts. 16. Mechanisms are in place to provide notification of risks or planned events in a clear and actionable way with enough notice to allow appropriate responses. Use email lists or chat channels to send notifications ahead of planned events. 1. AWS Chatbot can be used to send alerts and respond to events within your organizations messaging platform. 17. Provide an accessible source of information where planned events can be discovered. Provide notifications of planned events from the same system. 1. AWS Systems Manager Change Calendar can be used to create change windows when changes can occur. This provides team members notice when they can make changes safely. 18. Monitor vulnerability notifications and patch information to understand vulnerabilities in the wild and potential risks associated to your workload components. Provide notification to team members so that they can act. 1. You can subscribe to AWS Security Bulletins to receive notifications of vulnerabilities on AWS. 19. Seek diverse opinions and perspectives: Encourage contributions from everyone. Give communication opportunities to under-represented groups. Rotate roles and responsibilities in meetings. 1. Expand roles and responsibilities: Provide opportunities for team members to take on roles that they might not otherwise. They can gain experience and perspective from the role and from interactions with new team members with whom they might not otherwise interact. They can also bring their experience and perspective to the new role and team members they interact with. As perspective increases, identify emergent business opportunities or new opportunities for improvement. Rotate common tasks between members within a team that others typically perform to understand the demands and impact of performing them. 2. Provide a safe and welcoming environment: Establish policy and controls that protect the mental and physical safety of team members within your organization. Team members should be able to interact without fear of reprisal. When team members feel safe and welcome, they are more likely to be engaged and productive. The more diverse your organization, the better your understanding can be of the people you support, including your customers. When your team members are comfortable, feel free to speak, and are confident they are heard, they are more likely to share valuable insights (for example, marketing opportunities, accessibility needs, unserved market segments, and unacknowledged risks in your environment). 3. Encourage team members to participate fully: Provide the resources necessary for your employees to participate fully in all work related activities. Team members that face daily challenges develop skills for working around them. These uniquely-developed skills can provide significant benefit to your organization. Support team members with necessary accommodations to increase the benefits you can receive from their contributions.

๐Ÿ’ผ OPS03-BP05 Experimentation is encouraged

Experimentation is a catalyst for turning new ideas into products and features. It accelerates learning and keeps team members interested and engaged. Team members are encouraged to experiment often to drive innovation. Even when an undesired result occurs, there is value in knowing what not to do. Team members are not punished for successful experiments with undesired results. **Desired outcome** - Your organization encourages experimentation to foster innovation. - Experiments are used as an opportunity to learn. **Common anti-patterns** - You want to run an A/B test but there is no mechanism to run the experiment. You deploy a UI change without the ability to test it, resulting in a negative customer experience. - Your company only has a stage and production environment. There is no sandbox environment to experiment with new features or products, so experiments must be conducted in the production environment. **Benefits of establishing this best practice** - Experimentation drives innovation. - You can react faster to feedback from users through experimentation. - Your organization develops a culture of learning. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Experiments should be run in a safe manner. Leverage multiple environments to experiment without jeopardizing production resources. Use A/B testing and feature flags to test experiments. Provide team members the ability to conduct experiments in a sandbox environment. ### Customer example AnyCompany Retail encourages experimentation. Team members can use 20% of their work week to experiment or learn new technologies. They have a sandbox environment where they can innovate. A/B testing is used for new features to validate them with real user feedback. ### Implementation steps 1. Work with leadership across your organization to support experimentation. Encourage team members to experiment safely. 2. Provide your team members with an environment where they can safely experiment. They must have access to an environment that is like production. 1. Use a separate AWS account to create a sandbox environment. AWS Control Tower can be used to provision these accounts. 3. Use feature flags and A/B testing to experiment safely and gather user feedback. 1. AWS AppConfig Feature Flags provides the ability to create feature flags. 2. You can use AWS Lambda versions to deploy a new version of a function for beta testing. 5. Ensure experiments are logged, results are analyzed, and learnings are shared with the team to promote a culture of learning. **Level of effort for the implementation plan:** High. Providing a sandbox environment and safe mechanisms for experimentation may require significant investment. Application code may need modification to support feature flags or A/B testing.

๐Ÿ’ผ OPS03-BP06 Team members are encouraged to maintain and grow their skill sets

Teams must grow their skill sets to adopt new technologies, and to support changes in demand and responsibilities in support of your workloads. Growth of skills in new technologies is frequently a source of team member satisfaction and supports innovation. Support your team members' pursuit and maintenance of industry certifications that validate and acknowledge their growing skills. Cross train to promote knowledge transfer and reduce the risk of significant impact when you lose skilled and experienced team members with institutional knowledge. Provide dedicated structured time for learning. AWS provides resources, including the AWS Getting Started Resource Center, AWS Blogs, AWS Online Tech Talks, AWS Events and Webinars, and the AWS Well-Architected Labs, that provide guidance, examples, and detailed walkthroughs to educate your teams. Resources such as Support (AWS re:Post, Support Center), and AWS Documentation help remove technical roadblocks and improve operations. Reach out to Support through Support Center for help with your questions. AWS also shares best practices and patterns that we have learned through the operation of AWS in The Amazon Builders' Library and a wide variety of other useful educational material through the AWS Blog and The Official AWS Podcast. AWS Training and Certification includes free training through self-paced digital courses, along with learning plans by role or domain. You can also register for instructor-led training to further support the development of your teams' AWS skills. **Desired outcome:** Your organization constantly evaluates skill gaps and closes them with structured budget and investment. Teams encourage and incentivize their members with upskilling activities such as acquiring leading industry certifications. Teams take advantage of dedicated cross-sharing knowledge programs such as lunch-and-learns, immersion days, hackathons, and gamedays. Your organization's knowledge systems are kept up-to-date and relevant to cross-train team members, including new-hire onboarding trainings. **Common anti-patterns** - In the absence of a structured training program and budget, teams experience uncertainty as they try to keep pace with technology evolution, which results in increased attrition. - As part of migrating to AWS, your organization demonstrates skill gaps and varying cloud fluency amongst teams. Without an effort to upskill, teams find themselves overtasked with legacy and inefficient management of the cloud environment, which causes increased operator toil. This burnout increases employee dissatisfaction. **Benefits of establishing this best practice:** When your organization consciously invests in improving the skills of its teams, it also helps accelerate and scale cloud adoption and optimization. Targeted learning programs drive innovation and build operational ability for teams to be prepared to handle events. Teams consciously invest in the implementation and evolution of best practices. Team morale is high, and team members value their contribution to the business. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance To adopt new technologies, fuel innovation, and keep pace with changes in demand and responsibilities to support your workloads, continually invest in the professional growth of your teams. ### Implementation steps 1. **Use structured cloud advocacy programs:** AWS Skills Guild provides consultative training to increase cloud skill confidence and ignite a culture of continuous learning. 2. **Provide resources for education:** Provide dedicated, structured time and access to training materials and lab resources, and support participation in conferences and access to professional organizations that provide opportunities for learning from both educators and peers. Provide your junior team members with access to senior team members as mentors, or allow the junior team members to shadow their seniors' work and be exposed to their methods and skills. Encourage learning about content not directly related to work in order to have a broader perspective. 3. **Encourage use of expert technical resources:** Leverage resources such as AWS re:Post to get access to curated knowledge and vibrant community. 4. **Build and maintain an up-to-date knowledge repository:** Use knowledge sharing platforms such as wikis and runbooks. Create your own reusable expert knowledge source with AWS re:Post Private to streamline collaboration, improve productivity, and accelerate employee onboarding. 5. **Team education and cross-team engagement:** Plan for the continuing education needs of your team members. Provide opportunities for team members to join other teams (temporarily or permanently) to share skills and best practices benefiting your entire organization. 6. **Support pursuit and maintenance of industry certifications:** Support your team members in the acquisition and maintenance of industry certifications that validate what they have learned and acknowledge their accomplishments. **Level of effort for the implementation plan:** High

๐Ÿ’ผ OPS03-BP07 Resource teams appropriately

Provision the right amount of proficient team members, and provide tools and resources to support your workload needs. Overburdening team members increases the risk of human error. Investments in tools and resources, such as automation, can scale the effectiveness of your team and help them support a greater number of workloads without requiring additional capacity. **Desired outcome** - You have appropriately staffed your team to gain the skillsets needed for them to operate workloads in AWS in accordance with your migration plan. As your team has scaled itself up during the course of your migration project, they have gained proficiency in the core AWS technologies that the business plans to use when migrating or modernizing their applications. - You have carefully aligned your staffing plan to make efficient use of resources by leveraging automation and workflow. A smaller team can now manage more infrastructure on behalf of the application development teams. - With shifting operational priorities, any resource staffing constraints are proactively identified to protect the success of business initiatives. - Operational metrics that report operational toil (such as on-call fatigue or excessive paging) are reviewed to verify that staff are not overwhelmed. **Common anti-patterns** - Your staff have not ramped up on AWS skills as you close in on your multi-year cloud migration plan, which risks support of the workloads and lowers employee morale. - Your entire IT organization is shifting into agile ways of working. The business is prioritizing the product portfolio and setting metrics for what features need to be developed first. Your agile process does not require teams to assign story points to their work plans. As a result, it is impossible to know the level of capacity required for the next amount of work, or if you have the right skills assigned to the work. - You are having an AWS partner migrate your workloads, and you don't have a support transition plan for your teams once the partner completes the migration project. Your teams struggle to efficiently and effectively support the workloads. **Benefits of establishing this best practice:** You have appropriately-skilled team members available in your organization to support the workloads. Resource allocation can adapt to shifting priorities without impacting performance. Teams are proficient at supporting workloads while maximizing time to focus on innovating for customers, which in turn raises employee satisfaction. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Resource planning for your cloud migration should occur at an organizational level that aligns to your migration plan, as well as the desired operating model being implemented to support your new cloud environment. This should include understanding which cloud technologies are deployed for the business and application development teams. Infrastructure and operations leadership should plan for skills gap analysis, training, and role definition for engineers who are leading cloud adoption. ### Implementation steps 1. Define success criteria for team's success with relevant operational metrics such as staff productivity (for example, cost to support a workload or operator hours spent during incidents). 2. Define resource capacity planning and inspection mechanisms to verify that the right balance of qualified capacity is available when needed and can be adjusted over time. 3. Create mechanisms (for example, sending a monthly survey to teams) to understand work-related challenges that impact teams (like increasing responsibilities, changes in technology, loss of personnel, or increase in customers supported). 4. Use these mechanisms to engage with teams and spot trends that may contribute to employee productivity challenges. When your teams are impacted by external factors, reevaluate goals and adjust targets as appropriate. Identify obstacles that are impeding your team's progress. 5. Regularly review if your currently-provisioned resources are still sufficient, or if additional resources are needed, and make appropriate adjustments to support teams. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS04-BP01 Identify key performance indicators

Implementing observability in your workload starts with understanding its state and making data-driven decisions based on business requirements. One of the most effective ways to ensure alignment between monitoring activities and business objectives is by defining and monitoring key performance indicators (KPIs). **Desired outcome:** Efficient observability practices that are tightly aligned with business objectives, ensuring that monitoring efforts are always in service of tangible business outcomes. **Common anti-patterns** - Undefined KPIs: Working without clear KPIs can lead to monitoring too much or too little, missing vital signals. - Static KPIs: Not revisiting or refining KPIs as the workload or business objectives evolve. - Misalignment: Focusing on technical metrics that donโ€™t correlate directly with business outcomes or are harder to correlate with real-world issues. **Benefits of establishing this best practice** - Ease of issue identification: Business KPIs often surface issues more clearly than technical metrics. A dip in a business KPI can pinpoint a problem more effectively than sifting through numerous technical metrics. - Business alignment: Ensures that monitoring activities directly support business objectives. - Efficiency: Prioritize monitoring resources and attention on metrics that matter. - Proactivity: Recognize and address issues before they have broader business implications. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To effectively define workload KPIs: 1. Start with business outcomes: Before diving into metrics, understand the desired business outcomes. Is it increased sales, higher user engagement, or faster response times? 2. Correlate technical metrics with business objectives: Not all technical metrics have a direct impact on business outcomes. Identify those that do, but it's often more straightforward to identify an issue using a business KPI. 3. Use Amazon CloudWatch: Employ CloudWatch to define and monitor metrics that represent your KPIs. 4. Regularly review and update KPIs: As your workload and business evolve, keep your KPIs relevant. 5. Involve stakeholders: Involve both technical and business teams in defining and reviewing KPIs. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS04-BP02 Implement application telemetry

Application telemetry serves as the foundation for observability of your workload. It's crucial to emit telemetry that offers actionable insights into the state of your application and the achievement of both technical and business outcomes. From troubleshooting to measuring the impact of a new feature or ensuring alignment with business key performance indicators (KPIs), application telemetry informs the way you build, operate, and evolve your workload. Metrics, logs, and traces form the three primary pillars of observability. These serve as diagnostic tools that describe the state of your application. Over time, they assist in creating baselines and identifying anomalies. However, to ensure alignment between monitoring activities and business objectives, it's pivotal to define and monitor KPIs. Business KPIs often make it easier to identify issues compared to technical metrics alone. Other telemetry types, like real user monitoring (RUM) and synthetic transactions, complement these primary data sources. RUM offers insights into real-time user interactions, whereas synthetic transactions simulate potential user behaviors, helping detect bottlenecks before real users encounter them. **Desired outcome:** Derive actionable insights into the performance of your workload. These insights allow you to make proactive decisions about performance optimization, achieve increased workload stability, streamline CI/CD processes, and utilize resources effectively. **Common anti-patterns** - Incomplete observability: Neglecting to incorporate observability at every layer of the workload, resulting in blind spots that can obscure vital system performance and behavior insights. - Fragmented data view: When data is scattered across multiple tools and systems, it becomes challenging to maintain a holistic view of your workload's health and performance. - User-reported issues: A sign that proactive issue detection through telemetry and business KPI monitoring is lacking. **Benefits of establishing this best practice** - Informed decision-making: With insights from telemetry and business KPIs, you can make data-driven decisions. - Improved operational efficiency: Data-driven resource utilization leads to cost-effectiveness. - Enhanced workload stability: Faster detection and resolution of issues leading to improved uptime. - Streamlined CI/CD processes: Insights from telemetry data facilitate refinement of processes and reliable code delivery. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To implement application telemetry for your workload, use AWS services like Amazon CloudWatch and AWS X-Ray. Amazon CloudWatch provides a comprehensive suite of monitoring tools, allowing you to observe your resources and applications in AWS and on-premises environments. It collects, tracks, and analyzes metrics, consolidates and monitors log data, and responds to changes in your resources, enhancing your understanding of how your workload operates. In tandem, AWS X-Ray lets you trace, analyze, and debug your applications, giving you a deep understanding of your workload's behavior. With features like service maps, latency distributions, and trace timelines, AWS X-Ray provides insights into your workload's performance and the bottlenecks affecting it. ### Implementation steps 1. Identify what data to collect: Ascertain the essential metrics, logs, and traces that would offer substantial insights into your workload's health, performance, and behavior. 2. Deploy the CloudWatch agent: The CloudWatch agent is instrumental in procuring system and application metrics and logs from your workload and its underlying infrastructure. The CloudWatch agent can also be used to collect OpenTelemetry or X-Ray traces and send them to X-Ray. 3. Implement anomaly detection for logs and metrics: Use CloudWatch Logs anomaly detection and CloudWatch Metrics anomaly detection to automatically identify unusual activities in your application's operations. These tools use machine learning algorithms to detect and alert on anomalies, which enhances your monitoring capabilities and speeds up response time to potential disruptions or security threats. Set up these features to proactively manage application health and security. 4. Secure sensitive log data: Use Amazon CloudWatch Logs data protection to mask sensitive information within your logs. This feature helps maintain privacy and compliance through automatic detection and masking of sensitive data before it is accessed. Implement data masking to securely handle and protect sensitive details such as personally identifiable information (PII). 5. Define and monitor business KPIs: Establish custom metrics that align with your business outcomes. 6. Instrument your application with AWS X-Ray: In addition to deploying the CloudWatch agent, it's crucial to instrument your application to emit trace data. This process can provide further insights into your workload's behavior and performance. 7. Standardize data collection across your application: Standardize data collection practices across your entire application. Uniformity aids in correlating and analyzing data, providing a comprehensive view of your application's behavior. 8. Implement cross-account observability: Enhance monitoring efficiency across multiple AWS accounts with Amazon CloudWatch cross-account observability. With this feature, you can consolidate metrics, logs, and alarms from different accounts into a single view, which simplifies management and improves response times for identified issues across your organization's AWS environment. 9. Analyze and act on the data: Once data collection and normalization are in place, use Amazon CloudWatch for metrics and logs analysis, and AWS X-Ray for trace analysis. Such analysis can yield crucial insights into your workload's health, performance, and behavior, guiding your decision-making process. **Level of effort for the implementation plan:** High

๐Ÿ’ผ OPS04-BP03 Implement user experience telemetry

Gaining deep insights into customer experiences and interactions with your application is crucial. Real user monitoring (RUM) and synthetic transactions serve as powerful tools for this purpose. RUM provides data about real user interactions granting an unfiltered perspective of user satisfaction, while synthetic transactions simulate user interactions, helping in detecting potential issues even before they impact real users. **Desired outcome:** A holistic view of the customer experience, proactive detection of issues, and optimization of user interactions to deliver seamless digital experiences. **Common anti-patterns** - Applications without real user monitoring (RUM): - Delayed issue detection: Without RUM, you might not become aware of performance bottlenecks or issues until users complain. This reactive approach can lead to customer dissatisfaction. - Lack of user experience insights: Not using RUM means you lose out on crucial data that shows how real users interact with your application, limiting your ability to optimize the user experience. - Applications without synthetic transactions: - Missed edge cases: Synthetic transactions help you test paths and functions that might not be frequently used by typical users but are critical to certain business functions. Without them, these paths could malfunction and go unnoticed. - Checking for issues when the application is not being used: Regular synthetic testing can simulate times when real users aren't actively interacting with your application, ensuring the system always functions correctly. **Benefits of establishing this best practice** - Proactive issue detection: Identify and address potential issues before they impact real users. - Optimized user experience: Continuous feedback from RUM aids in refining and enhancing the overall user experience. - Insights on device and browser performance: Understand how your application performs across various devices and browsers, enabling further optimization. - Validated business workflows: Regular synthetic transactions ensure that core functionalities and critical paths remain operational and efficient. - Enhanced application performance: Leverage insights gathered from real user data to improve application responsiveness and reliability. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To leverage RUM and synthetic transactions for user activity telemetry, AWS offers services like Amazon CloudWatch RUM and Amazon CloudWatch Synthetics. Metrics, logs, and traces, coupled with user activity data, provide a comprehensive view of both the application's operational state and the user experience. ### Implementation steps 1. Deploy Amazon CloudWatch RUM: Integrate your application with CloudWatch RUM to collect, analyze, and present real user data. 1. Use the CloudWatch RUM JavaScript library to integrate RUM with your application. 2. Set up dashboards to visualize and monitor real user data. 2. Configure CloudWatch Synthetics: Create canaries, or scripted routines, that simulate user interactions with your application. 1. Define critical application workflows and paths. 2. Design canaries using CloudWatch Synthetics scripts to simulate user interactions for these paths. 3. Schedule and monitor canaries to run at specified intervals, ensuring consistent performance checks. 3. Analyze and act on data: Utilize data from RUM and synthetic transactions to gain insights and take corrective measures when anomalies are detected. Use CloudWatch dashboards and alarms to stay informed. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS04-BP04 Implement dependency telemetry

Dependency telemetry is essential for monitoring the health and performance of the external services and components your workload relies on. It provides valuable insights into reachability, timeouts, and other critical events related to dependencies such as DNS, databases, or third-party APIs. When you instrument your application to emit metrics, logs, and traces about these dependencies, you gain a clearer understanding of potential bottlenecks, performance issues, or failures that might impact your workload. **Desired outcome:** Ensure that the dependencies your workload relies on are performing as expected, allowing you to proactively address issues and ensure optimal workload performance. **Common anti-patterns** - Overlooking external dependencies: Focusing only on internal application metrics while neglecting metrics related to external dependencies. - Lack of proactive monitoring: Waiting for issues to arise instead of continuously monitoring dependency health and performance. - Siloed monitoring: Using multiple, disparate monitoring tools which can result in fragmented and inconsistent views of dependency health. **Benefits of establishing this best practice** - Improved workload reliability: By ensuring that external dependencies are consistently available and performing optimally. - Faster issue detection and resolution: Proactively identifying and addressing issues with dependencies before they impact the workload. - Comprehensive view: Gaining a holistic view of both internal and external components that influence workload health. - Enhanced workload scalability: By understanding the scalability limits and performance characteristics of external dependencies. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Implement dependency telemetry by starting with identifying the services, infrastructure, and processes that your workload depends on. Quantify what good conditions look like when those dependencies are functioning as expected, and then determine what data will be needed to measure those. With that information you can craft dashboards and alerts that provide insights to your operations teams on the state of those dependencies. Use AWS tools to discover and quantify the impacts when dependencies cannot deliver as needed. Continually revisit your strategy to account for changes in priorities, goals, and gained insights. ### Implementation steps 1. Identify external dependencies: Collaborate with stakeholders to pinpoint the external dependencies your workload relies on. External dependencies can encompass services like external databases, third-party APIs, network connectivity routes to other environments, and DNS services. The first step towards effective dependency telemetry is being comprehensive in understanding what those dependencies are. 2. Develop a monitoring strategy: Once you have a clear picture of your external dependencies, architect a monitoring strategy tailored to them. This involves understanding the criticality of each dependency, its expected behavior, and any associated service-level agreements or targets (SLA or SLTs). Set up proactive alerts to notify you of status changes or performance deviations. 3. Use network monitoring: Use Internet Monitor and Network Monitor, which provide comprehensive insights into global internet and network conditions. These tools help you understand and respond to outages, disruptions, or performance degradations that affect your external dependencies. 4. Stay informed with AWS Health: AWS Health is the authoritative source of information about the health of your AWS Cloud resources. Use AWS Health to visualize and receive notifications about any current service events and upcoming changes, such as planned lifecycle events, so you can take steps to mitigate impacts. 1. Create purpose-fit AWS Health event notifications to e-mail and chat channels through AWS User Notifications, and integrate programmatically with your monitoring and alerting tools through Amazon EventBridge or the AWS Health API. 2. Plan and track progress on health events that require action by integrating with change management or ITSM tools (like Jira or ServiceNow) that you may already use through Amazon EventBridge or the AWS Health API. 3. If you use AWS Organizations, enable organization view for AWS Health to aggregate AWS Health events across accounts. 5. Instrument your application with AWS X-Ray: AWS X-Ray provides insights into how applications and their underlying dependencies are performing. By tracing requests from start to end, you can identify bottlenecks or failures in the external services or components your application relies on. 6. Use Amazon DevOps Guru: This machine learning-driven service identifies operational issues, predicts when critical issues might occur, and recommends specific actions to take. It's invaluable for gaining insights into dependencies and ensuring they're not the source of operational problems. 7. Monitor regularly: Continually monitor metrics and logs related to external dependencies. Set up alerts for unexpected behavior or degraded performance. 8. Validate after changes: Whenever there's an update or change in any of the external dependencies, validate their performance and check their alignment with your application's requirements. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS04-BP05 Implement distributed tracing

Distributed tracing offers a way to monitor and visualize requests as they traverse through various components of a distributed system. By capturing trace data from multiple sources and analyzing it in a unified view, teams can better understand how requests flow, where bottlenecks exist, and where optimization efforts should focus. **Desired outcome:** Achieve a holistic view of requests flowing through your distributed system, allowing for precise debugging, optimized performance, and improved user experiences. **Common anti-patterns** - Inconsistent instrumentation: Not all services in a distributed system are instrumented for tracing. - Ignoring latency: Only focusing on errors and not considering the latency or gradual performance degradations. **Benefits of establishing this best practice** - Comprehensive system overview: Visualizing the entire path of requests, from entry to exit. - Enhanced debugging: Quickly identifying where failures or performance issues occur. - Improved user experience: Monitoring and optimizing based on actual user data, ensuring the system meets real-world demands. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Begin by identifying all of the elements of your workload that require instrumentation. Once all components are accounted for, leverage tools such as AWS X-Ray and OpenTelemetry to gather trace data for analysis with tools like X-Ray and Amazon CloudWatch ServiceLens Map. Engage in regular reviews with developers, and supplement these discussions with tools like Amazon DevOps Guru, X-Ray Analytics, and X-Ray Insights to help uncover deeper findings. Establish alerts from trace data to notify when outcomes, as defined in the workload monitoring plan, are at risk. ### Implementation steps 1. Adopt AWS X-Ray: Integrate X-Ray into your application to gain insights into its behavior, understand its performance, and pinpoint bottlenecks. Utilize X-Ray Insights for automatic trace analysis. 2. Instrument your services: Verify that every service, from an AWS Lambda function to an EC2 instance, sends trace data. The more services you instrument, the clearer the end-to-end view. 3. Incorporate CloudWatch Real User Monitoring and synthetic monitoring: Integrate Real User Monitoring (RUM) and synthetic monitoring with X-Ray. This allows for capturing real-world user experiences and simulating user interactions to identify potential issues. 4. Use the CloudWatch agent: The agent can send traces from either X-Ray or OpenTelemetry, enhancing the depth of insights obtained. 5. Use Amazon DevOps Guru: DevOps Guru uses data from X-Ray, CloudWatch, AWS Config, and AWS CloudTrail to provide actionable recommendations. 6. Analyze traces: Regularly review the trace data to discern patterns, anomalies, or bottlenecks that might impact your application's performance. 7. Set up alerts: Configure alarms in CloudWatch for unusual patterns or extended latencies, allowing proactive issue addressing. 8. Continuous improvement: Revisit your tracing strategy as services are added or modified to capture all relevant data points. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS05-BP01 Use version control

Use version control to activate tracking of changes and releases. Many AWS services offer version control capabilities. Use a revision or source control system such as Git to manage code and other artifacts such as version-controlled AWS CloudFormation templates of your infrastructure. **Desired outcome:** Your teams collaborate on code. When merged, the code is consistent and no changes are lost. Errors are easily reverted through correct versioning. **Common anti-patterns** - You have been developing and storing your code on your workstation. You have had an unrecoverable storage failure on the workstation and your code is lost. - After overwriting the existing code with your changes, you restart your application and it is no longer operable. You are unable to revert the change. - You have a write lock on a report file that someone else needs to edit. They contact you asking that you stop work on it so that they can complete their tasks. - Your research team has been working on a detailed analysis that shapes your future work. Someone has accidentally saved their shopping list over the final report. You are unable to revert the change and have to recreate the report. **Benefits of establishing this best practice:** By using version control capabilities you can easily revert to known good states and previous versions, and limit the risk of assets being lost. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Maintain assets in version controlled repositories. Doing so supports tracking changes, deploying new versions, detecting changes to existing versions, and reverting to prior versions (for example, rolling back to a known good state in the event of a failure). Integrate the version control capabilities of your configuration management systems into your procedures.

๐Ÿ’ผ OPS05-BP02 Test and validate changes

Every change deployed must be tested to avoid errors in production. This best practice is focused on testing changes from version control to artifact build. Besides application code changes, testing should include infrastructure, configuration, security controls, and operations procedures. Testing takes many forms, from unit tests to software component analysis (SCA). Moving tests further to the left in the software integration and delivery process results in higher certainty of artifact quality. Your organization must develop testing standards for all software artifacts. Automated tests reduce toil and avoid manual test errors. Manual tests may be necessary in some cases. Developers must have access to automated test results to create feedback loops that improve software quality. **Desired outcome:** Your software changes are tested before they are delivered. Developers have access to test results and validations. Your organization has a testing standard that applies to all software changes. **Common anti-patterns** - You deploy a new software change without any tests. It fails to run in production, which leads to an outage. - New security groups are deployed with AWS CloudFormation without being tested in a pre-production environment. The security groups make your app unreachable for your customers. - A method is modified but there are no unit tests. The software fails when it is deployed to production. **Benefits of establishing this best practice** - Change fail rate of software deployments are reduced. - Software quality is improved. - Developers have increased awareness on the viability of their code. - Security policies can be rolled out with confidence to support organization's compliance. - Infrastructure changes such as automatic scaling policy updates are tested in advance to meet traffic needs. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Testing is done on all changes, from application code to infrastructure, as part of your continuous integration practice. Test results are published so that developers have fast feedback. Your organization has a testing standard that all changes must pass. Use the power of generative AI with Amazon Q Developer to improve developer productivity and code quality. Amazon Q Developer includes generation of code suggestions (based on large language models), production of unit tests (including boundary conditions), and code security enhancements through detection and remediation of security vulnerabilities. **Customer example** As part of their continuous integration pipeline, AnyCompany Retail conducts several types of tests on all software artifacts. They practice test driven development so all software has unit tests. Once the artifact is built, they run end-to-end tests. After this first round of tests is complete, they run a static application security scan, which looks for known vulnerabilities. Developers receive messages as each testing gate is passed. Once all tests are complete, the software artifact is stored in an artifact repository. ### Implementation steps 1. Work with stakeholders in your organization to develop a testing standard for software artifacts. What standard tests should all artifacts pass? Are there compliance or governance requirements that must be included in the test coverage? Do you need to conduct code quality tests? When tests complete, who needs to know? 1. The AWS Deployment Pipeline Reference Architecture contains an authoritative list of types of tests that can be conducted on software artifacts as part of an integration pipeline. 2. Instrument your application with the necessary tests based on your software testing standard. Each set of tests should complete in under ten minutes. Tests should run as part of an integration pipeline. 1. Use Amazon Q Developer, a generative AI tool that can help create unit test cases (including boundary conditions), generate functions using code and comments, and implement well-known algorithms. 2. Use Amazon CodeGuru Reviewer to test your application code for defects. 3. You can use AWS CodeBuild to conduct tests on software artifacts. 4. AWS CodePipeline can orchestrate your software tests into a pipeline.

๐Ÿ’ผ OPS05-BP03 Use configuration management systems

Use configuration management systems to make and track configuration changes. These systems reduce errors caused by manual processes and reduce the level of effort to deploy changes. Static configuration management sets values when initializing a resource that are expected to remain consistent throughout the resourceโ€™s lifetime. Dynamic configuration management sets values at initialization that can or are expected to change during the lifetime of a resource. For example, you could set a feature toggle to activate functionality in your code through a configuration change, or change the level of log detail during an incident. Configurations should be deployed in a known and consistent state. You should use automated inspection to continually monitor resource configurations across environments and regions. These controls should be defined as code and management automated to ensure rules are consistently applied across environments. Changes to configurations should be updated through agreed change control procedures and applied consistently, honoring version control. Application configuration should be managed independently of application and infrastructure code. This allows for consistent deployment across multiple environments. Configuration changes do not result in rebuilding or redeploying the application. **Desired outcome** - You configure, validate, and deploy as part of your continuous integration, continuous delivery (CI/CD) pipeline. - You monitor to validate configurations are correct. - This minimizes any impact to end users and customers. **Common anti-patterns** - You manually update the web server configuration across your fleet and a number of servers become unresponsive due to update errors. - You manually update your application server fleet over the course of many hours. The inconsistency in configuration during the change causes unexpected behaviors. - Someone has updated your security groups and your web servers are no longer accessible. Without knowledge of what was changed, you spend significant time investigating the issue, extending your time to recovery. - You push a pre-production configuration into production through CI/CD without validation. You expose users and customers to incorrect data and services. **Benefits of establishing this best practice** - Adopting configuration management systems reduces the level of effort to make and track changes, and the frequency of errors caused by manual procedures. - Configuration management systems provide assurances with regards to governance, compliance, and regulatory requirements. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Configuration management systems are used to track and implement changes to application and environment configurations. Configuration management systems are also used to reduce errors caused by manual processes, make configuration changes repeatable and auditable, and reduce the level of effort. On AWS, you can use AWS Config to continually monitor your AWS resource configurations across accounts and Regions. It helps you to track their configuration history, understand how a configuration change would affect other resources, and audit them against expected or desired configurations using AWS Config Rules and AWS Config Conformance Packs. For dynamic configurations in your applications running on Amazon EC2 instances, AWS Lambda, containers, mobile applications, or IoT devices, you can use AWS AppConfig to configure, validate, deploy, and monitor them across your environments. ### Implementation steps 1. Identify configuration owners. 1. Make configuration owners aware of any compliance, governance, or regulatory needs. 2. Identify configuration items and deliverables. 1. Configuration items are all application and environmental configurations affected by a deployment within your CI/CD pipeline. 2. Deliverables include success criteria, validation, and what to monitor. 3. Select tools for configuration management based on your business requirements and delivery pipeline. 4. Consider weighted deployments such as canary deployments for significant configuration changes to minimize the impact of incorrect configurations. 5. Integrate your configuration management into your CI/CD pipeline. 6. Validate all changes pushed.

๐Ÿ’ผ OPS05-BP04 Use build and deployment management systems

Use build and deployment management systems. These systems reduce errors caused by manual processes and reduce the level of effort to deploy changes. In AWS, you can build continuous integration/continuous deployment (CI/CD) pipelines using services such as AWS Developer Tools (for example, AWS CodeBuild, AWS CodePipeline, and AWS CodeDeploy). **Desired outcome** - Your build and deployment management systems support your organization's continuous integration and continuous delivery (CI/CD) system that provides capabilities for automating safe rollouts with the correct configurations. **Common anti-patterns** - After compiling your code on your development system, you copy the executable onto your production systems and it fails to start. The local log files indicate that it has failed due to missing dependencies. - You successfully build your application with new features in your development environment and provide the code to quality assurance (QA). It fails QA because it is missing static assets. - On Friday, after much effort, you successfully built your application manually in your development environment including your newly coded features. On Monday, you are unable to repeat the steps that allowed you to successfully build your application. - You perform the tests you have created for your new release. Then you spend the next week setting up a test environment and performing all the existing integration tests followed by the performance tests. The new code has an unacceptable performance impact and must be redeveloped and then retested. **Benefits of establishing this best practice** - By providing mechanisms to manage build and deployment activities you reduce the level of effort to perform repetitive tasks, free your team members to focus on their high value creative tasks, and limit the introduction of error from manual procedures. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Build and deployment management systems are used to track and implement change, reduce errors caused by manual processes, and reduce the level of effort required for safe deployments. Fully automate the integration and deployment pipeline from code check-in through build, testing, deployment, and validation. This reduces lead time, decreases cost, encourages increased frequency of change, reduces the level of effort, and increases collaboration. ### Implementation steps 1. Use a version control system to store and manage assets (such as documents, source code, and binary files). 2. Use CodeBuild to compile your source code, run unit tests, and produce artifacts that are ready to deploy. 3. Use CodeDeploy as a deployment service that automates application deployments to Amazon EC2 instances, on-premises instances, serverless AWS Lambda functions, or Amazon ECS. 4. Monitor your deployments.

๐Ÿ’ผ OPS05-BP05 Perform patch management

Perform patch management to gain features, address issues, and remain compliant with governance. Automate patch management to reduce errors caused by manual processes, scale, and reduce the level of effort to patch. Patch and vulnerability management are part of your benefit and risk management activities. It is preferable to have immutable infrastructures and deploy workloads in verified known good states. Where that is not viable, patching in place is the remaining option. AWS Health is the authoritative source of information about planned lifecycle events and other action-required events that affect the health of your AWS Cloud resources. You should be aware of upcoming changes and updates that should be performed. Major planned lifecycle events are sent at least six months in advance. Amazon EC2 Image Builder provides pipelines to update machine images. As a part of patch management, consider Amazon Machine Images (AMIs) using an AMI image pipeline or container images with a Docker image pipeline, while AWS Lambda provides patterns for custom runtimes and additional libraries to remove vulnerabilities. You should manage updates to Amazon Machine Images for Linux or Windows Server images using Amazon EC2 Image Builder. You can use Amazon Elastic Container Registry (Amazon ECR) with your existing pipeline to manage Amazon ECS images and manage Amazon EKS images. Lambda includes version management features. Patching should not be performed on production systems without first testing in a safe environment. Patches should only be applied if they support an operational or business outcome. On AWS, you can use AWS Systems Manager Patch Manager to automate the process of patching managed systems and schedule the activity using Systems Manager Maintenance Windows. **Desired outcome** - Your AMI and container images are patched, up-to-date, and ready for launch. - You are able to track the status of all deployed images and know patch compliance. - You are able to report on current status and have a process to meet your compliance needs. **Common anti-patterns** - You are given a mandate to apply all new security patches within two hours, resulting in multiple outages due to application incompatibility with patches. - An unpatched library results in unintended consequences as unknown parties use vulnerabilities within it to access your workload. - You patch the developer environments automatically without notifying the developers. You receive multiple complaints from the developers that their environment ceases to operate as expected. - You have not patched the commercial off-the-shelf software on a persistent instance. When you have an issue with the software and contact the vendor, they notify you that the version is not supported and you have to patch to a specific level to receive any assistance. - A recently released patch for the encryption software you used has significant performance improvements. Your unpatched system has performance issues that remain in place as a result of not patching. - You are notified of a zero-day vulnerability requiring an emergency fix and you have to patch all your environments manually. - You are not aware of critical actions needed to maintain your resources, such as mandatory version updates because you do not review upcoming planned lifecycle events and other information. You lose critical time for planning and execution, resulting in emergency changes for your teams and potential impact or unexpected downtime. **Benefits of establishing this best practice** - By establishing a patch management process, including your criteria for patching and methodology for distribution across your environments, you can scale and report on patch levels. - Provides assurances around security patching and ensures clear visibility on the status of known fixes being in place. - Encourages adoption of desired features and capabilities, rapid removal of issues, and sustained compliance with governance. - Implement patch management systems and automation to reduce the level of effort to deploy patches and limit errors caused by manual processes. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Patch systems to remediate issues, to gain desired features or capabilities, and to remain compliant with governance policy and vendor support requirements. In immutable systems, deploy with the appropriate patch set to achieve the desired result. Automate the patch management mechanism to reduce the elapsed time to patch, to avoid errors caused by manual processes, and lower the level of effort to patch. ### Implementation steps **For Amazon EC2 Image Builder:** 1. Using Amazon EC2 Image Builder, specify pipeline details: 1. Create an image pipeline and name it. 2. Define pipeline schedule and time zone. 3. Configure any dependencies. 2. Choose a recipe: 1. Select existing recipe or create a new one. 2. Select image type. 3. Name and version your recipe. 4. Select your base image. 5. Add build components and add to target registry. 3. Optional - define your infrastructure configuration. 4. Optional - define configuration settings. 5. Review settings. 6. Maintain recipe hygiene regularly. **For Systems Manager Patch Manager:** 1. Create a patch baseline. 2. Select a patching operations method. 3. Enable compliance reporting and scanning.

๐Ÿ’ผ OPS05-BP06 Share design standards

Share best practices across teams to increase awareness and maximize the benefits of development efforts. Document them and keep them up to date as your architecture evolves. If shared standards are enforced in your organization, itโ€™s critical that mechanisms exist to request additions, changes, and exceptions to standards. Without this option, standards become a constraint on innovation. **Desired outcome** - Design standards are shared across teams in your organizations. - They are documented and kept up-to-date as best practices evolve. **Common anti-patterns** - Two development teams have each created a user authentication service. Your users must maintain a separate set of credentials for each part of the system they want to access. - Each team manages their own infrastructure. A new compliance requirement forces a change to your infrastructure and each team implements it in a different way. **Benefits of establishing this best practice** - Using shared standards supports the adoption of best practices and maximizes the benefits of development efforts. - Documenting and updating design standards keeps your organization up-to-date with best practices and security and compliance requirements. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Share existing best practices, design standards, checklists, operating procedures, guidance, and governance requirements across teams. Have procedures to request changes, additions, and exceptions to design standards to support improvement and innovation. Make teams aware of published content. Have a mechanism to keep design standards up-to-date as new best practices emerge. **Customer example** AnyCompany Retail has a cross-functional architecture team that creates software architecture patterns. This team builds the architecture with compliance and governance built in. Teams that adopt these shared standards get the benefits of having compliance and governance built in. They can quickly build on top of the design standard. The architecture team meets quarterly to evaluate architecture patterns and update them if necessary. ### Implementation steps 1. Identify a cross-functional team that owns developing and updating design standards. This team should work with stakeholders across your organization to develop design standards, operating procedures, checklists, guidance, and governance requirements. Document the design standards and share them within your organization. 1. AWS Service Catalog can be used to create portfolios representing design standards using infrastructure as code. You can share portfolios across accounts. 2. Have a mechanism in place to keep design standards up-to-date as new best practices are identified. 3. If design standards are centrally enforced, have a process to request changes, updates, and exemptions. **Level of effort for the implementation plan:** Medium. Developing a process to create and share design standards can take coordination and cooperation with stakeholders across your organization.

๐Ÿ’ผ OPS05-BP07 Implement practices to improve code quality

Implement practices to improve code quality and minimize defects. Some examples include test-driven development, code reviews, standards adoption, and pair programming. Incorporate these practices into your continuous integration and delivery process. **Desired outcome** - Your organization uses best practices like code reviews or pair programming to improve code quality. - Developers and operators adopt code quality best practices as part of the software development lifecycle. **Common anti-patterns** - You commit code to the main branch of your application without a code review. The change automatically deploys to production and causes an outage. - A new application is developed without any unit, end-to-end, or integration tests. There is no way to test the application before deployment. - Your teams make manual changes in production to address defects. Changes do not go through testing or code reviews and are not captured or logged through continuous integration and delivery processes. **Benefits of establishing this best practice** - By adopting practices to improve code quality, you can help minimize issues introduced to production. - Code quality best practices include pair programming, code reviews, and implementation of AI productivity tools. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Implement practices to improve code quality to minimize defects before they are deployed. Use practices like test-driven development, code reviews, and pair programming to increase the quality of your development. Use the power of generative AI with Amazon Q Developer to improve developer productivity and code quality. Amazon Q Developer includes generation of code suggestions (based on large language models), production of unit tests (including boundary conditions), and code security enhancements through detection and remediation of security vulnerabilities. **Customer example** AnyCompany Retail adopts several practices to improve code quality. They have adopted test-driven development as the standard for writing applications. For some new features, they will have developers pair program together during a sprint. Every pull request goes through a code review by a senior developer before being integrated and deployed. ### Implementation steps 1. Adopt code quality practices like test-driven development, code reviews, and pair programming into your continuous integration and delivery process. Use these techniques to improve software quality. 1. Use Amazon Q Developer, a generative AI tool that can help create unit test cases (including boundary conditions), generate functions using code and comments, implement well-known algorithms, detect security policy violations and vulnerabilities in your code, detect secrets, scan infrastructure as code (IaC), document code, and learn third-party code libraries more quickly. 2. Amazon CodeGuru Reviewer can provide programming recommendations for Java and Python code using machine learning. **Level of effort for the implementation plan:** Medium. There are many ways of implementing this best practice, but getting organizational adoption may be challenging.

๐Ÿ’ผ OPS05-BP08 Use multiple environment

Use multiple environments to experiment, develop, and test your workload. Use increasing levels of controls as environments approach production to gain confidence your workload operates as intended when deployed. **Desired outcome:** You have multiple environments that reflect your compliance and governance needs. You test and promote code through environments on your path to production. 1. Your organization does this through the establishment of a landing zone, which provides governance, controls, account automations, networking, security, and operational observability. Manage these landing zone capabilities by using multiple environments. A common example is a sandbox organization for developing and testing changes to an AWS Control Tower-based landing zone, which includes AWS IAM Identity Center and policies such as service control policies (SCPs). All of these elements can significantly impact the access to and operation of AWS accounts within the landing zone. 2. In addition to these services, your teams extend the landing zone capabilities with solutions published by AWS and AWS partners or as custom solutions developed within your organization. Examples of solutions published by AWS include Customizations for AWS Control Tower (CfCT) and AWS Control Tower Account Factory for Terraform (AFT). 3. Your organization applies the same principles of testing, promoting code, and policy changes for the landing zone through environments on your path to production. This strategy provides a stable and secure landing zone environment for your application and workload teams. **Common anti-patterns** - You are performing development in a shared development environment and another developer overwrites your code changes. - The restrictive security controls on your shared development environment are preventing you from experimenting with new services and features. - You perform load testing on your production systems and cause an outage for your users. - A critical error resulting in data loss has occurred in production. In your production environment, you attempt to recreate the conditions that lead to the data loss so that you can identify how it happened and prevent it from happening again. To prevent further data loss during testing, you are forced to make the application unavailable to your users. - You are operating a multi-tenant service and are unable to support a customer request for a dedicated environment. - You may not always test, but when you do, you test in your production environment. - You believe that the simplicity of a single environment overrides the scope of impact of changes within the environment. - You upgrade a key landing zone capability, but the change impairs your team's ability to vend accounts for either new projects or your existing workloads. - You apply new controls to your AWS accounts, but the change impacts your workload team's ability to deploy changes within their AWS accounts. **Benefits of establishing this best practice** - When you deploy multiple environments, you can support multiple simultaneous development, testing, and production environments without creating conflicts between developers or user communities. - For complex capabilities such as landing zones, it significantly reduces the risk of changes, simplifies the improvement process, and reduces the risk of critical updates to the environment. - Organizations that use landing zones naturally benefit from multi-accounts in their AWS environment, with account structure, governance, network, and security configurations. - Over time, as your organization grows, the landing zone can evolve to secure and organize your workloads and resources. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Use multiple environments and provide developers sandbox environments with minimized controls to aid in experimentation. Provide individual development environments to help work in parallel, increasing development agility. Implement more rigorous controls in the environments approaching production to allow developers to innovate. Use infrastructure as code and configuration management systems to deploy environments that are configured consistent with the controls present in production to ensure systems operate as expected when deployed. When environments are not in use, turn them off to avoid costs associated with idle resources (for example, development systems on evenings and weekends). Deploy production equivalent environments when load testing to improve valid results. Teams such as platform engineering, networking, and security operations often manage capabilities at the organization level with distinct requirements. A separation of accounts alone is insufficient to provide and maintain separate environments for experimentation, development, and testing. In such cases, create separate instances of AWS Organizations.

๐Ÿ’ผ OPS05-BP09 Make frequent, small, reversible changes

Frequent, small, and reversible changes reduce the scope and impact of a change. When used in conjunction with change management systems, configuration management systems, and build and delivery systems frequent, small, and reversible changes reduce the scope and impact of a change. This results in more effective troubleshooting and faster remediation with the option to roll back changes. **Common anti-patterns:** - You deploy a new version of your application quarterly with a change window that means a core service is turned off. - You frequently make changes to your database schema without tracking changes in your management systems. - You perform manual in-place updates, overwriting existing installations and configurations, and have no clear roll-back plan. **Benefits of establishing this best practice:** Development efforts are faster by deploying small changes frequently. When the changes are small, it is much easier to identify if they have unintended consequences, and they are easier to reverse. When the changes are reversible, there is less risk to implementing the change, as recovery is simplified. The change process has a reduced risk and the impact of a failed change is reduced. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Use frequent, small, and reversible changes to reduce the scope and impact of a change. This eases troubleshooting, helps with faster remediation, and provides the option to roll back a change. It also increases the rate at which you can deliver value to the business.

๐Ÿ’ผ OPS05-BP10 Fully automate integration and deployment

Automate build, deployment, and testing of the workload. This reduces errors caused by manual processes and reduces the effort to deploy changes. Apply metadata using Resource Tags and AWS Resource Groups following a consistent tagging strategy to aid in identification of your resources. Tag your resources for organization, cost accounting, access controls, and targeting the run of automated operations activities. **Desired outcome:** Developers use tools to deliver code and promote through to production. Developers do not have to log into the AWS Management Console to deliver updates. There is a full audit trail of change and configuration, meeting the needs of governance and compliance. Processes are repeatable and are standardized across teams. Developers are free to focus on development and code pushes, increasing productivity. **Common anti-patterns:** - On Friday, you finish authoring the new code for your feature branch. On Monday, after running your code quality test scripts and each of your unit tests scripts, you check in your code for the next scheduled release. - You are assigned to code a fix for a critical issue impacting a large number of customers in production. After testing the fix, you commit your code and email change management to request approval to deploy it to production. - As a developer, you log into the AWS Management Console to create a new development environment using non-standard methods and systems. **Benefits of establishing this best practice:** By implementing automated build and deployment management systems, you reduce errors caused by manual processes and reduce the effort to deploy changes helping your team members to focus on delivering business value. You increase the speed of delivery as you promote through to production. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance You use build and deployment management systems to track and implement change, to reduce errors caused by manual processes, and reduce the level of effort. Fully automate the integration and deployment pipeline from code check-in through build, testing, deployment, and validation. This reduces lead time, encourages increased frequency of change, reduces the level of effort, increases the speed to market, results in increased productivity, and increases the security of your code as you promote through to production.

๐Ÿ’ผ OPS06-BP01 Plan for unsuccessful changes

Plan to revert to a known good state, or remediate in the production environment if the deployment causes an undesired outcome. Having a policy to establish such a plan helps all teams develop strategies to recover from failed changes. Some example strategies are deployment and rollback steps, change policies, feature flags, traffic isolation, and traffic shifting. A single release may include multiple related component changes. The strategy should provide the ability to withstand or recover from a failure of any component change. **Desired outcome:** You have prepared a detailed recovery plan for your change in the event it is unsuccessful. In addition, you have reduced the size of your release to minimize the potential impact on other workload components. As a result, you have reduced your business impact by shortening the potential downtime caused by a failed change and increased the flexibility and efficiency of recovery times. **Common anti-patterns:** - You performed a deployment and your application has become unstable but there appear to be active users on the system. You have to decide whether to rollback the change and impact the active users or wait to rollback the change knowing the users may be impacted regardless. - After making a routine change, your new environments are accessible, but one of your subnets has become unreachable. You have to decide whether to rollback everything or try to fix the inaccessible subnet. While you are making that determination, the subnet remains unreachable. - Your systems are not architected in a way that allows them to be updated with smaller releases. As a result, you have difficulty in reversing those bulk changes during a failed deployment. - You do not use infrastructure as code (IaC) and you made manual updates to your infrastructure that resulted in an undesired configuration. You are unable to effectively track and revert the manual changes. - Because you have not measured increased frequency of your deployments, your team is not incentivized to reduce the size of their changes and improve their rollback plans for each change, leading to more risk and increased failure rates. - You do not measure the total duration of an outage caused by unsuccessful changes. Your team is unable to prioritize and improve its deployment process and recovery plan effectiveness. **Benefits of establishing this best practice:** Having a plan to recover from unsuccessful changes minimizes the mean time to recover (MTTR) and reduces your business impact. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance A consistent, documented policy and practice adopted by release teams allows an organization to plan what should happen if unsuccessful changes occur. The policy should allow for fixing forward in specific circumstances. In either situation, a fix forward or rollback plan should be well documented and tested before deployment to live production so that the time it takes to revert a change is minimized. ### Implementation steps 1. Document the policies that require teams to have effective plans to reverse changes within a specified period. 1. Policies should specify when a fix-forward situation is allowed. 2. Require a documented rollback plan to be accessible by all involved. 3. Specify the requirements to rollback (for example, when it is found that unauthorized changes have been deployed). 2. Analyze the level of impact of all changes related to each component of a workload. 1. Allow repeatable changes to be standardized, templated, and preauthorized if they follow a consistent workflow that enforces change policies. 2. Reduce the potential impact of any change by making the size of the change smaller so recovery takes less time and causes less business impact. 3. Ensure rollback procedures revert code to the known good state to avoid incidents where possible. 3. Integrate tools and workflows to enforce your policies programatically. 4. Make data about changes visible to other workload owners to improve the speed of diagnosis of any failed change that cannot be rolled back. 1. Measure success of this practice using visible change data and identify iterative improvements. 5. Use monitoring tools to verify the success or failure of a deployment to speed up decision-making on rolling back. 6. Measure your duration of outage during an unsuccessful change to continually improve your recovery plans. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS06-BP02 Test deployments

Test release procedures in pre-production by using the same deployment configuration, security controls, steps, and procedures as in production. Validate that all deployed steps are completed as expected, such as inspecting files, configurations, and services. Further test all changes with functional, integration, and load tests, along with any monitoring such as health checks. By doing these tests, you can identify deployment issues early with an opportunity to plan and mitigate them prior to production. You can create temporary parallel environments for testing every change. Automate the deployment of the test environments using infrastructure as code (IaC) to help reduce amount of work involved and ensure stability, consistency, and faster feature delivery. **Desired outcome:** Your organization adopts a test-driven development culture that includes testing deployments. This ensures teams are focused on delivering business value rather than managing releases. Teams are engaged early upon identification of deployment risks to determine the appropriate course of mitigation. **Common anti-patterns:** 1. During production releases, untested deployments cause frequent issues that require troubleshooting and escalation. 2. Your release contains infrastructure as code (IaC) that updates existing resources. You are unsure if the IaC runs successfully or causes impact to the resources. 3. You deploy a new feature to your application. It doesn't work as intended and there is no visibility until it gets reported by impacted users. 4. You update your certificates. You accidentally install the certificates to the wrong components, which goes undetected and impacts website visitors because a secure connection to the website can't be established. **Benefits of establishing this best practice:** Extensive testing in pre-production of deployment procedures, and the changes introduced by them, minimizes the potential impact to production caused by the deployments steps. This increases confidence during production release and minimizes operational support without slowing down velocity of the changes being delivered. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Testing your deployment process is as important as testing the changes that result from your deployment. This can be achieved by testing your deployment steps in a pre-production environment that mirrors production as closely as possible. Common issues, such as incomplete or incorrect deployment steps, or misconfigurations, can be caught as a result before going to production. In addition, you can test your recovery steps. **Customer example** As part of their continuous integration and continuous delivery (CI/CD) pipeline, AnyCompany Retail performs the defined steps needed to release infrastructure and software updates for its customers in a production-like environment. The pipeline is comprised of pre-checks to detect drift (detecting changes to resources performed outside of your IaC) in resources prior to deployment, as well as validate actions that the IaC takes upon its initiation. It validates deployment steps, like verifying that certain files and configurations are in place and services are in running states and are responding correctly to health checks on local host before re-registering with the load balancer. Additionally, all changes flag a number of automated tests, such as functional, security, regression, integration, and load tests. ### Implementation steps 1. Perform pre-install checks to mirror the pre-production environment to production. 1. Use drift detection to detect when resources have been changed outside of AWS CloudFormation. 2. Use change sets to validate that the intent of a stack update matches the actions that AWS CloudFormation takes when the change set is initiated. 2. Trigger a manual approval step in AWS CodePipeline to authorize the deployment to the pre-production environment. 3. Use deployment configurations such as AWS CodeDeploy AppSpec files to define deployment and validation steps. 4. Where applicable, integrate AWS CodeDeploy with other AWS services or integrate AWS CodeDeploy with partner products and services. 5. Monitor deployments using Amazon CloudWatch, AWS CloudTrail, and Amazon SNS event notifications. 6. Perform post-deployment automated testing, including functional, security, regression, integration, and load testing. 7. Troubleshoot deployment issues. 8. Successful validation of preceding steps should initiate a manual approval workflow to authorize deployment to production. **Level of effort for the implementation plan:** High

๐Ÿ’ผ OPS06-BP03 Employ safe deployment strategies

Safe production roll-outs control the flow of beneficial changes with an aim to minimize any perceived impact for customers from those changes. The safety controls provide inspection mechanisms to validate desired outcomes and limit the scope of impact from any defects introduced by the changes or from deployment failures. Safe roll-outs may include strategies such as feature-flags, one-box, rolling (canary releases), immutable, traffic splitting, and blue/green deployments. **Desired outcome:** Your organization uses a continuous integration continuous delivery (CI/CD) system that provides capabilities for automating safe rollouts. Teams are required to use appropriate safe roll-out strategies. **Common anti-patterns:** 1. You deploy an unsuccessful change to all of production all at once. As a result, all customers are impacted simultaneously. 2. A defect introduced in a simultaneous deployment to all systems requires an emergency release. Correcting it for all customers takes several days. 3. Managing production release requires planning and participation of several teams. This puts constraints on your ability to frequently update features for your customers. 4. You perform a mutable deployment by modifying your existing systems. After discovering that the change was unsuccessful, you are forced to modify the systems again to restore the old version, extending your time to recovery. **Benefits of establishing this best practice:** Automated deployments balance speed of roll-outs against delivering beneficial changes consistently to customers. Limiting impact prevents costly deployment failures and maximizes teams ability to efficiently respond to failures. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Continuous-delivery failures can lead to reduced service availability and bad customer experiences. To maximize the rate of successful deployments, implement safety controls in the end-to-end release process to minimize deployment errors, with a goal of achieving zero deployment failures. **Customer example** AnyCompany Retail is on a mission to achieve minimal to zero downtime deployments, meaning that there's no perceivable impact to its users during deployments. To accomplish this, the company has established deployment patterns (see the following workflow diagram), such as rolling and blue/green deployments. All teams adopt one or more of these patterns in their CI/CD pipeline. ### Implementation steps 1. Use an approval workflow to initiate the sequence of production roll-out steps upon promotion to production. 2. Use an automated deployment system such as AWS CodeDeploy. AWS CodeDeploy deployment options include in-place deployments for EC2/On-Premises and blue/green deployments for EC2/On-Premises, AWS Lambda, and Amazon ECS (see the preceding workflow diagram). - Where applicable, integrate AWS CodeDeploy with other AWS services or integrate AWS CodeDeploy with partner products and services. 3. Use blue/green deployments for databases such as Amazon Aurora and Amazon RDS. 4. Monitor deployments using Amazon CloudWatch, AWS CloudTrail, and Amazon Simple Notification Service (Amazon SNS) event notifications. 5. Perform post-deployment automated testing including functional, security, regression, integration, and any load tests. 6. Troubleshoot deployment issues. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS06-BP04 Automate testing and rollback

To increase the speed, reliability, and confidence of your deployment process, have a strategy for automated testing and rollback capabilities in pre-production and production environments. Automate testing when deploying to production to simulate human and system interactions that verify the changes being deployed. Automate rollback to revert back to a previous known good state quickly. The rollback should be initiated automatically on pre-defined conditions such as when the desired outcome of your change is not achieved or when the automated test fails. Automating these two activities improves your success rate for your deployments, minimizes recovery time, and reduces the potential impact to the business. **Desired outcome:** Your automated tests and rollback strategies are integrated into your continuous integration, continuous delivery (CI/CD) pipeline. Your monitoring is able to validate against your success criteria and initiate automatic rollback upon failure. This minimizes any impact to end users and customers. For example, when all testing outcomes have been satisfied, you promote your code into the production environment where automated regression testing is initiated, leveraging the same test cases. If regression test results do not match expectations, then automated rollback is initiated in the pipeline workflow. **Common anti-patterns:** - Your systems are not architected in a way that allows them to be updated with smaller releases. As a result, you have difficulty in reversing those bulk changes during a failed deployment. - Your deployment process consists of a series of manual steps. After you deploy changes to your workload, you start post-deployment testing. After testing, you realize that your workload is inoperable and customers are disconnected. You then begin rolling back to the previous version. All of these manual steps delay overall system recovery and cause a prolonged impact to your customers. - You spent time developing automated test cases for functionality that is not frequently used in your application, minimizing the return on investment in your automated testing capability. - Your release is comprised of application, infrastructure, patches and configuration updates that are independent from one another. However, you have a single CI/CD pipeline that delivers all changes at once. A failure in one component forces you to revert all changes, making your rollback complex and inefficient. - Your team completes the coding work in sprint one and begins sprint two work, but your plan did not include testing until sprint three. As a result, automated tests revealed defects from sprint one that had to be resolved before testing of sprint two deliverables could be started and the entire release is delayed, devaluing your automated testing. - Your automated regression test cases for the production release are complete, but you are not monitoring workload health. Since you have no visibility into whether or not the service has restarted, you are not sure if rollback is needed or if it has already occurred. **Benefits of establishing this best practice:** Automated testing increases the transparency of your testing process and your ability to cover more features in a shorter time period. By testing and validating changes in production, you are able to identify issues immediately. Improvement in consistency with automated testing tools allows for better detection of defects. By automatically rolling back to the previous version, the impact on your customers is minimized. Automated rollback ultimately inspires more confidence in your deployment capabilities by reducing business impact. Overall, these capabilities reduce time-to-delivery while ensuring quality. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Automate testing of deployed environments to confirm desired outcomes more quickly. Automate rollback to a previous known good state when pre-defined outcomes are not achieved to minimize recovery time and reduce errors caused by manual processes. Integrate testing tools with your pipeline workflow to consistently test and minimize manual inputs. Prioritize automating test cases, such as those that mitigate the greatest risks and need to be tested frequently with every change. Additionally, automate rollback based on specific conditions that are pre-defined in your test plan. ### Implementation steps 1. Establish a testing lifecycle for your development lifecycle that defines each stage of the testing process from requirements planning to test case development, tool configuration, automated testing, and test case closure. 1. Create a workload-specific testing approach from your overall test strategy. 2. Consider a continuous testing strategy where appropriate throughout the development lifecycle. 2. Select automated tools for testing and rollback based on your business requirements and pipeline investments. 3. Decide which test cases you wish to automate and which should be performed manually. These can be defined based on business value priority of the feature being tested. Align all team members to this plan and verify accountability for performing manual tests. 1. Apply automated testing capabilities to specific test cases that make sense for automation, such as repeatable or frequently run cases, those that require repetitive tasks, or those that are required across multiple configurations. 2. Define test automation scripts as well as the success criteria in the automation tool so continued workflow automation can be initiated when specific cases fail. 3. Define specific failure criteria for automated rollback. 4. Prioritize test automation to drive consistent results with thorough test case development where complexity and human interaction have a higher risk of failure. 5. Integrate your automated testing and rollback tools into your CI/CD pipeline. 1. Develop clear success criteria for your changes. 2. Monitor and observe to detect these criteria and automatically reverse changes when specific rollback criteria are met. 6. Perform different types of automated production testing, such as: 1. A/B testing to show results in comparison to the current version between two user testing groups. 2. Canary testing that allows you to roll out your change to a subset of users before releasing it to all. 3. Feature-flag testing which allows a single feature of the new version at a time to be flagged on and off from outside the application so that each new feature can be validated one at a time. 4. Regression testing to verify new functionality with existing interrelated components. 7. Monitor the operational aspects of the application, transactions, and interactions with other applications and components. Develop reports to show success of changes by workload so that you can identify what parts of the automation and workflow can be further optimized. 1. Develop test result reports that help you make quick decisions on whether or not rollback procedures should be invoked. 2. Implement a strategy that allows for automated rollback based upon pre-defined failure conditions that result from one or more of your test methods. 8. Develop your automated test cases to allow for reusability across future repeatable changes. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS07-BP01 Ensure personnel capability

Have a mechanism to validate that you have the appropriate number of trained personnel to support the workload. They must be trained on the platform and services that make up your workload. Provide them with the knowledge necessary to operate the workload. You must have enough trained personnel to support the normal operation of the workload and troubleshoot any incidents that occur. Have enough personnel so that you can rotate during on-call and vacations to avoid burnout. **Desired outcome** - There are enough trained personnel to support the workload at times when the workload is available. - You provide training for your personnel on the software and services that make up your workload. **Common anti-patterns** - Deploying a workload without team members trained to operate the platform and services in use. - Not having enough personnel to support on-call rotations or personnel taking time off. **Benefits of establishing this best practice** - Having skilled team members helps effective support of your workload. - With enough team members, you can support the workload and on-call rotations while decreasing the risk of burnout. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Validate that there are sufficient trained personnel to support the workload. Verify that you have enough team members to cover normal operational activities, including on-call rotations. **Customer example** AnyCompany Retail makes sure that teams supporting the workload are properly staffed and trained. They have enough engineers to support an on-call rotation. Personnel get training on the software and platform that the workload is built on and are encouraged to earn certifications. There are enough personnel so that people can take time off while still supporting the workload and the on-call rotation. ### Implementation steps 1. Assign an adequate number of personnel to operate and support your workload, including on-call duties, security issues, and lifecycle events, such as end of support and certificate rotation tasks. 2. Train your personnel on the software and platforms that compose your workload. 1. AWS Training and Certification has a library of courses about AWS. They provide free and paid courses, online and in-person. 2. AWS hosts events and webinars where you learn from AWS experts. 3. Perform the following on a regular basis: 1. Evaluate team size and skills as operating conditions and the workload change. 2. Adjust team size and skills to match operational requirements. 3. Verify ability and capacity to address planned lifecycle events, unplanned security, and operational notifications through AWS Health. **Level of effort for the implementation plan:** High. Hiring and training a team to support a workload can take significant effort but has substantial long-term benefits.

๐Ÿ’ผ OPS07-BP02 Ensure a consistent review of operational readiness

Use Operational Readiness Reviews (ORRs) to validate that you can operate your workload. ORR is a mechanism developed at Amazon to validate that teams can safely operate their workloads. An ORR is a review and inspection process using a checklist of requirements. An ORR is a self-service experience that teams use to certify their workloads. ORRs include best practices from lessons learned from our years of building software. An ORR checklist is composed of architectural recommendations, operational process, event management, and release quality. Our Correction of Error (CoE) process is a major driver of these items. Your own post-incident analysis should drive the evolution of your own ORR. An ORR is not only about following best practices but preventing the recurrence of events that youโ€™ve seen before. Lastly, security, governance, and compliance requirements can also be included in an ORR. Run ORRs before a workload launches to general availability and then throughout the software development lifecycle. Running the ORR before launch increases your ability to operate the workload safely. Periodically re-run your ORR on the workload to catch any drift from best practices. You can have ORR checklists for new services launches and ORRs for periodic reviews. This helps keep you up to date on new best practices that arise and incorporate lessons learned from post-incident analysis. As your use of the cloud matures, you can build ORR requirements into your architecture as defaults. **Desired outcome** - You have an ORR checklist with best practices for your organization. ORRs are conducted before workloads launch. - ORRs are run periodically over the course of the workload lifecycle. **Common anti-patterns** - You launch a workload without knowing if you can operate it. - Governance and security requirements are not included in certifying a workload for launch. - Workloads are not re-evaluated periodically. - Workloads launch without required procedures in place. - You see repetition of the same root cause failures in multiple workloads. **Benefits of establishing this best practice** - Your workloads include architecture, process, and management best practices. - Lessons learned are incorporated into your ORR process. - Required procedures are in place when workloads launch. - ORRs are run throughout the software lifecycle of your workloads. **Level of risk if this best practice is not established:** High ## Implementation guidance An ORR is two things: a process and a checklist. Your ORR process should be adopted by your organization and supported by an executive sponsor. At a minimum, ORRs must be conducted before a workload launches to general availability. Run the ORR throughout the software development lifecycle to keep it up to date with best practices or new requirements. The ORR checklist should include configuration items, security and governance requirements, and best practices from your organization. Over time, you can use services, such as AWS Config, AWS Security Hub, and AWS Control Tower Guardrails, to build best practices from the ORR into guardrails for automatic detection of best practices. **Customer example** After several production incidents, AnyCompany Retail decided to implement an ORR process. They built a checklist composed of best practices, governance and compliance requirements, and lessons learned from outages. New workloads conduct ORRs before they launch. Every workload conducts a yearly ORR with a subset of best practices to incorporate new best practices and requirements that are added to the ORR checklist. Over time, AnyCompany Retail used AWS Config to detect some best practices, speeding up the ORR process. ### Implementation steps To learn more about ORRs, read the Operational Readiness Reviews (ORR) whitepaper. It provides detailed information on the history of the ORR process, how to build your own ORR practice, and how to develop your ORR checklist. The following steps are an abbreviated version of that document. For an in-depth understanding of what ORRs are and how to build your own, we recommend reading that whitepaper. 1. Gather the key stakeholders together, including representatives from security, operations, and development. 2. Have each stakeholder provide at least one requirement. For the first iteration, try to limit the number of items to thirty or less. - Appendix B: Example ORR questions from the Operational Readiness Reviews (ORR) whitepaper contains sample questions that you can use to get started. 3. Collect your requirements into a spreadsheet. - You can use custom lenses in the AWS Well-Architected Tool to develop your ORR and share them across your accounts and AWS Organization. 4. Identify one workload to conduct the ORR on. A pre-launch workload or an internal workload is ideal. 5. Run through the ORR checklist and take note of any discoveries made. Discoveries might be acceptable if a mitigation is in place. For any discovery that lacks a mitigation, add those to your backlog of items and implement them before launch. 6. Continue to add best practices and requirements to your ORR checklist over time. Support customers with Enterprise Support can request the Operational Readiness Review Workshop from their Technical Account Manager. The workshop is an interactive working backwards session to develop your own ORR checklist. **Level of effort for the implementation plan:** High. Adopting an ORR practice in your organization requires executive sponsorship and stakeholder buy-in. Build and update the checklist with inputs from across your organization.

๐Ÿ’ผ OPS07-BP03 Use runbooks to perform procedures

A runbook is a documented process to achieve a specific outcome. Runbooks consist of a series of steps that someone follows to get something done. Runbooks have been used in operations going back to the early days of aviation. In cloud operations, we use runbooks to reduce risk and achieve desired outcomes. At its simplest, a runbook is a checklist to complete a task. Runbooks are an essential part of operating your workload. From onboarding a new team member to deploying a major release, runbooks are the codified processes that provide consistent outcomes no matter who uses them. Runbooks should be published in a central location and updated as the process evolves, as updating runbooks is a key component of a change management process. They should also include guidance on error handling, tools, permissions, exceptions, and escalations in case a problem occurs. As your organization matures, begin automating runbooks. Start with runbooks that are short and frequently used. Use scripting languages to automate steps or make steps easier to perform. As you automate the first few runbooks, you'll dedicate time to automating more complex runbooks. Over time, most of your runbooks should be automated in some way. **Desired outcome** - Your team has a collection of step-by-step guides for performing workload tasks. - The runbooks contain the desired outcome, necessary tools and permissions, and instructions for error handling. - They are stored in a central location (version control system) and updated frequently. For example, your runbooks provide capabilities for your teams to monitor, communicate, and respond to AWS Health events for critical accounts during application alarms, operational issues, and planned lifecycle events. **Common anti-patterns** - Relying on memory to complete each step of a process. - Manually deploying changes without a checklist. - Different team members performing the same process but with different steps or outcomes. - Letting runbooks drift out of sync with system changes and automation. **Benefits of establishing this best practice** - Reducing error rates for manual tasks. - Operations are performed in a consistent manner. - New team members can start performing tasks sooner. - Runbooks can be automated to reduce toil. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Runbooks can take several forms depending on the maturity level of your organization. At a minimum, they should consist of a step-by-step text document. The desired outcome should be clearly indicated. Clearly document necessary special permissions or tools. Provide detailed guidance on error handling and escalations in case something goes wrong. List the runbook owner and publish it in a central location. Once your runbook is documented, validate it by having someone else on your team run it. As procedures evolve, update your runbooks in accordance with your change management process. Your text runbooks should be automated as your organization matures. Using services like AWS Systems Manager automations, you can transform flat text into automations that can be run against your workload. These automations can be run in response to events, reducing the operational burden to maintain your workload. AWS Systems Manager Automation also provides a low-code visual design experience to create automation runbooks more easily. **Customer example** AnyCompany Retail must perform database schema updates during software deployments. The Cloud Operations Team worked with the Database Administration Team to build a runbook for manually deploying these changes. The runbook listed each step in the process in checklist form. It included a section on error handling in case something went wrong. They published the runbook on their internal wiki along with their other runbooks. The Cloud Operations Team plans to automate the runbook in a future sprint. ### Implementation steps If you don't have an existing document repository, a version control repository is a great place to start building your runbook library. You can build your runbooks using Markdown. We have provided an example runbook template that you can use to start building runbooks. 1. If you don't have an existing documentation repository or wiki, create a new version control repository in your version control system. 2. Identify a process that does not have a runbook. An ideal process is one that is conducted semi-regularly, short in number of steps, and has low impact failures. 3. In your document repository, create a new draft Markdown document using the template. Fill in Runbook Title and the required fields under Runbook Info. 4. Starting with the first step, fill in the Steps portion of the runbook. 5. Give the runbook to a team member. Have them use the runbook to validate the steps. If something is missing or needs clarity, update the runbook. 6. Publish the runbook to your internal documentation store. Once published, tell your team and other stakeholders. 7. Over time, you'll build a library of runbooks. As that library grows, start working to automate runbooks. **Level of effort for the implementation plan:** Low. The minimum standard for a runbook is a step-by-step text guide. Automating runbooks can increase the implementation effort.

๐Ÿ’ผ OPS07-BP04 Use playbooks to investigate issues

Playbooks are step-by-step guides used to investigate an incident. When incidents happen, playbooks are used to investigate, scope impact, and identify a root cause. Playbooks are used for a variety of scenarios, from failed deployments to security incidents. In many cases, playbooks identify the root cause that a runbook is used to mitigate. Playbooks are an essential component of your organization's incident response plans. A good playbook has several key features. It guides the user, step by step, through the process of discovery. Thinking outside-in, what steps should someone follow to diagnose an incident? Clearly define in the playbook if special tools or elevated permissions are needed in the playbook. Having a communication plan to update stakeholders on the status of the investigation is a key component. In situations where a root cause can't be identified, the playbook should have an escalation plan. If the root cause is identified, the playbook should point to a runbook that describes how to resolve it. Playbooks should be stored centrally and regularly maintained. If playbooks are used for specific alerts, provide your team with pointers to the playbook within the alert. As your organization matures, automate your playbooks. Start with playbooks that cover low-risk incidents. Use scripting to automate the discovery steps. Make sure that you have companion runbooks to mitigate common root causes. **Desired outcome** - Your organization has playbooks for common incidents. The playbooks are stored in a central location and available to your team members. - Playbooks are updated frequently. - For any known root causes, companion runbooks are built. **Common anti-patterns** - There is no standard way to investigate an incident. - Team members rely on muscle memory or institutional knowledge to troubleshoot a failed deployment. - New team members learn how to investigate issues through trial and error. - Best practices for investigating issues are not shared across teams. **Benefits of establishing this best practice** - Playbooks boost your efforts to mitigate incidents. - Different team members can use the same playbook to identify a root cause in a consistent manner. - Known root causes can have runbooks developed for them, speeding up recovery time. - Playbooks help team members to start contributing sooner. - Teams can scale their processes with repeatable playbooks. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance How you build and use playbooks depends on the maturity of your organization. If you are new to the cloud, build playbooks in text form in a central document repository. As your organization matures, playbooks can become semi-automated with scripting languages like Python. These scripts can be run inside a Jupyter notebook to speed up discovery. Advanced organizations have fully automated playbooks for common issues that are auto-remediated with runbooks. Start building your playbooks by listing common incidents that happen to your workload. Choose playbooks for incidents that are low risk and where the root cause has been narrowed down to a few issues to start. After you have playbooks for simpler scenarios, move on to the higher risk scenarios or scenarios where the root cause is not well known. Your text playbooks should be automated as your organization matures. Using services like AWS Systems Manager Automations, flat text can be transformed into automations. These automations can be run against your workload to speed up investigations. These automations can be activated in response to events, reducing the mean time to discover and resolve incidents. Customers can use AWS Systems Manager Incident Manager to respond to incidents. This service provides a single interface to triage incidents, inform stakeholders during discovery and mitigation, and collaborate throughout the incident. It uses AWS Systems Manager Automations to speed up detection and recovery. **Customer example** A production incident impacted AnyCompany Retail. The on-call engineer used a playbook to investigate the issue. As they progressed through the steps, they kept the key stakeholders, identified in the playbook, up to date. The engineer identified the root cause as a race condition in a backend service. Using a runbook, the engineer relaunched the service, bringing AnyCompany Retail back online. ### Implementation steps If you don't have an existing document repository, we suggest creating a version control repository for your playbook library. You can build your playbooks using Markdown, which is compatible with most playbook automation systems. If you are starting from scratch, use the following example playbook template. 1. If you don't have an existing document repository or wiki, create a new version control repository for your playbooks in your version control system. 2. Identify a common issue that requires investigation. This should be a scenario where the root cause is limited to a few issues and resolution is low risk. 3. Using the Markdown template, fill in the Playbook Name section and the fields under Playbook Info. 4. Fill in the troubleshooting steps. Be as clear as possible on what actions to perform or what areas you should investigate. 5. Give a team member the playbook and have them go through it to validate it. If there's anything missing or something isn't clear, update the playbook. 6. Publish your playbook in your document repository and inform your team and any stakeholders. 7. This playbook library will grow as you add more playbooks. Once you have several playbooks, start automating them using tools like AWS Systems Manager Automations to keep automation and playbooks in sync. **Level of effort for the implementation plan:** Low. Your playbooks should be text documents stored in a central location. More mature organizations will move towards automating playbooks.

๐Ÿ’ผ OPS07-BP05 Make informed decisions to deploy systems and changes

Have processes in place for successful and unsuccessful changes to your workload. A pre-mortem is an exercise where a team simulates a failure to develop mitigation strategies. Use pre-mortems to anticipate failure and create procedures where appropriate. Evaluate the benefits and risks of deploying changes to your workload. Verify that all changes comply with governance. **Desired outcome** - You make informed decisions when deploying changes to your workload. - Changes comply with governance. **Common anti-patterns** - Deploying a change to our workload without a process to handle a failed deployment. - Making changes to your production environment that are out of compliance with governance requirements. - Deploying a new version of your workload without establishing a baseline for resource utilization. **Benefits of establishing this best practice** - You are prepared for unsuccessful changes to your workload. - Changes to your workload are compliant with governance policies. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Use pre-mortems to develop processes for unsuccessful changes. Document your processes for unsuccessful changes. Ensure that all changes comply with governance. Evaluate the benefits and risks to deploying changes to your workload. **Customer example** AnyCompany Retail regularly conducts pre-mortems to validate their processes for unsuccessful changes. They document their processes in a shared Wiki and update it frequently. All changes comply with governance requirements. ### Implementation steps 1. Make informed decisions when deploying changes to your workload. Establish and review criteria for a successful deployment. Develop scenarios or criteria that would initiate a rollback of a change. Weigh the benefits of deploying changes against the risks of an unsuccessful change. 2. Verify that all changes comply with governance policies. 3. Use pre-mortems to plan for unsuccessful changes and document mitigation strategies. Run a table-top exercise to model an unsuccessful change and validate roll-back procedures. **Level of effort for the implementation plan:** Moderate. Implementing a practice of pre-mortems requires coordination and effort from stakeholders across your organization.

๐Ÿ’ผ OPS07-BP06 Create support plans for production workloads

Enable support for any software and services that your production workload relies on. Select an appropriate support level to meet your production service-level needs. Support plans for these dependencies are necessary in case there is a service disruption or software issue. Document support plans and how to request support for all service and software vendors. Implement mechanisms that verify that support points of contacts are kept up to date. **Desired outcome** - Implement support plans for software and services that production workloads rely on. - Choose an appropriate support plan based on service-level needs. - Document the support plans, support levels, and how to request support. **Common anti-patterns** - You have no support plan for a critical software vendor. Your workload is impacted by them and you can do nothing to expedite a fix or get timely updates from the vendor. - A developer that was the primary point of contact for a software vendor left the company. You are not able to reach the vendor support directly. You must spend time researching and navigating generic contact systems, increasing the time required to respond when needed. - A production outage occurs with a software vendor. There is no documentation on how to file a support case. **Benefits of establishing this best practice** - With the appropriate support level, you are able to get a response in the time frame necessary to meet service-level needs. - As a supported customer you can escalate if there are production issues. - Software and services vendors can assist in troubleshooting during an incident. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Enable support plans for any software and services vendors that your production workload relies on. Set up appropriate support plans to meet service-level needs. For AWS customers, this means activating AWS Business Support or greater on any accounts where you have production workloads. Meet with support vendors on a regular cadence to get updates about support offerings, processes, and contacts. Document how to request support from software and services vendors, including how to escalate if there is an outage. Implement mechanisms to keep support contacts up to date. **Customer example** At AnyCompany Retail, all commercial software and services dependencies have support plans. For example, they have AWS Enterprise Support activated on all accounts with production workloads. Any developer can raise a support case when there is an issue. There is a wiki page with information on how to request support, whom to notify, and best practices for expediting a case. ### Implementation steps 1. Work with stakeholders in your organization to identify software and services vendors that your workload relies on. Document these dependencies. 2. Determine service-level needs for your workload. Select a support plan that aligns with them. 3. For commercial software and services, establish a support plan with the vendors. 1. Subscribing to AWS Business Support or greater for all production accounts provides faster response time from AWS Support and is strongly recommended. If you donโ€™t have premium support, you must have an action plan to handle issues, which require help from AWS Support. AWS Support provides a mix of tools and technology, people, and programs designed to proactively help you optimize performance, lower costs, and innovate faster. In addition, AWS Business Support provides additional benefits, including API access to AWS Trusted Advisor and AWS Health for programmatic integration with your systems, alongside other access methods like the AWS Management Console and Amazon EventBridge channels. 4. Document the support plan in your knowledge management tool. Include how to request support, who to notify if a support case is filed, and how to escalate during an incident. A wiki is a good mechanism to allow anyone to make necessary updates to documentation when they become aware of changes to support processes or contacts. **Level of effort for the implementation plan:** Low. Most software and services vendors offer opt-in support plans. Documenting and sharing support best practices on your knowledge management system verifies that your team knows what to do when there is a production issue.

๐Ÿ’ผ OPS08-BP01 Analyze workload metrics

After implementing application telemetry, regularly analyze the collected metrics. While latency, requests, errors, and capacity (or quotas) provide insights into system performance, it's vital to prioritize the review of business outcome metrics. This ensures you're making data-driven decisions aligned with your business objectives. **Desired outcome** - Accurate insights into workload performance that drive data-informed decisions, ensuring alignment with business objectives. **Common anti-patterns** - Analyzing metrics in isolation without considering their impact on business outcomes. - Over-reliance on technical metrics while sidelining business metrics. - Infrequent review of metrics, missing out on real-time decision-making opportunities. **Benefits of establishing this best practice** - Enhanced understanding of the correlation between technical performance and business outcomes. - Improved decision-making process informed by real-time data. - Proactive identification and mitigation of issues before they affect business outcomes. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Leverage tools like Amazon CloudWatch to perform metric analysis. AWS services such as CloudWatch anomaly detection and Amazon DevOps Guru can be used to detect anomalies, especially when static thresholds are unknown or when patterns of behavior are more suited for anomaly detection. ### Implementation steps 1. Analyze and review: Regularly review and interpret your workload metrics. 1. Prioritize business outcome metrics over purely technical metrics. 2. Understand the significance of spikes, drops, or patterns in your data. 2. Utilize Amazon CloudWatch: Use Amazon CloudWatch for a centralized view and deep-dive analysis. 1. Configure CloudWatch dashboards to visualize your metrics and compare them over time. 2. Use percentiles in CloudWatch to get a clear view of metric distribution, which can help in defining SLAs and understanding outliers. 3. Set up CloudWatch anomaly detection to identify unusual patterns without relying on static thresholds. 4. Implement CloudWatch cross-account observability to monitor and troubleshoot applications that span multiple accounts within a Region. 5. Use CloudWatch Metric Insights to query and analyze metric data across accounts and Regions, identifying trends and anomalies. 6. Apply CloudWatch Metric Math to transform, aggregate, or perform calculations on your metrics for deeper insights. 3. Employ Amazon DevOps Guru: Incorporate Amazon DevOps Guru for its machine-learning-enhanced anomaly detection to identify early signs of operational issues for your serverless applications and remediate them before they impact your customers. 4. Optimize based on insights: Make informed decisions based on your metric analysis to adjust and improve your workloads. **Level of effort for the Implementation Plan:** Medium

๐Ÿ’ผ OPS08-BP02 Analyze workload logs

Regularly analyzing workload logs is essential for gaining a deeper understanding of the operational aspects of your application. By efficiently sifting through, visualizing, and interpreting log data, you can continually optimize application performance and security. **Desired outcome** - Rich insights into application behavior and operations derived from thorough log analysis, ensuring proactive issue detection and mitigation. **Common anti-patterns** - Neglecting the analysis of logs until a critical issue arises. - Not using the full suite of tools available for log analysis, missing out on critical insights. - Solely relying on manual review of logs without leveraging automation and querying capabilities. **Benefits of establishing this best practice** - Proactive identification of operational bottlenecks, security threats, and other potential issues. - Efficient utilization of log data for continuous application optimization. - Enhanced understanding of application behavior, aiding in debugging and troubleshooting. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Amazon CloudWatch Logs is a powerful tool for log analysis. Integrated features like CloudWatch Logs Insights and Contributor Insights make the process of deriving meaningful information from logs intuitive and efficient. ### Implementation steps 1. Set up CloudWatch Logs: Configure applications and services to send logs to CloudWatch Logs. 2. Use log anomaly detection: Utilize Amazon CloudWatch Logs anomaly detection to automatically identify and alert on unusual log patterns. This tool helps you proactively manage anomalies in your logs and detect potential issues early. 3. Set up CloudWatch Logs Insights: Use CloudWatch Logs Insights to interactively search and analyze your log data. 1. Craft queries to extract patterns, visualize log data, and derive actionable insights. 2. Use CloudWatch Logs Insights pattern analysis to analyze and visualize frequent log patterns. This feature helps you understand common operational trends and potential outliers in your log data. 3. Use CloudWatch Logs compare (diff) to perform differential analysis between different time periods or across different log groups. Use this capability to pinpoint changes and assess their impacts on your system's performance or behavior. 4. Monitor logs in real-time with Live Tail: Use Amazon CloudWatch Logs Live Tail to view log data in real-time. You can actively monitor your application's operational activities as they occur, which provides immediate visibility into system performance and potential issues. 5. Leverage Contributor Insights: Use CloudWatch Contributor Insights to identify top talkers in high cardinality dimensions like IP addresses or user-agents. 6. Implement CloudWatch Logs metric filters: Configure CloudWatch Logs metric filters to convert log data into actionable metrics. This allows you to set alarms or further analyze patterns. 7. Implement CloudWatch cross-account observability: Monitor and troubleshoot applications that span multiple accounts within a Region. 8. Regular review and refinement: Periodically review your log analysis strategies to capture all relevant information and continually optimize application performance. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS08-BP03 Analyze workload traces

Analyzing trace data is crucial for achieving a comprehensive view of an application's operational journey. By visualizing and understanding the interactions between various components, performance can be fine-tuned, bottlenecks identified, and user experiences enhanced. **Desired outcome** - Achieve clear visibility into your application's distributed operations, enabling quicker issue resolution and an enhanced user experience. **Common anti-patterns** - Overlooking trace data, relying solely on logs and metrics. - Not correlating trace data with associated logs. - Ignoring the metrics derived from traces, such as latency and fault rates. **Benefits of establishing this best practice** - Improve troubleshooting and reduce mean time to resolution (MTTR). - Gain insights into dependencies and their impact. - Swift identification and rectification of performance issues. - Leveraging trace-derived metrics for informed decision-making. - Improved user experiences through optimized component interactions. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance AWS X-Ray offers a comprehensive suite for trace data analysis, providing a holistic view of service interactions, monitoring user activities, and detecting performance issues. Features like ServiceLens, X-Ray Insights, X-Ray Analytics, and Amazon DevOps Guru enhance the depth of actionable insights derived from trace data. ### Implementation steps 1. Integrate AWS X-Ray: Ensure X-Ray is integrated with your applications to capture trace data. 2. Analyze X-Ray metrics: Delve into metrics derived from X-Ray traces, such as latency, request rates, fault rates, and response time distributions, using the service map to monitor application health. 3. Use ServiceLens: Leverage the ServiceLens map for enhanced observability of your services and applications. This allows for integrated viewing of traces, metrics, logs, alarms, and other health information. 4. Enable X-Ray Insights: 1. Turn on X-Ray Insights for automated anomaly detection in traces. 2. Examine insights to pinpoint patterns and ascertain root causes, such as increased fault rates or latencies. 3. Consult the insights timeline for a chronological analysis of detected issues. 5. Use X-Ray Analytics: X-Ray Analytics allows you to thoroughly explore trace data, pinpoint patterns, and extract insights. 6. Use groups in X-Ray: Create groups in X-Ray to filter traces based on criteria such as high latency, allowing for more targeted analysis. 7. Incorporate Amazon DevOps Guru: Engage Amazon DevOps Guru to benefit from machine learning models pinpointing operational anomalies in traces. 8. Use CloudWatch Synthetics: Use CloudWatch Synthetics to create canaries for continually monitoring your endpoints and workflows. These canaries can integrate with X-Ray to provide trace data for in-depth analysis of the applications being tested. 9. Use Real User Monitoring (RUM): With AWS X-Ray and CloudWatch RUM, you can analyze and debug the request path starting from end users of your application through downstream AWS managed services. This helps you identify latency trends and errors that impact your end users. 10. Correlate with logs: Correlate trace data with related logs within the X-Ray trace view for a granular perspective on application behavior. This allows you to view log events directly associated with traced transactions. 11. Implement CloudWatch cross-account observability: Monitor and troubleshoot applications that span multiple accounts within a Region. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS08-BP04 Create actionable alerts

Promptly detecting and responding to deviations in your application's behavior is crucial. Especially vital is recognizing when outcomes based on key performance indicators (KPIs) are at risk or when unexpected anomalies arise. Basing alerts on KPIs ensures that the signals you receive are directly tied to business or operational impact. This approach to actionable alerts promotes proactive responses and helps maintain system performance and reliability. **Desired outcome** - Receive timely, relevant, and actionable alerts for rapid identification and mitigation of potential issues, especially when KPI outcomes are at risk. **Common anti-patterns** - Setting up too many non-critical alerts, leading to alert fatigue. - Not prioritizing alerts based on KPIs, making it hard to understand the business impact of issues. - Neglecting to address root causes, leading to repetitive alerts for the same issue. **Benefits of establishing this best practice** - Reduced alert fatigue by focusing on actionable and relevant alerts. - Improved system uptime and reliability through proactive issue detection and mitigation. - Enhanced team collaboration and quicker issue resolution by integrating with popular alerting and communication tools. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To create an effective alerting mechanism, it's vital to use metrics, logs, and trace data that flag when outcomes based on KPIs are at risk or anomalies are detected. ### Implementation steps 1. Determine key performance indicators (KPIs): Identify your application's KPIs. Alerts should be tied to these KPIs to reflect the business impact accurately. 2. Implement anomaly detection: - Use Amazon CloudWatch anomaly detection: Set up Amazon CloudWatch anomaly detection to automatically detect unusual patterns, which helps you only generate alerts for genuine anomalies. - Use AWS X-Ray Insights: 1. Set up X-Ray Insights to detect anomalies in trace data. 2. Configure notifications for X-Ray Insights to be alerted on detected issues. - Integrate with Amazon DevOps Guru: 1. Leverage Amazon DevOps Guru for its machine learning capabilities in detecting operational anomalies with existing data. 2. Navigate to the notification settings in DevOps Guru to set up anomaly alerts. 3. Implement actionable alerts: Design alerts that provide adequate information for immediate action. 1. Monitor AWS Health events with Amazon EventBridge rules, or integrate programatically with the AWS Health API to automate actions when you receive AWS Health events. These can be general actions, such as sending all planned lifecycle event messages to a chat interface, or specific actions, such as the initiation of a workflow in an IT service management tool. 4. Reduce alert fatigue: Minimize non-critical alerts. When teams are overwhelmed with numerous insignificant alerts, they can lose oversight of critical issues, which diminishes the overall effectiveness of the alert mechanism. 5. Set up composite alarms: Use Amazon CloudWatch composite alarms to consolidate multiple alarms. 6. Integrate with alert tools: Incorporate tools like Ops Genie and PagerDuty. 7. Engage Amazon Q Developer in chat applications: Integrate Amazon Q Developer in chat applications to relay alerts to Amazon Chime, Microsoft Teams, and Slack. 8. Alert based on logs: Use log metric filters in CloudWatch to create alarms based on specific log events. 9. Review and iterate: Regularly revisit and refine alert configurations. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS08-BP05 Create dashboards

Dashboards are the human-centric view into the telemetry data of your workloads. While they provide a vital visual interface, they should not replace alerting mechanisms, but complement them. When crafted with care, not only can they offer rapid insights into system health and performance, but they can also present stakeholders with real-time information on business outcomes and the impact of issues. **Desired outcome** - Clear, actionable insights into system and business health using visual representations. **Common anti-patterns** - Overcomplicating dashboards with too many metrics. - Relying on dashboards without alerts for anomaly detection. - Not updating dashboards as workloads evolve. **Benefits of this best practice** - Immediate visibility into critical system metrics and KPIs. - Enhanced stakeholder communication and understanding. - Rapid insight into the impact of operational issues. **Level of risk if this best practice isn't established:** Medium ## Implementation guidance **Business-centric dashboards** Dashboards tailored to business KPIs engage a wider array of stakeholders. While these individuals might not be interested in system metrics, they are keen on understanding the business implications of these numbers. A business-centric dashboard ensures that all technical and operational metrics being monitored and analyzed are in sync with overarching business goals. This alignment provides clarity, ensuring everyone is on the same page regarding what's essential and what's not. Additionally, dashboards that highlight business KPIs tend to be more actionable. Stakeholders can quickly understand the health of operations, areas that need attention, and the potential impact on business outcomes. With this in mind, when creating your dashboards, ensure that there's a balance between technical metrics and business KPIs. Both are vital, but they cater to different audiences. Ideally, you should have dashboards that provide a holistic view of the system's health and performance while also emphasizing key business outcomes and their implications. Amazon CloudWatch Dashboards are customizable home pages in the CloudWatch console that you can use to monitor your resources in a single view, even those resources that are spread across different AWS Regions and accounts. ### Implementation steps 1. Create a basic dashboard: Create a new dashboard in CloudWatch, giving it a descriptive name. 2. Use Markdown widgets: Before diving into the metrics, use Markdown widgets to add textual context at the top of your dashboard. This should explain what the dashboard covers, the significance of the represented metrics, and can also contain links to other dashboards and troubleshooting tools. 3. Create dashboard variables: Incorporate dashboard variables where appropriate to allow for dynamic and flexible dashboard views. 4. Create metrics widgets: Add metric widgets to visualize various metrics your application emits, tailoring these widgets to effectively represent system health and business outcomes. 5. Log Insights queries: Utilize CloudWatch Log Insights to derive actionable metrics from your logs and display these insights on your dashboard. 6. Set up alarms: Integrate CloudWatch Alarms into your dashboard for a quick view of any metrics breaching their thresholds. 7. Use Contributor Insights: Incorporate CloudWatch Contributor Insights to analyze high-cardinality fields and get a clearer understanding of your resource's top contributors. 8. Design custom widgets: For specific needs not met by standard widgets, consider creating custom widgets. These can pull from various data sources or represent data in unique ways. 9. Use AWS Health: AWS Health is the authoritative source of information about the health of your AWS Cloud resources. Use AWS Health Dashboard out of the box, or use AWS Health data in your own dashboards and tools so you have the right information available to make informed decisions. 10. Iterate and refine: As your application evolves, regularly revisit your dashboard to ensure its relevance.

๐Ÿ’ผ OPS09-BP01 Measure operations goals and KPIs with metrics

Obtain goals and KPIs that define operations success from your organization and determine that metrics reflect these. Set baselines as a point of reference and reevaluate regularly. Develop mechanisms to collect these metrics from teams for evaluation. The DevOps Research and Assessment (DORA) metrics provide a popular method to measure progress towards DevOps practices of software delivery. **Desired outcome** - The organization publishes and shares the goals and KPIs for the operations teams. - You establish metrics that reflect these KPIs. Examples may include: - Ticket queue depth or average age of ticket - Ticket count grouped by type of issue - Time spent working issues with or without a standardized operating procedure (SOP) - Amount of time spent recovering from a failed code push - Call volume **Common anti-patterns** - Deployment deadlines are missed because developers are pulled away to perform troubleshooting tasks. Development teams argue for more personnel, but cannot quantify how many they need because the time taken away cannot be measured. - A Tier 1 desk was set up to handle user calls. Over time, more workloads were added, but no headcount was allocated to the Tier 1 desk. Customer satisfaction suffers as call times increase and issues go longer without resolution, but management sees no indicators of such, preventing any action. - A problematic workload has been handed off to a separate operations team for upkeep. Unlike other workloads, this new one was not supplied with proper documentation and runbooks. As such, teams spend longer troubleshooting and addressing failures. However, there are no metrics documenting this, which makes accountability difficult. **Benefits of establishing this best practice** - Where workload monitoring shows the state of our applications and services, monitoring operations teams provides owners insight into changes among the consumers of those workloads, such as shifting business needs. - Measure the effectiveness of these teams and evaluate them against business goals by creating metrics that can reflect the state of operations. - Metrics can highlight support issues or identify when drifts occur away from a service level target. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Schedule time with business leaders and stakeholders to determine what the overall goals of the service will be. Determine what the tasks of various operations teams should be and what challenges they could be approached with. Using these, brainstorm key performance indicators (KPIs) that might reflect these operations goals. These might be customer satisfaction, time from feature conception to deployment, average issue resolution time, or cost efficiencies. Working from KPIs, identify the metrics and sources of data that might reflect these goals best. Customer satisfaction may be a combination of various metrics such as call wait or response times, satisfaction scores, and types of issues raised. Deployment times may be the sum of time needed for testing and deployment, plus any post-deployment fixes that needed to be added. Statistics showing the time spent on different types of issues (or the counts of those issues) can provide a window into where targeted effort is needed.

๐Ÿ’ผ OPS09-BP02 Communicate status and trends to ensure visibility into operation

Knowing the state of your operations and its trending direction is necessary to identify when outcomes may be at risk, whether or not added work can be supported, or the effects that changes have had to your teams. During operations events, having status pages that users and operations teams can refer to for information can reduce pressure on communication channels and disseminate information proactively. **Desired outcome** - Operations leaders have insight at a glance to see what sort of call volumes their teams are operating under and what efforts may be under way, such as deployments. - Alerts are disseminated to stakeholders and user communities when impacts to normal operations occur. - Organization leadership and stakeholders can check a status page in response to an alert or impact, and obtain information surrounding an operational event, such as points of contact, ticket information, and estimated recovery times. - Reports are made available to leadership and other stakeholders to show operations statistics such as call volumes over a period of time, user satisfaction scores, numbers of outstanding tickets and their ages. **Common anti-patterns** - A workload goes down, leaving a service unavailable. Call volumes spike as users request to know what's going on. Managers add to the volume requesting to know who's working an issue. Various operations teams duplicate efforts in trying to investigate. - A desire for a new capability leads to several personnel being reassigned to an engineering effort. No backfill is provided, and issue resolution times spike. This information is not captured, and only after several weeks and dissatisfied user feedback does leadership become aware of the issue. **Benefits of establishing this best practice** - During operational events where the business is impacted, much time and energy can be wasted querying information from various teams attempting to understand the situation. By establishing widely-disseminated status pages and dashboards, stakeholders can quickly obtain information such as whether or not an issue was detected, who has lead on the issue, or when a return to normal operations may be expected. This frees team members from spending too much time communicating status to others and more time addressing issues. Dashboards and reports can provide insights to decision-makers and stakeholders to see how operations teams are able to respond to business needs and how their resources are being allocated. This is crucial for determining if adequate resources are in place to support the business. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance 1. Build dashboards that show the current key metrics for your ops teams, and make them readily accessible both to operations leaders and management. 2. Build status pages that can be updated quickly to show when an incident or event is unfolding, who has ownership and who is coordinating the response. Share any steps or workarounds that users should consider on this page, and disseminate the location widely. Encourage users to check this location first when confronted with an unknown issue. 3. Collect and provide reports that show the health of operations over time, and distribute this to leaders and decision makers to illustrate the work of operations along with challenges and needs. 4. Share between teams these metrics and reports that best reflect goals and KPIs and where they have been influential in driving change. Dedicate time to these activities to elevate the importance of operations inside of and between teams. 5. Use AWS Health alongside your own dashboards, or integrate AWS Health events into them, so that your teams can correlate application issues to AWS service status.

๐Ÿ’ผ OPS09-BP03 Review operations metrics and prioritize improvement

Setting aside dedicated time and resources for reviewing the state of operations ensures that serving the day-to-day line of business remains a priority. Pull together operations leaders and stakeholders to regularly review metrics, reaffirm or modify goals and objectives, and prioritize improvements. **Desired outcome** - Operations leaders and staff regularly meet to review metrics over a given reporting period. Challenges are communicated, wins are celebrated, and lessons learned are shared. - Stakeholders and business leaders are regularly briefed on the state of operations and solicited for input regarding goals, KPIs, and future initiatives. Tradeoffs between service delivery, operations, and maintenance are discussed and placed into context. **Common anti-patterns** - A new product is launched, but the Tier 1 and Tier 2 operations teams are not adequately trained to support or given additional staff. Metrics that show the decrease in ticket resolution times and increase in incident volumes are not seen by leaders. Action is taken weeks later when subscription numbers start to fall as discontent users move off the platform. - A manual process for performing maintenance on a workload has been in place for a long time. While a desire to automate has been present, this was a low priority given the low importance of the system. Over time however, the system has grown in importance and now these manual processes consume a majority of operations' time. No resources are scheduled for providing increased tooling to operations, leading to staff burnout as workloads increase. Leadership becomes aware once it's reported that staff are leaving for other competitors. **Benefits of establishing this best practice** - Ensures operations receives the same attention and resources as service delivery and new offerings. - Provides early visibility into risks before they impact business outcomes. - Operations teams gain insights into impending business changes and initiatives, enabling proactive efforts. - Leadership gains visibility into operational metrics, improving prioritization and allocation of resources. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance 1. Dedicate time to review operations metrics between stakeholders and operations teams. Review report data in the context of the organization's goals and objectives to determine if they are being met. 2. Identify sources of ambiguity where goals are unclear or where conflicts exist between requested outcomes and actual deliverables. 3. Determine where time, people, and tools can aid operations outcomes. Map these to KPIs and define targets for success. 4. Revisit reviews regularly to ensure operations is sufficiently resourced to support the line of business.

๐Ÿ’ผ OPS10-BP01 Use a process for event, incident, and problem management

The ability to efficiently manage events, incidents, and problems is key to maintaining workload health and performance. Establishing and following well-defined processes for each ensures swift, effective handling of operational challenges. **Desired outcome** - The organization effectively manages operational events, incidents, and problems through documented and centrally stored processes. - Processes are updated regularly to reflect changes, ensuring streamlined handling, high service reliability, and workload performance. **Common anti-patterns** - Reactive response to events rather than proactive monitoring. - Inconsistent handling of different types of events or incidents. - Failure to analyze incidents for root causes to prevent recurrence. **Benefits of establishing this best practice** - Streamlined and standardized response processes. - Reduced impact of incidents on services and customers. - Faster issue resolution. - Continuous improvement in operational processes. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Implementing this best practice involves tracking events, responding to incidents, and managing problems. Processes should be documented, shared, and frequently updated. ### Understanding events, incidents, and problems - **Events:** Observations of an action, occurrence, or change of state, planned or unplanned, internal or external. - **Incidents:** Events requiring a response due to unplanned interruptions or service degradations. - **Problems:** Root causes of one or more incidents, identified to prevent recurrence. ### Implementation steps #### Events 1. **Monitor events:** - Utilize observability tools to track application and workload activities. - Record user and service actions with AWS CloudTrail. - Respond to operational changes in real time via Amazon EventBridge. - Continuously assess resource configuration changes using AWS Config. 2. **Create processes:** - Define thresholds for normal and abnormal activities. - Establish criteria for escalating an event to an incident. - Review monitoring and response processes regularly, adjusting thresholds and alerting mechanisms. #### Incidents 1. **Respond to incidents:** - Use observability insights to quickly identify and resolve incidents. - Aggregate and manage incidents with AWS Systems Manager Ops Center. - Analyze and troubleshoot using Amazon CloudWatch and AWS X-Ray. - Leverage AWS Managed Services (AMS) or Enterprise Support features like Incident Detection and Response. 2. **Incident management process:** - Define clear roles, communication protocols, and steps for resolution. - Integrate with chat tools (e.g., Amazon Q Developer) for coordination. - Categorize incidents by severity with predefined response plans. 3. **Learn and improve:** - Conduct post-incident reviews and root cause analysis. - Update response plans and share lessons learned across teams. - Enterprise Support customers may use Incident Management Workshops to test and refine processes. #### Problems 1. **Identify problems:** - Analyze incident data to detect recurring patterns. - Use AWS CloudTrail and CloudWatch to uncover systemic issues. - Engage cross-functional teams for diverse perspectives on root causes. 2. **Problem management process:** - Focus on long-term solutions rather than quick fixes. - Apply root cause analysis techniques. - Update operational procedures and infrastructure to prevent recurrence. 3. **Continue to improve:** - Promote a culture of learning and proactive problem identification. - Regularly revise problem management processes to align with evolving business and technology needs. - Share insights and best practices organization-wide. 4. **Engage AWS Support:** - Leverage AWS Trusted Advisor for proactive guidance. - Enterprise Support customers can access specialized programs like AWS Countdown for critical events. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS10-BP02 Have a process per alert

Establishing a clear and defined process for each alert in your system is essential for effective and efficient incident management. This practice ensures that every alert leads to a specific, actionable response, improving the reliability and responsiveness of your operations. **Desired outcome:** Every alert initiates a specific, well-defined response plan. Where possible, responses are automated, with clear ownership and a defined escalation path. Alerts are linked to an up-to-date knowledge base so that any operator can respond consistently and effectively. Responses are quick and uniform across the board, enhancing operational efficiency and reliability. **Common anti-patterns** - Alerts have no predefined response process, leading to makeshift and delayed resolutions. - Alert overload causes important alerts to be overlooked. - Alerts are inconsistently handled due to lack of clear ownership and responsibility. **Benefits of establishing this best practice** - Reduced alert fatigue by only raising actionable alerts. - Decreased mean time to resolution (MTTR) for operational issues. - Decreased mean time to investigate (MTTI), which helps reduce MTTR. - Enhanced ability to scale operational responses. - Improved consistency and reliability in handling operational events. For example, you have a defined process for AWS Health events for critical accounts, including application alarms, operational issues, and planned lifecycle events (like updating Amazon EKS versions before clusters are auto-updated), and you provide the capability for your teams to actively monitor, communicate, and respond to these events. These actions help you prevent service disruptions caused by AWS-side changes or mitigate them faster when unexpected issues occur. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Having a process per alert involves establishing a clear response plan for each alert, automating responses where possible, and continually refining these processes based on operational feedback and evolving requirements. ### Implementation steps 1. Use composite alarms: Create composite alarms in CloudWatch to group related alarms, reducing noise and allowing for more meaningful responses. 2. Stay informed with AWS Health: AWS Health is the authoritative source of information about the health of your AWS Cloud resources. Use AWS Health to visualize and get notified of any current service events and upcoming changes, such as planned lifecycle events, so you can take steps to mitigate impacts. 1. Create purpose-fit AWS Health event notifications to e-mail and chat channels through AWS User Notifications, and integrate programatically with your monitoring and alerting tools through Amazon EventBridge or the AWS Health API. 2. Plan and track progress on health events that require action by integrating with change management or ITSM tools (like Jira or ServiceNow) that you may already use through Amazon EventBridge or the AWS Health API. 3. If you use AWS Organizations, enable organization view for AWS Health to aggregate AWS Health events across accounts. 3. Integrate Amazon CloudWatch alarms with Incident Manager: Configure CloudWatch alarms to automatically create incidents in AWS Systems Manager Incident Manager. 4. Integrate Amazon EventBridge with Incident Manager: Create EventBridge rules to react to events and create incidents using defined response plans. 5. Prepare for incidents in Incident Manager: 1. Establish detailed response plans in Incident Manager for each type of alert. 2. Establish chat channels through Amazon Q Developer in chat applications connected to response plans in Incident Manager, facilitating real-time communication during incidents across platforms like Slack, Microsoft Teams, and Amazon Chime. 3. Incorporate Systems Manager Automation runbooks within Incident Manager to drive automated responses to incidents.

๐Ÿ’ผ OPS10-BP03 Prioritize operational events based on business impact

Responding promptly to operational events is critical, but not all events are equal. When you prioritize based on business impact, you also prioritize addressing events with the potential for significant consequences, such as safety, financial loss, regulatory violations, or damage to reputation. **Desired outcome:** Responses to operational events are prioritized based on potential impact to business operations and objectives. This makes the responses efficient and effective. **Common anti-patterns** - Every event is treated with the same level of urgency, leading to confusion and delays in addressing critical issues. - You fail to distinguish between high and low impact events, leading to misallocation of resources. - Your organization lacks a clear prioritization framework, resulting in inconsistent responses to operational events. - Events are prioritized based on the order they are reported, rather than their impact on business outcomes. **Benefits of establishing this best practice** - Ensures critical business functions receive attention first, minimizing potential damage. - Improves resource allocation during multiple concurrent events. - Enhances the organization's ability to maintain trust and meet regulatory requirements. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance When faced with multiple operational events, a structured approach to prioritization based on impact and urgency is essential. This approach helps you make informed decisions, direct efforts where they're needed most, and mitigate the risk to business continuity. ### Implementation steps 1. Assess impact: Develop a classification system to evaluate the severity of events in terms of their potential impact on business operations and objectives. 2. Assess urgency: Define urgency levels for how quickly an event needs a response, considering factors such as safety, financial implications, and service-level agreements (SLAs). 3. Create a prioritization matrix: - Use a matrix to cross-reference impact and urgency, assigning priority levels to different combinations. - Make the matrix accessible and understood by all team members responsible for operational event responses. 4. Train and communicate: Train response teams on the prioritization matrix and the importance of following it during an event. Communicate the prioritization process to all stakeholders to set clear expectations. 5. Integrate with incident response: - Incorporate the prioritization matrix into your incident response plans and tools. - Automate the classification and prioritization of events where possible to speed up response times. - Enterprise Support customers can leverage AWS Incident Detection and Response, which provides 24x7 proactive monitoring and incident management for production workloads. 6. Review and adapt: Regularly review the effectiveness of the prioritization process and make adjustments based on feedback and changes in the business environment.

๐Ÿ’ผ OPS10-BP04 Define escalation paths

Establish clear escalation paths within your incident response protocols to facilitate timely and effective action. This includes specifying prompts for escalation, detailing the escalation process, and pre-approving actions to expedite decision-making and reduce mean time to resolution (MTTR). **Desired outcome:** A structured and efficient process that escalates incidents to the appropriate personnel, minimizing response times and impact. **Common anti-patterns** - Lack of clarity on recovery procedures leads to makeshift responses during critical incidents. - Absence of defined permissions and ownership results in delays when urgent action is needed. - Stakeholders and customers are not informed in line with expectations. - Important decisions are delayed. **Benefits of establishing this best practice** - Streamlined incident response through predefined escalation procedures. - Reduced downtime with pre-approved actions and clear ownership. - Improved resource allocation and support-level adjustments according to incident severity. - Improved communication to stakeholders and customers. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Properly defined escalation paths are crucial for rapid incident response. AWS Systems Manager Incident Manager supports the setup of structured escalation plans and on-call schedules, which alert the right personnel so that they are ready to act when incidents occur. ### Implementation steps 1. Set up escalation prompts: Set up CloudWatch alarms to create an incident in AWS Systems Manager Incident Manager. 2. Set up on-call schedules: Create on-call schedules in Incident Manager that align with your escalation paths. Equip on-call personnel with the necessary permissions and tools to act swiftly. 3. Detail escalation procedures: - Determine specific conditions under which an incident should be escalated. - Create escalation plans in Incident Manager. - Escalation channels should consist of a contact or an on-call schedule. - Define the roles and responsibilities of the team at each escalation level. 4. Pre-approve mitigation actions: Collaborate with decision-makers to pre-approve actions for anticipated scenarios. Use Systems Manager Automation runbooks integrated with Incident Manager to speed up incident resolution. 5. Specify ownership: Clearly identify internal owners for each step of the escalation path. 6. Detail third-party escalations: - Document third-party service-level agreements (SLAs), and align them with internal goals. - Set clear protocols for vendor communication during incidents. - Integrate vendor contacts into incident management tools for direct access. - Conduct regular drills that include third-party response scenarios. - Keep vendor escalation information well-documented and easily accessible. 7. Train and rehearse escalation plans: Train your team on the escalation process and conduct regular incident response drills or game days. Enterprise Support customers can request an Incident Management Workshop. 8. Continue to improve: Review the effectiveness of your escalation paths regularly. Update your processes based on lessons learned from incident post-mortems and continuous feedback. **Level of effort for the implementation plan:** Moderate

๐Ÿ’ผ OPS10-BP05 Define a customer communication plan for service-impacting events

Effective communication during service impacting events is critical to maintain trust and transparency with customers. A well-defined communication plan helps your organization quickly and clearly share information, both internally and externally, during incidents. **Desired outcome** - A robust communication plan that effectively informs customers and stakeholders during service impacting events. - Transparency in communication to build trust and reduce customer anxiety. - Minimizing the impact of service impacting events on customer experience and business operations. **Common anti-patterns** - Inadequate or delayed communication leads to customer confusion and dissatisfaction. - Overly technical or vague messaging fails to convey the actual impact on users. - There is no predefined communication strategy, resulting in inconsistent and reactive messaging. **Benefits of establishing this best practice** - Enhanced customer trust and satisfaction through proactive and clear communication. - Reduced burden on support teams by preemptively addressing customer concerns. - Improved ability to manage and recover from incidents effectively. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Creating a comprehensive communication plan for service impacting events involves multiple facets, from choosing the right channels to crafting the message and tone. The plan should be adaptable, scalable, and cater to different outage scenarios. ### Implementation steps 1. Define roles and responsibilities: - Assign a major incident manager to oversee incident response activities. - Designate a communications manager responsible for coordinating all external and internal communications. - Include the support manager to provide consistent communication through support tickets. 2. Identify communication channels: Select channels like workplace chat, email, SMS, social media, in-app notifications, and status pages. These channels should be resilient and able to operate independently during service impacting events. 3. Communicate quickly, clearly, and regularly to customers: - Develop templates for various service impairment scenarios, emphasizing simplicity and essential details. Include information about the service impairment, expected resolution time, and impact. - Use Amazon Pinpoint to alert customers using push notifications, in-app notifications, emails, text messages, voice messages, and messages over custom channels. - Use Amazon Simple Notification Service (Amazon SNS) to alert subscribers programatically or through email, mobile push notifications, and text messages. - Communicate status through dashboards by sharing an Amazon CloudWatch dashboard publicly. 4. Encourage social media engagement: - Actively monitor social media to understand customer sentiment. - Post on social media platforms for public updates and community engagement. - Prepare templates for consistent and clear social media communication. 5. Coordinate internal communication: Implement internal protocols using tools like Amazon Q Developer in chat applications for team coordination and communication. Use CloudWatch dashboards to communicate status. 6. Orchestrate communication with dedicated tools and services: - Use AWS Systems Manager Incident Manager with Amazon Q Developer in chat applications to set up dedicated chat channels for real-time internal communication and coordination during incidents. - Use AWS Systems Manager Incident Manager runbooks to automate customer notifications through Amazon Pinpoint, Amazon SNS, or third-party tools like social media platforms during incidents. - Incorporate approval workflows within runbooks to optionally review and authorize all external communications before sending. 7. Practice and improve: - Conduct training on the use of communication tools and strategies. Empower teams to make timely decisions during incidents. - Test the communication plan through regular drills or gamedays. Use these tests to refine messaging and evaluate the effectiveness of channels. - Implement feedback mechanisms to assess communication effectiveness during incidents. Continually evolve the communication plan based on feedback and changing needs. **Level of effort for the implementation plan:** High

๐Ÿ’ผ OPS10-BP06 Communicate status through dashboards

Use dashboards as a strategic tool to convey real-time operational status and key metrics to different audiences, including internal technical teams, leadership, and customers. These dashboards offer a centralized, visual representation of system health and business performance, enhancing transparency and decision-making efficiency. **Desired outcome** - Your dashboards provide a comprehensive view of the system and business metrics relevant to different stakeholders. - Stakeholders can proactively access operational information, reducing the need for frequent status requests. - Real-time decision-making is enhanced during normal operations and incidents. **Common anti-patterns** - Engineers joining an incident management call require status updates to get up to speed. - Relying on manual reporting for management, which leads to delays and potential inaccuracies. - Operations teams are frequently interrupted for status updates during incidents. **Benefits of establishing this best practice** - Empowers stakeholders with immediate access to critical information, promoting informed decision-making. - Reduces operational inefficiencies by minimizing manual reporting and frequent status inquiries. - Increases transparency and trust through real-time visibility into system performance and business metrics. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Dashboards effectively communicate the status of your system and business metrics and can be tailored to the needs of different audience groups. Tools like Amazon CloudWatch dashboards and Amazon QuickSight help you create interactive, real-time dashboards for system monitoring and business intelligence. ### Implementation steps 1. Identify stakeholder needs: Determine the specific information needs of different audience groups, such as technical teams, leadership, and customers. 2. Choose the right tools: Select appropriate tools like Amazon CloudWatch dashboards for system monitoring and Amazon QuickSight for interactive business intelligence. AWS Health provides a ready-to-use experience in the AWS Health Dashboard, or you can use Health events in Amazon EventBridge or through the AWS Health API to augment your own dashboards. 3. Design effective dashboards: - Design dashboards to clearly present relevant metrics and KPIs, ensuring they are understandable and actionable. - Incorporate system-level and business-level views as needed. - Include both high-level (for broad overviews) and low-level (for detailed analysis) dashboards. - Integrate automated alarms within dashboards to highlight critical issues. - Annotate dashboards with important metrics thresholds and goals for immediate visibility. 4. Integrate data sources: - Use Amazon CloudWatch to aggregate and display metrics from various AWS services and query metrics from other data sources, creating a unified view of your system's health and business metrics. - Use features like CloudWatch Logs Insights to query and visualize log data from different applications and services. - Use AWS Health events to stay informed about the operational status and confirmed operational issues from AWS services through the AWS Health API or AWS Health events on Amazon EventBridge. 5. Provide self-service access: - Share CloudWatch dashboards with relevant stakeholders for self-service information access using dashboard sharing features. - Ensure that dashboards are easily accessible and provide real-time, up-to-date information. 6. Regularly update and refine: - Continually update and refine dashboards to align with evolving business needs and stakeholder feedback. - Regularly review the dashboards to keep them relevant and effective for conveying the necessary information.

๐Ÿ’ผ OPS10-BP07 Automate responses to events

Automating event responses is key for fast, consistent, and error-free operational handling. Create streamlined processes and use tools to automatically manage and respond to events, minimizing manual interventions and enhancing operational effectiveness. **Desired outcome** - Reduced human errors and faster resolution times through automation. - Consistent and reliable operational event handling. - Enhanced operational efficiency and system reliability. **Common anti-patterns** - Manual event handling leads to delays and errors. - Automation is overlooked in repetitive, critical tasks. - Repetitive, manual tasks lead to alert fatigue and missing critical issues. **Benefits of establishing this best practice** - Accelerated event responses, reducing system downtime. - Reliable operations with automated and consistent event handling. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Incorporate automation to create efficient operational workflows and minimize manual interventions. ### Implementation steps 1. Identify automation opportunities: Determine repetitive tasks for automation, such as issue remediation, ticket enrichment, capacity management, scaling, deployments, and testing. 2. Identify automation prompts: - Assess and define specific conditions or metrics that initiate automated responses using Amazon CloudWatch alarm actions. - Use Amazon EventBridge to respond to events in AWS services, custom workloads, and SaaS applications. - Consider initiation events such as specific log entries, performance metrics thresholds, or state changes in AWS resources. 3. Implement event-driven automation: - Use AWS Systems Manager Automation runbooks to simplify maintenance, deployment, and remediation tasks. - Creating incidents in Incident Manager automatically gathers and adds details about the involved AWS resources to the incident. - Proactively monitor quotas using Quota Monitor for AWS. - Automatically adjust capacity with AWS Auto Scaling to maintain availability and performance. - Automate development pipelines with Amazon CodeCatalyst. - Smoke test or continually monitor endpoints and APIs using synthetic monitoring. 4. Perform risk mitigation through automation: - Implement automated security responses to swiftly address risks. - Use AWS Systems Manager State Manager to reduce configuration drift. - Remediate noncompliant resources with AWS Config Rules. **Level of effort for the implementation plan:** High

๐Ÿ’ผ OPS11-BP01 Have a process for continuous improvement

Evaluate your workload against internal and external architecture best practices. Conduct frequent, intentional workload reviews. Prioritize improvement opportunities into your software development cadence. **Desired outcome** - You analyze your workload against architecture best practices frequently. - You give improvement opportunities equal priority to features in your software development process. **Common anti-patterns** - You have not conducted an architecture review on your workload since it was deployed several years ago. - You give a lower priority to improvement opportunities. Compared to new features, these opportunities stay in the backlog. - There is no standard for implementing modifications to best practices for the organization. **Benefits of establishing this best practice** - Your workload is kept up-to-date on architecture best practices. - You evolve your workload in an intentional manner. - You can leverage organization best practices to improve all workloads. - You make marginal gains that have a cumulative impact, which drives deeper efficiencies. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Frequently conduct an architectural review of your workload. Use internal and external best practices, evaluate your workload, and identify improvement opportunities. Prioritize improvement opportunities into your software development cadence. ### Implementation steps 1. Conduct periodic architecture reviews of your production workload with an agreed-upon frequency. Use a documented architectural standard that includes AWS-specific best practices. 1. Use your internally-defined standards for these reviews. If you do not have an internal standard, use the AWS Well-Architected Framework. 2. Use the AWS Well-Architected Tool to create a custom lens of your internal best practices and conduct your architecture review. 3. Contact your AWS Solution Architect or Technical Account Manager to conduct a guided Well-Architected Framework Review of your workload. 2. Prioritize improvement opportunities identified during the review into your software development process. **Level of effort for the implementation plan:** Low. You can use the AWS Well-Architected Framework to conduct your yearly architecture review.

๐Ÿ’ผ OPS11-BP02 Perform post-incident analysis

Review customer-impacting events and identify the contributing factors and preventative actions. Use this information to develop mitigations to limit or prevent recurrence. Develop procedures for prompt and effective responses. Communicate contributing factors and corrective actions as appropriate, tailored to target audiences. **Desired outcome** - You have established incident management processes that include post-incident analysis. - You have observability plans in place to collect data on events. - With this data, you understand and collect metrics that support your post-incident analysis process. - You learn from incidents to improve future outcomes. **Common anti-patterns** - You administer an application server. Approximately every 23 hours and 55 minutes all your active sessions are terminated. You have tried to identify what is going wrong on your application server. You suspect it could instead be a network issue but are unable to get cooperation from the network team as they are too busy to support you. You lack a predefined process to follow to get support and collect the information necessary to determine what is going on. - You have had data loss within your workload. This is the first time it has happened and the cause is not obvious. You decide it is not important because you can recreate the data. Data loss starts occurring with greater frequency impacting your customers. This also places additional operational burden on you as you restore the missing data. **Benefits of establishing this best practice** - You have a predefined process to determine the components, conditions, actions, and events that contributed to an incident, which helps you identify opportunities for improvement. - You use data from post-incident analysis to make improvements. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Use a process to determine contributing factors. Review all customer-impacting incidents. Have a process to identify and document the contributing factors of an incident so that you can develop mitigations to limit or prevent recurrence and you can develop procedures for prompt and effective responses. Communicate incident root causes as appropriate, and tailor the communication to your target audience. Share learnings openly within your organization. ### Implementation steps 1. Collect metrics such as deployment change, configuration change, incident start time, alarm time, time of engagement, mitigation start time, and incident resolved time. 2. Describe key time points on the timeline to understand the events of the incident. 3. Ask the following questions: - Could you improve time to detection? - Are there updates to metrics and alarms that would detect the incident sooner? - Can you improve the time to diagnosis? - Are there updates to your response plans or escalation plans that would engage the correct responders sooner? - Can you improve the time to mitigation? - Are there runbook or playbook steps that you could add or improve? - Can you prevent future incidents from occurring? 4. Create checklists and actions. Track and deliver all actions. **Level of effort for the implementation plan:** Medium

๐Ÿ’ผ OPS11-BP03 Implement feedback loops

Feedback loops provide actionable insights that drive decision making. Build feedback loops into your procedures and workloads. This helps you identify issues and areas that need improvement. They also validate investments made in improvements. These feedback loops are the foundation for continuously improving your workload. Feedback loops fall into two categories: immediate feedback and retrospective analysis. Immediate feedback is gathered through review of the performance and outcomes from operations activities. This feedback comes from team members, customers, or the automated output of the activity. Immediate feedback is received from things like A/B testing and shipping new features, and it is essential to failing fast. Retrospective analysis is performed regularly to capture feedback from the review of operational outcomes and metrics over time. These retrospectives happen at the end of a sprint, on a cadence, or after major releases or events. This type of feedback loop validates investments in operations or your workload. It helps you measure success and validates your strategy. **Desired outcome** - You use immediate feedback and retrospective analysis to drive improvements. - There is a mechanism to capture user and team member feedback. - Retrospective analysis is used to identify trends that drive improvements. **Common anti-patterns** - You launch a new feature but have no way of receiving customer feedback on it. - After investing in operations improvements, you donโ€™t conduct a retrospective to validate them. - You collect customer feedback but donโ€™t regularly review it. - Feedback loops lead to proposed action items but they arenโ€™t included in the software development process. - Customers donโ€™t receive feedback on improvements theyโ€™ve proposed. **Benefits of establishing this best practice** - You can work backwards from the customer to drive new features. - Your organization culture can react to changes faster. - Trends are used to identify improvement opportunities. - Retrospectives validate investments made to your workload and operations. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Implementing this best practice means that you use both immediate feedback and retrospective analysis. These feedback loops drive improvements. There are many mechanisms for immediate feedback, including surveys, customer polls, or feedback forms. Your organization also uses retrospectives to identify improvement opportunities and validate initiatives. **Customer example** AnyCompany Retail created a web form where customers can give feedback or report issues. During the weekly scrum, user feedback is evaluated by the software development team. Feedback is regularly used to steer the evolution of their platform. They conduct a retrospective at the end of each sprint to identify items they want to improve. ### Implementation steps **Immediate feedback** 1. You need a mechanism to receive feedback from customers and team members. Your operations activities can also be configured to deliver automated feedback. 2. Your organization needs a process to review this feedback, determine what to improve, and schedule the improvement. 3. Feedback must be added into your software development process. 4. As you make improvements, follow up with the feedback submitter. 5. You can use AWS Systems Manager OpsCenter to create and track these improvements as OpsItems. **Retrospective analysis** 1. Conduct retrospectives at the end of a development cycle, on a set cadence, or after a major release. 2. Gather stakeholders involved in the workload for a retrospective meeting. 3. Create three columns on a whiteboard or spreadsheet: Stop, Start, and Keep. - Stop is for anything that you want your team to stop doing. - Start is for ideas that you want to start doing. - Keep is for items that you want to keep doing. 4. Go around the room and gather feedback from the stakeholders. 5. Prioritize the feedback. Assign actions and stakeholders to any Start or Keep items. 6. Add the actions to your software development process and communicate status updates to stakeholders as you make the improvements. **Level of effort for the implementation plan:** Medium. To implement this best practice, you need a way to take in immediate feedback and analyze it. Also, you need to establish a retrospective analysis process.

๐Ÿ’ผ OPS11-BP04 Perform knowledge management

Knowledge management helps team members find the information to perform their job. In learning organizations, information is freely shared which empowers individuals. The information can be discovered or searched. Information is accurate and up to date. Mechanisms exist to create new information, update existing information, and archive outdated information. The most common example of a knowledge management platform is a content management system like a wiki. **Desired outcome** - Team members have access to timely, accurate information. - Information is searchable. - Mechanisms exist to add, update, and archive information. **Common anti-patterns** - There is no centralized knowledge storage. Team members manage their own notes on their local machines. - You have a self-hosted wiki but no mechanisms to manage information, resulting in outdated information. - Someone identifies missing information but thereโ€™s no process to request adding it to the team wiki. They add it themselves but they miss a key step, leading to an outage. **Benefits of establishing this best practice** - Team members are empowered because information is shared freely. - New team members are onboarded faster because documentation is up to date and searchable. - Information is timely, accurate, and actionable. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Knowledge management is an important facet of learning organizations. To begin, you need a central repository to store your knowledge (as a common example, a self-hosted wiki). You must develop processes for adding, updating, and archiving knowledge. Develop standards for what should be documented and let everyone contribute. **Customer example** AnyCompany Retail hosts an internal Wiki where all knowledge is stored. Team members are encouraged to add to the knowledge base as they go about their daily duties. On a quarterly basis, a cross-functional team evaluates which pages are least updated and determines if they should be archived or updated. ### Implementation steps 1. Start with identifying the content management system where knowledge will be stored. Get agreement from stakeholders across your organization. 1. If you donโ€™t have an existing content management system, consider running a self-hosted wiki or using a version control repository as a starting point. 2. Develop runbooks for adding, updating, and archiving information. Educate your team on these processes. 3. Identify what knowledge should be stored in the content management system. Start with daily activities (runbooks and playbooks) that team members perform. Work with stakeholders to prioritize what knowledge is added. 4. On a periodic basis, work with stakeholders to identify out-of-date information and archive it or bring it up to date. **Level of effort for the implementation plan:** Medium. If you donโ€™t have an existing content management system, you can set up a self-hosted wiki or a version-controlled document repository.

๐Ÿ’ผ OPS11-BP05 Define drivers for improvement

Identify drivers for improvement to help you evaluate and prioritize opportunities based on data and feedback loops. Explore improvement opportunities in your systems and processes, and automate where appropriate. **Desired outcome** - You track data from across your environment. - You correlate events and activities to business outcomes. - You can compare and contrast between environments and systems. - You maintain a detailed activity history of your deployments and outcomes. - You collect data to support your security posture. **Common anti-patterns** - You collect data from across your environment but do not correlate events and activities. - You collect detailed data from across your estate, and it drives high Amazon CloudWatch and AWS CloudTrail activity and cost. However, you do not use this data meaningfully. - You do not account for business outcomes when defining drivers for improvement. - You do not measure the effects of new features. **Benefits of establishing this best practice** - You minimize the impact of event-based motivations or emotional investment by determining criteria for improvement. - You respond to business events, not just technical ones. - You measure your environment to identify areas of improvement. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance - Understand drivers for improvement: You should only make changes to a system when a desired outcome is supported. - Desired capabilities: Evaluate desired features and capabilities when evaluating opportunities for improvement. - Unacceptable issues: Evaluate unacceptable issues, bugs, and vulnerabilities when evaluating opportunities for improvement. Track rightsizing options, and seek optimization opportunities. - Compliance requirements: Evaluate updates and changes required to maintain compliance with regulation, policy, or to remain under support from a third party, when reviewing opportunities for improvement.

๐Ÿ’ผ OPS11-BP06 Validate insights

Review your analysis results and responses with cross-functional teams and business owners. Use these reviews to establish common understanding, identify additional impacts, and determine courses of action. Adjust responses as appropriate. **Desired outcomes** - You review insights with business owners on a regular basis. Business owners provide additional context to newly-gained insights. - You review insights and request feedback from technical peers, and you share your learnings across teams. - You publish data and insights for other technical and business teams to review. You factor in your learnings to new practices by other departments. - Summarize and review new insights with senior leaders. Senior leaders use new insights to define strategy. **Common anti-patterns** - You release a new feature. This feature changes some of your customer behaviors. Your observability does not take these changes into account. You do not quantify the benefits of these changes. - You push a new update and neglect refreshing your CDN. The CDN cache is no longer compatible with the latest release. You measure the percentage of requests with errors. All of your users report HTTP 400 errors when communicating with backend servers. You investigate the client errors and find that because you measured the wrong dimension, your time was wasted. - Your service-level agreement stipulates 99.9% uptime, and your recovery point objective is four hours. The service owner maintains that the system is zero downtime. You implement an expensive and complex replication solution, which wastes time and money. **Benefits of establishing this best practice** - When you validate insights with business owners and subject matter experts, you establish common understanding and more effectively guide improvement. - You discover hidden issues and factor them into future decisions. - Your focus moves from technical outcomes to business outcomes. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Validate insights: Engage with business owners and subject matter experts to ensure there is common understanding and agreement of the meaning of the data you have collected. Identify additional concerns, potential impacts, and determine a courses of action.

๐Ÿ’ผ OPS11-BP07 Perform operations metrics reviews

Regularly perform retrospective analysis of operations metrics with cross-team participants from different areas of the business. Use these reviews to identify opportunities for improvement, potential courses of action, and to share lessons learned. Look for opportunities to improve in all of your environments (for example, development, test, and production). **Desired outcome** - You frequently review business-affecting metrics. - You detect and review anomalies through your observability capabilities. - You use data to support business outcomes and goals. **Common anti-patterns** - Your maintenance window interrupts a significant retail promotion. The business remains unaware that there is a standard maintenance window that could be delayed if there are other business impacting events. - You suffered an extended outage because you commonly use an outdated library in your organization. You have since migrated to a supported library. The other teams in your organization do not know that they are at risk. - You do not regularly review attainment of customer SLAs. You are trending to not meet your customer SLAs. There are financial penalties related to not meeting your customer SLAs. **Benefits of establishing this best practice** - When you meet regularly to review operations metrics, events, and incidents, you maintain common understanding across teams. - Your team meets routinely to review metrics and incidents, which positions you to take action on risks and recognize customer SLAs. - You share lessons learned, which provides data for prioritization and targeted improvements for business outcomes. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance - Regularly perform retrospective analysis of operations metrics with cross-team participants from different areas of the business. - Engage stakeholders, including the business, development, and operations teams, to validate your findings from immediate feedback and retrospective analysis and share lessons learned. - Use their insights to identify opportunities for improvement and potential courses of action.

๐Ÿ’ผ OPS11-BP08 Document and share lessons learned

Document and share lessons learned from the operations activities so that you can use them internally and across teams. You should share what your teams learn to increase the benefit across your organization. Share information and resources to prevent avoidable errors and ease development efforts, and focus on delivery of desired features. Use AWS Identity and Access Management (IAM) to define permissions that permit controlled access to the resources you wish to share within and across accounts. **Desired outcome** - You use version-controlled repositories to share application libraries, scripted procedures, procedure documentation, and other system documentation. - You share your infrastructure standards as version-controlled AWS CloudFormation templates. - You review lessons learned across teams. **Common anti-patterns** - You suffered an extended outage because your organization commonly uses buggy library. You have since migrated to a reliable library. The other teams in your organization do not know they are at risk. No one documents and shares the experience with this library, and they are not aware of the risk. - You have identified an edge case in an internally-shared microservice that causes sessions to drop. You have updated your calls to the service to avoid this edge case. The other teams in your organization do not know that they are at risk. - You have found a way to significantly reduce the CPU utilization requirements for one of your microservices. You do not know if any other teams could take advantage of this technique. **Benefits of establishing this best practice:** Share lessons learned to support improvement and to maximize the benefits of experience. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance - Document and share lessons learned: Have procedures to document the lessons learned from the running of operations activities and retrospective analysis so that they can be used by other teams. - Share learnings: Have procedures to share lessons learned and associated artifacts across teams. For example, share updated procedures, guidance, governance, and best practices through an accessible wiki. Share scripts, code, and libraries through a common repository. - Leverage AWS re:Post Private as a knowledge service to streamline collaboration and knowledge sharing within your organization.

๐Ÿ’ผ OPS11-BP09 Allocate time to make improvements

Dedicate time and resources within your processes to make continuous incremental improvements possible. **Desired outcome** - You create temporary duplicates of environments, which lowers the risk, effort, and cost of experimentation and testing. - These duplicated environments can be used to test the conclusions from your analysis, experiment, and develop and test planned improvements. - You run gamedays, and you use Fault Injection Service (FIS) to provide the controls and guardrails that teams need to run experiments in a production-like environment. **Common anti-patterns** - There is a known performance issue in your application server. It is added to the backlog behind every planned feature implementation. If the rate of planned features being added remains constant, the performance issue would never be addressed. - To support continual improvement, you approve administrators and developers using all their extra time to select and implement improvements. No improvements are ever completed. - Operational acceptance is complete, and you do not test operational practices again. **Benefits of establishing this best practice:** By dedicating time and resources within your processes, you can make continuous, incremental improvements possible. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance - Allocate time to make improvements: Dedicate time and resources within your processes to make continuous, incremental improvements. - Implement changes to improve and evaluate the results to determine success. - If the results do not satisfy the goals and the improvement is still a priority, pursue alternative courses of action. - Simulate production workloads through game days, and use learnings from these simulations to improve.

๐Ÿ’ผ Organization priorities

Your teams need to have a shared understanding of your entire workload, their role in it, and shared business goals to set the priorities that will create business success. Well-defined priorities will maximize the benefits of your efforts. Review your priorities regularly so that they can be updated as your organization's needs change.

๐Ÿ’ผ Organizational Context (GV.OC)

The circumstances - mission, stakeholder expectations, dependencies, and legal, regulatory, and contractual requirements - surrounding the organization's cybersecurity risk management decisions are understood.

๐Ÿ’ผ Oversight (GV.OV)

Results of organization-wide cybersecurity risk management activities and performance are used to inform, improve, and adjust the risk management strategy

๐Ÿ’ผ P1.1-1 Communicates to Data Subjects

Notice is provided to data subjects regarding the following: - Purpose for collecting personal information - Choice and consent - Types of personal information collected - Methods of collection (for example, use of cookies or other tracking techniques) - Use, retention, and disposal - Access - Disclosure to third parties - Security for privacy - Quality, including data subjects' responsibilities for quality - Monitoring and enforcement If personal information is collected from sources other than the individual, such sources are described in the privacy notice.

๐Ÿ’ผ P1.1-2 Provides Notice to Data Subjects

Notice is provided to data subjects (1) at or before the time personal information is collected or as soon as practical thereafter, (2) at or before the entity changes its privacy notice or as soon as practical thereafter, or (3) before personal information is used for new purposes not previously identified.

๐Ÿ’ผ P2.1 The entity communicates choices available regarding the collection, use, retention, disclosure, and disposal of personal information to the data subjects and the consequences, if any, of each choice.

Explicit consent for the collection, use, retention, disclosure, and disposal of personal information is obtained from data subjects or other authorized persons, if required. Such consent is obtained only for the intended purpose of the information to meet the entity's objectives related to privacy. The entity's basis for determining implicit consent for the collection, use, retention, disclosure, and disposal of personal information is documented.

๐Ÿ’ผ P2.1-1 Communicates to Data Subjects

Data subjects are informed (a) about the choices available to them with respect to the collection, use, and disclosure of personal information and (b) that implicit or explicit consent is required to collect, use, and disclose personal information, unless a law or regulation specifically requires or allows otherwise.

๐Ÿ’ผ P2.1-3 Obtains Implicit or Explicit Consent

Implicit or explicit consent is obtained from data subjects at or before the time personal information is collected or soon thereafter. The individualโ€™s preferences expressed in his or her consent are confirmed and implemented.

๐Ÿ’ผ P3.1-2 Collects Information by Fair and Lawful Means

Methods of collecting personal information are reviewed by management before they are implemented to confirm that personal information is obtained (a) fairly, without intimidation or deception, and (b) lawfully, adhering to all relevant rules of law, whether derived from statute or common law, relating to the collection of personal information.

๐Ÿ’ผ P5.1-1 Responds to Data Controller Requests

The entity has a process to respond to data subject requests received from data controllers in accordance with service agreements and privacy objectives. Such process may include authentication of the request, permitting access where appropriate, responding within a reasonable time, and notification if the request is denied.

๐Ÿ’ผ P5.2-1 Responds to Data Controller Requests

The entity has a process to respond to data controllers' update requests, including updates to personal information and denial of requests, in accordance with service agreements to support the achievement of the entity's objectives related to privacy.

๐Ÿ’ผ P5.2-2 Communicates Denial of Access Requests

Data subjects are informed, in writing, of the reason a request for access to their personal information was denied, the source of the entityโ€™s legal right to deny such access, if applicable, and the individualโ€™s right, if any, to challenge such denial, as specifically permitted or required by law or regulation.

๐Ÿ’ผ P6.1-3 Discloses Personal Information Only to Appropriate Third Parties

Personal information is disclosed only to third parties who have agreements with the entity to protect personal information in a manner consistent with the relevant aspects of the entityโ€™s privacy notice or other specific instructions or requirements. The entity has procedures in place to evaluate that the third parties have effective controls to meet the terms of the agreement, instructions, or requirements.

๐Ÿ’ผ PE-1 PHYSICAL AND ENVIRONMENTAL PROTECTION POLICY AND PROCEDURES

The organization: PE-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: PE-1a.1. A physical and environmental protection policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and PE-1a.2. Procedures to facilitate the implementation of the physical and environmental protection policy and associated physical and environmental protection controls; and PE-1b. Reviews and updates the current: PE-1b.1. Physical and environmental protection policy [Assignment: organization-defined frequency]; and PE-1b.2. Physical and environmental protection procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ PE-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] physical and environmental protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the physical and environmental protection policy and the associated physical and environmental protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the physical and environmental protection policy and procedures; and c. Review and update the current physical and environmental protection: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ PE-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] physical and environmental protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the physical and environmental protection policy and the associated physical and environmental protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the physical and environmental protection policy and procedures; and c. Review and update the current physical and environmental protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PE-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] physical and environmental protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the physical and environmental protection policy and the associated physical and environmental protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the physical and environmental protection policy and procedures; and c. Review and update the current physical and environmental protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PE-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] physical and environmental protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the physical and environmental protection policy and the associated physical and environmental protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the physical and environmental protection policy and procedures; and c. Review and update the current physical and environmental protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PE-10 Emergency Shutoff

a. Provide the capability of shutting off power to [Assignment: organization-defined system or individual system components] in emergency situations; b. Place emergency shutoff switches or devices in [Assignment: organization-defined location by system or system component] to facilitate access for authorized personnel; and c. Protect emergency power shutoff capability from unauthorized activation.

๐Ÿ’ผ PE-10 EMERGENCY SHUTOFF

The organization: PE-10a. Provides the capability of shutting off power to the information system or individual system components in emergency situations; PE-10b. Places emergency shutoff switches or devices in [Assignment: organization-defined location by information system or system component] to facilitate safe and easy access for personnel; and PE-10c. Protects emergency power shutoff capability from unauthorized activation.

๐Ÿ’ผ PE-10 Emergency Shutoff (M)(H)

a. Provide the capability of shutting off power to [Assignment: organization-defined system or individual system components] in emergency situations; b. Place emergency shutoff switches or devices in [FedRAMP Assignment: near more than one egress point of the IT area and ensures it is labeled and protected by a cover to prevent accidental shut-off] to facilitate access for authorized personnel; and c. Protect emergency power shutoff capability from unauthorized activation.

๐Ÿ’ผ PE-10 Emergency Shutoff (M)(H)

a. Provide the capability of shutting off power to [Assignment: organization-defined system or individual system components] in emergency situations; b. Place emergency shutoff switches or devices in [FedRAMP Assignment: near more than one egress point of the IT area and ensures it is labeled and protected by a cover to prevent accidental shut-off] to facilitate access for authorized personnel; and c. Protect emergency power shutoff capability from unauthorized activation.

๐Ÿ’ผ PE-11 (2) LONG-TERM ALTERNATE POWER SUPPLY - SELF-CONTAINED

The organization provides a long-term alternate power supply for the information system that is: PE-11 (2)(a) Self-contained; PE-11 (2)(b) Not reliant on external power generation; and PE-11 (2)(c) Capable of maintaining [Selection: minimally required operational capability; full operational capability] in the event of an extended loss of the primary power source.

๐Ÿ’ผ PE-11 Emergency Power

Provide an uninterruptible power supply to facilitate [Selection (one or more): an orderly shutdown of the system; transition of the system to long-term alternate power] in the event of a primary power source loss.

๐Ÿ’ผ PE-11 EMERGENCY POWER

The organization provides a short-term uninterruptible power supply to facilitate [Selection (one or more): an orderly shutdown of the information system; transition of the information system to long-term alternate power] in the event of a primary power source loss.

๐Ÿ’ผ PE-11 Emergency Power (M)(H)

Provide an uninterruptible power supply to facilitate [Selection (one-or-more): Assignment: an orderly shutdown of the system; transition of the system to long-term alternate power] in the event of a primary power source loss.

๐Ÿ’ผ PE-11 Emergency Power (M)(H)

Provide an uninterruptible power supply to facilitate [Selection (one-or-more): Assignment: an orderly shutdown of the system; transition of the system to long-term alternate power] in the event of a primary power source loss.

๐Ÿ’ผ PE-11(2) Emergency Power | Alternate Power Supply โ€” Self-contained

Provide an alternate power supply for the system that is activated [Selection: manually; automatically] and that is: (a) Self-contained; (b) Not reliant on external power generation; and (c) Capable of maintaining [Selection: minimally required operational capability; full operational capability] in the event of an extended loss of the primary power source.

๐Ÿ’ผ PE-12 Emergency Lighting

Employ and maintain automatic emergency lighting for the system that activates in the event of a power outage or disruption and that covers emergency exits and evacuation routes within the facility.

๐Ÿ’ผ PE-12 EMERGENCY LIGHTING

The organization employs and maintains automatic emergency lighting for the information system that activates in the event of a power outage or disruption and that covers emergency exits and evacuation routes within the facility.

๐Ÿ’ผ PE-12 Emergency Lighting (L)(M)(H)

Employ and maintain automatic emergency lighting for the system that activates in the event of a power outage or disruption and that covers emergency exits and evacuation routes within the facility.

๐Ÿ’ผ PE-12 Emergency Lighting (L)(M)(H)

Employ and maintain automatic emergency lighting for the system that activates in the event of a power outage or disruption and that covers emergency exits and evacuation routes within the facility.

๐Ÿ’ผ PE-12 Emergency Lighting (L)(M)(H)

Employ and maintain automatic emergency lighting for the system that activates in the event of a power outage or disruption and that covers emergency exits and evacuation routes within the facility.

๐Ÿ’ผ PE-13 (1) DETECTION DEVICES | SYSTEMS

The organization employs fire detection devices/systems for the information system that activate automatically and notify [Assignment: organization-defined personnel or roles] and [Assignment: organization-defined emergency responders] in the event of a fire.

๐Ÿ’ผ PE-13 (2) SUPPRESSION DEVICES | SYSTEMS

The organization employs fire suppression devices/systems for the information system that provide automatic notification of any activation to Assignment: organization-defined personnel or roles] and [Assignment: organization-defined emergency responders].

๐Ÿ’ผ PE-13 (4) INSPECTIONS

The organization ensures that the facility undergoes [Assignment: organization-defined frequency] inspections by authorized and qualified inspectors and resolves identified deficiencies within [Assignment: organization-defined time period].

๐Ÿ’ผ PE-13 FIRE PROTECTION

The organization employs and maintains fire suppression and detection devices/systems for the information system that are supported by an independent energy source.

๐Ÿ’ผ PE-13(4) Fire Protection | Inspections

Ensure that the facility undergoes [Assignment: organization-defined frequency] fire protection inspections by authorized and qualified inspectors and identified deficiencies are resolved within [Assignment: organization-defined time period].

๐Ÿ’ผ PE-14 Environmental Controls

a. Maintain [Selection (one or more): temperature; humidity; pressure; radiation; [Assignment: organization-defined environmental control]] levels within the facility where the system resides at [Assignment: organization-defined acceptable levels]; and b. Monitor environmental control levels [Assignment: organization-defined frequency].

๐Ÿ’ผ PE-14 Environmental Controls (L)(M)(H)

a. Maintain [FedRAMP Assignment: consistent with American Society of Heating, Refrigerating and Air-conditioning Engineers (ASHRAE) document entitled Thermal Guidelines for Data Processing Environments] levels within the facility where the system resides at [Assignment: organization-defined acceptable levels]; and b. Monitor environmental control levels [FedRAMP Assignment: continuously]. **PE-14 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider measures temperature at server inlets and humidity levels by dew point.

๐Ÿ’ผ PE-14 Environmental Controls (L)(M)(H)

a. Maintain [FedRAMP Assignment: consistent with American Society of Heating, Refrigerating and Air-conditioning Engineers (ASHRAE) document entitled Thermal Guidelines for Data Processing Environments] levels within the facility where the system resides at [Assignment: organization-defined acceptable levels]; and b. Monitor environmental control levels [FedRAMP Assignment: continuously]. **PE-14 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider measures temperature at server inlets and humidity levels by dew point.

๐Ÿ’ผ PE-14 Environmental Controls (L)(M)(H)

a. Maintain [FedRAMP Assignment: consistent with American Society of Heating, Refrigerating and Air-conditioning Engineers (ASHRAE) document entitled Thermal Guidelines for Data Processing Environments] levels within the facility where the system resides at [Assignment: organization-defined acceptable levels]; and b. Monitor environmental control levels [FedRAMP Assignment: continuously]. **PE-14 Additional FedRAMP Requirements and Guidance:** **(a) Requirement**: The service provider measures temperature at server inlets and humidity levels by dew point.

๐Ÿ’ผ PE-14 TEMPERATURE AND HUMIDITY CONTROLS

The organization: PE-14a. Maintains temperature and humidity levels within the facility where the information system resides at [Assignment: organization-defined acceptable levels]; and PE-14b. Monitors temperature and humidity levels [Assignment: organization-defined frequency].

๐Ÿ’ผ PE-15 (1) AUTOMATION SUPPORT

The organization employs automated mechanisms to detect the presence of water in the vicinity of the information system and alerts [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ PE-15 Water Damage Protection

Protect the system from damage resulting from water leakage by providing master shutoff or isolation valves that are accessible, working properly, and known to key personnel.

๐Ÿ’ผ PE-15 WATER DAMAGE PROTECTION

The organization protects the information system from damage resulting from water leakage by providing master shutoff or isolation valves that are accessible, working properly, and known to key personnel.

๐Ÿ’ผ PE-15(1) Automation Support (H)

Detect the presence of water near the system and alert [FedRAMP Assignment: service provider building maintenance/physical security personnel] using [Assignment: organization-defined automated mechanisms].

๐Ÿ’ผ PE-16 Delivery and Removal

a. Authorize and control [Assignment: organization-defined types of system components] entering and exiting the facility; and b. Maintain records of the system components.

๐Ÿ’ผ PE-16 DELIVERY AND REMOVAL

The organization authorizes, monitors, and controls [Assignment: organization-defined types of information system components] entering and exiting the facility and maintains records of those items.

๐Ÿ’ผ PE-17 Alternate Work Site

a. Determine and document the [Assignment: organization-defined alternate work sites] allowed for use by employees; b. Employ the following controls at alternate work sites: [Assignment: organization-defined controls]; c. Assess the effectiveness of controls at alternate work sites; and d. Provide a means for employees to communicate with information security and privacy personnel in case of incidents.

๐Ÿ’ผ PE-17 ALTERNATE WORK SITE

The organization: PE-17a. Employs [Assignment: organization-defined security controls] at alternate work sites; PE-17b. Assesses as feasible, the effectiveness of security controls at alternate work sites; and PE-17c. Provides a means for employees to communicate with information security personnel in case of security incidents or problems.

๐Ÿ’ผ PE-17 Alternate Work Site (M)(H)

a. Determine and document the [Assignment: organization-defined alternate work sites] allowed for use by employees; b. Employ the following controls at alternate work sites: [Assignment: organization-defined controls]; c. Assess the effectiveness of controls at alternate work sites; and d. Provide a means for employees to communicate with information security and privacy personnel in case of incidents.

๐Ÿ’ผ PE-17 Alternate Work Site (M)(H)

a. Determine and document the [Assignment: organization-defined alternate work sites] allowed for use by employees; b. Employ the following controls at alternate work sites: [Assignment: organization-defined controls]; c. Assess the effectiveness of controls at alternate work sites; and d. Provide a means for employees to communicate with information security and privacy personnel in case of incidents.

๐Ÿ’ผ PE-18 (1) FACILITY SITE

The organization plans the location or site of the facility where the information system resides with regard to physical and environmental hazards and for existing facilities, considers the physical and environmental hazards in its risk mitigation strategy.

๐Ÿ’ผ PE-18 LOCATION OF INFORMATION SYSTEM COMPONENTS

The organization positions information system components within the facility to minimize potential damage from [Assignment: organization-defined physical and environmental hazards] and to minimize the opportunity for unauthorized access.

๐Ÿ’ผ PE-18 Location of System Components

Position system components within the facility to minimize potential damage from [Assignment: organization-defined physical and environmental hazards] and to minimize the opportunity for unauthorized access.

๐Ÿ’ผ PE-18 Location of System Components (H)

Position system components within the facility to minimize potential damage from [FedRAMP Assignment: physical and environmental hazards identified during threat assessment] and to minimize the opportunity for unauthorized access.

๐Ÿ’ผ PE-2 (2) TWO FORMS OF IDENTIFICATION

The organization requires two forms of identification from [Assignment: organization-defined list of acceptable forms of identification] for visitor access to the facility where the information system resides.

๐Ÿ’ผ PE-2 (3) RESTRICT UNESCORTED ACCESS

The organization restricts unescorted access to the facility where the information system resides to personnel with [Selection (one or more): security clearances for all information contained within the system; formal access authorizations for all information contained within the system; need for access to all information contained within the system; [Assignment: organization-defined credentials]].

๐Ÿ’ผ PE-2 Physical Access Authorizations

a. Develop, approve, and maintain a list of individuals with authorized access to the facility where the system resides; b. Issue authorization credentials for facility access; c. Review the access list detailing authorized facility access by individuals [Assignment: organization-defined frequency]; and d. Remove individuals from the facility access list when access is no longer required.

๐Ÿ’ผ PE-2 PHYSICAL ACCESS AUTHORIZATIONS

The organization: PE-2a. Develops, approves, and maintains a list of individuals with authorized access to the facility where the information system resides; PE-2b. Issues authorization credentials for facility access; PE-2c. Reviews the access list detailing authorized facility access by individuals [Assignment: organization-defined frequency]; and PE-2d. Removes individuals from the facility access list when access is no longer required.

๐Ÿ’ผ PE-2 Physical Access Authorizations (L)(M)(H)

a. Develop, approve, and maintain a list of individuals with authorized access to the facility where the system resides; b. Issue authorization credentials for facility access; c. Review the access list detailing authorized facility access by individuals [FedRAMP Assignment: at least annually]; and d. Remove individuals from the facility access list when access is no longer required.

๐Ÿ’ผ PE-2 Physical Access Authorizations (L)(M)(H)

a. Develop, approve, and maintain a list of individuals with authorized access to the facility where the system resides; b. Issue authorization credentials for facility access; c. Review the access list detailing authorized facility access by individuals [FedRAMP Assignment: at least annually]; and d. Remove individuals from the facility access list when access is no longer required.

๐Ÿ’ผ PE-2 Physical Access Authorizations (L)(M)(H)

a. Develop, approve, and maintain a list of individuals with authorized access to the facility where the system resides; b. Issue authorization credentials for facility access; c. Review the access list detailing authorized facility access by individuals [FedRAMP Assignment: at least annually]; and d. Remove individuals from the facility access list when access is no longer required.

๐Ÿ’ผ PE-2(3) Physical Access Authorizations | Restrict Unescorted Access

Restrict unescorted access to the facility where the system resides to personnel with [Selection (one or more): security clearances for all information contained within the system; formal access authorizations for all information contained within the system; need for access to all information contained within the system; [Assignment: organization-defined physical access authorizations]].

๐Ÿ’ผ PE-20 Asset Monitoring and Tracking

Employ [Assignment: organization-defined asset location technologies] to track and monitor the location and movement of [Assignment: organization-defined assets] within [Assignment: organization-defined controlled areas].

๐Ÿ’ผ PE-20 ASSET MONITORING AND TRACKING

The organization: PE-20a. Employs [Assignment: organization-defined asset location technologies] to track and monitor the location and movement of [Assignment: organization-defined assets] within [Assignment: organization-defined controlled areas]; and PE-20b. Ensures that asset location technologies are employed in accordance with applicable federal laws, Executive Orders, directives, regulations, policies, standards, and guidance.

๐Ÿ’ผ PE-22 Component Marking

Mark [Assignment: organization-defined system hardware components] indicating the impact level or classification level of the information permitted to be processed, stored, or transmitted by the hardware component.

๐Ÿ’ผ PE-23 Facility Location

a. Plan the location or site of the facility where the system resides considering physical and environmental hazards; and b. For existing facilities, consider the physical and environmental hazards in the organizational risk management strategy.

๐Ÿ’ผ PE-3 (1) INFORMATION SYSTEM ACCESS

The organization enforces physical access authorizations to the information system in addition to the physical access controls for the facility at [Assignment: organization-defined physical spaces containing one or more components of the information system].

๐Ÿ’ผ PE-3 (4) LOCKABLE CASINGS

The organization uses lockable physical casings to protect [Assignment: organization-defined information system components] from unauthorized physical access.

๐Ÿ’ผ PE-3 (5) TAMPER PROTECTION

The organization employs [Assignment: organization-defined security safeguards] to [Selection (one or more): detect; prevent] physical tampering or alteration of [Assignment: organization-defined hardware components] within the information system.

๐Ÿ’ผ PE-3 (6) FACILITY PENETRATION TESTING

The organization employs a penetration testing process that includes [Assignment: organization-defined frequency], unannounced attempts to bypass or circumvent security controls associated with physical access points to the facility.

๐Ÿ’ผ PE-3 Physical Access Control

a. Enforce physical access authorizations at [Assignment: organization-defined entry and exit points to the facility where the system resides] by: 1. Verifying individual access authorizations before granting access to the facility; and 2. Controlling ingress and egress to the facility using [Selection (one or more): [Assignment: organization-defined physical access control systems or devices]; guards]; b. Maintain physical access audit logs for [Assignment: organization-defined entry or exit points]; c. Control access to areas within the facility designated as publicly accessible by implementing the following controls: [Assignment: organization-defined physical access controls]; d. Escort visitors and control visitor activity [Assignment: organization-defined circumstances requiring visitor escorts and control of visitor activity]; e. Secure keys, combinations, and other physical access devices; f. Inventory [Assignment: organization-defined physical access devices] every [Assignment: organization-defined frequency]; and g. Change combinations and keys [Assignment: organization-defined frequency] and/or when keys are lost, combinations are compromised, or when individuals possessing the keys or combinations are transferred or terminated.

๐Ÿ’ผ PE-3 PHYSICAL ACCESS CONTROL

The organization: PE-3a. Enforces physical access authorizations at [Assignment: organization-defined entry/exit points to the facility where the information system resides] by; PE-3a.1. Verifying individual access authorizations before granting access to the facility; and PE-3a.2. Controlling ingress/egress to the facility using [Selection (one or more): [Assignment: organization-defined physical access control systems/devices]; guards]; PE-3b. Maintains physical access audit logs for [Assignment: organization-defined entry/exit points]; PE-3c. Provides [Assignment: organization-defined security safeguards] to control access to areas within the facility officially designated as publicly accessible; PE-3d. Escorts visitors and monitors visitor activity [Assignment: organization-defined circumstances requiring visitor escorts and monitoring]; PE-3e. Secures keys, combinations, and other physical access devices; PE-3f. Inventories [Assignment: organization-defined physical access devices] every [Assignment: organization-defined frequency]; and PE-3g. Changes combinations and keys [Assignment: organization-defined frequency] and/or when keys are lost, combinations are compromised, or individuals are transferred or terminated.

๐Ÿ’ผ PE-3 Physical Access Control (L)(M)(H)

a. Enforce physical access authorizations at [Assignment: organization-defined entry and exit points to the facility where the system resides] by: 1. Verifying individual access authorizations before granting access to the facility; and 2. Controlling ingress and egress to the facility using [FedRAMP Assignment: CSP defined physical access control systems/devices AND guards]; b. Maintain physical access audit logs for [Assignment: organization-defined entry or exit points]; c. Control access to areas within the facility designated as publicly accessible by implementing the following controls: [Assignment: organization-defined physical access controls]; d. Escort visitors and control visitor activity [FedRAMP Assignment: in all circumstances within restricted access area where the information system resides]; e. Secure keys, combinations, and other physical access devices; f. Inventory [Assignment: organization-defined physical access devices] every [FedRAMP Assignment: at least annually]; and g. Change combinations and keys [FedRAMP Assignment: at least annually or earlier as required by a security relevant event] and/or when keys are lost, combinations are compromised, or when individuals possessing the keys or combinations are transferred or terminated.

๐Ÿ’ผ PE-3 Physical Access Control (L)(M)(H)

a. Enforce physical access authorizations at [Assignment: organization-defined entry and exit points to the facility where the system resides] by: 1. Verifying individual access authorizations before granting access to the facility; and 2. Controlling ingress and egress to the facility using [FedRAMP Assignment: CSP defined physical access control systems/devices AND guards]; b. Maintain physical access audit logs for [Assignment: organization-defined entry or exit points]; c. Control access to areas within the facility designated as publicly accessible by implementing the following controls: [Assignment: organization-defined physical access controls]; d. Escort visitors and control visitor activity [FedRAMP Assignment: in all circumstances within restricted access area where the information system resides]; e. Secure keys, combinations, and other physical access devices; f. Inventory [Assignment: organization-defined physical access devices] every [FedRAMP Assignment: at least annually]; and g. Change combinations and keys [FedRAMP Assignment: at least annually or earlier as required by a security relevant event] and/or when keys are lost, combinations are compromised, or when individuals possessing the keys or combinations are transferred or terminated.

๐Ÿ’ผ PE-3 Physical Access Control (L)(M)(H)

a. Enforce physical access authorizations at [Assignment: organization-defined entry and exit points to the facility where the system resides] by: 1. Verifying individual access authorizations before granting access to the facility; and 2. Controlling ingress and egress to the facility using [FedRAMP Assignment: CSP defined physical access control systems/devices AND guards]; b. Maintain physical access audit logs for [Assignment: organization-defined entry or exit points]; c. Control access to areas within the facility designated as publicly accessible by implementing the following controls: [Assignment: organization-defined physical access controls]; d. Escort visitors and control visitor activity [FedRAMP Assignment: in all circumstances within restricted access area where the information system resides]; e. Secure keys, combinations, and other physical access devices; f. Inventory [Assignment: organization-defined physical access devices] every [FedRAMP Assignment: at least annually]; and g. Change combinations and keys [FedRAMP Assignment: at least annually or earlier as required by a security relevant event] and/or when keys are lost, combinations are compromised, or when individuals possessing the keys or combinations are transferred or terminated.

๐Ÿ’ผ PE-3(1) System Access (H)

Enforce physical access authorizations to the system in addition to the physical access controls for the facility at [Assignment: organization-defined physical spaces containing one or more components of the system].

๐Ÿ’ผ PE-4 Access Control for Transmission

Control physical access to [Assignment: organization-defined system distribution and transmission lines] within organizational facilities using [Assignment: organization-defined security controls].

๐Ÿ’ผ PE-4 ACCESS CONTROL FOR TRANSMISSION MEDIUM

The organization controls physical access to [Assignment: organization-defined information system distribution and transmission lines] within organizational facilities using [Assignment: organization-defined security safeguards].

๐Ÿ’ผ PE-5 (3) MARKING OUTPUT DEVICES

The organization marks [Assignment: organization-defined information system output devices] indicating the appropriate security marking of the information permitted to be output from the device.

๐Ÿ’ผ PE-6 (3) VIDEO SURVEILLANCE

The organization employs video surveillance of [Assignment: organization-defined operational areas] and retains video recordings for [Assignment: organization-defined time period].

๐Ÿ’ผ PE-6 Monitoring Physical Access

a. Monitor physical access to the facility where the system resides to detect and respond to physical security incidents; b. Review physical access logs [Assignment: organization-defined frequency] and upon occurrence of [Assignment: organization-defined events or potential indications of events]; and c. Coordinate results of reviews and investigations with the organizational incident response capability.

๐Ÿ’ผ PE-6 MONITORING PHYSICAL ACCESS

The organization: PE-6a. Monitors physical access to the facility where the information system resides to detect and respond to physical security incidents; PE-6b. Reviews physical access logs [Assignment: organization-defined frequency] and upon occurrence of [Assignment: organization-defined events or potential indications of events]; and PE-6c. Coordinates results of reviews and investigations with the organizational incident response capability.

๐Ÿ’ผ PE-6 Monitoring Physical Access (L)(M)(H)

a. Monitor physical access to the facility where the system resides to detect and respond to physical security incidents; b. Review physical access logs [FedRAMP Assignment: at least monthly] and upon occurrence of [Assignment: organization-defined events or potential indications of events]; and c. Coordinate results of reviews and investigations with the organizational incident response capability.

๐Ÿ’ผ PE-6 Monitoring Physical Access (L)(M)(H)

a. Monitor physical access to the facility where the system resides to detect and respond to physical security incidents; b. Review physical access logs [FedRAMP Assignment: at least monthly] and upon occurrence of [Assignment: organization-defined events or potential indications of events]; and c. Coordinate results of reviews and investigations with the organizational incident response capability.

๐Ÿ’ผ PE-6 Monitoring Physical Access (L)(M)(H)

a. Monitor physical access to the facility where the system resides to detect and respond to physical security incidents; b. Review physical access logs [FedRAMP Assignment: at least monthly] and upon occurrence of [Assignment: organization-defined events or potential indications of events]; and c. Coordinate results of reviews and investigations with the organizational incident response capability.

๐Ÿ’ผ PE-8 Visitor Access Records

a. Maintain visitor access records to the facility where the system resides for [Assignment: organization-defined time period]; b. Review visitor access records [Assignment: organization-defined frequency]; and c. Report anomalies in visitor access records to [Assignment: organization-defined personnel].

๐Ÿ’ผ PE-8 VISITOR ACCESS RECORDS

The organization: PE-8a. Maintains visitor access records to the facility where the information system resides for [Assignment: organization-defined time period]; and PE-8b. Reviews visitor access records [Assignment: organization-defined frequency].

๐Ÿ’ผ PE-8 Visitor Access Records (L)(M)(H)

a. Maintain visitor access records to the facility where the system resides for [FedRAMP Assignment: for a minimum of one (1) year]; b. Review visitor access records [FedRAMP Assignment: at least monthly]; and c. Report anomalies in visitor access records to [Assignment: organization-defined personnel].

๐Ÿ’ผ PE-8 Visitor Access Records (L)(M)(H)

a. Maintain visitor access records to the facility where the system resides for [FedRAMP Assignment: for a minimum of one (1) year]; b. Review visitor access records [FedRAMP Assignment: at least monthly]; and c. Report anomalies in visitor access records to [Assignment: organization-defined personnel].

๐Ÿ’ผ PE-8 Visitor Access Records (L)(M)(H)

a. Maintain visitor access records to the facility where the system resides for [FedRAMP Assignment: for a minimum of one (1) year]; b. Review visitor access records [FedRAMP Assignment: at least monthly]; and c. Report anomalies in visitor access records to [Assignment: organization-defined personnel].

๐Ÿ’ผ PERF01-BP01 Learn about and understand available cloud services and features

Continually learn about and discover available services and configurations that help you make better architectural decisions and improve performance efficiency in your workload architecture. **Common anti-patterns:** - You use the cloud as a collocated data center. - You do not modernize your application after migration to the cloud. - You only use one storage type for all things that need to be persisted. - You use instance types that are closest matched to your current standards, but are larger where needed. - You deploy and manage technologies that are available as managed services. **Benefits of establishing this best practice:** By considering new services and configurations, you may be able to greatly improve performance, reduce cost, and optimize the effort required to maintain your workload. It can also help you accelerate the time-to-value for cloud-enabled products. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS continually releases new services and features that can improve performance and reduce the cost of cloud workloads. Staying up-to-date with these new services and features is crucial for maintaining performance efficacy in the cloud. Modernizing your workload architecture also helps you accelerate productivity, drive innovation, and unlock more growth opportunities. ### Implementation steps - Inventory your workload software and architecture for related services. Decide which category of products to learn more about. - Explore AWS offerings to identify and learn about the relevant services and configuration options that can help you improve performance and reduce cost and operational complexity. - Use Amazon Q to get relevant information and advice about services. - Use sandbox (non-production) environments to learn and experiment with new services without incurring extra cost. - Continually learn about new cloud services and features.

๐Ÿ’ผ PERF01-BP02 Use guidance from your cloud provider or an appropriate partner to learn about architecture patterns and best practices

Use cloud company resources such as documentation, solutions architects, professional services, or appropriate partners to guide your architectural decisions. These resources help you review and improve your architecture for optimal performance. **Common anti-patterns:** - You use AWS as a common cloud provider. - You use AWS services in a manner that they were not designed for. - You follow all guidance without considering your business context. **Benefits of establishing this best practice:** Using guidance from a cloud provider or an appropriate partner can help you to make the right architectural choices for your workload and give you confidence in your decisions. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance AWS offers a wide range of guidance, documentation, and resources that can help you build and manage efficient cloud workloads. AWS documentation provides code samples, tutorials, and detailed service explanations. In addition to documentation, AWS provides training and certification programs, solutions architects, and professional services that can help customers explore different aspects of cloud services and implement efficient cloud architecture on AWS. Leverage these resources to gain insights into valuable knowledge and best practices, save time, and achieve better outcomes in the AWS Cloud. ### Implementation steps - Review AWS documentation and guidance and follow the best practices. - Join AWS partner events (like AWS Global Summits, AWS re:Invent, user groups, and workshops) to learn from AWS experts about best practices for using AWS services. - Reach out to AWS for assistance when you need additional guidance or product information. AWS Solutions Architects and AWS Professional Services provide guidance for solution implementation. AWS Partners provide AWS expertise to help you unlock agility and innovation for your business. - Use Support if you need technical support to use a service effectively. Our Support plans are designed to give you the right mix of tools and access to expertise so that you can be successful with AWS while optimizing performance, managing risk, and keeping costs under control.

๐Ÿ’ผ PERF01-BP03 Factor cost into architectural decisions

Factor cost into your architectural decisions to improve resource utilization and performance efficiency of your cloud workload. When you are aware of the cost implications of your cloud workload, you are more likely to leverage efficient resources and reduce wasteful practices. **Common anti-patterns:** - You only use one family of instances. - You do not evaluate licensed solutions against open-source solutions. - You do not define storage lifecycle policies. - You do not review new services and features of the AWS Cloud. - You only use block storage. **Benefits of establishing this best practice:** Factoring cost into your decision making allows you to use more efficient resources and explore other investments. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Optimizing workloads for cost can improve resource utilization and avoid waste in a cloud workload. Factoring cost into architectural decisions usually includes right-sizing workload components and enabling elasticity, which results in improved cloud workload performance efficiency. ### Implementation steps - Establish cost objectives like budget limits for your cloud workload. - Identify the key components (like instances and storage) that drive cost of your workload. You can use AWS Pricing Calculator and AWS Cost Explorer to identify key cost drivers in your workload. - Understand pricing models in the cloud, such as On-Demand, Reserved Instances, Savings Plans, and Spot Instances. - Use Well-Architected cost optimization best practices to optimize these key components for cost. - Continually monitor and analyze cost to identify cost optimization opportunities in your workload. - Use AWS Budgets to get alerts for unacceptable costs. - Use AWS Compute Optimizer or AWS Trusted Advisor to get cost optimization recommendations. - Use AWS Cost Anomaly Detection to get automated cost anomaly detection and root cause analysis.

๐Ÿ’ผ PERF01-BP04 Evaluate how trade-offs impact customers and architecture efficiency

When evaluating performance-related improvements, determine which choices impact your customers and workload efficiency. For example, if using a key-value data store increases system performance, it is important to evaluate how the eventually consistent nature of this change will impact customers. **Common anti-patterns:** - You assume that all performance gains should be implemented, even if there are tradeoffs for implementation. - You only evaluate changes to workloads when a performance issue has reached a critical point. **Benefits of establishing this best practice:** When you are evaluating potential performance-related improvements, you must decide if the tradeoffs for the changes are acceptable with the workload requirements. In some cases, you may have to implement additional controls to compensate for the tradeoffs. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Identify critical areas in your architecture in terms of performance and customer impact. Determine how you can make improvements, what trade-offs those improvements bring, and how they impact the system and the user experience. For example, implementing caching data can help dramatically improve performance but requires a clear strategy for how and when to update or invalidate cached data to prevent incorrect system behavior. ### Implementation steps - Understand your workload requirements and SLAs. - Clearly define evaluation factors. Factors may relate to cost, reliability, security, and performance of your workload. - Select architecture and services that can address your requirements. - Conduct experimentation and proof of concepts (POCs) to evaluate trade-off factors and impact on customers and architecture efficiency. Usually, highly-available, performant, and secure workloads consume more cloud resources while providing better customer experience. Understand the trade-offs across your workloadโ€™s complexity, performance, and cost. Typically, prioritizing two of the factors comes at the expense of the third.

๐Ÿ’ผ PERF01-BP05 Use policies and reference architectures

Use internal policies and existing reference architectures when selecting services and configurations to be more efficient when designing and implementing your workload. **Common anti-patterns:** - You allow a wide variety of technology that may impact the management overhead of your company. **Benefits of establishing this best practice:** Establishing a policy for architecture, technology, and vendor choices allows decisions to be made quickly. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Having internal policies in selecting resources and architecture provides standards and guidelines to follow when making architectural choices. Those guidelines streamline the decision-making process when choosing the right cloud service and can help improve performance efficiency. Deploy your workload using policies or reference architectures. Integrate the services into your cloud deployment, then use your performance tests to verify that you can continue to meet your performance requirements. ### Implementation steps - Clearly understand the requirements of your cloud workload. - Review internal and external policies to identify the most relevant ones. - Use the appropriate reference architectures provided by AWS or your industry best practices. - Create a continuum consisting of policies, standards, reference architectures, and prescriptive guidelines for common situations. Doing so allows your teams to move faster. Tailor the assets for your vertical if applicable. - Validate these policies and reference architectures for your workload in sandbox environments. - Stay up-to-date with industry standards and AWS updates to make sure your policies and reference architectures help optimize your cloud workload.

๐Ÿ’ผ PERF01-BP06 Use benchmarking to drive architectural decisions

Benchmark the performance of an existing workload to understand how it performs on the cloud and drive architectural decisions based on that data. **Common anti-patterns:** - You rely on common benchmarks that are not indicative of your workloadโ€™s characteristics. - You rely on customer feedback and perceptions as your only benchmark. **Benefits of establishing this best practice:** Benchmarking your current implementation allows you to measure performance improvements. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Use benchmarking with synthetic tests to assess how your workloadโ€™s components perform. Benchmarking is generally quicker to set up than load testing and is used to evaluate the technology for a particular component. Benchmarking is often used at the start of a new project, when you lack a full solution to load test. You can either build your own custom benchmark tests or use an industry standard test, such as TPC-DS, to benchmark your workloads. Industry benchmarks are helpful when comparing environments. Custom benchmarks are useful for targeting specific types of operations that you expect to make in your architecture. When benchmarking, it is important to pre-warm your test environment to get valid results. Run the same benchmark multiple times to verify that youโ€™ve captured any variance over time. Because benchmarks are generally faster to run than load tests, they can be used earlier in the deployment pipeline and provide faster feedback on performance deviations. When you evaluate a significant change in a component or service, a benchmark can be a quick way to see if you can justify the effort to make the change. Using benchmarking in conjunction with load testing is important because load testing informs you about how your workload performs in production. ### Implementation steps - **Plan and define:** - Define the objectives, baseline, testing scenarios, metrics (like CPU utilization, latency, or throughput), and KPIs for your benchmark. - Focus on user requirements in terms of user experience and factors such as response time and accessibility. - Identify a benchmarking tool that is suitable for your workload. You can use AWS services like Amazon CloudWatch or a third-party tool that is compatible with your workload. - **Configure and instrument:** - Set up your environment and configure your resources. - Implement monitoring and logging to capture testing results. - **Benchmark and monitor:** - Perform your benchmark tests and monitor the metrics during the test. - **Analyze and document:** - Document your benchmarking process and findings. - Analyze the results to identify bottlenecks, trends, and areas of improvement. - Use test results to make architectural decisions and adjust your workload. This may include changing services or adopting new features. - **Optimize and repeat:** - Adjust resource configurations and allocations based on your benchmarks. - Retest your workload after the adjustment to validate your improvements. - Document your learnings, and repeat the process to identify other areas of improvement.

๐Ÿ’ผ PERF01-BP07 Use a data-driven approach for architectural choices

Define a clear, data-driven approach for architectural choices to verify that the right cloud services and configurations are used to meet your specific business needs. **Common anti-patterns:** - You assume your current architecture is static and should not be updated over time. - Your architectural choices are based upon guesses and assumptions. - You introduce architecture changes over time without justification. **Benefits of establishing this best practice:** By having a well-defined approach for making architectural choices, you use data to influence your workload design and make informed decisions over time. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Use internal experience and knowledge of the cloud or external resources such as published use cases or whitepapers to choose resources and services in your architecture. You should have a well-defined process that encourages experimentation and benchmarking with the services that could be used in your workload. Backlogs for critical workloads should consist of not just user stories which deliver functionality relevant to business and users, but also technical stories which form an architecture runway for the workload. This runway is informed by new advancements in technology and new services and adopts them based on data and proper justification. This verifies that the architecture remains future-proof and does not stagnate. ### Implementation steps - Engage with key stakeholders to define workload requirements, including performance, availability, and cost considerations. Consider factors such as the number of users and usage pattern for your workload. - Create an architecture runway or a technology backlog which is prioritized along with the functional backlog. - Evaluate and assess different cloud services (for more detail, see PERF01-BP01 Learn about and understand available cloud services and features). - Explore different architectural patterns, like microservices or serverless, that meet your performance requirements (for more detail, see PERF01-BP02 Use guidance from your cloud provider or an appropriate partner to learn about architecture patterns and best practices). - Consult other teams, architecture diagrams, and resources, such as AWS Solution Architects, AWS Architecture Center, and AWS Partner Network, to help you choose the right architecture for your workload. - Define performance metrics like throughput and response time that can help you evaluate the performance of your workload. - Experiment and use defined metrics to validate the performance of the selected architecture. - Continually monitor and make adjustments as needed to maintain the optimal performance of your architecture. - Document your selected architecture and decisions as a reference for future updates and learnings. - Continually review and update the architecture selection approach based on learnings, new technologies, and metrics that indicate a needed change or problem in the current approach.

๐Ÿ’ผ PERF02-BP01 Select the best compute options for your workload

Selecting the most appropriate compute option for your workload allows you to improve performance, reduce unnecessary infrastructure costs, and lower the operational efforts required to maintain your workload. **Common anti-patterns:** - You use the same compute option that was used on premises. - You lack awareness of the cloud compute options, features, and solutions, and how those solutions might improve your compute performance. - You over-provision an existing compute option to meet scaling or performance requirements when an alternative compute option would align to your workload characteristics more precisely. **Benefits of establishing this best practice:** By identifying the compute requirements and evaluating against the options available, you can make your workload more resource efficient. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To optimize your cloud workloads for performance efficiency, it is important to select the most appropriate compute options for your use case and performance requirements. AWS provides a variety of compute options that cater to different workloads in the cloud. For instance, you can use Amazon EC2 to launch and manage virtual servers, AWS Lambda to run code without having to provision or manage servers, Amazon ECS or Amazon EKS to run and manage containers, or AWS Batch to process large volumes of data in parallel. Based on your scale and compute needs, you should choose and configure the optimal compute solution for your situation. You can also consider using multiple types of compute solutions in a single workload, as each one has its own advantages and drawbacks. The following steps guide you through selecting the right compute options to match your workload characteristics and performance requirements. ### Implementation steps - Understand your workload compute requirements. Key requirements to consider include processing needs, traffic patterns, data access patterns, scaling needs, and latency requirements. - Learn about different AWS compute services for your workload. For more information, see PERF01-BP01 Learn about and understand available cloud services and features. Here are some key AWS compute options, their characteristics, and common use cases: | AWS service | Key characteristics | Common use cases | |-------------|-------------------|----------------| | Amazon Elastic Compute Cloud (Amazon EC2) | Has dedicated option for hardware, license requirements, large selection of different instance families, processor types and compute accelerators | Lift and shift migrations, monolithic application, hybrid environments, enterprise applications | | Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS) | Easy deployment, consistent environments, scalable | Microservices, hybrid environments | | AWS Lambda | Serverless compute service that runs code in response to events and automatically manages the underlying compute resources | Microservices, event-driven applications | | AWS Batch | Efficiently and dynamically provisions Amazon ECS, Amazon EKS, and AWS Fargate compute resources, with an option to use On-Demand or Spot Instances based on your job requirements | HPC, train ML models | | Amazon Lightsail | Preconfigured Linux and Windows application for running small workloads | Simple web applications, custom website | - Evaluate cost (like hourly charge or data transfer) and management overhead (like patching and scaling) associated to each compute option. - Perform experiments and benchmarking in a non-production environment to identify which compute option can best address your workload requirements. - Once you have experimented and identified your new compute solution, plan your migration and validate your performance metrics. - Use AWS monitoring tools like Amazon CloudWatch and optimization services like AWS Compute Optimizer to continually optimize your compute resources based on real-world usage patterns.

๐Ÿ’ผ PERF02-BP02 Understand the available compute configuration and features

Understand the available configuration options and features for your compute service to help you provision the right amount of resources and improve performance efficiency. **Common anti-patterns:** - You do not evaluate compute options or available instance families against workload characteristics. - You over-provision compute resources to meet peak-demand requirements. **Benefits of establishing this best practice:** Be familiar with AWS compute features and configurations so that you can use a compute solution optimized to meet your workload characteristics and needs. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Each compute solution has unique configurations and features available to support different workload characteristics and requirements. Learn how these options complement your workload, and determine which configuration options are best for your application. Examples of these options include instance family, sizes, features (GPU, I/O), bursting, time-outs, function sizes, container instances, and concurrency. If your workload has been using the same compute option for more than four weeks and you anticipate that the characteristics will remain the same in the future, you can use AWS Compute Optimizer to find out if your current compute option is suitable for the workloads from CPU and memory perspective. ### Implementation steps - Understand workload requirements (like CPU need, memory, and latency). - Review AWS documentation and best practices to learn about recommended configuration options that can help improve compute performance. Here are some key configuration options to consider: | Configuration option | Examples | |---------------------|---------| | Instance type | Compute-optimized instances are ideal for workloads that require a high vCPU to memory ratio. Memory-optimized instances deliver large amounts of memory to support memory-intensive workloads. Storage-optimized instances are designed for workloads that require high, sequential read and write access (IOPS) to local storage. | | Pricing model | On-Demand Instances let you use the compute capacity by the hour or second with no long-term commitment. These instances are good for bursting above performance baseline needs. Savings Plans offer significant savings over On-Demand Instances in exchange for a commitment to use a specific amount of compute power for a one or three-year period. Spot Instances let you take advantage of unused instance capacity at a discount for your stateless, fault-tolerant workloads. | | Auto Scaling | Use Auto Scaling configuration to match compute resources to traffic patterns. | | Sizing | Use Compute Optimizer to get machine-learning-powered recommendations on which compute configuration best matches your compute characteristics. Use AWS Lambda Power Tuning to select the best configuration for your Lambda function. | | Hardware-based compute accelerators | Accelerated computing instances perform functions like graphics processing or data pattern matching more efficiently than CPU-based alternatives. For machine learning workloads, take advantage of purpose-built hardware that is specific to your workload, such as AWS Trainium, AWS Inferentia, and Amazon EC2 DL1. |

๐Ÿ’ผ PERF02-BP03 Collect compute-related metrics

Record and track compute-related metrics to better understand how your compute resources are performing and improve their performance and their utilization. **Common anti-patterns:** - You only use manual log file searching for metrics. - You only use the default metrics recorded by your monitoring software. - You only review metrics when there is an issue. **Benefits of establishing this best practice:** Collecting performance-related metrics will help you align application performance with business requirements to ensure that you are meeting your workload needs. It can also help you continually improve the resource performance and utilization in your workload. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Cloud workloads can generate large volumes of data such as metrics, logs, and events. In the AWS Cloud, collecting metrics is a crucial step to improve security, cost efficiency, performance, and sustainability. AWS provides a wide range of performance-related metrics using monitoring services such as Amazon CloudWatch to provide you with valuable insights. Metrics such as CPU utilization, memory utilization, disk I/O, and network inbound and outbound can provide insight into utilization levels or performance bottlenecks. Use these metrics as part of a data-driven approach to actively tune and optimize your workload's resources. In an ideal case, you should collect all metrics related to your compute resources in a single platform with retention policies implemented to support cost and operational goals. ### Implementation steps - Identify which performance-related metrics are relevant to your workload. You should collect metrics around resource utilization and the way your cloud workload is operating (like response time and throughput). - Default metrics examples: - Amazon EC2 default metrics - Amazon ECS default metrics - Amazon EKS default metrics - Lambda default metrics - Amazon EC2 memory and disk metrics - Choose and set up the right logging and monitoring solution for your workload: - AWS native Observability - AWS Distro for OpenTelemetry - Amazon Managed Service for Prometheus - Define the required filter and aggregation for the metrics based on your workload requirements. - Quantify custom application metrics with Amazon CloudWatch Logs and metric filters. - Collect custom metrics with Amazon CloudWatch strategic tagging. - Configure data retention policies for your metrics to match your security and operational goals: - Default data retention for CloudWatch metrics - Default data retention for CloudWatch Logs - If required, create alarms and notifications for your metrics to help you proactively respond to performance-related issues: - Create alarms for custom metrics using Amazon CloudWatch anomaly detection - Create metrics and alarms for specific web pages with Amazon CloudWatch RUM - Use automation to deploy your metric and log aggregation agents: - AWS Systems Manager automation - OpenTelemetry Collector

๐Ÿ’ผ PERF02-BP04 Configure and right-size compute resources

Configure and right-size compute resources to match your workloadโ€™s performance requirements and avoid under- or over-utilized resources. **Common anti-patterns:** - You ignore your workload performance requirements resulting in over-provisioned or under-provisioned compute resources. - You only choose the largest or smallest instance available for all workloads. - You only use one instance family for ease of management. - You ignore recommendations from AWS Cost Explorer or Compute Optimizer for right-sizing. - You do not re-evaluate the workload for suitability of new instance types. - You certify only a small number of instance configurations for your organization. **Benefits of establishing this best practice:** Right-sizing compute resources ensures optimal operation in the cloud by avoiding over-provisioning and under-provisioning resources. Properly sizing compute resources typically results in better performance and enhanced customer experience, while also lowering cost. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Right-sizing allows organizations to operate their cloud infrastructure in an efficient and cost-effective manner while addressing their business needs. Over-provisioning cloud resources can lead to extra costs, while under-provisioning can result in poor performance and a negative customer experience. AWS provides tools such as AWS Compute Optimizer and AWS Trusted Advisor that use historical data to provide recommendations to right-size your compute resources. ### Implementation steps - Choose an instance type to best fit your needs - Analyze the various performance characteristics of your workload and how these characteristics relate to memory, network, and CPU usage. Use this data to choose resources that best match your workload's profile and performance goals. - Monitor your resource usage using AWS monitoring tools such as Amazon CloudWatch. - Select the right configuration for compute resources: - For ephemeral workloads, evaluate instance Amazon CloudWatch metrics such as CPUUtilization to identify if the instance is under-utilized or over-utilized. - For stable workloads, check AWS rightsizing tools such as AWS Compute Optimizer and AWS Trusted Advisor at regular intervals to identify opportunities to optimize and right-size the compute resource. - Test configuration changes in a non-production environment before implementing in a live environment. - Continually re-evaluate new compute offerings and compare against your workloadโ€™s needs.

๐Ÿ’ผ PERF02-BP05 Scale your compute resources dynamically

Use the elasticity of the cloud to scale your compute resources up or down dynamically to match your needs and avoid over- or under-provisioning capacity for your workload. **Common anti-patterns:** - You react to alarms by manually increasing capacity. - You use the same sizing guidelines (generally static infrastructure) as in on-premises. - You leave increased capacity after a scaling event instead of scaling back down. **Benefits of establishing this best practice:** Configuring and testing the elasticity of compute resources can help you save money, maintain performance benchmarks, and improve reliability as traffic changes. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS provides the flexibility to scale your resources up or down dynamically through a variety of scaling mechanisms in order to meet changes in demand. Combined with compute-related metrics, dynamic scaling allows workloads to automatically respond to changes and use the optimal set of compute resources to achieve its goal. You can use a number of different approaches to match supply of resources with demand: - **Target-tracking approach:** Monitor your scaling metric and automatically increase or decrease capacity as needed. - **Predictive scaling:** Scale in anticipation of daily and weekly trends. - **Schedule-based approach:** Set your own scaling schedule according to predictable load changes. - **Service scaling:** Choose services (like serverless) that automatically scale by design. You must ensure that workload deployments can handle both scale-up and scale-down events. ### Implementation steps - Compute instances, containers, and functions provide mechanisms for elasticity, either in combination with autoscaling or as a feature of the service. Examples of automatic scaling mechanisms: | Autoscaling Mechanism | Where to use | |----------------------------------------|----------------------------------------------------------------------------| | Amazon EC2 Auto Scaling | To ensure you have the correct number of Amazon EC2 instances available to handle the user load for your application.| | Application Auto Scaling | To automatically scale the resources for individual AWS services beyond Amazon EC2 such as AWS Lambda functions or Amazon ECS services.| | Kubernetes Cluster Autoscaler/Karpenter| To automatically scale Kubernetes clusters.| - Scaling is often discussed related to compute services like Amazon EC2 Instances or AWS Lambda functions. Be sure to also consider the configuration of non-compute services like AWS Glue to match the demand. - Verify that the metrics for scaling match the characteristics of the workload being deployed. For example, for a video transcoding application, 100% CPU utilization is expected and should not be your primary metric. Use the depth of the transcoding job queue instead. You can use a customized metric for your scaling policy if required. To choose the right metrics, consider the following guidance for Amazon EC2: - The metric should be a valid utilization metric and describe how busy an instance is. - The metric value must increase or decrease proportionally to the number of instances in the Auto Scaling group. - Make sure that you use dynamic scaling instead of manual scaling for your Auto Scaling group. Target tracking scaling policies are recommended. - Verify that workload deployments can handle both scaling events (up and down). For example, use Activity history to verify a scaling activity for an Auto Scaling group. - Evaluate your workload for predictable patterns and proactively scale as you anticipate predicted and planned changes in demand. Predictive scaling can eliminate the need to overprovision capacity.

๐Ÿ’ผ PERF02-BP06 Use optimized hardware-based compute accelerators

Use hardware accelerators to perform certain functions more efficiently than CPU-based alternatives. **Common anti-patterns:** - In your workload, you haven't benchmarked a general-purpose instance against a purpose-built instance that can deliver higher performance and lower cost. - You are using hardware-based compute accelerators for tasks that can be more efficient using CPU-based alternatives. - You are not monitoring GPU usage. **Benefits of establishing this best practice:** By using hardware-based accelerators, such as graphics processing units (GPUs) and field programmable gate arrays (FPGAs), you can perform certain processing functions more efficiently. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Accelerated computing instances provide access to hardware-based compute accelerators such as GPUs and FPGAs. These hardware accelerators perform certain functions like graphic processing or data pattern matching more efficiently than CPU-based alternatives. Many accelerated workloads, such as rendering, transcoding, and machine learning, are highly variable in terms of resource usage. Only run this hardware for the time needed, and decommission them with automation when not required to improve overall performance efficiency. ### Implementation steps - Identify which accelerated computing instances can address your requirements. - For machine learning workloads, take advantage of purpose-built hardware that is specific to your workload, such as AWS Trainium, AWS Inferentia, and Amazon EC2 DL1. AWS Inferentia instances, such as Inf2 instances, offer up to 50% better performance/watt over comparable Amazon EC2 instances. - Collect usage metrics for your accelerated computing instances. For example, use CloudWatch agent to collect metrics such as `utilization_gpu` and `utilization_memory` for your GPUs, as shown in Collect NVIDIA GPU metrics with Amazon CloudWatch. - Optimize the code, network operations, and settings of hardware accelerators to ensure the underlying hardware is fully utilized. - Optimize GPU settings. - GPU Monitoring and Optimization in the Deep Learning AMI. - Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker AI. - Use the latest high-performance libraries and GPU drivers. - Use automation to release GPU instances when not in use.

๐Ÿ’ผ PERF03-BP01 Use a purpose-built data store that best supports your data access and storage requirements

Understand data characteristics (like shareable, size, cache size, access patterns, latency, throughput, and persistence of data) to select the right purpose-built data stores (storage or database) for your workload. **Common anti-patterns** - You stick to one data store because there is internal experience and knowledge of one particular type of database solution. - You assume that all workloads have similar data storage and access requirements. - You have not implemented a data catalog to inventory your data assets. **Benefits of establishing this best practice** Understanding data characteristics and requirements allows you to determine the most efficient and performant storage technology appropriate for your workload needs. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance When selecting and implementing data storage, make sure that the querying, scaling, and storage characteristics support the workload data requirements. AWS provides numerous data storage and database technologies including block storage, object storage, streaming storage, file system, relational, key-value, document, in-memory, graph, time series, and ledger databases. Each data management solution has options and configurations available to you to support your use-cases and data models. By understanding data characteristics and requirements, you can break away from monolithic storage technology and restrictive, one-size-fits-all approaches to focus on managing data appropriately. ### Implementation steps 1. Conduct an inventory of the various data types that exist in your workload. 2. Understand and document data characteristics and requirements, including: - Data type (unstructured, semi-structured, relational) - Data volume and growth - Data durability: persistent, ephemeral, transient - ACID (atomicity, consistency, isolation, durability) requirements - Data access patterns (read-heavy or write-heavy) - Latency - Throughput - IOPS (input/output operations per second) - Data retention period 3. Learn about different data stores (storage and database services) available for your workload on AWS that can meet your data characteristics, as outlined in PERF01-BP01 Learn about and understand available cloud services and features. Some examples of AWS storage technologies and their key characteristics include: | Type | AWS Services | Key characteristics | | ---- | ------------ | ------------------- | | Object storage | Amazon S3 | Unlimited scalability, high availability, and multiple options for accessibility. Transferring and accessing objects in and out of Amazon S3 can use a service, such as Transfer Acceleration or Access Points, to support your location, security needs, and access patterns.| | Archiving storage | Amazon Glacier | Built for data archiving. | | Streaming storage | Amazon Kinesis, Amazon Managed Streaming for Apache Kafka (Amazon MSK) | Efficient ingestion and storage of streaming data.| | Shared file system | Amazon Elastic File System (Amazon EFS) | Mountable file system that can be accessed by multiple types of compute solutions.| | Shared file system | Amazon FSx | Built on the latest AWS compute solutions to support four commonly used file systems: NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. Amazon FSx latency, throughput, and IOPS vary per file system and should be considered when selecting the right file system for your workload needs.| | Block storage | Amazon Elastic Block Store (Amazon EBS) | Scalable, high-performance block-storage service designed for Amazon Elastic Compute Cloud (Amazon EC2). Amazon EBS includes SSD-backed storage for transactional, IOPS-intensive workloads and HDD-backed storage for throughput-intensive workloads.| | Relational database | Amazon Aurora, Amazon RDS, Amazon Redshift | Designed to support ACID transactions, and maintain referential integrity and strong data consistency. Many traditional applications, ERP, CRM, and ecommerce use relational databases to store their data. | | Key-value database | Amazon DynamoDB | Optimized for common access patterns, typically to store and retrieve large volumes of data. High-traffic web apps, ecommerce systems, and gaming applications are typical use-cases.| | Document database | Amazon DocumentDB | Designed to store semi-structured data as JSON-like documents. These databases help developers build and update applications such as content management, catalogs, and user profiles quickly.| | In-memory database | Amazon ElastiCache, Amazon MemoryDB for Redis | Used for applications that require real-time access to data, lowest latency and highest throughput. Applications include caching, session management, gaming leaderboards, low latency ML feature store, microservices messaging, and high-throughput streaming mechanisms.| | Graph database | Amazon Neptune | Used for applications that must navigate and query millions of relationships between highly connected graph datasets with millisecond latency at large scale. Common uses include fraud detection, social networking, and recommendation engines. | | Time Series database | Amazon Timestream | Used to efficiently collect, synthesize, and derive insights from data that changes over time. IoT applications, DevOps, and industrial telemetry can utilize time-series databases.| | Wide column | Amazon Keyspaces (for Apache Cassandra) | Uses tables, rows, and columns, but column names and formats can vary per row. Commonly used in high-scale industrial apps like equipment maintenance, fleet management, and route optimization.| | Ledger | Amazon Quantum Ledger Database (Amazon QLDB) | Provides a centralized and trusted authority to maintain a scalable, immutable, and cryptographically verifiable record of transactions. Commonly used in systems of record, supply chain, registrations, and banking transactions.| 4. If you are building a data platform, leverage modern data architecture on AWS to integrate your data lake, data warehouse, and purpose-built data stores. 5. The key questions that you need to consider when choosing a data store for your workload are as follows: | Question | Things to consider | | -------- | ------------------- | | How is the data structured? | If the data is unstructured, consider an object-store such as Amazon S3 or a NoSQL database such as Amazon DocumentDB | | | For key-value data, consider DynamoDB, Amazon ElastiCache (Redis OSS) or Amazon MemoryDB| | What level of referential integrity is required? | For foreign key constraints, relational databases such as Amazon RDS and Aurora can provide this level of integrity.| | | Typically, within a NoSQL data-model, you would de-normalize your data into a single document or collection of documents to be retrieved in a single request rather than joining across documents or tables. | | Is ACID (atomicity, consistency, isolation, durability) compliance required? | If the ACID properties associated with relational databases are required, consider a relational database such as Amazon RDS and Aurora. | | | If strong consistency is required for NoSQL database, you can use strongly consistent reads with DynamoDB. | | How will the storage requirements change over time? How does this impact scalability? | Serverless databases such as DynamoDB and Amazon Quantum Ledger Database (Amazon QLDB) will scale dynamically. | | | Relational databases have upper bounds on provisioned storage, and often must be horizontally partitioned using mechanisms such as sharding once they reach these limits. | | What is the proportion of read queries in relation to write queries? Would caching be likely to improve performance? | Read-heavy workloads can benefit from a caching layer, like ElastiCache or DAX if the database is DynamoDB. | | | Reads can also be offloaded to read replicas with relational databases such as Amazon RDS. | | Does storage and modification (OLTP - Online Transaction Processing) or retrieval and reporting (OLAP - Online Analytical Processing) have a higher priority? | For high-throughput read as-is transactional processing, consider a NoSQL database such as DynamoDB. | | | For high-throughput and complex read patterns (like join) with consistency use Amazon RDS. | | | For analytical queries, consider a columnar database such as Amazon Redshift or exporting the data to Amazon S3 and performing analytics using Athena or Amazon QuickSight. | | What level of durability does the data require? | Aurora automatically replicates your data across three Availability Zones within a Region, meaning your data is highly durable with less chance of data loss. | | | DynamoDB is automatically replicated across multiple Availability Zones, providing high availability and data durability. | | | Amazon S3 provides 11 nines of durability. Many database services, such as Amazon RDS and DynamoDB, support exporting data to Amazon S3 for long-term retention and archival. | | Is there a desire to move away from commercial database engines or licensing costs? | Consider open-source engines such as PostgreSQL and MySQL on Amazon RDS or Aurora. | | | Leverage AWS Database Migration Service and AWS Schema Conversion Tool to perform migrations from commercial database engines to open-source | | What is the operational expectation for the database? Is moving to managed services a primary concern? | Leveraging Amazon RDS instead of Amazon EC2, and DynamoDB or Amazon DocumentDB instead of self-hosting a NoSQL database can reduce operational overhead. | | How is the database currently accessed? Is it only application access, or are there business intelligence (BI) users and other connected off-the-shelf applications? | If you have dependencies on external tooling then you may have to maintain compatibility with the databases they support. Amazon RDS is fully compatible with the difference engine versions that it supports including Microsoft SQL Server, Oracle, MySQL, and PostgreSQL. | 6. Perform experiments and benchmarking in a non-production environment to identify which data store can address your workload requirements.

๐Ÿ’ผ PERF03-BP02 Evaluate available configuration options for data store

Understand and evaluate the various features and configuration options available for your data stores to optimize storage space and performance for your workload. **Common anti-patterns** - You only use one storage type, such as Amazon EBS, for all workloads. - You use provisioned IOPS for all workloads without real-world testing against all storage tiers. - You are not aware of the configuration options of your chosen data management solution. - You rely solely on increasing instance size without looking at other available configuration options. - You are not testing the scaling characteristics of your data store. **Benefits of establishing this best practice** By exploring and experimenting with the data store configurations, you may be able to reduce the cost of infrastructure, improve performance, and lower the effort required to maintain your workloads. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance A workload could have one or more data stores used based on data storage and access requirements. To optimize your performance efficiency and cost, you must evaluate data access patterns to determine the appropriate data store configurations. While you explore data store options, take into consideration various aspects such as the storage options, memory, compute, read replica, consistency requirements, connection pooling, and caching options. Experiment with these various configuration options to improve performance efficiency metrics. ### Implementation steps 1. Understand the current configurations (like instance type, storage size, or database engine version) of your data store. 2. Review AWS documentation and best practices to learn about recommended configuration options that can help improve the performance of your data store. Key data store options to consider are the following: - **Offloading reads (like read replicas and caching):** - For DynamoDB tables, you can offload reads using DAX for caching. - You can create an Amazon ElastiCache (Redis OSS) cluster and configure your application to read from the cache first, falling back to the database if the requested item is not present. - Relational databases such as Amazon RDS and Aurora, and provisioned NoSQL databases such as Neptune and Amazon DocumentDB all support adding read replicas to offload the read portions of the workload. - Serverless databases such as DynamoDB will scale automatically. Ensure that you have enough read capacity units (RCU) provisioned to handle the workload. - **Scaling writes (like partition key sharding or introducing a queue):** - For relational databases, you can increase the size of the instance to accommodate an increased workload or increase the provisioned IOPs to allow for an increased throughput to the underlying storage. - You can also introduce a queue in front of your database rather than writing directly to the database. This pattern allows you to decouple the ingestion from the database and control the flow-rate so the database does not get overwhelmed. - Batching your write requests rather than creating many short-lived transactions can help improve throughput in high-write volume relational databases. - Serverless databases like DynamoDB can scale the write throughput automatically or by adjusting the provisioned write capacity units (WCU) depending on the capacity mode. - You can still run into issues with hot partitions when you reach the throughput limits for a given partition key. This can be mitigated by choosing a more evenly distributed partition key or by write-sharding the partition key. - **Policies to manage the lifecycle of your datasets:** - You can use Amazon S3 Lifecycle to manage your objects throughout their lifecycle. If the access patterns are unknown, changing, or unpredictable, you can use Amazon S3 Intelligent-Tiering, which monitors access patterns and automatically moves objects that have not been accessed to lower-cost access tiers. - You can leverage Amazon S3 Storage Lens metrics to identify optimization opportunities and gaps in lifecycle management. - Amazon EFS lifecycle management automatically manages file storage for your file systems. - **Connection management and pooling:** - Amazon RDS Proxy can be used with Amazon RDS and Aurora to manage connections to the database. - Serverless databases such as DynamoDB do not have connections associated with them, but consider the provisioned capacity and automatic scaling policies to deal with spikes in load. 3. Perform experiments and benchmarking in a non-production environment to identify which configuration option can address your workload requirements. 4. Once you have experimented, plan your migration and validate your performance metrics. 5. Use AWS monitoring (like Amazon CloudWatch) and optimization (like Amazon S3 Storage Lens) tools to continuously optimize your data store using real-world usage patterns.

๐Ÿ’ผ PERF03-BP03 Collect and record data store performance metrics

Track and record relevant performance metrics for your data store to understand how your data management solutions are performing. These metrics can help you optimize your data store, verify that your workload requirements are met, and provide a clear overview on how the workload performs. **Common anti-patterns** - You only use manual log file searching for metrics. - You only publish metrics to internal tools used by your team and donโ€™t have a comprehensive picture of your workload. - You only use the default metrics recorded by your selected monitoring software. - You only review metrics when there is an issue. - You only monitor system-level metrics and do not capture data access or usage metrics. **Benefits of establishing this best practice** Establishing a performance baseline helps you understand the normal behavior and requirements of workloads. Abnormal patterns can be identified and debugged faster, improving the performance and reliability of the data store. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To monitor the performance of your data stores, you must record multiple performance metrics over a period of time. This allows you to detect anomalies, as well as measure performance against business metrics to verify you are meeting your workload needs. Metrics should include both the underlying system that is supporting the data store and the database metrics. The underlying system metrics might include CPU utilization, memory, available disk storage, disk I/O, cache hit ratio, and network inbound and outbound metrics, while the data store metrics might include transactions per second, top queries, average query rates, response times, index usage, table locks, query timeouts, and number of connections open. This data is crucial to understand how the workload is performing and how the data management solution is used. Use these metrics as part of a data-driven approach to tune and optimize your workload's resources. Use tools, libraries, and systems that record performance measurements related to database performance. ### Implementation steps 1. Identify the key performance metrics for your data store to track. 2. Track metrics for specific AWS services: 3. Use an approved logging and monitoring solution to collect these metrics. - Amazon CloudWatch can collect metrics across the resources in your architecture. - You can also collect and publish custom metrics to surface business or derived metrics. - Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached. 4. Check if data store monitoring can benefit from a machine learning solution that detects performance anomalies. For example, Amazon DevOps Guru for Amazon RDS provides visibility into performance issues and makes recommendations for corrective actions. 5. Configure data retention in your monitoring and logging solution to match your security and operational goals.

๐Ÿ’ผ PERF03-BP04 Implement strategies to improve query performance in data store

Implement access patterns that can benefit from caching data for fast retrieval of frequently accessed data. **Common anti-patterns** - You cache data that changes frequently. - You rely on cached data as if it is durably stored and always available. - You don't consider the consistency of your cached data. - You don't monitor the efficiency of your caching implementation. **Benefits of establishing this best practice** Storing data in a cache can improve read latency, read throughput, user experience, and overall efficiency, as well as reduce costs. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance A cache is a software or hardware component aimed at storing data so that future requests for the same data can be served faster or more efficiently. The data stored in a cache can be reconstructed if lost by repeating an earlier calculation or fetching it from another data store. Data caching can be one of the most effective strategies to improve your overall application performance and reduce burden on your underlying primary data sources. Data can be cached at multiple levels in the application, such as within the application making remote calls, known as client-side caching, or by using a fast secondary service for storing the data, known as remote caching. **Client-side caching** With client-side caching, each client (an application or service that queries the backend datastore) can store the results of their unique queries locally for a specified amount of time. This can reduce the number of requests across the network to a datastore by checking the local client cache first. If the results are not present, the application can then query the datastore and store those results locally. This pattern allows each client to store data in the closest location possible (the client itself), resulting in the lowest possible latency. Clients can also continue to serve some queries when the backend datastore is unavailable, increasing the availability of the overall system. One disadvantage of this approach is that when multiple clients are involved, they may store the same cached data locally. This results in both duplicate storage usage and data inconsistency between those clients. One client might cache the results of a query, and one minute later another client can run the same query and get a different result. **Remote caching** To solve the issue of duplicate data between clients, a fast external service, or remote cache, can be used to store the queried data. Instead of checking a local data store, each client will check the remote cache before querying the backend datastore. This strategy allows for more consistent responses between clients, better efficiency in stored data, and a higher volume of cached data because the storage space scales independently of clients. The disadvantage of a remote cache is that the overall system may see a higher latency, as an additional network hop is required to check the remote cache. Client-side caching can be used alongside remote caching for multi-level caching to improve the latency. ### Implementation steps 1. Identify databases, APIs, and network services that could benefit from caching. Services that have heavy read workloads, have a high read-to-write ratio, or are expensive to scale are candidates for caching. 2. Identify the appropriate type of caching strategy that best fits your access pattern. 3. Follow Caching Best Practices for your data store. 4. Configure a cache invalidation strategy, such as a time-to-live (TTL), for all data that balances freshness of data and reducing pressure on backend datastore. 5. Enable features such as automatic connection retries, exponential backoff, client-side timeouts, and connection pooling in the client, if available, as they can improve performance and reliability. 6. Monitor cache hit rate with a goal of 80% or higher. Lower values may indicate insufficient cache size or an access pattern that does not benefit from caching. 7. Implement data replication to offload reads to multiple instances and improve data read performance and availability.

๐Ÿ’ผ PERF04-BP01 Understand how networking impacts performance

Analyze and understand how network-related decisions impact your workload to provide efficient performance and improved user experience. **Common anti-patterns** - All traffic flows through your existing data centers. - You route all traffic through central firewalls instead of using cloud-native network security tools. - You provision AWS Direct Connect connections without understanding actual usage requirements. - You donโ€™t consider workload characteristics and encryption overhead when defining your networking solutions. - You use on-premises concepts and strategies for networking solutions in the cloud. **Benefits of establishing this best practice** Understanding how networking impacts workload performance helps you identify potential bottlenecks, improve user experience, increase reliability, and lower operational maintenance as the workload changes. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance The network is responsible for the connectivity between application components, cloud services, edge networks, and on-premises data, and therefore it can heavily impact workload performance. In addition to workload performance, user experience can also be impacted by network latency, bandwidth, protocols, location, network congestion, jitter, throughput, and routing rules. Have a documented list of networking requirements from the workload including latency, packet size, routing rules, protocols, and supporting traffic patterns. Review the available networking solutions and identify which service meets your workload networking characteristics. Cloud-based networks can be quickly rebuilt, so evolving your network architecture over time is necessary to improve performance efficiency. ### Implementation steps 1. Define and document networking performance requirements, including metrics such as network latency, bandwidth, protocols, locations, traffic patterns (spikes and frequency), throughput, encryption, inspection, and routing rules. 2. Learn about key AWS networking services like VPCs, AWS Direct Connect, Elastic Load Balancing (ELB), and Amazon Route 53. 3. Capture the following key networking characteristics: - **Foundational networking characteristics:** - VPC Flow Logs - AWS Transit Gateway Flow Logs - AWS Transit Gateway metrics - AWS PrivateLink metrics - **Application networking characteristics:** - Elastic Fabric Adapter - AWS App Mesh metrics - Amazon API Gateway metrics - **Edge networking characteristics:** - Amazon CloudFront metrics - Amazon Route 53 metrics - AWS Global Accelerator metrics - **Hybrid networking characteristics:** - AWS Direct Connect metrics - AWS Site-to-Site VPN metrics - AWS Client VPN metrics - AWS Cloud WAN metrics - **Security networking characteristics:** - AWS Shield, AWS WAF, and AWS Network Firewall metrics - **Tracing characteristics:** - AWS X-Ray - VPC Reachability Analyzer - Network Access Analyzer - Amazon Inspector - Amazon CloudWatch RUM 4. Benchmark and test network performance: - Benchmark network throughput, as some factors can affect Amazon EC2 network performance when instances are in the same VPC. Measure the network bandwidth between Amazon EC2 Linux instances in the same VPC. - Perform load tests to experiment with networking solutions and options.

๐Ÿ’ผ PERF04-BP02 Evaluate available networking features

Evaluate networking features in the cloud that may increase performance. Measure the impact of these features through testing, metrics, and analysis. For example, take advantage of network-level features that are available to reduce latency, network distance, or jitter. **Common anti-patterns** - You stay within one Region because that is where your headquarters is physically located. - You use firewalls instead of security groups for filtering traffic. - You break TLS for traffic inspection rather than relying on security groups, endpoint policies, and other cloud-native functionality. - You only use subnet-based segmentation instead of security groups. **Benefits of establishing this best practice** Evaluating all service features and options can increase your workload performance, reduce the cost of infrastructure, decrease the effort required to maintain your workload, and increase your overall security posture. You can use the global AWS backbone to provide the optimal networking experience for your customers. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS offers services like AWS Global Accelerator and Amazon CloudFront that can help improve network performance, while most AWS services have product features (such as the Amazon S3 Transfer Acceleration feature) to optimize network traffic. Review which network-related configuration options are available to you and how they could impact your workload. Performance optimization depends on understanding how these options interact with your architecture and the impact that they will have on both measured performance and user experience. ### Implementation steps 1. Create a list of workload components. 1. Consider using AWS Cloud WAN to build, manage, and monitor your organization's network when building a unified global network. 2. Monitor your global and core networks with Amazon CloudWatch Logs metrics. Leverage Amazon CloudWatch RUM, which provides insights to help to identify, understand, and enhance usersโ€™ digital experience. 3. View aggregate network latency between AWS Regions and Availability Zones, as well as within each Availability Zone, using AWS Network Manager to gain insight into how your application performance relates to the performance of the underlying AWS network. 4. Use an existing configuration management database (CMDB) tool or a service such as AWS Config to create an inventory of your workload and how itโ€™s configured. 2. If this is an existing workload, identify and document the benchmark for your performance metrics, focusing on the bottlenecks and areas to improve. Performance-related networking metrics will differ per workload based on business requirements and workload characteristics. Metrics to review might include: bandwidth, latency, packet loss, jitter, and retransmits. 3. If this is a new workload, perform load tests to identify performance bottlenecks. 4. For the performance bottlenecks you identify, review the configuration options for your solutions to identify performance improvement opportunities. Key networking options and features include: - **Network path or routes:** Use Network Access Analyzer to identify paths or routes. - **Network protocols:** See PERF04-BP05 Choose network protocols to improve performance. - **Network topology:** Evaluate tradeoffs between VPC Peering and AWS Transit Gateway. Share your AWS Transit Gateway between multiple accounts using AWS Resource Access Manager. See PERF04-BP03 for dedicated connectivity or VPN. - **Network services:** AWS Global Accelerator, Amazon CloudFront, Lambda@edge, Amazon Route 53 routing options (latency-based, geolocation, geoproximity, IP-based). - **Storage resource features:** Amazon S3 Transfer Acceleration, Amazon S3 Multi-Region Access Points. - **Compute resource features:** Elastic Network Interfaces (ENA), placement group optimizations, Amazon Elastic Network Adapters, Elastic Fabric Adapter (EFA), Amazon EBS-optimized instances.

๐Ÿ’ผ PERF04-BP03 Choose appropriate dedicated connectivity or VPN for your workload

When hybrid connectivity is required to connect on-premises and cloud resources, provision adequate bandwidth to meet your performance requirements. Estimate the bandwidth and latency requirements for your hybrid workload. These numbers will drive your sizing requirements. **Common anti-patterns** - You only evaluate VPN solutions for your network encryption requirements. - You do not evaluate backup or redundant connectivity options. - You do not identify all workload requirements (encryption, protocol, bandwidth, and traffic needs). **Benefits of establishing this best practice** Selecting and configuring appropriate connectivity solutions will increase the reliability of your workload and maximize performance. By identifying workload requirements, planning ahead, and evaluating hybrid solutions, you can minimize expensive physical network changes and operational overhead while increasing your time-to-value. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Develop a hybrid networking architecture based on your bandwidth requirements. AWS Direct Connect allows you to connect your on-premises network privately with AWS. It is suitable when you need high-bandwidth and low-latency while achieving consistent performance. A VPN connection establishes a secure connection over the internet. It is used when only a temporary connection is required, when cost is a factor, or as a contingency while waiting for resilient physical network connectivity to be established when using AWS Direct Connect. If your bandwidth requirements are high, you might consider multiple AWS Direct Connect or VPN services. Traffic can be load balanced across services, although we don't recommend load balancing between AWS Direct Connect and VPN because of the latency and bandwidth differences. ### Implementation steps 1. Estimate the bandwidth and latency requirements of your existing applications. - For existing workloads that are moving to AWS, leverage the data from your internal network monitoring systems. - For new or existing workloads for which you donโ€™t have monitoring data, consult with the product owners to determine adequate performance metrics and provide a good user experience. 2. Select dedicated connection or VPN as your connectivity option. Based on all workload requirements (encryption, bandwidth, and traffic needs), you can either choose AWS Direct Connect or AWS VPN (or both). - AWS Direct Connect provides dedicated connectivity to the AWS environment, from 50 Mbps up to 100 Gbps, using either dedicated connections or hosted connections. This gives you managed and controlled latency and provisioned bandwidth so your workload can connect efficiently to other environments. Using AWS Direct Connect partners, you can have end-to-end connectivity from multiple environments, providing an extended network with consistent performance. AWS offers scaling direct connect connection bandwidth using either native 100 Gbps, link aggregation group (LAG), or BGP equal-cost multipath (ECMP). - The AWS Site-to-Site VPN provides a managed VPN service supporting Internet Protocol Security (IPsec). Each VPN connection includes two tunnels for high availability. 3. Follow AWS documentation to choose an appropriate connectivity option: - If you decide to use AWS Direct Connect, select the appropriate bandwidth for your connectivity. - If using AWS Site-to-Site VPN across multiple locations to connect to an AWS Region, use an accelerated Site-to-Site VPN connection to improve network performance. - If your network design consists of IPSec VPN over AWS Direct Connect, consider using Private IP VPN to improve security and achieve segmentation. AWS Site-to-Site Private IP VPN is deployed on top of transit virtual interface (VIF). - AWS Direct Connect SiteLink allows creating low-latency and redundant connections between your data centers worldwide by sending data over the fastest path between AWS Direct Connect locations, bypassing AWS Regions. 4. Validate your connectivity setup before deploying to production. Perform security and performance testing to assure it meets your bandwidth, reliability, latency, and compliance requirements. 5. Regularly monitor your connectivity performance and usage and optimize if required.

๐Ÿ’ผ PERF04-BP04 Use load balancing to distribute traffic across multiple resources

Distribute traffic across multiple resources or services to allow your workload to take advantage of the elasticity that the cloud provides. You can also use load balancing for offloading encryption termination to improve performance, reliability and manage and route traffic effectively. **Common anti-patterns** - You donโ€™t consider your workload requirements when choosing the load balancer type. - You donโ€™t leverage the load balancer features for performance optimization. - The workload is exposed directly to the internet without a load balancer. - You route all internet traffic through existing load balancers. - You use generic TCP load balancing and making each compute node handle SSL encryption. **Benefits of establishing this best practice** A load balancer handles the varying load of your application traffic in a single Availability Zone or across multiple Availability Zones and enables high availability, automatic scaling, and better utilization for your workload. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Load balancers act as the entry point for your workload, from which point they distribute the traffic to your backend targets, such as compute instances or containers, to improve utilization. Choosing the right load balancer type is the first step to optimize your architecture. Start by listing your workload characteristics, such as protocol (like TCP, HTTP, TLS, or WebSockets), target type (like instances, containers, or serverless), application requirements (like long running connections, user authentication, or stickiness), and placement (like Region, Local Zone, Outpost, or zonal isolation). AWS provides multiple models for your applications to use load balancing. Application Load Balancer is best suited for load balancing of HTTP and HTTPS traffic and provides advanced request routing targeted at the delivery of modern application architectures, including microservices and containers. Network Load Balancer is best suited for load balancing of TCP traffic where extreme performance is required. It is capable of handling millions of requests per second while maintaining ultra-low latencies, and it is optimized to handle sudden and volatile traffic patterns. Elastic Load Balancing provides integrated certificate management and SSL/TLS decryption, allowing you the flexibility to centrally manage the SSL settings of the load balancer and offload CPU intensive work from your workload. After choosing the right load balancer, you can start leveraging its features to reduce the amount of effort your backend has to do to serve the traffic. For example, using both Application Load Balancer (ALB) and Network Load Balancer (NLB), you can perform SSL/TLS encryption offloading, which is an opportunity to avoid the CPU-intensive TLS handshake from being completed by your targets and also to improve certificate management. When you configure SSL/TLS offloading in your load balancer, it becomes responsible for the encryption of the traffic from and to clients while delivering the traffic unencrypted to your backends, freeing up your backend resources and improving the response time for the clients. Application Load Balancer can also serve HTTP/2 traffic without needing to support it on your targets. This simple decision can improve your application response time, as HTTP/2 uses TCP connections more efficiently. Your workload latency requirements should be considered when defining the architecture. As an example, if you have a latency-sensitive application, you may decide to use Network Load Balancer, which offers extremely low latencies. Alternatively, you may decide to bring your workload closer to your customers by leveraging Application Load Balancer in AWS Local Zones or even AWS Outposts. Another consideration for latency-sensitive workloads is cross-zone load balancing. With cross-zone load balancing, each load balancer node distributes traffic across the registered targets in all allowed Availability Zones. Use Auto Scaling integrated with your load balancer. One of the key aspects of a performance efficient system has to do with right-sizing your backend resources. To do this, you can leverage load balancer integrations for backend target resources. Using the load balancer integration with Auto Scaling groups, targets will be added or removed from the load balancer as required in response to incoming traffic. Load balancers can also integrate with Amazon ECS and Amazon EKS for containerized workloads. ### Implementation steps 1. Define your load balancing requirements including traffic volume, availability and application scalability. 2. Choose the right load balancer type for your application. - Use Application Load Balancer for HTTP/HTTPS workloads. - Use Network Load Balancer for non-HTTP workloads that run on TCP or UDP. - Use a combination of both (ALB as a target of NLB) if you want to leverage features of both products. For example, you can do this if you want to use the static IPs of NLB together with HTTP header based routing from ALB, or if you want to expose your HTTP workload to an AWS PrivateLink. 3. For a full comparison of load balancers, see ELB product comparison. 4. Use SSL/TLS offloading if possible. - Configure HTTPS/TLS listeners with both Application Load Balancer and Network Load Balancer integrated with AWS Certificate Manager. - Note that some workloads may require end-to-end encryption for compliance reasons. In this case, it is a requirement to allow encryption at the targets. - For security best practices, see SEC09-BP02 Enforce encryption in transit. 5. Select the right routing algorithm (only ALB). - Least outstanding requests: Use to achieve a better load distribution to your backend targets for cases when the requests for your application vary in complexity or your targets vary in processing capability. - Round robin: Use when the requests and targets are similar, or if you need to distribute requests equally among targets. 6. Consider cross-zone or zonal isolation. - Use cross-zone turned off (zonal isolation) for latency improvements and zonal failure domains. It is turned off by default in NLB and in ALB you can turn it off per target group. - Use cross-zone turned on for increased availability and flexibility. By default, cross-zone is turned on for ALB and in NLB you can turn it on per target group. 7. Turn on HTTP keep-alives for your HTTP workloads (only ALB). With this feature, the load balancer can reuse backend connections until the keep-alive timeout expires, improving your HTTP request and response time and also reducing resource utilization on your backend targets. 8. Turn on monitoring for your load balancer. - Turn on access logs for your Application Load Balancer and Network Load Balancer. - The main fields to consider for ALB are request_processing_time, request_processing_time, and response_processing_time. - The main fields to consider for NLB are connection_time and tls_handshake_time. - Be ready to query the logs when you need them. You can use Amazon Athena to query both ALB logs and NLB logs. - Create alarms for performance related metrics such as TargetResponseTime for ALB.

๐Ÿ’ผ PERF04-BP05 Choose network protocols to improve performance

Make decisions about protocols for communication between systems and networks based on the impact to the workloadโ€™s performance. There is a relationship between latency and bandwidth to achieve throughput. If your file transfer is using Transmission Control Protocol (TCP), higher latencies will most likely reduce overall throughput. There are approaches to fix this with TCP tuning and optimized transfer protocols, but one solution is to use User Datagram Protocol (UDP). **Common anti-patterns** - You use TCP for all workloads regardless of performance requirements. **Benefits of establishing this best practice** Verifying that an appropriate protocol is used for communication between users and workload components helps improve overall user experience for your applications. For instance, connection-less UDP allows for high speed, but it doesn't offer retransmission or high reliability. TCP is a full featured protocol, but it requires greater overhead for processing the packets. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance If you have the ability to choose different protocols for your application and you have expertise in this area, optimize your application and end-user experience by using a different protocol. Note that this approach comes with significant difficulty and should only be attempted if you have optimized your application in other ways first. A primary consideration for improving your workloadโ€™s performance is to understand the latency and throughput requirements, and then choose network protocols that optimize performance. ### When to consider using TCP TCP provides reliable data delivery, and can be used for communication between workload components where reliability and guaranteed delivery of data is important. Many web-based applications rely on TCP-based protocols, such as HTTP and HTTPS, to open TCP sockets for communication between application components. Email and file data transfer are common applications that also make use of TCP, as it is a simple and reliable transfer mechanism between application components. Using TLS with TCP can add some overhead to the communication, which can result in increased latency and reduced throughput, but it comes with the advantage of security. The overhead comes mainly from the added overhead of the handshake process, which can take several round-trips to complete. Once the handshake is complete, the overhead of encrypting and decrypting data is relatively small. ### When to consider using UDP UDP is a connection-less-oriented protocol and is therefore suitable for applications that need fast, efficient transmission, such as log, monitoring, and VoIP data. Also, consider using UDP if you have workload components that respond to small queries from large numbers of clients to ensure optimal performance of the workload. Datagram Transport Layer Security (DTLS) is the UDP equivalent of Transport Layer Security (TLS). When using DTLS with UDP, the overhead comes from encrypting and decrypting the data, as the handshake process is simplified. DTLS also adds a small amount of overhead to the UDP packets, as it includes additional fields to indicate the security parameters and to detect tampering. ### When to consider using SRD Scalable reliable datagram (SRD) is a network transport protocol optimized for high-throughput workloads due to its ability to load-balancer traffic across multiple paths and quickly recover from packet drops or link failures. SRD is therefore best used for high performance computing (HPC) workloads that require high throughput and low latency communication between compute nodes. This might include parallel processing tasks such as simulation, modelling, and data analysis that involve a large amount of data transfer between nodes. ### Implementation steps - Use the AWS Global Accelerator and AWS Transfer Family services to improve the throughput of your online file transfer applications. The AWS Global Accelerator service helps you achieve lower latency between your client devices and your workload on AWS. With AWS Transfer Family, you can use TCP-based protocols such as Secure Shell File Transfer Protocol (SFTP) and File Transfer Protocol over SSL (FTPS) to securely scale and manage your file transfers to AWS storage services. - Use network latency to determine if TCP is appropriate for communication between workload components. If the network latency between your client application and server is high, then the TCP three-way handshake can take some time, thereby impacting on the responsiveness of your application. Metrics such as time to first byte (TTFB) and round-trip time (RTT) can be used to measure network latency. If your workload serves dynamic content to users, consider using Amazon CloudFront, which establishes a persistent connection to each origin for dynamic content to remove the connection setup time that would otherwise slow down each client request. - Using TLS with TCP or UDP can result in increased latency and reduced throughput for your workload due to the impact of encryption and decryption. For such workloads, consider SSL/TLS offloading on Elastic Load Balancing to improve workload performance by allowing the load balancer to handle SSL/TLS encryption and decryption process instead of having backend instances do it. This can help reduce the CPU utilization on the backend instances, which can improve performance and increase capacity. - Use the Network Load Balancer (NLB) to deploy services that rely on the UDP protocol, such as authentication and authorization, logging, DNS, IoT, and streaming media, to improve the performance and reliability of your workload. The NLB distributes incoming UDP traffic across multiple targets, allowing you to scale your workload horizontally, increase capacity, and reduce the overhead of a single target. - For your High Performance Computing (HPC) workloads, consider using the Elastic Network Adapter (ENA) Express functionality that uses the SRD protocol to improve network performance by providing a higher single flow bandwidth (25 Gbps) and lower tail latency (99.9 percentile) for network traffic between EC2 instances. - Use the Application Load Balancer (ALB) to route and load balance your gRPC (Remote Procedure Calls) traffic between workload components or between gRPC clients and services. gRPC uses the TCP-based HTTP/2 protocol for transport and it provides performance benefits such as lighter network footprint, compression, efficient binary serialization, support for numerous languages, and bi-directional streaming.

๐Ÿ’ผ PERF04-BP06 Choose your workload's location based on network requirements

Evaluate options for resource placement to reduce network latency and improve throughput, providing an optimal user experience by reducing page load and data transfer times. **Common anti-patterns** - You consolidate all workload resources into one geographic location. - You chose the closest Region to your location but not to the workload end user. **Benefits of establishing this best practice** User experience is greatly affected by the latency between the user and your application. By using appropriate AWS Regions and the AWS private global network, you can reduce latency and deliver a better experience to remote users. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Resources, such as Amazon EC2 instances, are placed into Availability Zones within AWS Regions, AWS Local Zones, AWS Outposts, or AWS Wavelength zones. Selection of this location influences network latency and throughput from a given user location. Edge services like Amazon CloudFront and AWS Global Accelerator can also be used to improve network performance by either caching content at edge locations or providing users with an optimal path to the workload through the AWS global network. Amazon EC2 provides placement groups for networking. A placement group is a logical grouping of instances to decrease latency. Using placement groups with supported instance types and an Elastic Network Adapter (ENA) enables workloads to participate in a low-latency, reduced jitter 25 Gbps network. Placement groups are recommended for workloads that benefit from low network latency, high network throughput, or both. Latency-sensitive services are delivered at edge locations using AWS global network, such as Amazon CloudFront. These edge locations commonly provide services like content delivery network (CDN) and domain name system (DNS). By having these services at the edge, workloads can respond with low latency to requests for content or DNS resolution. These services also provide geographic services, such as geotargeting of content (providing different content based on the end usersโ€™ location) or latency-based routing to direct end users to the nearest Region (minimum latency). Use edge services to reduce latency and to enable content caching. Configure cache control correctly for both DNS and HTTP/HTTPS to gain the most benefit from these approaches. ### Implementation steps - Capture information about the IP traffic going to and from network interfaces. - Analyze network access patterns in your workload to identify how users use your application. - Use monitoring tools, such as Amazon CloudWatch and AWS CloudTrail, to gather data on network activities. - Analyze the data to identify the network access pattern. - Select Regions for your workload deployment based on the following key elements: - Where your data is located: For data-heavy applications (such as big data and machine learning), application code should run as close to the data as possible. - Where your users are located: For user-facing applications, choose a Region (or Regions) close to your workloadโ€™s users. - Other constraints: Consider constraints such as cost and compliance as explained in What to Consider when Selecting a Region for your Workloads. - Use AWS Local Zones to run workloads like video rendering. Local Zones allow you to benefit from having compute and storage resources closer to end users. - Use AWS Outposts for workloads that need to remain on-premises and where you want that workload to run seamlessly with the rest of your other workloads in AWS. - Applications like high-resolution live video streaming, high-fidelity audio, and augmented reality or virtual reality (AR/VR) require ultra-low-latency for 5G devices. For such applications, consider AWS Wavelength. AWS Wavelength embeds AWS compute and storage services within 5G networks, providing mobile edge computing infrastructure for developing, deploying, and scaling ultra-low-latency applications. - Use local caching or AWS Caching Solutions for frequently used assets to improve performance, reduce data movement, and lower environmental impact. - Amazon CloudFront Use to cache static content such as images, scripts, and videos, as well as dynamic content such as API responses or web applications. - Amazon ElastiCache Use to cache content for web applications. - DynamoDB Accelerator Use to add in-memory acceleration to your DynamoDB tables. - Use services that can help you run code closer to users of your workload like the following: - Lambda@edge Use for compute-heavy operations that are initiated when objects are not in the cache. - Amazon CloudFront Functions Use for simple use cases like HTTP(s) requests or response manipulations that can be initiated by short-lived functions. - AWS IoT Greengrass Use to run local compute, messaging, and data caching for connected devices. - Some applications require fixed entry points or higher performance by reducing first byte latency and jitter, and increasing throughput. These applications can benefit from networking services that provide static anycast IP addresses and TCP termination at edge locations. AWS Global Accelerator can improve performance for your applications by up to 60% and provide quick failover for multi-region architectures. AWS Global Accelerator provides you with static anycast IP addresses that serve as a fixed entry point for your applications hosted in one or more AWS Regions. These IP addresses permit traffic to ingress onto the AWS global network as close to your users as possible. AWS Global Accelerator reduces the initial connection setup time by establishing a TCP connection between the client and the AWS edge location closest to the client. Review the use of AWS Global Accelerator to improve the performance of your TCP/UDP workloads and provide quick failover for multi-Region architectures.

๐Ÿ’ผ PERF04-BP07 Optimize network configuration based on metrics

Use collected and analyzed data to make informed decisions about optimizing your network configuration. **Common anti-patterns** - You assume that all performance-related issues are application-related. - You only test your network performance from a location close to where you have deployed the workload. - You use default configurations for all network services. - You overprovision the network resource to provide sufficient capacity. **Benefits of establishing this best practice** Collecting necessary metrics of your AWS network and implementing network monitoring tools allows you to understand network performance and optimize network configurations. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Monitoring traffic to and from VPCs, subnets, or network interfaces is crucial to understand how to utilize AWS network resources and optimize network configurations. By using the following AWS networking tools, you can further inspect information about the traffic usage, network access and logs. ### Implementation steps 1. Identify the key performance metrics such as latency or packet loss to collect. AWS provides several tools that can help you to collect these metrics. By using the following tools, you can further inspect information about the traffic usage, network access, and logs: - Amazon VPC IP Address Manager Use IPAM to plan, track, and monitor IP addresses for your AWS and on-premises workloads. This is a best practice to optimize IP address usage and allocation. - VPC Flow Logs Use VPC Flow Logs to capture detailed information about traffic to and from network interfaces in your VPCs. With VPC Flow Logs, you can diagnose overly restrictive or permissive security group rules and determine the direction of the traffic to and from the network interfaces. - AWS Transit Gateway Flow Logs Use AWS Transit Gateway Flow Logs to capture information about the IP traffic going to and from your transit gateways. - DNS query logging Log information about public or private DNS queries Route 53 receives. With DNS logs, you can optimize DNS configurations by understanding the domain or subdomain that was requested or Route 53 EDGE locations that responded to DNS queries. - Reachability Analyzer Reachability Analyzer helps you analyze and debug network reachability. Reachability Analyzer is a configuration analysis tool that allows you to perform connectivity testing between a source resource and a destination resource in your VPCs. This tool helps you verify that your network configuration matches your intended connectivity. - Network Access Analyzer Network Access Analyzer helps you understand network access to your resources. You can use Network Access Analyzer to specify your network access requirements and identify potential network paths that do not meet your specified requirements. By optimizing your corresponding network configuration, you can understand and verify the state of your network and demonstrate if your network on AWS meets your compliance requirements. - Amazon CloudWatch Use Amazon CloudWatch and turn on the appropriate metrics for network options. Make sure to choose the right network metric for your workload. For example, you can turn on metrics for VPC Network Address Usage, VPC NAT Gateway, AWS Transit Gateway, VPN tunnel, AWS Network Firewall, Elastic Load Balancing, and AWS Direct Connect. Continually monitoring metrics is a good practice to observe and understand your network status and usage, which helps you optimize network configuration based on your observations. - AWS Network Manager Using AWS Network Manager, you can monitor the real-time and historical performance of the AWS Global Network for operational and planning purposes. Network Manager provides aggregate network latency between AWS Regions and Availability Zones and within each Availability Zone, allowing you to better understand how your application performance relates to the performance of the underlying AWS network. - Amazon CloudWatch RUM Use Amazon CloudWatch RUM to collect the metrics that give you the insights that help you identify, understand, and improve user experience. 2. Identify top talkers and application traffic patterns using VPC and AWS Transit Gateway Flow Logs. 3. Assess and optimize your current network architecture including VPCs, subnets, and routing. As an example, you can evaluate how different VPC peering or AWS Transit Gateway can help you improve the networking in your architecture. 4. Assess the routing paths in your network to verify that the shortest path between destinations is always used. Network Access Analyzer can help you do this.

๐Ÿ’ผ PERF05-BP01 Establish key performance indicators (KPIs) to measure workload health and performance

Identify the KPIs that quantitatively and qualitatively measure workload performance. KPIs help you measure the health and performance of a workload related to a business goal. **Common anti-patterns** - You only monitor system-level metrics to gain insight into your workload and donโ€™t understand business impacts to those metrics. - You assume that your KPIs are already being published and shared as standard metric data. - You do not define a quantitative, measurable KPI. - You do not align KPIs with business goals or strategies. **Benefits of establishing this best practice** Identifying specific KPIs that represent workload health and performance helps align teams on their priorities and define successful business outcomes. Sharing those metrics with all departments provides visibility and alignment on thresholds, expectations, and business impact. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance KPIs allow business and engineering teams to align on the measurement of goals and strategies and how these factors combine to produce business outcomes. For example, a website workload might use page load time as an indication of overall performance. This metric would be one of multiple data points that measures user experience. In addition to identifying the page load time thresholds, you should document the expected outcome or business risk if ideal performance is not met. A long page load time affects your end users directly, decreases their user experience rating, and can lead to a loss of customers. When you define your KPI thresholds, combine both industry benchmarks and your end user expectations. For example, if the current industry benchmark is a webpage loading within a two-second time period, but your end users expect a webpage to load within a one-second time period, then you should take both of these data points into consideration when establishing the KPI. Your team must evaluate your workload KPIs using real-time granular data and historical data for reference and create dashboards that perform metric math on your KPI data to derive operational and utilization insights. KPIs should be documented and include thresholds that support business goals and strategies, and should be mapped to metrics being monitored. KPIs should be revisited when business goals, strategies, or end user requirements change. ### Implementation steps 1. Identify stakeholders: Identify and document key business stakeholders, including development and operation teams. 2. Define objectives: Work with these stakeholders to define and document objectives of your workload. Consider the critical performance aspects of your workloads, such as throughput, response time, and cost, as well as business goals, such as user satisfaction. 3. Review industry best practices: Review industry best practices to identify relevant KPIs aligned with your workload objectives. 4. Identify metrics: Identify metrics that are aligned with your workload objectives and can help you measure performance and business goals. Establish KPIs based on these metrics. Example metrics are measurements like average response time or number of concurrent users. 5. Define and document KPIs: Use industry best practices and your workload objectives to set targets for your workload KPI. Use this information to set KPI thresholds for severity or alarm level. Identify and document the risk and impact of a KPI is not met. 6. Implement monitoring: Use monitoring tools such as Amazon CloudWatch or AWS Config to collect metrics and measure KPIs. 7. Visually communicate KPIs: Use dashboard tools like Amazon QuickSight to visualize and communicate KPIs with stakeholders. 8. Analyze and optimize: Regularly review and analyze KPIs to identify areas of your workload that need to be improved. Work with stakeholders to implement these improvements. 9. Revisit and refine: Regularly review metrics and KPIs to assess their effectiveness, especially when business goals or workload performance change.

๐Ÿ’ผ PERF05-BP02 Use monitoring solutions to understand the areas where performance is most critical

Understand and identify areas where increasing the performance of your workload will have a positive impact on efficiency or customer experience. For example, a website that has a large amount of customer interaction can benefit from using edge services to move content delivery closer to customers. **Common anti-patterns** - You assume that standard compute metrics such as CPU utilization or memory pressure are enough to catch performance issues. - You only use the default metrics recorded by your selected monitoring software. - You only review metrics when there is an issue. **Benefits of establishing this best practice** Understanding critical areas of performance helps workload owners monitor KPIs and prioritize high-impact improvements. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Set up end-to-end tracing to identify traffic patterns, latency, and critical performance areas. Monitor your data access patterns for slow queries or poorly fragmented and partitioned data. Identify the constrained areas of the workload using load testing or monitoring. Increase performance efficiency by understanding your architecture, traffic patterns, and data access patterns, and identify your latency and processing times. Identify the potential bottlenecks that might affect the customer experience as the workload grows. After investigating these areas, look at which solution you could deploy to remove those performance concerns. ### Implementation steps 1. Set up end-to-end monitoring to capture all workload components and metrics. Here are examples of monitoring solutions on AWS. | Service | Where to use | |---------|--------------| | Amazon CloudWatch Real-User Monitoring (RUM) | To capture application performance metrics from real user client-side and frontend sessions. | | AWS X-Ray | To trace traffic through the application layers and identify latency between components and dependencies. Use X-Ray service maps to see relationships and latency between workload components. | | Amazon Relational Database Service Performance Insights | To view database performance metrics and identify performance improvements. | | Amazon RDS Enhanced Monitoring | To view database OS performance metrics. | | Amazon DevOps Guru | To detect abnormal operating patterns so you can identify operational issues before they impact your customers. | 2. Perform tests to generate metrics, identify traffic patterns, bottlenecks, and critical performance areas. Here are some examples of how to perform testing: - Set up CloudWatch Synthetic Canaries to mimic browser-based user activities programmatically using Linux cron jobs or rate expressions to generate consistent metrics over time. - Use the AWS Distributed Load Testing solution to generate peak traffic or test the workload at the expected growth rate. 3. Evaluate the metrics and telemetry to identify your critical performance areas. Review these areas with your team to discuss monitoring and solutions to avoid bottlenecks. 4. Experiment with performance improvements and measure those changes with data. As an example, you can use CloudWatch Evidently to test new improvements and performance impacts to your workload.

๐Ÿ’ผ PERF05-BP03 Define a process to improve workload performance

Define a process to evaluate new services, design patterns, resource types, and configurations as they become available. For example, run existing performance tests on new instance offerings to determine their potential to improve your workload. **Common anti-patterns** - You assume your current architecture is static and wonโ€™t be updated over time. - You introduce architecture changes over time with no metric justification. **Benefits of establishing this best practice** By defining your process for making architectural changes, you can use gathered data to influence your workload design over time. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Your workload's performance has a few key constraints. Document these so that you know what kinds of innovation might improve the performance of your workload. Use this information when learning about new services or technology as it becomes available to identify ways to alleviate constraints or bottlenecks. Identify the key performance constraints for your workload. Document your workloadโ€™s performance constraints so that you know what kinds of innovation might improve the performance of your workload. ### Implementation steps 1. Identify KPIs: Identify your workload performance KPIs as outlined in PERF05-BP01 Establish key performance indicators (KPIs) to measure workload health and performance to baseline your workload. 2. Implement monitoring: Use AWS observability tools to collect performance metrics and measure KPIs. 3. Conduct analysis: Conduct in-depth analysis to identify the areas (like configuration and application code) in your workload that is under-performing as outlined in PERF05-BP02 Use monitoring solutions to understand the areas where performance is most critical. Use your analysis and performance tools to identify the performance improvement strategies. 4. Validate improvements: Use sandbox or pre-production environments to validate the effectiveness of improvement strategies. 5. Implement changes: Implement the changes in production and continually monitor the workloadโ€™s performance. Document the improvements, and communicate the changes to stakeholders. 6. Revisit and refine: Regularly review your performance improvement process to identify areas for enhancement.

๐Ÿ’ผ PERF05-BP04 Load test your workload

Load test your workload to verify it can handle production load and identify any performance bottleneck. **Common anti-patterns** - You load test individual parts of your workload but not your entire workload. - You load test on infrastructure that is not the same as your production environment. - You only conduct load testing to your expected load and not beyond, to help foresee where you may have future problems. - You perform load testing without consulting the Amazon EC2 Testing Policy and submitting a Simulated Event Submissions Form. This results in your test failing to run, as it looks like a denial-of-service event. **Benefits of establishing this best practice** Measuring your performance under a load test will show you where you will be impacted as load increases. This can provide you with the capability of anticipating needed changes before they impact your workload. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Load testing in the cloud is a process to measure the performance of cloud workload under realistic conditions with expected user load. This process involves provisioning a production-like cloud environment, using load testing tools to generate load, and analyzing metrics to assess the ability of your workload handling a realistic load. Load tests must be run using synthetic or sanitized versions of production data (remove sensitive or identifying information). Automatically carry out load tests as part of your delivery pipeline, and compare the results against pre-defined KPIs and thresholds. This process helps you continue to achieve required performance. ### Implementation steps 1. Define your testing objectives: Identify the performance aspects of your workload that you want to evaluate, such as throughput and response time. 2. Select a testing tool: Choose and configure the load testing tool that suits your workload. 3. Set up your environment: Set up the test environment based on your production environment. You can use AWS services to run production-scale environments to test your architecture. 4. Implement monitoring: Use monitoring tools such as Amazon CloudWatch to collect metrics across the resources in your architecture. You can also collect and publish custom metrics. 5. Define scenarios: Define the load testing scenarios and parameters (like test duration and number of users). 6. Conduct load testing: Perform test scenarios at scale. Take advantage of the AWS Cloud to test your workload to discover where it fails to scale, or if it scales in a non-linear way. For example, use Spot Instances to generate loads at low cost and discover bottlenecks before they are experienced in production. 7. Analyze test results: Analyze the results to identify performance bottlenecks and areas for improvements. 8. Document and share findings: Document and report on findings and recommendations. Share this information with stakeholders to help them make informed decision regarding performance optimization strategies. 9. Continually iterate: Load testing should be performed at regular cadence, especially after a system change of update.

๐Ÿ’ผ PERF05-BP05 Use automation to proactively remediate performance-related issues

Use key performance indicators (KPIs), combined with monitoring and alerting systems, to proactively address performance-related issues. **Common anti-patterns** - You only allow operations staff the ability to make operational changes to the workload. - You let all alarms filter to the operations team with no proactive remediation. **Benefits of establishing this best practice** Proactive remediation of alarm actions allows support staff to concentrate on those items that are not automatically actionable. This helps operations staff handle all alarms without being overwhelmed and instead focus only on critical alarms. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Use alarms to trigger automated actions to remediate issues where possible. Escalate the alarm to those able to respond if automated response is not possible. For example, you may have a system that can predict expected key performance indicator (KPI) values and alarm when they breach certain thresholds, or a tool that can automatically halt or roll back deployments if KPIs are outside of expected values. Implement processes that provide visibility into performance as your workload is running. Build monitoring dashboards and establish baseline norms for performance expectations to determine if the workload is performing optimally. ### Implementation steps 1. Identify remediation workflow: Identify and understand the performance issue that can be remediated automatically. Use AWS monitoring solutions such as Amazon CloudWatch or AWS X-Ray to help you better understand the root cause of the issue. 2. Define the automation process: Create a step-by-step remediation process that can be used to automatically fix the issue. 3. Configure the initiation event: Configure the event to automatically initiate the remediation process. For example, you can define a trigger to automatically restart an instance when it reaches a certain threshold of CPU utilization. 4. Automate the remediation: Use AWS services and technologies to automate the remediation process. For example, AWS Systems Manager Automation provides a secure and scalable way to automate the remediation process. Make sure to use self-healing logic to revert changes if they do not successfully resolve the issue. 5. Test the workflow: Test the automated remediation process in a pre-production environment. 6. Implement the workflow: Implement the automated remediation in the production environment. 7. Develop a playbook: Develop and document a playbook that outlines the steps for the remediation plan, including the initiation events, remediation logic, and actions taken. Make sure to train stakeholders to help them effectively respond to automated remediation events. 8. Review and refine: Regularly assess the effectiveness of the automated remediation workflow. Adjust initiation events and remediation logic if necessary.

๐Ÿ’ผ PERF05-BP06 Keep your workload and services up-to-date

Stay up-to-date on new cloud services and features to adopt efficient features, remove issues, and improve the overall performance efficiency of your workload. **Common anti-patterns** - You assume your current architecture is static and will not be updated over time. - You do not have any systems or a regular cadence to evaluate if updated software and packages are compatible with your workload. **Benefits of establishing this best practice** By establishing a process to stay up-to-date on new services and offerings, you can adopt new features and capabilities, resolve issues, and improve workload performance. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Evaluate ways to improve performance as new services, design patterns, and product features become available. Determine which of these could improve performance or increase the efficiency of the workload through evaluation, internal discussion, or external analysis. Define a process to evaluate updates, new features, and services relevant to your workload. For example, build a proof of concept that uses new technologies or consult with an internal group. When trying new ideas or services, run performance tests to measure the impact that they have on the performance of the workload. ### Implementation steps 1. Inventory your workload: Inventory your workload software and architecture and identify components that need to be updated. 2. Identify update sources: Identify news and update sources related to your workload components. As an example, you can subscribe to the Whatโ€™s New at AWS blog for the products that match your workload component. You can subscribe to the RSS feed or manage your email subscriptions. 3. Define an update schedule: Define a schedule to evaluate new services and features for your workload. - You can use AWS Systems Manager Inventory to collect operating system (OS), application, and instance metadata from your Amazon EC2 instances and quickly understand which instances are running the software and configurations required by your software policy and which instances need to be updated. 4. Assess the new update: Understand how to update the components of your workload. Take advantage of agility in the cloud to quickly test how new features can improve your workload to gain performance efficiency. 5. Use automation: Use automation for the update process to reduce the level of effort to deploy new features and limit errors caused by manual processes. - You can use CI/CD to automatically update AMIs, container images, and other artifacts related to your cloud application. - You can use tools such as AWS Systems Manager Patch Manager to automate the process of system updates, and schedule the activity using AWS Systems Manager Maintenance Windows. 6. Document the process: Document your process for evaluating updates and new services. Provide your owners the time and space needed to research, test, experiment, and validate updates and new services. Refer back to the documented business requirements and KPIs to help prioritize which update will make a positive business impact.

๐Ÿ’ผ PERF05-BP07 Review metrics at regular intervals

As part of routine maintenance or in response to events or incidents, review which metrics are collected. Use these reviews to identify which metrics were essential in addressing issues and which additional metrics, if they were being tracked, could help identify, address, or prevent issues. **Common anti-patterns** - You allow metrics to stay in an alarm state for an extended period of time. - You create alarms that are not actionable by an automation system. **Benefits of establishing this best practice** Continually review metrics that are being collected to verify that they properly identify, address, or prevent issues. Metrics can also become stale if you let them stay in an alarm state for an extended period of time. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Constantly improve metric collection and monitoring. As part of responding to incidents or events, evaluate which metrics were helpful in addressing the issue and which metrics could have helped that are not currently being tracked. Use this method to improve the quality of metrics you collect so that you can prevent, or more quickly resolve future incidents. ### Implementation steps 1. Define metrics: Define critical performance metrics to monitor that are aligned to your workload objective, including metrics such as response time and resource utilization. 2. Establish baselines: Set a baseline and desirable value for each metric. The baseline should provide reference points to identify deviation or anomalies. 3. Set up a cadence: Set a cadence (like weekly or monthly) to review critical metrics. 4. Identify performance issues: During each review, assess trends and deviation from the baseline values. Look for any performance bottlenecks or anomalies. For identified issues, conduct in-depth root cause analysis to understand the main reason behind the issue. 5. Identify corrective actions: Use your analysis to identify corrective actions. This may include parameter tuning, fixing bugs, and scaling resources. 6. Document findings: Document your findings, including identified issues, root causes, and corrective actions. 7. Iterate and improve: Continually assess and improve the metrics review process. Use the lessons learned from previous review to enhance the process over time.

๐Ÿ’ผ Permissions management

Manage permissions to control access to human and machine identities that require access to AWS and your workloads. Permissions allow you to control who can access what, and under what conditions. By setting permissions to specific human and machine identities, you grant them access to specific service actions on specific resources. Additionally, you can specify conditions that must be true for access to be granted.

๐Ÿ’ผ PI1.1-2 Defines Data Necessary to Support a Product or Service

When data is provided as part of a service or product or as part of a reporting obligation related to a product or service: 1. The definition and purpose of the data is available to the users of the data. 2. The definition of the data includes the following information: a. The population of events or instances included in the set of data b. The nature of each element (for example, field) of the set of data (that is, the event or instance to which the data element relates, for example, transaction price of a sale of XYZ Corporation stock for the last trade in that stock on a given day) c. The sources of the data within the set d. The units of measurement of data elements (for example, fields) e. The accuracy, correctness, or precision of measurement f. The uncertainty or confidence interval inherent in each data element and in the population of those elements g. The time periods over which the set of data was measured or the period of time during which the events the data relates to occurred h. In addition to the date or period of time, the factors that determined the inclusion and exclusion of items in the data elements and population 3. The definition of the data is complete and accurate. 4. The description of the data identifies any information that is necessary to understand each data element and the population in a manner consistent with its definition and intended purpose (metadata) that has not been included within the data.

๐Ÿ’ผ PI1.4-1 Protects Output

Output is protected when stored or delivered, or both, to prevent theft, destruction, corruption, or deterioration that would prevent output from meeting specifications.

๐Ÿ’ผ PL-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] planning policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the planning policy and the associated planning controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the planning policy and procedures; and c. Review and update the current planning: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ PL-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] planning policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the planning policy and the associated planning controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the planning policy and procedures; and c. Review and update the current planning: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PL-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] planning policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the planning policy and the associated planning controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the planning policy and procedures; and c. Review and update the current planning: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PL-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] planning policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the planning policy and the associated planning controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the planning policy and procedures; and c. Review and update the current planning: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PL-1 SECURITY PLANNING POLICY AND PROCEDURES

The organization: PL-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: PL-1a.1. A security planning policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and PL-1a.2. Procedures to facilitate the implementation of the security planning policy and associated security planning controls; and PL-1b. Reviews and updates the current: PL-1b.1. Security planning policy [Assignment: organization-defined frequency]; and PL-1b.2. Security planning procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ PL-2 System Security and Privacy Plans

a. Develop security and privacy plans for the system that: 1. Are consistent with the organizationโ€™s enterprise architecture; 2. Explicitly define the constituent system components; 3. Describe the operational context of the system in terms of mission and business processes; 4. Identify the individuals that fulfill system roles and responsibilities; 5. Identify the information types processed, stored, and transmitted by the system; 6. Provide the security categorization of the system, including supporting rationale; 7. Describe any specific threats to the system that are of concern to the organization; 8. Provide the results of a privacy risk assessment for systems processing personally identifiable information; 9. Describe the operational environment for the system and any dependencies on or connections to other systems or system components; 10. Provide an overview of the security and privacy requirements for the system; 11. Identify any relevant control baselines or overlays, if applicable; 12. Describe the controls in place or planned for meeting the security and privacy requirements, including a rationale for any tailoring decisions; 13. Include risk determinations for security and privacy architecture and design decisions; 14. Include security- and privacy-related activities affecting the system that require planning and coordination with [Assignment: organization-defined individuals or groups]; and 15. Are reviewed and approved by the authorizing official or designated representative prior to plan implementation. b. Distribute copies of the plans and communicate subsequent changes to the plans to [Assignment: organization-defined personnel or roles]; c. Review the plans [Assignment: organization-defined frequency]; d. Update the plans to address changes to the system and environment of operation or problems identified during plan implementation or control assessments; and e. Protect the plans from unauthorized disclosure and modification.

๐Ÿ’ผ PL-2 System Security and Privacy Plans (L)(M)(H)

a. Develop security and privacy plans for the system that: 1. Are consistent with the organization's enterprise architecture; 2. Explicitly define the constituent system components; 3. Describe the operational context of the system in terms of mission and business processes; 4. Identify the individuals that fulfill system roles and responsibilities; 5. Identify the information types processed, stored, and transmitted by the system; 6. Provide the security categorization of the system, including supporting rationale; 7. Describe any specific threats to the system that are of concern to the organization; 8. Provide the results of a privacy risk assessment for systems processing personally identifiable information; 9. Describe the operational environment for the system and any dependencies on or connections to other systems or system components; 10. Provide an overview of the security and privacy requirements for the system; 11. Identify any relevant control baselines or overlays, if applicable; 12. Describe the controls in place or planned for meeting the security and privacy requirements, including a rationale for any tailoring decisions; 13. Include risk determinations for security and privacy architecture and design decisions; 14. Include security- and privacy-related activities affecting the system that require planning and coordination with [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role or designees]; and 15. Are reviewed and approved by the authorizing official or designated representative prior to plan implementation. b. Distribute copies of the plans and communicate subsequent changes to the plans to [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role]; c. Review the plans [FedRAMP Assignment: at least annually]; d. Update the plans to address changes to the system and environment of operation or problems identified during plan implementation or control assessments; and e. Protect the plans from unauthorized disclosure and modification.

๐Ÿ’ผ PL-2 System Security and Privacy Plans (L)(M)(H)

a. Develop security and privacy plans for the system that: 1. Are consistent with the organization's enterprise architecture; 2. Explicitly define the constituent system components; 3. Describe the operational context of the system in terms of mission and business processes; 4. Identify the individuals that fulfill system roles and responsibilities; 5. Identify the information types processed, stored, and transmitted by the system; 6. Provide the security categorization of the system, including supporting rationale; 7. Describe any specific threats to the system that are of concern to the organization; 8. Provide the results of a privacy risk assessment for systems processing personally identifiable information; 9. Describe the operational environment for the system and any dependencies on or connections to other systems or system components; 10. Provide an overview of the security and privacy requirements for the system; 11. Identify any relevant control baselines or overlays, if applicable; 12. Describe the controls in place or planned for meeting the security and privacy requirements, including a rationale for any tailoring decisions; 13. Include risk determinations for security and privacy architecture and design decisions; 14. Include security- and privacy-related activities affecting the system that require planning and coordination with [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role or designees]; and 15. Are reviewed and approved by the authorizing official or designated representative prior to plan implementation. b. Distribute copies of the plans and communicate subsequent changes to the plans to [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role]; c. Review the plans [FedRAMP Assignment: at least annually]; d. Update the plans to address changes to the system and environment of operation or problems identified during plan implementation or control assessments; and e. Protect the plans from unauthorized disclosure and modification.

๐Ÿ’ผ PL-2 System Security and Privacy Plans (L)(M)(H)

a. Develop security and privacy plans for the system that: 1. Are consistent with the organization's enterprise architecture; 2. Explicitly define the constituent system components; 3. Describe the operational context of the system in terms of mission and business processes; 4. Identify the individuals that fulfill system roles and responsibilities; 5. Identify the information types processed, stored, and transmitted by the system; 6. Provide the security categorization of the system, including supporting rationale; 7. Describe any specific threats to the system that are of concern to the organization; 8. Provide the results of a privacy risk assessment for systems processing personally identifiable information; 9. Describe the operational environment for the system and any dependencies on or connections to other systems or system components; 10. Provide an overview of the security and privacy requirements for the system; 11. Identify any relevant control baselines or overlays, if applicable; 12. Describe the controls in place or planned for meeting the security and privacy requirements, including a rationale for any tailoring decisions; 13. Include risk determinations for security and privacy architecture and design decisions; 14. Include security- and privacy-related activities affecting the system that require planning and coordination with [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role or designees]; and 15. Are reviewed and approved by the authorizing official or designated representative prior to plan implementation. b. Distribute copies of the plans and communicate subsequent changes to the plans to [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role]; c. Review the plans [FedRAMP Assignment: at least annually]; d. Update the plans to address changes to the system and environment of operation or problems identified during plan implementation or control assessments; and e. Protect the plans from unauthorized disclosure and modification.

๐Ÿ’ผ PL-2 SYSTEM SECURITY PLAN

The organization: PL-2a. Develops a security plan for the information system that: PL-2a.1. Is consistent with the organization???s enterprise architecture; PL-2a.2. Explicitly defines the authorization boundary for the system; PL-2a.3. Describes the operational context of the information system in terms of missions and business processes; PL-2a.4. Provides the security categorization of the information system including supporting rationale; PL-2a.5. Describes the operational environment for the information system and relationships with or connections to other information systems; PL-2a.6. Provides an overview of the security requirements for the system; PL-2a.7. Identifies any relevant overlays, if applicable; PL-2a.8. Describes the security controls in place or planned for meeting those requirements including a rationale for the tailoring decisions; and PL-2a.9. Is reviewed and approved by the authorizing official or designated representative prior to plan implementation; PL-2b. Distributes copies of the security plan and communicates subsequent changes to the plan to [Assignment: organization-defined personnel or roles]; PL-2c. Reviews the security plan for the information system [Assignment: organization-defined frequency]; PL-2d. Updates the plan to address changes to the information system/environment of operation or problems identified during plan implementation or security control assessments; and PL-2e. Protects the security plan from unauthorized disclosure and modification.

๐Ÿ’ผ PL-4 Rules of Behavior

a. Establish and provide to individuals requiring access to the system, the rules that describe their responsibilities and expected behavior for information and system usage, security, and privacy; b. Receive a documented acknowledgment from such individuals, indicating that they have read, understand, and agree to abide by the rules of behavior, before authorizing access to information and the system; c. Review and update the rules of behavior [Assignment: organization-defined frequency]; and d. Require individuals who have acknowledged a previous version of the rules of behavior to read and re-acknowledge [Selection (one or more): [Assignment: organization-defined frequency]; when the rules are revised or updated].

๐Ÿ’ผ PL-4 RULES OF BEHAVIOR

The organization: PL-4a. Establishes and makes readily available to individuals requiring access to the information system, the rules that describe their responsibilities and expected behavior with regard to information and information system usage; PL-4b. Receives a signed acknowledgment from such individuals, indicating that they have read, understand, and agree to abide by the rules of behavior, before authorizing access to information and the information system; PL-4c. Reviews and updates the rules of behavior [Assignment: organization-defined frequency]; and PL-4d. Requires individuals who have signed a previous version of the rules of behavior to read and re-sign when the rules of behavior are revised/updated.

๐Ÿ’ผ PL-4 Rules of Behavior (L)(M)(H)

a. Establish and provide to individuals requiring access to the system, the rules that describe their responsibilities and expected behavior for information and system usage, security, and privacy; b. Receive a documented acknowledgment from such individuals, indicating that they have read, understand, and agree to abide by the rules of behavior, before authorizing access to information and the system; c. Review and update the rules of behavior [FedRAMP Assignment: at least every three (3) years]; and d. Require individuals who have acknowledged a previous version of the rules of behavior to read and re-acknowledge [FedRAMP Assignment: at least annually and when the rules are revised or changed].

๐Ÿ’ผ PL-4 Rules of Behavior (L)(M)(H)

a. Establish and provide to individuals requiring access to the system, the rules that describe their responsibilities and expected behavior for information and system usage, security, and privacy; b. Receive a documented acknowledgment from such individuals, indicating that they have read, understand, and agree to abide by the rules of behavior, before authorizing access to information and the system; c. Review and update the rules of behavior [FedRAMP Assignment: at least every three (3) years]; and d. Require individuals who have acknowledged a previous version of the rules of behavior to read and re-acknowledge [FedRAMP Assignment: at least annually and when the rules are revised or changed].

๐Ÿ’ผ PL-4 Rules of Behavior (L)(M)(H)

a. Establish and provide to individuals requiring access to the system, the rules that describe their responsibilities and expected behavior for information and system usage, security, and privacy; b. Receive a documented acknowledgment from such individuals, indicating that they have read, understand, and agree to abide by the rules of behavior, before authorizing access to information and the system; c. Review and update the rules of behavior [FedRAMP Assignment: at least every three (3) years]; and d. Require individuals who have acknowledged a previous version of the rules of behavior to read and re-acknowledge [FedRAMP Assignment: at least annually and when the rules are revised or changed].

๐Ÿ’ผ PL-4(1) Social Media and External Site/Application Usage Restrictions (L)(M)(H)

Include in the rules of behavior, restrictions on: (a) Use of social media, social networking sites, and external sites/applications; (b) Posting organizational information on public websites; and (c) Use of organization-provided identifiers (e.g., email addresses) and authentication secrets (e.g., passwords) for creating accounts on external sites/applications.

๐Ÿ’ผ PL-4(1) Social Media and External Site/Application Usage Restrictions (L)(M)(H)

Include in the rules of behavior, restrictions on: (a) Use of social media, social networking sites, and external sites/applications; (b) Posting organizational information on public websites; and (c) Use of organization-provided identifiers (e.g., email addresses) and authentication secrets (e.g., passwords) for creating accounts on external sites/applications.

๐Ÿ’ผ PL-4(1) Social Media and External Site/Application Usage Restrictions (L)(M)(H)

Include in the rules of behavior, restrictions on: (a) Use of social media, social networking sites, and external sites/applications; (b) Posting organizational information on public websites; and (c) Use of organization-provided identifiers (e.g., email addresses) and authentication secrets (e.g., passwords) for creating accounts on external sites/applications.

๐Ÿ’ผ PL-7 Concept of Operations

a. Develop a Concept of Operations (CONOPS) for the system describing how the organization intends to operate the system from the perspective of information security and privacy; and b. Review and update the CONOPS [Assignment: organization-defined frequency].

๐Ÿ’ผ PL-7 SECURITY CONCEPT OF OPERATIONS

The organization: PL-7a. Develops a security Concept of Operations (CONOPS) for the information system containing at a minimum, how the organization intends to operate the system from the perspective of information security; and PL-7b. Reviews and updates the CONOPS [Assignment: organization-defined frequency].

๐Ÿ’ผ PL-8 (1) DEFENSE-IN-DEPTH

The organization designs its security architecture using a defense-in-depth approach that: PL-8 (1)(a) Allocates [Assignment: organization-defined security safeguards] to [Assignment: organization-defined locations and architectural layers]; and PL-8 (1)(b) Ensures that the allocated security safeguards operate in a coordinated and mutually reinforcing manner.

๐Ÿ’ผ PL-8 (2) SUPPLIER DIVERSITY

The organization requires that [Assignment: organization-defined security safeguards] allocated to [Assignment: organization-defined locations and architectural layers] are obtained from different suppliers.

๐Ÿ’ผ PL-8 INFORMATION SECURITY ARCHITECTURE

The organization: PL-8a. Develops an information security architecture for the information system that: PL-8a.1. Describes the overall philosophy, requirements, and approach to be taken with regard to protecting the confidentiality, integrity, and availability of organizational information; PL-8a.2. Describes how the information security architecture is integrated into and supports the enterprise architecture; and PL-8a.3. Describes any information security assumptions about, and dependencies on, external services; PL-8b. Reviews and updates the information security architecture [Assignment: organization-defined frequency] to reflect updates in the enterprise architecture; and PL-8c. Ensures that planned information security architecture changes are reflected in the security plan, the security Concept of Operations (CONOPS), and organizational procurements/acquisitions.

๐Ÿ’ผ PL-8 Security and Privacy Architectures

a. Develop security and privacy architectures for the system that: 1. Describe the requirements and approach to be taken for protecting the confidentiality, integrity, and availability of organizational information; 2. Describe the requirements and approach to be taken for processing personally identifiable information to minimize privacy risk to individuals; 3. Describe how the architectures are integrated into and support the enterprise architecture; and 4. Describe any assumptions about, and dependencies on, external systems and services; b. Review and update the architectures [Assignment: organization-defined frequency] to reflect changes in the enterprise architecture; and c. Reflect planned architecture changes in security and privacy plans, Concept of Operations (CONOPS), criticality analysis, organizational procedures, and procurements and acquisitions.

๐Ÿ’ผ PL-8 Security and Privacy Architectures (L)(M)(H)

a. Develop security and privacy architectures for the system that: 1. Describe the requirements and approach to be taken for protecting the confidentiality, integrity, and availability of organizational information; 2. Describe the requirements and approach to be taken for processing personally identifiable information to minimize privacy risk to individuals; 3. Describe how the architectures are integrated into and support the enterprise architecture; and 4. Describe any assumptions about, and dependencies on, external systems and services; b. Review and update the architectures [FedRAMP Assignment: at least annually and when a significant change occurs] to reflect changes in the enterprise architecture; and c. Reflect planned architecture changes in security and privacy plans, Concept of Operations (CONOPS), criticality analysis, organizational procedures, and procurements and acquisitions. **PL-8 Additional FedRAMP Requirements and Guidance:** **(b) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F.

๐Ÿ’ผ PL-8 Security and Privacy Architectures (L)(M)(H)

a. Develop security and privacy architectures for the system that: 1. Describe the requirements and approach to be taken for protecting the confidentiality, integrity, and availability of organizational information; 2. Describe the requirements and approach to be taken for processing personally identifiable information to minimize privacy risk to individuals; 3. Describe how the architectures are integrated into and support the enterprise architecture; and 4. Describe any assumptions about, and dependencies on, external systems and services; b. Review and update the architectures [FedRAMP Assignment: at least annually and when a significant change occurs] to reflect changes in the enterprise architecture; and c. Reflect planned architecture changes in security and privacy plans, Concept of Operations (CONOPS), criticality analysis, organizational procedures, and procurements and acquisitions. **PL-8 Additional FedRAMP Requirements and Guidance:** **(b) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F.

๐Ÿ’ผ PL-8 Security and Privacy Architectures (L)(M)(H)

a. Develop security and privacy architectures for the system that: 1. Describe the requirements and approach to be taken for protecting the confidentiality, integrity, and availability of organizational information; 2. Describe the requirements and approach to be taken for processing personally identifiable information to minimize privacy risk to individuals; 3. Describe how the architectures are integrated into and support the enterprise architecture; and 4. Describe any assumptions about, and dependencies on, external systems and services; b. Review and update the architectures [FedRAMP Assignment: at least annually and when a significant change occurs] to reflect changes in the enterprise architecture; and c. Reflect planned architecture changes in security and privacy plans, Concept of Operations (CONOPS), criticality analysis, organizational procedures, and procurements and acquisitions. **PL-8 Additional FedRAMP Requirements and Guidance:** **(b) Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F.

๐Ÿ’ผ PL-8(1) Security and Privacy Architectures | Defense in Depth

Design the security and privacy architectures for the system using a defense-in-depth approach that: (a) Allocates [Assignment: organization-defined controls] to [Assignment: organization-defined locations and architectural layers]; and (b) Ensures that the allocated controls operate in a coordinated and mutually reinforcing manner.

๐Ÿ’ผ Plan for data transfer

An advantage of the cloud is that it is a managed network service. There is no longer the need to manage and operate a fleet of switches, routers, and other associated network equipment. Networking resources in the cloud are consumed and paid for in the same way you pay for CPU and storage, you only pay for what you use. Efficient use of networking resources is required for cost optimization in the cloud.

๐Ÿ’ผ Plan for Disaster Recovery (DR)

Having backups and redundant workload components in place is the start of your DR strategy. RTO and RPO are your objectives for restoration of your workload. Set these based on business needs. Implement a strategy to meet these objectives, considering locations and function of workload resources and data. The probability of disruption and cost of recovery are also key factors that help to inform the business value of providing disaster recovery for a workload.

๐Ÿ’ผ Plan your network topology

Workloads often exist in multiple environments. These include multiple cloud environments (both publicly accessible and private) and possibly your existing data center infrastructure. Plans must include network considerations, such as intrasystem and intersystem connectivity, public IP address management, private IP address management, and domain name resolution.

๐Ÿ’ผ PM-1 Information Security Program Plan

a. Develop and disseminate an organization-wide information security program plan that: 1. Provides an overview of the requirements for the security program and a description of the security program management controls and common controls in place or planned for meeting those requirements; 2. Includes the identification and assignment of roles, responsibilities, management commitment, coordination among organizational entities, and compliance; 3. Reflects the coordination among organizational entities responsible for information security; and 4. Is approved by a senior official with responsibility and accountability for the risk being incurred to organizational operations (including mission, functions, image, and reputation), organizational assets, individuals, other organizations, and the Nation; b. Review and update the organization-wide information security program plan [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and c. Protect the information security program plan from unauthorized disclosure and modification.

๐Ÿ’ผ PM-10 Authorization Process

a. Manage the security and privacy state of organizational systems and the environments in which those systems operate through authorization processes; b. Designate individuals to fulfill specific roles and responsibilities within the organizational risk management process; and c. Integrate the authorization processes into an organization-wide risk management program.

๐Ÿ’ผ PM-11 Mission and Business Process Definition

a. Define organizational mission and business processes with consideration for information security and privacy and the resulting risk to organizational operations, organizational assets, individuals, other organizations, and the Nation; and b. Determine information protection and personally identifiable information processing needs arising from the defined mission and business processes; and c. Review and revise the mission and business processes [Assignment: organization-defined frequency].

๐Ÿ’ผ PM-14 Testing, Training, and Monitoring

a. Implement a process for ensuring that organizational plans for conducting security and privacy testing, training, and monitoring activities associated with organizational systems: 1. Are developed and maintained; and 2. Continue to be executed; and b. Review testing, training, and monitoring plans for consistency with the organizational risk management strategy and organization-wide priorities for risk response actions.

๐Ÿ’ผ PM-15 Security and Privacy Groups and Associations

Establish and institutionalize contact with selected groups and associations within the security and privacy communities: a. To facilitate ongoing security and privacy education and training for organizational personnel; b. To maintain currency with recommended security and privacy practices, techniques, and technologies; and c. To share current security and privacy information, including threats, vulnerabilities, and incidents.

๐Ÿ’ผ PM-17 Protecting Controlled Unclassified Information on External Systems

a. Establish policy and procedures to ensure that requirements for the protection of controlled unclassified information that is processed, stored or transmitted on external systems, are implemented in accordance with applicable laws, executive orders, directives, policies, regulations, and standards; and b. Review and update the policy and procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ PM-18 Privacy Program Plan

a. Develop and disseminate an organization-wide privacy program plan that provides an overview of the agencyโ€™s privacy program, and: 1. Includes a description of the structure of the privacy program and the resources dedicated to the privacy program; 2. Provides an overview of the requirements for the privacy program and a description of the privacy program management controls and common controls in place or planned for meeting those requirements; 3. Includes the role of the senior agency official for privacy and the identification and assignment of roles of other privacy officials and staff and their responsibilities; 4. Describes management commitment, compliance, and the strategic goals and objectives of the privacy program; 5. Reflects coordination among organizational entities responsible for the different aspects of privacy; and 6. Is approved by a senior official with responsibility and accountability for the privacy risk being incurred to organizational operations (including mission, functions, image, and reputation), organizational assets, individuals, other organizations, and the Nation; and b. Update the plan [Assignment: organization-defined frequency] and to address changes in federal privacy laws and policy and organizational changes and problems identified during plan implementation or privacy control assessments.

๐Ÿ’ผ PM-19 Privacy Program Leadership Role

Appoint a senior agency official for privacy with the authority, mission, accountability, and resources to coordinate, develop, and implement, applicable privacy requirements and manage privacy risks through the organization-wide privacy program.

๐Ÿ’ผ PM-20 Dissemination of Privacy Program Information

Maintain a central resource webpage on the organizationโ€™s principal public website that serves as a central source of information about the organizationโ€™s privacy program and that: a. Ensures that the public has access to information about organizational privacy activities and can communicate with its senior agency official for privacy; b. Ensures that organizational privacy practices and reports are publicly available; and c. Employs publicly facing email addresses and/or phone lines to enable the public to provide feedback and/or direct questions to privacy offices regarding privacy practices.

๐Ÿ’ผ PM-20(1) Dissemination of Privacy Program Information | Privacy Policies on Websites, Applications, and Digital Services

Develop and post privacy policies on all external-facing websites, mobile applications, and other digital services, that: (a) Are written in plain language and organized in a way that is easy to understand and navigate; (b) Provide information needed by the public to make an informed decision about whether and how to interact with the organization; and (c) Are updated whenever the organization makes a substantive change to the practices it describes and includes a time/date stamp to inform the public of the date of the most recent changes.

๐Ÿ’ผ PM-21 Accounting of Disclosures

a. Develop and maintain an accurate accounting of disclosures of personally identifiable information, including: 1. Date, nature, and purpose of each disclosure; and 2. Name and address, or other contact information of the individual or organization to which the disclosure was made; b. Retain the accounting of disclosures for the length of the time the personally identifiable information is maintained or five years after the disclosure is made, whichever is longer; and c. Make the accounting of disclosures available to the individual to whom the personally identifiable information relates upon request.

๐Ÿ’ผ PM-22 Personally Identifiable Information Quality Management

Develop and document organization-wide policies and procedures for: a. Reviewing for the accuracy, relevance, timeliness, and completeness of personally identifiable information across the information life cycle; b. Correcting or deleting inaccurate or outdated personally identifiable information; c. Disseminating notice of corrected or deleted personally identifiable information to individuals or other appropriate entities; and d. Appeals of adverse decisions on correction or deletion requests.

๐Ÿ’ผ PM-24 Data Integrity Board

Establish a Data Integrity Board to: a. Review proposals to conduct or participate in a matching program; and b. Conduct an annual review of all matching programs in which the agency has participated.

๐Ÿ’ผ PM-25 Minimization of Personally Identifiable Information Used in Testing, Training, and Research

a. Develop, document, and implement policies and procedures that address the use of personally identifiable information for internal testing, training, and research; b. Limit or minimize the amount of personally identifiable information used for internal testing, training, and research purposes; c. Authorize the use of personally identifiable information when such information is required for internal testing, training, and research; and d. Review and update policies and procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ PM-26 Complaint Management

Implement a process for receiving and responding to complaints, concerns, or questions from individuals about the organizational security and privacy practices that includes: a. Mechanisms that are easy to use and readily accessible by the public; b. All information necessary for successfully filing complaints; c. Tracking mechanisms to ensure all complaints received are reviewed and addressed within [Assignment: organization-defined time period]; d. Acknowledgement of receipt of complaints, concerns, or questions from individuals within [Assignment: organization-defined time period]; and e. Response to complaints, concerns, or questions from individuals within [Assignment: organization-defined time period].

๐Ÿ’ผ PM-27 Privacy Reporting

a. Develop [Assignment: organization-defined privacy reports] and disseminate to: 1. [Assignment: organization-defined oversight bodies] to demonstrate accountability with statutory, regulatory, and policy privacy mandates; and 2. [Assignment: organization-defined officials] and other personnel with responsibility for monitoring privacy program compliance; and b. Review and update privacy reports [Assignment: organization-defined frequency].

๐Ÿ’ผ PM-28 Risk Framing

a. Identify and document: 1. Assumptions affecting risk assessments, risk responses, and risk monitoring; 2. Constraints affecting risk assessments, risk responses, and risk monitoring; 3. Priorities and trade-offs considered by the organization for managing risk; and 4. Organizational risk tolerance; b. Distribute the results of risk framing activities to [Assignment: organization-defined personnel]; and c. Review and update risk framing considerations [Assignment: organization-defined frequency].

๐Ÿ’ผ PM-29 Risk Management Program Leadership Roles

a. Appoint a Senior Accountable Official for Risk Management to align organizational information security and privacy management processes with strategic, operational, and budgetary planning processes; and b. Establish a Risk Executive (function) to view and analyze risk from an organization-wide perspective and ensure management of risk is consistent across the organization.

๐Ÿ’ผ PM-3 Information Security and Privacy Resources

a. Include the resources needed to implement the information security and privacy programs in capital planning and investment requests and document all exceptions to this requirement; b. Prepare documentation required for addressing information security and privacy programs in capital planning and investment requests in accordance with applicable laws, executive orders, directives, policies, regulations, standards; and c. Make available for expenditure, the planned information security and privacy resources.

๐Ÿ’ผ PM-30 Supply Chain Risk Management Strategy

a. Develop an organization-wide strategy for managing supply chain risks associated with the development, acquisition, maintenance, and disposal of systems, system components, and system services; 1. Implement the supply chain risk management strategy consistently across the organization; and (a) Review and update the supply chain risk management strategy on [Assignment: organization-defined frequency] or as required, to address organizational changes.

๐Ÿ’ผ PM-31 Continuous Monitoring Strategy

Develop an organization-wide continuous monitoring strategy and implement continuous monitoring programs that include: a. Establishing the following organization-wide metrics to be monitored: [Assignment: organization-defined metrics]; b. Establishing [Assignment: organization-defined frequencies] for monitoring and [Assignment: organization-defined frequencies] for assessment of control effectiveness; c. Ongoing monitoring of organizationally-defined metrics in accordance with the continuous monitoring strategy; d. Correlation and analysis of information generated by control assessments and monitoring; e. Response actions to address results of the analysis of control assessment and monitoring information; and f. Reporting the security and privacy status of organizational systems to [Assignment: organization-defined personnel or roles] [Assignment: organization-defined frequency].

๐Ÿ’ผ PM-32 Purposing

Analyze [Assignment: organization-defined systems or systems components] supporting mission essential services or functions to ensure that the information resources are being used consistent with their intended purpose.

๐Ÿ’ผ PM-4 Plan of Action and Milestones Process

a. Implement a process to ensure that plans of action and milestones for the information security, privacy, and supply chain risk management programs and associated organizational systems: 1. Are developed and maintained; 2. Document the remedial information security, privacy, and supply chain risk management actions to adequately respond to risk to organizational operations and assets, individuals, other organizations, and the Nation; and 3. Are reported in accordance with established reporting requirements. b. Review plans of action and milestones for consistency with the organizational risk management strategy and organization-wide priorities for risk response actions.

๐Ÿ’ผ PM-7 Enterprise Architecture

Develop and maintain an enterprise architecture with consideration for information security, privacy, and the resulting risk to organizational operations and assets, individuals, other organizations, and the Nation.

๐Ÿ’ผ PM-9 Risk Management Strategy

a. Develops a comprehensive strategy to manage: 1. Security risk to organizational operations and assets, individuals, other organizations, and the Nation associated with the operation and use of organizational systems; and 2. Privacy risk to individuals resulting from the authorized processing of personally identifiable information; b. Implement the risk management strategy consistently across the organization; and c. Review and update the risk management strategy [Assignment: organization-defined frequency] or as required, to address organizational changes.

๐Ÿ’ผ PR.AA-01: Identities and credentials for authorized users, services, and hardware are managed by the organization

1. Initiate requests for new access or additional access for employees, contractors, and others, and track, review, and fulfill the requests, with permission from system or data owners when needed 2. Issue, manage, and revoke cryptographic certificates and identity tokens, cryptographic keys (i.e., key management), and other credentials 3. Select a unique identifier for each device from immutable hardware characteristics or an identifier securely provisioned to the device 4. Physically label authorized hardware with an identifier for inventory and servicing purposes

๐Ÿ’ผ PR.AA-03: Users, services, and hardware are authenticated

1. Require multifactor authentication 2. Enforce policies for the minimum strength of passwords, PINs, and similar authenticators 3. Periodically reauthenticate users, services, and hardware based on risk (e.g., in zero trust architectures) 4. Ensure that authorized personnel can access accounts essential for protecting safety under emergency conditions

๐Ÿ’ผ PR.AA-04: Identity assertions are protected, conveyed, and verified

1. Protect identity assertions that are used to convey authentication and user information through single sign-on systems 2. Protect identity assertions that are used to convey authentication and user information between federated systems 3. Implement standards-based approaches for identity assertions in all contexts, and follow all guidance for the generation (e.g., data models, metadata), protection (e.g., digital signing, encryption), and verification (e.g., signature validation) of identity assertions

๐Ÿ’ผ PR.AA-05: Access permissions, entitlements, and authorizations are defined in a policy, managed, enforced, and reviewed, and incorporate the principles of least privilege and separation of duties

1. Review logical and physical access privileges periodically and whenever someone changes roles or leaves the organization, and promptly rescind privileges that are no longer needed 2. Take attributes of the requester and the requested resource into account for authorization decisions (e.g., geolocation, day/time, requester endpoint's cyber health) 3. Restrict access and privileges to the minimum necessary (e.g., zero trust architecture) 4. Periodically review the privileges associated with critical business functions to confirm proper separation of duties

๐Ÿ’ผ PR.AT-01: Personnel are provided with awareness and training so that they possess the knowledge and skills to perform general tasks with cybersecurity risks in mind

1. Provide basic cybersecurity awareness and training to employees, contractors, partners, suppliers, and all other users of the organization's non-public resources 2. Train personnel to recognize social engineering attempts and other common attacks, report attacks and suspicious activity, comply with acceptable use policies, and perform basic cyber hygiene tasks (e.g., patching software, choosing passwords, protecting credentials) 3. Explain the consequences of cybersecurity policy violations, both to individual users and the organization as a whole 4. Periodically assess or test users on their understanding of basic cybersecurity practices 5. Require annual refreshers to reinforce existing practices and introduce new practices

๐Ÿ’ผ PR.AT-02: Individuals in specialized roles are provided with awareness and training so that they possess the knowledge and skills to perform relevant tasks with cybersecurity risks in mind

1. Identify the specialized roles within the organization that require additional cybersecurity training, such as physical and cybersecurity personnel, finance personnel, senior leadership, and anyone with access to business-critical data 2. Provide role-based cybersecurity awareness and training to all those in specialized roles, including contractors, partners, suppliers, and other third parties 3. Periodically assess or test users on their understanding of cybersecurity practices for their specialized roles 4. Require annual refreshers to reinforce existing practices and introduce new practices

๐Ÿ’ผ PR.DS-01: The confidentiality, integrity, and availability of data-at-rest are protected

1. Use encryption, digital signatures, and cryptographic hashes to protect the confidentiality and integrity of stored data in files, databases, virtual machine disk images, container images, and other resources 2. Use full disk encryption to protect data stored on user endpoints 3. Confirm the integrity of software by validating signatures 4. Restrict the use of removable media to prevent data exfiltration 5. Physically secure removable media containing unencrypted sensitive information, such as within locked offices or file cabinets

๐Ÿ’ผ PR.DS-02: The confidentiality, integrity, and availability of data-in-transit are protected

1. Use encryption, digital signatures, and cryptographic hashes to protect the confidentiality and integrity of network communications 2. Automatically encrypt or block outbound emails and other communications that contain sensitive data, depending on the data classification 3. Block access to personal email, file sharing, file storage services, and other personal communications applications and services from organizational systems and networks 4. Prevent reuse of sensitive data from production environments (e.g., customer records) in development, testing, and other non-production environments

๐Ÿ’ผ PR.DS-11: Backups of data are created, protected, maintained, and tested

1. Continuously back up critical data in near-real-time, and back up other data frequently at agreed-upon schedules 2. Test backups and restores for all types of data sources at least annually 3. Securely store some backups offline and offsite so that an incident or disaster will not damage them 4. Enforce geographic separation and geolocation restrictions for data backup storage

๐Ÿ’ผ PR.IR-01: Networks and environments are protected from unauthorized logical access and usage

1. Logically segment organization networks and cloud-based platforms according to trust boundaries and platform types (e.g., IT, IoT, OT, mobile, guests), and permit required communications only between segments 2. Logically segment organization networks from external networks, and permit only necessary communications to enter the organization's networks from the external networks 3. Implement zero trust architectures to restrict network access to each resource to the minimum necessary 4. Check the cyber health of endpoints before allowing them to access and use production resources

๐Ÿ’ผ Preparation

Preparing for an incident is critical for timely and effective incident response. Preparation is done across three domains: - **People**: Preparing your people for a security incident involves identifying the relevant stakeholders for incident response and training them on incident response and cloud technologies. - **Process**: Preparing your processes for a security incident involves documenting architectures, developing thorough incident response plans, and creating playbooks for consistent response to security events. - **Technology**: Preparing your technology for a security incident involves setting up access, aggregating and monitoring necessary logs, implementing effective alerting mechanisms, and developing response and investigative capabilities. Each of these domains are equally important for effective incident response. No incident response program is complete or effective without all three. You will need to prepare people, processes, and technologies with tight integration in order to be prepared for an incident.

๐Ÿ’ผ Process and culture

When architecting workloads, there are principles and practices that you can adopt to help you better run efficient high-performing cloud workloads. This focus area offers best practices to help adopt a culture that fosters performance efficiency of cloud workloads.

๐Ÿ’ผ Process and culture

Look for opportunities to reduce your sustainability impact by making changes to your development, test, and deployment practices.

๐Ÿ’ผ Protecting Compute

Compute resources include EC2 instances, containers, AWS Lambda functions, database services, IoT devices, and more. Each of these compute resource types require different approaches to secure them. However, they do share common strategies that you need to consider: defense in depth, vulnerability management, reduction in attack surface, automation of configuration and operation, and performing actions at a distance. In this section, you will find general guidance for protecting your compute resources for key services. For each AWS service used, itโ€™s important for you to check the specific security recommendations in the service documentation.

๐Ÿ’ผ Protecting Data at Rest

Data at rest represents any data that you persist in non-volatile storage for any duration in your workload. This includes block storage, object storage, databases, archives, IoT devices, and any other storage medium on which data is persisted. Protecting your data at rest reduces the risk of unauthorized access, when encryption and appropriate access controls are implemented.

๐Ÿ’ผ Protecting Data in Transit

Data in transit is any data that is sent from one system to another. This includes communication between resources within your workload as well as communication between other services and your end users. By providing the appropriate level of protection for your data in transit, you protect the confidentiality and integrity of your workloadโ€™s data.

๐Ÿ’ผ Protecting Networks

Users, both in your workforce and your customers, can be located anywhere. You need to pivot from traditional models of trusting anyone and anything that has access to your network. When you follow the principle of applying security at all layers, you employ a Zero Trust approach. Zero Trust security is a model where application components or microservices are considered discrete from each other and no component or microservice trusts any other.

๐Ÿ’ผ Protective Technology (PR.PT)

Technical security solutions are managed to ensure the security and resilience of systems and assets, consistent with related policies, procedures, and agreements.

๐Ÿ’ผ PS-1 PERSONNEL SECURITY POLICY AND PROCEDURES

The organization: PS-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: PS-1a.1. A personnel security policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and PS-1a.2. Procedures to facilitate the implementation of the personnel security policy and associated personnel security controls; and PS-1b. Reviews and updates the current: PS-1b.1. Personnel security policy [Assignment: organization-defined frequency]; and PS-1b.2. Personnel security procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ PS-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] personnel security policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the personnel security policy and the associated personnel security controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the personnel security policy and procedures; and c. Review and update the current personnel security: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ PS-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] personnel security policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the personnel security policy and the associated personnel security controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the personnel security policy and procedures; and c. Review and update the current personnel security: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PS-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] personnel security policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the personnel security policy and the associated personnel security controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the personnel security policy and procedures; and c. Review and update the current personnel security: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PS-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] personnel security policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the personnel security policy and the associated personnel security controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the personnel security policy and procedures; and c. Review and update the current personnel security: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ PS-2 Position Risk Designation

a. Assign a risk designation to all organizational positions; b. Establish screening criteria for individuals filling those positions; and c. Review and update position risk designations [Assignment: organization-defined frequency].

๐Ÿ’ผ PS-2 POSITION RISK DESIGNATION

The organization: PS-2a. Assigns a risk designation to all organizational positions; PS-2b. Establishes screening criteria for individuals filling those positions; and PS-2c. Reviews and updates position risk designations [Assignment: organization-defined frequency].

๐Ÿ’ผ PS-2 Position Risk Designation (L)(M)(H)

a. Assign a risk designation to all organizational positions; b. Establish screening criteria for individuals filling those positions; and c. Review and update position risk designations [FedRAMP Assignment: at least every three (3) years].

๐Ÿ’ผ PS-2 Position Risk Designation (L)(M)(H)

a. Assign a risk designation to all organizational positions; b. Establish screening criteria for individuals filling those positions; and c. Review and update position risk designations [FedRAMP Assignment: at least every three (3) years].

๐Ÿ’ผ PS-2 Position Risk Designation (L)(M)(H)

a. Assign a risk designation to all organizational positions; b. Establish screening criteria for individuals filling those positions; and c. Review and update position risk designations [FedRAMP Assignment: at least every three (3) years].

๐Ÿ’ผ PS-3 (1) CLASSIFIED INFORMATION

The organization ensures that individuals accessing an information system processing, storing, or transmitting classified information are cleared and indoctrinated to the highest classification level of the information to which they have access on the system.

๐Ÿ’ผ PS-3 (2) FORMAL INDOCTRINATION

The organization ensures that individuals accessing an information system processing, storing, or transmitting types of classified information which require formal indoctrination, are formally indoctrinated for all of the relevant types of information to which they have access on the system.

๐Ÿ’ผ PS-3 (3) INFORMATION WITH SPECIAL PROTECTION MEASURES

The organization ensures that individuals accessing an information system processing, storing, or transmitting information requiring special protection: PS-3 (3)(a) Have valid access authorizations that are demonstrated by assigned official government duties; and PS-3 (3)(b) Satisfy [Assignment: organization-defined additional personnel screening criteria].

๐Ÿ’ผ PS-3 Personnel Screening

a. Screen individuals prior to authorizing access to the system; and b. Rescreen individuals in accordance with [Assignment: organization-defined conditions requiring rescreening and, where rescreening is so indicated, the frequency of rescreening].

๐Ÿ’ผ PS-3 PERSONNEL SCREENING

The organization: PS-3a. Screens individuals prior to authorizing access to the information system; and PS-3b. Rescreens individuals according to [Assignment: organization-defined conditions requiring rescreening and, where rescreening is so indicated, the frequency of such rescreening].

๐Ÿ’ผ PS-3 Personnel Screening (L)(M)(H)

a. Screen individuals prior to authorizing access to the system; and b. Rescreen individuals in accordance with [FedRAMP Assignment: for national security clearances; a reinvestigation is required during the fifth (5th) year for top secret security clearance, the tenth (10th) year for secret security clearance, and fifteenth (15th) year for confidential security clearance. For moderate risk law enforcement and high impact public trust level, a reinvestigation is required during the fifth (5th) year. There is no reinvestigation for other moderate risk positions or any low risk positions].

๐Ÿ’ผ PS-3 Personnel Screening (L)(M)(H)

a. Screen individuals prior to authorizing access to the system; and b. Rescreen individuals in accordance with [FedRAMP Assignment: for national security clearances; a reinvestigation is required during the fifth (5th) year for top secret security clearance, the tenth (10th) year for secret security clearance, and fifteenth (15th) year for confidential security clearance. For moderate risk law enforcement and high impact public trust level, a reinvestigation is required during the fifth (5th) year. There is no reinvestigation for other moderate risk positions or any low risk positions].

๐Ÿ’ผ PS-3 Personnel Screening (L)(M)(H)

a. Screen individuals prior to authorizing access to the system; and b. Rescreen individuals in accordance with [FedRAMP Assignment: for national security clearances; a reinvestigation is required during the fifth (5th) year for top secret security clearance, the tenth (10th) year for secret security clearance, and fifteenth (15th) year for confidential security clearance. For moderate risk law enforcement and high impact public trust level, a reinvestigation is required during the fifth (5th) year. There is no reinvestigation for other moderate risk positions or any low risk positions].

๐Ÿ’ผ PS-3(2) Personnel Screening | Formal Indoctrination

Verify that individuals accessing a system processing, storing, or transmitting types of classified information that require formal indoctrination, are formally indoctrinated for all the relevant types of information to which they have access on the system.

๐Ÿ’ผ PS-3(3) Information Requiring Special Protective Measures (M)(H)

Verify that individuals accessing a system processing, storing, or transmitting information requiring special protection: (a) Have valid access authorizations that are demonstrated by assigned official government duties; and (b) Satisfy [FedRAMP Assignment: personnel screening criteria - as required by specific information].

๐Ÿ’ผ PS-3(3) Information Requiring Special Protective Measures (M)(H)

Verify that individuals accessing a system processing, storing, or transmitting information requiring special protection: (a) Have valid access authorizations that are demonstrated by assigned official government duties; and (b) Satisfy [FedRAMP Assignment: personnel screening criteria - as required by specific information].

๐Ÿ’ผ PS-4 (1) POST-EMPLOYMENT REQUIREMENTS

The organization: PS-4 (1)(a) Notifies terminated individuals of applicable, legally binding post-employment requirements for the protection of organizational information; and PS-4 (1)(b) Requires terminated individuals to sign an acknowledgment of post-employment requirements as part of the organizational termination process.

๐Ÿ’ผ PS-4 Personnel Termination

Upon termination of individual employment: a. Disable system access within [Assignment: organization-defined time period]; b. Terminate or revoke any authenticators and credentials associated with the individual; c. Conduct exit interviews that include a discussion of [Assignment: organization-defined information security topics]; d. Retrieve all security-related organizational system-related property; and e. Retain access to organizational information and systems formerly controlled by terminated individual.

๐Ÿ’ผ PS-4 PERSONNEL TERMINATION

The organization, upon termination of individual employment: PS-4a. Disables information system access within [Assignment: organization-defined time period]; PS-4b. Terminates/revokes any authenticators/credentials associated with the individual; PS-4c. Conducts exit interviews that include a discussion of [Assignment: organization-defined information security topics]; PS-4d. Retrieves all security-related organizational information system-related property; PS-4e. Retains access to organizational information and information systems formerly controlled by terminated individual; and PS-4f. Notifies [Assignment: organization-defined personnel or roles] within [Assignment: organization-defined time period].

๐Ÿ’ผ PS-4 Personnel Termination (L)(M)(H)

Upon termination of individual employment: a. Disable system access within [FedRAMP Assignment: four (4) hours]; b. Terminate or revoke any authenticators and credentials associated with the individual; c. Conduct exit interviews that include a discussion of [Assignment: organization-defined information security topics]; d. Retrieve all security-related organizational system-related property; and e. Retain access to organizational information and systems formerly controlled by terminated individual.

๐Ÿ’ผ PS-4 Personnel Termination (L)(M)(H)

Upon termination of individual employment: a. Disable system access within [FedRAMP Assignment: four (4) hours]; b. Terminate or revoke any authenticators and credentials associated with the individual; c. Conduct exit interviews that include a discussion of [Assignment: organization-defined information security topics]; d. Retrieve all security-related organizational system-related property; and e. Retain access to organizational information and systems formerly controlled by terminated individual.

๐Ÿ’ผ PS-4 Personnel Termination (L)(M)(H)

Upon termination of individual employment: a. Disable system access within [FedRAMP Assignment: four (4) hours]; b. Terminate or revoke any authenticators and credentials associated with the individual; c. Conduct exit interviews that include a discussion of [Assignment: organization-defined information security topics]; d. Retrieve all security-related organizational system-related property; and e. Retain access to organizational information and systems formerly controlled by terminated individual.

๐Ÿ’ผ PS-4(1) Personnel Termination | Post-employment Requirements

(a) Notify terminated individuals of applicable, legally binding post-employment requirements for the protection of organizational information; and (b) Require terminated individuals to sign an acknowledgment of post-employment requirements as part of the organizational termination process.

๐Ÿ’ผ PS-4(2) Automated Actions (H)

Use [Assignment: organization-defined automated mechanisms] to [Selection (one-or-more): notify [FedRAMP Assignment: access control personnel responsible for disabling access to the system] of individual termination actions; disable access to system resources].

๐Ÿ’ผ PS-5 Personnel Transfer

a. Review and confirm ongoing operational need for current logical and physical access authorizations to systems and facilities when individuals are reassigned or transferred to other positions within the organization; b. Initiate [Assignment: organization-defined transfer or reassignment actions] within [Assignment: organization-defined time period following the formal transfer action]; c. Modify access authorization as needed to correspond with any changes in operational need due to reassignment or transfer; and d. Notify [Assignment: organization-defined personnel or roles] within [Assignment: organization-defined time period].

๐Ÿ’ผ PS-5 PERSONNEL TRANSFER

The organization: PS-5a. Reviews and confirms ongoing operational need for current logical and physical access authorizations to information systems/facilities when individuals are reassigned or transferred to other positions within the organization; PS-5b. Initiates [Assignment: organization-defined transfer or reassignment actions] within [Assignment: organization-defined time period following the formal transfer action]; PS-5c. Modifies access authorization as needed to correspond with any changes in operational need due to reassignment or transfer; and PS-5d. Notifies [Assignment: organization-defined personnel or roles] within [Assignment: organization-defined time period].

๐Ÿ’ผ PS-5 Personnel Transfer (L)(M)(H)

a. Review and confirm ongoing operational need for current logical and physical access authorizations to systems and facilities when individuals are reassigned or transferred to other positions within the organization; b. Initiate [Assignment: organization-defined transfer or reassignment actions] within [FedRAMP Assignment: twenty-four (24) hours]; c. Modify access authorization as needed to correspond with any changes in operational need due to reassignment or transfer; and d. Notify [FedRAMP Assignment: including access control personnel responsible for the system] within [FedRAMP Assignment: twenty-four (24) hours].

๐Ÿ’ผ PS-5 Personnel Transfer (L)(M)(H)

a. Review and confirm ongoing operational need for current logical and physical access authorizations to systems and facilities when individuals are reassigned or transferred to other positions within the organization; b. Initiate [Assignment: organization-defined transfer or reassignment actions] within [FedRAMP Assignment: twenty-four (24) hours]; c. Modify access authorization as needed to correspond with any changes in operational need due to reassignment or transfer; and d. Notify [FedRAMP Assignment: including access control personnel responsible for the system] within [FedRAMP Assignment: twenty-four (24) hours].

๐Ÿ’ผ PS-5 Personnel Transfer (L)(M)(H)

a. Review and confirm ongoing operational need for current logical and physical access authorizations to systems and facilities when individuals are reassigned or transferred to other positions within the organization; b. Initiate [Assignment: organization-defined transfer or reassignment actions] within [FedRAMP Assignment: twenty-four (24) hours]; c. Modify access authorization as needed to correspond with any changes in operational need due to reassignment or transfer; and d. Notify [FedRAMP Assignment: including access control personnel responsible for the system] within [FedRAMP Assignment: twenty-four (24) hours].

๐Ÿ’ผ PS-6 (2) CLASSIFIED INFORMATION REQUIRING SPECIAL PROTECTION

The organization ensures that access to classified information requiring special protection is granted only to individuals who: PS-6 (2)(a) Have a valid access authorization that is demonstrated by assigned official government duties; PS-6 (2)(b) Satisfy associated personnel security criteria; and PS-6 (2)(c) Have read, understood, and signed a nondisclosure agreement.

๐Ÿ’ผ PS-6 (3) POST-EMPLOYMENT REQUIREMENTS

The organization: PS-6 (3)(a) Notifies individuals of applicable, legally binding post-employment requirements for protection of organizational information; and PS-6 (3)(b) Requires individuals to sign an acknowledgment of these requirements, if applicable, as part of granting initial access to covered information.

๐Ÿ’ผ PS-6 Access Agreements

a. Develop and document access agreements for organizational systems; b. Review and update the access agreements [Assignment: organization-defined frequency]; and c. Verify that individuals requiring access to organizational information and systems: 1. Sign appropriate access agreements prior to being granted access; and 2. Re-sign access agreements to maintain access to organizational systems when access agreements have been updated or [Assignment: organization-defined frequency].

๐Ÿ’ผ PS-6 ACCESS AGREEMENTS

The organization: PS-6a. Develops and documents access agreements for organizational information systems; PS-6b. Reviews and updates the access agreements [Assignment: organization-defined frequency]; and PS-6c. Ensures that individuals requiring access to organizational information and information systems: PS-6c.1. Sign appropriate access agreements prior to being granted access; and PS-6c.2. Re-sign access agreements to maintain access to organizational information systems when access agreements have been updated or [Assignment: organization-defined frequency].

๐Ÿ’ผ PS-6 Access Agreements (L)(M)(H)

a. Develop and document access agreements for organizational systems; b. Review and update the access agreements [FedRAMP Assignment: at least annually]; and c. Verify that individuals requiring access to organizational information and systems: 1. Sign appropriate access agreements prior to being granted access; and 2. Re-sign access agreements to maintain access to organizational systems when access agreements have been updated or [FedRAMP Assignment: at least annually and any time there is a change to the user's level of access].

๐Ÿ’ผ PS-6 Access Agreements (L)(M)(H)

a. Develop and document access agreements for organizational systems; b. Review and update the access agreements [FedRAMP Assignment: at least annually]; and c. Verify that individuals requiring access to organizational information and systems: 1. Sign appropriate access agreements prior to being granted access; and 2. Re-sign access agreements to maintain access to organizational systems when access agreements have been updated or [FedRAMP Assignment: at least annually and any time there is a change to the user's level of access].

๐Ÿ’ผ PS-6 Access Agreements (L)(M)(H)

a. Develop and document access agreements for organizational systems; b. Review and update the access agreements [FedRAMP Assignment: at least annually]; and c. Verify that individuals requiring access to organizational information and systems: 1. Sign appropriate access agreements prior to being granted access; and 2. Re-sign access agreements to maintain access to organizational systems when access agreements have been updated or [FedRAMP Assignment: at least annually and any time there is a change to the user's level of access].

๐Ÿ’ผ PS-6(3) Access Agreements | Post-employment Requirements

(a) Notify individuals of applicable, legally binding post-employment requirements for protection of organizational information; and (b) Require individuals to sign an acknowledgment of these requirements, if applicable, as part of granting initial access to covered information.

๐Ÿ’ผ PS-7 External Personnel Security

a. Establish personnel security requirements, including security roles and responsibilities for external providers; b. Require external providers to comply with personnel security policies and procedures established by the organization; c. Document personnel security requirements; d. Require external providers to notify [Assignment: organization-defined personnel or roles] of any personnel transfers or terminations of external personnel who possess organizational credentials and/or badges, or who have system privileges within [Assignment: organization-defined time period]; and e. Monitor provider compliance with personnel security requirements.

๐Ÿ’ผ PS-7 External Personnel Security (L)(M)(H)

a. Establish personnel security requirements, including security roles and responsibilities for external providers; b. Require external providers to comply with personnel security policies and procedures established by the organization; c. Document personnel security requirements; d. Require external providers to notify [FedRAMP Assignment: including access control personnel responsible for the system and/or facilities, as appropriate] of any personnel transfers or terminations of external personnel who possess organizational credentials and/or badges, or who have system privileges within [FedRAMP Assignment: within twenty-four (24) hours]; and e. Monitor provider compliance with personnel security requirements.

๐Ÿ’ผ PS-7 External Personnel Security (L)(M)(H)

a. Establish personnel security requirements, including security roles and responsibilities for external providers; b. Require external providers to comply with personnel security policies and procedures established by the organization; c. Document personnel security requirements; d. Require external providers to notify [FedRAMP Assignment: including access control personnel responsible for the system and/or facilities, as appropriate] of any personnel transfers or terminations of external personnel who possess organizational credentials and/or badges, or who have system privileges within [FedRAMP Assignment: within twenty-four (24) hours]; and e. Monitor provider compliance with personnel security requirements.

๐Ÿ’ผ PS-7 External Personnel Security (L)(M)(H)

a. Establish personnel security requirements, including security roles and responsibilities for external providers; b. Require external providers to comply with personnel security policies and procedures established by the organization; c. Document personnel security requirements; d. Require external providers to notify [FedRAMP Assignment: including access control personnel responsible for the system and/or facilities, as appropriate] of any personnel transfers or terminations of external personnel who possess organizational credentials and/or badges, or who have system privileges within [FedRAMP Assignment: within twenty-four (24) hours]; and e. Monitor provider compliance with personnel security requirements.

๐Ÿ’ผ PS-7 THIRD-PARTY PERSONNEL SECURITY

The organization: PS-7a. Establishes personnel security requirements including security roles and responsibilities for third-party providers; PS-7b. Requires third-party providers to comply with personnel security policies and procedures established by the organization; PS-7c. Documents personnel security requirements; PS-7d. Requires third-party providers to notify [Assignment: organization-defined personnel or roles] of any personnel transfers or terminations of third-party personnel who possess organizational credentials and/or badges, or who have information system privileges within [Assignment: organization-defined time period]; and PS-7e. Monitors provider compliance.

๐Ÿ’ผ PS-8 Personnel Sanctions

a. Employ a formal sanctions process for individuals failing to comply with established information security and privacy policies and procedures; and b. Notify [Assignment: organization-defined personnel or roles] within [Assignment: organization-defined time period] when a formal employee sanctions process is initiated, identifying the individual sanctioned and the reason for the sanction.

๐Ÿ’ผ PS-8 PERSONNEL SANCTIONS

The organization: PS-8a. Employs a formal sanctions process for individuals failing to comply with established information security policies and procedures; and PS-8b. Notifies [Assignment: organization-defined personnel or roles] within [Assignment: organization-defined time period] when a formal employee sanctions process is initiated, identifying the individual sanctioned and the reason for the sanction.

๐Ÿ’ผ PS-8 Personnel Sanctions (L)(M)(H)

a. Employ a formal sanctions process for individuals failing to comply with established information security and privacy policies and procedures; and b. Notify [FedRAMP Assignment: to include the ISSO and/or similar role within the organization] within [FedRAMP Assignment: twenty-four (24) hours] when a formal employee sanctions process is initiated, identifying the individual sanctioned and the reason for the sanction.

๐Ÿ’ผ PS-8 Personnel Sanctions (L)(M)(H)

a. Employ a formal sanctions process for individuals failing to comply with established information security and privacy policies and procedures; and b. Notify [FedRAMP Assignment: to include the ISSO and/or similar role within the organization] within [FedRAMP Assignment: twenty-four (24) hours] when a formal employee sanctions process is initiated, identifying the individual sanctioned and the reason for the sanction.

๐Ÿ’ผ PS-8 Personnel Sanctions (L)(M)(H)

a. Employ a formal sanctions process for individuals failing to comply with established information security and privacy policies and procedures; and b. Notify [FedRAMP Assignment: to include the ISSO and/or similar role within the organization] within [FedRAMP Assignment: twenty-four (24) hours] when a formal employee sanctions process is initiated, identifying the individual sanctioned and the reason for the sanction.

๐Ÿ’ผ PT-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] personally identifiable information processing and transparency policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the personally identifiable information processing and transparency policy and the associated personally identifiable information processing and transparency controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the personally identifiable information processing and transparency policy and procedures; and c. Review and update the current personally identifiable information processing and transparency: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ PT-2 Authority to Process Personally Identifiable Information

a. Determine and document the [Assignment: organization-defined authority] that permits the [Assignment: organization-defined processing] of personally identifiable information; and b. Restrict the [Assignment: organization-defined processing] of personally identifiable information to only that which is authorized.

๐Ÿ’ผ PT-3 Personally Identifiable Information Processing Purposes

a. Identify and document the [Assignment: organization-defined purpose(s)] for processing personally identifiable information; b. Describe the purpose(s) in the public privacy notices and policies of the organization; c. Restrict the [Assignment: organization-defined processing] of personally identifiable information to only that which is compatible with the identified purpose(s); and d. Monitor changes in processing personally identifiable information and implement [Assignment: organization-defined mechanisms] to ensure that any changes are made in accordance with [Assignment: organization-defined requirements].

๐Ÿ’ผ PT-4 Consent

Implement [Assignment: organization-defined tools or mechanisms] for individuals to consent to the processing of their personally identifiable information prior to its collection that facilitate individualsโ€™ informed decision-making.

๐Ÿ’ผ PT-4(2) Consent | Just-in-time Consent

Present [Assignment: organization-defined consent mechanisms] to individuals at [Assignment: organization-defined frequency] and in conjunction with [Assignment: organization-defined personally identifiable information processing].

๐Ÿ’ผ PT-4(3) Consent | Revocation

Implement [Assignment: organization-defined tools or mechanisms] for individuals to revoke consent to the processing of their personally identifiable information.

๐Ÿ’ผ PT-5 Privacy Notice

Provide notice to individuals about the processing of personally identifiable information that: a. Is available to individuals upon first interacting with an organization, and subsequently at [Assignment: organization-defined frequency]; b. Is clear and easy-to-understand, expressing information about personally identifiable information processing in plain language; c. Identifies the authority that authorizes the processing of personally identifiable information; d. Identifies the purposes for which personally identifiable information is to be processed; and e. Includes [Assignment: organization-defined information].

๐Ÿ’ผ PT-5(1) Privacy Notice | Just-in-time Notice

Present notice of personally identifiable information processing to individuals at a time and location where the individual provides personally identifiable information or in conjunction with a data action, or [Assignment: organization-defined frequency].

๐Ÿ’ผ PT-6 System of Records Notice

For systems that process information that will be maintained in a Privacy Act system of records: a. Draft system of records notices in accordance with OMB guidance and submit new and significantly modified system of records notices to the OMB and appropriate congressional committees for advance review; b. Publish system of records notices in the Federal Register; and c. Keep system of records notices accurate, up-to-date, and scoped in accordance with policy.

๐Ÿ’ผ PT-6(1) System of Records Notice | Routine Uses

Review all routine uses published in the system of records notice at [Assignment: organization-defined frequency] to ensure continued accuracy, and to ensure that routine uses continue to be compatible with the purpose for which the information was collected.

๐Ÿ’ผ PT-6(2) System of Records Notice | Exemption Rules

Review all Privacy Act exemptions claimed for the system of records at [Assignment: organization-defined frequency] to ensure they remain appropriate and necessary in accordance with law, that they have been promulgated as regulations, and that they are accurately described in the system of records notice.

๐Ÿ’ผ PT-7(1) Specific Categories of Personally Identifiable Information | Social Security Numbers

When a system processes Social Security numbers: (a) Eliminate unnecessary collection, maintenance, and use of Social Security numbers, and explore alternatives to their use as a personal identifier; (b) Do not deny any individual any right, benefit, or privilege provided by law because of such individualโ€™s refusal to disclose his or her Social Security number; and (c) Inform any individual who is asked to disclose his or her Social Security number whether that disclosure is mandatory or voluntary, by what statutory or other authority such number is solicited, and what uses will be made of it.

๐Ÿ’ผ PT-8 Computer Matching Requirements

When a system or organization processes information for the purpose of conducting a matching program: a. Obtain approval from the Data Integrity Board to conduct the matching program; b. Develop and enter into a computer matching agreement; c. Publish a matching notice in the Federal Register; d. Independently verify the information produced by the matching program before taking adverse action against an individual, if required; and e. Provide individuals with notice and an opportunity to contest the findings before taking adverse action against an individual.

๐Ÿ’ผ RA-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] risk assessment policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the risk assessment policy and the associated risk assessment controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the risk assessment policy and procedures; and c. Review and update the current risk assessment: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ RA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] risk assessment policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the risk assessment policy and the associated risk assessment controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the risk assessment policy and procedures; and c. Review and update the current risk assessment: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ RA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] risk assessment policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the risk assessment policy and the associated risk assessment controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the risk assessment policy and procedures; and c. Review and update the current risk assessment: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ RA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] risk assessment policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the risk assessment policy and the associated risk assessment controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the risk assessment policy and procedures; and c. Review and update the current risk assessment: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ RA-1 RISK ASSESSMENT POLICY AND PROCEDURES

The organization: RA-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: RA-1a.1. A risk assessment policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and RA-1a.2. Procedures to facilitate the implementation of the risk assessment policy and associated risk assessment controls; and RA-1b. Reviews and updates the current: RA-1b.1. Risk assessment policy [Assignment: organization-defined frequency]; and RA-1b.2. Risk assessment procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ RA-10 Threat Hunting

a. Establish and maintain a cyber threat hunting capability to: 1. Search for indicators of compromise in organizational systems; and 2. Detect, track, and disrupt threats that evade existing controls; and b. Employ the threat hunting capability [Assignment: organization-defined frequency].

๐Ÿ’ผ RA-2 Security Categorization

a. Categorize the system and information it processes, stores, and transmits; b. Document the security categorization results, including supporting rationale, in the security plan for the system; and c. Verify that the authorizing official or authorizing official designated representative reviews and approves the security categorization decision.

๐Ÿ’ผ RA-2 SECURITY CATEGORIZATION

The organization: RA-2a. Categorizes information and the information system in accordance with applicable federal laws, Executive Orders, directives, policies, regulations, standards, and guidance; RA-2b. Documents the security categorization results (including supporting rationale) in the security plan for the information system; and RA-2c. Ensures that the authorizing official or authorizing official designated representative reviews and approves the security categorization decision.

๐Ÿ’ผ RA-2 Security Categorization (L)(M)(H)

a. Categorize the system and information it processes, stores, and transmits; b. Document the security categorization results, including supporting rationale, in the security plan for the system; and c. Verify that the authorizing official or authorizing official designated representative reviews and approves the security categorization decision.

๐Ÿ’ผ RA-2 Security Categorization (L)(M)(H)

a. Categorize the system and information it processes, stores, and transmits; b. Document the security categorization results, including supporting rationale, in the security plan for the system; and c. Verify that the authorizing official or authorizing official designated representative reviews and approves the security categorization decision.

๐Ÿ’ผ RA-2 Security Categorization (L)(M)(H)

a. Categorize the system and information it processes, stores, and transmits; b. Document the security categorization results, including supporting rationale, in the security plan for the system; and c. Verify that the authorizing official or authorizing official designated representative reviews and approves the security categorization decision.

๐Ÿ’ผ RA-3 Risk Assessment

a. Conduct a risk assessment, including: 1. Identifying threats to and vulnerabilities in the system; 2. Determining the likelihood and magnitude of harm from unauthorized access, use, disclosure, disruption, modification, or destruction of the system, the information it processes, stores, or transmits, and any related information; and 3. Determining the likelihood and impact of adverse effects on individuals arising from the processing of personally identifiable information; b. Integrate risk assessment results and risk management decisions from the organization and mission or business process perspectives with system-level risk assessments; c. Document risk assessment results in [Selection: security and privacy plans; risk assessment report; [Assignment: organization-defined document]]; d. Review risk assessment results [Assignment: organization-defined frequency]; e. Disseminate risk assessment results to [Assignment: organization-defined personnel or roles]; and f. Update the risk assessment [Assignment: organization-defined frequency] or when there are significant changes to the system, its environment of operation, or other conditions that may impact the security or privacy state of the system.

๐Ÿ’ผ RA-3 RISK ASSESSMENT

The organization: RA-3a. Conducts an assessment of risk, including the likelihood and magnitude of harm, from the unauthorized access, use, disclosure, disruption, modification, or destruction of the information system and the information it processes, stores, or transmits; RA-3b. Documents risk assessment results in [Selection: security plan; risk assessment report; [Assignment: organization-defined document]]; RA-3c. Reviews risk assessment results [Assignment: organization-defined frequency]; RA-3d. Disseminates risk assessment results to [Assignment: organization-defined personnel or roles]; and RA-3e. Updates the risk assessment [Assignment: organization-defined frequency] or whenever there are significant changes to the information system or environment of operation (including the identification of new threats and vulnerabilities), or other conditions that may impact the security state of the system.

๐Ÿ’ผ RA-3 Risk Assessment (L)(M)(H)

a. Conduct a risk assessment, including: 1. Identifying threats to and vulnerabilities in the system; 2. Determining the likelihood and magnitude of harm from unauthorized access, use, disclosure, disruption, modification, or destruction of the system, the information it processes, stores, or transmits, and any related information; and 3. Determining the likelihood and impact of adverse effects on individuals arising from the processing of personally identifiable information; b. Integrate risk assessment results and risk management decisions from the organization and mission or business process perspectives with system-level risk assessments; c. Document risk assessment results in [FedRAMP Assignment: security assessment report]; d. Review risk assessment results [FedRAMP Assignment: at least every three (3) years and when a significant change occurs]; e. Disseminate risk assessment results to [Assignment: organization-defined personnel or roles]; and f. Update the risk assessment [FedRAMP Assignment: at least every three (3) years] or when there are significant changes to the system, its environment of operation, or other conditions that may impact the security or privacy state of the system. **RA-3 Additional FedRAMP Requirements and Guidance:** **Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F. **(e) Requirement**: Include all Authorizing Officials; for JAB authorizations to include FedRAMP.

๐Ÿ’ผ RA-3 Risk Assessment (L)(M)(H)

a. Conduct a risk assessment, including: 1. Identifying threats to and vulnerabilities in the system; 2. Determining the likelihood and magnitude of harm from unauthorized access, use, disclosure, disruption, modification, or destruction of the system, the information it processes, stores, or transmits, and any related information; and 3. Determining the likelihood and impact of adverse effects on individuals arising from the processing of personally identifiable information; b. Integrate risk assessment results and risk management decisions from the organization and mission or business process perspectives with system-level risk assessments; c. Document risk assessment results in [FedRAMP Assignment: security assessment report]; d. Review risk assessment results [FedRAMP Assignment: at least every three (3) years and when a significant change occurs]; e. Disseminate risk assessment results to [Assignment: organization-defined personnel or roles]; and f. Update the risk assessment [FedRAMP Assignment: at least every three (3) years] or when there are significant changes to the system, its environment of operation, or other conditions that may impact the security or privacy state of the system. **RA-3 Additional FedRAMP Requirements and Guidance:** **Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F. **(e) Requirement**: Include all Authorizing Officials; for JAB authorizations to include FedRAMP.

๐Ÿ’ผ RA-3 Risk Assessment (L)(M)(H)

a. Conduct a risk assessment, including: 1. Identifying threats to and vulnerabilities in the system; 2. Determining the likelihood and magnitude of harm from unauthorized access, use, disclosure, disruption, modification, or destruction of the system, the information it processes, stores, or transmits, and any related information; and 3. Determining the likelihood and impact of adverse effects on individuals arising from the processing of personally identifiable information; b. Integrate risk assessment results and risk management decisions from the organization and mission or business process perspectives with system-level risk assessments; c. Document risk assessment results in [FedRAMP Assignment: security assessment report]; d. Review risk assessment results [FedRAMP Assignment: at least every three (3) years and when a significant change occurs]; e. Disseminate risk assessment results to [Assignment: organization-defined personnel or roles]; and f. Update the risk assessment [FedRAMP Assignment: at least every three (3) years] or when there are significant changes to the system, its environment of operation, or other conditions that may impact the security or privacy state of the system. **RA-3 Additional FedRAMP Requirements and Guidance:** **Guidance**: Significant change is defined in NIST Special Publication 800-37 Revision 2, Appendix F. **(e) Requirement**: Include all Authorizing Officials; for JAB authorizations to include FedRAMP.

๐Ÿ’ผ RA-3(1) Risk Assessment | Supply Chain Risk Assessment

(a) Assess supply chain risks associated with [Assignment: organization-defined systems, system components, and system services]; and (b) Update the supply chain risk assessment [Assignment: organization-defined frequency], when there are significant changes to the relevant supply chain, or when changes to the system, environments of operation, or other conditions may necessitate a change in the supply chain.

๐Ÿ’ผ RA-3(1) Supply Chain Risk Assessment (L)(M)(H)

(a) Assess supply chain risks associated with [Assignment: organization-defined systems, system components, and system services]; and (b) Update the supply chain risk assessment [Assignment: organization-defined frequency], when there are significant changes to the relevant supply chain, or when changes to the system, environments of operation, or other conditions may necessitate a change in supply chain.

๐Ÿ’ผ RA-3(1) Supply Chain Risk Assessment (L)(M)(H)

(a) Assess supply chain risks associated with [Assignment: organization-defined systems, system components, and system services]; and (b) Update the supply chain risk assessment [Assignment: organization-defined frequency], when there are significant changes to the relevant supply chain, or when changes to the system, environments of operation, or other conditions may necessitate a change in supply chain.

๐Ÿ’ผ RA-3(1) Supply Chain Risk Assessment (L)(M)(H)

(a) Assess supply chain risks associated with [Assignment: organization-defined systems, system components, and system services]; and (b) Update the supply chain risk assessment [Assignment: organization-defined frequency], when there are significant changes to the relevant supply chain, or when changes to the system, environments of operation, or other conditions may necessitate a change in supply chain.

๐Ÿ’ผ RA-5 (4) DISCOVERABLE INFORMATION

The organization determines what information about the information system is discoverable by adversaries and subsequently takes [Assignment: organization-defined corrective actions].

๐Ÿ’ผ RA-5 (5) PRIVILEGED ACCESS

The information system implements privileged access authorization to [Assignment: organization-identified information system components] for selected [Assignment: organization-defined vulnerability scanning activities].

๐Ÿ’ผ RA-5 Vulnerability Monitoring and Scanning

a. Monitor and scan for vulnerabilities in the system and hosted applications [Assignment: organization-defined frequency and/or randomly in accordance with organization-defined process] and when new vulnerabilities potentially affecting the system are identified and reported; b. Employ vulnerability monitoring tools and techniques that facilitate interoperability among tools and automate parts of the vulnerability management process by using standards for: 1. Enumerating platforms, software flaws, and improper configurations; 2. Formatting checklists and test procedures; and 3. Measuring vulnerability impact; c. Analyze vulnerability scan reports and results from vulnerability monitoring; d. Remediate legitimate vulnerabilities [Assignment: organization-defined response times] in accordance with an organizational assessment of risk; e. Share information obtained from the vulnerability monitoring process and control assessments with [Assignment: organization-defined personnel or roles] to help eliminate similar vulnerabilities in other systems; and f. Employ vulnerability monitoring tools that include the capability to readily update the vulnerabilities to be scanned.

๐Ÿ’ผ RA-5 Vulnerability Monitoring and Scanning (L)(M)(H)

a. Monitor and scan for vulnerabilities in the system and hosted applications [FedRAMP Assignment: monthly operating system/infrastructure; monthly web applications (including APIs) and databases] and when new vulnerabilities potentially affecting the system are identified and reported; b. Employ vulnerability monitoring tools and techniques that facilitate interoperability among tools and automate parts of the vulnerability management process by using standards for: 1. Enumerating platforms, software flaws, and improper configurations; 2. Formatting checklists and test procedures; and 3. Measuring vulnerability impact; c. Analyze vulnerability scan reports and results from vulnerability monitoring; d. Remediate legitimate vulnerabilities [FedRAMP Assignment: high-risk vulnerabilities mitigated within thirty (30) days from date of discovery; moderate-risk vulnerabilities mitigated within ninety (90) days from date of discovery; low risk vulnerabilities mitigated within one hundred and eighty (180) days from date of discovery] in accordance with an organizational assessment of risk; e. Share information obtained from the vulnerability monitoring process and control assessments with [Assignment: organization-defined personnel or roles] to help eliminate similar vulnerabilities in other systems; and f. Employ vulnerability monitoring tools that include the capability to readily update the vulnerabilities to be scanned. **RA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: See the FedRAMP Documents page > Vulnerability Scanning Requirements <https://www.FedRAMP.gov/documents/> **Guidance**: Informational findings from a scanner are detailed as a returned result that holds no vulnerability risk or severity and for FedRAMP does not require an entry onto the POA&M or entry onto the RET during any assessment phase. Warning findings, on the other hand, are given a risk rating (low, moderate, high or critical) by the scanning solution and should be treated like any other finding with a risk or severity rating for tracking purposes onto either the POA&M or RET depending on when the findings originated (during assessments or during monthly continuous monitoring). If a warning is received during scanning, but further validation turns up no actual issue then this item should be categorized as a false positive. If this situation presents itself during an assessment phase (initial assessment, annual assessment or any SCR), follow guidance on how to report false positives in the Security Assessment Report (SAR). If this situation happens during monthly continuous monitoring, a deviation request will need to be submitted per the FedRAMP Vulnerability Deviation Request Form. Warnings are commonly associated with scanning solutions that also perform compliance scans, and if the scanner reports a โ€œwarningโ€ as part of the compliance scanning of a CSO, follow guidance surrounding the tracking of compliance findings during either the assessment phases (initial assessment, annual assessment or any SCR) or monthly continuous monitoring as it applies. Guidance on compliance scan findings can be found by searching on โ€œTracking of Compliance Scansโ€ in FAQs. **(a) Requirement**: an accredited independent assessor scans operating systems/infrastructure, web applications, and databases once annually. **(d) Requirement**: If a vulnerability is listed among the CISA Known Exploited Vulnerability (KEV) Catalog (<https://www.cisa.gov/known-exploited-vulnerabilities-catalog>) the KEV remediation date supersedes the FedRAMP parameter requirement. **(e) Requirement**: to include all Authorizing Officials; for JAB authorizations to include FedRAMP.

๐Ÿ’ผ RA-5 Vulnerability Monitoring and Scanning (L)(M)(H)

a. Monitor and scan for vulnerabilities in the system and hosted applications [FedRAMP Assignment: monthly operating system/infrastructure; monthly web applications (including APIs) and databases] and when new vulnerabilities potentially affecting the system are identified and reported; b. Employ vulnerability monitoring tools and techniques that facilitate interoperability among tools and automate parts of the vulnerability management process by using standards for: 1. Enumerating platforms, software flaws, and improper configurations; 2. Formatting checklists and test procedures; and 3. Measuring vulnerability impact; c. Analyze vulnerability scan reports and results from vulnerability monitoring; d. Remediate legitimate vulnerabilities [FedRAMP Assignment: high-risk vulnerabilities mitigated within thirty (30) days from date of discovery; moderate-risk vulnerabilities mitigated within ninety (90) days from date of discovery; low risk vulnerabilities mitigated within one hundred and eighty (180) days from date of discovery] in accordance with an organizational assessment of risk; e. Share information obtained from the vulnerability monitoring process and control assessments with [Assignment: organization-defined personnel or roles] to help eliminate similar vulnerabilities in other systems; and f. Employ vulnerability monitoring tools that include the capability to readily update the vulnerabilities to be scanned. **RA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: See the FedRAMP Documents page > Vulnerability Scanning Requirements <https://www.FedRAMP.gov/documents/> **Guidance**: Informational findings from a scanner are detailed as a returned result that holds no vulnerability risk or severity and for FedRAMP does not require an entry onto the POA&M or entry onto the RET during any assessment phase. Warning findings, on the other hand, are given a risk rating (low, moderate, high or critical) by the scanning solution and should be treated like any other finding with a risk or severity rating for tracking purposes onto either the POA&M or RET depending on when the findings originated (during assessments or during monthly continuous monitoring). If a warning is received during scanning, but further validation turns up no actual issue then this item should be categorized as a false positive. If this situation presents itself during an assessment phase (initial assessment, annual assessment or any SCR), follow guidance on how to report false positives in the Security Assessment Report (SAR). If this situation happens during monthly continuous monitoring, a deviation request will need to be submitted per the FedRAMP Vulnerability Deviation Request Form. Warnings are commonly associated with scanning solutions that also perform compliance scans, and if the scanner reports a โ€œwarningโ€ as part of the compliance scanning of a CSO, follow guidance surrounding the tracking of compliance findings during either the assessment phases (initial assessment, annual assessment or any SCR) or monthly continuous monitoring as it applies. Guidance on compliance scan findings can be found by searching on โ€œTracking of Compliance Scansโ€ in FAQs. **(a) Requirement**: an accredited independent assessor scans operating systems/infrastructure, web applications, and databases once annually. **(d) Requirement**: If a vulnerability is listed among the CISA Known Exploited Vulnerability (KEV) Catalog (<https://www.cisa.gov/known-exploited-vulnerabilities-catalog>) the KEV remediation date supersedes the FedRAMP parameter requirement. **(e) Requirement**: to include all Authorizing Officials; for JAB authorizations to include FedRAMP.

๐Ÿ’ผ RA-5 Vulnerability Monitoring and Scanning (L)(M)(H)

a. Monitor and scan for vulnerabilities in the system and hosted applications [FedRAMP Assignment: monthly operating system/infrastructure; monthly web applications (including APIs) and databases] and when new vulnerabilities potentially affecting the system are identified and reported; b. Employ vulnerability monitoring tools and techniques that facilitate interoperability among tools and automate parts of the vulnerability management process by using standards for: 1. Enumerating platforms, software flaws, and improper configurations; 2. Formatting checklists and test procedures; and 3. Measuring vulnerability impact; c. Analyze vulnerability scan reports and results from vulnerability monitoring; d. Remediate legitimate vulnerabilities [FedRAMP Assignment: high-risk vulnerabilities mitigated within thirty (30) days from date of discovery; moderate-risk vulnerabilities mitigated within ninety (90) days from date of discovery; low risk vulnerabilities mitigated within one hundred and eighty (180) days from date of discovery] in accordance with an organizational assessment of risk; e. Share information obtained from the vulnerability monitoring process and control assessments with [Assignment: organization-defined personnel or roles] to help eliminate similar vulnerabilities in other systems; and f. Employ vulnerability monitoring tools that include the capability to readily update the vulnerabilities to be scanned. **RA-5 Additional FedRAMP Requirements and Guidance:** **Guidance**: See the FedRAMP Documents page > Vulnerability Scanning Requirements <https://www.FedRAMP.gov/documents/> **Guidance**: Informational findings from a scanner are detailed as a returned result that holds no vulnerability risk or severity and for FedRAMP does not require an entry onto the POA&M or entry onto the RET during any assessment phase. Warning findings, on the other hand, are given a risk rating (low, moderate, high or critical) by the scanning solution and should be treated like any other finding with a risk or severity rating for tracking purposes onto either the POA&M or RET depending on when the findings originated (during assessments or during monthly continuous monitoring). If a warning is received during scanning, but further validation turns up no actual issue then this item should be categorized as a false positive. If this situation presents itself during an assessment phase (initial assessment, annual assessment or any SCR), follow guidance on how to report false positives in the Security Assessment Report (SAR). If this situation happens during monthly continuous monitoring, a deviation request will need to be submitted per the FedRAMP Vulnerability Deviation Request Form. Warnings are commonly associated with scanning solutions that also perform compliance scans, and if the scanner reports a โ€œwarningโ€ as part of the compliance scanning of a CSO, follow guidance surrounding the tracking of compliance findings during either the assessment phases (initial assessment, annual assessment or any SCR) or monthly continuous monitoring as it applies. Guidance on compliance scan findings can be found by searching on โ€œTracking of Compliance Scansโ€ in FAQs. **(a) Requirement**: an accredited independent assessor scans operating systems/infrastructure, web applications, and databases once annually. **(d) Requirement**: If a vulnerability is listed among the CISA Known Exploited Vulnerability (KEV) Catalog (<https://www.cisa.gov/known-exploited-vulnerabilities-catalog>) the KEV remediation date supersedes the FedRAMP parameter requirement. **(e) Requirement**: to include all Authorizing Officials; for JAB authorizations to include FedRAMP.

๐Ÿ’ผ RA-5 VULNERABILITY SCANNING

The organization: RA-5a. Scans for vulnerabilities in the information system and hosted applications [Assignment: organization-defined frequency and/or randomly in accordance with organization-defined process] and when new vulnerabilities potentially affecting the system/applications are identified and reported; RA-5b. Employs vulnerability scanning tools and techniques that facilitate interoperability among tools and automate parts of the vulnerability management process by using standards for: RA-5b.1. Enumerating platforms, software flaws, and improper configurations; RA-5b.2. Formatting checklists and test procedures; and RA-5b.3. Measuring vulnerability impact; RA-5c. Analyzes vulnerability scan reports and results from security control assessments; RA-5d. Remediates legitimate vulnerabilities [Assignment: organization-defined response times] in accordance with an organizational assessment of risk; and RA-5e. Shares information obtained from the vulnerability scanning process and security control assessments with [Assignment: organization-defined personnel or roles] to help eliminate similar vulnerabilities in other information systems (i.e., systemic weaknesses or deficiencies).

๐Ÿ’ผ RA-5(4) Discoverable Information (H)

Determine information about the system that is discoverable and take [FedRAMP Assignment: notify appropriate service provider personnel and follow procedures for organization and service provider-defined corrective actions].

๐Ÿ’ผ RA-5(8) Review Historic Audit Logs (H)

Review historic audit logs to determine if a vulnerability identified in a [Assignment: organization-defined system] has been previously exploited within an [Assignment: organization-defined time period]. **RA-5(8) Additional FedRAMP Requirement:** **Requirement**: This enhancement is required for all high (or critical) vulnerability scan findings.

๐Ÿ’ผ RA-6 Technical Surveillance Countermeasures Survey

Employ a technical surveillance countermeasures survey at [Assignment: organization-defined locations] [Selection (one or more): [Assignment: organization-defined frequency]; when the following events or indicators occur: [Assignment: organization-defined events or indicators]].

๐Ÿ’ผ RA-6 TECHNICAL SURVEILLANCE COUNTERMEASURES SURVEY

The organization employs a technical surveillance countermeasures survey at [Assignment: organization-defined locations] [Selection (one or more): [Assignment: organization-defined frequency]; [Assignment: organization-defined events or indicators occur]].

๐Ÿ’ผ RA-7 Risk Response

Respond to findings from security and privacy assessments, monitoring, and audits in accordance with organizational risk tolerance.

๐Ÿ’ผ RA-8 Privacy Impact Assessments

Conduct privacy impact assessments for systems, programs, or other activities before: a. Developing or procuring information technology that processes personally identifiable information; and b. Initiating a new collection of personally identifiable information that: 1. Will be processed using information technology; and 2. Includes personally identifiable information permitting the physical or virtual (online) contacting of a specific individual, if identical questions have been posed to, or identical reporting requirements imposed on, ten or more individuals, other than agencies, instrumentalities, or employees of the federal government.

๐Ÿ’ผ RA-9 Criticality Analysis

Identify critical system components and functions by performing a criticality analysis for [Assignment: organization-defined systems, system components, or system services] at [Assignment: organization-defined decision points in the system development life cycle].

๐Ÿ’ผ RA-9 Criticality Analysis (M)(H)

Identify critical system components and functions by performing a criticality analysis for [Assignment: organization-defined systems, system components, or system services] at [Assignment: organization-defined decision points in the system development life cycle].

๐Ÿ’ผ RA-9 Criticality Analysis (M)(H)

Identify critical system components and functions by performing a criticality analysis for [Assignment: organization-defined systems, system components, or system services] at [Assignment: organization-defined decision points in the system development life cycle].

๐Ÿ’ผ RC.CO-03: Recovery activities and progress in restoring operational capabilities are communicated to designated internal and external stakeholders

1. Securely share recovery information, including restoration progress, consistent with response plans and information sharing agreements 2. Regularly update senior leadership on recovery status and restoration progress for major incidents 3. Follow the rules and protocols defined in contracts for incident information sharing between the organization and its suppliers 4. Coordinate crisis communication between the organization and its critical suppliers

๐Ÿ’ผ RC.RP-04: Critical mission functions and cybersecurity risk management are considered to establish post-incident operational norms

1. Use business impact and system categorization records (including service delivery objectives) to validate that essential services are restored in the appropriate order 2. Work with system owners to confirm the successful restoration of systems and the return to normal operations 3. Monitor the performance of restored systems to verify the adequacy of the restoration

๐Ÿ’ผ Region selection

The choice of Region for your workload significantly affects its KPIs, including performance, cost, and carbon footprint. To effectively improve these KPIs, you should choose Regions for your workloads based on both business requirements and sustainability goals.

๐Ÿ’ผ REL01-BP01 Aware of service quotas and constraints

Be aware of your default quotas and manage your quota increase requests for your workload architecture. Know which cloud resource constraints, such as disk or network, are potentially impactful. **Desired outcome** Customers can prevent service degradation or disruption in their AWS accounts by implementing proper guidelines for monitoring key metrics, infrastructure reviews, and automation remediation steps to verify that services quotas and constraints are not reached that could cause service degradation or disruption. **Common anti-patterns** - Deploying a workload without understanding the hard or soft quotas and their limits for the services used. - Deploying a replacement workload without analyzing and reconfiguring the necessary quotas or contacting Support in advance. - Assuming that cloud services have no limits and the services can be used without consideration to rates, limits, counts, quantities. - Assuming that quotas will automatically be increased. - Not knowing the process and timeline of quota requests. - Assuming that the default cloud service quota is the identical for every service compared across regions. - Assuming that service constraints can be breached and the systems will auto-scale or add increase the limit beyond the resourceโ€™s constraints. - Not testing the application at peak traffic in order to stress the utilization of its resources. - Provisioning the resource without analysis of the required resource size. - Overprovisioning capacity by choosing resource types that go well beyond actual need or expected peaks. - Not assessing capacity requirements for new levels of traffic in advance of a new customer event or deploying a new technology. **Benefits of establishing this best practice** Monitoring and automated management of service quotas and resource constraints can proactively reduce failures. Changes in traffic patterns for a customerโ€™s service can cause a disruption or degradation if best practices are not followed. By monitoring and managing these values across all regions and all accounts, applications can have improved resiliency under adverse or unplanned events. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Service Quotas is an AWS service that helps you manage your quotas for over 250 AWS services from one location. Along with looking up the quota values, you can also request and track quota increases from the Service Quotas console or using the AWS SDK. AWS Trusted Advisor offers a service quotas check that displays your usage and quotas for some aspects of some services. The default service quotas per service are also in the AWS documentation per respective service (for example, see Amazon VPC Quotas). Some service limits, like rate limits on throttled APIs are set within the Amazon API Gateway itself by configuring a usage plan. Some limits that are set as configuration on their respective services include Provisioned IOPS, Amazon RDS storage allocated, and Amazon EBS volume allocations. Amazon Elastic Compute Cloud has its own service limits dashboard that can help you manage your instance, Amazon Elastic Block Store, and Elastic IP address limits. If you have a use case where service quotas impact your applicationโ€™s performance and they are not adjustable to your needs, then contact Support to see if there are mitigations. Service quotas can be Region specific or can also be global in nature. Using an AWS service that reaches its quota will not act as expected in normal usage and may cause service disruption or degradation. For example, a service quota limits the number of DL Amazon EC2 instances used in a Region. That limit may be reached during a traffic scaling event using Auto Scaling groups (ASG). Service quotas for each account should be assessed for usage on a regular basis to determine what the appropriate service limits might be for that account. These service quotas exist as operational guardrails, to prevent accidentally provisioning more resources than you need. They also serve to limit request rates on API operations to protect services from abuse. Service constraints are different from service quotas. Service constraints represent a particular resourceโ€™s limits as defined by that resource type. These might be storage capacity (for example, gp2 has a size limit of 1 GB - 16 TB) or disk throughput. It is essential that a resource typeโ€™s constraint be engineered and constantly assessed for usage that might reach its limit. If a constraint is reached unexpectedly, the accountโ€™s applications or services may be degraded or disrupted. If there is a use case where service quotas impact an applicationโ€™s performance and they cannot be adjusted to required needs, contact Support to see if there are mitigations. For more detail on adjusting fixed quotas, see REL01-BP03 Accommodate fixed service quotas and constraints through architecture. There are a number of AWS services and tools to help monitor and manage Service Quotas. The service and tools should be leveraged to provide automated or manual checks of quota levels. - AWS Trusted Advisor offers a service quota check that displays your usage and quotas for some aspects of some services. It can aid in identifying services that are near quota. - AWS Management Console provides methods to display services quota values, manage, request new quotas, monitor status of quota requests, and display history of quotas. - AWS CLI and CDKs offer programmatic methods to automatically manage and monitor service quota levels and usage. ### Implementation steps **For Service Quotas:** 1. Review AWS Service Quotas. 2. To be aware of your existing service quotas, determine the services (like IAM Access Analyzer) that are used. There are approximately 250 AWS services controlled by service quotas. Then, determine the specific service quota name that might be used within each account and Region. There are approximately 3000 service quota names per Region. 3. Augment this quota analysis with AWS Config to find all AWS resources used in your AWS accounts. 4. Use AWS CloudFormation data to determine your AWS resources used. Look at the resources that were created either in the AWS Management Console or with the list-stack-resources AWS CLI command. You can also see resources configured to be deployed in the template itself. 5. Determine all the services your workload requires by looking at the deployment code. 6. Determine the service quotas that apply. Use the programmatically accessible information from Trusted Advisor and Service Quotas. 7. Establish an automated monitoring method (see REL01-BP02 Manage service quotas across accounts and regions and REL01-BP04 Monitor and manage quotas) to alert and inform if services quotas are near or have reached their limit. 8. Establish an automated and programmatic method to check if a service quota has been changed in one region but not in other regions in the same account (see REL01-BP02 Manage service quotas across accounts and regions and REL01-BP04 Monitor and manage quotas). 9. Automate scanning application logs and metrics to determine if there are any quota or service constraint errors. If these errors are present, send alerts to the monitoring system. 10. Establish engineering procedures to calculate the required change in quota (see REL01-BP05 Automate quota management) once it has been identified that larger quotas are required for specific services. 11. Create a provisioning and approval workflow to request changes in service quota. This should include an exception workflow in case of request deny or partial approval. 12. Create an engineering method to review service quotas prior to provisioning and using new AWS services before rolling out to production or loaded environments. (for example, load testing account). **For service constraints:** 1. Establish monitoring and metrics methods to alert for resources reading close to their resource constraints. Leverage CloudWatch as appropriate for metrics or log monitoring. 2. Establish alert thresholds for each resource that has a constraint that is meaningful to the application or system. 3. Create workflow and infrastructure management procedures to change the resource type if the constraint is near utilization. This workflow should include load testing as a best practice to verify that new type is the correct resource type with the new constraints. 4. Migrate identified resource to the recommended new resource type, using existing procedures and processes.

๐Ÿ’ผ REL01-BP02 Manage service quotas across accounts and regions

If you are using multiple accounts or Regions, request the appropriate quotas in all environments in which your production workloads run. **Desired outcome** Services and applications should not be affected by service quota exhaustion for configurations that span accounts or Regions or that have resilience designs using zone, Region, or account failover. **Common anti-patterns** - Allowing resource usage in one isolation Region to grow with no mechanism to maintain capacity in the other ones. - Manually setting all quotas independently in isolation Regions. - Not considering the effect of resiliency architectures (like active or passive) in future quota needs during a degradation in the non-primary Region. - Not evaluating quotas regularly and making necessary changes in every Region and account the workload runs. - Not leveraging quota request templates to request increases across multiple Regions and accounts. - Not updating service quotas due to incorrectly thinking that increasing quotas has cost implications like compute reservation requests. **Benefits of establishing this best practice** Verifying that you can handle your current load in secondary regions or accounts if regional services become unavailable. This can help reduce the number of errors or levels of degradations that occur during region loss. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Service quotas are tracked per account. Unless otherwise noted, each quota is AWS Region-specific. In addition to the production environments, also manage quotas in all applicable non-production environments so that testing and development are not hindered. Maintaining a high degree of resiliency requires that service quotas are assessed continually (whether automated or manual). With more workloads spanning Regions due to the implementation of designs using Active/Active, Active/Passive โ€“ Hot, Active/Passive-Cold, and Active/Passive- Pilot Light approaches, it is essential to understand all Region and account quota levels. Past traffic patterns are not always a good indicator if the service quota is set correctly. Equally important, the service quota name limit is not always the same for every Region. In one Region, the value could be five, and in another region the value could be ten. Management of these quotas must span all the same services, accounts, and Regions to provide consistent resilience under load. Reconcile all the service quota differences across different Regions (Active Region or Passive Region) and create processes to continually reconcile these differences. The testing plans of passive Region failovers are rarely scaled to peak active capacity, meaning that game day or table top exercises can fail to find differences in service quotas between Regions and also then maintain the correct limits. Service quota drift, the condition where service quota limits for a specific named quota is changed in one Region and not all Regions, is very important to track and assess. Changing the quota in Regions with traffic or potentially could carry traffic should be considered. - Select relevant accounts and Regions based on your service requirements, latency, regulatory, and disaster recovery (DR) requirements. - Identify service quotas across all relevant accounts, Regions, and Availability Zones. The limits are scoped to account and Region. These values should be compared for differences. ### Implementation steps 1. Review Service Quotas values that might have breached beyond the a risk level of usage. AWS Trusted Advisor provides alerts for 80% and 90% threshold breaches. 2. Review values for service quotas in any Passive Regions (in an Active/Passive design). Verify that load will successfully run in secondary Regions in the event of a failure in the primary Region. 3. Automate assessing if any service quota drift has occurred between Regions in the same account and act accordingly to change the limits. 4. If the customer Organizational Units (OU) are structured in the supported manner, service quota templates should be updated to reflect changes in any quotas that should be applied to multiple Regions and accounts. - Create a template and associate Regions to the quota change. - Review all existing service quota templates for any changes required (Region, limits, and accounts).

๐Ÿ’ผ REL01-BP03 Accommodate fixed service quotas and constraints through architecture

Be aware of unchangeable service quotas, service constraints, and physical resource limits. Design architectures for applications and services to prevent these limits from impacting reliability. Examples include network bandwidth, serverless function invocation payload size, throttle burst rate for of an API gateway, and concurrent user connections to a database. **Desired outcome** The application or service performs as expected under normal and high traffic conditions. They have been designed to work within the limitations for that resourceโ€™s fixed constraints or service quotas. **Common anti-patterns** - Choosing a design that uses a resource of a service, unaware that there are design constraints that will cause this design to fail as you scale. - Performing benchmarking that is unrealistic and will reach service fixed quotas during the testing. For example, running tests at a burst limit but for an extended amount of time. - Choosing a design that cannot scale or be modified if fixed service quotas are to be exceeded. For example, an SQS payload size of 256KB. - Observability has not been designed and implemented to monitor and alert on thresholds for service quotas that might be at risk during high traffic events. **Benefits of establishing this best practice** Verifying that the application will run under all projected services load levels without disruption or degradation. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Unlike soft service quotas or resources that be replaced with higher capacity units, AWS servicesโ€™ fixed quotas cannot be changed. This means that all these type of AWS services must be evaluated for potential hard capacity limits when used in an application design. Hard limits are shown in the Service Quotas console. If the columns shows ADJUSTABLE = No, the service has a hard limit. Hard limits are also shown in some resources configuration pages. For example, Lambda has specific hard limits that cannot be adjusted. As an example, when designing a python application to run in a Lambda function, the application should be evaluated to determine if there is any chance of Lambda running longer than 15 minutes. If the code may run more than this service quota limit, alternate technologies or designs must be considered. If this limit is reached after production deployment, the application will suffer degradation and disruption until it can be remediated. Unlike soft quotas, there is no method to change to these limits even under emergency Severity 1 events. Once the application has been deployed to a testing environment, strategies should be used to find if any hard limits can be reached. Stress testing, load testing, and chaos testing should be part of the introduction test plan. ### Implementation steps 1. Review the complete list of AWS services that could be used in the application design phase. 2. Review the soft quota limits and hard quota limits for all these services. Not all limits are shown in the Service Quotas console. Some services describe these limits in alternate locations. 3. As you design your application, review your workloadโ€™s business and technology drivers, such as business outcomes, use case, dependent systems, availability targets, and disaster recovery objects. Let your business and technology drivers guide the process to identify the distributed system that is right for your workload. 4. Analyze service load across Regions and accounts. Many hard limits are regionally based for services. However, some limits are account based. 5. Analyze resilience architectures for resource usage during a zonal failure and Regional failure. In the progression of multi-Region designs using active/active, active/passive โ€“ hot, active/passive - cold, and active/passive - pilot light approaches, these failure cases will cause higher usage. This creates a potential use case for hitting hard limits.

๐Ÿ’ผ REL01-BP04 Monitor and manage quotas

Evaluate your potential usage and increase your quotas appropriately, allowing for planned growth in usage. **Desired outcome** Active and automated systems that manage and monitor have been deployed. These operations solutions ensure that quota usage thresholds are nearing being reached. These would be proactively remediated by requested quota changes. **Common anti-patterns** - Not configuring monitoring to check for service quota thresholds. - Not configuring monitoring for hard limits, even though those values cannot be changed. - Assuming that amount of time required to request and secure a soft quota change is immediate or a short period. - Configuring alarms for when service quotas are being approached, but having no process on how to respond to an alert. - Only configuring alarms for services supported by AWS Service Quotas and not monitoring other AWS services. - Not considering quota management for multiple Region resiliency designs, like active/active, active/passive โ€“ hot, active/passive - cold, and active/passive - pilot light approaches. - Not assessing quota differences between Regions. - Not assessing the needs in every Region for a specific quota increase request. - Not leveraging templates for multi-Region quota management. **Benefits of establishing this best practice** Automatic tracking of the AWS Service Quotas and monitoring your usage against those quotas will allow you to see when you are approaching a quota limit. You can also use this monitoring data to help limit any degradations due to quota exhaustion. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance For supported services, you can monitor your quotas by configuring various different services that can assess and then send alerts or alarms. This can aid in monitoring usage and can alert you to approaching quotas. These alarms can be invoked from AWS Config, Lambda functions, Amazon CloudWatch, or from AWS Trusted Advisor. You can also use metric filters on CloudWatch Logs to search and extract patterns in logs to determine if usage is approaching quota thresholds. ### Implementation steps **For monitoring:** 1. Capture current resource consumption (for example, buckets or instances). Use service API operations, such as the Amazon EC2 DescribeInstances API, to collect current resource consumption. 2. Capture your current quotas that are essential and applicable to the services using: - AWS Service Quotas - AWS Trusted Advisor - AWS documentation - AWS service-specific pages - AWS Command Line Interface (AWS CLI) - AWS Cloud Development Kit (AWS CDK) 3. Use AWS Service Quotas, an AWS service that helps you manage your quotas for over 250 AWS services from one location. 4. Use Trusted Advisor service limits to monitor your current service limits at various thresholds. 5. Use the service quota history (console or AWS CLI) to check on regional increases. 6. Compare service quota changes in each Region and each account to create equivalency, if required. **For management:** 1. Automated: Set up an AWS Config custom rule to scan service quotas across Regions and compare for differences. 2. Automated: Set up a scheduled Lambda function to scan service quotas across Regions and compare for differences. 3. Manual: Scan services quota through AWS CLI, API, or AWS Console to scan service quotas across Regions and compare for differences. Report the differences. 4. If differences in quotas are identified between Regions, request a quota change, if required. 5. Review the result of all requests.

๐Ÿ’ผ REL01-BP05 Automate quota management

Service quotas, also referred to as limits in AWS services, are the maximum values for the resources in your AWS account. Each AWS service defines a set of quotas and their default values. To provide your workload access to all the resources it needs, you might need to increase your service quota values. Growth in workload consumption of AWS resources can threaten workload stability and impact the user experience if quotas are exceeded. Implement tools to alert you when your workload approaches the limits and consider creating quota increase requests automatically. **Desired outcome** Quotas are appropriately configured for the workloads running in each AWS account and Region. **Common anti-patterns** - You fail to consider and adjust quotas appropriately to meet workload requirements. - You track quotas and usage using methods that can become outdated, such as with spreadsheets. - You only update service limits on periodic schedules. - Your organization lacks operational processes to review existing quotas and request service quota increases when necessary. **Benefits of establishing this best practice** - Enhanced workload resiliency: You prevent errors caused by exceeding AWS resource quotas. - Simplified disaster recovery: You can reuse automated quota management mechanisms built in the primary Region during DR setup in another AWS Region. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance View current quotas and track ongoing quota consumption through mechanisms such as AWS Service Quotas console, AWS Command Line Interface (AWS CLI), and AWS SDKs. You can also integrate your configuration management databases (CMDB) and IT service management (ITSM) systems with the AWS Service Quota APIs. Generate automated alerts if quota usage reaches your defined thresholds, and define a process for submitting quota increase requests when you receive alerts. If the underlying workload is critical to your business, you can automate quota increase requests, but carefully test the automation to avoid the risk of runaway action such as a growth feedback loop. Smaller quota increases are often automatically approved. Larger quota requests may need to be manually processed by AWS support and can take additional time to review and process. Allow for additional time to process multiple requests or large increase requests. ### Implementation steps 1. Implement automated monitoring of service quotas, and issue alerts if your workload's resource utilization growth approaches your quota limits. For example, Quota Monitor for AWS can provide automated monitoring of service quotas. This tool integrates with AWS Organizations and deploys using Cloudformation StackSets so that new accounts are automatically monitored on creation. 2. Use features such as Service Quotas request templates or AWS Control Tower to simplify Service Quotas setup for new accounts. 3. Build dashboards of your current service quota use across all AWS accounts and regions and reference them as necessary to prevent exceeding your quotas. Trusted Advisor Organizational (TAO) Dashboard, part of the Cloud Intelligence Dashboards, can get you quickly started with such a dashboard. 4. Track service limit increase requests. Consolidated Insights from Multiple Accounts (CIMA) can provide an Organization-level view of all your requests. 5. Test alert generation and any quota increase request automation by setting lower quota thresholds in non-production accounts. Do not conduct these tests in a production account.

๐Ÿ’ผ REL01-BP06 Ensure that a sufficient gap exists between the current quotas and the maximum usage to accommodate failover

This article explains how to maintain space between the resource quota and your usage, and how it can benefit your organization. After you finish using a resource, the usage quota may continue to account for that resource. This can result in a failing or inaccessible resource. Prevent resource failure by verifying that your quotas cover the overlap of inaccessible resources and their replacements. Consider cases like network failure, Availability Zone failure, or Region failures when calculating this gap. **Desired outcome** Small or large failures in resources or resource accessibility can be covered within the current service thresholds. Zone failures, network failures, or even Regional failures have been considered in the resource planning. **Common anti-patterns** - Setting service quotas based on current needs without accounting for failover scenarios. - Not considering the principals of static stability when calculating the peak quota for a service. - Not considering the potential of inaccessible resources in calculating total quota needed for each Region. - Not considering AWS service fault isolation boundaries for some services and their potential abnormal usage patterns. **Benefits of establishing this best practice** When service disruption events impact application availability, use the cloud to implement strategies to recover from these events. An example strategy is creating additional resources to replace inaccessible resources to accommodate failover conditions without exhausting your service limit. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance When evaluating a quota limit, consider failover cases that might occur due to some degradation. Consider the following failover cases: - A disrupted or inaccessible VPC. - An inaccessible subnet. - A degraded Availability Zone that impacts resource accessibility. - Networking routes or ingress and egress points are blocked or changed. - A degraded Region that impacts resource accessibility. - A subset of resources affected by a failure in a Region or an Availability Zone. The decision to failover is unique for each situation, as the business impact can vary. Address resource capacity planning in the failover location and the resourcesโ€™ quotas before deciding to failover an application or service. Consider higher than normal peaks of activity when reviewing quotas for each service. These peaks might be related to resources that are inaccessible due to networking or permissions, but are still active. Unterminated active resources count against the service quota limit. ### Implementation steps 1. Maintain space between your service quota and your maximum usage to accommodate for a failover or loss of accessibility. 2. Determine your service quotas. Account for typical deployment patterns, availability requirements, and consumption growth. 3. Request quota increases if necessary. Anticipate a wait time for the quota increase request. 4. Determine your reliability requirements (also known as your number of nines). 5. Understand potential fault scenarios such as loss of a component, an Availability Zone, or a Region. 6. Establish your deployment methodology (examples include canary, blue/green, red/black, and rolling). 7. Include an appropriate buffer to the current quota limit. An example buffer could be 15%. 8. Include calculations for static stability (Zonal and Regional) where appropriate. 9. Plan consumption growth and monitor your consumption trends. 10. Consider the static stability impact for your most critical workloads. Assess resources conforming to a statically stable system in all Regions and Availability Zones. 11. Consider using On-Demand Capacity Reservations to schedule capacity ahead of any failover. This is a useful strategy to implement for critical business schedules to reduce potential risks of obtaining the correct quantity and type of resources during failover.

๐Ÿ’ผ REL02-BP01 Use highly available network connectivity for your workload public endpoints

Building highly available network connectivity to public endpoints of your workloads can help you reduce downtime due to loss of connectivity and improve the availability and SLA of your workload. To achieve this, use highly available DNS, content delivery networks (CDNs), API gateways, load balancing, or reverse proxies. **Desired outcome** It is critical to plan, build, and operationalize highly available network connectivity for your public endpoints. If your workload becomes unreachable due to a loss in connectivity, even if your workload is running and available, your customers will see your system as down. By combining the highly available and resilient network connectivity for your workloadโ€™s public endpoints, along with a resilient architecture for your workload itself, you can provide the best possible availability and service level for your customers. AWS Global Accelerator, Amazon CloudFront, Amazon API Gateway, AWS Lambda Function URLs, AWS AppSync APIs, and Elastic Load Balancing (ELB) all provide highly available public endpoints. Amazon Route 53 provides a highly available DNS service for domain name resolution to verify that your public endpoint addresses can be resolved. You can also evaluate AWS Marketplace software appliances for load balancing and proxying. **Common anti-patterns** - Designing a highly available workload without planning out DNS and network connectivity for high availability. - Using public internet addresses on individual instances or containers and managing the connectivity to them with DNS. - Using IP addresses instead of domain names for locating services. - Not testing out scenarios where connectivity to your public endpoints is lost. - Not analyzing network throughput needs and distribution patterns. - Not testing and planning for scenarios where internet network connectivity to your public endpoints of your workload might be interrupted. - Providing content (like web pages, static assets, or media files) to a large geographic area and not using a content delivery network. - Not planning for distributed denial of service (DDoS) attacks. DDoS attacks risk shutting out legitimate traffic and lowering availability for your users. **Benefits of establishing this best practice** Designing for highly available and resilient network connectivity ensures that your workload is accessible and available to your users. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance At the core of building highly available network connectivity to your public endpoints is the routing of the traffic. To verify your traffic is able to reach the endpoints, the DNS must be able to resolve the domain names to their corresponding IP addresses. Use a highly available and scalable Domain Name System (DNS) such as Amazon Route 53 to manage your domainโ€™s DNS records. You can also use health checks provided by Amazon Route 53. The health checks verify that your application is reachable, available, and functional, and they can be set up in a way that they mimic your userโ€™s behavior, such as requesting a web page or a specific URL. In case of failure, Amazon Route 53 responds to DNS resolution requests and directs the traffic to only healthy endpoints. You can also consider using Geo DNS and Latency Based Routing capabilities offered by Amazon Route 53. To verify that your workload itself is highly available, use Elastic Load Balancing (ELB). Amazon Route 53 can be used to target traffic to ELB, which distributes the traffic to the target compute instances. You can also use Amazon API Gateway along with AWS Lambda for a serverless solution. Customers can also run workloads in multiple AWS Regions. With multi-site active/active pattern, the workload can serve traffic from multiple Regions. With a multi-site active/passive pattern, the workload serves traffic from the active region while data is replicated to the secondary region and becomes active in the event of a failure in the primary region. Route 53 health checks can then be used to control DNS failover from any endpoint in a primary Region to an endpoint in a secondary Region, verifying that your workload is reachable and available to your users. Amazon CloudFront provides a simple API for distributing content with low latency and high data transfer rates by serving requests using a network of edge locations around the world. Content delivery networks (CDNs) serve customers by serving content located or cached at a location near to the user. This also improves availability of your application as the load for content is shifted away from your servers over to CloudFrontโ€™s edge locations. The edge locations and regional edge caches hold cached copies of your content close to your viewers resulting in quick retrieval and increasing reachability and availability of your workload. For workloads with users spread out geographically, AWS Global Accelerator helps you improve the availability and performance of the applications. AWS Global Accelerator provides Anycast static IP addresses that serve as a fixed entry point to your application hosted in one or more AWS Regions. This allows traffic to ingress onto the AWS global network as close to your users as possible, improving reachability and availability of your workload. AWS Global Accelerator also monitors the health of your application endpoints by using TCP, HTTP, and HTTPS health checks. Any changes in the health or configuration of your endpoints permit redirection of user traffic to healthy endpoints that deliver the best performance and availability to your users. In addition, AWS Global Accelerator has a fault-isolating design that uses two static IPv4 addresses that are serviced by independent network zones increasing the availability of your applications. To help protect customers from DDoS attacks, AWS provides AWS Shield Standard. Shield Standard comes automatically turned on and protects from common infrastructure (layer 3 and 4) attacks like SYN/UDP floods and reflection attacks to support high availability of your applications on AWS. For additional protections against more sophisticated and larger attacks (like UDP floods), state exhaustion attacks (like TCP SYN floods), and to help protect your applications running on Amazon Elastic Compute Cloud (Amazon EC2), Elastic Load Balancing (ELB), Amazon CloudFront, AWS Global Accelerator, and Route 53, you can consider using AWS Shield Advanced. For protection against Application layer attacks like HTTP POST or GET floods, use AWS WAF. AWS WAF can use IP addresses, HTTP headers, HTTP body, URI strings, SQL injection, and cross-site scripting conditions to determine if a request should be blocked or allowed. ### Implementation steps 1. Set up highly available DNS: Amazon Route 53 is a highly available and scalable domain name system (DNS) web service. Route 53 connects user requests to internet applications running on AWS or on-premises. 2. Setup health checks: When using Route 53, verify that only healthy targets are resolvable. Start by creating Route 53 health checks and configuring DNS failover. The following aspects are important to consider when setting up health checks: - How Amazon Route 53 determines whether a health check is healthy - Creating, updating, and deleting health checks - Monitoring health check status and getting notifications - Best practices for Amazon Route 53 DNS 3. Connect your DNS service to your endpoints - When using Elastic Load Balancing as a target for your traffic, create an alias record using Amazon Route 53 that points to your load balancerโ€™s regional endpoint. During the creation of the alias record, set the Evaluate target health option to Yes. - For serverless workloads or private APIs when API Gateway is used, use Route 53 to direct traffic to API Gateway. 4. Decide on a content delivery network. - For delivering content using edge locations closer to the user, start by understanding how CloudFront delivers content. - Get started with a simple CloudFront distribution. CloudFront then knows where you want the content to be delivered from, and the details about how to track and manage content delivery. The following aspects are important to understand and consider when setting up CloudFront distribution: - How caching works with CloudFront edge locations - Increasing the proportion of requests that are served directly from the CloudFront caches (cache hit ratio) - Using Amazon CloudFront Origin Shield - Optimizing high availability with CloudFront origin failover 5. Set up application layer protection: AWS WAF helps you protect against common web exploits and bots that can affect availability, compromise security, or consume excessive resources. To get a deeper understanding, review how AWS WAF works and when you are ready to implement protections from application layer HTTP POST AND GET floods, review Getting started with AWS WAF. You can also use AWS WAF with CloudFront see the documentation on how AWS WAF works with Amazon CloudFront features. 6. Set up additional DDoS protection: By default, all AWS customers receive protection from common, most frequently occurring network and transport layer DDoS attacks that target your web site or application with AWS Shield Standard at no additional charge. For additional protection of internet-facing applications running on Amazon EC2, Elastic Load Balancing, Amazon CloudFront, AWS Global Accelerator, and Amazon Route 53 you can consider AWS Shield Advanced and review examples of DDoS resilient architectures. To protect your workload and your public endpoints from DDoS attacks review Getting started with AWS Shield Advanced.

๐Ÿ’ผ REL02-BP02 Provision redundant connectivity between private networks in the cloud and on-premises environments

Implement redundancy in your connections between private networks in the cloud and on-premises environments to achieve connectivity resilience. This can be accomplished by deploying two or more links and traffic paths, preserving connectivity in the event of network failures. **Common anti-patterns** - You depend on just one network connection, which creates a single point of failure. - You use only one VPN tunnel or multiple tunnels that end in the same Availability Zone. - You rely on one ISP for VPN connectivity, which can lead to complete failures during ISP outages. - Not implementing dynamic routing protocols like BGP, which are crucial for rerouting traffic during network disruptions. - You ignore the bandwidth limitations of VPN tunnels and overestimate their backup capabilities. **Benefits of establishing this best practice** By implementing redundant connectivity between your cloud environment and your corporate or on-premises environment, the dependent services between the two environments can communicate reliably. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance When using AWS Direct Connect to connect your on-premises network to AWS, you can achieve maximum network resiliency (SLA of 99.99%) by using separate connections that end on distinct devices in more than one on-premises location and more than one AWS Direct Connect location. This topology offers resilience against device failures, connectivity issues, and complete location outages. Alternatively, you can achieve high resiliency (SLA of 99.9%) by using two individual connections to multiple locations (each on-premises location connected to a single Direct Connect location). This approach protects against connectivity disruptions caused by fiber cuts or device failures and helps mitigate complete location failures. The AWS Direct Connect Resiliency Toolkit can assist in designing your AWS Direct Connect topology. You can also consider AWS Site-to-Site VPN ending on an AWS Transit Gateway as a cost-effective backup to your primary AWS Direct Connect connection. This setup enables equal-cost multipath (ECMP) routing across multiple VPN tunnels, allowing for throughput of up to 50Gbps, even though each VPN tunnel is capped at 1.25 Gbps. It's important to note, however, that AWS Direct Connect is still the most effective choice for minimizing network disruptions and providing stable connectivity. When using VPNs over the internet to connect your cloud environment to your on-premises data center, configure two VPN tunnels as part of a single site-to-site VPN connection. Each tunnel should end in a different Availability Zone for high availability and use redundant hardware to prevent on-premises device failure. Additionally, consider multiple internet connections from various internet service providers (ISPs) at your on-premises location to avoid complete VPN connectivity disruption due to a single ISP outage. Selecting ISPs with diverse routing and infrastructure, especially those with separate physical paths to AWS endpoints, provides high connectivity availability. In addition to physical redundancy with multiple AWS Direct Connect connections and multiple VPN tunnels (or a combination of both), implementing Border Gateway Protocol (BGP) dynamic routing is also crucial. Dynamic BGP provides automatic rerouting of traffic from one path to another based on real-time network conditions and configured policies. This dynamic behavior is especially beneficial in maintaining network availability and service continuity in the event of link or network failures. It quickly selects alternative paths, enhancing the network's resilience and reliability. ### Implementation steps 1. Acquire highly-available connectivity between AWS and your on-premises environment. - Use multiple AWS Direct Connect connections or VPN tunnels between separately deployed private networks. - Use multiple AWS Direct Connect locations for high availability. - If using multiple AWS Regions, create redundancy in at least two of them. 2. Use AWS Transit Gateway, when possible, to end your VPN connection. 3. Evaluate AWS Marketplace appliances to end VPNs or extend your SD-WAN to AWS. If you use AWS Marketplace appliances, deploy redundant instances for high availability in different Availability Zones. 4. Provide a redundant connection to your on-premises environment. - You may need redundant connections to multiple AWS Regions to achieve your availability needs. - Use the AWS Direct Connect Resiliency Toolkit to get started.

๐Ÿ’ผ REL02-BP03 Ensure IP subnet allocation accounts for expansion and availability

Amazon VPC IP address ranges must be large enough to accommodate workload requirements, including factoring in future expansion and allocation of IP addresses to subnets across Availability Zones. This includes load balancers, EC2 instances, and container-based applications. When you plan your network topology, the first step is to define the IP address space itself. Private IP address ranges (following RFC 1918 guidelines) should be allocated for each VPC. Accommodate the following requirements as part of this process: - Allow IP address space for more than one VPC per Region. - Within a VPC, allow space for multiple subnets so that you can cover multiple Availability Zones. - Consider leaving unused CIDR block space within a VPC for future expansion. - Ensure that there is IP address space to meet the needs of any transient fleets of Amazon EC2 instances that you might use, such as Spot Fleets for machine learning, Amazon EMR clusters, or Amazon Redshift clusters. Similar consideration should be given to Kubernetes clusters, such as Amazon Elastic Kubernetes Service (Amazon EKS), as each Kubernetes pod is assigned a routable address from the VPC CIDR block by default. - Note that the first four IP addresses and the last IP address in each subnet CIDR block are reserved and not available for your use. - Note that the initial VPC CIDR block allocated to your VPC cannot be changed or deleted, but you can add additional non-overlapping CIDR blocks to the VPC. Subnet IPv4 CIDRs cannot be changed, however IPv6 CIDRs can. - The largest possible VPC CIDR block is a /16, and the smallest is a /28. - Consider other connected networks (VPC, on-premises, or other cloud providers) and ensure non-overlapping IP address space. For more information, see REL02-BP05 Enforce non-overlapping private IP address ranges in all private address spaces where they are connected. **Desired outcome** A scalable IP subnet can help you accomodate for future growth and avoid unnecessary waste. **Common anti-patterns** - Failing to consider future growth, resulting in CIDR blocks that are too small and requiring reconfiguration, potentially causing downtime. - Incorrectly estimating how many IP addresses an elastic load balancer can use. - Deploying many high traffic load balancers into the same subnets. - Using automated scaling mechanisms whilst failing to monitor IP address consumption. - Defining excessively large CIDR ranges well beyond future growth expectations, which can lead to difficulty peering with other networks with overlapping address ranges. **Benefits of establishing this best practice** This ensures that you can accommodate the growth of your workloads and continue to provide availability as you scale up. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Plan your network to accommodate for growth, regulatory compliance, and integration with others. Growth can be underestimated, regulatory compliance can change, and acquisitions or private network connections can be difficult to implement without proper planning. - Select relevant AWS accounts and Regions based on your service requirements, latency, regulatory, and disaster recovery (DR) requirements. - Identify your needs for regional VPC deployments. - Identify the size of the VPCs. - Determine if you are going to deploy multi-VPC connectivity. - What Is a Transit Gateway? - Single Region Multi-VPC Connectivity - Determine if you need segregated networking for regulatory requirements. - Make VPCs with appropriately-sized CIDR blocks to accommodate your current and future needs. - If you have unknown growth projections, you may wish to err on the side of larger CIDR blocks to reduce the potential for future reconfiguration. - Consider using IPv6 addressing for subnets as part of a dual-stack VPC. IPv6 is well suited to being used in private subnets containing fleets of ephemeral instances or containers that would otherwise require large numbers of IPv4 addresses.

๐Ÿ’ผ REL02-BP04 Prefer hub-and-spoke topologies over many-to-many mesh

When connecting multiple private networks, such as Virtual Private Clouds (VPCs) and on-premises networks, opt for a hub-and-spoke topology over a meshed one. Unlike meshed topologies, where each network connects directly to the others and increases the complexity and management overhead, the hub-and-spoke architecture centralizes connections through a single hub. This centralization simplifies the network structure and enhances its operability, scalability, and control. AWS Transit Gateway is a managed, scalable, and highly-available service designed for construction of hub-and-spoke networks on AWS. It serves as the central hub of your network that provides network segmentation, centralized routing, and the simplified connection to both cloud and on-premises environments. **Desired outcome** - You have connected your Virtual Private Clouds (VPCs) and on-premises networks through a central hub. - You configure your peering connections through the hub, which acts as a highly scalable cloud router. - Routing is simplified because you do not have to work with complex peering relationships. - Traffic between networks is encrypted, and you have the ability to isolate networks. **Common anti-patterns** - You build complex network peering rules. - You provide routes between networks that should not communicate with one another (for example, separate workloads that have no interdependencies). - There is ineffective governance of the hub instance. **Benefits of establishing this best practice** As the number of connected networks increases, management and expansion of meshed connectivity becomes increasingly challenging. A mesh architecture introduces additional challenges, such as additional infrastructure components, configuration requirements, and deployment considerations. The mesh also introduces additional overhead to manage and monitor the data plane and control plane components. A hub-and-spoke model establishes centralized traffic routing across multiple networks, simplifying management and monitoring of the data plane and control plane components. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Create a Network Services account if one does not exist. Place the hub in the organization's Network Services account for central management by network engineers. The hub acts as a virtual router for traffic flowing between your VPCs and on-premises networks, reducing network complexity and simplifying troubleshooting. Consider your network design, including the VPCs, AWS Direct Connect, and Site-to-Site VPN connections you want to interconnect. Use a separate subnet for each Transit Gateway VPC attachment. Use small CIDRs (for example /28) for flexibility in compute resources. Create one network ACL and associate it with all hub subnets. Keep the ACL open for both inbound and outbound traffic. Design routing tables to provide routes only between networks that should communicate. Omit routes for networks that should remain isolated. ### Implementation steps 1. Plan your network: Determine which networks to connect and verify that CIDR ranges do not overlap. 2. Create an AWS Transit Gateway and attach your VPCs. 3. If needed, create VPN connections or Direct Connect gateways, and associate them with the Transit Gateway. 4. Configure Transit Gateway route tables to define how traffic is routed between the connected VPCs and other connections. 5. Use Amazon CloudWatch to monitor performance and adjust configurations as necessary for optimization and cost efficiency.

๐Ÿ’ผ REL02-BP05 Enforce non-overlapping private IP address ranges in all private address spaces where they are connected

The IP address ranges of each of your VPCs must not overlap when peered, connected via Transit Gateway, or connected over VPN. Avoid IP address conflicts between a VPC and on-premises environments or with other cloud providers that you use. You must also have a way to allocate private IP address ranges when needed. An IP address management (IPAM) system can help with automating this. **Desired outcome** - No IP address range conflicts between VPCs, on-premises environments, or other cloud providers. - Proper IP address management allows for easier scaling of network infrastructure to accommodate growth and changes in network requirements. **Common anti-patterns** - Using the same IP range in your VPC as you have on premises, in your corporate network, or other cloud providers. - Not tracking IP ranges of VPCs used to deploy your workloads. - Relying on manual IP address management processes, such as spreadsheets. - Over- or under-sizing CIDR blocks, which results in IP address waste or insufficient address space for your workload. **Benefits of establishing this best practice** Active planning of your network will ensure that you do not have multiple occurrences of the same IP address in interconnected networks. This prevents routing problems from occurring in parts of the workload that are using the different applications. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Make use of an IPAM, such as the Amazon VPC IP Address Manager, to monitor and manage your CIDR use. Several IPAMs are also available from the AWS Marketplace. Evaluate your potential usage on AWS, add CIDR ranges to existing VPCs, and create VPCs to allow planned growth in usage. ### Implementation steps 1. Capture current CIDR consumption (for example, VPCs and subnets). 1. Use service API operations to collect current CIDR consumption. 2. Use the Amazon VPC IP Address Manager to discover resources. 2. Capture your current subnet usage. 1. Use service API operations to collect subnets per VPC in each Region. 2. Use the Amazon VPC IP Address Manager to discover resources. 3. Record the current usage. 4. Determine if you created any overlapping IP ranges. 5. Calculate the spare capacity. 6. Identify overlapping IP ranges. You can either migrate to a new range of addresses or consider using techniques like private NAT Gateway or AWS PrivateLink if you need to connect the overlapping ranges.

๐Ÿ’ผ REL03-BP01 Choose how to segment your workload

Workload segmentation is important when determining the resilience requirements of your application. Monolithic architecture should be avoided whenever possible. Instead, carefully consider which application components can be broken out into microservices. Depending on your application requirements, this may end up being a combination of a service-oriented architecture (SOA) with microservices where possible. Workloads that are capable of statelessness are more capable of being deployed as microservices. **Desired outcome** Workloads should be supportable, scalable, and as loosely coupled as possible. When making choices about how to segment your workload, balance the benefits against the complexities. What is right for a new product racing to first launch is different than what a workload built to scale from the start needs. When refactoring an existing monolith, you will need to consider how well the application will support a decomposition towards statelessness. Breaking services into smaller pieces allows small, well-defined teams to develop and manage them. However, smaller services can introduce complexities which include possible increased latency, more complex debugging, and increased operational burden. **Common anti-patterns** - The microservice Death Star is a situation in which the atomic components become so highly interdependent that a failure of one results in a much larger failure, making the components as rigid and fragile as a monolith. **Benefits of establishing this practice** - More specific segments lead to greater agility, organizational flexibility, and scalability. - Reduced impact of service interruptions. - Application components may have different availability requirements, which can be supported by a more atomic segmentation. - Well-defined responsibilities for teams supporting the workload. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Choose your architecture type based on how you will segment your workload. Choose an SOA or microservices architecture (or in some rare cases, a monolithic architecture). Even if you choose to start with a monolith architecture, you must ensure that itโ€™s modular and can ultimately evolve to SOA or microservices as your product scales with user adoption. SOA and microservices offer respectively smaller segmentation, which is preferred as a modern scalable and reliable architecture, but there are trade-offs to consider, especially when deploying a microservice architecture. One primary trade-off is that you now have a distributed compute architecture that can make it harder to achieve user latency requirements and there is additional complexity in the debugging and tracing of user interactions. You can use AWS X-Ray to assist you in solving this problem. Another effect to consider is increased operational complexity as you increase the number of applications that you are managing, which requires the deployment of multiple independency components. ### Implementation guidance - Determine the appropriate architecture to refactor or build your application. SOA and microservices offer respectively smaller segmentation, which is preferred as a modern scalable and reliable architecture. SOA can be a good compromise for achieving smaller segmentation while avoiding some of the complexities of microservices. - If your workload is amenable to it, and your organization can support it, you should use a microservices architecture to achieve the best agility and reliability. - Consider following the Strangler Fig pattern to refactor a monolith into smaller components. This involves gradually replacing specific application components with new applications and services. AWS Migration Hub Refactor Spaces acts as the starting point for incremental refactoring. - Implementing microservices may require a service discovery mechanism to allow these distributed services to communicate with each other. AWS App Mesh can be used with service-oriented architectures to provide reliable discovery and access of services. AWS Cloud Map can also be used for dynamic, DNS-based service discovery. - If youโ€™re migrating from a monolith to SOA, Amazon MQ can help bridge the gap as a service bus when redesigning legacy applications in the cloud. - For existing monoliths with a single, shared database, choose how to reorganize the data into smaller segments. This could be by business unit, access pattern, or data structure. At this point in the refactoring process, you should choose to move forward with a relational or non-relational (NoSQL) type of database. **Level of effort for the implementation plan:** High

๐Ÿ’ผ REL03-BP02 Build services focused on specific business domains and functionality

Service-oriented architectures (SOA) define services with well-delineated functions defined by business needs. Microservices use domain models and bounded context to draw service boundaries along business context boundaries. Focusing on business domains and functionality helps teams define independent reliability requirements for their services. Bounded contexts isolate and encapsulate business logic, allowing teams to better reason about how to handle failures. **Desired outcome** Engineers and business stakeholders jointly define bounded contexts and use them to design systems as services that fulfill specific business functions. These teams use established practices like event storming to define requirements. New applications are designed as services with well-defined boundaries and loose coupling. Existing monoliths are decomposed into bounded contexts and system designs move towards SOA or microservice architectures. When monoliths are refactored, established approaches like bubble contexts and monolith decomposition patterns are applied. Domain-oriented services are executed as one or more processes that donโ€™t share state. They independently respond to fluctuations in demand and handle fault scenarios in light of domain-specific requirements. **Common anti-patterns** - Teams are formed around specific technical domains like UI and UX, middleware, or database instead of specific business domains. - Applications span domain responsibilities. Services that span bounded contexts can be more difficult to maintain, require larger testing efforts, and require multiple domain teams to participate in software updates. - Domain dependencies, like domain entity libraries, are shared across services such that changes for one service domain require changes to other service domains. - Service contracts and business logic donโ€™t express entities in a common and consistent domain language, resulting in translation layers that complicate systems and increase debugging efforts. **Benefits of establishing this best practice** Applications are designed as independent services bounded by business domains and use a common business language. Services are independently testable and deployable. Services meet domain-specific resiliency requirements for the domain implemented. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Domain-driven design (DDD) is the foundational approach of designing and building software around business domains. Itโ€™s helpful to work with an existing framework when building services focused on business domains. When working with existing monolithic applications, you can take advantage of decomposition patterns that provide established techniques to modernize applications into services. ### Implementation steps 1. Teams can hold event storming workshops to quickly identify events, commands, aggregates, and domains in a lightweight sticky note format. 2. Once domain entities and functions have been formed in a domain context, you can divide your domain into services using bounded context, where entities that share similar features and attributes are grouped together. With the model divided into contexts, a template for how to boundary microservices emerges. - For example, the Amazon.com website entities might include package, delivery, schedule, price, discount, and currency. - Package, delivery, and schedule are grouped into the shipping context, while price, discount, and currency are grouped into the pricing context. 4. Decomposing monoliths into microservices outlines patterns for refactoring microservices. Using patterns for decomposition by business capability, subdomain, or transaction aligns well with domain-driven approaches. 5. Tactical techniques such as the bubble context allow you to introduce DDD in existing or legacy applications without up-front rewrites and full commitments to DDD. In a bubble context approach, a small bounded context is established using a service mapping and coordination, or anti-corruption layer, which protects the newly defined domain model from external influences. After teams have performed domain analysis and defined entities and service contracts, they can take advantage of AWS services to implement their domain-driven design as cloud-based services. 1. Start your development by defining tests that exercise business rules of your domain. Test-driven development (TDD) and behavior-driven development (BDD) help teams keep services focused on solving business problems. 2. Select the AWS services that best meet your business domain requirements and microservice architecture: - AWS Serverless allows your team to focus on specific domain logic instead of managing servers and infrastructure. - Containers at AWS simplify the management of your infrastructure, so you can focus on your domain requirements. - Purpose-built databases help you match your domain requirements to the best fit database type. 3. Building hexagonal architectures on AWS outlines a framework to build business logic into services working backwards from a business domain to fulfill functional requirements and then attach integration adapters. Patterns that separate interface details from business logic with AWS services help teams focus on domain functionality and improve software quality.

๐Ÿ’ผ REL03-BP03 Provide service contracts per API

Service contracts are documented agreements between API producers and consumers defined in a machine-readable API definition. A contract versioning strategy allows consumers to continue using the existing API and migrate their applications to a newer API when they are ready. Producer deployment can happen any time as long as the contract is followed. Service teams can use the technology stack of their choice to satisfy the API contract. **Desired outcome** Applications built with service-oriented or microservice architectures are able to operate independently while having integrated runtime dependency. Changes deployed to an API consumer or producer do not interrupt the stability of the overall system when both sides follow a common API contract. Components that communicate over service APIs can perform independent functional releases, upgrades to runtime dependencies, or fail over to a disaster recovery (DR) site with little or no impact to each other. Discrete services are able to independently scale, absorbing resource demand without requiring other services to scale in unison. **Common anti-patterns** - Creating service APIs without strongly typed schemas. This results in APIs that cannot be used to generate API bindings and payloads that canโ€™t be programmatically validated. - Not adopting a versioning strategy, which forces API consumers to update and release or fail when service contracts evolve. - Error messages that leak details of the underlying service implementation rather than describe integration failures in the domain context and language. - Not using API contracts to develop test cases and mock API implementations to allow for independent testing of service components. **Benefits of establishing this best practice** Distributed systems composed of components that communicate over API service contracts can improve reliability. Developers can catch potential issues early in the development process with type checking during compilation to verify that requests and responses follow the API contract and required fields are present. API contracts provide a clear self-documenting interface for APIs and provide better interoperability between different systems and programming languages. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Once you have identified business domains and determined your workload segmentation, you can develop your service APIs. First, define machine-readable service contracts for APIs, and then implement an API versioning strategy. When you are ready to integrate services over common protocols like REST, GraphQL, or asynchronous events, you can incorporate AWS services into your architecture to integrate your components with strongly-typed API contracts. ### AWS services for service API contracts Incorporate AWS services including Amazon API Gateway, AWS AppSync, and Amazon EventBridge into your architecture to use API service contracts in your application. Amazon API Gateway helps you integrate directly with native AWS services and other web services. API Gateway supports the OpenAPI specification and versioning. AWS AppSync is a managed GraphQL endpoint you configure by defining a GraphQL schema to define a service interface for queries, mutations, and subscriptions. Amazon EventBridge uses event schemas to define events and generate code bindings for your events. ### Implementation steps 1. First, define a contract for your API. A contract will express the capabilities of an API as well as define strongly typed data objects and fields for the API input and output. 2. When you configure APIs in API Gateway, you can import and export OpenAPI Specifications for your endpoints. - Importing an OpenAPI definition simplifies the creation of your API and can be integrated with AWS infrastructure as code tools like the AWS Serverless Application Model and AWS Cloud Development Kit (AWS CDK). - Exporting an API definition simplifies integrating with API testing tools and provides service consumers an integration specification. 5. You can define and manage GraphQL APIs with AWS AppSync by defining a GraphQL schema file to generate your contract interface and simplify interaction with complex REST models, multiple database tables, or legacy services. 6. AWS Amplify projects that are integrated with AWS AppSync generate strongly typed JavaScript query files for use in your application as well as an AWS AppSync GraphQL client library for Amazon DynamoDB tables. 7. When you consume service events from Amazon EventBridge, events adhere to schemas that already exist in the schema registry or that you define with the OpenAPI Spec. With a schema defined in the registry, you can also generate client bindings from the schema contract to integrate your code with events. 8. Extending or versioning your API. Extending an API is a simpler option when adding fields that can be configured with optional fields or default values for required fields. - JSON-based contracts for protocols like REST and GraphQL can be a good fit for contract extension. - XML-based contracts for protocols like SOAP should be tested with service consumers to determine the feasibility of contract extension. 9. When versioning an API, consider implementing proxy versioning where a facade is used to support versions so that logic can be maintained in a single codebase. - With API Gateway, you can use request and response mappings to simplify absorbing contract changes by establishing a facade to provide default values for new fields or to strip removed fields from a request or response. With this approach, the underlying service can maintain a single codebase.

๐Ÿ’ผ REL04-BP01 Identify the kind of distributed systems you depend on

Distributed systems can be synchronous, asynchronous, or batch. Synchronous systems must process requests as quickly as possible and communicate with each other by making synchronous request and response calls using HTTP/S, REST, or remote procedure call (RPC) protocols. Asynchronous systems communicate with each other by exchanging data asynchronously through an intermediary service without coupling individual systems. Batch systems receive a large volume of input data, run automated data processes without human intervention, and generate output data. **Desired outcome** Design a workload that effectively interacts with synchronous, asynchronous, and batch dependencies. **Common anti-patterns** - Workload waits indefinitely for a response from its dependencies, which could lead to workload clients timing out, not knowing if their request has been received. - Workload uses a chain of dependent systems that call each other synchronously. This requires each system to be available and to successfully process a request before the whole chain can succeed, leading to potentially brittle behavior and overall availability. - Workload communicates with its dependencies asynchronously and relies on the concept of exactly-once guaranteed delivery of messages, when often it is still possible to receive duplicate messages. - Workload does not use proper batch scheduling tools and allows concurrent execution of the same batch job. **Benefits of establishing this best practice** It is common for a given workload to implement one or more styles of communication between synchronous, asynchronous, and batch. This best practice helps you identify the different trade-offs associated with each style of communication to make your workload able to tolerate disruptions in any of its dependencies. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance The following sections contain both general and specific implementation guidance for each kind of dependency. ### General guidance - Make sure that the performance and reliability service-level objectives (SLOs) that your dependencies offer meet the performance and reliability requirements of your workload. - Use AWS observability services to monitor response times and error rates to make sure your dependency is providing service at the levels needed by your workload. - Identify the potential challenges that your workload may face when communicating with its dependencies. Distributed systems come with a wide range of challenges that might increase architectural complexity, operational burden, and cost. Common challenges include latency, network disruptions, data loss, scaling, and data replication lag. - Implement robust error handling and logging to help you troubleshoot problems when your dependency experiences issues. ### Synchronous dependency In synchronous communications, your workload sends a request to its dependency and blocks the operation waiting for a response. When its dependency receives the request, it tries to handle it as soon as possible and sends a response back to your workload. A significant challenge with synchronous communication is that it causes temporal coupling, which requires your workload and its dependencies to be available at the same time. When your workload needs to communicate synchronously with its dependencies, consider the following guidance: - Your workload should not rely on multiple synchronous dependencies to perform a single function. This chain of dependencies increases overall brittleness because all dependencies in the pathway need to be available in order for the request to complete successfully. - When a dependency is unhealthy or unavailable, determine your error handling and retry strategies. Avoid using bimodal behavior. Bimodal behavior is when your workload exhibits different behavior under normal and failure modes. For more details on bimodal behavior, see REL11-BP05 Use static stability to prevent bimodal behavior. - Keep in mind that failing fast is better than making your workload wait. For instance, the AWS Lambda Developer Guide describes how to handle retries and failures when you invoke Lambda functions. - Set timeouts when your workload calls its dependency. This technique avoids waiting too long or waiting indefinitely for a response. For helpful discussion of this issue, see Tuning AWS Java SDK HTTP request settings for latency-aware Amazon DynamoDB applications. - Minimize the number of calls made from your workload to its dependency to fulfill a single request. Having chatty calls between them increases coupling and latency. ### Asynchronous dependency To temporally decouple your workload from its dependency, they should communicate asynchronously. Using an asynchronous approach, your workload can continue with any other processing without having to wait for its dependency, or chain of dependencies, to send a response. When your workload needs to communicate asynchronously with its dependency, consider the following guidance: - Determine whether to use messaging or event streaming based on your use case and requirements. Messaging allows your workload to communicate with its dependency by sending and receiving messages through a message broker. Event streaming allows your workload and its dependency to use a streaming service to publish and subscribe to events, delivered as continuous streams of data, that need to be processed as soon as possible. - Messaging and event streaming handle messages differently so you need to make trade-off decisions based on: - Message priority: message brokers can process high-priority messages ahead of normal messages. In event streaming, all messages have the same priority. - Message consumption: message brokers ensure that consumers receive the message. Event streaming consumers must keep track of the last message they have read. - Message ordering: with messaging, receiving messages in the exact order they are sent is not guaranteed unless you use a first-in-first-out (FIFO) approach. Event streaming always preserves the order in which the data was produced. - Message deletion: with messaging, the consumer must delete the message after processing it. The event streaming service appends the message to a stream and remains in there until the message's retention period expires. This deletion policy makes event streaming suitable for replaying messages. - Define how your workload knows when its dependency completes its work. For instance, when your workload invokes a Lambda function asynchronously, Lambda places the event in a queue and returns a success response without additional information. After processing is complete, the Lambda function can send the result to a destination, configurable based on success or failure. - Build your workload to handle duplicate messages by leveraging idempotency. Idempotency means that the results of your workload do not change even if your workload is generated more than once for the same message. Messaging or streaming services will redeliver a message if a network failure occurs or if an acknowledgement has not been received. - If your workload does not get a response from its dependency, it needs to resubmit the request. Consider limiting the number of retries to preserve your workload's CPU, memory, and network resources to handle other requests. The AWS Lambda documentation shows how to handle errors for asynchronous invocation. - Leverage suitable observability, debugging, and tracing tools to manage and operate your workload's asynchronous communication with its dependency. You can use Amazon CloudWatch to monitor messaging and event streaming services. You can also instrument your workload with AWS X-Ray to quickly gain insights for troubleshooting problems. ### Batch dependency Batch systems take input data, initiate a series of jobs to process it, and produce some output data, without manual intervention. Depending on the data size, jobs could run from minutes to, in some cases, several days. When your workload communicates with its batch dependency, consider the following guidance: - Define the time window when your workload should run the batch job. Your workload can set up a recurrence pattern to invoke a batch system, for example, every hour or at the end of every month. - Determine the location of the data input and the processed data output. Choose a storage service, such as Amazon Simple Storage Services (Amazon S3), Amazon Elastic File System (Amazon EFS), and Amazon FSx for Lustre, that allows your workload to read and write files at scale. - If your workload needs to invoke multiple batch jobs, you could leverage AWS Step Functions to simplify the orchestration of batch jobs that run in AWS or on-premises. This sample project demonstrates orchestration of batch jobs using Step Functions, AWS Batch, and Lambda. - Monitor batch jobs to look for abnormalities, such as a job taking longer than it should to complete. You could use tools like CloudWatch Container Insights to monitor AWS Batch environments and jobs. In this instance, your workload would stop the next job from beginning and inform the relevant staff of the exception.

๐Ÿ’ผ REL04-BP02 Implement loosely coupled dependencies

Dependencies such as queuing systems, streaming systems, workflows, and load balancers are loosely coupled. Loose coupling helps isolate behavior of a component from other components that depend on it, increasing resiliency and agility. Decoupling dependencies, such as queuing systems, streaming systems, and workflows, help minimize the impact of changes or failure on a system. This separation isolates a component's behavior from affecting others that depend on it, improving resilience and agility. In tightly coupled systems, changes to one component can necessitate changes in other components that rely on it, resulting in degraded performance across all components. Loose coupling breaks this dependency so that dependent components only need to know the versioned and published interface. Implementing loose coupling between dependencies isolates a failure in one from impacting another. Loose coupling allows you to modify code or add features to a component while minimizing risk to other components that depend on it. It also allows for granular resilience at a component level where you can scale out or even change underlying implementation of the dependency. To further improve resiliency through loose coupling, make component interactions asynchronous where possible. This model is suitable for any interaction that does not need an immediate response and where an acknowledgment that a request has been registered will suffice. It involves one component that generates events and another that consumes them. The two components do not integrate through direct point-to-point interaction but usually through an intermediate durable storage layer, such as an Amazon SQS queue, a streaming data platform such as Amazon Kinesis, or AWS Step Functions. Amazon SQS queues and AWS Step Functions are just two ways to add an intermediate layer for loose coupling. Event-driven architectures can also be built in the AWS Cloud using Amazon EventBridge, which can abstract clients (event producers) from the services they rely on (event consumers). Amazon Simple Notification Service (Amazon SNS) is an effective solution when you need high-throughput, push-based, many-to-many messaging. Using Amazon SNS topics, your publisher systems can fan out messages to a large number of subscriber endpoints for parallel processing. While queues offer several advantages, in most hard real-time systems, requests older than a threshold time (often seconds) should be considered stale (the client has given up and is no longer waiting for a response), and not processed. This way more recent (and likely still valid requests) can be processed instead. **Desired outcome:** Implementing loosely coupled dependencies allows you to minimize the surface area for failure to a component level, which helps diagnose and resolve issues. It also simplifies development cycles, allowing teams to implement changes at a modular level without affecting the performance of other components that depend on it. This approach provides the capability to scale out at a component level based on resource needs, as well as utilization of a component contributing to cost-effectiveness. **Common anti-patterns:** - Deploying a monolithic workload. - Directly invoking APIs between workload tiers with no capability of failover or asynchronous processing of the request. - Tight coupling using shared data. Loosely coupled systems should avoid sharing data through shared databases or other forms of tightly coupled data storage, which can reintroduce tight coupling and hinder scalability. - Ignoring back pressure. Your workload should have the ability to slow down or stop incoming data when a component can't process it at the same rate. **Benefits of establishing this best practice:** Loose coupling helps isolate behavior of a component from other components that depend on it, increasing resiliency and agility. Failure in one component is isolated from others. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Implement loosely coupled dependencies. There are various solutions that allow you to build loosely coupled applications. These include services for implementing fully managed queues, automated workflows, react to events, and APIs among others which can help isolate behavior of components from other components, and as such increasing resilience and agility. - Build event-driven architectures: Amazon EventBridge helps you build loosely coupled and distributed event-driven architectures. - Implement queues in distributed systems: You can use Amazon Simple Queue Service (Amazon SQS) to integrate and decouple distributed systems. - Containerize components as microservices: Microservices allow teams to build applications composed of small independent components which communicate over well-defined APIs. Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Kubernetes Service (Amazon EKS) can help you get started faster with containers. - Manage workflows with Step Functions: Step Functions help you coordinate multiple AWS services into flexible workflows. - Leverage publish-subscribe (pub/sub) messaging architectures: Amazon Simple Notification Service (Amazon SNS) provides message delivery from publishers to subscribers (also known as producers and consumers). ### Implementation steps - Components in an event-driven architecture are initiated by events. Events are actions that happen in a system, such as a user adding an item to a cart. When an action is successful, an event is generated that actuates the next component of the system. - Distributed messaging systems have three main parts that need to be implemented for a queue based architecture. They include components of the distributed system, the queue that is used for decoupling (distributed on Amazon SQS servers), and the messages in the queue. A typical system has producers which initiate the message into the queue, and the consumer which receives the message from the queue. The queue stores messages across multiple Amazon SQS servers for redundancy. - Microservices, when well-utilized, enhance maintainability and boost scalability, as loosely coupled components are managed by independent teams. It also allows for the isolation of behaviors to a single component in case of changes. - With AWS Step Functions you can build distributed applications, automate processes, orchestrate microservices, among other things. The orchestration of multiple components into an automated workflow allows you to decouple dependencies in your application.

๐Ÿ’ผ REL04-BP03 Do constant work

Systems can fail when there are large, rapid changes in load. For example, if your workload is doing a health check that monitors the health of thousands of servers, it should send the same size payload (a full snapshot of the current state) each time. Whether no servers are failing, or all of them, the health check system is doing constant work with no large, rapid changes. For example, if the health check system is monitoring 100,000 servers, the load on it is nominal under the normally light server failure rate. However, if a major event makes half of those servers unhealthy, then the health check system would be overwhelmed trying to update notification systems and communicate state to its clients. So instead the health check system should send the full snapshot of the current state each time. 100,000 server health states, each represented by a bit, would only be a 12.5-KB payload. Whether no servers are failing, or all of them are, the health check system is doing constant work, and large, rapid changes are not a threat to the system stability. This is actually how Amazon Route 53 handles health checks for endpoints (such as IP addresses) to determine how end users are routed to them. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance - Do constant work so that systems do not fail when there are large, rapid changes in load. - Implement loosely coupled dependencies. Dependencies such as queuing systems, streaming systems, workflows, and load balancers are loosely coupled. Loose coupling helps isolate behavior of a component from other components that depend on it, increasing resiliency and agility.

๐Ÿ’ผ REL04-BP04 Make mutating operations idempotent

An idempotent service promises that each request is processed exactly once, such that making multiple identical requests has the same effect as making a single request. This makes it easier for a client to implement retries without fear that a request is erroneously processed multiple times. To do this, clients can issue API requests with an idempotency token, which is used whenever the request is repeated. An idempotent service API uses the token to return a response identical to the response that was returned the first time that the request was completed, even if the underlying state of the system has changed. In a distributed system, it is relatively simple to perform an action at most once (client makes only one request) or at least once (keep requesting until client gets confirmation of success). It is more difficult to guarantee an action is performed exactly once, such that making multiple identical requests has the same effect as making a single request. Using idempotency tokens in APIs, services can receive a mutating request one or more times without the need to create duplicate records or side effects. **Desired outcome:** You have a consistent, well-documented, and widely adopted approach for ensuring idempotency across all components and services. **Common anti-patterns:** - You apply idempotency indiscriminately, even when not needed. - You introduce overly complex logic for implementing idempotency. - You use timestamps as keys for idempotency. This can cause inaccuracies due to clock skew or due to multiple clients that use the same timestamps to apply changes. - You store entire payloads for idempotency. In this approach, you save complete data payloads for every request and overwrite it at each new request. This can degrade performance and affect scalability. - You generate keys inconsistently across services. Without consistent keys, services may fail to recognize duplicate requests, which results in unintended results. **Benefits of establishing this best practice:** - Greater scalability: The system can handle retries and duplicate requests without having to perform additional logic or complex state management. - Enhanced reliability: Idempotency helps services handle multiple identical requests in a consistent manner, which reduces the risk of unintended side effects or duplicate records. This is especially crucial in distributed systems, where network failures and retries are common. - Improved data consistency: Because the same request produces the same response, idempotency helps maintain data consistency across distributed systems. This is essential to maintain the integrity of transactions and operations. - Error handling: Idempotency tokens make error handling more straightforward. If a client does not receive a response due to an issue, it can safely resend the request with the same idempotency token. - Operational transparency: Idempotency allows for better monitoring and logging. Services can log requests with their idempotency tokens, which makes it easier to trace and debug issues. - Simplified API contract: It can simplify the contract between the client and server side systems and reduce the fear of erroneous data processing. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance In a distributed system, performing an action at most once (the client makes only one request) or at least once (the client keeps requesting until success is confirmed) is relatively straightforward. However, it's challenging to implement exactly once behavior. To achieve this, your clients should generate and provide an idempotency token for each request. By using idempotency tokens, a service can distinguish between new requests and repeated ones. When a service receives a request with an idempotency token, it checks if the token has already been used. If the token has been used, the service retrieves and returns the stored response. If the token is new, the service processes the request, stores the response along with the token, and then returns the response. This mechanism makes all responses idempotent, which enhances the reliability and consistency of the distributed system. Idempotency is also an important behavior of event-driven architectures. These architectures are typically backed by a message queue such as Amazon SQS, Amazon MQ, Amazon Kinesis Streams, or Amazon Managed Streaming for Apache Kafka (MSK). In some circumstances, a message that was published only once may be accidentally delivered more than once. When a publisher generates and includes idempotency tokens in messages, it requests that the processing of any duplicate message received doesn't result in a repeated action for the same message. Consumers should keep track of each token received and ignore messages that contain duplicate tokens. Services and consumers should also pass the received idempotency token to any downstream services that it calls. Every downstream service in the processing chain is similarly responsible for making sure that idempotency is implemented to avoid the side effect of processing a message more than once. ### Implementation steps 1. **Identify idempotent operations** Determine which operations require idempotency. These typically include POST, PUT, and DELETE HTTP methods and database insert, update, or delete operations. Operations that do not mutate state, such as read-only queries, usually do not require idempotency unless they have side effects. 2. **Use unique identifiers** Include a unique token in each idempotent operation request sent by the sender, either directly in the request or as part of its metadata (for example, an HTTP header). This allows the receiver to recognize and handle duplicate requests or operations. Identifiers commonly used for tokens include Universally Unique Identifiers (UUIDs) and K-Sortable Unique Identifiers (KSUIDs). 3. **Track and manage state** Maintain the state of each operation or request in your workload. This can be achieved by storing the idempotency token and the corresponding state (such as pending, completed, or failed) in a database, cache, or other persistent store. This state information allows the workload to identify and handle duplicate requests or operations. Maintain consistency and atomicity by using appropriate concurrency control mechanisms if needed, such as locks, transactions, or optimistic concurrency controls. This includes the process of recording the idempotent token and running all mutating operations associated with servicing the request. This helps prevent race conditions and verifies that idempotent operations run correctly. Regularly remove old idempotency tokens from the datastore to manage storage and performance. If your storage system supports it, consider using expiration timestamps for data (often known as time to live, or TTL values). The likelihood of idempotency token reuse diminishes over time. Common AWS storage options typically used for storing idempotency tokens and related state include: - **Amazon DynamoDB:** DynamoDB is a NoSQL database service that provides low-latency performance and high availability, which makes it well-suited for the storage of idempotency-related data. The key-value and document data model of DynamoDB allows for efficient storage and retrieval of idempotency tokens and associated state information. DynamoDB can also expire idempotency tokens automatically if your application sets a TTL value when it inserts them. - **Amazon ElastiCache:** ElastiCache can store idempotency tokens with high throughput, low latency, and at low cost. Both ElastiCache (Redis) and ElastiCache (Memcached) can also expire idempotency tokens automatically if your application sets a TTL value when it inserts them. - **Amazon Relational Database Service (RDS):** You can use Amazon RDS to store idempotency tokens and related state information, especially if your application already uses a relational database for other purposes. - **Amazon Simple Storage Service (S3):** Amazon S3 is a highly scalable and durable object storage service that can be used to store idempotency tokens and related metadata. The versioning capabilities of S3 can be particularly useful for maintenance of the state of idempotent operations. The choice of storage service typically depends on factors such as the volume of idempotency-related data, the required performance characteristics, the need for durability and availability, and how the idempotency mechanism integrates with the overall workload architecture. 4. **Implement idempotent operations** Design your API and workload components to be idempotent. Incorporate idempotency checks into your workload components. Before you process a request or perform an operation, check if the unique identifier has already been processed. If it has, return the previous result instead of executing the operation again. For example, if a client sends a request to create a user, check if a user with the same unique identifier already exists. If the user exists, it should return the existing user information instead of creating a new one. Similarly, if a queue consumer receives a message with a duplicate idempotency token, the consumer should ignore the message. Create comprehensive test suites that validate the idempotency of requests. They should cover a wide range of scenarios, such as successful requests, failed requests, and duplicate requests. If your workload leverages AWS Lambda functions, consider Powertools for AWS Lambda. Powertools for AWS Lambda is a developer toolkit that helps implement serverless best practices and increase developer velocity when you work with AWS Lambda functions. In particular, it provides a utility to convert your Lambda functions into idempotent operations which are safe to retry. 5. **Communicate idempotency clearly** Document your API and workload components to clearly communicate the idempotent nature of the operations. This helps clients understand the expected behavior and how to interact with your workload reliably. 6. **Monitor and audit** Implement monitoring and auditing mechanisms to detect any issues related to the idempotency of responses, such as unexpected response variations or excessive duplicate request handling. This can help you detect and investigate any issues or unexpected behaviors in your workload.

๐Ÿ’ผ REL05-BP01 Implement graceful degradation to transform applicable hard dependencies into soft dependencies

Application components should continue to perform their core function even if dependencies become unavailable. They might be serving slightly stale data, alternate data, or even no data. This ensures overall system function is only minimally impeded by localized failures while delivering the central business value. **Desired outcome:** When a component's dependencies are unhealthy, the component itself can still function, although in a degraded manner. Failure modes of components should be seen as normal operation. Workflows should be designed in such a way that such failures do not lead to complete failure or at least to predictable and recoverable states. **Common anti-patterns:** - Not identifying the core business functionality needed. Not testing that components are functional even during dependency failures. - Serving no data on errors or when only one out of multiple dependencies is unavailable and partial results can still be returned. - Creating an inconsistent state when a transaction partially fails. - Not having an alternative way to access a central parameter store. - Invalidating or emptying local state as a result of a failed refresh without considering the consequences of doing so. **Benefits of establishing this best practice:** Graceful degradation improves the availability of the system as a whole and maintains the functionality of the most important functions even during failures. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Implementing graceful degradation helps minimize the impact of dependency failures on component function. Ideally, a component detects dependency failures and works around them in a way that minimally impacts other components or customers. Architecting for graceful degradation means considering potential failure modes during dependency design. For each failure mode, have a way to deliver most or at least the most critical functionality of the component to callers or customers. These considerations can become additional requirements that can be tested and verified. Ideally, a component is able to perform its core function in an acceptable manner even when one or multiple dependencies fail. This is as much a business discussion as a technical one. All business requirements are important and should be fulfilled if possible. However, it still makes sense to ask what should happen when not all of them can be fulfilled. A system can be designed to be available and consistent, but under circumstances where one requirement must be dropped, which one is more important? For payment processing, it might be consistency. For a real-time application, it might be availability. For a customer facing website, the answer may depend on customer expectations. What this means depends on the requirements of the component and what should be considered its core function. For example: - An ecommerce website might display data from multiple different systems like personalized recommendations, highest ranked products, and status of customer orders on the landing page. When one upstream system fails, it still makes sense to display everything else instead of showing an error page to a customer. - A component performing batch writes can still continue processing a batch if one of the individual operations fails. It should be simple to implement a retry mechanism. This can be done by returning information on which operations succeeded, which failed, and why they failed to the caller, or putting failed requests into a dead letter queue to implement asynchronous retries. Information about failed operations should be logged as well. - A system that processes transactions must verify that either all or no individual updates are executed. For distributed transactions, the saga pattern can be used to roll back previous operations in case a later operation of the same transaction fails. Here, the core function is maintaining consistency. - Time critical systems should be able to deal with dependencies not responding in a timely manner. In these cases, the circuit breaker pattern can be used. When responses from a dependency start timing out, the system can switch to a closed state where no additional call are made. - An application may read parameters from a parameter store. It can be useful to create container images with a default set of parameters and use these in case the parameter store is unavailable. Note that the pathways taken in case of component failure need to be tested and should be significantly simpler than the primary pathway. Generally, fallback strategies should be avoided. ### Implementation steps Identify external and internal dependencies. Consider what kinds of failures can occur in them. Think about ways that minimize negative impact on upstream and downstream systems and customers during those failures. The following is a list of dependencies and how to degrade gracefully when they fail: 1. Partial failure of dependencies: A component may make multiple requests to downstream systems, either as multiple requests to one system or one request to multiple systems each. Depending on the business context, different ways of handling for this may be appropriate. 2. A downstream system is unable to process requests due to high load: If requests to a downstream system are consistently failing, it does not make sense to continue retrying. This may create additional load on an already overloaded system and make recovery more difficult. The circuit breaker pattern can be utilized here, which monitors failing calls to a downstream system. If a high number of calls are failing, it will stop sending more requests to the downstream system and only occasionally let calls through to test whether the downstream system is available again. 3. A parameter store is unavailable: To transform a parameter store, soft dependency caching or sane defaults included in container or machine images may be used. Note that these defaults need to be kept up-to-date and included in test suites. 4. A monitoring service or other non-functional dependency is unavailable: If a component is intermittently unable to send logs, metrics, or traces to a central monitoring service, it is often best to still execute business functions as usual. Silently not logging or pushing metrics for a long time is often not acceptable. Also, some use cases may require complete auditing entries to fulfill compliance requirements. 5. A primary instance of a relational database may be unavailable: Amazon Relational Database Service, like almost all relational databases, can only have one primary writer instance. This creates a single point of failure for write workloads and makes scaling more difficult. This can partially be mitigated by using a Multi-AZ configuration for high availability or Amazon Aurora Serverless for better scaling. For very high availability requirements, it can make sense to not rely on the primary writer at all. For queries that only read, read replicas can be used, which provide redundancy and the ability to scale out, not just up. Writes can be buffered, for example in an Amazon Simple Queue Service queue, so that write requests from customers can still be accepted even if the primary is temporarily unavailable.

๐Ÿ’ผ REL05-BP02 Throttle requests

Throttle requests to mitigate resource exhaustion due to unexpected increases in demand. Requests below throttling rates are processed while those over the defined limit are rejected with a return message indicating the request was throttled. **Desired outcome:** Large volume spikes either from sudden customer traffic increases, flooding attacks, or retry storms are mitigated by request throttling, allowing workloads to continue normal processing of supported request volume. **Common anti-patterns:** - API endpoint throttles are not implemented or are left at default values without considering expected volumes. - API endpoints are not load tested or throttling limits are not tested. - Throttling request rates without considering request size or complexity. - Testing maximum request rates or maximum request size, but not testing both together. - Resources are not provisioned to the same limits established in testing. - Usage plans have not been configured or considered for application to application (A2A) API consumers. - Queue consumers that horizontally scale do not have maximum concurrency settings configured. - Rate limiting on a per IP address basis has not been implemented. **Benefits of establishing this best practice:** Workloads that set throttle limits are able to operate normally and process accepted request load successfully under unexpected volume spikes. Sudden or sustained spikes of requests to APIs and queues are throttled and do not exhaust request processing resources. Rate limits throttle individual requestors so that high volumes of traffic from a single IP address or API consumer will not exhaust resources impact other consumers. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Services should be designed to process a known capacity of requests; this capacity can be established through load testing. If request arrival rates exceed limits, the appropriate response signals that a request has been throttled. This allows the consumer to handle the error and retry later. When your service requires a throttling implementation, consider implementing the token bucket algorithm, where a token counts for a request. Tokens are refilled at a throttle rate per second and emptied asynchronously by one token per request. Amazon API Gateway implements the token bucket algorithm according to account and region limits and can be configured per-client with usage plans. Additionally, Amazon Simple Queue Service (Amazon SQS) and Amazon Kinesis can buffer requests to smooth out the request rate, and allow higher throttling rates for requests that can be addressed. Finally, you can implement rate limiting with AWS WAF to throttle specific API consumers that generate unusually high load. ### Implementation steps You can configure API Gateway with throttling limits for your APIs and return 429 Too Many Requests errors when limits are exceeded. You can use AWS WAF with your AWS AppSync and API Gateway endpoints to enable rate limiting on a per IP address basis. Additionally, where your system can tolerate asynchronous processing, you can put messages into a queue or stream to speed up responses to service clients, which allows you to burst to higher throttle rates. With asynchronous processing, when youโ€™ve configured Amazon SQS as an event source for AWS Lambda, you can configure maximum concurrency to avoid high event rates from consuming available account concurrent execution quota needed for other services in your workload or account. While API Gateway provides a managed implementation of the token bucket, in cases where you cannot use API Gateway, you can take advantage of language specific open-source implementations (see related examples in Resources) of the token bucket for your services. - Understand and configure API Gateway throttling limits at the account level per region, API per stage, and API key per usage plan levels. - Apply AWS WAF rate limiting rules to API Gateway and AWS AppSync endpoints to protect against floods and block malicious IPs. Rate limiting rules can also be configured on AWS AppSync API keys for A2A consumers. - Consider whether you require more throttling control than rate limiting for AWS AppSync APIs, and if so, configure an API Gateway in front of your AWS AppSync endpoint. - When Amazon SQS queues are set up as triggers for Lambda queue consumers, set maximum concurrency to a value that processes enough to meet your service level objectives but does not consume concurrency limits impacting other Lambda functions. Consider setting reserved concurrency on other Lambda functions in the same account and region when you consume queues with Lambda. - Use API Gateway with native service integrations to Amazon SQS or Kinesis to buffer requests. - If you cannot use API Gateway, look at language specific libraries to implement the token bucket algorithm for your workload. Check the examples section and do your own research to find a suitable library. - Test limits that you plan to set, or that you plan to allow to be increased, and document the tested limits. - Do not increase limits beyond what you establish in testing. When increasing a limit, verify that provisioned resources are already equivalent to or greater than those in test scenarios before applying the increase.

๐Ÿ’ผ REL05-BP03 Control and limit retry calls

Use exponential backoff to retry requests at progressively longer intervals between each retry. Introduce jitter between retries to randomize retry intervals. Limit the maximum number of retries. **Desired outcome:** Typical components in a distributed software system include servers, load balancers, databases, and DNS servers. During normal operation, these components can respond to requests with errors that are temporary or limited, and also errors that would be persistent regardless of retries. When clients make requests to services, the requests consume resources including memory, threads, connections, ports, or any other limited resources. Controlling and limiting retries is a strategy to release and minimize consumption of resources so that system components under strain are not overwhelmed. When client requests time out or receive error responses, they should determine whether or not to retry. If they do retry, they do so with exponential backoff with jitter and a maximum retry value. As a result, backend services and processes are given relief from load and time to self-heal, resulting in faster recovery and successful request servicing. **Common anti-patterns:** - Implementing retries without adding exponential backoff, jitter, and maximum retry values. Backoff and jitter help avoid artificial traffic spikes due to unintentionally coordinated retries at common intervals. - Implementing retries without testing their effects or assuming retries are already built into an SDK without testing retry scenarios. - Failing to understand published error codes from dependencies, leading to retrying all errors, including those with a clear cause that indicates lack of permission, configuration error, or another condition that predictably will not resolve without manual intervention. - Not addressing observability practices, including monitoring and alerting on repeated service failures so that underlying issues are made known and can be addressed. - Developing custom retry mechanisms when built-in or third-party retry capabilities suffice. - Retrying at multiple layers of your application stack in a manner which compounds retry attempts further consuming resources in a retry storm. Be sure to understand how these errors affect your application the dependencies you rely on, then implement retries at only one level. - Retrying service calls that are not idempotent, causing unexpected side effects like duplicated results. **Benefits of establishing this best practice:** Retries help clients acquire desired results when requests fail but also consume more of a serverโ€™s time to get the successful responses they want. When failures are rare or transient, retries work well. When failures are caused by resource overload, retries can make things worse. Adding exponential backoff with jitter to client retries allows servers to recover when failures are caused by resource overload. Jitter avoids alignment of requests into spikes, and backoff diminishes load escalation caused by adding retries to normal request load. Finally, itโ€™s important to configure a maximum number of retries or elapsed time to avoid creating backlogs that produce metastable failures. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Control and limit retry calls. Use exponential backoff to retry after progressively longer intervals. Introduce jitter to randomize retry intervals and limit the maximum number of retries. Some AWS SDKs implement retries and exponential backoff by default. Use these built-in AWS implementations where applicable in your workload. Implement similar logic in your workload when calling services that are idempotent and where retries improve your client availability. Decide what the timeouts are and when to stop retrying based on your use case. Build and exercise testing scenarios for those retry use cases. ### Implementation steps - Determine the optimal layer in your application stack to implement retries for the services your application relies on. - Be aware of existing SDKs that implement proven retry strategies with exponential backoff and jitter for your language of choice, and favor these over writing your own retry implementations. - Verify that services are idempotent before implementing retries. Once retries are implemented, be sure they are both tested and regularly exercise in production. - When calling AWS service APIs, use the AWS SDKs and AWS CLI and understand the retry configuration options. Determine if the defaults work for your use case, test, and adjust as needed.

๐Ÿ’ผ REL05-BP04 Fail fast and limit queues

When a service is unable to respond successfully to a request, fail fast. This allows resources associated with a request to be released, and permits a service to recover if itโ€™s running out of resources. Failing fast is a well-established software design pattern that can be leveraged to build highly reliable workloads in the cloud. Queuing is also a well-established enterprise integration pattern that can smooth load and allow clients to release resources when asynchronous processing can be tolerated. When a service is able to respond successfully under normal conditions but fails when the rate of requests is too high, use a queue to buffer requests. However, do not allow a buildup of long queue backlogs that can result in processing stale requests that a client has already given up on. **Desired outcome:** When systems experience resource contention, timeouts, exceptions, or grey failures that make service level objectives unachievable, fail fast strategies allow for faster system recovery. Systems that must absorb traffic spikes and can accommodate asynchronous processing can improve reliability by allowing clients to quickly release requests by using queues to buffer requests to backend services. When buffering requests to queues, queue management strategies are implemented to avoid insurmountable backlogs. **Common anti-patterns:** - Implementing message queues but not configuring dead letter queues (DLQ) or alarms on DLQ volumes to detect when a system is in failure. - Not measuring the age of messages in a queue, a measurement of latency to understand when queue consumers are falling behind or erroring out causing retrying. - Not clearing backlogged messages from a queue, when there is no value in processing these messages if the business need no longer exists. - Configuring first in first out (FIFO) queues when last in first out (LIFO) queues would better serve client needs, for example when strict ordering is not required and backlog processing is delaying all new and time sensitive requests resulting in all clients experiencing breached service levels. - Exposing internal queues to clients instead of exposing APIs that manage work intake and place requests into internal queues. - Combining too many work request types into a single queue which can exacerbate backlog conditions by spreading resource demand across request types. - Processing complex and simple requests in the same queue, despite needing different monitoring, timeouts and resource allocations. - Not validating inputs or using assertions to implement fail fast mechanisms in software that bubble up exceptions to higher level components that can handle errors gracefully. - Not removing faulty resources from request routing, especially when failures are grey emitting both successes and failures due to crashing and restarting, intermittent dependency failure, reduced capacity, or network packet loss. **Benefits of establishing this best practice:** Systems that fail fast are easier to debug and fix, and often expose issues in coding and configuration before releases are published into production. Systems that incorporate effective queueing strategies provide greater resilience and reliability to traffic spikes and intermittent system fault conditions. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Fail fast strategies can be coded into software solutions as well as configured into infrastructure. In addition to failing fast, queues are a straightforward yet powerful architectural technique to decouple system components smooth load. Amazon CloudWatch provides capabilities to monitor for and alarm on failures. Once a system is known to be failing, mitigation strategies can be invoked, including failing away from impaired resources. When systems implement queues with Amazon SQS and other queue technologies to smooth load, they must consider how to manage queue backlogs, as well as message consumption failures. ### Implementation steps - Implement programmatic assertions or specific metrics in your software and use them to explicitly alert on system issues. Amazon CloudWatch helps you create metrics and alarms based on application log pattern and SDK instrumentation. - Use CloudWatch metrics and alarms to fail away from impaired resources that are adding latency to processing or repeatedly failing to process requests. - Use asynchronous processing by designing APIs to accept requests and append requests to internal queues using Amazon SQS and then respond to the message-producing client with a success message so the client can release resources and move on with other work while backend queue consumers process requests. - Measure and monitor for queue processing latency by producing a CloudWatch metric each time you take a message off a queue by comparing now to message timestamp. - When failures prevent successful message processing or traffic spikes in volumes that cannot be processed within service level agreements, sideline older or excess traffic to a spillover queue. This allows priority processing of new work, and older work when capacity is available. This technique is an approximation of LIFO processing and allows normal system processing for all new work. - Use dead letter or redrive queues to move messages that canโ€™t be processed out of the backlog into a location that can be researched and resolved later. - Either retry or, when tolerable, drop old messages by comparing now to the message timestamp and discarding messages that are no longer relevant to the requesting client.

๐Ÿ’ผ REL05-BP05 Set client timeouts

Set timeouts appropriately on connections and requests, verify them systematically, and do not rely on default values as they are not aware of workload specifics. **Desired outcome:** Client timeouts should consider the cost to the client, server, and workload associated with waiting for requests that take abnormal amounts of time to complete. Since it is not possible to know the exact cause of any timeout, clients must use knowledge of services to develop expectations of probable causes and appropriate timeouts. Client connections time out based on configured values. After encountering a timeout, clients make decisions to back off and retry or open a circuit breaker. These patterns avoid issuing requests that may exacerbate an underlying error condition. **Common anti-patterns:** - Not being aware of system timeouts or default timeouts. - Not being aware of normal request completion timing. - Not considering causes for requests taking abnormally long or the resource cost of waiting. - Ignoring potential network impairments that may fail requests only after long timeouts. - Not testing timeout scenarios for both connections and requests. - Setting timeouts too high, causing long wait times and higher resource consumption. - Setting timeouts too low, causing artificial failures. - Overlooking patterns for handling timeout errors, like circuit breakers or retries. - Not monitoring service call error rates, latency, and outliers to inform timeout configuration. **Benefits of establishing this best practice:** Remote call timeouts are configured appropriately, systems handle timeouts gracefully, conserve resources when remote calls respond abnormally slow, and errors are managed predictably. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Set both a connection timeout and a request timeout on any service dependency call and generally on any call across processes. Many frameworks offer built-in timeout capabilities, but be careful, as some have default values that are infinite or higher than acceptable for your service goals. A value that is too high reduces the usefulness of the timeout because resources continue to be consumed while the client waits for the timeout to occur. A value that is too low can generate increased traffic on the backend and increased latency because too many requests are retried. In some cases, this can lead to complete outages because all requests are being retried. Consider the following when determining timeout strategies: - Requests may take longer than normal to process because of their content, impairments in a target service, or a networking partition failure. - Requests with abnormally expensive content could consume unnecessary server and client resources. In this case, timing out these requests and not retrying can preserve resources. Services should also protect themselves from abnormally expensive content with throttles and server-side timeouts. - Requests that take abnormally long due to a service impairment can be timed out and retried. Consideration should be given to service costs for the request and retry, but if the cause is a localized impairment, a retry is not likely to be expensive and will reduce client resource consumption. The timeout may also release server resources depending on the nature of the impairment. - Requests that take a long time to complete because the request or response has failed to be delivered by the network can be timed out and retried. Because the request or response was not delivered, failure would have been the outcome regardless of the length of timeout. Timing out in this case will not release server resources, but it will release client resources and improve workload performance. Take advantage of well-established design patterns like retries and circuit breakers to handle timeouts gracefully and support fail-fast approaches. AWS SDKs and AWS CLI allow for configuration of both connection and request timeouts and for retries with exponential backoff and jitter. AWS Lambda functions support configuration of timeouts, and with AWS Step Functions, you can build low code circuit breakers that take advantage of pre-built integrations with AWS services and SDKs. AWS App Mesh Envoy provides timeout and circuit breaker capabilities. ### Implementation steps - Configure timeouts on remote service calls and take advantage of built-in language timeout features or open source timeout libraries. - When your workload makes calls with an AWS SDK, review the documentation for language specific timeout configuration. - When using AWS SDKs or AWS CLI commands in your workload, configure default timeout values by setting the AWS configuration defaults for connectTimeoutInMillis and tlsNegotiationTimeoutInMillis. - Apply command line options cli-connect-timeout and cli-read-timeout to control one-off AWS CLI commands to AWS services. - Monitor remote service calls for timeouts, and set alarms on persistent errors so that you can proactively handle error scenarios. - Implement CloudWatch Metrics and CloudWatch anomaly detection on call error rates, service level objectives for latency, and latency outliers to provide insight into managing overly aggressive or permissive timeouts. - Configure timeouts on Lambda functions. - API Gateway clients must implement their own retries when handling timeouts. API Gateway supports a 50 millisecond to 29 second integration timeout for downstream integrations and does not retry when integration requests timeout. - Implement the circuit breaker pattern to avoid making remote calls when they are timing out. Open the circuit to avoid failing calls and close the circuit when calls are responding normally. - For container based workloads, review App Mesh Envoy features to leverage built in timeouts and circuit breakers. - Use AWS Step Functions to build low code circuit breakers for remote service calls, especially where calling AWS native SDKs and supported Step Functions integrations to simplify your workload.

๐Ÿ’ผ REL05-BP06 Make systems stateless where possible

Systems should either not require state, or should offload state such that between different client requests, there is no dependence on locally stored data on disk and in memory. This allows servers to be replaced at will without causing an availability impact. When users or services interact with an application, they often perform a series of interactions that form a session. A session is unique data for users that persists between requests while they use the application. A stateless application is an application that does not need knowledge of previous interactions and does not store session information. Once designed to be stateless, you can then use serverless compute services, such as AWS Lambda or AWS Fargate. In addition to server replacement, another benefit of stateless applications is that they can scale horizontally because any of the available compute resources (such as EC2 instances and AWS Lambda functions) can service any request. **Benefits of establishing this best practice** Systems that are designed to be stateless are more adaptable to horizontal scaling, making it possible to add or remove capacity based on fluctuating traffic and demand. They are also inherently resilient to failures and provide flexibility and agility in application development. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Make your applications stateless. Stateless applications allow horizontal scaling and are tolerant to the failure of an individual node. Analyze and understand the components of your application that maintain state within the architecture. This helps you assess the potential impact of transitioning to a stateless design. A stateless architecture decouples user data and offloads the session data. This provides the flexibility to scale each component independently to meet varying workload demands and optimize resource utilization. ### Implementation steps - Identify and understand the stateful components in your application. - Decouple data by separating and managing user data from the core application logic. - Amazon Cognito can decouple user data from application code by using features, such as identity pools, user pools, and Amazon Cognito Sync. - You can use AWS Secrets Manager decouple user data by storing secrets in a secure, centralized location. This means that the application code doesn't need to store secrets, which makes it more secure. - Consider using Amazon S3 to store large, unstructured data, such as images and documents. Your application can retrieve this data when required, eliminating the need to store it in memory. - Use Amazon DynamoDB to store information such as user profiles. Your application can query this data in near-real time. - Offload session data to a database, cache, or external files. - Amazon ElastiCache, Amazon DynamoDB, Amazon Elastic File System (Amazon EFS), and Amazon MemoryDB are examples of AWS services that you can use to offload session data. - Design a stateless architecture after you identify which state and user data need to be persisted with your storage solution of choice.

๐Ÿ’ผ REL05-BP07 Implement emergency levers

Emergency levers are rapid processes that can mitigate availability impact on your workload. Emergency levers work by disabling, throttling, or changing the behavior of components or dependencies using known and tested mechanisms. This can alleviate workload impairments caused by resource exhaustion due to unexpected increases in demand and reduce the impact of failures in non-critical components within your workload. **Desired outcome** By implementing emergency levers, you can establish known-good processes to maintain the availability of critical components in your workload. The workload should degrade gracefully and continue to perform its business-critical functions during the activation of an emergency lever. For more detail on graceful degradation, see REL05-BP01 Implement graceful degradation to transform applicable hard dependencies into soft dependencies. **Common anti-patterns** - Failure of non-critical dependencies impacts the availability of your core workload. - Not testing or verifying critical component behavior during non-critical component impairment. - No clear and deterministic criteria defined for activation or deactivation of an emergency lever. **Benefits of establishing this best practice** Implementing emergency levers can improve the availability of the critical components in your workload by providing your resolvers with established processes to respond to unexpected spikes in demand or failures of non-critical dependencies. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance - Identify critical components in your workload. - Design and architect the critical components in your workload to withstand failure of non-critical components. - Conduct testing to validate the behavior of your critical components during the failure of non-critical components. - Define and monitor relevant metrics or triggers to initiate emergency lever procedures. - Define the procedures (manual or automated) that comprise the emergency lever. ### Implementation steps 1. Identify business-critical components in your workload. - Each technical component in your workload should be mapped to its relevant business function and ranked as critical or non-critical. For examples of critical and non-critical functionality at Amazon, see Any Day Can Be Prime Day: How Amazon.com Search Uses Chaos Engineering to Handle Over 84K Requests Per Second. - This is both a technical and business decision, and varies by organization and workload. 2. Design and architect the critical components in your workload to withstand failure of non-critical components. - During dependency analysis, consider all potential failure modes, and verify that your emergency lever mechanisms deliver the critical functionality to downstream components. 3. Conduct testing to validate the behavior of your critical components during activation of your emergency levers. - Avoid bimodal behavior. For more detail, see REL11-BP05 Use static stability to prevent bimodal behavior. 4. Define, monitor, and alert on relevant metrics to initiate the emergency lever procedure. - Finding the right metrics to monitor depends on your workload. Some example metrics are latency or the number of failed request to a dependency. 5. Define the procedures, manual or automated, that comprise the emergency lever. - This may include mechanisms such as load shedding, throttling requests, or implementing graceful degradation.

๐Ÿ’ผ REL06-BP01 Monitor all components for the workload (Generation)

Monitor the components of the workload with Amazon CloudWatch or third-party tools. Monitor AWS services with AWS Health Dashboard. All components of your workload should be monitored, including the front-end, business logic, and storage tiers. Define key metrics, describe how to extract them from logs (if necessary), and set thresholds for invoking corresponding alarm events. Ensure metrics are relevant to the key performance indicators (KPIs) of your workload, and use metrics and logs to identify early warning signs of service degradation. For example, a metric related to business outcomes such as the number of orders successfully processed per minute, can indicate workload issues faster than technical metric, such as CPU Utilization. Use AWS Health Dashboard for a personalized view into the performance and availability of the AWS services underlying your AWS resources. Monitoring in the cloud offers new opportunities. Most cloud providers have developed customizable hooks and can deliver insights to help you monitor multiple layers of your workload. AWS services such as Amazon CloudWatch apply statistical and machine learning algorithms to continually analyze metrics of systems and applications, determine normal baselines, and surface anomalies with minimal user intervention. Anomaly detection algorithms account for the seasonality and trend changes of metrics. AWS makes an abundance of monitoring and log information available for consumption that can be used to define workload-specific metrics, change-in-demand processes, and adopt machine learning techniques regardless of ML expertise. In addition, monitor all of your external endpoints to ensure that they are independent of your base implementation. This active monitoring can be done with synthetic transactions (sometimes referred to as user canaries, but not to be confused with canary deployments) which periodically run a number of common tasks matching actions performed by clients of the workload. Keep these tasks short in duration and be sure not to overload your workload during testing. Amazon CloudWatch Synthetics allows you to create synthetic canaries to monitor your endpoints and APIs. You can also combine the synthetic canary client nodes with AWS X-Ray console to pinpoint which synthetic canaries are experiencing issues with errors, faults, or throttling rates for the selected time frame. **Desired Outcome** Collect and use critical metrics from all components of the workload to ensure workload reliability and optimal user experience. Detecting that a workload is not achieving business outcomes allows you to quickly declare a disaster and recover from an incident. **Common anti-patterns** - Only monitoring external interfaces to your workload. - Not generating any workload-specific metrics and only relying on metrics provided to you by the AWS services your workload uses. - Only using technical metrics in your workload and not monitoring any metrics related to non-technical KPIs the workload contributes to. - Relying on production traffic and simple health checks to monitor and evaluate workload state. **Benefits of establishing this best practice** Monitoring at all tiers in your workload allows you to more rapidly anticipate and resolve problems in the components that comprise the workload. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance 1. Turn on logging where available. Monitoring data should be obtained from all components of the workloads. Turn on additional logging, such as S3 Access Logs, and permit your workload to log workload specific data. Collect metrics for CPU, network I/O, and disk I/O averages from services such as Amazon ECS, Amazon EKS, Amazon EC2, Elastic Load Balancing, AWS Auto Scaling, and Amazon EMR. See AWS Services That Publish CloudWatch Metrics for a list of AWS services that publish metrics to CloudWatch. 2. Review all default metrics and explore any data collection gaps. Every service generates default metrics. Collecting default metrics allows you to better understand the dependencies between workload components, and how component reliability and performance affect the workload. You can also create and publish your own metrics to CloudWatch using the AWS CLI or an API. 3. Evaluate all the metrics to decide which ones to alert on for each AWS service in your workload. You may choose to select a subset of metrics that have a major impact on workload reliability. Focusing on critical metrics and threshold allows you to refine the number of alerts and can help minimize false-positives. 4. Define alerts and the recovery process for your workload after the alert is invoked. Defining alerts allows you to quickly notify, escalate, and follow steps necessary to recover from an incident and meet your prescribed Recovery Time Objective (RTO). You can use Amazon CloudWatch Alarms to invoke automated workflows and initiate recovery procedures based on defined thresholds. 5. Explore use of synthetic transactions to collect relevant data about workloads state. Synthetic monitoring follows the same routes and perform the same actions as a customer, which makes it possible for you to continually verify your customer experience even when you don't have any customer traffic on your workloads. By using synthetic transactions, you can discover issues before your customers do.

๐Ÿ’ผ REL06-BP02 Define and calculate metrics (Aggregation)

Collect metrics and logs from your workload components and calculate relevant aggregate metrics from them. These metrics provide broad and deep observability of your workload and can significantly improve your resilience posture. Observability is more than just collecting metrics from workload components and being able to view and alert on them. It's about having a holistic understanding about your workload's behavior. This behavioral information comes from all components in your workloads, which includes the cloud services on which they depend, well-crafted logs, and metrics. This data gives you oversight on your workload's behavior as a whole, as well as an understanding of every component's interaction with every unit of work at a fine level of detail. **Desired outcome** - You collect logs from your workload components and AWS service dependencies, and you publish them to a central location where they can be easily accessed and processed. - Your logs contain high-fidelity and accurate timestamps. - Your logs contain relevant information about the processing context, such as a trace identifier, user or account identifier, and remote IP address. - You create aggregate metrics from your logs that represent your workload's behavior from a high-level perspective. - You are able to query your aggregated logs to gain deep and relevant insights about your workload and identify actual and potential problems. **Common anti-patterns** - You don't collect relevant logs or metrics from the compute instances your workloads run on or the cloud services they use. - You overlook the collection of logs and metrics related to your business key performance indicators (KPIs). - You analyze workload-related telemetry in isolation without aggregation and correlation. - You allow metrics and logs to expire too quickly, which hinders trend analysis and recurring issue identification. **Benefits of establishing these best practices** You can detect more anomalies and correlate events and metrics between different components of your workload. You can create insights from your workload components based on information contained in logs that frequently aren't available in metrics alone. You can determine causes of failure more quickly by querying your logs at scale. **Level of risk exposed if these best practices are not established:** High ## Implementation guidance Identify the sources of telemetry data that are relevant for your workloads and their components. This data comes not only from components that publish metrics, such as your operating system (OS) and application runtimes such as Java, but also from application and cloud service logs. For example, web servers typically log each request with detailed information such as the timestamp, processing latency, user ID, remote IP address, path, and query string. The level of detail in these logs helps you perform detailed queries and generate metrics that may not have been otherwise available. Collect the metrics and logs using appropriate tools and processes. Logs generated by applications running on Amazon EC2 instance can be collected by an agent such as the Amazon CloudWatch Agent and published to a central storage service such as Amazon CloudWatch Logs. AWS-managed compute services such as AWS Lambda and Amazon Elastic Container Service publish logs to CloudWatch Logs for you automatically. Enable log collection for AWS storage and processing services used by your workloads such as Amazon CloudFront, Amazon S3, Elastic Load Balancing, and Amazon API Gateway. Enrich your telemetry data with dimensions that can help you see behavioral patterns more clearly and isolate correlated problems to groups of related components. Once added, you can observe component behavior at a finer level of detail, detect correlated failures, and take appropriate remedial steps. Examples of useful dimensions include Availability Zone, EC2 instance ID, and container task or Pod ID. Once you have collected the metrics and logs, you can write queries and generate aggregate metrics from them that provide useful insights into both normal and anomalous behavior. For example, you can use Amazon CloudWatch Logs Insights to derive custom metrics from your application logs, Amazon CloudWatch Metrics Insights to query your metrics at scale, Amazon CloudWatch Container Insights to collect, aggregate and summarize metrics and logs from your containerized applications and microservices, or Amazon CloudWatch Lambda Insights if you're using AWS Lambda functions. To create an aggregate error rate metric, you can increment a counter each time an error response or message is found in your component logs or calculate the aggregate value of an existing error rate metric. You can use this data to generate histograms that show tail behavior, such as the worst-performing requests or processes. You can also scan this data in real time for anomalous patterns using solutions such as CloudWatch Logs anomaly detection. These insights can be placed on dashboards to keep them organized according to your needs and preferences. Querying logs can help you understand how specific requests were handled by your workload components and reveal request patterns or other context that has an impact on your workload's resilience. It can be useful to research and prepare queries in advance, based on your knowledge of how your applications and other components behave, so you can more easily run them as needed. For example, with CloudWatch Logs Insights, you can interactively search and analyze your log data stored in CloudWatch Logs. You can also use Amazon Athena to query logs from multiple sources, including many AWS services, at petabyte scale. When you define a log retention policy, consider the value of historical logs. Historical logs can help identify long-term usage and behavioral patterns, regressions, and improvements in your workload's performance. Permanently deleted logs cannot be analyzed later. However, the value of historical logs tends to diminish over long periods of time. Choose a policy that balances your needs as appropriate and is compliant with any legal or contractual requirements you might be subject to. ### Implementation steps 1. Choose collection, storage, analysis, and display mechanisms for your observability data. 2. Install and configure metric and log collectors on the appropriate components of your workload (for example, on Amazon EC2 instances and in sidecar containers). Configure these collectors to restart automatically if they unexpectedly stop. Enable disk or memory buffering for the collectors so that temporary publication failures don't impact your applications or result in lost data. 3. Enable logging on AWS services you use as a part of your workloads, and forward those logs to the storage service you selected if needed. Refer to the respective services' user or developer guides for detailed instructions. 4. Define the operational metrics relevant to your workloads that are based on your telemetry data. These could be based on direct metrics emitted from your workload components, which can include business KPI related metrics, or the results of aggregated calculations such as sums, rates, percentiles, or histograms. Calculate these metrics using your log analyzer, and place them on dashboards as appropriate. 5. Prepare appropriate log queries to analyze workload components, requests, or transaction behavior as needed. 6. Define and enable a log retention policy for your component logs. Periodically delete logs when they become older than the policy permits.

๐Ÿ’ผ REL06-BP03 Send notifications (Real-time processing and alarming)

When organizations detect potential issues, they send real-time notifications and alerts to the appropriate personnel and systems in order to respond quickly and effectively to these issues. **Desired outcome** Rapid responses to operational events are possible through configuration of relevant alarms based on service and application metrics. When alarm thresholds are breached, the appropriate personnel and systems are notified so they can address underlying issues. **Common anti-patterns** - Configuring alarms with an excessively high threshold, resulting in the failure to send vital notifications. - Configuring alarms with a threshold that is too low, resulting in inaction on important alerts due to the noise of excessive notifications. - Not updating alarms and their threshold when usage changes. - For alarms best addressed through automated actions, sending the notification to personnel instead of generating the automated action results in excessive notifications being sent. **Benefits of establishing this best practice** Sending real-time notifications and alerts to the appropriate personnel and systems allows for early detection of issues and rapid responses to operational incidents. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Workloads should be equipped with real-time processing and alarming to improve the detectability of issues that could impact the availability of the application and serve as triggers for automated response. Organizations can perform real-time processing and alarming by creating alerts with defined metrics in order to receive notifications whenever significant events occur or a metric exceeds a threshold. Amazon CloudWatch allows you to create metric and composite alarms using CloudWatch alarms based on static threshold, anomaly detection, and other criteria. For more detail on the types of alarms you can configure using CloudWatch, see the alarms section of the CloudWatch documentation. You can construct customized views of metrics and alerts of your AWS resources for your teams using CloudWatch dashboards. The customizable home pages in the CloudWatch console allow you to monitor your resources in a single view across multiple Regions. Alarms can perform one or more actions, like sending a notification to an Amazon SNS topic, performing an Amazon EC2 action or an Amazon EC2 Auto Scaling action, or creating an OpsItem or incident in AWS Systems Manager. Amazon CloudWatch uses Amazon SNS to send notifications when the alarm changes state, providing message delivery from the publishers (producers) to the subscribers (consumers). CloudWatch sends EventBridge events whenever a CloudWatch alarm is created, updated, deleted, or its state changes. You can use EventBridge with these events to create rules that perform actions, such as notifying you whenever the state of an alarm changes or automatically triggering events in your account using Systems Manager automation. Stay informed with AWS Health. AWS Health is the authoritative source of information about the health of your AWS Cloud resources. Use AWS Health to get notified of any confirmed service events so you can quickly take steps to mitigate any impact. Create purpose-fit AWS Health event notifications to e-mail and chat channels through AWS User Notifications and integrate programmatically with your monitoring and alerting tools through Amazon EventBridge. If you use AWS Organizations, aggregate AWS Health events across accounts. **When should you use EventBridge or Amazon SNS?** Both EventBridge and Amazon SNS can be used to develop event-driven applications, and your choice will depend on your specific needs. Amazon EventBridge is recommended when you want to build an application that reacts to events from your own applications, SaaS applications, and AWS services. EventBridge is the only event-based service that integrates directly with third-party SaaS partners. EventBridge also automatically ingests events from over 200 AWS services without requiring developers to create any resources in their account. EventBridge uses a defined JSON-based structure for events, and helps you create rules that are applied across the entire event body to select events to forward to a target. EventBridge currently supports over 20 AWS services as targets, including AWS Lambda, Amazon SQS, Amazon SNS, Amazon Kinesis Data Streams, and Amazon Data Firehose. Amazon SNS is recommended for applications that need high fan out (thousands or millions of endpoints). A common pattern we see is that customers use Amazon SNS as a target for their rule to filter the events that they need, and fan out to multiple endpoints. Messages are unstructured and can be in any format. Amazon SNS supports forwarding messages to six different types of targets, including Lambda, Amazon SQS, HTTP/S endpoints, SMS, mobile push, and email. Amazon SNS typical latency is under 30 milliseconds. A wide range of AWS services send Amazon SNS messages by configuring the service to do so (more than 30, including Amazon EC2, Amazon S3, and Amazon RDS). ### Implementation steps 1. Create an alarm using Amazon CloudWatch alarms. - A metric alarm monitors a single CloudWatch metric or an expression dependent on CloudWatch metrics. The alarm initiates one or more actions based on the value of the metric or expression in comparison to a threshold over a number of time intervals. The action may consist of sending a notification to an Amazon SNS topic, performing an Amazon EC2 action or an Amazon EC2 Auto Scaling action, or creating an OpsItem or incident in AWS Systems Manager. - A composite alarm consists of a rule expression that considers the alarm conditions of other alarms you've created. The composite alarm only enters alarm state if all rule conditions are met. The alarms specified in the rule expression of a composite alarm can include metric alarms and additional composite alarms. Composite alarms can send Amazon SNS notifications when their state changes and can create Systems Manager OpsItems or incidents when they enter the alarm state, but they cannot perform Amazon EC2 or Auto Scaling actions. 2. Set up Amazon SNS notifications. When creating a CloudWatch alarm, you can include an Amazon SNS topic to send a notification when the alarm changes state. 3. Create rules in EventBridge that matches specified CloudWatch alarms. Each rule supports multiple targets, including Lambda functions. For example, you can define an alarm that initiates when available disk space is running low, which triggers a Lambda function through an EventBridge rule, to clean up the space.

๐Ÿ’ผ REL06-BP04 Automate responses (Real-time processing and alarming)

Use automation to take action when an event is detected, for example, to replace failed components. Automated real-time processing of alarms is implemented so that systems can take quick corrective action and attempt to prevent failures or degraded service when alarms are triggered. Automated responses to alarms could include the replacement of failing components, the adjustment of compute capacity, the redirection of traffic to healthy hosts, availability zones, or other regions, and the notification of operators. **Desired outcome** Real-time alarms are identified, and automated processing of alarms is set up to invoke the appropriate actions taken to maintain service level objectives and service-level agreements (SLAs). Automation can range from self-healing activities of single components to full-site failover. **Common anti-patterns** - Not having a clear inventory or catalog of key real-time alarms. - No automated responses on critical alarms (for example, when compute is nearing exhaustion, autoscaling occurs). - Contradictory alarm response actions. - No standard operating procedures (SOPs) for operators to follow when they receive alert notifications. - Not monitoring configuration changes, as undetected configuration changes can cause downtime for workloads. - Not having a strategy to undo unintended configuration changes. **Benefits of establishing this best practice** Automating alarm processing can improve system resiliency. The system takes corrective actions automatically, reducing manual activities that allow for human, error-prone interventions. Workload operates meet availability goals, and reduces service disruption. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance To effectively manage alerts and automate their response, categorize alerts based on their criticality and impact, document response procedures, and plan responses before ranking tasks. Identify tasks requiring specific actions (often detailed in runbooks), and examine all runbooks and playbooks to determine which tasks can be automated. If actions can be defined, often they can be automated. If actions cannot be automated, document manual steps in an SOP and train operators on them. Continually challenge manual processes for automation opportunities where you can establish and maintain a plan to automate alert responses. ### Implementation steps 1. Create an inventory of alarms: To obtain a list of all alarms, you can use the AWS CLI using the Amazon CloudWatch command describe-alarms. Depending upon how many alarms you have set up, you might have to use pagination to retrieve a subset of alarms for each call, or alternatively you can use the AWS SDK to obtain the alarms using an API call. 2. Document all alarm actions: Update a runbook with all alarms and their actions, irrespective if they are manual or automated. AWS Systems Manager provides predefined runbooks. 3. Set up and manage alarm actions: For any of the alarms that require an action, specify the automated action using the CloudWatch SDK. For example, you can change the state of your Amazon EC2 instances automatically based on a CloudWatch alarm by creating and enabling actions on an alarm or disabling actions on an alarm. You can also use Amazon EventBridge to respond automatically to system events, such as application availability issues or resource changes. You can create rules to indicate which events you're interested in, and the actions to take when an event matches a rule. The actions that can be automatically initiated include invoking an AWS Lambda function, invoking Amazon EC2 Run Command, relaying the event to Amazon Kinesis Data Streams, and seeing Automate Amazon EC2 using EventBridge. 4. Standard Operating Procedures (SOPs): Based on your application components, AWS Resilience Hub recommends multiple SOP templates. You can use these SOPs to document all the processes an operator should follow in case an alert is raised. You can also construct a SOP based on Resilience Hub recommendations, where you need a Resilience Hub application with an associated resiliency policy, as well as a historic resiliency assessment against that application. The recommendations for your SOP are produced by the resiliency assessment. Resilience Hub works with Systems Manager to automate the steps of your SOPs by providing a number of SSM documents you can use as the basis for those SOPs. For example, Resilience Hub may recommend an SOP for adding disk space based on an existing SSM automation document. 5. Perform automated actions using Amazon DevOps Guru: You can use Amazon DevOps Guru to automatically monitor application resources for anomalous behavior and deliver targeted recommendations to speed up problem identification and remediation times. With DevOps Guru, you can monitor streams of operational data in near real time from multiple sources including Amazon CloudWatch metrics, AWS Config, AWS CloudFormation, and AWS X-Ray. You can also use DevOps Guru to automatically create OpsItems in OpsCenter and send events to EventBridge for additional automation.

๐Ÿ’ผ REL06-BP05 Analyze logs

Collect log files and metrics histories and analyze these for broader trends and workload insights. Amazon CloudWatch Logs Insights supports a simple yet powerful query language that you can use to analyze log data. Amazon CloudWatch Logs also supports subscriptions that allow data to flow seamlessly to Amazon S3 where you can use or Amazon Athena to query the data. It also supports queries on a large array of formats. See Supported SerDes and Data Formats in the Amazon Athena User Guide for more information. For analysis of huge log file sets, you can run an Amazon EMR cluster to run petabyte-scale analyses. There are a number of tools provided by AWS Partners and third parties that allow for aggregation, processing, storage, and analytics. These tools include New Relic, Splunk, Loggly, Logstash, CloudHealth, and Nagios. However, outside generation of system and application logs is unique to each cloud provider, and often unique to each service. An often-overlooked part of the monitoring process is data management. You need to determine the retention requirements for monitoring data, and then apply lifecycle policies accordingly. Amazon S3 supports lifecycle management at the S3 bucket level. This lifecycle management can be applied differently to different paths in the bucket. Toward the end of the lifecycle, you can transition data to Amazon S3 Glacier for long-term storage, and then expiration after the end of the retention period is reached. The S3 Intelligent-Tiering storage class is designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance - CloudWatch Logs Insights allows you to interactively search and analyze your log data in Amazon CloudWatch Logs. - Use Amazon CloudWatch Logs to send logs to Amazon S3 where you can use or Amazon Athena to query the data. - Create an S3 lifecycle policy for your server access logs bucket. Configure the lifecycle policy to periodically remove log files. Doing so reduces the amount of data that Athena analyzes for each query.

๐Ÿ’ผ REL06-BP06 Regularly review monitoring scope and metrics

Frequently review how workload monitoring is implemented, and update it as your workload and its architecture evolves. Regular audits of your monitoring helps reduce the risk of missed or overlooked trouble indicators and further helps your workload meet its availability goals. Effective monitoring is anchored in key business metrics, which evolve as your business priorities change. Your monitoring review process should emphasize service-level indicators (SLIs) and incorporate insights from your infrastructure, applications, clients, and users. **Desired outcome** You have an effective monitoring strategy that is regularly reviewed and updated periodically, as well as after any significant events or changes. You verify that key application health indicators are still relevant as your workload and business requirements evolve. **Common anti-patterns** - You collect only default metrics. - You set up a monitoring strategy, but you never review it. - You don't discuss monitoring when major changes are deployed. - You trust outdated metrics to determine workload health. - Your operations teams are overwhelmed with false-positive alerts due to outdated metrics and thresholds. - You lack observability of application components that are not being monitored. - You focus only on low-level technical metrics and excluding business metrics in your monitoring. **Benefits of establishing this best practice** When you regularly review your monitoring, you can anticipate potential problems and verify that you are capable of detecting them. It also allows you to uncover blind spots that you might have missed during earlier reviews, which further improves your ability to detect issues. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Review monitoring metrics and scope during your operational readiness review (ORR) process. Perform periodic operational readiness reviews on a consistent schedule to evaluate whether there are any gaps between your current workload and the monitoring you have configured. Establish a regular cadence for operational performance reviews and knowledge sharing to enhance your ability to achieve higher performance from your operational teams. Validate whether existing alert thresholds are still adequate, and check for situations where operational teams are receiving false-positive alerts or not monitoring aspects of the application that should be monitored. The Resilience Analysis Framework provides useful guidance that can help you navigate the process. The focus of the framework is to identify potential failure modes and the preventive and corrective controls you can use to mitigate their impact. This knowledge can help you identify the right metrics and events to monitor and alert upon. ### Implementation steps 1. Schedule and conduct regular reviews of the workload dashboards. You may have different cadences for the depth at which you inspect. 2. Inspect for trends in the metrics. Compare the metric values to historic values to see if there are trends that may indicate that something that needs investigation. Examples of this include increased latency, decreased primary business function, and increased failure responses. 3. Inspect for outliers and anomalies in your metrics, which can be masked by averages or medians. Look at the highest and lowest values during the time frame, and investigate the causes of observations that are far outside of normal bounds. As you continue to remove these causes, you can tighten your expected metric bounds in response to the improved consistency of your workload performance. 4. Look for sharp changes in behavior. An immediate change in quantity or direction of a metric may indicate that there has been a change in the application or external factors that you may need to add additional metrics to track. 5. Review whether the current monitoring strategy remains relevant for the application. Based on an analysis of previous incidents (or the Resilience Analysis Framework), assess if there are additional aspects of the application that should be incorporated into the monitoring scope. 6. Review your Real User Monitoring (RUM) metrics to determine whether there are any gaps in application functionality coverage. 7. Review your change management process. Update your procedures if necessary to include a monitoring analysis step that should be performed before you approve a change. 8. Implement monitoring review as part of your operational readiness review and correction of error processes.

๐Ÿ’ผ REL06-BP07 Monitor end-to-end tracing of requests through your system

Trace requests as they process through service components so product teams can more easily analyze and debug issues and improve performance. **Desired outcome** Workloads with comprehensive tracing across all components are easy to debug, improving mean time to resolution (MTTR) of errors and latency by simplifying root cause discovery. End-to-end tracing reduces the time it takes to discover impacted components and drill into the detailed root causes of errors or latency. **Common anti-patterns** - Tracing is used for some components but not for all. For example, without tracing for AWS Lambda, teams might not clearly understand latency caused by cold starts in a spiky workload. - Synthetic canaries or real-user monitoring (RUM) are not configured with tracing. Without canaries or RUM, client interaction telemetry is omitted from the trace analysis yielding an incomplete performance profile. - Hybrid workloads include both cloud native and third party tracing tools, but steps have not been taken elect and fully integrate a single tracing solution. Based on the elected tracing solution, cloud native tracing SDKs should be used to instrument components that are not cloud native or third party tools should be configured to ingest cloud native trace telemetry. **Benefits of establishing this best practice** When development teams are alerted to issues, they can see a full picture of system component interactions, including component by component correlation to logging, performance, and failures. Because tracing makes it easy to visually identify root causes, less time is spent investigating root causes. Teams that understand component interactions in detail make better and faster decisions when resolving issues. Decisions like when to invoke disaster recovery (DR) failover or where to best implement self-healing strategies can be improved by analyzing systems traces, ultimately improving customer satisfaction with your services. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Teams that operate distributed applications can use tracing tools to establish a correlation identifier, collect traces of requests, and build service maps of connected components. All application components should be included in request traces including service clients, middleware gateways and event buses, compute components, and storage, including key value stores and databases. Include synthetic canaries and real-user monitoring in your end-to-end tracing configuration to measure remote client interactions and latency so that you can accurately evaluate your systems performance against your service level agreements and objectives. You can use AWS X-Ray and Amazon CloudWatch Application Monitoring instrumentation services to provide a complete view of requests as they travel through your application. X-Ray collects application telemetry and allows you to visualize and filter it across payloads, functions, traces, services, APIs, and can be turned on for system components with no-code or low-code. CloudWatch application monitoring includes ServiceLens to integrate your traces with metrics, logs, and alarms. CloudWatch application monitoring also includes synthetics to monitor your endpoints and APIs, as well as real-user monitoring to instrument your web application clients. ### Implementation steps 1. Use AWS X-Ray on all supported native services like Amazon S3, AWS Lambda, and Amazon API Gateway. These AWS services enable X-Ray with configuration toggles using infrastructure as code, AWS SDKs, or the AWS Management Console. 2. Instrument applications AWS Distro for Open Telemetry and X-Ray or third-party collection agents. 3. Review the AWS X-Ray Developer Guide for programming language specific implementation. These documentation sections detail how to instrument HTTP requests, SQL queries, and other processes specific to your application programming language. 4. Use X-Ray tracing for Amazon CloudWatch Synthetic Canaries and Amazon CloudWatch RUM to analyze the request path from your end user client through your downstream AWS infrastructure. 5. Configure CloudWatch metrics and alarms based on resource health and canary telemetry so that teams are alerted to issues quickly, and can then deep dive into traces and service maps with ServiceLens. 6. Enable X-Ray integration for third party tracing tools like Datadog, New Relic, or Dynatrace if you are using third party tools for your primary tracing solution.

๐Ÿ’ผ REL07-BP01 Use automation when obtaining or scaling resources

A cornerstone of reliability in the cloud is the programmatic definition, provisioning, and management of your infrastructure and resources. Automation helps you streamline resource provisioning, facilitate consistent and secure deployments, and scale resources across your entire infrastructure. **Desired outcome** You manage your infrastructure as code (IaC). You define and maintain your infrastructure code in version control systems (VCS). You delegate provisioning AWS resources to automated mechanisms and leverage managed services like Application Load Balancer (ALB), Network Load Balancer (NLB), and Auto Scaling groups. You provision your resources using continuous integration/continuous delivery (CI/CD) pipelines so that code changes automatically initiate resource updates, including updates to your Auto Scaling configurations. **Common anti-patterns** - You deploy resources manually using the command line or at the AWS Management Console (also known as click-ops). - You tightly couple your application components or resources, and create inflexible architectures as a result. - You implement inflexible scaling policies that do not adapt to changing business requirements, traffic patterns, or new resource types. - You manually estimate capacity to meet anticipated demand. **Benefits of establishing this best practice** Infrastructure as code (IaC) allows infrastructure to be defined programmatically. This helps you manage infrastructure changes through the same software development lifecycle as application changes, which promotes consistency and repeatability and reduces the risk of manual, error-prone tasks. You can further streamline the process of provisioning and updating resources through implementing IaC with automated delivery pipelines. You can deploy infrastructure updates reliably and efficiently without the need for manual intervention. This agility is particularly important when scaling resources to meet fluctuating demands. You can achieve dynamic, automated resource scaling in conjunction with IaC and delivery pipelines. By monitoring key metrics and applying predefined scaling policies, Auto Scaling can automatically provision or deprovision resources as needed, which improves performance and cost-efficiency. This reduces the potential for manual errors or delays in response to changes in application or workload requirements. The combination of IaC, automated delivery pipelines, and Auto Scaling helps organizations provision, update, and scale their environments with confidence. This automation is essential to maintain a responsive, resilient, and efficiently-managed cloud infrastructure. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To set up automation with CI/CD pipelines and infrastructure as code (IaC) for your AWS architecture, choose a version control system such as Git to store your IaC templates and configuration. These templates can be written using tools such as AWS CloudFormation. To start, define your infrastructure components (such as AWS VPCs, Amazon EC2 Auto Scaling Groups, and Amazon RDS databases) within these templates. Next, integrate these IaC templates with a CI/CD pipeline to automate the deployment process. AWS CodePipeline provides a seamless AWS-native solution, or you can use other third-party CI/CD solutions. Create a pipeline that activates when changes occur to your version control repository. Configure the pipeline to include stages that lint and validate your IaC templates, deploy the infrastructure to a staging environment, run automated tests, and finally, deploy to production. Incorporate approval steps where necessary to maintain control over changes. This automated pipeline not only speeds up deployment but also facilitates consistency and reliability across environments. Configure Auto Scaling of resources such as Amazon EC2 instances, Amazon ECS tasks, and database replicas in your IaC to provide automatic scale-out and scale-in as needed. This approach enhances application availability and performance and optimizes cost by dynamically adjusting resources based on demand. For a list of supported resources, see Amazon EC2 Auto Scaling and AWS Auto Scaling. ### Implementation steps 1. Create and use a source code repository to store the code that controls your infrastructure configuration. Commit changes to this repository to reflect any ongoing changes you want to make. 2. Select an infrastructure as code solution such as AWS CloudFormation to keep your infrastructure up to date and detect inconsistency (drift) from your intended state. 3. Integrate your IaC platform with your CI/CD pipeline to automate deployments. 4. Determine and collect the appropriate metrics for automatic scaling of resources. 5. Configure automatic scaling of resources using scale-out and scale-in policies appropriate for your workload components. Consider using scheduled scaling for predictable usage patterns. 6. Monitor deployments to detect failures and regressions. Implement rollback mechanisms within your CI/CD platform to revert changes if necessary.

๐Ÿ’ผ REL07-BP02 Obtain resources upon detection of impairment to a workload

Scale resources reactively when necessary if availability is impacted, to restore workload availability. You first must configure health checks and the criteria on these checks to indicate when availability is impacted by lack of resources. Then, either notify the appropriate personnel to manually scale the resource, or start automation to automatically scale it. Scale can be manually adjusted for your workload (for example, changing the number of EC2 instances in an Auto Scaling group, or modifying throughput of a DynamoDB table through the AWS Management Console or AWS CLI). However, automation should be used whenever possible (refer to Use automation when obtaining or scaling resources). **Desired outcome:** Scaling activities (either automatically or manually) are initiated to restore availability upon detection of a failure or degraded customer experience. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Implement observability and monitoring across all components in your workload, to monitor customer experience and detect failure. Define the procedures, manual or automated, that scale the required resources. o For more information, see REL11-BP01 Monitor all components of the workload to detect failures. ### Implementation steps - Define the procedures, manual or automated, that scale the required resources. - Scaling procedures depend on how the different components within your workload are designed. - Scaling procedures also vary depending on the underlying technology utilized. - Components using AWS Auto Scaling can use scaling plans to configure a set of instructions for scaling your resources. If you work with AWS CloudFormation or add tags to AWS resources, you can set up scaling plans for different sets of resources per application. Auto Scaling provides recommendations for scaling strategies customized to each resource. After you create your scaling plan, Auto Scaling combines dynamic scaling and predictive scaling methods together to support your scaling strategy. - Amazon EC2 Auto Scaling verifies that you have the correct number of Amazon EC2 instances available to handle the load for your application. You create collections of EC2 instances, called Auto Scaling groups. You can specify the minimum and maximum number of instances in each Auto Scaling group, and Amazon EC2 Auto Scaling ensures that your group never goes below or above these limits. - Amazon DynamoDB auto scaling uses the Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns. This allows a table or a global secondary index to increase its provisioned read and write capacity to handle sudden increases in traffic, without throttling.

๐Ÿ’ผ REL07-BP03 Obtain resources upon detection that more resources are needed for a workload

One of the most valuable features of cloud computing is the ability to provision resources dynamically. In traditional on-premises compute environments, you must identify and provision enough capacity in advance to serve peak demand. This is a problem because it is expensive and because it poses risks to availability if you underestimate the workload's peak capacity needs. In the cloud, you don't have to do this. Instead, you can provision compute, database, and other resource capacity as needed to meet current and forecasted demand. Automated solutions such as Amazon EC2 Auto Scaling and Application Auto Scaling can bring resources online for you based on metrics you specify. This can make the scaling process easier and predictable, and it can make your workload significantly more reliable by ensuring you have enough resources available at all times. **Desired outcome:** You configure automatic scaling of compute and other resources to meet demand. You provide sufficient headroom in your scaling policies to allow bursts of traffic to be served while additional resources are brought online. **Common anti-patterns:** - You provision a fixed number of scalable resources. - You choose a scaling metric that does not correlate to actual demand. - You fail to provide enough headroom in your scaling plans to accommodate demand bursts. - Your scaling policies add capacity too late, which leads to capacity exhaustion and degraded service while additional resources are brought online. - You fail to correctly configure minimum and maximum resource counts, which leads to scaling failures. **Benefits of establishing this best practice:** Having enough resources to meet current demand is critical to provide high availability of your workload and adhere to your defined service-level objectives (SLOs). Automatic scaling allows you to provide the right amount of compute, database, and other resources your workload needs in order to serve current and forecasted demand. You don't need to determine peak capacity needs and statically allocate resources to serve it. Instead, as demand grows, you can allocate more resources to accommodate it, and after demand falls, you can deactivate resources to reduce cost. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance First, determine whether the workload component is suitable for automatic scaling. These components are called horizontally scalable because they provide the same resources and behave identically. Examples of horizontally-scalable components include EC2 instances that are configured alike, Amazon Elastic Container Service (ECS) tasks, and pods running on Amazon Elastic Kubernetes Service (EKS). These compute resources are typically located behind a load balancer and are referred to as replicas. Other replicated resources may include database read replicas, Amazon DynamoDB tables, and Amazon ElastiCache (Redis OSS) clusters. For a complete list of supported resources, see AWS services that you can use with Application Auto Scaling. For container-based architectures, you may need to scale two different ways. First, you may need to scale the containers that provide horizontally-scalable services. Second, you may need to scale the compute resources to make space for new containers. Different automatic scaling mechanisms exist for each layer. To scale ECS tasks, you can use Application Auto Scaling. To scale Kubernetes pods, you can use Horizontal Pod Autoscaler (HPA) or Kubernetes Event-driven Autoscaling (KEDA). To scale the compute resources, you can use Capacity Providers for ECS, or for Kubernetes, you can use Karpenter or Cluster Autoscaler. Next, select how you will perform automatic scaling. There are three major options: metric-based scaling, scheduled scaling, and predictive scaling. ### Metric-based scaling Metric-based scaling provisions resources based on the value of one or more scaling metrics. A scaling metric is one that corresponds to your workload's demand. A good way to determine appropriate scaling metrics is to perform load testing in a non-production environment. During your load tests, keep the number of scalable resources fixed, and slowly increase demand (for example, throughput, concurrency, or simulated users). Then look for metrics that increase (or decrease) as demand grows, and conversely decrease (or increase) as demand falls. Typical scaling metrics include CPU utilization, work queue depth (such as an Amazon SQS queue), number of active users, and network throughput. **Note:** AWS has observed that with most applications, memory utilization increases as the application warms up and then reaches a steady value. When demand decreases, memory utilization typically remains elevated rather than decreasing in parallel. Because memory utilization does not correspond to demand in both directionsโ€“that is, growing and falling with demandโ€“consider carefully before you select this metric for automatic scaling. Metric-based scaling is a latent operation. It can take several minutes for utilization metrics to propagate to auto scaling mechanisms, and these mechanisms typically wait for a clear signal of increased demand before reacting. Then, as the auto scaler creates new resources, it can take additional time for them to come to full service. Because of this, it is important to not set your scaling metric targets too close to full utilization (for example, 90% CPU utilization). Doing so risks exhausting existing resource capacity before additional capacity can come online. Typical resource utilization targets can range between 50-70% for optimum availability, depending on demand patterns and time required to provision additional resources. ### Scheduled scaling Scheduled scaling provisions or removes resources based on the calendar or time of day. It is frequently used for workloads that have predictable demand, such as peak utilization during weekday business hours or sales events. Both Amazon EC2 Auto Scaling and Application Auto Scaling support scheduled scaling. KEDA's cron scaler supports scheduled scaling of Kubernetes pods. ### Predictive scaling Predictive scaling uses machine learning to automatically scale resources based on anticipated demand. Predictive scaling analyzes the historical value of a utilization metric you provide and continuously predicts its future value. The predicted value is then used to scale the resource up or down. Amazon EC2 Auto Scaling can perform predictive scaling. ### Implementation steps 1. Determine whether the workload component is suitable for automatic scaling. 2. Determine what kind of scaling mechanism is most appropriate for the workload: metric-based scaling, scheduled scaling, or predictive scaling. 3. Select the appropriate automatic scaling mechanism for the component. For Amazon EC2 instances, use Amazon EC2 Auto Scaling. For other AWS services, use Application Auto Scaling. For Kubernetes pods (such as those running in an Amazon EKS cluster), consider Horizontal Pod Autoscaler (HPA) or Kubernetes Event-driven Autoscaling (KEDA). For Kubernetes or EKS nodes, consider Karpenter and Cluster Auto Scaler (CAS). 4. For metric or scheduled scaling, conduct load testing to determine the appropriate scaling metrics and target values for your workload. For scheduled scaling, determine the number of resources needed at the dates and times you select. Determine the maximum number of resources needed to serve expected peak traffic. 5. Configure the auto scaler based on the information collected above. Consult the auto scaling service's documentation for details. Verify that the maximum and minimum scaling limits are configured correctly. 6. Verify the scaling configuration is working as expected. Perform load testing in a non-production environment and observe how the system reacts, and adjust as needed. When enabling auto scaling in production, configure appropriate alarms to notify you of any unexpected behavior.

๐Ÿ’ผ REL07-BP04 Load test your workload

Adopt a load testing methodology to measure if scaling activity meets workload requirements. Itโ€™s important to perform sustained load testing. Load tests should discover the breaking point and test the performance of your workload. AWS makes it easy to set up temporary testing environments that model the scale of your production workload. In the cloud, you can create a production-scale test environment on demand, complete your testing, and then decommission the resources. Because you only pay for the test environment when it's running, you can simulate your live environment for a fraction of the cost of testing on premises. Load testing in production should also be considered as part of game days where the production system is stressed, during hours of lower customer usage, with all personnel on hand to interpret results and address any problems that arise. **Common anti-patterns:** - Performing load testing on deployments that are not the same configuration as your production. - Performing load testing only on individual pieces of your workload, and not on the entire workload. - Performing load testing with a subset of requests and not a representative set of actual requests. - Performing load testing to a small safety factor above expected load. **Benefits of establishing this best practice:** You know what components in your architecture fail under load and be able to identify what metrics to watch to indicate that you are approaching that load in time to address the problem, preventing the impact of that failure. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance - Perform load testing to identify which aspect of your workload indicates that you must add or remove capacity. Load testing should have representative traffic similar to what you receive in production. Increase the load while watching the metrics you have instrumented to determine which metric indicates when you must add or remove resources. - **Distributed Load Testing on AWS:** simulate thousands of connected users - Identify the mix of requests. You may have varied mixes of requests, so you should look at various time frames when identifying the mix of traffic. - Implement a load driver. You can use custom code, open source, or commercial software to implement a load driver. - Load test initially using small capacity. You see some immediate effects by driving load onto a lesser capacity, possibly as small as one instance or container. - Load test against larger capacity. The effects will be different on a distributed load, so you must test against as close to a product environment as possible.

๐Ÿ’ผ REL08-BP01 Use runbooks for standard activities such as deployment

Runbooks are the predefined procedures to achieve specific outcomes. Use runbooks to perform standard activities, whether done manually or automatically. Examples include deploying a workload, patching a workload, or making DNS modifications. For example, put processes in place to ensure rollback safety during deployments. Ensuring that you can roll back a deployment without any disruption for your customers is critical in making a service reliable. For runbook procedures, start with a valid effective manual process, implement it in code, and invoke it to automatically run where appropriate. Even for sophisticated workloads that are highly automated, runbooks are still useful for running game days or meeting rigorous reporting and auditing requirements. Note that playbooks are used in response to specific incidents, and runbooks are used to achieve specific outcomes. Often, runbooks are for routine activities, while playbooks are used for responding to non-routine events. **Common anti-patterns:** - Performing unplanned changes to configuration in production. - Skipping steps in your plan to deploy faster, resulting in a failed deployment. - Making changes without testing the reversal of the change. **Benefits of establishing this best practice:** Effective change planning increases your ability to successfully run the change because you are aware of all the systems impacted. Validating your change in test environments increases your confidence. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance - Provide consistent and prompt responses to well-understood events by documenting procedures in runbooks. - Use the principle of infrastructure as code to define your infrastructure. By using AWS CloudFormation (or a trusted third party) to define your infrastructure, you can use version control software to version and track changes. - Use AWS CloudFormation (or a trusted third-party provider) to define your infrastructure. - Create templates that are singular and decoupled, using good software design principles. - Determine the permissions, templates, and responsible parties for implementation. - Use a hosted source code management system based on a popular technology such as Git to store your source code and infrastructure as code (IaC) configuration.

๐Ÿ’ผ REL08-BP02 Integrate functional testing as part of your deployment

Use techniques such as unit tests and integration tests that validate required functionality. Unit testing is the process where you test the smallest functional unit of code to validate its behavior. Integration testing seeks to validate that each application feature works according to the software requirements. While unit tests focus on testing part of an application in isolation, integration tests consider side effects (for example, the effect of data being changed through a mutation operation). In either case, tests should be integrated into a deployment pipeline, and if success criteria are not met, the pipeline is halted or rolled back. These tests are run in a pre-production environment, which is staged prior to production in the pipeline. You achieve the best outcomes when these tests are run automatically as part of build and deployment actions. For instance, with AWS CodePipeline, developers commit changes to a source repository where CodePipeline automatically detects the changes. The application is built, and unit tests are run. After the unit tests have passed, the built code is deployed to staging servers for testing. From the staging server, CodePipeline runs more tests, such as integration or load tests. Upon the successful completion of those tests, CodePipeline deploys the tested and approved code to production instances. **Desired outcome:** You use automation to perform unit and integration tests to validate that your code behaves as expected. These tests are integrated into the deployment process, and a test failure aborts the deployment. **Common anti-patterns:** - You ignore or bypass test failures and plans during the deployment process in order to accelerate the deployment timeline. - You manually perform tests outside the deployment pipeline. - You skip testing steps in the automation through manual emergency workflows. - You run automated tests in an environment that does not closely resemble the production environment. - You build a test suite that is insufficiently flexible and is difficult to maintain, update, or scale as the application evolves. **Benefits of establishing this best practice:** Automated testing during the deployment process catches issues early, which reduces the risk of a release to production with bugs or unexpected behavior. Unit tests validate the code behaves as desired and API contracts are honored. Integration tests validate that the system operates according to specified requirements. These types of tests verify the intended working order of components such as user interfaces, APIs, databases, and source code. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Adopt a test-driven development (TDD) approach to writing software, where you develop test cases to specify and validate your code. To start, create test cases for each function. If the test fails, you write new code to pass the test. This approach helps you validate the expected result of each function. Run unit tests and validate that they pass before you commit code to a source code repository. Implement both unit and integration tests as part of the build, test, and deployment stages of the CI/CD pipeline. Automate testing, and automatically initiate tests whenever a new version of the application is ready to be deployed. If success criteria are not met, the pipeline is halted or rolled back. If the application is a web or mobile app, perform automated integration testing on multiple desktop browsers or real devices. This approach is particularly useful to validate the compatibility and functionality of mobile apps across a diverse range of devices. ### Implementation steps 1. Write unit tests before you write functional code (test-driven development, or TDD). Establish code guidelines so that writing and running unit tests are a non-functional coding requirement. 2. Create a suite of automated integration tests that cover the identified testable functionalities. These tests should simulate user interactions and validate the expected outcomes. 3. Create the necessary test environment to run the integration tests. This may include staging or pre-production environments that closely mimic the production environment. 4. Set up your source, build, test, and deploy stages using the AWS CodePipeline console or AWS Command Line Interface (CLI). 5. Deploy the application once the code has been built and tested. AWS CodeDeploy can deploy it to your staging (testing) and production environments. These environments may include Amazon EC2 instances, AWS Lambda functions, or on-premises servers. The same deployment mechanism should be used to deploy the application to all environments. 6. Monitor the progress of your pipeline and the status of each stage. Use quality checks to block the pipeline based on the status of your tests. You can also receive notifications for any pipeline stage failure or pipeline completion. 7. Continually monitor the results of the tests, and look for patterns, regressions or areas that require more attention. Use this information to improve the test suite, identify areas of the application that need more robust testing, and optimize the deployment process.

๐Ÿ’ผ REL08-BP03 Integrate resiliency testing as part of your deployment

Integrate resiliency testing by consciously introducing failures in your system to measure its capability in case of disruptive scenarios. Resilience tests are different from unit and function tests that are usually integrated in deployment cycles, as they focus on the identification of unanticipated failures in your system. While it is safe to start with resiliency testing integration in pre-production, set a goal to implement these tests in production as a part of your game days. **Desired outcome:** Resiliency testing helps build confidence in the system's ability to withstand degradation in production. Experiments identify weak points that could lead to failure, which helps you improve your system to automatically and efficiently mitigate failure and degradation. **Common anti-patterns:** - Lack of observability and monitoring in deployment processes - Reliance on humans to resolve system failures - Poor quality analysis mechanisms - Focus on known issues in a system and a lack of experimentation to identify any unknowns - Identification of failures, but no resolution - No documentation of findings and runbooks **Benefits of establishing best practices:** Resilience testing integrated in your deployments helps to identify unknown issues in the system that otherwise go unnoticed, which can lead to downtime in production. Identification of these unknowns in a system helps you document findings, integrate testing into your CI/CD process, and build runbooks, which simplify mitigation through efficient, repeatable mechanisms. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance The most common resiliency testing forms that can be integrated in your system's deployments are disaster recovery and chaos engineering. - Include updates to your disaster recovery plans and standard operating procedures (SOPs) with any significant deployment. - Integrate reliability testing into your automated deployment pipelines. Services such as AWS Resilience Hub can be integrated into your CI/CD pipeline to establish continuous resilience assessments that are automatically evaluated as part of every deployment. - Define your applications in AWS Resilience Hub. Resilience assessments generate code snippets that help you create recovery procedures as AWS Systems Manager documents for your applications and provide a list of recommended Amazon CloudWatch monitors and alarms. - Once your DR plans and SOPs are updated, complete disaster recovery testing to verify that they are effective. Disaster recovery testing helps you determine if you can restore your system after an event and return to normal operations. You can simulate various disaster recovery strategies and identify whether your planning is sufficient to meet your uptime requirements. Common disaster recovery strategies include backup and restore, pilot light, cold standby, warm standby, hot standby, and active-active, and they all differ in cost and complexity. Before disaster recovery testing, we recommend that you define your recovery time objective (RTO) and recovery point objective (RPO) to simplify the choice of strategy to simulate. AWS offers disaster recovery tools like AWS Elastic Disaster Recovery to help you get started with your planning and testing. - Chaos engineering experiments introduce disruptions to the system, such as network outages and service failures. By simulating with controlled failures, you can discover your system's vulnerabilities while containing the impacts of the injected failures. Just like the other strategies, run controlled failure simulations in non-production environments using services like AWS Fault Injection Service to gain confidence before deploying in production.

๐Ÿ’ผ REL08-BP04 Deploy using immutable infrastructure

Immutable infrastructure is a model that mandates that no updates, security patches, or configuration changes happen in-place on production workloads. When a change is needed, the architecture is built onto new infrastructure and deployed into production. Follow an immutable infrastructure deployment strategy to increase the reliability, consistency, and reproducibility in your workload deployments. **Desired outcome:** With immutable infrastructure, no in-place modifications are allowed to run infrastructure resources within a workload. Instead, when a change is needed, a new set of updated infrastructure resources containing all the necessary changes are deployed in parallel to your existing resources. This deployment is validated automatically, and if successful, traffic is gradually shifted to the new set of resources. This deployment strategy applies to software updates, security patches, infrastructure changes, configuration updates, and application updates, among others. **Common anti-patterns:** - Implementing in-place changes to running infrastructure resources. **Benefits of establishing this best practice:** - **Increased consistency across environments:** Since there are no differences in infrastructure resources across environments, consistency is increased and testing is simplified. - **Reduction in configuration drifts:** By replacing infrastructure resources with a known and version-controlled configuration, the infrastructure is set to a known, tested, and trusted state, avoiding configuration drifts. - **Reliable atomic deployments:** Deployments either complete successfully or nothing changes, increasing consistency and reliability in the deployment process. - **Simplified deployments:** Deployments are simplified because they don't need to support upgrades. Upgrades are just new deployments. - **Safer deployments with fast rollback and recovery processes:** Deployments are safer because the previous working version is not changed. You can roll back to it if errors are detected. - **Enhanced security posture:** By not allowing changes to infrastructure, remote access mechanisms (such as SSH) can be disabled. This reduces the attack vector, improving your organization's security posture. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance **Automation** When defining an immutable infrastructure deployment strategy, it is recommended to use automation as much as possible to increase reproducibility and minimize the potential of human error. For more detail, see REL08-BP05 Deploy changes with automation and Automating safe, hands-off deployments. With infrastructure as code (IaC), infrastructure provisioning, orchestration, and deployment steps are defined in a programmatic, descriptive, and declarative way and stored in a source control system. Leveraging infrastructure as code makes it simpler to automate infrastructure deployment and helps achieve infrastructure immutability. **Deployment patterns** When a change in the workload is required, the immutable infrastructure deployment strategy mandates that a new set of infrastructure resources is deployed, including all necessary changes. It is important for this new set of resources to follow a rollout pattern that minimizes user impact. There are two main strategies for this deployment: - **Canary deployment:** The practice of directing a small number of your customers to the new version, usually running on a single service instance (the canary). You then deeply scrutinize any behavior changes or errors that are generated. You can remove traffic from the canary if you encounter critical problems and send the users back to the previous version. If the deployment is successful, you can continue to deploy at your desired velocity, while monitoring the changes for errors, until you are fully deployed. AWS CodeDeploy can be configured with a deployment configuration that allows a canary deployment. - **Blue/green deployment:** Similar to the canary deployment, except that a full fleet of the application is deployed in parallel. You alternate your deployments across the two stacks (blue and green). Once again, you can send traffic to the new version, and fall back to the old version if you see problems with the deployment. Commonly all traffic is switched at once, however you can also use fractions of your traffic to each version to dial up the adoption of the new version using the weighted DNS routing capabilities of Amazon Route 53. AWS CodeDeploy and AWS Elastic Beanstalk can be configured with a deployment configuration that allows a blue/green deployment. **Drift detection** Drift is defined as any change that causes an infrastructure resource to have a different state or configuration to what is expected. Any type of unmanaged configuration change goes against the notion of immutable infrastructure, and should be detected and remediated in order to have a successful implementation of immutable infrastructure. ### Implementation steps 1. Disallow the in-place modification of running infrastructure resources. - You can use AWS Identity and Access Management (IAM) to specify who or what can access services and resources in AWS, centrally manage fine-grained permissions, and analyze access to refine permissions across AWS. 2. Automate the deployment of infrastructure resources to increase reproducibility and minimize the potential of human error. - As described in the Introduction to DevOps on AWS whitepaper, automation is a cornerstone with AWS services and is internally supported in all services, features, and offerings. - Prebaking your Amazon Machine Image (AMI) can speed up the time to launch them. EC2 Image Builder is a fully managed AWS service that helps you automate the creation, maintenance, validation, sharing, and deployment of customized, secure, and up-to-date Linux or Windows custom AMI. - Some of the services that support automation are: - **AWS Elastic Beanstalk:** A service to rapidly deploy and scale web applications developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, NGINX, Passenger, and IIS. - **AWS Proton:** Helps platform teams connect and coordinate all the different tools your development teams need for infrastructure provisioning, code deployments, monitoring, and updates. AWS Proton enables automated infrastructure as code provisioning and deployment of serverless and container-based applications. - Leveraging infrastructure as code makes it easy to automate infrastructure deployment, and helps achieve infrastructure immutability. AWS provides services that enable the creation, deployment, and maintenance of infrastructure in a programmatic, descriptive, and declarative way. - **AWS CloudFormation:** Helps developers create AWS resources in an orderly and predictable fashion. Resources are written in text files using JSON or YAML format. The templates require a specific syntax and structure that depends on the types of resources being created and managed. You author your resources in JSON or YAML with any code editor, check it into a version control system, and then AWS CloudFormation builds the specified services in safe, repeatable manner. - **AWS Serverless Application Model (AWS SAM):** An open-source framework that you can use to build serverless applications on AWS. AWS SAM integrates with other AWS services, and is an extension of AWS CloudFormation. - **AWS Cloud Development Kit (AWS CDK):** An open-source software development framework to model and provision your cloud application resources using familiar programming languages. You can use AWS CDK to model application infrastructure using TypeScript, Python, Java, and .NET. AWS CDK uses AWS CloudFormation in the background to provision resources in a safe, repeatable manner. - **AWS Cloud Control API:** Introduces a common set of Create, Read, Update, Delete, and List (CRUDL) APIs to help developers manage their cloud infrastructure in an easy and consistent way. The Cloud Control API common APIs allow developers to uniformly manage the lifecycle of AWS and third-party services. 3. Implement deployment patterns that minimize user impact: - **Canary deployments:** - Set up an API Gateway canary release deployment. - Create a pipeline with canary deployments for Amazon ECS using AWS App Mesh. - **Blue/green deployments:** The Blue/Green Deployments on AWS whitepaper describes example techniques to implement blue/green deployment strategies. 4. Detect configuration or state drifts.

๐Ÿ’ผ REL08-BP05 Deploy changes with automation

Deployments and patching are automated to eliminate negative impact. Making changes to production systems is one of the largest risk areas for many organizations. We consider deployments a first-class problem to be solved alongside the business problems that the software addresses. Today, this means the use of automation wherever practical in operations, including testing and deploying changes, adding or removing capacity, and migrating data. **Desired outcome:** You build automated deployment safety into the release process with extensive pre-production testing, automatic rollbacks, and staggered production deployments. This automation minimizes the potential impact on production caused by failed deployments, and developers no longer need to actively watch deployments to production. **Common anti-patterns:** - You perform manual changes. - You skip steps in your automation through manual emergency workflows. - You don't follow your established plans and processes in favor of accelerated timelines. - You perform rapid follow-on deployments without allowing for bake time. **Benefits of establishing this best practice:** When you use automation to deploy all changes, you remove the potential for introduction of human error and provide the ability to test before you change production. Performing this process prior to production push verifies that your plans are complete. Additionally, automatic rollback into your release process can identify production issues and return your workload to its previously-working operational state. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Automate your deployment pipeline. Deployment pipelines allow you to invoke automated testing and detection of anomalies, and either halt the pipeline at a certain step before production deployment, or automatically roll back a change. An integral part of this is the adoption of the culture of continuous integration and continuous delivery/deployment (CI/CD), where a commit or code change passes through various automated stage gates from build and test stages to deployment on production environments. Although conventional wisdom suggests that you keep people in the loop for the most difficult operational procedures, we suggest that you automate the most difficult procedures for that very reason. ### Implementation steps You can automate deployments to remove manual operations by following these steps: 1. Set up a code repository to store your code securely: Use a hosted source code management system based on a popular technology such as Git to store your source code and infrastructure as code (IaC) configuration. 2. Configure a continuous integration service to compile your source code, run tests, and create deployment artifacts: To set up a build project for this purpose, see Getting started with AWS CodeBuild using the console. 3. Set up a deployment service that automates application deployments and handles the complexity of application updates without reliance on error-prone manual deployments: AWS CodeDeploy automates software deployments to a variety of compute services, such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premise servers. To configure these steps, see Getting started with CodeDeploy. 4. Set up a continuous delivery service that automates your release pipelines for quicker and more reliable application and infrastructure updates: Consider using AWS CodePipeline to help you automate your release pipelines.

๐Ÿ’ผ REL09-BP01 Identify and back up all data that needs to be backed up, or reproduce the data from sources

Understand and use the backup capabilities of the data services and resources used by the workload. Most services provide capabilities to back up workload data. **Desired outcome:** Data sources have been identified and classified based on criticality. Then, establish a strategy for data recovery based on the RPO. This strategy involves either backing up these data sources, or having the ability to reproduce data from other sources. In the case of data loss, the strategy implemented allows recovery or the reproduction of data within the defined RPO and RTO. **Cloud maturity phase:** Foundational **Common anti-patterns:** - Not aware of all data sources for the workload and their criticality. - Not taking backups of critical data sources. - Taking backups of only some data sources without using criticality as a criterion. - No defined RPO, or backup frequency cannot meet RPO. - Not evaluating if a backup is necessary or if data can be reproduced from other sources. **Benefits of establishing this best practice:** Identifying the places where backups are necessary and implementing a mechanism to create backups, or being able to reproduce the data from an external source improves the ability to restore and recover data during an outage. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance All AWS data stores offer backup capabilities. Services such as Amazon RDS and Amazon DynamoDB additionally support automated backup that allows point-in-time recovery (PITR), which allows you to restore a backup to any time up to five minutes or less before the current time. Many AWS services offer the ability to copy backups to another AWS Region. AWS Backup is a tool that gives you the ability to centralize and automate data protection across AWS services. AWS Elastic Disaster Recovery allows you to copy full server workloads and maintain continuous data protection from on-premise, cross-AZ or cross-Region, with a Recovery Point Objective (RPO) measured in seconds. Amazon S3 can be used as a backup destination for self-managed and AWS-managed data sources. AWS services such as Amazon EBS, Amazon RDS, and Amazon DynamoDB have built-in capabilities to create backups. Third-party backup software can also be used. On-premises data can be backed up to the AWS Cloud using AWS Storage Gateway or AWS DataSync. Amazon S3 buckets can be used to store this data on AWS. Amazon S3 offers multiple storage tiers such as Amazon S3 Glacier or S3 Glacier Deep Archive to reduce cost of data storage. You might be able to meet data recovery needs by reproducing the data from other sources. For example, Amazon ElastiCache replica nodes or Amazon RDS read replicas could be used to reproduce data if the primary is lost. In cases where sources like this can be used to meet your Recovery Point Objective (RPO) and Recovery Time Objective (RTO), you might not require a backup. Another example, if working with Amazon EMR, it might not be necessary to backup your HDFS data store, as long as you can reproduce the data into Amazon EMR from Amazon S3. When selecting a backup strategy, consider the time it takes to recover data. The time needed to recover data depends on the type of backup (in the case of a backup strategy), or the complexity of the data reproduction mechanism. This time should fall within the RTO for the workload. ### Implementation steps 1. Identify all data sources for the workload. Data can be stored on a number of resources such as databases, volumes, filesystems, logging systems, and object storage. Refer to the Resources section to find Related documents on different AWS services where data is stored, and the backup capability these services provide. 2. Classify data sources based on criticality. Different data sets will have different levels of criticality for a workload, and therefore different requirements for resiliency. For example, some data might be critical and require a RPO near zero, while other data might be less critical and can tolerate a higher RPO and some data loss. Similarly, different data sets might have different RTO requirements as well. 3. Use AWS or third-party services to create backups of the data. AWS Backup is a managed service that allows creating backups of various data sources on AWS. AWS Elastic Disaster Recovery handles automated sub-second data replication to an AWS Region. Most AWS services also have native capabilities to create backups. The AWS Marketplace has many solutions that provide these capabilities as well. 4. For data that is not backed up, establish a data reproduction mechanism. You might choose not to backup data that can be reproduced from other sources for various reasons. There might be a situation where it is cheaper to reproduce data from sources when needed rather than creating a backup as there may be a cost associated with storing backups. Another example is where restoring from a backup takes longer than reproducing the data from sources, resulting in a breach in RTO. In such situations, consider tradeoffs and establish a well-defined process for how data can be reproduced from these sources when data recovery is necessary. For example, if you have loaded data from Amazon S3 to a data warehouse (like Amazon Redshift), or MapReduce cluster (like Amazon EMR) to do analysis on that data, this may be an example of data that can be reproduced from other sources. As long as the results of these analyses are either stored somewhere or reproducible, you would not suffer a data loss from a failure in the data warehouse or MapReduce cluster. Other examples that can be reproduced from sources include caches (like Amazon ElastiCache) or RDS read replicas. 5. Establish a cadence for backing up data. Creating backups of data sources is a periodic process and the frequency should depend on the RPO. **Level of effort for the Implementation Plan:** Moderate

๐Ÿ’ผ REL09-BP02 Secure and encrypt backups

Control and detect access to backups using authentication and authorization. Prevent and detect if data integrity of backups is compromised using encryption. Implement security controls to prevent unauthorized access to backup data. Encrypt backups to protect the confidentiality and integrity of your data. **Common anti-patterns** - Having the same access to the backups and restoration automation as you do to the data. - Not encrypting your backups. - Not implementing immutability for protection against deletion or tampering. - Using the same security domain for production and backup systems. - Not validating backup integrity through regular testing. **Benefits of establishing this best practice** - Securing your backups prevents tampering with the data, and encryption of the data prevents access to that data if it is accidentally exposed. - Enhanced protection against ransomware and other cyber threats that target backup infrastructure. - Reduced recovery time following a cyber incident through validated recovery processes. - Improved business continuity capabilities during security incidents. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Control and detect access to backups using authentication and authorization, such as AWS Identity and Access Management (IAM). Prevent and detect if data integrity of backups is compromised using encryption. Amazon S3 supports several methods of encryption of your data at rest. Using server- side encryption, Amazon S3 accepts your objects as unencrypted data, and then encrypts them as they are stored. Using client-side encryption, your workload application is responsible for encrypting the data before it is sent to Amazon S3. Both methods allow you to use AWS Key Management Service (AWS KMS) to create and store the data key, or you can provide your own key, which you are then responsible for. Using AWS KMS, you can set policies using IAM on who can and cannot access your data keys and decrypted data. For Amazon RDS, if you have chosen to encrypt your databases, then your backups are encrypted also. DynamoDB backups are always encrypted. When using AWS Elastic Disaster Recovery, all data in transit and at rest is encrypted. With Elastic Disaster Recovery, data at rest can be encrypted using either the default Amazon EBS encryption Volume Encryption Key or a custom customer-managed key. ## Cyber resilience considerations To enhance backup security against cyber threats, consider implementing these additional controls besides encryption: - Implement immutability using AWS Backup Vault Lock or Amazon S3 Object Lock to prevent backup data from being altered or deleted during its retention period, protecting against ransomware and malicious deletion. - Establish logical isolation between production and backup environments with AWS Backup logically air-gapped vault for critical systems, creating separation that helps prevent compromise of both environments simultaneously. - Validate backup integrity regularly using AWS Backup restore testing to verify that backups are not corrupted and can be successfully restored following a cyber incident. - Implement multi-party approval for critical recovery operations using AWS Backup multi-party approval to prevent unauthorized or malicious recovery attempts by requiring authorization from multiple designated approvers. ### Implementation steps 1. Use encryption on each of your data stores. If your source data is encrypted, then the backup will also be encrypted. - Use encryption in Amazon RDS. You can configure encryption at rest using AWS Key Management Service when you create an RDS instance. - Use encryption on Amazon EBS volumes. You can configure default encryption or specify a unique key upon volume creation. - Use the required Amazon DynamoDB encryption. DynamoDB encrypts all data at rest. You can either use an AWS owned AWS KMS key or an AWS managed KMS key, specifying a key that is stored in your account. - Encrypt your data stored in Amazon EFS. Configure the encryption when you create your file system. - Configure the encryption in the source and destination Regions. You can configure encryption at rest in Amazon S3 using keys stored in KMS, but the keys are Region- specific. You can specify the destination keys when you configure the replication. - Choose whether to use the default or custom Amazon EBS encryption for Elastic Disaster Recovery. This option will encrypt your replicated data at rest on the Staging Area Subnet disks and the replicated disks. 2. Implement least privilege permissions to access your backups. Follow best practices to limit the access to the backups, snapshots, and replicas in accordance with security best practices. 3. Configure immutability for critical backups. For critical data, implement AWS Backup Vault Lock or S3 Object Lock to prevent deletion or alteration during the specified retention period. For implementation details, see AWS Backup Vault Lock. 4. Create logical separation for backup environments. Implement AWS Backup logically air-gapped vault for critical systems requiring enhanced protection from cyber threats. For implementation guidance, see Building cyber resiliency with AWS Backup logically air-gapped vault. 5. Implement backup validation processes. Configure AWS Backup restore testing to regularly verify that backups are not corrupted and can be successfully restored following a cyber incident. 6. Configure multi-party approval for sensitive recovery operations. For critical systems, implement AWS Backup multi-party approval to require authorization from multiple designated approvers before recovery can proceed. For implementation details, see Improve recovery resilience with AWS Backup support for Multi-party approval.

๐Ÿ’ผ REL09-BP03 Perform data backup automatically

Configure backups to be taken automatically based on a periodic schedule informed by the Recovery Point Objective (RPO), or by changes in the dataset. Critical datasets with low data loss requirements need to be backed up automatically on a frequent basis, whereas less critical data where some loss is acceptable can be backed up less frequently. **Desired outcome:** An automated process that creates backups of data sources at an established cadence. **Common anti-patterns** - Performing backups manually. - Using resources that have backup capability, but not including the backup in your automation. **Benefits of establishing this best practice** Automating backups verifies that they are taken regularly based on your RPO, and alerts you if they are not taken. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance AWS Backup can be used to create automated data backups of various AWS data sources. Amazon RDS instances can be backed up almost continuously every five minutes and Amazon S3 objects can be backed up almost continuously every fifteen minutes, providing for point-in-time recovery (PITR) to a specific point in time within the backup history. For other AWS data sources, such as Amazon EBS volumes, Amazon DynamoDB tables, or Amazon FSx file systems, AWS Backup can run automated backup as frequently as every hour. These services also offer native backup capabilities. AWS services that offer automated backup with point-in-time recovery include Amazon DynamoDB, Amazon RDS, and Amazon Keyspaces (for Apache Cassandra) โ€“ these can be restored to a specific point in time within the backup history. Most other AWS data storage services offer the ability to schedule periodic backups, as frequently as every hour. Amazon RDS and Amazon DynamoDB offer continuous backup with point-in-time recovery. Amazon S3 versioning, once turned on, is automatic. Amazon Data Lifecycle Manager can be used to automate the creation, copy and deletion of Amazon EBS snapshots. It can also automate the creation, copy, deprecation and deregistration of Amazon EBS-backed Amazon Machine Images (AMIs) and their underlying Amazon EBS snapshots. AWS Elastic Disaster Recovery provides continuous block-level replication from the source environment (on-premises or AWS) to the target recovery region. Point-in-time Amazon EBS snapshots are automatically created and managed by the service. For a centralized view of your backup automation and history, AWS Backup provides a fully managed, policy-based backup solution. It centralizes and automates the back up of data across multiple AWS services in the cloud as well as on premises using the AWS Storage Gateway. In additional to versioning, Amazon S3 features replication. The entire S3 bucket can be automatically replicated to another bucket in the same, or a different AWS Region. ### Implementation steps 1. Identify data sources that are currently being backed up manually. For more detail, see REL09-BP01 Identify and back up all data that needs to be backed up, or reproduce the data from sources. 2. Determine the RPO for the workload. For more detail, see REL13-BP01 Define recovery objectives for downtime and data loss. 3. Use an automated backup solution or managed service. AWS Backup is a fully-managed service that makes it easy to centralize and automate data protection across AWS services, in the cloud, and on-premises. Using backup plans in AWS Backup, create rules which define the resources to backup, and the frequency at which these backups should be created. This frequency should be informed by the RPO established in Step 2. For hands-on guidance on how to create automated backups using AWS Backup, see Testing Backup and Restore of Data. Native backup capabilities are offered by most AWS services that store data. For example, RDS can be leveraged for automated backups with point-in-time recovery (PITR). 4. For data sources not supported by an automated backup solution or managed service such as on-premises data sources or message queues, consider using a trusted third-party solution to create automated backups. Alternatively, you can create automation to do this using the AWS CLI or SDKs. You can use AWS Lambda Functions or AWS Step Functions to define the logic involved in creating a data backup, and use Amazon EventBridge to invoke it at a frequency based on your RPO. **Level of effort for the Implementation Plan:** Low

๐Ÿ’ผ REL09-BP04 Perform periodic recovery of the data to verify backup integrity and processes

Validate that your backup process implementation meets your Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) by performing a recovery test. **Desired outcome:** Data from backups is periodically recovered using well-defined mechanisms to verify that recovery is possible within the established recovery time objective (RTO) for the workload. Verify that restoration from a backup results in a resource that contains the original data without any of it being corrupted or inaccessible, and with data loss within the recovery point objective (RPO). **Common anti-patterns** - Restoring a backup, but not querying or retrieving any data to check that the restoration is usable. - Assuming that a backup exists. - Assuming that the backup of a system is fully operational and that data can be recovered from it. - Assuming that the time to restore or recover data from a backup falls within the RTO for the workload. - Assuming that the data contained on the backup falls within the RPO for the workload. - Restoring when necessary, without using a runbook or outside of an established automated procedure. **Benefits of establishing this best practice** Testing the recovery of the backups verifies that data can be restored when needed without having any worry that data might be missing or corrupted, that the restoration and recovery is possible within the RTO for the workload, and any data loss falls within the RPO for the workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Testing backup and restore capability increases confidence in the ability to perform these actions during an outage. Periodically restore backups to a new location and run tests to verify the integrity of the data. Some common tests that should be performed are checking if all data is available, is not corrupted, is accessible, and that any data loss falls within the RPO for the workload. Such tests can also help ascertain if recovery mechanisms are fast enough to accommodate the workload's RTO. Using AWS, you can stand up a testing environment and restore your backups to assess RTO and RPO capabilities, and run tests on data content and integrity. Additionally, Amazon RDS and Amazon DynamoDB allow point-in-time recovery (PITR). Using continuous backup, you can restore your dataset to the state it was in at a specified date and time. If all the data is available, is not corrupted, is accessible, and any data loss falls within the RPO for the workload. Such tests can also help ascertain if recovery mechanisms are fast enough to accommodate the workload's RTO. AWS Elastic Disaster Recovery offers continual point-in-time recovery snapshots of Amazon EBS volumes. As source servers are replicated, point-in-time states are chronicled over time based on the configured policy. Elastic Disaster Recovery helps you verify the integrity of these snapshots by launching instances for test and drill purposes without redirecting the traffic. ### Implementation steps 1. Identify data sources that are currently being backed up and where these backups are being stored. For implementation guidance, see REL09-BP01 Identify and back up all data that needs to be backed up, or reproduce the data from sources. 2. Establish criteria for data validation for each data source. Different types of data will have different properties which might require different validation mechanisms. Consider how this data might be validated before you are confident to use it in production. Some common ways to validate data are using data and backup properties such as data type, format, checksum, size, or a combination of these with custom validation logic. For example, this might be a comparison of the checksum values between the restored resource and the data source at the time the backup was created. 3. Establish RTO and RPO for restoring the data based on data criticality. For implementation guidance, see REL13-BP01 Define recovery objectives for downtime and data loss. 4. Assess your recovery capability. Review your backup and restore strategy to understand if it can meet your RTO and RPO, and adjust the strategy as necessary. Using AWS Resilience Hub, you can run an assessment of your workload. The assessment evaluates your application configuration against the resiliency policy and reports if your RTO and RPO targets can be met. 5. Do a test restore using currently established processes used in production for data restoration. These processes depend on how the original data source was backed up, the format and storage location of the backup itself, or if the data is reproduced from other sources. For example, if you are using a managed service such as AWS Backup, this might be as simple as restoring the backup into a new resource. If you used AWS Elastic Disaster Recovery you can launch a recovery drill. 6. Validate data recovery from the restored resource based on criteria you previously established for data validation. Does the restored and recovered data contain the most recent record or item at the time of backup? Does this data fall within the RPO for the workload? 7. Measure time required for restore and recovery and compare it to your established RTO. Does this process fall within the RTO for the workload? For example, compare the timestamps from when the restoration process started and when the recovery validation completed to calculate how long this process takes. All AWS API calls are timestamped and this information is available in AWS CloudTrail. While this information can provide details on when the restore process started, the end timestamp for when the validation was completed should be recorded by your validation logic. If using an automated process, then services like Amazon DynamoDB can be used to store this information. Additionally, many AWS services provide an event history which provides timestamped information when certain actions occurred. Within AWS Backup, backup and restore actions are referred to as jobs, and these jobs contain timestamp information as part of its metadata which can be used to measure time required for restoration and recovery. 8. Notify stakeholders if data validation fails, or if the time required for restoration and recovery exceeds the established RTO for the workload. When implementing automation to do this, such as in this lab, services like Amazon Simple Notification Service (Amazon SNS) can be used to send push notifications such as email or SMS to stakeholders. These messages can also be published to messaging applications such as Amazon Chime, Slack, or Microsoft Teams or used to create tasks as OpsItems using AWS Systems Manager OpsCenter. 9. Automate this process to run periodically. For example, services like AWS Lambda or a State Machine in AWS Step Functions can be used to automate the restore and recovery processes, and Amazon EventBridge can be used to invoke this automation workflow periodically as shown in the architecture diagram below. Learn how to Automate data recovery validation with AWS Backup. Additionally, this Well- Architected lab provides a hands-on experience on one way to do automation for several of the steps here. **Level of effort for the Implementation Plan:** Moderate to high depending on the complexity of the validation criteria.

๐Ÿ’ผ REL10-BP01 Deploy the workload to multiple locations

Distribute workload data and resources across multiple Availability Zones or, where necessary, across AWS Regions. A fundamental principle for service design in AWS is to avoid single points of failure, including the underlying physical infrastructure. AWS provides cloud computing resources and services globally across multiple geographic locations called Regions. Each Region is physically and logically independent and consists of three or more Availability Zones (AZs). Availability Zones are geographically close to each other but are physically separated and isolated. When you distribute your workloads among Availability Zones and Regions, you mitigate the risk of threats such as fires, floods, weather-related disasters, earthquakes, and human error. Create a location strategy to provide high availability that is appropriate for your workloads. **Desired outcome:** Production workloads are distributed among multiple Availability Zones (AZs) or Regions to achieve fault tolerance and high availability. **Common anti-patterns** - Your production workload exists only in a single Availability Zone. - You implement a multi-Region architecture when a multi-AZ architecture would satisfy business requirements. - Your deployments or data become desynchronized, which results in configuration drift or under-replicated data. - You don't account for dependencies between application components if resilience and multi-location requirements differ between those components. **Benefits of establishing this best practice** - Your workload is more resilient to incidents, such as power or environmental control failures, natural disasters, upstream service failures, or network issues that impact an AZ or an entire Region. - You can access a wider inventory of Amazon EC2 instances and reduce the likelihood of InsufficientCapacityExceptions (ICE) when launching specific EC2 instance types. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Deploy and operate all production workloads in at least two Availability Zones (AZs) in a Region. ### Using multiple Availability Zones Availability Zones are resource hosting locations that are physically separated from each other to avoid correlated failures due to risks such as fires, floods, and tornadoes. Each Availability Zone has independent physical infrastructure, including utility power connections, backup power sources, mechanical services, and network connectivity. This arrangement limits faults in any of these components to just the impacted Availability Zone. For example, if an AZ-wide incident makes EC2 instances unavailable in the affected Availability Zone, your instances in other Availability Zone remains available. Despite being physically separated, Availability Zones in the same AWS Region are close enough to provide high-throughput, low-latency (single-digit millisecond) networking. You can replicate data synchronously between Availability Zones for most workloads without significantly impacting user experience. This means you can use Availability Zones in a Region in an active/active or active/standby configuration. All compute associated with your workload should be distributed among multiple Availability Zones. This includes Amazon EC2 instances, AWS Fargate tasks, and VPC-attached AWS Lambda functions. AWS compute services, including EC2 Auto Scaling, Amazon Elastic Container Service (ECS), and Amazon Elastic Kubernetes Service (EKS), provide ways for you to launch and manage compute across Availability Zones. Configure them to automatically replace compute as needed in a different Availability Zone to maintain availability. To direct traffic to available Availability Zones, place a load balancer in front of your compute, such as an Application Load Balancer or Network Load Balancer. AWS load balancers can reroute traffic to available instances in the event of an Availability Zone impairment. You should also replicate data for your workload and make it available in multiple Availability Zones. Some AWS managed data services, such as Amazon S3, Amazon Elastic File Service (EFS), Amazon Aurora, Amazon DynamoDB, Amazon Simple Queue Service (SQS), and Amazon Kinesis Data Streams replicate data in multiple Availability Zones by default and are robust against Availability Zone impairment. With other AWS managed data services, such as Amazon Relational Database Service (RDS), Amazon Redshift, and Amazon ElastiCache, you must enable multi-AZ replication. Once enabled, these services automatically detect an Availability Zone impairment, redirect requests to an available Availability Zone, and re-replicate data as needed after recovery without customer intervention. Familiarize yourself with the user guide for each AWS managed data service you use to understand its multi-AZ capabilities, behaviors, and operations. If you are using self-managed storage, such as Amazon Elastic Block Store (EBS) volumes or Amazon EC2 instance storage, you must manage multi-AZ replication yourself. ### Using multiple AWS Regions If you have workloads that require extreme resilience (such as critical infrastructure, health-related applications, or services with stringent customer or mandated availability requirements), you may require additional availability beyond what a single AWS Region can provide. In this case, you should deploy and operate your workload across at least two AWS Regions (assuming that your data residency requirements allow it). AWS Regions are located in different geographical regions around the world and in multiple continents. AWS Regions have even greater physical separation and isolation than Availability Zones alone. AWS services, with few exceptions, take advantage of this design to operate fully independently between different Regions (also known as Regional services). A failure of an AWS Regional service is designed not to impact the service in a different Region. When you operate your workload in multiple Regions, you should consider additional requirements. Because resources in different Regions are separate from and independent of one another, you must duplicate your workload's components in each Region. This includes foundational infrastructure, such as VPCs, in addition to compute and data services. NOTE: When you consider a multi-Regional design, verify that your workload is capable of running in a single Region. If you create dependencies between Regions where a component in one Region relies on services or components in a different Region, you can increase the risk of failure and significantly weaken your reliability posture. To ease multi-Regional deployments and maintain consistency, AWS CloudFormation StackSets can replicate your entire AWS infrastructure across multiple Regions. AWS CloudFormation can also detect configuration drift and inform you when your AWS resources in a Region are out of sync. Many AWS services offer multi-region replication for important workload assets. For example, EC2 Image Builder can publish your EC2 machine images (AMIs) after every build to each Region you use. Amazon Elastic Container Registry (ECR) can replicate your container images to your selected Regions. You must also replicate your data across each of your chosen Regions. Many AWS managed data services provide cross-Regional replication capability, including Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Aurora, Amazon Redshift, Amazon Elasticache, and Amazon EFS. Amazon DynamoDB global tables accept writes in any supported Region and will replicate data among all your other configured Regions. With other services, you must designate a primary Region for writes, as other Regions contain read-only replicas. For each AWS-managed data service your workload uses, refer to its user guide and developer guide to understand its multi-Region capabilities and limitations. Pay special attention to where writes must be directed, transactional capabilities and limitations, how replication is performed, and how to monitor synchronization between Regions. AWS also provides the ability to route request traffic to your Regional deployments with great flexibility. For example, you can configure your DNS records using Amazon Route 53 to direct traffic to the closest available Region to the user. Alternatively, you can configure your DNS records in an active/standby configuration, where you designate one Region as primary and fall back to a Regional replica only if the primary Region becomes unhealthy. You can configure Route 53 health checks to detect unhealthy endpoints and perform automatic failover and additionally use Amazon Application Recovery Controller (ARC) to provide a highly-available routing control for manually re-routing traffic as needed. Even if you choose not to operate in multiple Regions for high availability, consider multiple Regions as part of your disaster recovery (DR) strategy. If possible, replicate your workload's infrastructure components and data in a warm standby or pilot light configuration in a secondary Region. In this design, you replicate baseline infrastructure from the primary Region such as VPCs, Auto Scaling groups, container orchestrators, and other components, but you configure the variable-sized components in the standby Region (such as the number of EC2 instances and database replicas) to be a minimally-operable size. You also arrange for continuous data replication from the primary Region to the standby Region. If an incident occurs, you can then scale out, or grow, the resources in the standby Region, and then promote it to become the primary Region. ### Implementation steps 1. Work with business stakeholders and data residency experts to determine which AWS Regions can be used to host your resources and data. 2. Work with business and technical stakeholders to evaluate your workload, and determine whether its resilience needs can be met by a multi-AZ approach (single AWS Region) or if they require a multi-Region approach (if multiple Regions are permitted). The use of multiple Regions can achieve greater availability but can involve additional complexity and cost. Consider the following factors in your evaluation: 1. Business objectives and customer requirements: How much downtime is permitted should a workload-impacting incident occur in an Availability Zone or a Region? Evaluate your recovery point objectives as discussed in REL13-BP01 Define recovery objectives for downtime and data loss. 2. Disaster recovery (DR) requirements: What kind of potential disaster do you want to insure yourself against? Consider the possibility of data loss or long-term unavailability at different scopes of impact from a single Availability Zone to an entire Region. If you replicate data and resources across Availability Zones, and a single Availability Zone experiences a sustained failure, you can recover service in another Availability Zone. If you replicate data and resources across Regions, you can recover service in another Region. 3. Deploy your compute resources into multiple Availability Zones. 1. In your VPC, create multiple subnets in different Availability Zones. Configure each to be large enough to accommodate the resources needed to serve the workload, even during an incident. For more detail, see REL02-BP03 Ensure IP subnet allocation accounts for expansion and availability. 2. If you are using Amazon EC2 instances, use EC2 Auto Scaling to manage your instances. Specify the subnets you chose in the previous step when you create your Auto Scaling groups. 3. If you are using AWS Fargate compute for Amazon ECS or Amazon EKS, select the subnets you chose in the first step when you create an ECS Service, launch an ECS task, or create a Fargate profile for EKS. 4. If you are using AWS Lambda functions that need to run in your VPC, select the subnets you chose in the first step when you create the Lambda function. For any functions that do not have a VPC configuration, AWS Lambda manages availability for you automatically. 5. Place traffic directors such as load balancers in front of your compute resources. If cross-zone load balancing is enabled, AWS Application Load Balancers and Network Load Balancers detect when targets such as EC2 instances and containers are unreachable due to Availability Zone impairment and reroute traffic towards targets in healthy Availability Zones. If you disable cross-zone load balancing, use Amazon Application Recovery Controller (ARC) to provide zonal shift capability. If you are using a third-party load balancer or have implemented your own load balancers, configure them with multiple front ends across different Availability Zones. 4. Replicate your workload's data across multiple Availability Zones. 1. If you use an AWS-managed data service such as Amazon RDS, Amazon ElastiCache, or Amazon FSx, study its user guide to understand its data replication and resilience capabilities. Enable cross-AZ replication and failover if necessary. 2. If you use AWS-managed storage services such as Amazon S3, Amazon EFS, and Amazon FSx, avoid using single-AZ or One Zone configurations for data that requires high durability. Use a multi-AZ configuration for these services. Check the respective service's user guide to determine whether multi-AZ replication is enabled by default or whether you must enable it. 3. If you run a self-managed database, queue, or other storage service, arrange for multi-AZ replication according to the application's instructions or best practices. Familiarize yourself with the failover procedures for your application. 5. Configure your DNS service to detect AZ impairment and reroute traffic to a healthy Availability Zone. Amazon Route 53, when used in combination with Elastic Load Balancers, can do this automatically. Route 53 can also be configured with failover records that use health checks to respond to queries with only healthy IP addresses. For any DNS records used for failover, specify a short time to live (TTL) value (for example, 60 seconds or less) to help prevent record caching from impeding recovery (Route 53 alias records supply appropriate TTLs for you). **Additional steps when using multiple AWS Regions** 1. Replicate all operating system (OS) and application code used by your workload across your selected Regions. Replicate Amazon Machine Images (AMIs) used by your EC2 instances if necessary using solutions such as Amazon EC2 Image Builder. Replicate container images stored in registries using solutions such as Amazon ECR cross-Region replication. Enable Regional replication for any Amazon S3 buckets used for storing application resources. 2. Deploy your compute resources and configuration metadata (such as parameters stored in AWS Systems Manager Parameter Store) into multiple Regions. Use the same procedures described in previous steps, but replicate the configuration for each Region you are using for your workload. Use infrastructure as code solutions such as AWS CloudFormation to uniformly reproduce the configurations among Regions. If you are using a secondary Region in a pilot light configuration for disaster recovery, you may reduce the number of your compute resources to a minimum value to save cost, with a corresponding increase in time to recovery. 3. Replicate your data from your primary Region into your secondary Regions. 1. Amazon DynamoDB global tables provide global replicas of your data that can be written to from any supported Region. With other AWS-managed data services, such as Amazon RDS, Amazon Aurora, and Amazon Elasticache, you designate a primary (read/write) Region and replica (read-only) Regions. Consult the respective services' user and developer guides for details on Regional replication. 2. If you are running a self-managed database, arrange for multi-Region replication according to the application's instructions or best practices. Familiarize yourself with the failover procedures for your application. 3. If your workload uses AWS EventBridge, you may need to forward selected events from your primary Region to your secondary Regions. To do so, specify event buses in your secondary Regions as targets for matched events in your primary Region. 4. Consider whether and to what extent you want to use identical encryption keys across Regions. A typical approach that balances security and ease of use is to use Region-scoped keys for Region-local data and authentication, and use globally-scoped keys for encryption of data that is replicated among different Regions. AWS Key Management Service (KMS) supports multi-region keys to securely distribute and protect keys shared across Regions. 5. Consider AWS Global Accelerator to improve the availability of your application by directing traffic to Regions that contain healthy endpoints.

๐Ÿ’ผ REL10-BP02 Automate recovery for components constrained to a single location

If components of the workload can only run in a single Availability Zone or in an on-premises data center, implement the capability to do a complete rebuild of the workload within your defined recovery objectives. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance If the best practice to deploy the workload to multiple locations is not possible due to technological constraints, you must implement an alternate path to resiliency. You must automate the ability to recreate necessary infrastructure, redeploy applications, and recreate necessary data for these cases. For example, Amazon EMR launches all nodes for a given cluster in the same Availability Zone because running a cluster in the same zone improves performance of the jobs flows as it provides a higher data access rate. If this component is required for workload resilience, then you must have a way to redeploy the cluster and its data. Also for Amazon EMR, you should provision redundancy in ways other than using Multi-AZ. You can provision multiple nodes. Using EMR File System (EMRFS), data in EMR can be stored in Amazon S3, which in turn can be replicated across multiple Availability Zones or AWS Regions. Similarly, for Amazon Redshift, by default it provisions your cluster in a randomly selected Availability Zone within the AWS Region that you select. All the cluster nodes are provisioned in the same zone. For stateful server-based workloads deployed to an on-premise data center, you can use AWS Elastic Disaster Recovery to protect your workloads in AWS. If you are already hosted in AWS, you can use Elastic Disaster Recovery to protect your workload to an alternative Availability Zone or Region. Elastic Disaster Recovery uses continual block-level replication to a lightweight staging area to provide fast, reliable recovery of on-premises and cloud-based applications. ### Implementation steps 1. Implement self-healing. Deploy your instances or containers using automatic scaling when possible. If you cannot use automatic scaling, use automatic recovery for EC2 instances or implement self-healing automation based on Amazon EC2 or ECS container lifecycle events. - Use Amazon EC2 Auto Scaling groups for instances and container workloads that have no requirements for a single instance IP address, private IP address, Elastic IP address, and instance metadata. - The launch template user data can be used to implement automation that can self-heal most workloads. - Use automatic recovery of Amazon EC2 instances for workloads that require a single instance ID address, private IP address, elastic IP address, and instance metadata. - Automatic Recovery will send recovery status alerts to a SNS topic as the instance failure is detected. - Use Amazon EC2 instance lifecycle events or Amazon ECS events to automate self-healing where automatic scaling or EC2 recovery cannot be used. - Use the events to invoke automation that will heal your component according to the process logic you require. - Protect stateful workloads that are limited to a single location using AWS Elastic Disaster Recovery.

๐Ÿ’ผ REL10-BP03 Use bulkhead architectures to limit scope of impact

Implement bulkhead architectures (also known as cell-based architectures) to restrict the effect of failure within a workload to a limited number of components. **Desired outcome:** A cell-based architecture uses multiple isolated instances of a workload, where each instance is known as a cell. Each cell is independent, does not share state with other cells, and handles a subset of the overall workload requests. This reduces the potential impact of a failure, such as a bad software update, to an individual cell and the requests it is processing. If a workload uses 10 cells to service 100 requests, when a failure occurs, 90% of the overall requests would be unaffected by the failure. **Common anti-patterns:** - Allowing cells to grow without bounds. - Applying code updates or deployments to all cells at the same time. - Sharing state or components between cells (with the exception of the router layer). - Adding complex business or routing logic to the router layer. - Not minimizing cross-cell interactions. **Benefits of establishing this best practice:** With cell-based architectures, many common types of failure are contained within the cell itself, providing additional fault isolation. These fault boundaries can provide resilience against failure types that otherwise are hard to contain, such as unsuccessful code deployments or requests that are corrupted or invoke a specific failure mode (also known as poison pill requests). **Level of risk exposed if this best practice is not established:** High ## Implementation guidance On a ship, bulkheads ensure that a hull breach is contained within one section of the hull. In complex systems, this pattern is often replicated to allow fault isolation. Fault isolated boundaries restrict the effect of a failure within a workload to a limited number of components. Components outside of the boundary are unaffected by the failure. Using multiple fault isolated boundaries, you can limit the impact on your workload. On AWS, customers can use multiple Availability Zones and Regions to provide fault isolation, but the concept of fault isolation can be extended to your workloadโ€™s architecture as well. The overall workload is partitioned cells by a partition key. This key needs to align with the grain of the service, or the natural way that a service's workload can be subdivided with minimal cross-cell interactions. Examples of partition keys are customer ID, resource ID, or any other parameter easily accessible in most API calls. A cell routing layer distributes requests to individual cells based on the partition key and presents a single endpoint to clients. ### Implementation steps When designing a cell-based architecture, there are several design considerations to consider: 1. **Partition key:** Special consideration should be taken while choosing the partition key. - It should align with the grain of the service, or the natural way that a service's workload can be subdivided with minimal cross-cell interactions. Examples are customer ID or resource ID. - The partition key must be available in all requests, either directly or in a way that could be easily inferred deterministically by other parameters. 2. **Persistent cell mapping:** Upstream services should only interact with a single cell for the lifecycle of their resources. - Depending on the workload, a cell migration strategy may be needed to migrate data from one cell to another. A possible scenario when a cell migration may be needed is if a particular user or resource in your workload becomes too big and requires it to have a dedicated cell. - Cells should not share state or components between cells. - Consequently, cross-cell interactions should be avoided or kept to a minimum, as those interactions create dependencies between cells and therefore diminish the fault isolation improvements. 3. **Router layer:** The router layer is a shared component between cells, and therefore cannot follow the same compartmentalization strategy as with cells. - It is recommended for the router layer to distribute requests to individual cells using a partition mapping algorithm in a computationally efficient manner, such as combining cryptographic hash functions and modular arithmetic to map partition keys to cells. - To avoid multi-cell impacts, the routing layer must remain as simple and horizontally scalable as possible, which necessitates avoiding complex business logic within this layer. This has the added benefit of making it easy to understand its expected behavior at all times, allowing for thorough testability. 4. **Cell size:** Cells should have a maximum size and should not be allowed to grow beyond it. - The maximum size should be identified by performing thorough testing, until breaking points are reached and safe operating margins are established. For more detail on how to implement testing practices, see REL07-BP04 Load test your workload. - The overall workload should grow by adding additional cells, allowing the workload to scale with increases in demand. 5. **Multi-AZ or Multi-Region strategies:** Multiple layers of resilience should be leveraged to protect against different failure domains. - For resilience, you should use an approach that builds layers of defense. One layer protects against smaller, more common disruptions by building a highly available architecture using multiple AZs. Another layer of defense is meant to protect against rare events like widespread natural disasters and Region-level disruptions. This second layer involves architecting your application to span multiple AWS Regions. Implementing a multi-Region strategy for your workload helps protect it against widespread natural disasters that affect a large geographic region of a country, or technical failures of Region-wide scope. Be aware that implementing a multi-Region architecture can be significantly complex, and is usually not required for most workloads. For more detail, see REL10-BP01 Deploy the workload to multiple locations. 6. **Code deployment:** A staggered code deployment strategy should be preferred over deploying code changes to all cells at the same time. - This helps minimize potential failure to multiple cells due to a bad deployment or human error.

๐Ÿ’ผ REL11-BP01 Monitor all components of the workload to detect failures

Continually monitor the health of your workload so that you and your automated systems are aware of failures or degradations as soon as they occur. Monitor for key performance indicators (KPIs) based on business value. All recovery and healing mechanisms must start with the ability to detect problems quickly. Technical failures should be detected first so that they can be resolved. However, availability is based on the ability of your workload to deliver business value, so key performance indicators (KPIs) that measure this need to be a part of your detection and remediation strategy. **Desired outcome:** Essential components of a workload are monitored independently to detect and alert on failures when and where they happen. **Common anti-patterns:** - No alarms have been configured, so outages occur without notification. - Alarms exist, but at thresholds that don't provide adequate time to react. - Metrics are not collected often enough to meet the recovery time objective (RTO). - Only the customer facing interfaces of the workload are actively monitored. - Only collecting technical metrics, no business function metrics. - No metrics measuring the user experience of the workload. - Too many monitors are created. **Benefits of establishing this best practice:** Having appropriate monitoring at all layers allows you to reduce recovery time by reducing time to detection. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Identify all workloads that will be reviewed for monitoring. Once you have identified all components of the workload that will need to monitored, you will now need to determine the monitoring interval. The monitoring interval will have a direct impact on how fast recovery can be initiated based on the time it takes to detect a failure. The mean time to detection (MTTD) is the amount of time between a failure occurring and when repair operations begin. The list of services should be extensive and complete. Monitoring must cover all layers of the application stack including application, platform, infrastructure, and network. Your monitoring strategy should consider the impact of gray failures. ### Implementation steps - Your monitoring interval is dependent on how quickly you must recover. Your recovery time is driven by the time it takes to recover, so you must determine the frequency of collection by accounting for this time and your recovery time objective (RTO). - Configure detailed monitoring for components and managed services. - Determine if detailed monitoring for EC2 instances and Auto Scaling is necessary. Detailed monitoring provides one minute interval metrics, and default monitoring provides five minute interval metrics. - Determine if enhanced monitoring for RDS is necessary. Enhanced monitoring uses an agent on RDS instances to get useful information about different process or threads. - Determine the monitoring requirements of critical serverless components for Lambda, API Gateway, Amazon EKS, Amazon ECS, and all types of load balancers. - Determine the monitoring requirements of storage components for Amazon S3, Amazon FSx, Amazon EFS, and Amazon EBS. - Create custom metrics to measure business key performance indicators (KPIs). Workloads implement key business functions, which should be used as KPIs that help identify when an indirect problem happens. - Monitor the user experience for failures using user canaries. Synthetic transaction testing (also known as canary testing, but not to be confused with canary deployments) that can run and simulate customer behavior is among the most important testing processes. Run these tests constantly against your workload endpoints from diverse remote locations. - Create custom metrics that track the user's experience. If you can instrument the experience of the customer, you can determine when the consumer experience degrades. - Set alarms to detect when any part of your workload is not working properly and to indicate when to automatically scale resources. Alarms can be visually displayed on dashboards, send alerts through Amazon SNS or email, and work with Auto Scaling to scale workload resources up or down. - Create dashboards to visualize your metrics. Dashboards can be used to visually see trends, outliers, and other indicators of potential problems or to provide an indication of problems you may want to investigate. - Create distributed tracing monitoring for your services. With distributed monitoring, you can understand how your application and its underlying services are performing to identify and troubleshoot the root cause of performance issues and errors. - Create monitoring systems (using CloudWatch or X-Ray) dashboards and data collection in a separate Region and account. - Stay informed about service degradations with AWS Health. Create purpose-fit AWS Health event notifications to e-mail and chat channels through AWS User Notifications and integrate programmatically with your monitoring and alerting tools through Amazon EventBridge.

๐Ÿ’ผ REL11-BP02 Fail over to healthy resources

If a resource failure occurs, healthy resources should continue to serve requests. For location impairments (such as Availability Zone or AWS Region), ensure that you have systems in place to fail over to healthy resources in unimpaired locations. When designing a service, distribute load across resources, Availability Zones, or Regions. Therefore, failure of an individual resource or impairment can be mitigated by shifting traffic to remaining healthy resources. Consider how services are discovered and routed to in the event of a failure. Design your services with fault recovery in mind. At AWS, we design services to minimize the time to recover from failures and impact on data. Our services primarily use data stores that acknowledge requests only after they are durably stored across multiple replicas within a Region. They are constructed to use cell-based isolation and use the fault isolation provided by Availability Zones. We use automation extensively in our operational procedures. We also optimize our replace-and-restart functionality to recover quickly from interruptions. The patterns and designs that allow for the failover vary for each AWS platform service. Many AWS native managed services are natively multiple Availability Zone (like Lambda or API Gateway). Other AWS services (like EC2 and EKS) require specific best practice designs to support failover of resources or data storage across AZs. Monitoring should be set up to check that the failover resource is healthy, track the progress of the resources failing over, and monitor business process recovery. **Desired outcome:** Systems are capable of automatically or manually using new resources to recover from degradation. **Common anti-patterns:** - Planning for failure is not part of the planning and design phase. - RTO and RPO are not established. - Insufficient monitoring to detect failing resources. - Proper isolation of failure domains. - Multi-Region fail over is not considered. - Detection for failure is too sensitive or aggressive when deciding to failover. - Not testing or validating failover design. - Performing auto healing automation, but not notifying that healing was needed. - Lack of dampening period to avoid failing back too soon. **Benefits of establishing this best practice:** You can build more resilient systems that maintain reliability when experiencing failures by degrading gracefully and recovering quickly. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS services, such as Elastic Load Balancing and Amazon EC2 Auto Scaling, help distribute load across resources and Availability Zones. Therefore, failure of an individual resource (such as an EC2 instance) or impairment of an Availability Zone can be mitigated by shifting traffic to remaining healthy resources. For multi-Region workloads, designs are more complicated. For example, cross-Region read replicas allow you to deploy your data to multiple AWS Regions. However, failover is still required to promote the read replica to primary and then point your traffic to the new endpoint. Amazon Route 53, Amazon Application Recovery Controller (ARC), Amazon CloudFront, and AWS Global Accelerator can help route traffic across AWS Regions. AWS services, such as Amazon S3, Lambda, API Gateway, Amazon SQS, Amazon SNS, Amazon SES, Amazon Pinpoint, Amazon ECR, AWS Certificate Manager, EventBridge, or Amazon DynamoDB, are automatically deployed to multiple Availability Zones by AWS. In case of failure, these AWS services automatically route traffic to healthy locations. Data is redundantly stored in multiple Availability Zones and remains available. For Amazon RDS, Amazon Aurora, Amazon Redshift, Amazon EKS, or Amazon ECS, Multi-AZ is a configuration option. AWS can direct traffic to the healthy instance if failover is initiated. This failover action may be taken by AWS or as required by the customer. For Amazon EC2 instances, Amazon Redshift, Amazon ECS tasks, or Amazon EKS pods, you choose which Availability Zones to deploy to. For some designs, Elastic Load Balancing provides the solution to detect instances in unhealthy zones and route traffic to the healthy ones. Elastic Load Balancing can also route traffic to components in your on-premises data center. For Multi-Region traffic failover, rerouting can leverage Amazon Route 53, Amazon Application Recovery Controller, AWS Global Accelerator, Route 53 Private DNS for VPCs, or CloudFront to provide a way to define internet domains and assign routing policies, including health checks, to route traffic to healthy Regions. AWS Global Accelerator provides static IP addresses that act as a fixed entry point to your application, then route to endpoints in AWS Regions of your choosing, using the AWS global network instead of the internet for better performance and reliability. ### Implementation steps - Create failover designs for all appropriate applications and services. Isolate each architecture component and create failover designs meeting RTO and RPO for each component. - Configure lower environments (like development or test) with all services that are required to have a failover plan. Deploy the solutions using infrastructure as code (IaC) to ensure repeatability. - Configure a recovery site such as a second Region to implement and test the failover designs. If necessary, resources for testing can be configured temporarily to limit additional costs. - Determine which failover plans are automated by AWS, which can be automated by a DevOps process, and which might be manual. Document and measure each service's RTO and RPO. - Create a failover playbook and include all steps to failover each resource, application, and service. - Create a failback playbook and include all steps to failback (with timing) each resource, application, and service. - Create a plan to initiate and rehearse the playbook. Use simulations and chaos testing to test the playbook steps and automation. - For location impairment (such as Availability Zone or AWS Region), ensure you have systems in place to fail over to healthy resources in unimpaired locations. Check quota, autoscaling levels, and resources running before failover testing.

๐Ÿ’ผ REL11-BP03 Automate healing on all layers

Upon detection of a failure, use automated capabilities to perform actions to remediate. Degradations may be automatically healed through internal service mechanisms or require resources to be restarted or removed through remediation actions. For self-managed applications and cross-Region healing, recovery designs and automated healing processes can be pulled from existing best practices. The ability to restart or remove a resource is an important tool to remediate failures. A best practice is to make services stateless where possible. This prevents loss of data or availability on resource restart. In the cloud, you can (and generally should) replace the entire resource (for example, a compute instance or serverless function) as part of the restart. The restart itself is a simple and reliable way to recover from failure. Many different types of failures occur in workloads. Failures can occur in hardware, software, communications, and operations. Restarting or retrying also applies to network requests. Apply the same recovery approach to both a network timeout and a dependency failure where the dependency returns an error. Both events have a similar effect on the system, so rather than attempting to make either event a special case, apply a similar strategy of limited retry with exponential backoff and jitter. Ability to restart is a recovery mechanism featured in recovery-oriented computing and high availability cluster architectures. **Desired outcome:** Automated actions are performed to remediate detection of a failure. **Common anti-patterns:** - Provisioning resources without autoscaling. - Deploying applications in instances or containers individually. - Deploying applications that cannot be deployed into multiple locations without using automatic recovery. - Manually healing applications that automatic scaling and automatic recovery fail to heal. - No automation to failover databases. - Lack automated methods to reroute traffic to new endpoints. - No storage replication. **Benefits of establishing this best practice:** Automated healing can reduce your mean time to recovery and improve your availability. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Designs for Amazon EKS or other Kubernetes services should include both minimum and maximum replica or stateful sets and the minimum cluster and node group sizing. These mechanisms provide a minimum amount of continually-available processing resources while automatically remediating any failures using the Kubernetes control plane. Design patterns that are accessed through a load balancer using compute clusters should leverage Auto Scaling groups. Elastic Load Balancing (ELB) automatically distributes incoming application traffic across multiple targets and virtual appliances in one or more Availability Zones (AZs). Clustered compute-based designs that do not use load balancing should have their size designed for loss of at least one node. This will allow for the service to maintain itself running in potentially reduced capacity while it's recovering a new node. Example services are Mongo, DynamoDB Accelerator, Amazon Redshift, Amazon EMR, Cassandra, Kafka, MSK-EC2, Couchbase, ELK, and Amazon OpenSearch Service. Many of these services can be designed with additional auto healing features. Some cluster technologies must generate an alert upon the loss a node triggering an automated or manual workflow to recreate a new node. This workflow can be automated using AWS Systems Manager to remediate issues quickly. Amazon EventBridge can be used to monitor and filter for events such as CloudWatch alarms or changes in state in other AWS services. Based on event information, it can then invoke AWS Lambda, Systems Manager Automation, or other targets to run custom remediation logic on your workload. Amazon EC2 Auto Scaling can be configured to check for EC2 instance health. If the instance is in any state other than running, or if the system status is impaired, Amazon EC2 Auto Scaling considers the instance to be unhealthy and launches a replacement instance. For large-scale replacements (such as the loss of an entire Availability Zone), static stability is preferred for high availability. ### Implementation steps - Use Auto Scaling groups to deploy tiers in a workload. Auto Scaling can perform self-healing on stateless applications and add or remove capacity. - For compute instances noted previously, use load balancing and choose the appropriate type of load balancer. - Consider healing for Amazon RDS. With standby instances, configure for auto failover to the standby instance. For Amazon RDS Read Replica, automated workflow is required to make a read replica primary. - Implement automatic recovery on EC2 instances that have applications deployed that cannot be deployed in multiple locations, and can tolerate rebooting upon failures. Automatic recovery can be used to replace failed hardware and restart the instance when the application is not capable of being deployed in multiple locations. The instance metadata and associated IP addresses are kept, as well as the EBS volumes and mount points to Amazon Elastic File System or File Systems for Lustre and Windows. Using AWS OpsWorks, you can configure automatic healing of EC2 instances at the layer level. - Implement automated recovery using AWS Step Functions and AWS Lambda when you cannot use automatic scaling or automatic recovery, or when automatic recovery fails. When you cannot use automatic scaling, and either cannot use automatic recovery or automatic recovery fails, you can automate the healing using AWS Step Functions and AWS Lambda. - Amazon EventBridge can be used to monitor and filter for events such as CloudWatch alarms or changes in state in other AWS services. Based on event information, it can then invoke AWS Lambda (or other targets) to run custom remediation logic on your workload.

๐Ÿ’ผ REL11-BP04 Rely on the data plane and not the control plane during recovery

Control planes provide the administrative APIs used to create, read and describe, update, delete, and list (CRUDL) resources, while data planes handle day-to-day service traffic. When implementing recovery or mitigation responses to potentially resiliency-impacting events, focus on using a minimal number of control plane operations to recover, rescale, restore, heal, or failover the service. Data plane action should supersede any activity during these degradation events. For example, the following are all control plane actions: launching a new compute instance, creating block storage, and describing queue services. When you launch compute instances, the control plane has to perform multiple tasks like finding a physical host with capacity, allocating network interfaces, preparing local block storage volumes, generating credentials, and adding security rules. Control planes tend to be complicated orchestration. **Desired outcome:** When a resource enters an impaired state, the system is capable of automatically or manually recovering by shifting traffic from impaired to healthy resources. **Common anti-patterns:** - Dependence on changing DNS records to re-route traffic. - Dependence on control-plane scaling operations to replace impaired components due to insufficiently provisioned resources. - Relying on extensive, multi service, multi-API control plane actions to remediate any category of impairment. **Benefits of establishing this best practice:** Increased success rate for automated remediation can reduce your mean time to recovery and improve availability of the workload. **Level of risk exposed if this best practice is not established:** Medium: For certain types of service degradations, control planes are affected. Dependencies on extensive use of the control plane for remediation may increase recovery time (RTO) and mean time to recovery (MTTR). ## Implementation guidance To limit data plane actions, assess each service for what actions are required to restore service. Leverage Amazon Application Recovery Controller to shift the DNS traffic. These features continually monitor your applicationโ€™s ability to recover from failures and allow you to control your application recovery across multiple AWS Regions, Availability Zones, and on premises. Route 53 routing policies use the control plane, so do not rely on it for recovery. The Route 53 data planes answer DNS queries and perform and evaluate health checks. They are globally distributed and designed for a 100% availability service level agreement (SLA). The Route 53 management APIs and consoles where you create, update, and delete Route 53 resources run on control planes that are designed to prioritize the strong consistency and durability that you need when managing DNS. To achieve this, the control planes are located in a single Region: US East (N. Virginia). While both systems are built to be very reliable, the control planes are not included in the SLA. There could be rare events in which the data planeโ€™s resilient design allows it to maintain availability while the control planes do not. For disaster recovery and failover mechanisms, use data plane functions to provide the best possible reliability. Design your compute infrastructure to be statically stable to avoid using the control plane during an incident. For example, if you are using Amazon EC2 instances, avoid provisioning new instances manually or instructing Auto Scaling Groups to add instances in response. For the highest levels of resilience, provision sufficient capacity in the cluster used for failover. If this capacity threshold must be limited, set throttles on the overall end-to-end system to safely limit the total traffic reaching the limited set of resources. For services like Amazon DynamoDB, Amazon API Gateway, load balancers, and AWS Lambda serverless, using those services leverages the data plane. However, creating new functions, load balancers, API gateways, or DynamoDB tables is a control plane action and should be completed before the degradation as preparation for an event and rehearsal of failover actions. For Amazon RDS, data plane actions allow for access to data. Understand which operations are on the data plane and which are on the control plane. ### Implementation steps For each workload that needs to be restored after a degradation event, evaluate the failover runbook, high availability design, auto healing design, or HA resource restoration plan. Identity each action that might be considered a control plane action. Consider changing the control action to a data plane action: - Auto Scaling (control plane) to pre-scaled Amazon EC2 resources (data plane) - Amazon EC2 instance scaling (control plane) to AWS Lambda scaling (data plane) - Assess any designs using Kubernetes and the nature of the control plane actions. Adding pods is a data plane action in Kubernetes. Actions should be limited to adding pods and not adding nodes. Using over-provisioned nodes is the preferred method to limit control plane actions. Consider alternate approaches that allow for data plane actions to affect the same remediation. - Route 53 Record change (control plane) or Amazon Application Recovery Controller (data plane) - Route 53 Health checks for more automated updates Consider some services in a secondary Region, if the service is mission critical, to allow for more control plane and data plane actions in an unaffected Region. - Amazon EC2 Auto Scaling or Amazon EKS in a primary Region compared to Amazon EC2 Auto Scaling or Amazon EKS in a secondary Region and routing traffic to secondary Region (control plane action) - Make read replica in secondary primary or attempting same action in primary Region (control plane action)

๐Ÿ’ผ REL11-BP05 Use static stability to prevent bimodal behavior

Workloads should be statically stable and only operate in a single normal mode. Bimodal behavior is when your workload exhibits different behavior under normal and failure modes. For example, you might try and recover from an Availability Zone failure by launching new instances in a different Availability Zone. This can result in a bimodal response during a failure mode. You should instead build workloads that are statically stable and operate within only one mode. In this example, those instances should have been provisioned in the second Availability Zone before the failure. This static stability design verifies that the workload only operates in a single mode. **Desired outcome:** Workloads do not exhibit bimodal behavior during normal and failure modes. **Common anti-patterns:** - Assuming resources can always be provisioned regardless of the failure scope. - Trying to dynamically acquire resources during a failure. - Not provisioning adequate resources across zones or Regions until a failure occurs. - Considering static stable designs for compute resources only. **Benefits of establishing this best practice:** Workloads running with statically stable designs are capable of having predictable outcomes during normal and failure events. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Bimodal behavior occurs when your workload exhibits different behavior under normal and failure modes (for example, relying on launching new instances if an Availability Zone fails). An example of bimodal behavior is when stable Amazon EC2 designs provision enough instances in each Availability Zone to handle the workload load if one AZ were removed. Elastic Load Balancing or Amazon Route 53 health would check to shift a load away from the impaired instances. After traffic has shifted, use AWS Auto Scaling to asynchronously replace instances from the failed zone and launch them in the healthy zones. Static stability for compute deployment (such as EC2 instances or containers) results in the highest reliability. This must be weighed against the cost for this model and the business value of maintaining the workload under all resilience cases. It's less expensive to provision less compute capacity and rely on launching new instances in the case of a failure, but for large-scale failures (such as an Availability Zone or Regional impairment), this approach is less effective because it relies on both an operational plane, and sufficient resources being available in the unaffected zones or Regions. Your solution should also weigh reliability against the costs needs for your workload. Static stability architectures apply to a variety of architectures including compute instances spread across Availability Zones, database read replica designs, Kubernetes (Amazon EKS) cluster designs, and multi-Region failover architectures. It is also possible to implement a more statically stable design by using more resources in each zone. By adding more zones, you reduce the amount of additional compute you need for static stability. An example of bimodal behavior would be a network timeout that could cause a system to attempt to refresh the configuration state of the entire system. This would add an unexpected load to another component and might cause it to fail, resulting in other unexpected consequences. This negative feedback loop impacts the availability of your workload. Instead, you can build systems that are statically stable and operate in only one mode. A statically stable design would do constant work and always refresh the configuration state on a fixed cadence. When a call fails, the workload would use the previously cached value and initiate an alarm. Another example of bimodal behavior is allowing clients to bypass your workload cache when failures occur. This might seem to be a solution that accommodates client needs but it can significantly change the demands on your workload and is likely to result in failures. Assess critical workloads to determine what workloads require this type of resilience design. For those that are deemed critical, each application component must be reviewed. Example types of services that require static stability evaluations are: - Compute: Amazon EC2, EKS-EC2, ECS-EC2, EMR-EC2 - Databases: Amazon Redshift, Amazon RDS, Amazon Aurora - Storage: Amazon S3 (Single Zone), Amazon EFS (mounts), Amazon FSx (mounts) - Load balancers: Under certain designs ### Implementation steps - Build systems that are statically stable and operate in only one mode. In this case, provision enough instances in each Availability Zone or Region to handle the workload capacity if one Availability Zone or Region were removed. A variety of services can be used for routing to healthy resources, such as: - Cross Region DNS Routing - MRAP Amazon S3 MultiRegion Routing - AWS Global Accelerator - Amazon Application Recovery Controller - Configure database read replicas to account for the loss of a single primary instance or a read replica. If traffic is being served by read replicas, the quantity in each Availability Zone and each Region should equate to the overall need in case of the zone or Region failure. - Configure critical data in Amazon S3 storage that is designed to be statically stable for data stored in case of an Availability Zone failure. If Amazon S3 One Zone-IA storage class is used, this should not be considered statically stable, as the loss of that zone minimizes access to this stored data. - Load balancers are sometimes configured incorrectly or by design to service a specific Availability Zone. In this case, the statically stable design might be to spread a workload across multiple AZs in a more complex design. The original design may be used to reduce interzone traffic for security, latency, or cost reasons.

๐Ÿ’ผ REL11-BP06 Send notifications when events impact availability

Notifications are sent upon the detection of thresholds breached, even if the event causing the issue was automatically resolved. Automated healing allows your workload to be reliable. However, it can also obscure underlying problems that need to be addressed. Implement appropriate monitoring and events so that you can detect patterns of problems, including those addressed by auto healing, so that you can resolve root cause issues. Resilient systems are designed so that degradation events are immediately communicated to the appropriate teams. These notifications should be sent through one or many communication channels. **Desired outcome:** Alerts are immediately sent to operations teams when thresholds are breached, such as error rates, latency, or other critical key performance indicator (KPI) metrics, so that these issues are resolved as soon as possible and user impact is avoided or minimized. **Common anti-patterns:** - Sending too many alarms. - Sending alarms that are not actionable. - Setting alarm thresholds too high (over sensitive) or too low (under sensitive). - Not sending alarms for external dependencies. - Not considering gray failures when designing monitoring and alarms. - Performing healing automation, but not notifying the appropriate team that healing was needed. **Benefits of establishing this best practice:** Notifications of recovery make operational and business teams aware of service degradations so that they can react immediately to minimize both mean time to detect (MTTD) and mean time to repair (MTTR). Notifications of recovery events also assure that you don't ignore problems that occur infrequently. **Level of risk exposed if this best practice is not established:** Medium. Failure to implement appropriate monitoring and events notification mechanisms can result in failure to detect patterns of problems, including those addressed by auto healing. A team will only be made aware of system degradation when users contact customer service or by chance. ## Implementation guidance When defining a monitoring strategy, a triggered alarm is a common event. This event would likely contain an identifier for the alarm, the alarm state (such as IN ALARM or OK), and details of what triggered it. In many cases, an alarm event should be detected and an email notification sent. This is an example of an action on an alarm. Alarm notification is critical in observability, as it informs the right people that there is an issue. However, when action on events mature in your observability solution, it can automatically remediate the issue without the need for human intervention. Once KPI-monitoring alarms have been established, alerts should be sent to appropriate teams when thresholds are exceeded. Those alerts may also be used to trigger automated processes that will attempt to remediate the degradation. For more complex threshold monitoring, composite alarms should be considered. Composite alarms use a number of KPI-monitoring alarms to create an alert based on operational business logic. CloudWatch Alarms can be configured to send emails, or to log incidents in third-party incident tracking systems using Amazon SNS integration or Amazon EventBridge. ### Implementation steps Create various types of alarms based on how the workloads are monitored, such as: - Application alarms are used to detect when any part of your workload is not working properly. - Infrastructure alarms indicate when to scale resources. Alarms can be visually displayed on dashboards, send alerts through Amazon SNS or email, and work with Auto Scaling to scale workload resources in or out. - Simple static alarms can be created to monitor when a metric breaches a static threshold for a specified number of evaluation periods. - Composite alarms can account for complex alarms from multiple sources. - Once the alarm has been created, create appropriate notification events. You can directly invoke an Amazon SNS API to send notifications and link any automation for remediation or communication. - Stay informed about service degradations with AWS Health. Create purpose-fit AWS Health event notifications to e-mail and chat channels through AWS User Notifications and integrate programmatically with your monitoring and alerting tools through Amazon EventBridge.

๐Ÿ’ผ REL11-BP07 Architect your product to meet availability targets and uptime service level agreements (SLAs)

Architect your product to meet availability targets and uptime service level agreements (SLAs). If you publish or privately agree to availability targets or uptime SLAs, verify that your architecture and operational processes are designed to support them. **Desired outcome:** Each application has a defined target for availability and SLA for performance metrics, which can be monitored and maintained in order to meet business outcomes. **Common anti-patterns:** - Designing and deploying workloadโ€™s without setting any SLAs. - SLA metrics are set too high without rationale or business requirements. - Setting SLAs without taking into account for dependencies and their underlying SLA. - Application designs are created without considering the Shared Responsibility Model for Resilience. **Benefits of establishing this best practice:** Designing applications based on key resiliency targets helps you meet business objectives and customer expectations. These objectives help drive the application design process that evaluates different technologies and considers various tradeoffs. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Application designs have to account for a diverse set of requirements that are derived from business, operational, and financial objectives. Within the operational requirements, workloads need to have specific resilience metric targets so they can be properly monitored and supported. Resilience metrics should not be set or derived after deploying the workload. They should be defined during the design phase and help guide various decisions and tradeoffs. - Every workload should have its own set of resilience metrics. Those metrics may be different from other business applications. - Reducing dependencies can have a positive impact on availability. Each workload should consider its dependencies and their SLAs. In general, select dependencies with availability goals equal to or greater than the goals of your workload. - Consider loosely coupled designs so your workload can operate correctly despite dependency impairment, where possible. - Reduce control plane dependencies, especially during recovery or a degradation. Evaluate designs that are statically stable for mission critical workloads. Use resource sparing to increase the availability of those dependencies in a workload. - Observability and instrumentation are critical for achieving SLAs by reducing Mean Time to Detection (MTTD) and Mean Time to Repair (MTTR). - Less frequent failure (longer MTBF), shorter failure detection times (shorter MTTD), and shorter repair times (shorter MTTR) are the three factors that are used to improve availability in distributed systems. - Establishing and meeting resilience metrics for a workload is foundational to any effective design. Those designs must factor in tradeoffs of design complexity, service dependencies, performance, scaling, and costs. ### Implementation steps - Review and document the workload design considering the following questions: - Where are control planes used in the workload? - How does the workload implement fault tolerance? - What are the design patterns for scaling, automatic scaling, redundancy, and highly available components? - What are the requirements for data consistency and availability? - Are there considerations for resource sparing or resource static stability? - What are the service dependencies? - Define SLA metrics based on the workload architecture while working with stakeholders. Consider the SLAs of all dependencies used by the workload. - Once the SLA target has been set, optimize the architecture to meet the SLA. - Once the design is set that will meet the SLA, implement operational changes, process automation, and runbooks that also will have focus on reducing MTTD and MTTR. - Once deployed, monitor and report on the SLA.

๐Ÿ’ผ REL12-BP01 Use playbooks to investigate failures

Permit consistent and prompt responses to failure scenarios that are not well understood, by documenting the investigation process in playbooks. Playbooks are the predefined steps performed to identify the factors contributing to a failure scenario. The results from any process step are used to determine the next steps to take until the issue is identified or escalated. The playbook is proactive planning that you must do to be able to take reactive actions effectively. When failure scenarios not covered by the playbook are encountered in production, first address the issue (put out the fire). Then go back and review the steps you took to address the issue and use these to add a new entry in the playbook. Note that playbooks are used in response to specific incidents, while runbooks are used to achieve specific outcomes. Often, runbooks are used for routine activities and playbooks are used to respond to non-routine events. **Common anti-patterns:** - Planning to deploy a workload without knowing the processes to diagnose issues or respond to incidents. - Unplanned decisions about which systems to gather logs and metrics from when investigating an event. - Not retaining metrics and events long enough to be able to retrieve the data. **Benefits of establishing this best practice:** Capturing playbooks ensures that processes can be consistently followed. Codifying your playbooks limits the introduction of errors from manual activity. Automating playbooks shortens the time to respond to an event by eliminating the requirement for team member intervention or providing them additional information when their intervention begins. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance - Use playbooks to identify issues. Playbooks are documented processes to investigate issues. They allow consistent and prompt responses to failure scenarios. Playbooks must contain the information and guidance necessary for an adequately skilled person to gather applicable information, identify potential sources of failure, isolate faults, and determine contributing factors (perform post-incident analysis). - Implement playbooks as code. Perform your operations as code by scripting your playbooks to ensure consistency and reduce errors caused by manual processes. Playbooks can be composed of multiple scripts representing the different steps that might be necessary to identify the contributing factors to an issue. Runbook activities can be invoked or performed as part of playbook activities, or might prompt to run a playbook in response to identified events.

๐Ÿ’ผ REL12-BP02 Perform post-incident analysis

Review customer-impacting events, and identify the contributing factors and preventative action items. Use this information to develop mitigations to limit or prevent recurrence. Develop procedures for prompt and effective responses. Communicate contributing factors and corrective actions as appropriate, tailored to target audiences. Have a method to communicate these causes to others as needed. Assess why existing testing did not find the issue. Add tests for this case if tests do not already exist. **Desired outcome** Your teams have a consistent and agreed upon approach to handling post-incident analysis. One mechanism is the correction of error (COE) process. The COE process helps your teams identify, understand, and address the root causes for incidents, while also building mechanisms and guardrails to limit the probability of the same incident happening again. **Common anti-patterns** - Finding contributing factors, but not continuing to look deeper for other potential problems and approaches to mitigate. - Only identifying human error causes, and not providing any training or automation that could prevent human errors. - Focus on assigning blame rather than understanding the root cause, creating a culture of fear and hindering open communication. - Failure to share insights, which keeps incident analysis findings within a small group and prevents others from benefiting from the lessons learned. - No mechanism to capture institutional knowledge, thereby losing valuable insights by not preserving the lessons-learned in the form of updated best practices and resulting in repeat incidents with the same or similar root cause. **Benefits of establishing this best practice** Conducting post-incident analysis and sharing the results permits other workloads to mitigate the risk if they have implemented the same contributing factors, and allows them to implement the mitigation or automated recovery before an incident occurs. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Good post-incident analysis provides opportunities to propose common solutions for problems with architecture patterns that are used in other places in your systems. A cornerstone of the COE process is documenting and addressing issues. It is recommended to define a standardized way to document critical root causes, and ensure they are reviewed and addressed. Assign clear ownership for the post-incident analysis process. Designate a responsible team or individual who will oversee incident investigations and follow-ups. Encourage a culture that focuses on learning and improvement rather than assigning blame. Emphasize that the goal is to prevent future incidents, not to penalize individuals. Develop well-defined procedures for conducting post-incident analyses. These procedures should outline the steps to be taken, the information to be collected, and the key questions to be addressed during the analysis. Investigate incidents thoroughly, going beyond immediate causes to identify root causes and contributing factors. Use techniques like the five whys to delve deep into the underlying issues. Maintain a repository of lessons learned from incident analyses. This institutional knowledge can serve as a reference for future incidents and prevention efforts. Share findings and insights from post-incident analyses, and consider holding open-invite post-incident review meetings to discuss lessons learned. ### Implementation steps - While conducting post-incident analysis, ensure the process is blame-free. This allows people involved in the incident to be dispassionate about the proposed corrective actions and promote honest self-assessment and collaboration across teams. - Define a standardized way to document critical issues. An example structure for such document is as follows: - What happened? - What was the impact on customers and your business? - What was the root cause? - What data do you have to support this? For example, metrics and graphs - What were the critical pillar implications, especially security? - When architecting workloads, you make trade-offs between pillars based upon your business context. These business decisions can drive your engineering priorities. You might optimize to reduce cost at the expense of reliability in development environments, or, for mission-critical solutions, you might optimize reliability with increased costs. Security is always job zero, as you have to protect your customers. - What lessons did you learn? - What corrective actions are you taking? - Action items - Related items - Create well-defined standard operating procedures for conducting post-incident analyses. - Set up a standardized incident reporting process. Document all incidents comprehensively, including the initial incident report, logs, communications, and actions taken during the incident. - Remember that an incident does not require an outage. It could be a near-miss, or a system performing in an unexpected way while still fulfilling its business function. - Continually improve your post-incident analysis process based on feedback and lessons learned. - Capture key findings in a knowledge management system, and consider any patterns that should be added to developer guides or pre-deployment checklists.

๐Ÿ’ผ REL12-BP03 Test scalability and performance requirements

Use techniques such as load testing to validate that the workload meets scaling and performance requirements. In the cloud, you can create a production-scale test environment for your workload on demand. Instead of reliance on a scaled-down test environment, which could lead to inaccurate predictions of production behaviors, you can use the cloud to provision a test environment that closely mirrors your expected production environment. This environment helps you test in a more accurate simulation of the real-world conditions your application faces. Alongside your performance testing efforts, it's essential to validate that your base resources, scaling settings, service quotas, and resiliency design operate as expected under load. This holistic approach verifies that your application can reliably scale and perform as required, even under the most demanding conditions. **Desired outcome** Your workload maintains its expected behavior even while subject to peak load. You proactively address any performance-related issues that may arise as the application grows and evolves. **Common anti-patterns** - You use test environments that do not closely match the production environment. - You treat load testing as a separate, one-time activity rather than an integrated part of the deployment continuous integration (CI) pipeline. - You don't define clear and measurable performance requirements, such as response time, throughput, and scalability targets. - You perform tests with unrealistic or insufficient load scenarios, and you fail to test for peak loads, sudden spikes, and sustained high load. - You don't stress test the workload by exceeding expected load limits. - You use inadequate or inappropriate load testing and performance profiling tools. - You lack comprehensive monitoring and alerting systems to track performance metrics and detect anomalies. **Benefits of establishing this best practice** - Load testing helps you identify potential performance bottlenecks in your system before it goes into production. When you simulate production-level traffic and workloads, you can identify areas where your system may struggle to handle the load, such as slow response times, resource constraints, or system failures. - As you test your system under various load conditions, you can better understand the resource requirements needed to support your workload. This information can help you make informed decisions about resource allocation and prevent over-provisioning or under-provisioning of resources. - To identify potential failure points, you can observe how your workload performs under high load conditions. This information helps you improve your workload's reliability and resiliency by implementing fault-tolerance mechanisms, failover strategies, and redundancy measures, as appropriate. - You identify and address performance issues early, which helps you avoid the costly consequences of system outages, slow response times, and dissatisfied users. - Detailed performance data and profiling information collected during testing can help you troubleshoot performance-related issues that may arise in production. This can lead to faster incident response and resolution, which reduces the impact on users and your organization's operations. - In certain industries, proactive performance testing can help your workload meet compliance standards, which reduces the risk of penalties or legal issues. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance The first step is to define a comprehensive testing strategy that covers all aspects of scaling and performance requirements. To start, clearly define your workload's service-level objectives (SLOs) based on your business needs, such as throughput, latency histogram, and error rate. Next, design a suite of tests that can simulate various load scenarios that range from average usage to sudden spikes and sustained peak loads, and verify that the workload's behavior meets your SLOs. These tests should be automated and integrated into your continuous integration and deployment pipeline to catch performance regressions early in the development process. To effectively test scaling and performance, invest in the right tools and infrastructure. This includes load testing tools that can generate realistic user traffic, performance profiling tools to identify bottlenecks, and monitoring solutions to track key metrics. Importantly, you should verify that your test environments closely match the production environment in terms of infrastructure and environment conditions to make your test results as accurate as possible. To make it easier to reliably replicate and scale production-like setups, use infrastructure as code and container-based applications. Scaling and performance tests are an ongoing process, not a one-time activity. Implement comprehensive monitoring and alerting to track the application's performance in production, and use this data to continually refine your test strategies and optimization efforts. Regularly analyze performance data to identify emerging issues, test new scaling strategies, and implement optimizations to improve the application's efficiency and reliability. When you adopt an iterative approach and constantly learn from production data, you can verify that your application can adapt to variable user demands and maintain resiliency and optimal performance over time. ### Implementation steps - Establish clear and measurable performance requirements, such as response time, throughput, and scalability targets. These requirements should be based on your workload's usage patterns, user expectations, and business needs. - Select and configure a load testing tool that can accurately mimic the load patterns and user behavior in your production environment. - Set up a test environment that closely matches the production environment, including infrastructure and environment conditions, to improve the accuracy of your test results. - Create a test suite that covers a wide range of scenarios, from average usage patterns to peak loads, rapid spikes, and sustained high loads. Integrate the tests into your continuous integration and deployment pipelines to catch performance regressions early in the development process. - Conduct load testing to simulate real-world user traffic and understand how your application behaves under different load conditions. To stress test your application, exceed the expected load and observe its behavior, such as response time degradation, resource exhaustion, or system failures, which helps identify the breaking point of your application and inform scaling strategies. Evaluate the scalability of your workload by incrementally increasing the load, and measure the performance impact to identify scaling limits and plan for future capacity needs. - Implement comprehensive monitoring and alerting to track performance metrics, detect anomalies, and initiate scaling actions or notifications when thresholds are exceeded. - Continually monitor and analyze performance data to identify areas for improvement. Iterate on your testing strategies and optimization efforts.

๐Ÿ’ผ REL12-BP04 Test resiliency using chaos engineering

Run chaos experiments regularly in environments that are in or as close to production as possible to understand how your system responds to adverse conditions. **Desired outcome** The resilience of the workload is regularly verified by applying chaos engineering in the form of fault injection experiments or injection of unexpected load, in addition to resilience testing that validates known expected behavior of your workload during an event. Combine both chaos engineering and resilience testing to gain confidence that your workload can survive component failure and can recover from unexpected disruptions with minimal to no impact. **Common anti-patterns** - Designing for resiliency, but not verifying how the workload functions as a whole when faults occur. - Never experimenting under real-world conditions and expected load. - Not treating your experiments as code or maintaining them through the development cycle. - Not running chaos experiments both as part of your CI/CD pipeline, as well as outside of deployments. - Neglecting to use past post-incident analyses when determining which faults to experiment with. **Benefits of establishing this best practice** Injecting faults to verify the resilience of your workload allows you to gain confidence that the recovery procedures of your resilient design will work in the case of a real fault. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Chaos engineering provides your teams with capabilities to continually inject real world disruptions (simulations) in a controlled way at the service provider, infrastructure, workload, and component level, with minimal to no impact to your customers. It allows your teams to learn from faults and observe, measure, and improve the resilience of your workloads, as well as validate that alerts fire and teams get notified in the case of an event. When performed continually, chaos engineering can highlight deficiencies in your workloads that, if left unaddressed, could negatively affect availability and operation. If a system is able to withstand these disruptions, the chaos experiment should be maintained as an automated regression test. In this way, chaos experiments should be performed as part of your systems development lifecycle (SDLC) and as part of your CI/CD pipeline. To ensure that your workload can survive component failure, inject real world events as part of your experiments. For example, experiment with the loss of Amazon EC2 instances or failover of the primary Amazon RDS database instance, and verify that your workload is not impacted (or only minimally impacted). Use a combination of component faults to simulate events that may be caused by a disruption in an Availability Zone. For application-level faults (such as crashes), you can start with stressors such as memory and CPU exhaustion. To validate fallback or failover mechanisms for external dependencies due to intermittent network disruptions, your components should simulate such an event by blocking access to the third-party providers for a specified duration that can last from seconds to hours. Other modes of degradation might cause reduced functionality and slow responses, often resulting in a disruption of your services. Common sources of this degradation are increased latency on critical services and unreliable network communication (dropped packets). Experiments with these faults, including networking effects such as latency, dropped messages, and DNS failures, could include the inability to resolve a name, reach the DNS service, or establish connections to dependent services. **Chaos engineering tools** AWS Fault Injection Service (AWS FIS) is a fully managed service for running fault injection experiments that can be used as part of your CD pipeline, or outside of the pipeline. AWS FIS is a good choice to use during chaos engineering game days. It supports simultaneously introducing faults across different types of resources including Amazon EC2, Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon RDS. These faults include termination of resources, forcing failovers, stressing CPU or memory, throttling, latency, and packet loss. Since it is integrated with Amazon CloudWatch Alarms, you can set up stop conditions as guardrails to rollback an experiment if it causes unexpected impact. There are also several third-party options for fault injection experiments. These include open-source tools such as Chaos Toolkit, Chaos Mesh, and Litmus Chaos, as well as commercial options like Gremlin. To expand the scope of faults that can be injected on AWS, AWS FIS integrates with Chaos Mesh and Litmus Chaos, allowing you to coordinate fault injection workflows among multiple tools. For example, you can run a stress test on a podโ€™s CPU using Chaos Mesh or Litmus faults while terminating a randomly selected percentage of cluster nodes using AWS FIS fault actions. ### Implementation steps 1. Determine which faults to use for experiments. Assess the design of your workload for resiliency. Such designs (created using the best practices of the Well-Architected Framework) account for risks based on critical dependencies, past events, known issues, and compliance requirements. List each element of the design intended to maintain resilience and the faults it is designed to mitigate. For more information about creating such lists, see the Operational Readiness Review whitepaper. The Failure Modes and Effects Analysis (FMEA) process provides a framework for performing a component-level analysis of failures and how they impact your workload. 2. Assign a priority to each fault. Start with a coarse categorization such as high, medium, or low. To assess priority, consider frequency of the fault and impact of failure to the overall workload. When considering frequency of a given fault, analyze past data for this workload when available. If not available, use data from other workloads running in a similar environment. When considering impact of a given fault, the larger the scope of the fault, generally the larger the impact. Also consider the workload design and purpose. For example, the ability to access the source data stores is critical for a workload doing data transformation and analysis. In this case, you would prioritize experiments for access faults, as well as throttled access and latency insertion. Post-incident analyses are a good source of data to understand both frequency and impact of failure modes. Use the assigned priority to determine which faults to experiment with first and the order with which to develop new fault injection experiments. 3. For each experiment that you perform, follow the chaos engineering and continuous resilience flywheel in the following figure. 1. Define steady state as some measurable output of a workload that indicates normal behavior. Your workload exhibits steady state if it is operating reliably and as expected. Therefore, validate that your workload is healthy before defining steady state. Steady state does not necessarily mean no impact to the workload when a fault occurs, as a certain percentage in faults could be within acceptable limits. The steady state is your baseline that you will observe during the experiment, which will highlight anomalies if your hypothesis defined in the next step does not turn out as expected. For example, a steady state of a payments system can be defined as the processing of 300 TPS with a success rate of 99% and round-trip time of 500 ms. 2. Form a hypothesis about how the workload will react to the fault. A good hypothesis is based on how the workload is expected to mitigate the fault to maintain the steady state. The hypothesis states that given the fault of a specific type, the system or workload will continue steady state, because the workload was designed with specific mitigations. The specific type of fault and mitigations should be specified in the hypothesis. The following template can be used for the hypothesis (but other wording is also acceptable): For example: - If 20% of the nodes in the Amazon EKS node-group are taken down, the Transaction Create API continues to serve the 99th percentile of requests in under 100 ms (steady state). The Amazon EKS nodes will recover within five minutes, and pods will get scheduled and process traffic within eight minutes after the initiation of the experiment. Alerts will fire within three minutes. - If a single Amazon EC2 instance failure occurs, the order systemโ€™s Elastic Load Balancing health check will cause the Elastic Load Balancing to only send requests to the remaining healthy instances while the Amazon EC2 Auto Scaling replaces the failed instance, maintaining a less than 0.01% increase in server-side (5xx) errors (steady state). - If the primary Amazon RDS database instance fails, the Supply Chain data collection workload will failover and connect to the standby Amazon RDS database instance to maintain less than 1 minute of database read or write errors (steady state). 3. Run the experiment by injecting the fault. An experiment should by default be fail-safe and tolerated by the workload. If you know that the workload will fail, do not run the experiment. Chaos engineering should be used to find known-unknowns or unknown-unknowns. Known-unknowns are things you are aware of but donโ€™t fully understand, and unknown-unknowns are things you are neither aware of nor fully understand. Experimenting against a workload that you know is broken wonโ€™t provide you with new insights. Your experiment should be carefully planned, have a clear scope of impact, and provide a rollback mechanism that can be applied in case of unexpected turbulence. If your due-diligence shows that your workload should survive the experiment, move forward with the experiment. There are several options for injecting the faults. For workloads on AWS, AWS FIS provides many predefined fault simulations called actions. You can also define custom actions that run in AWS FIS using AWS Systems Manager documents. We discourage the use of custom scripts for chaos experiments, unless the scripts have the capabilities to understand the current state of the workload, are able to emit logs, and provide mechanisms for rollbacks and stop conditions where possible. An effective framework or toolset which supports chaos engineering should track the current state of an experiment, emit logs, and provide rollback mechanisms to support the controlled running of an experiment. Start with an established service like AWS FIS that allows you to perform experiments with a clearly defined scope and safety mechanisms that rollback the experiment if the experiment introduces unexpected turbulence. To learn about a wider variety of experiments using AWS FIS, also see the Resilient and Well-Architected Apps with Chaos Engineering lab. Also, AWS Resilience Hub will analyze your workload and create experiments that you can choose to implement and run in AWS FIS. Experiments should run in production under real-world load using canary deployments that spin up both a control and experimental system deployment, where feasible. Running experiments during off-peak times is a good practice to mitigate potential impact when first experimenting in production. Also, if using actual customer traffic poses too much risk, you can run experiments using synthetic traffic on production infrastructure against the control and experimental deployments. When using production is not possible, run experiments in pre-production environments that are as close to production as possible. You must establish and monitor guardrails to ensure the experiment does not impact production traffic or other systems beyond acceptable limits. Establish stop conditions to stop an experiment if it reaches a threshold on a guardrail metric that you define. This should include the metrics for steady state for the workload, as well as the metric against the components into which youโ€™re injecting the fault. A synthetic monitor (also known as a user canary) is one metric you should usually include as a user proxy. Stop conditions for AWS FIS are supported as part of the experiment template, allowing up to five stop-conditions per template. One of the principles of chaos is minimize the scope of the experiment and its impact: While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained. A method to verify the scope and potential impact is to perform the experiment in a non-production environment first, verifying that thresholds for stop conditions activate as expected during an experiment and observability is in place to catch an exception, instead of directly experimenting in production. When running fault injection experiments, verify that all responsible parties are well-informed. Communicate with appropriate teams such as the operations teams, service reliability teams, and customer support to let them know when experiments will be run and what to expect. Give these teams communication tools to inform those running the experiment if they see any adverse effects. You must restore the workload and its underlying systems back to the original known-good state. Often, the resilient design of the workload will self-heal. But some fault designs or failed experiments can leave your workload in an unexpected failed state. By the end of the experiment, you must be aware of this and restore the workload and systems. With AWS FIS you can set a rollback configuration (also called a post action) within the action parameters. A post action returns the target to the state that it was in before the action was run. Whether automated (such as using AWS FIS) or manual, these post actions should be part of a playbook that describes how to detect and handle failures. 4. Improve the workload design for resilience. If steady state was not maintained, then investigate how the workload design can be improved to mitigate the fault, applying the best practices of the AWS Well-Architected Reliability pillar. Additional guidance and resources can be found in the AWS Builderโ€™s Library, which hosts articles about how to improve your health checks or employ retries with backoff in your application code, among others. After these changes have been implemented, run the experiment again (shown by the dotted line in the chaos engineering flywheel) to determine their effectiveness. If the verify step indicates the hypothesis holds true, then the workload will be in steady state, and the cycle continues. 4. Run experiments regularly. A chaos experiment is a cycle, and experiments should be run regularly as part of chaos engineering. After a workload meets the experimentโ€™s hypothesis, the experiment should be automated to run continually as a regression part of your CI/CD pipeline. To learn how to do this, see this blog on how to run AWS FIS experiments using AWS CodePipeline. This lab on recurrent AWS FIS experiments in a CI/CD pipeline allows you to work hands-on. Fault injection experiments are also a part of game days (see REL12-BP05 Conduct game days regularly). Game days simulate a failure or event to verify systems, processes, and team responses. The purpose is to actually perform the actions the team would perform as if an exceptional event happened. 5. Capture and store experiment results. Results for fault injection experiments must be captured and persisted. Include all necessary data (such as time, workload, and conditions) to be able to later analyze experiment results and trends. Examples of results might include screenshots of dashboards, CSV dumps from your metricโ€™s database, or a hand-typed record of events and observations from the experiment. Experiment logging with AWS FIS can be part of this data capture.

๐Ÿ’ผ REL12-BP05 Conduct game days regularly

Conduct game days to regularly exercise your procedures for responding to workload-impacting events and impairments. Involve the same teams who would be responsible for handling production scenarios. These exercises help enforce measures to prevent user impact caused by production events. When you practice your response procedures in realistic conditions, you can identify and address any gaps or weaknesses before a real event occurs. Game days simulate events in production-like environments to test systems, processes, and team responses. The purpose is to perform the same actions the team would perform as if the event actually occurred. These exercises help you understand where improvements can be made and can help develop organizational experience in dealing with events and impairments. These should be conducted regularly so that your team builds ingrained habits for how to respond. Game days prepare teams to handle production events with greater confidence. Teams that are well-practiced are more able to quickly detect and respond to various scenarios. This results in a significantly improved readiness and resilience posture. **Desired outcome** You run resilience game days on a consistent, scheduled basis. These game days are seen as a normal and expected part of doing business. Your organization has built a culture of preparedness, and when production issues occur, your teams are well-prepared to respond effectively, resolve the issues efficiently, and mitigate the impact on customers. **Common anti-patterns** - You document your procedures, but never exercise them. - You exclude business decision makers in the test exercises. - You run a game day, but don't inform all relevant stakeholders. - You focus solely on technical failures, but don't involve business stakeholders. - You don't incorporate lessons learned from game days into your recovery processes. - You blame teams for failures or bugs. **Benefits of establishing this best practice** - Enhance response skills: On game days, teams practice their duties and test their communication mechanisms during simulated events, which creates a more coordinated and efficient response in production situations. - Identify and address dependencies: Complex environments often involve intricate dependencies between various systems, services, and components. Game days can help you identify and address these dependencies, and verify that your critical systems and services are properly covered by your runbook procedures and can be scaled up or recovered in a timely manner. - Foster a culture of resilience: Game days can help cultivate a mindset of resilience within an organization. When you involve cross-functional teams and stakeholders, these exercises promote awareness, collaboration, and a shared understanding of the importance of resilience across the entire organization. - Continuous improvement and adaptation: Regular game days help you to continually assess and adapt your resilience strategies, which keeps them relevant and effective in the face of changing circumstances. - Increase confidence in the system: Successful game days can help you build confidence in the system's ability to withstand and recover from disruptions. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Once you have designed and implemented the necessary resilience measures, conduct a game day to validate that everything works as planned in production. A game day, especially the first one, should involve all team members, and all stakeholders and participants should be informed in advance about the date, time, and simulated scenarios. During the game day, the involved teams simulate various events and potential scenarios according to the prescribed procedures. The participants closely monitor and assess the impact of these simulated events. If the system operates as designed, the automated detection, scaling, and self-healing mechanisms should activate and result in little to no impact on users. If the team observes any negative impact, they roll back the test and remedy the identified issues, either through automated means or manual intervention documented in the applicable runbooks. To continuously improve resilience, it's critical to document and incorporate lessons learned. This process is a feedback loop that systematically captures insights from game days and uses them to enhance systems, processes, and team capabilities. To help you reproduce real-world scenarios where system components or services may fail unexpectedly, inject simulated faults as a game day exercise. Teams can test the resilience and fault tolerance of their systems and simulate their incident response and recovery processes in a controlled environment. In AWS, your game days can be carried out with replicas of your production environment using infrastructure as code. Through this process, you can test in a safe environment that closely resembles your production environment. Consider AWS Fault Injection Service to create different failure scenarios. Use services like Amazon CloudWatch and AWS X-Ray to monitor system behavior during game days. Use AWS Systems Manager to manage and run playbooks, and use AWS Step Functions to orchestrate recurring game day workflows. ### Implementation steps 1. Establish a game day program: Develop a structured program that defines the frequency, scope, and objectives of game days. Involve key stakeholders and subject matter experts in planning and running these exercises. 2. Prepare the game day: - Identify the key business-critical services that are the focus of the game day. Catalog and map the people, processes, and technologies that support those services. - Set the agenda for the game day, and prepare the involved teams to participate in the event. Prepare your automation services to simulate the planned scenarios and run the appropriate recovery processes. AWS services such as AWS Fault Injection Service, AWS Step Functions, and AWS Systems Manager can help you automate various aspects of game days, such as injection of faults and initiation of recovery actions. 3. Run your simulation: On the game day, run the planned scenario. Observe and document how the people, processes, and technologies react to the simulated event. 4. Conduct post-exercise reviews: After the game day, conduct a retrospective session to review the lessons learned. Identify areas for improvement and any actions needed to improve operational resilience. Document your findings, and track any necessary changes to enhance your resilience strategies and preparedness to completion.

๐Ÿ’ผ REL13-BP01 Define recovery objectives for downtime and data loss

Failures can impact your business in several ways. First, failures can cause service interruption (downtime). Second, failures can cause data to become lost, inconsistent, or stale. In order to guide how you respond and recover from failures, define a Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for each workload. Recovery Time Objective (RTO) is the maximum acceptable delay between the interruption of service and restoration of service. Recovery Point Objective (RPO) is the maximum acceptable time after the last data recovery point. **Desired outcome** Every workload has a designated RTO and RPO based on technical considerations and business impact. **Common anti-patterns** - You haven't designated recovery objectives. - You select arbitrary recovery objectives. - You select recovery objectives that are too lenient and do not meet business objectives. - You have not evaluated the impact of downtime and data loss. - You select unrealistic recovery objectives, such as zero time to recover or zero data loss, which may not be achievable for your workload configuration. - You select recovery objectives that are more stringent than actual business objectives. This forces recovery implementations that are costlier and more complicated than what the workload needs. - You select recovery objectives that are incompatible with those of a dependent workload. - You fail to consider regulatory and compliance requirements. **Benefits of establishing this best practice** When you set RTOs and RPOs for your workloads, you establish clear and measurable goals for recovery based on your business needs. Once you've set those goals, you can create disaster recovery (DR) plans that are tailored to meet them. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Construct a matrix or worksheet to help guide your disaster recovery planning. In your matrix, create different workload categories or tiers based on their business impact (such as critical, high, medium, and low) and the associated RTOs and RPOs to target for each one. For each workload, investigate and understand the impact of downtime and lost data on your business. The impact typically grows with downtime and data loss, but the shape of the impact can differ based on the workload type. For example, downtime for up to an hour might have low impact, but after that, the impact could quickly intensify. Impact can take many forms, including financial impact (such as lost revenue), reputational impact (including loss of customer trust), operational impact (such as a missed payroll or decreased productivity), and regulatory risk. Once completed, assign the workload to the appropriate tier. Consider the following questions when you analyze the impact of failure: - What is the maximum time the workload can be unavailable before unacceptable impact to the business is incurred? - How much impact, and what kind, will be incurred by the business by a workload disruption? Consider all kinds of impact, including financial, reputational, operational, and regulatory. - What is the maximum amount of data that can be lost or unrecoverable before unacceptable impact to the business is incurred? - Can lost data be recreated from other sources (also known as derived data)? If so, also consider the RPOs of all source data used to recreate the workload data. - What are the recovery objectives and availability expectations of workloads that this one depends on (downstream)? Your workload's objectives must be achievable given the recovery capabilities of its downstream dependencies. Consider possible downstream dependency workarounds or mitigations that can improve this workload's recovery capability. - What are the recovery objectives and availability expectations of workloads that depend on this one (upstream)? Upstream workload objectives may require this workload to have more stringent recovery capabilities than it first appears. - Are there different recovery objectives based on the type of incident? For example, you might have different RTOs and RPOs depending on whether the incident impacts an Availability Zone or an entire Region. - Do your recovery objectives change during certain events or times of the year? For example, you might have different RTOs and RPOs around holiday shopping seasons, sporting events, special sales, and new product launches. - How do the recovery objectives align with any line of business and organizational disaster recovery strategy you might have? - Are there legal or contractual ramifications to consider? For example, are you contractually obligated to provide a service with a given RTO or RPO? What penalties might you incur for not meeting them? - Are you required to maintain data integrity to meet regulatory or compliance requirements? ### Implementation steps 1. Identify the business stakeholders and technical teams responsible for each workload, and engage with them. 2. Create categories or tiers of criticality for workload impact in your organization. Example categories include critical, high, medium, and low. For each category, choose an RTO and RPO that reflects your business objectives and requirements. 3. Assign one of the impact categories you created in the previous step to each workload. To decide how a workload maps to a category, consider the workload's importance to the business and the impact of interruption or data loss, and use the questions above to guide you. This results in an RTO and RPO for each workload. 4. Consider the RTO and RPO for each workload determined in the previous step. Involve the workload's business and technical teams to determine whether the objectives should be adjusted. For example, business stakeholders could determine that more stringent targets are required. Alternatively, technical teams could determine that targets should be modified to make them achievable with available resources and technological constraints.

๐Ÿ’ผ REL13-BP02 Use defined recovery strategies to meet the recovery objectives

Define a disaster recovery (DR) strategy that meets your workload's recovery objectives. Choose a strategy such as backup and restore, standby (active/passive), or active/active. **Desired outcome** For each workload, there is a defined and implemented DR strategy that allows the workload to achieve DR objectives. DR strategies between workloads make use of reusable patterns (such as the strategies previously described). **Common anti-patterns** - Implementing inconsistent recovery procedures for workloads with similar DR objectives. - Leaving the DR strategy to be implemented ad-hoc when a disaster occurs. - Having no plan for disaster recovery. - Dependency on control plane operations during recovery. **Benefits of establishing this best practice** - Using defined recovery strategies allows you to use common tooling and test procedures. - Using defined recovery strategies improves knowledge sharing between teams and implementation of DR on the workloads they own. **Level of risk exposed if this best practice is not established:** High. Without a planned, implemented, and tested DR strategy, you are unlikely to achieve recovery objectives in the event of a disaster. ## Implementation guidance A DR strategy relies on the ability to stand up your workload in a recovery site if your primary location becomes unable to run the workload. The most common recovery objectives are RTO and RPO, as discussed in REL13-BP01 Define recovery objectives for downtime and data loss. A DR strategy across multiple Availability Zones (AZs) within a single AWS Region, can provide mitigation against disaster events like fires, floods, and major power outages. If it is a requirement to implement protection against an unlikely event that prevents your workload from being able to run in a given AWS Region, you can use a DR strategy that uses multiple Regions. When architecting a DR strategy across multiple Regions, you should choose one of the following strategies. They are listed in increasing order of cost and complexity, and decreasing order of RTO and RPO. Recovery Region refers to an AWS Region other than the primary one used for your workload. - **Backup and restore (RPO in hours, RTO in 24 hours or less):** Back up your data and applications into the recovery Region. Using automated or continuous backups will permit point in time recovery (PITR), which can lower RPO to as low as 5 minutes in some cases. In the event of a disaster, you will deploy your infrastructure (using infrastructure as code to reduce RTO), deploy your code, and restore the backed-up data to recover from a disaster in the recovery Region. - **Pilot light (RPO in minutes, RTO in tens of minutes):** Provision a copy of your core workload infrastructure in the recovery Region. Replicate your data into the recovery Region and create backups of it there. Resources required to support data replication and backup, such as databases and object storage, are always on. Other elements such as application servers or serverless compute are not deployed, but can be created when needed with the necessary configuration and application code. - **Warm standby (RPO in seconds, RTO in minutes):** Maintain a scaled-down but fully functional version of your workload always running in the recovery Region. Business-critical systems are fully duplicated and are always on, but with a scaled down fleet. Data is replicated and live in the recovery Region. When the time comes for recovery, the system is scaled up quickly to handle the production load. The more scaled-up the warm standby is, the lower RTO and control plane reliance will be. When fully scales this is known as hot standby. - **Multi-Region (multi-site) active-active (RPO near zero, RTO potentially zero):** Your workload is deployed to, and actively serving traffic from, multiple AWS Regions. This strategy requires you to synchronize data across Regions. Possible conflicts caused by writes to the same record in two different regional replicas must be avoided or handled, which can be complex. Data replication is useful for data synchronization and will protect you against some types of disaster, but it will not protect you against data corruption or destruction unless your solution also includes options for point-in-time recovery. ### Implementation steps 1. Determine a DR strategy that will satisfy recovery requirements for this workload. Choosing a DR strategy is a trade-off between reducing downtime and data loss (RTO and RPO) and the cost and complexity of implementing the strategy. You should avoid implementing a strategy that is more stringent than it needs to be, as this incurs unnecessary costs. 2. Review the patterns for how the selected DR strategy can be implemented. This step is to understand how you will implement the selected strategy. The strategies are explained using AWS Regions as the primary and recovery sites. However, you can also choose to use Availability Zones within a single Region as your DR strategy, which makes use of elements of multiple of these strategies. In the following steps, you can apply the strategy to your specific workload. **Backup and restore** Backup and restore is the least complex strategy to implement, but will require more time and effort to restore the workload, leading to higher RTO and RPO. It is a good practice to always make backups of your data, and copy these to another site (such as another AWS Region). **Pilot light** With the pilot light approach, you replicate your data from your primary Region to your recovery Region. Core resources used for the workload infrastructure are deployed in the recovery Region, however additional resources and any dependencies are still needed to make this a functional stack. **Warm standby** The warm standby approach involves ensuring that there is a scaled down, but fully functional, copy of your production environment in another Region. This approach extends the pilot light concept and decreases the time to recovery because your workload is always-on in another Region. If the recovery Region is deployed at full capacity, then this is known as hot standby. Using warm standby or pilot light requires scaling up resources in the recovery Region. To verify capacity is available when needed, consider the use for capacity reservations for EC2 instances. If using AWS Lambda, then provisioned concurrency can provide runtime environments so that they are prepared to respond immediately to your function's invocations. **Multi-site active/active** You can run your workload simultaneously in multiple Regions as part of a multi-site active/active strategy. Multi-site active/active serves traffic from all regions to which it is deployed. Customers may select this strategy for reasons other than DR. It can be used to increase availability, or when deploying a workload to a global audience (to put the endpoint closer to users and/or to deploy stacks localized to the audience in that region). As a DR strategy, if the workload cannot be supported in one of the AWS Regions to which it is deployed, then that Region is evacuated, and the remaining Regions are used to maintain availability. Multi-site active/active is the most operationally complex of the DR strategies, and should only be selected when business requirements necessitate it. **AWS Elastic Disaster Recovery** If you are considering the pilot light or warm standby strategy for disaster recovery, AWS Elastic Disaster Recovery could provide an alternative approach with improved benefits. Elastic Disaster Recovery can offer an RPO and RTO target similar to warm standby, but maintain the low-cost approach of pilot light. Elastic Disaster Recovery replicates your data from your primary region to your recovery Region, using continual data protection to achieve an RPO measured in seconds and an RTO that can be measured in minutes. Only the resources required to replicate the data are deployed in the recovery region, which keeps costs down, similar to the pilot light strategy. When using Elastic Disaster Recovery, the service coordinates and orchestrates the recovery of compute resources when initiated as part of failover or drill. **Additional practices for protecting data** With all strategies, you must also mitigate against a data disaster. Continuous data replication protects you against some types of disaster, but it may not protect you against data corruption or destruction unless your strategy also includes versioning of stored data or options for point-in-time recovery. You must also back up the replicated data in the recovery site to create point-in-time backups in addition to the replicas. **Using multiple Availability Zones (AZs) within a single AWS Region** When using multiple AZs within a single Region, your DR implementation uses multiple elements of the above strategies. This architecture makes use of a multi-site active/active approach, as the Amazon EC2 instances and the Elastic Load Balancer have resources deployed in multiple AZs, actively handing requests. The architecture also demonstrates hot standby, where if the primary Amazon RDS instance fails (or the AZ itself fails), then the standby instance is promoted to primary. In addition to this HA architecture, you need to add backups of all data required to run your workload. This is especially important for data that is constrained to a single zone such as Amazon EBS volumes or Amazon Redshift clusters. If an AZ fails, you will need to restore this data to another AZ. Where possible, you should also copy data backups to another AWS Region as an additional layer of protection. 3. Assess the resources of your workload, and what their configuration will be in the recovery Region prior to failover (during normal operation). For infrastructure and AWS resources use infrastructure as code such as AWS CloudFormation or third-party tools like Hashicorp Terraform. To deploy across multiple accounts and Regions with a single operation you can use AWS CloudFormation StackSets. For Multi-site active/active and Hot Standby strategies, the deployed infrastructure in your recovery Region has the same resources as your primary Region. For Pilot Light and Warm Standby strategies, the deployed infrastructure will require additional actions to become production ready. Using CloudFormation parameters and conditional logic, you can control whether a deployed stack is active or standby with a single template. When using Elastic Disaster Recovery, the service will replicate and orchestrate the restoration of application configurations and compute resources. All DR strategies require that data sources are backed up within the AWS Region, and then those backups are copied to the recovery Region. AWS Backup provides a centralized view where you can configure, schedule, and monitor backups for these resources. For Pilot Light, Warm Standby, and Multi-site active/active, you should also replicate data from the primary Region to data resources in the recovery Region, such as Amazon Relational Database Service (Amazon RDS) DB instances or Amazon DynamoDB tables. These data resources are therefore live and ready to serve requests in the recovery Region. 4. Determine and implement how you will make your recovery Region ready for failover when needed (during a disaster event). For multi-site active/active, failover means evacuating a Region, and relying on the remaining active Regions. In general, those Regions are ready to accept traffic. For Pilot Light and Warm Standby strategies, your recovery actions will need to deploy the missing resources, such as the EC2 instances in Figure 20, plus any other missing resources. For all of the above strategies you may need to promote read-only instances of databases to become the primary read/write instance. For backup and restore, restoring data from backup creates resources for that data such as EBS volumes, RDS DB instances, and DynamoDB tables. You also need to restore the infrastructure and deploy code. You can use AWS Backup to restore data in the recovery Region. See REL09-BP01 Identify and back up all data that needs to be backed up, or reproduce the data from sources for more details. Rebuilding the infrastructure includes creating resources like EC2 instances in addition to the Amazon Virtual Private Cloud (Amazon VPC), subnets, and security groups needed. You can automate much of the restoration process. 5. Determine and implement how you will reroute traffic to failover when needed (during a disaster event). This failover operation can be initiated either automatically or manually. Automatically initiated failover based on health checks or alarms should be used with caution since an unnecessary failover (false alarm) incurs costs such as non-availability and data loss. Manually initiated failover is therefore often used. In this case, you should still automate the steps for failover, so that the manual initiation is like the push of a button. There are several traffic management options to consider when using AWS services. One option is to use Amazon Route 53. Using Amazon Route 53, you can associate multiple IP endpoints in one or more AWS Regions with a Route 53 domain name. To implement manually initiated failover you can use Amazon Application Recovery Controller, which provides a highly available data plane API to reroute traffic to the recovery Region. When implementing failover, use data plane operations and avoid control plane ones as described in REL11-BP04 Rely on the data plane and not the control plane during recovery. To learn more about this and other options see this section of the Disaster Recovery Whitepaper. 6. Design a plan for how your workload will fail back. Failback is when you return workload operation to the primary Region, after a disaster event has abated. Provisioning infrastructure and code to the primary Region generally follows the same steps as were initially used, relying on infrastructure as code and code deployment pipelines. The challenge with failback is restoring data stores, and ensuring their consistency with the recovery Region in operation. In the failed over state, the databases in the recovery Region are live and have the up-to-date data. The goal then is to re-synchronize from the recovery Region to the primary Region, ensuring it is up-to-date. Some AWS services will do this automatically. If using Amazon DynamoDB global tables, even if the table in the primary Region had become not available, when it comes back online, DynamoDB resumes propagating any pending writes. If using Amazon Aurora Global Database and using managed planned failover, then Aurora global database's existing replication topology is maintained. Therefore, the former read/write instance in the primary Region will become a replica and receive updates from the recovery Region. In cases where this is not automatic, you will need to re-establish the database in the primary Region as a replica of the database in the recovery Region. In many cases this will involve deleting the old primary database, and creating new replicas. After a failover, if you can continue running in your recovery Region, consider making this the new primary Region. You would still do all the above steps to make the former primary Region into a recovery Region. Some organizations do a scheduled rotation, swapping their primary and recovery Regions periodically (for example every three months). All of the steps required to fail over and fail back should be maintained in a playbook that is available to all members of the team, and is periodically reviewed. When using Elastic Disaster Recovery, the service will assist in orchestrating and automating the failback process. **Level of effort for the Implementation Plan:** High

๐Ÿ’ผ REL13-BP03 Test disaster recovery implementation to validate the implementation

Regularly test failover to your recovery site to verify that it operates properly and that RTO and RPO are met. **Common anti-patterns:** - Never exercise failovers in production. **Benefits of establishing this best practice:** Regularly testing your disaster recovery plan verifies that it will work when it needs to, and that your team knows how to perform the strategy. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance A pattern to avoid is developing recovery paths that are rarely exercised. For example, you might have a secondary data store that is used for read-only queries. When you write to a data store and the primary fails, you might want to fail over to the secondary data store. If you donโ€™t frequently test this failover, you might find that your assumptions about the capabilities of the secondary data store are incorrect. The capacity of the secondary, which might have been sufficient when you last tested, might no longer be able to tolerate the load under this scenario. Our experience has shown that the only error recovery that works is the path you test frequently. This is why having a small number of recovery paths is best. You can establish recovery patterns and regularly test them. If you have a complex or critical recovery path, you still need to regularly exercise that failure in production to convince yourself that the recovery path works. In the example we just discussed, you should fail over to the standby regularly, regardless of need. ### Implementation steps 1. Engineer your workloads for recovery. Regularly test your recovery paths. Recovery-oriented computing identifies the characteristics in systems that enhance recovery: isolation and redundancy, system-wide ability to roll back changes, ability to monitor and determine health, ability to provide diagnostics, automated recovery, modular design, and ability to restart. Exercise the recovery path to verify that you can accomplish the recovery in the specified time to the specified state. Use your runbooks during this recovery to document problems and find solutions for them before the next test. 2. For Amazon EC2-based workloads, use AWS Elastic Disaster Recovery to implement and launch drill instances for your DR strategy. AWS Elastic Disaster Recovery provides the ability to efficiently run drills, which helps you prepare for a failover event. You can also frequently launch your instances using Elastic Disaster Recovery for test and drill purposes without redirecting the traffic.

๐Ÿ’ผ REL13-BP04 Manage configuration drift at the DR site or Region

To perform a successful disaster recovery (DR) procedure, your workload must be able to resume normal operations in a timely manner with no relevant loss of functionality or data once the DR environment has been brought online. To achieve this goal, it's essential to maintain consistent infrastructure, data, and configurations between your DR environment and the primary environment. **Desired outcome:** Your disaster recovery site's configuration and data are in parity with the primary site, which facilitates rapid and complete recovery when needed. **Common anti-patterns:** - You fail to update recovery locations when changes are made to the primary locations, which results in outdated configurations that could hinder recovery efforts. - You do not consider potential limitations such as service differences between primary and recovery locations, which can lead to unexpected failures during failover. - You rely on manual processes to update and synchronize the DR environment, which increases the risk of human error and inconsistency. - You fail to detect configuration drift, which leads to a false sense of DR site readiness prior to an incident. **Benefits of establishing this best practice:** Consistency between the DR environment and the primary environment significantly improves the likelihood of a successful recovery after an incident and reduces the risk of a failed recovery procedure. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance A comprehensive approach to configuration management and failover readiness can help you verify that the DR site is consistently updated and prepared to take over in the event of a primary site failure. To achieve consistency between your primary and disaster recovery (DR) environments, validate that your delivery pipelines distribute applications to both your primary and DR sites. Roll out changes to the DR sites after an appropriate evaluation period (also known as staggered deployments) to detect problems at the primary site and halt the deployment before they spread. Implement monitoring to detect configuration drift, and track changes and compliance across your environments. Perform automated remediation in the DR site to keep it fully consistent and ready to take over in the event of an incident. ### Implementation steps - Validate that the DR region contains the AWS services and features required for a successful execution of your DR plan. - Use infrastructure as code (IaC). Keep your production infrastructure and application configuration templates accurate, and regularly apply them to your disaster recovery environment. AWS CloudFormation can detect drift between what your CloudFormation templates specify and what is actually deployed. - Configure CI/CD pipelines to deploy applications and infrastructure updates to all environments, including primary and DR sites. CI/CD solutions such as AWS CodePipeline can automate the deployment process, which reduces the risk of configuration drift. - Stagger deployments between the primary and DR environments. This approach allows updates to be initially deployed and tested in the primary environment, which isolates issues in the primary site before they are propagated to the DR site. This approach prevents defects from being simultaneously pushed to production and the DR site at the same time and maintains the integrity of the DR environment. - Continually monitor resource configurations in both primary and DR environments. Solutions such as AWS Config can help to enforce configuration compliance and detect drift, which helps maintain the consistent configurations across environments. - Implement alerting mechanisms to track and notify upon any configuration drift or data replication interruption or lag. - Automate the remediation of detected configuration drift. - Schedule regular audits and compliance checks to verify ongoing alignment between primary and DR configurations. Periodic reviews help you maintain compliance with defined rules and identify any discrepancies that need to be addressed. - Check for mismatches in AWS provisioned capacity, service quotas, throttle limits, and configuration and version discrepancies.

๐Ÿ’ผ REL13-BP05 Automate recovery

Implement tested and automated recovery mechanisms that are reliable, observable, and reproducible to reduce the risk and business impact of failure. **Desired outcome:** You have implemented a well-documented, standardized, and thoroughly-tested automation workflow for recovery processes. Your recovery automation automatically corrects minor issues that pose low risk of data loss or unavailability. You are able to quickly invoke recovery processes for serious incidents, observe the remediation behavior while they operate, and end the processes if you observe dangerous situations or failures. **Common anti-patterns:** - You depend on components or mechanisms that are in a failed or degraded state as part of your recovery plan. - Your recovery processes require manual intervention, such as console access (also known as click ops). - You automatically initiate recovery procedures in situations that present a high risk of data loss or unavailability. - You fail to include a mechanism to abort a recovery procedure (like an Andon cord or big red stop button) that is not working or that poses additional risks. **Benefits of establishing this best practice:** - Increased reliability, predictability, and consistency of recovery operations. - Ability to meet more stringent recovery objectives, including Recovery Time Objective (RTO) and Recovery Point Objective (RPO). - Reduced likelihood of recovery failing during an incident. - Reduced risk of failures associated with manual recovery processes that are prone to human error. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance To implement automated recovery, you need a comprehensive approach that uses AWS services and best practices. To start, identify critical components and potential failure points in your workload. Develop automated processes that can recover your workloads and data from failures without human intervention. Develop your recovery automation using infrastructure as code (IaC) principles. This makes your recovery environment consistent with the source environment and allows for version control of your recovery processes. To orchestrate complex recovery workflows, consider solutions such as AWS Systems Manager Automations or AWS Step Functions. Automation of recovery processes provides significant benefits and can help you more easily achieve your Recovery Time Objective (RTO) and Recovery Point Objective (RPO). However, they can encounter unexpected situations that may cause them to fail or create new risks of their own such as additional downtime and data loss. To mitigate this risk, provide the ability to quickly halt a recovery automation in progress. Once halted, you can investigate and take corrective steps. For supported workloads, consider solutions such as AWS Elastic Disaster Recovery (AWS DRS) to provide automated failover. AWS DRS continually replicates your machines (including operating system, system state configuration, databases, applications, and files) into a staging area in your target AWS account and preferred Region. If an incident occurs, AWS DRS automates the conversion of your replicated servers into fully-provisioned workloads in your recovery Region on AWS. Maintenance and improvement of automated recovery is an ongoing process. Continually test and refine your recovery procedures based on lessons learned, and stay updated on new AWS services and features that can enhance your recovery capabilities. ### Implementation steps **Plan for automated recovery** - Conduct a thorough review of your workload architecture, components, and dependencies to identify and plan automated recovery mechanisms. Categorize your workload's dependencies into hard and soft dependencies. Hard dependencies are those that the workload cannot operate without and for which no substitute can be provided. Soft dependencies are those that the workload ordinarily uses but are replaceable with temporary substitute systems or processes or can be handled by graceful degradation. - Establish processes to identify and recover missing or corrupted data. - Define steps to confirm a recovered steady state after recovery actions have been completed. - Consider any actions required to make the recovered system ready for full service, such as pre-warming and populating caches. - Consider problems that could be encountered during the recovery process and how to detect and remediate them. - Consider scenarios where the primary site and its control plane are inaccessible. Verify that recovery actions can be performed independently without reliance on the primary site. Consider solutions such as Amazon Application Recovery Controller (ARC) to redirect traffic without the need to manually mutate DNS records. **Develop automated recovery process** - Implement automated fault detection and failover mechanisms for hands-free recovery. Build dashboards such as with Amazon CloudWatch to report the progress and health of automated recovery procedures. Include procedures to validate successful recovery. Provide a mechanism to abort a recovery in process. - Build playbooks as a fallback process for faults that cannot be automatically recovered from, and take into consideration your disaster recovery plan. - Test recovery processes as discussed in REL13-BP03. **Prepare for recovery** - Evaluate the state of your recovery site and deploy critical components to it in advance. For more detail, see REL13-BP04. - Define clear roles, responsibilities, and decision-making processes for recovery operations, involving relevant stakeholders and teams across the organization. - Define the conditions to initiate your recovery processes. - Create a plan to revert the recovery process and fall back to your primary site if required or after it's considered safe.

๐Ÿ’ผ Resource Reliability

Policies for identifying resources that are not being upgraded to the latest versions, or function inconsistently and incorrectly.

๐Ÿ’ผ Resource Right-Sizing

Policies for identifying resources that can be reconfigured to align resource capacity with actual workload demands to avoid over/under-provisioning.

๐Ÿ’ผ Responding to events

You should anticipate operational events, both planned (for example, sales promotions, deployments, and failure tests) and unplanned (for example, surges in utilization and component failures). You should use your existing runbooks and playbooks to deliver consistent results when you respond to alerts. Defined alerts should be owned by a role or a team that is accountable for the response and escalations. You will also want to know the business impact of your system components and use this to target efforts when needed. You should perform a root cause analysis (RCA) after events, and then prevent recurrence of failures or document workarounds.

๐Ÿ’ผ Risk Assessment (ID.RA)

The organization understands the cybersecurity risk to organizational operations (including mission, functions, image, or reputation), organizational assets, and individuals.

๐Ÿ’ผ Risk Management Strategy (GV.RM)

The organization's priorities, constraints, risk tolerance and appetite statements, and assumptions are established, communicated, and used to support operational risk decisions.

๐Ÿ’ผ RS.AN-03: Analysis is performed to establish what has taken place during an incident and the root cause of the incident

1. Determine the sequence of events that occurred during the incident and which assets and resources were involved in each event 2. Attempt to determine what vulnerabilities, threats, and threat actors were directly or indirectly involved in the incident 3. Analyze the incident to find the underlying, systemic root causes 4. Check any cyber deception technology for additional information on attacker behavior

๐Ÿ’ผ RS.AN-06: Actions performed during an investigation are recorded, and the records' integrity and provenance are preserved

1. Require each incident responder and others (e.g., system administrators, cybersecurity engineers) who perform incident response tasks to record their actions and make the record immutable 2. Require the incident lead to document the incident in detail and be responsible for preserving the integrity of the documentation and the sources of all information being reported

๐Ÿ’ผ RS.CO-02: Internal and external stakeholders are notified of incidents

1. Follow the organization's breach notification procedures after discovering a data breach incident, including notifying affected customers 2. Notify business partners and customers of incidents in accordance with contractual requirements 3. Notify law enforcement agencies and regulatory bodies of incidents based on criteria in the incident response plan and management approval

๐Ÿ’ผ RS.CO-03: Information is shared with designated internal and external stakeholders

1. Securely share information consistent with response plans and information sharing agreements 2. Voluntarily share information about an attacker's observed TTPs, with all sensitive data removed, with an Information Sharing and Analysis Center (ISAC) 3. Notify HR when malicious insider activity occurs 4. Regularly update senior leadership on the status of major incidents 5. Follow the rules and protocols defined in contracts for incident information sharing between the organization and its suppliers 6. Coordinate crisis communication methods between the organization and its critical suppliers

๐Ÿ’ผ RS.MA-03: Incidents are categorized and prioritized

1. Further review and categorize incidents based on the type of incident (e.g., data breach, ransomware, DDoS, account compromise) 2. Prioritize incidents based on their scope, likely impact, and time-critical nature 3. Select incident response strategies for active incidents by balancing the need to quickly recover from an incident with the need to observe the attacker or conduct a more thorough investigation

๐Ÿ’ผ RS.MI-01: Incidents are contained

1. Cybersecurity technologies (e.g., antivirus software) and cybersecurity features of other technologies (e.g., operating systems, network infrastructure devices) automatically perform containment actions 2. Allow incident responders to manually select and perform containment actions 3. Allow a third party (e.g., internet service provider, managed security service provider) to perform containment actions on behalf of the organization 4. Automatically transfer compromised endpoints to a remediation virtual local area network (VLAN)

๐Ÿ’ผ RS.MI-02: Incidents are eradicated

1. Cybersecurity technologies and cybersecurity features of other technologies (e.g., operating systems, network infrastructure devices) automatically perform eradication actions 2. Allow incident responders to manually select and perform eradication actions 3. Allow a third party (e.g., managed security service provider) to perform eradication actions on behalf of the organization

๐Ÿ’ผ SA-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] system and services acquisition policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and services acquisition policy and the associated system and services acquisition controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and services acquisition policy and procedures; and c. Review and update the current system and services acquisition: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ SA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and services acquisition policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and services acquisition policy and the associated system and services acquisition controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and services acquisition policy and procedures; and c. Review and update the current system and services acquisition: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and services acquisition policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and services acquisition policy and the associated system and services acquisition controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and services acquisition policy and procedures; and c. Review and update the current system and services acquisition: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SA-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and services acquisition policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and services acquisition policy and the associated system and services acquisition controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and services acquisition policy and procedures; and c. Review and update the current system and services acquisition: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SA-1 SYSTEM AND SERVICES ACQUISITION POLICY AND PROCEDURES

The organization: SA-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: SA-1a.1. A system and services acquisition policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and SA-1a.2. Procedures to facilitate the implementation of the system and services acquisition policy and associated system and services acquisition controls; and SA-1b. Reviews and updates the current: SA-1b.1. System and services acquisition policy [Assignment: organization-defined frequency]; and SA-1b.2. System and services acquisition procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ SA-10 (4) TRUSTED GENERATION

The organization requires the developer of the information system, system component, or information system service to employ tools for comparing newly generated versions of security-relevant hardware descriptions and software/firmware source and object code with previous versions.

๐Ÿ’ผ SA-10 (5) MAPPING INTEGRITY FOR VERSION CONTROL

The organization requires the developer of the information system, system component, or information system service to maintain the integrity of the mapping between the master build data (hardware drawings and software/firmware code) describing the current version of security-relevant hardware, software, and firmware and the on-site master copy of the data for the current version.

๐Ÿ’ผ SA-10 (6) TRUSTED DISTRIBUTION

The organization requires the developer of the information system, system component, or information system service to execute procedures for ensuring that security-relevant hardware, software, and firmware updates distributed to the organization are exactly as specified by the master copies.

๐Ÿ’ผ SA-10 Developer Configuration Management

Require the developer of the system, system component, or system service to: a. Perform configuration management during system, component, or service [Selection (one or more): design; development; implementation; operation; disposal]; b. Document, manage, and control the integrity of changes to [Assignment: organization-defined configuration items under configuration management]; c. Implement only organization-approved changes to the system, component, or service; d. Document approved changes to the system, component, or service and the potential security and privacy impacts of such changes; and e. Track security flaws and flaw resolution within the system, component, or service and report findings to [Assignment: organization-defined personnel].

๐Ÿ’ผ SA-10 DEVELOPER CONFIGURATION MANAGEMENT

The organization requires the developer of the information system, system component, or information system service to: SA-10a. Perform configuration management during system, component, or service [Selection (one or more): design; development; implementation; operation]; SA-10b. Document, manage, and control the integrity of changes to [Assignment: organization-defined configuration items under configuration management]; SA-10c. Implement only organization-approved changes to the system, component, or service; SA-10d. Document approved changes to the system, component, or service and the potential security impacts of such changes; and SA-10e. Track security flaws and flaw resolution within the system, component, or service and report findings to [Assignment: organization-defined personnel].

๐Ÿ’ผ SA-10 Developer Configuration Management (M)(H)

Require the developer of the system, system component, or system service to: a. Perform configuration management during system, component, or service [FedRAMP Assignment: development, implementation, AND operation]; b. Document, manage, and control the integrity of changes to [Assignment: organization-defined configuration items under configuration management]; c. Implement only organization-approved changes to the system, component, or service; d. Document approved changes to the system, component, or service and the potential security and privacy impacts of such changes; and e. Track security flaws and flaw resolution within the system, component, or service and report findings to [Assignment: organization-defined personnel]. **SA-10 Additional FedRAMP Requirements and Guidance:** **(e) Requirement**: track security flaws and flaw resolution within the system, component, or service and report findings to organization-defined personnel, to include FedRAMP.

๐Ÿ’ผ SA-10 Developer Configuration Management (M)(H)

Require the developer of the system, system component, or system service to: a. Perform configuration management during system, component, or service [FedRAMP Assignment: development, implementation, AND operation]; b. Document, manage, and control the integrity of changes to [Assignment: organization-defined configuration items under configuration management]; c. Implement only organization-approved changes to the system, component, or service; d. Document approved changes to the system, component, or service and the potential security and privacy impacts of such changes; and e. Track security flaws and flaw resolution within the system, component, or service and report findings to [Assignment: organization-defined personnel]. **SA-10 Additional FedRAMP Requirements and Guidance:** **(e) Requirement**: track security flaws and flaw resolution within the system, component, or service and report findings to organization-defined personnel, to include FedRAMP.

๐Ÿ’ผ SA-11 (1) STATIC CODE ANALYSIS

The organization requires the developer of the information system, system component, or information system service to employ static code analysis tools to identify common flaws and document the results of the analysis.

๐Ÿ’ผ SA-11 (2) THREAT AND VULNERABILITY ANALYSES

The organization requires the developer of the information system, system component, or information system service to perform threat and vulnerability analyses and subsequent testing/evaluation of the as-built system, component, or service.

๐Ÿ’ผ SA-11 (3) INDEPENDENT VERIFICATION OF ASSESSMENT PLANS | EVIDENCE

The organization: SA-11 (3)(a) Requires an independent agent satisfying [Assignment: organization-defined independence criteria] to verify the correct implementation of the developer security assessment plan and the evidence produced during security testing/evaluation; and SA-11 (3)(b) Ensures that the independent agent is either provided with sufficient information to complete the verification process or granted the authority to obtain such information.

๐Ÿ’ผ SA-11 (4) MANUAL CODE REVIEWS

The organization requires the developer of the information system, system component, or information system service to perform a manual code review of [Assignment: organization-defined specific code] using [Assignment: organization-defined processes, procedures, and/or techniques].

๐Ÿ’ผ SA-11 (5) PENETRATION TESTING

The organization requires the developer of the information system, system component, or information system service to perform penetration testing at [Assignment: organization-defined breadth/depth] and with [Assignment: organization-defined constraints].

๐Ÿ’ผ SA-11 (7) VERIFY SCOPE OF TESTING | EVALUATION

The organization requires the developer of the information system, system component, or information system service to verify that the scope of security testing/evaluation provides complete coverage of required security controls at [Assignment: organization-defined depth of testing/evaluation].

๐Ÿ’ผ SA-11 (8) DYNAMIC CODE ANALYSIS

The organization requires the developer of the information system, system component, or information system service to employ dynamic code analysis tools to identify common flaws and document the results of the analysis.

๐Ÿ’ผ SA-11 DEVELOPER SECURITY TESTING AND EVALUATION

The organization requires the developer of the information system, system component, or information system service to: SA-11a. Create and implement a security assessment plan; SA-11b. Perform [Selection (one or more): unit; integration; system; regression] testing/evaluation at [Assignment: organization-defined depth and coverage]; SA-11c. Produce evidence of the execution of the security assessment plan and the results of the security testing/evaluation; SA-11d. Implement a verifiable flaw remediation process; and SA-11e. Correct flaws identified during security testing/evaluation.

๐Ÿ’ผ SA-11 Developer Testing and Evaluation

Require the developer of the system, system component, or system service, at all post-design stages of the system development life cycle, to: a. Develop and implement a plan for ongoing security and privacy control assessments; b. Perform [Selection (one or more): unit; integration; system; regression] testing/evaluation [Assignment: organization-defined frequency] at [Assignment: organization-defined depth and coverage]; c. Produce evidence of the execution of the assessment plan and the results of the testing and evaluation; d. Implement a verifiable flaw remediation process; and e. Correct flaws identified during testing and evaluation.

๐Ÿ’ผ SA-11 Developer Testing and Evaluation (M)(H)

Require the developer of the system, system component, or system service, at all post-design stages of the system development life cycle, to: a. Develop and implement a plan for ongoing security and privacy assessments; b. Perform [Selection (one-or-more): unit; integration; system; regression] testing/evaluation [Assignment: organization-defined frequency] at [Assignment: organization-defined depth and coverage]; c. Produce evidence of the execution of the assessment plan and the results of the testing and evaluation; d. Implement a verifiable flaw remediation process; and e. Correct flaws identified during testing and evaluation.

๐Ÿ’ผ SA-11 Developer Testing and Evaluation (M)(H)

Require the developer of the system, system component, or system service, at all post-design stages of the system development life cycle, to: a. Develop and implement a plan for ongoing security and privacy assessments; b. Perform [Selection (one-or-more): unit; integration; system; regression] testing/evaluation [Assignment: organization-defined frequency] at [Assignment: organization-defined depth and coverage]; c. Produce evidence of the execution of the assessment plan and the results of the testing and evaluation; d. Implement a verifiable flaw remediation process; and e. Correct flaws identified during testing and evaluation.

๐Ÿ’ผ SA-11(1) Static Code Analysis (M)(H)

Require the developer of the system, system component, or system service to employ static code analysis tools to identify common flaws and document the results of the analysis. **SA-11(1) Additional FedRAMP Requirements:** **Requirement**: The service provider must document its methodology for reviewing newly developed code for the Service in its Continuous Monitoring Plan. If Static code analysis cannot be performed (for example, when the source code is not available), then dynamic code analysis must be performed (see SA-11 (8)).

๐Ÿ’ผ SA-11(1) Static Code Analysis (M)(H)

Require the developer of the system, system component, or system service to employ static code analysis tools to identify common flaws and document the results of the analysis. **SA-11(1) Additional FedRAMP Requirements:** **Requirement**: The service provider must document its methodology for reviewing newly developed code for the Service in its Continuous Monitoring Plan. If Static code analysis cannot be performed (for example, when the source code is not available), then dynamic code analysis must be performed (see SA-11 (8)).

๐Ÿ’ผ SA-11(2) Developer Testing and Evaluation | Threat Modeling and Vulnerability Analyses

Require the developer of the system, system component, or system service to perform threat modeling and vulnerability analyses during development and the subsequent testing and evaluation of the system, component, or service that: (a) Uses the following contextual information: [Assignment: organization-defined information concerning impact, environment of operations, known or assumed threats, and acceptable risk levels]; (b) Employs the following tools and methods: [Assignment: organization-defined tools and methods]; (c) Conducts the modeling and analyses at the following level of rigor: [Assignment: organization-defined breadth and depth of modeling and analyses]; and (d) Produces evidence that meets the following acceptance criteria: [Assignment: organization-defined acceptance criteria].

๐Ÿ’ผ SA-11(2) Threat Modeling and Vulnerability Analyses (M)(H)

Require the developer of the system, system component, or system service to perform threat modeling and vulnerability analyses during development and the subsequent testing and evaluation of the system, component, or service that: (a) Uses the following contextual information: [Assignment: organization-defined information concerning impact, environment of operations, known or assumed threats, and acceptable risk levels]; (b) Employs the following tools and methods: [Assignment: organization-defined tools and methods]; (c) Conducts the modeling and analyses at the following level of rigor: [Assignment: organization-defined breadth and depth of modeling and analyses]; and (d) Produces evidence that meets the following acceptance criteria: [Assignment: organization-defined acceptance criteria].

๐Ÿ’ผ SA-11(2) Threat Modeling and Vulnerability Analyses (M)(H)

Require the developer of the system, system component, or system service to perform threat modeling and vulnerability analyses during development and the subsequent testing and evaluation of the system, component, or service that: (a) Uses the following contextual information: [Assignment: organization-defined information concerning impact, environment of operations, known or assumed threats, and acceptable risk levels]; (b) Employs the following tools and methods: [Assignment: organization-defined tools and methods]; (c) Conducts the modeling and analyses at the following level of rigor: [Assignment: organization-defined breadth and depth of modeling and analyses]; and (d) Produces evidence that meets the following acceptance criteria: [Assignment: organization-defined acceptance criteria].

๐Ÿ’ผ SA-11(3) Developer Testing and Evaluation | Independent Verification of Assessment Plans and Evidence

(a) Require an independent agent satisfying [Assignment: organization-defined independence criteria] to verify the correct implementation of the developer security and privacy assessment plans and the evidence produced during testing and evaluation; and (b) Verify that the independent agent is provided with sufficient information to complete the verification process or granted the authority to obtain such information.

๐Ÿ’ผ SA-12 (11) PENETRATION TESTING | ANALYSIS OF ELEMENTS, PROCESSES, AND ACTORS

The organization employs [Selection (one or more): organizational analysis, independent third-party analysis, organizational penetration testing, independent third-party penetration testing] of [Assignment: organization-defined supply chain elements, processes, and actors] associated with the information system, system component, or information system service.

๐Ÿ’ผ SA-12 (14) IDENTITY AND TRACEABILITY

The organization establishes and retains unique identification of [Assignment: organization-defined supply chain elements, processes, and actors] for the information system, system component, or information system service.

๐Ÿ’ผ SA-12 (2) SUPPLIER REVIEWS

The organization conducts a supplier review prior to entering into a contractual agreement to acquire the information system, system component, or information system service.

๐Ÿ’ผ SA-12 (5) LIMITATION OF HARM

The organization employs [Assignment: organization-defined security safeguards] to limit harm from potential adversaries identifying and targeting the organizational supply chain.

๐Ÿ’ผ SA-12 (9) OPERATIONS SECURITY

The organization employs [Assignment: organization-defined Operations Security (OPSEC) safeguards] in accordance with classification guides to protect supply chain-related information for the information system, system component, or information system service.

๐Ÿ’ผ SA-12 SUPPLY CHAIN PROTECTION

The organization protects against supply chain threats to the information system, system component, or information system service by employing [Assignment: organization-defined security safeguards] as part of a comprehensive, defense-in-breadth information security strategy.

๐Ÿ’ผ SA-13 TRUSTWORTHINESS

The organization: SA-13a. Describes the trustworthiness required in the [Assignment: organization-defined information system, information system component, or information system service] supporting its critical missions/business functions; and SA-13b. Implements [Assignment: organization-defined assurance overlay] to achieve such trustworthiness.

๐Ÿ’ผ SA-14 CRITICALITY ANALYSIS

The organization identifies critical information system components and functions by performing a criticality analysis for [Assignment: organization-defined information systems, information system components, or information system services] at [Assignment: organization-defined decision points in the system development life cycle].

๐Ÿ’ผ SA-15 (1) QUALITY METRICS

The organization requires the developer of the information system, system component, or information system service to: SA-15 (1)(a) Define quality metrics at the beginning of the development process; and SA-15 (1)(b) Provide evidence of meeting the quality metrics [Selection (one or more): [Assignment: organization-defined frequency]; [Assignment: organization-defined program review milestones]; upon delivery].

๐Ÿ’ผ SA-15 (2) SECURITY TRACKING TOOLS

The organization requires the developer of the information system, system component, or information system service to select and employ a security tracking tool for use during the development process.

๐Ÿ’ผ SA-15 (3) CRITICALITY ANALYSIS

The organization requires the developer of the information system, system component, or information system service to perform a criticality analysis at [Assignment: organization-defined breadth/depth] and at [Assignment: organization-defined decision points in the system development life cycle].

๐Ÿ’ผ SA-15 (4) THREAT MODELING | VULNERABILITY ANALYSIS

The organization requires that developers perform threat modeling and a vulnerability analysis for the information system at [Assignment: organization-defined breadth/depth] that: SA-15 (4)(a) Uses [Assignment: organization-defined information concerning impact, environment of operations, known or assumed threats, and acceptable risk levels]; SA-15 (4)(b) Employs [Assignment: organization-defined tools and methods]; and SA-15 (4)(c) Produces evidence that meets [Assignment: organization-defined acceptance criteria].

๐Ÿ’ผ SA-15 (5) ATTACK SURFACE REDUCTION

The organization requires the developer of the information system, system component, or information system service to reduce attack surfaces to [Assignment: organization-defined thresholds].

๐Ÿ’ผ SA-15 (6) CONTINUOUS IMPROVEMENT

The organization requires the developer of the information system, system component, or information system service to implement an explicit process to continuously improve the development process.

๐Ÿ’ผ SA-15 (7) AUTOMATED VULNERABILITY ANALYSIS

The organization requires the developer of the information system, system component, or information system service to: SA-15 (7)(a) Perform an automated vulnerability analysis using [Assignment: organization-defined tools]; SA-15 (7)(b) Determine the exploitation potential for discovered vulnerabilities; SA-15 (7)(c) Determine potential risk mitigations for delivered vulnerabilities; and SA-15 (7)(d) Deliver the outputs of the tools and results of the analysis to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SA-15 (9) USE OF LIVE DATA

The organization approves, documents, and controls the use of live data in development and test environments for the information system, system component, or information system service.

๐Ÿ’ผ SA-15 Development Process, Standards, and Tools

a. Require the developer of the system, system component, or system service to follow a documented development process that: 1. Explicitly addresses security and privacy requirements; 2. Identifies the standards and tools used in the development process; 3. Documents the specific tool options and tool configurations used in the development process; and 4. Documents, manages, and ensures the integrity of changes to the process and/or tools used in development; and b. Review the development process, standards, tools, tool options, and tool configurations [Assignment: organization-defined frequency] to determine if the process, standards, tools, tool options and tool configurations selected and employed can satisfy the following security and privacy requirements: [Assignment: organization-defined security and privacy requirements].

๐Ÿ’ผ SA-15 DEVELOPMENT PROCESS, STANDARDS, AND TOOLS

The organization: SA-15a. Requires the developer of the information system, system component, or information system service to follow a documented development process that: SA-15a.1. Explicitly addresses security requirements; SA-15a.2. Identifies the standards and tools used in the development process; SA-15a.3. Documents the specific tool options and tool configurations used in the development process; and SA-15a.4. Documents, manages, and ensures the integrity of changes to the process and/or tools used in development; and SA-15b. Reviews the development process, standards, tools, and tool options/configurations [Assignment: organization-defined frequency] to determine if the process, standards, tools, and tool options/configurations selected and employed can satisfy [Assignment: organization-defined security requirements].

๐Ÿ’ผ SA-15 Development Process, Standards, and Tools (M)(H)

a. Require the developer of the system, system component, or system service to follow a documented development process that: 1. Explicitly addresses security and privacy requirements; 2. Identifies the standards and tools used in the development process; 3. Documents the specific tool options and tool configurations used in the development process; and 4. Documents, manages, and ensures the integrity of changes to the process and/or tools used in development; and b. Review the development process, standards, tools, tool options, and tool configurations [FedRAMP Assignment: frequency at least annually] to determine if the process, standards, tools, tool options and tool configurations selected and employed can satisfy the following security and privacy requirements: [FedRAMP Assignment: FedRAMP Security Authorization requirements].

๐Ÿ’ผ SA-15 Development Process, Standards, and Tools (M)(H)

a. Require the developer of the system, system component, or system service to follow a documented development process that: 1. Explicitly addresses security and privacy requirements; 2. Identifies the standards and tools used in the development process; 3. Documents the specific tool options and tool configurations used in the development process; and 4. Documents, manages, and ensures the integrity of changes to the process and/or tools used in development; and b. Review the development process, standards, tools, tool options, and tool configurations [FedRAMP Assignment: frequency at least annually] to determine if the process, standards, tools, tool options and tool configurations selected and employed can satisfy the following security and privacy requirements: [FedRAMP Assignment: FedRAMP Security Authorization requirements].

๐Ÿ’ผ SA-15(1) Development Process, Standards, and Tools | Quality Metrics

Require the developer of the system, system component, or system service to: (a) Define quality metrics at the beginning of the development process; and (b) Provide evidence of meeting the quality metrics [Selection (one or more): [Assignment: organization-defined frequency]; [Assignment: organization-defined program review milestones]; upon delivery].

๐Ÿ’ผ SA-15(3) Criticality Analysis (M)(H)

Require the developer of the system, system component, or system service to perform a criticality analysis: (a) At the following decision points in the system development life cycle: [Assignment: organization-defined decision points in the system development life cycle]; and (b) At the following level of rigor: [Assignment: organization-defined breadth and depth of criticality analysis].

๐Ÿ’ผ SA-15(3) Criticality Analysis (M)(H)

Require the developer of the system, system component, or system service to perform a criticality analysis: (a) At the following decision points in the system development life cycle: [Assignment: organization-defined decision points in the system development life cycle]; and (b) At the following level of rigor: [Assignment: organization-defined breadth and depth of criticality analysis].

๐Ÿ’ผ SA-15(3) Development Process, Standards, and Tools | Criticality Analysis

Require the developer of the system, system component, or system service to perform a criticality analysis: (a) At the following decision points in the system development life cycle: [Assignment: organization-defined decision points in the system development life cycle]; and (b) At the following level of rigor: [Assignment: organization-defined breadth and depth of criticality analysis].

๐Ÿ’ผ SA-15(7) Development Process, Standards, and Tools | Automated Vulnerability Analysis

Require the developer of the system, system component, or system service [Assignment: organization-defined frequency] to: (a) Perform an automated vulnerability analysis using [Assignment: organization-defined tools]; (b) Determine the exploitation potential for discovered vulnerabilities; (c) Determine potential risk mitigations for delivered vulnerabilities; and (d) Deliver the outputs of the tools and results of the analysis to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SA-16 Developer-provided Training

Require the developer of the system, system component, or system service to provide the following training on the correct use and operation of the implemented security and privacy functions, controls, and/or mechanisms: [Assignment: organization-defined training].

๐Ÿ’ผ SA-16 DEVELOPER-PROVIDED TRAINING

The organization requires the developer of the information system, system component, or information system service to provide [Assignment: organization-defined training] on the correct use and operation of the implemented security functions, controls, and/or mechanisms.

๐Ÿ’ผ SA-16 Developer-provided Training (H)

Require the developer of the system, system component, or system service to provide the following training on the correct use and operation of the implemented security and privacy functions, controls, and/or mechanisms: [Assignment: organization-defined training].

๐Ÿ’ผ SA-17 (1) FORMAL POLICY MODEL

The organization requires the developer of the information system, system component, or information system service to: SA-17 (1)(a) Produce, as an integral part of the development process, a formal policy model describing the [Assignment: organization-defined elements of organizational security policy] to be enforced; and SA-17 (1)(b) Prove that the formal policy model is internally consistent and sufficient to enforce the defined elements of the organizational security policy when implemented.

๐Ÿ’ผ SA-17 (2) SECURITY-RELEVANT COMPONENTS

The organization requires the developer of the information system, system component, or information system service to: SA-17 (2)(a) Define security-relevant hardware, software, and firmware; and SA-17 (2)(b) Provide a rationale that the definition for security-relevant hardware, software, and firmware is complete.

๐Ÿ’ผ SA-17 (3) FORMAL CORRESPONDENCE

The organization requires the developer of the information system, system component, or information system service to: SA-17 (3)(a) Produce, as an integral part of the development process, a formal top-level specification that specifies the interfaces to security-relevant hardware, software, and firmware in terms of exceptions, error messages, and effects; SA-17 (3)(b) Show via proof to the extent feasible with additional informal demonstration as necessary, that the formal top-level specification is consistent with the formal policy model; SA-17 (3)(c) Show via informal demonstration, that the formal top-level specification completely covers the interfaces to security-relevant hardware, software, and firmware; SA-17 (3)(d) Show that the formal top-level specification is an accurate description of the implemented security-relevant hardware, software, and firmware; and SA-17 (3)(e) Describe the security-relevant hardware, software, and firmware mechanisms not addressed in the formal top-level specification but strictly internal to the security-relevant hardware, software, and firmware.

๐Ÿ’ผ SA-17 (4) INFORMAL CORRESPONDENCE

The organization requires the developer of the information system, system component, or information system service to: SA-17 (4)(a) Produce, as an integral part of the development process, an informal descriptive top-level specification that specifies the interfaces to security-relevant hardware, software, and firmware in terms of exceptions, error messages, and effects; SA-17 (4)(b) Show via [Selection: informal demonstration, convincing argument with formal methods as feasible] that the descriptive top-level specification is consistent with the formal policy model; SA-17 (4)(c) Show via informal demonstration, that the descriptive top-level specification completely covers the interfaces to security-relevant hardware, software, and firmware; SA-17 (4)(d) Show that the descriptive top-level specification is an accurate description of the interfaces to security-relevant hardware, software, and firmware; and SA-17 (4)(e) Describe the security-relevant hardware, software, and firmware mechanisms not addressed in the descriptive top-level specification but strictly internal to the security-relevant hardware, software, and firmware.

๐Ÿ’ผ SA-17 (5) CONCEPTUALLY SIMPLE DESIGN

The organization requires the developer of the information system, system component, or information system service to: SA-17 (5)(a) Design and structure the security-relevant hardware, software, and firmware to use a complete, conceptually simple protection mechanism with precisely defined semantics; and SA-17 (5)(b) Internally structure the security-relevant hardware, software, and firmware with specific regard for this mechanism.

๐Ÿ’ผ SA-17 (6) STRUCTURE FOR TESTING

The organization requires the developer of the information system, system component, or information system service to structure security-relevant hardware, software, and firmware to facilitate testing.

๐Ÿ’ผ SA-17 (7) STRUCTURE FOR LEAST PRIVILEGE

The organization requires the developer of the information system, system component, or information system service to structure security-relevant hardware, software, and firmware to facilitate controlling access with least privilege.

๐Ÿ’ผ SA-17 Developer Security and Privacy Architecture and Design

Require the developer of the system, system component, or system service to produce a design specification and security and privacy architecture that: a. Is consistent with the organizationโ€™s security and privacy architecture that is an integral part the organizationโ€™s enterprise architecture; b. Accurately and completely describes the required security and privacy functionality, and the allocation of controls among physical and logical components; and c. Expresses how individual security and privacy functions, mechanisms, and services work together to provide required security and privacy capabilities and a unified approach to protection.

๐Ÿ’ผ SA-17 Developer Security and Privacy Architecture and Design (H)

Require the developer of the system, system component, or system service to produce a design specification and security and privacy architecture that: a. Is consistent with the organization's security and privacy architecture that is an integral part the organization's enterprise architecture; b. Accurately and completely describes the required security and privacy functionality, and the allocation of controls among physical and logical components; and c. Expresses how individual security and privacy functions, mechanisms, and services work together to provide required security and privacy capabilities and a unified approach to protection.

๐Ÿ’ผ SA-17 DEVELOPER SECURITY ARCHITECTURE AND DESIGN

The organization requires the developer of the information system, system component, or information system service to produce a design specification and security architecture that: SA-17a. Is consistent with and supportive of the organization???s security architecture which is established within and is an integrated part of the organization???s enterprise architecture; SA-17b. Accurately and completely describes the required security functionality, and the allocation of security controls among physical and logical components; and SA-17c. Expresses how individual security functions, mechanisms, and services work together to provide required security capabilities and a unified approach to protection.

๐Ÿ’ผ SA-17(1) Developer Security and Privacy Architecture and Design | Formal Policy Model

Require the developer of the system, system component, or system service to: (a) Produce, as an integral part of the development process, a formal policy model describing the [Assignment: organization-defined elements of organizational security and privacy policy] to be enforced; and (b) Prove that the formal policy model is internally consistent and sufficient to enforce the defined elements of the organizational security and privacy policy when implemented.

๐Ÿ’ผ SA-17(3) Developer Security and Privacy Architecture and Design | Formal Correspondence

Require the developer of the system, system component, or system service to: (a) Produce, as an integral part of the development process, a formal top-level specification that specifies the interfaces to security-relevant hardware, software, and firmware in terms of exceptions, error messages, and effects; (b) Show via proof to the extent feasible with additional informal demonstration as necessary, that the formal top-level specification is consistent with the formal policy model; (c) Show via informal demonstration, that the formal top-level specification completely covers the interfaces to security-relevant hardware, software, and firmware; (d) Show that the formal top-level specification is an accurate description of the implemented security-relevant hardware, software, and firmware; and (e) Describe the security-relevant hardware, software, and firmware mechanisms not addressed in the formal top-level specification but strictly internal to the security-relevant hardware, software, and firmware.

๐Ÿ’ผ SA-17(4) Developer Security and Privacy Architecture and Design | Informal Correspondence

Require the developer of the system, system component, or system service to: (a) Produce, as an integral part of the development process, an informal descriptive top-level specification that specifies the interfaces to security-relevant hardware, software, and firmware in terms of exceptions, error messages, and effects; (b) Show via [Selection: informal demonstration; convincing argument with formal methods as feasible] that the descriptive top-level specification is consistent with the formal policy model; (c) Show via informal demonstration, that the descriptive top-level specification completely covers the interfaces to security-relevant hardware, software, and firmware; (d) Show that the descriptive top-level specification is an accurate description of the interfaces to security-relevant hardware, software, and firmware; and (e) Describe the security-relevant hardware, software, and firmware mechanisms not addressed in the descriptive top-level specification but strictly internal to the security-relevant hardware, software, and firmware.

๐Ÿ’ผ SA-18 (1) MULTIPLE PHASES OF SDLC

The organization employs anti-tamper technologies and techniques during multiple phases in the system development life cycle including design, development, integration, operations, and maintenance.

๐Ÿ’ผ SA-19 COMPONENT AUTHENTICITY

The organization: SA-19a. Develops and implements anti-counterfeit policy and procedures that include the means to detect and prevent counterfeit components from entering the information system; and SA-19b. Reports counterfeit information system components to [Selection (one or more): source of counterfeit component; [Assignment: organization-defined external reporting organizations]; [Assignment: organization-defined personnel or roles]].

๐Ÿ’ผ SA-2 Allocation of Resources

a. Determine the high-level information security and privacy requirements for the system or system service in mission and business process planning; b. Determine, document, and allocate the resources required to protect the system or system service as part of the organizational capital planning and investment control process; and c. Establish a discrete line item for information security and privacy in organizational programming and budgeting documentation.

๐Ÿ’ผ SA-2 ALLOCATION OF RESOURCES

The organization: SA-2a. Determines information security requirements for the information system or information system service in mission/business process planning; SA-2b. Determines, documents, and allocates the resources required to protect the information system or information system service as part of its capital planning and investment control process; and SA-2c. Establishes a discrete line item for information security in organizational programming and budgeting documentation.

๐Ÿ’ผ SA-2 Allocation of Resources (L)(M)(H)

a. Determine the high-level information security and privacy requirements for the system or system service in mission and business process planning; b. Determine, document, and allocate the resources required to protect the system or system service as part of the organizational capital planning and investment control process; and c. Establish a discrete line item for information security and privacy in organizational programming and budgeting documentation.

๐Ÿ’ผ SA-2 Allocation of Resources (L)(M)(H)

a. Determine the high-level information security and privacy requirements for the system or system service in mission and business process planning; b. Determine, document, and allocate the resources required to protect the system or system service as part of the organizational capital planning and investment control process; and c. Establish a discrete line item for information security and privacy in organizational programming and budgeting documentation.

๐Ÿ’ผ SA-2 Allocation of Resources (L)(M)(H)

a. Determine the high-level information security and privacy requirements for the system or system service in mission and business process planning; b. Determine, document, and allocate the resources required to protect the system or system service as part of the organizational capital planning and investment control process; and c. Establish a discrete line item for information security and privacy in organizational programming and budgeting documentation.

๐Ÿ’ผ SA-21 (1) VALIDATION OF SCREENING

The organization requires the developer of the information system, system component, or information system service take [Assignment: organization-defined actions] to ensure that the required access authorizations and screening criteria are satisfied.

๐Ÿ’ผ SA-21 Developer Screening

Require that the developer of [Assignment: organization-defined system, system component, or system service]: a. Has appropriate access authorizations as determined by assigned [Assignment: organization-defined official government duties]; and b. Satisfies the following additional personnel screening criteria: [Assignment: organization-defined additional personnel screening criteria].

๐Ÿ’ผ SA-21 DEVELOPER SCREENING

The organization requires that the developer of [Assignment: organization-defined information system, system component, or information system service]: SA-21a. Have appropriate access authorizations as determined by assigned [Assignment: organization-defined official government duties]; and SA-21b. Satisfy [Assignment: organization-defined additional personnel screening criteria].

๐Ÿ’ผ SA-21 Developer Screening (H)

Require that the developer of [Assignment: organization-defined system, system component, or system service]: a. Has appropriate access authorizations as determined by assigned [Assignment: organization-defined official government duties]; and b. Satisfies the following additional personnel screening criteria: [Assignment: organization-defined additional personnel screening criteria].

๐Ÿ’ผ SA-22 Unsupported System Components

a. Replace system components when support for the components is no longer available from the developer, vendor, or manufacturer; or b. Provide the following options for alternative sources for continued support for unsupported components [Selection (one or more): in-house support; [Assignment: organization-defined support from external providers]].

๐Ÿ’ผ SA-22 UNSUPPORTED SYSTEM COMPONENTS

The organization: SA-22a. Replaces information system components when support for the components is no longer available from the developer, vendor, or manufacturer; and SA-22b. Provides justification and documents approval for the continued use of unsupported system components required to satisfy mission/business needs.

๐Ÿ’ผ SA-22 Unsupported System Components (L)(M)(H)

a. Replace system components when support for the components is no longer available from the developer, vendor, or manufacturer; or b. Provide the following options for alternative sources for continued support for unsupported components [Selection (one-or-more): in-house support; [Assignment: organization-defined support from external providers]].

๐Ÿ’ผ SA-22 Unsupported System Components (L)(M)(H)

a. Replace system components when support for the components is no longer available from the developer, vendor, or manufacturer; or b. Provide the following options for alternative sources for continued support for unsupported components [Selection (one-or-more): in-house support; [Assignment: organization-defined support from external providers]].

๐Ÿ’ผ SA-22 Unsupported System Components (L)(M)(H)

a. Replace system components when support for the components is no longer available from the developer, vendor, or manufacturer; or b. Provide the following options for alternative sources for continued support for unsupported components [Selection (one-or-more): in-house support; [Assignment: organization-defined support from external providers]].

๐Ÿ’ผ SA-23 Specialization

Employ [Selection (one or more): design; modification; augmentation; reconfiguration] on [Assignment: organization-defined systems or system components] supporting mission essential services or functions to increase the trustworthiness in those systems or components.

๐Ÿ’ผ SA-3 System Development Life Cycle

a. Acquire, develop, and manage the system using [Assignment: organization-defined system development life cycle] that incorporates information security and privacy considerations; b. Define and document information security and privacy roles and responsibilities throughout the system development life cycle; c. Identify individuals having information security and privacy roles and responsibilities; and d. Integrate the organizational information security and privacy risk management process into system development life cycle activities.

๐Ÿ’ผ SA-3 SYSTEM DEVELOPMENT LIFE CYCLE

The organization: SA-3a. Manages the information system using [Assignment: organization-defined system development life cycle] that incorporates information security considerations; SA-3b. Defines and documents information security roles and responsibilities throughout the system development life cycle; SA-3c. Identifies individuals having information security roles and responsibilities; and SA-3d. Integrates the organizational information security risk management process into system development life cycle activities.

๐Ÿ’ผ SA-3 System Development Life Cycle (L)(M)(H)

a. Acquire, develop, and manage the system using [Assignment: organization-defined system development life cycle] that incorporates information security and privacy considerations; b. Define and document information security and privacy roles and responsibilities throughout the system development life cycle; c. Identify individuals having information security and privacy roles and responsibilities; and d. Integrate the organizational information security and privacy risk management process into system development life cycle activities.

๐Ÿ’ผ SA-3 System Development Life Cycle (L)(M)(H)

a. Acquire, develop, and manage the system using [Assignment: organization-defined system development life cycle] that incorporates information security and privacy considerations; b. Define and document information security and privacy roles and responsibilities throughout the system development life cycle; c. Identify individuals having information security and privacy roles and responsibilities; and d. Integrate the organizational information security and privacy risk management process into system development life cycle activities.

๐Ÿ’ผ SA-3 System Development Life Cycle (L)(M)(H)

a. Acquire, develop, and manage the system using [Assignment: organization-defined system development life cycle] that incorporates information security and privacy considerations; b. Define and document information security and privacy roles and responsibilities throughout the system development life cycle; c. Identify individuals having information security and privacy roles and responsibilities; and d. Integrate the organizational information security and privacy risk management process into system development life cycle activities.

๐Ÿ’ผ SA-3(2) System Development Life Cycle | Use of Live or Operational Data

(a) Approve, document, and control the use of live data in preproduction environments for the system, system component, or system service; and (b) Protect preproduction environments for the system, system component, or system service at the same impact or classification level as any live data in use within the preproduction environments.

๐Ÿ’ผ SA-4 (10) USE OF APPROVED PIV PRODUCTS

The organization employs only information technology products on the FIPS 201-approved products list for Personal Identity Verification (PIV) capability implemented within organizational information systems.

๐Ÿ’ผ SA-4 (2) DESIGN | IMPLEMENTATION INFORMATION FOR SECURITY CONTROLS

The organization requires the developer of the information system, system component, or information system service to provide design and implementation information for the security controls to be employed that includes: [Selection (one or more): security-relevant external system interfaces; high-level design; low-level design; source code or hardware schematics; [Assignment: organization-defined design/implementation information]] at [Assignment: organization-defined level of detail].

๐Ÿ’ผ SA-4 (3) DEVELOPMENT METHODS | TECHNIQUES | PRACTICES

The organization requires the developer of the information system, system component, or information system service to demonstrate the use of a system development life cycle that includes [Assignment: organization-defined state-of-the-practice system/security engineering methods, software development methods, testing/evaluation/validation techniques, and quality control processes].

๐Ÿ’ผ SA-4 (5) SYSTEM | COMPONENT | SERVICE CONFIGURATIONS

The organization requires the developer of the information system, system component, or information system service to: SA-4 (5)(a) Deliver the system, component, or service with [Assignment: organization-defined security configurations] implemented; and SA-4 (5)(b) Use the configurations as the default for any subsequent system, component, or service reinstallation or upgrade.

๐Ÿ’ผ SA-4 (6) USE OF INFORMATION ASSURANCE PRODUCTS

The organization: SA-4 (6)(a) Employs only government off-the-shelf (GOTS) or commercial off-the-shelf (COTS) information assurance (IA) and IA-enabled information technology products that compose an NSA-approved solution to protect classified information when the networks used to transmit the information are at a lower classification level than the information being transmitted; and SA-4 (6)(b) Ensures that these products have been evaluated and/or validated by NSA or in accordance with NSA-approved procedures.

๐Ÿ’ผ SA-4 (7) NIAP-APPROVED PROTECTION PROFILES

The organization: SA-4 (7)(a) Limits the use of commercially provided information assurance (IA) and IA-enabled information technology products to those products that have been successfully evaluated against a National Information Assurance partnership (NIAP)-approved Protection Profile for a specific technology type, if such a profile exists; and SA-4 (7)(b) Requires, if no NIAP-approved Protection Profile exists for a specific technology type but a commercially provided information technology product relies on cryptographic functionality to enforce its security policy, that the cryptographic module is FIPS-validated.

๐Ÿ’ผ SA-4 (8) CONTINUOUS MONITORING PLAN

The organization requires the developer of the information system, system component, or information system service to produce a plan for the continuous monitoring of security control effectiveness that contains [Assignment: organization-defined level of detail].

๐Ÿ’ผ SA-4 Acquisition Process

Include the following requirements, descriptions, and criteria, explicitly or by reference, using [Selection (one or more): standardized contract language; [Assignment: organization-defined contract language]] in the acquisition contract for the system, system component, or system service: a. Security and privacy functional requirements; b. Strength of mechanism requirements; c. Security and privacy assurance requirements; d. Controls needed to satisfy the security and privacy requirements. e. Security and privacy documentation requirements; f. Requirements for protecting security and privacy documentation; g. Description of the system development environment and environment in which the system is intended to operate; h. Allocation of responsibility or identification of parties responsible for information security, privacy, and supply chain risk management; and i. Acceptance criteria.

๐Ÿ’ผ SA-4 ACQUISITION PROCESS

The organization includes the following requirements, descriptions, and criteria, explicitly or by reference, in the acquisition contract for the information system, system component, or information system service in accordance with applicable federal laws, Executive Orders, directives, policies, regulations, standards, guidelines, and organizational mission/business needs: SA-4a. Security functional requirements; SA-4b. Security strength requirements; SA-4c. Security assurance requirements; SA-4d. Security-related documentation requirements; SA-4e. Requirements for protecting security-related documentation; SA-4f. Description of the information system development environment and environment in which the system is intended to operate; and SA-4g. Acceptance criteria.

๐Ÿ’ผ SA-4 Acquisition Process (L)(M)(H)

Include the following requirements, descriptions, and criteria, explicitly or by reference, using [Selection (one-or-more): standardized contract language; [Assignment: organization-defined contract language]] in the acquisition contract for the system, system component, or system service: a. Security and privacy functional requirements; b. Strength of mechanism requirements; c. Security and privacy assurance requirements; d. Controls needed to satisfy the security and privacy requirements. e. Security and privacy documentation requirements; f. Requirements for protecting security and privacy documentation; g. Description of the system development environment and environment in which the system is intended to operate; h. Allocation of responsibility or identification of parties responsible for information security, privacy, and supply chain risk management; and i. Acceptance criteria. **SA-4 Additional FedRAMP Requirements and Guidance:** **Guidance**: The use of Common Criteria (ISO/IEC 15408) evaluated products is strongly preferred. See <https://www.niap-ccevs.org/Product/index.cfm> or <https://www.commoncriteriaportal.org/products/>. **Requirement**: The service provider must comply with Federal Acquisition Regulation (FAR) Subpart 7.103, and Section 889 of the John S. McCain National Defense Authorization Act (NDAA) for Fiscal Year 2019 (Pub. L. 115-232), and FAR Subpart 4.21, which implements Section 889 (as well as any added updates related to FISMA to address security concerns in the system acquisitions process).

๐Ÿ’ผ SA-4 Acquisition Process (L)(M)(H)

Include the following requirements, descriptions, and criteria, explicitly or by reference, using [Selection (one-or-more): standardized contract language; [Assignment: organization-defined contract language]] in the acquisition contract for the system, system component, or system service: a. Security and privacy functional requirements; b. Strength of mechanism requirements; c. Security and privacy assurance requirements; d. Controls needed to satisfy the security and privacy requirements. e. Security and privacy documentation requirements; f. Requirements for protecting security and privacy documentation; g. Description of the system development environment and environment in which the system is intended to operate; h. Allocation of responsibility or identification of parties responsible for information security, privacy, and supply chain risk management; and i. Acceptance criteria. **SA-4 Additional FedRAMP Requirements and Guidance:** **Guidance**: The use of Common Criteria (ISO/IEC 15408) evaluated products is strongly preferred. See <https://www.niap-ccevs.org/Product/index.cfm> or <https://www.commoncriteriaportal.org/products/>. **Requirement**: The service provider must comply with Federal Acquisition Regulation (FAR) Subpart 7.103, and Section 889 of the John S. McCain National Defense Authorization Act (NDAA) for Fiscal Year 2019 (Pub. L. 115-232), and FAR Subpart 4.21, which implements Section 889 (as well as any added updates related to FISMA to address security concerns in the system acquisitions process).

๐Ÿ’ผ SA-4 Acquisition Process (L)(M)(H)

Include the following requirements, descriptions, and criteria, explicitly or by reference, using [Selection (one-or-more): standardized contract language; [Assignment: organization-defined contract language]] in the acquisition contract for the system, system component, or system service: a. Security and privacy functional requirements; b. Strength of mechanism requirements; c. Security and privacy assurance requirements; d. Controls needed to satisfy the security and privacy requirements. e. Security and privacy documentation requirements; f. Requirements for protecting security and privacy documentation; g. Description of the system development environment and environment in which the system is intended to operate; h. Allocation of responsibility or identification of parties responsible for information security, privacy, and supply chain risk management; and i. Acceptance criteria. **SA-4 Additional FedRAMP Requirements and Guidance:** **Guidance**: The use of Common Criteria (ISO/IEC 15408) evaluated products is strongly preferred. See <https://www.niap-ccevs.org/Product/index.cfm> or <https://www.commoncriteriaportal.org/products/>. **Requirement**: The service provider must comply with Federal Acquisition Regulation (FAR) Subpart 7.103, and Section 889 of the John S. McCain National Defense Authorization Act (NDAA) for Fiscal Year 2019 (Pub. L. 115-232), and FAR Subpart 4.21, which implements Section 889 (as well as any added updates related to FISMA to address security concerns in the system acquisitions process).

๐Ÿ’ผ SA-4(12) Acquisition Process | Data Ownership

(a) Include organizational data ownership requirements in the acquisition contract; and (b) Require all data to be removed from the contractorโ€™s system and returned to the organization within [Assignment: organization-defined time frame].

๐Ÿ’ผ SA-4(2) Acquisition Process | Design and Implementation Information for Controls

Require the developer of the system, system component, or system service to provide design and implementation information for the controls that includes: [Selection (one or more): security-relevant external system interfaces; high-level design; low-level design; source code or hardware schematics; [Assignment: organization-defined design and implementation information]] at [Assignment: organization-defined level of detail].

๐Ÿ’ผ SA-4(2) Design and Implementation Information for Controls (M)(H)

Require the developer of the system, system component, or system service to provide design and implementation information for the controls that includes: [FedRAMP Assignment: at a minimum to include security-relevant external system interfaces; high-level design; low-level design; source code or network and data flow diagram; [Assignment: organization-defined design and implementation information]] at [Assignment: organization-defined level of detail].

๐Ÿ’ผ SA-4(2) Design and Implementation Information for Controls (M)(H)

Require the developer of the system, system component, or system service to provide design and implementation information for the controls that includes: [FedRAMP Assignment: at a minimum to include security-relevant external system interfaces; high-level design; low-level design; source code or network and data flow diagram; [Assignment: organization-defined design and implementation information]] at [Assignment: organization-defined level of detail].

๐Ÿ’ผ SA-4(3) Acquisition Process | Development Methods, Techniques, and Practices

Require the developer of the system, system component, or system service to demonstrate the use of a system development life cycle process that includes: (a) [Assignment: organization-defined systems engineering methods]; (b) <assign:#>organization-defined [Selection (one or more): systems security; privacy<#:assign> engineering methods]; and (c) [Assignment: organization-defined software development methods; testing, evaluation, assessment, verification, and validation methods; and quality control processes].

๐Ÿ’ผ SA-4(5) System, Component, and Service Configurations (H)

Require the developer of the system, system component, or system service to: (a) Deliver the system, component, or service with [FedRAMP Assignment: The service provider shall use the DoD STIGs to establish configuration settings; Center for Internet Security up to Level 2 (CIS Level 2) guidelines shall be used if STIGs are not available; Custom baselines shall be used if CIS is not available.] implemented; and (b) Use the configurations as the default for any subsequent system, component, or service reinstallation or upgrade.

๐Ÿ’ผ SA-4(6) Acquisition Process | Use of Information Assurance Products

(a) Employ only government off-the-shelf or commercial off-the-shelf information assurance and information assurance-enabled information technology products that compose an NSA-approved solution to protect classified information when the networks used to transmit the information are at a lower classification level than the information being transmitted; and (b) Ensure that these products have been evaluated and/or validated by NSA or in accordance with NSA-approved procedures.

๐Ÿ’ผ SA-4(7) Acquisition Process | NIAP-approved Protection Profiles

(a) Limit the use of commercially provided information assurance and information assurance-enabled information technology products to those products that have been successfully evaluated against a National Information Assurance partnership (NIAP)-approved Protection Profile for a specific technology type, if such a profile exists; and (b) Require, if no NIAP-approved Protection Profile exists for a specific technology type but a commercially provided information technology product relies on cryptographic functionality to enforce its security policy, that the cryptographic module is FIPS-validated or NSA-approved.

๐Ÿ’ผ SA-5 INFORMATION SYSTEM DOCUMENTATION

The organization: SA-5a. Obtains administrator documentation for the information system, system component, or information system service that describes: SA-5a.1. Secure configuration, installation, and operation of the system, component, or service; SA-5a.2. Effective use and maintenance of security functions/mechanisms; and SA-5a.3. Known vulnerabilities regarding configuration and use of administrative (i.e., privileged) functions; SA-5b. Obtains user documentation for the information system, system component, or information system service that describes: SA-5b.1. User-accessible security functions/mechanisms and how to effectively use those security functions/mechanisms; SA-5b.2. Methods for user interaction, which enables individuals to use the system, component, or service in a more secure manner; and SA-5b.3. User responsibilities in maintaining the security of the system, component, or service; SA-5c. Documents attempts to obtain information system, system component, or information system service documentation when such documentation is either unavailable or nonexistent and takes [Assignment: organization-defined actions] in response; SA-5d. Protects documentation as required, in accordance with the risk management strategy; and SA-5e. Distributes documentation to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SA-5 System Documentation

a. Obtain or develop administrator documentation for the system, system component, or system service that describes: 1. Secure configuration, installation, and operation of the system, component, or service; 2. Effective use and maintenance of security and privacy functions and mechanisms; and 3. Known vulnerabilities regarding configuration and use of administrative or privileged functions; b. Obtain or develop user documentation for the system, system component, or system service that describes: 1. User-accessible security and privacy functions and mechanisms and how to effectively use those functions and mechanisms; 2. Methods for user interaction, which enables individuals to use the system, component, or service in a more secure manner and protect individual privacy; and 3. User responsibilities in maintaining the security of the system, component, or service and privacy of individuals; c. Document attempts to obtain system, system component, or system service documentation when such documentation is either unavailable or nonexistent and take [Assignment: organization-defined actions] in response; and d. Distribute documentation to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SA-5 System Documentation (L)(M)(H)

a. Obtain or develop administrator documentation for the system, system component, or system service that describes: 1. Secure configuration, installation, and operation of the system, component, or service; 2. Effective use and maintenance of security and privacy functions and mechanisms; and 3. Known vulnerabilities regarding configuration and use of administrative or privileged functions; b. Obtain or develop user documentation for the system, system component, or system service that describes: 1. User-accessible security and privacy functions and mechanisms and how to effectively use those functions and mechanisms; 2. Methods for user interaction, which enables individuals to use the system, component, or service in a more secure manner and protect individual privacy; and 3. User responsibilities in maintaining the security of the system, component, or service and privacy of individuals; c. Document attempts to obtain system, system component, or system service documentation when such documentation is either unavailable or nonexistent and take [Assignment: organization-defined actions] in response; and d. Distribute documentation to [FedRAMP Assignment: at a minimum, the ISSO (or similar role within the organization)].

๐Ÿ’ผ SA-5 System Documentation (L)(M)(H)

a. Obtain or develop administrator documentation for the system, system component, or system service that describes: 1. Secure configuration, installation, and operation of the system, component, or service; 2. Effective use and maintenance of security and privacy functions and mechanisms; and 3. Known vulnerabilities regarding configuration and use of administrative or privileged functions; b. Obtain or develop user documentation for the system, system component, or system service that describes: 1. User-accessible security and privacy functions and mechanisms and how to effectively use those functions and mechanisms; 2. Methods for user interaction, which enables individuals to use the system, component, or service in a more secure manner and protect individual privacy; and 3. User responsibilities in maintaining the security of the system, component, or service and privacy of individuals; c. Document attempts to obtain system, system component, or system service documentation when such documentation is either unavailable or nonexistent and take [Assignment: organization-defined actions] in response; and d. Distribute documentation to [FedRAMP Assignment: at a minimum, the ISSO (or similar role within the organization)].

๐Ÿ’ผ SA-5 System Documentation (L)(M)(H)

a. Obtain or develop administrator documentation for the system, system component, or system service that describes: 1. Secure configuration, installation, and operation of the system, component, or service; 2. Effective use and maintenance of security and privacy functions and mechanisms; and 3. Known vulnerabilities regarding configuration and use of administrative or privileged functions; b. Obtain or develop user documentation for the system, system component, or system service that describes: 1. User-accessible security and privacy functions and mechanisms and how to effectively use those functions and mechanisms; 2. Methods for user interaction, which enables individuals to use the system, component, or service in a more secure manner and protect individual privacy; and 3. User responsibilities in maintaining the security of the system, component, or service and privacy of individuals; c. Document attempts to obtain system, system component, or system service documentation when such documentation is either unavailable or nonexistent and take [Assignment: organization-defined actions] in response; and d. Distribute documentation to [FedRAMP Assignment: at a minimum, the ISSO (or similar role within the organization)].

๐Ÿ’ผ SA-8 Security and Privacy Engineering Principles

Apply the following systems security and privacy engineering principles in the specification, design, development, implementation, and modification of the system and system components: [Assignment: organization-defined systems security and privacy engineering principles].

๐Ÿ’ผ SA-9 (1) RISK ASSESSMENTS | ORGANIZATIONAL APPROVALS

The organization: SA-9 (1)(a) Conducts an organizational assessment of risk prior to the acquisition or outsourcing of dedicated information security services; and SA-9 (1)(b) Ensures that the acquisition or outsourcing of dedicated information security services is approved by [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SA-9 (5) PROCESSING, STORAGE, AND SERVICE LOCATION

The organization restricts the location of [Selection (one or more): information processing; information/data; information system services] to [Assignment: organization-defined locations] based on [Assignment: organization-defined requirements or conditions].

๐Ÿ’ผ SA-9 EXTERNAL INFORMATION SYSTEM SERVICES

The organization: SA-9a. Requires that providers of external information system services comply with organizational information security requirements and employ [Assignment: organization-defined security controls] in accordance with applicable federal laws, Executive Orders, directives, policies, regulations, standards, and guidance; SA-9b. Defines and documents government oversight and user roles and responsibilities with regard to external information system services; and SA-9c. Employs [Assignment: organization-defined processes, methods, and techniques] to monitor security control compliance by external service providers on an ongoing basis.

๐Ÿ’ผ SA-9 External System Services

a. Require that providers of external system services comply with organizational security and privacy requirements and employ the following controls: [Assignment: organization-defined controls]; b. Define and document organizational oversight and user roles and responsibilities with regard to external system services; and c. Employ the following processes, methods, and techniques to monitor control compliance by external service providers on an ongoing basis: [Assignment: organization-defined processes, methods, and techniques].

๐Ÿ’ผ SA-9 External System Services (L)(M)(H)

a. Require that providers of external system services comply with organizational security and privacy requirements and employ the following controls: [FedRAMP Assignment: Appropriate FedRAMP Security Controls Baseline(s) if Federal information is processed or stored within the external system]; b. Define and document organizational oversight and user roles and responsibilities with regard to external system services; and c. Employ the following processes, methods, and techniques to monitor control compliance by external service providers on an ongoing basis: [FedRAMP Assignment: Federal/FedRAMP Continuous Monitoring requirements must be met for external systems where Federal information is processed or stored].

๐Ÿ’ผ SA-9 External System Services (L)(M)(H)

a. Require that providers of external system services comply with organizational security and privacy requirements and employ the following controls: [FedRAMP Assignment: Appropriate FedRAMP Security Controls Baseline(s) if Federal information is processed or stored within the external system]; b. Define and document organizational oversight and user roles and responsibilities with regard to external system services; and c. Employ the following processes, methods, and techniques to monitor control compliance by external service providers on an ongoing basis: [FedRAMP Assignment: Federal/FedRAMP Continuous Monitoring requirements must be met for external systems where Federal information is processed or stored].

๐Ÿ’ผ SA-9 External System Services (L)(M)(H)

a. Require that providers of external system services comply with organizational security and privacy requirements and employ the following controls: [FedRAMP Assignment: Appropriate FedRAMP Security Controls Baseline(s) if Federal information is processed or stored within the external system]; b. Define and document organizational oversight and user roles and responsibilities with regard to external system services; and c. Employ the following processes, methods, and techniques to monitor control compliance by external service providers on an ongoing basis: [FedRAMP Assignment: Federal/FedRAMP Continuous Monitoring requirements must be met for external systems where Federal information is processed or stored].

๐Ÿ’ผ SA-9(1) Risk Assessments and Organizational Approvals (M)(H)

(a) Conduct an organizational assessment of risk prior to the acquisition or outsourcing of information security services; and (b) Verify that the acquisition or outsourcing of dedicated information security services is approved by [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SA-9(1) Risk Assessments and Organizational Approvals (M)(H)

(a) Conduct an organizational assessment of risk prior to the acquisition or outsourcing of information security services; and (b) Verify that the acquisition or outsourcing of dedicated information security services is approved by [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SC-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] system and communications protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and communications protection policy and the associated system and communications protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and communications protection policy and procedures; and c. Review and update the current system and communications protection: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ SC-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and communications protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and communications protection policy and the associated system and communications protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and communications protection policy and procedures; and c. Review and update the current system and communications protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SC-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and communications protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and communications protection policy and the associated system and communications protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and communications protection policy and procedures; and c. Review and update the current system and communications protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SC-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and communications protection policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and communications protection policy and the associated system and communications protection controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and communications protection policy and procedures; and c. Review and update the current system and communications protection: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SC-1 SYSTEM AND COMMUNICATIONS PROTECTION POLICY AND PROCEDURES

The organization: SC-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: SC-1a.1. A system and communications protection policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and SC-1a.2. Procedures to facilitate the implementation of the system and communications protection policy and associated system and communications protection controls; and SC-1b. Reviews and updates the current: SC-1b.1. System and communications protection policy [Assignment: organization-defined frequency]; and SC-1b.2. System and communications protection procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ SC-10 Network Disconnect

Terminate the network connection associated with a communications session at the end of the session or after [Assignment: organization-defined time period] of inactivity.

๐Ÿ’ผ SC-10 NETWORK DISCONNECT

The information system terminates the network connection associated with a communications session at the end of the session or after [Assignment: organization-defined time period] of inactivity.

๐Ÿ’ผ SC-10 Network Disconnect (M)(H)

Terminate the network connection associated with a communications session at the end of the session or after [FedRAMP Assignment: no longer than ten (10) minutes for privileged sessions and no longer than fifteen (15) minutes for user sessions] of inactivity.

๐Ÿ’ผ SC-10 Network Disconnect (M)(H)

Terminate the network connection associated with a communications session at the end of the session or after [FedRAMP Assignment: no longer than ten (10) minutes for privileged sessions and no longer than fifteen (15) minutes for user sessions] of inactivity.

๐Ÿ’ผ SC-11 Trusted Path

a. Provide a [Selection: physically; logically] isolated trusted communications path for communications between the user and the trusted components of the system; and b. Permit users to invoke the trusted communications path for communications between the user and the following security functions of the system, including at a minimum, authentication and re-authentication: [Assignment: organization-defined security functions].

๐Ÿ’ผ SC-11 TRUSTED PATH

The information system establishes a trusted communications path between the user and the following security functions of the system: [Assignment: organization-defined security functions to include at a minimum, information system authentication and re-authentication].

๐Ÿ’ผ SC-11(1) Trusted Path | Irrefutable Communications Path

(a) Provide a trusted communications path that is irrefutably distinguishable from other communications paths; and (b) Initiate the trusted communications path for communications between the [Assignment: organization-defined security functions] of the system and the user.

๐Ÿ’ผ SC-12 (2) SYMMETRIC KEYS

The organization produces, controls, and distributes symmetric cryptographic keys using [Selection: NIST FIPS-compliant; NSA-approved] key management technology and processes.

๐Ÿ’ผ SC-12 (3) ASYMMETRIC KEYS

The organization produces, controls, and distributes asymmetric cryptographic keys using [Selection: NSA-approved key management technology and processes; approved PKI Class 3 certificates or prepositioned keying material; approved PKI Class 3 or Class 4 certificates and hardware security tokens that protect the user???s private key].

๐Ÿ’ผ SC-12 Cryptographic Key Establishment and Management

Establish and manage cryptographic keys when cryptography is employed within the system in accordance with the following key management requirements: [Assignment: organization-defined requirements for key generation, distribution, storage, access, and destruction].

๐Ÿ’ผ SC-12 CRYPTOGRAPHIC KEY ESTABLISHMENT AND MANAGEMENT

The organization establishes and manages cryptographic keys for required cryptography employed within the information system in accordance with [Assignment: organization-defined requirements for key generation, distribution, storage, access, and destruction].

๐Ÿ’ผ SC-12 Cryptographic Key Establishment and Management (L)(M)(H)

Establish and manage cryptographic keys when cryptography is employed within the system in accordance with the following key management requirements: [FedRAMP Assignment: In accordance with Federal requirements]. **SC-12 Additional FedRAMP Requirements and Guidance:** **Guidance**: See references in NIST 800-53 documentation. **Guidance**: Must meet applicable Federal Cryptographic Requirements. See References Section of control. **Guidance**: Wildcard certificates may be used internally within the system, but are not permitted for external customer access to the system.

๐Ÿ’ผ SC-12 Cryptographic Key Establishment and Management (L)(M)(H)

Establish and manage cryptographic keys when cryptography is employed within the system in accordance with the following key management requirements: [FedRAMP Assignment: In accordance with Federal requirements]. **SC-12 Additional FedRAMP Requirements and Guidance:** **Guidance**: See references in NIST 800-53 documentation. **Guidance**: Must meet applicable Federal Cryptographic Requirements. See References Section of control. **Guidance**: Wildcard certificates may be used internally within the system, but are not permitted for external customer access to the system.

๐Ÿ’ผ SC-12 Cryptographic Key Establishment and Management (L)(M)(H)

Establish and manage cryptographic keys when cryptography is employed within the system in accordance with the following key management requirements: [FedRAMP Assignment: In accordance with Federal requirements]. **SC-12 Additional FedRAMP Requirements and Guidance:** **Guidance**: See references in NIST 800-53 documentation. **Guidance**: Must meet applicable Federal Cryptographic Requirements. See References Section of control. **Guidance**: Wildcard certificates may be used internally within the system, but are not permitted for external customer access to the system.

๐Ÿ’ผ SC-12(3) Cryptographic Key Establishment and Management | Asymmetric Keys

Produce, control, and distribute asymmetric cryptographic keys using [Selection: NSA-approved key management technology and processes; prepositioned keying material; DoD-approved or DoD-issued Medium Assurance PKI certificates; DoD-approved or DoD-issued Medium Hardware Assurance PKI certificates and hardware security tokens that protect the user's private key; certificates issued in accordance with organization-defined requirements].

๐Ÿ’ผ SC-13 Cryptographic Protection

a. Determine the [Assignment: organization-defined cryptographic uses]; and b. Implement the following types of cryptography required for each specified cryptographic use: [Assignment: organization-defined types of cryptography for each specified cryptographic use].

๐Ÿ’ผ SC-13 CRYPTOGRAPHIC PROTECTION

The information system implements [Assignment: organization-defined cryptographic uses and type of cryptography required for each use] in accordance with applicable federal laws, Executive Orders, directives, policies, regulations, and standards.

๐Ÿ’ผ SC-13 Cryptographic Protection (L)(M)(H)

a. Determine the [Assignment: organization-defined cryptographic uses]; and b. Implement the following types of cryptography required for each specified cryptographic use: [FedRAMP Assignment: FIPS-validated or NSA-approved cryptography]. **SC-13 Additional FedRAMP Requirements and Guidance:** **Guidance**: This control applies to all use of cryptography. In addition to encryption, this includes functions such as hashing, random number generation, and key generation. Examples include the following: - Encryption of data - Decryption of data - Generation of one time passwords (OTPs) for MFA - Protocols such as TLS, SSH, and HTTPS The requirement for FIPS 140 validation, as well as timelines for acceptance of FIPS 140-2, and 140-3 can be found at the [NIST Cryptographic Module Validation Program (CMVP)](https://csrc.nist.gov/projects/cryptographic-module-validation-program). **Guidance**: For NSA-approved cryptography, the National Information Assurance Partnership (NIAP) oversees a national program to evaluate Commercial IT Products for Use in National Security Systems. The NIAP Product Compliant List can be found at the following location: <https://www.niap-ccevs.org/Product/index.cfm> **Guidance**:Guidance**: When leveraging encryption from underlying IaaS/PaaS: While some IaaS/PaaS provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Guidance**: Moving to non-FIPS CM or product is acceptable when: - FIPS validated version has a known vulnerability - Feature with vulnerability is in use - Non-FIPS version fixes the vulnerability - Non-FIPS version is submitted to NIST for FIPS validation - POA&M is added to track approval, and deployment when ready **Guidance**: At a minimum, this control applies to cryptography in use for the following controls: AU-9(3), CP-9(8), IA-2(6), IA-5(1), MP-5, SC-8(1), and SC-28(1).

๐Ÿ’ผ SC-13 Cryptographic Protection (L)(M)(H)

a. Determine the [Assignment: organization-defined cryptographic uses]; and b. Implement the following types of cryptography required for each specified cryptographic use: [FedRAMP Assignment: FIPS-validated or NSA-approved cryptography]. **SC-13 Additional FedRAMP Requirements and Guidance:** **Guidance**: This control applies to all use of cryptography. In addition to encryption, this includes functions such as hashing, random number generation, and key generation. Examples include the following: - Encryption of data - Decryption of data - Generation of one time passwords (OTPs) for MFA - Protocols such as TLS, SSH, and HTTPS The requirement for FIPS 140 validation, as well as timelines for acceptance of FIPS 140-2, and 140-3 can be found at the [NIST Cryptographic Module Validation Program (CMVP)](https://csrc.nist.gov/projects/cryptographic-module-validation-program). **Guidance**: For NSA-approved cryptography, the National Information Assurance Partnership (NIAP) oversees a national program to evaluate Commercial IT Products for Use in National Security Systems. The NIAP Product Compliant List can be found at the following location: <https://www.niap-ccevs.org/Product/index.cfm> **Guidance**:Guidance**: When leveraging encryption from underlying IaaS/PaaS: While some IaaS/PaaS provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Guidance**: Moving to non-FIPS CM or product is acceptable when: - FIPS validated version has a known vulnerability - Feature with vulnerability is in use - Non-FIPS version fixes the vulnerability - Non-FIPS version is submitted to NIST for FIPS validation - POA&M is added to track approval, and deployment when ready **Guidance**: At a minimum, this control applies to cryptography in use for the following controls: AU-9(3), CP-9(8), IA-2(6), IA-5(1), MP-5, SC-8(1), and SC-28(1).

๐Ÿ’ผ SC-13 Cryptographic Protection (L)(M)(H)

a. Determine the [Assignment: organization-defined cryptographic uses]; and b. Implement the following types of cryptography required for each specified cryptographic use: [FedRAMP Assignment: FIPS-validated or NSA-approved cryptography]. **SC-13 Additional FedRAMP Requirements and Guidance:** **Guidance**: This control applies to all use of cryptography. In addition to encryption, this includes functions such as hashing, random number generation, and key generation. Examples include the following: - Encryption of data - Decryption of data - Generation of one time passwords (OTPs) for MFA - Protocols such as TLS, SSH, and HTTPS The requirement for FIPS 140 validation, as well as timelines for acceptance of FIPS 140-2, and 140-3 can be found at the [NIST Cryptographic Module Validation Program (CMVP)](https://csrc.nist.gov/projects/cryptographic-module-validation-program). **Guidance**: For NSA-approved cryptography, the National Information Assurance Partnership (NIAP) oversees a national program to evaluate Commercial IT Products for Use in National Security Systems. The NIAP Product Compliant List can be found at the following location: <https://www.niap-ccevs.org/Product/index.cfm> **Guidance**:Guidance**: When leveraging encryption from underlying IaaS/PaaS: While some IaaS/PaaS provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Guidance**: Moving to non-FIPS CM or product is acceptable when: - FIPS validated version has a known vulnerability - Feature with vulnerability is in use - Non-FIPS version fixes the vulnerability - Non-FIPS version is submitted to NIST for FIPS validation - POA&M is added to track approval, and deployment when ready **Guidance**: At a minimum, this control applies to cryptography in use for the following controls: AU-9(3), CP-9(8), IA-2(6), IA-5(1), MP-5, SC-8(1), and SC-28(1).

๐Ÿ’ผ SC-15 COLLABORATIVE COMPUTING DEVICES

The information system: SC-15a. Prohibits remote activation of collaborative computing devices with the following exceptions: [Assignment: organization-defined exceptions where remote activation is to be allowed]; and SC-15b. Provides an explicit indication of use to users physically present at the devices.

๐Ÿ’ผ SC-15 Collaborative Computing Devices and Applications

a. Prohibit remote activation of collaborative computing devices and applications with the following exceptions: [Assignment: organization-defined exceptions where remote activation is to be allowed]; and b. Provide an explicit indication of use to users physically present at the devices.

๐Ÿ’ผ SC-15 Collaborative Computing Devices and Applications (L)(M)(H)

a. Prohibit remote activation of collaborative computing devices and applications with the following exceptions: [FedRAMP Assignment: no exceptions for computing devices]; and b. Provide an explicit indication of use to users physically present at the devices. **SC-15 Additional FedRAMP Requirements and Guidance:** **Requirement**: The information system provides disablement (instead of physical disconnect) of collaborative computing devices in a manner that supports ease of use.

๐Ÿ’ผ SC-15 Collaborative Computing Devices and Applications (L)(M)(H)

a. Prohibit remote activation of collaborative computing devices and applications with the following exceptions: [FedRAMP Assignment: no exceptions for computing devices]; and b. Provide an explicit indication of use to users physically present at the devices. **SC-15 Additional FedRAMP Requirements and Guidance:** **Requirement**: The information system provides disablement (instead of physical disconnect) of collaborative computing devices in a manner that supports ease of use.

๐Ÿ’ผ SC-15 Collaborative Computing Devices and Applications (L)(M)(H)

a. Prohibit remote activation of collaborative computing devices and applications with the following exceptions: [FedRAMP Assignment: no exceptions for computing devices]; and b. Provide an explicit indication of use to users physically present at the devices. **SC-15 Additional FedRAMP Requirements and Guidance:** **Requirement**: The information system provides disablement (instead of physical disconnect) of collaborative computing devices in a manner that supports ease of use.

๐Ÿ’ผ SC-17 Public Key Infrastructure Certificates

a. Issue public key certificates under an [Assignment: organization-defined certificate policy] or obtain public key certificates from an approved service provider; and b. Include only approved trust anchors in trust stores or certificate stores managed by the organization.

๐Ÿ’ผ SC-17 Public Key Infrastructure Certificates (M)(H)

a. Issue public key certificates under an [Assignment: organization-defined certificate policy] or obtain public key certificates from an approved service provider; and b. Include only approved trust anchors in trust stores or certificate stores managed by the organization.

๐Ÿ’ผ SC-17 Public Key Infrastructure Certificates (M)(H)

a. Issue public key certificates under an [Assignment: organization-defined certificate policy] or obtain public key certificates from an approved service provider; and b. Include only approved trust anchors in trust stores or certificate stores managed by the organization.

๐Ÿ’ผ SC-18 (4) PREVENT AUTOMATIC EXECUTION

The information system prevents the automatic execution of mobile code in [Assignment: organization-defined software applications] and enforces [Assignment: organization-defined actions] prior to executing the code.

๐Ÿ’ผ SC-18 Mobile Code

a. Define acceptable and unacceptable mobile code and mobile code technologies; and b. Authorize, monitor, and control the use of mobile code within the system.

๐Ÿ’ผ SC-18 MOBILE CODE

The organization: SC-18a. Defines acceptable and unacceptable mobile code and mobile code technologies; SC-18b. Establishes usage restrictions and implementation guidance for acceptable mobile code and mobile code technologies; and SC-18c. Authorizes, monitors, and controls the use of mobile code within the information system.

๐Ÿ’ผ SC-18 Mobile Code (M)(H)

a. Define acceptable and unacceptable mobile code and mobile code technologies; and b. Authorize, monitor, and control the use of mobile code within the system.

๐Ÿ’ผ SC-18 Mobile Code (M)(H)

a. Define acceptable and unacceptable mobile code and mobile code technologies; and b. Authorize, monitor, and control the use of mobile code within the system.

๐Ÿ’ผ SC-19 VOICE OVER INTERNET PROTOCOL

The organization: SC-19a. Establishes usage restrictions and implementation guidance for Voice over Internet Protocol (VoIP) technologies based on the potential to cause damage to the information system if used maliciously; and SC-19b. Authorizes, monitors, and controls the use of VoIP within the information system.

๐Ÿ’ผ SC-20 SECURE NAME | ADDRESS RESOLUTION SERVICE (AUTHORITATIVE SOURCE)

The information system: SC-20a. Provides additional data origin authentication and integrity verification artifacts along with the authoritative name resolution data the system returns in response to external name/address resolution queries; and SC-20b. Provides the means to indicate the security status of child zones and (if the child supports secure resolution services) to enable verification of a chain of trust among parent and child domains, when operating as part of a distributed, hierarchical namespace.

๐Ÿ’ผ SC-20 Secure Name/address Resolution Service (authoritative Source)

a. Provide additional data origin authentication and integrity verification artifacts along with the authoritative name resolution data the system returns in response to external name/address resolution queries; and b. Provide the means to indicate the security status of child zones and (if the child supports secure resolution services) to enable verification of a chain of trust among parent and child domains, when operating as part of a distributed, hierarchical namespace.

๐Ÿ’ผ SC-20 Secure Name/Address Resolution Service (Authoritative Source) (L)(M)(H)

a. Provide additional data origin authentication and integrity verification artifacts along with the authoritative name resolution data the system returns in response to external name/address resolution queries; and b. Provide the means to indicate the security status of child zones and (if the child supports secure resolution services) to enable verification of a chain of trust among parent and child domains, when operating as part of a distributed, hierarchical namespace. **SC-20 Additional FedRAMP Requirements and Guidance:** **Guidance**: SC-20 applies to use of external authoritative DNS to access a CSO from outside the boundary. **Guidance**: External authoritative DNS servers may be located outside an authorized environment. Positioning these servers inside an authorized boundary is encouraged. **Guidance**: CSPs are recommended to self-check DNSSEC configuration through one of many available analyzers such as [Sandia National Labs](https://dnsviz.net) **Requirement**: Control Description should include how DNSSEC is implemented on authoritative DNS servers to supply valid responses to external DNSSEC requests.

๐Ÿ’ผ SC-20 Secure Name/Address Resolution Service (Authoritative Source) (L)(M)(H)

a. Provide additional data origin authentication and integrity verification artifacts along with the authoritative name resolution data the system returns in response to external name/address resolution queries; and b. Provide the means to indicate the security status of child zones and (if the child supports secure resolution services) to enable verification of a chain of trust among parent and child domains, when operating as part of a distributed, hierarchical namespace. **SC-20 Additional FedRAMP Requirements and Guidance:** **Guidance**: SC-20 applies to use of external authoritative DNS to access a CSO from outside the boundary. **Guidance**: External authoritative DNS servers may be located outside an authorized environment. Positioning these servers inside an authorized boundary is encouraged. **Guidance**: CSPs are recommended to self-check DNSSEC configuration through one of many available analyzers such as [Sandia National Labs](https://dnsviz.net) **Requirement**: Control Description should include how DNSSEC is implemented on authoritative DNS servers to supply valid responses to external DNSSEC requests.

๐Ÿ’ผ SC-20 Secure Name/Address Resolution Service (Authoritative Source) (L)(M)(H)

a. Provide additional data origin authentication and integrity verification artifacts along with the authoritative name resolution data the system returns in response to external name/address resolution queries; and b. Provide the means to indicate the security status of child zones and (if the child supports secure resolution services) to enable verification of a chain of trust among parent and child domains, when operating as part of a distributed, hierarchical namespace. **SC-20 Additional FedRAMP Requirements and Guidance:** **Guidance**: SC-20 applies to use of external authoritative DNS to access a CSO from outside the boundary. **Guidance**: External authoritative DNS servers may be located outside an authorized environment. Positioning these servers inside an authorized boundary is encouraged. **Guidance**: CSPs are recommended to self-check DNSSEC configuration through one of many available analyzers such as [Sandia National Labs](https://dnsviz.net) **Requirement**: Control Description should include how DNSSEC is implemented on authoritative DNS servers to supply valid responses to external DNSSEC requests.

๐Ÿ’ผ SC-21 Secure Name/Address Resolution Service (Recursive or Caching Resolver) (L)(M)(H)

Request and perform data origin authentication and data integrity verification on the name/address resolution responses the system receives from authoritative sources. **SC-21 Additional FedRAMP Requirements and Guidance:** **Guidance**: Accepting an unsigned reply is acceptable **Guidance**: SC-21 applies to use of internal recursive DNS to access a domain outside the boundary by a component inside the boundary. DNSSEC resolution to access a component inside the boundary is excluded. **Requirement**: Control description should include how DNSSEC is implemented on recursive DNS servers to make DNSSEC requests when resolving DNS requests from internal components to domains external to the CSO boundary. - If the reply is signed, and fails DNSSEC, do not use the reply. - If the reply is unsigned: - CSP chooses the policy to apply. **Requirement**: Internal recursive DNS servers must be located inside an authorized environment. It is typically within the boundary or leveraged from an underlying IaaS/PaaS.

๐Ÿ’ผ SC-21 Secure Name/Address Resolution Service (Recursive or Caching Resolver) (L)(M)(H)

Request and perform data origin authentication and data integrity verification on the name/address resolution responses the system receives from authoritative sources. **SC-21 Additional FedRAMP Requirements and Guidance:** **Guidance**: Accepting an unsigned reply is acceptable **Guidance**: SC-21 applies to use of internal recursive DNS to access a domain outside the boundary by a component inside the boundary. DNSSEC resolution to access a component inside the boundary is excluded. **Requirement**: Control description should include how DNSSEC is implemented on recursive DNS servers to make DNSSEC requests when resolving DNS requests from internal components to domains external to the CSO boundary. - If the reply is signed, and fails DNSSEC, do not use the reply. - If the reply is unsigned: - CSP chooses the policy to apply. **Requirement**: Internal recursive DNS servers must be located inside an authorized environment. It is typically within the boundary or leveraged from an underlying IaaS/PaaS.

๐Ÿ’ผ SC-21 Secure Name/Address Resolution Service (Recursive or Caching Resolver) (L)(M)(H)

Request and perform data origin authentication and data integrity verification on the name/address resolution responses the system receives from authoritative sources. **SC-21 Additional FedRAMP Requirements and Guidance:** **Guidance**: Accepting an unsigned reply is acceptable **Guidance**: SC-21 applies to use of internal recursive DNS to access a domain outside the boundary by a component inside the boundary. DNSSEC resolution to access a component inside the boundary is excluded. **Requirement**: Control description should include how DNSSEC is implemented on recursive DNS servers to make DNSSEC requests when resolving DNS requests from internal components to domains external to the CSO boundary. - If the reply is signed, and fails DNSSEC, do not use the reply. - If the reply is unsigned: - CSP chooses the policy to apply. **Requirement**: Internal recursive DNS servers must be located inside an authorized environment. It is typically within the boundary or leveraged from an underlying IaaS/PaaS.

๐Ÿ’ผ SC-24 Fail in Known State

Fail to a [Assignment: organization-defined known system state] for the following failures on the indicated components while preserving [Assignment: organization-defined system state information] in failure: [Assignment: list of organization-defined types of system failures on organization-defined system components].

๐Ÿ’ผ SC-24 FAIL IN KNOWN STATE

The information system fails to a [Assignment: organization-defined known-state] for [Assignment: organization-defined types of failures] preserving [Assignment: organization-defined system state information] in failure.

๐Ÿ’ผ SC-24 Fail in Known State (H)

Fail to a [Assignment: organization-defined known system state] for the following failures on the indicated components while preserving [Assignment: organization-defined system state information] in failure: [list of organization-defined types of system failures on organization-defined system components].

๐Ÿ’ผ SC-25 Thin Nodes

Employ minimal functionality and information storage on the following system components: [Assignment: organization-defined system components].

๐Ÿ’ผ SC-25 THIN NODES

The organization employs [Assignment: organization-defined information system components] with minimal functionality and information storage.

๐Ÿ’ผ SC-26 Decoys

Include components within organizational systems specifically designed to be the target of malicious attacks for detecting, deflecting, and analyzing such attacks.

๐Ÿ’ผ SC-26 HONEYPOTS

The information system includes components specifically designed to be the target of malicious attacks for the purpose of detecting, deflecting, and analyzing such attacks.

๐Ÿ’ผ SC-28 (1) CRYPTOGRAPHIC PROTECTION

The information system implements cryptographic mechanisms to prevent unauthorized disclosure and modification of [Assignment: organization-defined information] on [Assignment: organization-defined information system components].

๐Ÿ’ผ SC-28 Protection of Information at Rest (L)(M)(H)

Protect the [FedRAMP Assignment: confidentiality AND integrity] of the following information at rest: [Assignment: organization-defined information at rest]. **SC-28 Additional FedRAMP Requirements and Guidance:** **Guidance**: The organization supports the capability to use cryptographic mechanisms to protect information at rest. **Guidance**: When leveraging encryption from underlying IaaS/PaaS: While some IaaS/PaaS services provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Guidance**: Note that this enhancement requires the use of cryptography in accordance with SC-13.

๐Ÿ’ผ SC-28 Protection of Information at Rest (L)(M)(H)

Protect the [FedRAMP Assignment: confidentiality AND integrity] of the following information at rest: [Assignment: organization-defined information at rest]. **SC-28 Additional FedRAMP Requirements and Guidance:** **Guidance**: The organization supports the capability to use cryptographic mechanisms to protect information at rest. **Guidance**: When leveraging encryption from underlying IaaS/PaaS: While some IaaS/PaaS services provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Guidance**: Note that this enhancement requires the use of cryptography in accordance with SC-13.

๐Ÿ’ผ SC-28 Protection of Information at Rest (L)(M)(H)

Protect the [FedRAMP Assignment: confidentiality AND integrity] of the following information at rest: [Assignment: organization-defined information at rest]. **SC-28 Additional FedRAMP Requirements and Guidance:** **Guidance**: The organization supports the capability to use cryptographic mechanisms to protect information at rest. **Guidance**: When leveraging encryption from underlying IaaS/PaaS: While some IaaS/PaaS services provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Guidance**: Note that this enhancement requires the use of cryptography in accordance with SC-13.

๐Ÿ’ผ SC-28(1) Cryptographic Protection (L)(M)(H)

Implement cryptographic mechanisms to prevent unauthorized disclosure and modification of the following information at rest on [FedRAMP Assignment: all information system components storing Federal data or system data that must be protected at the High or Moderate impact levels]: [Assignment: organization-defined information]. **SC-28 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Organizations should select a mode of protection that is targeted towards the relevant threat scenarios. Examples: A. Organizations may apply full disk encryption (FDE) to a mobile device where the primary threat is loss of the device while storage is locked. B. For a database application housing data for a single customer, encryption at the file system level would often provide more protection than FDE against the more likely threat of an intruder on the operating system accessing the storage. C. For a database application housing data for multiple customers, encryption with unique keys for each customer at the database record level may be more appropriate.

๐Ÿ’ผ SC-28(1) Cryptographic Protection (L)(M)(H)

Implement cryptographic mechanisms to prevent unauthorized disclosure and modification of the following information at rest on [FedRAMP Assignment: all information system components storing Federal data or system data that must be protected at the High or Moderate impact levels]: [Assignment: organization-defined information]. **SC-28 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Organizations should select a mode of protection that is targeted towards the relevant threat scenarios. Examples: A. Organizations may apply full disk encryption (FDE) to a mobile device where the primary threat is loss of the device while storage is locked. B. For a database application housing data for a single customer, encryption at the file system level would often provide more protection than FDE against the more likely threat of an intruder on the operating system accessing the storage. C. For a database application housing data for multiple customers, encryption with unique keys for each customer at the database record level may be more appropriate.

๐Ÿ’ผ SC-28(1) Cryptographic Protection (L)(M)(H)

Implement cryptographic mechanisms to prevent unauthorized disclosure and modification of the following information at rest on [FedRAMP Assignment: all information system components storing Federal data or system data that must be protected at the High or Moderate impact levels]: [Assignment: organization-defined information]. **SC-28 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Organizations should select a mode of protection that is targeted towards the relevant threat scenarios. Examples: A. Organizations may apply full disk encryption (FDE) to a mobile device where the primary threat is loss of the device while storage is locked. B. For a database application housing data for a single customer, encryption at the file system level would often provide more protection than FDE against the more likely threat of an intruder on the operating system accessing the storage. C. For a database application housing data for multiple customers, encryption with unique keys for each customer at the database record level may be more appropriate.

๐Ÿ’ผ SC-29 (1) VIRTUALIZATION TECHNIQUES

The organization employs virtualization techniques to support the deployment of a diversity of operating systems and applications that are changed [Assignment: organization-defined frequency].

๐Ÿ’ผ SC-29 Heterogeneity

Employ a diverse set of information technologies for the following system components in the implementation of the system: [Assignment: organization-defined system components].

๐Ÿ’ผ SC-29 HETEROGENEITY

The organization employs a diverse set of information technologies for [Assignment: organization-defined information system components] in the implementation of the information system.

๐Ÿ’ผ SC-3 (5) LAYERED STRUCTURES

The organization implements security functions as a layered structure minimizing interactions between layers of the design and avoiding any dependence by lower layers on the functionality or correctness of higher layers.

๐Ÿ’ผ SC-30 (2) RANDOMNESS

The organization employs [Assignment: organization-defined techniques] to introduce randomness into organizational operations and assets.

๐Ÿ’ผ SC-30 (4) MISLEADING INFORMATION

The organization employs realistic, but misleading information in [Assignment: organization-defined information system components] with regard to its security state or posture.

๐Ÿ’ผ SC-30 Concealment and Misdirection

Employ the following concealment and misdirection techniques for [Assignment: organization-defined systems] at [Assignment: organization-defined time periods] to confuse and mislead adversaries: [Assignment: organization-defined concealment and misdirection techniques].

๐Ÿ’ผ SC-30 CONCEALMENT AND MISDIRECTION

The organization employs [Assignment: organization-defined concealment and misdirection techniques] for [Assignment: organization-defined information systems] at [Assignment: organization-defined time periods] to confuse and mislead adversaries.

๐Ÿ’ผ SC-31 (2) MAXIMUM BANDWIDTH

The organization reduces the maximum bandwidth for identified covert [Selection (one or more); storage; timing] channels to [Assignment: organization-defined values].

๐Ÿ’ผ SC-31 Covert Channel Analysis

a. Perform a covert channel analysis to identify those aspects of communications within the system that are potential avenues for covert [Selection (one or more): storage; timing] channels; and b. Estimate the maximum bandwidth of those channels.

๐Ÿ’ผ SC-31 COVERT CHANNEL ANALYSIS

The organization: SC-31a. Performs a covert channel analysis to identify those aspects of communications within the information system that are potential avenues for covert [Selection (one or more): storage; timing] channels; and SC-31b. Estimates the maximum bandwidth of those channels.

๐Ÿ’ผ SC-32 INFORMATION SYSTEM PARTITIONING

The organization partitions the information system into [Assignment: organization-defined information system components] residing in separate physical domains or environments based on [Assignment: organization-defined circumstances for physical separation of components].

๐Ÿ’ผ SC-32 System Partitioning

Partition the system into [Assignment: organization-defined system components] residing in separate [Selection: physical; logical] domains or environments based on [Assignment: organization-defined circumstances for physical or logical separation of components].

๐Ÿ’ผ SC-34 (1) NO WRITABLE STORAGE

The organization employs [Assignment: organization-defined information system components] with no writeable storage that is persistent across component restart or power on/off.

๐Ÿ’ผ SC-34 (3) HARDWARE-BASED PROTECTION

The organization: SC-34 (3)(a) Employs hardware-based, write-protect for [Assignment: organization-defined information system firmware components]; and SC-34 (3)(b) Implements specific procedures for [Assignment: organization-defined authorized individuals] to manually disable hardware write-protect for firmware modifications and re-enable the write-protect prior to returning to operational mode.

๐Ÿ’ผ SC-34 Non-modifiable Executable Programs

For [Assignment: organization-defined system components], load and execute: a. The operating environment from hardware-enforced, read-only media; and b. The following applications from hardware-enforced, read-only media: [Assignment: organization-defined applications].

๐Ÿ’ผ SC-34 NON-MODIFIABLE EXECUTABLE PROGRAMS

The information system at [Assignment: organization-defined information system components]: SC-34a. Loads and executes the operating environment from hardware-enforced, read-only media; and SC-34b. Loads and executes [Assignment: organization-defined applications] from hardware-enforced, read-only media.

๐Ÿ’ผ SC-35 HONEYCLIENTS

The information system includes components that proactively seek to identify malicious websites and/or web-based malicious code.

๐Ÿ’ผ SC-36 (1) POLLING TECHNIQUES

The organization employs polling techniques to identify potential faults, errors, or compromises to [Assignment: organization-defined distributed processing and storage components].

๐Ÿ’ผ SC-36(1) Distributed Processing and Storage | Polling Techniques

(a) Employ polling techniques to identify potential faults, errors, or compromises to the following processing and storage components: [Assignment: organization-defined distributed processing and storage components]; and (b) Take the following actions in response to identified faults, errors, or compromises: [Assignment: organization-defined actions].

๐Ÿ’ผ SC-37 (1) ENSURE DELIVERY | TRANSMISSION

The organization employs [Assignment: organization-defined security safeguards] to ensure that only [Assignment: organization-defined individuals or information systems] receive the [Assignment: organization-defined information, information system components, or devices].

๐Ÿ’ผ SC-37 Out-of-band Channels

Employ the following out-of-band channels for the physical delivery or electronic transmission of [Assignment: organization-defined information, system components, or devices] to [Assignment: organization-defined individuals or systems]: [Assignment: organization-defined out-of-band channels].

๐Ÿ’ผ SC-37 OUT-OF-BAND CHANNELS

The organization employs [Assignment: organization-defined out-of-band channels] for the physical delivery or electronic transmission of [Assignment: organization-defined information, information system components, or devices] to [Assignment: organization-defined individuals or information systems].

๐Ÿ’ผ SC-38 Operations Security

Employ the following operations security controls to protect key organizational information throughout the system development life cycle: [Assignment: organization-defined operations security controls].

๐Ÿ’ผ SC-38 OPERATIONS SECURITY

The organization employs [Assignment: organization-defined operations security safeguards] to protect key organizational information throughout the system development life cycle.

๐Ÿ’ผ SC-4 (2) PERIODS PROCESSING

The information system prevents unauthorized information transfer via shared resources in accordance with [Assignment: organization-defined procedures] when system processing explicitly switches between different information classification levels or security categories.

๐Ÿ’ผ SC-40 Wireless Link Protection

Protect external and internal [Assignment: organization-defined wireless links] from the following signal parameter attacks: [Assignment: organization-defined types of signal parameter attacks or references to sources for such attacks].

๐Ÿ’ผ SC-40 WIRELESS LINK PROTECTION

The information system protects external and internal [Assignment: organization-defined wireless links] from [Assignment: organization-defined types of signal parameter attacks or references to sources for such attacks].

๐Ÿ’ผ SC-41 Port and I/O Device Access

[Selection: Physically; Logically] disable or remove [Assignment: organization-defined connection ports or input/output devices] on the following systems or system components: [Assignment: organization-defined systems or system components].

๐Ÿ’ผ SC-41 PORT AND I|O DEVICE ACCESS

The organization physically disables or removes [Assignment: organization-defined connection ports or input/output devices] on [Assignment: organization-defined information systems or information system components].

๐Ÿ’ผ SC-42 (2) AUTHORIZED USE

The organization employs the following measures: [Assignment: organization-defined measures], so that data or information collected by [Assignment: organization-defined sensors] is only used for authorized purposes.

๐Ÿ’ผ SC-42 (3) PROHIBIT USE OF DEVICES

The organization prohibits the use of devices possessing [Assignment: organization-defined environmental sensing capabilities] in [Assignment: organization-defined facilities, areas, or systems].

๐Ÿ’ผ SC-42 Sensor Capability and Data

a. Prohibit [Selection (one or more): the use of devices possessing [Assignment: organization-defined environmental sensing capabilities] in [Assignment: organization-defined facilities, areas, or systems]; the remote activation of environmental sensing capabilities on organizational systems or system components with the following exceptions: [Assignment: organization-defined exceptions where remote activation of sensors is allowed]]; and b. Provide an explicit indication of sensor use to [Assignment: organization-defined group of users].

๐Ÿ’ผ SC-42 SENSOR CAPABILITY AND DATA

The information system: SC-42a. Prohibits the remote activation of environmental sensing capabilities with the following exceptions: [Assignment: organization-defined exceptions where remote activation of sensors is allowed]; and SC-42b. Provides an explicit indication of sensor use to [Assignment: organization-defined class of users].

๐Ÿ’ผ SC-43 Usage Restrictions

a. Establish usage restrictions and implementation guidelines for the following system components: [Assignment: organization-defined system components]; and b. Authorize, monitor, and control the use of such components within the system.

๐Ÿ’ผ SC-43 USAGE RESTRICTIONS

The organization: SC-43a. Establishes usage restrictions and implementation guidance for [Assignment: organization-defined information system components] based on the potential to cause damage to the information system if used maliciously; and SC-43b. Authorizes, monitors, and controls the use of such components within the information system.

๐Ÿ’ผ SC-44 DETONATION CHAMBERS

The organization employs a detonation chamber capability within [Assignment: organization-defined information system, system component, or location].

๐Ÿ’ผ SC-45(1) Synchronization with Authoritative Time Source (M)(H)

(a) Compare the internal system clocks [FedRAMP Assignment: At least hourly] with [FedRAMP Assignment: <http://tf.nist.gov/tf-cgi/servers.cgi>]; and (b) Synchronize the internal system clocks to the authoritative time source when the time difference is greater than [FedRAMP Assignment: any difference]. **SC-45(1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Synchronization of system clocks improves the accuracy of log analysis. **Requirement**: The service provider selects primary and secondary time servers used by the NIST Internet time service. The secondary server is selected from a different geographic region than the primary server. **Requirement**: The service provider synchronizes the system clocks of network computers that run operating systems other than Windows to the Windows Server Domain Controller emulator or to the same time source for that server.

๐Ÿ’ผ SC-45(1) Synchronization with Authoritative Time Source (M)(H)

(a) Compare the internal system clocks [FedRAMP Assignment: At least hourly] with [FedRAMP Assignment: <http://tf.nist.gov/tf-cgi/servers.cgi>]; and (b) Synchronize the internal system clocks to the authoritative time source when the time difference is greater than [FedRAMP Assignment: any difference]. **SC-45(1) Additional FedRAMP Requirements and Guidance:** **Guidance**: Synchronization of system clocks improves the accuracy of log analysis. **Requirement**: The service provider selects primary and secondary time servers used by the NIST Internet time service. The secondary server is selected from a different geographic region than the primary server. **Requirement**: The service provider synchronizes the system clocks of network computers that run operating systems other than Windows to the Windows Server Domain Controller emulator or to the same time source for that server.

๐Ÿ’ผ SC-48 Sensor Relocation

Relocate [Assignment: organization-defined sensors and monitoring capabilities] to [Assignment: organization-defined locations] under the following conditions or circumstances: [Assignment: organization-defined conditions or circumstances].

๐Ÿ’ผ SC-5 (3) DETECTION | MONITORING

The organization: SC-5 (3)(a) Employs [Assignment: organization-defined monitoring tools] to detect indicators of denial of service attacks against the information system; and SC-5 (3)(b) Monitors [Assignment: organization-defined information system resources] to determine if sufficient resources exist to prevent effective denial of service attacks.

๐Ÿ’ผ SC-5 DENIAL OF SERVICE PROTECTION

The information system protects against or limits the effects of the following types of denial of service attacks: [Assignment: organization-defined types of denial of service attacks or references to sources for such information] by employing [Assignment: organization-defined security safeguards].

๐Ÿ’ผ SC-5 Denial-of-service Protection

a. [Selection: Protect against; Limit] the effects of the following types of denial-of-service events: [Assignment: organization-defined types of denial-of-service events]; and b. Employ the following controls to achieve the denial-of-service objective: [Assignment: organization-defined controls by type of denial-of-service event].

๐Ÿ’ผ SC-5 Denial-of-service Protection (L)(M)(H)

a. [FedRAMP Assignment: Protect against] the effects of the following types of denial-of-service events: [FedRAMP Assignment: at a minimum: ICMP (ping) flood, SYN flood, slowloris, buffer overflow attack, and volume attack] and; b. Employ the following controls to achieve the denial-of-service objective: [Assignment: organization-defined controls by type of denial-of-service event].

๐Ÿ’ผ SC-5 Denial-of-service Protection (L)(M)(H)

a. [FedRAMP Assignment: Protect against] the effects of the following types of denial-of-service events: [FedRAMP Assignment: at a minimum: ICMP (ping) flood, SYN flood, slowloris, buffer overflow attack, and volume attack] and; b. Employ the following controls to achieve the denial-of-service objective: [Assignment: organization-defined controls by type of denial-of-service event].

๐Ÿ’ผ SC-5 Denial-of-service Protection (L)(M)(H)

a. [FedRAMP Assignment: Protect against] the effects of the following types of denial-of-service events: [FedRAMP Assignment: at a minimum: ICMP (ping) flood, SYN flood, slowloris, buffer overflow attack, and volume attack] and; b. Employ the following controls to achieve the denial-of-service objective: [Assignment: organization-defined controls by type of denial-of-service event].

๐Ÿ’ผ SC-5(3) Denial-of-service Protection | Detection and Monitoring

(a) Employ the following monitoring tools to detect indicators of denial-of-service attacks against, or launched from, the system: [Assignment: organization-defined monitoring tools]; and (b) Monitor the following system resources to determine if sufficient resources exist to prevent effective denial-of-service attacks: [Assignment: organization-defined system resources].

๐Ÿ’ผ SC-51 Hardware-based Protection

a. Employ hardware-based, write-protect for [Assignment: organization-defined system firmware components]; and b. Implement specific procedures for [Assignment: organization-defined authorized individuals] to manually disable hardware write-protect for firmware modifications and re-enable the write-protect prior to returning to operational mode.

๐Ÿ’ผ SC-6 Resource Availability

Protect the availability of resources by allocating [Assignment: organization-defined resources] by [Selection (one or more): priority; quota; [Assignment: organization-defined controls]].

๐Ÿ’ผ SC-6 RESOURCE AVAILABILITY

The information system protects the availability of resources by allocating [Assignment: organization-defined resources] by [Selection (one or more); priority; quota; [Assignment: organization-defined security safeguards]].

๐Ÿ’ผ SC-7 (12) HOST-BASED PROTECTION

The organization implements [Assignment: organization-defined host-based boundary protection mechanisms] at [Assignment: organization-defined information system components].

๐Ÿ’ผ SC-7 (4) EXTERNAL TELECOMMUNICATIONS SERVICES

The organization: SC-7 (4)(a) Implements a managed interface for each external telecommunication service; SC-7 (4)(b) Establishes a traffic flow policy for each managed interface; SC-7 (4)(c) Protects the confidentiality and integrity of the information being transmitted across each interface; SC-7 (4)(d) Documents each exception to the traffic flow policy with a supporting mission/business need and duration of that need; and SC-7 (4)(e) Reviews exceptions to the traffic flow policy [Assignment: organization-defined frequency] and removes exceptions that are no longer supported by an explicit mission/business need.

๐Ÿ’ผ SC-7 Boundary Protection

a. Monitor and control communications at the external managed interfaces to the system and at key internal managed interfaces within the system; b. Implement subnetworks for publicly accessible system components that are [Selection: physically; logically] separated from internal organizational networks; and c. Connect to external networks or systems only through managed interfaces consisting of boundary protection devices arranged in accordance with an organizational security and privacy architecture.

๐Ÿ’ผ SC-7 BOUNDARY PROTECTION

The information system: SC-7a. Monitors and controls communications at the external boundary of the system and at key internal boundaries within the system; SC-7b. Implements subnetworks for publicly accessible system components that are [Selection: physically; logically] separated from internal organizational networks; and SC-7c. Connects to external networks or information systems only through managed interfaces consisting of boundary protection devices arranged in accordance with an organizational security architecture.

๐Ÿ’ผ SC-7 Boundary Protection (L)(M)(H)

a. Monitor and control communications at the external managed interfaces to the system and at key internal managed interfaces within the system; b. Implement subnetworks for publicly accessible system components that are [Selection: Assignment: physically; logically] separated from internal organizational networks; and c. Connect to external networks or systems only through managed interfaces consisting of boundary protection devices arranged in accordance with an organizational security and privacy architecture. **SC-7 Additional FedRAMP Requirements and Guidance:** **(b) Guidance**: SC-7 (b) should be met by subnet isolation. A subnetwork (subnet) is a physically or logically segmented section of a larger network defined at TCP/IP Layer 3, to both minimize traffic and, important for a FedRAMP Authorization, add a crucial layer of network isolation. Subnets are distinct from VLANs (Layer 2), security groups, and VPCs and are specifically required to satisfy SC-7 part b and other controls. See the [FedRAMP Subnets White Paper] (<https://www.fedramp.gov/assets/resources/documents/FedRAMP_subnets_white_paper.pdf>) for additional information.

๐Ÿ’ผ SC-7 Boundary Protection (L)(M)(H)

a. Monitor and control communications at the external managed interfaces to the system and at key internal managed interfaces within the system; b. Implement subnetworks for publicly accessible system components that are [Selection: Assignment: physically; logically] separated from internal organizational networks; and c. Connect to external networks or systems only through managed interfaces consisting of boundary protection devices arranged in accordance with an organizational security and privacy architecture. **SC-7 Additional FedRAMP Requirements and Guidance:** **(b) Guidance**: SC-7 (b) should be met by subnet isolation. A subnetwork (subnet) is a physically or logically segmented section of a larger network defined at TCP/IP Layer 3, to both minimize traffic and, important for a FedRAMP Authorization, add a crucial layer of network isolation. Subnets are distinct from VLANs (Layer 2), security groups, and VPCs and are specifically required to satisfy SC-7 part b and other controls. See the [FedRAMP Subnets White Paper] (<https://www.fedramp.gov/assets/resources/documents/FedRAMP_subnets_white_paper.pdf>) for additional information.

๐Ÿ’ผ SC-7 Boundary Protection (L)(M)(H)

a. Monitor and control communications at the external managed interfaces to the system and at key internal managed interfaces within the system; b. Implement subnetworks for publicly accessible system components that are [Selection: Assignment: physically; logically] separated from internal organizational networks; and c. Connect to external networks or systems only through managed interfaces consisting of boundary protection devices arranged in accordance with an organizational security and privacy architecture. **SC-7 Additional FedRAMP Requirements and Guidance:** **(b) Guidance**: SC-7 (b) should be met by subnet isolation. A subnetwork (subnet) is a physically or logically segmented section of a larger network defined at TCP/IP Layer 3, to both minimize traffic and, important for a FedRAMP Authorization, add a crucial layer of network isolation. Subnets are distinct from VLANs (Layer 2), security groups, and VPCs and are specifically required to satisfy SC-7 part b and other controls. See the [FedRAMP Subnets White Paper] (<https://www.fedramp.gov/assets/resources/documents/FedRAMP_subnets_white_paper.pdf>) for additional information.

๐Ÿ’ผ SC-7(12) Host-based Protection (M)(H)

Implement [FedRAMP Assignment: Host Intrusion Prevention System (HIPS), Host Intrusion Detection System (HIDS), or minimally a host-based firewall] at [Assignment: organization-defined system components].

๐Ÿ’ผ SC-7(12) Host-based Protection (M)(H)

Implement [FedRAMP Assignment: Host Intrusion Prevention System (HIPS), Host Intrusion Detection System (HIDS), or minimally a host-based firewall] at [Assignment: organization-defined system components].

๐Ÿ’ผ SC-7(24) Boundary Protection | Personally Identifiable Information

For systems that process personally identifiable information: (a) Apply the following processing rules to data elements of personally identifiable information: [Assignment: organization-defined processing rules]; (b) Monitor for permitted processing at the external interfaces to the system and at key internal boundaries within the system; (c) Document each processing exception; and (d) Review and remove exceptions that are no longer supported.

๐Ÿ’ผ SC-7(4) Boundary Protection | External Telecommunications Services

(a) Implement a managed interface for each external telecommunication service; (b) Establish a traffic flow policy for each managed interface; (c) Protect the confidentiality and integrity of the information being transmitted across each interface; (d) Document each exception to the traffic flow policy with a supporting mission or business need and duration of that need; (e) Review exceptions to the traffic flow policy [Assignment: organization-defined frequency] and remove exceptions that are no longer supported by an explicit mission or business need; (f) Prevent unauthorized exchange of control plane traffic with external networks; (g) Publish information to enable remote networks to detect unauthorized control plane traffic from internal networks; and (h) Filter unauthorized control plane traffic from external networks.

๐Ÿ’ผ SC-7(4) External Telecommunications Services (M)(H)

(a) Implement a managed interface for each external telecommunication service; (b) Establish a traffic flow policy for each managed interface; (c) Protect the confidentiality and integrity of the information being transmitted across each interface; (d) Document each exception to the traffic flow policy with a supporting mission or business need and duration of that need; (e) Review exceptions to the traffic flow policy [FedRAMP Assignment: at least every one hundred and eighty (180) days or whenever there is a change in the threat environment that warrants a review of the exceptions] and remove exceptions that are no longer supported by an explicit mission or business need; (f) Prevent unauthorized exchange of control plane traffic with external networks; (g) Publish information to enable remote networks to detect unauthorized control plane traffic from internal networks; and (h) Filter unauthorized control plane traffic from external networks.

๐Ÿ’ผ SC-7(4) External Telecommunications Services (M)(H)

(a) Implement a managed interface for each external telecommunication service; (b) Establish a traffic flow policy for each managed interface; (c) Protect the confidentiality and integrity of the information being transmitted across each interface; (d) Document each exception to the traffic flow policy with a supporting mission or business need and duration of that need; (e) Review exceptions to the traffic flow policy [FedRAMP Assignment: at least every one hundred and eighty (180) days or whenever there is a change in the threat environment that warrants a review of the exceptions] and remove exceptions that are no longer supported by an explicit mission or business need; (f) Prevent unauthorized exchange of control plane traffic with external networks; (g) Publish information to enable remote networks to detect unauthorized control plane traffic from internal networks; and (h) Filter unauthorized control plane traffic from external networks.

๐Ÿ’ผ SC-7(5) Deny by Default โ€” Allow by Exception (M)(H)

Deny network communications traffic by default and allow network communications traffic by exception [Selection (one-or-more): at managed interfaces; for [FedRAMP Assignment: any systems]]. **SC-7 (5) Additional FedRAMP Requirements and Guidance:** **Guidance**: For JAB Authorization, CSPs shall include details of this control in their Architecture Briefing.

๐Ÿ’ผ SC-7(5) Deny by Default โ€” Allow by Exception (M)(H)

Deny network communications traffic by default and allow network communications traffic by exception [Selection (one-or-more): at managed interfaces; for [FedRAMP Assignment: any systems]]. **SC-7 (5) Additional FedRAMP Requirements and Guidance:** **Guidance**: For JAB Authorization, CSPs shall include details of this control in their Architecture Briefing.

๐Ÿ’ผ SC-8 (1) CRYPTOGRAPHIC OR ALTERNATE PHYSICAL PROTECTION

The information system implements cryptographic mechanisms to [Selection (one or more): prevent unauthorized disclosure of information; detect changes to information] during transmission unless otherwise protected by [Assignment: organization-defined alternative physical safeguards].

๐Ÿ’ผ SC-8 Transmission Confidentiality and Integrity (L)(M)(H)

Protect the [FedRAMP Assignment: confidentiality AND integrity] of transmitted information. **SC-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: For each instance of data in transit, confidentiality AND integrity should be through cryptography as specified in SC-8 (1), physical means as specified in SC-8 (5), or in combination. For clarity, this control applies to all data in transit. Examples include the following data flows: - Crossing the system boundary - Between compute instances - including containers - From a compute instance to storage - Replication between availability zones - Transmission of backups to storage - From a load balancer to a compute instance - Flows from management tools required for their work - e.g. log collection, scanning, etc. The following applies only when choosing SC-8 (5) in lieu of SC-8 (1) FedRAMP-Defined Assignment / Selection Parameters SC-8 (5)-1 [a hardened or alarmed carrier Protective Distribution System (PDS) when outside of Controlled Access Area (CAA)] SC-8 (5)-2 [prevent unauthorized disclosure of information AND detect changes to information] **Guidance**: SC-8 (5) applies when physical protection has been selected as the method to protect confidentiality and integrity. For physical protection, data in transit must be in either a Controlled Access Area (CAA), or a Hardened or alarmed PDS. **Hardened or alarmed PDS**: Shall be as defined in SECTION X - CATEGORY 2 PDS INSTALLATION GUIDANCE of CNSSI No.7003, titled PROTECTED DISTRIBUTION SYSTEMS (PDS). Per the CNSSI No. 7003 Section VIII, PDS must originate and terminate in a Controlled Access Area (CAA). **Controlled Access Area (CAA)**: Data will be considered physically protected, and in a CAA if it meets Section 2.3 of the DHS's Recommended Practice: Improving Industrial Control System Cybersecurity with Defense-in-Depth Strategies. CSPs can meet Section 2.3 of the DHS' recommended practice by satisfactory implementation of the following controls PE-2 (1), PE-2 (2), PE-2 (3), PE-3 (2), PE-3 (3), PE-6 (2), and PE-6 (3). Note: When selecting SC-8 (5), the above SC-8(5), and the above referenced PE controls must be added to the SSP. CNSSI No.7003 can be accessed here: <https://www.dcsa.mil/Portals/91/documents/ctp/nao/CNSSI_7003_PDS_September_2015.pdf>. DHS Recommended Practice: Improving Industrial Control System Cybersecurity with Defense-in-Depth Strategies can be accessed here: <https://us-cert.cisa.gov/sites/default/files/FactSheets/NCCIC%20ICS_FactSheet_Defense_in_Depth_Strategies_S508C.pdf>

๐Ÿ’ผ SC-8 Transmission Confidentiality and Integrity (L)(M)(H)

Protect the [FedRAMP Assignment: confidentiality AND integrity] of transmitted information. **SC-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: For each instance of data in transit, confidentiality AND integrity should be through cryptography as specified in SC-8 (1), physical means as specified in SC-8 (5), or in combination. For clarity, this control applies to all data in transit. Examples include the following data flows: - Crossing the system boundary - Between compute instances - including containers - From a compute instance to storage - Replication between availability zones - Transmission of backups to storage - From a load balancer to a compute instance - Flows from management tools required for their work - e.g. log collection, scanning, etc. The following applies only when choosing SC-8 (5) in lieu of SC-8 (1) FedRAMP-Defined Assignment / Selection Parameters SC-8 (5)-1 [a hardened or alarmed carrier Protective Distribution System (PDS) when outside of Controlled Access Area (CAA)] SC-8 (5)-2 [prevent unauthorized disclosure of information AND detect changes to information] **Guidance**: SC-8 (5) applies when physical protection has been selected as the method to protect confidentiality and integrity. For physical protection, data in transit must be in either a Controlled Access Area (CAA), or a Hardened or alarmed PDS. **Hardened or alarmed PDS**: Shall be as defined in SECTION X - CATEGORY 2 PDS INSTALLATION GUIDANCE of CNSSI No.7003, titled PROTECTED DISTRIBUTION SYSTEMS (PDS). Per the CNSSI No. 7003 Section VIII, PDS must originate and terminate in a Controlled Access Area (CAA). **Controlled Access Area (CAA)**: Data will be considered physically protected, and in a CAA if it meets Section 2.3 of the DHS's Recommended Practice: Improving Industrial Control System Cybersecurity with Defense-in-Depth Strategies. CSPs can meet Section 2.3 of the DHS' recommended practice by satisfactory implementation of the following controls PE-2 (1), PE-2 (2), PE-2 (3), PE-3 (2), PE-3 (3), PE-6 (2), and PE-6 (3). Note: When selecting SC-8 (5), the above SC-8(5), and the above referenced PE controls must be added to the SSP. CNSSI No.7003 can be accessed here: <https://www.dcsa.mil/Portals/91/documents/ctp/nao/CNSSI_7003_PDS_September_2015.pdf>. DHS Recommended Practice: Improving Industrial Control System Cybersecurity with Defense-in-Depth Strategies can be accessed here: <https://us-cert.cisa.gov/sites/default/files/FactSheets/NCCIC%20ICS_FactSheet_Defense_in_Depth_Strategies_S508C.pdf>

๐Ÿ’ผ SC-8 Transmission Confidentiality and Integrity (L)(M)(H)

Protect the [FedRAMP Assignment: confidentiality AND integrity] of transmitted information. **SC-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: For each instance of data in transit, confidentiality AND integrity should be through cryptography as specified in SC-8 (1), physical means as specified in SC-8 (5), or in combination. For clarity, this control applies to all data in transit. Examples include the following data flows: - Crossing the system boundary - Between compute instances - including containers - From a compute instance to storage - Replication between availability zones - Transmission of backups to storage - From a load balancer to a compute instance - Flows from management tools required for their work - e.g. log collection, scanning, etc. The following applies only when choosing SC-8 (5) in lieu of SC-8 (1) FedRAMP-Defined Assignment / Selection Parameters SC-8 (5)-1 [a hardened or alarmed carrier Protective Distribution System (PDS) when outside of Controlled Access Area (CAA)] SC-8 (5)-2 [prevent unauthorized disclosure of information AND detect changes to information] **Guidance**: SC-8 (5) applies when physical protection has been selected as the method to protect confidentiality and integrity. For physical protection, data in transit must be in either a Controlled Access Area (CAA), or a Hardened or alarmed PDS. **Hardened or alarmed PDS**: Shall be as defined in SECTION X - CATEGORY 2 PDS INSTALLATION GUIDANCE of CNSSI No.7003, titled PROTECTED DISTRIBUTION SYSTEMS (PDS). Per the CNSSI No. 7003 Section VIII, PDS must originate and terminate in a Controlled Access Area (CAA). **Controlled Access Area (CAA)**: Data will be considered physically protected, and in a CAA if it meets Section 2.3 of the DHS's Recommended Practice: Improving Industrial Control System Cybersecurity with Defense-in-Depth Strategies. CSPs can meet Section 2.3 of the DHS' recommended practice by satisfactory implementation of the following controls PE-2 (1), PE-2 (2), PE-2 (3), PE-3 (2), PE-3 (3), PE-6 (2), and PE-6 (3). Note: When selecting SC-8 (5), the above SC-8(5), and the above referenced PE controls must be added to the SSP. CNSSI No.7003 can be accessed here: <https://www.dcsa.mil/Portals/91/documents/ctp/nao/CNSSI_7003_PDS_September_2015.pdf>. DHS Recommended Practice: Improving Industrial Control System Cybersecurity with Defense-in-Depth Strategies can be accessed here: <https://us-cert.cisa.gov/sites/default/files/FactSheets/NCCIC%20ICS_FactSheet_Defense_in_Depth_Strategies_S508C.pdf>

๐Ÿ’ผ SC-8(1) Cryptographic Protection (L)(M)(H)

Implement cryptographic mechanisms to [FedRAMP Assignment: prevent unauthorized disclosure of information AND detect changes to information] during transmission. **SC-8 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: See M-22-09, including "Agencies encrypt all DNS requests and HTTP traffic within their environment"SC-8 (1) applies when encryption has been selected as the method to protect confidentiality and integrity. Otherwise refer to SC-8 (5). SC-8 (1) is strongly encouraged. **Guidance**: Note that this enhancement requires the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or NSA approved cryptography (see SC-13). **Guidance**: When leveraging encryption from the underlying IaaS/PaaS: While some IaaS/PaaS services provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Requirement**: Please ensure SSP Section 10.3 Cryptographic Modules Implemented for Data At Rest (DAR) and Data In Transit (DIT) is fully populated for reference in this control.

๐Ÿ’ผ SC-8(1) Cryptographic Protection (L)(M)(H)

Implement cryptographic mechanisms to [FedRAMP Assignment: prevent unauthorized disclosure of information AND detect changes to information] during transmission. **SC-8 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: See M-22-09, including "Agencies encrypt all DNS requests and HTTP traffic within their environment"SC-8 (1) applies when encryption has been selected as the method to protect confidentiality and integrity. Otherwise refer to SC-8 (5). SC-8 (1) is strongly encouraged. **Guidance**: Note that this enhancement requires the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or NSA approved cryptography (see SC-13). **Guidance**: When leveraging encryption from the underlying IaaS/PaaS: While some IaaS/PaaS services provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Requirement**: Please ensure SSP Section 10.3 Cryptographic Modules Implemented for Data At Rest (DAR) and Data In Transit (DIT) is fully populated for reference in this control.

๐Ÿ’ผ SC-8(1) Cryptographic Protection (L)(M)(H)

Implement cryptographic mechanisms to [FedRAMP Assignment: prevent unauthorized disclosure of information AND detect changes to information] during transmission. **SC-8 (1) Additional FedRAMP Requirements and Guidance:** **Guidance**: See M-22-09, including "Agencies encrypt all DNS requests and HTTP traffic within their environment"SC-8 (1) applies when encryption has been selected as the method to protect confidentiality and integrity. Otherwise refer to SC-8 (5). SC-8 (1) is strongly encouraged. **Guidance**: Note that this enhancement requires the use of cryptography which must be compliant with Federal requirements and utilize FIPS validated or NSA approved cryptography (see SC-13). **Guidance**: When leveraging encryption from the underlying IaaS/PaaS: While some IaaS/PaaS services provide encryption by default, many require encryption to be configured, and enabled by the customer. The CSP has the responsibility to verify encryption is properly configured. **Requirement**: Please ensure SSP Section 10.3 Cryptographic Modules Implemented for Data At Rest (DAR) and Data In Transit (DIT) is fully populated for reference in this control.

๐Ÿ’ผ SEC01-BP01 Separate workloads using accounts

Establish common guardrails and isolation between environments (such as production, development, and test) and workloads through a multi-account strategy. Account-level separation is strongly recommended, as it provides a strong isolation boundary for security, billing, and access. **Desired outcome** An account structure that isolates cloud operations, unrelated workloads, and environments into separate accounts, increasing security across the cloud infrastructure. **Common anti-patterns** - Placing multiple unrelated workloads with different data sensitivity levels into the same account. - Poorly defined organizational unit (OU) structure. **Benefits of establishing this best practice** - Decreased scope of impact if a workload is inadvertently accessed. - Central governance of access to AWS services, resources, and Regions. - Maintain security of the cloud infrastructure with policies and centralized administration of security services. - Automated account creation and maintenance process. - Centralized auditing of your infrastructure for compliance and regulatory requirements. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS accounts provide a security isolation boundary between workloads or resources that operate at different sensitivity levels. AWS provides tools to manage your cloud workloads at scale through a multi-account strategy to leverage this isolation boundary. For guidance on the concepts, patterns, and implementation of a multi-account strategy on AWS, see Organizing Your AWS Environment Using Multiple Accounts. When you have multiple AWS accounts under central management, your accounts should be organized into a hierarchy defined by layers of organizational units (OUs). Security controls can then be organized and applied to the OUs and member accounts, establishing consistent preventative controls on member accounts in the organization. The security controls are inherited, allowing you to filter permissions available to member accounts located at lower levels of an OU hierarchy. A good design takes advantage of this inheritance to reduce the number and complexity of security policies required to achieve the desired security controls for each member account. AWS Organizations and AWS Control Tower are two services that you can use to implement and manage this multi-account structure in your AWS environment. AWS Organizations allows you to organize accounts into a hierarchy defined by one or more layers of OUs, with each OU containing a number of member accounts. Service control policies (SCPs) allow the organization administrator to establish granular preventative controls on member accounts, and AWS Config can be used to establish proactive and detective controls on member accounts. Many AWS services integrate with AWS Organizations to provide delegated administrative controls and performing service-specific tasks across all member accounts in the organization. Layered on top of AWS Organizations, AWS Control Tower provides a one-click best practices setup for a multi-account AWS environment with a landing zone. The landing zone is the entry point to the multi-account environment established by Control Tower. Control Tower provides several benefits over AWS Organizations. Three benefits that provide improved account governance are: - Integrated mandatory security controls that are automatically applied to accounts admitted into the organization. - Optional controls that can be turned on or off for a given set of OUs. - AWS Control Tower Account Factory provides automated deployment of accounts containing pre-approved baselines and configuration options inside your organization. ### Implementation steps 1. Design an organizational unit structure: A properly designed organizational unit structure reduces the management burden required to create and maintain service control policies and other security controls. Your organizational unit structure should be aligned with your business needs, data sensitivity, and workload structure. 2. Create a landing zone for your multi-account environment: A landing zone provides a consistent security and infrastructure foundation from which your organization can quickly develop, launch, and deploy workloads. You can use a custom-built landing zone or AWS Control Tower to orchestrate your environment. 3. Establish guardrails: Implement consistent security guardrails for your environment through your landing zone. AWS Control Tower provides a list of mandatory and optional controls that can be deployed. Mandatory controls are automatically deployed when implementing Control Tower. Review the list of highly recommended and optional controls, and implement controls that are appropriate to your needs. 4. Restrict access to newly added Regions: For new AWS Regions, IAM resources such as users and roles are only propagated to the Regions that you specify. This action can be performed through the console when using Control Tower, or by adjusting IAM permission policies in AWS Organizations. 5. Consider AWS CloudFormation StackSets: StackSets help you deploy resources including IAM policies, roles, and groups into different AWS accounts and Regions from an approved template.

๐Ÿ’ผ SEC01-BP02 Secure account root user and properties

The root user is the most privileged user in an AWS account, with full administrative access to all resources within the account, and in some cases cannot be constrained by security policies. Deactivating programmatic access to the root user, establishing appropriate controls for the root user, and avoiding routine use of the root user helps reduce the risk of inadvertent exposure of the root credentials and subsequent compromise of the cloud environment. **Desired outcome** Securing the root user helps reduce the chance that accidental or intentional damage can occur through the misuse of root user credentials. Establishing detective controls can also alert the appropriate personnel when actions are taken using the root user. **Common anti-patterns** - Using the root user for tasks other than the few that require root user credentials. - Neglecting to test contingency plans on a regular basis to verify the functioning of critical infrastructure, processes, and personnel during an emergency. - Only considering the typical account login flow and neglecting to consider or test alternate account recovery methods. - Not handling DNS, email servers, and telephone providers as part of the critical security perimeter, as these are used in the account recovery flow. **Benefits of establishing this best practice** Securing access to the root user builds confidence that actions in your account are controlled and audited. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS offers many tools to help secure your account. However, because some of these measures are not turned on by default, you must take direct action to implement them. Consider these recommendations as foundational steps to securing your AWS account. As you implement these steps itโ€™s important that you build a process to continuously assess and monitor the security controls. When you first create an AWS account, you begin with one identity that has complete access to all AWS services and resources in the account. This identity is called the AWS account root user. You can sign in as the root user using the email address and password that you used to create the account. Due to the elevated access granted to the AWS root user, you must limit use of the AWS root user to perform tasks that specifically require it. The root user login credentials must be closely guarded, and multi-factor authentication (MFA) should always be used for the AWS account root user. In addition to the normal authentication flow to log into your root user using a username, password, and multi-factor authentication (MFA) device, there are account recovery flows to log into your AWS account root user given access to the email address and phone number associated with your account. Therefore, it is equally important to secure the root user email account where the recovery email is sent and the phone number associated with the account. Also consider potential circular dependencies where the email address associated with the root user is hosted on email servers or domain name service (DNS) resources from the same AWS account. When using AWS Organizations, there are multiple AWS accounts each of which have a root user. One account is designated as the management account and several layers of member accounts can then be added underneath the management account. Prioritize securing your management accountโ€™s root user, then address your member account root users. The strategy for securing your management accountโ€™s root user can differ from your member account root users, and you can place preventative security controls on your member account root users. ### Implementation steps The following implementation steps are recommended to establish controls for the root user. Where applicable, recommendations are cross-referenced to CIS AWS Foundations benchmark version 1.4.0. In addition to these steps, consult AWS best practice guidelines for securing your AWS account and resources. ### Preventative controls 1. Set up accurate contact information for the account. 1. This information is used for the lost password recovery flow, lost MFA device account recovery flow, and for critical security-related communications with your team. 2. Use an email address hosted by your corporate domain, preferably a distribution list, as the root userโ€™s email address. Using a distribution list rather than an individualโ€™s email account provides additional redundancy and continuity for access to the root account over long periods of time. 3. The phone number listed on the contact information should be a dedicated, secure phone for this purpose. The phone number should not be listed or shared with anyone. 2. Do not create access keys for the root user. If access keys exist, remove them (CIS 1.4). 1. Eliminate any long-lived programmatic credentials (access and secret keys) for the root user. 2. If root user access keys already exist, you should transition processes using those keys to use temporary access keys from an AWS Identity and Access Management (IAM) role, then delete the root user access keys. 3. Determine if you need to store credentials for the root user. 1. If you are using AWS Organizations to create new member accounts, the initial password for the root user on new member accounts is set to a random value that is not exposed to you. Consider using the password reset flow from your AWS Organization management account to gain access to the member account if needed. 2. For standalone AWS accounts or the management AWS Organization account, consider creating and securely storing credentials for the root user. Use MFA for the root user. 4. Use preventative controls for member account root users in AWS multi-account environments. 1. Consider using the *Disallow Creation of Root Access Keys for the Root User* preventative guard rail for member accounts. 2. Consider using the *Disallow Actions as a Root User* preventative guard rail for member accounts. 5. If you need credentials for the root user: 1. Use a complex password. 2. Turn on multi-factor authentication (MFA) for the root user, especially for AWS Organizations management (payer) accounts (CIS 1.5). 3. Consider hardware MFA devices for resiliency and security, as single use devices can reduce the chances that the devices containing your MFA codes might be reused for other purposes. Verify that hardware MFA devices powered by a battery are replaced regularly. (CIS 1.6) - To configure MFA for the root user, follow the instructions for creating either a virtual MFA or hardware MFA device. 4. Consider enrolling multiple MFA devices for backup. Up to 8 MFA devices are allowed per account. - Note that enrolling more than one MFA device for the root user automatically turns off the flow for recovering your account if the MFA device is lost. 5. Store the password securely, and consider circular dependencies if storing the password electronically. Donโ€™t store the password in such a way that would require access to the same AWS account to obtain it. 6. Optional: Consider establishing a periodic password rotation schedule for the root user. - Credential management best practices depend on your regulatory and policy requirements. Root users protected by MFA are not reliant on the password as a single factor of authentication. - Changing the root user password on a periodic basis reduces the risk that an inadvertently exposed password can be misused. ### Detective controls - Create alarms to detect use of the root credentials (CIS 1.7). Amazon GuardDuty can monitor and alert on root user API credential usage through the *RootCredentialUsage* finding. - Evaluate and implement the detective controls included in the AWS Well-Architected Security Pillar conformance pack for AWS Config, or if using AWS Control Tower, the strongly recommended controls available inside Control Tower. ### Operational guidance - Determine who in the organization should have access to the root user credentials. - Use a two-person rule so that no one individual has access to all necessary credentials and MFA to obtain root user access. - Verify that the organization, and not a single individual, maintains control over the phone number and email alias associated with the account (which are used for password reset and MFA reset flow). - Use root user only by exception (CIS 1.7). - The AWS root user must not be used for everyday tasks, even administrative ones. Only log in as the root user to perform AWS tasks that require root user. All other actions should be performed by other users assuming appropriate roles. - Periodically check that access to the root user is functioning so that procedures are tested prior to an emergency situation requiring the use of the root user credentials. - Periodically check that the email address associated with the account and those listed under Alternate Contacts work. Monitor these email inboxes for security notifications you might receive from <abuse@amazon.com>. Also ensure any phone numbers associated with the account are working. - Prepare incident response procedures to respond to root account misuse. Refer to the AWS Security Incident Response Guide and the best practices in the Incident Response section of the Security Pillar whitepaper for more information on building an incident response strategy for your AWS account.

๐Ÿ’ผ SEC01-BP03 Identify and validate control objectives

Based on your compliance requirements and risks identified from your threat model, derive and validate the control objectives and controls that you need to apply to your workload. Ongoing validation of control objectives and controls help you measure the effectiveness of risk mitigation. **Desired outcome** - The security control objectives of your business are well-defined and aligned to your compliance requirements. - Controls are implemented and enforced through automation and policy and are continually evaluated for their effectiveness in achieving your objectives. - Evidence of effectiveness at both a point in time and over a period of time are readily reportable to auditors. **Common anti-patterns** - Regulatory requirements, market expectations, and industry standards for assurable security are not well-understood for your business. - Your cybersecurity frameworks and control objectives are misaligned to the requirements of your business. - The implementation of controls does not strongly align to your control objectives in a measurable way. - You do not use automation to report on the effectiveness of your controls. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance There are many common cybersecurity frameworks that can form the basis for your security control objectives. Consider the regulatory requirements, market expectations, and industry standards for your business to determine which frameworks best supports your needs. Examples include AICPA SOC 2, HITRUST, PCI-DSS, ISO 27001, and NIST SP 800-53. For the control objectives you identify, understand how AWS services you consume help you to achieve those objectives. Use AWS Artifact to find documentation and reports aligned to your target frameworks that describe the scope of responsibility covered by AWS and guidance for the remaining scope that is your responsibility. For further service-specific guidance as they align to various framework control statements, see AWS Customer Compliance Guides. As you define the controls that achieve your objectives, codify enforcement using preventative controls, and automate mitigations using detective controls. Help prevent non-compliant resource configurations and actions across your AWS Organizations using service control policies (SCP). Implement rules in AWS Config to monitor and report on non-compliant resources, then switch rules to an enforcement model once confident in their behavior. To deploy sets of pre-defined and managed rules that align to your cybersecurity frameworks, evaluate the use of AWS Security Hub standards as your first option. The AWS Foundational Service Best Practices (FSBP) standard and the CIS AWS Foundations Benchmark are good starting points with controls that align to many objectives that are shared across multiple standard frameworks. Where Security Hub does not intrinsically have the control detections desired, it can be complemented using AWS Config conformance packs. Use APN Partner Bundles recommended by the AWS Global Security and Compliance Acceleration (GSCA) team to get assistance from security advisors, consulting agencies, evidence collection and reporting systems, auditors, and other complementary services when required. ### Implementation steps - Evaluate common cybersecurity frameworks, and align your control objectives to the ones chosen. - Obtain relevant documentation on guidance and responsibilities for your framework using AWS Artifact. Understand which parts of compliance fall on the AWS side of the shared responsibility model and which parts are your responsibility. - Use SCPs, resource policies, role trust policies, and other guardrails to prevent non-compliant resource configurations and actions. - Evaluate deploying Security Hub standards and AWS Config conformance packs that align to your control objectives.

๐Ÿ’ผ SEC01-BP04 Stay up to date with security threats and recommendations

Stay up to date with the latest threats and mitigations by monitoring industry threat intelligence publications and data feeds for updates. Evaluate managed service offerings that automatically update based on the latest threat data. **Desired outcome** - You stay informed as industry publications are updated with the latest threats and recommendations. - You use automation to detect potential vulnerabilities and exposures as you identify new threats. - You take mitigating action against these threats. - You adopt AWS services that automatically update with the latest threat intelligence. **Common anti-patterns** - Not having a reliable and repeatable mechanism to stay informed of the latest threat intelligence. - Maintaining manual inventory of your technology portfolio, workloads, and dependencies that require human review for potential vulnerabilities and exposures. - Not having mechanisms in place to update your workloads and dependencies to the latest versions available that provide known threat mitigations. **Benefits of establishing this best practice** Using threat intelligence sources to stay up to date reduces the risk of missing out on important changes to the threat landscape that can impact your business. Having automation in place to scan, detect, and remediate where potential vulnerabilities or exposures exist in your workloads and their dependencies can help you mitigate risks quickly and predictably, compared to manual alternatives. This helps control time and costs related to vulnerability mitigation. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Review trusted threat intelligence publications to stay on top of the threat landscape. Consult the MITRE ATT&CK knowledge base for documentation on known adversarial tactics, techniques, and procedures (TTPs). Review MITRE's Common Vulnerabilities and Exposures (CVE) list to stay informed on known vulnerabilities in products you rely on. Understand critical risks to web applications with the Open Worldwide Application Security Project (OWASP)'s popular OWASP Top 10 project. Stay up to date on AWS security events and recommended remediation steps with AWS Security Bulletins for CVEs. To reduce your overall effort and overhead of staying up to date, consider using AWS services that automatically incorporate new threat intelligence over time. For example, Amazon GuardDuty stays up to date with industry threat intelligence for detecting anomalous behaviors and threat signatures within your accounts. Amazon Inspector automatically keeps a database of the CVEs it uses for its continuous scanning features up to date. Both AWS WAF and AWS Shield Advanced provide managed rule groups that are updated automatically as new threats emerge. Review the Well-Architected operational excellence pillar for automated fleet management and patching. ### Implementation steps - Subscribe to updates for threat intelligence publications that are relevant to your business and industry. Subscribe to the AWS Security Bulletins. - Consider adopting services that incorporate new threat intelligence automatically, such as Amazon GuardDuty and Amazon Inspector. - Deploy a fleet management and patching strategy that aligns with the best practices of the Well-Architected Operational Excellence Pillar.

๐Ÿ’ผ SEC01-BP05 Reduce security management scope

Determine if you can reduce your security scope by using AWS services that shift management of certain controls to AWS (managed services). These services can help reduce your security maintenance tasks, such as infrastructure provisioning, software setup, patching, or backups. **Desired outcome** - You consider the scope of your security management when selecting AWS services for your workload. - The cost of management overhead and maintenance tasks (the total cost of ownership, or TCO) is weighed against the cost of the services you select, in addition to other Well-Architected considerations. - You incorporate AWS control and compliance documentation into your control evaluation and verification procedures. **Common anti-patterns** - Deploying workloads without thoroughly understanding the shared responsibility model for the services you select. - Hosting databases and other technologies on virtual machines without having evaluated a managed service equivalent. - Not including security management tasks into the total cost of ownership of hosting technologies on virtual machines when compared to managed service options. **Benefits of establishing this best practice** Using managed services can reduce your overall burden of managing operational security controls, which can reduce your security risks and total cost of ownership. Time that would otherwise be spent on certain security tasks can be reinvested into tasks that provide more value to your business. Managed services can also reduce the scope of your compliance requirements by shifting some control requirements to AWS. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance There are multiple ways you can integrate the components of your workload on AWS. Installing and running technologies on Amazon EC2 instances often requires you to take on the largest share of the overall security responsibility. To help reduce the burden of operating certain controls, identify AWS managed services that reduce the scope of your side of the shared responsibility model and understand how you can use them in your existing architecture. Examples include using the Amazon Relational Database Service (Amazon RDS) for deploying databases, Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS) for orchestrating containers, or using serverless options. When building new applications, think through which services can help reduce time and cost when it comes to implementing and managing security controls. Compliance requirements can also be a factor when selecting services. Managed services can shift the compliance of some requirements to AWS. Discuss with your compliance team about their degree of comfort with auditing the aspects of services you operate and manage and accepting control statements in relevant AWS audit reports. You can provide the audit artifacts found in AWS Artifact to your auditors or regulators as evidence of AWS security controls. You can also use the responsibility guidance provided by some of the AWS audit artifacts to design your architecture, along with the AWS Customer Compliance Guides. This guidance helps determine the additional security controls you should put in place in order to support the specific use cases of your system. When using managed services, be familiar with the process of updating their resources to newer versions (for example, updating the version of a database managed by Amazon RDS, or a programming language runtime for an AWS Lambda function). While the managed service may perform this operation for you, configuring the timing of the update and understanding the impact on your operations remains your responsibility. Tools like AWS Health can help you track and manage these updates throughout your environments. ### Implementation steps 1. Evaluate the components of your workload that can be replaced with a managed service. 1. If you are migrating a workload to AWS, consider the reduced management (time and expense) and reduction of risk when you assess if you should rehost, refactor, replatform, rebuild, or replace your workload. Sometimes additional investment at the start of a migration can have significant savings in the long run. 2. Consider implementing managed services, like Amazon RDS, instead of installing and managing your own technology deployments. 3. Use the responsibility guidance in AWS Artifact to help determine the security controls you should put in place for your workload. 4. Keep an inventory of resources in use, and stay up-to-date with new services and approaches to identify new opportunities to reduce scope.

๐Ÿ’ผ SEC01-BP06 Automate deployment of standard security controls

Apply modern DevOps practices as you develop and deploy security controls that are standard across your AWS environments. Define standard security controls and configurations using Infrastructure as Code (IaC) templates, capture changes in a version control system, test changes as part of a CI/CD pipeline, and automate the deployment of changes to your AWS environments. **Desired outcome** - IaC templates capture standardized security controls and commit them to a version control system. - CI/CD pipelines are in place that detect changes and automate testing and deploying your AWS environments. - Guardrails are in place to detect and alert on misconfigurations in templates before proceeding to deployment. - Workloads are deployed into environments where standard controls are in place. - Teams have access to deploy approved service configurations through a self-service mechanism. - Secure backup and recovery strategies are in place for control configurations, scripts, and related data. **Common anti-patterns** - Making changes to your standard security controls manually, through a web console or command-line interface. - Relying on individual workload teams to manually implement the controls a central team defines. - Relying on a central security team to deploy workload-level controls at the request of a workload team. - Allowing the same individuals or teams to develop, test, and deploy security control automation scripts without proper separation of duties or checks and balances. **Benefits of establishing this best practice** Using templates to define your standard security controls allows you to track and compare changes over time using a version control system. Using automation to test and deploy changes creates standardization and predictability, increasing the chances of a successful deployment and reducing manual repetitive tasks. Providing a self-serve mechanism for workload teams to deploy approved services and configurations reduces the risk of misconfiguration and misuse. This also helps them to incorporate controls earlier in the development process. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance When following the practices described in SEC01-BP01 Separate workloads using accounts, you will end up with multiple AWS accounts for different environments that you manage using AWS Organizations. While each of these environments and workloads may need distinct security controls, you can standardize some security controls across your organization. Examples include integrating centralized identity providers, defining networks and firewalls, and configuring standard locations for storing and analyzing logs. In the same way you can use infrastructure as code (IaC) to apply the same rigor of application code development to infrastructure provisioning, you can use IaC to define and deploy your standard security controls as well. Wherever possible, define your security controls in a declarative way, such as in AWS CloudFormation, and store them in a source control system. Use DevOps practices to automate the deploying your controls for more predictable releases, automated testing using tools like AWS CloudFormation Guard, and detecting drift between your deployed controls and your desired configuration. You can use services such as AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy to construct a CI/CD pipeline. Consider the guidance in Organizing Your AWS Environment Using Multiple Accounts to configure these services in their own accounts that are separate from other deployment pipelines. You can also define templates to standardize defining and deploying AWS accounts, services, and configurations. This technique allows a central security team to manage these definitions and provide them to workload teams through a self-service approach. One way to achieve this is by using Service Catalog, where you can publish templates as products that workload teams can incorporate into their own pipeline deployments. If you are using AWS Control Tower, some templates and controls are available as a starting point. Control Tower also provides the Account Factory capability, allowing workload teams to create new AWS accounts using the standards you define. This capability helps remove dependencies on a central team to approve and create new accounts when they are identified as needed by your workload teams. You may need these accounts to isolate different workload components based on reasons such as the function they serve, the sensitivity of data being processed, or their behavior. ### Implementation steps 1. Determine how you will store and maintain your templates in a version control system. 2. Create CI/CD pipelines to test and deploy your templates. Define tests to check for misconfigurations and that templates adhere to your company standards. 3. Build a catalog of standardized templates for workload teams to deploy AWS accounts and services according to your requirements. 4. Implement secure backup and recovery strategies for your control configurations, scripts, and related data.

๐Ÿ’ผ SEC01-BP07 Identify threats and prioritize mitigations using a threat model

Perform threat modeling to identify and maintain an up-to-date register of potential threats and associated mitigations for your workload. Prioritize your threats and adapt your security control mitigations to prevent, detect, and respond. Revisit and maintain this in the context of your workload, and the evolving security landscape. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance **What is threat modeling?** โ€œThreat modeling works to identify, communicate, and understand threats and mitigations within the context of protecting something of value.โ€ โ€“ The Open Web Application Security Project (OWASP) Application Threat Modeling **Why should you threat model?** Systems are complex, and are becoming increasingly more complex and capable over time, delivering more business value and increased customer satisfaction and engagement. This means that IT design decisions need to account for an ever-increasing number of use cases. This complexity and number of use-case permutations typically makes unstructured approaches ineffective for finding and mitigating threats. Instead, you need a systematic approach to enumerate the potential threats to the system, and to devise mitigations and prioritize these mitigations to make sure that the limited resources of your organization have the maximum impact in improving the overall security posture of the system. Threat modeling is designed to provide this systematic approach, with the aim of finding and addressing issues early in the design process, when the mitigations have a low relative cost and effort compared to later in the lifecycle. This approach aligns with the industry principle of shift-left security. Ultimately, threat modeling integrates with an organizationโ€™s risk management process and helps drive decisions on which controls to implement by using a threat-driven approach. **When should threat modeling be performed?** Start threat modeling as early as possible in the lifecycle of your workload, this gives you better flexibility on what to do with the threats you have identified. Much like software bugs, the earlier you identify threats, the more cost effective it is to address them. A threat model is a living document and should continue to evolve as your workloads change. Revisit your threat models over time, including when there is a major change, a change in the threat landscape, or when you adopt a new feature or service. ### Implementation steps **How can we perform threat modeling?** There are many different ways to perform threat modeling. Much like programming languages, there are advantages and disadvantages to each, and you should choose the way that works best for you. One approach is to start with Shostackโ€™s 4 Question Frame for Threat Modeling, which poses open-ended questions to provide structure to your threat modeling exercise: 1. **What are we working on?** The purpose of this question is to help you understand and agree upon the system you are building and the details about that system that are relevant to security. Creating a model or diagram is the most popular way to answer this question, as it helps you to visualize what you are building, for example, using a data flow diagram. Writing down assumptions and important details about your system also helps you define what is in scope. This allows everyone contributing to the threat model to focus on the same thing, and avoid time-consuming detours into out-of-scope topics (including out of date versions of your system). For example, if you are building a web application, it is probably not worth your time threat modeling the operating system trusted boot sequence for browser clients, as you have no ability to affect this with your design. 2. **What can go wrong?** This is where you identify threats to your system. Threats are accidental or intentional actions or events that have unwanted impacts and could affect the security of your system. Without a clear understanding of what could go wrong, you have no way of doing anything about it. There is no canonical list of what can go wrong. Creating this list requires brainstorming and collaboration between all of the individuals within your team and relevant personas involved in the threat modeling exercise. You can aid your brainstorming by using a model for identifying threats, such as STRIDE, which suggests different categories to evaluate: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of privilege. In addition, you might want to aid the brainstorming by reviewing existing lists and research for inspiration, including the OWASP Top 10, HiTrust Threat Catalog, and your organizationโ€™s own threat catalog. 3. **What are we going to do about it?** As was the case with the previous question, there is no canonical list of all possible mitigations. The inputs into this step are the identified threats, actors, and areas of improvement from the previous step. Security and compliance is a shared responsibility between you and AWS. Itโ€™s important to understand that when you ask โ€œWhat are we going to do about it?โ€, that you are also asking โ€œWho is responsible for doing something about it?โ€. Understanding the balance of responsibilities between you and AWS helps you scope your threat modeling exercise to the mitigations that are under your control, which are typically a combination of AWS service configuration options and your own system-specific mitigations. For the AWS portion of the shared responsibility, you will find that AWS services are in-scope of many compliance programs. These programs help you to understand the robust controls in place at AWS to maintain security and compliance of the cloud. The audit reports from these programs are available for download for AWS customers from AWS Artifact. Regardless of which AWS services you are using, thereโ€™s always an element of customer responsibility, and mitigations aligned to these responsibilities should be included in your threat model. For security control mitigations for the AWS services themselves, you want to consider implementing security controls across domains, including domains such as identity and access management (authentication and authorization), data protection (at rest and in transit), infrastructure security, logging, and monitoring. The documentation for each AWS service has a dedicated security chapter that provides guidance on the security controls to consider as mitigations. Importantly, consider the code that you are writing and its code dependencies, and think about the controls that you could put in place to address those threats. These controls could be things such as input validation, session handling, and bounds handling. Often, the majority of vulnerabilities are introduced in custom code, so focus on this area. **Did we do a good job?** The aim is for your team and organization to improve both the quality of threat models and the velocity at which you are performing threat modeling over time. These improvements come from a combination of practice, learning, teaching, and reviewing. To go deeper and get hands on, itโ€™s recommended that you and your team complete the Threat modeling the right way for builders training course or workshop. In addition, if you are looking for guidance on how to integrate threat modeling into your organizationโ€™s application development lifecycle, see How to approach threat modeling post on the AWS Security Blog. **Threat Composer** To aid and guide you in performing threat modeling, consider using the Threat Composer tool, which aims to reduce your time-to-value when threat modeling. The tool helps you do the following: - Write useful threat statements aligned to threat grammar that work in a natural non-linear workflow - Generate a human-readable threat model - Generate a machine-readable threat model to allow you to treat threat models as code - Help you to quickly identify areas of quality and coverage improvement using the Insights Dashboard For further reference, visit Threat Composer and switch to the system-defined Example Workspace.

๐Ÿ’ผ SEC01-BP08 Evaluate and implement new security services and features regularly

Evaluate and implement security services and features from AWS and AWS Partners that help you evolve the security posture of your workload. **Desired outcome** You have a standard practice in place that informs you of new features and services released by AWS and AWS Partners. You evaluate how these new capabilities influence the design of current and new controls for your environments and workloads. **Common anti-patterns** - You don't subscribe to AWS blogs and RSS feeds to learn of relevant new features and services quickly - You rely on news and updates about security services and features from second-hand sources - You don't encourage AWS users in your organization to stay informed on the latest updates **Benefits of establishing this best practice** When you stay on top of new security services and features, you can make informed decisions about the implementation of controls in your cloud environments and workloads. These sources help raise awareness of the evolving security landscape and how AWS services can be used to protect against new and emerging threats. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance AWS informs customers of new security services and features through several channels: - AWS What's New - AWS News Blog - AWS Security Blog - AWS Security Bulletins - AWS documentation overview You can subscribe to an AWS Daily Feature Updates topic using Amazon Simple Notification Service (Amazon SNS) for a comprehensive daily summary of updates. Some security services, such as Amazon GuardDuty and AWS Security Hub, provide their own SNS topics to stay informed about new standards, findings, and other updates for those particular services. New services and features are also announced and described in detail during conferences, events, and webinars conducted around the globe each year. Of particular note is the annual AWS re:Inforce security conference and the more general AWS re:Invent conference. The previously-mentioned AWS news channels share these conference announcements about security and other services, and you can view deep dive educational breakout sessions online at the AWS Events channel on YouTube. You can also ask your AWS account team about the latest security service updates and recommendations. You can reach out to your team through the Sales Support form if you do not have their direct contact information. Similarly, if you subscribed to AWS Enterprise Support, you will receive weekly updates from your Technical Account Manager (TAM) and can schedule a regular review meeting with them. ### Implementation steps 1. Subscribe to the various blogs and bulletins with your favorite RSS reader or to the Daily Features Updates SNS topic. 2. Evaluate which AWS events to attend to learn first-hand about new features and services. 3. Set up meetings with your AWS account team for any questions about updating security services and features. 4. Consider subscribing to Enterprise Support to have regular consultations with a Technical Account Manager (TAM).

๐Ÿ’ผ SEC02-BP01 Use strong sign-in mechanisms

Sign-ins (authentication using sign-in credentials) can present risks when not using mechanisms like multi-factor authentication (MFA), especially in situations where sign-in credentials have been inadvertently disclosed or are easily guessed. Use strong sign-in mechanisms to reduce these risks by requiring MFA and strong password policies. **Desired outcome** Reduce the risks of unintended access to credentials in AWS by using strong sign-in mechanisms for AWS Identity and Access Management (IAM) users, the AWS account root user, AWS IAM Identity Center, and third-party identity providers. This means requiring MFA, enforcing strong password policies, and detecting anomalous login behavior. **Common anti-patterns** - Not enforcing a strong password policy for your identities including complex passwords and MFA. - Sharing the same credentials among different users. - Not using detective controls for suspicious sign-ins. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance There are several ways for human identities to sign in to AWS. It is an AWS best practice to rely on a centralized identity provider using federation (direct SAML 2.0 federation between AWS IAM and the centralized IdP or using AWS IAM Identity Center) when authenticating to AWS. In this case, establish a secure sign-in process with your identity provider or Microsoft Active Directory. When you first open an AWS account, you begin with an AWS account root user. You should only use the account root user to set up access for your users (and for tasks that require the root user). It's important to turn on multi-factor authentication (MFA) for the account root user immediately after opening your AWS account and to secure the root user using the AWS best practice guide. AWS IAM Identity Center is designed for workforce users, and you can create and manage user identities within the service and secure the sign-in process with MFA. AWS Cognito, on the other hand, is designed for customer identity and access management (CIAM), which provides user pools and identity providers for external user identities in your applications. If you create users in AWS IAM Identity Center, secure the sign-in process in that service and turn on MFA. For external user identities in your applications, you can use Amazon Cognito user pools and secure the sign-in process in that service or through one of the supported identity providers in Amazon Cognito user pools. Additionally, for users in AWS IAM Identity Center, you can use AWS Verified Access to provide an additional layer of security by verifying the user's identity and device posture before they are granted access to AWS resources. If you are using AWS Identity and Access Management (IAM) users, secure the sign-in process using IAM. You can use both AWS IAM Identity Center and direct IAM federation simultaneously to manage access to AWS. You can use IAM federation to manage access to the AWS Management Console and services and IAM Identity Center to manage access to business applications like QuickSight or Amazon Q Business. Regardless of the sign-in method, it's critical to enforce a strong sign-in policy. ### Implementation steps The following are general strong sign-in recommendations. The actual settings you configure should be set by your company policy or use a standard like NIST 800-63. - Require MFA. It's an IAM best practice to require MFA for human identities and workloads. Turning on MFA provides an additional layer of security requiring that users provide sign-in credentials and a one-time password (OTP) or a cryptographically verified and generated string from a hardware device. - Enforce a minimum password length, which is a primary factor in password strength. - Enforce password complexity to make passwords more difficult to guess. - Allow users to change their own passwords. - Create individual identities instead of shared credentials. By creating individual identities, you can give each user a unique set of security credentials. Individual users provide the ability to audit each user's activity. **IAM Identity Center recommendations:** - IAM Identity Center provides a predefined password policy when using the default directory that establishes password length, complexity, and reuse requirements. - Turn on MFA and configure the context-aware or always-on setting for MFA when the identity source is the default directory, AWS Managed Microsoft AD, or AD Connector. - Allow users to register their own MFA devices. **Amazon Cognito user pools directory recommendations:** - Configure the Password strength settings. - Require MFA for users. - Use the Amazon Cognito user pools advanced security settings for features like adaptive authentication which can block suspicious sign-ins. **IAM user recommendations:** - Ideally you are using IAM Identity Center or direct federation. However, you might have the need for IAM users. In that case, set a password policy for IAM users. You can use the password policy to define requirements such as minimum length or whether the password requires non-alphabetic characters. - Create an IAM policy to enforce MFA sign-in so that users are allowed to manage their own passwords and MFA devices.

๐Ÿ’ผ SEC02-BP02 Use temporary credentials

When doing any type of authentication, it's best to use temporary credentials instead of long-term credentials to reduce or eliminate risks, such as credentials being inadvertently disclosed, shared, or stolen. **Desired outcome** To reduce the risk of long-term credentials, use temporary credentials wherever possible for both human and machine identities. Long-term credentials create many risks, such as exposure through uploads to public repositories. By using temporary credentials, you significantly reduce the chances of credentials becoming compromised. **Common anti-patterns** - Developers using long-term access keys from IAM users rather than obtaining temporary credentials from the CLI using federation. - Developers embedding long-term access keys in their code and uploading that code to public Git repositories. - Developers embedding long-term access keys in mobile apps that are then made available in app stores. - Users sharing long-term access keys with other users, or employees leaving the company with long-term access keys still in their possession. - Using long-term access keys for machine identities when temporary credentials could be used. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Use temporary security credentials instead of long-term credentials for all AWS API and CLI requests. API and CLI requests to AWS services must, in nearly every case, be signed using AWS access keys. These requests can be signed with either temporary or long-term credentials. The only time you should use long-term credentials, also known as long-term access keys, is if you are using an IAM user or the AWS account root user. When you federate to AWS or assume an IAM role through other methods, temporary credentials are generated. Even when you access the AWS Management Console using sign-in credentials, temporary credentials are generated for you to make calls to AWS services. There are few situations where you need long-term credentials and you can accomplish nearly all tasks using temporary credentials. Avoiding the use of long-term credentials in favor of temporary credentials should go hand in hand with a strategy of reducing the usage of IAM users in favor of federation and IAM roles. While IAM users have been used for both human and machine identities in the past, we now recommend not using them to avoid the risks in using long-term access keys. ### Implementation steps ### Human identities For workforce identities like employees, administrators, developers, and operators: - Rely on a centralized identity provider and require human users to use federation with an identity provider to access AWS using temporary credentials. Federation for your users can be done either with direct federation to each AWS account or using AWS IAM Identity Center and the identity provider of your choice. Federation provides a number of advantages over using IAM users in addition to eliminating long-term credentials. Your users can also request temporary credentials from the command line for direct federation or by using IAM Identity Center. This means that there are few use cases that require IAM users or long-term credentials for your users. For third-party identities - When granting third parties, such as software as a service (SaaS) providers, access to resources in your AWS account, you can use cross-account roles and resource-based policies. Additionally, you can use the Amazon Cognito OAuth 2.0 grant client credentials flow for B2B SaaS customers or partners. User identities that access your AWS resources through web browsers, client applications, mobile apps, or interactive command-line tools - If you need to grant applications for consumers or customers access to your AWS resources, you can use Amazon Cognito identity pools or Amazon Cognito user pools to provide temporary credentials. The permissions for the credentials are configured through IAM roles. You can also define a separate IAM role with limited permissions for guest users who are not authenticated. ### Machine identities For machine identities, you might need to use long-term credentials. In these cases, you should require workloads to use temporary credentials with IAM roles to access AWS. - For Amazon Elastic Compute Cloud (Amazon EC2), you can use roles for Amazon EC2. - AWS Lambda allows you to configure a Lambda execution role to grant the service permissions to perform AWS actions using temporary credentials. There are many other similar models for AWS services to grant temporary credentials using IAM roles. - For IoT devices, you can use the AWS IoT Core credential provider to request temporary credentials. - For on-premises systems or systems that run outside of AWS that need access to AWS resources, you can use IAM Roles Anywhere. There are scenarios where temporary credentials are not supported, which require the use of long-term credentials. In these situations, audit and rotate these credentials periodically and rotate access keys regularly. For highly restricted IAM user access keys, consider the following additional security measures: - Grant highly restricted permissions: - Adhere to the principle of least privilege (be specific about actions, resources, and conditions). - Consider granting the IAM user only the AssumeRole operation for one specific role. Depending on the on-premise architecture, this approach helps isolate and secure the long-term IAM credentials. - Limit the allowed network sources and IP addresses in the IAM role trust policy. - Monitor usage and set up alerts for unused permissions or misuse (using AWS CloudWatch Logs metric filters and alarms). - Enforce permission boundaries (service control policies (SCPs) and permission boundaries complement each other - SCPs are coarse-grained, while permission boundaries are fine-grained). - Implement a process to provision and securely store (in an on-premise vault) the credentials. Some other options for scenarios requiring long-term credentials include: - Build your own token vending API (using Amazon API Gateway). - For scenarios where you must use long-term credentials or credentials other than AWS access keys (such as database logins), you can use a service designed to handle the management of secrets, such as AWS Secrets Manager. Secrets Manager simplifies the management, rotation, and secure storage of encrypted secrets. Many AWS services support a direct integration with Secrets Manager. - For multi-cloud integrations, you can use identity federation based on your source credential service provider (CSP) credentials (see AWS STS AssumeRoleWithWebIdentity).

๐Ÿ’ผ SEC02-BP03 Store and use secrets securely

A workload requires an automated capability to prove its identity to databases, resources, and third-party services. This is accomplished using secret access credentials, such as API access keys, passwords, and OAuth tokens. Using a purpose-built service to store, manage, and rotate these credentials helps reduce the likelihood that those credentials become compromised. **Desired outcome** Implementing a mechanism for securely managing application credentials that achieves the following goals: - Identifying what secrets are required for the workload. - Reducing the number of long-term credentials required by replacing them with short-term credentials when possible. - Establishing secure storage and automated rotation of remaining long-term credentials. - Auditing access to secrets that exist in the workload. - Continual monitoring to verify that no secrets are embedded in source code during the development process. - Reduce the likelihood of credentials being inadvertently disclosed. **Common anti-patterns** - Not rotating credentials. - Storing long-term credentials in source code or configuration files. - Storing credentials at rest unencrypted. **Benefits of establishing this best practice** - Secrets are stored encrypted at rest and in transit. - Access to credentials is gated through an API (think of it as a credential vending machine). - Access to a credential (both read and write) is audited and logged. - Separation of concerns: credential rotation is performed by a separate component, which can be segregated from the rest of the architecture. - Secrets are automatically distributed on-demand to software components and rotation occurs in a central location. - Access to credentials can be controlled in a fine-grained manner. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance In the past, credentials used to authenticate to databases, third-party APIs, tokens, and other secrets might have been embedded in source code or in environment files. AWS provides several mechanisms to store these credentials securely, automatically rotate them, and audit their usage. The best way to approach secrets management is to follow the guidance of remove, replace, and rotate. The most secure credential is one that you do not have to store, manage, or handle. There might be credentials that are no longer necessary to the functioning of the workload that can be safely removed. For credentials that are still required for the proper functioning of the workload, there might be an opportunity to replace a long-term credential with a temporary or short-term credential. For example, instead of hard-coding an AWS secret access key, consider replacing that long-term credential with a temporary credential using IAM roles. Some long-lived secrets might not be able to be removed or replaced. These secrets can be stored in a service such as AWS Secrets Manager, where they can be centrally stored, managed, and rotated on a regular basis. A common anti-pattern is embedding IAM access keys inside source code, configuration files, or mobile apps. When an IAM access key is required to communicate with an AWS service, use temporary (short-term) security credentials. These short-term credentials can be provided through IAM roles for EC2 instances, execution roles for Lambda functions, Cognito IAM roles for mobile user access, and IoT Core policies for IoT devices. When interfacing with third parties, prefer delegating access to an IAM role with the necessary access to your account's resources rather than configuring an IAM user and sending the third party the secret access key for that user. There are many cases where the workload requires the storage of secrets necessary to interoperate with other services and resources. AWS Secrets Manager is purpose-built to securely manage these credentials, as well as the storage, use, and rotation of API tokens, passwords, and other credentials. AWS Secrets Manager provides five key capabilities to ensure the secure storage and handling of sensitive credentials: encryption at rest, encryption in transit, comprehensive auditing, fine-grained access control, and extensible credential rotation. Other secret management services from AWS Partners or locally developed solutions that provide similar capabilities and assurances are also acceptable. When you retrieve a secret, you can use the Secrets Manager client-side caching components to cache it for future use. Retrieving a cached secret is faster than retrieving it from Secrets Manager. Additionally, because there is a cost for calling Secrets Manager APIs, using a cache can reduce your costs. For all of the ways you can retrieve secrets, see Get secrets. ### Implementation steps 1. Identify code paths containing hard-coded credentials using automated tools such as Amazon CodeGuru. - Use Amazon CodeGuru to scan your code repositories. Once the review is complete, filter on Type=Secrets in CodeGuru to find problematic lines of code. 2. Identify credentials that can be removed or replaced. 1. Identify credentials no longer needed and mark for removal. 2. For AWS Secret Keys that are embedded in source code, replace them with IAM roles associated with the necessary resources. If part of your workload is outside AWS but requires IAM credentials to access AWS resources, consider IAM Roles Anywhere or AWS Systems Manager Hybrid Activations. 3. For other third-party, long-lived secrets that require the use of the rotate strategy, integrate Secrets Manager into your code to retrieve third-party secrets at runtime. 1. The CodeGuru console can automatically create a secret in Secrets Manager using the discovered credentials. 2. Integrate secret retrieval from Secrets Manager into your application code. - Serverless Lambda functions can use a language-agnostic Lambda extension. - For EC2 instances or containers, AWS provides example client-side code for retrieving secrets from Secrets Manager in several popular programming languages. 4. Periodically review your code base and re-scan to verify no new secrets have been added to the code. - Consider using a tool such as git-secrets to prevent committing new secrets to your source code repository. 5. Monitor Secrets Manager activity for indications of unexpected usage, inappropriate secret access, or attempts to delete secrets. 6. Reduce human exposure to credentials. Restrict access to read, write, and modify credentials to an IAM role dedicated for this purpose, and only provide access to assume the role to a small subset of operational users.

๐Ÿ’ผ SEC02-BP04 Rely on a centralized identity provider

For workforce identities (employees and contractors), rely on an identity provider that allows you to manage identities in a centralized place. This makes it easier to manage access across multiple applications and systems, because you are creating, assigning, managing, revoking, and auditing access from a single location. **Desired outcome:** You have a centralized identity provider where you centrally manage workforce users, authentication policies (such as requiring multi-factor authentication (MFA)), and authorization to systems and applications (such as assigning access based on a user's group membership or attributes). Your workforce users sign in to the central identity provider and federate (single sign-on) to internal and external applications, removing the need for users to remember multiple credentials. Your identity provider is integrated with your human resources (HR) systems so that personnel changes are automatically synchronized to your identity provider. For example, if someone leaves your organization, you can automatically revoke access to federated applications and systems (including AWS). You have enabled detailed audit logging in your identity provider and are monitoring these logs for unusual user behavior. **Common anti-patterns** - You do not use federation and single-sign on. Your workforce users create separate user accounts and credentials in multiple applications and systems. - You have not automated the lifecycle of identities for workforce users, such as by integrating your identity provider with your HR systems. When a user leaves your organization or changes roles, you follow a manual process to delete or update their records in multiple applications and systems. **Benefits of establishing this best practice** By using a centralized identity provider, you have a single place to manage workforce user identities and policies, the ability to assign access to applications to users and groups, and the ability to monitor user sign-in activity. By integrating with your human resources (HR) systems, when a user changes roles, these changes are synchronized to the identity provider and automatically updates their assigned applications and permissions. When a user leaves your organization, their identity is automatically disabled in the identity provider, revoking their access to federated applications and systems. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance **Guidance for workforce users accessing AWS** Workforce users like employees and contractors in your organization may require access to AWS using the AWS Management Console or AWS Command Line Interface (AWS CLI) to perform their job functions. You can grant AWS access to your workforce users by federating from your centralized identity provider to AWS at two levels: direct federation to each AWS account or federating to multiple accounts in your AWS organization. To federate your workforce users directly with each AWS account, you can use a centralized identity provider to federate to AWS Identity and Access Management in that account. The flexibility of IAM allows you to enable a separate SAML 2.0 or an Open ID Connect (OIDC) Identity Provider for each AWS account and use federated user attributes for access control. Your workforce users will use their web browser to sign in to the identity provider by providing their credentials (such as passwords and MFA token codes). The identity provider issues a SAML assertion to their browser that is submitted to the AWS Management Console sign in URL to allow the user to single sign-on to the AWS Management Console by assuming an IAM Role. Your users can also obtain temporary AWS API credentials for use in the AWS CLI or AWS SDKs from AWS STS by assuming the IAM role using a SAML assertion from the identity provider. To federate your workforce users with multiple accounts in your AWS organization, you can use AWS IAM Identity Center to centrally manage access for your workforce users to AWS accounts and applications. You enable Identity Center for your organization and configure your identity source. IAM Identity Center provides a default identity source directory which you can use to manage your users and groups. Alternatively, you can choose an external identity source by connecting to your external identity provider using SAML 2.0 and automatically provisioning users and groups using SCIM, or connecting to your Microsoft AD Directory using AWS Directory Service. Once an identity source is configured, you can assign access to users and groups to AWS accounts by defining least-privilege policies in your permission sets. Your workforce users can authenticate through your central identity provider to sign in to the AWS access portal and single-sign on to the AWS accounts and cloud applications assigned to them. Your users can configure the AWS CLI v2 to authenticate with Identity Center and get credentials to run AWS CLI commands. Identity Center also allows single-sign on access to AWS applications such as Amazon SageMaker AI Studio and AWS IoT Sitewise Monitor portals. After you follow the preceding guidance, your workforce users will no longer need to use IAM users and groups for normal operations when managing workloads on AWS. Instead, your users and groups are managed outside of AWS and users are able to access AWS resources as a federated identity. Federated identities use the groups defined by your centralized identity provider. You should identify and remove IAM groups, IAM users, and long-lived user credentials (passwords and access keys) that are no longer needed in your AWS accounts. You can find unused credentials using IAM credential reports, delete the corresponding IAM users and delete IAM groups. You can apply a Service Control Policy (SCP) to your organization that helps prevent the creation of new IAM users and groups, enforcing that access to AWS is via federated identities. **Guidance for users of your applications** You can manage the identities of users of your applications, such as a mobile app, using Amazon Cognito as your centralized identity provider. Amazon Cognito enables authentication, authorization, and user management for your web and mobile apps. Amazon Cognito provides an identity store that scales to millions of users, supports social and enterprise identity federation, and offers advanced security features to help protect your users and business. You can integrate your custom web or mobile application with Amazon Cognito to add user authentication and access control to your applications in minutes. Built on open identity standards such as SAML and Open ID Connect (OIDC), Amazon Cognito supports various compliance regulations and integrates with frontend and backend development resources. ### Implementation steps **Steps for workforce users accessing AWS** 1. Federate your workforce users to AWS using a centralized identity provider using one of the following approaches: - Use IAM Identity Center to enable single sign-on to multiple AWS accounts in your AWS organization by federating with your identity provider. - Use IAM to connect your identity provider directly to each AWS account, enabling federated fine-grained access. 2. Identify and remove IAM users and groups that are replaced by federated identities. **Steps for users of your applications** 1. Use Amazon Cognito as a centralized identity provider towards your applications. 2. Integrate your custom applications with Amazon Cognito using OpenID Connect and OAuth. You can develop your custom applications using the Amplify libraries that provide simple interfaces to integrate with a variety of AWS services, such as Amazon Cognito for authentication.

๐Ÿ’ผ SEC02-BP05 Audit and rotate credentials periodically

Audit and rotate credentials periodically to limit how long the credentials can be used to access your resources. Long-term credentials create many risks, and these risks can be reduced by rotating long-term credentials regularly. **Desired outcome** Implement credential rotation to help reduce the risks associated with long-term credential usage. Regularly audit and remediate non-compliance with credential rotation policies. **Common anti-patterns** - Not auditing credential use. - Using long-term credentials unnecessarily. - Using long-term credentials and not rotating them regularly. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance When you cannot rely on temporary credentials and require long-term credentials, audit credentials to verify that defined controls like multi-factor authentication (MFA) are enforced, rotated regularly, and have the appropriate access level. Periodic validation, preferably through an automated tool, is necessary to verify that the correct controls are enforced. For human identities, you should require users to change their passwords periodically and retire access keys in favor of temporary credentials. As you move from AWS Identity and Access Management (IAM) users to centralized identities, you can generate a credential report to audit your users. We also recommend that you enforce and monitor MFA in your identity provider. You can set up AWS Config Rules, or use AWS Security Hub Security Standards, to monitor if users have configured MFA. Consider using IAM Roles Anywhere to provide temporary credentials for machine identities. In situations when using IAM roles and temporary credentials is not possible, frequent auditing and rotating access keys is necessary. ### Implementation steps 1. Regularly audit credentials: Auditing the identities that are configured in your identity provider and IAM helps verify that only authorized identities have access to your workload. Such identities can include, but are not limited to, IAM users, AWS IAM Identity Center users, Active Directory users, or users in a different upstream identity provider. For example, remove people that leave the organization, and remove cross-account roles that are no longer required. Have a process in place to periodically audit permissions to the services accessed by an IAM entity. This helps you identify the policies you need to modify to remove any unused permissions. Use credential reports and AWS Identity and Access Management Access Analyzer to audit IAM credentials and permissions. You can use Amazon CloudWatch to set up alarms for specific API calls called within your AWS environment. Amazon GuardDuty can also alert you to unexpected activity, which might indicate overly permissive access or unintended access to IAM credentials. 2. Rotate credentials regularly: When you are unable to use temporary credentials, rotate long-term IAM access keys regularly (maximum every 90 days). If an access key is unintentionally disclosed without your knowledge, this limits how long the credentials can be used to access your resources. 3. Review IAM permissions: To improve the security of your AWS account, regularly review and monitor each of your IAM policies. Verify that policies adhere to the principle of least privilege. 4. Consider automating IAM resource creation and updates: IAM Identity Center automates many IAM tasks, such as role and policy management. Alternatively, AWS CloudFormation can be used to automate the deployment of IAM resources, including roles and policies, to reduce the chance of human error because the templates can be verified and version controlled. 5. Use IAM Roles Anywhere to replace IAM users for machine identities: IAM Roles Anywhere allows you to use roles in areas that you traditionally could not, such as on-premise servers. IAM Roles Anywhere uses a trusted X.509 certificate to authenticate to AWS and receive temporary credentials. Using IAM Roles Anywhere avoids the need to rotate these credentials, as long-term credentials are no longer stored in your on-premises environment. Please note that you will need to monitor and rotate the X.509 certificate as it approaches expiration.

๐Ÿ’ผ SEC02-BP06 Employ user groups and attributes

Defining permissions according to user groups and attributes helps reduce the number and complexity of policies, making it simpler to achieve the principle of least privilege. You can use user groups to manage the permissions for many people in one place based on the function they perform in your organization. Attributes, such as department, project, or location, can provide an additional layer of permission scope when people perform a similar function but for different subsets of resources. **Desired outcome** You can apply changes in permissions based on function to all users who perform that function. Group membership and attributes govern user permissions, reducing the need to manage permissions at the individual user level. The groups and attributes you define in your identity provider (IdP) are propagated automatically to your AWS environments. **Common anti-patterns** - Managing permissions for individual users and duplicating across many users. - Defining groups at too high a level, granting overly-broad permissions. - Defining groups at too granular a level, creating duplication and confusion about membership. - Using groups with duplicate permissions across subsets of resources when attributes can be used instead. - Not managing groups, attributes, and memberships through a standardized identity provider integrated with your AWS environments. - Using role chaining when using AWS IAM Identity Center sessions **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance AWS permissions are defined in documents called policies that are associated with a principal, such as a user, group, role, or resource. You can scale permissions management by organizing permissions assignments (group, permissions, account) based on job-function, workload, and SDLC environment. For your workforce, this allows you to define groups based on the function your users perform for your organization, rather than based on the resources being accessed. For example, a WebAppDeveloper group may have a policy attached for configuring services like Amazon CloudFront within a development account. An AutomationDeveloper group may have some overlapping permissions with the WebAppDeveloper group. These common permissions can be captured in a separate policy and associated with both groups, rather than having users from both functions belong to a CloudFrontAccess group. In addition to groups, you can use attributes to further scope access. For example, you may have a Project attribute for users in your WebAppDeveloper group to scope access to resources specific to their project. Using this technique removes the need to have different groups for application developers working on different projects if their permissions are otherwise the same. The way you refer to attributes in permission policies is based on their source, whether they are defined as part of your federation protocol (such as SAML, OIDC, or SCIM), as custom SAML assertions, or set within IAM Identity Center. ### Implementation steps 1. Establish where you will define groups and attributes: - Following the guidance in SEC02-BP04 Rely on a centralized identity provider, you can determine whether you need to define groups and attributes within your identity provider, within IAM Identity Center, or using IAM user groups in a specific account. 2. Define groups: - Determine your groups based on function and scope of access required. Consider using a hierarchical structure or naming conventions to organize groups effectively. - If defining within IAM Identity Center, create groups and associate the desired level of access using permission sets. - If defining within an external identity provider, determine if the provider supports the SCIM protocol and consider enabling automatic provisioning within IAM Identity Center. This capability synchronizes the creation, membership, and deletion of groups between your provider and IAM Identity Center. 3. Define attributes: - If you use an external identity provider, both the SCIM and SAML 2.0 protocols provide certain attributes by default. Additional attributes can be defined and passed using SAML assertions with the https://aws.amazon.com/SAML/Attributes/PrincipalTag attribute name. Refer to your identity provider's documentation for guidance on defining and configuring custom attributes. - If you define roles within IAM Identity Center, enable the attribute-based access control (ABAC) feature, and define attributes as desired. Consider attributes that align with your organization's structure or resource tagging strategy. - If you require IAM role chaining from IAM Roles assumed through IAM Identity Center, values like source-identity and principal-tags will not propagate. For more detail, see Enable and configure attributes for access control. 4. Scope permissions based on groups and attributes: - Consider including conditions in your permission policies that compare the attributes of your principal with the attributes of the resources being accessed. For example, you can define a condition to allow access to a resource only if the value of a PrincipalTag condition key matches the value of a ResourceTag key of the same name. - When defining ABAC policies, follow the guidance in the ABAC authorization best practices and examples. - Regularly review and update your group and attribute structure as your organization's needs evolve to ensure optimal permissions management.

๐Ÿ’ผ SEC03-BP01 Define access requirements

Each component or resource of your workload needs to be accessed by administrators, end users, or other components. Have a clear definition of who or what should have access to each component, choose the appropriate identity type and method of authentication and authorization. **Common anti-patterns** - Hard-coding or storing secrets in your application. - Granting custom permissions for each user. - Using long-lived credentials. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Each component or resource of your workload needs to be accessed by administrators, end users, or other components. Have a clear definition of who or what should have access to each component, choose the appropriate identity type and method of authentication and authorization. Regular access to AWS accounts within the organization should be provided using federated access or a centralized identity provider. You should also centralize your identity management and ensure that there is an established practice to integrate AWS access to your employee access lifecycle. For example, when an employee changes to a job role with a different access level, their group membership should also change to reflect their new access requirements. When defining access requirements for non-human identities, determine which applications and components need access and how permissions are granted. Using IAM roles built with the least privilege access model is a recommended approach. AWS Managed policies provide predefined IAM policies that cover most common use cases. AWS services, such as AWS Secrets Manager and AWS Systems Manager Parameter Store, can help decouple secrets from the application or workload securely in cases where it's not feasible to use IAM roles. In Secrets Manager, you can establish automatic rotation for your credentials. You can use Systems Manager to reference parameters in your scripts, commands, SSM documents, configuration, and automation workflows by using the unique name that you specified when you created the parameter. You can use AWS IAM Roles Anywhere to obtain temporary security credentials in IAM for workloads that run outside of AWS. Your workloads can use the same IAM policies and IAM roles that you use with AWS applications to access AWS resources. Where possible, prefer short-term temporary credentials over long-term static credentials. For scenarios in which you need users with programmatic access and long-term credentials, use access key last used information to rotate and remove access keys. Users need programmatic access if they want to interact with AWS outside of the AWS Management Console. The way to grant programmatic access depends on the type of user that's accessing AWS.

๐Ÿ’ผ SEC03-BP02 Grant least privilege access

Grant only the access that users require to perform specific actions on specific resources under specific conditions. Use group and identity attributes to dynamically set permissions at scale, rather than defining permissions for individual users. For example, you can allow a group of developers access to manage only resources for their project. This way, if a developer leaves the project, their access is automatically revoked without changing the underlying access policies. **Desired outcome** Users have only the minimum permissions required for their specific job functions. You use separate AWS accounts to isolate developers from production environments. When developers need to access production environments for specific tasks, they are granted limited and controlled access only for the duration of those tasks. Their production access is immediately revoked after they complete the necessary work. You conduct regular reviews of permissions and promptly revoke them when no longer needed, such as when a user changes roles or leaves the organization. You restrict administrator privileges to a small, trusted group to reduce risk exposure. You give machine or system accounts only the minimum permissions required to perform their intended tasks. **Common anti-patterns** - By default, you grant users administrator permissions. - You use the root user account for daily activities. - You create overly permissive policies without proper scoping. - Your permissions reviews are infrequent, which leads to permissions creep. - You rely solely on attribute-based access control for environment isolation or permissions management. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance The principle of least privilege states that identities should only be permitted to perform the smallest set of actions necessary to fulfill a specific task. This balances usability, efficiency, and security. Operating under this principle helps limit unintended access and helps track who has access to what resources. IAM users and roles have no permissions by default. The root user has full access by default and should be tightly controlled, monitored, and used only for tasks that require root access. IAM policies are used to explicitly grant permissions to IAM roles or specific resources. For example, identity-based policies can be attached to IAM groups, while S3 buckets can be controlled by resource-based policies. When you create an IAM policy, you can specify the service actions, resources, and conditions that must be true for AWS to allow or deny access. AWS supports a variety of conditions to help you scope down access. For example, by using the PrincipalOrgID condition key, you can deny actions if the requestor isn't a part of your AWS Organization. You can also control requests that AWS services make on your behalf, such as AWS CloudFormation creating an AWS Lambda function, using the CalledVia condition key. You can layer different policy types to establish defense-in-depth and limit the overall permissions of your users. You can also restrict what permissions can be granted and under what conditions. For example, you can allow your workload teams to create their own IAM policies for systems they build, but only if they apply a Permission Boundary to limit the maximum permissions they can grant. ### Implementation steps 1. Implement least privilege policies: Assign access policies with least privilege to IAM groups and roles to reflect the user's role or function that you have defined. 2. Isolate development and production environments through separate AWS accounts: Use separate AWS accounts for development and production environments, and control access between them using service control policies, resource policies, and identity policies. 3. Base policies on API usage: One way to determine the needed permissions is to review AWS CloudTrail logs. You can use this review to create permissions tailored to the actions that the user actually performs within AWS. IAM Access Analyzer can automatically generate an IAM policy based on access activity. You can use IAM Access Advisor at the organization or account level to track the last accessed information for a particular policy. 4. Consider using AWS managed policies for job functions: When you begin to create fine-grained permissions policies, it can be helpful to use AWS managed policies for common job roles, such as billing, database administrators, and data scientists. These policies can help narrow the access that users have while you determine how to implement the least privilege policies. 5. Remove unnecessary permissions: Detect and remove unused IAM entities, credentials, and permissions to achieve the principle of least privilege. You can use IAM Access Analyzer to identify external and unused access, and IAM Access Analyzer policy generation can help fine-tune permissions policies. 6. Ensure that users have limited access to production environments: Users should only have access to production environments with a valid use case. After the user performs the specific tasks that required production access, access should be revoked. Limiting access to production environments helps prevent unintended production-impacting events and lowers the scope of impact of unintended access. 7. Consider permissions boundaries: A permissions boundary is a feature for using a managed policy that sets the maximum permissions that an identity-based policy can grant to an IAM entity. An entity's permissions boundary allows it to perform only the actions that are allowed by both its identity-based policies and its permissions boundaries. 8. Refine access using attribute-based access control and resource tags: Attribute-based access control (ABAC) using resource tags can be used to refine permissions when supported. You can use an ABAC model that compares principal tags to resource tags to refine access based on custom dimensions you define. This approach can simplify and reduce the number of permission policies in your organization. - It is recommended that ABAC only be used for access control when both the principals and resources are owned by your AWS Organization. External parties may use the same tag names and values as your organization for their own principals and resources. If you rely solely on these name-value pairs for granting access to external party principals or resources, you may provide unintended permissions. 9. Use service control policies for AWS Organizations: Service control policies centrally control the maximum available permissions for member accounts in your organization. Importantly, you can use service control policies to restrict root user permissions in member accounts. Also consider using AWS Control Tower, which provides prescriptive managed controls that enrich AWS Organizations. You can also define your own controls within Control Tower. 10. Establish a user lifecycle policy for your organization: User lifecycle policies define tasks to perform when users are onboarded onto AWS, change job role or scope, or no longer need access to AWS. Perform permission reviews during each step of a user's lifecycle to verify that permissions are properly restrictive and to avoid permissions creep. 11. Establish a regular schedule to review permissions and remove any unneeded permissions: You should regularly review user access to verify that users do not have overly permissive access. AWS Config and IAM Access Analyzer can help during user permissions audits. 12. Establish a job role matrix: A job role matrix visualizes the various roles and access levels required within your AWS footprint. With a job role matrix, you can define and separate permissions based on user responsibilities within your organization. Use groups instead of applying permissions directly to individual users or roles.

๐Ÿ’ผ SEC03-BP03 Establish emergency access process

Create a process that allows for emergency access to your workloads in the unlikely event of an issue with your centralized identity provider. You must design processes for different failure modes that may result in an emergency event. For example, under normal circumstances, your workforce users federate to the cloud using a centralized identity provider (SEC02-BP04) to manage their workloads. However, if your centralized identity provider fails, or the configuration for federation in the cloud is modified, then your workforce users may not be able to federate into the cloud. An emergency access process allows authorized administrators to access your cloud resources through alternate means (such as an alternate form of federation or direct user access) to fix issues with your federation configuration or your workloads. The emergency access process is used until the normal federation mechanism is restored. **Desired outcome** - You have defined and documented the failure modes that count as an emergency: consider your normal circumstances and the systems your users depend on to manage their workloads. Consider how each of these dependencies can fail and cause an emergency situation. You may find the questions and best practices in the Reliability pillar useful to identify failure modes and architect more resilient systems to minimize the likelihood of failures. - You have documented the steps that must be followed to confirm a failure as an emergency. For example, you can require your identity administrators to check the status of your primary and standby identity providers and, if both are unavailable, declare an emergency event for identity provider failure. - You have defined an emergency access process specific to each type of emergency or failure mode. Being specific can reduce the temptation on the part of your users to overuse a general process for all types of emergencies. Your emergency access processes describe the circumstances under which each process should be used, and conversely situations where the process should not be used and points to alternate processes that may apply. - Your processes are well-documented with detailed instructions and playbooks that can be followed quickly and efficiently. Remember that an emergency event can be a stressful time for your users and they may be under extreme time pressure, so design your process to be as simple as possible. **Common anti-patterns** - You do not have well-documented and well-tested emergency access processes. Your users are unprepared for an emergency and follow improvised processes when an emergency event arises. - Your emergency access processes depend on the same systems (such as a centralized identity provider) as your normal access mechanisms. This means that the failure of such a system may impact both your normal and emergency access mechanisms and impair your ability to recover from the failure. - Your emergency access processes are used in non-emergency situations. For example, your users frequently misuse emergency access processes as they find it easier to make changes directly than submit changes through a pipeline. - Your emergency access processes do not generate sufficient logs to audit the processes, or the logs are not monitored to alert for potential misuse of the processes. **Benefits of establishing this best practice** - By having well-documented and well-tested emergency access processes, you can reduce the time taken by your users to respond to and resolve an emergency event. This can result in less downtime and higher availability of the services you provide to your customers. - You can track each emergency access request and detect and alert on unauthorized attempts to misuse the process for non-emergency events. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance This section provides guidance for creating emergency access processes for several failure modes related to workloads deployed on AWS, starting with common guidance that applies to all failure modes and followed by specific guidance based on the type of failure mode. ### Common guidance for all failure modes Consider the following as you design an emergency access process for a failure mode: - Document the pre-conditions and assumptions of the process: when the process should be used and when it should not be used. It helps to detail the failure mode and document assumptions, such as the state of other related systems. For example, the process for the Failure Mode 2 assumes that the identity provider is available, but the configuration on AWS is modified or has expired. - Pre-create resources needed by the emergency access process (SEC10-BP05). For example, pre-create the emergency access AWS account with IAM users and roles, and the cross-account IAM roles in all the workload accounts. This verifies that these resources are ready and available when an emergency event happens. By pre-creating resources, you do not have a dependency on AWS control plane APIs (used to create and modify AWS resources) that may be unavailable in an emergency. Further, by pre-creating IAM resources, you do not need to account for potential delays due to eventual consistency. - Include emergency access processes as part of your incident management plans (SEC10-BP02). Document how emergency events are tracked and communicated to others in your organization such as peer teams, your leadership, and, when applicable, externally to your customers and business partners. - Define the emergency access request process in your existing service request workflow system if you have one. Typically, such workflow systems allow you to create intake forms to collect information about the request, track the request through each stage of the workflow, and add both automated and manual approval steps. Relate each request with a corresponding emergency event tracked in your incident management system. Having a uniform system for emergency accesses allows you to track those requests in a single system, analyze usage trends, and improve your processes. - Verify that your emergency access processes can only be initiated by authorized users and require approvals from the user's peers or management as appropriate. The approval process should operate effectively both inside and outside business hours. Define how requests for approval allow secondary approvers if the primary approvers are unavailable and are escalated up your management chain until approved. - Implement robust logging, monitoring, and alerting mechanisms for the emergency access process and mechanisms. Generate detailed audit logs for all successful and failed attempts to gain emergency access. Correlate the activity with ongoing emergency events from your incident management system, and initiate alerts when actions occur outside of expected time periods or when the emergency access account is used during normal operations. The emergency access account should only be accessed during emergencies, as break-glass procedures can be considered a backdoor. Integrate with your security information and event management (SIEM) tool or AWS Security Hub to report and audit all activities during the emergency access period. Upon returning to normal operations, automatically rotate the emergency access credentials, and notify the relevant teams. - Test emergency access processes periodically to verify that the steps are clear and grant the correct level of access quickly and efficiently. Your emergency access processes should be tested as part of incident response simulations (SEC10-BP07) and disaster recovery tests (REL13-BP03). ### Failure Mode 1: Identity provider used to federate to AWS is unavailable As described in SEC02-BP04 Rely on a centralized identity provider, we recommend relying on a centralized identity provider to federate your workforce users to grant access to AWS accounts. You can federate to multiple AWS accounts in your AWS organization using IAM Identity Center, or you can federate to individual AWS accounts using IAM. In both cases, workforce users authenticate with your centralized identity provider before being redirected to an AWS sign-in endpoint to single sign-on. In the unlikely event that your centralized identity provider is unavailable, your workforce users can't federate to AWS accounts or manage their workloads. In this emergency event, you can provide an emergency access process for a small set of administrators to access AWS accounts to perform critical tasks that cannot wait until your centralized identity providers are back online. For example, your identity provider is unavailable for 4 hours and during that period you need to modify the upper limits of an Amazon EC2 Auto Scaling group in a Production account to handle an unexpected spike in customer traffic. Your emergency administrators should follow the emergency access process to gain access to the specific production AWS account and make the necessary changes. The emergency access process relies on a pre-created emergency access AWS account that is used solely for emergency access and has AWS resources (such as IAM roles and IAM users) to support the emergency access process. During normal operations, no one should access the emergency access account and you must monitor and alert on the misuse of this account. The emergency access account has emergency access IAM roles with permissions to assume cross-account roles in the AWS accounts that require emergency access. These IAM roles are pre-created and configured with trust policies that trust the emergency account's IAM roles. The emergency access process can use one of the following approaches: - You can pre-create a set of IAM users for your emergency administrators in the emergency access account with associated strong passwords and MFA tokens. These IAM users have permissions to assume the IAM roles that then allow cross-account access to the AWS account where emergency access is required. We recommend creating as few such users as possible and assigning each user to a single emergency administrator. During an emergency, an emergency administrator user signs into the emergency access account using their password and MFA token code, switches to the emergency access IAM role in the emergency account, and finally switches to the emergency access IAM role in the workload account to perform the emergency change action. The advantage of this approach is that each IAM user is assigned to one emergency administrator and you can know which user signed-in by reviewing CloudTrail events. The disadvantage is that you have to maintain multiple IAM users with their associated long-lived passwords and MFA tokens. - You can use the emergency access AWS account root user to sign into the emergency access account, assume the IAM role for emergency access, and assume the cross-account role in the workload account. We recommend setting a strong password and multiple MFA tokens for the root user. We also recommend storing the password and the MFA tokens in a secure enterprise credential vault that enforces strong authentication and authorization. You should secure the password and MFA token reset factors: set the email address for the account to an email distribution list that is monitored by your cloud security administrators, and the phone number of the account to a shared phone number that is also monitored by security administrators. The advantage of this approach is that there is one set of root user credentials to manage. The disadvantage is that since this is a shared user, multiple administrators have ability to sign in as the root user. You must audit your enterprise vault log events to identify which administrator checked out the root user password. ### Failure Mode 2: Identity provider configuration on AWS is modified or has expired To allow your workforce users to federate to AWS accounts, you can configure the IAM Identity Center with an external identity provider or create an IAM Identity Provider (SEC02-BP04). Typically, you configure these by importing a SAML meta-data XML document provided by your identity provider. The meta-data XML document includes a X.509 certificate corresponding to a private key that the identity provider uses to sign its SAML assertions. These configurations on the AWS-side may be modified or deleted by mistake by an administrator. In another scenario, the X.509 certificate imported into AWS may expire and a new meta-data XML with a new certificate has not yet been imported into AWS. Both scenarios can break federation to AWS for your workforce users, resulting in an emergency. In such an emergency event, you can provide your identity administrators access to AWS to fix the federation issues. For example, your identity administrator uses the emergency access process to sign into the emergency access AWS account, switches to a role in the Identity Center administrator account, and updates the external identity provider configuration by importing the latest SAML meta-data XML document from your identity provider to re-enable federation. Once federation is fixed, your workforce users continue to use the normal operating process to federate into their workload accounts. You can follow the approaches detailed in the previous Failure Mode 1 to create an emergency access process. You can grant least-privilege permissions to your identity administrators to access only the Identity Center administrator account and perform actions on Identity Center in that account. ### Failure Mode 3: Identity Center disruption In the unlikely event of an IAM Identity Center or AWS Region disruption, we recommend that you set up a configuration that you can use to provide temporary access to the AWS Management Console. The emergency access process uses direct federation from your identity provider to IAM in an emergency account. ### Implementation steps ### Common steps for all failure modes - Create an AWS account dedicated to emergency access processes. Pre-create the IAM resources needed in the account such as IAM roles or IAM users, and optionally IAM Identity Providers. Additionally, pre-create cross-account IAM roles in the workload AWS accounts with trust relationships with corresponding IAM roles in the emergency access account. You can use AWS CloudFormation StackSets with AWS Organizations to create such resources in the member accounts in your organization. - Create AWS Organizations service control policies (SCPs) to deny the deletion and modification of the cross-account IAM roles in the member AWS accounts. - Enable CloudTrail for the emergency access AWS account and send the trail events to a central S3 bucket in your log collection AWS account. If you are using AWS Control Tower to set up and govern your AWS multi-account environment, then every account you create using AWS Control Tower or enroll in AWS Control Tower has CloudTrail enabled by default and sent to an S3 bucket in a dedicated log archive AWS account. - Monitor the emergency access account for activity by creating EventBridge rules that match on console login and API activity by the emergency IAM roles. Send notifications to your security operations center when activity happens outside of an ongoing emergency event tracked in your incident management system. ### Additional steps for Failure Mode 1 and Failure Mode 2 - Pre-create resources depending on the mechanism you choose for emergency access: - Using IAM users: pre-create the IAM users with strong passwords and associated MFA devices. - Using the emergency account root user: configure the root user with a strong password and store the password in your enterprise credential vault. Associate multiple physical MFA devices with the root user and store the devices in locations that can be accessed quickly by members of your emergency administrator team. ### Additional steps for Failure Mode 3: Identity Center disruption - As detailed in Set up emergency access to the AWS Management Console, in the emergency access AWS account, create an IAM Identity Provider to enable direct SAML federation from your identity provider. - Create emergency operations groups in your IdP with no members. - Create IAM roles corresponding to the emergency operations groups in the emergency access account.

๐Ÿ’ผ SEC03-BP04 Reduce permissions continuously

As your teams determine what access is required, remove unneeded permissions and establish review processes to achieve least privilege permissions. Continually monitor and remove unused identities and permissions for both human and machine access. **Desired outcome** Permission policies should adhere to the least privilege principle. As job duties and roles become better defined, your permission policies need to be reviewed to remove unnecessary permissions. This approach lessens the scope of impact should credentials be inadvertently exposed or otherwise accessed without authorization. **Common anti-patterns** - Defaulting to granting users administrator permissions. - Creating policies that are overly permissive, but without full administrator privileges. - Keeping permission policies after they are no longer needed. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance As teams and projects are just getting started, permissive permission policies might be used to inspire innovation and agility. For example, in a development or test environment, developers can be given access to a broad set of AWS services. We recommend that you evaluate access continuously and restrict access to only those services and service actions that are necessary to complete the current job. We recommend this evaluation for both human and machine identities. Machine identities, sometimes called system or service accounts, are identities that give AWS access to applications or servers. This access is especially important in a production environment, where overly permissive permissions can have a broad impact and potentially expose customer data. AWS provides multiple methods to help identify unused users, roles, permissions, and credentials. AWS can also help analyze access activity of IAM users and roles, including associated access keys, and access to AWS resources such as objects in Amazon S3 buckets. AWS Identity and Access Management Access Analyzer policy generation can assist you in creating restrictive permission policies based on the actual services and actions a principal interacts with. Attribute-based access control (ABAC) can help simplify permissions management, as you can provide permissions to users using their attributes instead of attaching permissions policies directly to each user. ### Implementation steps - Use AWS Identity and Access Management Access Analyzer: IAM Access Analyzer helps identify resources in your organization and accounts, such as Amazon Simple Storage Service (Amazon S3) buckets or IAM roles that are shared with an external entity. - Use IAM Access Analyzer policy generation: IAM Access Analyzer policy generation helps you create fine-grained permission policies based on an IAM user or roleโ€™s access activity. - Test permissions across lower environments before production: Start by using the less critical sandbox and development environments to test the permissions required for various job functions using IAM Access Analyzer. Then, progressively tighten and validate these permissions across the testing, quality assurance, and staging environments before applying them to production. The lower environments can have more relaxed permissions initially, as service control policies (SCPs) enforce guardrails by limiting the maximum permissions granted. - Determine an acceptable timeframe and usage policy for IAM users and roles: Use the last accessed timestamp to identify unused users and roles and remove them. Review service and action last accessed information to identify and scope permissions for specific users and roles. For example, you can use last accessed information to identify the specific Amazon S3 actions that your application role requires and restrict the roleโ€™s access to only those actions. Last accessed information features are available in the AWS Management Console and programmatically allow you to incorporate them into your infrastructure workflows and automated tools. - Consider logging data events in AWS CloudTrail: By default, CloudTrail does not log data events such as Amazon S3 object-level activity (for example, GetObject and DeleteObject) or Amazon DynamoDB table activities (for example, PutItem and DeleteItem). Consider using logging for these events to determine what users and roles need access to specific Amazon S3 objects or DynamoDB table items.

๐Ÿ’ผ SEC03-BP05 Define permission guardrails for your organization

Use permission guardrails to reduce the scope of available permissions that can be granted to principals. The permission policy evaluation chain includes your guardrails to determine the effective permissions of a principal when making authorization decisions. You can define guardrails using a layer-based approach. Apply some guardrails broadly across your entire organization and apply others granularly to temporary access sessions. **Desired outcome** You have clear isolation of environments using separate AWS accounts. Service control policies (SCPs) are used to define organization-wide permission guardrails. Broader guardrails are set at the hierarchy levels closest to your organization root, and more strict guardrails are set closer to the level of individual accounts. Where supported, resource policies define the conditions that a principal must satisfy to gain access to a resource. Resource policies also scope down the set of allowable actions, where appropriate. Permission boundaries are placed on principals that manage workload permissions, delegating permission management to individual workload owners. **Common anti-patterns** - Creating member AWS accounts within an AWS Organization, but not using SCPs to restrict the use and permissions available to their root credentials. - Assigning permissions based on least privilege, but not placing guardrails on the maximum set of permissions that can be granted. - Relying on the implicit deny foundation of AWS IAM to restrict permissions, trusting that policies will not grant an undesired explicit allow permission. - Running multiple workload environments in the same AWS account, and then relying on mechanisms such as VPCs, tags, or resource policies to enforce permission boundaries. **Benefits of establishing this best practice** Permission guardrails help build confidence that undesired permissions cannot be granted, even when a permission policy attempts to do so. This can simplify defining and managing permissions by reducing the maximum scope of permissions needing consideration. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance We recommend you use a layer-based approach to define permission guardrails for your organization. This approach systematically reduces the maximum set of possible permissions as additional layers are applied. This helps you grant access based on the principle of least privilege, reducing the risk of unintended access due to policy misconfiguration. The first step to establish permission guardrails is to isolate your workloads and environments into separate AWS accounts. Principals from one account cannot access resources in another account without explicit permission to do so, even when both accounts are in the same AWS organization or under the same organizational unit (OU). You can use OUs to group accounts you want to administer as a single unit. The next step is to reduce the maximum set of permissions that you can grant to principals within the member accounts of your organization. You can use service control policies (SCPs) for this purpose, which you can apply to either an OU or an account. SCPs can enforce common access controls, such as restricting access to specific AWS Regions, help prevent resources from being deleted, or disabling potentially risky service actions. SCPs that you apply to the root of your organization only affect its member accounts, not the management account. SCPs only govern the principals within your organization. Your SCPs don't govern principals outside your organization that are accessing your resources. If you are using AWS Control Tower, you can leverage its controls and landing zones as the foundation for your permission guardrails and multi-account environment. The landing zones provide a pre-configured, secure baseline environment with separate accounts for different workloads and applications. The guardrails enforce mandatory controls around security, operations, and compliance through a combination of Service Control Policies (SCPs), AWS Config rules, and other configurations. However, when using Control Tower guardrails and landing zones alongside custom Organization SCPs, it's crucial to follow the best practices outlined in the AWS documentation to avoid conflicts and ensure proper governance. By adhering to these guidelines, you can effectively leverage Control Tower's guardrails, landing zones, and custom SCPs while mitigating potential conflicts and ensuring proper governance and control over your multi-account AWS environment. A further step is to use IAM resource policies to scope the available actions that you can take on the resources they govern, along with any conditions that the acting principal must meet. This can be as broad as allowing all actions so long as the principal is part of your organization (using the PrincipalOrgId condition key), or as granular as only allowing specific actions by a specific IAM role. You can take a similar approach with conditions in IAM role trust policies. If a resource or role trust policy explicitly names a principal in the same account as the role or resource it governs, that principal does not need an attached IAM policy that grants the same permissions. If the principal is in a different account from the resource, then the principal does need an attached IAM policy that grants those permissions. Often, a workload team will want to manage the permissions their workload requires. This may require them to create new IAM roles and permission policies. You can capture the maximum scope of permissions the team is allowed to grant in an IAM permission boundary, and associate this document to an IAM role the team can then use to manage their IAM roles and permissions. This approach can provide them the flexibility to complete their work while mitigating risks of having IAM administrative access. A more granular step is to implement privileged access management (PAM) and temporary elevated access management (TEAM) techniques. One example of PAM is to require principals to perform multi-factor authentication before taking privileged actions. TEAM requires a solution that manages the approval and timeframe that a principal is allowed to have elevated access. One approach is to temporarily add the principal to the role trust policy for an IAM role that has elevated access. Another approach is to, under normal operation, scope down the permissions granted to a principal by an IAM role using a session policy, and then temporarily lift this restriction during the approved time window. To learn more about solutions that AWS and select partners validated, see Temporary elevated access. ### Implementation steps 1. Isolate your workloads and environments into separate AWS accounts. 2. Use SCPs to reduce the maximum set of permissions that can be granted to principals within the member accounts of your organization. 1. When defining SCPs to reduce the maximum set of permissions that can be granted to principals within your organization's member accounts, you can choose between an allow list or deny list approach. The allow list strategy explicitly specifies the access that is allowed and implicitly blocks all other access. The deny list strategy explicitly specifies the access that isn't allowed and allows all other access by default. Both strategies have their advantages and trade-offs, and the appropriate choice depends on your organization's specific requirements and risk model. 2. Additionally, review the service control policy examples to understand how to construct SCPs effectively. 3. Use IAM resource policies to scope down and specify conditions for permitted actions on resources. Use conditions in IAM role trust policies to create restrictions on assuming roles. 4. Assign IAM permission boundaries to IAM roles that workload teams can then use to manage their own workload IAM roles and permissions. 5. Evaluate PAM and TEAM solutions based on your needs.

๐Ÿ’ผ SEC03-BP06 Manage access based on lifecycle

Monitor and adjust the permissions granted to your principals (users, roles, and groups) throughout their lifecycle within your organization. Adjust group memberships as users change roles, and remove access when a user leaves the organization. **Desired outcome** - You monitor and adjust permissions throughout the lifecycle of principals within the organization, reducing risk of unnecessary privileges. - You grant appropriate access when you create a user. - You modify access as the user's responsibilities change, and you remove access when the user is no longer active or has left the organization. - You centrally manage changes to your users, roles, and groups. - You use automation to propagate changes to your AWS environments. **Common anti-patterns** - Granting excessive or broad access privileges to identities upfront, beyond what is initially required. - Not reviewing and adjusting access privileges as identities' roles and responsibilities change over time. - Leaving inactive or terminated identities with active access privileges. This increases the risk of unauthorized access. - Not leveraging automation to manage the lifecycle of identities. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Carefully manage and adjust access privileges that you grant to identities (such as users, roles, groups) throughout their lifecycle. This lifecycle includes the initial onboarding phase, ongoing changes in roles and responsibilities, and eventual offboarding or termination. Proactively manage access based on the stage of the lifecycle to maintain the appropriate access level. Adhere to the principle of least privilege to reduce the risk of excessive or unnecessary access privileges. You can manage the lifecycle of IAM users directly within the AWS account, or through federation from your workforce identity provider to AWS IAM Identity Center. For IAM users, you can create, modify, and delete users and their associated permissions within the AWS account. For federated users, you can use IAM Identity Center to manage their lifecycle by synchronizing user and group information from your organization's identity provider using the System for Cross-domain Identity Management (SCIM) protocol. SCIM is an open standard protocol for automated provisioning and deprovisioning of user identities across different systems. By integrating your identity provider with IAM Identity Center using SCIM, you can automatically synchronize user and group information, helping to validate that access privileges are granted, modified, or revoked based on changes in your organization's authoritative identity source. As the roles and responsibilities of employees change within your organization, adjust their access privileges accordingly. You can use IAM Identity Center's permission sets to define different job roles or responsibilities and associate them with the appropriate IAM policies and permissions. When an employee's role changes, you can update their assigned permission set to reflect their new responsibilities. Verify that they have the necessary access while adhering to the principle of least privilege. ### Implementation steps 1. Define and document an access management lifecycle process, including procedures for granting initial access, periodic reviews, and offboarding. 2. Implement IAM roles, groups, and permissions boundaries to manage access collectively and enforce maximum permissible access levels. 3. Integrate with a federated identity provider (such as Microsoft Active Directory, Okta, Ping Identity) as the authoritative source for user and group information using IAM Identity Center. 4. Use the SCIM protocol to synchronize user and group information from the identity provider into IAM Identity Center's Identity Store. 5. Create permission sets in IAM Identity Center that represent different job roles or responsibilities within your organization. Define the appropriate IAM policies and permissions for each permission set. 6. Implement regular access reviews, prompt access revocation, and continuous improvement of the access management lifecycle process. 7. Provide training and awareness to employees on access management best practices.

๐Ÿ’ผ SEC03-BP07 Analyze public and cross-account access

Continually monitor findings that highlight public and cross-account access. Reduce public access and cross-account access to only the specific resources that require this access. **Desired outcome** - Know which of your AWS resources are shared and with whom. - Continually monitor and audit your shared resources to verify they are shared with only authorized principals. **Common anti-patterns** - Not keeping an inventory of shared resources. - Not following a process for approval of cross-account or public access to resources. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance If your account is in AWS Organizations, you can grant access to resources to the entire organization, specific organizational units, or individual accounts. If your account is not a member of an organization, you can share resources with individual accounts. You can grant direct cross-account access using resource-based policies โ€” for example, Amazon Simple Storage Service (Amazon S3) bucket policies โ€” or by allowing a principal in another account to assume an IAM role in your account. When using resource policies, verify that access is only granted to authorized principals. Define a process to approve all resources which are required to be publicly available. AWS Identity and Access Management Access Analyzer uses provable security to identify all access paths to a resource from outside of its account. It reviews resource policies continuously, and reports findings of public and cross-account access to make it simple for you to analyze potentially broad access. Consider configuring IAM Access Analyzer with AWS Organizations to verify that you have visibility to all your accounts. IAM Access Analyzer also allows you to preview findings before deploying resource permissions. This allows you to validate that your policy changes grant only the intended public and cross-account access to your resources. When designing for multi-account access, you can use trust policies to control in what cases a role can be assumed. For example, you could use the PrincipalOrgId condition key to deny an attempt to assume a role from outside your AWS Organizations. AWS Config can report resources that are misconfigured, and through AWS Config policy checks, can detect resources that have public access configured. Services such as AWS Control Tower and AWS Security Hub simplify deploying detective controls and guardrails across AWS Organizations to identify and remediate publicly exposed resources. For example, AWS Control Tower has a managed guardrail which can detect if any Amazon EBS snapshots are restorable by AWS accounts. ### Implementation steps 1. Consider using AWS Config for AWS Organizations: AWS Config allows you to aggregate findings from multiple accounts within an AWS Organizations to a delegated administrator account. This provides a comprehensive view, and allows you to deploy AWS Config Rules across accounts to detect publicly accessible resources. 2. Configure AWS Identity and Access Management Access Analyzer: IAM Access Analyzer helps you identify resources in your organization and accounts, such as Amazon S3 buckets or IAM roles that are shared with an external entity. 3. Use auto-remediation in AWS Config to respond to changes in public access configuration of Amazon S3 buckets: You can automatically turn on the block public access settings for Amazon S3 buckets. 4. Implement monitoring and alerting to identify if Amazon S3 buckets have become public: You must have monitoring and alerting in place to identify when Amazon S3 Block Public Access is turned off, and if Amazon S3 buckets become public. Additionally, if you are using AWS Organizations, you can create a service control policy that prevents changes to Amazon S3 public access policies. AWS Trusted Advisor checks for Amazon S3 buckets that have open access permissions. Bucket permissions that grant, upload, or delete access to everyone create potential security issues by allowing anyone to add, modify, or remove items in a bucket. The Trusted Advisor check examines explicit bucket permissions and associated bucket policies that might override the bucket permissions. You also can use AWS Config to monitor your Amazon S3 buckets for public access. When reviewing access controls for Amazon S3 buckets, it is important to consider the nature of the data stored within them. Amazon Macie is a service designed to help you discover and protect sensitive data, such as Personally Identifiable Information (PII), Protected Health Information (PHI), and credentials like private keys or AWS access keys.

๐Ÿ’ผ SEC03-BP08 Share resources securely within your organization

As the number of workloads grows, you might need to share access to resources in those workloads or provision the resources multiple times across multiple accounts. You might have constructs to compartmentalize your environment, such as having development, testing, and production environments. However, having separation constructs does not limit you from being able to share securely. By sharing components that overlap, you can reduce operational overhead and allow for a consistent experience without guessing what you might have missed while creating the same resource multiple times. **Desired outcome** - Minimize unintended access by using secure methods to share resources within your organization, and help with your data loss prevention initiative. - Reduce your operational overhead compared to managing individual components, reduce errors from manually creating the same component multiple times, and increase your workloadsโ€™ scalability. - You can benefit from decreased time to resolution in multi-point failure scenarios, and increase your confidence in determining when a component is no longer needed. - For prescriptive guidance on analyzing externally shared resources, see SEC03-BP07 Analyze public and cross-account access. **Common anti-patterns** - Lack of process to continually monitor and automatically alert on unexpected external share. - Lack of baseline on what should be shared and what should not. - Defaulting to a broadly open policy rather than sharing explicitly when required. - Manually creating foundational resources that overlap when required. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Architect your access controls and patterns to govern the consumption of shared resources securely and only with trusted entities. Monitor shared resources and review shared resource access continuously, and be alerted on inappropriate or unexpected sharing. Review Analyze public and cross-account access to help you establish governance to reduce the external access to only resources that require it, and to establish a process to monitor continuously and alert automatically. Cross-account sharing within AWS Organizations is supported by a number of AWS services, such as AWS Security Hub, Amazon GuardDuty, and AWS Backup. These services allow for data to be shared to a central account, be accessible from a central account, or manage resources and data from a central account. For example, AWS Security Hub can transfer findings from individual accounts to a central account where you can view all the findings. AWS Backup can take a backup for a resource and share it across accounts. You can use AWS Resource Access Manager (AWS RAM) to share other common resources, such as VPC subnets and Transit Gateway attachments, AWS Network Firewall, or Amazon SageMaker AI pipelines. To restrict your account to only share resources within your organization, use service control policies (SCPs) to prevent access to external principals. When sharing resources, combine identity-based controls and network controls to create a data perimeter for your organization to help protect against unintended access. A data perimeter is a set of preventive guardrails to help verify that only your trusted identities are accessing trusted resources from expected networks. These controls place appropriate limits on what resources can be shared and prevent sharing or exposing resources that should not be allowed. For example, as a part of your data perimeter, you can use VPC endpoint policies and the AWS:PrincipalOrgId condition to ensure the identities accessing your Amazon S3 buckets belong to your organization. It is important to note that SCPs do not apply to service-linked roles or AWS service principals. When using Amazon S3, turn off ACLs for your Amazon S3 bucket and use IAM policies to define access control. For restricting access to an Amazon S3 origin from Amazon CloudFront, migrate from origin access identity (OAI) to origin access control (OAC) which supports additional features including server-side encryption with AWS Key Management Service. In some cases, you might want to allow sharing resources outside of your organization or grant a third party access to your resources. For prescriptive guidance on managing permissions to share resources externally, see Permissions management. ### Implementation steps 1. Use AWS Organizations: AWS Organizations is an account management service that allows you to consolidate multiple AWS accounts into an organization that you create and centrally manage. You can group your accounts into organizational units (OUs) and attach different policies to each OU to help you meet your budgetary, security, and compliance needs. You can also control how AWS artificial intelligence (AI) and machine learning (ML) services can collect and store data, and use the multi-account management of the AWS services integrated with Organizations. 2. Integrate AWS Organizations with AWS services: When you use an AWS service to perform tasks on your behalf in the member accounts of your organization, AWS Organizations creates an IAM service-linked role (SLR) for that service in each member account. You should manage trusted access using the AWS Management Console, the AWS APIs, or the AWS CLI. For prescriptive guidance on turning on trusted access, see Using AWS Organizations with other AWS services and AWS services that you can use with Organizations. 3. Establish a data perimeter: A data perimeter provides a clear boundary of trust and ownership. On AWS, it is typically represented as your AWS organization managed by AWS Organizations, along with any on-premises networks or systems that access your AWS resources. The goal of the data perimeter is to verify that access is allowed if the identity is trusted, the resource is trusted, and the network is expected. However, establishing a data perimeter is not a one-size-fits-all approach. Evaluate and adopt the control objectives outlined in the Building a Perimeter on AWS whitepaper based on your specific security risk models and requirements. You should carefully consider your unique risk posture and implement the perimeter controls that align with your security needs. 4. Use resource sharing in AWS services and restrict accordingly: Many AWS services allow you to share resources with another account, or target a resource in another account, such as Amazon Machine Images (AMIs) and AWS Resource Access Manager (AWS RAM). Restrict the ModifyImageAttribute API to specify the trusted accounts to share the AMI with. Specify the ram:RequestedAllowsExternalPrincipals condition when using AWS RAM to constrain sharing to your organization only, to help prevent access from untrusted identities. For prescriptive guidance and considerations, see Resource sharing and external targets. 5. Use AWS RAM to share securely in an account or with other AWS accounts: AWS RAM helps you securely share the resources that you have created with roles and users in your account and with other AWS accounts. In a multi-account environment, AWS RAM allows you to create a resource once and share it with other accounts. This approach helps reduce your operational overhead while providing consistency, visibility, and auditability through integrations with Amazon CloudWatch and AWS CloudTrail, which you do not receive when using cross-account access. If you have resources that you shared previously using a resource-based policy, you can use the PromoteResourceShareCreatedFromPolicy API or an equivalent to promote the resource share to a full AWS RAM resource share. In some cases, you might need to take additional steps to share resources. For example, to share an encrypted snapshot, you need to share a AWS KMS key.

๐Ÿ’ผ SEC03-BP09 Share resources securely with a third party

The security of your cloud environment doesn't stop at your organization. Your organization might rely on a third party to manage a portion of your data. The permission management for the third-party managed system should follow the practice of just-in-time access using the principle of least privilege with temporary credentials. By working closely with a third party, you can reduce the scope of impact and risk of unintended access together. **Desired outcome** - You avoid using long-term AWS Identity and Access Management (IAM) credentials like access keys and secret keys, as they pose a security risk if misused. - You use IAM roles and temporary credentials to improve your security posture and minimize the operational overhead of managing long-term credentials. - When granting third-party access, you use a universally unique identifier (UUID) as the external ID in the IAM trust policy and keep the IAM policies attached to the role under your control to ensure least privilege access. - For prescriptive guidance on analyzing externally shared resources, see SEC03-BP07 Analyze public and cross-account access. **Common anti-patterns** - Using the default IAM trust policy without any conditions. - Using long-term IAM credentials and access keys. - Reusing external IDs. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance You might want to allow sharing resources outside of AWS Organizations or grant a third party access to your account. For example, a third party might provide a monitoring solution that needs to access resources within your account. In those cases, create an IAM cross-account role with only the privileges needed by the third party. Additionally, define a trust policy using the external ID condition. When using an external ID, you or the third party can generate a unique ID for each customer, third party, or tenancy. The unique ID should not be controlled by anyone but you after it's created. The third party must implement a process to relate the external ID to the customer in a secure, auditable, and reproducible manner. You can also use IAM Roles Anywhere to manage IAM roles for applications outside of AWS that use AWS APIs. If the third party no longer requires access to your environment, remove the role. Avoid providing long-term credentials to a third party. Maintain awareness of other AWS services that support sharing, such as the AWS Well-Architected Tool allowing sharing a workload with other AWS accounts, and AWS Resource Access Manager helping you securely share an AWS resource you own with other accounts. ### Implementation steps 1. Use cross-account roles to provide access to external accounts. Cross-account roles reduce the amount of sensitive information that is stored by external accounts and third parties for servicing their customers. Cross-account roles allow you to grant access to AWS resources in your account securely to a third party, such as AWS Partners or other accounts in your organization, while maintaining the ability to manage and audit that access. The third party might be providing service to you from a hybrid infrastructure or alternatively pulling data into an offsite location. IAM Roles Anywhere helps you allow third-party workloads to securely interact with your AWS workloads and further reduce the need for long-term credentials. You should not use long-term credentials or access keys associated with users to provide external account access. Instead, use cross-account roles to provide the cross-account access. 2. Perform due diligence and ensure secure access for third-party SaaS providers. When sharing resources with third-party SaaS providers, perform thorough due diligence to ensure they have a secure and responsible approach to accessing your AWS resources. Evaluate their shared responsibility model to understand what security measures they provide and what falls under your responsibility. Ensure that the SaaS provider has a secure and auditable process for accessing your resources, including the use of external IDs and least privilege access principles. The use of external IDs helps address the confused deputy problem. Implement security controls to ensure secure access and adherence to the principle of least privilege when granting access to third-party SaaS providers. This may include the use of external IDs, universally unique identifiers (UUIDs), and IAM trust policies that limit access to only what is strictly necessary. Work closely with the SaaS provider to establish secure access mechanisms, regularly review their access to your AWS resources, and conduct audits to ensure compliance with your security requirements. 3. Deprecate customer-provided long-term credentials. Deprecate the use of long-term credentials and use cross-account roles or IAM Roles Anywhere. If you must use long-term credentials, establish a plan to migrate to role-based access. For details on managing keys, see Identity management. Also, work with your AWS account team and the third party to establish a risk mitigation runbook. For prescriptive guidance on responding to and mitigating the potential impact of a security incident, see Incident response. 4. Verify that setup has prescriptive guidance or is automated. The external ID is not treated as a secret, but the external ID must not be an easily guessable value, such as a phone number, name, or account ID. Make the external ID a read-only field so that the external ID cannot be changed for the purpose of impersonating the setup. You or the third party can generate the external ID. Define a process to determine who is responsible for generating the ID. Regardless of the entity creating the external ID, the third party enforces uniqueness and formats consistently across customers. The policy created for cross-account access in your accounts must follow the least-privilege principle. The third party must provide a role policy document or automated setup mechanism that uses an AWS CloudFormation template or an equivalent for you. This reduces the chance of errors associated with manual policy creation and offers an auditable trail. The third party should provide an automated, auditable setup mechanism. However, by using the role policy document outlining the access needed, you should automate the setup of the role. Using an AWS CloudFormation template or equivalent, you should monitor for changes with drift d 5. Account for changes. Your account structure, your need for the third party, or their service offering being provided might change. You should anticipate changes and failures, and plan accordingly with the right people, process, and technology. Audit the level of access you provide on a periodic basis, and implement detection methods to alert you to unexpected changes. Monitor and audit the use of the role and the datastore of the external IDs. You should be prepared to revoke third-party access, either temporarily or permanently, as a result of unexpected changes or access patterns. Also, measure the impact to your revocation operation, including the time it takes to perform, the people involved, the cost, and the impact to other resources. For prescriptive guidance on detection methods, see the Detection best practices.

๐Ÿ’ผ SEC04-BP01 Configure service and application logging

Retain security event logs from services and applications. This is a fundamental principle of security for audit, investigations, and operational use cases, and a common security requirement driven by governance, risk, and compliance (GRC) standards, policies, and procedures. **Desired outcome** An organization should be able to reliably and consistently retrieve security event logs from AWS services and applications in a timely manner when required to fulfill an internal process or obligation, such as a security incident response. Consider centralizing logs for better operational results. **Common anti-patterns** - Logs are stored in perpetuity or deleted too soon. - Everybody can access logs. - Relying entirely on manual processes for log governance and use. - Storing every single type of log just in case it is needed. - Checking log integrity only when necessary. **Benefits of establishing this best practice** Implement a root cause analysis (RCA) mechanism for security incidents and a source of evidence for your governance, risk, and compliance obligations. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance During a security investigation or other use cases based on your requirements, you need to be able to review relevant logs to record and understand the full scope and timeline of the incident. Logs are also required for alert generation, indicating that certain actions of interest have happened. It is critical to select, turn on, store, and set up querying and retrieval mechanisms and alerting. ### Implementation steps - Select and use log sources. Ahead of a security investigation, you need to capture relevant logs to retroactively reconstruct activity in an AWS account. Select log sources relevant to your workloads. The log source selection criteria should be based on the use cases required by your business. Establish a trail for each AWS account using AWS CloudTrail or an AWS Organizations trail, and configure an Amazon S3 bucket for it. AWS CloudTrail is a logging service that tracks API calls made against an AWS account capturing AWS service activity. Itโ€™s turned on by default with a 90-day retention of management events that can be retrieved through CloudTrail Event history using the AWS Management Console, the AWS CLI, or an AWS SDK. For longer retention and visibility of data events, create a CloudTrail trail and associate it with an Amazon S3 bucket, and optionally with an Amazon CloudWatch log group. Alternatively, you can create a CloudTrail Lake, which retains CloudTrail logs for up to seven years and provides a SQL-based querying facility. AWS recommends that customers using a VPC turn on network traffic and DNS logs using VPC Flow Logs and Amazon Route 53 resolver query logs, respectively, and streaming them to either an Amazon S3 bucket or a CloudWatch log group. You can create a VPC flow log for a VPC, a subnet, or a network interface. For VPC Flow Logs, you can be selective on how and where you use Flow Logs to reduce cost. AWS CloudTrail Logs, VPC Flow Logs, and Route 53 resolver query logs are the basic logging sources to support security investigations in AWS. You can also use Amazon Security Lake to collect, normalize, and store this log data in Apache Parquet format and Open Cybersecurity Schema Framework (OCSF), which is ready for querying. Security Lake also supports other AWS logs and logs from third-party sources. AWS services can generate logs not captured by the basic log sources, such as Elastic Load Balancing logs, AWS WAF logs, AWS Config recorder logs, Amazon GuardDuty findings, Amazon Elastic Kubernetes Service (Amazon EKS) audit logs, and Amazon EC2 instance operating system and application logs. For a full list of logging and monitoring options, see Appendix A: Cloud capability definitions โ€“ Logging and Events of the AWS Security Incident Response Guide. - Research logging capabilities for each AWS service and application: Each AWS service and application provides you with options for log storage, each of which with its own retention and lifecycle capabilities. The two most common log storage services are Amazon Simple Storage Service (Amazon S3) and Amazon CloudWatch. For long retention periods, it is recommended to use Amazon S3 for its cost effectiveness and flexible lifecycle capabilities. If the primary logging option is Amazon CloudWatch Logs, as an option, you should consider archiving less frequently accessed logs to Amazon S3. - Select log storage: The choice of log storage is generally related to which querying tool you use, retention capabilities, familiarity, and cost. The main options for log storage are an Amazon S3 bucket or a CloudWatch Log group. An Amazon S3 bucket provides cost-effective, durable storage with an optional lifecycle policy. Logs stored in Amazon S3 buckets can be queried using services such as Amazon Athena. A CloudWatch log group provides durable storage and a built-in query facility through CloudWatch Logs Insights. - Identify appropriate log retention: When you use an Amazon S3 bucket or CloudWatch log group to store logs, you must establish adequate lifecycles for each log source to optimize storage and retrieval costs. Customers generally have between three months to one year of logs readily available for querying, with retention of up to seven years. The choice of availability and retention should align with your security requirements and a composite of statutory, regulatory, and business mandates. - Use logging for each AWS service and application with proper retention and lifecycle policies: - Configure AWS CloudTrail Trail - Configure VPC Flow Logs - Configure Amazon GuardDuty Finding Export - Configure AWS Config recording - Configure AWS WAF web ACL traffic - Configure AWS Network Firewall network traffic logs - Configure Elastic Load Balancing access logs - Configure Amazon Route 53 resolver query logs - Configure Amazon RDS logs - Configure Amazon EKS Control Plane logs - Configure Amazon CloudWatch agent for Amazon EC2 instances and on-premises servers - Select and implement querying mechanisms for logs: For log queries, you can use CloudWatch Logs Insights for data stored in CloudWatch log groups, and Amazon Athena and Amazon OpenSearch Service for data stored in Amazon S3. You can also use third-party querying tools such as a security information and event management (SIEM) service. The process for selecting a log querying tool should consider the people, process, and technology aspects of your security operations. Select a tool that fulfills operational, business, and security requirements, and is both accessible and maintainable in the long term. Keep in mind that log querying tools work optimally when the number of logs to be scanned is kept within the toolโ€™s limits. It is not uncommon to have multiple querying tools because of cost or technical constraints. For example, you might use a third-party security information and event management (SIEM) tool to perform queries for the last 90 days of data, but use Athena to perform queries beyond 90 days because of the log ingestion cost of a SIEM. Regardless of the implementation, verify that your approach minimizes the number of tools required to maximize operational efficiency, especially during a security event investigation. - Use logs for alerting: AWS provides alerting through several security services: - AWS Config monitors and records your AWS resource configurations and allows you to automate the evaluation and remediation against desired configurations. - Amazon GuardDuty is a threat detection service that continually monitors for malicious activity and unauthorized behavior to protect your AWS accounts and workloads. GuardDuty ingests, aggregates, and analyzes information from sources, such as AWS CloudTrail management and data events, DNS logs, VPC Flow Logs, and Amazon EKS Audit logs. GuardDuty pulls independent data streams directly from CloudTrail, VPC Flow Logs, DNS query logs, and Amazon EKS. You donโ€™t have to manage Amazon S3 bucket policies or modify the way you collect and store logs. It is still recommended to retain these logs for your own investigation and compliance purposes. - AWS Security Hub provides a single place that aggregates, organizes, and prioritizes your security alerts or findings from multiple AWS services and optional third-party products to give you a comprehensive view of security alerts and compliance status. You can also use custom alert generation engines for security alerts not covered by these services or for specific alerts relevant to your environment. For information on building these alerts and detections, see Detection in the AWS Security Incident Response Guide.

๐Ÿ’ผ SEC04-BP02 Capture logs, findings, and metrics in standardized locations

Security teams rely on logs and findings to analyze events that may indicate unauthorized activity or unintentional changes. To streamline this analysis, capture security logs and findings in standardized locations. This makes data points of interest available for correlation and can simplify tool integrations. **Desired outcome** You have a standardized approach to collect, analyze, and visualize log data, findings, and metrics. Security teams can efficiently correlate, analyze, and visualize security data across disparate systems to discover potential security events and identify anomalies. Security information and event management (SIEM) systems or other mechanisms are integrated to query and analyze log data for timely responses, tracking, and escalation of security events. **Common anti-patterns** - Teams independently own and manage logging and metrics collection that is inconsistent with the organization's logging strategy. - Teams don't have adequate access controls to restrict visibility and alteration of the data collected. - Teams don't govern their security logs, findings, and metrics as part of their data classification policy. - Teams neglect data sovereignty and localization requirements when configuring data collections. **Benefits of establishing this best practice** A standardized logging solution to collect and query log data and events improves insights derived from the information they contain. Configuring an automated lifecycle for the collected log data can reduce the costs incurred by log storage. You can build fine-grained access control for the collected log information according to the sensitivity of the data and access patterns needed by your teams. You can integrate tooling to correlate, visualize, and derive insights from the data. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Growth in AWS usage within an organization results in a growing number of distributed workloads and environments. As each of these workloads and environments generate data about the activity within them, capturing and storing this data locally presents a challenge for security operations. Security teams use tools such as SIEM systems to collect data from distributed sources and undergo correlation, analysis, and response workflows. This requires managing a complex set of permissions for accessing the various data sources and additional overhead in operating the extraction, transformation, and loading (ETL) processes. To overcome these challenges, consider aggregating all relevant sources of security log data into a Log Archive account as described in *Organizing Your AWS Environment Using Multiple Accounts*. This includes all security-related data from your workload and logs that AWS services generate, such as AWS CloudTrail, AWS WAF, Elastic Load Balancing, and Amazon Route 53. There are several benefits to capturing this data in standardized locations in a separate AWS account with proper cross-account permissions. This practice helps prevent log tampering within compromised workloads and environments, provides a single integration point for additional tools, and offers a more simplified model for configuring data retention and lifecycle. Evaluate the impacts of data sovereignty, compliance scopes, and other regulations to determine if multiple security data storage locations and retention periods are required. To ease capturing and standardizing logs and findings, evaluate Amazon Security Lake in your Log Archive account. You can configure Security Lake to automatically ingest data from common sources such as CloudTrail, Route 53, Amazon EKS, and VPC Flow Logs. You can also configure AWS Security Hub as a data source into Security Lake, allowing you to correlate findings from other AWS services, such as Amazon GuardDuty and Amazon Inspector, with your log data. You can also use third-party data source integrations, or configure custom data sources. All integrations standardize your data into the Open Cybersecurity Schema Framework (OCSF) format, and are stored in Amazon S3 buckets as Parquet files, eliminating the need for ETL processing. Storing security data in standardized locations provides advanced analytics capabilities. AWS recommends you deploy tools for security analytics that operate in an AWS environment into a Security Tooling account that is separate from your Log Archive account. This approach allows you to implement controls at depth to protect the integrity and availability of the logs and log management process, distinct from the tools that access them. Consider using services, such as Amazon Athena, to run on-demand queries that correlate multiple data sources. You can also integrate visualization tools, such as QuickSight. AI-powered solutions are becoming increasingly available and can perform functions such as translating findings into human-readable summaries and natural language interaction. These solutions are often more readily integrated by having a standardized data storage location for querying. ### Implementation steps 1. **Create the Log Archive and Security Tooling accounts** - Using AWS Organizations, create the Log Archive and Security Tooling accounts under a security organizational unit. If you are using AWS Control Tower to manage your organization, the Log Archive and Security Tooling accounts are created for you automatically. Configure roles and permissions for accessing and administering these accounts as required. 2. **Configure your standardized security data locations** - Determine your strategy for creating standardized security data locations. You can achieve this through options like common data lake architecture approaches, third-party data products, or Amazon Security Lake. AWS recommends that you capture security data from AWS Regions that are opted-in for your accounts, even when not actively in use. 3. **Configure data source publication to your standardized locations** - Identify the sources for your security data and configure them to publish into your standardized locations. Evaluate options to automatically export data in the desired format as opposed to those where ETL processes need to be developed. With Amazon Security Lake, you can collect data from supported AWS sources and integrated third-party systems. 4. **Configure tools to access your standardized locations** - Configure tools such as Amazon Athena, QuickSight, or third-party solutions to have the access required to your standardized locations. Configure these tools to operate out of the Security Tooling account with cross-account read access to the Log Archive account where applicable. Create subscribers in Amazon Security Lake to provide these tools access to your data.

๐Ÿ’ผ SEC04-BP03 Correlate and enrich security alerts

Unexpected activity can generate multiple security alerts by different sources, requiring further correlation and enrichment to understand the full context. Implement automated correlation and enrichment of security alerts to help achieve more accurate incident identification and response. **Desired outcome** As activity generates different alerts within your workloads and environments, automated mechanisms correlate data and enrich that data with additional information. This pre-processing presents a more detailed understanding of the event, which helps your investigators determine the criticality of the event and if it constitutes an incident that requires formal response. This process reduces the load on your monitoring and investigation teams. **Common anti-patterns** - Different groups of people investigate findings and alerts generated by different systems, unless otherwise mandated by separation of duty requirements. - Your organization funnels all security finding and alert data to standard locations, but requires investigators to perform manual correlation and enrichment. - You rely solely on the intelligence of threat detection systems to report on findings and establish criticality. **Benefits of establishing this best practice** Automated correlation and enrichment of alerts helps to reduce the overall cognitive load and manual data preparation required of your investigators. This practice can reduce the time it takes to determine if the event represents an incident and initiate a formal response. Additional context also helps you accurately assess the true severity of an event, as it may be higher or lower than what any one alert suggests. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Security alerts can come from many different sources within AWS, including: - Services such as Amazon GuardDuty, AWS Security Hub, Amazon Macie, Amazon Inspector, AWS Config, AWS Identity and Access Management Access Analyzer, and Network Access Analyzer. - Alerts from automated analysis of AWS service, infrastructure, and application logs, such as from Security Analytics for Amazon OpenSearch Service. - Alarms in response to changes in your billing activity from sources such as Amazon CloudWatch, Amazon EventBridge, or AWS Budgets. - Third-party sources such as threat intelligence feeds and Security Partner Solutions from the AWS Partner Network. - Contact by AWS Trust & Safety or other sources, such as customers or internal employees. In their most fundamental form, alerts contain information about who (the principal or identity) is doing what (the action taken) to what (the resources affected). For each of these sources, identify if there are ways you can create mappings across identifiers for these identities, actions, and resources as the foundation for performing correlation. This can take the form of integrating alert sources with a SIEM tool to perform automated correlation for you, building your own data pipelines and processing, or a combination of both. An example of a service that can perform correlation for you is Amazon Detective. Detective performs ongoing ingestion of alerts from various AWS and third-party sources and uses different forms of intelligence to assemble a visual graph of their relationships to aid investigations. While the initial criticality of an alert is an aid for prioritization, the context in which the alert happened determines its true criticality. For example, Amazon GuardDuty might alert that an EC2 instance within your workload is querying an unexpected domain name. GuardDuty might assign low criticality to this alert on its own. However, automated correlation with other activity around the time of the alert might uncover that several hundred EC2 instances were deployed by the same identity, increasing overall operating costs. In this event, this correlated event context would warrant a new security alert and the criticality might be adjusted to high, expediting further action. ### Implementation steps 1. Identify sources for security alert information. Understand how alerts from these systems represent identity, action, and resources to determine where correlation is possible. 2. Establish a mechanism for capturing alerts from different sources. Consider services such as Security Hub, EventBridge, and CloudWatch for this purpose. 3. Identify sources for data correlation and enrichment. Example sources include AWS CloudTrail, VPC Flow Logs, Route 53 Resolver logs, and infrastructure and application logs. Any or all of these logs might be consumed through a single integration with Amazon Security Lake. 4. Integrate your alerts with your data correlation and enrichment sources to create more detailed security event contexts and establish criticality. 1. Amazon Detective, SIEM tooling, or other third-party solutions can perform a certain level of ingestion, correlation, and enrichment automatically. 2. You can also use AWS services to build your own. For example, you can invoke an AWS Lambda function to run an Amazon Athena query against AWS CloudTrail or Amazon Security Lake, and publish the results to EventBridge.

๐Ÿ’ผ SEC04-BP04 Initiate remediation for non-compliant resources

Your detective controls may alert on resources that are out of compliance with your configuration requirements. You can initiate programmatically-defined remediations, either manually or automatically, to fix these resources and help minimize potential impacts. When you define remediations programmatically, you can take prompt and consistent action. While automation can enhance security operations, you should implement and manage automation carefully. Place appropriate oversight and control mechanisms to verify that automated responses are effective, accurate, and aligned with organizational policies and risk appetite. **Desired outcome** You define resource configuration standards along with the steps to remediate when resources are detected to be non-compliant. Where possible, you've defined remediations programmatically so they can be initiated either manually or through automation. Detection systems are in place to identify non-compliant resources and publish alerts into centralized tools that are monitored by your security personnel. These tools support running your programmatic remediations, either manually or automatically. Automatic remediations have appropriate oversight and control mechanisms in place to govern their use. **Common anti-patterns** - You implement automation, but fail to thoroughly test and validate remediation actions. This can result in unintended consequences, such as disrupting legitimate business operations or causing system instability. - You improve response times and procedures through automation, but without proper monitoring and mechanisms that allow human intervention and judgment when needed. - You rely solely on remediations, rather than having remediations as one part of a broader incident response and recovery program. **Benefits of establishing this best practice** Automatic remediations can respond to misconfigurations faster than manual processes, which helps you minimize potential business impacts and reduce the window of opportunity for unintended uses. When you define remediations programmatically, they are applied consistently, which reduces the risk of human error. Automation also can handle a larger volume of alerts simultaneously, which is particularly important in environments operating at large scale. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance As described in SEC01-BP03 Identify and validate control objectives, services such as AWS Config and AWS Security Hub can help you monitor the configuration of resources in your accounts for adherence to your requirements. When non-compliant resources are detected, services such as AWS Security Hub, can help with routing alerts appropriately and remediation. These solutions provide a central place for your security investigators to monitor for issues and take corrective action. While some non-compliant resource situations are unique and require human judgment to remediate, other situations have a standard response that you can define programmatically. For example, a standard response to a misconfigured VPC security group could be to remove the disallowed rules and notify the owner. Responses can be defined in AWS Lambda functions, AWS Systems Manager Automation documents, or through other code environments you prefer. Make sure the environment is able to authenticate to AWS using an IAM role with the least amount of permission needed to take corrective action. Once you define the desired remediation, you can then determine your preferred means for initiating it. AWS Config can initiate remediations for you. If you are using Security Hub, you can do this through custom actions, which publishes the finding information to Amazon EventBridge. An EventBridge rule can then initiate your remediation. You can configure remediations through Security Hub to run either automatically or manually. For programmatic remediation, we recommend that you have comprehensive logs and audits for the actions taken, as well as their outcomes. Review and analyze these logs to assess the effectiveness of the automated processes, and identify areas of improvement. Capture logs in Amazon CloudWatch Logs and remediation outcomes as finding notes in Security Hub. As a starting point, consider Automated Security Response on AWS, which has pre-built remediations for resolving common security misconfigurations. ### Implementation steps 1. Analyze and prioritize alerts. - Consolidate security alerts from various AWS services into Security Hub for centralized visibility, prioritization, and remediation. 2. Develop remediations. - Use services such as Systems Manager and AWS Lambda to run programmatic remediations. 3. Configure how remediations are initiated. - Using Systems Manager, define custom actions that publish findings to EventBridge. Configure these actions to be initiated manually or automatically. - You can also use Amazon Simple Notification Service (SNS) to send notifications and alerts to relevant stakeholders (like security team or incident response teams) for manual intervention or escalation, if required. 4. Review and analyze remediation logs for effectiveness and improvement. - Send log output to CloudWatch Logs. Capture outcomes as finding notes in Security Hub.

๐Ÿ’ผ SEC05-BP01 Create network layers

Segment your network topology into different layers based on logical groupings of your workload components according to their data sensitivity and access requirements. Distinguish between components that require inbound access from the internet, such as public web endpoints, and those that only need internal access, such as databases. **Desired outcome** The layers of your network are part of an integral defense-in-depth approach to security that complements the identity authentication and authorization strategy of your workloads. Layers are in place according to data sensitivity and access requirements, with appropriate traffic flow and control mechanisms. **Common anti-patterns** - You create all resources in a single VPC or subnet. - You construct your network layers without consideration of data sensitivity requirements, component behaviors, or functionality. - You use VPCs and subnets as defaults for all network layer considerations, and you don't consider how AWS managed services influence your topology. **Benefits of establishing this best practice** Establishing network layers is the first step in restricting unnecessary pathways through the network, particularly those that lead to critical systems and data. This makes it harder for unauthorized actors to gain access to your network and navigate to additional resources within. Discrete network layers beneficially reduce the scope of analysis for inspection systems, such as for intrusion detection or malware prevention. This reduces the potential for false positives and unnecessary processing overhead. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance When designing a workload architecture, it is common to separate components into different layers based on their responsibility. For example, a web application can have a presentation layer, application layer, and data layer. You can take a similar approach when designing your network topology. Underlying network controls can help enforce your workload's data access requirements. For example, in a three-tier web application architecture, you can store your static presentation layer files on Amazon S3 and serve them from a content delivery network (CDN), such as Amazon CloudFront. The application layer can have public endpoints that an Application Load Balancer (ALB) serves in an Amazon VPC public subnet (similar to a demilitarized zone, or DMZ), with back-end services deployed into private subnets. The data layer, that is hosting resources such as databases and shared file systems, can reside in different private subnets from the resources of your application layer. At each of these layer boundaries (CDN, public subnet, private subnet), you can deploy controls that allow only authorized traffic to traverse across those boundaries. Similar to modeling network layers based on the functional purpose of your workload's components, also consider the sensitivity of the data being processed. Using the web application example, while all of your workload services may reside within the application layer, different services may process data with different sensitivity levels. In this case, dividing the application layer using multiple private subnets, different VPCs in the same AWS account, or even different VPCs in different AWS accounts for each level of data sensitivity may be appropriate according to your control requirements. A further consideration for network layers is the behavior consistency of your workload's components. Continuing the example, in the application layer you may have services that accept inputs from end-users or external system integrations that are inherently riskier than the inputs to other services. Examples include file uploads, code scripts to run, email scanning and so on. Placing these services in their own network layer helps create a stronger isolation boundary around them, and can prevent their unique behavior from creating false positive alerts in inspection systems. As part of your design, consider how using AWS managed services influences your network topology. Explore how services such as Amazon VPC Lattice can help make the interoperability of your workload components across network layers easier. When using AWS Lambda, deploy in your VPC subnets unless there are specific reasons not to. Determine where VPC endpoints and AWS PrivateLink can simplify adhering to security policies that limit access to internet gateways. ### Implementation steps 1. Review your workload architecture. Logically group components and services based on the functions they serve, the sensitivity of data being processed, and their behavior. 2. For components responding to requests from the internet, consider using load balancers or other proxies to provide public endpoints. Explore shifting security controls by using managed services, such as CloudFront, Amazon API Gateway, Elastic Load Balancing, and AWS Amplify to host public endpoints. 3. For components running in compute environments, such as Amazon EC2 instances, AWS Fargate containers, or Lambda functions, deploy these into private subnets based on your groups from the first step. 4. For fully managed AWS services, such as Amazon DynamoDB, Amazon Kinesis, or Amazon SQS, consider using VPC endpoints as the default for access over private IP addresses.

๐Ÿ’ผ SEC05-BP02 Control traffic flow within your network layers

Within the layers of your network, use further segmentation to restrict traffic only to the flows necessary for each workload. First, focus on controlling traffic between the internet or other external systems to a workload and your environment (north-south traffic). Afterwards, look at flows between different components and systems (east-west traffic). **Desired outcome** You permit only the network flows necessary for the components of your workloads to communicate with each other and their clients and any other services they depend on. Your design factors in considerations such as public compared to private ingress and egress, data classification, regional regulations, and protocol requirements. Wherever possible, you favor point-to-point flows over network peering as part of a principle of least privilege design. **Common anti-patterns** - You take a perimeter-based approach to network security and only control traffic flow at the boundary of your network layers. - You assume all traffic within a network layer is authenticated and authorized. - You apply controls for either your ingress traffic or your egress traffic, but not both. - You rely solely on your workload components and network controls to authenticate and authorize traffic. **Benefits of establishing this best practice** This practice helps reduce the risk of unauthorized movement within your network and adds an extra layer of authorization to your workloads. By performing traffic flow control, you can restrict the scope of impact of a security incident and speed up detection and response. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance While network layers help establish the boundaries around components of your workload that serve a similar function, data sensitivity level, and behavior, you can create a much finer-grained level of traffic control by using techniques to further segment components within these layers that follows the principle of least privilege. Within AWS, network layers are primarily defined using subnets according to IP address ranges within an Amazon VPC. Layers can also be defined using different VPCs, such as for grouping microservice environments by business domain. When using multiple VPCs, mediate routing using an AWS Transit Gateway. While this provides traffic control at a Layer 4 level (IP address and port ranges) using security groups and route tables, you can gain further control using additional services, such as AWS PrivateLink, Amazon Route 53 Resolver DNS Firewall, AWS Network Firewall, and AWS WAF. Understand and inventory the data flow and communication requirements of your workloads in terms of connection-initiating parties, ports, protocols, and network layers. Evaluate the protocols available for establishing connections and transmitting data to select ones that achieve your protection requirements (for example, HTTPS rather than HTTP). Capture these requirements at both the boundaries of your networks and within each layer. Once these requirements are identified, explore options to only allow the required traffic to flow at each connection point. A good starting point is to use security groups within your VPC, as they can be attached to resources that uses an Elastic Network Interface (ENI), such Amazon EC2 instances, Amazon ECS tasks, Amazon EKS pods, or Amazon RDS databases. Unlike a Layer 4 firewall, a security group can have a rule that allows traffic from another security group by its identifier, minimizing updates as resources within the group change over time. You can also filter traffic using both inbound and outbound rules using security groups. When traffic moves between VPCs, it's common to use VPC peering for simple routing or the AWS Transit Gateway for complex routing. With these approaches, you facilitate traffic flows between the range of IP addresses of both the source and destination networks. However, if your workload only requires traffic flows between specific components in different VPCs, consider using a point-to-point connection using AWS PrivateLink. To do this, identify which service should act as the producer and which should act as the consumer. Deploy a compatible load balancer for the producer, turn on PrivateLink accordingly, and then accept a connection request by the consumer. The producer service is then assigned a private IP address from the consumer's VPC that the consumer can use to make subsequent requests. This approach reduces the need to peer the networks. Include the costs for data processing and load balancing as part of evaluating PrivateLink. While security groups and PrivateLink help control the flow between the components of your workloads, another major consideration is how to control which DNS domains your resources are allowed to access (if any). Depending on the DHCP configuration of your VPCs, you can consider two different AWS services for this purpose. Most customers use the default Route 53 Resolver DNS service (also called Amazon DNS server or AmazonProvidedDNS) available to VPCs at the +2 address of its CIDR range. With this approach, you can create DNS Firewall rules and associate them to your VPC that determine what actions to take for the domain lists you supply. If you are not using the Route 53 Resolver, or if you want to complement the Resolver with deeper inspection and flow control capabilities beyond domain filtering, consider deploying an AWS Network Firewall. This service inspects individual packets using either stateless or stateful rules to determine whether to deny or allow the traffic. You can take a similar approach for filtering inbound web traffic to your public endpoints using AWS WAF. For further guidance on these services, see SEC05-BP03 Implement inspection-based protection. ### Implementation steps 1. Identify the required data flows between the components of your workloads. 2. Apply multiple controls with a defense-in-depth approach for both inbound and outbound traffic, including the use of security groups, and route tables. 3. Use firewalls to define fine-grained control over network traffic in, out, and across your VPCs, such as the Route 53 Resolver DNS Firewall, AWS Network Firewall, and AWS WAF. Consider using the AWS Firewall Manager for centrally configuring and managing your firewall rules across your organization.

๐Ÿ’ผ SEC05-BP03 Implement inspection-based protection

Set up traffic inspection points between your network layers to make sure data in transit matches the expected categories and patterns. Analyze traffic flows, metadata, and patterns to help identify, detect, and respond to events more effectively. **Desired outcome** Traffic that traverses between your network layers are inspected and authorized. Allow and deny decisions are based on explicit rules, threat intelligence, and deviations from baseline behaviors. Protections become stricter as traffic moves closer to sensitive data. **Common anti-patterns** - Relying solely on firewall rules based on ports and protocols. Not taking advantage of intelligent systems. - Authoring firewall rules based on specific current threat patterns that are subject to change. - Only inspecting traffic where traffic transits from private to public subnets, or from public subnets to the Internet. - Not having a baseline view of your network traffic to compare for behavior anomalies. **Benefits of establishing this best practice** Inspection systems allow you to author intelligent rules, such as allowing or denying traffic only when certain conditions within the traffic data exist. Benefit from managed rule sets from AWS and partners, based on the latest threat intelligence, as the threat landscape changes over time. This reduces the overhead of maintaining rules and researching indicators of compromise, reducing the potential for false positives. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Have fine-grained control over both your stateful and stateless network traffic using AWS Network Firewall, or other Firewalls and Intrusion Prevention Systems (IPS) on AWS Marketplace that you can deploy behind a Gateway Load Balancer (GWLB). AWS Network Firewall supports Suricata-compatible open source IPS specifications to help protect your workload. Both the AWS Network Firewall and vendor solutions that use a GWLB support different inline inspection deployment models. For example, you can perform inspection on a per-VPC basis, centralize in an inspection VPC, or deploy in a hybrid model where east-west traffic flows through an inspection VPC and Internet ingress is inspected per-VPC. Another consideration is whether the solution supports unwrapping Transport Layer Security (TLS), enabling deep packet inspection for traffic flows initiated in either direction. If you are using solutions that perform out-of-band inspections, such as pcap analysis of packet data from network interfaces operating in promiscuous mode, you can configure VPC traffic mirroring. Mirrored traffic counts towards the available bandwidth of your interfaces and is subject to the same data transfer charges as non-mirrored traffic. You can see if virtual versions of these appliances are available on the AWS Marketplace, which may support inline deployment behind a GWLB. For components that transact over HTTP-based protocols, protect your application from common threats with a web application firewall (WAF). AWS WAF is a web application firewall that lets you monitor and block HTTP(S) requests that match your configurable rules before sending to Amazon API Gateway, Amazon CloudFront, AWS AppSync or an Application Load Balancer. Consider deep packet inspection when you evaluate the deployment of your web application firewall, as some require you to terminate TLS before traffic inspection. To get started with AWS WAF, you can use AWS Managed Rules in combination with your own, or use existing partner integrations. You can centrally manage AWS WAF, AWS Shield Advanced, AWS Network Firewall, and Amazon VPC security groups across your AWS Organization with AWS Firewall Manager. ### Implementation steps 1. Determine if you can scope inspection rules broadly, such as through an inspection VPC, or if you require a more granular per-VPC approach. 2. For inline inspection solutions: - If using AWS Network Firewall, create rules, firewall policies, and the firewall itself. Once these have been configured, you can route traffic to the firewall endpoint to enable inspection. - If using a third-party appliance with a Gateway Load Balancer (GWLB), deploy and configure your appliance in one or more availability zones. Then, create your GWLB, the endpoint service, endpoint, and configure routing for your traffic. 3. For out-of-band inspection solutions: - Turn on VPC Traffic Mirroring on interfaces where inbound and outbound traffic should be mirrored. You can use Amazon EventBridge rules to invoke an AWS Lambda function to turn on traffic mirroring on interfaces when new resources are created. Point the traffic mirroring sessions to the Network Load Balancer in front of your appliance that processes traffic. 4. For inbound web traffic solutions: - To configure AWS WAF, start by configuring a web access control list (web ACL). The web ACL is a collection of rules with a serially processed default action (ALLOW or DENY) that defines how your WAF handles traffic. You can create your own rules and groups or use AWS managed rule groups in your web ACL. - Once your web ACL is configured, associate the web ACL with an AWS resource (like an Application Load Balancer, API Gateway REST API, or CloudFront distribution) to begin protecting web traffic.

๐Ÿ’ผ SEC05-BP04 Automate network protection

Automate the deployment of your network protections using DevOps practices, such as infrastructure as code (IaC) and CI/CD pipelines. These practices can help you track changes in your network protections through a version control system, reduce the time it takes to deploy changes, and help detect if your network protections drift from your desired configuration. **Desired outcome** You define network protections with templates and commit them into a version control system. Automated pipelines are initiated when new changes are made that orchestrates their testing and deployment. Policy checks and other static tests are in place to validate changes before deployment. You deploy changes into a staging environment to validate the controls are operating as expected. Deployment into your production environments is also performed automatically once controls are approved. **Common anti-patterns** - Relying on individual workload teams to each define their complete network stack, protections, and automations. Not publishing standard aspects of the network stack and protections centrally for workload teams to consume. - Relying on a central network team to define all aspects of the network, protections, and automations. Not delegating workload-specific aspects of the network stack and protections to that workload's team. - Striking the right balance between centralization and delegation between a network team and workload teams, but not applying consistent testing and deployment standards across your IaC templates and CI/CD pipelines. Not capturing required configurations in tooling that checks your templates for adherence. **Benefits of establishing this best practice** Using templates to define your network protections allows you to track and compare changes over time with a version control system. Using automation to test and deploy changes creates standardization and predictability, increasing the chances of a successful deployment and reducing repetitive manual configurations. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance A number of network protection controls described in SEC05-BP02 Control traffic flows within your network layers and SEC05-BP03 Implement inspection-based protection come with managed rules systems that can update automatically based on the latest threat intelligence. Examples of protecting your web endpoints include AWS WAF managed rules and AWS Shield Advanced automatic application layer DDoS mitigation. Use AWS Network Firewall managed rule groups to stay up to date with low-reputation domain lists and threat signatures as well. Beyond managed rules, we recommend you use DevOps practices to automate deploying your network resources, protections, and the rules you specify. You can capture these definitions in AWS CloudFormation or another infrastructure as code (IaC) tool of your choice, commit them to a version control system, and deploy them using CI/CD pipelines. Use this approach to gain the traditional benefits of DevOps for managing your network controls, such as more predictable releases, automated testing using tools like AWS CloudFormation Guard, and detecting drift between your deployed environment and your desired configuration. Based on the decisions you made as part of SEC05-BP01 Create network layers, you may have a central management approach to creating VPCs that are dedicated for ingress, egress, and inspection flows. As described in the AWS Security Reference Architecture (AWS SRA), you can define these VPCs in a dedicated Network infrastructure account. You can use similar techniques to centrally define the VPCs used by your workloads in other accounts, their security groups, AWS Network Firewall deployments, Route 53 Resolver rules and DNS Firewall configurations, and other network resources. You can share these resources with your other accounts with the AWS Resource Access Manager. With this approach, you can simplify the automated testing and deployment of your network controls to the Network account, with only one destination to manage. You can do this in a hybrid model, where you deploy and share certain controls centrally and delegate other controls to the individual workload teams and their respective accounts. ### Implementation steps 1. Establish ownership over which aspects of the network and protections are defined centrally, and which your workload teams can maintain. 2. Create environments to test and deploy changes to your network and its protections. For example, use a Network Testing account and a Network Production account. 3. Determine how you will store and maintain your templates in a version control system. Store central templates in a repository that is distinct from workload repositories, while workload templates can be stored in repositories specific to that workload. 4. Create CI/CD pipelines to test and deploy templates. Define tests to check for misconfigurations and that templates adhere to your company standards.

๐Ÿ’ผ SEC06-BP01 Perform vulnerability management

Frequently scan and patch for vulnerabilities in your code, dependencies, and in your infrastructure to help protect against new threats. **Desired outcome** You have a solution that continually scans your workload for software vulnerabilities, potential defects, and unintended network exposure. You have established processes and procedures to identify, prioritize, and remediate these vulnerabilities based on risk assessment criteria. Additionally, you have implemented automated patch management for your compute instances. Your vulnerability management program is integrated into your software development lifecycle, with solutions to scan your source code during the CI/CD pipeline. **Common anti-patterns** - Not having a vulnerability management program. - Performing system patching without considering severity or risk avoidance. - Using software that has passed its vendor-provided end of life (EOL) date. - Deploying code into production before analyzing it for security issues. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Vulnerability management is a key aspect of maintaining a secure and robust cloud environment. It involves a comprehensive process that includes security scans, identification and prioritization of issues, and patch operations to resolve the identified vulnerabilities. Automation plays a pivotal role in this process because it facilitates continuous scanning of workloads for potential issues and unintended network exposure, as well as remediation efforts. The AWS Shared Responsibility Model is a fundamental concept that underpins vulnerability management. According to this model, AWS is responsible for securing the underlying infrastructure, including hardware, software, networking, and facilities that run AWS services. Conversely, you are responsible for securing your data, security configurations, and management tasks associated with services like Amazon EC2 instances and Amazon S3 objects. AWS offers a range of services to support vulnerability management programs. Amazon Inspector continuously scans AWS workloads for software vulnerabilities and unintended network access, while AWS Systems Manager Patch Manager helps manage patching across Amazon EC2 instances. These services can be integrated with AWS Security Hub, a cloud security posture management service that automates AWS security checks, centralizes security alerts, and provides a comprehensive view of an organization's security posture. Furthermore, Amazon CodeGuru Security uses static code analysis to identify potential issues in Java and Python applications during the development phase. By incorporating vulnerability management practices into the software development lifecycle, you can proactively address vulnerabilities before they are introduced into production environments, which reduces the risk of security events and minimizes the potential impact of vulnerabilities. ### Implementation steps 1. Understand the shared responsibility model: Review the AWS shared responsibility model to understand your responsibilities for securing your workloads and data in the cloud. AWS is responsible for securing the underlying cloud infrastructure, while you are responsible for securing your applications, data, and the services you use. 2. Implement vulnerability scanning: Configure a vulnerability scanning service, such as Amazon Inspector, to automatically scan your compute instances (for example, virtual machines, containers, or serverless functions) for software vulnerabilities, potential defects, and unintended network exposure. 3. Establish vulnerability management processes: Define processes and procedures to identify, prioritize, and remediate vulnerabilities. This may include the setup of regular vulnerability scanning schedules, establishment of risk assessment criteria, and definition of remediation timelines based on vulnerability severity. 4. Set up patch management: Use a patch management service to automate the process of patching your compute instances, both for operating systems and applications. You can configure the service to scan instances for missing patches and automatically install them on a schedule. Consider AWS Systems Manager Patch Manager to provide this functionality. 5. Configure malware protection: Implement mechanisms to detect malicious software in your environment. For example, you can use tools like Amazon GuardDuty to analyze, detect, and alert of malware in EC2 and EBS volumes. GuardDuty can also scan newly uploaded objects to Amazon S3 for potential malware or viruses and take action to isolate them before they are ingested into downstream processes. 6. Integrate vulnerability scanning in CI/CD pipelines: If you're using a CI/CD pipeline for your application deployment, integrate vulnerability scanning tools into your pipeline. Tools like Amazon CodeGuru Security and open-source options can scan your source code, dependencies, and artifacts for potential security issues. 7. Configure a security monitoring service: Set up a security monitoring service, such as AWS Security Hub, to get a comprehensive view of your security posture across multiple cloud services. The service should collect security findings from various sources and present them in a standardized format for easier prioritization and remediation. 8. Implement web application penetration testing: If your application is a web application, and your organization has the necessary skills or can hire outside assistance, consider implementing web application penetration testing to identify potential vulnerabilities in your application. 9. Automate with infrastructure as code: Use infrastructure as code (IaC) tools, such as AWS CloudFormation, to automate the deployment and configuration of your resources, including the security services mentioned previously. This practice helps you create a more consistent and standardized resource architecture across multiple accounts and environments. 10. Monitor and continually improve: Continually monitor your vulnerability management program's effectiveness, and make improvements as needed. Review security findings, assess the effectiveness of your remediation efforts, and adjust your processes and tools accordingly.

๐Ÿ’ผ SEC06-BP02 Provision compute from hardened images

Provide fewer opportunities for unintended access to your runtime environments by deploying them from hardened images. Only acquire runtime dependencies, such as container images and application libraries, from trusted registries and verify their signatures. Create your own private registries to store trusted images and libraries for use in your build and deploy processes. **Desired outcome** Your compute resources are provisioned from hardened baseline images. You retrieve external dependencies, such as container images and application libraries, only from trusted registries and verify their signatures. These are stored in private registries for your build and deployment processes to reference. You scan and update images and dependencies regularly to help protect against any newly discovered vulnerabilities. **Common anti-patterns** - Acquiring images and libraries from trusted registries, but not verifying their signature or performing vulnerability scans before putting into use. - Hardening images, but not regularly testing them for new vulnerabilities or updating to the latest version. - Installing or not removing software packages that are not required during the expected lifecycle of the image. - Relying solely on patching to keep production compute resources up to date. Patching alone can still cause compute resources to drift from the hardened standard over time. Patching can also fail to remove malware that may have been installed by a threat actor during a security event. **Benefits of establishing this best practice** Hardening images helps reduce the number of paths available in your runtime environment that can allow unintended access to unauthorized users or services. It also can reduce the scope of impact should any unintended access occur. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To harden your systems, start from the latest versions of operating systems, container images, and application libraries. Apply patches to known issues. Minimize the system by removing any unneeded applications, services, device drivers, default users, and other credentials. Take any other needed actions, such as disabling ports to create an environment that has only the resources and capabilities needed by your workloads. From this baseline, you can then install software, agents, or other processes you need for purposes such as workload monitoring or vulnerability management. You can reduce the burden of hardening systems by using guidance that trusted sources provide, such as the Center for Internet Security (CIS) and the Defense Information Systems Agency (DISA) Security Technical Implementation Guides (STIGs). We recommend you start with an Amazon Machine Image (AMI) published by AWS or an APN partner, and use the AWS EC2 Image Builder to automate configuration according to an appropriate combination of CIS and STIG controls. While there are available hardened images and EC2 Image Builder recipes that apply the CIS or DISA STIG recommendations, you may find their configuration prevents your software from running successfully. In this situation, you can start from a non-hardened base image, install your software, and then incrementally apply CIS controls to test their impact. For any CIS control that prevents your software from running, test if you can implement the finer-grained hardening recommendations in a DISA instead. Keep track of the different CIS controls and DISA STIG configurations you are able to apply successfully. Use these to define your image hardening recipes in EC2 Image Builder accordingly. For containerized workloads, hardened images from Docker are available on the Amazon Elastic Container Registry (ECR) public repository. You can use EC2 Image Builder to harden container images alongside AMIs. Similar to operating systems and container images, you can obtain code packages (or libraries) from public repositories, through tooling such as pip, npm, Maven, and NuGet. We recommend you manage code packages by integrating private repositories, such as within AWS CodeArtifact, with trusted public repositories. This integration can handle retrieving, storing, and keeping packages up-to-date for you. Your application build processes can then obtain and test the latest version of these packages alongside your application, using techniques like Software Composition Analysis (SCA), Static Application Security Testing (SAST), and Dynamic Application Security Testing (DAST). For serverless workloads that use AWS Lambda, simplify managing package dependencies using Lambda layers. Use Lambda layers to configure a set of standard dependencies that are shared across different functions into a standalone archive. You can create and maintain layers through their own build process, providing a central way for your functions to stay up-to-date. ### Implementation steps 1. Harden operating systems. Use base images from trusted sources as a foundation for building your hardened AMIs. Use EC2 Image Builder to help customize the software installed on your images. 2. Harden containerized resources. Configure containerized resources to meet security best practices. When using containers, implement ECR Image Scanning in your build pipeline and on a regular basis against your image repository to look for CVEs in your containers. 3. When using serverless implementation with AWS Lambda, use Lambda layers to segregate application function code and shared dependent libraries. Configure code signing for Lambda to make sure that only trusted code runs in your Lambda functions.

๐Ÿ’ผ SEC06-BP03 Reduce manual management and interactive access

Use automation to perform deployment, configuration, maintenance, and investigative tasks wherever possible. Consider manual access to compute resources in cases of emergency procedures or in safe (sandbox) environments, when automation is not available. **Desired outcome** Programmatic scripts and automation documents (runbooks) capture authorized actions on your compute resources. These runbooks are initiated either automatically, through change detection systems, or manually, when human judgment is required. Direct access to compute resources is only made available in emergency situations when automation is not available. All manual activities are logged and incorporated into a review process to continually improve your automation capabilities. **Common anti-patterns** - Interactive access to Amazon EC2 instances with protocols such as SSH or RDP. - Maintaining individual user logins such as /etc/passwd or Windows local users. - Sharing a password or private key to access an instance among multiple users. - Manually installing software and creating or updating configuration files. - Manually updating or patching software. - Logging into an instance to troubleshoot problems. **Benefits of establishing this best practice** Performing actions with automation helps you to reduce the operational risk of unintended changes and misconfigurations. Removing the use of Secure Shell (SSH) and Remote Desktop Protocol (RDP) for interactive access reduces the scope of access to your compute resources. This takes away a common path for unauthorized actions. Capturing your compute resource management tasks in automation documents and programmatic scripts provides a mechanism to define and audit the full scope of authorized activities at a fine-grained level of detail. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Logging into an instance is a classic approach to system administration. After installing the server operating system, users would typically log in manually to configure the system and install the desired software. During the server's lifetime, users might log in to perform software updates, apply patches, change configurations, and troubleshoot problems. Manual access poses a number of risks, however. It requires a server that listens for requests, such as an SSH or RDP service, that can provide a potential path to unauthorized access. It also increases the risk of human error associated with performing manual steps. These can result in workload incidents, data corruption or destruction, or other security issues. Human access also requires protections against the sharing of credentials, creating additional management overhead. To mitigate these risks, you can implement an agent-based remote access solution, such as AWS Systems Manager. AWS Systems Manager Agent (SSM Agent) initiates an encrypted channel and thus does not rely on listening for externally-initiated requests. Consider configuring SSM Agent to establish this channel over a VPC endpoint. Systems Manager gives you fine-grained control over how you can interact with your managed instances. You define the automations to run, who can run them, and when they can run. Systems Manager can apply patches, install software, and make configuration changes without interactive access to the instance. Systems Manager can also provide access to a remote shell and log every command invoked, and its output, during the session to logs and Amazon S3. AWS CloudTrail records invocations of Systems Manager APIs for inspection. ### Implementation steps 1. Install AWS Systems Manager Agent (SSM Agent) on your Amazon EC2 instances. Check to see if SSM Agent is included and started automatically as part of your base AMI configuration. 2. Verify that the IAM Roles associated with your EC2 instance profiles include the AmazonSSMManagedInstanceCore managed IAM policy. 3. Disable SSH, RDP, and other remote access services running on your instances. You can do this by running scripts configured in the user data section of your launch templates or by building customized AMIs with tools such as EC2 Image Builder. 4. Verify that the security group ingress rules applicable to your EC2 instances do not permit access on port 22/tcp (SSH) or port 3389/tcp (RDP). Implement detection and alerting on misconfigured security groups using services such as AWS Config. 5. Define appropriate automations, runbooks, and run commands in Systems Manager. Use IAM policies to define who can perform these actions and the conditions under which they are permitted. Test these automations thoroughly in a non-production environment. Invoke these automations when necessary, instead of interactively accessing the instance. 6. Use AWS Systems Manager Session Manager to provide interactive access to instances when necessary. Turn on session activity logging to maintain an audit trail in Amazon CloudWatch Logs or Amazon S3.

๐Ÿ’ผ SEC06-BP04 Validate software integrity

Use cryptographic verification to validate the integrity of software artifacts (including images) your workload uses. Cryptographically sign your software as a safeguard against unauthorized changes run within your compute environments. **Desired outcome** All artifacts are obtained from trusted sources. Vendor website certificates are validated. Downloaded artifacts are cryptographically verified by their signatures. Your own software is cryptographically signed and verified by your computing environments. **Common anti-patterns** - Trusting reputable vendor websites to obtain software artifacts, but ignoring certificate expiration notices. Proceeding with downloads without confirming certificates are valid. - Validating vendor website certificates, but not cryptographically verifying downloaded artifacts from these websites. - Relying solely on digests or hashes to validate software integrity. Hashes establish that artifacts have not been modified from the original version, but do not validate their source. - Not signing your own software, code, or libraries, even when only used in your own deployments. **Benefits of establishing this best practice** Validating the integrity of artifacts that your workload depends on helps prevent malware from entering your compute environments. Signing your software helps safeguard against unauthorized running in your compute environments. Secure your software supply chain by signing and verifying code. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Operating system images, container images, and code artifacts are often distributed with integrity checks available, such as through a digest or hash. These allow clients to verify integrity by computing their own hash of the payload and validating it is the same as the one published. While these checks help verify that the payload has not been tampered with, they do not validate the payload came from the original source (its provenance). Verifying provenance requires a certificate that a trusted authority issues to digitally sign the artifact. If you are using a downloaded software or artifacts in your workload, check if the provider provides a public key for digital signature verification. Here are some examples of how AWS provides a public key and verification instructions for software we publish: - EC2 Image Builder: Verify the signature of the AWSTOE installation download - AWS Systems Manager: Verifying the signature of SSM Agent - Amazon CloudWatch: Verifying the signature of the CloudWatch agent package Incorporate digital signature verification into the processes you use for obtaining and hardening images, as discussed in SEC06-BP02 Provision compute from hardened images. You can use AWS Signer to help manage the verification of signatures, as well as your own code-signing lifecycle for your own software and artifacts. Both AWS Lambda and Amazon Elastic Container Registry provide integrations with Signer to verify the signatures of your code and images. Using the examples in the Resources section, you can incorporate Signer into your continuous integration and delivery (CI/CD) pipelines to automate verification of signatures and the signing of your own code and images.

๐Ÿ’ผ SEC06-BP05 Automate compute protection

Automate compute protection operations to reduce the need for human intervention. Use automated scanning to detect potential issues within your compute resources, and remediate with automated programmatic responses or fleet management operations. Incorporate automation in your CI/CD processes to deploy trustworthy workloads with up-to-date dependencies. **Desired outcome** Automated systems perform all scanning and patching of compute resources. You use automated verification to check that software images and dependencies come from trusted sources, and have not been tampered with. Workloads are automatically checked for up-to-date dependencies, and are signed to establish trustworthiness in AWS compute environments. Automated remediations are initiated when non-compliant resources are detected. **Common anti-patterns** - Following the practice of immutable infrastructure, but not having a solution in place for emergency patching or replacement of production systems. - Using automation to fix misconfigured resources, but not having a manual override mechanism in place. Situations may arise where you need to adjust the requirements, and you may need to suspend automations until you make these changes. **Benefits of establishing this best practice** Automation can reduce the risk of unauthorized access and use of your compute resources. It helps to prevent misconfigurations from making their way into production environments, and detecting and fixing misconfigurations should they occur. Automation also helps to detect unauthorized access and use of compute resources to reduce your time to respond. This in turn can reduce the overall scope of impact from the issue. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance You can apply the automations described in the Security Pillar practices for protecting your compute resources. SEC06-BP01 Perform vulnerability management describes how you can use Amazon Inspector in both your CI/CD pipelines and for continually scanning your runtime environments for known Common Vulnerabilities and Exposures (CVEs). You can use AWS Systems Manager to apply patches or redeploy from fresh images through automated runbooks to keep your compute fleet updated with the latest software and libraries. Use these techniques to reduce the need for manual processes and interactive access to your compute resources. See SEC06-BP03 Reduce manual management and interactive access to learn more. Automation also plays a role in deploying workloads that are trustworthy, described in SEC06-BP02 Provision compute from hardened images and SEC06-BP04 Validate software integrity. You can use services such as EC2 Image Builder, AWS Signer, AWS CodeArtifact, and Amazon Elastic Container Registry (ECR) to download, verify, construct, and store hardened and approved images and code dependencies. Alongside Inspector, each of these can play a role in your CI/CD process so your workload makes its way to production only when it is confirmed that its dependencies are up-to-date and from trusted sources. Your workload is also signed so AWS compute environments, such as AWS Lambda and Amazon Elastic Kubernetes Service (EKS), can verify it hasn't been tampered with before allowing it to run. Beyond these preventative controls, you can use automation in your detective controls for your compute resources as well. As one example, AWS Security Hub offers the NIST 800-53 Rev. 5 standard that includes checks such as [EC2.8] EC2 instances should use Instance Metadata Service Version 2 (IMDSv2). IMDSv2 uses the techniques of session authentication, blocking requests that contain an X-Forwarded-For HTTP header, and a network TTL of 1 to stop traffic originating from external sources to retrieve information about the EC2 instance. This check in Security Hub can detect when EC2 instances use IMDSv1 and initiate automated remediation. Learn more about automated detection and remediations in SEC04-BP04 Initiate remediation for non-compliant resources. ### Implementation steps 1. Automate creating secure, compliant and hardened AMIs with EC2 Image Builder. You can produce images that incorporate controls from the Center for Internet Security (CIS) Benchmarks or Security Technical Implementation Guide (STIG) standards from base AWS and APN partner images. 2. Automate configuration management. Enforce and validate secure configurations in your compute resources automatically by using a configuration management service or tool. 1. Automated configuration management using AWS Config 2. Automated security and compliance posture management using AWS Security Hub 3. Automate patching or replacing Amazon Elastic Compute Cloud (Amazon EC2) instances. AWS Systems Manager Patch Manager automates the process of patching managed instances with both security-related and other types of updates. You can use Patch Manager to apply patches for both operating systems and applications. 4. Automate scanning of compute resources for common vulnerabilities and exposures (CVEs), and embed security scanning solutions within your build pipeline. 5. Consider Amazon GuardDuty for automatic malware and threat detection to protect compute resources. GuardDuty can also identify potential issues when an AWS Lambda function gets invoked in your AWS environment. 6. Consider AWS Partner solutions. AWS Partners offer industry-leading products that are equivalent, identical to, or integrate with existing controls in your on-premises environments. These products complement the existing AWS services to allow you to deploy a comprehensive security architecture and a more seamless experience across your cloud and on-premises environments.

๐Ÿ’ผ SEC07-BP01 Understand your data classification scheme

Understand the classification of data your workload is processing, its handling requirements, the associated business processes, where the data is stored, and who the data owner is. Your data classification and handling scheme should consider the applicable legal and compliance requirements of your workload and what data controls are needed. Understanding the data is the first step in the data classification journey. **Desired outcome** The types of data present in your workload are well-understood and documented. Appropriate controls are in place to protect sensitive data based on its classification. These controls govern considerations such as who is allowed to access the data and for what purpose, where the data is stored, the encryption policy for that data and how encryption keys are managed, the lifecycle for the data and its retention requirements, appropriate destruction processes, what backup and recovery processes are in place, and the auditing of access. **Common anti-patterns** - Not having a formal data classification policy in place to define data sensitivity levels and their handling requirements - Not having a good understanding of the sensitivity levels of data within your workload, and not capturing this information in architecture and operations documentation - Failing to apply the appropriate controls around your data based on its sensitivity and requirements, as outlined in your data classification and handling policy - Failing to provide feedback about data classification and handling requirements to owners of the policies. **Benefits of establishing this best practice** This practice removes ambiguity around the appropriate handling of data within your workload. Applying a formal policy that defines the sensitivity levels of data in your organization and their required protections can help you comply with legal regulations and other cybersecurity attestations and certifications. Workload owners can have confidence in knowing where sensitive data is stored and what protection controls are in place. Capturing these in documentation helps new team members better understand them and maintain controls early in their tenure. These practices can also help reduce costs by right sizing the controls for each type of data. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance When designing a workload, you may be considering ways to protect sensitive data intuitively. For example, in a multi-tenant application, it is intuitive to think of each tenant's data as sensitive and put protections in place so that one tenant can't access the data of another tenant. Likewise, you may intuitively design access controls so only administrators can modify data while other users have only read-level access or no access at all. By having these data sensitivity levels defined and captured in policy, along with their data protection requirements, you can formally identify what data resides in your workload. You can then determine if the right controls are in place, if the controls can be audited, and what responses are appropriate if data is found to be mishandled. To help identify where sensitive data resides within your workload, consider using a data catalog. A data catalog is a database that maps data in your organization, its location, sensitivity level, and the controls in place to protect that data. Additionally, consider using resource tags where available. For example, you can apply a tag that has a tag key of Classification and a tag value of PHI for protected health information (PHI), and another tag that has a tag key of Sensitivity and a tag value of High. Services such as AWS Config can then be used to monitor these resources for changes and alert if they are modified in a way that brings them out of compliance with your protection requirements (such as changing the encryption settings). You can capture the standard definition of your tag keys and acceptable values using tag policies, a feature of AWS Organizations. It is not recommended that the tag key or value contains private or sensitive data. ### Implementation steps 1. Understand your organization's data classification scheme and protection requirements. 2. Identify the types of sensitive data processed by your workloads. 3. Capture the data in a data catalog that provides a single view of where data resides in the organization and the level of sensitivity of that data. 4. Consider using resource and data-level tagging, where available, to tag data with its sensitivity level and other operational metadata that can help with monitoring and incident response. - AWS Organizations tag policies can be used to enforce tagging standards.

๐Ÿ’ผ SEC07-BP02 Apply data protection controls based on data sensitivity

Apply data protection controls that provide an appropriate level of control for each class of data defined in your classification policy. This practice can allow you to protect sensitive data from unauthorized access and use, while preserving the availability and use of data. **Desired outcome** You have a classification policy that defines the different levels of sensitivity for data in your organization. For each of these sensitivity levels, you have clear guidelines published for approved storage and handling services and locations, and their required configuration. You implement the controls for each level according to the level of protection required and their associated costs. You have monitoring and alerting in place to detect if data is present in unauthorized locations, processed in unauthorized environments, accessed by unauthorized actors, or the configuration of related services becomes non-compliant. **Common anti-patterns** - Applying the same level of protection controls across all data. This may lead to over-provisioning security controls for low-sensitivity data, or insufficient protection of highly sensitive data. - Not involving relevant stakeholders from security, compliance, and business teams when defining data protection controls. - Overlooking the operational overhead and costs associated with implementing and maintaining data protection controls. - Not conducting periodic data protection control reviews to maintain alignment with classification policies. - Not having a complete inventory of where data resides at rest and in transit. **Benefits of establishing this best practice** By aligning your controls to the classification level of your data, your organization can invest in higher levels of control where needed. This can include increasing resources on securing, monitoring, measuring, remediating, and reporting. Where fewer controls are appropriate, you can improve the accessibility and completeness of data for your workforce, customers, or constituents. This approach gives your organization the most flexibility with data usage, while still adhering to data protection requirements. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Implementing data protection controls based on data sensitivity levels involves several key steps. First, identify the different data sensitivity levels within your workload architecture (such as public, internal, confidential, and restricted) and evaluate where you store and process this data. Next, define isolation boundaries around data based on its sensitivity level. We recommend you separate data into different AWS accounts, using service control policies (SCPs) to restrict services and actions allowed for each data sensitivity level. This way, you can create strong isolation boundaries and enforce the principle of least privilege. After you define the isolation boundaries, implement appropriate protection controls based on the data sensitivity levels. Refer to best practices for Protecting data at rest and Protecting data in transit to implement relevant controls like encryption, access controls, and auditing. Consider techniques like tokenization or anonymization to reduce the sensitivity level of your data. Simplify applying consistent data policies across your business with a centralized system for tokenization and de-tokenization. Continuously monitor and test the effectiveness of the implemented controls. Regularly review and update the data classification scheme, risk assessments, and protection controls as your organization's data landscape and threats evolve. Align the implemented data protection controls with relevant industry regulations, standards, and legal requirements. Further, provide security awareness and training to help employees understand the data classification scheme and their responsibilities in handling and protecting sensitive data. ### Implementation steps 1. Identify the classification and sensitivity levels of data within your workload. 2. Define isolation boundaries for each level and determine an enforcement strategy. 3. Evaluate the controls you define that govern access, encryption, auditing, retention, and others required by your data classification policy. 4. Evaluate options to reduce the sensitivity level of data where appropriate, such as using tokenization or anonymization. 5. Verify your controls using automated testing and monitoring of your configured resources.

๐Ÿ’ผ SEC07-BP03 Automate identification and classification

Automating the identification and classification of data can help you implement the correct controls. Using automation to augment manual determination reduces the risk of human error and exposure. **Desired outcome** You are able to verify whether the proper controls are in place based on your classification and handling policy. Automated tools and services help you to identify and classify the sensitivity level of your data. Automation also helps you continually monitor your environments to detect and alert if data is being stored or handled in unauthorized ways so corrective action can be taken quickly. **Common anti-patterns** - Relying solely on manual processes for data identification and classification, which can be error-prone and time-consuming. This can lead to inefficient and inconsistent data classification, especially as data volumes grow. - Not having mechanisms to track and manage data assets across the organization. - Overlooking the need for continuous monitoring and classification of data as it moves and evolves within the organization. **Benefits of establishing this best practice** Automating data identification and classification can lead to more consistent and accurate application of data protection controls, reducing the risk of human error. Automation can also provide visibility into sensitive data access and movement, helping you detect unauthorized handling and take corrective action. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance While human judgment is often used to classify data during the initial design phases of a workload, consider having systems in place that automate identification and classification on test data as a preventive control. For example, developers can be provided a tool or service to scan representative data to determine its sensitivity. Within AWS, you can upload data sets into Amazon S3 and scan them using Amazon Macie, Amazon Comprehend, or Amazon Comprehend Medical. Likewise, consider scanning data as part of unit and integration testing to detect where sensitive data is not expected. Alerting on sensitive data at this stage can highlight gaps in protections before deployment to production. Other features such as sensitive data detection in AWS Glue, Amazon SNS, and Amazon CloudWatch can also be used to detect PII and take mitigating action. For any automated tool or service, understand how it defines sensitive data, and augment it with other human or automated solutions to close any gaps as needed. As a detective control, use ongoing monitoring of your environments to detect if sensitive data is being stored in non-compliant ways. This can help detect situations such as sensitive data being emitted into log files or being copied to a data analytics environment without proper de-identification or redaction. Data that is stored in Amazon S3 can be continually monitored for sensitive data using Amazon Macie. ### Implementation steps 1. Review the data classification scheme within your organization described in SEC07-BP01. - With an understanding of your organization's data classification scheme, you can establish accurate processes for automated identification and classification that align with your company's policies. 3. Perform an initial scan of your environments for automated identification and classification. - An initial full scan of your data can help produce a comprehensive understanding of where sensitive data resides in your environments. When a full scan is not initially required or is unable to be completed up-front due to cost, evaluate if data sampling techniques are suitable to achieve your outcomes. For example, Amazon Macie can be configured to perform a broad automated sensitive data discovery operation across your S3 buckets. This capability uses sampling techniques to cost-efficiently perform a preliminary analysis of where sensitive data resides. A deeper analysis of S3 buckets can then be performed using a sensitive data discovery job. Other data stores can also be exported to S3 to be scanned by Macie. - Establish access control defined in SEC07-BP02 for your data storage resources identified within your scan. 3. Configure ongoing scans of your environments. - The automated sensitive data discovery capability of Macie can be used to perform ongoing scans of your environments. Known S3 buckets that are authorized to store sensitive data can be excluded using an allow list in Macie. 4. Incorporate identification and classification into your build and test processes. - Identify tools that developers can use to scan data for sensitivity while workloads are in development. Use these tools as part of integration testing to alert when sensitive data is unexpected and prevent further deployment. 5. Implement a system or runbook to take action when sensitive data is found in unauthorized locations. - Restrict access to data using auto-remediation. For example, you can move this data to an S3 bucket with restricted access or tag the object if you use attribute-based access control (ABAC). Additionally, consider masking the data when it is detected. - Alert your data protection and incident response teams to investigate the root cause of the incident. Any learnings they identify can help prevent future incidents.

๐Ÿ’ผ SEC07-BP04 Define scalable data lifecycle management

Understand your data lifecycle requirements as they relate to your different levels of data classification and handling. This can include how data is handled when it first enters your environment, how data is transformed, and the rules for its destruction. Consider factors such as retention periods, access, auditing, and tracking provenance. **Desired outcome** You classify data as close as possible to the point and time of ingestion. When data classification requires masking, tokenization, or other processes that reduce sensitivity level, you perform these actions as close as possible to point and time of ingestion. You delete data in accordance with your policy when it is no longer appropriate to keep, based on its classification. **Common anti-patterns** - Implementing a one-size-fits-all approach to data lifecycle management, without considering varying sensitivity levels and access requirements. - Considering lifecycle management only from the perspective of either data that is usable, or data that is backed up, but not both. - Assuming that data that has entered your workload is valid, without establishing its value or provenance. - Relying on data durability as a substitute for data backups and protection. - Retaining data beyond its usefulness and required retention period. **Benefits of establishing this best practice** A well-defined and scalable data lifecycle management strategy helps maintain regulatory compliance, improves data security, optimizes storage costs, and enables efficient data access and sharing while maintaining appropriate controls. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Data within a workload is often dynamic. The form it takes when entering your workload environment can be different from when it is stored or used in business logic, reporting, analytics, or machine learning. In addition, the value of data can change over time. Some data is temporal in nature and loses value as it gets older. Consider how these changes to your data impact evaluation under your data classification scheme and associated controls. Where possible, use an automated lifecycle mechanism, such as Amazon S3 lifecycle policies and the Amazon Data Lifecycle Manager, to configure your data retention, archiving, and expiration processes. For data stored in DynamoDB, you can use the Time To Live (TTL) feature to define a per-item expiration timestamp. Distinguish between data that is available for use, and data that is stored as a backup. Consider using AWS Backup to automate the backup of data across AWS services. Amazon EBS snapshots provide a way to copy an EBS volume and store it using S3 features, including lifecycle, data protection, and access to protection mechanisms. Two of these mechanisms are S3 Object Lock and AWS Backup Vault Lock, which can provide you with additional security and control over your backups. Manage clear separation of duties and access for backups. Isolate backups at the account level to maintain separation from the affected environment during an event. Another aspect of lifecycle management is recording the history of data as it progresses through your workload, called data provenance tracking. This can give confidence that you know where the data came from, any transformations performed, what owner or process made those changes, and when. Having this history helps with troubleshooting issues and investigations during potential security events. For example, you can log metadata about transformations in an Amazon DynamoDB table. Within a data lake, you can keep copies of transformed data in different S3 buckets for each data pipeline stage. Store schema and timestamp information in an AWS Glue Data Catalog. Regardless of your solution, consider the requirements of your end users to determine the appropriate tooling you need to report on your data provenance. This will help you determine how to best track your provenance. ### Implementation steps 1. Analyze the workload's data types, sensitivity levels, and access requirements to classify the data and define appropriate lifecycle management strategies. 2. Design and implement data retention policies and automated destruction processes that align with legal, regulatory, and organizational requirements. 3. Establish processes and automation for continuous monitoring, auditing, and adjustment of data lifecycle management strategies, controls, and policies as workload requirements and regulations evolve. - Detect resources that do not have automated lifecycle management turned on with AWS Config.

๐Ÿ’ผ SEC08-BP01 Implement secure key management

Secure key management includes the storage, rotation, access control, and monitoring of key material required to secure data at rest for your workload. **Desired outcome** You have a scalable, repeatable, and automated key management mechanism. The mechanism enforces least privilege access to key material and provides the correct balance between key availability, confidentiality, and integrity. You monitor access to the keys, and if rotation of key material is required, you rotate them using an automated process. You do not allow key material to be accessed by human operators. **Common anti-patterns** - Human access to unencrypted key material. - Creating custom cryptographic algorithms. - Overly broad permissions to access key material. **Benefits of establishing this best practice** By establishing a secure key management mechanism for your workload, you can help provide protection for your content against unauthorized access. Additionally, you may be subject to regulatory requirements to encrypt your data. An effective key management solution can provide technical mechanisms aligned to those regulations to protect key material. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Encryption of data at rest is a fundamental security control. To implement this control, your workload needs a mechanism to securely store and manage the key material used to encrypt your data at rest. AWS offers the AWS Key Management Service (AWS KMS) to provide durable, secure, and redundant storage for AWS KMS keys. Many AWS services integrate with AWS KMS to support encryption of your data. AWS KMS uses FIPS 140-2 Level 3 validated hardware security modules to protect your keys. There is no mechanism to export AWS KMS keys in plain text. When deploying workloads using a multi-account strategy, you should keep AWS KMS keys in the same account as the workload that uses them. This distributed model places the responsibility for managing the AWS KMS keys with your team. In other use cases, your organization may choose to store AWS KMS keys into a centralized account. This centralized structure requires additional policies to enable the cross-account access required for the workload account to access keys stored in the centralized account, but may be more applicable in use cases where a single key is shared across multiple AWS accounts. Regardless of where the key material is stored, you should tightly control access to the key through the use of key policies and IAM policies. Key policies are the primary way to control access to an AWS KMS key. Additionally, AWS KMS key grants can provide access to AWS services to encrypt and decrypt data on your behalf. Review the guidance for access control to your AWS KMS keys. You should monitor the use of encryption keys to detect unusual access patterns. Operations performed using AWS managed keys and customer managed keys stored in AWS KMS can be logged in AWS CloudTrail and should be reviewed periodically. Pay special attention to monitoring key destruction events. To mitigate accidental or malicious destruction of key material, key destruction events do not delete the key material immediately. Attempts to delete keys in AWS KMS are subject to a waiting period, which defaults to 30 days and a minimum of 7 days, providing administrators time to review these actions and roll back the request if necessary. Most AWS services use AWS KMS in a way that is transparent to you - your only requirement is to decide whether to use an AWS managed or customer managed key. If your workload requires the direct use of AWS KMS to encrypt or decrypt data, you should use envelope encryption to protect your data. The AWS Encryption SDK can provide your applications client-side encryption primitives to implement envelope encryption and integrate with AWS KMS. ### Implementation steps 1. Determine the appropriate key management options (AWS managed or customer managed) for the key. - For ease of use, AWS offers AWS owned and AWS managed keys for most services, which provide encryption-at-rest capability without the need to manage key material or key policies. - When using customer managed keys, consider the default key store to provide the best balance between agility, security, data sovereignty, and availability. Other use cases may require the use of custom key stores with AWS CloudHSM or the external key store. 2. Review the list of services that you are using for your workload to understand how AWS KMS integrates with the service. For example, EC2 instances can use encrypted EBS volumes, verifying that Amazon EBS snapshots created from those volumes are also encrypted using a customer managed key and mitigating accidental disclosure of unencrypted snapshot data. - For detailed information about the encryption options that an AWS service offers, see the Encryption at Rest topic in the user guide or the developer guide for the service. 4. Implement AWS KMS: AWS KMS makes it simple for you to create and manage keys and control the use of encryption across a wide range of AWS services and in your applications. 5. Consider AWS Encryption SDK: Use the AWS Encryption SDK with AWS KMS integration when your application needs to encrypt data client-side. 6. Enable IAM Access Analyzer to automatically review and notify if there are overly broad AWS KMS key policies. - Consider using custom policy checks to verify that a resource policy update does not grant public access to KMS Keys. 7. Enable Security Hub to receive notifications if there are misconfigured key policies, keys scheduled for deletion, or keys without automated rotation enabled. 8. Determine the logging level appropriate for your AWS KMS keys. Since calls to AWS KMS, including read-only events, are logged, the CloudTrail logs associated with AWS KMS can become voluminous. - Some organizations prefer to segregate the AWS KMS logging activity into a separate trail.

๐Ÿ’ผ SEC08-BP02 Enforce encryption at rest

Encrypt private data at rest to maintain confidentiality and provide an additional layer of protection against unintended data disclosure or exfiltration. Encryption protects data so that it cannot be read or accessed without first being decrypted. Inventory and control unencrypted data to mitigate risks associated with data exposure. **Desired outcome** You have mechanisms that encrypt private data by default when at rest. These mechanisms help maintain the confidentiality of the data and provides an additional layer of protection against unintended data disclosure or exfiltration. You maintain an inventory of unencrypted data and understand the controls that are in place to protect it. **Common anti-patterns** - Not using encrypt-by-default configurations. - Providing overly permissive access to decryption keys. - Not monitoring the use of encryption and decryption keys. - Storing data unencrypted. - Using the same encryption key for all data regardless of data usage, types, and classification. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Map encryption keys to data classifications within your workloads. This approach helps protect against overly permissive access when using either a single, or very small number of encryption keys for your data (see SEC07-BP01 Understand your data classification scheme). AWS Key Management Service (AWS KMS) integrates with many AWS services to make it easier to encrypt your data at rest. For example, in Amazon Elastic Compute Cloud (Amazon EC2), you can set default encryption on accounts so that new EBS volumes are automatically encrypted. When using AWS KMS, consider how tightly the data needs to be restricted. Default and service-controlled AWS KMS keys are managed and used on your behalf by AWS. For sensitive data that requires fine-grained access to the underlying encryption key, consider customer managed keys (CMKs). You have full control over CMKs, including rotation and access management through the use of key policies. Additionally, services such as Amazon Simple Storage Service (Amazon S3) now encrypt all new objects by default. This implementation provides enhanced security with no impact on performance. Other services, such as Amazon Elastic Compute Cloud (Amazon EC2) or Amazon Elastic File System (Amazon EFS), support settings for default encryption. You can also use AWS Config Rules to automatically check that you are using encryption for Amazon Elastic Block Store (Amazon EBS) volumes, Amazon Relational Database Service (Amazon RDS) instances, Amazon S3 buckets, and other services within your organization. AWS also provides options for client-side encryption, allowing you to encrypt data prior to uploading it to the cloud. The AWS Encryption SDK provides a way to encrypt your data using envelope encryption. You provide the wrapping key, and the AWS Encryption SDK generates a unique data key for each data object it encrypts. Consider AWS CloudHSM if you need a managed single-tenant hardware security module (HSM). AWS CloudHSM allows you to generate, import, and manage cryptographic keys on a FIPS 140-2 level 3 validated HSM. Some use cases for AWS CloudHSM include protecting private keys for issuing a certificate authority (CA), and turning on transparent data encryption (TDE) for Oracle databases. The AWS CloudHSM Client SDK provides software that allows you to encrypt data client side using keys stored inside AWS CloudHSM prior to uploading your data into AWS. The Amazon DynamoDB Encryption Client also allows you to encrypt and sign items prior to upload into a DynamoDB table. ### Implementation steps 1. Configure default encryption for new Amazon EBS volumes: Specify that you want all newly created Amazon EBS volumes to be created in encrypted form, with the option of using the default key provided by AWS or a key that you create. 2. Configure encrypted Amazon Machine Images (AMIs): Copying an existing AMI with encryption configured will automatically encrypt root volumes and snapshots. 3. Configure Amazon RDS encryption: Configure encryption for your Amazon RDS database clusters and snapshots at rest by using the encryption option. 4. Create and configure AWS KMS keys with policies that limit access to the appropriate principals for each classification of data: For example, create one AWS KMS key for encrypting production data and a different key for encrypting development or test data. You can also provide key access to other AWS accounts. Consider having different accounts for your development and production environments. If your production environment needs to decrypt artifacts in the development account, you can edit the CMK policy used to encrypt the development artifacts to give the production account the ability to decrypt those artifacts. The production environment can then ingest the decrypted data for use in production. 5. Configure encryption in additional AWS services: For other AWS services you use, review the security documentation for that service to determine the service's encryption options.

๐Ÿ’ผ SEC08-BP03 Automate data at rest protection

Use automation to validate and enforce data at rest controls. Use automated scanning to detect misconfiguration of your data storage solutions, and perform remediations through automated programmatic response where possible. Incorporate automation in your CI/CD processes to detect data storage misconfigurations before they are deployed to production. **Desired outcome** Automated systems scan and monitor data storage locations for misconfiguration of controls, unauthorized access, and unexpected use. Detection of misconfigured storage locations initiates automated remediations. Automated processes create data backups and store immutable copies outside of the original environment. **Common anti-patterns** - Not considering options to enable encryption by default settings, where supported. - Not considering security events, in addition to operational events, when formulating an automated backup and recovery strategy. - Not enforcing public access settings for storage services. - Not monitoring and audit your controls for protecting data at rest. **Benefits of establishing this best practice** Automation helps to prevent the risk of misconfiguring your data storage locations. It helps to prevent misconfigurations from entering your production environments. This best practice also helps with detecting and fixing misconfigurations if they occur. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Automation is a theme throughout the practices for protecting your data at rest. SEC01-BP06 Automate deployment of standard security controls describes how you can capture the configuration of your resources using infrastructure as code (IaC) templates, such as with AWS CloudFormation. These templates are committed to a version control system, and are used to deploy resources on AWS through a CI/CD pipeline. These techniques equally apply to automating the configuration of your data storage solutions, such as encryption settings on Amazon S3 buckets. You can check the settings that you define in your IaC templates for misconfiguration in your CI/CD pipelines using rules in AWS CloudFormation Guard. You can monitor settings that are not yet available in CloudFormation or other IaC tooling for misconfiguration with AWS Config. Alerts that Config generates for misconfigurations can be remediated automatically, as described in SEC04-BP04 Initiate remediation for non-compliant resources. Using automation as part of your permissions management strategy is also an integral component of automated data protections. SEC03-BP02 Grant least privilege access and SEC03-BP04 Reduce permissions continuously describe configuring least-privilege access policies that are continually monitored by the AWS Identity and Access Management Access Analyzer to generate findings when permission can be reduced. Beyond automation for monitoring permissions, you can configure Amazon GuardDuty to watch for anomalous data access behavior for your EBS volumes (by way of an EC2 instance), S3 buckets, and supported Amazon Relational Database Service databases. Automation also plays a role in detecting when sensitive data is stored in unauthorized locations. SEC07-BP03 Automate identification and classification describes how Amazon Macie can monitor your S3 buckets for unexpected sensitive data and generate alerts that can initiate an automated response. Follow the practices in REL09 Back up data to develop an automated data backup and recovery strategy. Data backup and recovery is as important for recovering from security events as it is for operational events. ### Implementation steps 1. Capture data storage configuration in IaC templates. Use automated checks in your CI/CD pipelines to detect misconfigurations. - You can use for AWS CloudFormation your IaC templates, and AWS CloudFormation Guard for checking templates for misconfiguration. - Use AWS Config to run rules in a proactive evaluation mode. Use this setting to check the compliance of a resource as a step in your CI/CD pipeline before creating it. 2. Monitor resources for data storage misconfigurations. - Set AWS Config to monitor data storage resources for changes in control configurations and generate alerts to invoke remediation actions when it detects a misconfiguration. - See SEC04-BP04 Initiate remediation for non-compliant resources for more guidance on automated remediations. 3. Monitor and reduce data access permissions continually through automation. - IAM Access Analyzer can run continually to generate alerts when permissions can potentially be reduced. 4. Monitor and alert on anomalous data access behaviors. - GuardDuty watches for both known threat signatures and deviations from baseline access behaviors for data storage resources such as EBS volumes, S3 buckets, and RDS databases. 5. Monitor and alert on sensitive data being stored in unexpected locations. - Use Amazon Macie to continually scan your S3 buckets for sensitive data. 6. Automate secure and encrypted backups of your data. - AWS Backup is a managed service that creates encrypted and secure backups of various data sources on AWS. Elastic Disaster Recovery allows you to copy full server workloads and maintain continuous data protection with a recovery point objective (RPO) measured in seconds. You can configure both services to work together to automate creating data backups and copying them to failover locations. This can help keep your data available when impacted by either operational or security events.

๐Ÿ’ผ SEC08-BP04 Enforce access control

To help protect your data at rest, enforce access control using mechanisms such as isolation and versioning. Apply least privilege and conditional access controls. Prevent granting public access to your data. **Desired outcome** You verify that only authorized users can access data on a need-to-know basis. You protect your data with regular backups and versioning to prevent against intentional or inadvertent modification or deletion of data. You isolate critical data from other data to protect its confidentiality and data integrity. **Common anti-patterns** - Storing data with different sensitivity requirements or classification together. - Using overly permissive permissions on decryption keys. - Improperly classifying data. - Not retaining detailed backups of important data. - Providing persistent access to production data. - Not auditing data access or regularly reviewing permissions. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Protecting data at rest is important to maintain data integrity, confidentiality, and compliance with regulatory requirements. You can implement multiple controls to help achieve this, including access control, isolation, conditional access, and versioning. You can enforce access control with the principle of least privilege, which provides only the necessary permissions to users and services to perform their tasks. This includes access to encryption keys. Review your AWS Key Management Service (AWS KMS) policies to verify that the level of access you grant is appropriate and that relevant conditions apply. You can separate data based on different classification levels by using distinct AWS accounts for each level, and manage these accounts using AWS Organizations. This isolation can help prevent unauthorized access and minimizes the risk of data exposure. Regularly review the level of access granted in Amazon S3 bucket policies. Avoid using publicly readable or writeable buckets unless absolutely necessary. Consider using AWS Config to detect publicly available buckets and Amazon CloudFront to serve content from Amazon S3. Verify that buckets that should not allow public access are properly configured to prevent it. Implement versioning and object locking mechanisms for critical data stored in Amazon S3. Amazon S3 versioning preserves previous versions of objects to recover data from accidental deletion or overwrites. Amazon S3 Object Lock provides mandatory access control for objects, which prevents them from being deleted or overwritten, even by the root user, until the lock expires. Additionally, Amazon S3 Glacier Vault Lock offers a similar feature for archives stored in Amazon S3 Glacier. ### Implementation steps 1. Enforce access control with the principle of least privilege: - Review the access permissions granted to users and services, and verify that they have only the necessary permissions to perform their tasks. - Review access to encryption keys by checking the AWS Key Management Service (AWS KMS) policies. 2. Separate data based on different classification levels: - Use distinct AWS accounts for each data classification level. - Manage these accounts using AWS Organizations. 3. Review Amazon S3 bucket and object permissions: - Regularly review the level of access granted in Amazon S3 bucket policies. - Avoid using publicly readable or writeable buckets unless absolutely necessary. - Consider using AWS Config to detect publicly available buckets. - Use Amazon CloudFront to serve content from Amazon S3. - Verify that buckets that should not allow public access are properly configured to prevent it. - You can apply the same review process for databases and any other data sources that use IAM authentication, such as SQS or third-party data stores. 4. Use AWS IAM Access Analyzer: - You can configure AWS IAM Access Analyzer to analyze Amazon S3 buckets and generate findings when an S3 policy grants access to an external entity. 5. Implement versioning and object locking mechanisms: - Use Amazon S3 versioning to preserve previous versions of objects, which provides recovery from accidental deletion or overwrites. - Use Amazon S3 Object Lock to provide mandatory access control for objects, which prevents them from being deleted or overwritten, even by the root user, until the lock expires. - Use Amazon S3 Glacier Vault Lock for archives stored in Amazon S3 Glacier. 6. Use Amazon S3 Inventory: - You can use Amazon S3 Inventory to audit and report on the replication and encryption status of your S3 objects. 7. Review Amazon EBS and AMI sharing permissions: - Review your sharing permissions for Amazon EBS and AMI sharing to verify that your images and volumes are not shared with AWS accounts that are external to your workload. 8. Review AWS Resource Access Manager Shares periodically: - You can use AWS Resource Access Manager to share resources, such as AWS Network Firewall policies, Amazon Route 53 resolver rules, and subnets, within your Amazon VPCs. - Audit shared resources regularly and stop sharing resources that no longer need to be shared.

๐Ÿ’ผ SEC09-BP01 Implement secure key and certificate management

Transport Layer Security (TLS) certificates are used to secure network communications and establish the identity of websites, resources, and workloads over the internet, as well as private networks. **Desired outcome** A secure certificate management system that can provision, deploy, store, and renew certificates in a public key infrastructure (PKI). A secure key and certificate management mechanism prevents certificate private key material from disclosure and automatically renews the certificate on a periodic basis. It also integrates with other services to provide secure network communications and identity for machine resources inside of your workload. Key material should never be accessible to human identities. **Common anti-patterns** - Performing manual steps during the certificate deployment or renewal processes. - Paying insufficient attention to certificate authority (CA) hierarchy when designing a private CA. - Using self-signed certificates for public resources. **Benefits of establishing this best practice** - Simplify certificate management through automated deployment and renewal - Encourage encryption of data in transit using TLS certificates - Increased security and auditability of certificate actions taken by the certificate authority - Organization of management duties at different layers of the CA hierarchy **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Modern workloads make extensive use of encrypted network communications using PKI protocols such as TLS. PKI certificate management can be complex, but automated certificate provisioning, deployment, and renewal can reduce the friction associated with certificate management. AWS provides two services to manage general-purpose PKI certificates: AWS Certificate Manager and AWS Private Certificate Authority (AWS Private CA). ACM is the primary service that customers use to provision, manage, and deploy certificates for use in both public-facing as well as private AWS workloads. ACM issues private certificates using AWS Private CA and integrates with many other AWS managed services to provide secure TLS certificates for workloads. ACM can also issue publicly trusted certificates from Amazon Trust Services. Public certificates from ACM can be used on public facing workloads, as modern browsers and operating systems trust these certificates by default. AWS Private CA allows you to establish your own root or subordinate certificate authority and issue TLS certificates through an API. You can use these kinds of certificates in scenarios where you control and manage the trust chain on the client side of the TLS connection. In addition to TLS use cases, AWS Private CA can be used to issue certificates to Kubernetes pods, Matter device product attestations, code signing, and other use cases with a custom template. You can also use IAM Roles Anywhere to provide temporary IAM credentials to on-premises workloads that have been issued X.509 certificates signed by your Private CA. In addition to ACM and AWS Private CA, AWS IoT Core provides specialized support for provisioning, managing and deploying PKI certificates to IoT devices. AWS IoT Core provides specialized mechanisms for onboarding IoT devices into your public key infrastructure at scale. Some AWS services, such as Amazon API Gateway and Elastic Load Balancing, offer their own capabilities for using certificates to secure application connections. For example, both API Gateway and Application Load Balancer (ALB) support mutual TLS (mTLS) using client certificates that you create and export using the AWS Management Console, CLI, or APIs. **Considerations for establishing a private CA hierarchy** When you need to establish a private CA, it's important to take special care to properly design the CA hierarchy upfront. It's a best practice to deploy each level of your CA hierarchy into separate AWS accounts when creating a private CA hierarchy. This intentional step reduces the surface area for each level in the CA hierarchy, making it simpler to discover anomalies in CloudTrail log data and reducing the scope of access or impact if there is unauthorized access to one of the accounts. The root CA should reside in its own separate account and should only be used to issue one or more intermediate CA certificates. Then, create one or more intermediate CAs in accounts separate from the root CA's account to issue certificates for end users, devices, or other workloads. Finally, issue certificates from your root CA to the intermediate CAs, which will in turn issue certificates to your end users or devices. ### Implementation steps 1. Determine the relevant AWS services required for your use case: - Many use cases can leverage the existing AWS public key infrastructure using AWS Certificate Manager. ACM can be used to deploy TLS certificates for web servers, load balancers, or other uses for publicly trusted certificates. - Consider AWS Private CA when you need to establish your own private certificate authority hierarchy or need access to exportable certificates. ACM can then be used to issue many types of end-entity certificates using the AWS Private CA. - For use cases where certificates must be provisioned at scale for embedded Internet of things (IoT) devices, consider AWS IoT Core. - Consider using native mTLS functionality in services like Amazon API Gateway or Application Load Balancer. 2. Implement automated certificate renewal whenever possible: - Use ACM managed renewal for certificates issued by ACM along with integrated AWS managed services. 3. Establish logging and audit trails: - Enable CloudTrail logs to track access to the accounts holding certificate authorities. Consider configuring log file integrity validation in CloudTrail to verify the authenticity of the log data. - Periodically generate and review audit reports that list the certificates that your private CA has issued or revoked. These reports can be exported to an S3 bucket. - When deploying a private CA, you will also need to establish an S3 bucket to store the Certificate Revocation List (CRL). For guidance on configuring this S3 bucket based on your workload's requirements, see Planning a certificate revocation list (CRL).

๐Ÿ’ผ SEC09-BP02 Enforce encryption in transit

Enforce your defined encryption requirements based on your organizationโ€™s policies, regulatory obligations and standards to help meet organizational, legal, and compliance requirements. Only use protocols with encryption when transmitting sensitive data outside of your virtual private cloud (VPC). Encryption helps maintain data confidentiality even when the data transits untrusted networks. **Desired outcome** - You encrypt network traffic between your resources and the internet to mitigate unauthorized access to the data. - You encrypt network traffic within your internal AWS environment according to your security requirements. - You encrypt data in transit using secure TLS protocols and cipher suites. **Common anti-patterns** - Using deprecated versions of SSL, TLS, and cipher suite components (for example, SSL v3.0, 1024-bit RSA keys, and RC4 cipher). - Allowing unencrypted (HTTP) traffic to or from public-facing resources. - Not monitoring and replacing X.509 certificates prior to expiration. - Using self-signed X.509 certificates for TLS. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance AWS services provide HTTPS endpoints using TLS for communication, providing encryption in transit when communicating with the AWS APIs. Insecure HTTP protocols can be audited and blocked in a Virtual Private Cloud (VPC) through the use of security groups. HTTP requests can also be automatically redirected to HTTPS in Amazon CloudFront or on an Application Load Balancer. You can use an Amazon Simple Storage Service (Amazon S3) bucket policy to restrict the ability to upload objects through HTTP, effectively enforcing the use of HTTPS for object uploads to your bucket(s). You have full control over your computing resources to implement encryption in transit across your services. Additionally, you can use VPN connectivity into your VPC from an external network or AWS Direct Connect to facilitate encryption of traffic. Verify that your clients make calls to AWS APIs using at least TLS 1.2, as AWS has deprecated the use of earlier versions of TLS as of February 2024. We recommend you use TLS 1.3. If you have special requirements for encryption in transit, you can find third-party solutions available in the AWS Marketplace. ### Implementation steps 1. Enforce encryption in transit: Your defined encryption requirements should be based on the latest standards and best practices and only allow secure protocols. For example, configure a security group to only allow the HTTPS protocol to an application load balancer or Amazon EC2 instance. 2. Configure secure protocols in edge services: Configure HTTPS with Amazon CloudFront and use a security profile appropriate for your security posture and use case. 3. Use a VPN for external connectivity: Consider using an IPsec VPN for securing point-to-point or network-to-network connections to help provide both data privacy and integrity. 4. Configure secure protocols in load balancers: Select a security policy that provides the strongest cipher suites supported by the clients that will be connecting to the listener. Create an HTTPS listener for your Application Load Balancer. 5. Configure secure protocols in Amazon Redshift: Configure your cluster to require a secure socket layer (SSL) or transport layer security (TLS) connection. 6. Configure secure protocols: Review AWS service documentation to determine encryption-in-transit capabilities. 7. Configure secure access when uploading to Amazon S3 buckets: Use Amazon S3 bucket policy controls to enforce secure access to data. 8. Consider using AWS Certificate Manager: ACM allows you to provision, manage, and deploy public TLS certificates for use with AWS services. 9. Consider using AWS Private Certificate Authority for private PKI needs: AWS Private CA allows you to create private certificate authority (CA) hierarchies to issue end-entity X.509 certificates that can be used to create encrypted TLS channels.

๐Ÿ’ผ SEC09-BP03 Authenticate network communications

Verify the identity of communications by using protocols that support authentication, such as Transport Layer Security (TLS) or IPsec. Design your workload to use secure, authenticated network protocols whenever communicating between services, applications, or to users. Using network protocols that support authentication and authorization provides stronger control over network flows and reduces the impact of unauthorized access. **Desired outcome** A workload with well-defined data plane and control plane traffic flows between services. The traffic flows use authenticated and encrypted network protocols where technically feasible. **Common anti-patterns** - Unencrypted or unauthenticated traffic flows within your workload. - Reusing authentication credentials across multiple users or entities. - Relying solely on network controls as an access control mechanism. - Creating a custom authentication mechanism rather than relying on industry- standard authentication mechanisms. - Overly permissive traffic flows between service components or other resources in the VPC. **Benefits of establishing this best practice** - Limits the scope of impact for unauthorized access to one part of the workload. - Provides a higher level of assurance that actions are only performed by authenticated entities. - Improves decoupling of services by clearly defining and enforcing intended data transfer interfaces. - Enhances monitoring, logging, and incident response through request attribution and well-defined communication interfaces. - Provides defense-in-depth for your workloads by combining network controls with authentication and authorization controls. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Your workloadโ€™s network traffic patterns can be characterized into two categories: - East-west traffic represents traffic flows between services that make up a workload. - North-south traffic represents traffic flows between your workload and consumers. While it is common practice to encrypt north-south traffic, securing east-west traffic using authenticated protocols is less common. Modern security practices recommend that network design alone does not grant a trusted relationship between two entities. When two services may reside within a common network boundary, it is still best practice to encrypt, authenticate, and authorize communications between those services. As an example, AWS service APIs use the AWS Signature Version 4 (SigV4) signature protocol to authenticate the caller, no matter what network the request originates from. This authentication ensures that AWS APIs can verify the identity that requested the action, and that identity can then be combined with policies to make an authorization decision to determine whether the action should be allowed or not. Services such as Amazon VPC Lattice and Amazon API Gateway allow you use the same SigV4 signature protocol to add authentication and authorization to east-west traffic in your own workloads. If resources outside of your AWS environment need to communicate with services that require SigV4-based authentication and authorization, you can use AWS Identity and Access Management (IAM) Roles Anywhere on the non-AWS resource to acquire temporary AWS credentials. These credentials can be used to sign requests to services using SigV4 to authorize access. Another common mechanism for authenticating east-west traffic is TLS mutual authentication (mTLS). Many Internet of Things (IoT), business-to-business applications, and microservices use mTLS to validate the identity of both sides of a TLS communication through the use of both client and server-side X.509 certificates. These certificates can be issued by AWS Private Certificate Authority (AWS Private CA). You can use services such as Amazon API Gateway to provide mTLS authentication for inter- or intra-workload communication. Application Load Balancer also supports mTLS for internal or external facing workloads. While mTLS provides authentication information for both sides of a TLS communication, it does not provide a mechanism for authorization. Finally, OAuth 2.0 and OpenID Connect (OIDC) are two protocols typically used for controlling access to services by users, but are now becoming popular for service-to- service traffic as well. API Gateway provides a JSON Web Token (JWT) authorizer, allowing workloads to restrict access to API routes using JWTs issued from OIDC or OAuth 2.0 identity providers. OAuth2 scopes can be used as a source for basic authorization decisions, but the authorization checks still need to be implemented in the application layer, and OAuth2 scopes alone cannot support more complex authorization needs. ### Implementation steps 1. Define and document your workload network flows: The first step in implementing a defense-in-depth strategy is defining your workloadโ€™s traffic flows. - Create a data flow diagram that clearly defines how data is transmitted between different services that comprise your workload. This diagram is the first step to enforcing those flows through authenticated network channels. - Instrument your workload in development and testing phases to validate that the data flow diagram accurately reflects the workloadโ€™s behavior at runtime. - A data flow diagram can also be useful when performing a threat modeling exercise, as described in SEC01-BP07 Identify threats and prioritize mitigations using a threat model. 2. Establish network controls: Consider AWS capabilities to establish network controls aligned to your data flows. While network boundaries should not be the only security control, they provide a layer in the defense-in-depth strategy to protect your workload. - Use security groups to establish define and restrict data flows between resources. - Consider using AWS PrivateLink to communicate with both AWS and third-party services that support AWS PrivateLink. Data sent through a AWS PrivateLink interface endpoint stays within the AWS network backbone and does not traverse the public Internet. 3. Implement authentication and authorization across services in your workload: Choose the set of AWS services most appropriate to provide authenticated, encrypted traffic flows in your workload. - Consider Amazon VPC Lattice to secure service-to-service communication. VPC Lattice can use SigV4 authentication combined with auth policies to control service-to- service access. - For service-to-service communication using mTLS, consider API Gateway, Application Load Balancer. AWS Private CA can be used to establish a private CA hierarchy capable of issuing certificates for use with mTLS. - When integrating with services using OAuth 2.0 or OIDC, consider API Gateway using the JWT authorizer. - For communication between your workload and IoT devices, consider AWS IoT Core, which provides several options for network traffic encryption and authentication. 4. Monitor for unauthorized access: Continually monitor for unintended communication channels, unauthorized principals attempting to access protected resources, and other improper access patterns. - If using VPC Lattice to manage access to your services, consider enabling and monitoring VPC Lattice access logs. These access logs include information on the requesting entity, network information including source and destination VPC, and request metadata. - Consider enabling VPC flow logs to capture metadata on network flows and periodically review for anomalies. - Refer to the AWS Security Incident Response Guide and the Incident Response section of the AWS Well-Architected Framework security pillar for more guidance on planning, simulating, and responding to security incidents.

๐Ÿ’ผ SEC10-BP01 Identify key personnel and external resources

Identify internal and external personnel, resources, and legal obligations to help your organization respond to an incident. **Desired outcome** - You have a list of key personnel, their contact information, and the roles they play when responding to a security event. - You review this information regularly and update it to reflect personnel changes from an internal and external tools perspective. - You consider all third-party service providers and vendors while documenting this information, including security partners, cloud providers, and software-as-a-service (SaaS) applications. - During a security event, personnel are available with the appropriate level of responsibility, context, and access to be able to respond and recover. **Common anti-patterns** - Not maintaining an updated list of key personnel with contact information, their roles, and their responsibilities when responding to security events. - Assuming that everyone understands the people, dependencies, infrastructure, and solutions when responding to and recovering from an event. - Not having a document or knowledge repository that represents key infrastructure or application design. - Not having proper onboarding processes for new employees to effectively contribute to a security event response, such as conducting event simulations. - Not having an escalation path in place when key personnel are temporarily unavailable or fail to respond during security events. **Benefits of establishing this best practice** This practice reduces the triage and response time spent on identifying the right personnel and their roles during an event. Minimize wasted time during an event by maintaining an updated list of key personnel and their roles so you can bring the right individuals to triage and recover from an event. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Identify key personnel in your organization: Maintain a contact list of personnel within your organization that you need to involve. Regularly review and update this information in the event of personnel movement, like organizational changes, promotions, and team changes. This is especially important for key roles like incident managers, incident responders, and communications lead. - Incident manager: Incident managers have overall authority during the event response. - Incident responders: Incident responders are responsible for investigation and remediation activities. These people can differ based on the type of event, but are typically developers and operation teams responsible for the impacted application. - Communications lead: The communications lead is responsible for internal and external communications, especially with public agencies, regulators, and customers. - Onboarding process: Regularly train and onboard new employees to equip them with the necessary skills and knowledge to contribute effectively to incident response efforts. Incorporate simulations and hands-on exercises as part of the onboarding process to facilitate their preparedness. - Subject matter experts (SMEs): In the case of distributed and autonomous teams, we recommend you identify an SME for mission critical workloads. They offer insights into the operation and data classification of critical workloads involved in the event. **Example table format:** | Role | Name | Contact Information | Responsibilities | | --- | --- | --- | --- | | Incident Manager | Jane Doe| jane.doe@example.com | Overall authority during response | | Incident Responder | John Smith | john.smith@example.com | Investigation and remediation | | Communications Lead | Emily Johnson | emily.johnson@example.com | Internal and external communications | | Communications Lead | Michael Brown | michael.brown@example.com | Insights on critical workloads | Consider using the AWS Systems Manager Incident Manager feature to capture key contacts, define a response plan, automate on-call schedules, and create escalation plans. Automate and rotate all staff through an on-call schedule, so that responsibility for the workload is shared across its owners. This promotes good practices, such as emitting relevant metrics and logs as well as defining alarm thresholds that matter for the workload. Identify external partners: Enterprises use tools built by independent software vendors (ISVs), partners, and subcontractors to build differentiating solutions for their customers. Engage key personnel from these parties who can help respond to and recover from an incident. We recommend you sign up for the appropriate level of Support in order to get prompt access to AWS subject matter experts through a support case. Consider similar arrangements with all critical solutions providers for the workloads. Some security events require publicly listed businesses to notify relevant public agencies and regulators of the event and impacts. Maintain and update contact information for the relevant departments and responsible individuals. ### Implementation steps 1. Set up an incident management solution. - Consider deploying Incident Manager in your Security Tooling account. 2. Define contacts in your incident management solution. - Define at least two types of contact channels for each contact (such as SMS, phone, or email), to ensure reachability during an incident. 3. Define a response plan. - Identify the most appropriate contacts to engage during an incident. Define escalation plans aligned to the roles of personnel to be engaged, rather than individual contacts. Consider including contacts that may be responsible for informing external entities, even if they are not directly engaged to resolve the incident.

๐Ÿ’ผ SEC10-BP02 Develop incident management plans

The first document to develop for incident response is the incident response plan. The incident response plan is designed to be the foundation for your incident response program and strategy. **Benefits of establishing this best practice** Developing thorough and clearly defined incident response processes is key to a successful and scalable incident response program. When a security event occurs, clear steps and workflows can help you to respond in a timely manner. You might already have existing incident response processes. Regardless of your current state, itโ€™s important to update, iterate, and test your incident response processes regularly. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance An incident management plan is critical to respond, mitigate, and recover from the potential impact of security incidents. An incident management plan is a structured process for identifying, remediating, and responding in a timely matter to security incidents. The cloud has many of the same operational roles and requirements found in an on-premises environment. When you create an incident management plan, it is important to factor response and recovery strategies that best align with your business outcome and compliance requirements. For example, if you operate workloads in AWS that are FedRAMP compliant in the United States, follow the recommendations in NIST SP 800-61 Computer Security Handling Guide. Similarly, when you operate workloads that store personally identifiable information (PII), consider how to protect and respond to issues related to data residency and use. When building an incident management plan for your workloads in AWS, start with the AWS Shared Responsibility Model for building a defense-in-depth approach towards incident response. In this model, AWS manages security of the cloud, and you are responsible for security in the cloud. This means that you retain control and are responsible for the security controls you choose to implement. The AWS Security Incident Response Guide details key concepts and foundational guidance for building a cloud-centric incident management plan. An effective incident management plan must be continually iterated upon, remaining current with your cloud operations goal. Consider using the implementation plans detailed below as you create and evolve your incident management plan. ### Implementation steps 1. Define roles and responsibilities within your organization for handling security events. This should involve representatives from various departments, including: - Human resources (HR) - Executive team - Legal department - Application owners and developers (subject matter experts, or SMEs) 2. Clearly outline who is responsible, accountable, consulted, and informed (RACI) during an incident. Create a RACI chart to facilitate quick and direct communication, and clearly outline the leadership across different stages of an event. 3. Involve application owners and developers (SMEs) during an incident, as they can provide valuable information and context to aid in measuring the impact. Build relationships with these SMEs, and practice incident response scenarios with them before an actual incident occurs. 4. Involve trusted partners or external experts in the investigation or response process, as they can provide additional expertise and perspective. 5. Align your incident management plans and roles with any local regulations or compliance requirements that govern your organization. 6. Practice and test your incident response plans regularly, and involve all the defined roles and responsibilities. This helps streamline the process and verify you have a coordinated and efficient response to security incidents. 7. Review and update the roles, responsibilities, and RACI chart periodically, or as your organizational structure or requirements change. ### Understand AWS response teams and support - **AWS Support:** - Support offers a range of plans that provide access to tools and expertise that support the success and operational health of your AWS solutions. If you need technical support and more resources to help plan, deploy, and optimize your AWS environment, you can select a support plan that best aligns with your AWS use case. - Consider the Support Center in AWS Management Console (sign-in required) as the central point of contact to get support for issues that affect your AWS resources. Access to Support is controlled by AWS Identity and Access Management. - **AWS Customer Incident Response Team (CIRT)** - The AWS Customer Incident Response Team (CIRT) is a specialized 24/7 global AWS team that provides support to customers during active security events on the customer side of the AWS Shared Responsibility Model. - When the AWS CIRT supports you, they provide assistance with triage and recovery for an active security event on AWS. They can assist in root cause analysis through the use of AWS service logs and provide you with recommendations for recovery. They can also provide security recommendations and best practices to help you avoid security events in the future. - AWS customers can engage the AWS CIRT through a Support case. - **DDoS response support** - AWS offers AWS Shield, which provides a managed distributed denial of service (DDoS) protection service that safeguards web applications running on AWS. Shield provides always-on detection and automatic inline mitigations that can minimize application downtime and latency, so there is no need to engage Support to benefit from DDoS protection. There are two tiers of Shield: AWS Shield Standard and AWS Shield Advanced. To learn about the differences between these two tiers, see Shield features documentation. - **AWS Managed Services (AMS)** - AWS Managed Services (AMS) provides ongoing management of your AWS infrastructure so you can focus on your applications. By implementing best practices to maintain your infrastructure, AMS helps reduce your operational overhead and risk. AMS automates common activities such as change requests, monitoring, patch management, security, and backup services, and provides full-lifecycle services to provision, run, and support your infrastructure. - AMS takes responsibility for deploying a suite of security detective controls and provides a 24/7 first line of response to alerts. When an alert is initiated, AMS follows a standard set of automated and manual playbooks to verify a consistent response. These playbooks are shared with AMS customers during onboarding so that they can develop and coordinate a response with AMS. ### Develop the incident response plan The incident response plan is designed to be the foundation for your incident response program and strategy. The incident response plan should be in a formal document. An incident response plan typically includes these sections: - **Incident response team overview:** Outlines the goals and functions of the incident response team. - **Roles and responsibilities:** Lists the incident response stakeholders and details their roles when an incident occurs. - **A communication plan:** Details contact information and how you communicate during an incident. - **Backup communication methods:** Itโ€™s a best practice to have out-of-band communication as a backup for incident communication. An example of an application that provides a secure out-of-band communications channel is AWS Wickr. - **Phases of incident response and actions to take:** Enumerates the phases of incident response (for example, detect, analyze, eradicate, contain, and recover), including high-level actions to take within those phases. - **Incident severity and prioritization definitions:** Details how to classify the severity of an incident, how to prioritize the incident, and then how the severity definitions affect escalation procedures. While these sections are common throughout companies of different sizes and industries, each organizationโ€™s incident response plan is unique. You need to build an incident response plan that works best for your organization.

๐Ÿ’ผ SEC10-BP03 Prepare forensic capabilities

Ahead of a security incident, consider developing forensics capabilities to support security event investigations. **Level of risk exposed if this best practice is not established:** Medium Concepts from traditional on-premises forensics apply to AWS. For key information to start building forensics capabilities in the AWS Cloud, see Forensic investigation environment strategies in the AWS Cloud. Once you have your environment and AWS account structure set up for forensics, define the technologies required to effectively perform forensically sound methodologies across the four phases: - **Collection:** Collect relevant AWS logs, such as AWS CloudTrail, AWS Config, VPC Flow Logs, and host-level logs. Collect snapshots, backups, and memory dumps of impacted AWS resources where available. - **Examination:** Examine the data collected by extracting and assessing the relevant information. - **Analysis:** Analyze the data collected in order to understand the incident and draw conclusions from it. - **Reporting:** Present the information resulting from the analysis phase. ### Implementation steps ### Prepare your forensics environment AWS Organizations helps you centrally manage and govern an AWS environment as you grow and scale AWS resources. An AWS organization consolidates your AWS accounts so that you can administer them as a single unit. You can use organizational units (OUs) to group accounts together to administer as a single unit. For incident response, itโ€™s helpful to have an AWS account structure that supports the functions of incident response, which includes a security OU and a forensics OU. Within the security OU, you should have accounts for: - **Log archival:** Aggregate logs in a log archival AWS account with limited permissions. - **Security tools:** Centralize security services in a security tool AWS account. This account operates as the delegated administrator for security services. Within the forensics OU, you have the option to implement a single forensics account or accounts for each Region that you operate in, depending on which works best for your business and operational model. If you create a forensics account per Region, you can block the creation of AWS resources outside of that Region and reduce the risk of resources being copied to an unintended region. For example, if you only operate in US East (N. Virginia) Region (us-east-1) and US West (Oregon) (us-west-2), then you would have two accounts in the forensics OU: one for us-east-1 and one for us-west-2. You can create a forensics AWS account for multiple Regions. You should exercise caution in copying AWS resources to that account to verify youโ€™re aligning with your data sovereignty requirements. Because it takes time to provision new accounts, it is imperative to create and instrument the forensics accounts well ahead of an incident so that responders can be prepared to effectively use them for response. ### Capture backups and snapshots Setting up backups of key systems and databases are critical for recovering from a security incident and for forensics purposes. With backups in place, you can restore your systems to their previous safe state. On AWS, you can take snapshots of various resources. Snapshots provide you with point-in-time backups of those resources. There are many AWS services that can support you in backup and recovery. Especially when it comes to situations such as ransomware, itโ€™s critical for your backups to be well protected. For guidance on securing your backups, see Top 10 security best practices for securing backups in AWS. In addition to securing your backups, you should regularly test your backup and restore processes to verify that the technology and processes you have in place work as expected. ### Automate forensics During a security event, your incident response team must be able to collect and analyze evidence quickly while maintaining accuracy for the time period surrounding the event (such as capturing logs related to a specific event or resource or collecting memory dump of an Amazon EC2 instance). Itโ€™s both challenging and time consuming for the incident response team to manually collect the relevant evidence, especially across a large number of instances and accounts. Additionally, manual collection can be prone to human error. For these reasons, you should develop and implement automation for forensics as much as possible. AWS offers a number of automation resources for forensics, which are listed in the following Resources section. These resources are examples of forensics patterns that we have developed and customers have implemented. While they might be a useful reference architecture to start with, consider modifying them or creating new forensics automation patterns based on your environment, requirements, tools, and forensics processes.

๐Ÿ’ผ SEC10-BP04 Develop and test security incident response playbooks

A key part of preparing your incident response processes is developing playbooks. Incident response playbooks provide prescriptive guidance and steps to follow when a security event occurs. Having clear structure and steps simplifies the response and reduces the likelihood for human error. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Playbooks should be created for incident scenarios such as: - **Expected incidents:** Playbooks should be created for incidents you anticipate. This includes threats like denial of service (DoS), ransomware, and credential compromise. - **Known security findings or alerts:** Playbooks should be created to address your known security findings and alerts, such as those from Amazon GuardDuty. When you receive a GuardDuty finding, the playbook should provide clear steps to prevent mishandling or ignoring the alert. For more remediation details and guidance, see Remediating security issues discovered by GuardDuty. Playbooks should contain technical steps for a security analyst to complete in order to adequately investigate and respond to a potential security incident. ### Implementation steps Items to include in a playbook include: - **Playbook overview:** What risk or incident scenario does this playbook address? What is the goal of the playbook? - **Prerequisites:** What logs, detection mechanisms, and automated tools are required for this incident scenario? What is the expected notification? - **Communication and escalation information:** Who is involved and what is their contact information? What are each of the stakeholdersโ€™ responsibilities? - **Response steps:** Across phases of incident response, what tactical steps should be taken? What queries should an analyst run? What code should be run to achieve the desired outcome? - **Detect:** How will the incident be detected? - **Analyze:** How will the scope of impact be determined? - **Contain:** How will the incident be isolated to limit scope? - **Eradicate:** How will the threat be removed from the environment? - **Recover:** How will the affected system or resource be brought back into production? - **Expected outcomes:** After queries and code are run, what is the expected result of the playbook?

๐Ÿ’ผ SEC10-BP05 Pre-provision access

Verify that incident responders have the correct access pre-provisioned in AWS to reduce the time needed for investigation through to recovery. **Common anti-patterns:** - Using the root account for incident response. - Altering existing accounts. - Manipulating IAM permissions directly when providing just-in-time privilege elevation. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance AWS recommends reducing or eliminating reliance on long-lived credentials wherever possible, in favor of temporary credentials and just-in-time privilege escalation mechanisms. Long-lived credentials are prone to security risk and increase operational overhead. For most management tasks, as well as incident response tasks, we recommend you implement identity federation alongside temporary escalation for administrative access. In this model, a user requests elevation to a higher level of privilege (such as an incident response role) and, provided the user is eligible for elevation, a request is sent to an approver. If the request is approved, the user receives a set of temporary AWS credentials which can be used to complete their tasks. After these credentials expire, the user must submit a new elevation request. We recommend the use of temporary privilege escalation in the majority of incident response scenarios. The correct way to do this is to use the AWS Security Token Service and session policies to scope access. There are scenarios where federated identities are unavailable, such as: - Outage related to a compromised identity provider (IdP). - Misconfiguration or human error causing broken federated access management system. - Malicious activity such as a distributed denial of service (DDoS) event or rendering unavailability of the system. In the preceding cases, there should be emergency break glass access configured to allow investigation and timely remediation of incidents. We recommend that you use a user, group, or role with appropriate permissions to perform tasks and access AWS resources. Use the root user only for tasks that require root user credentials. To verify that incident responders have the correct level of access to AWS and other relevant systems, we recommend the pre-provisioning of dedicated accounts. The accounts require privileged access, and must be tightly controlled and monitored. The accounts must be built with the fewest privileges required to perform the necessary tasks, and the level of access should be based on the playbooks created as part of the incident management plan. Use purpose-built and dedicated users and roles as a best practice. Temporarily escalating user or role access through the addition of IAM policies both makes it unclear what access users had during the incident, and risks the escalated privileges not being revoked. It is important to remove as many dependencies as possible to verify that access can be gained under the widest possible number of failure scenarios. To support this, create a playbook to verify that incident response users are created as users in a dedicated security account, and not managed through any existing Federation or single sign-on (SSO) solution. Each individual responder must have their own named account. The account configuration must enforce strong password policy and multi-factor authentication (MFA). If the incident response playbooks only require access to the AWS Management Console, the user should not have access keys configured and should be explicitly disallowed from creating access keys. This can be configured with IAM policies or service control policies (SCPs) as mentioned in the AWS Security Best Practices for AWS Organizations SCPs. The users should have no privileges other than the ability to assume incident response roles in other accounts. During an incident it might be necessary to grant access to other internal or external individuals to support investigation, remediation, or recovery activities. In this case, use the playbook mechanism mentioned previously, and there must be a process to verify that any additional access is revoked immediately after the incident is complete. To verify that the use of incident response roles can be properly monitored and audited, it is essential that the IAM accounts created for this purpose are not shared between individuals, and that the AWS account root user is not used unless required for a specific task. If the root user is required (for example, IAM access to a specific account is unavailable), use a separate process with a playbook available to verify availability of the root user sign-in credentials and MFA token. To configure the IAM policies for the incident response roles, consider using IAM Access Analyzer to generate policies based on AWS CloudTrail logs. To do this, grant administrator access to the incident response role on a non-production account and run through your playbooks. Once complete, a policy can be created that allows only the actions taken. This policy can then be applied to all the incident response roles across all accounts. You might wish to create a separate IAM policy for each playbook to allow easier management and auditing. Example playbooks could include response plans for ransomware, data breaches, loss of production access, and other scenarios. Use the incident response accounts to assume dedicated incident response IAM roles in other AWS accounts. These roles must be configured to only be assumable by users in the security account, and the trust relationship must require that the calling principal has authenticated using MFA. The roles must use tightly-scoped IAM policies to control access. Ensure that all AssumeRole requests for these roles are logged in CloudTrail and alerted on, and that any actions taken using these roles are logged. It is strongly recommended that both the IAM accounts and the IAM roles are clearly named to allow them to be easily found in CloudTrail logs. An example of this would be to name the IAM accounts <USER_ID>-BREAK-GLASS and the IAM roles BREAK-GLASS-ROLE. CloudTrail is used to log API activity in your AWS accounts and should be used to configure alerts on usage of the incident response roles. Refer to the blog post on configuring alerts when root keys are used. The instructions can be modified to configure the Amazon CloudWatch metric filter-to-filter on AssumeRole events related to the incident response IAM role: { $.eventName = "AssumeRole" && $.requestParameters.roleArn = "<INCIDENT_RESPONSE_ROLE_ARN>" && $.userIdentity.invokedBy NOT EXISTS && $.eventType != "AwsServiceEvent" } As the incident response roles are likely to have a high level of access, it is important that these alerts go to a wide group and are acted upon promptly. During an incident, it is possible that a responder might require access to systems which are not directly secured by IAM. These could include Amazon Elastic Compute Cloud instances, Amazon Relational Database Service databases, or software-as-a-service (SaaS) platforms. It is strongly recommended that rather than using native protocols such as SSH or RDP, AWS Systems Manager Session Manager is used for all administrative access to Amazon EC2 instances. This access can be controlled using IAM, which is secure and audited. It might also be possible to automate parts of your playbooks using AWS Systems Manager Run Command documents, which can reduce user error and improve time to recovery. For access to databases and third-party tools, we recommend storing access credentials in AWS Secrets Manager and granting access to the incident responder roles. Finally, the management of the incident response IAM accounts should be added to your Joiners, Movers, and Leavers processes and reviewed and tested periodically to verify that only the intended access is allowed.

๐Ÿ’ผ SEC10-BP06 Pre-deploy tools

Verify that security personnel have the right tools pre-deployed to reduce the time for investigation through to recovery. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance To automate security response and operations functions, you can use a comprehensive set of APIs and tools from AWS. You can fully automate identity management, network security, data protection, and monitoring capabilities and deliver them using popular software development methods that you already have in place. When you build security automation, your system can monitor, review, and initiate a response, rather than having people monitor your security position and manually react to events. If your incident response teams continue to respond to alerts in the same way, they risk alert fatigue. Over time, the team can become desensitized to alerts and can either make mistakes handling ordinary situations or miss unusual alerts. Automation helps avoid alert fatigue by using functions that process the repetitive and ordinary alerts, leaving humans to handle the sensitive and unique incidents. Integrating anomaly detection systems, such as Amazon GuardDuty, AWS CloudTrail Insights, and Amazon CloudWatch Anomaly Detection, can reduce the burden of common threshold-based alerts. You can improve manual processes by programmatically automating steps in the process. After you define the remediation pattern to an event, you can decompose that pattern into actionable logic, and write the code to perform that logic. Responders can then run that code to remediate the issue. Over time, you can automate more and more steps, and ultimately automatically handle whole classes of common incidents. During a security investigation, you need to be able to review relevant logs to record and understand the full scope and timeline of the incident. Logs are also required for alert generation, indicating certain actions of interest have happened. It is critical to select, enable, store, and set up querying and retrieval mechanisms, and set up alerting. Additionally, an effective way to provide tools to search log data is Amazon Detective. AWS offers over 200 cloud services and thousands of features. We recommend that you review the services that can support and simplify your incident response strategy. In addition to logging, you should develop and implement a tagging strategy. Tagging can help provide context around the purpose of an AWS resource. Tagging can also be used for automation. ### Implementation steps 1. **Enable security services to support detection and response** - AWS provides native detective, preventative, and responsive capabilities, and other services can be used to architect custom security solutions. For a list of the most relevant services for security incident response, see Cloud capability definitions. 2. **Develop and implement a tagging strategy** - Obtaining contextual information on the business use case and relevant internal stakeholders surrounding an AWS resource can be difficult. One way to do this is in the form of tags, which assign metadata to your AWS resources and consist of a user-defined key and value. You can create tags to categorize resources by purpose, owner, environment, type of data processed, and other criteria of your choice. - Having a consistent tagging strategy can speed up response times and minimize time spent on organizational context by allowing you to quickly identify and discern contextual information about an AWS resource. Tags can also serve as a mechanism to initiate response automations. Youโ€™ll want to first define the tags you want to implement across your organization. After that, youโ€™ll implement and enforce this tagging strategy.

๐Ÿ’ผ SEC10-BP07 Run simulations

As organizations grow and evolve over time, so does the threat landscape, making it important to continually review your incident response capabilities. Running simulations (also known as game days) is one method that can be used to perform this assessment. Simulations use real-world security event scenarios designed to mimic a threat actorโ€™s tactics, techniques, and procedures (TTPs) and allow an organization to exercise and evaluate their incident response capabilities by responding to these mock cyber events as they might occur in reality. **Benefits of establishing this best practice** Simulations have a variety of benefits: - Validating cyber readiness and developing the confidence of your incident responders. - Testing the accuracy and efficiency of tools and workflows. - Refining communication and escalation methods aligned with your incident response plan. - Providing an opportunity to respond to less common vectors. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance There are three main types of simulations: **Tabletop exercises:** The tabletop approach to simulations is a discussion-based session involving the various incident response stakeholders to practice roles and responsibilities and use established communication tools and playbooks. Exercise facilitation can typically be accomplished in a full day in a virtual venue, physical venue, or a combination. Because it is discussion-based, the tabletop exercise focuses on processes, people, and collaboration. Technology is an integral part of the discussion, but the actual use of incident response tools or scripts is generally not a part of the tabletop exercise. **Purple team exercises:** Purple team exercises increase the level of collaboration between the incident responders (blue team) and simulated threat actors (red team). The blue team is comprised of members of the security operations center (SOC), but can also include other stakeholders that would be involved during an actual cyber event. The red team is comprised of a penetration testing team or key stakeholders that are trained in offensive security. The red team works collaboratively with the exercise facilitators when designing a scenario so that the scenario is accurate and feasible. During purple team exercises, the primary focus is on the detection mechanisms, the tools, and the standard operating procedures (SOPs) supporting the incident response efforts. **Red team exercises:** During a red team exercise, the offense (red team) conducts a simulation to achieve a certain objective or set of objectives from a predetermined scope. The defenders (blue team) will not necessarily have knowledge of the scope and duration of the exercise, which provides a more realistic assessment of how they would respond to an actual incident. Because red team exercises can be invasive tests, be cautious and implement controls to verify that the exercise does not cause actual harm to your environment. Consider facilitating cyber simulations at a regular interval. Each exercise type can provide unique benefits to the participants and the organization as a whole, so you might choose to start with less complex simulation types (such as tabletop exercises) and progress to more complex simulation types (red team exercises). You should select a simulation type based on your security maturity, resources, and your desired outcomes. Some customers might not choose to perform red team exercises due to complexity and cost. ### Implementation steps Regardless of the type of simulation you choose, simulations generally follow these implementation steps: 1. **Define core exercise elements:** Define the simulation scenario and the objectives of the simulation. Both of these should have leadership acceptance. 2. **Identify key stakeholders:** At a minimum, an exercise needs exercise facilitators and participants. Depending on the scenario, additional stakeholders such as legal, communications, or executive leadership might be involved. 3. **Build and test the scenario:** The scenario might need to be redefined as it is being built if specific elements arenโ€™t feasible. A finalized scenario is expected as the output of this stage. 4. **Facilitate the simulation:** The type of simulation determines the facilitation used (a paper-based scenario compared to a highly technical, simulated scenario). The facilitators should align their facilitation tactics to the exercise objects and they should engage all exercise participants wherever possible to provide the most benefit. 5. **Develop the after-action report (AAR):** Identify areas that went well, those that can use improvement, and potential gaps. The AAR should measure the effectiveness of the simulation as well as the teamโ€™s response to the simulated event so that progress can be tracked over time with future simulations.

๐Ÿ’ผ SEC10-BP08 Establish a framework for learning from incidents

Implementing a lessons learned framework and root cause analysis capability can not only help improve incident response capabilities, but also help prevent the incident from recurring. By learning from each incident, you can help avoid repeating the same mistakes, exposures, or misconfigurations, not only improving your security posture, but also minimizing time lost to preventable situations. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance It's important to implement a lessons learned framework that establishes and achieves, at a high level, the following points: - When is a lessons learned held? - What is involved in the lessons learned process? - How is a lessons learned performed? - Who is involved in the process and how? - How will areas of improvement be identified? - How will you ensure improvements are effectively tracked and implemented? The framework should not focus on or blame individuals, but instead should focus on improving tools and processes. ### Implementation steps Aside from the preceding high-level outcomes listed, itโ€™s important to make sure that you ask the right questions to derive the most value (information that leads to actionable improvements) from the process. Consider these questions to help get you started in fostering your lessons learned discussions: - What was the incident? - When was the incident first identified? - How was it identified? - What systems alerted on the activity? - What systems, services, and data were involved? - What specifically occurred? - What worked well? - What didn't work well? - Which process or procedures failed or failed to scale to respond to the incident? - What can be improved within the following areas: **People** - Were the people who were needed to be contacted actually available and was the contact list up to date? - Were people missing training or capabilities needed to effectively respond and investigate the incident? - Were the appropriate resources ready and available? **Process** - Were processes and procedures followed? - Were processes and procedures documented and available for this (type of) incident? - Were required processes and procedures missing? - Were the responders able to gain timely access to the required information to respond to the issue? **Technology** - Did existing alerting systems effectively identify and alert on the activity? - How could we have reduced time-to-detection by 50%? - Do existing alerts need improvement or new alerts need to be built for this (type of) incident? - Did existing tools allow for effective investigation (search/analysis) of the incident? - What can be done to help identify this (type of) incident sooner? - What can be done to help prevent this (type of) incident from occurring again? - Who owns the improvement plan and how will you test that it has been implemented? - What is the timeline for the additional monitoring or preventative controls and processes to be implemented and tested? This list isnโ€™t all-inclusive, but is intended to serve as a starting point for identifying what the organization and business needs are and how you can analyze them in order to most effectively learn from incidents and continuously improve your security posture. Most important is getting started by incorporating lessons learned as a standard part of your incident response process, documentation, and expectations across the stakeholders.

๐Ÿ’ผ SEC11-BP01 Train for application security

Provide training to your team on secure development and operation practices, which helps them build secure and high-quality software. This practice helps your team to prevent, detect, and remediate security issues earlier in the development lifecycle. Consider training that covers threat modeling, secure coding practices, and using services for secure configurations and operations. Provide your team access to training through self-service resources, and regularly gather their feedback for continuous improvement. **Desired outcome** You equip your team with the knowledge and skills necessary to design and build software with security in mind from the outset. Through training on threat modeling and secure development practices, your team has a deep understanding of potential security risks and how to mitigate them during the software development lifecycle (SDLC). This proactive approach to security is part of your team's culture, and you become able to identify and remediate potential security issues early on. As a result, your team delivers high-quality, secure software and features more efficiently, which accelerates the overall delivery timeline. You have a collaborative and inclusive security culture within your organization, where the ownership of security is shared across all builders. **Common anti-patterns** - You wait until a security review, and then consider the security properties of a system. - You leave all security decisions to a central security team. - You don't communicate how the decisions taken in the SDLC relate to the overall security expectations or policies of the organization. - You perform the security review process too late. **Benefits of establishing this best practice** - Better knowledge of the organizational requirements for security early in the development cycle. - Being able to identify and remediate potential security issues faster, resulting in a quicker delivery of features. - Improved quality of software and systems. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance To build secure and high-quality software, provide training to your team on common practices for secure development and operation of applications. This practice can help your team prevent, detect, and remediate security issues earlier in the development lifecycle, which can accelerate your delivery timeline. To achieve this practice, consider training your team on threat modeling using AWS resources like the Threat Modeling Workshop. Threat modeling can help your team understand potential security risks and design systems with security in mind from the outset. Additionally, you can provide access to AWS Training and Certification, industry, or AWS Partner training on secure development practices. Clearly define and communicate your organization's security review process, and outline the responsibilities of your team, the security team, and other stakeholders. Publish self-service guidance, code examples, and templates that demonstrate how to meet your security requirements. You can use AWS services like AWS CloudFormation, AWS Cloud Development Kit (AWS CDK) (AWS CDK) Constructs, and Service Catalog to provide pre-approved, secure configurations and reduce the need for custom setups. Regularly gather feedback from your team on their experience with the security review process and training, and use this feedback to continuously improve. Conduct game days or bug bash campaigns to identify and address security issues while simultaneously enhancing your team's skills. ### Implementation steps 1. **Identify training needs:** Assess the current skill level and knowledge gaps within your team regarding secure development practices through surveys, code reviews, or discussions with team members. 2. **Plan the training:** Based on the identified needs, create a training plan that covers relevant topics such as threat modeling, secure coding practices, security testing, and secure deployment practices. Employ resources like the Threat Modeling Workshop, AWS Training and Certification, and industry or AWS Partner training programs. 3. **Schedule and deliver training:** Schedule regular training sessions or workshops for your team. These can be instructor-led or self-paced, depending on your team's preferences and availability. Encourage hands-on exercises and practical examples to reinforce the learning. 4. **Define a security review process:** Collaborate with your security team and other stakeholders to clearly define the security review process for your applications. Document the responsibilities of each team or individual involved in the process, including your development team, security team, and other relevant stakeholders. 5. **Create self-service resources:** Develop self-service guidance, code examples, and templates that demonstrate how to meet your organization's security requirements. Consider AWS services like CloudFormation, AWS CDK Constructs, and Service Catalog to provide pre-approved, secure configurations and reduce the need for custom setups. 6. **Communicate and socialize:** Effectively communicate the security review process and the available self-service resources to your team. Conduct training sessions or workshops to familiarize them with these resources, and verify that they understand how to use them. 7. **Gather feedback and improve:** Regularly collect feedback from your team on their experience with the security review process and training. Use this feedback to identify areas for improvement and continuously refine the training materials, self-service resources, and the security review process. 8. **Conduct security exercises:** Organize game days or bug bash campaigns to identify and address security issues within your applications. These exercises not only help uncover potential vulnerabilities but also serve as practical learning opportunities for your team that enhance their skills in secure development and operation. 9. **Continue to learn and improve:** Encourage your team to stay up to date with the latest secure development practices, tools, and techniques. Regularly review and update your training materials and resources to reflect the evolving security landscape and best practices.

๐Ÿ’ผ SEC11-BP02 Automate testing throughout the development and release lifecycle

Automate the testing for security properties throughout the development and release lifecycle. Automation makes it easier to consistently and repeatably identify potential issues in software prior to release, which reduces the risk of security issues in the software being provided. **Desired outcome** The goal of automated testing is to provide a programmatic way of detecting potential issues early and often throughout the development lifecycle. When you automate regression testing, you can rerun functional and non-functional tests to verify that previously tested software still performs as expected after a change. When you define security unit tests to check for common misconfigurations, such as broken or missing authentication, you can identify and fix these issues early in the development process. Test automation uses purpose-built test cases for application validation, based on the applicationโ€™s requirements and desired functionality. The result of the automated testing is based on comparing the generated test output to its respective expected output, which expedites the overall testing lifecycle. Testing methodologies such as regression testing and unit test suites are best suited for automation. Automating the testing of security properties allows builders to receive automated feedback without having to wait for a security review. Automated tests in the form of static or dynamic code analysis can increase code quality and help detect potential software issues early in the development lifecycle. **Common anti-patterns** - Not communicating the test cases and test results of the automated testing. - Performing the automated testing only immediately prior to a release. - Automating test cases with frequently changing requirements. - Failing to provide guidance on how to address the results of security tests. **Benefits of establishing this best practice** - Reduced dependency on people evaluating the security properties of systems. - Having consistent findings across multiple workstreams improves consistency. - Reduced likelihood of introducing security issues into production software. - Shorter window of time between detection and remediation due to catching software issues earlier. - Increased visibility of systemic or repeated behavior across multiple workstreams, which can be used to drive organization-wide improvements. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance As you build your software, adopt various mechanisms for software testing to ensure that you are testing your application for both functional requirements, based on your applicationโ€™s business logic, and non-functional requirements, which are focused on application reliability, performance, and security. Static application security testing (SAST) analyzes your source code for anomalous security patterns, and provides indications for defect prone code. SAST relies on static inputs, such as documentation (requirements specification, design documentation, and design specifications) and application source code to test for a range of known security issues. Static code analyzers can help expedite the analysis of large volumes of code. The NIST Quality Group provides a comparison of Source Code Security Analyzers, which includes open source tools for Byte Code Scanners and Binary Code Scanners. Complement your static testing with dynamic analysis security testing (DAST) methodologies, which performs tests against the running application to identify potentially unexpected behavior. Dynamic testing can be used to detect potential issues that are not detectable via static analysis. Testing at the code repository, build, and pipeline stages allows you to check for different types of potential issues from entering into your code. Amazon Q Developers provides code recommendations, including security scanning, in the builder's IDE. Amazon CodeGuru Security can identify critical issues, security issues, and hard-to-find bugs during application development, and provides recommendations to improve code quality. Extracting Software Bill of Materials (SBOM) also allows you to extract a formal record containing the details and relationships of the various components used in building your software. This allows you to inform vulnerability management, and quickly identify software or component dependencies and supply chain risks. The Security for Developers workshop uses AWS developer tools, such as AWS CodeBuild, AWS CodeCommit, and AWS CodePipeline, for release pipeline automation that includes SAST and DAST testing methodologies. As you progress through your SDLC, establish an iterative process that includes periodic application reviews with your security team. Feedback gathered from these security reviews should be addressed and validated as part of your release readiness review. These reviews establish a robust application security posture, and provide builders with actionable feedback to address potential issues. ### Implementation steps 1. Implement consistent IDE, code review, and CI/CD tools that include security testing. 2. Consider where in the SDLC it is appropriate to block pipelines instead of just notifying builders that issues need to be remediated. 3. Automated Security Helper (ASH) is an example for open-source code security scanning tool. 4. Performing testing or code analysis using automated tools, such as Amazon Q Developer integrated with developer IDEs, and Amazon CodeGuru Security for scanning code on commit, helps builders get feedback at the right time. 5. When building using AWS Lambda, you can use Amazon Inspector to scan the application code in your functions. 6. When automated testing is included in CI/CD pipelines, you should use a ticketing system to track the notification and remediation of software issues. 7. For security tests that might generate findings, linking to guidance for remediation helps builders improve code quality. 8. Regularly analyze the findings from automated tools to prioritize the next automation, builder training, or awareness campaign. 9. To extract SBOM as part of your CI/CD pipelines, use Amazon Inspector SBOM Generator to produce SBOMs for archives, container images, directories, local systems, and compiled Go and Rust binaries in the CycloneDX SBOM format.

๐Ÿ’ผ SEC11-BP03 Perform regular penetration testing

Perform regular penetration testing of your software. This mechanism helps identify potential software issues that cannot be detected by automated testing or a manual code review. It can also help you understand the efficacy of your detective controls. Penetration testing should try to determine if the software can be made to perform in unexpected ways, such as exposing data that should be protected, or granting broader permissions than expected. **Desired outcome** Penetration testing is used to detect, remediate, and validate your applicationโ€™s security properties. Regular and scheduled penetration testing should be performed as part of the software development lifecycle (SDLC). The findings from penetration tests should be addressed prior to the software being released. You should analyze the findings from penetration tests to identify if there are issues that could be found using automation. Having a regular and repeatable penetration testing process that includes an active feedback mechanism helps inform the guidance to builders and improves software quality. **Common anti-patterns** - Only penetration testing for known or prevalent security issues. - Penetration testing applications without dependent third-party tools and libraries. - Only penetration testing for package security issues, and not evaluating implemented business logic. **Benefits of establishing this best practice** - Increased confidence in the security properties of the software prior to release. - Opportunity to identify preferred application patterns, which leads to greater software quality. - A feedback loop that identifies earlier in the development cycle where automation or additional training can improve the security properties of software. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Penetration testing is a structured security testing exercise where you run planned security breach scenarios to detect, remediate, and validate security controls. Penetration tests start with reconnaissance, during which data is gathered based on the current design of the application and its dependencies. A curated list of security-specific testing scenarios are built and run. The key purpose of these tests is to uncover security issues in your application, which could be exploited for gaining unintended access to your environment, or unauthorized access to data. You should perform penetration testing when you launch new features, or whenever your application has undergone major changes in function or technical implementation. You should identify the most appropriate stage in the development lifecycle to perform penetration testing. This testing should happen late enough that the functionality of the system is close to the intended release state, but with enough time remaining for any issues to be remediated. ### Implementation steps 1. Have a structured process for how penetration testing is scoped, basing this process on the threat model is a good way of maintaining context. 2. Identify the appropriate place in the development cycle to perform penetration testing. This should be when there is minimal change expected in the application, but with enough time to perform remediation. 3. Train your builders on what to expect from penetration testing findings, and how to get information on remediation. 4. Use tools to speed up the penetration testing process by automating common or repeatable tests. 5. Analyze penetration testing findings to identify systemic security issues, and use this data to inform additional automated testing and ongoing builder education.

๐Ÿ’ผ SEC11-BP04 Conduct code reviews

Implement code reviews to help verify the quality and security of software being developed. Code reviews involve having team members other than the original code author review the code for potential issues, vulnerabilities, and adherence to coding standards and best practices. This process helps catch errors, inconsistencies, and security flaws that might have been overlooked by the original developer. Use automated tools to assist with code reviews. **Desired outcome** You include code reviews during development to increase the quality of the software being written. You upskill less experienced members of the team through learnings identified during the code review. You identify opportunities for automation and support the code review process using automated tools and testing. **Common anti-patterns** - You don't review code before deployment. - The same person writes and reviews the code. - You don't use automation and tools to assist or orchestrate code reviews. - You don't train builders on application security before they review code. **Benefits of establishing this best practice** - Increased code quality. - Increased consistency of code development through reuse of common approaches. - Reduction in the number of issues discovered during penetration testing and later stages. - Improved knowledge transfer within the team. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Code reviews help to verify the quality and security of the software during development. Manual reviews involve having a team member other than the original code author review the code for potential issues, vulnerabilities, and adherence to coding standards and best practices. This process helps catch errors, inconsistencies, and security flaws that might have been overlooked by the original developer. Consider Amazon CodeGuru Security to help conduct automated code reviews. CodeGuru Security uses machine learning and automated reasoning to analyze your code and identify potential security vulnerabilities and coding issues. Integrate automated code reviews with your existing code repositories and continuous integration/ continuous deployment (CI/CD) pipelines. ### Implementation steps 1. **Establish a code review process:** - Define when code reviews should occur, such as before merging code into the main branch or before deploying to production. - Determine who should be involved in the code review process, such as team members, senior developers, and security experts. - Decide on the code review methodology, including the process and tools to be used. 2. **Set up code review tools:** - Evaluate and select code review tools that fit your team's needs, such as GitHub Pull Requests or CodeGuru Security. - Integrate the chosen tools with your existing code repositories and CI/CD pipelines. - Configure the tools to enforce code review requirements, such as the minimum number of reviewers and approval rules. 3. **Define a code review checklist and guidelines:** - Create a code review checklist or guidelines that outline what should be reviewed. Consider factors such as code quality, security vulnerabilities, adherence to coding standards, and performance. - Share the checklist or guidelines with the development team, and verify everyone understands the expectations. 4. **Train developers on code review best practices:** - Provide training to your team on how to conduct effective code reviews. - Educate your team on application security principles and common vulnerabilities to look for during reviews. - Encourage knowledge sharing and pair programming sessions to upskill less experienced team members. 5. **Implement the code review process:** - Integrate the code review step into your development workflow, such as creating a pull request and assigning reviewers. - Require that code changes undergo a code review before merge or deployment. - Encourage open communication and constructive feedback during the review process. 6. **Monitor and improve:** - Regularly review the effectiveness of your code review process and gather feedback from the team. - Identify opportunities for automation or tool improvements to streamline the code review process. - Continuously update and refine the code review checklist or guidelines based on learnings and industry best practices. 7. **Foster a culture of code review:** - Emphasize the importance of code reviews to maintain code quality and security. - Celebrate successes and learnings from the code review process. - Encourage a collaborative and supportive environment where developers feel comfortable giving and receiving feedback.

๐Ÿ’ผ SEC11-BP05 Centralize services for packages and dependencies

Provide centralized services for your teams to obtain software packages and other dependencies. This allows the validation of packages before they are included in the software that you write and provides a source of data for the analysis of the software being used in your organization. **Desired outcome** You build your workload from external software packages in addition to the code that you write. This makes it simpler for you to implement functionality that is repeatedly used, such as a JSON parser or an encryption library. You centralize the sources for these packages and dependencies so your security team can validate them before they are used. You use this approach in conjunction with the manual and automated testing flows to increase the confidence in the quality of the software that you develop. **Common anti-patterns** - You pull packages from arbitrary repositories on the internet. - You don't test new packages before you make them available to builders. **Benefits of establishing this best practice** - Better understanding of what packages are being used in the software being built. - Being able to notify workload teams when a package needs to be updated based on the understanding of who is using what. - Reducing the risk of a package with issues being included in your software. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Provide centralized services for packages and dependencies in a way that is simple for builders to consume. Centralized services can be logically central rather than implemented as a monolithic system. This approach allows you to provide services in a way that meets the needs of your builders. You should implement an efficient way of adding packages to the repository when updates happen or new requirements emerge. AWS services such as AWS CodeArtifact or similar AWS partner solutions provide a way of delivering this capability. ### Implementation steps 1. Implement a logically centralized repository service that is available in all of the environments where software is developed. 2. Include access to the repository as part of the AWS account vending process. 3. Build automation to test packages before they are published in a repository. 4. Maintain metrics of the most commonly used packages, languages, and teams with the highest amount of change. 5. Provide an automated mechanism for builder teams to request new packages and provide feedback. 6. Regularly scan packages in your repository to identify the potential impact of newly discovered issues.

๐Ÿ’ผ SEC11-BP06 Deploy software programmatically

Perform software deployments programmatically where possible. This approach reduces the likelihood that a deployment fails or an unexpected issue is introduced due to human error. **Desired outcome** The version of your workload that you test is the version that you deploy, and the deployment is performed consistently every time. You externalize the configuration of your workload, which helps you deploy to different environments without changes. You employ cryptographic signing of your software packages to verify that nothing changes between environments. **Common anti-patterns** - Manually deploying software into production. - Manually performing changes to software to cater to different environments. **Benefits of establishing this best practice** - Increased confidence in the software release process. - Reduced risk of a failed change impacting business functionality. - Increased release cadence due to lower change risk. - Automatic rollback capability for unexpected events during deployment. - Ability to cryptographically prove that the software that was tested is the software deployed. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance To maintain a robust and reliable application infrastructure, implement secure and automated deployment practices. This practice involves removing persistent human access from production environments, using CI/CD tools for deployments, and externalizing environment-specific configuration data. By following this approach, you can enhance security, reduce the risk of human errors, and streamline the deployment process. You can build your AWS account structure to remove persistent human access from production environments. This practice minimizes the risk of unauthorized changes or accidental modifications, which improves the integrity of your production systems. Instead of direct human access, you can use CI/CD tools like AWS CodeBuild and AWS CodePipeline to perform deployments. You can use these services to automate the build, test, and deployment processes, which reduces manual intervention and increases consistency. To further enhance security and traceability, you can sign your application packages after they have been tested and validate these signatures during deployment. To do so, use cryptographic tools such as AWS Signer or AWS Key Management Service (AWS KMS). By signing and verifying packages, you can make sure that you deploy only authorized and validated code to your environments. Additionally, your team can architect your workload to obtain environment-specific configuration data from an external source, such as AWS Systems Manager Parameter Store. This practice separates the application code from the configuration data, which helps you manage and update configurations independently without modifying the application code itself. To streamline infrastructure provisioning and management, consider using infrastructure as code (IaC) tools like AWS CloudFormation or AWS CDK. You can use these tools to define your infrastructure as code, which improves the consistency and repeatability of deployments across different environments. Consider canary deployments to validate the successful deployment of your software. Canary deployments involve rolling out changes to a subset of instances or users before deploying to the entire production environment. You can then monitor the impact of changes and roll back if necessary, which minimizes the risk of widespread issues. Follow the recommendations outlined in the Organizing Your AWS Environment Using Multiple Accounts whitepaper. This whitepaper provides guidance on separating environments (such as development, staging, and production) into distinct AWS accounts, which further enhances security and isolation. ### Implementation steps 1. **Set up AWS account structure:** - Follow the guidance in the Organizing Your AWS Environment Using Multiple Accounts whitepaper to create separate AWS accounts for different environments (for example, development, staging, and production). - Configure appropriate access controls and permissions for each account to restrict direct human access to production environments. 2. **Implement a CI/CD pipeline:** - Set up a CI/CD pipeline using services like AWS CodeBuild and AWS CodePipeline. - Configure the pipeline to automatically build, test, and deploy your application code to the respective environments. - Integrate code repositories with the CI/CD pipeline for version control and code management. 3. **Sign and verify application packages:** - Use AWS Signer or AWS Key Management Service (AWS KMS) to sign your application packages after they have been tested and validated. - Configure the deployment process to verify the signatures of the application packages before you deploy them to the target environments. 4. **Externalize configuration data:** - Store environment-specific configuration data in AWS Systems Manager Parameter Store. - Modify your application code to retrieve configuration data from the Parameter Store during deployment or runtime. 5. **Implement infrastructure as code (IaC):** - Use IaC tools like AWS CloudFormation or AWS CDK to define and manage your infrastructure as code. - Create CloudFormation templates or CDK scripts to provision and configure the necessary AWS resources for your application. - Integrate IaC with your CI/CD pipeline to automatically deploy infrastructure changes alongside application code changes. 6. **Implement canary deployments:** - Configure your deployment process to support canary deployments, where changes are rolled out to a subset of instances or users before you deploy them to the entire production environment. - Use services like AWS CodeDeploy or AWS ECS to manage canary deployments and monitor the impact of changes. - Implement rollback mechanisms to revert to the previous stable version if issues are detected during the canary deployment. 7. **Monitor and audit:** - Set up monitoring and logging mechanisms to track deployments, application performance, and infrastructure changes. - Use services like Amazon CloudWatch and AWS CloudTrail to collect and analyze logs and metrics. - Implement auditing and compliance checks to verify adherence to security best practices and regulatory requirements. 8. **Continuously improve:** - Regularly review and update your deployment practices, and incorporate feedback and lessons learned from previous deployments. - Automate as much of the deployment process as possible to reduce manual intervention and potential human errors. - Collaborate with cross-functional teams (for example, operations or security) to align and continuously improve deployment practices. By following these steps, you can implement secure and automated deployment practices in your AWS environment, which enhances security, reduces the risk of human errors, and streamlines the deployment process.

๐Ÿ’ผ SEC11-BP07 Regularly assess security properties of the pipelines

Apply the principles of the Well-Architected Security Pillar to your pipelines, with particular attention to the separation of permissions. Regularly assess the security properties of your pipeline infrastructure. Effectively managing the security of the pipelines allows you to deliver the security of the software that passes through the pipelines. **Desired outcome** The pipelines you use to build and deploy your software follow the same recommended practices as any other workload in your environment. The tests that you implement in your pipelines are not editable by the teams who use them. You give the pipelines only the permissions needed for the deployments they are doing using temporary credentials. You implement safeguards to prevent pipelines from deploying to the wrong environments. You configure your pipelines to emit state so that the integrity of your build environments can be validated. **Common anti-patterns** - Security tests that can be bypassed by builders. - Overly broad permissions for deployment pipelines. - Pipelines not being configured to validate inputs. - Not regularly reviewing the permissions associated with your CI/CD infrastructure. - Use of long-term or hardcoded credentials. **Benefits of establishing this best practice** - Greater confidence in the integrity of the software that is built and deployed through the pipelines. - Ability to stop a deployment when there is suspicious activity. **Level of risk exposed if this best practice is not established:** High ## Implementation guidance Your deployment pipelines are a critical component of your software development lifecycle and should follow the same security principles and practices as any other workload in your environment. This includes implementing proper access controls, validating inputs, and regularly reviewing and auditing the permissions associated with your CI/CD infrastructure. Verify that the teams responsible for building and deploying applications do not have the ability to edit or bypass the security tests and checks implemented in your pipelines. This separation of concerns helps maintain the integrity of your build and deployment processes. As a starting point, consider employing the AWS Deployment Pipelines Reference Architecture. This reference architecture provides a secure and scalable foundation for building your CI/CD pipelines on AWS. Additionally, you can use services like AWS Identity and Access Management (IAM) Access Analyzer to generate least-privilege IAM policies for both your pipeline permissions and as a step in your pipeline to verify workload permissions. This helps verify that your pipelines and workloads have only the necessary permissions required for their specific functions, which reduces the risk of unauthorized access or actions. ### Implementation steps 1. Start with the AWS Deployment Pipelines Reference Architecture. 2. Consider using AWS IAM Access Analyzer to programmatically generate least privilege IAM policies for the pipelines. 3. Integrate your pipelines with monitoring and alerting so that you are notified of unexpected or abnormal activity. For AWS managed services, Amazon EventBridge allows you to route data to targets such as AWS Lambda or Amazon Simple Notification Service (Amazon SNS).

๐Ÿ’ผ SEC11-BP08 Build a program that embeds security ownership in workload teams

Build a program or mechanism that empowers builder teams to make security decisions about the software that they create. Your security team still needs to validate these decisions during a review, but embedding security ownership in builder teams allows for faster, more secure workloads to be built. This mechanism also promotes a culture of ownership that positively impacts the operation of the systems you build. **Desired outcome** You have embedded security ownership and decision-making in your teams. You have either trained your teams on how to think about security or have augmented their team with embedded or associated security people. Your teams make higher-quality security decisions earlier in the development cycle as a result. **Common anti-patterns** - Leaving all security design decisions to a security team. - Not addressing security requirements early enough in the development process. - Not obtaining feedback from builders and security people on the operation of the program. **Benefits of establishing this best practice** - Reduced time to complete security reviews. - Reduction in security issues that are only detected at the security review stage. - Improvement in the overall quality of the software being written. - Opportunity to identify and understand systemic issues or areas of high value improvement. - Reduction in the amount of rework required due to security review findings. - Improvement in the perception of the security function. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Start with the guidance in SEC11-BP01 Train for application security. Then identify the operational model for the program that you think might work best for your organization. The two main patterns are to train builders or to embed security people in builder teams. After you have decided on the initial approach, you should pilot with a single or small group of workload teams to prove the model works for your organization. Leadership support from the builder and security parts of the organization helps with the delivery and success of the program. As you build this program, itโ€™s important to choose metrics that can be used to show the value of the program. Learning from how AWS has approached this problem is a good learning experience. This best practice is very much focused on organizational change and culture. The tools that you use should support the collaboration between the builder and security communities. ### Implementation steps 1. Start by training your builders for application security. 2. Create a community and an onboarding program to educate builders. 3. Pick a name for the program. Guardians, Champions, or Advocates are commonly used. 4. Identify the model to use: train builders, embed security engineers, or have affinity security roles. 5. Identify project sponsors from security, builders, and potentially other relevant groups. 6. Track metrics for the number of people involved in the program, the time taken for reviews, and the feedback from builders and security people. Use these metrics to make improvements.

๐Ÿ’ผ Secure Access

Policies for identifying resources that do not follow data access best practices for private access.

๐Ÿ’ผ Select the best pricing model

**Perform workload cost modeling:** Consider the requirements of the workload components and understand the potential pricing models. Define the availability requirement of the component. Determine if there are multiple independent resources that perform the function in the workload, and what the workload requirements are over time. Compare the cost of the resources using the default On-Demand pricing model and other applicable models. Factor in any potential changes in resources or workload components. **Perform regular account level analysis:** Performing regular cost modeling ensures that opportunities to optimize across multiple workloads can be implemented. For example, if multiple workloads use On-Demand, at an aggregate level, the risk of change is lower, and implementing a commitment-based discount will achieve a lower overall cost. It is recommended to perform analysis in regular cycles of two weeks to one month. This analysis allows you to make small adjustment purchases, so the coverage of your pricing models continues to evolve with your changing workloads and their components. Use the AWS Cost Explorer recommendations tool to find opportunities for commitment discounts. To find opportunities for Spot workloads, use an hourly view of your overall usage, and look for regular periods of changing usage or elasticity. **Pricing models:** AWS has multiple pricing models that allow you to pay for your resources in the most cost-effective way that suits your organizationโ€™s needs. The following section describes each purchasing model: - On-Demand Instances - Spot Instances - Commitment discounts - Savings Plans - Commitment discounts - Reserved Instances/Capacity - Geographic selection - Third-party agreements and pricing **On-Demand Instances:** This is the default, pay as you go pricing model. When you use resources (for example, EC2 instances or services such as DynamoDB on demand) you pay a flat rate, and you have no long-term commitments. You can increase or decrease the capacity of your resources or services based on the demands of your application. On-Demand has an hourly rate, but depending on the service, can be billed in increments of one second (for example Amazon RDS, or Linux EC2 instances). On demand is recommended for applications with short-term workloads (for example, a four-month project), that spike periodically, or unpredictable workloads that canโ€™t be interrupted. On demand is also suitable for workloads, such as pre-production environments, which require uninterrupted runtimes, but do not run long enough for a commitment discount (Savings Plans or Reserved Instances). **Spot Instances:** A Spot Instance is spare Amazon EC2 compute capacity available at discounts of up to 90% off On-Demand prices with no long-term commitment required. With Spot Instances, you can significantly reduce the cost of running your applications or scale your applicationโ€™s compute capacity for the same budget. Unlike On-Demand, Spot Instances can be interrupted with a 2-minute warning if Amazon EC2 needs the capacity back, or the Spot Instance price exceeds your configured price. On average, Spot Instances are interrupted less than 5% of the time. Spot Instances are ideal when there is a queue or buffer in place, or where there are multiple resources working independently to process the requests (for example, Hadoop data processing). Typically these workloads are fault-tolerant, stateless, and flexible, such as batch processing, big data and analytics, containerized environments, and high performance computing (HPC). Non-critical workloads such as test and development environments are also candidates for Spot. Spot Instances are also integrated into multiple AWS services, such as Amazon EC2 Auto Scaling groups, Amazon EMR, Amazon Elastic Container Service (Amazon ECS), and AWS Batch. When a Spot Instance needs to be reclaimed, Amazon EC2 sends a two-minute warning via a Spot Instance interruption notice delivered through CloudWatch Events, as well as in the instance metadata. During that two-minute period, your application can use the time to save its state, drain running containers, upload final log files, or remove itself from a load balancer. At the end of the two minutes, you have the option to hibernate, stop, or terminate the Spot Instance. Consider the following best practices when adopting Spot Instances in your workloads: - **Be flexible across as many instance types as possible:** Be flexible in both the family and size of the instance type, to improve the likelihood of fulfilling your target capacity requirements, obtain the lowest possible cost, and minimize the impact of interruptions. - **Be flexible about where your workload will run:** Available capacity can vary by Availability Zone. This improves the likelihood of fulfilling your target capacity by tapping into multiple spare capacity pools, and provides the lowest possible cost. - **Design for continuity:** Design your workloads for statelessness and fault-tolerance, so that if some of your EC2 capacity gets interrupted, it will not have impact on the availability or performance of the workload. - We recommend using Spot Instances in combination with On-Demand and Savings Plans/Reserved Instances to maximize workload cost optimization with performance. **Commitment discounts โ€“ Savings Plans:** AWS provides a number of ways for you to reduce your costs by reserving or committing to use a certain amount of resources, and receiving a discounted rate for your resources. A Savings Plan allows you to make an hourly spend commitment for one or three years, and receive discounted pricing across your resources. Savings Plans provide discounts for AWS Compute services such as Amazon EC2, AWS Fargate, and AWS Lambda. When you make the commitment, you pay that commitment amount every hour, and it is subtracted from your On-Demand usage at the discount rate. For example, you commit to $50 an hour, and have $150 an hour of On-Demand usage. Considering the Savings Plans pricing, your specific usage has a discount rate of 50%. So, your $50 commitment covers $100 of On-Demand usage. You will pay $50 (commitment) and $50 of remaining On-Demand usage. Compute Savings Plans are the most flexible and provide a discount of up to 66%. They automatically apply across Availability Zones, instance size, instance family, operating system, tenancy, Region, and compute service. Instance Savings Plans have less flexibility but provide a higher discount rate (up to 72%). They automatically apply across Availability Zones, instance size, operating system, and tenancy. There are three payment options: - **No upfront payment:** There is no upfront payment; you then pay a reduced hourly rate each month for the total hours in the month. - **Partial upfront payment:** Provides a higher discount rate than No upfront. Part of the usage is paid up front; you then pay a smaller reduced hourly rate each month for the total hours in the month. - **All upfront payment:** Usage for the entire period is paid up front, and no other costs are incurred for the remainder of the term for usage that is covered by the commitment. You can apply any combination of these three purchasing options across your workloads. Savings plans apply first to the usage in the account they are purchased in, from the highest discount percentage to the lowest, then they apply to the consolidated usage across all other accounts, from the highest discount percentage to the lowest. It is recommended to purchase all Savings Plans in an account with no usage or resources, such as the management account. This ensures that the Savings Plan applies to the highest discount rates across all of your usage, maximizing the discount amount. Workloads and usage typically change over time. It is recommended to continually purchase small amounts of Savings Plans commitment over time. This ensures that you maintain high levels of coverage to maximize your discounts, and your plans closely match your workload and organization requirements at all times. Do not set a target coverage in your accounts, due to the variability of discount that is possible. Low coverage does not necessarily indicate high potential savings. You may have a low coverage in your account, but if your usage is made up of small instances, with a licensed operating system, the potential saving could be as low as a few percent. Instead, track and monitor the potential savings available in the Savings Plan recommendation tool. Frequently review the Savings Plans recommendations in Cost Explorer (perform regular analysis) and continue to purchase commitments until the estimated savings are below the required discount for the organization. For example, track and monitor that your potential discounts remained below 20%, if it goes above that a purchase must be made. Monitor the utilization and coverage, but only to detect changes. Do not aim for a specific utilization percent, or coverage percent, as this does not necessarily scale with savings. Ensure that a purchase of Savings Plans results in an increase in coverage, and if there are decreases in coverage or utilization ensure they are quantified and known. For example, you migrate a workload resource to a newer instance type, which reduces utilization of an existing plan, but the performance benefit outweighs the saving reduction. **Commitment discounts โ€“ Reserved Instances/Commitment:** Similar to Savings Plans, Reserved Instances (RI) offer discounts up to 72% for a commitment to running a minimum amount of resources. Reserved Instances are available for Amazon RDS, Amazon OpenSearch Service, Amazon ElastiCache, Amazon Redshift, and DynamoDB. Amazon CloudFront and AWS Elemental MediaConvert also provide discounts when you make minimum usage commitments. Reserved Instances are currently available for Amazon EC2, however Savings Plans offer the same discount levels with increased flexibility and no management overhead. Reserved Instances offer the same pricing options of no upfront, partial upfront, and all upfront, and the same terms of one or three years. Reserved Instances can be purchased in a Region or a specific Availability Zone. They provide a capacity reservation when purchased in an Availability Zone. Amazon EC2 features convertible RIs, however, Savings Plans should be used for all EC2 instances due to increased flexibility and reduced operational costs. The same process and metrics should be used to track and make purchases of Reserved Instances. It is recommended to not track coverage of RIs across your accounts. It is also recommended that utilization percentage is not monitored or tracked, instead view the utilization report in Cost Explorer, and use net savings column in the table. If the net savings is a significantly large negative amount, you must take action to remediate the unused RI. **EC2 Fleet:** EC2 Fleet is a feature that allows you to define a target compute capacity, and then specify the instance types and the balance of On-Demand and Spot Instances for the fleet. EC2 Fleet will automatically launch the lowest price combination of resources to meet the defined capacity. **Geographic selection:** When you architect your solutions, a best practice is to seek to place computing resources closer to users to provide lower latency and strong data sovereignty. For global audiences, you should use multiple locations to meet these needs. You should select the geographic location that minimizes your costs. The AWS Cloud infrastructure is built around Regions and Availability Zones. A Region is a physical location in the world where we have multiple Availability Zones. Availability Zones consist of one or more discrete data centers, each with redundant power, networking, and connectivity, housed in separate facilities. Each AWS Region operates within local market conditions, and resource pricing is different in each Region. Choose a specific Region to operate a component of or your entire solution so that you can run at the lowest possible price globally. You can use the AWS Simple Monthly Calculator to estimate the costs of your workload in various Regions. **Third-party agreements and pricing:** When you use third-party solutions or services in the cloud, it is important that the pricing structures are aligned to Cost Optimization outcomes. Pricing should scale with the outcomes and value it provides. An example of this is software that takes a percentage of savings it provides, the more you save (outcome) the more it charges. Agreements that scale with your bill are typically not aligned to Cost Optimization, unless they provide outcomes for every part of your specific bill. For example, a solution that provides recommendations for Amazon EC2 and charges a percentage of your entire bill will increase if you use other services for which it provides no benefit. Another example is a managed service that is charged at a percentage of the cost of resources that are managed. A larger instance size may not necessarily require more management effort, but will be charged more. Ensure that these service pricing arrangements include a cost optimization program or features in their service to drive efficiency.

๐Ÿ’ผ Select the correct resource type, size, and number

By selecting the best resource type, size, and number of resources, you meet the technical requirements with the lowest cost resource. Right-sizing activities takes into account all of the resources of a workload, all of the attributes of each individual resource, and the effort involved in the right-sizing operation. Right-sizing can be an iterative process, initiated by changes in usage patterns and external factors, such as AWS price drops or new AWS resource types. Right-sizing can also be one-off if the cost of the effort to right-size, outweighs the potential savings over the life of the workload.

๐Ÿ’ผ SI-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] system and information integrity policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and information integrity policy and the associated system and information integrity controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and information integrity policy and procedures; and c. Review and update the current system and information integrity: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ SI-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and information integrity policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and information integrity policy and the associated system and information integrity controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and information integrity policy and procedures; and c. Review and update the current system and information integrity: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SI-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and information integrity policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and information integrity policy and the associated system and information integrity controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and information integrity policy and procedures; and c. Review and update the current system and information integrity: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SI-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] system and information integrity policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the system and information integrity policy and the associated system and information integrity controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the system and information integrity policy and procedures; and c. Review and update the current system and information integrity: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SI-1 SYSTEM AND INFORMATION INTEGRITY POLICY AND PROCEDURES

The organization: SI-1a. Develops, documents, and disseminates to [Assignment: organization-defined personnel or roles]: SI-1a.1. A system and information integrity policy that addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and SI-1a.2. Procedures to facilitate the implementation of the system and information integrity policy and associated system and information integrity controls; and SI-1b. Reviews and updates the current: SI-1b.1. System and information integrity policy [Assignment: organization-defined frequency]; and SI-1b.2. System and information integrity procedures [Assignment: organization-defined frequency].

๐Ÿ’ผ SI-10 (1) MANUAL OVERRIDE CAPABILITY

The information system: SI-10 (1)(a) Provides a manual override capability for input validation of [Assignment: organization-defined inputs]; SI-10 (1)(b) Restricts the use of the manual override capability to only [Assignment: organization-defined authorized individuals]; and SI-10 (1)(c) Audits the use of the manual override capability.

๐Ÿ’ผ SI-10 Information Input Validation (M)(H)

Check the validity of the following information inputs: [Assignment: organization-defined information inputs to the system]. **SI-10 Additional FedRAMP Requirements and Guidance:** **Requirement**: Validate all information inputs and document any exceptions.

๐Ÿ’ผ SI-10 Information Input Validation (M)(H)

Check the validity of the following information inputs: [Assignment: organization-defined information inputs to the system]. **SI-10 Additional FedRAMP Requirements and Guidance:** **Requirement**: Validate all information inputs and document any exceptions.

๐Ÿ’ผ SI-10(1) Information Input Validation | Manual Override Capability

(a) Provide a manual override capability for input validation of the following information inputs: [Assignment: organization-defined inputs defined in the base control (SI-10)]; (b) Restrict the use of the manual override capability to only [Assignment: organization-defined authorized individuals]; and (c) Audit the use of the manual override capability.

๐Ÿ’ผ SI-11 Error Handling

a. Generate error messages that provide information necessary for corrective actions without revealing information that could be exploited; and b. Reveal error messages only to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SI-11 ERROR HANDLING

The information system: SI-11a. Generates error messages that provide information necessary for corrective actions without revealing information that could be exploited by adversaries; and SI-11b. Reveals error messages only to [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SI-11 Error Handling (M)(H)

a. Generate error messages that provide information necessary for corrective actions without revealing information that could be exploited; and b. Reveal error messages only to [FedRAMP Assignment: to include the ISSO and/or similar role within the organization].

๐Ÿ’ผ SI-11 Error Handling (M)(H)

a. Generate error messages that provide information necessary for corrective actions without revealing information that could be exploited; and b. Reveal error messages only to [FedRAMP Assignment: to include the ISSO and/or similar role within the organization].

๐Ÿ’ผ SI-12 INFORMATION HANDLING AND RETENTION

The organization handles and retains information within the information system and information output from the system in accordance with applicable federal laws, Executive Orders, directives, policies, regulations, standards, and operational requirements.

๐Ÿ’ผ SI-12 Information Management and Retention

Manage and retain information within the system and information output from the system in accordance with applicable laws, executive orders, directives, regulations, policies, standards, guidelines and operational requirements.

๐Ÿ’ผ SI-13 (3) MANUAL TRANSFER BETWEEN COMPONENTS

The organization manually initiates transfers between active and standby information system components [Assignment: organization-defined frequency] if the mean time to failure exceeds [Assignment: organization-defined time period].

๐Ÿ’ผ SI-13 (4) STANDBY COMPONENT INSTALLATION | NOTIFICATION

The organization, if information system component failures are detected: SI-13 (4)(a) Ensures that the standby components are successfully and transparently installed within [Assignment: organization-defined time period]; and SI-13 (4)(b) [Selection (one or more): activates [Assignment: organization-defined alarm]; automatically shuts down the information system].

๐Ÿ’ผ SI-13 Predictable Failure Prevention

a. Determine mean time to failure (MTTF) for the following system components in specific environments of operation: [Assignment: organization-defined system components]; and b. Provide substitute system components and a means to exchange active and standby components in accordance with the following criteria: [Assignment: organization-defined MTTF substitution criteria].

๐Ÿ’ผ SI-13 PREDICTABLE FAILURE PREVENTION

The organization: SI-13a. Determines mean time to failure (MTTF) for [Assignment: organization-defined information system components] in specific environments of operation; and SI-13b. Provides substitute information system components and a means to exchange active and standby components at [Assignment: organization-defined MTTF substitution criteria].

๐Ÿ’ผ SI-14 Non-persistence

Implement non-persistent [Assignment: organization-defined system components and services] that are initiated in a known state and terminated [Selection (one or more): upon end of session of use; periodically at [Assignment: organization-defined frequency]].

๐Ÿ’ผ SI-14 NON-PERSISTENCE

The organization implements non-persistent [Assignment: organization-defined information system components and services] that are initiated in a known state and terminated [Selection (one or more): upon end of session of use; periodically at [Assignment: organization-defined frequency]].

๐Ÿ’ผ SI-15 Information Output Filtering

Validate information output from the following software programs and/or applications to ensure that the information is consistent with the expected content: [Assignment: organization-defined software programs and/or applications].

๐Ÿ’ผ SI-15 INFORMATION OUTPUT FILTERING

The information system validates information output from [Assignment: organization-defined software programs and/or applications] to ensure that the information is consistent with the expected content.

๐Ÿ’ผ SI-16 Memory Protection

Implement the following controls to protect the system memory from unauthorized code execution: [Assignment: organization-defined controls].

๐Ÿ’ผ SI-16 MEMORY PROTECTION

The information system implements [Assignment: organization-defined security safeguards] to protect its memory from unauthorized code execution.

๐Ÿ’ผ SI-17 Fail-safe Procedures

Implement the indicated fail-safe procedures when the indicated failures occur: [Assignment: organization-defined list of failure conditions and associated fail-safe procedures].

๐Ÿ’ผ SI-17 FAIL-SAFE PROCEDURES

The information system implements [Assignment: organization-defined fail-safe procedures] when [Assignment: organization-defined failure conditions occur].

๐Ÿ’ผ SI-19 De-identification

a. Remove the following elements of personally identifiable information from datasets: [Assignment: organization-defined elements of personally identifiable information]; and b. Evaluate [Assignment: organization-defined frequency] for effectiveness of de-identification.

๐Ÿ’ผ SI-2 Flaw Remediation

a. Identify, report, and correct system flaws; b. Test software and firmware updates related to flaw remediation for effectiveness and potential side effects before installation; c. Install security-relevant software and firmware updates within [Assignment: organization-defined time period] of the release of the updates; and d. Incorporate flaw remediation into the organizational configuration management process.

๐Ÿ’ผ SI-2 FLAW REMEDIATION

The organization: SI-2a. Identifies, reports, and corrects information system flaws; SI-2b. Tests software and firmware updates related to flaw remediation for effectiveness and potential side effects before installation; SI-2c. Installs security-relevant software and firmware updates within [Assignment: organization-defined time period] of the release of the updates; and SI-2d. Incorporates flaw remediation into the organizational configuration management process.

๐Ÿ’ผ SI-2 Flaw Remediation (L)(M)(H)

a. Identify, report, and correct system flaws; b. Test software and firmware updates related to flaw remediation for effectiveness and potential side effects before installation; c. Install security-relevant software and firmware updates within [FedRAMP Assignment: within thirty (30) days of release of updates] of the release of the updates; and d. Incorporate flaw remediation into the organizational configuration management process.

๐Ÿ’ผ SI-2 Flaw Remediation (L)(M)(H)

a. Identify, report, and correct system flaws; b. Test software and firmware updates related to flaw remediation for effectiveness and potential side effects before installation; c. Install security-relevant software and firmware updates within [FedRAMP Assignment: within thirty (30) days of release of updates] of the release of the updates; and d. Incorporate flaw remediation into the organizational configuration management process.

๐Ÿ’ผ SI-2 Flaw Remediation (L)(M)(H)

a. Identify, report, and correct system flaws; b. Test software and firmware updates related to flaw remediation for effectiveness and potential side effects before installation; c. Install security-relevant software and firmware updates within [FedRAMP Assignment: within thirty (30) days of release of updates] of the release of the updates; and d. Incorporate flaw remediation into the organizational configuration management process.

๐Ÿ’ผ SI-20 Tainting

Embed data or capabilities in the following systems or system components to determine if organizational data has been exfiltrated or improperly removed from the organization: [Assignment: organization-defined systems or system components].

๐Ÿ’ผ SI-21 Information Refresh

Refresh [Assignment: organization-defined information] at [Assignment: organization-defined frequencies] or generate the information on demand and delete the information when no longer needed.

๐Ÿ’ผ SI-22 Information Diversity

a. Identify the following alternative sources of information for [Assignment: organization-defined essential functions and services]: [Assignment: organization-defined alternative information sources]; and b. Use an alternative information source for the execution of essential functions or services on [Assignment: organization-defined systems or system components] when the primary source of information is corrupted or unavailable.

๐Ÿ’ผ SI-23 Information Fragmentation

Based on [Assignment: organization-defined circumstances]: a. Fragment the following information: [Assignment: organization-defined information]; and b. Distribute the fragmented information across the following systems or system components: [Assignment: organization-defined systems or system components].

๐Ÿ’ผ SI-3 (10) MALICIOUS CODE ANALYSIS

The organization: SI-3 (10)(a) Employs [Assignment: organization-defined tools and techniques] to analyze the characteristics and behavior of malicious code; and SI-3 (10)(b) Incorporates the results from malicious code analysis into organizational incident response and flaw remediation processes.

๐Ÿ’ผ SI-3 (6) TESTING | VERIFICATION

The organization: SI-3 (6)(a) Tests malicious code protection mechanisms [Assignment: organization-defined frequency] by introducing a known benign, non-spreading test case into the information system; and SI-3 (6)(b) Verifies that both detection of the test case and associated incident reporting occur.

๐Ÿ’ผ SI-3 (8) DETECT UNAUTHORIZED COMMANDS

The information system detects [Assignment: organization-defined unauthorized operating system commands] through the kernel application programming interface at [Assignment: organization-defined information system hardware components] and [Selection (one or more): issues a warning; audits the command execution; prevents the execution of the command].

๐Ÿ’ผ SI-3 Malicious Code Protection

a. Implement [Selection (one or more): signature based; non-signature based] malicious code protection mechanisms at system entry and exit points to detect and eradicate malicious code; b. Automatically update malicious code protection mechanisms as new releases are available in accordance with organizational configuration management policy and procedures; c. Configure malicious code protection mechanisms to: 1. Perform periodic scans of the system [Assignment: organization-defined frequency] and real-time scans of files from external sources at [Selection (one or more): endpoint; network entry and exit points] as the files are downloaded, opened, or executed in accordance with organizational policy; and 2. [Selection (one or more): block malicious code; quarantine malicious code; take [Assignment: organization-defined action]]; and send alert to [Assignment: organization-defined personnel or roles] in response to malicious code detection; and d. Address the receipt of false positives during malicious code detection and eradication and the resulting potential impact on the availability of the system.

๐Ÿ’ผ SI-3 MALICIOUS CODE PROTECTION

The organization: SI-3a. Employs malicious code protection mechanisms at information system entry and exit points to detect and eradicate malicious code; SI-3b. Updates malicious code protection mechanisms whenever new releases are available in accordance with organizational configuration management policy and procedures; SI-3c. Configures malicious code protection mechanisms to: SI-3c.1. Perform periodic scans of the information system [Assignment: organization-defined frequency] and real-time scans of files from external sources at [Selection (one or more); endpoint; network entry/exit points] as the files are downloaded, opened, or executed in accordance with organizational security policy; and SI-3c.2. [Selection (one or more): block malicious code; quarantine malicious code; send alert to administrator; [Assignment: organization-defined action]] in response to malicious code detection; and SI-3d. Addresses the receipt of false positives during malicious code detection and eradication and the resulting potential impact on the availability of the information system.

๐Ÿ’ผ SI-3 Malicious Code Protection (L)(M)(H)

a. Implement [FedRAMP Assignment: signature based and non-signature based] malicious code protection mechanisms at system entry and exit points to detect and eradicate malicious code; b. Automatically update malicious code protection mechanisms as new releases are available in accordance with organizational configuration management policy and procedures; c. Configure malicious code protection mechanisms to: 1. Perform periodic scans of the system [FedRAMP Assignment: at least weekly] and real-time scans of files from external sources at [FedRAMP Assignment: to include endpoints and network entry and exit points] as the files are downloaded, opened, or executed in accordance with organizational policy; and 2. [FedRAMP Assignment: to include blocking and quarantining malicious code] and send alert to [FedRAMP Assignment; administrator or defined security personnel near-realtime] in response to malicious code detection; and d. Address the receipt of false positives during malicious code detection and eradication and the resulting potential impact on the availability of the system.

๐Ÿ’ผ SI-3 Malicious Code Protection (L)(M)(H)

a. Implement [FedRAMP Assignment: signature based and non-signature based] malicious code protection mechanisms at system entry and exit points to detect and eradicate malicious code; b. Automatically update malicious code protection mechanisms as new releases are available in accordance with organizational configuration management policy and procedures; c. Configure malicious code protection mechanisms to: 1. Perform periodic scans of the system [FedRAMP Assignment: at least weekly] and real-time scans of files from external sources at [FedRAMP Assignment: to include endpoints and network entry and exit points] as the files are downloaded, opened, or executed in accordance with organizational policy; and 2. [FedRAMP Assignment: to include blocking and quarantining malicious code] and send alert to [FedRAMP Assignment; administrator or defined security personnel near-realtime] in response to malicious code detection; and d. Address the receipt of false positives during malicious code detection and eradication and the resulting potential impact on the availability of the system.

๐Ÿ’ผ SI-3 Malicious Code Protection (L)(M)(H)

a. Implement [FedRAMP Assignment: signature based and non-signature based] malicious code protection mechanisms at system entry and exit points to detect and eradicate malicious code; b. Automatically update malicious code protection mechanisms as new releases are available in accordance with organizational configuration management policy and procedures; c. Configure malicious code protection mechanisms to: 1. Perform periodic scans of the system [FedRAMP Assignment: at least weekly] and real-time scans of files from external sources at [FedRAMP Assignment: to include endpoints and network entry and exit points] as the files are downloaded, opened, or executed in accordance with organizational policy; and 2. [FedRAMP Assignment: to include blocking and quarantining malicious code] and send alert to [FedRAMP Assignment; administrator or defined security personnel near-realtime] in response to malicious code detection; and d. Address the receipt of false positives during malicious code detection and eradication and the resulting potential impact on the availability of the system.

๐Ÿ’ผ SI-3(10) Malicious Code Protection | Malicious Code Analysis

(a) Employ the following tools and techniques to analyze the characteristics and behavior of malicious code: [Assignment: organization-defined tools and techniques]; and (b) Incorporate the results from malicious code analysis into organizational incident response and flaw remediation processes.

๐Ÿ’ผ SI-3(8) Malicious Code Protection | Detect Unauthorized Commands

(a) Detect the following unauthorized operating system commands through the kernel application programming interface on [Assignment: organization-defined system hardware components]: [Assignment: organization-defined unauthorized operating system commands]; and (b) [Selection (one or more): issue a warning; audit the command execution; prevent the execution of the command].

๐Ÿ’ผ SI-4 (11) ANALYZE COMMUNICATIONS TRAFFIC ANOMALIES

The organization analyzes outbound communications traffic at the external boundary of the information system and selected [Assignment: organization-defined interior points within the system (e.g., subnetworks, subsystems)] to discover anomalies.

๐Ÿ’ผ SI-4 (12) AUTOMATED ALERTS

The organization employs automated mechanisms to alert security personnel of the following inappropriate or unusual activities with security implications: [Assignment: organization-defined activities that trigger alerts].

๐Ÿ’ผ SI-4 (13) ANALYZE TRAFFIC | EVENT PATTERNS

The organization: SI-4 (13)(a) Analyzes communications traffic/event patterns for the information system; SI-4 (13)(b) Develops profiles representing common traffic patterns and/or events; and SI-4 (13)(c) Uses the traffic/event profiles in tuning system-monitoring devices to reduce the number of false positives and the number of false negatives.

๐Ÿ’ผ SI-4 (18) ANALYZE TRAFFIC | COVERT EXFILTRATION

The organization analyzes outbound communications traffic at the external boundary of the information system (i.e., system perimeter) and at [Assignment: organization-defined interior points within the system (e.g., subsystems, subnetworks)] to detect covert exfiltration of information.

๐Ÿ’ผ SI-4 (19) INDIVIDUALS POSING GREATER RISK

The organization implements [Assignment: organization-defined additional monitoring] of individuals who have been identified by [Assignment: organization-defined sources] as posing an increased level of risk.

๐Ÿ’ผ SI-4 (22) UNAUTHORIZED NETWORK SERVICES

The information system detects network services that have not been authorized or approved by [Assignment: organization-defined authorization or approval processes] and [Selection (one or more): audits; alerts [Assignment: organization-defined personnel or roles]].

๐Ÿ’ผ SI-4 (23) HOST-BASED DEVICES

The organization implements [Assignment: organization-defined host-based monitoring mechanisms] at [Assignment: organization-defined information system components].

๐Ÿ’ผ SI-4 (3) AUTOMATED TOOL INTEGRATION

The organization employs automated tools to integrate intrusion detection tools into access control and flow control mechanisms for rapid response to attacks by enabling reconfiguration of these mechanisms in support of attack isolation and elimination.

๐Ÿ’ผ SI-4 (5) SYSTEM-GENERATED ALERTS

The information system alerts [Assignment: organization-defined personnel or roles] when the following indications of compromise or potential compromise occur: [Assignment: organization-defined compromise indicators].

๐Ÿ’ผ SI-4 (7) AUTOMATED RESPONSE TO SUSPICIOUS EVENTS

The information system notifies [Assignment: organization-defined incident response personnel (identified by name and/or by role)] of detected suspicious events and takes [Assignment: organization-defined least-disruptive actions to terminate suspicious events].

๐Ÿ’ผ SI-4 INFORMATION SYSTEM MONITORING

The organization: SI-4a. Monitors the information system to detect: SI-4a.1. Attacks and indicators of potential attacks in accordance with [Assignment: organization-defined monitoring objectives]; and SI-4a.2. Unauthorized local, network, and remote connections; SI-4b. Identifies unauthorized use of the information system through [Assignment: organization-defined techniques and methods]; SI-4c. Deploys monitoring devices: SI-4c.1. Strategically within the information system to collect organization-determined essential information; and SI-4c.2. At ad hoc locations within the system to track specific types of transactions of interest to the organization; SI-4d. Protects information obtained from intrusion-monitoring tools from unauthorized access, modification, and deletion; SI-4e. Heightens the level of information system monitoring activity whenever there is an indication of increased risk to organizational operations and assets, individuals, other organizations, or the Nation based on law enforcement information, intelligence information, or other credible sources of information; SI-4f. Obtains legal opinion with regard to information system monitoring activities in accordance with applicable federal laws, Executive Orders, directives, policies, or regulations; and SI-4g. Provides [Assignment: organization-defined information system monitoring information] to [Assignment: organization-defined personnel or roles] [Selection (one or more): as needed; [Assignment: organization-defined frequency]].

๐Ÿ’ผ SI-4 System Monitoring

a. Monitor the system to detect: 1. Attacks and indicators of potential attacks in accordance with the following monitoring objectives: [Assignment: organization-defined monitoring objectives]; and 2. Unauthorized local, network, and remote connections; b. Identify unauthorized use of the system through the following techniques and methods: [Assignment: organization-defined techniques and methods]; c. Invoke internal monitoring capabilities or deploy monitoring devices: 1. Strategically within the system to collect organization-determined essential information; and 2. At ad hoc locations within the system to track specific types of transactions of interest to the organization; d. Analyze detected events and anomalies; e. Adjust the level of system monitoring activity when there is a change in risk to organizational operations and assets, individuals, other organizations, or the Nation; f. Obtain legal opinion regarding system monitoring activities; and g. Provide [Assignment: organization-defined system monitoring information] to [Assignment: organization-defined personnel or roles] [Selection (one or more): as needed; [Assignment: organization-defined frequency]].

๐Ÿ’ผ SI-4 System Monitoring (L)(M)(H)

a. Monitor the system to detect: 1. Attacks and indicators of potential attacks in accordance with the following monitoring objectives: [Assignment: organization-defined monitoring objectives]; and 2. Unauthorized local, network, and remote connections; b. Identify unauthorized use of the system through the following techniques and methods: [Assignment: organization-defined techniques and methods]; c. Invoke internal monitoring capabilities or deploy monitoring devices: 1. Strategically within the system to collect organization-determined essential information; and 2. At ad hoc locations within the system to track specific types of transactions of interest to the organization; d. Analyze detected events and anomalies; e. Adjust the level of system monitoring activity when there is a change in risk to organizational operations and assets, individuals, other organizations, or the Nation; f. Obtain legal opinion regarding system monitoring activities; and g. Provide [Assignment: organization-defined system monitoring information] to [Assignment: organization-defined personnel or roles] [Selection (one-or-more): as needed; [Assignment: organization-defined frequency]]. **SI-4 Additional FedRAMP Requirements and Guidance:** **Guidance**: See US-CERT Incident Response Reporting Guidelines.

๐Ÿ’ผ SI-4 System Monitoring (L)(M)(H)

a. Monitor the system to detect: 1. Attacks and indicators of potential attacks in accordance with the following monitoring objectives: [Assignment: organization-defined monitoring objectives]; and 2. Unauthorized local, network, and remote connections; b. Identify unauthorized use of the system through the following techniques and methods: [Assignment: organization-defined techniques and methods]; c. Invoke internal monitoring capabilities or deploy monitoring devices: 1. Strategically within the system to collect organization-determined essential information; and 2. At ad hoc locations within the system to track specific types of transactions of interest to the organization; d. Analyze detected events and anomalies; e. Adjust the level of system monitoring activity when there is a change in risk to organizational operations and assets, individuals, other organizations, or the Nation; f. Obtain legal opinion regarding system monitoring activities; and g. Provide [Assignment: organization-defined system monitoring information] to [Assignment: organization-defined personnel or roles] [Selection (one-or-more): as needed; [Assignment: organization-defined frequency]]. **SI-4 Additional FedRAMP Requirements and Guidance:** **Guidance**: See US-CERT Incident Response Reporting Guidelines.

๐Ÿ’ผ SI-4 System Monitoring (L)(M)(H)

a. Monitor the system to detect: 1. Attacks and indicators of potential attacks in accordance with the following monitoring objectives: [Assignment: organization-defined monitoring objectives]; and 2. Unauthorized local, network, and remote connections; b. Identify unauthorized use of the system through the following techniques and methods: [Assignment: organization-defined techniques and methods]; c. Invoke internal monitoring capabilities or deploy monitoring devices: 1. Strategically within the system to collect organization-determined essential information; and 2. At ad hoc locations within the system to track specific types of transactions of interest to the organization; d. Analyze detected events and anomalies; e. Adjust the level of system monitoring activity when there is a change in risk to organizational operations and assets, individuals, other organizations, or the Nation; f. Obtain legal opinion regarding system monitoring activities; and g. Provide [Assignment: organization-defined system monitoring information] to [Assignment: organization-defined personnel or roles] [Selection (one-or-more): as needed; [Assignment: organization-defined frequency]]. **SI-4 Additional FedRAMP Requirements and Guidance:** **Guidance**: See US-CERT Incident Response Reporting Guidelines.

๐Ÿ’ผ SI-4(10) Visibility of Encrypted Communications (H)

Make provisions so that [Assignment: organization-defined encrypted communications traffic] is visible to [Assignment: organization-defined system monitoring tools and mechanisms]. **SI-4 (10) Additional FedRAMP Requirements and Guidance:** **Requirement**: The service provider must support Agency requirements to comply with [M-21-31](https://www.whitehouse.gov/wp-content/uploads/2021/08/M-21-31-Improving-the-Federal-Governments-Investigative-and-Remediation-Capabilities-Related-to-Cybersecurity-Incidents.pdf) and [M-22-09](https://www.whitehouse.gov/wp-content/uploads/2022/01/M-22-09.pdf).

๐Ÿ’ผ SI-4(12) Automated Organization-generated Alerts (H)

Alert [Assignment: organization-defined personnel or roles] using [Assignment: organization-defined automated mechanisms] when the following indications of inappropriate or unusual activities with security or privacy implications occur: [Assignment: organization-defined activities that trigger alerts].

๐Ÿ’ผ SI-4(19) Risk for Individuals (H)

Implement [Assignment: organization-defined additional monitoring] of individuals who have been identified by [Assignment: organization-defined sources] as posing an increased level of risk.

๐Ÿ’ผ SI-4(22) Unauthorized Network Services (H)

(a) Detect network services that have not been authorized or approved by [Assignment: organization-defined authorization or approval processes]; and (b) [Selection (one or more): Audit; Alert [Assignment: organization-defined personnel or roles]] when detected.

๐Ÿ’ผ SI-4(23) Host-based Devices (M)(H)

Implement the following host-based monitoring mechanisms at [Assignment: organization-defined system components]: [Assignment: organization-defined host-based monitoring mechanisms].

๐Ÿ’ผ SI-4(23) Host-based Devices (M)(H)

Implement the following host-based monitoring mechanisms at [Assignment: organization-defined system components]: [Assignment: organization-defined host-based monitoring mechanisms].

๐Ÿ’ผ SI-4(4) Inbound and Outbound Communications Traffic (M)(H)

(a) Determine criteria for unusual or unauthorized activities or conditions for inbound and outbound communications traffic; (b) Monitor inbound and outbound communications traffic [FedRAMP Assignment: continuously] for [Assignment: organization-defined unusual or unauthorized activities or conditions].

๐Ÿ’ผ SI-4(4) Inbound and Outbound Communications Traffic (M)(H)

(a) Determine criteria for unusual or unauthorized activities or conditions for inbound and outbound communications traffic; (b) Monitor inbound and outbound communications traffic [FedRAMP Assignment: continuously] for [Assignment: organization-defined unusual or unauthorized activities or conditions].

๐Ÿ’ผ SI-4(5) System-generated Alerts (M)(H)

Alert [Assignment: organization-defined personnel or roles] when the following system-generated indications of compromise or potential compromise occur: [Assignment: organization-defined compromise indicators]. **SI-4 (5) Additional FedRAMP Requirements and Guidance:** **Guidance**: In accordance with the incident response plan.

๐Ÿ’ผ SI-4(5) System-generated Alerts (M)(H)

Alert [Assignment: organization-defined personnel or roles] when the following system-generated indications of compromise or potential compromise occur: [Assignment: organization-defined compromise indicators]. **SI-4 (5) Additional FedRAMP Requirements and Guidance:** **Guidance**: In accordance with the incident response plan.

๐Ÿ’ผ SI-5 Security Alerts, Advisories, and Directives

a. Receive system security alerts, advisories, and directives from [Assignment: organization-defined external organizations] on an ongoing basis; b. Generate internal security alerts, advisories, and directives as deemed necessary; c. Disseminate security alerts, advisories, and directives to: [Selection (one or more): [Assignment: organization-defined personnel or roles]; [Assignment: organization-defined elements within the organization]; [Assignment: organization-defined external organizations]]; and d. Implement security directives in accordance with established time frames, or notify the issuing organization of the degree of noncompliance.

๐Ÿ’ผ SI-5 SECURITY ALERTS, ADVISORIES, AND DIRECTIVES

The organization: SI-5a. Receives information system security alerts, advisories, and directives from [Assignment: organization-defined external organizations] on an ongoing basis; SI-5b. Generates internal security alerts, advisories, and directives as deemed necessary; SI-5c. Disseminates security alerts, advisories, and directives to: [Selection (one or more): [Assignment: organization-defined personnel or roles]; [Assignment: organization-defined elements within the organization]; [Assignment: organization-defined external organizations]]; and SI-5d. Implements security directives in accordance with established time frames, or notifies the issuing organization of the degree of noncompliance.

๐Ÿ’ผ SI-5 Security Alerts, Advisories, and Directives (L)(M)(H)

a. Receive system security alerts, advisories, and directives from [FedRAMP Assignment: to include US-CERT and Cybersecurity and Infrastructure Security Agency (CISA) Directives] on an ongoing basis; b. Generate internal security alerts, advisories, and directives as deemed necessary; c. Disseminate security alerts, advisories, and directives to: [Selection (one-or-more): [FedRAMP Assignment: to include system security personnel and administrators with configuration/patch-management responsibilities]; [Assignment: organization-defined elements within the organization]; [Assignment: organization-defined external organizations]]; and d. Implement security directives in accordance with established time frames, or notify the issuing organization of the degree of noncompliance. **SI-5 Additional FedRAMP Requirements and Guidance:** **Requirement**: Service Providers must address the CISA Emergency and Binding Operational Directives applicable to their cloud service offering per FedRAMP guidance. This includes listing the applicable directives and stating compliance status.

๐Ÿ’ผ SI-5 Security Alerts, Advisories, and Directives (L)(M)(H)

a. Receive system security alerts, advisories, and directives from [FedRAMP Assignment: to include US-CERT and Cybersecurity and Infrastructure Security Agency (CISA) Directives] on an ongoing basis; b. Generate internal security alerts, advisories, and directives as deemed necessary; c. Disseminate security alerts, advisories, and directives to: [Selection (one-or-more): [FedRAMP Assignment: to include system security personnel and administrators with configuration/patch-management responsibilities]; [Assignment: organization-defined elements within the organization]; [Assignment: organization-defined external organizations]]; and d. Implement security directives in accordance with established time frames, or notify the issuing organization of the degree of noncompliance. **SI-5 Additional FedRAMP Requirements and Guidance:** **Requirement**: Service Providers must address the CISA Emergency and Binding Operational Directives applicable to their cloud service offering per FedRAMP guidance. This includes listing the applicable directives and stating compliance status.

๐Ÿ’ผ SI-5 Security Alerts, Advisories, and Directives (L)(M)(H)

a. Receive system security alerts, advisories, and directives from [FedRAMP Assignment: to include US-CERT and Cybersecurity and Infrastructure Security Agency (CISA) Directives] on an ongoing basis; b. Generate internal security alerts, advisories, and directives as deemed necessary; c. Disseminate security alerts, advisories, and directives to: [Selection (one-or-more): [FedRAMP Assignment: to include system security personnel and administrators with configuration/patch-management responsibilities]; [Assignment: organization-defined elements within the organization]; [Assignment: organization-defined external organizations]]; and d. Implement security directives in accordance with established time frames, or notify the issuing organization of the degree of noncompliance. **SI-5 Additional FedRAMP Requirements and Guidance:** **Requirement**: Service Providers must address the CISA Emergency and Binding Operational Directives applicable to their cloud service offering per FedRAMP guidance. This includes listing the applicable directives and stating compliance status.

๐Ÿ’ผ SI-6 Security and Privacy Function Verification

a. Verify the correct operation of [Assignment: organization-defined security and privacy functions]; b. Perform the verification of the functions specified in SI-6a [Selection (one or more): [Assignment: organization-defined system transitional states]; upon command by user with appropriate privilege; [Assignment: organization-defined frequency]]; c. Alert [Assignment: organization-defined personnel or roles] to failed security and privacy verification tests; and d. [Selection (one or more): Shut the system down; Restart the system; [Assignment: organization-defined alternative action(s)]] when anomalies are discovered.

๐Ÿ’ผ SI-6 Security and Privacy Function Verification (M)(H)

a. Verify the correct operation of [Assignment: organization-defined security and privacy functions]; b. Perform the verification of the functions specified in SI-6a [Selection (one-or-more): [FedRAMP Assignment: system transitional states to include upon system startup and/or restart; upon command by user with appropriate privilege]; [FedRAMP Assignment: at least monthly]]; c. Alert [FedRAMP Assignment: to include system administrators and security personnel] to failed security and privacy verification tests; and d. [Selection (one-or-more): Shut the system down; Restart the system; alternative actions(s)] when anomalies are discovered.

๐Ÿ’ผ SI-6 Security and Privacy Function Verification (M)(H)

a. Verify the correct operation of [Assignment: organization-defined security and privacy functions]; b. Perform the verification of the functions specified in SI-6a [Selection (one-or-more): [FedRAMP Assignment: system transitional states to include upon system startup and/or restart; upon command by user with appropriate privilege]; [FedRAMP Assignment: at least monthly]]; c. Alert [FedRAMP Assignment: to include system administrators and security personnel] to failed security and privacy verification tests; and d. [Selection (one-or-more): Shut the system down; Restart the system; alternative actions(s)] when anomalies are discovered.

๐Ÿ’ผ SI-6 SECURITY FUNCTION VERIFICATION

The information system: SI-6a. Verifies the correct operation of [Assignment: organization-defined security functions]; SI-6b. Performs this verification [Selection (one or more): [Assignment: organization-defined system transitional states]; upon command by user with appropriate privilege; [Assignment: organization-defined frequency]]; SI-6c. Notifies [Assignment: organization-defined personnel or roles] of failed security verification tests; and SI-6d. [Selection (one or more): shuts the information system down; restarts the information system; [Assignment: organization-defined alternative action(s)]] when anomalies are discovered.

๐Ÿ’ผ SI-7 (1) INTEGRITY CHECKS

The information system performs an integrity check of [Assignment: organization-defined software, firmware, and information] [Selection (one or more): at startup; at [Assignment: organization-defined transitional states or security-relevant events]; [Assignment: organization-defined frequency]].

๐Ÿ’ผ SI-7 (13) CODE EXECUTION IN PROTECTED ENVIRONMENTS

The organization allows execution of binary or machine-executable code obtained from sources with limited or no warranty and without the provision of source code only in confined physical or virtual machine environments and with the explicit approval of [Assignment: organization-defined personnel or roles].

๐Ÿ’ผ SI-7 (14) BINARY OR MACHINE EXECUTABLE CODE

The organization: SI-7 (14)(a) Prohibits the use of binary or machine-executable code from sources with limited or no warranty and without the provision of source code; and SI-7 (14)(b) Provides exceptions to the source code requirement only for compelling mission/operational requirements and with the approval of the authorizing official.

๐Ÿ’ผ SI-7 (15) CODE AUTHENTICATION

The information system implements cryptographic mechanisms to authenticate [Assignment: organization-defined software or firmware components] prior to installation.

๐Ÿ’ผ SI-7 (8) AUDITING CAPABILITY FOR SIGNIFICANT EVENTS

The information system, upon detection of a potential integrity violation, provides the capability to audit the event and initiates the following actions: [Selection (one or more): generates an audit record; alerts current user; alerts [Assignment: organization-defined personnel or roles]; [Assignment: organization-defined other actions]].

๐Ÿ’ผ SI-7 Software, Firmware, and Information Integrity

a. Employ integrity verification tools to detect unauthorized changes to the following software, firmware, and information: [Assignment: organization-defined software, firmware, and information]; and b. Take the following actions when unauthorized changes to the software, firmware, and information are detected: [Assignment: organization-defined actions].

๐Ÿ’ผ SI-7 Software, Firmware, and Information Integrity (M)(H)

a. Employ integrity verification tools to detect unauthorized changes to the following software, firmware, and information: [Assignment: organization-defined software, firmware, and information]; and b. Take the following actions when unauthorized changes to the software, firmware, and information are detected: [Assignment: organization-defined actions].

๐Ÿ’ผ SI-7 Software, Firmware, and Information Integrity (M)(H)

a. Employ integrity verification tools to detect unauthorized changes to the following software, firmware, and information: [Assignment: organization-defined software, firmware, and information]; and b. Take the following actions when unauthorized changes to the software, firmware, and information are detected: [Assignment: organization-defined actions].

๐Ÿ’ผ SI-7(1) Integrity Checks (M)(H)

Perform an integrity check of [Assignment: organization-defined software, firmware, and information] [Selection (one-or-more): at startup; at [Assignment: organization-defined transitional states or security-relevant events]; [FedRAMP Assignment: at least monthly]].

๐Ÿ’ผ SI-7(1) Integrity Checks (M)(H)

Perform an integrity check of [Assignment: organization-defined software, firmware, and information] [Selection (one-or-more): at startup; at [Assignment: organization-defined transitional states or security-relevant events]; [FedRAMP Assignment: at least monthly]].

๐Ÿ’ผ SI-7(15) Code Authentication (H)

Implement cryptographic mechanisms to authenticate the following software or firmware components prior to installation: [FedRAMP Assignment: to include all software and firmware inside the boundary].

๐Ÿ’ผ SI-8 Spam Protection

a. Employ spam protection mechanisms at system entry and exit points to detect and act on unsolicited messages; and b. Update spam protection mechanisms when new releases are available in accordance with organizational configuration management policy and procedures.

๐Ÿ’ผ SI-8 SPAM PROTECTION

The organization: SI-8a. Employs spam protection mechanisms at information system entry and exit points to detect and take action on unsolicited messages; and SI-8b. Updates spam protection mechanisms when new releases are available in accordance with organizational configuration management policy and procedures.

๐Ÿ’ผ SI-8 Spam Protection (M)(H)

a. Employ spam protection mechanisms at system entry and exit points to detect and act on unsolicited messages; and b. Update spam protection mechanisms when new releases are available in accordance with organizational configuration management policy and procedures. **SI-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: When CSO sends email on behalf of the government as part of the business offering, Control Description should include implementation of Domain-based Message Authentication, Reporting & Conformance (DMARC) on the sending domain for outgoing messages as described in [DHS Binding Operational Directive (BOD) 18-01](https://cyber.dhs.gov/bod/18-01/). **Guidance**: CSPs should confirm DMARC configuration (where appropriate) to ensure that policy=reject and the rua parameter includes reports@dmarc.cyber.dhs.gov. DMARC compliance should be documented in the SI-08 control implementation solution description and list the FROM: domain(s) that will be seen by email recipients.

๐Ÿ’ผ SI-8 Spam Protection (M)(H)

a. Employ spam protection mechanisms at system entry and exit points to detect and act on unsolicited messages; and b. Update spam protection mechanisms when new releases are available in accordance with organizational configuration management policy and procedures. **SI-8 Additional FedRAMP Requirements and Guidance:** **Guidance**: When CSO sends email on behalf of the government as part of the business offering, Control Description should include implementation of Domain-based Message Authentication, Reporting & Conformance (DMARC) on the sending domain for outgoing messages as described in [DHS Binding Operational Directive (BOD) 18-01](https://cyber.dhs.gov/bod/18-01/). **Guidance**: CSPs should confirm DMARC configuration (where appropriate) to ensure that policy=reject and the rua parameter includes reports@dmarc.cyber.dhs.gov. DMARC compliance should be documented in the SI-08 control implementation solution description and list the FROM: domain(s) that will be seen by email recipients.

๐Ÿ’ผ Software and architecture

Implement patterns for performing load smoothing and maintaining consistent high utilization of deployed resources to minimize the resources consumed. Components might become idle from lack of use because of changes in user behavior over time. Revise patterns and architecture to consolidate under-utilized components to increase overall utilization. Retire components that are no longer required. Understand the performance of your workload components, and optimize the components that consume the most resources. Be aware of the devices your customers use to access your services, and implement patterns to minimize the need for device upgrades.

๐Ÿ’ผ SR-1 Policy and Procedures

a. Develop, document, and disseminate to [Assignment: organization-defined personnel or roles]: 1. [Selection (one or more): Organization-level; Mission/business process-level; System-level] supply chain risk management policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the supply chain risk management policy and the associated supply chain risk management controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the supply chain risk management policy and procedures; and c. Review and update the current supply chain risk management: 1. Policy [Assignment: organization-defined frequency] and following [Assignment: organization-defined events]; and 2. Procedures [Assignment: organization-defined frequency] and following [Assignment: organization-defined events].

๐Ÿ’ผ SR-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role or designees]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] supply chain risk management policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the supply chain risk management policy and the associated supply chain risk management controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the supply chain risk management policy and procedures; and c. Review and update the current supply chain risk management: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SR-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role or designees]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] supply chain risk management policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the supply chain risk management policy and the associated supply chain risk management controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the supply chain risk management policy and procedures; and c. Review and update the current supply chain risk management: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SR-1 Policy and Procedures (L)(M)(H)

a. Develop, document, and disseminate to [FedRAMP Assignment: to include chief privacy and ISSO and/or similar role or designees]: 1. [Selection (one-or-more): organization-level; mission/business process-level; system-level] supply chain risk management policy that: (a) Addresses purpose, scope, roles, responsibilities, management commitment, coordination among organizational entities, and compliance; and (b) Is consistent with applicable laws, executive orders, directives, regulations, policies, standards, and guidelines; and 2. Procedures to facilitate the implementation of the supply chain risk management policy and the associated supply chain risk management controls; b. Designate an [Assignment: organization-defined official] to manage the development, documentation, and dissemination of the supply chain risk management policy and procedures; and c. Review and update the current supply chain risk management: 1. Policy [FedRAMP Assignment: at least every three (3) years] and following [Assignment: organization-defined events]; and 2. Procedures [FedRAMP Assignment: at least annually] and following [FedRAMP Assignment: significant changes].

๐Ÿ’ผ SR-10 Inspection of Systems or Components

Inspect the following systems or system components [Selection (one or more): at random; at [Assignment: organization-defined frequency], upon [Assignment: organization-defined indications of need for inspection]] to detect tampering: [Assignment: organization-defined systems or system components].

๐Ÿ’ผ SR-10 Inspection of Systems or Components (L)(M)(H)

Inspect the following systems or system components [Selection (one-or-more): at random; at [Assignment: organization-defined frequency], upon [Assignment: organization-defined indications of need for inspection]] to detect tampering: [Assignment: organization-defined systems or system components].

๐Ÿ’ผ SR-10 Inspection of Systems or Components (L)(M)(H)

Inspect the following systems or system components [Selection (one-or-more): at random; at [Assignment: organization-defined frequency], upon [Assignment: organization-defined indications of need for inspection]] to detect tampering: [Assignment: organization-defined systems or system components].

๐Ÿ’ผ SR-10 Inspection of Systems or Components (L)(M)(H)

Inspect the following systems or system components [Selection (one-or-more): at random; at [Assignment: organization-defined frequency], upon [Assignment: organization-defined indications of need for inspection]] to detect tampering: [Assignment: organization-defined systems or system components].

๐Ÿ’ผ SR-11 Component Authenticity

a. Develop and implement anti-counterfeit policy and procedures that include the means to detect and prevent counterfeit components from entering the system; and b. Report counterfeit system components to [Selection (one or more): source of counterfeit component; [Assignment: organization-defined external reporting organizations]; [Assignment: organization-defined personnel or roles]].

๐Ÿ’ผ SR-11 Component Authenticity (L)(M)(H)

a. Develop and implement anti-counterfeit policy and procedures that include the means to detect and prevent counterfeit components from entering the system; and b. Report counterfeit system components to [Selection (one-or-more): source of counterfeit component; [Assignment: organization-defined external reporting organizations]; [Assignment: organization-defined personnel or roles]]. **SR-11 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure that their supply chain vendors provide authenticity of software and patches and the vendor must have a plan to protect the development pipeline.

๐Ÿ’ผ SR-11 Component Authenticity (L)(M)(H)

a. Develop and implement anti-counterfeit policy and procedures that include the means to detect and prevent counterfeit components from entering the system; and b. Report counterfeit system components to [Selection (one-or-more): source of counterfeit component; [Assignment: organization-defined external reporting organizations]; [Assignment: organization-defined personnel or roles]]. **SR-11 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure that their supply chain vendors provide authenticity of software and patches and the vendor must have a plan to protect the development pipeline.

๐Ÿ’ผ SR-11 Component Authenticity (L)(M)(H)

a. Develop and implement anti-counterfeit policy and procedures that include the means to detect and prevent counterfeit components from entering the system; and b. Report counterfeit system components to [Selection (one-or-more): source of counterfeit component; [Assignment: organization-defined external reporting organizations]; [Assignment: organization-defined personnel or roles]]. **SR-11 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure that their supply chain vendors provide authenticity of software and patches and the vendor must have a plan to protect the development pipeline.

๐Ÿ’ผ SR-12 Component Disposal

Dispose of [Assignment: organization-defined data, documentation, tools, or system components] using the following techniques and methods: [Assignment: organization-defined techniques and methods].

๐Ÿ’ผ SR-12 Component Disposal (L)(M)(H)

Dispose of [Assignment: organization-defined data, documentation, tools, or system components] using the following techniques and methods: [Assignment: organization-defined techniques and methods].

๐Ÿ’ผ SR-12 Component Disposal (L)(M)(H)

Dispose of [Assignment: organization-defined data, documentation, tools, or system components] using the following techniques and methods: [Assignment: organization-defined techniques and methods].

๐Ÿ’ผ SR-12 Component Disposal (L)(M)(H)

Dispose of [Assignment: organization-defined data, documentation, tools, or system components] using the following techniques and methods: [Assignment: organization-defined techniques and methods].

๐Ÿ’ผ SR-2 Supply Chain Risk Management Plan

a. Develop a plan for managing supply chain risks associated with the research and development, design, manufacturing, acquisition, delivery, integration, operations and maintenance, and disposal of the following systems, system components or system services: [Assignment: organization-defined systems, system components, or system services]; b. Review and update the supply chain risk management plan [Assignment: organization-defined frequency] or as required, to address threat, organizational or environmental changes; and c. Protect the supply chain risk management plan from unauthorized disclosure and modification.

๐Ÿ’ผ SR-2 Supply Chain Risk Management Plan (L)(M)(H)

a. Develop a plan for managing supply chain risks associated with the research and development, design, manufacturing, acquisition, delivery, integration, operations and maintenance, and disposal of the following systems, system components or system services [Assignment: organization-defined systems, system components, or system services] b. Review and update the supply chain risk management plan [FedRAMP Assignment: at least annually] or as required, to address threat, organizational or environmental changes; and c. Protect the supply chain risk management plan from unauthorized disclosure and modification.

๐Ÿ’ผ SR-2 Supply Chain Risk Management Plan (L)(M)(H)

a. Develop a plan for managing supply chain risks associated with the research and development, design, manufacturing, acquisition, delivery, integration, operations and maintenance, and disposal of the following systems, system components or system services [Assignment: organization-defined systems, system components, or system services] b. Review and update the supply chain risk management plan [FedRAMP Assignment: at least annually] or as required, to address threat, organizational or environmental changes; and c. Protect the supply chain risk management plan from unauthorized disclosure and modification.

๐Ÿ’ผ SR-2 Supply Chain Risk Management Plan (L)(M)(H)

a. Develop a plan for managing supply chain risks associated with the research and development, design, manufacturing, acquisition, delivery, integration, operations and maintenance, and disposal of the following systems, system components or system services [Assignment: organization-defined systems, system components, or system services] b. Review and update the supply chain risk management plan [FedRAMP Assignment: at least annually] or as required, to address threat, organizational or environmental changes; and c. Protect the supply chain risk management plan from unauthorized disclosure and modification.

๐Ÿ’ผ SR-2(1) Establish SCRM Team (L)(M)(H)

Establish a supply chain risk management team consisting of [Assignment: organization-defined personnel, roles, and responsibilities] to lead and support the following SCRM activities: [Assignment: organization-defined supply chain risk management activities].

๐Ÿ’ผ SR-2(1) Establish SCRM Team (L)(M)(H)

Establish a supply chain risk management team consisting of [Assignment: organization-defined personnel, roles, and responsibilities] to lead and support the following SCRM activities: [Assignment: organization-defined supply chain risk management activities].

๐Ÿ’ผ SR-2(1) Establish SCRM Team (L)(M)(H)

Establish a supply chain risk management team consisting of [Assignment: organization-defined personnel, roles, and responsibilities] to lead and support the following SCRM activities: [Assignment: organization-defined supply chain risk management activities].

๐Ÿ’ผ SR-3 Supply Chain Controls and Processes

a. Establish a process or processes to identify and address weaknesses or deficiencies in the supply chain elements and processes of [Assignment: organization-defined system or system component] in coordination with [Assignment: organization-defined supply chain personnel]; b. Employ the following controls to protect against supply chain risks to the system, system component, or system service and to limit the harm or consequences from supply chain-related events: [Assignment: organization-defined supply chain controls]; and c. Document the selected and implemented supply chain processes and controls in [Selection: security and privacy plans; supply chain risk management plan; [Assignment: organization-defined document]].

๐Ÿ’ผ SR-3 Supply Chain Controls and Processes (L)(M)(H)

a. Establish a process or processes to identify and address weaknesses or deficiencies in the supply chain elements and processes of [Assignment: organization-defined system or system component] in coordination with [Assignment: organization-defined supply chain personnel]; b. Employ the following controls to protect against supply chain risks to the system, system component, or system service and to limit the harm or consequences from supply chain-related events: [Assignment: organization-defined supply chain controls]; and c. Document the selected and implemented supply chain processes and controls in [Selection: security and privacy plans; supply chain risk management plan [Assignment: organization-defined document]]. **SR-3 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSO must document and maintain the supply chain custody, including replacement devices, to ensure the integrity of the devices before being introduced to the boundary.

๐Ÿ’ผ SR-3 Supply Chain Controls and Processes (L)(M)(H)

a. Establish a process or processes to identify and address weaknesses or deficiencies in the supply chain elements and processes of [Assignment: organization-defined system or system component] in coordination with [Assignment: organization-defined supply chain personnel]; b. Employ the following controls to protect against supply chain risks to the system, system component, or system service and to limit the harm or consequences from supply chain-related events: [Assignment: organization-defined supply chain controls]; and c. Document the selected and implemented supply chain processes and controls in [Selection: security and privacy plans; supply chain risk management plan [Assignment: organization-defined document]]. **SR-3 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSO must document and maintain the supply chain custody, including replacement devices, to ensure the integrity of the devices before being introduced to the boundary.

๐Ÿ’ผ SR-3 Supply Chain Controls and Processes (L)(M)(H)

a. Establish a process or processes to identify and address weaknesses or deficiencies in the supply chain elements and processes of [Assignment: organization-defined system or system component] in coordination with [Assignment: organization-defined supply chain personnel]; b. Employ the following controls to protect against supply chain risks to the system, system component, or system service and to limit the harm or consequences from supply chain-related events: [Assignment: organization-defined supply chain controls]; and c. Document the selected and implemented supply chain processes and controls in [Selection: security and privacy plans; supply chain risk management plan [Assignment: organization-defined document]]. **SR-3 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSO must document and maintain the supply chain custody, including replacement devices, to ensure the integrity of the devices before being introduced to the boundary.

๐Ÿ’ผ SR-4 Provenance

Document, monitor, and maintain valid provenance of the following systems, system components, and associated data: [Assignment: organization-defined systems, system components, and associated data].

๐Ÿ’ผ SR-4(1) Provenance | Identity

Establish and maintain unique identification of the following supply chain elements, processes, and personnel associated with the identified system and critical system components: [Assignment: organization-defined supply chain elements, processes, and personnel associated with organization-defined systems and critical system components].

๐Ÿ’ผ SR-4(2) Provenance | Track and Trace

Establish and maintain unique identification of the following systems and critical system components for tracking through the supply chain: [Assignment: organization-defined systems and critical system components].

๐Ÿ’ผ SR-4(4) Provenance | Supply Chain Integrity โ€” Pedigree

Employ [Assignment: organization-defined controls] and conduct [Assignment: organization-defined analysis] to ensure the integrity of the system and system components by validating the internal composition and provenance of critical or mission-essential technologies, products, and services.

๐Ÿ’ผ SR-5 Acquisition Strategies, Tools, and Methods

Employ the following acquisition strategies, contract tools, and procurement methods to protect against, identify, and mitigate supply chain risks: [Assignment: organization-defined acquisition strategies, contract tools, and procurement methods].

๐Ÿ’ผ SR-6 Supplier Assessments and Reviews

Assess and review the supply chain-related risks associated with suppliers or contractors and the system, system component, or system service they provide [Assignment: organization-defined frequency].

๐Ÿ’ผ SR-6 Supplier Assessments and Reviews (M)(H)

Assess and review the supply chain-related risks associated with suppliers or contractors and the system, system component, or system service they provide [FedRAMP Assignment: at least annually]. **SR-6 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure that their supply chain vendors build and test their systems in alignment with NIST SP 800-171 or a commensurate security and compliance framework. CSOs must ensure that vendors are compliant with physical facility access and logical access controls to supplied products.

๐Ÿ’ผ SR-6 Supplier Assessments and Reviews (M)(H)

Assess and review the supply chain-related risks associated with suppliers or contractors and the system, system component, or system service they provide [FedRAMP Assignment: at least annually]. **SR-6 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure that their supply chain vendors build and test their systems in alignment with NIST SP 800-171 or a commensurate security and compliance framework. CSOs must ensure that vendors are compliant with physical facility access and logical access controls to supplied products.

๐Ÿ’ผ SR-6(1) Supplier Assessments and Reviews | Testing and Analysis

Employ [Selection (one or more): organizational analysis; independent third-party analysis; organizational testing; independent third-party testing] of the following supply chain elements, processes, and actors associated with the system, system component, or system service: [Assignment: organization-defined supply chain elements, processes, and actors].

๐Ÿ’ผ SR-7 Supply Chain Operations Security

Employ the following Operations Security (OPSEC) controls to protect supply chain-related information for the system, system component, or system service: [Assignment: organization-defined Operations Security (OPSEC) controls].

๐Ÿ’ผ SR-8 Notification Agreements

Establish agreements and procedures with entities involved in the supply chain for the system, system component, or system service for the [Selection (one or more): notification of supply chain compromises; results of assessments or audits; [Assignment: organization-defined information]].

๐Ÿ’ผ SR-8 Notification Agreements (L)(M)(H)

Establish agreements and procedures with entities involved in the supply chain for the system, system component, or system service for the [FedRAMP Assignment: notification of supply chain compromises and results of assessment or audits]. **SR-8 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure and document how they receive notifications from their supply chain vendor of newly discovered vulnerabilities including zero-day vulnerabilities.

๐Ÿ’ผ SR-8 Notification Agreements (L)(M)(H)

Establish agreements and procedures with entities involved in the supply chain for the system, system component, or system service for the [FedRAMP Assignment: notification of supply chain compromises and results of assessment or audits]. **SR-8 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure and document how they receive notifications from their supply chain vendor of newly discovered vulnerabilities including zero-day vulnerabilities.

๐Ÿ’ผ SR-8 Notification Agreements (L)(M)(H)

Establish agreements and procedures with entities involved in the supply chain for the system, system component, or system service for the [FedRAMP Assignment: notification of supply chain compromises and results of assessment or audits]. **SR-8 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure and document how they receive notifications from their supply chain vendor of newly discovered vulnerabilities including zero-day vulnerabilities.

๐Ÿ’ผ SR-9 Tamper Resistance and Detection (H)

Implement a tamper protection program for the system, system component, or system service. **SR-9 Additional FedRAMP Requirements and Guidance:** **Requirement**: CSOs must ensure vendors provide authenticity of software and patches supplied to the service provider including documenting the safeguards in place.

๐Ÿ’ผ Supply Chain Risk Management (ID.SC)

The organization's priorities, constraints, risk tolerances, and assumptions are established and used to support risk decisions associated with managing supply chain risk. The organization has established and implemented the processes to identify, assess and manage supply chain risks.

๐Ÿ’ผ SUS01-BP01 Choose Region based on both business requirements and sustainability goals

Choose a Region for your workload based on both your business requirements and sustainability goals to optimize its KPIs, including performance, cost, and carbon footprint. **Common anti-patterns:** - You select the workloadโ€™s Region based on your own location. - You consolidate all workload resources into one geographic location. **Benefits of establishing this best practice:** Placing a workload close to Amazon renewable energy projects or Regions with low published carbon intensity can help to lower the carbon footprint of a cloud workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance The AWS Cloud is a constantly expanding network of Regions and points of presence (PoP), with a global network infrastructure linking them together. The choice of Region for your workload significantly affects its KPIs, including performance, cost, and carbon footprint. To effectively improve these KPIs, you should choose Regions for your workload based on both your business requirements and sustainability goals. ### Implementation steps 1. Shortlist potential Regions: Follow these steps to assess and shortlist potential Regions for your workload based on your business requirements, including compliance, available features, cost, and latency: - Confirm that these Regions are compliant, based on your required local regulations (for example, data sovereignty). - Use the AWS Regional Services Lists to check if the Regions have the services and features you need to run your workload. - Calculate the cost of the workload on each Region using the AWS Pricing Calculator. - Test the network latency between your end user locations and each AWS Region. 2. Choose Regions: Choose Regions near Amazon renewable energy projects and Regions where the grid has a published carbon intensity that is lower than other locations (or Regions). - Identify your relevant sustainability guidelines to track and compare year-to-year carbon emissions based on Greenhouse Gas Protocol (market-based and location based methods). - Choose region based on method you use to track carbon emissions.

๐Ÿ’ผ SUS02-BP01 Scale workload infrastructure dynamically

Use elasticity of the cloud and scale your infrastructure dynamically to match supply of cloud resources to demand and avoid overprovisioned capacity in your workload. **Common anti-patterns:** - You do not scale your infrastructure with user load. - You manually scale your infrastructure all the time. - You leave increased capacity after a scaling event instead of scaling back down. **Benefits of establishing this best practice:** Configuring and testing workload elasticity help to efficiently match supply of cloud resources to demand and avoid overprovisioned capacity. You can take advantage of elasticity in the cloud to automatically scale capacity during and after demand spikes to make sure you are only using the right number of resources needed to meet your business requirements. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance The cloud provides the flexibility to expand or reduce your resources dynamically through a variety of mechanisms to meet changes in demand. Optimally matching supply to demand delivers the lowest environmental impact for a workload. Demand can be fixed or variable, requiring metrics and automation to make sure that management does not become burdensome. Applications can scale vertically (up or down) by modifying the instance size, horizontally (in or out) by modifying the number of instances, or a combination of both. You can use a number of different approaches to match supply of resources with demand. - **Target-tracking approach:** Monitor your scaling metric and automatically increase or decrease capacity as you need it. - **Predictive scaling:** Scale in anticipation of daily and weekly trends. - **Schedule-based approach:** Set your own scaling schedule according to predictable load changes. - **Service scaling:** Pick services (like serverless) that are natively scaling by design or provide auto scaling as a feature. Identify periods of low or no utilization and scale resources to remove excess capacity and improve efficiency. ### Implementation steps 1. Elasticity matches the supply of resources you have against the demand for those resources. Instances, containers, and functions provide mechanisms for elasticity, either in combination with automatic scaling or as a feature of the service. AWS provides a range of auto scaling mechanisms to ensure that workloads can scale down quickly and easily during periods of low user load. Here are some examples of auto scaling mechanisms: | Auto scaling mechanism | Where to use | |------------------------------|-------------------------------------------------------------------------------| | **Amazon EC2 Auto Scaling** | Use to verify you have the correct number of Amazon EC2 instances available to handle the user load for your application. | | **Application Auto Scaling** | Use to automatically scale the resources for individual AWS services beyond Amazon EC2, such as Lambda functions or Amazon Elastic Container Service (Amazon ECS) services. | | **Kubernetes Cluster Autoscaler** | Use to automatically scale Kubernetes clusters on AWS. | 2. Scaling is often discussed related to compute services like Amazon EC2 instances or AWS Lambda functions. Consider the configuration of non-compute services like Amazon DynamoDB read and write capacity units or Amazon Kinesis Data Streams shards to match the demand. 3. Verify that the metrics for scaling up or down are validated against the type of workload being deployed. If you are deploying a video transcoding application, 100% CPU utilization is expected and should not be your primary metric. You can use a customized metric (such as memory utilization) for your scaling policy if required. To choose the right metrics, consider the following guidance for Amazon EC2: - The metric should be a valid utilization metric and describe how busy an instance is. - The metric value must increase or decrease proportionally to the number of instances in the Auto Scaling group. 4. Use dynamic scaling instead of manual scaling for your Auto Scaling group. We also recommend that you use target tracking scaling policies in your dynamic scaling. 5. Verify that workload deployments can handle both scale-out and scale-in events. Create test scenarios for scale-in events to verify that the workload behaves as expected and does not affect the user experience (like losing sticky sessions). You can use Activity history to verify a scaling activity for an Auto Scaling group. 6. Evaluate your workload for predictable patterns and proactively scale as you anticipate predicted and planned changes in demand. With predictive scaling, you can eliminate the need to overprovision capacity.

๐Ÿ’ผ SUS02-BP02 Align SLAs with sustainability goals

Review and optimize workload service-level agreements (SLA) based on your sustainability goals to minimize the resources required to support your workload while continuing to meet business needs. **Common anti-patterns:** - Workload SLAs are unknown or ambiguous. - You define your SLA just for availability and performance. - You use the same design pattern (like Multi-AZ architecture) for all your workloads. **Benefits of establishing this best practice:** Aligning SLAs with sustainability goals leads to optimal resource usage while meeting business needs. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance SLAs define the level of service expected from a cloud workload, such as response time, availability, and data retention. They influence the architecture, resource usage, and environmental impact of a cloud workload. At a regular cadence, review SLAs and make trade-offs that significantly reduce resource usage in exchange for acceptable decreases in service levels. ### Implementation steps 1. **Understand sustainability goals:** Identify sustainability goals in your organization, such as carbon reduction or improving resource utilization. 2. **Review SLAs:** Evaluate your SLAs to assess if they support your business requirements. If you are exceeding SLAs, perform further review. 3. **Understand trade-offs:** Understand the trade-offs across your workloadโ€™s complexity (like high volume of concurrent users), performance (like latency), and sustainability impact (like required resources). Typically, prioritizing two of the factors comes at the expense of the third. 4. **Adjust SLAs:** Adjust your SLAs by making trade-offs that significantly reduce sustainability impacts in exchange for acceptable decreases in service levels. - **Sustainability and reliability:** Highly available workloads tend to consume more resources. - **Sustainability and performance:** Using more resources to boost performance could have a higher environmental impact. - **Sustainability and security:** Overly secure workloads could have a higher environmental impact. 5. **Define sustainability SLAs if possible:** Include sustainability SLAs for your workload. For example, define a minimum utilization level as a sustainability SLA for your compute instances. 6. **Use efficient design patterns:** Use design patterns such as microservices on AWS that prioritize business-critical functions and allow lower service levels (such as response time or recovery time objectives) for non-critical functions. 7. **Communicate and establish accountability:** Share the SLAs with all relevant stakeholders, including your development team and your customers. Use reporting to track and monitor the SLAs. Assign accountability to meet the sustainability targets for your SLAs. 8. **Use incentives and rewards:** Use incentives and rewards to achieve or exceed SLAs aligned with sustainability goals. 9. **Review and iterate:** Regularly review and adjust your SLAs to make sure they are aligned with evolving sustainability and performance goals.

๐Ÿ’ผ SUS02-BP03 Stop the creation and maintenance of unused assets

Decommission unused assets in your workload to reduce the number of cloud resources required to support your demand and minimize waste. **Common anti-patterns:** - You do not analyze your application for assets that are redundant or no longer required. - You do not remove assets that are redundant or no longer required. **Benefits of establishing this best practice:** Removing unused assets frees resources and improves the overall efficiency of the workload. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Unused assets consume cloud resources like storage space and compute power. By identifying and eliminating these assets, you can free up these resources, resulting in a more efficient cloud architecture. Perform regular analysis on application assets such as pre-compiled reports, datasets, static images, and asset access patterns to identify redundancy, underutilization, and potential decommission targets. Remove those redundant assets to reduce the resource waste in your workload. ### Implementation steps 1. **Conduct an inventory:** Conduct a comprehensive inventory to identify all assets within your workload. 2. **Analyze usage:** Use continuous monitoring to identify static assets that are no longer required. 3. **Remove unused assets:** Develop a plan to remove assets that are no longer required. - Before removing any asset, evaluate the impact of removing it on the architecture. - Consolidate overlapping generated assets to remove redundant processing. - Update your applications to no longer produce and store assets that are not required. 6. **Communicate with third parties:** Instruct third parties to stop producing and storing assets managed on your behalf that are no longer required. Ask to consolidate redundant assets. 7. **Use lifecycle policies:** Use lifecycle policies to automatically delete unused assets. - You can use Amazon S3 Lifecycle to manage your objects throughout their lifecycle. - You can use Amazon Data Lifecycle Manager to automate the creation, retention, and deletion of Amazon EBS snapshots and Amazon EBS-backed AMIs. 8. **Review and optimize:** Regularly review your workload to identify and remove any unused assets.

๐Ÿ’ผ SUS02-BP04 Optimize geographic placement of workloads based on their networking requirements

Select cloud location and services for your workload that reduce the distance network traffic must travel and decrease the total network resources required to support your workload. **Common anti-patterns:** - You select the workload's Region based on your own location. - You consolidate all workload resources into one geographic location. - All traffic flows through your existing data centers. **Benefits of establishing this best practice:** Placing a workload close to its users provides the lowest latency while decreasing data movement across the network and reducing environmental impact. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance The AWS Cloud infrastructure is built around location options such as Regions, Availability Zones, placement groups, and edge locations such as AWS Outposts and AWS Local Zones. These location options are responsible for maintaining connectivity between application components, cloud services, edge networks, and on-premises data centers. Analyze the network access patterns in your workload to identify how to use these cloud location options and reduce the distance network traffic must travel. ### Implementation steps 1. Analyze network access patterns in your workload to identify how users use your application. - Use monitoring tools, such as Amazon CloudWatch and AWS CloudTrail, to gather data on network activities. - Analyze the data to identify the network access pattern. 2. Select the Regions for your workload deployment based on the following key elements: - **Your Sustainability goal:** as explained in Region selection. - **Where your data is located:** For data-heavy applications (such as big data and machine learning), application code should run as close to the data as possible. - **Where your users are located:** For user-facing applications, choose a Region (or Regions) close to your workloadโ€™s users. - **Other constraints:** Consider constraints such as cost and compliance as explained in *What to Consider when Selecting a Region for your Workloads*. 3. Use local caching or AWS Caching Solutions for frequently used assets to improve performance, reduce data movement, and lower environmental impact. | Service | When to use | |-------------------------|---------------------------------------------------------------------------| | **Amazon CloudFront** | Use to cache static content such as images, scripts, and videos, as well as dynamic content such as API responses or web applications. | | **Amazon ElastiCache** | Use to cache content for web applications. | | **DynamoDB Accelerator**| Use to add in-memory acceleration to your DynamoDB tables. | 4. Use services that can help you run code closer to users of your workload: | Service | When to use | |---------------------------------|------------------------------------------------------------------------| | **Lambda@Edge** | Use for compute-heavy operations that are initiated when objects are not in the cache. | | **Amazon CloudFront Functions** | Use for simple use cases like HTTP(s) request or response manipulations that can be initiated by short-lived functions. | | **AWS IoT Greengrass** | Use to run local compute, messaging, and data caching for connected devices.| 5. Use connection pooling to allow for connection reuse and reduce required resources. 6. Use distributed data stores that donโ€™t rely on persistent connections and synchronous updates for consistency to serve regional populations. 7. Replace pre-provisioned static network capacity with shared dynamic capacity, and share the sustainability impact of network capacity with other subscribers.

๐Ÿ’ผ SUS02-BP05 Optimize team member resources for activities performed

Optimize resources provided to team members to minimize the environmental sustainability impact while supporting their needs. **Common anti-patterns:** - You ignore the impact of devices used by your team members on the overall efficiency of your cloud application. - You manually manage and update resources used by team members. **Benefits of establishing this best practice:** Optimizing team member resources improves the overall efficiency of cloud-enabled applications. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Understand the resources your team members use to consume your services, their expected lifecycle, and the financial and sustainability impact. Implement strategies to optimize these resources. For example, perform complex operations, such as rendering and compilation, on highly utilized scalable infrastructure instead of on underutilized high-powered single-user systems. ### Implementation steps 1. **Use energy-efficient workstations:** Provide team members with energy-efficient workstations and peripherals. Use efficient power management features (like low power mode) in these devices to reduce their energy usage. 2. **Use virtualization:** Use virtual desktops and application streaming to limit upgrade and device requirements. 3. **Encourage remote collaboration:** Encourage team members to use remote collaboration tools such as Amazon Chime or AWS Wickr to reduce the need for travel and associated carbon emissions. 4. **Use energy-efficient software:** Provide team members with energy-efficient software by removing or turning off unnecessary features and processes. 5. **Manage lifecycles:** Evaluate the impact of processes and systems on your device lifecycle, and select solutions that minimize the requirement for device replacement while satisfying business requirements. Regularly maintain and update workstations or software to maintain and improve efficiency. 6. **Remote device management:** Implement remote management for devices to reduce required business travel. - AWS Systems Manager Fleet Manager is a unified user interface (UI) experience that helps you remotely manage your nodes running on AWS or on premises.

๐Ÿ’ผ SUS02-BP06 Implement buffering or throttling to flatten the demand curve

Buffering and throttling flatten the demand curve and reduce the provisioned capacity required for your workload. **Common anti-patterns:** - You process the client requests immediately while it is not needed. - You do not analyze the requirements for client requests. **Benefits of establishing this best practice:** Flattening the demand curve reduce the required provisioned capacity for the workload. Reducing the provisioned capacity means less energy consumption and less environmental impact. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Flattening the workload demand curve can help you to reduce the provisioned capacity for a workload and reduce its environmental impact. Assume a workload with the demand curve shown in below figure. This workload has two peaks, and to handle those peaks, the resource capacity as shown by orange line is provisioned. The resources and energy used for this workload is not indicated by the area under the demand curve, but the area under the provisioned capacity line, as provisioned capacity is needed to handle those two peaks. You can use buffering or throttling to modify the demand curve and smooth out the peaks, which means less provisioned capacity and less energy consumed. Implement throttling when your clients can perform retries. Implement buffering to store the request and defer processing until a later time. ### Implementation steps 1. Analyze the client requests to determine how to respond to them. Questions to consider include: - Can this request be processed asynchronously? - Does the client have retry capability? 2. If the client has retry capability, then you can implement throttling, which tells the source that if it cannot service the request at the current time, it should try again later. - You can use Amazon API Gateway to implement throttling. 3. For clients that cannot perform retries, a buffer needs to be implemented to flatten the demand curve. A buffer defers request processing, allowing applications that run at different rates to communicate effectively. A buffer-based approach uses a queue or a stream to accept messages from producers. Messages are read by consumers and processed, allowing the messages to run at the rate that meets the consumersโ€™ business requirements. - **Amazon Simple Queue Service (Amazon SQS):** A managed service that provides queues that allow a single consumer to read individual messages. - **Amazon Kinesis:** Provides a stream that allows many consumers to read the same messages. 4. Analyze the overall demand, rate of change, and required response time to right size the throttle or buffer required.

๐Ÿ’ผ SUS03-BP01 Optimize software and architecture for asynchronous and scheduled jobs

Use efficient software and architecture patterns such as queue-driven to maintain consistent high utilization of deployed resources. **Common anti-patterns:** - You overprovision the resources in your cloud workload to meet unforeseen spikes in demand. - Your architecture does not decouple senders and receivers of asynchronous messages by a messaging component. **Benefits of establishing this best practice:** - Efficient software and architecture patterns minimize the unused resources in your workload and improve the overall efficiency. - You can scale the processing independently of the receiving of asynchronous messages. - Through a messaging component, you have relaxed availability requirements that you can meet with fewer resources. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Use efficient architecture patterns such as event-driven architecture that result in even utilization of components and minimize overprovisioning in your workload. Using efficient architecture patterns minimizes idle resources from lack of use due to changes in demand over time. Understand the requirements of your workload components and adopt architecture patterns that increase overall utilization of resources. Retire components that are no longer required. ### Implementation steps 1. Analyze the demand for your workload to determine how to respond to those. 2. For requests or jobs that donโ€™t require synchronous responses, use queue-driven architectures and auto scaling workers to maximize utilization. Here are some examples of when you might consider queue-driven architecture: | Queuing mechanism | Description | |--------------------------------------------------------------|-----------------------------------------------------------------------| | **AWS Batch job queues** | AWS Batch jobs are submitted to a job queue where they reside until they can be scheduled to run in a compute environment.| | **Amazon Simple Queue Service and Amazon EC2 Spot Instances**| Pairing Amazon SQS and Spot Instances to build fault tolerant and efficient architecture.| 3. For requests or jobs that can be processed anytime, use scheduling mechanisms to process jobs in batch for more efficiency. Here are some examples of scheduling mechanisms on AWS: | Scheduling mechanism | Description | |-----------------------------------------|----------------------------------------------------------------------| | **Amazon EventBridge Scheduler** | A capability from Amazon EventBridge that allows you to create, run, and manage scheduled tasks at scale.| | **AWS Glue time-based schedule** | Define a time-based schedule for your crawlers and jobs in AWS Glue.| | **Amazon ECS scheduled tasks** | Amazon ECS supports creating scheduled tasks. Scheduled tasks use Amazon EventBridge rules to run tasks either on a schedule or in response to an EventBridge event. | | **Instance Scheduler** | Configure start and stop schedules for your Amazon EC2 and Amazon Relational Database Service instances.| 4. If you use polling and webhooks mechanisms in your architecture, replace those with events. Use event-driven architectures to build highly efficient workloads. 5. Leverage serverless on AWS to eliminate over-provisioned infrastructure. 6. Right size individual components of your architecture to prevent idling resources waiting for input. - You can use the Rightsizing Recommendations in AWS Cost Explorer or AWS Compute Optimizer to identify rightsizing opportunities.

๐Ÿ’ผ SUS03-BP02 Remove or refactor workload components with low or no use

Remove components that are unused and no longer required, and refactor components with little utilization to minimize waste in your workload. **Common anti-patterns:** - You do not regularly check the utilization level of individual components of your workload. - You do not check and analyze recommendations from AWS rightsizing tools such as AWS Compute Optimizer. **Benefits of establishing this best practice:** Removing unused components minimizes waste and improves the overall efficiency of your cloud workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Unused or underutilized components in a cloud workload consume unnecessary compute, storage, or network resources. Remove or refactor these components to directly reduce waste and improve the overall efficiency of a cloud workload. This is an iterative improvement process which can be initiated by changes in demand or the release of a new cloud service. For example, a significant drop in AWS Lambda function run time can indicate a need to lower the memory size. Also, as AWS releases new services and features, the optimal services and architecture for your workload may change. Continually monitor workload activity and look for opportunities to improve the utilization level of individual components. By removing idle components and performing rightsizing activities, you meet your business requirements with the fewest cloud resources. ### Implementation steps 1. **Inventory your AWS resources:** Create an inventory of your AWS resources. In AWS, you can turn on AWS Resource Explorer to explore and organize your AWS resources. 2. **Monitor utilization:** Monitor and capture the utilization metrics for critical components of your workload (like CPU utilization, memory utilization, or network throughput in Amazon CloudWatch metrics). 3. **Identify unused components:** Identify unused or under-utilized components in your architecture. - For stable workloads, check AWS rightsizing tools such as AWS Compute Optimizer at regular intervals to identify idle, unused, or underutilized components. - For ephemeral workloads, evaluate utilization metrics to identify idle, unused, or underutilized components. 4. **Remove unused components:** Retire components and associated assets (like Amazon ECR images) that are no longer needed. - Automated Cleanup of Unused Images in Amazon ECR. - Delete unused Amazon Elastic Block Store (Amazon EBS) volumes by using AWS Config and AWS Systems Manager. 5. **Refactor underutilized components:** Refactor or consolidate underutilized components with other resources to improve utilization efficiency. For example, you can provision multiple small databases on a single Amazon RDS database instance instead of running databases on individual underutilized instances. 6. **Evaluate improvements:** Understand the resources provisioned by your workload to complete a unit of work. Use this information to evaluate improvements achieved by removing or refactoring components.

๐Ÿ’ผ SUS03-BP03 Optimize areas of code that consume the most time or resources

Optimize your code that runs within different components of your architecture to minimize resource usage while maximizing performance. **Common anti-patterns:** - You ignore optimizing your code for resource usage. - You usually respond to performance issues by increasing the resources. - Your code review and development process does not track performance changes. **Benefits of establishing this best practice:** Using efficient code minimizes resource usage and improves performance. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance It is crucial to examine every functional area, including the code for a cloud- architected application, to optimize its resource usage and performance. Continually monitor your workloadโ€™s performance in build environments and production and identify opportunities to improve code snippets that have particularly high resource usage. Adopt a regular review process to identify bugs or anti-patterns within your code that use resources inefficiently. Leverage simple and efficient algorithms that produce the same results for your use case. ### Implementation steps 1. **Use efficient programming language:** Use an efficient operating system and programming language for the workload. 2. **Use an AI coding companion:** Consider using an AI coding companion such as Amazon Q Developer to efficiently write code. 3. **Automate code reviews:** While developing your workloads, adopt an automated code review process to improve quality and identify bugs and anti-patterns. - **Automate code reviews with Amazon CodeGuru Reviewer** - **Detecting concurrency bugs with Amazon CodeGuru** - **Raising code quality for Python applications using Amazon CodeGuru** 4. **Use a code profiler:** Use a code profiler to identify the areas of code that use the most time or resources as targets for optimization. - **Reducing your organization's carbon footprint with Amazon CodeGuru Profiler** - **Understanding memory usage in your Java application with Amazon CodeGuru Profiler** - **Improving customer experience and reducing cost with Amazon CodeGuru Profiler** 5. **Monitor and optimize:** Use continuous monitoring resources to identify components with high resource requirements or suboptimal configuration. - Replace computationally intensive algorithms with simpler and more efficient versions that produce the same result. - Remove unnecessary code such as sorting and formatting. 6. **Use code refactoring or transformation:** Explore the possibility of Amazon Q code transformation for application maintenance and upgrades.

๐Ÿ’ผ SUS03-BP04 Optimize impact on devices and equipment

Understand the devices and equipment used in your architecture and use strategies to reduce their usage. This can minimize the overall environmental impact of your cloud workload. **Common anti-patterns:** - You ignore the environmental impact of devices used by your customers. - You manually manage and update resources used by customers. **Benefits of establishing this best practice:** Implementing software patterns and features that are optimized for customer devices can reduce the overall environmental impact of cloud workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Implementing software patterns and features that are optimized for customer devices can reduce the environmental impact in several ways: - Implementing new features that are backward compatible can reduce the number of hardware replacements. - Optimizing an application to run efficiently on devices can help to reduce their energy consumption and extend their battery life (if they are powered by battery). - Optimizing an application for devices can also reduce the data transfer over the network. Understand the devices and equipment used in your architecture, their expected lifecycle, and the impact of replacing those components. Implement software patterns and features that can help to minimize the device energy consumption, the need for customers to replace the device, and also upgrade it manually. ### Implementation steps 1. **Conduct an inventory:** Inventory the devices used in your architecture. Devices can be mobile, tablet, IoT devices, smart light, or even smart devices in a factory. 2. **Use energy-efficient devices:** Consider using energy-efficient devices in your architecture. Use power management configurations on devices to enter low power mode when not in use. 3. **Run efficient applications:** Optimize the application running on the devices: - Use strategies such as running tasks in the background to reduce their energy consumption. - Account for network bandwidth and latency when building payloads, and implement capabilities that help your applications work well on low bandwidth, high latency links. - Convert payloads and files into optimized formats required by devices. For example, you can use Amazon Elastic Transcoder or AWS Elemental MediaConvert to convert large, high-quality digital media files into formats that users can play back on mobile devices, tablets, web browsers, and connected televisions. - Perform computationally intense activities server-side (such as image rendering), or use application streaming to improve the user experience on older devices. - Segment and paginate output, especially for interactive sessions, to manage payloads and limit local storage requirements. 4. **Engage suppliers:** Work with device suppliers who use sustainable materials and provide transparency in their supply chains and environmental certifications. 5. **Use over-the-air (OTA) updates:** Use automated OTA mechanisms to deploy updates to one or more devices. - You can use a CI/CD pipeline to update mobile applications. - You can use AWS IoT Device Management to remotely manage connected devices at scale. 6. **Use managed device farms:** To test new features and updates, use managed device farms with representative sets of hardware and iterate development to maximize the devices supported. For more details, see SUS06-BP05 *Use managed device farms for testing*. 7. **Continue to monitor and improve:** Track the energy usage of devices to identify areas for improvement. Use new technologies or best practices to enhance environmental impacts of these devices.

๐Ÿ’ผ SUS03-BP05 Use software patterns and architectures that best support data access and storage patterns

Understand how data is used within your workload, consumed by your users, transferred, and stored. Use software patterns and architectures that best support data access and storage to minimize the compute, networking, and storage resources required to support the workload. **Common anti-patterns:** - You assume that all workloads have similar data storage and access patterns. - You only use one tier of storage, assuming all workloads fit within that tier. - You assume that data access patterns will stay consistent over time. - Your architecture supports a potential high data access burst, which results in the resources remaining idle most of the time. **Benefits of establishing this best practice:** Selecting and optimizing your architecture based on data access and storage patterns will help decrease development complexity and increase overall utilization. Understanding when to use global tables, data partitioning, and caching will help you decrease operational overhead and scale based on your workload needs. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance To improve long-term workload sustainability, use architecture patterns that support data access and storage characteristics for your workload. These patterns help you efficiently retrieve and process data. For example, you can use modern data architecture on AWS with purpose-built services optimized for your unique analytics use cases. These architecture patterns allow for efficient data processing and reduce the resource usage. ### Implementation steps 1. **Understand data characteristics:** Analyze your data characteristics and access patterns to identify the correct configuration for your cloud resources. Key characteristics to consider include: - **Data type:** structured, semi-structured, unstructured - **Data growth:** bounded, unbounded - **Data durability:** persistent, ephemeral, transient - **Access patterns:** reads or writes, update frequency, spiky, or consistent 2. **Use optimal architecture patterns:** Use architecture patterns that best support data access and storage patterns. 3. **Use purpose-built services:** Use technologies that are fit-for-purpose. - Use technologies that work natively with compressed data. - Use purpose-built analytics services for data processing in your architecture. - Use the database engine that best supports your dominant query pattern. Manage your database indexes for efficient querying. 4. **Minimize data transfer:** Select network protocols that reduce the amount of network capacity consumed in your architecture.

๐Ÿ’ผ SUS04-BP01 Implement a data classification policy

Classify data to understand its criticality to business outcomes and choose the right energy-efficient storage tier to store the data. **Common anti-patterns:** - You do not identify data assets with similar characteristics (such as sensitivity, business criticality, or regulatory requirements) that are being processed or stored. - You have not implemented a data catalog to inventory your data assets. **Benefits of establishing this best practice:** Implementing a data classification policy allows you to determine the most energy-efficient storage tier for data. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Data classification involves identifying the types of data that are being processed and stored in an information system owned or operated by an organization. It also involves making a determination on the criticality of the data and the likely impact of a data compromise, loss, or misuse. Implement data classification policy by working backwards from the contextual use of the data and creating a categorization scheme that takes into account the level of criticality of a given dataset to an organizationโ€™s operations. ### Implementation steps 1. **Perform data inventory:** Conduct an inventory of the various data types that exist for your workload. 2. **Group data:** Determine criticality, confidentiality, integrity, and availability of data based on risk to the organization. Use these requirements to group data into one of the data classification tiers that you adopt. As an example, see *Four simple steps to classify your data and secure your startup*. 3. **Define data classification levels and policies:** For each data group, define data classification level (for example, public or confidential) and handling policies. Tag data accordingly. For more detail on data classification categories, see *Data Classification whitepaper*. 4. **Periodically review:** Periodically review and audit your environment for untagged and unclassified data. Use automation to identify this data, and classify and tag the data appropriately. As an example, see *Data Catalog and crawlers in AWS Glue*. 5. **Establish a data catalog:** Establish a data catalog that provides audit and governance capabilities. 6. **Documentation:** Document data classification policies and handling procedures for each data class.

๐Ÿ’ผ SUS04-BP02 Use technologies that support data access and storage patterns

Use storage technologies that best support how your data is accessed and stored to minimize the resources provisioned while supporting your workload. **Common anti-patterns:** - You assume that all workloads have similar data storage and access patterns. - You only use one tier of storage, assuming all workloads fit within that tier. - You assume that data access patterns will stay consistent over time. **Benefits of establishing this best practice:** Selecting and optimizing your storage technologies based on data access and storage patterns will help you reduce the required cloud resources to meet your business needs and improve the overall efficiency of cloud workload. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Select the storage solution that aligns best to your access patterns, or consider changing your access patterns to align with the storage solution to maximize performance efficiency. ### Implementation steps 1. **Evaluate data and access characteristics:** Evaluate your data characteristics and access pattern to collect the key characteristics of your storage needs. Key characteristics to consider include: - **Data type:** structured, semi-structured, unstructured - **Data growth:** bounded, unbounded - **Data durability:** persistent, ephemeral, transient - **Access patterns:** reads or writes, frequency, spiky, or consistent 2. **Choose the right storage technology:** Migrate data to the appropriate storage technology that supports your data characteristics and access pattern. Here are some examples of AWS storage technologies and their key characteristics: - **Object storage:** Amazon S3 An object storage service with unlimited scalability, high availability, and multiple options for accessibility. Transferring and accessing objects in and out of Amazon S3 can use a service, such as Transfer Acceleration or Access Points, to support your location, security needs, and access patterns. - **Archiving storage:** Amazon Glacier Storage class of Amazon S3 built for data-archiving. - **Shared file system:** Amazon Elastic File System (Amazon EFS) Mountable file system that can be accessed by multiple types of compute solutions. Amazon EFS automatically grows and shrinks storage and is performance-optimized to deliver consistent low latencies. - **Shared file system:** Amazon FSx Built on the latest AWS compute solutions to support four commonly used file systems: NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. Amazon FSx latency, throughput, and IOPS vary per file system and should be considered when selecting the right file system for your workload needs. - **Block storage:** Amazon Elastic Block Store (Amazon EBS) Scalable, high-performance block-storage service designed for Amazon EC2. Amazon EBS includes SSD-backed storage for transactional, IOPS-intensive workloads and HDD-backed storage for throughput-intensive workloads. - **Relational database:** Amazon Aurora, Amazon RDS, Amazon Redshift Designed to support ACID transactions and maintain referential integrity and strong data consistency. Many traditional applications, ERP, CRM, and ecommerce systems use relational databases to store their data. - **Key-value database:** Amazon DynamoDB Optimized for common access patterns, typically to store and retrieve large volumes of data. High-traffic web apps, ecommerce systems, and gaming applications are typical use-cases for key-value databases. 3. **Automate storage allocation:** For storage systems that are a fixed size, such as Amazon EBS or Amazon FSx, monitor the available storage space and automate storage allocation on reaching a threshold. You can leverage Amazon CloudWatch to collect and analyze different metrics for Amazon EBS and Amazon FSx. 4. **Choose the right storage class:** Choose the appropriate storage class for your data. - Amazon S3 storage classes can be configured at the object level. A single bucket can contain objects stored across all of the storage classes. - You can use Amazon S3 Lifecycle policies to automatically transition objects between storage classes or remove data without any application changes. In general, you have to make a trade-off between resource efficiency, access latency, and reliability when considering these storage mechanisms.

๐Ÿ’ผ SUS04-BP03 Use policies to manage the lifecycle of your datasets

Manage the lifecycle of all of your data and automatically enforce deletion to minimize the total storage required for your workload. **Common anti-patterns:** - You manually delete data. - You do not delete any of your workload data. - You do not move data to more energy-efficient storage tiers based on its retention and access requirements. **Benefits of establishing this best practice:** Using data lifecycle policies ensures efficient data access and retention in a workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Datasets usually have different retention and access requirements during their lifecycle. For example, your application may need frequent access to some datasets for a limited period of time. After that, those datasets are infrequently accessed. To improve the efficiency of data storage and computation over time, implement lifecycle policies, which are rules that define how data is handled over time. With lifecycle configuration rules, you can tell the specific storage service to transition a dataset to more energy-efficient storage tiers, archive it, or delete it. This practice minimizes active data storage and retrieval, which leads to lower energy consumption. In addition, practices such as archiving or deleting obsolete data support regulatory compliance and data governance. ### Implementation steps 1. **Use data classification:** Classify datasets in your workload. 2. **Define handling rules:** Define handling procedures for each data class. 3. **Enable automation:** Set automated lifecycle policies to enforce lifecycle rules. Here are some examples of how to set up automated lifecycle policies for different AWS storage services: - **Amazon S3:** You can use Amazon S3 Lifecycle to manage your objects throughout their lifecycle. If your access patterns are unknown, changing, or unpredictable, you can use Amazon S3 Intelligent-Tiering, which monitors access patterns and automatically moves objects that have not been accessed to lower-cost access tiers. You can leverage Amazon S3 Storage Lens metrics to identify optimization opportunities and gaps in lifecycle management. - **Amazon Elastic Block Store:** You can use Amazon Data Lifecycle Manager to automate the creation, retention, and deletion of Amazon EBS snapshots and Amazon EBS-backed AMIs. - **Amazon Elastic File System:** Amazon EFS lifecycle management automatically manages file storage for your file systems. - **Amazon Elastic Container Registry:** Amazon ECR lifecycle policies automate the cleanup of your container images by expiring images based on age or count. - **AWS Elemental MediaStore:** You can use an object lifecycle policy that governs how long objects should be stored in the MediaStore container. 4. **Delete unused assets:** Delete unused volumes, snapshots, and data that is out of its retention period. Use native service features like Amazon DynamoDB Time To Live or Amazon CloudWatch log retention for deletion. 5. **Aggregate and compress:** Aggregate and compress data where applicable based on lifecycle rules.

๐Ÿ’ผ SUS04-BP04 Use elasticity and automation to expand block storage or file system

Use elasticity and automation to expand block storage or file system as data grows to minimize the total provisioned storage. **Common anti-patterns:** - You procure large block storage or file system for future need. - You overprovision the input and output operations per second (IOPS) of your file system. - You do not monitor the utilization of your data volumes. **Benefits of establishing this best practice:** Minimizing over-provisioning for storage system reduces the idle resources and improves the overall efficiency of your workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Create block storage and file systems with size allocation, throughput, and latency that are appropriate for your workload. Use elasticity and automation to expand block storage or file system as data grows without having to over-provision these storage services. ### Implementation steps 1. For fixed size storage like Amazon EBS, verify that you are monitoring the amount of storage used versus the overall storage size and create automation, if possible, to increase the storage size when reaching a threshold. 2. Use elastic volumes and managed block data services to automate allocation of additional storage as your persistent data grows. As an example, you can use Amazon EBS Elastic Volumes to change volume size, volume type, or adjust the performance of your Amazon EBS volumes. 3. Choose the right storage class, performance mode, and throughput mode for your file system to address your business need, not exceeding that. 4. Set target levels of utilization for your data volumes, and resize volumes outside of expected ranges. 5. Right size read-only volumes to fit the data. 6. Migrate data to object stores to avoid provisioning the excess capacity from fixed volume sizes on block storage. 7. Regularly review elastic volumes and file systems to terminate idle volumes and shrink over-provisioned resources to fit the current data size.

๐Ÿ’ผ SUS04-BP05 Remove unneeded or redundant data

Remove unneeded or redundant data to minimize the storage resources required to store your datasets. **Common anti-patterns:** - You duplicate data that can be easily obtained or recreated. - You back up all data without considering its criticality. - You only delete data irregularly, on operational events, or not at all. - You store data redundantly irrespective of the storage service's durability. - You turn on Amazon S3 versioning without any business justification. **Benefits of establishing this best practice:** Removing unneeded data reduces the storage size required for your workload and the workload environmental impact. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance When you remove unneeded and redundant datasets, you can reduce storage cost and environmental footprint. This practice may also make computing more efficient, as compute resources only process important data instead of unneeded data. Automate the deletion of unneeded data. Use technologies that deduplicate data at the file and block level. Use service features for native data replication and redundancy. ### Implementation steps 1. Evaluate public datasets: Evaluate if you can avoid storing data by using existing publicly available datasets in AWS Data Exchange and Open Data on AWS. 2. De-deplicate data: Use mechanisms that can deduplicate data at the block and object level. Here are some examples of how to deduplicate data on AWS: | Storage service | Deduplication mechanism | | --------------- | ----------------------- | | Amazon S3 | Use AWS Lake Formation FindMatches to find matching records across a dataset (including ones without identifiers) by using the new FindMatches ML Transform. | | Amazon FSx | Use data deduplication on Amazon FSx for Windows. | | Amazon Elastic Block Store snapshots | Snapshots are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved.| 3. Use lifecycle policies: Use lifecycle policies to automate unneeded data deletion. Use native service features like Amazon DynamoDB Time To Live, Amazon S3 Lifecycle, or Amazon CloudWatch log retention for deletion. 4. Use data virtualization: Use data virtualization capabilities on AWS to maintain data at its source and avoid data duplication. 5. Use incremental backup: Use backup technology that can make incremental backups. 6. Use native durability: Leverage the durability of Amazon S3 and replication of Amazon EBS to meet your durability goals instead of self-managed technologies (such as a redundant array of independent disks (RAID)). 7. Use efficient logging: Centralize log and trace data, deduplicate identical log entries, and establish mechanisms to tune verbosity when needed. 8. Use efficient caching: Pre-populate caches only where justified. 9. Establish cache monitoring and automation to resize the cache accordingly. 10. Remove old version assets: Remove out-of-date deployments and assets from object stores and edge caches when pushing new versions of your workload.

๐Ÿ’ผ SUS04-BP06 Use shared file systems or storage to access common data

Adopt shared file systems or storage to avoid data duplication and allow for more efficient infrastructure for your workload. **Common anti-patterns** - You provision storage for each individual client. - You do not detach data volume from inactive clients. - You do not provide access to storage across platforms and systems. **Benefits of establishing this best practice** Using shared file systems or storage allows for sharing data to one or more consumers without having to copy the data. This helps to reduce the storage resources required for the workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance If you have multiple users or applications accessing the same datasets, using shared storage technology is crucial to use efficient infrastructure for your workload. Shared storage technology provides a central location to store and manage datasets and avoid data duplication. It also enforces consistency of the data across different systems. Moreover, shared storage technology allows for more efficient use of compute power, as multiple compute resources can access and process data at the same time in parallel. Fetch data from these shared storage services only as needed and detach unused volumes to free up resources. ### Implementation steps 1. Use shared storage: Migrate data to shared storage when the data has multiple consumers. Here are some examples of shared storage technology on AWS: - **Amazon EBS Multi-Attach** Amazon EBS Multi-Attach allows you to attach a single Provisioned IOPS SSD (io1 or io2) volume to multiple instances that are in the same Availability Zone. - **Amazon EFS** See When to Choose Amazon EFS. - **Amazon FSx** See Choosing an Amazon FSx File System. - **Amazon S3** Applications that do not require a file system structure and are designed to work with object storage can use Amazon S3 as a massively scalable, durable, low-cost object storage solution. 2. Fetch data as needed: Copy data to or fetch data from shared file systems only as needed. As an example, you can create an Amazon FSx for Lustre file system backed by Amazon S3 and only load the subset of data required for processing jobs to Amazon FSx. 3. Delete unneeded data: Delete data as appropriate for your usage patterns as outlined in SUS04-BP03 Use policies to manage the lifecycle of your datasets. 4. Detach inactive clients: Detach volumes from clients that are not actively using them.

๐Ÿ’ผ SUS04-BP07 Minimize data movement across networks

Use shared file systems or object storage to access common data and minimize the total networking resources required to support data movement for your workload. **Common anti-patterns** - You store all data in the same AWS Region independent of where the data users are. - You do not optimize data size and format before moving it over the network. **Benefits of establishing this best practice** Optimizing data movement across the network reduces the total networking resources required for the workload and lowers its environmental impact. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Moving data around your organization requires compute, networking, and storage resources. Use techniques to minimize data movement and improve the overall efficiency of your workload. ### Implementation steps 1. Use proximity: Consider proximity to the data or users as a decision factor when selecting a Region for your workload. 2. Partition services: Partition Regionally-consumed services so that their Region-specific data is stored within the Region where it is consumed. 3. Use efficient file formats: Use efficient file formats (such as Parquet or ORC) and compress data before you move it over the network. 4. Minimize data movement: Don't move unused data. Some examples that can help you avoid moving unused data: - Reduce API responses to only relevant data. - Aggregate data where detailed (record-level information is not required). - See Well-Architected Lab - Optimize Data Pattern Using Amazon Redshift Data Sharing. - Consider Cross-account data sharing in AWS Lake Formation. 5. Use edge services: Use services that can help you run code closer to users of your workload. - **Lambda@Edge** Use for compute-heavy operations that are run when objects are not in the cache. - **CloudFront Functions** Use for simple use cases such as HTTP(s) request/response manipulations that can be initiated by short-lived functions. - **AWS IoT Greengrass** Run local compute, messaging, and data caching for connected devices.

๐Ÿ’ผ SUS04-BP08 Back up data only when difficult to recreate

Avoid backing up data that has no business value to minimize storage resources requirements for your workload. **Common anti-patterns** - You do not have a backup strategy for your data. - You back up data that can be easily recreated. **Benefits of establishing this best practice** Avoiding back-up of non-critical data reduces the required storage resources for the workload and lowers its environmental impact. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Avoiding the back up of unnecessary data can help lower cost and reduce the storage resources used by the workload. Only back up data that has business value or is needed to satisfy compliance requirements. Examine backup policies and exclude ephemeral storage that doesnโ€™t provide value in a recovery scenario. ### Implementation steps 1. Classify data: Implement data classification policy as outlined in SUS04-BP01 Implement a data classification policy. 2. Design a backup strategy: Use the criticality of your data classification and design backup strategy based on your recovery time objective (RTO) and recovery point objective (RPO). Avoid backing up non-critical data. - Exclude data that can be easily recreated. - Exclude ephemeral data from your backups. - Exclude local copies of data, unless the time required to restore that data from a common location exceeds your service-level agreements (SLAs). 3. Use automated backup: Use an automated solution or managed service to back up business-critical data. - **AWS Backup** is a fully-managed service that makes it easy to centralize and automate data protection across AWS services, in the cloud, and on premises. For hands-on guidance on how to create automated backups using AWS Backup, see Well-Architected Labs - Testing Backup and Restore of Data. - Automate backups and optimize backup costs for Amazon EFS using AWS Backup.

๐Ÿ’ผ SUS05-BP01 Use the minimum amount of hardware to meet your needs

Use the minimum amount of hardware for your workload to efficiently meet your business needs. **Common anti-patterns** - You do not monitor resource utilization. - You have resources with a low utilization level in your architecture. - You do not review the utilization of static hardware to determine if it should be resized. - You do not set hardware utilization goals for your compute infrastructure based on business KPIs. **Benefits of establishing this best practice** Rightsizing your cloud resources helps to reduce a workloadโ€™s environmental impact, save money, and maintain performance benchmarks. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Optimally select the total number of hardware required for your workload to improve its overall efficiency. The AWS Cloud provides the flexibility to expand or reduce the number of resources dynamically through a variety of mechanisms, such as AWS Auto Scaling, and meet changes in demand. It also provides APIs and SDKs that allow resources to be modified with minimal effort. Use these capabilities to make frequent changes to your workload implementations. Additionally, use rightsizing guidelines from AWS tools to efficiently operate your cloud resource and meet your business needs. ### Implementation steps 1. Choose the instances type: Choose the right instances type to best fit your needs. 2. Scale: Use small increments to scale variable workloads. 3. Use multiple compute purchase options: Balance instance flexibility, scalability, and cost savings with multiple compute purchase options. - **Amazon EC2 On-Demand Instances** are best suited for new, stateful, and spiky workloads which canโ€™t be instance type, location, or time flexible. - **Amazon EC2 Spot Instances** are a great way to supplement the other options for applications that are fault tolerant and flexible. - **Leverage Compute Savings Plans** for steady state workloads that allow flexibility if your needs (like AZ, Region, instance families, or instance types) change. 4. Use instance and Availability Zone diversity: Maximize application availability and take advantage of excess capacity by diversifying your instances and Availability Zones. 5. Rightsize instances: Use the rightsizing recommendations from AWS tools to make adjustments on your workload. - Use rightsizing recommendations in AWS Cost Explorer or AWS Compute Optimizer to identify rightsizing opportunities. 6. Negotiate service-level agreements (SLAs): Negotiate SLAs that permit temporarily reducing capacity while automation deploys replacement resources.

๐Ÿ’ผ SUS05-BP02 Use instance types with the least impact

Continually monitor and use new instance types to take advantage of energy efficiency improvements. **Common anti-patterns** - You are only using one family of instances. - You are only using x86 instances. - You specify one instance type in your Amazon EC2 Auto Scaling configuration. - You use AWS instances in a manner that they were not designed for (for example, you use compute-optimized instances for a memory-intensive workload). - You do not evaluate new instance types regularly. - You do not check recommendations from AWS rightsizing tools such as AWS Compute Optimizer. **Benefits of establishing this best practice** By using energy-efficient and right-sized instances, you are able to greatly reduce the environmental impact and cost of your workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Using efficient instances in cloud workload is crucial for lower resource usage and cost-effectiveness. Continually monitor the release of new instance types and take advantage of energy efficiency improvements, including those instance types designed to support specific workloads such as machine learning training and inference, and video transcoding. ### Implementation steps 1. Learn and explore instance types: Find instance types that can lower your workload's environmental impact. - Subscribe to What's New with AWS to stay up-to-date with the latest AWS technologies and instances. - Learn about different AWS instance types. - Learn about AWS Graviton-based instances which offer the best performance per watt of energy use in Amazon EC2 by watching re:Invent 2020 - Deep dive on AWS Graviton2 processor-powered Amazon EC2 instances and Deep dive into AWS Graviton3 and Amazon EC2 C7g instances. 2. Use instance types with the least impact: Plan and transition your workload to instance types with the least impact. - Define a process to evaluate new features or instances for your workload. Take advantage of agility in the cloud to quickly test how new instance types can improve your workload environmental sustainability. Use proxy metrics to measure how many resources it takes you to complete a unit of work. - If possible, modify your workload to work with different numbers of vCPUs and different amounts of memory to maximize your choice of instance type. - Consider transitioning your workload to Graviton-based instances to improve the performance efficiency of your workload. - Consider selecting the AWS Graviton option in your usage of AWS managed services. - Migrate your workload to Regions that offer instances with the least sustainability impact and still meet your business requirements. - For machine learning workloads, take advantage of purpose-built hardware that is specific to your workload such as AWS Trainium, AWS Inferentia, and Amazon EC2 DL1. AWS Inferentia instances such as Inf2 instances offer up to 50% better performance per watt over comparable Amazon EC2 instances. - Use Amazon SageMaker AI Inference Recommender to right size ML inference endpoint. - For spiky workloads (workloads with infrequent requirements for additional capacity), use burstable performance instances. - For stateless and fault-tolerant workloads, use Amazon EC2 Spot Instances to increase overall utilization of the cloud, and reduce the sustainability impact of unused resources. 3. Operate and optimize: Operate and optimize your workload instance. - For ephemeral workloads, evaluate instance Amazon CloudWatch metrics such as CPUUtilization to identify if the instance is idle or under-utilized. - For stable workloads, check AWS rightsizing tools such as AWS Compute Optimizer at regular intervals to identify opportunities to optimize and right-size the instances.

๐Ÿ’ผ SUS05-BP03 Use managed services

Use managed services to operate more efficiently in the cloud. **Common anti-patterns** - You use Amazon EC2 instances with low utilization to run your applications. - Your in-house team only manages the workload, without time to focus on innovation or simplifications. - You deploy and maintain technologies for tasks that can run more efficiently on managed services. **Benefits of establishing this best practice** - Using managed services shifts the responsibility to AWS, which has insights across millions of customers that can help drive new innovations and efficiencies. - Managed service distributes the environmental impact of the service across many users because of the multi-tenet control planes. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Managed services shift responsibility to AWS for maintaining high utilization and sustainability optimization of the deployed hardware. Managed services also remove the operational and administrative burden of maintaining a service, which allows your team to have more time and focus on innovation. Review your workload to identify the components that can be replaced by AWS managed services. For example, Amazon RDS, Amazon Redshift, and Amazon ElastiCache provide a managed database service. Amazon Athena, Amazon EMR, and Amazon OpenSearch Service provide a managed analytics service. ### Implementation steps 1. Inventory your workload: Inventory your workload for services and components. 2. Identify candidates: Assess and identify components that can be replaced by managed services. Here are some examples of when you might consider using a managed service: - **Hosting a database** Use managed Amazon Relational Database Service (Amazon RDS) instances instead of maintaining your own Amazon RDS instances on Amazon Elastic Compute Cloud (Amazon EC2). - **Hosting a container workload** Use AWS Fargate, instead of implementing your own container infrastructure. - **Hosting web apps** Use AWS Amplify Hosting as fully managed CI/CD and hosting service for static websites and server-side rendered web apps. 3. Create a migration plan: Identify dependencies and create a migrations plan. Update runbooks and playbooks accordingly. - The AWS Application Discovery Service automatically collects and presents detailed information about application dependencies and utilization to help you make more informed decisions as you plan your migration. 4. Perform tests: Test the service before migrating to the managed service. 5. Replace self-hosted services: Use your migration plan to replace self-hosted services with managed service. 6. Monitor and adjust: Continually monitor the service after the migration is complete to make adjustments as required and optimize the service.

๐Ÿ’ผ SUS05-BP04 Optimize your use of hardware-based compute accelerators

Optimize your use of accelerated computing instances to reduce the physical infrastructure demands of your workload. **Common anti-patterns** - You are not monitoring GPU usage. - You are using a general-purpose instance for workload while a purpose-built instance can deliver higher performance, lower cost, and better performance per watt. - You are using hardware-based compute accelerators for tasks where theyโ€™re more efficient using CPU-based alternatives. **Benefits of establishing this best practice** By optimizing the use of hardware-based accelerators, you can reduce the physical-infrastructure demands of your workload. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance If you require high processing capability, you can benefit from using accelerated computing instances, which provide access to hardware-based compute accelerators such as graphics processing units (GPUs) and field programmable gate arrays (FPGAs). These hardware accelerators perform certain functions like graphic processing or data pattern matching more efficiently than CPU-based alternatives. Many accelerated workloads, such as rendering, transcoding, and machine learning, are highly variable in terms of resource usage. Only run this hardware for the time needed, and decommission them with automation when not required to minimize resources consumed. ### Implementation steps 1. Explore compute accelerators: Identify which accelerated computing instances can address your requirements. 2. Use purpose-built hardware: For machine learning workloads, take advantage of purpose-built hardware that is specific to your workload, such as AWS Trainium, AWS Inferentia, and Amazon EC2 DL1. AWS Inferentia instances such as Inf2 instances offer up to 50% better performance per watt over comparable Amazon EC2 instances. 3. Monitor usage metrics: Collect usage metrics for your accelerated computing instances. For example, you can use CloudWatch agent to collect metrics such as utilization_gpu and utilization_memory for your GPUs as shown in Collect NVIDIA GPU metrics with Amazon CloudWatch. 4. Rightsize: Optimize the code, network operation, and settings of hardware accelerators to make sure that underlying hardware is fully utilized. - Optimize GPU settings - GPU Monitoring and Optimization in the Deep Learning AMI - Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker AI 5. Keep up to date: Use the latest high performant libraries and GPU drivers. 6. Release unneeded instances: Use automation to release GPU instances when not in use.

๐Ÿ’ผ SUS06-BP01 Communicate and cascade your sustainability goals

Technology is a key enabler of sustainability. IT teams play a crucial role in driving meaningful change towards your organization's sustainability goals. These teams should clearly understand the company's sustainability targets and work to communicate and cascade those priorities across its operations. **Common anti-patterns** - You don't know your organization's sustainability goals and how they apply to your team. - You have insufficient awareness and training about the environmental impact of cloud workloads. - You are unsure about the specific areas to prioritize. - You do not involve your employees and customers in your sustainability initiatives. **Benefits of establishing this best practice** From optimization of infrastructure and systems to use of innovative technologies, IT teams can reduce the organization's carbon emissions and minimize resource consumption. Communication of sustainability goals can provide the ability for IT teams continuously improve and adapt to evolving sustainability challenges. Additionally, these sustainable optimizations often translate to cost savings as well, which strengthens the business case. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance The primary sustainability goals for IT teams should be to optimize systems and solutions to increase resource efficiency and minimize the organization's carbon footprint and overall environmental impact. Shared services and initiatives like training programs and operational dashboards can support organizations as they optimize IT operations and build solutions that can help significantly reduce the carbon footprint. The cloud presents an opportunity not only to move physical infrastructure and energy procurement responsibilities to the shared responsibility of the cloud provider but also to continuously optimize the resource efficiency of cloud-based services. When teams use the cloud's inherent efficiency and shared responsibility model, they can drive meaningful reductions in the organization's environmental impact. This, in turn, can contribute to the organization's overall sustainability goals and demonstrate the value of these teams as strategic partners in the journey towards a more sustainable future. ### Implementation steps 1. Define goals and objectives: Establish well-defined goals for your IT program. This involves getting input from responsible stakeholders from different departments such as IT, sustainability, and finance. These teams should define measurable goals that align with your organization's sustainability goals, including areas such carbon reduction and resource optimization. 2. Understand the carbon accounting boundaries of your business: Understand how carbon accounting methods like the Greenhouse Gas (GHG) Protocol relate to your workloads in the cloud. 3. Use cloud solutions for carbon accounting: Use cloud solutions such as carbon accounting solutions on AWS to track scope one, two, and three for GHG emissions across your operations, portfolios, and value chains. With these solutions, organizations can streamline GHG emission data acquisition, simplify reporting, and derive insights to inform their climate strategies. 4. Monitor the carbon footprint of your IT portfolio: Track and report carbon emissions of your IT systems. Use the AWS Customer Carbon Footprint Tool to track, measure, review, and forecast the carbon emissions generated from your AWS usage. 5. Communicate resource usage through proxy metrics to your teams: Track and report on your resource usage through proxy metrics. In the on-demand pricing models of the cloud, resource usage is related to cost, which is a generally-understandable metric. At a minimum, use cost as a proxy metric to communicate the resource usage and improvements by each team. - Enable hourly granularity in your Cost Explorer and create a Cost and Usage Report (CUR): The CUR provides daily or hourly usage granularity, rates, costs, and usage attributes for all AWS services. Use the Cloud Intelligence Dashboards and its Sustainability Proxy Metrics Dashboard as a starting point for the processing and visualization of cost and usage based data. 6. Continuously optimize and evaluate: Use an improvement process to continuously optimize your IT systems, including cloud workload for efficiency and sustainability. Monitor carbon footprint before and after implementation of optimization strategy. Use the reduction in carbon footprint to assess the effectiveness. 7. Foster a sustainability culture: Use training programs (like AWS Skill Builder) to educate your employees about sustainability. Engage them in sustainability initiatives. Share and celebrate their success stories. Use incentives to award them if they achieve sustainability targets.

๐Ÿ’ผ SUS06-BP02 Adopt methods that can rapidly introduce sustainability improvements

Adopt methods and processes to validate potential improvements, minimize testing costs, and deliver small improvements. **Common anti-patterns** - Reviewing your application for sustainability is a task done only once at the beginning of a project. - Your workload has become stale, as the release process is too cumbersome to introduce minor changes for resource efficiency. - You do not have mechanisms to improve your workload for sustainability. **Benefits of establishing this best practice** By establishing a process to introduce and track sustainability improvements, you will be able to continually adopt new features and capabilities, remove issues, and improve workload efficiency. **Level of risk exposed if this best practice is not established:** Medium ## Implementation guidance Test and validate potential sustainability improvements before deploying them to production. Account for the cost of testing when calculating potential future benefit of an improvement. Develop low cost testing methods to deliver small improvements. ### Implementation steps 1. Understand and communicate your organizational sustainability goals: Understand your organizational sustainability goals, such carbon reduction or water stewardship. Translate these goals into sustainability requirements for your cloud workloads. Communicate these requirements to key stakeholders. 2. Add sustainability requirements to your backlog: Add requirements for sustainability improvement to your development backlog. 3. Iterate and improve: Use an iterative improvement process to identify, evaluate, prioritize, test, and deploy these improvements. 4. Test using minimum viable product (MVP): Develop and test potential improvements using the minimum viable representative components to reduce the cost and environmental impact of testing. 5. Streamline the process: Continually improve and streamline your development processes. As an example, automate your software delivery process using continuous integration and delivery (CI/CD) pipelines to test and deploy potential improvements to reduce the level of effort and limit errors caused by manual processes. 6. Training and awareness: Run training programs for your team members to educate them about sustainability and how their activities impact your organizational sustainability goals. 7. Assess and adjust: Continually assess the impact of improvements and make adjustments as needed.

๐Ÿ’ผ SUS06-BP03 Keep your workload up-to-date

Keep your workload up-to-date to adopt efficient features, remove issues, and improve the overall efficiency of your workload. **Common anti-patterns** - You assume your current architecture is static and will not be updated over time. - You do not have any systems or a regular cadence to evaluate if updated software and packages are compatible with your workload. **Benefits of establishing this best practice** By establishing a process to keep your workload up to date, you can adopt new features and capabilities, resolve issues, and improve workload efficiency. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Up to date operating systems, runtimes, middlewares, libraries, and applications can improve workload efficiency and make it easier to adopt more efficient technologies. Up to date software might also include features to measure the sustainability impact of your workload more accurately, as vendors deliver features to meet their own sustainability goals. Adopt a regular cadence to keep your workload up to date with the latest features and releases. ### Implementation steps 1. Define a process: Use a process and schedule to evaluate new features or instances for your workload. Take advantage of agility in the cloud to quickly test how new features can improve your workload to: - Reduce sustainability impacts. - Gain performance efficiencies. - Remove barriers for a planned improvement. - Improve your ability to measure and manage sustainability impacts. 2. Conduct an inventory: Inventory your workload software and architecture and identify components that need to be updated. - You can use AWS Systems Manager Inventory to collect operating system (OS), application, and instance metadata from your Amazon EC2 instances and quickly understand which instances are running the software and configurations required by your software policy and which instances need to be updated. 3. Learn the update procedure: Understand how to update the components of your workload. - **Machine images:** Use EC2 Image Builder to manage updates to Amazon Machine Images (AMIs) for Linux or Windows server images. - **Container images:** Use Amazon Elastic Container Registry (Amazon ECR) with your existing pipeline to manage Amazon Elastic Container Service (Amazon ECS) images. - **AWS Lambda:** AWS Lambda includes version management features. 4. Use automation: Automate updates to reduce the level of effort to deploy new features and limit errors caused by manual processes. - You can use CI/CD to automatically update AMIs, container images, and other artifacts related to your cloud application. - You can use tools such as AWS Systems Manager Patch Manager to automate the process of system updates, and schedule the activity using AWS Systems Manager Maintenance Windows.

๐Ÿ’ผ SUS06-BP04 Increase utilization of build environments

Increase the utilization of resources to develop, test, and build your workloads. **Common anti-patterns** - You manually provision or terminate your build environments. - You keep your build environments running independent of test, build, or release activities (for example, running an environment outside of the working hours of your development team members). - You over-provision resources for your build environments. **Benefits of establishing this best practice** By increasing the utilization of build environments, you can improve the overall efficiency of your cloud workload while allocating the resources to builders to develop, test, and build efficiently. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Use automation and infrastructure-as-code to bring build environments up when needed and take them down when not used. A common pattern is to schedule periods of availability that coincide with the working hours of your development team members. Your test environments should closely resemble the production configuration. However, look for opportunities to use instance types with burst capacity, Amazon EC2 Spot Instances, automatic scaling database services, containers, and serverless technologies to align development and test capacity with use. Limit data volume to just meet the test requirements. If using production data in test, explore possibilities of sharing data from production and not moving data across. ### Implementation steps 1. Use infrastructure as code: Use infrastructure as code to provision your build environments. 2. Use automation: Use automation to manage the lifecycle of your development and test environments and maximize the efficiency of your build resources. 3. Maximize utilization: Use strategies to maximize the utilization of development and test environments. - Use minimum viable representative environments to develop and test potential improvements. - Use serverless technologies if possible. - Use On-Demand Instances to supplement your developer devices. - Use instance types with burst capacity, Spot Instances, and other technologies to align build capacity with use. - Adopt native cloud services for secure instance shell access rather than deploying fleets of bastion hosts. - Automatically scale your build resources depending on your build jobs.

๐Ÿ’ผ SUS06-BP05 Use managed device farms for testing

Use managed device farms to efficiently test a new feature on a representative set of hardware. **Common anti-patterns** - You manually test and deploy your application on individual physical devices. - You do not use app testing service to test and interact with your apps (for example, Android, iOS, and web apps) on real, physical devices. **Benefits of establishing this best practice** Using managed device farms for testing cloud-enabled applications provides a number of benefits: - They include more efficient features to test application on wide range of devices. - They eliminate the need for in-house infrastructure for testing. - They offer diverse device types, including older and less popular hardware, which eliminates the need for unnecessary device upgrades. **Level of risk exposed if this best practice is not established:** Low ## Implementation guidance Using managed device farms can help you to streamline the testing process for new features on a representative set of hardware. Managed device farms offer diverse device types including older, less popular hardware, and avoid customer sustainability impact from unnecessary device upgrades. ### Implementation steps 1. Define testing requirements: Define your testing requirements and plan (like test type, operating systems, and test schedule). - You can use Amazon CloudWatch RUM to collect and analyze client-side data and shape your testing plan. 2. Select a managed device farm: Select a managed device farm that can support your testing requirements. For example, you can use AWS Device Farm to test and understand the impact of your changes on a representative set of hardware. 3. Use automation: Use automation and continuous integration/continuous deployment (CI/CD) to schedule and run your tests. - Integrating AWS Device Farm with your CI/CD pipeline to run cross-browser Selenium tests. - Building and testing iOS and iPadOS apps with AWS DevOps and mobile services. 4. Review and adjust: Continually review your testing results and make necessary improvements.

๐Ÿ’ผ Test reliability

After you have designed your workload to be resilient to the stresses of production, testing is the only way to ensure that it will operate as designed, and deliver the resiliency you expect.

๐Ÿ’ผ Threat Protection

Policies for identifying resources that do not implement configurations used to mitigate potential security threats.

๐Ÿ’ผ Use fault isolation to protect your workload

Fault isolation limits the impact of a component or system failure to a defined boundary. With proper isolation, components outside of the boundary are unaffected by the failure. Running your workload across multiple fault isolation boundaries can make it more resilient to failure.

๐Ÿ’ผ Utilizing workload observability

Ensure optimal workload health by leveraging observability. Utilize relevant metrics, logs, and traces to gain a comprehensive view of your workload's performance and address issues efficiently.