terraform.aws-baseline-infra/modules/ManagementGovernance/Monitoring.EC2
2024-01-24 11:18:35 +08:00
..
get-cwagent-device.sh UPD: backported monitoring modules from customer repo 2023-05-23 13:10:16 +08:00
get-cwagent-dimensions.sh UPD: Added Cwl-firehose-s3 module and pulled a couple updates from upstream repo. 2024-01-13 00:25:30 +08:00
get-os-platform.sh UPD: backported monitoring modules from customer repo 2023-05-23 13:10:16 +08:00
main.tf UPD: various updates on cloudwatch monitoring from upstream 2024-01-24 11:18:35 +08:00
provider.tf UPD: backported monitoring modules from customer repo 2023-05-23 13:10:16 +08:00
README.md UPD: various updates on cloudwatch monitoring from upstream 2024-01-24 11:18:35 +08:00
variables.tf UPD: various monitoring updates from upstream 2023-07-04 08:16:09 +08:00

Monitoring module

This module deploys the default cloudwatch metric monitoring

Notes

Terraform lifecycle ignores tags to speed up terraform subsequent update. Cloudwatch alarm tags cannot be read on aws console anyway.

Example

module "ec2-instances" {
  source        = "../../modules/util/resource-list"
  resource-type = "ec2"
}

module "ec2-monitoring" {
  cw-alarm-prefix            = local.cw-alarm-prefix
  for_each                   = module.ec2-instances.result-set
  source                     = "../../modules/ManagementGovernance/Monitoring.EC2"
  default-tags               = local.default-tags
  ec2-instance-id            = each.value
  threshold-CPUUtilization   = 90
  threshold-mem_free         = 100000
  threshold-swap_free        = 100000
  threshold-disk_free        = 1 * 1000 * 1000 * 1000
  threshold-disk_inodes_free = 10000
  threshold-processes_total  = 500
  threshold-LogicalDiskFreePct = 10
  threshold-MemoryCommittedPct = 90
  actions-enabled            = var.actions-enabled
  sns-targets = var.sns-targets
}

Sample cloudwatch alarm email notification

Subject: ALARM: "TestAlarmPleaseIgnore" in Asia Pacific (Hong Kong)

You are receiving this email because your Amazon CloudWatch Alarm "TestAlarmPleaseIgnore" in the 
Asia Pacific (Hong Kong) region has entered the ALARM state, because "Threshold Crossed: 1 out of 
the last 1 datapoints [864.0 (24/01/24 00:56:00)] was less than or equal to the threshold (900.0) 
(minimum 1 datapoint for OK -> ALARM transition)." at "Wednesday 24 January, 2024 01:01:34 UTC".

View this alarm in the AWS Management Console:
https://ap-east-1.console.aws.amazon.com%2Fcloudwatch...

Alarm Details:
- Name:                       TestAlarmPleaseIgnore
- Description:                Cloudwatch alarm for the following resource
- Instance ID: xxx
- Instance Name: yyy
- Instance IP: zz.zz.zz.zz
- State Change:               OK -> ALARM
- Reason for State Change:    Threshold Crossed: 1 out of the last 1 datapoints [864.0 (24/01/24 00:56:00)] was less than or equal to the threshold (900.0) (minimum 1 datapoint for OK -> ALARM transition).
- Timestamp:                  Wednesday 24 January, 2024 01:01:34 UTC
- AWS Account:                111122223333
- Alarm Arn:                  arn:aws:cloudwatch:ap-east-1:111122223333:alarm:TestAlarmPleaseIgnore

Threshold:
- The alarm is in the ALARM state when the metric is LessThanOrEqualToThreshold 900.0 for at least 1 of the last 1 period(s) of 300 seconds.

Monitored Metric:
- MetricNamespace:                     AWS/EC2
- MetricName:                          CPUCreditBalance
- Dimensions:                          [InstanceId = i-050d4adeafaa53cd0]
- Period:                              300 seconds
- Statistic:                           Average
- Unit:                                not specified
- TreatMissingData:                    missing


State Change Actions:
- OK:
- ALARM: [arn:aws:sns:ap-east-1:111122223333:CWA-SNS-Email-KenFong]
- INSUFFICIENT_DATA: