3.0 KiB
3.0 KiB
Monitoring module
This module deploys the default cloudwatch metric monitoring
Notes
Terraform lifecycle ignores tags to speed up terraform subsequent update. Cloudwatch alarm tags cannot be read on aws console anyway.
Example
module "ec2-instances" {
source = "../../modules/util/resource-list"
resource-type = "ec2"
}
module "ec2-monitoring" {
cw-alarm-prefix = local.cw-alarm-prefix
for_each = module.ec2-instances.result-set
source = "../../modules/ManagementGovernance/Monitoring.EC2"
default-tags = local.default-tags
ec2-instance-id = each.value
threshold-CPUUtilization = 90
threshold-mem_free = 100000
threshold-swap_free = 100000
threshold-disk_free = 1 * 1000 * 1000 * 1000
threshold-disk_inodes_free = 10000
threshold-processes_total = 500
threshold-LogicalDiskFreePct = 10
threshold-MemoryCommittedPct = 90
actions-enabled = var.actions-enabled
sns-targets = var.sns-targets
}
Sample cloudwatch alarm email notification
Subject: ALARM: "TestAlarmPleaseIgnore" in Asia Pacific (Hong Kong)
You are receiving this email because your Amazon CloudWatch Alarm "TestAlarmPleaseIgnore" in the
Asia Pacific (Hong Kong) region has entered the ALARM state, because "Threshold Crossed: 1 out of
the last 1 datapoints [864.0 (24/01/24 00:56:00)] was less than or equal to the threshold (900.0)
(minimum 1 datapoint for OK -> ALARM transition)." at "Wednesday 24 January, 2024 01:01:34 UTC".
View this alarm in the AWS Management Console:
https://ap-east-1.console.aws.amazon.com%2Fcloudwatch...
Alarm Details:
- Name: TestAlarmPleaseIgnore
- Description: Cloudwatch alarm for the following resource
- Instance ID: xxx
- Instance Name: yyy
- Instance IP: zz.zz.zz.zz
- State Change: OK -> ALARM
- Reason for State Change: Threshold Crossed: 1 out of the last 1 datapoints [864.0 (24/01/24 00:56:00)] was less than or equal to the threshold (900.0) (minimum 1 datapoint for OK -> ALARM transition).
- Timestamp: Wednesday 24 January, 2024 01:01:34 UTC
- AWS Account: 111122223333
- Alarm Arn: arn:aws:cloudwatch:ap-east-1:111122223333:alarm:TestAlarmPleaseIgnore
Threshold:
- The alarm is in the ALARM state when the metric is LessThanOrEqualToThreshold 900.0 for at least 1 of the last 1 period(s) of 300 seconds.
Monitored Metric:
- MetricNamespace: AWS/EC2
- MetricName: CPUCreditBalance
- Dimensions: [InstanceId = i-050d4adeafaa53cd0]
- Period: 300 seconds
- Statistic: Average
- Unit: not specified
- TreatMissingData: missing
State Change Actions:
- OK:
- ALARM: [arn:aws:sns:ap-east-1:111122223333:CWA-SNS-Email-KenFong]
- INSUFFICIENT_DATA: