terraform.aws-baseline-infra/modules/ManagementGovernance/Monitoring.EC2/README.md

74 lines
3.0 KiB
Markdown
Raw Normal View History

2022-11-03 21:16:35 +08:00
# Monitoring module
This module deploys the default cloudwatch metric monitoring
## Notes
2022-11-10 17:10:45 +08:00
Terraform lifecycle ignores tags to speed up terraform subsequent update. Cloudwatch alarm tags cannot be read on aws console anyway.
## Example
```terraform
module "ec2-instances" {
source = "../../modules/util/resource-list"
resource-type = "ec2"
}
module "ec2-monitoring" {
cw-alarm-prefix = local.cw-alarm-prefix
for_each = module.ec2-instances.result-set
source = "../../modules/ManagementGovernance/Monitoring.EC2"
default-tags = local.default-tags
ec2-instance-id = each.value
threshold-CPUUtilization = 90
threshold-mem_free = 100000
threshold-swap_free = 100000
threshold-disk_free = 1 * 1000 * 1000 * 1000
threshold-disk_inodes_free = 10000
threshold-processes_total = 500
threshold-LogicalDiskFreePct = 10
threshold-MemoryCommittedPct = 90
actions-enabled = var.actions-enabled
sns-targets = var.sns-targets
}
```
## Sample cloudwatch alarm email notification
```
Subject: ALARM: "TestAlarmPleaseIgnore" in Asia Pacific (Hong Kong)
You are receiving this email because your Amazon CloudWatch Alarm "TestAlarmPleaseIgnore" in the
Asia Pacific (Hong Kong) region has entered the ALARM state, because "Threshold Crossed: 1 out of
the last 1 datapoints [864.0 (24/01/24 00:56:00)] was less than or equal to the threshold (900.0)
(minimum 1 datapoint for OK -> ALARM transition)." at "Wednesday 24 January, 2024 01:01:34 UTC".
View this alarm in the AWS Management Console:
https://ap-east-1.console.aws.amazon.com%2Fcloudwatch...
Alarm Details:
- Name: TestAlarmPleaseIgnore
- Description: Cloudwatch alarm for the following resource
- Instance ID: xxx
- Instance Name: yyy
- Instance IP: zz.zz.zz.zz
- State Change: OK -> ALARM
- Reason for State Change: Threshold Crossed: 1 out of the last 1 datapoints [864.0 (24/01/24 00:56:00)] was less than or equal to the threshold (900.0) (minimum 1 datapoint for OK -> ALARM transition).
- Timestamp: Wednesday 24 January, 2024 01:01:34 UTC
- AWS Account: 111122223333
- Alarm Arn: arn:aws:cloudwatch:ap-east-1:111122223333:alarm:TestAlarmPleaseIgnore
Threshold:
- The alarm is in the ALARM state when the metric is LessThanOrEqualToThreshold 900.0 for at least 1 of the last 1 period(s) of 300 seconds.
Monitored Metric:
- MetricNamespace: AWS/EC2
- MetricName: CPUCreditBalance
- Dimensions: [InstanceId = i-050d4adeafaa53cd0]
- Period: 300 seconds
- Statistic: Average
- Unit: not specified
- TreatMissingData: missing
State Change Actions:
- OK:
- ALARM: [arn:aws:sns:ap-east-1:111122223333:CWA-SNS-Email-KenFong]
- INSUFFICIENT_DATA:
2022-11-10 17:10:45 +08:00
```