74 lines
3.0 KiB
Markdown
74 lines
3.0 KiB
Markdown
# Monitoring module
|
|
This module deploys the default cloudwatch metric monitoring
|
|
|
|
## Notes
|
|
Terraform lifecycle ignores tags to speed up terraform subsequent update. Cloudwatch alarm tags cannot be read on aws console anyway.
|
|
|
|
## Example
|
|
```terraform
|
|
module "ec2-instances" {
|
|
source = "../../modules/util/resource-list"
|
|
resource-type = "ec2"
|
|
}
|
|
|
|
module "ec2-monitoring" {
|
|
cw-alarm-prefix = local.cw-alarm-prefix
|
|
for_each = module.ec2-instances.result-set
|
|
source = "../../modules/ManagementGovernance/Monitoring.EC2"
|
|
default-tags = local.default-tags
|
|
ec2-instance-id = each.value
|
|
threshold-CPUUtilization = 90
|
|
threshold-mem_free = 100000
|
|
threshold-swap_free = 100000
|
|
threshold-disk_free = 1 * 1000 * 1000 * 1000
|
|
threshold-disk_inodes_free = 10000
|
|
threshold-processes_total = 500
|
|
threshold-LogicalDiskFreePct = 10
|
|
threshold-MemoryCommittedPct = 90
|
|
actions-enabled = var.actions-enabled
|
|
sns-targets = var.sns-targets
|
|
}
|
|
```
|
|
|
|
## Sample cloudwatch alarm email notification
|
|
```
|
|
Subject: ALARM: "TestAlarmPleaseIgnore" in Asia Pacific (Hong Kong)
|
|
|
|
You are receiving this email because your Amazon CloudWatch Alarm "TestAlarmPleaseIgnore" in the
|
|
Asia Pacific (Hong Kong) region has entered the ALARM state, because "Threshold Crossed: 1 out of
|
|
the last 1 datapoints [864.0 (24/01/24 00:56:00)] was less than or equal to the threshold (900.0)
|
|
(minimum 1 datapoint for OK -> ALARM transition)." at "Wednesday 24 January, 2024 01:01:34 UTC".
|
|
|
|
View this alarm in the AWS Management Console:
|
|
https://ap-east-1.console.aws.amazon.com%2Fcloudwatch...
|
|
|
|
Alarm Details:
|
|
- Name: TestAlarmPleaseIgnore
|
|
- Description: Cloudwatch alarm for the following resource
|
|
- Instance ID: xxx
|
|
- Instance Name: yyy
|
|
- Instance IP: zz.zz.zz.zz
|
|
- State Change: OK -> ALARM
|
|
- Reason for State Change: Threshold Crossed: 1 out of the last 1 datapoints [864.0 (24/01/24 00:56:00)] was less than or equal to the threshold (900.0) (minimum 1 datapoint for OK -> ALARM transition).
|
|
- Timestamp: Wednesday 24 January, 2024 01:01:34 UTC
|
|
- AWS Account: 111122223333
|
|
- Alarm Arn: arn:aws:cloudwatch:ap-east-1:111122223333:alarm:TestAlarmPleaseIgnore
|
|
|
|
Threshold:
|
|
- The alarm is in the ALARM state when the metric is LessThanOrEqualToThreshold 900.0 for at least 1 of the last 1 period(s) of 300 seconds.
|
|
|
|
Monitored Metric:
|
|
- MetricNamespace: AWS/EC2
|
|
- MetricName: CPUCreditBalance
|
|
- Dimensions: [InstanceId = i-050d4adeafaa53cd0]
|
|
- Period: 300 seconds
|
|
- Statistic: Average
|
|
- Unit: not specified
|
|
- TreatMissingData: missing
|
|
|
|
|
|
State Change Actions:
|
|
- OK:
|
|
- ALARM: [arn:aws:sns:ap-east-1:111122223333:CWA-SNS-Email-KenFong]
|
|
- INSUFFICIENT_DATA:
|
|
``` |