grafana(host): account in Prom. hyperv. disk alerts for longer backups
All checks were successful
/ Ansible Lint (push) Successful in 1m39s

Set duration for Prometheus hypervisor disk rw rate and hard disk io
alerts to 2h to account for the very long running (over 90m) backup job.
This commit is contained in:
June 2025-02-18 15:38:07 +01:00
commit fce4c2f73b
Signed by: june
SSH key fingerprint: SHA256:o9EAq4Y9N9K0pBQeBTqhSDrND5E7oB+60ZNx0U1yPe0

View file

@ -166,7 +166,7 @@ groups:
# Longer intervals to account for disk intensive hypervisor tasks (backups, moving VMs, etc.).
- alert: HypervisorHostUnusualDiskReadRate
expr: (sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"}
for: 90m
for: 2h
labels:
severity: warning
annotations:
@ -174,7 +174,7 @@ groups:
description: "Disk is probably reading too much data (> 50 MB/s)\n VALUE = {{ $value }}"
- alert: HypervisorHostUnusualDiskWriteRate
expr: (sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"}
for: 90m
for: 2h
labels:
severity: warning
annotations:
@ -256,7 +256,7 @@ groups:
# Since hard disks on the hypervisor can easily have their IO saturated by hypervisor tasks (backups, moving VMs, etc.), alert when the IO is above the regular threshold for a very long time.
- alert: HypervisorHostUnusualHardDiskIo
expr: (rate(node_disk_io_time_seconds_total{device=~"s.+"}[1m]) > 0.5) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"}
for: 90m
for: 2h
labels:
severity: warning
annotations: