grafana: set dur. for Prom. hyperv. disk rw rate and hdd io aler. to 90m
	
		
			
	
		
	
	
		
	
		
			All checks were successful
		
		
	
	
		
			
				
	
				/ Ansible Lint (push) Successful in 1m43s
				
			
		
		
	
	
		
	
		
			All checks were successful
		
		
	
	/ Ansible Lint (push) Successful in 1m43s
				
			Set duration for Prometheus hypervisor disk rw rate and hard disk io alerts to 90m to account for the very long running (over an hour) backup job.
This commit is contained in:
		
					parent
					
						
							
								1bae6234ae
							
						
					
				
			
			
				commit
				
					
						ac7e8bb6f2
					
				
			
		
					 1 changed files with 3 additions and 3 deletions
				
			
		|  | @ -166,7 +166,7 @@ groups: | ||||||
|       # Longer intervals to account for disk intensive hypervisor tasks (backups, moving VMs, etc.). |       # Longer intervals to account for disk intensive hypervisor tasks (backups, moving VMs, etc.). | ||||||
|       - alert: HypervisorHostUnusualDiskReadRate |       - alert: HypervisorHostUnusualDiskReadRate | ||||||
|         expr: (sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"} |         expr: (sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"} | ||||||
|         for: 60m |         for: 90m | ||||||
|         labels: |         labels: | ||||||
|           severity: warning |           severity: warning | ||||||
|         annotations: |         annotations: | ||||||
|  | @ -174,7 +174,7 @@ groups: | ||||||
|           description: "Disk is probably reading too much data (> 50 MB/s)\n  VALUE = {{ $value }}" |           description: "Disk is probably reading too much data (> 50 MB/s)\n  VALUE = {{ $value }}" | ||||||
|       - alert: HypervisorHostUnusualDiskWriteRate |       - alert: HypervisorHostUnusualDiskWriteRate | ||||||
|         expr: (sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"} |         expr: (sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"} | ||||||
|         for: 60m |         for: 90m | ||||||
|         labels: |         labels: | ||||||
|           severity: warning |           severity: warning | ||||||
|         annotations: |         annotations: | ||||||
|  | @ -256,7 +256,7 @@ groups: | ||||||
|       # Since hard disks on the hypervisor can easily have their IO saturated by hypervisor tasks (backups, moving VMs, etc.), alert when the IO is above the regular threshold for a very long time. |       # Since hard disks on the hypervisor can easily have their IO saturated by hypervisor tasks (backups, moving VMs, etc.), alert when the IO is above the regular threshold for a very long time. | ||||||
|       - alert: HypervisorHostUnusualHardDiskIo |       - alert: HypervisorHostUnusualHardDiskIo | ||||||
|         expr: (rate(node_disk_io_time_seconds_total{device=~"s.+"}[1m]) > 0.5) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"} |         expr: (rate(node_disk_io_time_seconds_total{device=~"s.+"}[1m]) > 0.5) * on(instance) group_left (nodename) node_uname_info{nodename="chaosknoten"} | ||||||
|         for: 50m |         for: 90m | ||||||
|         labels: |         labels: | ||||||
|           severity: warning |           severity: warning | ||||||
|         annotations: |         annotations: | ||||||
|  |  | ||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue