Commit graph

20 commits

Author SHA1 Message Date
4060dbbe21
fix all ansible-lint yaml errors (except for line-length) 2024-11-23 02:49:23 +01:00
a386f9e2eb
custom alerts for CI VMs
its expected for some VMs to have high Read / Write rates for some time
so this is a custom alerts for ours CI VMs
2024-11-10 17:06:41 +01:00
3284fae62a
Add more prometheus node exporter 2024-11-05 19:16:28 +01:00
261bd7d654
Add prometheus-node-exporter role and add it to most hosts 2024-11-03 21:27:51 +01:00
34dc6d9a84
Reduce Host Memory is underutilized to 10% 2024-10-18 21:15:20 +02:00
4cac84e7ec
prometheus: have different disk alerts for physical and virtual hosts
Have more relaxed read/write alerts for physical hosts as they are
probably hypervisors and regular high read/writes are more common.
Also differentiate between physical and virtual hosts for IO alerts and
allow for hard disks to spend more time in IO.
2024-10-05 17:22:45 +02:00
f721dd9fea
prometheus: make opnsense-ccchh job not fail half the time
The scrape seems to take around a second to complete and with the
configured timeout of 1s that failed half the time. Therefore use the
default, more relaxed scrape interval and timeout and have it be
reliable.
2024-10-05 17:22:45 +02:00
0a05cad0a1
prometheus & alertmanager: add self-alerting
Add self-alerting for Prometheus and Alertmanager using rules from
https://samber.github.io/awesome-prometheus-alerts/rules
2024-10-02 04:13:37 +02:00
2e29b78f6a
prometheus: move Jitsis node exporter target to hosts job 2024-10-02 03:45:56 +02:00
30876f821c
prometheus, alertmanager: use Prometheus alerts with Alertmanager
For now introduce node-exporter/hosts alert rules, which got taken from
https://samber.github.io/awesome-prometheus-alerts/rules
However with the labels removed from the description, since they don't
render correctly (at least in Telegram) and don't seem to provide much
value, as we render the labels in the notification anyway.

Also only have Telegram as the notification channel for now, as it was
the easiest to set up.
2024-10-02 03:36:30 +02:00
803b19de0a
prometheus: add job for node exporter (for the NixOS VMs for now) 2024-10-01 20:09:42 +02:00
29d2d2926f
prometheus: don't duplicate scrape interval and timeout 2024-10-01 01:59:33 +02:00
0f732833de Grafana-Config für PVE dazu 2024-02-26 22:29:02 +01:00
e2a0b9e74c grafana: add chaosknoten 2024-01-30 23:23:13 +01:00
2431b455c2 Use prometheus-jitsi-meet-exporter 2024-01-29 21:13:22 +01:00
3184154f7b Add jitsi video bridge stats 2024-01-29 20:31:12 +01:00
e0ebe2c720 Add jitsi as target 2024-01-28 07:52:46 +01:00
79ac891c30 Add metrics for club OPNsense 2024-01-26 19:28:09 +01:00
0307ad6c9f proxy access to metrics through nginx 2024-01-24 19:36:21 +01:00
a68edb81c4 Add Grafana/Prometheus config 2024-01-24 19:12:43 +01:00