7f1afef50d
move secrets from sops lookup plugin to sops vars plugin
...
/ Ansible Lint (push) Failing after 1m54s
This makes secret configuration and usage a good bit cleaner.
2025-05-04 16:50:15 +02:00
bbe4cc131a
eh22-netbox: remove eh22-netbox as its being decommissioned
/ Ansible Lint (push) Failing after 1m44s
2025-05-03 23:40:03 +02:00
97b8386878
grafana(host): move secrets to SOPS
/ Ansible Lint (push) Failing after 1m49s
2025-05-03 22:18:26 +02:00
e183f1a2c3
prometheus remote write with alloy using it
/ Ansible Lint (push) Failing after 1m53s
2025-04-30 01:11:17 +02:00
e21ff26f36
fix: alertmanager
...
/ Ansible Lint (push) Failing after 1m56s
the message template now just give out simple string if the list of alerts is to long
2025-04-28 23:02:13 +02:00
456117a789
adding loki
/ Ansible Lint (push) Failing after 1m55s
2025-04-28 20:31:55 +02:00
fce4c2f73b
grafana(host): account in Prom. hyperv. disk alerts for longer backups
...
/ Ansible Lint (push) Successful in 1m39s
Set duration for Prometheus hypervisor disk rw rate and hard disk io
alerts to 2h to account for the very long running (over 90m) backup job.
2025-02-18 15:38:07 +01:00
07511ef723
grafana(host): remove decomissioned nix-box-june from Prometheus targets
/ Ansible Lint (push) Successful in 1m42s
2025-02-18 04:51:26 +01:00
79012fb7f8
eh22-netbox: setup EH22 NetBox
/ Ansible Lint (push) Successful in 1m44s
2025-02-17 01:23:35 +01:00
ac7e8bb6f2
grafana: set dur. for Prom. hyperv. disk rw rate and hdd io aler. to 90m
...
/ Ansible Lint (push) Successful in 1m43s
Set duration for Prometheus hypervisor disk rw rate and hard disk io
alerts to 90m to account for the very long running (over an hour) backup
job.
2025-02-15 06:08:37 +01:00
40cddb67b4
grafana: account for long backup jobs in Prom. hyperv. disk rw rate al.
/ Ansible Lint (pull_request) Successful in 1m35s
/ Ansible Lint (push) Successful in 1m34s
2025-02-06 19:17:21 +01:00
c4e35c1adf
grafana: pull out prom. net. rec. err. alerts for OPNs. to ex. wg int.
...
/ Ansible Lint (push) Successful in 1m32s
/ Ansible Lint (pull_request) Successful in 1m30s
Pull out prometheus network receive error alerts for OPNsense to exclude
its WireGuard interfaces, which like to throw errors, but which aren't
of importance.
2025-02-06 01:34:45 +01:00
ee66631c2d
grafana: diff. prometheus disk io alerts by host task and disk type
...
/ Ansible Lint (push) Successful in 1m34s
/ Ansible Lint (pull_request) Successful in 1m32s
Differentiate by host task (hypervisor or not) and disk (hard disk or
not) type not by whether or not the host is physical and virtual and
then by disk type.
This is in line with the disk rate alerts changes and allows for
fine-grained adjustments based on the host task type, which actually
matters for these alerts.
2025-02-06 01:13:10 +01:00
9e77a41e3c
grafana: differentiate prometheus disk rate alerts by host task type
...
/ Ansible Lint (push) Successful in 1m38s
/ Ansible Lint (pull_request) Successful in 1m37s
Not by a mix of host task type (CI server or not) and whether or not the
host is virtual or physical.
Also only differentiate on the duration not the rate, to not
accidentally exclude slow hard disks.
2025-02-06 01:05:05 +01:00
5016407cef
grafana: group prometheus alert rules for better organization
/ Ansible Lint (push) Successful in 1m40s
/ Ansible Lint (pull_request) Successful in 1m37s
2025-02-06 00:12:50 +01:00
6fa896dd3f
Remove jobe for mumble.c3lingo.org since the the endpoint appears to dont exsists anymore
/ Ansible Lint (push) Successful in 1m49s
2025-01-19 21:03:38 +01:00
07dbbf055c
reorganize (config) files and templates into one "resources" dir
...
This groups the files and templates for each host together and therefore
makes it easier to see all the (config) files for a host.
Also clean up incorrect, unused docker_compose config for mumble and
clean up unused engelsystem configs.
2024-12-08 02:55:25 +01:00