2e66e5de3b
grafana: enable promql-experimental-functions
/ Ansible Lint (push) Failing after 2m0s
2025-07-20 19:09:59 +02:00
a4c703b185
grafana: there are more metrics now
/ Ansible Lint (push) Failing after 1m51s
2025-07-18 21:23:39 +02:00
9f0c276240
grafana: setup sendAlert = false receiver to mute alert
/ Ansible Lint (push) Failing after 1m57s
2025-07-16 22:55:07 +02:00
d734a1cc6c
grafana: accept WAL from remote write
/ Ansible Lint (push) Failing after 2m2s
2025-07-09 00:27:56 +02:00
ff5f8ffc80
ntfy-alertmanager silence now works
/ Ansible Lint (push) Failing after 2m0s
2025-06-12 20:04:02 +02:00
1cc4ca6947
ntfy-alertmanager setup silence in alertmanager
/ Ansible Lint (push) Failing after 1m57s
2025-06-12 19:48:18 +02:00
2cb9dc6dae
grafana dinge versuchen
/ Ansible Lint (push) Failing after 2m2s
2025-06-12 19:20:03 +02:00
0a50ee470a
grafana: add email alerts, and fix some rules
/ Ansible Lint (push) Failing after 2m0s
2025-06-10 21:22:53 +02:00
db99b153e4
grafana: make ntfy messages look a bit better
/ Ansible Lint (push) Failing after 1m58s
2025-06-05 00:45:45 +02:00
00bcd45111
grafana: alertmanager settings
/ Ansible Lint (push) Failing after 2m2s
2025-06-04 22:36:57 +02:00
5fe5304463
alertmanager
...
/ Ansible Lint (push) Failing after 2m20s
set repeat_interval for fux and try restore telegram channel
2025-06-04 03:01:12 +02:00
9b444ec4c4
rules eval
/ Ansible Lint (push) Failing after 2m10s
2025-06-03 18:33:20 +02:00
06c1ebbd5f
grafana: fix remote write
/ Ansible Lint (push) Failing after 1m54s
2025-06-02 23:02:19 +02:00
3a9673b113
ntfy alerts
/ Ansible Lint (push) Failing after 1m55s
2025-06-02 22:42:37 +02:00
0e61131c1b
prometheus: pre filtering setup
/ Ansible Lint (push) Failing after 1m57s
2025-06-01 01:33:14 +02:00
7f1afef50d
move secrets from sops lookup plugin to sops vars plugin
...
/ Ansible Lint (push) Failing after 1m54s
This makes secret configuration and usage a good bit cleaner.
2025-05-04 16:50:15 +02:00
bbe4cc131a
eh22-netbox: remove eh22-netbox as its being decommissioned
/ Ansible Lint (push) Failing after 1m44s
2025-05-03 23:40:03 +02:00
97b8386878
grafana(host): move secrets to SOPS
/ Ansible Lint (push) Failing after 1m49s
2025-05-03 22:18:26 +02:00
01c006ec22
grafana fix nginx ip allow list
/ Ansible Lint (push) Failing after 1m48s
2025-05-02 01:08:55 +02:00
58642620a1
IPv6 fix für metrics
/ Ansible Lint (push) Failing after 1m47s
2025-04-30 16:23:35 +02:00
bd9e04eef8
metrics fux
2025-04-30 02:16:09 +02:00
e183f1a2c3
prometheus remote write with alloy using it
/ Ansible Lint (push) Failing after 1m53s
2025-04-30 01:11:17 +02:00
e21ff26f36
fix: alertmanager
...
/ Ansible Lint (push) Failing after 1m56s
the message template now just give out simple string if the list of alerts is to long
2025-04-28 23:02:13 +02:00
456117a789
adding loki
/ Ansible Lint (push) Failing after 1m55s
2025-04-28 20:31:55 +02:00
fce4c2f73b
grafana(host): account in Prom. hyperv. disk alerts for longer backups
...
/ Ansible Lint (push) Successful in 1m39s
Set duration for Prometheus hypervisor disk rw rate and hard disk io
alerts to 2h to account for the very long running (over 90m) backup job.
2025-02-18 15:38:07 +01:00
07511ef723
grafana(host): remove decomissioned nix-box-june from Prometheus targets
/ Ansible Lint (push) Successful in 1m42s
2025-02-18 04:51:26 +01:00
79012fb7f8
eh22-netbox: setup EH22 NetBox
/ Ansible Lint (push) Successful in 1m44s
2025-02-17 01:23:35 +01:00
ac7e8bb6f2
grafana: set dur. for Prom. hyperv. disk rw rate and hdd io aler. to 90m
...
/ Ansible Lint (push) Successful in 1m43s
Set duration for Prometheus hypervisor disk rw rate and hard disk io
alerts to 90m to account for the very long running (over an hour) backup
job.
2025-02-15 06:08:37 +01:00
40cddb67b4
grafana: account for long backup jobs in Prom. hyperv. disk rw rate al.
/ Ansible Lint (pull_request) Successful in 1m35s
/ Ansible Lint (push) Successful in 1m34s
2025-02-06 19:17:21 +01:00
c4e35c1adf
grafana: pull out prom. net. rec. err. alerts for OPNs. to ex. wg int.
...
/ Ansible Lint (push) Successful in 1m32s
/ Ansible Lint (pull_request) Successful in 1m30s
Pull out prometheus network receive error alerts for OPNsense to exclude
its WireGuard interfaces, which like to throw errors, but which aren't
of importance.
2025-02-06 01:34:45 +01:00
ee66631c2d
grafana: diff. prometheus disk io alerts by host task and disk type
...
/ Ansible Lint (push) Successful in 1m34s
/ Ansible Lint (pull_request) Successful in 1m32s
Differentiate by host task (hypervisor or not) and disk (hard disk or
not) type not by whether or not the host is physical and virtual and
then by disk type.
This is in line with the disk rate alerts changes and allows for
fine-grained adjustments based on the host task type, which actually
matters for these alerts.
2025-02-06 01:13:10 +01:00
9e77a41e3c
grafana: differentiate prometheus disk rate alerts by host task type
...
/ Ansible Lint (push) Successful in 1m38s
/ Ansible Lint (pull_request) Successful in 1m37s
Not by a mix of host task type (CI server or not) and whether or not the
host is virtual or physical.
Also only differentiate on the duration not the rate, to not
accidentally exclude slow hard disks.
2025-02-06 01:05:05 +01:00
5016407cef
grafana: group prometheus alert rules for better organization
/ Ansible Lint (push) Successful in 1m40s
/ Ansible Lint (pull_request) Successful in 1m37s
2025-02-06 00:12:50 +01:00
6fa896dd3f
Remove jobe for mumble.c3lingo.org since the the endpoint appears to dont exsists anymore
/ Ansible Lint (push) Successful in 1m49s
2025-01-19 21:03:38 +01:00
07dbbf055c
reorganize (config) files and templates into one "resources" dir
...
This groups the files and templates for each host together and therefore
makes it easier to see all the (config) files for a host.
Also clean up incorrect, unused docker_compose config for mumble and
clean up unused engelsystem configs.
2024-12-08 02:55:25 +01:00