Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Wanted to add alerting for systemd services in Prometheus today, which spontaneously turned out to be a huge pain in the lower human backend.
For some reason, on Ubuntu 16.04 systemd adds services without unit files for software, that isn't even installed on the damn server (in this case for mysql-server / mysql-common and mysql-client are installed) and lists them as "not-found" and "inactive". The prometheus node exporter that we use, has a little bug in the systemd collector that makes sure that the states of *all* services are collected - even those without a unit file.
so those metrics are pulled by prometheus and now I have to take with those faulty metrics in the condition logic of the alert, because I'm trying to trigger that one on a service which is listed with state "active" = 0 or "failed" = 1.
now guess. right! If the unit file doesn't exist, the regarded systemd service is marked as "inactive", which is another possible state of the metrics in the node exporter. the problem is that the value 1 for state "inactive" means, that "active" has the value 0 (not even wrong) and the alert is triggered.
so systemd fucks up somehow, the node exporter collector fucks up because systemd fucked up and I have to unfuck this with some crazy horse shit logic. w.t.f. to that.
the only good news is, that it works like a charm on Ubuntu 18.04, as far, as I can tell.
while writing this little rant, I thought of a solution.
I could try to change the alert condition to state "active" = 0 AND "failed" = 1.. but that will wait till tomorrow.
one does not simply patch monitoring conditions at midnight..
rant