You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In both cases I ran into situations where naemon would crash (and dump a core) and merlind would eventually peg itself at 99% CPU usage. I went in circles for awhile trying to determine what was going on. I started to narrow in on service and host checks that would return a CRITICAL state and cause naemon to crash when it was attempting to generate a notification (even though I had notifications disabled globally). During my initial load testing I was using mostly ping checks that all returned OK, so I rarely hit this condition. But the moment I started getting checks that returned CRITICAL, things would break.
Anyway, long story short - I built merlin from source and everything is fine now. But given the run-around I went through, I figured I'd report this here for anyone else who may encounter this problem -or- merely as a suggestion that it might be an appropriate time to package a new release.
I did see in the github issues (#146 ) there was a 2022.06.30 release, but I never actually found it.
I can re-configure these systems to trigger the issue again pretty easily if you need more info, but since the issue is fixed in the current source code I doubt any further troubleshooting is needed.
Some additional information about the systems where I encountered these problems:
CentOS Stream release 8
4.18.0-527.el8.x86_64 #1 SMP Thu Nov 23 14:16:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
libnaemon-1.4.1-18.1.x86_64
naemon-thruk-1.4.1-13.1.noarch
naemon-livestatus-1.4.1-14.1.x86_64
naemon-1.4.1-13.1.noarch
naemon-devel-1.4.1-18.1.x86_64
naemon-core-1.4.1-18.1.x86_64
naemon-vimvault-1.4.0-3.2.x86_64
It also seems like I may have had this exact same problem on a set of Debian machines on 3/6/2023 after naemon got upgraded there. I suppose the fix slipped my mind!
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
4.19.0-25-amd64 #1 SMP Debian 4.19.289-2 (2023-08-08) x86_64 GNU/Linux
ii libnaemon:amd64 1.4.1-1 amd64
ii naemon 1.4.1-1 amd64
ii naemon-core 1.4.1-1 amd64
ii naemon-dev 1.4.1-1 amd64
ii naemon-livestatus 1.4.1-1 amd64
ii naemon-thruk 1.4.1-1 amd64
ii naemon-vimvault 1.4.0-1 amd64
ii thruk 3.10-1 amd64
The text was updated successfully, but these errors were encountered:
There is a release available here on github: 2022.06.02. I started off using that. I then migrated to using packages hosted on this mirror:
https://download.opensuse.org/repositories/home:/itrs-op5/CentOS_8_Stream/
In both cases I ran into situations where naemon would crash (and dump a core) and merlind would eventually peg itself at 99% CPU usage. I went in circles for awhile trying to determine what was going on. I started to narrow in on service and host checks that would return a CRITICAL state and cause naemon to crash when it was attempting to generate a notification (even though I had notifications disabled globally). During my initial load testing I was using mostly ping checks that all returned OK, so I rarely hit this condition. But the moment I started getting checks that returned CRITICAL, things would break.
Anyway, long story short - I built merlin from source and everything is fine now. But given the run-around I went through, I figured I'd report this here for anyone else who may encounter this problem -or- merely as a suggestion that it might be an appropriate time to package a new release.
I did see in the github issues (#146 ) there was a 2022.06.30 release, but I never actually found it.
I can re-configure these systems to trigger the issue again pretty easily if you need more info, but since the issue is fixed in the current source code I doubt any further troubleshooting is needed.
Some additional information about the systems where I encountered these problems:
CentOS Stream release 8
4.18.0-527.el8.x86_64 #1 SMP Thu Nov 23 14:16:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
libnaemon-1.4.1-18.1.x86_64
naemon-thruk-1.4.1-13.1.noarch
naemon-livestatus-1.4.1-14.1.x86_64
naemon-1.4.1-13.1.noarch
naemon-devel-1.4.1-18.1.x86_64
naemon-core-1.4.1-18.1.x86_64
naemon-vimvault-1.4.0-3.2.x86_64
NAME="Red Hat Enterprise Linux"
VERSION="8.9 (Ootpa)"
4.18.0-513.5.1.el8_9.x86_64 #1 SMP Fri Sep 29 05:21:10 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
naemon-livestatus-1.4.1-14.1.x86_64
naemon-1.4.1-13.1.noarch
libnaemon-1.4.1-18.1.x86_64
naemon-vimvault-1.4.0-3.2.x86_64
naemon-core-1.4.1-18.1.x86_64
naemon-thruk-1.4.1-13.1.noarch
(Both CentOS and RHEL systems were originally fetching naemon from https://labs.consol.de/repo/stable/rhel8/x86_64/ but switched to https://download.opensuse.org/repositories/home:/naemon/CentOS_7/)
It also seems like I may have had this exact same problem on a set of Debian machines on 3/6/2023 after naemon got upgraded there. I suppose the fix slipped my mind!
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
4.19.0-25-amd64 #1 SMP Debian 4.19.289-2 (2023-08-08) x86_64 GNU/Linux
ii libnaemon:amd64 1.4.1-1 amd64
ii naemon 1.4.1-1 amd64
ii naemon-core 1.4.1-1 amd64
ii naemon-dev 1.4.1-1 amd64
ii naemon-livestatus 1.4.1-1 amd64
ii naemon-thruk 1.4.1-1 amd64
ii naemon-vimvault 1.4.0-1 amd64
ii thruk 3.10-1 amd64
The text was updated successfully, but these errors were encountered: