monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[monit] monit 5.0 beta4 bug - sends same message every cycle


From: Aleksander Kamenik
Subject: [monit] monit 5.0 beta4 bug - sends same message every cycle
Date: Wed, 19 Nov 2008 13:31:34 +0200
User-agent: Thunderbird 2.0.0.16 (X11/20080723)

Hi,

This bug occurs the second time now, the first time was on 13th Nov also beta4.


monit detects a high load at 05:12 (expected):

"Monit alert devel.kisise at Wed, 19 Nov 2008 05:12:11 +0200 on devel

  loadavg(1min) of 3.1 matches resource limit [loadavg(1min)>3.0]"

But this load stays there only for a minute, but instead of the resource succeeded message I get the same message the next cycle (55s). And the next cycle and the next one etc.

I got almost 400 messages, all exactly the same, before I arrived at work at noon and shut down monit.

monit unmonitor all did not stop the messages from being sent. monit summary showed that no services were monitored, but the messages still kept coming.

Shutting down monit stopped the messages, but as soon as I started monit up again, even with the services unmonitored, it started spamming me with the same message again. I tried to monitor and unmonitor again, but this did not help.

So this buggy state survives restarts.

I shut down monit again and here's my little investigation, note the bunch of *.devel.kisise files:

devel:/var/monit # pwd
/var/monit
devel:/var/monit # ll
total 136
-rw------- 1 root root 154 Nov 18 05:13 1226977982_devel.kisise
-rw------- 1 root root 152 Nov 18 05:39 1226979565_devel.kisise
-rw------- 1 root root 196 Nov 19 05:12 1227064336_devel.kisise
-rw------- 1 root root 154 Nov 19 05:12 1227064376_devel.kisise
-rw------- 1 root root 152 Nov 19 05:38 1227065904_devel.kisise
-rw------- 1 root root 156 Nov 19 10:31 1227083466_apache2_bin
-rw------- 1 root root 157 Nov 19 10:31 1227083466_apache2_init
-rw------- 1 root root 154 Nov 19 10:31 1227083466_bootfs
-rw------- 1 root root 152 Nov 19 10:31 1227083466_cron
-rw------- 1 root root 154 Nov 19 10:31 1227083466_devel.kisise
-rw------- 1 root root 160 Nov 19 10:31 1227083466_mysqld_bin
-rw------- 1 root root 161 Nov 19 10:31 1227083466_mysqld_init
-rw------- 1 root root 164 Nov 19 10:31 1227083466_mysqldsafe_bin
-rw------- 1 root root 157 Nov 19 10:31 1227083466_ntpd_bin
-rw------- 1 root root 158 Nov 19 10:31 1227083466_ntpd_init
-rw------- 1 root root 157 Nov 19 10:31 1227083466_postfix_bin
-rw------- 1 root root 158 Nov 19 10:31 1227083466_postfix_init
-rw------- 1 root root 154 Nov 19 10:31 1227083466_rootfs
-rw------- 1 root root 157 Nov 19 10:31 1227083466_samba_init
-rw------- 1 root root 154 Nov 19 10:31 1227083466_sshd_bin
-rw------- 1 root root 155 Nov 19 10:31 1227083466_sshd_init
-rw------- 1 root root 152 Nov 19 10:31 1227083469_apache2
-rw------- 1 root root 155 Nov 19 10:31 1227083469_mysql
-rw------- 1 root root 153 Nov 19 10:31 1227083469_ntpd
-rw------- 1 root root 153 Nov 19 10:31 1227083469_postfix
-rw------- 1 root root 161 Nov 19 10:31 1227083469_samba_smbd_bin
-rw------- 1 root root 150 Nov 19 10:31 1227083469_smb
-rw------- 1 root root 150 Nov 19 10:31 1227083469_sshd
-rw------- 1 root root 146 Nov 19 10:37 1227083853_devel.kisise
-rw------- 1 root root 146 Nov 19 11:22 1227086528_devel.kisise
-rw------- 1 root root 146 Nov 19 13:16 1227093413_devel.kisise
-rw------- 1 root root 152 Nov 19 13:17 1227093437_devel.kisise
-rw------- 1 root root 154 Nov 19 13:19 1227093593_devel.kisise
-rw------- 1 root root 146 Nov 19 13:21 1227093677_devel.kisise
devel:/var/monit # grep 3.1 *
Binary file 1227064336_devel.kisise matches
devel:/var/monit # strings 1227064336_devel.kisise
devel.kisise
loadavg(1min) of 3.1 matches resource limit [loadavg(1min)>3.0]
devel:/var/monit #

The only fortunate thing about this is, is that devel.kisise is the only box which sends only emails, but no sms. :)

This error obviously does not occer every night, it's the second time tonight though. The last time a proper restart of monit killed the bug though, this time not.

The box is running SLES10SP2 x86. This is monit 5.0 beta4, I'd say this bug was introduced in one of the last betas.

If you need any more info, ask.

Regards,

--

Aleksander Kamenik
System Administrator
Krediidiinfo AS
an Experian Company
Phone: +372 665 9649
Email: address@hidden

http://www.krediidiinfo.ee/
http://www.experiangroup.com/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]