Re: Not understanding 'Program Status Testing'

From:

Paul Theodoropoulos

Subject:

Date:

Wed, 23 Jul 2014 13:06:24 -0700

User-agent:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.0

On 7/22/14 2:32 AM, Vincent WATREMEZ wrote:

Hi Paul,

By any chance, does the user running monit have the right privileges to run `burcado status`?

Also, you might debug the status command by displaying its output to STDERR.

Regards

Vincent

Thanks Vincent - I figured out part of it. It wasn't privileges, it was paths - monit maintains a very strict and limited path. When I ran the script myself as root on the command line, it worked fine. But within monit, it actually wanted my test script to be this, with all paths spelled out:

#!/bin/sh
/usr/local/bin/bucardo.pl status |/bin/grep trumgr|/usr/bin/cut -d"|" -f4|/bin/grep ".m" >/dev/null 2>&1
exit $?

Now monit correctly understands when the test fails:

Program 'bucardo.monitor'
status                            Status failed
monitoring status                 Monitored
last started                      Wed, 23 Jul 2014 13:01:36
last exit value                   0
data collected                    Wed, 23 Jul 2014 13:01:36

Which is all great - except that it never generates an alert! I've confirmed that my other checks generate alerts - only this one fails to do so. I have of course tried reversing the status checks, etc - no joy. So I'm still stuck.

2014-07-21 23:44 GMT+02:00 Paul Theodoropoulos <address@hidden>:
I have a daemon which I want to monitor specific status. I've created the following script called 'bucardo.monitor':

#!/bin/sh
bucardo status |grep mydb|cut -d"|" -f4| grep ".m" >/dev/null 2>&1
exit $?

In short, if the string "(one char)m" exists, I wish to get an alert. When I run the script from the command line, and the string I'm looking for exists, I get the following expected output:

me# bucardo.monitor;echo $?
0

I created a monit conf file thus:

alert address@hidden with reminder on 5 cycle
alert address@hidden with reminder on 5 cycle
check program bucardo-monitor with path /usr/local/bin/bucardo.monitor
with timeout 3 seconds
if status = 0 then alert

The manual states that the operator should be "==", however the last example under status only uses a single equals sign - and I've tried both, no difference. I've also use just "if status 0 then alert" as suggested in the manual, also no difference.

The problem is that monit always shows a last exit status of "1" - except for a few moments after issuing 'monit reload' to deploy changes to the script:

Program 'bucardo-monitor'
status                            Status ok
monitoring status                 Monitored
last started                      Mon, 21 Jul 2014 14:40:47
last exit value                   1
data collected                    Mon, 21 Jul 2014 14:40:47

I've forced the test to be highly sensitive so that it will changed from an exit of 0 to 1 every few minutes, well within my monitoring window - but again, I never get a status other than 1 in monit status, and thus never get an alert.

Am I doing something wrong? Misunderstanding?

-- Paul Theodoropoulos www.anastrophe.com
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

-- 
Paul Theodoropoulos
www.anastrophe.com