monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Program status testing - alert contains stdout and not stderr


From: Nestor Urquiza
Subject: Re: Program status testing - alert contains stdout and not stderr
Date: Sun, 14 Jul 2013 11:11:24 -0400

Some more information. I removed the timeout and restarted monit. I noticed that the script hanged during an rsync and then the default timeout of 5 minutes did not apply. As a consequence I got monit reporting an error with stdout content (the stdout content of the next execution as explained)

I will proceed to setup the timeout explicitely to 30 minutes (the script runs every two hours) and tomorrow I will be reporting back on any issues.


On Sat, Jul 13, 2013 at 6:14 PM, Nestor Urquiza <address@hidden> wrote:
Hi Jan-Henrik,

I went ahead and created a sample script to make sure this actually works and I can confirm it does with that simple script. The issue as logs show is apparently a result of a double notification. The script took so long that monit killed it but the timeout was exactly equal to the time of next occurence:
[EDT Jul 13 03:15:51] error    : 'myscript' program timed out after 7230 seconds. Killing program with pid 4407
[EDT Jul 13 03:15:51] error    : 'myscript' Sun Microsystems Inc.     SunOS 5.10      Generic January 2005
You have new mail.

The first is a real error but from the myscript logs I can see that on 03:15 it did start and it was running correctly until suddenly it stopped presumably because monit killed it. So my best guess at this moment would be:
1. Monit receives previous myscript timeout notification at the same time as current myscript run events
2. Monit kills both instances
3. Monit alerts on the timeout and on the killed process, however on the latter there is nothing in stderr so monit defaults to stdout

Clearly I have a workaround which is setting a shorter than the script run cycle (2 hours for this script case)

On a side note/question I noticed monit switches to "waiting" for the next occurrence of the script instead of staying in failed status. After all I would like to run 'monit summary' and make sure I know if the script failed last time or not (and not rely uniquely on an alert). Is this a feature to be considered? You can see this easily just scheduling a simple bash bash script and forcing it to exit with status=1 for example.

Thanks!
- Nestor


On Sat, Jul 13, 2013 at 9:02 AM, Jan-Henrik Haukeland <address@hidden> wrote:
On 13 Jul 2013, at 13:39, Nestor Urquiza <address@hidden> wrote:

> check program myscript with path "/usr/local/bin/myscript.sh" with timeout 1000 seconds if status != 0 then alert
> When it fails I get the stdout in the alert instead of stderr. There is a lot of logging in the script and monit is collecting only the first few lines so tge real cause of the issue is not coming up. This is happening in solaris running 5.5.1.

Monit first reads from the script's stderr, if there is nothing there _then_ it reads from stdout. Please make sure that your script really write to stderr if needed. The output (if any) is part of the alert message and to avoid too long messages only 255 chars are read. Maybe your script could do some processing of the error and only write the relevant part to stderr?
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general



reply via email to

[Prev in Thread] Current Thread [Next in Thread]