monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Monit daemon Hangs


From: Thomas Vaccarino
Subject: Monit daemon Hangs
Date: Wed, 10 Oct 2012 08:30:56 -0400

Hello,

I am in the process of deploying Monit across hundreds of machines.  This 
number will grow to well over a thousand machines once the work is complete.  
I've been running into a situation where Monit hangs after a monit reload is 
issued.  It's random, but when it does hang a monit quit, followed by a monit, 
gets everything running again.  In order to support the deployment of Monit 
across so many systems in a hands off fashion, two RPMs had to be written that 
facilitate the deployment of the binaries and configs.  The config RPM changes 
the most often and as such a mass deployment of this RPM isn't uncommon.  
However, I am running into cases where after the monit config RPM is installed 
and a monit reload is issued, any commands to monit (i.e. monit summary etc) 
after the reload fail.  The log snippet below provides some details:

I am deploying Monit 5.5 on CentOS 5.

Please note that the monit daemon is running when the following messages appear 
in the logs.

[EDT Oct 10 06:48:32] error    : monit: Openssl read timeout error!
[EDT Oct 10 06:48:32] error    : monit: Cannot connect to the monit daemon. Did 
you start it with http support?
[EDT Oct 10 06:48:53] error    : 'service goes here' process is not running
[EDT Oct 10 06:49:16] error    : monit: Openssl read timeout error!
[EDT Oct 10 06:49:16] error    : monit: Cannot connect to the monit daemon. Did 
you start it with http support?
[EDT Oct 10 06:55:11] info     : Shutting down monit HTTP server
[EDT Oct 10 06:55:11] info     : monit HTTP server stopped
[EDT Oct 10 06:55:11] info     : monit daemon with pid [12988] killed
[EDT Oct 10 06:55:11] info     : 'hostname goes here' Monit stopped
[EDT Oct 10 06:56:31] error    : monit: Status not available -- the monit 
daemon is not running
[EDT Oct 10 06:56:33] info     : Starting monit daemon with http interface at 
[*:2812]
[EDT Oct 10 06:56:33] info     : Starting monit HTTP server at [*:2812]
[EDT Oct 10 06:56:33] info     : monit HTTP server started

Restart fixes it:

address@hidden ~]# monit quit
monit daemon with pid [12988] killed

address@hidden bin]# monit
Starting monit daemon with http interface at [*:2812]

Monit is deployed with SSL enabled as well as PAM.  This issue doesn't happen 
all the time.  In a plant of about 35 or so hosts I've had as many as 3 cases 
where the daemon just stops responding to commands after the reload.  To make 
things more interesting, there are monit commands in the Application RPMs pre 
and post install scripts that perform an unmonitor in order to cut down on the 
number of e-mails when an application deployment is in progress.  As it stands 
right now, when the monit daemon isn't responding this is causing me some 
issues when the unmonitor is run from those scripts.  I can probably work 
around this, but it would be great if hanging issue could be solved.

Thanks for the help.

Tom Vaccarino


reply via email to

[Prev in Thread] Current Thread [Next in Thread]