[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Monit daemon Hangs
From: |
Thomas Vaccarino |
Subject: |
Monit daemon Hangs |
Date: |
Wed, 10 Oct 2012 08:30:56 -0400 |
Hello,
I am in the process of deploying Monit across hundreds of machines. This
number will grow to well over a thousand machines once the work is complete.
I've been running into a situation where Monit hangs after a monit reload is
issued. It's random, but when it does hang a monit quit, followed by a monit,
gets everything running again. In order to support the deployment of Monit
across so many systems in a hands off fashion, two RPMs had to be written that
facilitate the deployment of the binaries and configs. The config RPM changes
the most often and as such a mass deployment of this RPM isn't uncommon.
However, I am running into cases where after the monit config RPM is installed
and a monit reload is issued, any commands to monit (i.e. monit summary etc)
after the reload fail. The log snippet below provides some details:
I am deploying Monit 5.5 on CentOS 5.
Please note that the monit daemon is running when the following messages appear
in the logs.
[EDT Oct 10 06:48:32] error : monit: Openssl read timeout error!
[EDT Oct 10 06:48:32] error : monit: Cannot connect to the monit daemon. Did
you start it with http support?
[EDT Oct 10 06:48:53] error : 'service goes here' process is not running
[EDT Oct 10 06:49:16] error : monit: Openssl read timeout error!
[EDT Oct 10 06:49:16] error : monit: Cannot connect to the monit daemon. Did
you start it with http support?
[EDT Oct 10 06:55:11] info : Shutting down monit HTTP server
[EDT Oct 10 06:55:11] info : monit HTTP server stopped
[EDT Oct 10 06:55:11] info : monit daemon with pid [12988] killed
[EDT Oct 10 06:55:11] info : 'hostname goes here' Monit stopped
[EDT Oct 10 06:56:31] error : monit: Status not available -- the monit
daemon is not running
[EDT Oct 10 06:56:33] info : Starting monit daemon with http interface at
[*:2812]
[EDT Oct 10 06:56:33] info : Starting monit HTTP server at [*:2812]
[EDT Oct 10 06:56:33] info : monit HTTP server started
Restart fixes it:
address@hidden ~]# monit quit
monit daemon with pid [12988] killed
address@hidden bin]# monit
Starting monit daemon with http interface at [*:2812]
Monit is deployed with SSL enabled as well as PAM. This issue doesn't happen
all the time. In a plant of about 35 or so hosts I've had as many as 3 cases
where the daemon just stops responding to commands after the reload. To make
things more interesting, there are monit commands in the Application RPMs pre
and post install scripts that perform an unmonitor in order to cut down on the
number of e-mails when an application deployment is in progress. As it stands
right now, when the monit daemon isn't responding this is causing me some
issues when the unmonitor is run from those scripts. I can probably work
around this, but it would be great if hanging issue could be solved.
Thanks for the help.
Tom Vaccarino
- Monit daemon Hangs,
Thomas Vaccarino <=