[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [monit] how to specify this tricky situation
From: |
Martin Pala |
Subject: |
Re: [monit] how to specify this tricky situation |
Date: |
Sat, 21 Jun 2008 12:01:12 +0200 |
On Jun 13, 2008, at 10:21 PM, David Blank-Edelman wrote:
Hi-
I've been really happy with monit and appreciate all of the hard
work put into it. I recently encountered a scenario for which I
can't seem to find an elegant approach. I'm wondering if you have
any suggestions about the following:
I have a piece of software I am trying to monitor which consists of
two daemons which I'll call "spam" and "proxy".
I want to check that two things are always true:
1) is the SMTP service on port 25 (provided by "proxy") available
2) is the process "spam" running as per its pid file
That's all very easy to do using monit. Here is where it gets more
interesting.
I have three vendor-supplied scripts available to me:
start-software
stop-software
restart-software (calls stop-software and then start-software)
start-software and stop-software always start and stop _both_ the
daemons.
The place I am getting into trouble seems to be with the two stanzas
of my config file fighting with each other. Let's say the "spam"
process goes down. It attempts to restart, but this has the side
effect of bringing down the proxy daemon and monit then attempts to
correct the lack of SMTP service by, you guessed it, bringing down
the spam process. And so on... The other part of this that I think
is biting me is that at least part of the process is asynchronous
and hence some of this corrective action is overlapping
Hi.
yes, there was possible overlap in actions, this problem is addressed
in monit-5.0 (currently beta):
http://www.tildeslash.com/monit/download/
Monit 5.0 will most probably solve he problem - it will wait for
service to start before testing the next service.
Ideally I'd love to construct a single stanza that says (atomically)
if either #1 or #2 is true, attempt a restart. I would think that I
could use dependencies to help with this, but the problem is they
are both (because of how they are started/stopped) dependent on each
other. I also contemplated using the fact that both #1 and #2 could
be put in the same group, but as far as I can tell groups aren't
actually accessible from the config file (i.e. you can't say
"restart group" from anything but the command line).
Since the restart script is common, you can use common service entry
in monit, joining the smtp port and process check, something like:
--8<--
check process spam_proxy with pidfile /var/run/spam.pid
start program = ...
stop program = ...
if failed port 25 protocol smtp then restart
--8<--
If one of the services will fail, monit will call restart script to
recover the service.
Martin