monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit watchdog timer restart


From: Martin Pala
Subject: Re: monit watchdog timer restart
Date: Thu, 26 Feb 2004 19:17:27 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040122 Debian/1.6-1

I'm sorry, my suggestion was not correct.

Monit propagates the events upwards (which is logical :) For example if you have following chain:

A->B->C

where A depends on B depends on C, and C will fail, all depending services (A and B) will inherite the same event (stop for example). However if A only related event occures, it is not inherited to its parents - they can work independent of A state.

So in your case probably the following setup could work (corrected):

 check process AppManager with pidfile "/var/lxs/run/AppManager.pid"
  start program = "/opt/lxs/bin/lxs.sh start AppManager"
  stop program = "/opt/lxs/bin/lxs.sh stop AppManager"

 check file AppManager_monit with path "/var/lxs/run/AppManager.monit"
  if timestamp > 2 minutes then exec "/usr/bin/monit stop AppManager"
  depends on AppManager


=> in the case that the process AppManager failed (and doesn't update its state file anymore), you will stop it via monit's interface. This way monit will be instructed to cleanly stop the service including its dependant AppManager_monit (file monitoring). If you will call service init script directly, monit will think that the service failed and will start it again - thus following setup is wrong:

  ...
  if timestamp > 2 minutes then exec "/opt/lxs/bin/lxs.sh stop
  ...

The princip is common in cluster frameworks generaly (for example Sun Cluster) - if the service is under cluster control, you must use cluster's resource management instead of direct "shot" (for such case you need to bring the service out of cluster control).

Martin


Peter Holdaway wrote:
I have tried this and the stop action does not affect the process.

monit.log says...

[ Feb 24 14:32:18] start: (AppManager) /opt/lxs/bin/lxs.sh
[ Feb 24 14:34:37] Event: timestamp test failed for
/var/lxs/run/AppManager.monit


monit status says...

Process 'AppManager'                running
File 'AppManager_monit'             not monitored


Is this a bug?

Regards,

  Peter


-----Original Message-----
From: address@hidden
[mailto:address@hidden
On Behalf Of Martin Pala
Sent: Tuesday, 24 February 2004 2:12 PM
To: This is the general mailing list for monit
Subject: Re: monit watchdog timer restart

Hi, following setup should work in this case:

 check process AppManager with pidfile "/var/lxs/run/AppManager.pid"
  start program = "/opt/lxs/bin/lxs.sh start AppManager"
  stop program = "/opt/lxs/bin/lxs.sh stop AppManager"

 check file AppManager_monit with path "/var/lxs/run/AppManager.monit"
  if timestamp > 2 minutes then stop
  depends on AppManager


Stop events are inherited via dependency, though the depending file-type
service has no stop method defined.


Martin

Peter Holdaway wrote:

Hi,

 I would like some advice on the simplest way to implement a watchdog

timer

restart of a process. Perhaps this could also be added to the

documentation

too.

 I have a process that should be regularly updating a file. If it is

not I

can assume the process is broken and should be restarted. In the absence

of

a network protocol, this is often the easiest way to instrument a

process

for monitoring its readiness to perform work.


 In version 3.2 of monit this was accomplished by...

check AppManager with pidfile /var/lxs/run/AppManager.pid
   start program = "/opt/lxs/bin/lxs.sh start AppManager"
   stop program = "/opt/lxs/bin/lxs.sh stop AppManager"
   if timestamp "/var/lxs/run/AppManager.monit" > 2 minute then restart


 Is the following the best expression of this problem in monit 4.2 ?


check process AppManager with pidfile "/var/lxs/run/AppManager.pid"
   start program = "/opt/lxs/bin/lxs.sh start AppManager"
   stop program = "/opt/lxs/bin/lxs.sh stop AppManager"

check file AppManager_monit with path "/var/lxs/run/AppManager.monit"
   if timestamp > 2 minutes then exec "/opt/lxs/bin/lxs.sh stop

AppManager"

   depends on AppManager


Notice the unusual direction of the dependency. There do not seem to be

any

examples of this in the documentation.

This dependency is required so that when "monit stop AppManager" is

issued

then the AppManager_monit service is also stopped.

This solution requires two time periods to restart the process, one for

the

timestamp and one for the process restart.


TIA

 Peter





reply via email to

[Prev in Thread] Current Thread [Next in Thread]