monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Running an external script with EXEC on a timeout


From: Gael Pourriel
Subject: Re: [PATCH] Running an external script with EXEC on a timeout
Date: Sun, 27 Feb 2005 19:59:55 +0100

Nice one, I've looked into the source to do this too since I'm running
the same config as you do with heartbeat and I needed to get the node
to reboot when timing out so the other one could take over.
However, I've used a different trick which works too without modifying
the source, it does involve however that you control the script that
start your services.
Basically whenever I start apache using the apachectl script I also
add a byte to a temp file (echo >> /tmp/apache_restart), then I use
another test in monit to check this temp file and exec "reboot" when
it reach a given size (10 bytes would mean 10 restart). However I've
also got another check that deletes the file is the timestamp if over
a given period.
I agree it does take a lot more tweaking than using the timeout
function, I'll look into your patch to see if I can use it.

Gael





On Sat, 26 Feb 2005 16:35:22 +1100, Patrick Burns <address@hidden> wrote:
> I need this feature so much I'm trying to put it in myself. I'm
> intending to use monit to look after the nodes in my Heartbeat clusters.
> If something on a node fails (E.g. Apache goes down) monit can try to
> restart it. However if a number of restarts are unsuccessful, it would
> be good to have the node gracefully leave the cluster and initiate a
> fail-over.
> 
> I've got this in /etc/monitrc:
> 
> ---
> set daemon 10
> set alert address@hidden
> check process foo with pidfile /tmp/foo
>         if 3 restarts within 5 cycles then exec /tmp/bar
> ---
> 
> /tmp/bar just contains:
> 
> ---
> #!/bin/bash
> echo Hello World
> ---
> 
> Output looks like this (edited for brevity):
> 
> ---
> mail:~# monit -I -v -c /etc/monitrc
> Runtime constants:
> (removed)
> 
> The service list contains the following entries:
> 
> Process Name          = foo
>  Group                = (not defined)
>  Pid file             = /tmp/foo
>  Monitoring mode      = active
>  Timeout              = If 3 restart within 5 cycles then exec else if
>  recovered then alert
> 
> -------------------------------------------------------------------------------
> Starting monit daemon
> 'foo' process is not running
> Does not exist notification is sent to address@hidden
> monit: Start or stop method not defined -- process foo
> 'foo' process is not running
> monit: Start or stop method not defined -- process foo
> 'foo' process is not running
> monit: Start or stop method not defined -- process foo
> 'foo' service timed out and will not be checked anymore
> Timeout notification is sent to address@hidden
> Monitoring disabled -- service foo
> Hello World
> ^C
> monit daemon with pid [3155] killed
> You have new mail in /var/mail/patrickb
> ---
> 
> You can see the exec worked, as it printed "Hello World" to the console
> after the service timed out.
> 
> If I can exec any arbitrary command after a timeout, there's no reason
> why I can't put in "/etc/init.d/heartbeat stop" to cause the node to
> give up it's resources if a service looks terminally broken. (Assuming
> the error hasn't propagated to the other node in the cluster as well...)
> 
> Patch attached...
> 
> --
>   Patrick Burns
>   address@hidden
> 
> 
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]