monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Recursive loop of 'monit restart apache' when calling it from inside a s


From: Jeremy Clarke
Subject: Recursive loop of 'monit restart apache' when calling it from inside a system service definition
Date: Fri, 5 Nov 2010 20:12:25 -0400

Hi guys, I have the following setup on a purely-http server:

set daemon  60

check process apache
    with pidfile /var/run/httpd.pid
    start program = "/etc/init.d/httpd restart" with timeout 30 seconds
    stop program = "/etc/init.d/httpd stop"
     if failed host globalvoicesonline.org port 80 protocol http with timeout 25 seconds then alert

check system server1
    if loadavg (1min) > 8 for 1 cycles then exec "/bin/bash -c '/usr/bin/monit restart apache'"
    if memory usage > 70% for 3 cycles then exec "/bin/bash -c '/usr/bin/monit restart apache'"
    if memory usage > 80% then exec "/bin/bash -c '/usr/bin/monit restart apache'"



I got the "/bin/bash -c '/usr/bin/monit restart apache'"  part form a thread in this group from a long time ago, and it seems to be doing what you would expect it to, but with one really big caveat: When the monit daemon is loaded by the exec command it seems to re-check the 'apache' service immediately and decide to restart it again. The result is that when my load goes over 8 the server goes into a cycle of restarting itself every 2 seconds until the load has lowered. Obviously this is not desireable behavior.

Ideally I would expect monit to know better somehow, and wait at least the 30 second timeout before trying the restart process again. I'm not even sure what is causing the constant restarts though it seems like a loop to me.

Here's an example from my monit log (truncated cause even one example of this is super long):

[EDT Nov  5 19:32:47] error    : 'server1' loadavg(1min) of 12.0 matches resource limit [loadavg(1min)>8.0]
[EDT Nov  5 19:32:47] info     : 'server1' exec: /bin/bash
[EDT Nov  5 19:32:47] info     : restart service 'apache' on user request
[EDT Nov  5 19:32:47] info     : monit daemon at 14056 awakened
[EDT Nov  5 19:32:47] info     : Awakened by User defined signal 1
[EDT Nov  5 19:32:47] info     : 'apache' trying to restart
[EDT Nov  5 19:32:47] info     : 'apache' stop: /etc/init.d/httpd
[EDT Nov  5 19:32:48] info     : 'apache' start: /etc/init.d/httpd
[EDT Nov  5 19:32:49] info     : 'apache' restart action done
[EDT Nov  5 19:32:49] error    : 'server1' loadavg(1min) of 12.0 matches resource limit [loadavg(1min)>8.0]
[EDT Nov  5 19:32:49] info     : 'server1' exec: /bin/bash
[EDT Nov  5 19:32:49] info     : restart service 'apache' on user request
[EDT Nov  5 19:32:49] info     : monit daemon at 14056 awakened
[EDT Nov  5 19:32:49] info     : Awakened by User defined signal 1
[EDT Nov  5 19:32:49] info     : 'apache' trying to restart
[EDT Nov  5 19:32:49] info     : 'apache' stop: /etc/init.d/httpd
[EDT Nov  5 19:32:50] info     : 'apache' start: /etc/init.d/httpd
[EDT Nov  5 19:32:51] info     : 'apache' restart action done
[EDT Nov  5 19:32:51] error    : 'server1' loadavg(1min) of 12.0 matches resource limit [loadavg(1min)>8.0]
[EDT Nov  5 19:32:51] info     : 'server1' exec: /bin/bash
[EDT Nov  5 19:32:51] info     : restart service 'apache' on user request
[EDT Nov  5 19:32:51] info     : monit daemon at 14056 awakened
[EDT Nov  5 19:32:51] info     : Awakened by User defined signal 1
[EDT Nov  5 19:32:51] info     : 'apache' trying to restart
[EDT Nov  5 19:32:51] info     : 'apache' stop: /etc/init.d/httpd
[EDT Nov  5 19:32:52] info     : 'apache' start: /etc/init.d/httpd
[EDT Nov  5 19:32:53] info     : 'apache' restart action done
[EDT Nov  5 19:32:53] error    : 'server1' loadavg(1min) of 11.0 matches resource limit [loadavg(1min)>8.0]
[EDT Nov  5 19:32:53] info     : 'server1' exec: /bin/bash
[EDT Nov  5 19:32:53] info     : restart service 'apache' on user request
[EDT Nov  5 19:32:53] info     : monit daemon at 14056 awakened
[EDT Nov  5 19:32:53] info     : Awakened by User defined signal 1
[EDT Nov  5 19:32:53] info     : 'apache' trying to restart
[EDT Nov  5 19:32:53] info     : 'apache' stop: /etc/init.d/httpd
[EDT Nov  5 19:32:54] info     : 'apache' start: /etc/init.d/httpd
[EDT Nov  5 19:32:55] info     : 'apache' restart action done
[EDT Nov  5 19:32:55] error    : 'server1' loadavg(1min) of 11.0 matches resource limit [loadavg(1min)>8.0]
[EDT Nov  5 19:32:55] info     : 'server1' exec: /bin/bash
[EDT Nov  5 19:32:55] info     : restart service 'apache' on user request
[EDT Nov  5 19:32:55] info     : monit daemon at 14056 awakened
[EDT Nov  5 19:32:55] info     : Awakened by User defined signal 1
[EDT Nov  5 19:32:55] info     : 'apache' trying to restart
[EDT Nov  5 19:32:55] info     : 'apache' stop: /etc/init.d/httpd
[EDT Nov  5 19:32:56] info     : 'apache' start: /etc/init.d/httpd
[EDT Nov  5 19:32:57] info     : 'apache' restart action done
[EDT Nov  5 19:32:57] error    : 'server1' loadavg(1min) of 10.1 matches resource limit [loadavg(1min)>8.0]
[EDT Nov  5 19:32:57] info     : 'server1' exec: /bin/bash
[EDT Nov  5 19:32:57] info     : restart service 'apache' on user request
[EDT Nov  5 19:32:57] info     : monit daemon at 14056 awakened
[EDT Nov  5 19:32:57] info     : Awakened by User defined signal 1
[EDT Nov  5 19:32:57] info     : 'apache' trying to restart
[EDT Nov  5 19:32:57] info     : 'apache' stop: /etc/init.d/httpd
[EDT Nov  5 19:32:58] info     : 'apache' start: /etc/init.d/httpd
[EDT Nov  5 19:32:59] info     : 'apache' restart action done

----

AND SO ON, it repeated that loop six more times before finally getting the desired result:

[EDT Nov  5 19:33:12] info     : 'server1' 'server1' loadavg(1min) check succeeded [current loadavg(1min)=8.0]

So is there a way I can avoid this behavior? Am I doomed to use a raw exec of the init script if I want to restart apache from the system definition?

In my testing the Apache process never shows itself as using any ram or CPU, so I can't test those inside the apache definition to restart it when they become problems.

Is there a way to tell the monit daemon "Please do this restart command, but don't do any of your own diagnostics."?

Maybe there is a way to be more explicit about waiting between restarts than the 30 second timeout on the 'start program' definition? My interval is 60 seconds so there is no way that monit is respecting that if it is re-runnign the restart procedure every 2 seconds.

Thanks for any help, and for the great program. It's saved my ass most days since I installed it, but now I'm trying to get it to work properly and logically rather than just barely ;)

--
Jeremy Clarke | http://jeremyclarke.org
Code and Design | http://globalvoicesonline.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]