monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Restarting based on load average dangerous?


From: Martin Pala
Subject: Re: Restarting based on load average dangerous?
Date: Fri, 12 May 2006 20:27:59 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060205 Debian/1.7.12-1.1

... there is currently no possbility to limit the consecutive exec action occurence for the constant service state in monit - the exec action is performed anytime after the event ratio which triggers the exec action was reached.

You can use state file which can limit the exec to one instance - for example:


if loadavg (1min) > 2 for 2 times within 4 cycles
then exec "/bin/bash -c 'test ! -f /tmp/done && /usr/local/bin/over && touch /tmp/done'"
else if passed for 10 cycles
then exec "/bin/bash -c 'test -f /tmp/done && /usr/local/bin/under && rm -f /tmp/done'"


Martin



Micah Anderson wrote:
Martin Pala wrote:

Yes, see monit manual ... for example:

if loadavg(1min) > 25 for 8 times within 10 cycles
 then exec "/usr/bin/monit apachectl stop"
else if passed for 20 cycles
 then exec "/usr/bin/monit apachectl start"


I think you do not mean to put /us/bin/monit in the exec line, right?

I tried this, and it didn't work as I expected. I set it like this:

  if loadavg (1min) > 2 for 2 times within 4 cycles
    then exec "/usr/local/bin/over"
  else if passed for 10 cycles
    then exec "/usr/local/bin/under"


(my cycles are 30 seconds).

The first two cycles that it was over '2' it did as I expected, it
exec'd after the second one. However, it *continued* to exec every 30
seconds. This is exactly what I do not want. I want monit to see the
load is above 'x' within 'x' cycles, and if so, stop the service. Once
it has issued the stop, I want it to monitor the load and once it has
dropped below 'x', start the service again (but only start it if it
previously stopped it):

May 12 13:02:27 black monit[16951]: 'localhost' loadavg(1min) of 14.5
matches resource limit [loadavg(1min)>2.0]
May 12 13:02:51 black monit[16951]: Monit has not changed
May 12 13:02:51 black monit[16951]: 'localhost' loadavg(1min) of 19.8
matches resource limit [loadavg(1min)>2.0]
(here exec was run)
May 12 13:03:21 black monit[16951]: 'localhost' loadavg(1min) of 16.7
matches resource limit [loadavg(1min)>2.0]
(here exec was run)
May 12 13:03:51 black monit[16951]: 'localhost' loadavg(1min) of 12.7
matches resource limit [loadavg(1min)>2.0]
(here exec was run)
May 12 13:04:21 black monit[16951]: 'localhost' loadavg(1min) of 12.4
matches resource limit [loadavg(1min)>2.0]
(here exec was run)



Martin


Micah Anderson wrote:

If the load on your system goes above a threshold and you know that
this is a result of a runaway process that needs to be restarted, will
this cause the process to be restarted over and over because the 1
minute load average will not drop fast enough to get below the threshold:

check system localhost
 if loadavg (1min) > 25 then exec "/usr/bin/monit apachectl restart"

I'm afraid that the load will climb to 35, monit will see this and
apache will be restarted, next cycle monit will see that the load
average is 30 (because it is going down), and it will issue a restart
*again*, the load will continue to drop, monit will see its now 26 and
restart apache a third time, when really there is no load problem as
the load delta is dropping.

Is there a way to make a load dependency that says, "If load gets
above 25, stop this process, once the load drops back down below 10
things are probably back to normal, so start the process again."?

Thanks,
Micah





--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general




--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general







reply via email to

[Prev in Thread] Current Thread [Next in Thread]