monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] sending a SIGKILL from monit for handling stale mongrel pids


From: Martin Pala
Subject: Re: [monit] sending a SIGKILL from monit for handling stale mongrel pids
Date: Tue, 16 Oct 2007 21:22:38 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 Iceape/1.1.4 (Debian-1.1.4-1)

timeout disables the process monitoring and sends alert - the idea is, that of the service is in error state too long and all/repeated automatic recovery attempts failed, it makes no sense to try it over and over and thus it is possible to stop monitoring and alert operator.

Regarding the kill ... you can use the "exec" action like this:

  if cpu is greater than 80% for 5 cycles then exec "/bin/pkill mongrel"

or more specific (reusing pid file):

if cpu is greater than 80% for 5 cycles then exec "/bin/bash -c 'kill -9 `cat /var/run/mongrel_cluster/mongrel.9006.pid`'"

Martin


Michael Steinfeld wrote:
So maybe I am a complete idiot... but here is what I have been pondering

Every once in awhile it seems that monit will attempt to restart
mongrels if it meets the specificied criteria.. CPU to high/long, to
much RAM .. etc

What happens is monit will attempt to restart mongrels, but the pids
are not dying. Even if I do, "monit -g group stop all" and wait...
they don't die. Even attempting to stop the process by itself doesn't
work. So I have to send a SIGKILL

(I have not been able to figure out what is causing this )

So.. I was thinking to have monit send a SIGKILL if 5 cycles doesn't
solve the issue.

#my monit service for mongrels
check process mongrel_9006
  with pidfile /var/run/mongrel_cluster/mongrel.9006.pid
  start program = "/usr/local/bin/mongrel_rails cluster::start -C
/etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006"
  stop program = "/usr/local/bin/mongrel_rails cluster::stop -C
/etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006"
  if totalmem is greater than 110.0 MB for 3 cycles then
          restart       # eating up memory?
  if loadavg(5min) greater than 10 for 8 cycles then
    restart          # bad, bad, bad
  if cpu is greater than 50% for 2 cycles then
    alert                  # send an email to admin
   if cpu is greater than 80% for 3 cycles then
    restart
  if 10 restarts within 10 cycles then
    timeout

Instead of ..

<snip>
  if cpu is greater than 50% for 2 cycles then
    alert                  # send an email to admin
   if cpu is greater than 80% for 3 cycles then
</snip>

do this ...

<snip>
if cpu is greater than 50% for 2 cycles then
  alert                  # complain about it
if cpu is greater than 80% for 5 cycles then
 sigkill
      sleep 5 # enough time to kill all 8 mongrel pids
         start_fresh
</snip>

#so it would look like this... you get the idea.
#my monit service for mongrels
check process mongrel_9006
  with pidfile /var/run/mongrel_cluster/mongrel.9006.pid
  start program = "/usr/local/bin/mongrel_rails cluster::start -C
/etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006"
  stop program = "/usr/local/bin/mongrel_rails cluster::stop -C
/etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006"

  kill_the_bastard = "kill -9 <pid>"  # hmpf...

  if totalmem is greater than 110.0 MB for 3 cycles then
          restart       # eating up memory?
  if loadavg(5min) greater than 10 for 8 cycles then
    restart          # bad, bad, bad
  if cpu is greater than 50% for 2 cycles then
  alert                  # complain about it

if cpu is greater than 80% for 5 cycles then
 kill_the_bastard
   # I am assuming that if it is killed, then monit will start it

 if 10 restarts within 10 cycles then
    timeout

so question, does 'timeout' actually send a SIGTERM/SIGHUP to the
proccess, or does it just execute the stop command for that particular
service?

how are you guys handling stale pids with monit? In the case that
executing stop/restart doesn't work?

Is what I am suggesting even possible?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]