monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] sending a SIGKILL from monit for handling stale mongrel pids


From: Michael Steinfeld
Subject: Re: [monit] sending a SIGKILL from monit for handling stale mongrel pids
Date: Tue, 16 Oct 2007 15:43:33 -0400

Perfect! Thanks Martin.

On 10/16/07, Martin Pala <address@hidden> wrote:
> timeout disables the process monitoring and sends alert - the idea is,
> that of the service is in error state too long and all/repeated
> automatic recovery attempts failed, it makes no sense to try it over and
> over and thus it is possible to stop monitoring and alert operator.
>
> Regarding the kill ... you can use the "exec" action like this:
>
>    if cpu is greater than 80% for 5 cycles then exec "/bin/pkill mongrel"
>
> or more specific (reusing pid file):
>
>    if cpu is greater than 80% for 5 cycles then exec "/bin/bash -c 'kill
> -9 `cat /var/run/mongrel_cluster/mongrel.9006.pid`'"
>
> Martin
>
>
> Michael Steinfeld wrote:
> > So maybe I am a complete idiot... but here is what I have been pondering
> >
> > Every once in awhile it seems that monit will attempt to restart
> > mongrels if it meets the specificied criteria.. CPU to high/long, to
> > much RAM .. etc
> >
> > What happens is monit will attempt to restart mongrels, but the pids
> > are not dying. Even if I do, "monit -g group stop all" and wait...
> > they don't die. Even attempting to stop the process by itself doesn't
> > work. So I have to send a SIGKILL
> >
> > (I have not been able to figure out what is causing this )
> >
> > So.. I was thinking to have monit send a SIGKILL if 5 cycles doesn't
> > solve the issue.
> >
> > #my monit service for mongrels
> > check process mongrel_9006
> >   with pidfile /var/run/mongrel_cluster/mongrel.9006.pid
> >   start program = "/usr/local/bin/mongrel_rails cluster::start -C
> > /etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006"
> >   stop program = "/usr/local/bin/mongrel_rails cluster::stop -C
> > /etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006"
> >   if totalmem is greater than 110.0 MB for 3 cycles then
> >         restart       # eating up memory?
> >   if loadavg(5min) greater than 10 for 8 cycles then
> >     restart          # bad, bad, bad
> >   if cpu is greater than 50% for 2 cycles then
> >     alert                  # send an email to admin
> >    if cpu is greater than 80% for 3 cycles then
> >     restart
> >   if 10 restarts within 10 cycles then
> >     timeout
> >
> > Instead of ..
> >
> > <snip>
> >   if cpu is greater than 50% for 2 cycles then
> >     alert                  # send an email to admin
> >    if cpu is greater than 80% for 3 cycles then
> > </snip>
> >
> > do this ...
> >
> > <snip>
> > if cpu is greater than 50% for 2 cycles then
> >   alert                  # complain about it
> > if cpu is greater than 80% for 5 cycles then
> >  sigkill
> >       sleep 5 # enough time to kill all 8 mongrel pids
> >          start_fresh
> > </snip>
> >
> > #so it would look like this... you get the idea.
> > #my monit service for mongrels
> > check process mongrel_9006
> >   with pidfile /var/run/mongrel_cluster/mongrel.9006.pid
> >   start program = "/usr/local/bin/mongrel_rails cluster::start -C
> > /etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006"
> >   stop program = "/usr/local/bin/mongrel_rails cluster::stop -C
> > /etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006"
> >
> >   kill_the_bastard = "kill -9 <pid>"  # hmpf...
> >
> >   if totalmem is greater than 110.0 MB for 3 cycles then
> >         restart       # eating up memory?
> >   if loadavg(5min) greater than 10 for 8 cycles then
> >     restart          # bad, bad, bad
> >   if cpu is greater than 50% for 2 cycles then
> >   alert                  # complain about it
> >
> > if cpu is greater than 80% for 5 cycles then
> >  kill_the_bastard
> >    # I am assuming that if it is killed, then monit will start it
> >
> >  if 10 restarts within 10 cycles then
> >     timeout
> >
> > so question, does 'timeout' actually send a SIGTERM/SIGHUP to the
> > proccess, or does it just execute the stop command for that particular
> > service?
> >
> > how are you guys handling stale pids with monit? In the case that
> > executing stop/restart doesn't work?
> >
> > Is what I am suggesting even possible?
> >
>
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
>


-- 
Michael Steinfeld
Linux Admin/Developer
AIM: mikesteinfeld
GTALK: address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]