monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Monitoring processes without a pidfile


From: Daniel Clark
Subject: Re: [monit] Monitoring processes without a pidfile
Date: Tue, 22 Jul 2008 16:04:12 -0400

On 7/21/08, Martin Pala <address@hidden> wrote:
> Currently the workaround is to create the pidfile from startup script:
>  http://www.tildeslash.com/monit/doc/faq.php

The example script didn't work for me, because the service would fork
off another process as part of becoming a daemon. The below ugly shell
code seems to do the trick, but makes me think I should just use
runit/daemontools for this after all...

#!/bin/sh

ME="$(basename $0)"
PIDFILE="/var/run/$ME.pid"

DIR="/usr/local/logpp-0.15"
CMD="$DIR/bin/logpp -d -r 5 -t www.example.com -l debug $DIR/etc/$ME.conf"

PEXISTS=1
if [ -f $PIDFILE ]; then
    ps -p $(cat $PIDFILE) 2>&1 > /dev/null
    PEXISTS=$?
fi

case $1 in
    start)
           if [ $PEXISTS -eq 0 ]; then
               echo "$ME: already started; exiting with error..."
               exit 1
           fi
           if [ -f $PIDFILE ]; then
               echo "$ME: removing stale $PIDFILE"
               rm $PIDFILE
           fi
           $CMD 2>&1 >/tmp/$ME.out &
           sleep 1
           PGREPCMD="pgrep -n -U root -f '$CMD'"
           eval $PGREPCMD > $PIDFILE
           RC=$?
           if [ $RC -ne 0 ]; then
               rm $PIDFILE
               echo "$ME: failed to find PID of running command."
               echo "$ME: you may need to manually kill a process."
               exit 2
           fi
           ;;
     stop)
           if [ $PEXISTS -ne 0 ]; then
               echo "$ME: already stopped; exiting with error..."
               exit 3
           fi
           PID="$(cat $PIDFILE)"
           kill -15 $PID 2>/dev/null
           RC=$?
           if [ $RC -ne 0 ]; then
               echo "$ME: Couldn't kill -15 $PID; will try kill -9..."
           else
                rm $PIDFILE
                exit 0
           fi
           kill -9 $PID 2>/dev/null
           if [ $RC -ne 0 ]; then
                echo "$ME: Couldn't kill -9 $PID; exiting with error."
                exit 4
           else
                rm $PIDFILE
                exit 0
           fi
           ;;
        *)
           echo "usage: $ME {start|stop}" ;;
esac

>  If you need to mangle the service, you should do it via monit (like "monit
> restart <service>" since the service is controlled by monit and if for
> example the service stops by 3rd party process without monit knowing about
> it, monit will start it again.

Good point; although in practice not doing this doesn't seem to be a
problem in some cases, as the timing for things to go wrong would need
to be pretty specific. Unfortunately one of the processes I am using
monit for (sphinx search's "searchd" daemon) pretty much requires its
companion "indexer" program to be run with a switch (--rotate) that
does a kill -HUP on the running "searchd"; but it has been running
that way for a few weeks, and no monit email yet.

>  We are also planning to add support for services controled directly by
> monit - monit then won't need the pidfile (it will be parent of such
> services and will know the pid).

Ah, so perhaps if I wait long enough monit will have this
runit/daemontools feature... sweet! :-)

>  The regular expression for process name won't be much reliable, since it
> could be easily cheated by any user (starting process with matching name)
> and there also can be cases where multiple matching processes will run
> whereas it can be hard to decide which process is the correct one, etc.

Well, as you see in the shell code above, that can be worked around by
specifying the process is being run as a certain user; also I didn't
mention it, but the machines we are using monit on all do not have
non-sysadmin accounts on them.

I actually like the multiple matching in some cases, as it allows the
creation of scripts that will "fix" the machine if somehow more than
one of the exact same daemon got started by accident (in fact I'll
probably change the stop action of the included script to do that
later).

BTW thanks so much for all the quick & useful replies!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]