monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] monit starts a process twice


From: Nick Upson
Subject: Re: [monit] monit starts a process twice
Date: Fri, 14 Mar 2008 12:18:38 +0000

great, thanks very much

On 13/03/2008, Martin Pala <address@hidden> wrote:
> Hi,
>
> thanks for data, the problem is, that monit enables the monitoring too
> early - as soon as it creates the wait_start thread. If the process is
> starting slowly, the monitoring detects that it is not running but it
> doesn't know that the start is pending already.
>
> Here is patch which should fix the problem.
>
>
> Thanks,
> Martin
>
>
>
>
>
> Nick Upson wrote:
> > Here is an extract from the log, with additonal comments as I *may*
> > have an idea of the problem
> >
> > Mon Feb  4 08:28:07 MST 2008 restart bs1
> > # this is me issueing the command for the restart
> >
> > [MST Feb  4 08:28:07] info     : restart service 'bs1' on user request
> > [MST Feb  4 08:28:07] info     : monit daemon at 10762 awakened
> > [MST Feb  4 08:28:07] info     : Awakened by User defined signal 1
> > [MST Feb  4 08:28:07] info     : 'bs1' trying to restart
> > [MST Feb  4 08:28:07] debug    : Monitoring disabled -- service bs1
> > [MST Feb  4 08:28:07] info     : 'bs1' stop: /opt/unb/bin/bs.sh
> > # everything as expected so far
> >
> > [MST Feb  4 08:28:08] debug    : 'bs1' Error testing process id
> > [10793] -- No such process
> > [MST Feb  4 08:28:08] debug    : 'bs1' Error testing process id
> > [10793] -- No such process
> > [MST Feb  4 08:28:08] debug    : 'bs1' Error testing process id
> > [10793] -- No such process
> > # now why is monit complaining about the process it says it has
> > stopped monitoring
> >
> >
> > [MST Feb  4 08:28:08] info     : 'bs1' start: /opt/unb/bin/bs.sh
> > # this does work and did start it running as shown by ps output
> >
> >
> > [MST Feb  4 08:28:08] debug    : 'bs1' Error testing process id
> > [10793] -- No such process
> > # and by this point the pid in the pid file should have the new pid in it
> >
> > [MST Feb  4 08:28:08] debug    : Monitoring enabled -- service bs1
> >
> > .
> > .
> > .
> > .
> > [MST Feb  4 08:28:09] debug    : monit: pidfile '/var/run/bs1.pid'
> > does not exist
> > [MST Feb  4 08:28:10] debug    : monit: pidfile '/var/run/bs1.pid'
> > does not exist
> > [MST Feb  4 08:28:11] debug    : monit: pidfile '/var/run/bs1.pid'
> > does not exist
> > # continues this until the next scheduled check on processes
> > .
> > .
> > .
> > [MST Feb  4 08:30:08] debug    : monit: pidfile '/var/run/bs1.pid'
> > does not exist
> > [MST Feb  4 08:30:08] error    : 'bs1' process is not running
> > [MST Feb  4 08:30:08] info     : 'bs1' trying to restart
> > [MST Feb  4 08:30:08] debug    : Monitoring disabled -- service bs1
> > [MST Feb  4 08:30:08] debug    : monit: pidfile '/var/run/bs1.pid'
> > does not exist
> > [MST Feb  4 08:30:08] debug    : monit: pidfile '/var/run/bs1.pid'
> > does not exist
> > [MST Feb  4 08:30:08] info     : 'bs1' start: /opt/unb/bin/bs.sh
> > # so here we go starting the second copy of the same process running
> >
> > [MST Feb  4 08:30:08] debug    : monit: pidfile '/var/run/bs1.pid'
> > does not exist
> > [MST Feb  4 08:30:08] debug    : Monitoring enabled -- service bs1
> >
> > # and from here on monit is happy about the second copy but the first
> > one is still there as well
> >
> >
> > --
> > To unsubscribe:
> > http://lists.nongnu.org/mailman/listinfo/monit-general
>
> Index: control.c
> ===================================================================
> RCS file: /sources/monit/monit/control.c,v
> retrieving revision 1.108
> diff -u -r1.108 control.c
> --- control.c   8 Mar 2008 01:08:39 -0000       1.108
> +++ control.c   13 Mar 2008 21:47:33 -0000
> @@ -309,9 +309,10 @@
>        LogError("Warning: Failed to create the start controller thread. "
>            "Thread error -- %s.\n", strerror(status));
>       }
> +    } else {
> +      Util_monitorSet(s);
>     }
>   }
> -  Util_monitorSet(s);
>  }
>
>
> @@ -480,6 +481,8 @@
>     Run.wait_start--;
>   END_LOCK;
>
> +  Util_monitorSet(s);
> +
>   return NULL;
>
>  }
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]