monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit process restart problem - simultaneous stop/start race


From: Leif Gustafson
Subject: Re: monit process restart problem - simultaneous stop/start race
Date: Thu, 27 Sep 2012 10:18:23 -0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120312 Thunderbird/11.0

Yes, I have experienced this many times as well.  For example, httpd on
Red Hat/CentOS usually is restarted but the pid file is almost *always*
empty afterwards.  I'm considering executing "/sbin/service httpd
restart" rather than relying on the start/stop.

On 09/27/2012 6:53 AM, Brano Zarnovican wrote:
> Hi,
>
> when I restart service manually, via init script (service foo restart)
> it works every time.
> When you try the same with monit (monit restart foo), it will end up
> in Execution failed most of the time.
>
> Root cause:
> On restart action, monit will fork and execute start program as soon
> as the monitored process disappears, irrespective if stop program has
> finished or it is still running, leading to a partial overlap of the
> end of stop execution and beginning of start.
>
> Typical init script
>
> start() {
>     start service &
>     echo $! > /var/run/foo.pid
> }
> stop() {
>     kill `cat /var/run/foo.pid`
>     rm -f /var/run/foo.pid
> }
>
>
> State #1: process 'foo' is running with pid 100, pid file exists
> monit restart foo
>
> stop: kill `cat /var/run/foo.pid`
> start: start service &
> start: echo $! > /var/run/foo.pid
> stop: rm -f /var/run/foo.pid
>
> State #2: process 'foo' is running with pid 200, pid file is missing
>
> (later, monit attempts to start a process which he consider to be down)
> start: start service &
> start: echo $! > /var/run/foo.pid
>
> depending on how good your scripts are, you end up with either
> State #3a: process 'foo' is running with pid 200, pid file contains
> 300 (failed second process)
> or
> State #3b: process 'foo' is running with pid 200, pid file is still missing
>
> Workaround is to insert few sleeps here and there (best place is
> pre-startup). Or save the timestamp of pid file before kill-ing and
> check if it was changed just before 'rm'. Or, don't delete pid file at
> all..
>
> The root of the problem is that there might be a code which is
> executed after the process is stopped which simply cannot overlap with
> start. Pid file is just one example. Imagine that you would delete
> some tmp or persistent state file during stop which are also created
> during startup.
>
> Suggested solution:
> Introduce an option that would make monit to wait for the end of stop
> program instead of process termination. Respectively, the later of the
> two events. Only then it would call start program.
>
> Regards,
>
> BranoZ
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]