monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: processing error?


From: Martin Pala
Subject: Re: processing error?
Date: Tue, 11 Sep 2012 17:49:17 +0200

Yes, monit calls the "restart" action (as showed in the log) which normally consists of "stop"->"start" provided that Monit can verify that the process is running. If the pidfile is missing, monit assumes that the service is not running => skips the call of "stop" program as it supposes it's not necessary (program stopped already). The "restart" action then expands to "start" script execution only. As it seems apache start script finds the apache instance even without the pidfile and probably exits without recreating the pidfile.



On Sep 11, 2012, at 5:42 PM, Nick Upson <address@hidden> wrote:

The problem appears to be that, according to the log monit does a 'restart' but then does a 'start' as well

Nick Upson



On 11 September 2012 16:28, Martin Pala <address@hidden> wrote:
Yes, if the pidfile is removed and Monit is set to use the pidfile, then it will try to restart the service as it's driven by the pidfile content and check it every cycle. Because the pidfile is missing, the "restart" action is reduced to plain "start" - if the apache start script doesn't create the missing pidfile in this case, then Monit thinks that the service is not running (as it has no key to the process' PID).

=> if the start of apache via Monit works (i.e. no environment problem), but apache start script just doesn't recreate the pidfile, then you can try replace the start action with restart to make sure that the apache script will create the pidfile:
--8<--
check process apache with pidfile /var/run/httpd.pid
start program = "/etc/init.d/httpd restart"
stop  program = "/etc/init.d/httpd stop"
--8<--

Regards,
Martin


On Sep 11, 2012, at 5:15 PM, Nick Upson <address@hidden> wrote:

The issue is not the environment, its the existance of the pid file (or not). I can recreate the problem by taking a monit system that has already started apache (proving its not the environment) and remove the pid file, I then get the same output as shown below, repeated every cycle, with no exit to the state.


Nick Upson



On 11 September 2012 15:55, Martin Pala <address@hidden> wrote:
Hi,

the debug message about the non-existing pidfilewithin the same second can be ignored - it's logged only in the debug mode by the method Util_isProcessRunning() (the debug messages repetition is fixed in newer Monit versions). When you filter out the debug messages, the event-action sequence is showed:

[BST Sep  5 18:55:07] error    : 'apache' process is not running
[BST Sep  5 18:55:07] info     : 'apache' trying to restart
[BST Sep  5 18:55:07] info     : 'apache' start: /etc/init.d/httpd
[BST Sep  5 18:55:37] error    : 'apache' failed to start

=> Monit detected that apache is not running at 18:55:07, it tried to restart it, but apache didn't start within 30 seconds.

The reason why apache didn't start via Monit is most probably its dependence on some environment variable. Monit purges the environment for security reasons when executing the program, so you get only: PATH=/bin:/usr/bin:/sbin:/usr/sbin + MONIT_ variables. Sometimes linux distributions use special file (such as  /etc/apache2/envvars on Debian) where you can set environment variables which are required for Apache.

You can wrap the start program in shell - it'll load the environment you need:

--8<--
start program = "/bin/bash -c '/etc/init.d/httpd start'"
--8<--

Regards,
Martin



On Sep 10, 2012, at 2:51 PM, Nick Upson <address@hidden> wrote:

I think that my theory may hold water, from /etc/init.d/httpd

# When stopping httpd a delay of >10 second is required before SIGKILLing the
# httpd parent; this gives enough time for the httpd parent to SIGKILL any
# errant children.
stop() {
        echo -n $"Stopping $prog: "
        killproc -p ${pidfile} -d 10 $httpd
        RETVAL=$?
        echo
        [ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
}

Nick Upson



On 10 September 2012 12:56, Nick Upson <address@hidden> wrote:
I've just found a server where monit has been attempting to restart httpd for several days.

the monit config entry is:

check process apache with pidfile /var/run/httpd.pid
start program = "/etc/init.d/httpd start"
stop  program = "/etc/init.d/httpd stop"

output in the monit.log is:

[BST Sep  5 18:55:07] debug    : monit: pidfile '/var/run/httpd.pid' does not exist
[BST Sep  5 18:55:07] error    : 'apache' process is not running
[BST Sep  5 18:55:07] info     : 'apache' trying to restart
[BST Sep  5 18:55:07] debug    : monit: pidfile '/var/run/httpd.pid' does not exist
[BST Sep  5 18:55:07] debug    : monit: pidfile '/var/run/httpd.pid' does not exist
[BST Sep  5 18:55:07] info     : 'apache' start: /etc/init.d/httpd
[BST Sep  5 18:55:07] debug    : monit: pidfile '/var/run/httpd.pid' does not exist
[BST Sep  5 18:55:07] debug    : monit: pidfile '/var/run/httpd.pid' does not exist
[BST Sep  5 18:55:37] error    : 'apache' failed to start

I think what is happening is that the restart and the start execution are over-lapping such that the removal of the pid file
by the restart happens after the separate start. Why a restart is being attempted I do not understand.

This was corrected on the server by doing a manual kill of the httpd processes


and, BTW, reporting that the pid file does not exist as part of starting a process is, at best, redundant
and at worse, confusing



Nick Upson




--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


reply via email to

[Prev in Thread] Current Thread [Next in Thread]