monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some Odd behavior on RHEL 5.4 with HTTPD (apache)


From: Martin Pala
Subject: Re: Some Odd behavior on RHEL 5.4 with HTTPD (apache)
Date: Mon, 14 Jan 2013 22:33:47 +0100

The most likely root cause is the Redhat's 5.4 /etc/init.d/httpd script ... as 
described, the monit behaviour is straightforward and driven by the pidfile 
content. If for example the httpd script removes the pidfile early (before the 
httpd is really stopped), then monit thinks that the process stopped, as the 
pidfile is no longer present (or the process referenced by it is not running). 
I'll try to replicate the problem when i'll have some spare time.

You can try to use the pattern based process check ("check process ... matching 
...") - it will wait till all httpd instances are stopped.

Regards,
Martin


On Jan 14, 2013, at 10:24 PM, "Leif Gustafson" <address@hidden> wrote:

> I concur that restarting httpd via stop and start monit mechanism is
> extremely unreliable on RedHat 5.x systems and even earlier.  It very rarely
> works correctly (processes get left behind and no pid file), so much so that
> I have started to use an exec action to run “/etc/init.d/httpd restart”
> which always works correctly.  I wonder if it is perhaps a timing issue and
> perhaps the culprit is really how RedHat's init script is structured?  If I
> find the time to research it further I will let you know what I discover.
> 
> 
> Thanks,
> Leif
> 
> From: address@hidden
> [mailto:address@hidden On
> Behalf Of Martin Pala
> Sent: Monday, January 14, 2013 6:40 AM
> To: This is the general mailing list for monit
> Subject: Re: Some Odd behavior on RHEL 5.4 with HTTPD (apache)
> 
> Monit waits for the process to stop before the start script is executed. If
> you use the pidfile based process check, the stop will wait till the process
> with matching PID is not present. The start is done only if the stop
> succeeded.
> 
> You can see the internals in src/control.c - the machinery is driven by
> control_service() ... the ACTION_RESTART calls do_stop() which internally
> uses wait_process() to check that the process stopped. If stop failed, the
> restart failed too, otherwise the do_start() is called.
> 
> There were some changes between various monit versions (the above is for
> monit 5.4), but i general the behaviour is the same in older monit versions
> too.
> 
> Regards,
> Martin
> 
> 
> On Jan 14, 2013, at 3:33 PM, Bill G. <address@hidden> wrote:
> 
> 
> Hi Martin,
> I know for a fact that they are not child processes, as the parent process
> is still alive, and the init script doesn't recognize HTTPS as running.
> What I am theorizing is that monit is running the restart processes
> simultaneously, i.e. instead of /etc/init.d/httpd stop .....waiting for
> stop.....stopped now issue the start command /etc/init.d/httpd start.
> It is issuing the commands as two separate threads and this is confusing the
> init scripts possibly because it generally takes longer for HTTPD to stop
> then start.  So it issues the stop and start at the same time. Stop wins the
> race as the first process to start, initializes the signal to kill waits,
> while next to hit is the httpd start command it says oh look pid file is
> there but the pid it lists is gone, OK start, start successful. Meanwhile
> the stop command finishes and removes the pid file. To top it off when it
> tries to restart it fails to start due to the fact that port 80 is indeed
> listening and apache is running, the init system is, however, unaware.
> I hope that makes sense.
> Thanks,
> Bill
> On Jan 14, 2013 5:37 AM, "Martin Pala" <address@hidden> wrote:
> Hi,
> 
> the httpd processes which keep running could be apache childs which are left
> to finish the pending requests when you stop apache. You can verify that by
> checking their PPID - if it's 1 (init), then they'll die as soon as they'll
> finish the work.
> 
> Have you verified that the stop+start commands work via Monit? Monit
> executes the start/stop/restart programs in a sandbox for security reasons
> and drops all environment variables and sets only spartan
> PATH=/bin:/usr/bin:/sbin:/usr/sbin. If your RHEL54-apache script depends on
> some environment variables, then it may fail.
> 
> You can verify the start/stop scripts this way:
> 
>         monit stop RHEL54-apache
>         monit start RHEL54-apache
> 
> You can also log the scripts output - modify the monit configuration file
> and reload monit:
> 
>   check process RHEL54-apache with path /etc/httpd/run/httpd.pid
>     start program = "/bin/bash -c '/etc/init.d/httpd start
>>> /tmp/RHEL54-apache.log 2>&1'"
>     stop program = "/bin/bash -c '/etc/init.d/httpd stop
>>> /tmp/RHEL54-apache.log 2>&1'"
>     ...
> 
> Regards,
> Martin
> 
> 
> 
> On Jan 13, 2013, at 11:53 AM, Bill G. <address@hidden> wrote:
> 
>> Hi List,
>> 
>> I may have stumbled across a bug, or I am just a nub (but I am pretty
>> sure it is the former)
>> 
>> I am running RHEL 5.2 and 5.4 (dont ask why, just know that at this
>> time it s a requirement)
>> 
>> For all intents and purposes the configurations between the two are
>> identical (changing only hostnames and IP addresses)
>> 
>> Here is the first configuration (note all hostnames, ipaddresses, and
>> service names are obfusticated):
>> 
>>   check host RHEL54-vip with address 10.0.0.1
>>     start program = "/sbin/ifup eth0:1"
>>     stop program = "/sbin/ifdown eth0:1"
>>     if failed icmp type echo count 3 with timeout 3 seconds then restart
>>     if 5 restarts within 5 cycles then timeout
>> 
>>   check process RHEL54-apache with path /etc/httpd/run/httpd.pid
>>     start program = "/etc/init.d/httpd start"
>>     stop program = "/etc/init.d/httpd stop"
>>     if failed host RHEL54-vip port 80
>>       protocol HTTP request "/" then restart
>>     if 5 restarts within 5 cycles then timeout
>> 
>> Everything works as expected in 5.2, but in 5.4 you get the feeling
>> that the world is ending.
>> 
>> If I force a crash from RHEL52:
>> 
>> Logs:
>> 
>> ICMP echo response for 10.0.0.1 1/3 timed out -- no response within 3
> seconds
>> ICMP echo response for 10.0.01 2/3 timed out -- no response within 3
> seconds
>> ICMP echo response for 10.0.0.1 3/3 timed out -- no response within 3
> seconds
>> 'RHEL52-vip' failed ICMP test [Echo Request]
>> 'RHEL52-apache' failed, cannot open a connection to INET[RHEL52-vip:80/]
> via TCP
>> 'RHEL52-apache' stop: /etc/init.d/httpd
>> 'RHEL52-apache' start: /etc/init.d/httpd
>> 'RHEL52-apache' started
>> 'RHEL52-apache' process is running with pid 5072
>> 
>> Everything else comes up and is happy
>> 
>> Same scenario on RHEL54
>> nds
>> nds
>> nds
>> 'RHEL54-vip' failed ICMP test [Echo Request]
>>  TCP
>> 'RHEL54-apache' stop: /etc/init.d/httpd
>> 'RHEL54-apache' start: /etc/init.d/httpd
>> 'RHEL54-apache' failed to start
>> 
>> Lather rinse repeat X5
>> 
>> What I have found is happening is that either the httpd process is not
>> being shut down properly, and/or monit is not waiting for a proper
>> return from the shutdown.
>> 
>> When in this condition:
>> 
>> # /etc/init.d/httpd status
>> httpd is stopped
>> ps -auxwww|grep httpd
>> Warning: bad syntax, perhaps a bogus '-'? See
> /usr/share/doc/procps-3.2.7/FAQ
>> root     22031  0.1  0.5 236080 11200 ?        Ss   10:07   0:00
> /usr/sbin/httpd
>> apache   22033  0.0  0.3 236212  6640 ?        S    10:07   0:00
> /usr/sbin/httpd
>> apache   22034  0.0  0.3 236212  6636 ?        S    10:07   0:00
> /usr/sbin/httpd
>> apache   22035  0.0  0.3 236212  6636 ?        S    10:07   0:00
> /usr/sbin/httpd
>> apache   22036  0.0  0.3 236212  6636 ?        S    10:07   0:00
> /usr/sbin/httpd
>> apache   22037  0.0  0.3 236212  6636 ?        S    10:07   0:00
> /usr/sbin/httpd
>> apache   22038  0.0  0.3 236212  6636 ?        S    10:07   0:00
> /usr/sbin/httpd
>> apache   22039  0.0  0.3 236212  6636 ?        S    10:07   0:00
> /usr/sbin/httpd
>> apache   22040  0.0  0.3 236212  6636 ?        S    10:07   0:00
> /usr/sbin/httpd
>> 
>> 
>> (i give it to you typo and all!)
>> 
>> Only way to fix is to issue a killall command
>> killall httpd
>> then it starts as perscribed.
>> 
>> 
>> So I thought, maybe it is something silly with the binary, so I
>> decided to compile from source.. same issue.
>> 
>> My current workaround is as follows:
>> 
>>   check process RHEL54-apache with path /etc/httpd/run/httpd.pid
>>     start program = "/etc/init.d/httpd start"
>>     stop program = "/etc/init.d/httpd stop"
>>     if failed host RHEL54-vip port 80
>>       protocol HTTP request "/" then restart
>>     if 2 restarts within 2 cycles then exec "/usr/bin/killall httpd"
>>     if 5 restarts within 5 cycles then timeout
>> 
>> Any ideas how to make this function properly?
>> --
>> Thanks,
>> Bill G.
>> address@hidden
>> 
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]