monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Monit restart command problem


From: Martin Pala
Subject: Re: [monit] Monit restart command problem
Date: Thu, 05 Mar 2009 20:26:57 +0100
User-agent: Thunderbird 2.0.0.19 (X11/20090105)

The beta is over - next release is final, due in 1 week at maximum. We can add the feature to some next release after 5.0.



Perdue, Emmett wrote:
I understand the Parent - Child relationship. But that does not hold true in all cases. JBoss is a good case. There is a single start executable / script. That will spawn several processes, of which not all have a Parent - Child relationship. I can work around the issue I think temporarily. But I think a permanent solution is for Monit to process every PID in a .pid file. That would be the cleanest way to handle. Is that something that could be worked into Monit during the beta phase?

------------------------------------------------------------------------
*From:* address@hidden [mailto:address@hidden *On Behalf Of *Martin Pala
*Sent:* Thursday, March 05, 2009 1:52 PM
*To:* This is the general mailing list for monit
*Subject:* Re: [monit] Monit restart command problem

If these processes have common parent process (like apache which spawns child processes), monit watches the parent process.

If your script starts three independent processes with parent being init (pid 1), then you will need some workaround. For example modify the start script to check that all processes are stopped before starting - if they are running, sleep 1 and check again.

We can most also modify monit to check all pids from pidfile.



split the configuration and starup script to three independent processes (which they really are)

On Mar 5, 2009, at 7:38 PM, Perdue, Emmett wrote:

If a "program" that Monit controls has more than 1 PID and all of those are started from a single start script, but ALL must be stopped BEFORE the start command is issued on a restart... how is that done with Monit? Not every piece of software has just a single PID associated with it.

------------------------------------------------------------------------
*From:* address@hidden [mailto:address@hidden *On Behalf Of *Martin Pala
*Sent:* Thursday, March 05, 2009 1:34 PM
*To:* This is the general mailing list for monit
*Subject:* Re: [monit] Monit restart command problem

Yes, monit reads only one pid from pidfile


On Mar 5, 2009, at 7:30 PM, Perdue, Emmett wrote:

Process Name          = jboss_eradbre
 Group                = server
Pid file = /opt/local/software/jboss/jboss-4.0.5.GA/logs/jboss_eradbre.pid
 Monitoring mode      = active
Start program = '/etc/init.d/jboss_eradbre start' timeout 30 second(s) Stop program = '/etc/init.d/jboss_eradbre stop' timeout 30 second(s)
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
Timeout = If 3 restart within 5 cycles then unmonitor else if succeeded then alert There are multiple PID's in the jboss_eradbre.pid file. 3 in this case. See below: $ cat /opt/local/software/jboss/jboss-4.0.5.GA/logs/jboss_eradbre.pid
9800
9981
10004
Could this be the problem? Could Monit be stopping the 1st PID and then issuing the start command without waiting on the 2nd and 3rd PID's to stop? If I run the /etc/init.d/jboss_eradbre itself, the problem does not happen, it only happens when Monit handles process.
------------------------------------------------------------------------
*From:* address@hidden <mailto:address@hidden> [mailto:address@hidden *On Behalf Of *Martin Pala
*Sent:* Thursday, March 05, 2009 1:20 PM
*To:* This is the general mailing list for monit
*Subject:* Re: [monit] Monit restart command problem

If the service is process, monit execs the stop command and waits for the process with pid matching the pidfile content to stop. As soon as the process stops, start script is executed. If the process is stopping quickly, the start script can be executed very quickly (within the same second).

If the check is for different service type (like file, directory, host, etc.), then the stop script is executed followed by start immediately since monit has currently no way how to identify whether the stop script finished OK or not.

What is the configuration of jboss_eradbre service?

You can run monit with -v option to see details.


On Mar 5, 2009, at 5:44 PM, Perdue, Emmett wrote:

I am seeing some strange behavior from Monit when a restart command is issued. When I issue a "monit restart app_name" command, Monit is sending the stop and start commands in the monitrc file back to back within 1/10 of a second. It is not sending the stop command and waiting for it to finish before sending the start command.

If I run the scripts outside of Monit, all is fine. What should I look for? Below is a snip of the Monit log from when the problem happens…

[EST Mar 5 10:12:32] debug : restart service 'jboss_eradbre' on user request
[EST Mar  5 10:12:32] info     : monit daemon at 25448 awakened
[EST Mar  5 10:12:32] info     : Awakened by User defined signal 1
[EST Mar  5 10:12:32] info     : 'jboss_eradbre' trying to restart
[EST Mar 5 10:12:32] info : 'jboss_eradbre' stop: /etc/init.d/jboss_eradbre [EST Mar 5 10:12:33] info : 'jboss_eradbre' start: /etc/init.d/jboss_eradbre


Thank You,

Emmett D. Perdue
*CSX Corp.*
*Sr. Systems Admin - RHCE*
*Middleware Software Provisioning*
Phone: (904) 633-5187 RNX: 633-5187
E-Mail: address@hidden <mailto:address@hidden>

Picture (Metafile)
*/"Individuals Play the Game, But Teams Win Championships!"/*


------------------------------------------------------------------------

*This email transmission and any accompanying attachments may contain CSX privileged and confidential information intended only for the use of the intended addressee. Any dissemination, distribution, copying or action taken in reliance on the contents of this email by anyone other than the intended recipient is strictly prohibited. If you have received this email in error please immediately delete it and notify sender at the above CSX email address. Sender and CSX accept no liability for any damage caused directly or indirectly by receipt of this email. *

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general


------------------------------------------------------------------------

*This email transmission and any accompanying attachments may contain CSX privileged and confidential information intended only for the use of the intended addressee. Any dissemination, distribution, copying or action taken in reliance on the contents of this email by anyone other than the intended recipient is strictly prohibited. If you have received this email in error please immediately delete it and notify sender at the above CSX email address. Sender and CSX accept no liability for any damage caused directly or indirectly by receipt of this email. *

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general


------------------------------------------------------------------------

*This email transmission and any accompanying attachments may contain CSX privileged and confidential information intended only for the use of the intended addressee. Any dissemination, distribution, copying or action taken in reliance on the contents of this email by anyone other than the intended recipient is strictly prohibited. If you have received this email in error please immediately delete it and notify sender at the above CSX email address. Sender and CSX accept no liability for any damage caused directly or indirectly by receipt of this email. *

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

------------------------------------------------------------------------

* This email transmission and any accompanying attachments may contain CSX privileged and confidential information intended only for the use of the intended addressee. Any dissemination, distribution, copying or action taken in reliance on the contents of this email by anyone other than the intended recipient is strictly prohibited. If you have received this email in error please immediately delete it and notify sender at the above CSX email address. Sender and CSX accept no liability for any damage caused directly or indirectly by receipt of this email. *


------------------------------------------------------------------------

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]