|Subject:||RE: Question on the dependency of processes|
|Date:||Thu, 14 Apr 2005 15:50:18 -0400|
I sent a couple of emails to the general mailing list under the same subject line, but got busy in some other things. I finally got some time to look at monit again. And I am in the process of building a test case as requested, and will keep you posted on the results.
However, I noticed that in the 'Dependencies' section of the monit manual, there was an example of process start/stop/crashed scenario. In the example, there was not a mentioning of multiple process crash scenarioes. In fact, in the multiple crash/exit scenario, the start up sequence is different from a clean start scenario. I am starting to wonder if what I'm using monit for is indeed its design intent.
Using the following server startup as an example:
WEB-SERVER -> APPLICATION-SERVER -> DATABASE -> FILESYSTEM
(a) (b) (c) (d)
If b does not run, monit will "first stop a then start b and finally start a again". Cool.
The behaviour changes quite a bit if both a and b are found to be not running by monit:
Because of the way monit is implemented, a will be started first, and then b. This is in conflicting with the 'If no servers are running' behaviour, where as b is started before a.
While servers should be designed to accommodate the disappearing of its own communication parties, there are applications that do not behave like this. Instead, they count on process dependencies framework (i.e. restarting dependant processes) to ensure the proper re-establishment of communication once the most depending process is restarted.
Is monit designed to handle cases like this? Or is there a pre-requisite where if process a depends on process b, and if b is down (crashed, killed, etc), a will remain up and running?
Thanks very much.
Phone: (613) 763-4286
> -----Original Message-----
> From: Jiang, Yiwen [CAR:9D10:EXCH]
> Sent: Wednesday, March 23, 2005 12:06 PM
> To: 'This is the general mailing list for monit'
> Cc: 'address@hidden'
> Subject: RE: Question on the dependency of processes
> Hi there,
> Sorry for the delayed response.... Busy with product delivery dates.
> > On Mar 14, 2005, at 15:38, Yiwen Jiang wrote:
> > > I am not sure if this is the proper news group that I
> > should post this
> > > question to, as there are monit implementation questions in this
> > > email...
> > You should really take implementation issues to the monit-developer
> > list. But..
> K. This message is an attempt to bridge this topic over to
> the dev group.
> > > What I have found was that the order in the monitrc file for
> > > monitoring these proceeses generate different
> 'servicelist' content
> > > (in the source code). For example, the content of
> > servicelist (when in
> > > validate.c::validate() to check for zombie processes) is
> > different if
> > > the processes are listed in reverse order in the monitrc file.
> > >
> > > For example, say I have a service dependency tree like:
> > > E->D->C->B->A
> > > F->D->C->B->A
> > > G->A
> > > Where as A is the 'root of the tree.
> > >
> > > In my monitrc file, I have 'check process' in the
> > following order: E,
> > > F, D, C, B, G, A.
> > >
> > > If I turn debug on using -v option, the checks on the zombie
> > > processes are in the order of: G, F, E, D, C, B, A
> > >
> > > If I reverse the order in the monitrc file, and restart
> monit using
> > > -v option, the checks on the zombie processes are in the
> > order of: E,
> > > F, D, C, B, G, A. This is in a different result than the
> > previous one.
> > The list is initially built during parsing and reshuffled
> > if dependencies are present. Because of this the final list may look
> > different if you change the order of the service entries.
> > Note however
> > that in both cases the reshuffling is done so the leaf nodes
> > are first
> > in the list.
> > > I went through the code, and noticed that the 'servicelist' is
> > > actually re-organized based on the dependencies after the
> > > configuration file is parsed.. However, the result yield the most
> > > visited process to be the last on the servicelist.
> > >
> > > I don't quite understand why the the most visited
> process is not at
> > > the beginning of the list. If my understanding is
> correct, validate
> > > goes through the servicelist, to check process status every poll
> > > interval. If we think of a scenario where because process A
> > crashed,
> > > process G exited. The current behaviour will result in G being
> > > restarted before A, despite the dependency.
> > Hmm you have a point there, although the end result should
> be the same
> > it seems that you got one unnecessary restart of G. Have
> you verified
> > that this is the case? Browsing the code it does indeed
> look that way.
> Well, it is not the unnecessary restart of G that I'm
> concerned about (actually, I didn't notice that one at all).
> You see, the product I am working on is heavily depends on
> the process dependencies and start up order. In this product,
> if G detects A is down, it will exit as well. The way monit
> works, if I understand correctly, is that it will detect G
> being down first, and restart G. I have observed that the
> startup time dramatically increases when G is started before
> A, even though it is A that crashed. Is this the expected behaviour?
> > > Would it not make more sense to have the servicelist
> > constructed the
> > > other way where the most dependent process be the first
> > process on the
> > > servicelist?
> > >
> > > Because of the dependencies between these processes, it
> really only
> > > make sense to me if monit would check for the 'root'
> > process first. Or
> > > am I mis-using monit?
> > I don't remember why we ended up having the service list with
> > the least
> > depending services first. It may be other scenarios that
> justify this
> > design, although no one comes to mind right now. Could you
> > implement a
> > test case with the most depending services first in the list
> > and verify
> > that dependencies continue to work as described in the monit
> > manual? If
> This will take some time, due to the way my development
> environment works, plus work schedule. I will try though.
> > it does, we'll certainly reverse the service list and
> accept a patch
> > from you or fix it ourself.
> What would happen if the test case fails? Shouldn't monit
> behaves 'properly' (i.e. start the dead process that is
> closest to the trunk of the dependency tree first)?
> Thank you VERY MUCH for your help.
> > --
> > Jan-Henrik Haukeland
> > Mobil +47 97141255
> > --
> > To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit> -general
|[Prev in Thread]||Current Thread||[Next in Thread]|