--- Begin Message ---
Subject: |
Re: Handle multiple branches of the dependency tree |
Date: |
Thu, 28 Apr 2005 19:32:02 -0400 |
I'm sorry for not responding earlier. I am swamped down in work and
since this area of monit is more complicated I have to find time and
investigate and test your changes before I can respond constructively.
I hope you understand. I have it on my TODO list and I'll look at it.
With regards
Jan-Henrik
On Apr 28, 2005, at 19:43, Yiwen Jiang wrote:
> Hi Jan,
>
> Sorry to bother you, but I sent the following email to the monit-dev
> group, and have not heard anything back from the group.
>
> As per your suggestion on March 14th, on the monit mailing list, I
> have updated the monit code and tested to ensure to reverse the
> process dependency tree in monit did not break the existing
> functionality. Would you consider incorporate this change in? or??
>
> I know you must be quite busy, and I would like to just get an idea on
> the possibility of this suggested modification being accepted by you
> guys and incorporate into the monit code main stream.
>
> Thanks very much for your help.
>
> Cheers,
> Yiwen
>
> --
> Yiwen Jiang
> Nortel Networks
> E-mail: address@hidden
> Phone: (613) 763-4286
> ESN: 393-4286
>
>
>
> > -----Original Message-----
> > From: Jiang, Yiwen [CAR:9D10:EXCH]
> > Sent: Monday, April 18, 2005 12:44 PM
> > To: 'address@hidden'
> > Subject: Handle multiple branches of the dependency tree
> >
> >
> > Hi there,
> >
> > I have a suspicion that v4.4 of monit does not support
> > multiple branches in the dependency tree. I am wondering if
> > anyone on the list can verify my suspicion.
> >
> > Say I have my dependency tree as follows,
> > E->D->C->B->A
> > F->H->I->B->A
> > G->A
> >
> > Where A is the root most process, and if A dies, E, F, and G
> > will all die. Depends on the relative location of E, F, and
> > G in the servicelist (the linked list that
> > validate.c::validate() traverses in determine process
> > statuses), different processes will be started/restarted,
> > when A fails.
> >
> > If E is at the beginning of the servicelist (relative to F,
> > and G), because it has no dependents (i.e. no stop of
> > dependents is required), E will be started (line 156 of
> > control.c). Function do_start() recursively parses through
> > the dependent tree of (E->D->C->B->A) to start A, B, C, D, E
> > in that order. However, branch processes I, H, F, G that also
> > depends on A, which in theory needs to be restarted, are not
> > touched by monit.
> >
> > Similarly, if F is at the beginning of the servicelist
> > (relative to E and G), the processes that will be restarted
> > are: A, B, I, H, F.
> >
> > This is because servicelist is built with the least dependent
> > process at the beginning of the list. When validate() checks
> > for the processes, the least depending (leaf) process is
> > checked first. Upon failure detection, monit tries to start
> > it. During do_start(), it was determined that actually it was
> > A that failed. Monit starts A. However, due to the recursive
> > nature of do_start(), only the downed processes on the branch
> > that led to the detection of A being down are started.
> >
> > I emailed part of this findings to the mailing list on March
> > 14th. Now, I have implemented a test case as Jan suggested in
> > the response, with the most depending services first in the list.
> >
> > I have tested my chages against a complex, branched
> > dependency tree that I am currently using. And it looked like
> > that the root most process (say process A) that was down was
> > detected first. Based on the existing monit logic, monit
> > stopped all the dependent processes of process A, and restart
> > the dependency tree from the pross A and down, including all
> > branches of the tree.
> >
> > The code that I have changed is quite limited. Instead of
> > using a single linked list, a double linked list is used for
> > the servicelist data structure. Function validate() will use
> > the servicelist_backward and prev pointer to check for
> > process status. This means that the root most processes are
> > checked before the leaf processes.
> >
> > With this code change, it allows monit to support a wider
> > scenario of cases (including the case that some dependent
> > processes may exit once they detect the depending process has
> > exited). I do not know if this change affects any other
> > scenarios that monit covers, but it covers all scenarios that
> > I can think of in my particular case.
> >
> > My questions are:
> > 1. Do you think that the case I mentioned above is a
> > valid scenario that monit should support?
> > 2. If so, do you have a more generic test driver where
> > a regression test can be done for this set of changes?
> > 3. Would you be interested to incoorporate this change
> > into the monit code that you maintain?
> >
> > Thanks very much for your help.
> >
> > Here are the changes (affecting three files):
> >
> > 1. p.y
> > $ diff monit-4.4/p.y monitchanged/p.y
> > 2712c2712
> > < for(d= depend_list; d; d= d->next_depend)
> > ---
> > > for(d= depend_list; d; d= d->next_depend) {
> > 2714c2714,2719
> > <
> > ---
> > > if (d->next_depend != NULL) {
> > > d->next_depend->prev = d;
> > > servicelist_backward = d->next_depend;
> > > }
> > > }
> > >
> >
> > 2. validate.c
> > $ diff monit-4.4/validate.c monitchanged/validate.c
> > 141c141
> > < for(s= servicelist; s; s= s->next) {
> > ---
> > > for(s= servicelist_backward; s; s= s->prev) {
> >
> > 3. monitor.h
> > $ diff monit-4.4/monitor.h monitchanged/monitor.h
> > 617a618
> > > struct myservice *prev; /**< prev
> > service in chain */
> > 631a633
> > > Service_T servicelist_backward; /**< The
> > service list (created in p.y) */
> >
> >
> >
> > Cheers,
> > Yiwen
> >
> > --
> > Yiwen Jiang
> > Nortel Networks
> > E-mail: address@hidden
> > Phone: (613) 763-4286
> > ESN: 393-4286
> >
> >
> >
> > > -----Original Message-----
> > > From: Jiang, Yiwen [CAR:9D10:EXCH]
> > > Sent: Wednesday, March 23, 2005 12:06 PM
> > > To: 'This is the general mailing list for monit'
> > > Cc: 'address@hidden'
> > > Subject: RE: Question on the dependency of processes
> > >
> > >
> > > Hi there,
> > >
> > > Sorry for the delayed response.... Busy with product delivery
> dates.
> > >
> > > > On Mar 14, 2005, at 15:38, Yiwen Jiang wrote:
> > > >
> > > > > I am not sure if this is the proper news group that I
> > > > should post this
> > > > > question to, as there are monit implementation questions in
> this
> > > > > email...
> > > >
> > > > You should really take implementation issues to the
> > monit-developer
> > > > list. But..
> > > K. This message is an attempt to bridge this topic over to
> > > the dev group.
> > >
> > > > > What I have found was that the order in the monitrc file for
> > > > > monitoring these proceeses generate different
> > > 'servicelist' content
> > > > > (in the source code). For example, the content of
> > > > servicelist (when in
> > > > > validate.c::validate() to check for zombie processes) is
> > > > different if
> > > > > the processes are listed in reverse order in the monitrc file.
> > > > >
> > > > > For example, say I have a service dependency tree like:
> > > > > E->D->C->B->A
> > > > > F->D->C->B->A
> > > > > G->A
> > > > > Where as A is the 'root of the tree.
> > > > >
> > > > > In my monitrc file, I have 'check process' in the
> > > > following order: E,
> > > > > F, D, C, B, G, A.
> > > > >
> > > > > If I turn debug on using -v option, the checks on the zombie
> > > > > processes are in the order of: G, F, E, D, C, B, A
> > > > >
> > > > > If I reverse the order in the monitrc file, and restart
> > > monit using
> > > > > -v option, the checks on the zombie processes are in the
> > > > order of: E,
> > > > > F, D, C, B, G, A. This is in a different result than the
> > > > previous one.
> > > >
> > > > The list is initially built during parsing and reshuffled
> > > afterwards
> > > > if dependencies are present. Because of this the final
> > list may look
> > > > different if you change the order of the service entries. Note
> > > > however that in both cases the reshuffling is done so the
> > leaf nodes
> > > > are first
> > > > in the list.
> > > >
> > > > > I went through the code, and noticed that the 'servicelist' is
> > > > > actually re-organized based on the dependencies after the
> > > > > configuration file is parsed.. However, the result
> > yield the most
> > > > > visited process to be the last on the servicelist.
> > > > >
> > > > > I don't quite understand why the the most visited
> > > process is not at
> > > > > the beginning of the list. If my understanding is
> > > correct, validate
> > > > > goes through the servicelist, to check process status every
> poll
> > > > > interval. If we think of a scenario where because process A
> > > > crashed,
> > > > > process G exited. The current behaviour will result in G being
> > > > > restarted before A, despite the dependency.
> > > >
> > > > Hmm you have a point there, although the end result should
> > > be the same
> > > > it seems that you got one unnecessary restart of G. Have
> > > you verified
> > > > that this is the case? Browsing the code it does indeed
> > > look that way.
> > >
> > > Well, it is not the unnecessary restart of G that I'm
> > > concerned about (actually, I didn't notice that one at all).
> > > You see, the product I am working on is heavily depends on
> > > the process dependencies and start up order. In this product,
> > > if G detects A is down, it will exit as well. The way monit
> > > works, if I understand correctly, is that it will detect G
> > > being down first, and restart G. I have observed that the
> > > startup time dramatically increases when G is started before
> > > A, even though it is A that crashed. Is this the expected
> > behaviour?
> > >
> > >
> > > > > Would it not make more sense to have the servicelist
> > > > constructed the
> > > > > other way where the most dependent process be the first
> > > > process on the
> > > > > servicelist?
> > > > >
> > > > > Because of the dependencies between these processes, it
> > > really only
> > > > > make sense to me if monit would check for the 'root'
> > > > process first. Or
> > > > > am I mis-using monit?
> > > >
> > > > I don't remember why we ended up having the service list with the
> > > > least depending services first. It may be other scenarios that
> > > justify this
> > > > design, although no one comes to mind right now. Could you
> > > > implement a
> > > > test case with the most depending services first in the list
> > > > and verify
> > > > that dependencies continue to work as described in the monit
> > > > manual? If
> > > This will take some time, due to the way my development
> > > environment works, plus work schedule. I will try though.
> > >
> > > > it does, we'll certainly reverse the service list and
> > > accept a patch
> > > > from you or fix it ourself.
> > > What would happen if the test case fails? Shouldn't monit
> > > behaves 'properly' (i.e. start the dead process that is
> > > closest to the trunk of the dependency tree first)?
> > >
> > > Thank you VERY MUCH for your help.
> > >
> > > > --
> > > > Jan-Henrik Haukeland
> > > > Mobil +47 97141255
> > > >
> > > >
> > > >
> > > > --
> > > > To unsubscribe:
> > > http://lists.nongnu.org/mailman/listinfo/monit> -general
> > > >
> > > >
> > >
> >
>
--
Jan-Henrik Haukeland
mobile +47 97141255
--- End Message ---