monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Circular dependency detection


From: Yiwen Jiang
Subject: RE: Circular dependency detection
Date: Tue, 4 Apr 2006 08:22:53 -0400

Thank you!

BTW, about a year ago, I posted some emails with regards to the way
monit handled multiple branches of the dependency tree... Can you tell
me if 4.7 has this fixed as well please?

Thanks.

Cheers,
Yiwen

-- 
Yiwen Jiang
Nortel Networks
E-mail: address@hidden
Phone: (613) 763-4286
ESN: 393-4286
 


> -----Original Message-----
> From: address@hidden 
> [mailto:address@hidden 
> On Behalf Of Jan-Henrik Haukeland
> Sent: Monday, April 03, 2006 5:04 PM
> To: This is the general mailing list for monit
> Subject: Re: Circular dependency detection
> 
> 
> Try upgrade to version 4.7. The dependency graph was re-implemented  
> thanks to Philipp Berndt and this version should fix the problem.
> 
> On 3. apr. 2006, at 22.27, Yiwen Jiang wrote:
> 
> > Hi there,
> >
> > I am using 4.4 version of monit.
> >
> > I ran into a pecular problem the other day, with a dependency
> > definition
> > similar to the following in the /etc/monitrc file:
> >
> > B dependent on A
> > C dependent on B
> > D dependent on C
> > Z dependent on A
> >
> > During the statup, monit indicated that it detected a circular 
> > dependency. When I modified so that Z is dependent on C, the error
> > goes
> > away. But I do not believe the dependency above is a circular
> > dependency... Or is it?
> >
> > Thanks very much for your help.
> >
> > Cheers,
> > Yiwen
> 
> 
> 
> --
> To unsubscribe: http://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
--- Begin Message --- Subject: Re: Handle multiple branches of the dependency tree Date: Thu, 28 Apr 2005 19:32:02 -0400
I'm sorry for not responding earlier. I am swamped down in work and 
since this area of monit is more complicated I have to find time and 
investigate and test your changes before I can respond constructively. 
I hope you understand. I have it on my TODO list and I'll look at it.

With regards

Jan-Henrik

On Apr 28, 2005, at 19:43, Yiwen Jiang wrote:

> Hi Jan,
>
> Sorry to bother you, but I sent the following email to the monit-dev 
> group, and have not heard anything back from the group.
>
> As per your suggestion on March 14th, on the monit mailing list, I 
> have updated the monit code and tested to ensure to reverse the 
> process dependency tree in monit did not break the existing 
> functionality. Would you consider incorporate this change in? or?? 
>
> I know you must be quite busy, and I would like to just get an idea on 
> the possibility of this suggested modification being accepted by you 
> guys and incorporate into the monit code main stream.
>
> Thanks very much for your help.
>
> Cheers,
> Yiwen
>
> -- 
> Yiwen Jiang
> Nortel Networks
> E-mail: address@hidden
> Phone: (613) 763-4286
> ESN: 393-4286
>  
>
>
> > -----Original Message-----
> > From: Jiang, Yiwen [CAR:9D10:EXCH]
> > Sent: Monday, April 18, 2005 12:44 PM
> > To: 'address@hidden'
> > Subject: Handle multiple branches of the dependency tree
> >
> >
> > Hi there,
> >
> > I have a suspicion that v4.4 of monit does not support
> > multiple branches in the dependency tree. I am wondering if
> > anyone on the list can verify my suspicion.
> >
> > Say I have my dependency tree as follows,
> > E->D->C->B->A
> > F->H->I->B->A
> > G->A
> >
> > Where A is the root most process, and if A dies, E, F, and G
> > will all die.  Depends on the relative location of E, F, and
> > G in the servicelist (the linked list that
> > validate.c::validate() traverses in determine process
> > statuses), different processes will be started/restarted,
> > when A fails.
> >
> > If E is at the beginning of the servicelist (relative to F,
> > and G), because it has no dependents (i.e. no stop of
> > dependents is required), E will be started (line 156 of
> > control.c). Function do_start() recursively parses through
> > the dependent tree of (E->D->C->B->A) to start A, B, C, D, E
> > in that order. However, branch processes I, H, F, G that also
> > depends on A, which in theory needs to be restarted, are not
> > touched by monit.
> >
> > Similarly, if F is at the beginning of the servicelist
> > (relative to E and G), the processes that will be restarted
> > are: A, B, I, H, F.
> >
> > This is because servicelist is built with the least dependent
> > process at the beginning of the list. When validate() checks
> > for the processes, the least depending (leaf) process is
> > checked first. Upon failure detection, monit tries to start
> > it. During do_start(), it was determined that actually it was
> > A that failed. Monit starts A. However, due to the recursive
> > nature of do_start(), only the downed processes on the branch
> > that led to the detection of A being down are started.
> >
> > I emailed part of this findings to the mailing list on March
> > 14th. Now, I have implemented a test case as Jan suggested in
> > the response, with the most depending services first in the list.
> >
> > I have tested my chages against a complex, branched
> > dependency tree that I am currently using. And it looked like
> > that the root most process (say process A) that was down was
> > detected first. Based on the existing monit logic, monit
> > stopped all the dependent processes of process A, and restart
> > the dependency tree from the pross A and down, including all
> > branches of the tree.
> >
> > The code that I have changed is quite limited. Instead of
> > using a single linked list, a double linked list is used for
> > the servicelist data structure. Function validate() will use
> > the servicelist_backward and prev pointer to check for
> > process status. This means that the root most processes are
> > checked before the leaf processes.
> >
> > With this code change, it allows monit to support a wider
> > scenario of cases (including the case that some dependent
> > processes may exit once they detect the depending process has
> > exited).  I do not know if this change affects any other
> > scenarios that monit covers, but it covers all scenarios that
> > I can think of in my particular case. 
> >
> > My questions are:
> >       1. Do you think that the case I mentioned above is a
> > valid scenario that monit should support?
> >       2. If so, do you have a more generic test driver where
> > a regression test can be done for this set of changes? 
> >       3. Would you be interested to incoorporate this change
> > into the monit code that you maintain? 
> >
> > Thanks very much for your help.
> >
> > Here are the changes (affecting three files):
> >
> > 1. p.y
> > $ diff monit-4.4/p.y monitchanged/p.y
> > 2712c2712
> > <     for(d= depend_list; d; d= d->next_depend)
> > ---
> > >     for(d= depend_list; d; d= d->next_depend) {
> > 2714c2714,2719
> > <    
> > ---
> > >       if (d->next_depend != NULL) {
> > >         d->next_depend->prev = d;
> > >         servicelist_backward = d->next_depend;
> > >       }
> > >     }
> > >
> >
> > 2. validate.c
> > $ diff monit-4.4/validate.c monitchanged/validate.c
> > 141c141
> > <   for(s= servicelist; s; s= s->next) {
> > ---
> > >   for(s= servicelist_backward; s; s= s->prev) {
> >
> > 3. monitor.h
> > $ diff monit-4.4/monitor.h monitchanged/monitor.h
> > 617a618
> > >   struct myservice *prev;                         /**< prev
> > service in chain */
> > 631a633
> > > Service_T servicelist_backward;                /**< The
> > service list (created in p.y) */
> >
> >
> >
> > Cheers,
> > Yiwen
> >
> > --
> > Yiwen Jiang
> > Nortel Networks
> > E-mail: address@hidden
> > Phone: (613) 763-4286
> > ESN: 393-4286
>
> >
> >
> > > -----Original Message-----
> > > From: Jiang, Yiwen [CAR:9D10:EXCH]
> > > Sent: Wednesday, March 23, 2005 12:06 PM
> > > To: 'This is the general mailing list for monit'
> > > Cc: 'address@hidden'
> > > Subject: RE: Question on the dependency of processes
> > >
> > >
> > > Hi there,
> > >
> > > Sorry for the delayed response.... Busy with product delivery 
> dates.
> > >
> > > > On Mar 14, 2005, at 15:38, Yiwen Jiang wrote:
> > > >
> > > > > I am not sure if this is the proper news group that I
> > > > should post this
> > > > > question to, as there are monit implementation questions in 
> this
> > > > > email...
> > > >
> > > > You should really take implementation issues to the
> > monit-developer
> > > > list. But..
> > > K. This message is an attempt to bridge this topic over to
> > > the dev group.
> > > 
> > > > > What I have found was that the order in the monitrc file for
> > > > > monitoring these proceeses generate different
> > > 'servicelist' content
> > > > > (in the source code). For example, the content of
> > > > servicelist (when in
> > > > > validate.c::validate() to check for zombie processes) is
> > > > different if
> > > > > the processes are listed in reverse order in the monitrc file.
> > > > >
> > > > > For example, say I have a service dependency tree like:
> > > > > E->D->C->B->A
> > > > > F->D->C->B->A
> > > > > G->A
> > > > > Where as A is the 'root of the tree.
> > > > >
> > > > >  In my monitrc file, I have 'check process' in the
> > > > following order: E,
> > > > > F, D, C, B, G, A.
> > > > >
> > > > >  If I turn debug on using -v option, the checks on the zombie
> > > > > processes are in the order of: G, F, E, D, C, B, A
> > > > >
> > > > >  If I reverse the order in the monitrc file, and restart
> > > monit using
> > > > > -v option, the checks on the zombie processes are in the
> > > > order of: E,
> > > > > F, D, C, B, G, A. This is in a different result than the
> > > > previous one.
> > > >
> > > > The list is initially built during parsing and reshuffled
> > > afterwards
> > > > if dependencies are present. Because of this the final
> > list may look
> > > > different if you change the order of the service entries. Note
> > > > however that in both cases the reshuffling is done so the
> > leaf nodes
> > > > are first
> > > > in the list.
> > > >
> > > > > I went through the code, and noticed that the 'servicelist' is
> > > > > actually re-organized based on the dependencies after the
> > > > > configuration file is parsed.. However, the result
> > yield the most
> > > > > visited process to be the last on the servicelist.
> > > > >
> > > > >  I don't quite understand why the the most visited
> > > process is not at
> > > > > the beginning of the list. If my understanding is
> > > correct, validate
> > > > > goes through the servicelist, to check process status every 
> poll
> > > > > interval. If we think of a scenario where because process A
> > > > crashed,
> > > > > process G exited. The current behaviour will result in G being
> > > > > restarted before A, despite the dependency.
> > > >
> > > > Hmm you have a point there, although the end result should
> > > be the same
> > > > it seems that you got one unnecessary restart of G. Have
> > > you verified
> > > > that this is the case? Browsing the code it does indeed
> > > look that way.
> > >
> > > Well, it is not the unnecessary restart of G that I'm
> > > concerned about (actually, I didn't notice that one at all).
> > > You see, the product I am working on is heavily depends on
> > > the process dependencies and start up order. In this product,
> > > if G detects A is down, it will exit as well. The way monit
> > > works, if I understand correctly, is that it will detect G
> > > being down first, and restart G. I have observed that the
> > > startup time dramatically increases when G is started before
> > > A, even though it is A that crashed.  Is this the expected
> > behaviour?
> > >
> > >
> > > > >  Would it not make more sense to have the servicelist
> > > > constructed the
> > > > > other way where the most dependent process be the first
> > > > process on the
> > > > > servicelist?
> > > > >
> > > > > Because of the dependencies between these processes, it
> > > really only
> > > > > make sense to me if monit would check for the 'root'
> > > > process first. Or
> > > > > am I mis-using monit?
> > > >
> > > > I don't remember why we ended up having the service list with the
> > > > least depending services first. It may be other scenarios that
> > > justify this
> > > > design, although no one comes to mind right now. Could you
> > > > implement a
> > > > test case with the most depending services first in the list
> > > > and verify
> > > > that dependencies continue to work as described in the monit
> > > > manual? If
> > > This will take some time, due to the way my development
> > > environment works, plus work schedule. I will try though.
> > >
> > > > it does, we'll certainly reverse the service list and
> > > accept a patch
> > > > from you or fix it ourself.
> > > What would happen if the test case fails? Shouldn't monit
> > > behaves 'properly' (i.e. start the dead process that is
> > > closest to the trunk of the dependency tree first)?
> > >
> > > Thank you VERY MUCH for your help.
> > > 
> > > > --
> > > > Jan-Henrik Haukeland
> > > > Mobil +47 97141255
> > > >
> > > >
> > > >
> > > > --
> > > > To unsubscribe:
> > > http://lists.nongnu.org/mailman/listinfo/monit> -general
> > > >
> > > >
> > >
> >
>
--
Jan-Henrik Haukeland
mobile +47 97141255



--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]