bug-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Definitely a bug in locking.


From: Cole, William
Subject: RE: Definitely a bug in locking.
Date: Mon, 21 Mar 2005 18:05:22 -0500

I found the problem by doing amazingly cruel debug things to locks.c and
figured out exactly what is happening *there* although I have not tracked it
back to the specific source of the problem. 

I believe the problem is that 'operand' as it is passed in seems to be the
same pointer that was returned from a prior CanonifyName, i.e.
CanonifyName's 'buffer' static pointer. The canonified name has not been
copied out to its own buffer. When the CanonifyName(operator) call is made
on line 216, 'operand' is clobbered: it reads 'copy' after that call. On
line 217 CanonifyName(operand) is called and that completely whacks
'operand' because it is really the same pointer as 'buffer' still. 

I suppose there would be 2 ways to approach this:

1. Find everywhere that GetLock is called and make sure that 'operand' as
passed is not CanonifyName's buffer. 
2. Make local copies of those strings. 




> -----Original Message-----
> From: Mark Burgess [mailto:address@hidden
> Sent: Monday, March 21, 2005 5:14 PM
> To: Cole, William
> Cc: 'address@hidden'; Baker, Darryl; 'address@hidden'
> Subject: RE: Definitely a bug in locking.
> 
> 
> 
> Okay, I see. I cannot see anything wrong here, but I have made a small
> change in do.c where the lock is called. Get the latest from 
> subversion
> and let me know if this continues to happen, thanks.
> 
> M
> 
> 
> On Mon, 2005-03-21 at 16:25 -0500, Cole, William wrote:
> > Mark,
> >    I am the co-worker Darryl referred to, and perhaps I can 
> explain better
> > what I think I see here. 
> >  
> > > - -----Original Message-----
> > > From: Mark Burgess [mailto:address@hidden
> > > Sent: Sunday, March 20, 2005 4:26 AM
> > > To: Baker, Darryl
> > > Cc: address@hidden
> > > Subject: Re: Definitely a bug in locking.
> > > 
> > > 
> > > I cannot figure out what, if anything, is wrong here, Sorry to be
> > > slow.
> > 
> > Note the last few lines of the debug output:
> > 
> > > >
> > 
> GetLock(copy,_var_cfengine_master_etc_sudoers__usr_local_etc_s
> udoers_sysadm0
> > 5_abh_vw_com,time=1111156198), ExpireAfter=60, IfElapsed=1
> > > > GetLastLock()
> > > > cfengine:asgqd545: Nothing scheduled for copy. (0/1 
> minutes elapsed) 
> > 
> > The first and last are generated by lines 212 and 257/258 
> respectively in
> > locks.c, inside the GetLock function. The last line is 
> constructed thus:
> > 
> >    snprintf(OUTPUT,CF_BUFSIZE*2,"Nothing scheduled for 
> %s.%s (%u/%u minutes
> > elapsed\n",operator,operand,elapsedtime,ifelapsed);
> > 
> > Note that 'operand' seems to have become null!
> > 
> > 
> > > Can you make sure that we are talking about the same 
> thing by trying
> > > the
> > > version currently in the subversion repository or snapshot. (e.g.
> > > send
> > > me the bit of code that you think is wrong) and explain 
> exactly the
> > > symptoms. The debug output here tells me nothing because 
> I don't know
> > > what I'm looking for.
> > 
> > This test was against a version downloaded 3/14/05 calling 
> itself 2.1.14.
> > The locks.c appears to be identical to the one in the 
> Subversion repository.
> > 
> > 
> > I have just downloaded the latest snapshot and will try 
> running the cfagent
> > from it to see if the same thing happens. 
> > 
> > > 
> > > Thanks
> > > 
> > > M
> > > 
> > > On Fri, 2005-03-18 at 09:43 -0500, Baker, Darryl wrote:
> > > > One of my co-workers dug in a bit deeper into our problems last
> > > > night and found:
> > > > 
> > > > Solaris 8 and 9 on SPARC using a few days old 
> snap-shot. Berkeley
> > > > DB=3.3   
> > > > 
> > > > I ran cfagent in all-out debug mode (-d2)  on asgqd545 
> and noticed
> > > > from the trace below. This was repeated for all of the 'copy'
> > > > actions that should have occurred. Note the 'IfElapsed' 
> parameter
> > > > to GetLock. Now note the third line. I dug up the doc 
> on IfElapsed,
> > > > and discovered that it is intended to assure that cfengine can't
> > > > run off and insanely start copying everything at full 
> tilt. I set
> > > > "IfElapsed = ( 0 )" in cf.main and the file copies went right
> > > > through. 
> > > > 
> > > > I have managed to prove that my C is hopelessly rusty, 
> but I'm 99%
> > > > certain that the bug is somewhere in lines 212-257 of locks.c. I
> > > > know that because the variable 'operand' goes null in 
> there somehow
> > > > (!) and results in that third line of debug not having what it
> > > > should. (Line 212 sets the first line, 257 sets the last, and
> > > > 'operand' goes missing in there.... just wrong...) 
> > > > 
> > > > 
> > > > 
> > > > ExpandVarstring(sysadm05.abh.vw.com)
> > > > ExpandVarstring(/var/cfengine/master/etc/sudoers)
> > > > ExpandVarstring(/usr/local/etc/sudoers)
> > > > Checking copy from
> > > > sysadm05.abh.vw.com:/var/cfengine/master/etc/sudoers to
> > > > /usr/local/etc/sudoers
> > > > ExpandVarstring(sysadm05.abh.vw.com)
> > > > Server connection to sysadm05.abh.vw.com already open on 4
> > > > Authentic connection verified
> > > > cf_rstat(/var/cfengine/master/etc/sudoers)
> > > > GetCachedStatData(/var/cfengine/master/etc/sudoers)
> > > > Did not find in cache
> > > > Transaction Send[t 54][Packed text]
> > > > Attempting to send 62 bytes
> > > > SendSocketStream, sent 62
> > > > RecvSocketStream(8)
> > > >     (Concatenated 8 from stream)
> > > > Transaction Receive [t 74][]
> > > > RecvSocketStream(74)
> > > >     (Concatenated 74 from stream)
> > > > Mode = 288,0
> > > > OK: type=0
> > > >  mode=440
> > > >  lmode=0
> > > >  uid=0
> > > >  gid=0
> > > >  size=22418
> > > >  atime=1111154948
> > > >  mtime=1111103746 ino=281856 nlnk=1, dev=22282242
> > > > RecvSocketStream(8)
> > > >     (Concatenated 8 from stream)
> > > > Transaction Receive [t 3][]
> > > > RecvSocketStream(3)
> > > >     (Concatenated 3 from stream)
> > > > Linkbuffer: OK:
> > > > 
> GetLock(copy,_var_cfengine_master_etc_sudoers__usr_local_etc_sudoers
> > > > _sysadm0 5_abh_vw_com,time=1111156198), ExpireAfter=60, 
> IfElapsed=1
> > > > GetLastLock()
> > > > cfengine:asgqd545: Nothing scheduled for copy. (0/1 minutes
> > > > elapsed)  
> > > > 
> > > > 
> ____________________________________________________________________
> > > > _ Darryl Baker
> > > > gedas USA, Inc.
> > > > Operational Services Business Unit
> > > > 3800 Hamlin Road
> > > > Auburn Hills, MI 48326
> > > > US
> > > > phone   +1-248-754-5341
> > > > fax     +1-248-754-6399
> > > > address@hidden
> > > > http://www.gedasusa.com
> > > > 
> ____________________________________________________________________
> > > > _ 
> > > >  
> > > > >  
> > > >  <<Baker, Darryl.vcf>> 
> > > > _______________________________________________
> > > > Bug-cfengine mailing list
> > > > address@hidden
> > > > http://lists.gnu.org/mailman/listinfo/bug-cfengine
> > > 
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: PGP Personal Security 7.0.3
> > > 
> > > iQA/AwUBQj7jO1e1Bhkj9lZeEQLPnACguJRtdfJFmlw0NvROWWeK1B58rO4An3HS
> > > 9NhHVhfh11Kl7nxnLELnJVgB
> > > =ZCQz
> > > -----END PGP SIGNATURE-----
> > >  
> > > 
> > 
> > 
> > _______________________________________________
> > Bug-cfengine mailing list
> > address@hidden
> > http://lists.gnu.org/mailman/listinfo/bug-cfengine
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]