[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Strange cfexecd/cfagent interaction
From: |
Tim Auckland |
Subject: |
Re: Strange cfexecd/cfagent interaction |
Date: |
20 Aug 2002 09:25:00 -0700 |
I think you'll see the EBADF on the systems that are working too.
That's quite normal on an exec.
I would suspect this is another instance of cfexecd's pthread running
out of stack space. This happened a lot in the betas of 2.0.0, but
should be fixed by now. As with any memory problem, commenting out
an unrelated line of code can sometimes "fix" the problem.
Take a look at the thread initialisation code in cfexecd, and try more
stack space, or try compiling without threads support, and see if that
makes any difference.
Tim
On Tue, 2002-08-20 at 09:07, David J. Bianco wrote:
> I've noticed on about 7 of my machines (out of about 150), cfexecd
> seems like it can't run the cfagent process when it starts up. Here's
> what I see in my syslog:
>
> Aug 20 11:01:53 xxx.jlab.org cfexecd[26729]: cfengine defines no system
> administrator address
> Aug 20 11:01:53 xxx.jlab.org cfexecd[26729]: Need: sysadm = ( address@hidden
> )
> in control
>
> Now, I use the same config files on each of my hosts, and the same
> binaries too, architecture permitting. None of my other hosts complain
> and a manual check of the cfagent.conf file shows that I do define
> my email address there properly. I even get tons of reports emailed to
> me from the other machines, but not these malfunctioning 7.
>
> On the machines that malfunction, an strace of cfexecd when it starts
> up shows the following excerpt:
>
> [pid 997] close(0) = 0
> [pid 997] getpid() = 997
> [pid 997] rt_sigaction(SIGRT_0, {SIG_DFL}, NULL, 8) = 0
> [pid 997] rt_sigaction(SIGRT_1, {SIG_DFL}, NULL, 8) = 0
> [pid 997] rt_sigaction(SIGRT_2, {SIG_DFL}, NULL, 8) = 0
> [pid 997] execve("/var/cfengine/sbin/cfagent",
> ["/var/cfengine/sbin/cfagent", "-z"], [/* 57 vars */]) = 0
> [pid 997] fcntl(0, F_GETFD) = -1 EBADF (Bad file descriptor)
> [pid 997] --- SIGSEGV (Segmentation fault) ---
> <... read resumed> "", 4096) = 0
> --- SIGCHLD (Child exited) ---
>
> Translation: cfexecd tried to exec cfagent -z. Cfagent started, but
> before main() was invoked the process initialization routine tried
> to see if stdin should be preserved across the exec. Stdin was already
> closed, though, so fcntl() segfaulted before cfagent really had a chance
> to run.
>
> I traced this down to one line in cfpopen.c which seemed to be the
> trigger for this behavior, line 89:
>
> if (pid == 0)
> {
> switch (*type)
> {
> case 'r':
>
> /* THIS CLOSE IS THE TRIGGER LINE FOR THE BUG */
> close(pd[0]); /* Don't need output from parent */
>
> if (pd[1] != 1)
> {
> dup2(pd[1],1); /* Attach pp=pd[1] to our stdout */
> dup2(pd[1],2); /* Merge stdout/stderr */
> close(pd[1]);
> }
>
> break;
>
> This is the line that actually closes stdin for the newly created
> child process. If I comment it out, cfagent runs beautifully.
> If I leave it in, it bombs when cfexecd starts up.
>
> Now, I would argue that this is probably a bug in fcntl, since it
> should do some sort of error checking and return a -1 with errno
> set, rather than just segfaulting. Still, this code has been failing
> on more than one OS. The 7 machines it has trouble on are a mixture
> of HP-UX, Linux and Solaris.
>
> Has anyone else seen this? What would the implications be of *not*
> closing the child's stdin before execing cfagent? My brief analysis
> leads me to believe that it would be pretty safe, but I haven't looked
> into every call to cfpopen() in all parts of the code.
>
> Anyway, I'm not sure what the final fix for this is, but it seems
> that keeping stdin open might be a good one.
>
> David
>
>
> --
> David J. Bianco, GSEC <address@hidden>
> Thomas Jefferson National Accelerator Facility
>
> The views expressed herein are solely those of the author and
> not those of SURA/Jefferson Lab or the US DOE.
>
>
>
> _______________________________________________
> Bug-cfengine mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/bug-cfengine