[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
SIGPIPE problem
From: |
Yaroslav Halchenko |
Subject: |
SIGPIPE problem |
Date: |
Tue, 20 Sep 2005 17:32:07 -0400 |
User-agent: |
mutt-ng devel-20050619 (Debian) |
Dear Cfengineers,
Such a problem seems started to occur when we extended the cluster to 25
nodes:
/var/lib/cfengine2/bin/cfagent -Dfrom_cfexecd
hangs and I guess that is causing next message to be sent via email
cfengine:node5: Received signal 13 (SIGPIPE) while doing
[lock.cfagent_conf.node5.files._var_spool_torque_spool_3777_4000__1_1094]
cfengine:node5: Logical start time Tue Sep 20 16:42:23 2005
cfengine:node5: This sub-task started really at Tue Sep 20 16:42:23 2005
Hanging process has PID 29349, thus here is some diagnostics:
gdb attached to the process gives backtrace of 758 function calls, which
look like
#0 0xb7d0d2cb in nanosleep () from /lib/tls/libc.so.6
#1 0xb7d0d110 in sleep () from /lib/tls/libc.so.6
#2 0x0804d163 in ?? ()
#3 0x000009d1 in ?? ()
#4 0x00000000 in ?? ()
#5 0x0000000a in ?? ()
#6 0x00000000 in ?? ()
#7 0x08130cc0 in optarg ()
#8 0x00000000 in ?? ()
.....
#747 0xb7ca6e55 in getenv () from /lib/tls/libc.so.6
#748 0x0804b2d4 in ?? ()
#749 0x08120860 in optarg ()
#750 0x080a85f8 in _IO_stdin_used ()
#751 0x00000000 in ?? ()
#752 0xb7dadff4 in ?? () from /lib/tls/libc.so.6
#753 0x00000000 in ?? ()
#754 0xb7dadff4 in ?? () from /lib/tls/libc.so.6
#755 0xbffffe28 in ?? ()
#756 0xb7c8fec0 in __libc_start_main () from /lib/tls/libc.so.6
#757 0xb7c8fec0 in __libc_start_main () from /lib/tls/libc.so.6
#758 0x0804b0a1 in ?? ()
In the logs on the node I see:
cfengine.node5.runlog:Tue Sep 20 16:30:25 2005:Lock removed normally
:pid=29349:lock.cfagent_conf.node5.copy._etc_cfengine_inputs___var_lib_cfengine2_inputs_ravana_3343:
cfengine.node5.runlog:Tue Sep 20 16:30:26 2005:Lock removed normally
:pid=29349:lock.cfagent_conf.node5.tidy._var_lib_cfengine2_outputs_3023:
cfengine.node5.runlog:Tue Sep 20 17:16:47 2005:Lock removed normally
:pid=29349:lock.cfagent_conf.node5.disks.__3249:
cfengine.node5.runlog:Tue Sep 20 17:16:48 2005:Lock removed normally
:pid=29349:lock.cfagent_conf.node5.disks._usr_2286:
cfengine.node5.runlog:Tue Sep 20 17:16:48 2005:Lock removed normally
:pid=29349:lock.cfagent_conf.node5.disks._var_4909:
I kept attached in gdb since approx 17:00, and then the process
"completed" as soon as I detached
Where should I look to reveal the source of the SIGPIPE message?
I'm running Debian unstable with 2.1.15-1 of cfengine
I have splaytime to be 50 and maxconnections 40
Thank you in advance for the hints
--
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07105
Student Ph.D. @ CS Dept. NJIT
- SIGPIPE problem,
Yaroslav Halchenko <=