[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments

From: Sree Harsha Totakura
Subject: Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments
Date: Fri, 31 Jan 2014 17:06:54 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131005 Icedove/17.0.9

Ok, I shall introduce a new configuration option --with-abstract-sockets

Copying this message to the mailing list so that other devs also know of it.


On 01/31/2014 03:40 PM, Christian Grothoff wrote:
> Hi!
> Why #ifdef? We could have an extra option "USE_ABSTRACT_SOCKETS",
> and if that is set and access controls are permissive (as in, any
> user/group has full access anyway), then we if (cond) to abstract
> sockets.  That should be fine, as you can obviously disable ACL
> on supermuc and set the abstract sockets option.
> My 2 cents
> -Christian
> On 01/31/2014 02:32 PM, Sree Harsha Totakura wrote:
>> Hi Christian,
>> We discussed about the slow startup of peers on the supermuc a while
>> ago.  Here is something I found related to it.  The 'bind' calls are
>> still relatively slow than they used to be earlier.
>> I suspect this is due to the changes you introduced to fix #2887.  With
>> these changes, gnunet *always* uses named sockets and since the compute
>> nodes on the supermuc have a shared global network file system creating
>> a socket file on these is slowing us down.
>> I foresee that we will have the same problem on other testbeds where it
>> is very likely to have a network file system.  So, should I ifdef the
>> code with a new configure flag to use abstract domain sockets and forgo
>> the user access permission checks in the case where abstract sockets are
>> used?
>> -
>> Sree
>> On 12/04/2013 01:57 PM, Christian Grothoff wrote:
>>> Hi!
>>> 'stat' calls should now be reduced significantly in SVN HEAD.
>>> 'bind' might also succeed more often, which might also help.
>>> Please let me know if this helped, and how much.
>>> Best,
>>> Christian
>>> On 12/04/2013 01:16 PM, Sree Harsha Totakura wrote:
>>>> Hi Christian,
>>>> Just a status update on the SuperMUC:
>>>> I found the reason why our processes do not receive some signals --
>>>> SIGTERM, SIGINT, SIGPIPE, SIGHUP were masked by the MPI run time.  Since
>>>> the signal mask is preserved through exec, our processes where are
>>>> exec'ed from MPI processes also have these signals masked out.  MSH now
>>>> clears this mask before execing child processes and this seems to work.
>>>> The other problem we are having now is the embarrassingly slow startup
>>>> of peers on the compute nodes; with the ARM processes often found in
>>>> "uninterruptable sleep" (D) state.  I traced the system calls from all
>>>> the started ARM services and found that the processes are spending about
>>>> a considerably time in `stat()'.  The following are the top 20 system
>>>> calls ordered according to the time spent executing them.  They were
>>>> collected and aggregated over 200 ARM services:
>>>> stat 12972.281245
>>>> bind 5504.138075
>>>> select 187.148530
>>>> open 82.702509
>>>> access 78.527948
>>>> read 11.191872
>>>> fcntl 10.993160
>>>> close 5.692947
>>>> setsockopt 3.294542
>>>> socket 2.986985
>>>> brk 2.909969
>>>> execve 2.174772
>>>> mmap 1.771800
>>>> mprotect 1.767051
>>>> umask 1.240802
>>>> chmod 0.925778
>>>> rt_sigaction 0.753291
>>>> listen 0.722637
>>>> fstat 0.596976
>>>> fadvise64 0.357256
>>>> On a related note, I observed that MPI tasks we start on the SuperMUC
>>>> are always pinned to CPUs.  All processes which are started from an MPI
>>>> task get pinned to the same CPU.  I can confirm this as we only start
>>>> one MSH task per compute node and all the processes which are started
>>>> from this MSH process contend and run on the same core.  I tried to
>>>> disable this process affinity, but this is being restricted by the
>>>> administrators.  For now, I am able to change the affinity setting to
>>>> use a round-robin fashion, but still this means that processes are
>>>> *always* pinned to a CPU; previously they all get pinned to a single CPU
>>>> because we only start 1 MSH task, now they are pinned across 32 CPUs.
>>>> In any case, I believe that this behavior is harming us.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]