Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments

gnunet-developers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments

From:	Christian Grothoff
Subject:	Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments
Date:	Fri, 31 Jan 2014 17:31:08 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131103 Icedove/17.0.10

NO, not a configure option. Just a config-file option!

No #ifdef, simple C "if".

-Christian

On 01/31/2014 05:06 PM, Sree Harsha Totakura wrote:
> Ok, I shall introduce a new configuration option --with-abstract-sockets
> then.
> 
> Copying this message to the mailing list so that other devs also know of it.
> 
> Sree
> 
> On 01/31/2014 03:40 PM, Christian Grothoff wrote:
>> Hi!
>>
>> Why #ifdef? We could have an extra option "USE_ABSTRACT_SOCKETS",
>> and if that is set and access controls are permissive (as in, any
>> user/group has full access anyway), then we if (cond) to abstract
>> sockets.  That should be fine, as you can obviously disable ACL
>> on supermuc and set the abstract sockets option.
>>
>> My 2 cents
>>
>> -Christian
>>
>> On 01/31/2014 02:32 PM, Sree Harsha Totakura wrote:
>>> Hi Christian,
>>>
>>> We discussed about the slow startup of peers on the supermuc a while
>>> ago.  Here is something I found related to it.  The 'bind' calls are
>>> still relatively slow than they used to be earlier.
>>>
>>> I suspect this is due to the changes you introduced to fix #2887.  With
>>> these changes, gnunet *always* uses named sockets and since the compute
>>> nodes on the supermuc have a shared global network file system creating
>>> a socket file on these is slowing us down.
>>>
>>> I foresee that we will have the same problem on other testbeds where it
>>> is very likely to have a network file system.  So, should I ifdef the
>>> code with a new configure flag to use abstract domain sockets and forgo
>>> the user access permission checks in the case where abstract sockets are
>>> used?
>>>
>>> -
>>> Sree
>>>
>>> On 12/04/2013 01:57 PM, Christian Grothoff wrote:
>>>> Hi!
>>>>
>>>> 'stat' calls should now be reduced significantly in SVN HEAD.
>>>> 'bind' might also succeed more often, which might also help.
>>>>
>>>> Please let me know if this helped, and how much.
>>>>
>>>> Best,
>>>>
>>>> Christian
>>>>
>>>> On 12/04/2013 01:16 PM, Sree Harsha Totakura wrote:
>>>>> Hi Christian,
>>>>>
>>>>> Just a status update on the SuperMUC:
>>>>>
>>>>> I found the reason why our processes do not receive some signals --
>>>>> SIGTERM, SIGINT, SIGPIPE, SIGHUP were masked by the MPI run time.  Since
>>>>> the signal mask is preserved through exec, our processes where are
>>>>> exec'ed from MPI processes also have these signals masked out.  MSH now
>>>>> clears this mask before execing child processes and this seems to work.
>>>>>
>>>>> The other problem we are having now is the embarrassingly slow startup
>>>>> of peers on the compute nodes; with the ARM processes often found in
>>>>> "uninterruptable sleep" (D) state.  I traced the system calls from all
>>>>> the started ARM services and found that the processes are spending about
>>>>> a considerably time in `stat()'.  The following are the top 20 system
>>>>> calls ordered according to the time spent executing them.  They were
>>>>> collected and aggregated over 200 ARM services:
>>>>>
>>>>> stat 12972.281245
>>>>> bind 5504.138075
>>>>> select 187.148530
>>>>> open 82.702509
>>>>> access 78.527948
>>>>> read 11.191872
>>>>> fcntl 10.993160
>>>>> close 5.692947
>>>>> setsockopt 3.294542
>>>>> socket 2.986985
>>>>> brk 2.909969
>>>>> execve 2.174772
>>>>> mmap 1.771800
>>>>> mprotect 1.767051
>>>>> umask 1.240802
>>>>> chmod 0.925778
>>>>> rt_sigaction 0.753291
>>>>> listen 0.722637
>>>>> fstat 0.596976
>>>>> fadvise64 0.357256
>>>>>
>>>>> On a related note, I observed that MPI tasks we start on the SuperMUC
>>>>> are always pinned to CPUs.  All processes which are started from an MPI
>>>>> task get pinned to the same CPU.  I can confirm this as we only start
>>>>> one MSH task per compute node and all the processes which are started
>>>>> from this MSH process contend and run on the same core.  I tried to
>>>>> disable this process affinity, but this is being restricted by the
>>>>> administrators.  For now, I am able to change the affinity setting to
>>>>> use a round-robin fashion, but still this means that processes are
>>>>> *always* pinned to a CPU; previously they all get pinned to a single CPU
>>>>> because we only start 1 MSH task, now they are pinned across 32 CPUs.
>>>>> In any case, I believe that this behavior is harming us.
>>>>>
>>>
>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments, Sree Harsha Totakura, 2014/01/31
- Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments, Christian Grothoff <=

Prev by Date: Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments
Previous by thread: Re: [GNUnet-developers] Named UNIX domain sockets slowing experiments
Index(es):
- Date
- Thread