coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency


From: Pádraig Brady
Subject: Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...
Date: Wed, 16 Mar 2011 16:15:41 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 16/03/11 15:32, Jim Meyering wrote:
> Pádraig Brady wrote:
>>
>> With SUBTHREAD_LINES_HEURISTIC=128k and -S1M option to sort we get no 
>> threads as
>> nlines never gets above 12787 (there looks to be around 80 bytes overhead 
>> per line?).
>> Only when -S >= 12M do we get nlines high enough to create threads.
> 
> Thanks for pursuing this.
> Here's a proposed patch to address the other problem.
> It doesn't have much of an effect (any?) on your
> issue when using very little memory, but when a sort user
> specifies -S1M, I think they probably want to avoid the
> expense (memory) of going multi-threaded.
> 
> What do you think?
> 
> -#define INPUT_FILE_SIZE_GUESS (1024 * 1024)
> +#define INPUT_FILE_SIZE_GUESS (128 * 1024)

This does seem a bit like whack-a-mole
but at least we're lining them up better.

The above gives reasonable threading by default,
while reducing the large upfront malloc.

$ for len in 1 79; do
    for i in $(seq 22); do
      lines=$((2<<$i))
      yes "$(printf %${len}s)"| head -n$lines > t.sort
      strace -f -c -e clone ./sort --parallel=16 t.sort -o /dev/null 2>&1 |
      join --nocheck-order -a1 -o1.4,1.5 - /dev/null |
      sed -n "s/\([0-9]*\) clone/$lines\t\1/p"
    done
  done

#lines  threads (2 byte lines)
------------------------------
131072  1
262144  3
524288  7
1048576 15
2097152 15
4194304 15
8388608 15

#lines  threads (80 byte lines)
------------------------------
131072  1
262144  3
524288  7
1048576 15
2097152 15
4194304 22
8388608 60

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]