[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] sort: Add --threads option, which parallelizes internal sort
From: |
Jim Meyering |
Subject: |
Re: [PATCH] sort: Add --threads option, which parallelizes internal sort. |
Date: |
Thu, 02 Apr 2009 13:06:43 +0200 |
Ralf Wildenhues wrote:
> Hi Paul, all,
> Paul Eggert writes:
>>
>> This patch is by Glen Lenker, Matt Pham, Benjamin Nuernberger, Sky
>> Lin, TaeSung Roh, and Paul Eggert. It adds support for parallelism
>> within an internal sort. On our simple tests on a 2-core desktop x86,
>> overall performance improved by roughly a factor of 1.6.
>
> This is too interesting to pass up.
>
> Example run, on an 8-way, and with cat'ed instances of the dictionary,
> on tmpfs, timings best of three:
>
> runtime [s] threads
> file size [MB] 1 2 4 8
> 1 0.06 0.04 0.03 0.04
> 2 0.13 0.09 0.07 0.07
> 4 0.28 0.20 0.16 0.15
> 8 0.61 0.43 0.34 0.32
> 16 1.34 0.94 0.74 0.72
> 32 3.00 2.06 1.63 1.57
> 64 6.36 4.38 3.44 3.32
> 128 13.49 9.30 7.13 7.24
> 256 28.62 19.49 15.17 15.18
>
> Here's the abbreviated 'time' output for the last row:
>
> 26.95user 1.67system 0:28.62elapsed 100%CPU
> 30.78user 1.98system 0:19.49elapsed 168%CPU
> 37.41user 2.04system 0:15.17elapsed 260%CPU
> 60.68user 2.79system 0:15.18elapsed 417%CPU
>
> It suggests to me that too much time is spent busy-waiting in pthread_join,
> or that sort is computing too much (I haven't looked at the patch in detail).
>
> Also, I'd have expected the rate going from 1 to 2 threads to get at least
> a bit better with bigger file size, but it remains remarkably constant,
> around 1.45 for this setup. What am I missing?
Thanks for doing that, Ralf.
Just to give everyone a heads-up,
I'm a little reluctant to apply this patch in its current
state, since it's not achieving reasonable efficiency.
I noticed that running the sole test,
env time make RUN_VERY_EXPENSIVE_TESTS=yes check -C tests \
TESTS=misc/sort-benchmark-random VERBOSE=yes
took about 36s on a pretty fast system.
changing the inner loop to this
print ((map {chr(32+rand(94))} (1..100)) , "\n");
reduced that to 25s
I changed it to write "exp" from within the script
that generated the data, but that didn't make a big difference.
Finally I realized that those 50M calls to "rand" were the
problem, and only about 4% of them were actually ever used
while sorting, since 94^4 is far more than 500k.
Adjusting accordingly brought the test's run time down to ~8s.
However, the resulting sort invocation took barely 1 second here,
and that's obviously too small to be useful. Ramping up to 5M lines,
the resulting test takes almost 2 minutes and the sort itself took 34s
on this particular quad-core system.
A more interesting test would be to ensure that when
run on a multi-core system sorting with --threads=2
is at least X% faster than sorting with --threads=1.
Also, it'd be nice to add the following:
- a test that uses --threads=N to exercise the new option-parsing code,
though if you adjust as suggested to compare --threads=N for
N=1&2, that's not needed.
- a NEWS item
>From b1332b9b429dfa9a8bba7286a78bbc7e37a2e419 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Thu, 2 Apr 2009 12:35:08 +0200
Subject: [PATCH] tests: make sort-benchmark-random more efficient, then use a
larger test
* tests/misc/sort-benchmark-random: cut 75% off run time, then
increase input size by a factor of 10, so that sort runs for more
than 1 second on fast hardware.
---
tests/misc/sort-benchmark-random | 42 +++++++++++++++++++-------------------
1 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/tests/misc/sort-benchmark-random b/tests/misc/sort-benchmark-random
index 2bace4f..0c4d61d 100755
--- a/tests/misc/sort-benchmark-random
+++ b/tests/misc/sort-benchmark-random
@@ -34,33 +34,33 @@ export LC_ALL
fail=0
perl -e '
-my $num_lines = 500000;
+my $num_lines = 5000000;
my $length = 100;
+# Only the first few bytes of each line need to be random.
+# the remaining bytes are solely to simulate a more realistic work load.
+my $rand_bytes = 5; # per line, so that 94^5 >> $num_lines
+
+my @line;
for (my $i=0; $i < $num_lines; $i++)
{
- for (my $j=0; $j < $length; $j++)
- {
- printf "%c", 32 + rand(94);
- }
- print "\n";
-}' > in || framework_failure
-
-# We need to generate a lot of data for sort to show a noticeable
-# improvement in performance. Sorting it in PERL may take awhile.
-
-perl -e '
-open (FILE, "<in");
-my @list = <FILE>;
-print sort(@list);
-close (FILE);
-' > exp || framework_failure
-
-#start=$(date +%s)
+ my $line = join ("", map {chr(32+rand(94))} (1..$rand_bytes))
+ . ("0" x ($length - $rand_bytes)) . "\n";
+ print $line;
+ push @line, $line;
+}
+END {
+ open FH, ">", "exp" or die;
+ print FH sort @line;
+ close FH or die;
+}
+' > in || framework_failure
+
+start=$(date +%s)
time sort in > out || fail=1
-#duration=$(expr $(date +%s) - $start)
+duration=$(expr $(date +%s) - $start)
-#echo sorting the random data took $duration seconds
+echo sorting the random data took $duration seconds
compare out exp || fail=1
--
1.6.2.rc1.285.gc5f54
- Re: [PATCH] sort: Add --threads option, which parallelizes internal sort.,
Jim Meyering <=
- Re: [PATCH] sort: Add --threads option, which parallelizes internal sort., Jim Meyering, 2009/04/03
- Re: [PATCH] sort: Add --threads option, which parallelizes internal sort., Paul Eggert, 2009/04/03
- Re: [PATCH] sort: Add --threads option, which parallelizes internal sort., Jim Meyering, 2009/04/03
- Re: [PATCH] sort: Add --threads option, which parallelizes internal sort., Ralf Wildenhues, 2009/04/04
- Re: [PATCH] sort: Add --threads option, which parallelizes internal sort., Ralf Wildenhues, 2009/04/04
- Re: [PATCH] sort: Add --threads option, which parallelizes internal sort., James Youngman, 2009/04/05
- Re: [PATCH] sort: Add --threads option, which parallelizes internal sort., Ralf Wildenhues, 2009/04/18