coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SV: [PATCH] Speedup wc -l


From: Kristoffer Brånemyr
Subject: SV: [PATCH] Speedup wc -l
Date: Mon, 16 Mar 2015 17:40:59 +0000 (UTC)

Well, according to my quick research I think gcc would only use the builtin version if the length is known at compile time. However, I can't seem to get gcc to emit the builtin version even if I force a known length. Maybe gcc simply doesn't have a faster version than glibc for the amd64 target?

Interesting that people have varied results with my patch. I guess it could also be because of internal differences in cpus, but of 2 people reporting, 1 reports no difference, 1 reports it's faster. Would be interesting to see more testcases.
 
--
/Kristoffer Brånemyr


Den måndag, 16 mars 2015 0:11 skrev Sami Kerola <address@hidden>:


On 15 March 2015 at 22:18, Pádraig Brady <address@hidden> wrote:

> On 15/03/15 21:14, Kristoffer Brånemyr wrote:
>>
>>
>>
>>
>>>Den söndag, 15 mars 2015 20:13 skrev Pádraig Brady <address@hidden>:
>>>
>>>
>>>>On 15/03/15 08:33, Kristoffer Brånemyr wrote:
>>>>
>>>> Hi,
>>>>
>>>> I did some tests and found out you can actually beat memchr with a simple loop. Tests were done on >>a Intel Xeon E3-1231v3 (4*3.4GHz), on a 4GB file that was already cached in memory. >>Benchmarking >was done simply with the 'time' command. I don't know how this code would run on >>other >architectures, but I guess you could put it in an #ifdef?
>>>>
>>>> Coreutils 2.83 version, compiled with -O3:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real    0m3.126s
>>>> user    0m2.699s
>>>> sys    0m0.429s
>>>>
>>>>
>>>> Improved version compiled with -O2:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real    0m2.857s
>>>> user    0m2.461s
>>>> sys    0m0.396s
>>>>
>>>> Improved version compiled with -O3:
>>>>  507755520 /home/ztion/words
>>>>
>>>> real    0m1.518s
>>>> user    0m1.157s
>>>> sys    0m0.361s
>>>>
>>>> I studied the generated assembly and with -O3 gcc generates some fancy SSE code, getting some nice speedups. memchr is also SSE optimized as far as I know, so it's interesting that this is so much faster, twice as fast actually.
>>>>
>>>> In case you don't like turning -O3 on for some reason (the default in coreutils is -O2 i think), the best version I could put together for -O2 was this:
>>>>
>>>> Improved version 2, compiled with -O2:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real    0m2.206s
>>>> user    0m1.827s
>>>> sys    0m0.379s
>>
>>
>>>Interesting. Thanks for the results.
>>>I use 'gcc -march=native -g -O3' locally, and with that can't see a difference in performance.
>>>
>>>What version of glibc and gcc are you using?
>>>gcc-4.9.2-1.fc21.x86_64 and glibc-2.20-7.fc21.x86_64 here.
>>>
>>>thanks,
>>>Pádraig.
>>
>>
>> Hi,
>>
>> This is with gcc 4.9.2-7 and glibc 2.19-17 on Debian amd64. The difference is still there for me when compiling with your CFLAGS. Have they improved memchr in glibc 2.20? I don't think they have that yet in debian unfortunately.
>>
>> What cpu do you have?
>
>
> i3-2310M
>
> I was doing a very quick test with _short_ lines
> Specifically /usr/share/dict/words
>
> Note GCC should be using builtin_memchr here so not
> hitting the function call overhead.
>
> I'll look in more detail later.


Build from coreutils & gnulib git checkouts from point v8.23-149-gd95cdcc

real    0m0.824s
real    0m0.828s
real    0m0.830s
real    0m0.831s
real    0m0.875s

After Kristoffer's change

real    0m0.774s
real    0m0.776s
real    0m0.778s
real    0m0.779s
real    0m0.780s

I'm using up to date testing  archlinux.

$ pacman -Q gcc glibc linux
gcc 4.9.2-4
glibc 2.21-2
linux 3.19.1-1

Built with: gcc -O3 -Ofast
CPU: AMD E1-1200

Reference. My test input had following data:

$ time wc test-input
1141570  8211600 49489140 test-input

--
Sami Kerola
http://www.iki.fi/kerolasa/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]