coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Speedup wc -l


From: Sami Kerola
Subject: Re: [PATCH] Speedup wc -l
Date: Sun, 15 Mar 2015 23:11:06 +0000

On 15 March 2015 at 22:18, Pádraig Brady <address@hidden> wrote:
> On 15/03/15 21:14, Kristoffer Brånemyr wrote:
>>
>>
>>
>>
>>>Den söndag, 15 mars 2015 20:13 skrev Pádraig Brady <address@hidden>:
>>>
>>>
>>>>On 15/03/15 08:33, Kristoffer Brånemyr wrote:
>>>>
>>>> Hi,
>>>>
>>>> I did some tests and found out you can actually beat memchr with a simple 
>>>> loop. Tests were done on >>a Intel Xeon E3-1231v3 (4*3.4GHz), on a 4GB 
>>>> file that was already cached in memory. >>Benchmarking >was done simply 
>>>> with the 'time' command. I don't know how this code would run on >>other 
>>>> >architectures, but I guess you could put it in an #ifdef?
>>>>
>>>> Coreutils 2.83 version, compiled with -O3:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real    0m3.126s
>>>> user    0m2.699s
>>>> sys    0m0.429s
>>>>
>>>>
>>>> Improved version compiled with -O2:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real    0m2.857s
>>>> user    0m2.461s
>>>> sys    0m0.396s
>>>>
>>>> Improved version compiled with -O3:
>>>>  507755520 /home/ztion/words
>>>>
>>>> real    0m1.518s
>>>> user    0m1.157s
>>>> sys    0m0.361s
>>>>
>>>> I studied the generated assembly and with -O3 gcc generates some fancy SSE 
>>>> code, getting some nice speedups. memchr is also SSE optimized as far as I 
>>>> know, so it's interesting that this is so much faster, twice as fast 
>>>> actually.
>>>>
>>>> In case you don't like turning -O3 on for some reason (the default in 
>>>> coreutils is -O2 i think), the best version I could put together for -O2 
>>>> was this:
>>>>
>>>> Improved version 2, compiled with -O2:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real    0m2.206s
>>>> user    0m1.827s
>>>> sys    0m0.379s
>>
>>
>>>Interesting. Thanks for the results.
>>>I use 'gcc -march=native -g -O3' locally, and with that can't see a 
>>>difference in performance.
>>>
>>>What version of glibc and gcc are you using?
>>>gcc-4.9.2-1.fc21.x86_64 and glibc-2.20-7.fc21.x86_64 here.
>>>
>>>thanks,
>>>Pádraig.
>>
>>
>> Hi,
>>
>> This is with gcc 4.9.2-7 and glibc 2.19-17 on Debian amd64. The difference 
>> is still there for me when compiling with your CFLAGS. Have they improved 
>> memchr in glibc 2.20? I don't think they have that yet in debian 
>> unfortunately.
>>
>> What cpu do you have?
>
>
> i3-2310M
>
> I was doing a very quick test with _short_ lines
> Specifically /usr/share/dict/words
>
> Note GCC should be using builtin_memchr here so not
> hitting the function call overhead.
>
> I'll look in more detail later.

Build from coreutils & gnulib git checkouts from point v8.23-149-gd95cdcc

real    0m0.824s
real    0m0.828s
real    0m0.830s
real    0m0.831s
real    0m0.875s

After Kristoffer's change

real    0m0.774s
real    0m0.776s
real    0m0.778s
real    0m0.779s
real    0m0.780s

I'm using up to date testing  archlinux.

$ pacman -Q gcc glibc linux
gcc 4.9.2-4
glibc 2.21-2
linux 3.19.1-1

Built with: gcc -O3 -Ofast
CPU: AMD E1-1200

Reference. My test input had following data:

$ time wc test-input
 1141570  8211600 49489140 test-input

-- 
Sami Kerola
http://www.iki.fi/kerolasa/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]