[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL
From: |
Sam Russell |
Subject: |
Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL |
Date: |
Wed, 25 Dec 2024 10:34:13 +0100 |
I've increased the buffer from 64KiB to 2MiB and converted it to a ring
buffer so we can remove the 32-byte slow reduction from every cycle.
Need to average over 5 calls but I'm seeing a consistent 20%+ reduction in
time
- Chorba algorithm (~10%)
- Reducing fread calls (~5%)
- Removing the 32-byte reduction from each cycle (~5%)
No rush on this btw, Merry Christmas :)
On Wed, 25 Dec 2024 at 09:25, Sam Russell <sam.h.russell@gmail.com> wrote:
> That's interesting. I'm having issues across cfarm as they often don't
> have the coreutils dependencies and won't work with the version of clib I'm
> building against.
>
> Are you comparing the user times or the real times? IMO the user time is
> the important part as the sys part of the timing just depends on disk I/O.
> The high I/O (and the fact that we're only reading in 64KB chunks) means
> that there's going to be large variance, but I'm still seeing a consistent
> improvement over 5-10 runs.
>
> On amazon EC2 t3 (Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz)
>
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
> cksum_pclmul: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m3.129s
> user 0m0.422s
> sys 0m2.705s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
> cksum_pclmul: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m3.025s
> user 0m0.394s
> sys 0m2.630s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
> cksum_pclmul: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m3.705s
> user 0m0.517s
> sys 0m3.187s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
> cksum_pclmul: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m3.334s
> user 0m0.431s
> sys 0m2.903s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
> cksum_pclmul: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m3.250s
> user 0m0.420s
> sys 0m2.829s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
> cksum_pclmul_chorba: avx512 support not detected
> cksum_pclmul_chorba: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m2.888s
> user 0m0.368s
> sys 0m2.518s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
> cksum_pclmul_chorba: avx512 support not detected
> cksum_pclmul_chorba: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m3.032s
> user 0m0.366s
> sys 0m2.665s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
> cksum_pclmul_chorba: avx512 support not detected
> cksum_pclmul_chorba: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m2.938s
> user 0m0.347s
> sys 0m2.583s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
> cksum_pclmul_chorba: avx512 support not detected
> cksum_pclmul_chorba: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m3.148s
> user 0m0.419s
> sys 0m2.728s
> ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
> cksum_pclmul_chorba: avx512 support not detected
> cksum_pclmul_chorba: using pclmul hardware support
> 4215202376 4294967296 file
>
> real 0m2.808s
> user 0m0.344s
> sys 0m2.463s
>
> cfarm13 (Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz)
>
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
> 4215202376 4294967296 file
>
> real 0m1.103s
> user 0m0.436s
> sys 0m0.667s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
> 4215202376 4294967296 file
>
> real 0m1.320s
> user 0m0.464s
> sys 0m0.855s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
> 4215202376 4294967296 file
>
> real 0m1.641s
> user 0m0.416s
> sys 0m1.224s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
> 4215202376 4294967296 file
>
> real 0m1.714s
> user 0m0.496s
> sys 0m1.214s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
> 4215202376 4294967296 file
>
> real 0m1.107s
> user 0m0.457s
> sys 0m0.650s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
> 4215202376 4294967296 file
>
> real 0m1.091s
> user 0m0.485s
> sys 0m0.606s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
> 4215202376 4294967296 file
>
> real 0m1.083s
> user 0m0.483s
> sys 0m0.600s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
> 4215202376 4294967296 file
>
> real 0m1.102s
> user 0m0.403s
> sys 0m0.699s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
> 4215202376 4294967296 file
>
> real 0m1.081s
> user 0m0.412s
> sys 0m0.669s
> pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
> 4215202376 4294967296 file
>
> real 0m1.077s
> user 0m0.412s
> sys 0m0.665s
>
> If anyone has an i7 server I can test on I'd be happy to get more results.
> I had another change I was working on earlier that's also a 5-10%
> improvement that can get lost in the noise of the variance, I can combine
> them if we need a stronger improvement to consider taking this change?
>
> On Wed, 25 Dec 2024 at 00:52, Pádraig Brady <P@draigbrady.com> wrote:
>
>> On 24/12/2024 20:43, Sam Russell wrote:
>> > ah sorry, clicked on the wrong patch file, here is the real one
>> >
>> > On Tue, Dec 24, 2024, 19:36 Pádraig Brady <P@draigbrady.com <mailto:
>> P@draigbrady.com>> wrote:
>> >
>> > On 24/12/2024 16:03, Sam Russell wrote:
>> > > I've released a new paper here https://arxiv.org/abs/2412.16398
>> <https://arxiv.org/abs/2412.16398> and this
>> > > was the easiest algorithm to implement from it. It gets a 5-20%
>> speedup for
>> > > SSE/AVX1 and diminishing returns for AVX2/AVX512
>> >
>> > Ignoring this as looks applicable to gnulib not coreutils,
>> > and I think you've already landed this in gnulib.
>>
>> Ah thanks,
>> However this is a regression on i7-5600U at least:
>>
>> $ truncate -s4G file
>>
>> $ time src/cksum --debug filecksum: avx512 support not detected
>> cksum: avx2 support not detected
>> cksum: using pclmul hardware support
>> 4215202376 4294967296 file
>> real 0m1.445s
>> user 0m0.250s
>> sys 0m1.132s
>>
>> $ git am < ~/0001-cksum-Implement-Chorba-algorithm-in-PCLMUL.patch
>> $ make
>>
>> $ time src/cksum --debug file
>> cksum: avx512 support not detected
>> cksum: avx2 support not detected
>> cksum: using pclmul hardware support
>> 4215202376 4294967296 file
>> real 0m1.969s
>> user 0m0.263s
>> sys 0m1.683s
>>
>>
>> (I've run this a few times, with similar timings).
>>
>> cheers,
>> Pádraig
>>
>
0001-cksum-Implement-Chorba-algorithm-in-PCLMUL.patch
Description: Binary data
- [PATCH] cksum: Implement Chorba algorithm in PCLMUL, Sam Russell, 2024/12/24
- Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL, Pádraig Brady, 2024/12/24
- Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL, Sam Russell, 2024/12/24
- Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL, Michael Stone, 2024/12/25
- Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL, Sam Russell, 2024/12/25
- Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL, Michael Stone, 2024/12/25
- Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL, Pádraig Brady, 2024/12/25
- Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL, Sam Russell, 2024/12/25