[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC v4] target/arm: Implement SVE2 HISTCNT, HISTSEG
From: |
Richard Henderson |
Subject: |
Re: [PATCH RFC v4] target/arm: Implement SVE2 HISTCNT, HISTSEG |
Date: |
Thu, 16 Apr 2020 09:13:52 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 |
On 4/16/20 7:42 AM, Stephen Long wrote:
> +static inline uint8_t do_histseg_cnt(uint8_t n, uint64_t m0, uint64_t m1)
> +{
> + int esz = 0;
Clearer to use MO_8.
> + int bits = 8 << esz;
> + uint64_t ones = dup_const(esz, 1);
> + uint64_t signs = ones << (bits - 1);
> + uint64_t cmp0, cmp1;
> +
> + cmp1 = dup_const(1, n);
Error in the esz argument here.
> + cmp0 = cmp1 ^ m0;
> + cmp1 = cmp1 ^ m1;
> + cmp0 = (cmp0 - ones) & ~cmp0;
> + cmp1 = (cmp1 - ones) & ~cmp1;
> + return ctpop64((cmp0 | cmp1) & signs);
> +}
Ah, well, I may have been too brief in my suggestion before. I encourage you
to have a look at the bithacks patch and understand the algorithm here -- it's
quite clever.
We cannot simply OR the two halves together, since 8 | 8 == 8 loses one from
the count of bits. So:
cmp0 = (cmp0 - ones) & ~cmp0 & signs;
cmp1 = (cmp1 - ones) & ~cmp1 & signs;
/*
* Combine the two compares in a way that the bits do
* not overlap, and so preserves the count of set bits.
* If the host has a efficient instruction for ctpop,
* then ctpop(x) + ctpop(y) has the same number of
* operations as ctpop(x | (y >> 1)). If the host does
* not have an efficient ctpop, then we only want to
* use it once.
*/
return ctpop64(cmp0 | (cmp1 >> 1));
> + for (j = 0; j < 64; j += 8) {
> + uint8_t count0 = do_histseg_cnt(n0 >> j, m0, m1);
> + out0 |= count0 << j;
> +
> + uint8_t count1 = do_histseg_cnt(n1 >> j, m0, m1);
> + out1 |= count1 << j;
> + }
Wrong type for count0/count1 for shifting by e.g. 56.
You might as well just use uint64_t as the return value from do_histseg_cnt()
so that we don't get unnecessary zero-extensions from the compiler.
r~