qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC v4] target/arm: Implement SVE2 HISTCNT, HISTSEG


From: Richard Henderson
Subject: Re: [PATCH RFC v4] target/arm: Implement SVE2 HISTCNT, HISTSEG
Date: Thu, 16 Apr 2020 09:13:52 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

On 4/16/20 7:42 AM, Stephen Long wrote:
> +static inline uint8_t do_histseg_cnt(uint8_t n, uint64_t m0, uint64_t m1)
> +{
> +    int esz = 0;

Clearer to use MO_8.

> +    int bits = 8 << esz;
> +    uint64_t ones = dup_const(esz, 1);
> +    uint64_t signs = ones << (bits - 1);
> +    uint64_t cmp0, cmp1;
> +
> +    cmp1 = dup_const(1, n);

Error in the esz argument here.

> +    cmp0 = cmp1 ^ m0;
> +    cmp1 = cmp1 ^ m1;
> +    cmp0 = (cmp0 - ones) & ~cmp0;
> +    cmp1 = (cmp1 - ones) & ~cmp1;
> +    return ctpop64((cmp0 | cmp1) & signs);
> +}

Ah, well, I may have been too brief in my suggestion before.  I encourage you
to have a look at the bithacks patch and understand the algorithm here -- it's
quite clever.

We cannot simply OR the two halves together, since 8 | 8 == 8 loses one from
the count of bits.  So:

  cmp0 = (cmp0 - ones) & ~cmp0 & signs;
  cmp1 = (cmp1 - ones) & ~cmp1 & signs;

  /*
   * Combine the two compares in a way that the bits do
   * not overlap, and so preserves the count of set bits.
   * If the host has a efficient instruction for ctpop,
   * then ctpop(x) + ctpop(y) has the same number of
   * operations as ctpop(x | (y >> 1)).  If the host does
   * not have an efficient ctpop, then we only want to
   * use it once.
   */
  return ctpop64(cmp0 | (cmp1 >> 1));

> +        for (j = 0; j < 64; j += 8) {
> +            uint8_t count0 = do_histseg_cnt(n0 >> j, m0, m1);
> +            out0 |= count0 << j;
> +
> +            uint8_t count1 = do_histseg_cnt(n1 >> j, m0, m1);
> +            out1 |= count1 << j;
> +        }

Wrong type for count0/count1 for shifting by e.g. 56.

You might as well just use uint64_t as the return value from do_histseg_cnt()
so that we don't get unnecessary zero-extensions from the compiler.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]