[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 13/55] target/arm: Implement MVE VCLZ
From: |
Richard Henderson |
Subject: |
Re: [PATCH 13/55] target/arm: Implement MVE VCLZ |
Date: |
Thu, 10 Jun 2021 07:03:20 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 |
On 6/10/21 5:40 AM, Peter Maydell wrote:
+#define DO_1OP(OP, ESIZE, TYPE, H, FN) \
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
+ { \
+ TYPE *d = vd, *m = vm; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ TYPE r = FN(m[H(e)]); \
+ uint64_t bytemask = mask_to_bytemask##ESIZE(mask); \
Why uint64_t and not TYPE? Or uint32_t?
A later patch adds the mask_to_bytemask8(), so I wanted
a type that was definitely unsigned (so TYPE isn't any good)
and which was definitely big enough for 64 bits.
Hmm. I was just concerned about the unnecessary type extension involved.
What about changing the interface. Not to return a mask as you do here, but to
perform the entire merge operation. E.g.
static uint8_t mergemask1(uint8_t d, uint8_t r, uint16_t mask)
{
return mask & 1 ? r : d;
}
static uint16_t mergemask2(uint16_t d, uint16_t r, uint16_t mask)
{
uint16_t bmask = array_whotsit[mask & 3];
return (d & ~bmask) | (r & bmask);
}
etc.
Or maybe with a pointer argument for D, so that the load+store is done there as
well. In which case you could use QEMU_GENERIC to select the function invoked,
instead of using token pasting everywhere. E.g.
static void mergemask_ub(uint8_t *d, uint8_r, uint16_t mask)
{
if (mask & 1) {
*d = r;
}
}
static void mergemask_sb(int8_t *d, int8_r, uint16_t mask)
{
mergemask_ub((uint8_t *)d, r, mask);
}
static void mergemask_uh(uint16_t *d, uint16_r, uint16_t mask)
{
uint16_t bmask = array_whotsit[mask & 3];
*d = (*d & ~bmask) | (r & bmask);
}
...
#define mergemask(D, R, M) \
QEMU_GENERIC(D, (uint8_t *, mergemask_ub), \
(int8_t *, mergemask_sb), \
... )
BTW, now that we're at minimal gcc 7, I think we can shift to -std=gnu11 so
that we can drop QEMU_GENERIC and just use _Generic, which is much easier to
read than the above, and will give better error messages for missing cases.
Anyway...
Which takes your boilerplate down to
+#define DO_1OP(OP, ESIZE, TYPE, H, FN) \
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
+ { \
+ TYPE *d = vd, *m = vm; \
+ uint16_t mask = mve_element_mask(env); \
+ for (unsigned e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ mergemask(&d[H(e)], FN(m[H(e)]), mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
which looks pretty tidy to me.
r~
[PATCH 12/55] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns, Peter Maydell, 2021/06/07
[PATCH 14/55] target/arm: Implement MVE VCLS, Peter Maydell, 2021/06/07
[PATCH 21/55] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR, Peter Maydell, 2021/06/07
[PATCH 20/55] target/arm: Implement MVE VDUP, Peter Maydell, 2021/06/07