qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 13/55] target/arm: Implement MVE VCLZ


From: Richard Henderson
Subject: Re: [PATCH 13/55] target/arm: Implement MVE VCLZ
Date: Thu, 10 Jun 2021 07:03:20 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

On 6/10/21 5:40 AM, Peter Maydell wrote:
+#define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
+    {                                                                   \
+        TYPE *d = vd, *m = vm;                                          \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            TYPE r = FN(m[H(e)]);                                       \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \

Why uint64_t and not TYPE?  Or uint32_t?

A later patch adds the mask_to_bytemask8(), so I wanted
a type that was definitely unsigned (so TYPE isn't any good)
and which was definitely big enough for 64 bits.

Hmm.  I was just concerned about the unnecessary type extension involved.

What about changing the interface. Not to return a mask as you do here, but to perform the entire merge operation. E.g.

static uint8_t mergemask1(uint8_t d, uint8_t r, uint16_t mask)
{
    return mask & 1 ? r : d;
}

static uint16_t mergemask2(uint16_t d, uint16_t r, uint16_t mask)
{
    uint16_t bmask = array_whotsit[mask & 3];
    return (d & ~bmask) | (r & bmask);
}

etc.

Or maybe with a pointer argument for D, so that the load+store is done there as well. In which case you could use QEMU_GENERIC to select the function invoked, instead of using token pasting everywhere. E.g.

static void mergemask_ub(uint8_t *d, uint8_r, uint16_t mask)
{
    if (mask & 1) {
        *d = r;
    }
}

static void mergemask_sb(int8_t *d, int8_r, uint16_t mask)
{
    mergemask_ub((uint8_t *)d, r, mask);
}

static void mergemask_uh(uint16_t *d, uint16_r, uint16_t mask)
{
    uint16_t bmask = array_whotsit[mask & 3];
    *d = (*d & ~bmask) | (r & bmask);
}

...

#define mergemask(D, R, M) \
    QEMU_GENERIC(D, (uint8_t *, mergemask_ub), \
                    (int8_t *,  mergemask_sb), \
                    ... )

BTW, now that we're at minimal gcc 7, I think we can shift to -std=gnu11 so that we can drop QEMU_GENERIC and just use _Generic, which is much easier to read than the above, and will give better error messages for missing cases. Anyway...

Which takes your boilerplate down to

+#define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
+    {                                                                   \
+        TYPE *d = vd, *m = vm;                                          \
+        uint16_t mask = mve_element_mask(env);                          \
+        for (unsigned e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {     \
+            mergemask(&d[H(e)], FN(m[H(e)]), mask);                     \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }

which looks pretty tidy to me.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]