qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 59/82] target/arm: Implement SVE mixed sign dot product (i


From: Peter Maydell
Subject: Re: [PATCH v6 59/82] target/arm: Implement SVE mixed sign dot product (indexed)
Date: Thu, 13 May 2021 13:57:59 +0100

On Fri, 30 Apr 2021 at 22:04, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h           |  5 +++
>  target/arm/helper.h        |  4 +++
>  target/arm/sve.decode      |  4 +++
>  target/arm/translate-sve.c | 16 +++++++++
>  target/arm/vec_helper.c    | 68 ++++++++++++++++++++++++++++++++++++++
>  5 files changed, 97 insertions(+)

> diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
> index 8b7269d8e1..98b707f4f5 100644
> --- a/target/arm/vec_helper.c
> +++ b/target/arm/vec_helper.c
> @@ -677,6 +677,74 @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void 
> *vm,
>      clear_tail(d, opr_sz, simd_maxsz(desc));
>  }
>
> +void HELPER(gvec_sudot_idx_b)(void *vd, void *vn, void *vm,
> +                              void *va, uint32_t desc)
> +{
> +    intptr_t i, segend, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4;
> +    intptr_t index = simd_data(desc);
> +    int32_t *d = vd, *a = va;
> +    int8_t *n = vn;
> +    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
> +
> +    /*
> +     * Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
> +     * Otherwise opr_sz is a multiple of 16.
> +     */

These are only used by SVE, aren't they ? I guess maintaining
the parallelism with the helpers that are shared is worthwhile.

> +    segend = MIN(4, opr_sz_4);
> +    i = 0;
> +    do {
> +        uint8_t m0 = m_indexed[i * 4 + 0];
> +        uint8_t m1 = m_indexed[i * 4 + 1];
> +        uint8_t m2 = m_indexed[i * 4 + 2];
> +        uint8_t m3 = m_indexed[i * 4 + 3];
> +
> +        do {
> +            d[i] = (a[i] +
> +                    n[i * 4 + 0] * m0 +
> +                    n[i * 4 + 1] * m1 +
> +                    n[i * 4 + 2] * m2 +
> +                    n[i * 4 + 3] * m3);
> +        } while (++i < segend);
> +        segend = i + 4;
> +    } while (i < opr_sz_4);
> +
> +    clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> +
> +void HELPER(gvec_usdot_idx_b)(void *vd, void *vn, void *vm,
> +                              void *va, uint32_t desc)
> +{
> +    intptr_t i, segend, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4;
> +    intptr_t index = simd_data(desc);
> +    uint32_t *d = vd, *a = va;
> +    uint8_t *n = vn;
> +    int8_t *m_indexed = (int8_t *)vm + index * 4;
> +
> +    /*
> +     * Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
> +     * Otherwise opr_sz is a multiple of 16.
> +     */
> +    segend = MIN(4, opr_sz_4);
> +    i = 0;
> +    do {
> +        int8_t m0 = m_indexed[i * 4 + 0];
> +        int8_t m1 = m_indexed[i * 4 + 1];
> +        int8_t m2 = m_indexed[i * 4 + 2];
> +        int8_t m3 = m_indexed[i * 4 + 3];
> +
> +        do {
> +            d[i] = (a[i] +
> +                    n[i * 4 + 0] * m0 +
> +                    n[i * 4 + 1] * m1 +
> +                    n[i * 4 + 2] * m2 +
> +                    n[i * 4 + 3] * m3);
> +        } while (++i < segend);
> +        segend = i + 4;
> +    } while (i < opr_sz_4);
> +
> +    clear_tail(d, opr_sz, simd_maxsz(desc));
> +}

Maybe we should macroify this, as unless I'm misreading them
gvec_sdot_idx_b, gvec_udot_idx_b, gvec_sudot_idx_b and gvec_usdot_idx_b
only differ in the types of the index and the data.

But if you'd rather not you can have a
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
for this version.

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]