qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] target/arm: Implement SVE2 FMMLA


From: Richard Henderson
Subject: Re: [PATCH v2] target/arm: Implement SVE2 FMMLA
Date: Wed, 22 Apr 2020 09:42:04 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

On 4/22/20 7:15 AM, Stephen Long wrote:
> Signed-off-by: Stephen Long <address@hidden>
> 
> I'm guessing endianness doesn't matter because we are writing to the
> corresponding 32-bit/64-bit in the destination register.
> ---
>  target/arm/cpu.h           | 10 +++++++++
>  target/arm/helper-sve.h    |  3 +++
>  target/arm/sve.decode      |  4 ++++
>  target/arm/sve_helper.c    | 44 ++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 29 +++++++++++++++++++++++++
>  5 files changed, 90 insertions(+)

Endianness does matter for 32-bit, as we are writing into a host-endian 64-bit
quantity.  I was being over-brief in my earlier reply.


> +        TYPE p0, p1, results[4];                                            \
> +                                                                            \
> +        /* i = 0, j = 0 */                                                  \
> +        p0 = MUL(n00, m00, status);                                         \
> +        p1 = MUL(n01, m01, status);                                         \
> +        results[0] = ADD(a[0], ADD(p0, p1, status), status);                \
> +                                                                            \
> +        /* i = 0, j = 1 */                                                  \
> +        p0 = MUL(n00, m10, status);                                         \
> +        p1 = MUL(n01, m11, status);                                         \
> +        results[1] = ADD(a[1], ADD(p0, p1, status), status);                \
> +                                                                            \
> +        /* i = 1, j = 0 */                                                  \
> +        p0 = MUL(n10, m00, status);                                         \
> +        p1 = MUL(n11, m01, status);                                         \
> +        results[2] = ADD(a[2], ADD(p0, p1, status), status);                \
> +                                                                            \
> +        /* i = 1, j = 1 */                                                  \
> +        p0 = MUL(n10, m10, status);                                         \
> +        p1 = MUL(n11, m11, status);                                         \
> +        results[3] = ADD(a[3], ADD(p0, p1, status), status);                \
> +                                                                            \
> +        memcpy(d, results, sizeof(TYPE) * 4);                               \

There's no need for the result array -- we have already read the inputs, so we
can write back the result straight away.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]