[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] target/arm: Implement SVE2 FMMLA
From: |
Richard Henderson |
Subject: |
Re: [PATCH v2] target/arm: Implement SVE2 FMMLA |
Date: |
Wed, 22 Apr 2020 09:42:04 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 |
On 4/22/20 7:15 AM, Stephen Long wrote:
> Signed-off-by: Stephen Long <address@hidden>
>
> I'm guessing endianness doesn't matter because we are writing to the
> corresponding 32-bit/64-bit in the destination register.
> ---
> target/arm/cpu.h | 10 +++++++++
> target/arm/helper-sve.h | 3 +++
> target/arm/sve.decode | 4 ++++
> target/arm/sve_helper.c | 44 ++++++++++++++++++++++++++++++++++++++
> target/arm/translate-sve.c | 29 +++++++++++++++++++++++++
> 5 files changed, 90 insertions(+)
Endianness does matter for 32-bit, as we are writing into a host-endian 64-bit
quantity. I was being over-brief in my earlier reply.
> + TYPE p0, p1, results[4]; \
> + \
> + /* i = 0, j = 0 */ \
> + p0 = MUL(n00, m00, status); \
> + p1 = MUL(n01, m01, status); \
> + results[0] = ADD(a[0], ADD(p0, p1, status), status); \
> + \
> + /* i = 0, j = 1 */ \
> + p0 = MUL(n00, m10, status); \
> + p1 = MUL(n01, m11, status); \
> + results[1] = ADD(a[1], ADD(p0, p1, status), status); \
> + \
> + /* i = 1, j = 0 */ \
> + p0 = MUL(n10, m00, status); \
> + p1 = MUL(n11, m01, status); \
> + results[2] = ADD(a[2], ADD(p0, p1, status), status); \
> + \
> + /* i = 1, j = 1 */ \
> + p0 = MUL(n10, m10, status); \
> + p1 = MUL(n11, m11, status); \
> + results[3] = ADD(a[3], ADD(p0, p1, status), status); \
> + \
> + memcpy(d, results, sizeof(TYPE) * 4); \
There's no need for the result array -- we have already read the inputs, so we
can write back the result straight away.
r~