While the 8-bit input elements are sequential in the input vector,
the 32-bit output elements are not sequential in the output matrix.
Do not attempt to compute 2 32-bit outputs at the same time.
Cc: qemu-stable@nongnu.org
Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v2: Fixed endian issue; double-checked on s390x.
---
target/arm/tcg/sme_helper.c | 77 ++++++++++++++++++-------------
tests/tcg/aarch64/sme-smopa-1.c | 47 +++++++++++++++++++
tests/tcg/aarch64/sme-smopa-2.c | 54 ++++++++++++++++++++++
tests/tcg/aarch64/Makefile.target | 2 +-
4 files changed, 147 insertions(+), 33 deletions(-)
create mode 100644 tests/tcg/aarch64/sme-smopa-1.c
create mode 100644 tests/tcg/aarch64/sme-smopa-2.c