[Stable-9.1.2 54/58] target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed

qemu-stable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Stable-9.1.2 54/58] target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed

From:	Michael Tokarev
Subject:	[Stable-9.1.2 54/58] target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
Date:	Sat, 9 Nov 2024 15:08:55 +0300

From: Peter Maydell <peter.maydell@linaro.org>

Our implementation of the indexed version of SVE SDOT/UDOT/USDOT got
the calculation of the inner loop terminator wrong.  Although we
correctly account for the element size when we calculate the
terminator for the first iteration:
   intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);
we don't do that when we move it forward after the first inner loop
completes.  The intention is that we process the vector in 128-bit
segments, which for a 64-bit element size should mean (1, 2), (3, 4),
(5, 6), etc.  This bug meant that we would iterate (1, 2), (3, 4, 5,
6), (7, 8, 9, 10) etc and apply the wrong indexed element to some of
the operations, and also index off the end of the vector.

You don't see this bug if the vector length is small enough that we
don't need to iterate the outer loop, i.e.  if it is only 128 bits,
or if it is the 64-bit special case from AA32/AA64 AdvSIMD.  If the
vector length is 256 bits then we calculate the right results for the
elements in the vector but do index off the end of the vector. Vector
lengths greater than 256 bits see wrong answers. The instructions
that produce 32-bit results behave correctly.

Fix the recalculation of 'segend' for subsequent iterations, and
restore a version of the comment that was lost in the refactor of
commit 7020ffd656a5 that explains why we only need to clamp segend to
opr_sz_n for the first iteration, not the later ones.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2595
Fixes: 7020ffd656a5 ("target/arm: Macroize helper_gvec_{s,u}dot_idx_{b,h}")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241101185544.2130972-1-peter.maydell@linaro.org
(cherry picked from commit e6b2fa1b81ac6b05c4397237c846a295a9857920)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 98604d170f..7cbd1b0f43 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -836,6 +836,13 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, 
uint32_t desc)  \
 {                                                                         \
     intptr_t i = 0, opr_sz = simd_oprsz(desc);                            \
     intptr_t opr_sz_n = opr_sz / sizeof(TYPED);                           \
+    /*                                                                    \
+     * Special case: opr_sz == 8 from AA64/AA32 advsimd means the         \
+     * first iteration might not be a full 16 byte segment. But           \
+     * for vector lengths beyond that this must be SVE and we know        \
+     * opr_sz is a multiple of 16, so we need not clamp segend            \
+     * to opr_sz_n when we advance it at the end of the loop.             \
+     */                                                                   \
     intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);                  \
     intptr_t index = simd_data(desc);                                     \
     TYPED *d = vd, *a = va;                                               \
@@ -853,7 +860,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, 
uint32_t desc)  \
                     n[i * 4 + 2] * m2 +                                   \
                     n[i * 4 + 3] * m3);                                   \
         } while (++i < segend);                                           \
-        segend = i + 4;                                                   \
+        segend = i + (16 / sizeof(TYPED));                                \
     } while (i < opr_sz_n);                                               \
     clear_tail(d, opr_sz, simd_maxsz(desc));                              \
 }
-- 
2.39.5

[Prev in Thread]

Current Thread

[Next in Thread]

[Stable-9.1.2 41/58] target/ppc: Fix mtDPDES targeting SMT siblings, (continued)
- [Stable-9.1.2 41/58] target/ppc: Fix mtDPDES targeting SMT siblings, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 47/58] tests/tcg: Replace -mpower8-vector with -mcpu=power8, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 46/58] hw/ssi/pnv_spi: Fixes Coverity CID 1558831, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 45/58] hw/ssi/pnv_spi: Return early in transfer(), Michael Tokarev, 2024/11/09
- [Stable-9.1.2 52/58] Revert "target/arm: Fix usage of MMU indexes when EL3 is AArch32", Michael Tokarev, 2024/11/09
- [Stable-9.1.2 48/58] hw/sd/sdcard: Fix calculation of size when using eMMC boot partitions, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 49/58] qemu-ga: Fix a SIGSEGV in ga_run_command() helper, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 51/58] acpi/disassemle-aml.sh: fix up after dir reorg, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 50/58] hw/acpi: Fix ordering of BDF in Generic Initiator PCI Device Handle., Michael Tokarev, 2024/11/09
- [Stable-9.1.2 53/58] target/arm: Add new MMU indexes for AArch32 Secure PL1&0, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 54/58] target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed), Michael Tokarev <=
- [Stable-9.1.2 55/58] migration: Ensure vmstate_save() sets errp, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 56/58] hw/nvme: fix handling of over-committed queues, Michael Tokarev, 2024/11/09
- [Stable-9.1.2 58/58] Revert "hw/audio/hda: fix memory leak on audio setup", Michael Tokarev, 2024/11/09
- [Stable-9.1.2 57/58] 9pfs: fix crash on 'Treaddir' request, Michael Tokarev, 2024/11/09

Prev by Date: [Stable-9.1.2 53/58] target/arm: Add new MMU indexes for AArch32 Secure PL1&0
Next by Date: [Stable-9.1.2 55/58] migration: Ensure vmstate_save() sets errp
Previous by thread: [Stable-9.1.2 53/58] target/arm: Add new MMU indexes for AArch32 Secure PL1&0
Next by thread: [Stable-9.1.2 55/58] migration: Ensure vmstate_save() sets errp
Index(es):
- Date
- Thread