Re: [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector lo

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector lo

From:	Paolo Savini
Subject:	Re: [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.
Date:	Tue, 10 Sep 2024 12:20:16 +0100
User-agent:	Mozilla Thunderbird

Thanks for the feedback Richard, I'm working on the endianness. Couldyou please give me more details about the atomicity issues you arereferring to?


Best wishes

Paolo

On 7/27/24 08:15, Richard Henderson wrote:

On 7/18/24 01:30, Paolo Savini wrote:
This patch optimizes the emulation of unit-stride load/store RVVinstructionswhen the data being loaded/stored per iteration amounts to 64 bytesor more.The optimization consists of calling __builtin_memcpy on chunks ofdata of 128and 256 bytes between the memory address of the simulated vectorregister and
the destination memory address and vice versa.
This is done only if we have direct access to the RAM of the hostmachine.
Signed-off-by: Paolo Savini <paolo.savini@embecosm.com>
---
  target/riscv/vector_helper.c | 17 ++++++++++++++++-
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 4b444c6bc5..7674972784 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -486,7 +486,22 @@ vext_group_ldst_host(CPURISCVState *env, void*vd, uint32_t byte_end,
      }
        fn = fns[is_load][group_size];
-    fn(vd, byte_offset, host + byte_offset);
+
+    if (byte_offset + 32 < byte_end) {
+      group_size = MO_256;
+      if (is_load)
+ __builtin_memcpy((uint8_t *)(vd + byte_offset), (uint8_t*)(host + byte_offset), 32);
+      else
+ __builtin_memcpy((uint8_t *)(host + byte_offset), (uint8_t*)(vd + byte_offset), 32);
+    } else if (byte_offset + 16 < byte_end) {
+      group_size = MO_128;
+      if (is_load)
+ __builtin_memcpy((uint8_t *)(vd + byte_offset), (uint8_t*)(host + byte_offset), 16);
+      else
+ __builtin_memcpy((uint8_t *)(host + byte_offset), (uint8_t*)(vd + byte_offset), 16);
+    } else {
+      fn(vd, byte_offset, host + byte_offset);
+    }
This will not work for big-endian hosts.
This may have atomicity issues, depending on the spec, the compileroptions, and the host capabilities.
r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data., Paolo Savini <=
- Re: [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data., Richard Henderson, 2024/09/10

Prev by Date: Re: check-functional skipUnless failure
Next by Date: [PATCH 0/2] misc: Rename included template headers using '.inc' suffix
Previous by thread: [PATCH v5 0/4] tests: updates for aarch64/sbsa-ref
Next by thread: Re: [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.
Index(es):
- Date
- Thread