Re: [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st

qemu-riscv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st

From:	Max Chou
Subject:	Re: [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions
Date:	Tue, 30 Jul 2024 23:16:45 +0800
User-agent:	Mozilla Thunderbird

On 2024/7/25 2:04 PM, Richard Henderson wrote:

On 7/17/24 23:39, Max Chou wrote:
+static inline QEMU_ALWAYS_INLINE void
+vext_continus_ldst_host(CPURISCVState *env, vext_ldst_elem_fn_host*ldst_host,+ void *vd, uint32_t evl, uint32_t reg_start,void *host,
+                        uint32_t esz, bool is_load)
+{
+#if TARGET_BIG_ENDIAN != HOST_BIG_ENDIAN
+    for (; reg_start < evl; reg_start++, host += esz) {
+        uint32_t byte_off = reg_start * esz;
+        ldst_host(vd, byte_off, host);
+    }
+#else
+    uint32_t byte_offset = reg_start * esz;
+    uint32_t size = (evl - reg_start) * esz;
+
+    if (is_load) {
+        memcpy(vd + byte_offset, host, size);
+    } else {
+        memcpy(host, vd + byte_offset, size);
+    }
+#endif
First, TARGET_BIG_ENDIAN is always false, so this reduces toHOST_BIG_ENDIAN.
Second, even if TARGET_BIG_ENDIAN were true, this optimization wouldbe wrong, because of the element ordering given in vector_internals.h(i.e. H1 etc).

Thanks for the suggestions.
I missed the element ordering here.
I'll fix this at v6.

Third, this can be done with C if, instead of cpp ifdef, so that youalways compile-test both sides.
Fourth... what are the atomicity guarantees of RVV? I didn'timmediately see anything in the RVV manual, which suggests that theatomicity is the same as individual integer loads of the same size. Because there are no atomicity guarantees for memcpy, you can only usethis for byte load/store.
For big-endian bytes, you can optimize this to 64-bit little-endianoperations.
Compare arm gen_sve_ldr.

Thanks for the suggestion.
I'll check arm gen_sve_ldr.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

[RFC PATCH v5 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions, Max Chou, 2024/07/17
- [RFC PATCH v5 1/5] target/riscv: Set vdata.vm field for vector load/store whole register instructions, Max Chou, 2024/07/17
- [RFC PATCH v5 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store, Max Chou, 2024/07/17
  - Re: [RFC PATCH v5 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store, Richard Henderson, 2024/07/25
    - Re: [RFC PATCH v5 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store, Max Chou, 2024/07/30
- [RFC PATCH v5 3/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store, Max Chou, 2024/07/17
- [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions, Max Chou, 2024/07/17
  - Re: [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions, Richard Henderson, 2024/07/25
    - Re: [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions, Max Chou <=
- [RFC PATCH v5 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance, Max Chou, 2024/07/17
  - Re: [RFC PATCH v5 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance, Richard Henderson, 2024/07/25
    - Re: [RFC PATCH v5 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance, Max Chou, 2024/07/30

Prev by Date: Re: [RFC PATCH v5 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store
Next by Date: [PULL 03/14] tests/tcg: Use --noexecstack with assembler files
Previous by thread: Re: [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions
Next by thread: [RFC PATCH v5 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance
Index(es):
- Date
- Thread