[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RFC 0/1 v3] target/riscv: use tcg ops generation to emulate whole reg r
From: |
Paolo Savini |
Subject: |
[RFC 0/1 v3] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores. |
Date: |
Wed, 22 Jan 2025 16:49:04 +0000 |
Previous versions:
- RFC v1:
https://lore.kernel.org/all/20241218170840.1090473-1-paolo.savini@embecosm.com/
- RFC v2:
https://lore.kernel.org/all/20241220153834.16302-1-paolo.savini@embecosm.com/
Thanks Max for the feedback here:
https://lore.kernel.org/all/258795e9-4e97-4cd7-949f-24e88d24f25e@sifive.com/
The previous version had the issue that calls to tcg_gen_qemu_[ld/st]_i128 and
tcg_gen_[ld/st]_i128 would not generate 128 bits loads and stores but generated
64-bit pairs of loads/stores. This meant that with a trap on the second
load/store
we weren't able to increment vstart in ldst_whole_trans by the number of
elements
processed by the first 64 bits load/store.
I propose here the following fixes:
- Split the emulation of whole register loads/stores into smaller sizes:
we generate at best pairs of 64 bits loads/stores anyway so we'd rather call
directly for the generation of 64 bits load/store operations and update vstart
accordingly, instead of calling for 128 bits loads and store that under the
hood will be split.
- Emulate whole register loads/stores by 32 bits blocks for hosts with 32 bits
registers: this is done again to avoid a splitting of the load or store that
we want to generate without us being able to set vstart correctly in case a
trap happens.
- Don't generate 32 bits loads/stores but fall back to the helper function
if the host has 32 bits registers and we are loading/storing vector elements
of 64 bits. This is done in order to avoid that a trap stops the execution
mid-element.
The patch also adds a set of conditions for the use of tcg nodes or helper
function that is host architecture specific.
We observed a performance gain on all the combinations of vector length,
element size and number of fields in the emulation of the whole register loads
and stores apart from a few cases where the helper function pefroms better.
We add a set of condition that cover those cases and future strings can be added
if other architectures require them.
The commit message changed to better reflect the new behaviour of the patch.
Cc: Richard Handerson <richard.henderson@linaro.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: Bin Meng <bmeng.cn@gmail.com>
Cc: Weiwei Li <liwei1518@gmail.com>
Cc: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Cc: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Cc: Helene Chelin <helene.chelin@embecosm.com>
Cc: Nathan Egge <negge@google.com>
Cc: Max Chou <max.chou@sifive.com>
Cc: Jeremy Bennett <jeremy.bennett@embecosm.com>
Cc: Craig Blackmore <craig.blackmore@embecosm.com>
Paolo Savini (1):
target/riscv: use tcg ops generation to emulate whole reg rvv
loads/stores.
target/riscv/insn_trans/trans_rvv.c.inc | 164 +++++++++++++++++-------
1 file changed, 119 insertions(+), 45 deletions(-)
--
2.34.1
- [RFC 0/1 v3] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.,
Paolo Savini <=