[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 25/51] target/arm: Implement SME MOVA
From: |
Peter Maydell |
Subject: |
Re: [PATCH v3 25/51] target/arm: Implement SME MOVA |
Date: |
Thu, 23 Jun 2022 12:24:48 +0100 |
On Mon, 20 Jun 2022 at 19:20, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We can reuse the SVE functions for implementing moves to/from
> horizontal tile slices, but we need new ones for moves to/from
> vertical tile slices.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper-sme.h | 11 ++++
> target/arm/helper-sve.h | 2 +
> target/arm/translate-a64.h | 9 +++
> target/arm/translate.h | 5 ++
> target/arm/sme.decode | 15 +++++
> target/arm/sme_helper.c | 110 ++++++++++++++++++++++++++++++++++++-
> target/arm/sve_helper.c | 12 ++++
> target/arm/translate-a64.c | 19 +++++++
> target/arm/translate-sme.c | 105 +++++++++++++++++++++++++++++++++++
> 9 files changed, 287 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
> index c4ee1f09e4..600346e08c 100644
> --- a/target/arm/helper-sme.h
> +++ b/target/arm/helper-sme.h
> @@ -21,3 +21,14 @@ DEF_HELPER_FLAGS_2(set_pstate_sm, TCG_CALL_NO_RWG, void,
> env, i32)
> DEF_HELPER_FLAGS_2(set_pstate_za, TCG_CALL_NO_RWG, void, env, i32)
>
> DEF_HELPER_FLAGS_3(sme_zero, TCG_CALL_NO_RWG, void, env, i32, i32)
> +
> +DEF_HELPER_FLAGS_4(sme_mova_avz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(sme_mova_zav_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
What do the 'avz' and 'zav' stand for here? I thought that
'zav' might mean "from the ZA storage to a Vector", but
then what is 'avz' ?
> +static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs,
> + int tile_index, bool vertical)
> +{
> + int tile = tile_index >> (4 - esz);
> + int index = esz == MO_128 ? 0 : extract32(tile_index, 0, 4 - esz);
> + int pos, len, offset;
> + TCGv_i32 t_index;
> + TCGv_ptr addr;
> +
> + /* Resolve tile.size[index] to an untyped ZA slice index. */
> + t_index = tcg_temp_new_i32();
> + tcg_gen_trunc_tl_i32(t_index, cpu_reg(s, rs));
> + tcg_gen_addi_i32(t_index, t_index, index);
> +
> + len = ctz32(s->svl) - esz;
> + pos = esz;
> + offset = tile;
> +
> + /*
> + * Horizontal slice. Index row N, column 0.
> + * The helper will iterate by the element size.
> + */
> + if (!vertical) {
> + pos += ctz32(sizeof(ARMVectorReg));
> + offset *= sizeof(ARMVectorReg);
> + }
> + offset += offsetof(CPUARMState, zarray);
> +
> + tcg_gen_deposit_z_i32(t_index, t_index, pos, len);
> + tcg_gen_addi_i32(t_index, t_index, offset);
> +
> + /*
> + * Vertical tile slice. Index row 0, column N.
> + * The helper will iterate by the row spacing in the array.
> + * Need to adjust addressing for elements smaller than uint64_t for BE.
> + */
> + if (HOST_BIG_ENDIAN && vertical && esz < MO_64) {
> + tcg_gen_xori_i32(t_index, t_index, 8 - (1 << esz));
> + }
> +
> + addr = tcg_temp_new_ptr();
> + tcg_gen_ext_i32_ptr(addr, t_index);
> + tcg_temp_free_i32(t_index);
> + tcg_gen_add_ptr(addr, addr, cpu_env);
> +
> + return addr;
> +}
This is too confusing -- I spent half an hour looking at it and
couldn't figure out if it was correct or not. I can see roughly
what it's supposed to be doing but I don't really want to try
to reverse engineer the details from the sequence of operations.
Eg the way we sometimes just add in the tile number and sometimes
add in the tile number * the size of a vector reg looks very
strange; I figured out that the deposit op is doing the equivalent
of the pseudocode's "MOD dim" on the slice index but it doesn't
say so and the calculation of len and pos is kind of obscure to me.
Perhaps (a) more commentary and (b) separating out the
horizontal and vertical cases would help ?
thanks
-- PMM
- Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening), (continued)
- [PATCH v3 25/51] target/arm: Implement SME MOVA, Richard Henderson, 2022/06/20
- Re: [PATCH v3 25/51] target/arm: Implement SME MOVA,
Peter Maydell <=
- [PATCH v3 24/51] target/arm: Implement SME ZERO, Richard Henderson, 2022/06/20
- [PATCH v3 26/51] target/arm: Implement SME LD1, ST1, Richard Henderson, 2022/06/20
[PATCH v3 27/51] target/arm: Export unpredicated ld/st from translate-sve.c, Richard Henderson, 2022/06/20
[PATCH v3 15/51] target/arm: Move arm_cpu_*_finalize to internals.h, Richard Henderson, 2022/06/20