Re: [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support

qemu-riscv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support

From:	Richard Henderson
Subject:	Re: [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support
Date:	Wed, 28 Jun 2023 11:07:40 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

On 6/27/23 19:45, Max Chou wrote:

+#define GEN_V_UNMASKED_TRANS(NAME, CHECK, EGS)                                \
+    static bool trans_##NAME(DisasContext *s, arg_##NAME *a)                  \
+    {                                                                         \
+        if (CHECK(s, a)) {                                                    \
+            TCGv_ptr rd_v, rs2_v;                                             \
+            TCGv_i32 desc;                                                    \
+            uint32_t data = 0;                                                \
+            TCGLabel *over = gen_new_label();                                 \
+            TCGLabel *vl_ok = gen_new_label();                                \
+            TCGLabel *vstart_ok = gen_new_label();                            \
+            TCGv_i32 tmp = tcg_temp_new_i32();                                \
+                                                                              \
+            /* save opcode for unwinding in case we throw an exception */     \
+            decode_save_opc(s);                                               \
+                                                                              \
+            /* check (vl % EGS == 0) assuming it's power of 2 */              \
+            tcg_gen_trunc_tl_i32(tmp, cpu_vl);                                \
+            tcg_gen_andi_i32(tmp, tmp, EGS - 1);                              \
+            tcg_gen_brcondi_i32(TCG_COND_EQ, tmp, 0, vl_ok);                  \
+            gen_helper_restore_cpu_and_raise_exception(                       \
+                cpu_env, tcg_constant_i32(RISCV_EXCP_ILLEGAL_INST));          \
+            gen_set_label(vl_ok);                                             \
+                                                                              \
+            /* check (vstart % EGS == 0) assuming it's power of 2 */          \
+            tcg_gen_trunc_tl_i32(tmp, cpu_vstart);                            \
+            tcg_gen_andi_i32(tmp, tmp, EGS - 1);                              \
+            tcg_gen_brcondi_i32(TCG_COND_EQ, tmp, 0, vstart_ok);              \
+            gen_helper_restore_cpu_and_raise_exception(                       \
+                cpu_env, tcg_constant_i32(RISCV_EXCP_ILLEGAL_INST));          \
+            gen_set_label(vstart_ok);                                         \
+                                                                              \
+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);        \
+            data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
+            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
+            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
+            data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
+            rd_v = tcg_temp_new_ptr();                                        \
+            rs2_v = tcg_temp_new_ptr();                                       \
+            desc = tcg_constant_i32(                                          \
+                simd_desc(s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, data)); \
+            tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));              \
+            tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));            \
+            gen_helper_##NAME(rd_v, rs2_v, cpu_env, desc);                    \
+            mark_vs_dirty(s);                                                 \
+            gen_set_label(over);                                              \
+            return true;                                                      \
+        }                                                                     \
+        return false;                                                         \
+    }


This kind of massive macro is bad style.
Much better to have a helper function and pass in gen_helper_foo as a parameter.

You can eliminate the vstart % EGS test, and the vstart < vl test, when 
VSTART_EQ_ZERO.
You can eliminate the vl % EGS test when VL_EQ_VLMAX.

You could move all of these tests out of line, into a helper_foo_chk() function whichperforms the checks and then calls helper_foo().

+#define GEN_VI_UNMASKED_TRANS(NAME, CHECK, EGS)                               \
+    static bool trans_##NAME(DisasContext *s, arg_##NAME *a)                  \
+    {                                                                         \
+        if (CHECK(s, a)) {                                                    \
+            TCGv_ptr rd_v, rs2_v;                                             \
+            TCGv_i32 uimm_v, desc;                                            \
+            uint32_t data = 0;                                                \
+            TCGLabel *over = gen_new_label();                                 \
+            TCGLabel *vl_ok = gen_new_label();                                \
+            TCGLabel *vstart_ok = gen_new_label();                            \
+            TCGv_i32 tmp = tcg_temp_new_i32();                                \
+                                                                              \
+            /* save opcode for unwinding in case we throw an exception */     \
+            decode_save_opc(s);                                               \
+                                                                              \
+            /* check (vl % EGS == 0) assuming it's power of 2 */              \
+            tcg_gen_trunc_tl_i32(tmp, cpu_vl);                                \
+            tcg_gen_andi_i32(tmp, tmp, EGS - 1);                              \
+            tcg_gen_brcondi_i32(TCG_COND_EQ, tmp, 0, vl_ok);                  \
+            gen_helper_restore_cpu_and_raise_exception(                       \
+                cpu_env, tcg_constant_i32(RISCV_EXCP_ILLEGAL_INST));          \
+            gen_set_label(vl_ok);                                             \
+                                                                              \
+            /* check (vstart % EGS == 0) assuming it's power of 2 */          \
+            tcg_gen_trunc_tl_i32(tmp, cpu_vstart);                            \
+            tcg_gen_andi_i32(tmp, tmp, EGS - 1);                              \
+            tcg_gen_brcondi_i32(TCG_COND_EQ, tmp, 0, vstart_ok);              \
+            gen_helper_restore_cpu_and_raise_exception(                       \
+                cpu_env, tcg_constant_i32(RISCV_EXCP_ILLEGAL_INST));          \
+            gen_set_label(vstart_ok);                                         \
+                                                                              \
+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);        \
+            data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
+            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
+            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
+            data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
+                                                                              \
+            rd_v = tcg_temp_new_ptr();                                        \
+            rs2_v = tcg_temp_new_ptr();                                       \
+            uimm_v = tcg_constant_i32(a->rs1);                                \
+            desc = tcg_constant_i32(                                          \
+                simd_desc(s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, data)); \
+            tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));              \
+            tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));            \
+            gen_helper_##NAME(rd_v, rs2_v, uimm_v, cpu_env, desc);            \
+            mark_vs_dirty(s);                                                 \
+            gen_set_label(over);                                              \
+            return true;                                                      \
+        }                                                                     \
+        return false;                                                         \
+    }


Likewise.

+#define GEN_ZVKNED_HELPER_VV(NAME, ...)                                   \
+    void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,  \
+                      uint32_t desc)                                      \
+    {                                                                     \
+        uint64_t *vd = vd_vptr;                                           \
+        uint64_t *vs2 = vs2_vptr;                                         \
+        uint32_t vl = env->vl;                                            \
+        uint32_t total_elems = vext_get_total_elems(env, desc, 4);        \
+        uint32_t vta = vext_vta(desc);                                    \
+                                                                          \
+        for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {        \
+            AESState round_key;                                           \
+            round_key.d[HOST_BIG_ENDIAN] = cpu_to_le64(vs2[i * 2 + 0]);   \
+            round_key.d[!HOST_BIG_ENDIAN] = cpu_to_le64(vs2[i * 2 + 1]);  \
+            AESState round_state;                                         \
+            cpu_to_le64s(vd + i * 2 + 0);                                 \
+            cpu_to_le64s(vd + i * 2 + 1);                                 \
+            for (int j = 0; j < 16; j++) {                                \
+                round_state.b[j] = ((uint8_t *)(vd + i * 2))[j];          \
+            }                                                             \


I think all of this byte swapping is wrong.
With this last loop particularly being particularly silly.

You want to present the 16 bytes in *host* endian order.
Because the words are always in little-endian order (see H1 et al),
we only need to swap the words on big-endian hosts.

See 
20230620110758.787479-21-richard.henderson@linaro.org/">https://lore.kernel.org/qemu-devel/20230620110758.787479-21-richard.henderson@linaro.org/
where I do exactly the same thing for ARM:

+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vm + i);
+        AESState t;
+
+        /* Our uint64_t are in the wrong order for big-endian. */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1];
+            t.d[1] = st->d[0];
+            aesdec_IMC(&t, &t, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            aesdec_IMC(ad, st, false);
+        }

+void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
+                        CPURISCVState *env, uint32_t desc)
+{
+    uint32_t *vd = vd_vptr;
+    uint32_t *vs2 = vs2_vptr;
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
+    uint32_t vta = vext_vta(desc);
+
+    uimm &= 0b1111;
+    if (uimm > 10 || uimm == 0) {
+        uimm ^= 0b1000;
+    }
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        uint32_t rk[8];
+        static const uint32_t rcon[] = {
+            0x01000000, 0x02000000, 0x04000000, 0x08000000, 0x10000000,
+            0x20000000, 0x40000000, 0x80000000, 0x1B000000, 0x36000000,
+        };
+
+        rk[0] = bswap32(vs2[i * 4 + H4(0)]);
+        rk[1] = bswap32(vs2[i * 4 + H4(1)]);
+        rk[2] = bswap32(vs2[i * 4 + H4(2)]);
+        rk[3] = bswap32(vs2[i * 4 + H4(3)]);
+
+        rk[4] = rk[0] ^ (((uint32_t)AES_sbox[(rk[3] >> 16) & 0xff] << 24) |
+                         ((uint32_t)AES_sbox[(rk[3] >> 8) & 0xff] << 16) |
+                         ((uint32_t)AES_sbox[(rk[3] >> 0) & 0xff] << 8) |
+                         ((uint32_t)AES_sbox[(rk[3] >> 24) & 0xff] << 0))
+                      ^ rcon[uimm - 1];
+        rk[5] = rk[1] ^ rk[4];
+        rk[6] = rk[2] ^ rk[5];
+        rk[7] = rk[3] ^ rk[6];
+
+        vd[i * 4 + H4(0)] = bswap32(rk[4]);
+        vd[i * 4 + H4(1)] = bswap32(rk[5]);
+        vd[i * 4 + H4(2)] = bswap32(rk[6]);
+        vd[i * 4 + H4(3)] = bswap32(rk[7]);
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
+}


All of this byte swapping is going to be wrong for a big-endian host.
It is also a little bit silly to do for a little-endian host.

You're byte swapping uint32_t words, then extracting bytes from those words. Just extractthe exact byte you require from the original input, using the H1() macro, and now you havecorrect code for both big- and little-endian hosts.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v5 00/15] Add RISC-V vector cryptographic instruction set support, (continued)
- [PATCH v5 00/15] Add RISC-V vector cryptographic instruction set support, Max Chou, 2023/06/27
  - [PATCH v6 01/15] target/riscv: Refactor some of the generic vector functionality, Max Chou, 2023/06/27
  - [PATCH v6 02/15] target/riscv: Refactor vector-vector translation macro, Max Chou, 2023/06/27
  - [PATCH v6 03/15] target/riscv: Remove redundant "cpu_vl == 0" checks, Max Chou, 2023/06/27
  - [PATCH v6 04/15] target/riscv: Add Zvbc ISA extension support, Max Chou, 2023/06/27
  - [PATCH v6 05/15] target/riscv: Move vector translation checks, Max Chou, 2023/06/27
  - [PATCH v6 06/15] target/riscv: Refactor translation of vector-widening instruction, Max Chou, 2023/06/27
  - [PATCH v6 07/15] target/riscv: Refactor some of the generic vector functionality, Max Chou, 2023/06/27
  - [PATCH v6 08/15] target/riscv: Add Zvbb ISA extension support, Max Chou, 2023/06/27
  - [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support, Max Chou, 2023/06/27
    - Re: [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support, Richard Henderson <=
    - Re: [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support, Max Chou, 2023/06/29
    - Re: [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support, Richard Henderson, 2023/06/29
  - [PATCH v6 10/15] target/riscv: Add Zvknh ISA extension support, Max Chou, 2023/06/27
    - Re: [PATCH v6 10/15] target/riscv: Add Zvknh ISA extension support, Richard Henderson, 2023/06/28
    - Re: [PATCH v6 10/15] target/riscv: Add Zvknh ISA extension support, Max Chou, 2023/06/29
  - [PATCH v6 11/15] target/riscv: Add Zvksh ISA extension support, Max Chou, 2023/06/27
  - [PATCH v6 12/15] target/riscv: Add Zvkg ISA extension support, Max Chou, 2023/06/27
  - [PATCH v6 13/15] crypto: Create sm4_subword, Max Chou, 2023/06/27
  - [PATCH v6 14/15] crypto: Add SM4 constant parameter CK, Max Chou, 2023/06/27
  - [PATCH v6 15/15] target/riscv: Add Zvksed ISA extension support, Max Chou, 2023/06/27

Prev by Date: Re: [PATCH v5 16/19] target/riscv/cpu.c: create KVM mock properties
Next by Date: Re: [PATCH v6 10/15] target/riscv: Add Zvknh ISA extension support
Previous by thread: [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support
Next by thread: Re: [PATCH v6 09/15] target/riscv: Add Zvkned ISA extension support
Index(es):
- Date
- Thread