Re: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX

From:	Richard Henderson
Subject:	Re: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX
Date:	Mon, 28 Aug 2023 09:57:45 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 8/28/23 08:19, Jiajie Chen wrote:

+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg rd, int64_t v64)
+{
+    /* Try vldi if imm can fit */
+    if (vece <= MO_32 && (-0x200 <= v64 && v64 <= 0x1FF)) {
+        uint32_t imm = (vece << 10) | ((uint32_t)v64 & 0x3FF);
+        tcg_out_opc_vldi(s, rd, imm);
+        return;
+    }

v64 has the value replicated across 64 bits.
In order to do the comparison above, you'll want

    int64_t vale = sextract64(v64, 0, 8 << vece);
    if (-0x200 <= vale && vale <= 0x1ff)
        ...

Since the only documentation for LSX is qemu's own translator code, why are you testingvece <= MO_32? MO_64 should be available as well? Or is there a bug in trans_vldi()?

It might be nice to leave a to-do for vldi imm bit 12 set, for the patterns expanded byvldi_get_value(). In particular, mode == 9 is likely to be useful, and modes {1,2,3,5}are easy to test for.

+
+    /* Fallback to vreplgr2vr */
+    tcg_out_movi(s, type, TCG_REG_TMP0, v64);

type is a vector type; you can't use it here.
Correct would be TCG_TYPE_I64.

Better to load vale instead, since that will take fewer insns in tcg_out_movi.

+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+                           unsigned vecl, unsigned vece,
+                           const TCGArg args[TCG_MAX_OP_ARGS],
+                           const int const_args[TCG_MAX_OP_ARGS])
+{
+    TCGType type = vecl + TCG_TYPE_V64;
+    TCGArg a0, a1, a2;
+    TCGReg base;
+    TCGReg temp = TCG_REG_TMP0;
+    int32_t offset;
+
+    a0 = args[0];
+    a1 = args[1];
+    a2 = args[2];
+
+    /* Currently only supports V128 */
+    tcg_debug_assert(type == TCG_TYPE_V128);
+
+    switch (opc) {
+    case INDEX_op_st_vec:
+        /* Try to fit vst imm */
+        if (-0x800 <= a2 && a2 <= 0x7ff) {
+            base = a1;
+            offset = a2;
+        } else {
+            tcg_out_addi(s, TCG_TYPE_I64, temp, a1, a2);
+            base = temp;
+            offset = 0;
+        }
+        tcg_out_opc_vst(s, a0, base, offset);
+        break;
+    case INDEX_op_ld_vec:
+        /* Try to fit vld imm */
+        if (-0x800 <= a2 && a2 <= 0x7ff) {
+            base = a1;
+            offset = a2;
+        } else {
+            tcg_out_addi(s, TCG_TYPE_I64, temp, a1, a2);
+            base = temp;
+            offset = 0;
+        }
+        tcg_out_opc_vld(s, a0, base, offset);

tcg_out_addi has a hole in bits [15:12], and can take an extra insn if those bits are set.Better to load the offset with tcg_out_movi and then use VLDX/VSTX instead of VLD/VST.

@@ -159,6 +170,30 @@ typedef enum {
  #define TCG_TARGET_HAS_mulsh_i64        1
  #define TCG_TARGET_HAS_qemu_ldst_i128   0

+#define TCG_TARGET_HAS_v64 0

+#define TCG_TARGET_HAS_v128             use_lsx_instructions
+#define TCG_TARGET_HAS_v256             0

Perhaps reserve for a follow-up, but TCG_TARGET_HAS_v64 can easily be supported using thesame instructions.

The only difference is load/store, where you could use FLD.D/FST.D to load the lower64-bits of the fp/vector register, or VLDREPL.D to load and initialize all bits andVSTELM.D to store the lower 64-bits.

I tend to think the float insns are more flexible, having a larger displacement, and theavailability of FLDX/FSTX as well.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 00/11] Lower TCG vector ops to LSX, Jiajie Chen, 2023/08/28
- [PATCH 04/11] tcg/loongarch64: Lower add/sub_vec to vadd/vsub, Jiajie Chen, 2023/08/28
  - Re: [PATCH 04/11] tcg/loongarch64: Lower add/sub_vec to vadd/vsub, Richard Henderson, 2023/08/28
- [PATCH 03/11] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt, Jiajie Chen, 2023/08/28
  - Re: [PATCH 03/11] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt, Richard Henderson, 2023/08/28
- [PATCH 07/11] tcg/loongarch64: Lower mul_vec to vmul, Jiajie Chen, 2023/08/28
  - Re: [PATCH 07/11] tcg/loongarch64: Lower mul_vec to vmul, Richard Henderson, 2023/08/28
- [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX, Jiajie Chen, 2023/08/28
  - Re: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX, Richard Henderson <=
    - Re: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX, Jiajie Chen, 2023/08/28
- [PATCH 06/11] tcg/loongarch64: Lower neg_vec to vneg, Jiajie Chen, 2023/08/28
  - Re: [PATCH 06/11] tcg/loongarch64: Lower neg_vec to vneg, Richard Henderson, 2023/08/28
- [PATCH 01/11] tcg/loongarch64: Import LSX instructions, Jiajie Chen, 2023/08/28
- [PATCH 09/11] tcg/loongarch64: Lower vector saturated ops, Jiajie Chen, 2023/08/28
  - Re: [PATCH 09/11] tcg/loongarch64: Lower vector saturated ops, Richard Henderson, 2023/08/28
- [PATCH 10/11] tcg/loongarch64: Lower vector shift vector ops, Jiajie Chen, 2023/08/28
  - Re: [PATCH 10/11] tcg/loongarch64: Lower vector shift vector ops, Richard Henderson, 2023/08/28
- [PATCH 11/11] tcg/loongarch64: Lower bitsel_vec to vbitsel, Jiajie Chen, 2023/08/28
  - Re: [PATCH 11/11] tcg/loongarch64: Lower bitsel_vec to vbitsel, Richard Henderson, 2023/08/28

Prev by Date: [PATCH v7 7/7] hw/i386/pc: Support hv-balloon
Next by Date: Re: [PATCH 1/3] hw/mips/jazz: Remove the big_endian variable
Previous by thread: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX
Next by thread: Re: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX
Index(es):
- Date
- Thread