qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/5] plugins: add new inline op STORE_U64


From: Pierrick Bouvier
Subject: Re: [PATCH v2 2/5] plugins: add new inline op STORE_U64
Date: Wed, 13 Mar 2024 11:58:17 +0400
User-agent: Mozilla Thunderbird

On 3/13/24 01:15, Richard Henderson wrote:
On 3/11/24 21:54, Pierrick Bouvier wrote:
+static void gen_empty_inline_cb_store_u64(void)
+{
+    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
+    TCGv_ptr cpu_index_as_ptr = tcg_temp_ebb_new_ptr();
+    TCGv_i64 val = tcg_temp_ebb_new_i64();
+    TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
+
+    tcg_gen_ld_i32(cpu_index, tcg_env,
+                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
+    /* second operand will be replaced by immediate value */
+    tcg_gen_mul_i32(cpu_index, cpu_index, cpu_index);
+    tcg_gen_ext_i32_ptr(cpu_index_as_ptr, cpu_index);
+    tcg_gen_movi_ptr(ptr, 0);
+    tcg_gen_add_ptr(ptr, ptr, cpu_index_as_ptr);
+
+    tcg_gen_movi_i64(val, 0);
+    tcg_gen_st_i64(val, ptr, 0);
+
+    tcg_temp_free_ptr(ptr);
+    tcg_temp_free_i64(val);
+    tcg_temp_free_ptr(cpu_index_as_ptr);
+    tcg_temp_free_i32(cpu_index);
+}

I was never fond of this full pattern generate...


I agree with you. Didn't want to start this discussion, but yes, implementing this feels clunky and error prone (especially the replace part, that depends on architecture bitness for which you execute).

@@ -352,6 +385,20 @@ static TCGOp *copy_st_i64(TCGOp **begin_op, TCGOp *op)
       return op;
   }
+static TCGOp *copy_mov_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
+{
+    if (TCG_TARGET_REG_BITS == 32) {
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+        op->args[1] = tcgv_i32_arg(TCGV_LOW(tcg_constant_i64(v)));
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+        op->args[1] = tcgv_i32_arg(TCGV_HIGH(tcg_constant_i64(v)));
+    } else {
+        op = copy_op(begin_op, op, INDEX_op_mov_i64);
+        op->args[1] = tcgv_i64_arg(tcg_constant_i64(v));
+    }
+    return op;
+}

... followed by pattern match and modify.  I think adding more of this is 
fragile, and a
mistake.

(1) This encodes knowledge of the order of the parts of a mov_i64 for 32-bit 
host.
(2) You shouldn't use TCG_LOW/HIGH of tcg_constant_i64, but two separate calls 
to
tcg_constant_i32 with the parts of @v.

But all of this would be easier if we simply generate new code now, instead of 
copy.

I'm open to work on this kind of change, and simply have a single pass that generate tcg ops, just before optimizing step, and translation to target arch. I would like to hear what Alex opinion is on doing this.


+static TCGOp *append_inline_cb_store_u64(const struct qemu_plugin_dyn_cb *cb,
+                                       TCGOp *begin_op, TCGOp *op,
+                                       int *unused)
+{
+    char *ptr = cb->inline_insn.entry.score->data->data;
+    size_t elem_size = g_array_get_element_size(
+        cb->inline_insn.entry.score->data);
+    size_t offset = cb->inline_insn.entry.offset;
+    op = copy_ld_i32(&begin_op, op);
+    op = copy_mul_i32(&begin_op, op, elem_size);
+    op = copy_ext_i32_ptr(&begin_op, op);
+    op = copy_const_ptr(&begin_op, op, ptr + offset);
+    op = copy_add_ptr(&begin_op, op);
+    op = copy_mov_i64(&begin_op, op, cb->inline_insn.imm);
+    op = copy_st_i64(&begin_op, op);

You'd also be able to fold offset into the store.  This would allow the 
scoreboard address
to be entered once into the constant pool and have multiple uses.


The problem is that several callbacks can operate on several scoreboards (with different entries offset), so I'm not sure what we can precompute here.

We would need to keep a set of all target scoreboards, pre-compute final pointer for everyone of them, and emit this before any callback code. This sounded more complicated than just emitting all this.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]