[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Another CXL/MMIO tcg tlb corner case
From: |
Jørgen Hansen |
Subject: |
Re: Another CXL/MMIO tcg tlb corner case |
Date: |
Fri, 15 Mar 2024 15:03:36 +0000 |
> On 15 Mar 2024, at 13.25, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Jørgen Hansen <Jorgen.Hansen@wdc.com> writes:
>
>> Hi,
>>
>> While doing some testing using numactl-based interleaving of application
>> memory
>> across regular memory and CXL-based memory using QEMU with tcg, I ran into an
>> issue similar to what we saw a while back - link to old issue:
>> https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=Epz3_+CheAT8Crsk9mOu894wbNW_FywamkZiw@mail.gmail.com/#t.
>>
>> When running:
>>
>> numactl --interleave 0,1 ./cachebench …
>>
>> I hit the following:
>>
>> numactl --interleave 0,1 ./cachebench --json_test_config
>> ../test_configs/hit_ratio/graph_cache_follower_assocs/config.json
>> qemu: fatal: cpu_io_recompile: could not find TB for pc=0x7fffc3926dd4
>
> Ok so this will be because we (by design) don't cache TB's for MMIO
> regions. But as you say:
>>
>> Looking at the tb being executed, it looks like it is a single instruction
>> tb,
>> so with my _very_ limited understanding of tcg, it shouldn’t be necessary to
>> do the IO recompile:
>>
>> (gdb) up 13
>>
>> #13 0x0000555555ca13ac in cpu_loop_exec_tb (tb_exit=0x7ffff49ff6d8,
>> last_tb=<synthetic pointer>, pc=<optimized out>, tb=0x7fffc3926cc0
>> <code_gen_buffer+464678035>, cpu=0x5555578c19c0) at
>> ../accel/tcg/cpu-exec.c:904
>> 904 tb = cpu_tb_exec(cpu, tb, tb_exit);
>> (gdb) print *tb
>> $1 = {pc = 0, cs_base = 0, flags = 415285939, cflags = 4278321152, size = 7,
>> icount = 1, tc = {ptr = 0x7fffc3926d80 <code_gen_buffer+464678227>, size =
>> 176}, page_next = {0, 0}, page_addr = {18446744073709551615,
>> 18446744073709551615}, jmp_lock = {value = 0}, jmp_reset_offset =
>> {65535, 65535}, jmp_insn_offset = {65535, 65535}, jmp_target_addr =
>> {0, 0}, jmp_list_head = 140736474540928, jmp_list_next = {0, 0},
>> jmp_dest = {0, 0}}
>
> with an icount of 1 we by definition can do io. This is done in the
> translator_loop:
>
> /*
> * Disassemble one instruction. The translate_insn hook should
> * update db->pc_next and db->is_jmp to indicate what should be
> * done next -- either exiting this loop or locate the start of
> * the next instruction.
> */
> if (db->num_insns == db->max_insns) {
> /* Accept I/O on the last instruction. */
> set_can_do_io(db, true);
> }
>
> and we see later on:
>
> tb->icount = db->num_insns;
>
> So I guess we must have gone into the translator loop expecting to
> translate more? (side note having int8_t type for the saved bool value
> seems weird).
>
> Could you confirm by doing something like:
>
> if (tb->icount == 1 && db->max_insns > 1) {
> fprintf(stderr, "ended up doing one insn (%d, %d)", db->max_insns,
> db->saved_can_do_io);
> }
>
> after we set tb->icount?
>
Thanks for looking into this - I tried what you suggest above and it looks
like that is a case that happens quite often (I see 100s of these just when
booting the VM), so it is hard to tell whether it is related directly to the
issue, e.g.,:
ended up doing one insn (512, 0)ended up doing one insn (512, 0)ended up doing
one insn (512, 0)ended up doing one insn (512, 0)ended up doing one insn (512,
0)ended up doing one insn (512, 0)ended up doing one insn (512, 0)qemu: fatal:
cpu_io_recompile: could not find TB for pc=0x7f4264da3d48
RAX=0000000000000000 RBX=00007f6798e69040 RCX=000000000205f958
RDX=000000000205f8f0
RSI=0000000000000000 RDI=00007ffd5663c720 RBP=00007ffd5663c720
RSP=00007ffd5663c6c0
R8 =000000000000001e R9 =000000000205f920 R10=0000000000000004
R11=00007f67984b56b0
R12=00007ffd5663c7e0 R13=00007f6798e5d0b8 R14=00007f6798e5d588
R15=00007ffd5663c700
RIP=00007f6798e39ffd RFL=00000246 [---Z-P-] CPL=3 II=0 A20=1 SMM=0 HLT=0
>> If the application is run entirely out of MMIO memory, things work fine (the
>> previous patches related to this is in Jonathans branch), so one thought is
>> that
>> it is related to having the code on a mix of regular and CXL memory. Since we
>> previously had issues with code crossing page boundaries where only the
>> second
>> page is MMIO, I tried out the following change to the fix introduced for that
>> issue thinking that reverting to the slow path in the middle of the
>> translation
>> might not correctly update can_do_io:
>>
>> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
>> index 38c34009a5..db6ea360e0 100644
>> --- a/accel/tcg/translator.c
>> +++ b/accel/tcg/translator.c
>> @@ -258,6 +258,7 @@ static void *translator_access(CPUArchState *env,
>> DisasContextBase *db,
>> if (unlikely(new_page1 == -1)) {
>> tb_unlock_pages(tb);
>> tb_set_page_addr0(tb, -1);
>> + set_can_do_io(db, true);
>> return NULL;
>> }
>>
>> With that change, things work fine. Not saying that this is a valid fix for
>> the
>> issue, but just trying to provide as much information as possible :)
>
> I can see why this works, my worry is if we address all the early exit
> cases here.
>
>>
>> Thanks,
>> Jorgen
>
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
Thanks,
Jorgen