[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Another CXL/MMIO tcg tlb corner case
From: |
Alex Bennée |
Subject: |
Re: Another CXL/MMIO tcg tlb corner case |
Date: |
Fri, 15 Mar 2024 12:25:49 +0000 |
User-agent: |
mu4e 1.12.1; emacs 29.2 |
Jørgen Hansen <Jorgen.Hansen@wdc.com> writes:
> Hi,
>
> While doing some testing using numactl-based interleaving of application
> memory
> across regular memory and CXL-based memory using QEMU with tcg, I ran into an
> issue similar to what we saw a while back - link to old issue:
> https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=Epz3_+CheAT8Crsk9mOu894wbNW_FywamkZiw@mail.gmail.com/#t.
>
> When running:
>
> numactl --interleave 0,1 ./cachebench …
>
> I hit the following:
>
> numactl --interleave 0,1 ./cachebench --json_test_config
> ../test_configs/hit_ratio/graph_cache_follower_assocs/config.json
> qemu: fatal: cpu_io_recompile: could not find TB for pc=0x7fffc3926dd4
Ok so this will be because we (by design) don't cache TB's for MMIO
regions. But as you say:
>
> Looking at the tb being executed, it looks like it is a single instruction tb,
> so with my _very_ limited understanding of tcg, it shouldn’t be necessary to
> do the IO recompile:
>
> (gdb) up 13
>
> #13 0x0000555555ca13ac in cpu_loop_exec_tb (tb_exit=0x7ffff49ff6d8,
> last_tb=<synthetic pointer>, pc=<optimized out>, tb=0x7fffc3926cc0
> <code_gen_buffer+464678035>, cpu=0x5555578c19c0) at
> ../accel/tcg/cpu-exec.c:904
> 904 tb = cpu_tb_exec(cpu, tb, tb_exit);
> (gdb) print *tb
> $1 = {pc = 0, cs_base = 0, flags = 415285939, cflags = 4278321152, size = 7,
> icount = 1, tc = {ptr = 0x7fffc3926d80 <code_gen_buffer+464678227>, size =
> 176}, page_next = {0, 0}, page_addr = {18446744073709551615,
> 18446744073709551615}, jmp_lock = {value = 0}, jmp_reset_offset =
> {65535, 65535}, jmp_insn_offset = {65535, 65535}, jmp_target_addr =
> {0, 0}, jmp_list_head = 140736474540928, jmp_list_next = {0, 0},
> jmp_dest = {0, 0}}
with an icount of 1 we by definition can do io. This is done in the
translator_loop:
/*
* Disassemble one instruction. The translate_insn hook should
* update db->pc_next and db->is_jmp to indicate what should be
* done next -- either exiting this loop or locate the start of
* the next instruction.
*/
if (db->num_insns == db->max_insns) {
/* Accept I/O on the last instruction. */
set_can_do_io(db, true);
}
and we see later on:
tb->icount = db->num_insns;
So I guess we must have gone into the translator loop expecting to
translate more? (side note having int8_t type for the saved bool value
seems weird).
Could you confirm by doing something like:
if (tb->icount == 1 && db->max_insns > 1) {
fprintf(stderr, "ended up doing one insn (%d, %d)", db->max_insns,
db->saved_can_do_io);
}
after we set tb->icount?
> If the application is run entirely out of MMIO memory, things work fine (the
> previous patches related to this is in Jonathans branch), so one thought is
> that
> it is related to having the code on a mix of regular and CXL memory. Since we
> previously had issues with code crossing page boundaries where only the second
> page is MMIO, I tried out the following change to the fix introduced for that
> issue thinking that reverting to the slow path in the middle of the
> translation
> might not correctly update can_do_io:
>
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 38c34009a5..db6ea360e0 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -258,6 +258,7 @@ static void *translator_access(CPUArchState *env,
> DisasContextBase *db,
> if (unlikely(new_page1 == -1)) {
> tb_unlock_pages(tb);
> tb_set_page_addr0(tb, -1);
> + set_can_do_io(db, true);
> return NULL;
> }
>
> With that change, things work fine. Not saying that this is a valid fix for
> the
> issue, but just trying to provide as much information as possible :)
I can see why this works, my worry is if we address all the early exit
cases here.
>
> Thanks,
> Jorgen
--
Alex Bennée
Virtualisation Tech Lead @ Linaro