qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DSB does not seem to wait for TLBI completion


From: Alex Bennée
Subject: Re: DSB does not seem to wait for TLBI completion
Date: Thu, 18 Nov 2021 17:01:45 +0000
User-agent: mu4e 1.7.5; emacs 28.0.60

Idan Horowitz <idan.horowitz@gmail.com> writes:

> Hey, I'm running a bare-metal image on QEMU 6.1 and I've encountered the 
> following scenario:
> After receiving a data abort and mapping in the correct page I try to 
> invalidate the corresponding TLB entry using the following assembly
> sequence:
>
> dsb ish
> tlbi vaae1is, x0
> dsb sy
>
> Unfortunately this does not seem to have any immediate effect, as upon 
> returning back to the source of the exception I immediately hit
> the same Data Abort. This cycle of receiving a Data Abort and then updating 
> the mapping continues for 100s of times, until the TLB finally
> updates to the correct mapping.
>
> As part of my testing I also tried to replace the Inner Shareable tlbi I 
> showed above with the base version that only invalidates the current
> PE's TLB entry (tlbi vaae1, x0) this seemed to fix the issue, which made me 
> suspect something was up with QEMU itself, as the inner
> shareable version of the instruction is supposed to invalidate the current 
> PE's TLB entry as well as the others', so if the non-shareable
> version works the inner-shareable one should work as well.
>
> After digging a bit through the code I saw that the non-shareable version 
> calls 'tlb_flush_page_bits_by_mmuidx' which eventually calls
> 'tlb_flush_range_by_mmuidx_async_0' synchronously, while the inner-shareable 
> version calls
> 'tlb_flush_page_bits_by_mmuidx_all_cpus_synced' which also eventually calls 
> 'tlb_flush_range_by_mmuidx_async_0', but asynchronously
> this time.
>
> Moving on to the implementation of the DSB instruction I saw that it is 
> translated into an 'INDEX_op_mb' operation, but looking at the
> interpreter handling of that instruction, it simply performs a memory 
> barrier, it does not handle any of the async tasks in the work queue
> (at least explicitly) so from my (admittedly basic) understanding of the code 
> it looks like QEMU's implementation of the DSB instruction
> does not wait until the TLB flush has finished, as required.

If we exit the translation block like the code for ISB does then that
will give a chance for all the queued work to complete. If we have done
a _synced call this includes bringing all vCPUs to a halt before
flushing and restarting.

> If anyone can point me in the right direction it would be greatly
> appreciated.

Try:

modified   target/arm/translate-a64.c
@@ -1553,6 +1553,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
             break;
         }
         tcg_gen_mb(bar);
+        gen_goto_tb(s, 0, s->base.pc_next);
         return;
     case 6: /* ISB */

and see if that helps. I suspect do be efficient we should probably do
some more decode on the instruction to make that decision as ending a
block for every DMB/DSB might be overkill and impact performance. 

I don't think we have a way to track pending state awaiting a DSB
instruction in the translator but in theory we could. I thought
(ri->type & ARM_CP_IO) for system registers would ensure an end of block
but apparently that is only for icount.

>
> Thanks, Idan Horowitz.


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]