[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH] accel/tcg: clear all TBs from a page when it is written
From: |
Nicholas Piggin |
Subject: |
Re: [RFC PATCH] accel/tcg: clear all TBs from a page when it is written to |
Date: |
Wed, 14 Aug 2024 16:09:36 +1000 |
On Mon Aug 12, 2024 at 11:25 AM AEST, Richard Henderson wrote:
> On 8/9/24 17:47, Nicholas Piggin wrote:
> > This is not a clean patch, but does fix a problem I hit with TB
> > invalidation due to the target software writing to memory with TBs.
> >
> > Lockup messages are triggering in Linux due to page clearing taking a
> > long time when a code page has been freed, because it takes a lot of
> > notdirty notifiers, which massively slows things down. Linux might
> > possibly have a bug here too because it seems to hang indefinitely in
> > some cases, but even if it didn't, the latency of clearing these pages
> > is very high.
> >
> > This showed when running KVM on the emulated machine, starting and
> > stopping guests. That causes lots of instruction pages to be freed.
> > Usually if you're just running Linux, executable pages remain in
> > pagecache so you get fewer of these bombs in the kernel memory
> > allocator. But page reclaim, JITs, deleting executable files, etc.,
> > could trigger it too.
> >
> > Invalidating all TBs from the page on any hit seems to avoid the problem
> > and generally speeds things up.
> >
> > How important is the precise invalidation? These days I assume the
> > tricky kind of SMC that frequently writes code close to where it's
> > executing is pretty rare and might not be something we really care about
> > for performance. Could we remove sub-page TB invalidation entirely?
>
> Happens on x86 and s390 regularly enough, so we can't remove it.
>
> > @@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct
> > page_collection *pages,
> > TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) :
> > NULL;
> > #endif /* TARGET_HAS_PRECISE_SMC */
> >
> > + start &= TARGET_PAGE_MASK;
> > + last |= ~TARGET_PAGE_MASK;
> > +
> > /* Range may not cross a page. */
> > tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0);
>
> This would definitely break SMC.
They can't invalidate the instruction currently being executed?
I'll experiment a bit more.
> However, there's a better solution. We're already iterating over all of the
> TBs on the
> current page only. Move *everything* except the tb_phys_invalidate__locked
> call into the
> SMC ifdef, and unconditionally invalidate every TB selected in the loop.
Okay. I suspect *most* of the time even the strict SMC archs would
not be writing to the same page they're executing either. But I can
start with the !SMC.
> We experimented with something like this for aarch64, which used to spend a
> lot of the
> kernel startup time invalidating code pages from the (somewhat bloated) EDK2
> bios. But it
> turned out the bigger problem was address space randomization, and with
> CF_PCREL the
> problem appeared to go away.
Interesting.
> I don't think we've done any kvm-under-tcg performance testing, but lockup
> messages would
> certainly be something to look for...
Yeah, actually Linux is throwing the messages a bit more recently
at least on distros that enable page clearing at alloc for security,
because that clearing is a big chunk that can happen in critical
sections.
Thanks for the suggestion, I'll give it a try.
Thanks,
Nick