qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TC


From: Nicholas Piggin
Subject: Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Date: Tue, 20 Feb 2024 13:44:21 +1000

On Tue Feb 20, 2024 at 12:10 AM AEST, Thomas Huth wrote:
> On 19/02/2024 07.17, Nicholas Piggin wrote:
> > The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
> > aligned ranges forgot to bring the TCG TLB up to date after clearing
> > some of the dirty memory bitmap bits. This can result in stores though
> > the TCG TLB not setting the dirty memory bitmap and ultimately causes
> > memory corruption / lost updates during migration from a TCG host.
> > 
> > Fix this by exporting an abstracted function to call when dirty bits
> > have been cleared.
> > 
> > Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
>
> Sounds promising! ... but it doesn't seem to fix the migration-test qtest 
> with s390x when it gets enabled again:
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
>           return g_test_run();
>       }
>
> -    /*
> -     * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
> -     * there until the problems are resolved
> -     */
> -    if (g_str_equal(arch, "s390x") && !has_kvm) {
> -        g_test_message("Skipping test: s390x host with KVM is required");
> -        return g_test_run();
> -    }
> -
>       tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
>       if (!tmpfs) {
>           g_test_message("Can't create temporary directory in %s: %s",
>
> I wonder whether there is more stuff like this necessary somewhere?
>
> Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to 
> see whether that works fine now?

I'm seeing a hang about every 10 minutes with s390x. ppc64 is reliable
so far.

So both my patches didn't fix the problem for s390. It seems like the
test just stops running, so maybe it's a harness problem? I didn't
dig into what state the machine is in at this point.

I did fix a few ppc64 migration issues recently that came up with
testing reverse-debugging. That was very good for finding problems
(but very difficult to diagnose failures). Maybe that helped stability
on this test?

Thanks,
Nick



reply via email to

[Prev in Thread] Current Thread [Next in Thread]