qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V2 06/11] migration: fix mismatched GPAs during cpr


From: Peter Xu
Subject: Re: [PATCH V2 06/11] migration: fix mismatched GPAs during cpr
Date: Fri, 16 Aug 2024 10:43:23 -0400

On Thu, Aug 15, 2024 at 04:54:58PM -0400, Steven Sistare wrote:
> On 8/13/2024 4:43 PM, Peter Xu wrote:
> > On Wed, Aug 07, 2024 at 05:04:26PM -0400, Steven Sistare wrote:
> > > On 7/19/2024 12:28 PM, Peter Xu wrote:
> > > > On Sun, Jun 30, 2024 at 12:40:29PM -0700, Steve Sistare wrote:
> > > > > For new cpr modes, ramblock_is_ignored will always be true, because 
> > > > > the
> > > > > memory is preserved in place rather than copied.  However, for an 
> > > > > ignored
> > > > > block, parse_ramblock currently requires that the received address of 
> > > > > the
> > > > > block must match the address of the statically initialized region on 
> > > > > the
> > > > > target.  This fails for a PCI rom block, because the memory region 
> > > > > address
> > > > > is set when the guest writes to a BAR on the source, which does not 
> > > > > occur
> > > > > on the target, causing a "Mismatched GPAs" error during cpr migration.
> > > > 
> > > > Is this a common fix with/without cpr mode?
> > > > 
> > > > It looks to me mr->addr (for these ROMs) should only be set in PCI 
> > > > config
> > > > region updates as you mentioned.  But then I didn't figure out when 
> > > > they're
> > > > updated on dest in live migration: the ramblock info was sent at the
> > > > beginning of migration, so it doesn't even have PCI config space 
> > > > migrated;
> > > > I thought the real mr->addr should be in there.
> > > > 
> > > > I also failed to understand yet on why the mr->addr check needs to be 
> > > > done
> > > > by ignore-shared only.  Some explanation would be greatly helpful around
> > > > this area..
> > > 
> > > The error_report does not bite for normal migration because 
> > > migrate_ram_is_ignored()
> > > is false for the problematic blocks, so the block->mr->addr check is not
> > > performed.  However, mr->addr is never fixed up in this case, which is a
> > > quiet potential bug, and this patch fixes that with the "has_addr" check.
> > > 
> > > For cpr-exec, migrate_ram_is_ignored() is true for all blocks,
> > > because we do not copy the contents over the migration stream, we 
> > > preserve the
> > > memory in place.  So we fall into the block->mr->addr sanity check and 
> > > fail
> > > with the original code.
> > 
> > OK I get your point now.  However this doesn't look right, instead I start
> > to question why we need to send mr->addr at all..
> > 
> > As I said previously, AFAIU mr->addr should only be updated when there's
> > some PCI config space updates so that it moves the MR around in the address
> > space based on how guest drivers / BIOS (?) set things up.  Now after these
> > days not looking, and just started to look at this again, I think the only
> > sane place to do this update is during a post_load().
> > 
> > And if we start to check some of the memory_region_set_address() users,
> > that's exactly what happened..
> > 
> >    - ich9_pm_iospace_update(), update addr for ICH9LPCPMRegs.io, where
> >      ich9_pm_post_load() also invokes it.
> > 
> >    - pm_io_space_update(), updates PIIX4PMState.io, where
> >      vmstate_acpi_post_load() also invokes it.
> > 
> > I stopped here just looking at the initial two users, it looks all sane to
> > me that it only got updated there, because the update requires pci config
> > space being migrated first.
> > 
> > IOW, I don't think having mismatched mr->addr is wrong at this stage.
> > Instead, I don't see why we should send mr->addr at all in this case during
> > as early as SETUP, and I don't see anything justifies the mr->addr needs to
> > be verified in parse_ramblock() since ignore-shared introduced by Yury in
> > commit fbd162e629aaf8 in 2019.
> > 
> > We can't drop mr->addr now when it's on-wire, but I think we should drop
> > the error report and addr check, instead of this patch.
> 
> As it turns out, my test case triggers this bug because it sets 
> x-ignore-shared,
> but x-ignore-shared is not needed for cpr-exec, because migrate_ram_is_ignored
> is true for all blocks when mode==cpr-exec.  So, the best fix for the GPAs bug
> for me is to stop setting x-ignore-shared.  I will drop this patch.
> 
> I agree that post_load is the right place to restore mr->addr, and I don't
> understand why commit fbd162e629aaf8 added the error report, but I am going
> to leave it as is.

Ah, I didn't notice that cpr special cased migrate_ram_is_ignored()..

Shall we stick with the old check, but always require cpr to rely on
ignore-shared?

Then we replace this patch with removing the error_report, probably
together with not caring about whatever is received at all.. would that be
cleaner?

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]