[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 06/29] migration: Add auto-pause capability
From: |
Peter Xu |
Subject: |
Re: [PATCH v2 06/29] migration: Add auto-pause capability |
Date: |
Wed, 25 Oct 2023 10:58:16 -0400 |
On Wed, Oct 25, 2023 at 03:20:16PM +0100, Daniel P. Berrangé wrote:
> On Wed, Oct 25, 2023 at 10:57:12AM -0300, Fabiano Rosas wrote:
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> >
> > > On Mon, Oct 23, 2023 at 05:35:45PM -0300, Fabiano Rosas wrote:
> > >> Add a capability that allows the management layer to delegate to QEMU
> > >> the decision of whether to pause a VM and perform a non-live
> > >> migration. Depending on the type of migration being performed, this
> > >> could bring performance benefits.
> > >
> > > I'm not really see what problem this is solving.
> > >
> >
> > Well, this is the fruit of your discussion with Peter Xu in the previous
> > version of the patch.
> >
> > To recap: he thinks QEMU is doing useless work with file migrations
> > because they are always asynchronous. He thinks we should always pause
> > before doing fixed-ram migration. You said that libvirt would rather use
> > fixed-ram for a more broad set of savevm-style commands, so you'd rather
> > not always pause. I'm trying to cater to both of your wishes. This new
> > capability is the middle ground I came up with.
> >
> > So fixed-ram would always pause the VM, because that is the primary
> > use-case, but libvirt would be allowed to say: don't pause this time.
>
> If the VM is going to be powered off immediately after saving
> a snapshot then yes, you might as well pause it, but we can't
> assume that will be the case. An equally common use case
> would be for saving periodic snapshots of a running VM. This
> should be transparent such that the VM remains running the
> whole time, except a narrow window at completion of RAM/state
> saving where we flip the disk snapshots, so they are in sync
> with the RAM snapshot.
Libvirt will still use fixed-ram for live snapshot purpose, especially for
Windows? Then auto-pause may still be useful to identify that from what
Fabiano wants to achieve here (which is in reality, non-live)?
IIRC of previous discussion that was the major point that libvirt can still
leverage fixed-ram for a live case - since Windows lacks efficient live
snapshot (background-snapshot feature).
>From that POV it sounds like auto-pause is a good knob for that.
>
> IOW, save/restore to disk can imply paused, but snapshotting
> should not imply paused. So I don't see an unambiguous
> rationale that we should diverge when fixed-ram is set and
> auto-pause the VM.
>
> > > Mgmt apps are perfectly capable of pausing the VM before issuing
> > > the migrate operation.
> > >
> >
> > Right. But would QEMU be allowed to just assume that if a VM is paused
> > at the start of migration it can then go ahead and skip all dirty page
> > mechanisms?
>
> Skipping dirty page tracking would imply that the mgmt app cannot
> resume CPUs without either letting the operation complete, or
> aborting it.
>
> That is probably a reasonable assumption, as I can't come up with
> a use case for starting out paused and then later resuming, unless
> there was a scearnio where you needed to synchronous something
> external with the start of migration. Sychronizing storage though
> is something that happens at the end of migration instead.
>
> > Without pausing, we're basically doing *live* migration into a static
> > file that will be kept on disk for who knows how long before being
> > restored on the other side. We could release the src QEMU resources (a
> > bit) earlier if we paused the VM beforehand.
>
> Can we really release resources early ? If the save operation fails
> right at the end, we want to be able to resume execution of CPUs,
> which assumes all resources are still available, otherwise we have
> a failure scenario where we've not successfully saved to disk and
> also don't still have the running QEMU.
Indeed we need to consider if the user starts the VM again during the
auto-pause enabled migration. A few options, and one of them should allow
early free of resources. Assuming auto-pause=on and migration started,
then:
1) Allow VM starts later
1.a) Start dirty tracking right at this point
Not prefer this. This will make all things transparent but IMHO
unnecessary complexity on maintaining dirty tracking status.
1.b) Fail the migration
Can be a good option, IMHO, treating auto-pause as a promise from
the user that VM won't need to be running anymore. If VM starts,
promise break, migration fails.
2) Doesn't allow VM starts later
Can also be a good option. In this case VM resources (I think
mostly, RAM) can be freed right after migrated. If user request
VM start, fail the start instead of migration itself. Migration
must succeed or data lost.
Thanks,
>
> > We're basically talking about whether we want the VM to be usable in the
> > (hopefully) very short time between issuing the migration command and
> > the migration being finished. We might be splitting hairs here, but we
> > need some sort of consensus.
>
> The time may not be very short for large VMs.
>
> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
--
Peter Xu
- [PATCH v2 04/29] migration: Return the saved state from global_state_store, (continued)
- [PATCH v2 04/29] migration: Return the saved state from global_state_store, Fabiano Rosas, 2023/10/23
- [PATCH v2 05/29] migration: Introduce global_state_store_once, Fabiano Rosas, 2023/10/23
- [PATCH v2 06/29] migration: Add auto-pause capability, Fabiano Rosas, 2023/10/23
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Daniel P . Berrangé, 2023/10/25
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Fabiano Rosas, 2023/10/25
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Daniel P . Berrangé, 2023/10/25
- Re: [PATCH v2 06/29] migration: Add auto-pause capability,
Peter Xu <=
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Daniel P . Berrangé, 2023/10/25
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Peter Xu, 2023/10/25
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Daniel P . Berrangé, 2023/10/25
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Peter Xu, 2023/10/25
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Daniel P . Berrangé, 2023/10/25
- Re: [PATCH v2 06/29] migration: Add auto-pause capability, Peter Xu, 2023/10/25
[PATCH v2 07/29] migration: Run "file:" migration with a stopped VM, Fabiano Rosas, 2023/10/23
[PATCH v2 08/29] tests/qtest: File migration auto-pause tests, Fabiano Rosas, 2023/10/23
[PATCH v2 09/29] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file, Fabiano Rosas, 2023/10/23
[PATCH v2 10/29] io: Add generic pwritev/preadv interface, Fabiano Rosas, 2023/10/23