[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] migration/docs: Explain two solutions for VMSD compatibility
From: |
Peter Xu |
Subject: |
Re: [PATCH] migration/docs: Explain two solutions for VMSD compatibility |
Date: |
Tue, 30 Jan 2024 12:25:47 +0800 |
On Mon, Jan 29, 2024 at 03:51:07PM +0000, Peter Maydell wrote:
> On Mon, 29 Jan 2024 at 15:18, Fabiano Rosas <farosas@suse.de> wrote:
> >
> > Peter Maydell <peter.maydell@linaro.org> writes:
> >
> > > On Mon, 29 Jan 2024 at 13:45, Fabiano Rosas <farosas@suse.de> wrote:
> > >>
> > >> Peter Xu <peterx@redhat.com> writes:
> > >> > Fundamentally, IMHO it's because QEMU as a project is used both in
> > >> > enterprise and personal emulations. I think it might be too strict to
> > >> > always request backward migration capability if we know some device /
> > >> > arch
> > >> > is only used for personal, or educational, purposes.
> > >>
> > >> Do we need migration support tiers? =)
> > >
> > > We already have them. The tier list is:
> >
> > Ah that's good. Thanks, Peter.
> >
> > >
> > > * if the machine type is a versioned one, then we maintain
> > > forwards compatibility for the versioned machine
> > > (i.e. can migrate machine-X.Y of QEMU A.B to the
> > > machine-X.Y of a QEMU C.D which is newer than A.B).
> > > * if the machine type is not versioned, then we do not make
> > > any guarantee of migration compatibility across QEMU versions.
> > > Instead the aim is that if the user tries it it either works
> > > or gives an error message that the migration failed
> > > (e.g. because the version field in a VMState struct was bumped).
> > > Migration breaks are generally called out in commit messages.
> > > Often for machines in this tier the user is really interested
> > > in state-save snapshots for debugging purposes, rather than
> > > in a true cross-host-machine migration.
> > > * some machine types do not support migration/savevm/loadvm
> > > at all, because of devices missing VMState structs. This
> > > is not desirable, and for new machine models we try to
> > > ensure that they have vmstate structs as part of the minimum
> > > quality bar, but it is true of some legacy machine types.
> >
> > Hm, does this mean in some cases we're requiring new models to have
> > vmstate only to never look at them again? Or do you mean some versioned
> > machines are currently broken?
>
> New device models have vmstate; we don't actively test that
> savevm/loadvm works, but as with most device models we fix bugs
> if anybody reports them. Some older device models simply omit
> the vmstate struct completely (which results in the guest not
> behaving right after savevm/loadvm); a few at least register a
> migration blocker. Usually if somebody's doing a refactoring
> and cleanup of an old device they'll add the vmstate while they're
> doing it.
>
> Any device which is used by a versioned machine type is supposed
> to have the vmstate support.
>
> > > AIUI we, in the sense of the upstream project, do not support
> > > backwards migration compatibility (i.e. migrating a machine-X.Y
> > > from QEMU C.D to QEMU A.B where A.B is an older version than C.D);
> > > though some downstreams (read: RedHat) may do so.
> >
> > Here we still need to make a distinction between migration code and
> > vmstate. If we simply ignore backwards migration then it might become
> > impossible for downstreams to provide it without major
> > modifications. But luckily this is the easy case.
>
> Yeah, there's no reason for us to make our downstreams' lives
> harder; the "not supported upstream" part is a mix of
> (a) we don't test it so it probably doesn't work and
> (b) we're not going to insist on patch submitters tying themselves
> in knots over trying to implement a level of compatibility for
> a device when we don't advertise that it's supposed to work
(b) makes sense. I suppose that further justified what this document
wanted to make clear of, on that we should probably allow vmsd versioning
to be used, which is already better than either migration incompatible, or
even not support migration at all on the system.
But shouldn't we still have that tier to request "backward migration"? Yes
it's not covered yet by any test, but maybe some day we can have it.
Besides, we're more discussing the goal which can apply to suggestions to
patch submitters and reviewers. I think that's still a fair goal so that
on extremely popular devices + architectures, we still ask for bi-direction
migration capability.
Do we have a place where such tiering is documented? Should I add one more
patch to describe it?
I think the hard thing is still how to justify which change will require
which tier: we were discussing machine type numbers, but normally IIUC many
devices can be used in multiple machine types, in that case should we pick
the strictest rule out of those machine types that such device support?
Maybe it'll be easier we just justify that during reviews.
--
Peter Xu