qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 10/10] migration: introduce snapshot-{save,load,delete} QM


From: Dr. David Alan Gilbert
Subject: Re: [PATCH v5 10/10] migration: introduce snapshot-{save,load,delete} QMP commands
Date: Tue, 6 Oct 2020 18:36:30 +0100
User-agent: Mutt/1.14.6 (2020-07-11)

* Eric Blake (eblake@redhat.com) wrote:
> On 10/2/20 11:27 AM, Daniel P. Berrangé wrote:
> > savevm, loadvm and delvm are some of the few HMP commands that have never
> > been converted to use QMP. The reasons for the lack of conversion are
> > that they blocked execution of the event thread, and the semantics
> > around choice of disks were ill-defined.
> > 
> > Despite this downside, however, libvirt and applications using libvirt
> > have used these commands for as long as QMP has existed, via the
> > "human-monitor-command" passthrough command. IOW, while it is clearly
> > desirable to be able to fix the problems, they are not a blocker to
> > all real world usage.
> > 
> > Meanwhile there is a need for other features which involve adding new
> > parameters to the commands. This is possible with HMP passthrough, but
> > it provides no reliable way for apps to introspect features, so using
> > QAPI modelling is highly desirable.
> > 
> > This patch thus introduces new snapshot-{load,save,delete} commands to
> > QMP that are intended to replace the old HMP counterparts. The new
> > commands are given different names, because they will be using the new
> > QEMU job framework and thus will have diverging behaviour from the HMP
> > originals. It would thus be misleading to keep the same name.
> > 
> > While this design uses the generic job framework, the current impl is
> > still blocking. The intention that the blocking problem is fixed later.
> > None the less applications using these new commands should assume that
> > they are asynchronous and thus wait for the job status change event to
> > indicate completion.
> > 
> > In addition to using the job framework, the new commands require the
> > caller to be explicit about all the block device nodes used in the
> > snapshot operations, with no built-in default heuristics in use.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> 
> > +++ b/qapi/job.json
> > @@ -22,10 +22,17 @@
> >  #
> >  # @amend: image options amend job type, see "x-blockdev-amend" (since 5.1)
> >  #
> > +# @snapshot-load: snapshot load job type, see "snapshot-load" (since 5.2)
> > +#
> > +# @snapshot-save: snapshot save job type, see "snapshot-save" (since 5.2)
> > +#
> > +# @snapshot-delete: snapshot delete job type, see "snapshot-delete" (since 
> > 5.2)
> > +#
> >  # Since: 1.7
> >  ##
> >  { 'enum': 'JobType',
> > -  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend'] }
> > +  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend',
> > +           'snapshot-load', 'snapshot-save', 'snapshot-delete'] }
> >  
> >  ##
> >  # @JobStatus:
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 7f5e6fd681..d2bd551ad9 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1787,3 +1787,123 @@
> >  # Since: 5.2
> >  ##
> >  { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
> > +
> > +##
> > +# @snapshot-save:
> > +#
> > +# Save a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to create
> > +# @devices: list of block device node names to save a snapshot to
> > +# @vmstate: block device node name to save vmstate to
> 
> Here, you document vmstate last,...
> 
> > +#
> > +# Applications should not assume that the snapshot save is complete
> > +# when this command returns. The job commands / events must be used
> > +# to determine completion and to fetch details of any errors that arise.
> > +#
> > +# Note that the VM CPUs will be paused during the time it takes to
> > +# save the snapshot
> 
> "will be", or "may be"?  As you stated above, we may be able to lift the
> synchronous limitations down the road, while still maintaining the
> present interface of using this command to start the job and waiting on
> the job id until it is finished, at which point the CPUs might not need
> to be paused as much.
> 
> > +#
> > +# It is strongly recommended that @devices contain all writable
> > +# block device nodes if a consistent snapshot is required.
> > +#
> > +# If @tag already exists, an error will be reported
> > +#
> > +# Returns: nothing
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "snapshot-save",
> > +#      "data": {
> > +#         "job-id": "snapsave0",
> > +#         "tag": "my-snap",
> > +#         "vmstate": "disk0",
> > +#         "devices": ["disk0", "disk1"]
> 
> ...here vmstate occurs before devices.  I don't know if our doc
> generator cares about inconsistent ordering.
> 
> > +#      }
> > +#    }
> > +# <- { "return": { } }
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'command': 'snapshot-save',
> > +  'data': { 'job-id': 'str',
> > +            'tag': 'str',
> > +            'vmstate': 'str',
> > +            'devices': ['str'] } }
> > +
> > +##
> > +# @snapshot-load:
> > +#
> > +# Load a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to load.
> > +# @devices: list of block device node names to load a snapshot from
> > +# @vmstate: block device node name to load vmstate from
> > +#
> > +# Applications should not assume that the snapshot save is complete
> > +# when this command returns. The job commands / events must be used
> > +# to determine completion and to fetch details of any errors that arise.
> 
> s/save/load/
> 
> > +#
> > +# Note that the VM CPUs will be paused during the time it takes to
> > +# save the snapshot
> 
> s/save/load/
> 
> But while pausing CPUs during save is annoying, pausing CPUs during
> restore makes sense (after all, executing on stale data that will still
> be updated during the restore is just wasted execution).

Note that there are other snapshotting schemes that can do this more
dynamically and page/load the state on demand - a rapid resume from
snapshot like that is quite attractive.

Dave

> 
> > +#
> > +# It is strongly recommended that @devices contain all writable
> > +# block device nodes that can have changed since the original
> > +# @snapshot-save command execution.
> > +#
> > +# Returns: nothing
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "snapshot-load",
> > +#      "data": {
> > +#         "job-id": "snapload0",
> > +#         "tag": "my-snap",
> > +#         "vmstate": "disk0",
> > +#         "devices": ["disk0", "disk1"]
> > +#      }
> > +#    }
> > +# <- { "return": { } }
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'command': 'snapshot-load',
> > +  'data': { 'job-id': 'str',
> > +            'tag': 'str',
> > +            'vmstate': 'str',
> > +            'devices': ['str'] } }
> > +
> > +##
> > +# @snapshot-delete:
> > +#
> > +# Delete a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to delete.
> > +# @devices: list of block device node names to delete a snapshot from
> > +#
> > +# Applications should not assume that the snapshot save is complete
> > +# when this command returns. The job commands / events must be used
> > +# to determine completion and to fetch details of any errors that arise.
> 
> Do we have a query- command handy to easily learn which snapshot names
> are even available to attempt deletion on?  If not, that's worth a
> separate patch.
> 
> > +#
> > +# Returns: nothing
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "snapshot-delete",
> > +#      "data": {
> > +#         "job-id": "snapdelete0",
> > +#         "tag": "my-snap",
> > +#         "devices": ["disk0", "disk1"]
> > +#      }
> > +#    }
> > +# <- { "return": { } }
> > +#
> > +# Since: 5.2
> > +##
> 
> > +++ b/tests/qemu-iotests/group
> > @@ -291,6 +291,7 @@
> >  277 rw quick
> >  279 rw backing quick
> >  280 rw migration quick
> > +310 rw quick
> >  281 rw quick
> >  282 rw img quick
> >  283 auto quick
> 
> What's wrong with sorted order? I get the renumbering to appease a merge
> conflict, but it also requires rearrangement ;)
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
> 



-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]