[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v7 00/11] migration: bring improved savevm/loadvm/delvm to QMP
From: |
Daniel P . Berrangé |
Subject: |
[PATCH v7 00/11] migration: bring improved savevm/loadvm/delvm to QMP |
Date: |
Wed, 21 Oct 2020 17:26:53 +0100 |
v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg07523.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg07076.html
v4: https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg05221.html
v5: https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg00587.html
v6: https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg02158.html
This series aims to provide a better designed replacement for the
savevm/loadvm/delvm HMP commands, which despite their flaws continue
to be actively used in the QMP world via the HMP command passthrough
facility.
The main problems addressed are:
- The logic to pick which disk to store the vmstate in is not
satsifactory.
The first block driver state cannot be assumed to be the root disk
image, it might be OVMF varstore and we don't want to store vmstate
in there.
- The logic to decide which disks must be snapshotted is hardwired
to all disks which are writable
Again with OVMF there might be a writable varstore, but this can be
raw rather than qcow2 format, and thus unable to be snapshotted.
While users might wish to snapshot their varstore, in some/many/most
cases it is entirely uneccessary. Users are blocked from snapshotting
their VM though due to this varstore.
- The commands are synchronous blocking execution and returning
errors immediately.
This is partially addressed by integrating with the job framework.
This forces the client to use the async commands to determine
the completion status or error message from the operations.
In the block code I've only dealt with node names for block devices, as
IIUC, this is all that libvirt should need in the -blockdev world it now
lives in. IOW, I've made not attempt to cope with people wanting to use
these QMP commands in combination with -drive args, as libvirt will
never use -drive with a QEMU new enough to have these new commands.
The main limitations of this current impl
- The snapshot process runs serialized in the main thread. ie QEMU
guest execution is blocked for the duration. The job framework
lets us fix this in future without changing the QMP semantics
exposed to the apps.
- Most vmstate loading errors just go to stderr, as they are not
using Error **errp reporting. Thus the job framework just
reports a fairly generic message
"Error -22 while loading VM state"
Again this can be fixed later without changing the QMP semantics
exposed to apps.
I've done some minimal work in libvirt to start to make use of the new
commands to validate their functionality, but this isn't finished yet.
My ultimate goal is to make the GNOME Boxes maintainer happy again by
having internal snapshots work with OVMF:
https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e=
f45c5f64048f16a6e
Changed in v7:
- Incorporate changes from:
https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03165.html
- Tweaked error message
Changed in v6:
- Resolve many conflicts with recent replay changes
- Misc typos in QAPI
Changed in v5:
- Fix prevention of tag overwriting
- Refactor and expand test suite coverage to validate
more negative scenarios
Changed in v4:
- Make the device lists mandatory, dropping all support for
QEMU's built-in heuristics to select devices.
- Improve some error reporting and I/O test coverage
Changed in v3:
- Schedule a bottom half to escape from coroutine context in
the jobs. This is needed because the locking in the snapshot
code goes horribly wrong when run from a background coroutine
instead of the main event thread.
- Re-factor way we iterate over devices, so that we correctly
report non-existant devices passed by the user over QMP.
- Add QAPI docs notes about limitations wrt vmstate error
reporting (it all goes to stderr not an Error **errp)
so QMP only gets a fairly generic error message currently.
- Add I/O test to validate many usage scenarios / errors
- Add I/O test helpers to handle QMP events with a deterministic
ordering
- Ensure 'delete-snapshot' reports an error if requesting
delete from devices that don't support snapshot, instead of
silently succeeding with no erro.
Changed in v2:
- Use new command names "snapshot-{load,save,delete}" to make it
clear that these are different from the "savevm|loadvm|delvm"
as they use the Job framework
- Use an include list for block devs, not an exclude list
Daniel P. Berrang=C3=A9 (10):
block: push error reporting into bdrv_all_*_snapshot functions
migration: stop returning errno from load_snapshot()
block: add ability to specify list of blockdevs during snapshot
block: allow specifying name of block device for vmstate storage
block: rename and alter bdrv_all_find_snapshot semantics
migration: control whether snapshots are ovewritten
migration: wire up support for snapshot device selection
migration: introduce a delete_snapshot wrapper
iotests: add support for capturing and matching QMP events
migration: introduce snapshot-{save,load,delete} QMP commands
Philippe Mathieu-Daud=C3=A9 (1):
migration: Make save_snapshot() return bool, not 0/-1
block/monitor/block-hmp-cmds.c | 7 +-
block/snapshot.c | 256 +++++++++++++++------
include/block/snapshot.h | 23 +-
include/migration/snapshot.h | 47 +++-
migration/savevm.c | 294 ++++++++++++++++++++----
monitor/hmp-cmds.c | 12 +-
qapi/job.json | 9 +-
qapi/migration.json | 121 ++++++++++
replay/replay-debugging.c | 12 +-
replay/replay-snapshot.c | 5 +-
softmmu/vl.c | 2 +-
tests/qemu-iotests/267.out | 12 +-
tests/qemu-iotests/310 | 385 +++++++++++++++++++++++++++++++
tests/qemu-iotests/310.out | 407 +++++++++++++++++++++++++++++++++
tests/qemu-iotests/common.qemu | 107 ++++++++-
tests/qemu-iotests/group | 1 +
16 files changed, 1548 insertions(+), 152 deletions(-)
create mode 100755 tests/qemu-iotests/310
create mode 100644 tests/qemu-iotests/310.out
--=20
2.26.2
- [PATCH v7 00/11] migration: bring improved savevm/loadvm/delvm to QMP,
Daniel P . Berrangé <=
- [PATCH v7 01/11] block: push error reporting into bdrv_all_*_snapshot functions, Daniel P . Berrangé, 2020/10/21
- [PATCH v7 02/11] migration: Make save_snapshot() return bool, not 0/-1, Daniel P . Berrangé, 2020/10/21
- [PATCH v7 03/11] migration: stop returning errno from load_snapshot(), Daniel P . Berrangé, 2020/10/21
- [PATCH v7 04/11] block: add ability to specify list of blockdevs during snapshot, Daniel P . Berrangé, 2020/10/21
- [PATCH v7 05/11] block: allow specifying name of block device for vmstate storage, Daniel P . Berrangé, 2020/10/21
- [PATCH v7 07/11] migration: control whether snapshots are ovewritten, Daniel P . Berrangé, 2020/10/21
- [PATCH v7 06/11] block: rename and alter bdrv_all_find_snapshot semantics, Daniel P . Berrangé, 2020/10/21
- [PATCH v7 08/11] migration: wire up support for snapshot device selection, Daniel P . Berrangé, 2020/10/21
- [PATCH v7 10/11] iotests: add support for capturing and matching QMP events, Daniel P . Berrangé, 2020/10/21