qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V3 00/22] Live Update


From: Stefan Hajnoczi
Subject: Re: [PATCH V3 00/22] Live Update
Date: Fri, 14 May 2021 12:53:06 +0100

On Thu, May 13, 2021 at 04:21:15PM -0400, Steven Sistare wrote:
> On 5/12/2021 12:42 PM, Stefan Hajnoczi wrote:
> > On Fri, May 07, 2021 at 05:24:58AM -0700, Steve Sistare wrote:
> >> Provide the cprsave and cprload commands for live update.  These save and
> >> restore VM state, with minimal guest pause time, so that qemu may be 
> >> updated
> >> to a new version in between.
> >>
> >> cprsave stops the VM and saves vmstate to an ordinary file.  It supports 
> >> two
> >> modes: restart and reboot.  For restart, cprsave exec's the qemu binary (or
> >> /usr/bin/qemu-exec if it exists) with the same argv.  qemu restarts in a
> >> paused state and waits for the cprload command.
> > 
> > I think cprsave/cprload could be generalized by using QMP to stash the
> > file descriptors. The 'getfd' QMP command already exists and QEMU code
> > already opens fds passed using this mechanism.
> > 
> > I haven't checked but it may be possible to drop some patches by reusing
> > QEMU's monitor file descriptor passing since the code already knows how
> > to open from 'getfd' fds.
> > 
> > The reason why using QMP is interesting is because it eliminates the
> > need for execve(2). QEMU may be unable to execute a program due to
> > chroot, seccomp, etc.
> > 
> > QMP would enable cprsave/cprload to work both with and without
> > execve(2).
> > 
> > One tricky thing with this approach might be startup ordering: how to
> > get fds via the QMP monitor in the new process before processing the
> > entire command-line.
> 
> Early on I experimented with a similar approach.  Old qemu passed descriptors 
> to an
> escrow process and exited; new qemu started and retrieved the descriptors 
> from escrow.
> vfio mostly worked after I hacked the kernel to suppress the original-pid 
> owner check.
> I suspect my recent vfio extensions would smooth the rough edges.

I wonder about the reason for VFIO's pid limitation, maybe because it
pins pages from the original process?

Is this VFIO pid limitation the main reason why you chose to make QEMU
execve(2) the new binary?

> However, the main issue is that guest ram must be backed by named shared 
> memory, and
> we would need to add code to support shared memory for all the secondary 
> memory objects.
> That makes it less interesting for us at this time; we care about updating 
> legacy qemu 
> instances with anonymous guest memory.

Thanks for explaining this more in the other sub-thread. The secondary
memory objects you mentioned are relatively small so I don't think
saving them in the traditional way is a problem.

Two approaches for zero-copy memory migration fit into QEMU's existing
migration infrastructure:

- Marking RAM blocks that are backed by named memory (tmpfs, hugetlbfs,
  etc) so they are not saved into the savevm file. The existing --object
  memory-backend-file syntax can be used.

- Extending the live migration protocol to detect when file descriptor
  passing is available (i.e. UNIX domain socket migration) and using
  that for memory-backend-* objects that have fds.

Either of these approaches would handle RAM with existing savevm/migrate
commands.

The remaining issue is how to migrate VFIO and other file descriptors
that cannot be reopened by the new process. As mentioned, QEMU already
has file descriptor passing support in the QMP monitor and support for
opening passed file descriptors (see qemu_open_internal(),
monitor_fd_param(), and socket_get_fd()).

The advantage of integrating live update functionality into the existing
savevm/migrate commands is that it will work in more use cases with
less code duplication/maintenance/bitrot prevention than the
special-case cprsave command in this patch series.

Maybe there is a fundamental technical reason why live update needs to
be different from QEMU's existing migration commands but I haven't
figured it out yet.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]