[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V3 00/22] Live Update
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH V3 00/22] Live Update |
Date: |
Tue, 18 May 2021 10:57:50 +0100 |
User-agent: |
Mutt/2.0.7 (2021-05-04) |
* Steven Sistare (steven.sistare@oracle.com) wrote:
> On 5/14/2021 7:53 AM, Stefan Hajnoczi wrote:
> > On Thu, May 13, 2021 at 04:21:15PM -0400, Steven Sistare wrote:
> >> On 5/12/2021 12:42 PM, Stefan Hajnoczi wrote:
> >>> On Fri, May 07, 2021 at 05:24:58AM -0700, Steve Sistare wrote:
> >>>> Provide the cprsave and cprload commands for live update. These save and
> >>>> restore VM state, with minimal guest pause time, so that qemu may be
> >>>> updated
> >>>> to a new version in between.
> >>>>
> >>>> cprsave stops the VM and saves vmstate to an ordinary file. It supports
> >>>> two
> >>>> modes: restart and reboot. For restart, cprsave exec's the qemu binary
> >>>> (or
> >>>> /usr/bin/qemu-exec if it exists) with the same argv. qemu restarts in a
> >>>> paused state and waits for the cprload command.
> >>>
> >>> I think cprsave/cprload could be generalized by using QMP to stash the
> >>> file descriptors. The 'getfd' QMP command already exists and QEMU code
> >>> already opens fds passed using this mechanism.
> >>>
> >>> I haven't checked but it may be possible to drop some patches by reusing
> >>> QEMU's monitor file descriptor passing since the code already knows how
> >>> to open from 'getfd' fds.
> >>>
> >>> The reason why using QMP is interesting is because it eliminates the
> >>> need for execve(2). QEMU may be unable to execute a program due to
> >>> chroot, seccomp, etc.
> >>>
> >>> QMP would enable cprsave/cprload to work both with and without
> >>> execve(2).
> >>>
> >>> One tricky thing with this approach might be startup ordering: how to
> >>> get fds via the QMP monitor in the new process before processing the
> >>> entire command-line.
> >>
> >> Early on I experimented with a similar approach. Old qemu passed
> >> descriptors to an
> >> escrow process and exited; new qemu started and retrieved the descriptors
> >> from escrow.
> >> vfio mostly worked after I hacked the kernel to suppress the original-pid
> >> owner check.
> >> I suspect my recent vfio extensions would smooth the rough edges.
> >
> > I wonder about the reason for VFIO's pid limitation, maybe because it
> > pins pages from the original process?
>
> The dma unmap code verifies that the requesting task is the same as the task
> that mapped
> the pages. We could add an ioctl that passes ownership to a new task. We
> would also need
> to fix locked memory accounting, which is associated with the mm of the
> original task.
> > Is this VFIO pid limitation the main reason why you chose to make QEMU
> > execve(2) the new binary?
>
> That is one. Plus, re-attaching to named shared memory for pc.ram causes the
> vfio conflict
> errors I mentioned in the previous email. We would need to suppress
> redundant dma map calls,
> but allow legitimate dma maps and unmaps in response to the ongoing address
> space changes and
> diff callbacks caused by some drivers. It would be messy and fragile. In
> general, it felt like
> I was working against vfio rather than with it.
OK the weirdness of vfio helps explain a bit about why you're doing it
this way; can you help separate some difference between restart and
reboot for me though:
In 'reboot' mode; where the guest must do suspend in it's drivers, how
much of these vfio requirements are needed? I guess the memfd use
for the anonymous areas isn't any use for reboot mode.
You mention cprsave calls VFIO_DMA_UNMAP_FLAG_VADDR - after that does
vfio still care about the currently-anonymous areas?
> Another big reason is a requirement to preserve anonymous memory for legacy
> qemu updates (via
> code injection which I briefly mentioned in KVM forum). If we extend cpr to
> allow updates
> without exec, I still need the exec option.
Can you explain what that code injection mechanism is for those of us
who didn't see that?
Dave
> >> However, the main issue is that guest ram must be backed by named shared
> >> memory, and
> >> we would need to add code to support shared memory for all the secondary
> >> memory objects.
> >> That makes it less interesting for us at this time; we care about updating
> >> legacy qemu
> >> instances with anonymous guest memory.
> >
> > Thanks for explaining this more in the other sub-thread. The secondary
> > memory objects you mentioned are relatively small so I don't think
> > saving them in the traditional way is a problem.
> >
> > Two approaches for zero-copy memory migration fit into QEMU's existing
> > migration infrastructure:
> >
> > - Marking RAM blocks that are backed by named memory (tmpfs, hugetlbfs,
> > etc) so they are not saved into the savevm file. The existing --object
> > memory-backend-file syntax can be used.
> >
> > - Extending the live migration protocol to detect when file descriptor
> > passing is available (i.e. UNIX domain socket migration) and using
> > that for memory-backend-* objects that have fds.
> >
> > Either of these approaches would handle RAM with existing savevm/migrate
> > commands.
>
> Yes, but the vfio issues would still need to be solved, and we would need new
> command line options to back existing and future secondary memory objects
> with
> named shared memory.
>
> > The remaining issue is how to migrate VFIO and other file descriptors
> > that cannot be reopened by the new process. As mentioned, QEMU already
> > has file descriptor passing support in the QMP monitor and support for
> > opening passed file descriptors (see qemu_open_internal(),
> > monitor_fd_param(), and socket_get_fd()).
> >
> > The advantage of integrating live update functionality into the existing
> > savevm/migrate commands is that it will work in more use cases with
> > less code duplication/maintenance/bitrot prevention than the
> > special-case cprsave command in this patch series.
> >
> > Maybe there is a fundamental technical reason why live update needs to
> > be different from QEMU's existing migration commands but I haven't
> > figured it out yet.
>
> vfio and anonymous memory.
>
> Regarding code duplication, I did consider whether to extend the migration
> syntax and implementation versus creating something new. Those functions
> handle stuff like bdrv snapshot, aio, and migration which are n/a for the cpr
> use case, and the cpr functions handle state that is n/a for the migration
> case.
> I judged that handling both in the same functions would be less readable and
> maintainable. After feedback during the V1 review, I simplified the cprsave
> code by by calling qemu_save_device_state, as Xen does, thus eliminating any
> interaction with the migration code.
>
> Regarding bit rot, I still need to add a cpr test to the test suite, when the
> review is more complete and folks agree on the final form of the
> functionality.
>
> I do like the idea of supporting update without exec, but as a future
> project,
> and not at the expense of dropping update with exec.
>
> - Steve
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
- Re: [PATCH V3 00/22] Live Update, (continued)
- Re: [PATCH V3 00/22] Live Update, no-reply, 2021/05/07
- Re: [PATCH V3 00/22] Live Update, Stefan Hajnoczi, 2021/05/12
- Re: [PATCH V3 00/22] Live Update, Steven Sistare, 2021/05/13
- Re: [PATCH V3 00/22] Live Update, Stefan Hajnoczi, 2021/05/14
- Re: [PATCH V3 00/22] Live Update, Steven Sistare, 2021/05/14
- Re: [PATCH V3 00/22] Live Update, Stefan Hajnoczi, 2021/05/17
- Re: [PATCH V3 00/22] Live Update, Alex Williamson, 2021/05/17
- Re: [PATCH V3 00/22] Live Update, Stefan Hajnoczi, 2021/05/18
- Re: [PATCH V3 00/22] Live Update, Steven Sistare, 2021/05/18
- Re: [PATCH V3 00/22] Live Update,
Dr. David Alan Gilbert <=
- Re: [PATCH V3 00/22] Live Update, Steven Sistare, 2021/05/18
- Re: [PATCH V3 00/22] Live Update, Dr. David Alan Gilbert, 2021/05/18
- Re: [PATCH V3 00/22] Live Update, Alex Williamson, 2021/05/18
- Re: [PATCH V3 00/22] Live Update, Steven Sistare, 2021/05/18
- Re: [PATCH V3 00/22] Live Update [reboot], Dr. David Alan Gilbert, 2021/05/20
- Re: [PATCH V3 00/22] Live Update [reboot], Steven Sistare, 2021/05/21
- Re: [PATCH V3 00/22] Live Update [restart], Dr. David Alan Gilbert, 2021/05/20
- Re: [PATCH V3 00/22] Live Update [restart], Steven Sistare, 2021/05/21
- Re: [PATCH V3 00/22] Live Update [restart], Dr. David Alan Gilbert, 2021/05/24
Re: [PATCH V3 00/22] Live Update, Steven Sistare, 2021/05/19