qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 15/29] migration/ram: Add support for 'fixed-ram' outgoing


From: Peter Xu
Subject: Re: [PATCH v2 15/29] migration/ram: Add support for 'fixed-ram' outgoing migration
Date: Wed, 1 Nov 2023 12:24:22 -0400

On Wed, Nov 01, 2023 at 03:52:18PM +0000, Daniel P. Berrangé wrote:
> On Wed, Nov 01, 2023 at 11:23:37AM -0400, Peter Xu wrote:
> > On Wed, Oct 25, 2023 at 10:39:58AM +0100, Daniel P. Berrangé wrote:
> > > If I'm reading the code correctly the new format has some padding
> > > such that each "ramblock pages" region starts on a 1 MB boundary.
> > > 
> > > eg so we get:
> > > 
> > >  --------------------------------
> > >  | ramblock 1 header            |
> > >  --------------------------------
> > >  | ramblock 1 fixed-ram header  |
> > >  --------------------------------
> > >  | padding to next 1MB boundary |
> > >  | ...                          |
> > >  --------------------------------
> > >  | ramblock 1 pages             |
> > >  | ...                          |
> > >  --------------------------------
> > >  | ramblock 2 header            |
> > >  --------------------------------
> > >  | ramblock 2 fixed-ram header  |
> > >  --------------------------------
> > >  | padding to next 1MB boundary |
> > >  | ...                          |
> > >  --------------------------------
> > >  | ramblock 2 pages             |
> > >  | ...                          |
> > >  --------------------------------
> > >  | ...                          |
> > >  --------------------------------
> > >  | RAM_SAVE_FLAG_EOS            |
> > >  --------------------------------
> > >  | ...                          |
> > >  -------------------------------
> > 
> > When reading the series, I was thinking one more thing on whether fixed-ram
> > would like to leverage compression in the future?
> 
> Libvirt currently supports compression of saved state images, so yes,
> I think compression is a desirable feature.

Ah, yeah this will work too; one more copy as you mentioned below, but
assume that's not a major concern so far (or.. will it?).

> 
> Due to libvirt's architecture it does compression on the stream and
> the final step in the sequence bounc buffers into suitably aligned
> memory required for O_DIRECT.
> 
> > To be exact, not really fixed-ram as a feature, but non-live snapshot as
> > the real use case.  More below.
> > 
> > I just noticed that compression can be a great feature to have for such use
> > case, where the image size can be further shrinked noticeably.  In this
> > case, speed of savevm may not matter as much as image size (as compression
> > can take some more cpu overhead): VM will be stopped anyway.
> > 
> > With current fixed-ram layout, we probably can't have compression due to
> > two reasons:
> > 
> >   - We offset each page with page alignment in the final image, and that's
> >     where fixed-ram as the term comes from; more fundamentally,
> > 
> >   - We allow src VM to run (dropping auto-pause as the plan, even if we
> >     plan to guarantee it not run; QEMU still can't take that as
> >     guaranteed), then we need page granule on storing pages, and then it's
> >     hard to know the size of each page after compressed.
> > 
> > If with the guarantee that VM is stopped, I think compression should be
> > easy to get?  Because right after dropping the page-granule requirement, we
> > can compress in chunks, storing binary in the image, one page written once.
> > We may lose O_DIRECT but we can consider the hardware accelerators on
> > [de]compress if necessary.
> 
> We can keep O_DIRECT if we buffer in QEMU between compressor output
> and disk I/O, which is what libvirt does. QEMU would still be saving
> at least one extra copy compared to libvirt
> 
> 
> The fixed RAM layout was primarily intended to allow easy parallel
> I/O without needing any synchronization between threads. In theory
> fixed RAM layout even allows you todo something fun like
> 
>    maped_addr = mmap(save-stat-fd, offset, ramblocksize);
>    memcpy(ramblock, maped_addr, ramblocksize)
>    munmap(maped_addr)
> 
> which would still be buffered I/O without O_DIRECT, but might be better
> than many writes() as you avoid 1000's of syscalls.
> 
> Anyway back to compression, I think if you wanted to allow for parallel
> I/O, then it would require a different "fixed ram" approach, where each
> multifd  thread requested use of a 64 MB region, compressed until that
> was full, then asked for another 64 MB region, repeat until done.

Right, we need a constant buffer per-thread if so.

> 
> The reason we didn't want to break up the file format into regions like
> this is because we wanted to allow for flexbility into configuration on
> save / restore. eg  you might save using 7 threads, but restore using
> 3 threads. We didn't want the on-disk layout to have any structural
> artifact that was related to the number of threads saving data, as that
> would make restore less efficient. eg 2 threads would process 2 chunks
> each and  and 1 thread would process 3 chunks, which is unbalanced.

I didn't follow on why the image needs to contain thread number
information.

Can the sub-header for each compressed chunk be described as (assuming
under specific ramblock header, so ramblock is known):

  - size of compressed data
  - (start_offset, end_offset) of pages this chunk of data represents

Then when saving, we assign 64M to each thread no matter how many are
there, for each thread it first compresses 64M into binary, knowing the
size, then request for a writeback to image, with the chunk header and
binary flushed.

Then the final image will be a sequence of chunks for each ramblock.

Assuming decompress can do the same by assigning different chunks to each
decompress thread, no matter how many are there.

Would that work?

To go back to the original topic: I think it's fine if Libvirt will do the
compression, that is more flexible indeed to do per-file with whatever
compression algorithm the uesr wants, and even cover non-RAM data.

I think such considerations / thoughts over compression solution may also
be nice to be documented in the docs/ under this feature.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]