qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: qemu-img convert vs writing another copy tool


From: Richard W.M. Jones
Subject: Re: qemu-img convert vs writing another copy tool
Date: Fri, 24 Jan 2020 09:55:55 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Jan 23, 2020 at 01:21:28PM -0600, Eric Blake wrote:
> On 1/23/20 12:35 PM, Richard W.M. Jones wrote:
> >  - Hint that the target already contains zeroes.  It's almost always
> >    the case that we know this, but we cannot tell qemu.  This was the
> >    cause of a big performance regression last year.
> 
> This has just recently been proposed:
> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg03617.html

Oh indeed, this is good.

> >  - NBD multi-conn.  In my tests this makes a really massive
> >    performance difference in certain situations.  Again, virt-v2v has
> >    a lot of information that we cannot pass to qemu: we know, for
> >    example, exactly if the server supports the feature, how many
> >    threads are available, in some situations even have information
> >    about the network and backing disks that the data will travel over
> >    / be stored on.
> 
> Multi-conn for reading the source allows better parallelism.
> Multi-conn for writing is a bit trickier - it should be safe if the
> different connections are only touching distinct segments of the
> export (no overlaps), but as qemu does not advertise multiconn in
> such situations, you may still need a command-line switch to force
> multiple writers in spite of the server not advertising it.  Here,
> I'm not aware of anyone with patches underway, but I also think it
> would be a good ground for exploring.

But in the qemu-img convert case specifically, multi-conn should
be safe for writing?

One additional problem with multi-conn is that NBD servers only
advertise that the feature is present, not the best possible degree of
parallelism to use.  (It's possible that the server cannot or doesn't
know this.)

> >  - External block lists.  This is a rather obscure requirement, but
> >    it's necessary in the case where we can get the allocated block map
> >    from another source (eg. pyvmomi) and then want to use that with an
> >    NBD source that does not support extents (eg. nbdkit-ssh-plugin /
> >    libssh / sftp).  [Having said that, it may be possible to implement
> >    this as an nbdkit filter, so maybe this is not a blocking feature.]
> 
> How are you intending to use this? I'm guessing you have some way of
> feeding in information to qemu-img of which portions of the source
> image you want to copy, and ignore remaining portions.

I should say first that I've nearly finished an nbdkit filter
implementation of this, so feel free to ignore this for qemu.

The background to this feature is that some block device backends do
not have support for determining extents / disk block allocation
status.  The one that is most frequently used is ssh (sftp).  Note
that adding this support to sftp, while possible, doesn't really solve
the problem because the proprietary hypervisors we are pulling from
don't use recent SSH servers.

So copying from SSH is slow because you have no choice except to read
vast amounts of zeroes or deleted data.  (This doesn't affect virt-v2v
because it has another strategy to avoid this, but it does affect
other scenarios such as "warm" conversions and any migration that
doesn't involve using virt-v2v.)

However you can get the extent information by other means.  For VMware
you can use VMOMI to read this.  Or you can ssh in and run commands
like xfs_bmap.

So in theory at least it's possible to assemble the required data
from multiple sources and thus avoid wasteful copying.

With nbdkit you'll be able to do something like:

  # fetch the extents list over VMOMI > extents.txt, then
  nbdkit -U /tmp/sock --filter=extentlist ssh \
                   host=server /vmfs/.../file-flat.vmdk \
                   extentlist=extents.txt
  qemu-img convert nbd:unix:/tmp/sock ...

> Note that it IS already possible to use qemu's copy-on-read feature
> as a way to copy only a subset of a source file over to a
> destination file. When demonstrating incremental backup, I wrote
> this shell function:
> 
> copyif() {
> if test $# -lt 2 || test $# -gt 3; then
>   echo 'usage: copyif src dst [bitmap]'
>   return 1
> fi
> if test -z "$3"; then
>   map_from="-f raw nbd://localhost:10809/$1"
>   state=true
> else
>   map_from="--image-opts driver=nbd,export=$1,server.type=inet"
>   map_from+=",server.host=localhost,server.port=10809"
>   map_from+=",x-dirty-bitmap=qemu:dirty-bitmap:$3"
>   state=false
> fi
> $qemu_img info -f raw nbd://localhost:10809/$1 || return
> $qemu_img info -f qcow2 $2 || return
> ret=0
> $qemu_img rebase -u -f qcow2 -F raw -b nbd://localhost:10809/$1 $2
> while read line; do
>   [[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.$state.*
> ]] || continue
>   start=${BASH_REMATCH[1]} len=${BASH_REMATCH[2]}
>   echo
>   echo " $start $len:"
>   qemu-io -C -c "r $start $len" -f qcow2 $2
> done < <($qemu_img map --output=json $map_from)
> $qemu_img rebase -u -f qcow2 -b '' $2
> if test $ret = 0; then echo 'Success!'; fi
> return $ret
> }
> 
> The key lines here are 'qemu-io -C -c "r $start $len" -f qcow2 $2',
> which is performed in a loop to read just targetted portions of the
> destination qcow2 file with copy-on-read set to pull in that portion
> from its backing file, and '<($qemu_img map --output=json
> $map_from)' which was used to derive the extent map driving which
> portions of the file to read.
> 
> We also have 'qemu-img dd' that can copy subsets of a file, although
> it is not currently the ideal interface, and probably needs to be
> enhanced (I have a branch where I had tried working on patches for
> it, but where the feedback was that we want the improvements to be
> more generic, or even teach 'qemu-img convert' to support offsets
> the way 'qemu-img dd' tries to; I'd need to revisit that branch...)
> 
> >
> >One thing which qemu-img convert can do which nbdcp could not:
> >
> >  - Read or write from qcow2 files.
> 
> Although you could still couple things together: nbdcp for new
> features plus qemu-nbd to drive an NBD wrapper around qcow2 (as
> source or as destination).
> 
> >
> >So instead of splitting the ecosystem and writing a new tool that
> >doesn't do as much as qemu-img convert, I wonder what qemu developers
> >think about the above missing features?  For example, are they in
> >scope for qemu-img convert?
> >
> 
> I could see all of these being viable additions to qemu-img, but
> also wonder if writing nbdcp would get those features available in a
> faster manner.
> 
> 
> >
> >SYNOPSIS
> >         nbdcp [-a|--target-allocation allocated|sparse]
> >               [-b|--block-list <blocksfile>]
> 
> These make sense for any qemu-img format.
> 
> >               [-m|--multi-conn <n>] [-M|--multi-conn-target <n>]
> 
> These might make more sense as tunables for how to set up NBD client
> (destination) or server (source), rather than directly as qemu-img
> options.  That is, I could imagine that we'd use qemu-img
> --image-format, and then expose new blockdev-style knobs for setting
> up the NBD endpoint to enable multiconn usage of that endpoint.

Yes this makes sense.

> >               [-p|--progress-bar] [-S|--sparse-detect <n>]
> >               [-T|--threads <n>] [-z|--target-is-zero]
> >               'nbd://...'|DISK.IMG 'nbd://...'|DISK.IMG
> 
> And these options also seem like they are useful to qemu-img proper.
> 
> >
> >        This program cannot: copy from file to file (use cp(1) or dd(1)), 
> > copy
> >        to or from formats other than raw (use qemu-img(1) convert), or 
> > access
> >        servers other than NBD servers (also use qemu-img(1)).
> 
> Again, depending on how we want to mix-and-match things, using
> qemu-nbd to create the NBD endpoint for the nbdcp source or
> destination may be worthwhile (which is different than directly
> using qemu-img); we'd want some decent examples of building such
> chains between tools.  Or it could help us decide whether we can cut
> out some overhead by consolidating typical uses into one tool rather
> than requiring convoluted chains.
> 
> 
> >
> >        -b BLOCKSFILE
> >        --block-list=BLOCKSFILE
> >            Load the list of extents from an external file.  nbdcp considers
> >            this to be the truth for source extents.  The file should contain
> >            one record per line in the same format as nbdkit-sh-plugin(1), 
> > ie:
> >
> >             offset length type
> >
> >            with "offset" and "length" in bytes, and the "type" field being a
> >            comma-separated list of the words "hole" and "zero".  For 
> > example:
> >
> >             0  1M
> >             1M 9M  hole,zero
> 
> Could we also teach this to parse 'qemu-img map --output=json'
> format? And/or add 'qemu-img map --output=XYZ' (different from the
> current --output=human') that gives sufficient information?  (Note:
> --output=human is NOT suitable for extent lists - it intentionally
> outputs only the data portions, and in so doing coalesces 'hole' and
> 'hole,zero' segments to be indistinguishable).

If qemu-img doesn't have the data (we have to get it from
another source), is the output of qemu-img map relevant?

Rich.

> >
> >        -p
> >        --progress-bar
> >            Display a progress bar during copying.
> >
> >        -p machine:FD
> >        --progress-bar=machine:FD
> >            Write a machine-readable progress bar to file descriptor "FD".
> >            This progress bar prints lines with the format "COPIED/TOTAL"
> >            (where "COPIED" and "TOTAL" are 64 bit unsigned integers).
> 
> Supporting optional arguments to long options is okay, but
> supporting optional arguments to short options gets tricky when
> using getopt.  I would recommend two separate options, '-p' with no
> argument as shorthand for progress to stderr, and '-P description'
> with mandatory option for where to send progress, rather than trying
> to let '-p' have optional argument.
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]