[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH v4 0/5] qcow2: async handling of fragmented io
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [Qemu-block] [PATCH v4 0/5] qcow2: async handling of fragmented io |
Date: |
Tue, 3 Sep 2019 12:16:11 +0000 |
Pinging, as Stefan's branch merged into master and now these series based on
master.
16.08.2019 18:30, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
>
> Here is an asynchronous scheme for handling fragmented qcow2
> reads and writes. Both qcow2 read and write functions loops through
> sequential portions of data. The series aim it to parallelize these
> loops iterations.
> It improves performance for fragmented qcow2 images, I've tested it
> as described below.
>
> v4 [perf results not updated]:
> 01: new patch. Unrelated, but need to fix 026 before the series to
> correctly fix it after :)
> 02: - use coroutine_fn where appropriate (i.e. in aio_task_pool_new too)
> - add Max's r-b
> 03,04: add Max's r-b
> 05: fix 026 output
>
> v3 (by Max's comments) [perf results not updated]:
>
> 01: - use coroutine_fn where appropriate !!!!!!!!!!!!!!!!!!!!!!!
> - add aio_task_pool_free
> - add some comments
> - move header to include/block
> - s/wait_done/waiting
> 02: - Rewrite note about decryption in guest buffers [thx to Eric]
> - separate g_assert_not_reached for QCOW2_CLUSTER_ZERO_*
> - drop return after g_assert_not_reached
> 03: - drop bytes_done and correctly use qiov_offset
> - fix comment
> 04: - move QCOW2_MAX_WORKERS to block/qcow2.h
> - initialize ret in qcow2_co_preadv_part
> Based-on: https://github.com/stefanha/qemu/commits/block
>
>
> v2: changed a lot, as
> 1. a lot of preparations around locks, hd_qiovs, threads for encryption
> are done
> 2. I decided to create separate file with async request handling API, to
> reuse it for backup, stream and copy-on-read to improve their performance
> too. Mirror and qemu-img convert has their own async request handling,
> may be we'll be able finally merge all these similar code into one
> feature.
> Note that not all API calls used in qcow2, some will be needed on
> following steps for parallelizing other io loops.
>
> About testing:
>
> I have four 4G qcow2 images (with default 64k block size) on my ssd disk:
> t-seq.qcow2 - sequentially written qcow2 image
> t-reverse.qcow2 - filled by writing 64k portions from end to the start
> t-rand.qcow2 - filled by writing 64k portions (aligned) in random order
> t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters
> (see source code of image generation in the end for details)
>
> and I've done several runs like the following (sequential io by 1mb chunks):
>
> out=/tmp/block; echo > $out; cat /tmp/files | while read file; do for wr
> in {"","-w"}; do echo "$file" $wr; ./qemu-img bench -c 4096 -d 1 -f qcow2 -n
> -s 1m -t none $wr "$file" | grep 'Run completed in' | awk '{print $4}' >>
> $out; done; done
>
>
> short info about parameters:
> -w - do writes (otherwise do reads)
> -c - count of blocks
> -s - block size
> -t none - disable cache
> -n - native aio
> -d 1 - don't use parallel requests provided by qemu-img bench itself
>
> results:
>
> +---------------------------+---------+---------+
> | file | master | async |
> +---------------------------+---------+---------+
> | /ssd/t-part-rand.qcow2 | 14.671 | 9.193 |
> +---------------------------+---------+---------+
> | /ssd/t-part-rand.qcow2 -w | 11.434 | 8.621 |
> +---------------------------+---------+---------+
> | /ssd/t-rand.qcow2 | 20.421 | 10.05 |
> +---------------------------+---------+---------+
> | /ssd/t-rand.qcow2 -w | 11.097 | 8.915 |
> +---------------------------+---------+---------+
> | /ssd/t-reverse.qcow2 | 17.515 | 9.407 |
> +---------------------------+---------+---------+
> | /ssd/t-reverse.qcow2 -w | 11.255 | 8.649 |
> +---------------------------+---------+---------+
> | /ssd/t-seq.qcow2 | 9.081 | 9.072 |
> +---------------------------+---------+---------+
> | /ssd/t-seq.qcow2 -w | 8.761 | 8.747 |
> +---------------------------+---------+---------+
> | /tmp/t-part-rand.qcow2 | 41.179 | 41.37 |
> +---------------------------+---------+---------+
> | /tmp/t-part-rand.qcow2 -w | 54.097 | 55.323 |
> +---------------------------+---------+---------+
> | /tmp/t-rand.qcow2 | 711.899 | 514.339 |
> +---------------------------+---------+---------+
> | /tmp/t-rand.qcow2 -w | 546.259 | 642.114 |
> +---------------------------+---------+---------+
> | /tmp/t-reverse.qcow2 | 86.065 | 96.522 |
> +---------------------------+---------+---------+
> | /tmp/t-reverse.qcow2 -w | 46.557 | 48.499 |
> +---------------------------+---------+---------+
> | /tmp/t-seq.qcow2 | 33.804 | 33.862 |
> +---------------------------+---------+---------+
> | /tmp/t-seq.qcow2 -w | 34.299 | 34.233 |
> +---------------------------+---------+---------+
>
>
> Performance gain is obvious, especially for read and especially for ssd.
> For hdd there is a degradation for reverse case, but this is the most
> impossible case and seems not critical.
>
> How images are generated:
>
> ==== gen-writes ======
> #!/usr/bin/env python
> import random
> import sys
>
> size = 4 * 1024 * 1024 * 1024
> block = 64 * 1024
> block2 = 1024 * 1024
>
> arg = sys.argv[1]
>
> if arg in ('rand', 'reverse', 'seq'):
> writes = list(range(0, size, block))
>
> if arg == 'rand':
> random.shuffle(writes)
> elif arg == 'reverse':
> writes.reverse()
> elif arg == 'part-rand':
> writes = []
> for off in range(0, size, block2):
> wr = list(range(off, off + block2, block))
> random.shuffle(wr)
> writes.extend(wr)
> elif arg != 'seq':
> sys.exit(1)
>
> for w in writes:
> print 'write -P 0xff {} {}'.format(w, block)
>
> print 'q'
> ==========================
>
> ===== gen-test-images.sh =====
> #!/bin/bash
>
> IMG_PATH=/ssd
>
> for name in seq reverse rand part-rand; do
> IMG=$IMG_PATH/t-$name.qcow2
> echo createing $IMG ...
> rm -f $IMG
> qemu-img create -f qcow2 $IMG 4G
> gen-writes $name | qemu-io $IMG
> done
> ==============================
>
>
> Vladimir Sementsov-Ogievskiy (5):
> tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache
> block: introduce aio task pool
> block/qcow2: refactor qcow2_co_preadv_part
> block/qcow2: refactor qcow2_co_pwritev_part
> block/qcow2: introduce parallel subrequest handling in read and write
>
> qapi/block-core.json | 2 +-
> block/qcow2.h | 3 +
> include/block/aio_task.h | 54 ++++
> block/aio_task.c | 124 ++++++++
> block/qcow2.c | 461 +++++++++++++++++++----------
> block/Makefile.objs | 2 +
> block/trace-events | 1 +
> tests/qemu-iotests/026.out | 18 +-
> tests/qemu-iotests/026.out.nocache | 188 ++++++------
> 9 files changed, 591 insertions(+), 262 deletions(-)
> create mode 100644 include/block/aio_task.h
> create mode 100644 block/aio_task.c
>
--
Best regards,
Vladimir
- Re: [Qemu-block] [PATCH v4 0/5] qcow2: async handling of fragmented io,
Vladimir Sementsov-Ogievskiy <=