|
From: | Paolo Bonzini |
Subject: | Re: Fio regression caused by f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 |
Date: | Fri, 6 May 2022 10:42:05 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 |
On 5/6/22 06:30, Lukáš Doktor wrote:
Also let me briefly share the details about the execution:
Thanks, this is super useful! I got very similar results to yours: QEMU 6.2 bw=1132MiB/s QEMU 7.0 bw=1046MiB/s QEMU 7.0 + patch bw=1012MiB/s QEMU 7.0 + tweaked patch bw=1077MiB/s "tweaked patch" is moving qemu_cond_signal after qemu_mutex_unlock. It's better than QemuSemaphore in QEMU 7.0 but still not as good as the original. /me thinks Paolo
--- mkdir -p /var/lib/runperf/runperf-nbd/ truncate -s 256M /var/lib/runperf/runperf-nbd//disk.img nohup qemu-nbd -t -k /var/lib/runperf/runperf-nbd//socket -f raw /var/lib/runperf/runperf-nbd//disk.img &> $(mktemp /var/lib/runperf/runperf-nbd//qemu_nbd_XXXX.log) & echo $! >> /var/lib/runperf/runperf-nbd//kill_pids for PID in $(cat /var/lib/runperf/runperf-nbd//kill_pids); do disown -h $PID; done export TERM=xterm-256color true mkdir -p /var/lib/runperf/runperf-nbd/ cat > /var/lib/runperf/runperf-nbd/nbd.fio << \Gr1UaS # To use fio to test nbdkit: # # nbdkit -U - memory size=256M --run 'export unixsocket; fio examples/nbd.fio' # # To use fio to test qemu-nbd: # # rm -f /tmp/disk.img /tmp/socket # truncate -s 256M /tmp/disk.img # export target=/tmp/socket # qemu-nbd -t -k $target -f raw /tmp/disk.img & # fio examples/nbd.fio # killall qemu-nbd [global] bs = $@ runtime = 30 ioengine = nbd iodepth = 32 direct = 1 sync = 0 time_based = 1 clocksource = gettimeofday ramp_time = 5 write_bw_log = fio write_iops_log = fio write_lat_log = fio log_avg_msec = 1000 write_hist_log = fio log_hist_msec = 10000 # log_hist_coarseness = 4 # 76 bins rw = $@ uri=nbd+unix:///?socket=/var/lib/runperf/runperf-nbd/socket # Starting from nbdkit 1.14 the following will work: #uri=${uri} [job0] offset=0 [job1] offset=64m [job2] offset=128m [job3] offset=192m Gr1UaS benchmark_bin=/usr/local/bin/fio pbench-fio --block-sizes=4 --job-file=/var/lib/runperf/runperf-nbd/nbd.fio --numjobs=4 --runtime=60 --samples=5 --test-types=write --clients=$WORKER_IP --- I am using pbench to run the execution, but you can simply replace the "$@" variables in the produced "/var/lib/runperf/runperf-nbd/nbd.fio" and run it directly using fio. Regards, Lukáš Dne 05. 05. 22 v 15:27 Paolo Bonzini napsal(a):On 5/5/22 14:44, Daniel P. Berrangé wrote:util/thread-pool.c uses qemu_sem_*() to notify worker threads when work becomes available. It makes sense that this operation is performance-critical and that's why the benchmark regressed.Doh, I questioned whether the change would have a performance impact, and it wasn't thought to be used in perf critical placesThe expectation was that there would be no contention and thus no overhead because of the pool->lock that exists anyway, but that was optimistic. Lukáš, can you run a benchmark with this condvar implementation that was suggested by Stefan: 20220505131346.823941-1-pbonzini@redhat.com/raw">https://lore.kernel.org/qemu-devel/20220505131346.823941-1-pbonzini@redhat.com/raw ? If it still regresses, we can either revert the patch or look at a different implementation (even getting rid of the global queue is an option). Thanks, Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |