qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] 4b2b3d: coroutine: resize pool periodically i


From: Richard Henderson
Subject: [Qemu-commits] [qemu/qemu] 4b2b3d: coroutine: resize pool periodically instead of lim...
Date: Thu, 21 Oct 2021 12:26:06 -0700

  Branch: refs/heads/staging
  Home:   https://github.com/qemu/qemu
  Commit: 4b2b3d2653f255ef4259a7689af1956536565901
      
https://github.com/qemu/qemu/commit/4b2b3d2653f255ef4259a7689af1956536565901
  Author: Stefan Hajnoczi <stefanha@redhat.com>
  Date:   2021-10-21 (Thu, 21 Oct 2021)

  Changed paths:
    A include/qemu/coroutine-pool-timer.h
    M include/qemu/coroutine.h
    M iothread.c
    A util/coroutine-pool-timer.c
    M util/main-loop.c
    M util/meson.build
    M util/qemu-coroutine.c

  Log Message:
  -----------
  coroutine: resize pool periodically instead of limiting size

It was reported that enabling SafeStack reduces IOPS significantly
(>25%) with the following fio benchmark on virtio-blk using a NVMe host
block device:

  # fio --rw=randrw --bs=4k --iodepth=64 --runtime=1m --direct=1 \
        --filename=/dev/vdb --name=job1 --ioengine=libaio --thread \
        --group_reporting --numjobs=16 --time_based \
        --output=/tmp/fio_result

Serge Guelton and I found that SafeStack is not really at fault, it just
increases the cost of coroutine creation. This fio workload exhausts the
coroutine pool and coroutine creation becomes a bottleneck. Previous
work by Honghao Wang also pointed to excessive coroutine creation.

Creating new coroutines is expensive due to allocating new stacks with
mmap(2) and mprotect(2). Currently there are thread-local and global
pools that recycle old Coroutine objects and their stacks but the
hardcoded size limit of 64 for thread-local pools and 128 for the global
pool is insufficient for the fio benchmark shown above.

This patch changes the coroutine pool algorithm to a simple thread-local
pool without a maximum size limit. Threads periodically shrink the pool
down to a size sufficient for the maximum observed number of coroutines.

The global pool is removed by this patch. It can help to hide the fact
that local pools are easily exhausted, but it's doesn't fix the root
cause. I don't think there is a need for a global pool because QEMU's
threads are long-lived, so let's keep things simple.

Performance of the above fio benchmark is as follows:

      Before   After
IOPS     60k     97k

Memory usage varies over time as needed by the workload:

            VSZ (KB)             RSS (KB)
Before fio  4705248              843128
During fio  5747668 (+ ~100 MB)  849280
After fio   4694996 (- ~100 MB)  845184

This confirms that coroutines are indeed being freed when no longer
needed.

Thanks to Serge Guelton for working on identifying the bottleneck with
me!

Reported-by: Tingting Mao <timao@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20210913153524.1190696-1-stefanha@redhat.com
Cc: Serge Guelton <sguelton@redhat.com>
Cc: Honghao Wang <wanghonghao@bytedance.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Daniele Buono <dbuono@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

[Moved atexit notifier to coroutine_delete() after GitLab CI reported a
memory leak in tests/unit/test-aio-multithread because the Coroutine
object was created in the main thread but runs in an IOThread (where
it's also deleted).
--Stefan]


  Commit: 78d98bfbd39d507ffb38f846f60ad63335cd9c83
      
https://github.com/qemu/qemu/commit/78d98bfbd39d507ffb38f846f60ad63335cd9c83
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   2021-10-21 (Thu, 21 Oct 2021)

  Changed paths:
    A include/qemu/coroutine-pool-timer.h
    M include/qemu/coroutine.h
    M iothread.c
    A util/coroutine-pool-timer.c
    M util/main-loop.c
    M util/meson.build
    M util/qemu-coroutine.c

  Log Message:
  -----------
  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging

Pull request

Performance optimization when guest applications submit a lot of parallel I/O.
This has also been found to improve clang SafeStack performance.

# gpg: Signature made Thu 21 Oct 2021 10:40:49 AM PDT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]

* remotes/stefanha/tags/block-pull-request:
  coroutine: resize pool periodically instead of limiting size

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


Compare: https://github.com/qemu/qemu/compare/4c127fdbe81d...78d98bfbd39d



reply via email to

[Prev in Thread] Current Thread [Next in Thread]