qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/2] support block encryption/decryption in parallel


From: Kevin Wolf
Subject: Re: [PATCH 0/2] support block encryption/decryption in parallel
Date: Thu, 16 Jan 2025 13:37:44 +0100

Am 13.12.2024 um 16:56 hat Daniel P. Berrangé geschrieben:
> On Thu, Nov 28, 2024 at 06:51:20PM +0800, tugy@chinatelecom.cn wrote:
> > From: Guoyi Tu <tugy@chinatelecom.cn>
> > 
> > Currently, disk I/O encryption and decryption operations are performed 
> > sequentially
> > in the main thread or IOthread. When the number of I/O requests increases,
> > this becomes a performance bottleneck.
> > 
> > To address this issue, this patch use thread pool to perform I/O encryption
> > and decryption in parallel, improving overall efficiency.
> 
> We already have support for parallel encryption through use of IO threads
> since approximately this commit:
> 
>   commit af206c284e4c1b17cdfb0f17e898b288c0fc1751
>   Author: Stefan Hajnoczi <stefanha@redhat.com>
>   Date:   Mon May 27 11:58:50 2024 -0400
> 
>     block/crypto: create ciphers on demand
>     
>     Ciphers are pre-allocated by qcrypto_block_init_cipher() depending on
>     the given number of threads. The -device
>     virtio-blk-pci,iothread-vq-mapping= feature allows users to assign
>     multiple IOThreads to a virtio-blk device, but the association between
>     the virtio-blk device and the block driver happens after the block
>     driver is already open.
>     
>     When the number of threads given to qcrypto_block_init_cipher() is
>     smaller than the actual number of threads at runtime, the
>     block->n_free_ciphers > 0 assertion in qcrypto_block_pop_cipher() can
>     fail.
>     
>     Get rid of qcrypto_block_init_cipher() n_thread's argument and allocate
>     ciphers on demand.
> 
> 
> Say we have QEMU pinned to 4 host CPUs, and we've setup 4 IO threads
> for the disk, then encryption can max out 4 host CPUs worth of resource.

This is a lot of "if"s. Even just that it requires explicit
configuration and doesn't work out of the box would be a strong point
for me why having something that works by default (like a thread pool)
is worth it.

You're assuming that it's even possible to setup 4 iothreads which share
the load evenly. That's not a given at all. The only device that can
even make use of more than one iothread is virtio-blk. (And if we're
looking at all devices that exist in QEMU, most devices can't even make
use of a single iothread!) But if you do have a virtio-blk device, then
that setup means one iothread per queue. In a Linux guest, if all I/O
comes from a single CPU, then it will use the same queue and we'll have
three idle iothreads and one that is overloaded.

So in order to achieve a similar effect with iothreads, you must be
using virtio-blk, you must explicitly configure four iothreads and four
mappings of virtqueues to iothreads, and you must run a workload in the
guest that performs I/O from four different threads running on different
CPUs.

There are certainly good use cases for iothreads, but with this number
of preconditions, I don't think we can assume that they make it
unnecessary to parallelise things in other ways, too.

> How is this new proposed way to use a thread pool going to do better
> than that in an apples-to-apples comparison ? ie allow same number
> of host CPUs for both. The fundamental limit is still the AES performance
> of the host CPU(s) that you allow QEMU to execute work on. If the thread
> pool is allowed to use 4 host CPUs, it shouldn't be significantly different
> from allowing use of 4 host CPUs for I/O threads surely ?
> 
> Having multiple different ways to support parallel encryption is not
> ideal. If there's something I/O threads can't do optimally right
> now, is it practical to make them work better ?

The limitations inside the guest obviously can't be changed by QEMU.

We could in theory add iothread support to more devices, though if they
don't have a concept of multiple queues that could be processed by
different threads, it's pretty pointless (apart from working around
limitations in the backends like you're suggesting here).

And of course, the most interesting one would be solving the
out-of-the-box aspect. This is far from trivial, because the optimal
configuration really depends on your workload, and nothing on the host
can know automatically what will eventually run in the guest. So it will
be tough finding defaults that improve this case without also hurting
other cases.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]