[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading prepa
From: |
piaojun |
Subject: |
Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3 |
Date: |
Mon, 5 Aug 2019 10:52:21 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 |
Hi Stefan,
>From my test, 9p has better bandwidth than virtio as below:
---
9p Test:
# mount -t 9p -o
trans=virtio,version=9p2000.L,rw,nodev,msize=1000000000,access=client 9pshare
/mnt/9pshare
# fio -direct=1 -time_based -iodepth=1 -rw=randwrite -ioengine=libaio -bs=1M
-size=1G -numjob=1 -runtime=30 -group_reporting -name=file
-filename=/mnt/9pshare/file
file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=1
fio-2.13
Starting 1 process
file: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/1091MB/0KB /s] [0/1091/0 iops] [eta
00m:00s]
file: (groupid=0, jobs=1): err= 0: pid=6187: Mon Aug 5 17:55:44 2019
write: io=35279MB, bw=1175.1MB/s, iops=1175, runt= 30001msec
slat (usec): min=589, max=4211, avg=844.04, stdev=124.04
clat (usec): min=1, max=24, avg= 2.53, stdev= 1.16
lat (usec): min=591, max=4214, avg=846.57, stdev=124.14
clat percentiles (usec):
| 1.00th=[ 2], 5.00th=[ 2], 10.00th=[ 2], 20.00th=[ 2],
| 30.00th=[ 2], 40.00th=[ 2], 50.00th=[ 2], 60.00th=[ 3],
| 70.00th=[ 3], 80.00th=[ 3], 90.00th=[ 3], 95.00th=[ 3],
| 99.00th=[ 4], 99.50th=[ 13], 99.90th=[ 18], 99.95th=[ 20],
| 99.99th=[ 22]
lat (usec) : 2=0.04%, 4=98.27%, 10=1.15%, 20=0.48%, 50=0.06%
cpu : usr=23.83%, sys=5.24%, ctx=105843, majf=0, minf=9
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=35279/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
---
---
virtiofs Test:
# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/mnt/virtiofs/ -o
cache=none
# mount -t virtio_fs myfs /mnt/virtiofs -o rootmode=040000,user_id=0,group_id=0
# fio -direct=1 -time_based -iodepth=1 -rw=randwrite -ioengine=libaio -bs=1M
-size=1G -numjob=1 -runtime=30 -group_reporting -name=file
-filename=/mnt/virtiofs/file
file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=1
fio-2.13
Starting 1 process
file: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/895.1MB/0KB /s] [0/895/0 iops] [eta
00m:00s]
file: (groupid=0, jobs=1): err= 0: pid=6046: Mon Aug 5 17:54:58 2019
write: io=23491MB, bw=801799KB/s, iops=783, runt= 30001msec
slat (usec): min=93, max=390, avg=233.40, stdev=64.22
clat (usec): min=849, max=4083, avg=1039.32, stdev=178.98
lat (usec): min=971, max=4346, avg=1272.72, stdev=200.34
clat percentiles (usec):
| 1.00th=[ 972], 5.00th=[ 980], 10.00th=[ 988], 20.00th=[ 988],
| 30.00th=[ 996], 40.00th=[ 1004], 50.00th=[ 1012], 60.00th=[ 1012],
| 70.00th=[ 1020], 80.00th=[ 1032], 90.00th=[ 1032], 95.00th=[ 1384],
| 99.00th=[ 1560], 99.50th=[ 1768], 99.90th=[ 3664], 99.95th=[ 4016],
| 99.99th=[ 4048]
lat (usec) : 1000=37.21%
lat (msec) : 2=62.39%, 4=0.34%, 10=0.06%
cpu : usr=15.39%, sys=4.03%, ctx=23496, majf=0, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=23491/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
---
And the backend filesystem is ext4 + ramdisk, and 9p has deeper queue
depth than virtiofs catched by iostat. Then I check the code, and found
9p uses pwritev, but virtiofs uses pwrite. I wonder if virtiofs could
also use iovec to improve its performance.
I'd like to help contributing the patch in the future.
Thanks,
Jun
On 2019/8/2 0:54, Stefan Hajnoczi wrote:
> This patch series introduces the virtiofsd --thread-pool-size=NUM and sets the
> default value to 64. Each virtqueue has its own thread pool for processing
> requests. Blocking requests no longer pause virtqueue processing and I/O
> performance should be greatly improved when the queue depth is greater than 1.
>
> Linux boot and pjdfstest have been tested with these patches and the default
> thread pool size of 64.
>
> I have now concluded the thread-safety code audit. Please let me know if you
> have concerns about things I missed.
>
> Performance
> -----------
> Please try these patches out and share your results.
>
> Scalability
> -----------
> There are several synchronization primitives that are taken by the virtqueue
> processing thread or the thread pool worker. Unfortunately this is necessary
> to protect shared state. It means that thread pool workers contend on or at
> least access thread synchronization primitives. If anyone has suggestions for
> improving this situation, please discuss.
>
> 1. vu_dispatch_rwlock - libvhost-user from races between the vhost-user
> protocol thread and the virtqueue processing and thread pool worker
> threads.
>
> 2. vq_lock - protects the virtqueue from races between the virtqueue
> processing
> thread and thread pool workers.
>
> 3. init_rwlock - protects FUSE_INIT/FUSE_DESTROY from races with other
> requests.
>
> 4. se->lock - protects se->list and the FUSE_INTERRUPT shared state.
>
> Ideally we could avoid hitting all of these locks on each request. That would
> make the code scale better.
>
> Future work
> -----------
> This series does not complete the multithreading effort yet. Two items are
> still missing:
> 1. Multiqueue support
> 2. CPU affinity for virtqueue processing threads and thread pools
>
> Stefan Hajnoczi (4):
> virtiofsd: process requests in a thread pool
> virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
> virtiofsd: fix lo_destroy() resource leaks
> virtiofsd: add --thread-pool-size=NUM option
>
> contrib/virtiofsd/fuse_i.h | 2 +
> contrib/virtiofsd/fuse_lowlevel.c | 25 +-
> contrib/virtiofsd/fuse_virtio.c | 491 ++++++++++++++++-------------
> contrib/virtiofsd/passthrough_ll.c | 43 ++-
> contrib/virtiofsd/seccomp.c | 1 +
> 5 files changed, 318 insertions(+), 244 deletions(-)
>
- Re: [Qemu-devel] [PATCH 1/4] virtiofsd: process requests in a thread pool, (continued)
- [Qemu-devel] [PATCH 4/4] virtiofsd: add --thread-pool-size=NUM option, Stefan Hajnoczi, 2019/08/01
- Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3,
piaojun <=
- Re: [Qemu-devel] [PATCH 0/4] virtiofsd: multithreading preparation part 3, Stefan Hajnoczi, 2019/08/07
- Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3, Vivek Goyal, 2019/08/07
- Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3, Stefan Hajnoczi, 2019/08/08
- Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3, Dr. David Alan Gilbert, 2019/08/08
- Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3, Vivek Goyal, 2019/08/08
- Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3, Stefan Hajnoczi, 2019/08/09
- Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3, Liu Bo, 2019/08/10
- Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3, Stefan Hajnoczi, 2019/08/09