[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 00/10] Optimize buffer_is_zero
From: |
Alexander Monakov |
Subject: |
Re: [PATCH v4 00/10] Optimize buffer_is_zero |
Date: |
Fri, 16 Feb 2024 02:37:46 +0300 (MSK) |
On Thu, 15 Feb 2024, Richard Henderson wrote:
> > Converting a 4.4 GiB Windows 10 image to qcow2. It was mentioned in v1 and
> > v2,
> > are you saying they did not reach your inbox?
> > https://lore.kernel.org/qemu-devel/20231013155856.21475-1-mmromanov@ispras.ru/
> > https://lore.kernel.org/qemu-devel/20231027143704.7060-1-mmromanov@ispras.ru/
>
> I'm saying that this is not a reproducible description of methodology.
>
> With master, so with neither of our changes:
>
> I tried converting an 80G win7 image that I happened to have lying about, I
> see buffer_zero_avx2 with only 3.03% perf overhead. Then I tried truncating
> the image to 16G to see if having the entire image in ram would help -- not
> yet, still only 3.4% perf overhead. Finally, I truncated the image to 4G and
> saw 2.9% overhead.
>
> So... help be out here. I would like to be able to see results that are at
> least vaguely similar.
Ah, I guess you might be running at low perf_event_paranoid setting that
allows unprivileged sampling of kernel events? In our submissions the
percentage was for perf_event_paranoid=2, i.e. relative to Qemu only,
excluding kernel time under syscalls.
Retrieve IE11.Win7.VirtualBox.zip from
https://archive.org/details/ie11.win7.virtualbox
and use
unzip -p IE11.Win7.VirtualBox.zip | tar xv
to extract 'IE11 - Win7-disk001.vmdk'.
(Mikhail used a different image when preparing the patch)
On this image, I get 70% in buffer_zero_sse2 on a Sandy Bridge running
qemu-img convert 'IE11 - Win7-disk001.vmdk' -O qcow2 /tmp/t.qcow2
user:kernel time is about 0.15:2.3, so 70% relative to user time does
roughly correspond to single-digits percentage relative to (user+kernel) time.
(which does tell us that qemu-img is doing I/O inefficiently, it shouldn't
need two seconds to read a fully cached 5 Gigabyte file)
Alexander
- Re: [PATCH v4 07/10] util/bufferiszero: Introduce biz_accel_fn typedef, (continued)
- [PATCH v4 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel, Richard Henderson, 2024/02/15
- [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aarch64, Richard Henderson, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/15
- Re: [PATCH v4 00/10] Optimize buffer_is_zero,
Alexander Monakov <=
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/16
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Alexander Monakov, 2024/02/16
- Re: [PATCH v4 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/02/16