qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 00/10] Optimize buffer_is_zero


From: Alexander Monakov
Subject: Re: [PATCH v4 00/10] Optimize buffer_is_zero
Date: Fri, 16 Feb 2024 02:37:46 +0300 (MSK)

On Thu, 15 Feb 2024, Richard Henderson wrote:

> > Converting a 4.4 GiB Windows 10 image to qcow2. It was mentioned in v1 and
> > v2,
> > are you saying they did not reach your inbox?
> > https://lore.kernel.org/qemu-devel/20231013155856.21475-1-mmromanov@ispras.ru/
> > https://lore.kernel.org/qemu-devel/20231027143704.7060-1-mmromanov@ispras.ru/
> 
> I'm saying that this is not a reproducible description of methodology.
> 
> With master, so with neither of our changes:
> 
> I tried converting an 80G win7 image that I happened to have lying about, I
> see buffer_zero_avx2 with only 3.03% perf overhead.  Then I tried truncating
> the image to 16G to see if having the entire image in ram would help -- not
> yet, still only 3.4% perf overhead.  Finally, I truncated the image to 4G and
> saw 2.9% overhead.
> 
> So... help be out here.  I would like to be able to see results that are at
> least vaguely similar.

Ah, I guess you might be running at low perf_event_paranoid setting that
allows unprivileged sampling of kernel events? In our submissions the
percentage was for perf_event_paranoid=2, i.e. relative to Qemu only,
excluding kernel time under syscalls.

Retrieve IE11.Win7.VirtualBox.zip from
https://archive.org/details/ie11.win7.virtualbox
and use

  unzip -p IE11.Win7.VirtualBox.zip | tar xv

to extract 'IE11 - Win7-disk001.vmdk'.

(Mikhail used a different image when preparing the patch)

On this image, I get 70% in buffer_zero_sse2 on a Sandy Bridge running

  qemu-img convert 'IE11 - Win7-disk001.vmdk' -O qcow2 /tmp/t.qcow2

user:kernel time is about 0.15:2.3, so 70% relative to user time does
roughly correspond to single-digits percentage relative to (user+kernel) time.

(which does tell us that qemu-img is doing I/O inefficiently, it shouldn't
need two seconds to read a fully cached 5 Gigabyte file)

Alexander



reply via email to

[Prev in Thread] Current Thread [Next in Thread]