qemu-trivial
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] bitops.h: Compile out asserts without --enable-debug


From: BALATON Zoltan
Subject: Re: [PATCH] bitops.h: Compile out asserts without --enable-debug
Date: Mon, 22 May 2023 14:00:45 +0200 (CEST)

On Mon, 22 May 2023, Peter Maydell wrote:
On Sat, 20 May 2023 at 21:55, BALATON Zoltan <balaton@eik.bme.hu> wrote:

The low level extract and deposit funtions provided by bitops.h are
used in performance critical places. It crept into target/ppc via
FIELD_EX64 and also used by softfloat so PPC code using a lot of FPU
where hardfloat is also disabled is doubly affected.

Normally asserts should be compiled out from release builds with
-DNDEBUG but that cannot be used in QEMU because some places still
rely on asserts instead of proper error checking. To resolve this,
compile out asserts from deposit/extract functions in optimised builds
which improves performance for target/ppc and possibly others too.

Can we have some figures for performance improvements,
please? General QEMU policy is that asserts remain,
even in non-debug builds, so exceptions from that policy
should come with justification with figures attached.

Here are some figures converting a 10MB wav file to mp3 with lame on AmigaOS pegasos2 which is using a lot of FPU operations (which is using
softfloat on TCG target/ppc due to no hardfloat so it's very slow):

   8.0: 1:11 0.8264x, 1:11 0x8258x
master: 1:12 0.8117x, 1:12 0.8103x
 patch: 1:02 0.9541x, 1:02 0.9506x

The numbers are time minute:seconds and speed compared to play speed (lame calls this play/CPU). I did two runs, first after booting the guest and another one after the first. Despite the second run should use more cache and less compile overhead the first runs seem to be a bit faster. Real hardware gets about 11x speed although that's with Altivec where on QEMU Altivec does not help much so we're still much behind even on an Intel Core i7 3.6GHz CPU for FPU ops where integer ops are much better. I got similar reports from Apple silicon hardware with macOS too. Eventually fixing hardfloat for target/ppc would help even more but this is independent of that and significant enough to remove this overhead now. I've never seen these asserts firing and unlikely to depend on run time values so leaving them in non-debug builds seems unnecessary overkill that also hurts performance.

Regards,
BALATON Zoltan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]