qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About hardfloat in ppc


From: BALATON Zoltan
Subject: Re: About hardfloat in ppc
Date: Thu, 30 Apr 2020 22:17:45 +0200 (CEST)
User-agent: Alpine 2.22 (BSF 395 2020-01-19)

On Thu, 30 Apr 2020, Alex Bennée wrote:
BALATON Zoltan <address@hidden> writes:
On Tue, 28 Apr 2020, Alex Bennée wrote:
罗勇刚(Yonggang Luo) <address@hidden> writes:
I am confusing why only  inexact  are set then we can use hard-float.

The inexact behaviour of the host hardware may be different from the
guest architecture we are trying to emulate and the host hardware may
not be configurable to emulate the guest mode.

Have a look in softfloat.c and see all the places where
float_flag_inexact is set. Can you convince yourself that the host
hardware will do the same?

Can you convince me that it won't? This all seems to be guessing
without evidence so I think what we need first is some tests to prove
it either way. Such tests could then also be used at runtime to decide
if the host and guest FPU are compatible enough to enable hardfloat.
Are such tests available somewhere or what would need to be done to
implement them?

I seem to recall it comes down to the various approaches that FPUs can
take when dealing with tiny numbers when rounding. Emilio did the
original work so I've CC'd him. The original paper is referenced in the
hardfloat commentary:

Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
binary translator." Software: Practice and Experience 46.12 (2016):1591-1615.

which is worth a read if you can get hold of it.

Running tests on start up is not without precedent. We have a
softfloat_init which checks for a broken FMA implementation. However I'd
caution about adding too many checks in there.

Sure the runtime check should be quick so likely the approach would be to write detailed tests to profile different FPU implementations then only include one quick check to tell at runtime if we're running on a known good host. Maybe if someone knows the different FPUs can tell this without tests but I don't know and finding out from docs seems more work than determining it empirically by testing. Does someone have some hints on what operations should be tested to check for different inexact handling in different FPUs?

This may not solve the problem with PPC target with non-cumulative
status bits but could improve hardfloat performance at least for some
host-guest combinations. To see if it worth the effort we should run
such test on common combinations (say x86_64. ARM and PPC hosts with
at least these guests).

We already enable hardfloat for all hosts apart from PPC and FAST_MATHS.

Only if inexact is set which may be common but still not using softfloat ar all if host's implementation is good for guest could be even faster.

And PPC always clearing inexact  flag before calling to soft-float
funcitons. so we can not
optimize it with hard-float.
I need some resouces about ineact flag and why always clearing inexcat in
PPC FP simualtion.

Because that is the behaviour of the PPC floating point unit. The
inexact flag will represent the last operation done.

More precisely additional to the usual cumulative (or sticky) bits
there are two non-sticky bits for inexact and rounded (latter of which
is not emulated) that currently need clearing FP status before every
FP op.

Thanks for the clarification.

I wonder if we can know when the guest reads these and rerun
the last FP op in softfloat to compute them only if these are read,
then it's enough to remember the last FP op. This could be relatively
simple and may be used even if we don't detect accessing the bits
within FPSCR just accessing the FPSCR as likely most guest code does
not check that and any cross-platform code won't check PPC specific
non-sticky bits so I'd exepect most guest code to be fine with
hardfloat.

You could go further if you know nothing in a block can fault you can
skip the calculation overhead of the per-op flags for all but the last
op in the block.

I think that's an additional optimisation that could be done once the simple case of just rerunning last op if flags are accessed works. Just to keep complexity low first then try more complex solution. (Although I'm not planning to try to do this so whatever complexity can be handled by whom will implement it is fine but less complexity means less bugs so I'd go for simple first.)

Although what about FP exceptions? We also need to revert
to softfloat it FP exceptions are enabled so maybe using host FP
exception for managing status bits could be the way to go to let
hardware manage this and we don't need to implement everything in
software.

Well for all apart from inexact handling (which would fault as soon as
set) all other exception types are detected before we pass them to
hardfloat anyway. Given the range of NaN types we would have to post
process and hardfloat operation anyway to give the right NaN.

Is checking for those exceptions beforehand really needed? Wouldn't it be easier to install an exception handler and let the hardware do those checks? It this is again done because of FPU implemenation differences but inexact is determined by looking at the FP status (that's why it's cleared on PPC) then that means that we always use the hosts inexact semantics and don't emulate guest correctly anyway, so we can skip the tests above. Then why can't we install an exception handler and set guest bits whenever that's raised?

Regards,
BALATON Zoltan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]