Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC

From:	BALATON Zoltan
Subject:	Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
Date:	Tue, 3 Mar 2020 00:01:44 +0100 (CET)
User-agent:	Alpine 2.22 (BSF 395 2020-01-19)

On Mon, 2 Mar 2020, Alex Bennée wrote:

BALATON Zoltan <address@hidden> writes:

On Sun, 1 Mar 2020, Richard Henderson wrote:

On 3/1/20 4:13 PM, Programmingkid wrote:

Ok, I was just looking at Intel's x87 chip documentation. It
supports IEEE 754 floating point operations and exception flags.
This leads me to this question. Would simply taking the host
exception flags and using them to set the PowerPC's FPU's flag be
an acceptable solution to this problem?


In my understanding that's what is currently done, the problem with
PPC as Richard said is the non-sticky versions of some of these bits
which need clearing FP exception status before every FPU op which
seems to be expensive and slower than using softfloat. So to use
hardfloat we either accept that we can't emulate these bits with
hardfloat or we need to do something else than clearing flags and
checking after every FPU op.

While not emulating these bits don't seem to matter for most clients
and other PPC emulations got away with it, QEMU prefers accuracy over
speed even for rarely used features.

No.

The primary issue is the FPSCR.FI flag.  This is not an accumulative bit, per
ieee754, but per operation.

The "hardfloat" option works (with other targets) only with ieee745
accumulative exceptions, when the most common of those exceptions, inexact, has
already been raised.  And thus need not be raised a second time.


Why exactly it's done that way? What are the differences between IEEE
FP implementations that prevents using hardfloat most of the time
instead of only using it in some (although supposedly common) special
cases?


There are a couple of wrinkles. As far as NaN and denormal behaviour
goes we have enough slack in the spec that different guests have
slightly different behaviour. See pickNaN and friends in the soft float
specialisation code. As a result we never try and hand off to hardfloat
for NaNs, Infs and Zeros. Luckily testing for those cases if a fairly
small part of the cost of the calculation.

Also things tend to get unstuck on changes to rounding modes.
Fortunately it doesn't seem to be supper common.

OK but how do these relate to inexact flag and why is that the one that'schecked for using hardfloat? Also rounding mode is checked but why can'twe set the same mode on host and why only use hardfloat in one specificrounding mode? These two checks seem to further limit hardfloat use beyondthe above cases or are these the same?

You can read even more detail in the paper that originally prompted
Emilio's work:

 "supporting the neon and VFP instruction sets in an LLVM-based
  binary translator"
  https://www.thinkmind.org/download.php?articleid=icas_2015_5_20_20033

I've only had a quick look at it but seems to not discuss all details.They say the ARM instruction they wanted to emulate have some non-standardflush-to-zero behaviour where exceptions (including inexact) are handleddifferently. Is this related to the check above and if yes shouldn't thatonly apply to ARM target? Other standard compliant target probably shouldnot be limited by this.

They've also found out that clearing and reading host FP flags is "slowerthan QEMU" which is what we have for PPC currently. They say the solutionis to not use host exceptions at all but calculate the exception flagsfrom software looking at inputs and result instead maybe trying differentFP ops that test for the exception cases. Unfortunately this paper doesnot describe how exactly that's done just say maybe it will be describedlater. It seems like kind of softfloat but using FPU for actualcalculation and deduce exeptions without access to intermediate reaultsthat softfloat may be using. So they can use hardware for calculationwhich should be the largest part and calculate the flags from software.This way they claim 1.24 to 3.36 times speed up compared to then QEMU(using only softfloat I guess which is still what we have for PPC today).

Per the PowerPC architecture, inexact must be recognized afresh for every
operation.  Which is cheap in hardware but expensive in software.

And once you're done with FI, FR has been and continues to be emulated 
incorrectly.


I think CPUs can also raise exceptions when they detect the condition
in hardware so maybe we should install our FPU exception handler and
set guest flags from that then we don't need to check and won't have
problem with these bits either. Why is that not possible or isn't
done?


One of my original patches did just this:

 Subject: [PATCH] fpu/softfloat: use hardware sqrt if we can (EXPERIMENT!)
 Date: Tue, 20 Feb 2018 21:01:37 +0000
 Message-Id: <address@hidden>


It's this patch:
http://patchwork.ozlabs.org/patch/875764/

This at least shows where to hook in FP exception handling but based onthe above paper maybe that's not the best solution after all but may wortha try anyway in case it's simpler than what they did.

The two problems you run into are:

- relying on a trap for inexact will be slow if you keep hitting it

Which is slower? Clearing exception flags before every op and reading themagain or trapping for exceptions? I'd expect even if exceptions are commonthey should be less frequent than every op (otherwise they would not be"exceptional").

- reading host FPU flag registers turns out to be pretty expensive

That's what using exceptions should avoid. If we only need to read andclear flags when exception happens that should be less frequent than doingthat for every FP op. Hopefully even with the additional overhead ofcalling the handler if all the handler has to do is set a correspondingflag in a global.


Regards,
BALATON Zoltan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, Programmingkid, 2020/03/01
- Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, Richard Henderson, 2020/03/01
  - Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, BALATON Zoltan, 2020/03/02
    - Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, Richard Henderson, 2020/03/02
    - Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, BALATON Zoltan, 2020/03/02
    - Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, Richard Henderson, 2020/03/02
    - Message not available
    - Fwd: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, G 3, 2020/03/04
    - Re: Fwd: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, Richard Henderson, 2020/03/05
    - Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, Alex Bennée, 2020/03/02
    - Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC, BALATON Zoltan <=

Prev by Date: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
Next by Date: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
Previous by thread: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
Next by thread: Re: Problem with virtual to physical memory translation when KVM is enabled.
Index(es):
- Date
- Thread