qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC


From: Alex Bennée
Subject: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
Date: Mon, 02 Mar 2020 17:10:30 +0000
User-agent: mu4e 1.3.9; emacs 27.0.90

BALATON Zoltan <address@hidden> writes:

> On Sun, 1 Mar 2020, Richard Henderson wrote:
>> On 3/1/20 4:13 PM, Programmingkid wrote:
>>> Ok, I was just looking at Intel's x87 chip documentation. It
>>> supports IEEE 754 floating point operations and exception flags.
>>> This leads me to this question. Would simply taking the host
>>> exception flags and using them to set the PowerPC's FPU's flag be
>>> an acceptable solution to this problem?
>
> In my understanding that's what is currently done, the problem with
> PPC as Richard said is the non-sticky versions of some of these bits
> which need clearing FP exception status before every FPU op which
> seems to be expensive and slower than using softfloat. So to use
> hardfloat we either accept that we can't emulate these bits with
> hardfloat or we need to do something else than clearing flags and
> checking after every FPU op.
>
> While not emulating these bits don't seem to matter for most clients
> and other PPC emulations got away with it, QEMU prefers accuracy over
> speed even for rarely used features.
>
>> No.
>>
>> The primary issue is the FPSCR.FI flag.  This is not an accumulative bit, per
>> ieee754, but per operation.
>>
>> The "hardfloat" option works (with other targets) only with ieee745
>> accumulative exceptions, when the most common of those exceptions, inexact, 
>> has
>> already been raised.  And thus need not be raised a second time.
>
> Why exactly it's done that way? What are the differences between IEEE
> FP implementations that prevents using hardfloat most of the time
> instead of only using it in some (although supposedly common) special
> cases?

There are a couple of wrinkles. As far as NaN and denormal behaviour
goes we have enough slack in the spec that different guests have
slightly different behaviour. See pickNaN and friends in the soft float
specialisation code. As a result we never try and hand off to hardfloat
for NaNs, Infs and Zeros. Luckily testing for those cases if a fairly
small part of the cost of the calculation.

Also things tend to get unstuck on changes to rounding modes.
Fortunately it doesn't seem to be supper common. 

You can read even more detail in the paper that originally prompted
Emilio's work:

  "supporting the neon and VFP instruction sets in an LLVM-based
   binary translator"
   https://www.thinkmind.org/download.php?articleid=icas_2015_5_20_20033

>> Per the PowerPC architecture, inexact must be recognized afresh for every
>> operation.  Which is cheap in hardware but expensive in software.
>>
>> And once you're done with FI, FR has been and continues to be emulated 
>> incorrectly.
>
> I think CPUs can also raise exceptions when they detect the condition
> in hardware so maybe we should install our FPU exception handler and
> set guest flags from that then we don't need to check and won't have
> problem with these bits either. Why is that not possible or isn't
> done?

One of my original patches did just this:

  Subject: [PATCH] fpu/softfloat: use hardware sqrt if we can (EXPERIMENT!)
  Date: Tue, 20 Feb 2018 21:01:37 +0000
  Message-Id: <address@hidden>

The two problems you run into are:

 - relying on a trap for inexact will be slow if you keep hitting it
 - reading host FPU flag registers turns out to be pretty expensive

> The softfloat code has a comment that working with exceptions is
> not pleasent but why? Isn't setting flags from a handler easier than
> checking separately for each op? If this is because of differences in
> how flags are handled by different targets we don't have to do that
> from the host FPU exception handler. That handler could only set a
> global flag on each exception that targets can be checked by targets
> and handle differences. This global flag then can include non-sticky
> versions if needed because clearing a global should be less expensive
> than clearing FPU status reg. But I don't really know, just guessing,
> somone who knows more about FPUs probably knows a better way.
>
> Regards,
> BALATON Zoltan


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]