qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About hardfloat in ppc


From: Alex Bennée
Subject: Re: About hardfloat in ppc
Date: Fri, 01 May 2020 14:10:13 +0100
User-agent: mu4e 1.4.1; emacs 28.0.50

罗勇刚(Yonggang Luo) <address@hidden> writes:

> On Fri, May 1, 2020 at 7:58 PM BALATON Zoltan <address@hidden> wrote:
>
>> On Fri, 1 May 2020, 罗勇刚(Yonggang Luo) wrote:
>> > That's what I suggested,
>> > We preserve a  float computing cache
>> > typedef struct FpRecord {
>> >  uint8_t op;
>> >  float32 A;
>> >  float32 B;
>> > }  FpRecord;
>> > FpRecord fp_cache[1024];
>> > int fp_cache_length;
>> > uint32_t fp_exceptions;
>> >
>> > 1. For each new fp operation we push it to the  fp_cache,
>> > 2. Once we read the fp_exceptions , then we re-compute
>> > the fp_exceptions by re-running the fp FpRecord sequence.
>> > and clear  fp_cache_length.
>>
>> Why do you need to store more than the last fp op? The cumulative bits can
>> be tracked like it's done for other targets by not clearing fp_status then
>> you can read it from there. Only the non-sticky FI bit needs to be
>> computed but that's only determined by the last op so it's enough to
>> remember that and run that with softfloat (or even hardfloat after
>> clearing status but softfloat may be faster for this) to get the bits for
>> last op when status is read.
>>
> Yeap, store only the last fp op is also an option. Do you means that store
> the last fp op,
> and calculate it when necessary?  I am thinking about a general fp
> optmize method that suite
> for all target.

I think that's getting a little ahead of yourself. Let's prove the
technique is valuable for PPC (given it has the most to gain). We can
always generalise later if it's worthwhile.

Rather than creating a new structure I would suggest creating 3 new tcg
globals (op, inA, inB) and re-factor the front-end code so each FP op
loaded the TCG globals. The TCG optimizer should pick up aliased loads
and automatically eliminate the dead ones. We might need some new
machinery for the TCG to avoid spilling the values over potentially
faulting loads/stores but that is likely a phase 2 problem. 

Next you will want to find places that care about the per-op bits of
cpu_fpscr and call a helper with the new globals to re-run the
computation and feed the values in.

That would give you a reasonable working prototype to start doing some
measurements of overhead and if it makes a difference.

>
>>
>> > 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 and
>> > clear  fp_exceptions.
>> > 4. If the  fp_cache are full, then we re-compute
>> > the fp_exceptions by re-running the fp FpRecord sequence.
>>
>> All this cache management and more than one element seems unnecessary to
>> me although I may be missing something.
>>
>> > Now the keypoint is how to tracking the read and write of FPSCR register,
>> > The current code are
>> >    cpu_fpscr = tcg_global_mem_new(cpu_env,
>> >                                   offsetof(CPUPPCState, fpscr), "fpscr");
>>
>> Maybe you could search where the value is read which should be the places
>> where we need to handle it but changes may be needed to make a clear API
>> for this between target/ppc, TCG and softfloat which likely does not
>> exist yet.

Once the per-op calculation is fixed in the PPC front-end I thing the
only change needed is to remove the #if defined(TARGET_PPC) in
softfloat.c - it's only really there because it avoids the overhead of
checking flags which we always know to be clear in it's case.

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]