[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: About hardfloat in ppc
From: |
Alex Bennée |
Subject: |
Re: About hardfloat in ppc |
Date: |
Fri, 01 May 2020 15:01:24 +0100 |
User-agent: |
mu4e 1.4.1; emacs 28.0.50 |
BALATON Zoltan <address@hidden> writes:
> On Fri, 1 May 2020, Alex Bennée wrote:
>> 罗勇刚(Yonggang Luo) <address@hidden> writes:
>>> On Fri, May 1, 2020 at 7:58 PM BALATON Zoltan <address@hidden> wrote:
>>>> On Fri, 1 May 2020, 罗勇刚(Yonggang Luo) wrote:
>>>>> That's what I suggested,
>>>>> We preserve a float computing cache
>>>>> typedef struct FpRecord {
>>>>> uint8_t op;
>>>>> float32 A;
>>>>> float32 B;
>>>>> } FpRecord;
>>>>> FpRecord fp_cache[1024];
>>>>> int fp_cache_length;
>>>>> uint32_t fp_exceptions;
>>>>>
>>>>> 1. For each new fp operation we push it to the fp_cache,
>>>>> 2. Once we read the fp_exceptions , then we re-compute
>>>>> the fp_exceptions by re-running the fp FpRecord sequence.
>>>>> and clear fp_cache_length.
>>>>
>>>> Why do you need to store more than the last fp op? The cumulative bits can
>>>> be tracked like it's done for other targets by not clearing fp_status then
>>>> you can read it from there. Only the non-sticky FI bit needs to be
>>>> computed but that's only determined by the last op so it's enough to
>>>> remember that and run that with softfloat (or even hardfloat after
>>>> clearing status but softfloat may be faster for this) to get the bits for
>>>> last op when status is read.
>>>>
>>> Yeap, store only the last fp op is also an option. Do you means that store
>>> the last fp op,
>>> and calculate it when necessary? I am thinking about a general fp
>>> optmize method that suite
>>> for all target.
>>
>> I think that's getting a little ahead of yourself. Let's prove the
>> technique is valuable for PPC (given it has the most to gain). We can
>> always generalise later if it's worthwhile.
>>
>> Rather than creating a new structure I would suggest creating 3 new tcg
>> globals (op, inA, inB) and re-factor the front-end code so each FP op
>> loaded the TCG globals.
>
> So that's basically wherever you see helper_reset_fpstatus() in
> target/ppc we would need to replace it with saving op and args to
> globals? Or just repurpose this helper to do that. This is called
> before every fp op but not before sub ops within vector ops. Is that
> correct? Probably it is, as vector ops are a single op but how do we
> detect changes in flags by sub ops for those? These might have some
> existing bugs I think.
I'll defer to the PPC front end experts on this. I'm not familiar with
how it all goes together at all.
>
>> The TCG optimizer should pick up aliased loads
>> and automatically eliminate the dead ones. We might need some new
>> machinery for the TCG to avoid spilling the values over potentially
>> faulting loads/stores but that is likely a phase 2 problem.
>
> I have no idea how to do this or even where to look. Some more
> detailed explanation may be needed here.
Don't worry about it now. Let's worry about it when we see how often
faulting instructions are interleaved with fp ops.
>
>> Next you will want to find places that care about the per-op bits of
>> cpu_fpscr and call a helper with the new globals to re-run the
>> computation and feed the values in.
>
> So the code that cares about these bits are in guest thus we would
> need to compute it if we detect the guest accessing these. Detecting
> when the individual bits are accessed might be difficult so at first
> we could go for checking if the fpscr is read and recompute FI bit
> then before returning value. You previously said these might be when
> fpscr is read or when generating exceptions but not sure where exactly
> are these done for ppc. (I'd expect to have mffpscr but there seem to
> be different other ops instead accessing parts of fpscr which are
> found in target/ppc/fp-impl.inc.c:567 so this would need studying the
> PPC docs to understand how the guest can access the FI bit of fpscr
> reg.)
>
>> That would give you a reasonable working prototype to start doing some
>> measurements of overhead and if it makes a difference.
>>
>>>
>>>>
>>>>> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 and
>>>>> clear fp_exceptions.
>>>>> 4. If the fp_cache are full, then we re-compute
>>>>> the fp_exceptions by re-running the fp FpRecord sequence.
>>>>
>>>> All this cache management and more than one element seems unnecessary to
>>>> me although I may be missing something.
>>>>
>>>>> Now the keypoint is how to tracking the read and write of FPSCR register,
>>>>> The current code are
>>>>> cpu_fpscr = tcg_global_mem_new(cpu_env,
>>>>> offsetof(CPUPPCState, fpscr), "fpscr");
>>>>
>>>> Maybe you could search where the value is read which should be the places
>>>> where we need to handle it but changes may be needed to make a clear API
>>>> for this between target/ppc, TCG and softfloat which likely does not
>>>> exist yet.
>>
>> Once the per-op calculation is fixed in the PPC front-end I thing the
>> only change needed is to remove the #if defined(TARGET_PPC) in
>> softfloat.c - it's only really there because it avoids the overhead of
>> checking flags which we always know to be clear in it's case.
>
> That's the theory but I've found that removing that define currently
> makes general fp ops slower but vector ops faster so I think there may
> be some bugs that would need to be found and fixed. So testing with
> some proper test suite might be needed.
You might want to do what Laurent did and hack up a testfloat with
"system" implementations:
https://github.com/vivier/m68k-testfloat/blob/master/testfloat/M68K-Linux-GCC/systfloat.c
I would be nice to plumb that sort of support into our existing
testfloat fork in the code base (tests/fp) but I suspect getting an
out-of-tree fork building and running first would be the quickest way
forward.
>
> Regards,
> BALATON Zoltan
--
Alex Bennée
- Re: About hardfloat in ppc, BALATON Zoltan, 2020/05/01
- Re: About hardfloat in ppc, Yonggang Luo, 2020/05/01
- Re: About hardfloat in ppc, Yonggang Luo, 2020/05/01
- Re: About hardfloat in ppc, Richard Henderson, 2020/05/01
- Re: About hardfloat in ppc, Yonggang Luo, 2020/05/01
- Re: About hardfloat in ppc, Richard Henderson, 2020/05/01