gforth
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gforth] Performance anomality with dynamic superinstructions on MIP


From: Bernd Paysan
Subject: Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel
Date: Mon, 24 Mar 2014 03:17:41 +0100
User-agent: KMail/4.11.5 (Linux/3.11.10-7-desktop; KDE/4.11.5; x86_64; ; )

Am Sonntag, 23. März 2014, 19:46:24 schrieb Bernd Paysan:
> Am Sonntag, 23. März 2014, 18:38:58 schrieb David Kuehling:
> > Replying to myself, quick update (before I have to shutdown my computer
> > for today):
> > 
> > The instruction in question is 'rdhwr v1,$29' which is mips32r2, i.e.
> > 
> > not supported on Loongson2f.  GCC outputs it via a sequence like:
> >         .set    push
> >         .set    mips32r2
> >         rdhwr   $3,$29
> >         .set    pop
> > 
> > I guess on MIPS the GCC runtime nowadays uses model specific register
> > $29 (which is not CPU reg $29 !) for addressing thread local storage.
> > To support older mipses this is implemented in kernel via an invalid
> > opcode interrupt emulation.  I.e. very slow.  How can we prevent writes
> > to thread local storage from creeping into goto*?
> 
> This stuff is copied from the first NEXT, i.e. the thing between
> before_goto: and after_goto:
> 
> #define FIRST_NEXT_P2 NEXT_P1_5; GOTO_ALIGN; \
> before_goto: goto *real_ca; after_goto:
> 
> Suggestion: Add a "asm volatile("": : :"memory")" before "before_goto:"
> 
> That should scare GCC to move stuff behind it.

I've looked at what ARM and x86_64 GCC do, and they also move in some stuff, 
x86_64 less, ARM more.  It's not as bad as your case (with an emulated 
function), but it's still stuff.  asm __volatile__ ("": : :"memory") doesn't 
prevent it.  Neither does calling a dummy function.

What did the trick?  Using FIRST_NEXT actually in after_last:, this is a dummy 
for getting the tail of the last address, we can put anything we like there.  
Doing FIRST_NEXT there makes it a noop, and since there's nothing to move into 
the goto, it stays as small as it should.

On the Core i7, I see no difference (the two leas and the one write are 
swallowed by the sheer power of the Core i7), but on my Galaxy Note II, this 
gives a very clear and significant speedup:

 0.575 0.710  0.365 0.750 0.390 2014-03-24; Exynos 4 Quad 1.6GHz; gcc-4.8.x 
(Android 4.3)
 0.735 0.920  0.900 1.110 0.690 2012-10-31; Exynos 4 Quad 1.6GHz; gcc-4.6.x 
(Android 4.1.1)

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]