[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gforth] Performance anomality with dynamic superinstructions on MIP
From: |
David Kuehling |
Subject: |
Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel |
Date: |
Sat, 22 Mar 2014 15:25:42 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) |
>>>>> "Bernd" == Bernd Paysan <address@hidden> writes:
> Am Samstag, 22. März 2014, 07:24:55 schrieb David Kuehling:
>> I'm using a recent gforth revision from git (6ec9915f6277de) and
>> noticed that running gforth --dynamic produces pretty extreme
>> performance degradation [..]
> How does this affect other microbenchmarks, e.g. onebench.fs? And:
> SEE-CODE <word> shows the dynamically generated code; could you
> provide that for the microbenchmark above?
Ahh, SEE-CODE does a nice job. The disassembly for the full
code-sequences of my recursive micro-benchmark for gforth-fast with and
w/o --dynamic is listed below. Looks like there is a problem with the
CALL code sequence generated for calls into colon-definitions:
gforth-fast --dynamic
: test1 ;
: test2 test1 ;
see-code test2
$2BB725B0 call
$2BB725B4 <test1>
( $2BFC9FA8 ) 3 16 0 addu,
( $2BFC9FAC ) 16 0 16 lw,
( $2BFC9FB0 ) 2 18 0 addu,
( $2BFC9FB4 ) 3 3 4 addiu,
( $2BFC9FB8 ) 18 18 -4 addiu,
( $2BFC9FBC ) 16 16 4 addiu,
( $2BFC9FC0 ) 3 -4 2 sw,
( $2BFC9FC4 ) 2 -4 16 lw,
( $2BFC9FC8 ) $7C03E83B , ( illegal inst )
( $2BFC9FCC ) 4 -32680 28 lw,
( $2BFC9FD0 ) 30 3 0 addu,
( $2BFC9FD4 ) 4 4 30 addu,
( $2BFC9FD8 ) 3 2 0 addu,
( $2BFC9FDC ) 4 256 29 sw,
( $2BFC9FE0 ) 3 jr,
( $2BFC9FE4 ) 1 1 0 or,
$2BB725B8 ;s ok
Compare this against the disassembly of CALL:
see call:
Code call
( $403C34 ) 3 16 0 addu,
( $403C38 ) 16 0 16 lw,
( $403C3C ) 2 18 0 addu,
( $403C40 ) 3 3 4 addiu,
( $403C44 ) 18 18 -4 addiu,
( $403C48 ) 16 16 4 addiu,
( $403C4C ) 3 -4 2 sw,
( $403C50 ) 2 -4 16 lw,
( $403C54 ) 3 2 0 addu,
( $403C58 ) 3 jr,
( $403C5C ) 1 1 0 or,
end-code
Instead of NEXT the code in test2 holds some nonsense, starting with
invalid instruction $7C03E83B . Don't know why that instruction doesn't
SIGILL, but maybe it's a non-standard/undocumented instruction on
Loongson2f. The binutils also don't know anything about that opcode:
echo -e "\x3b\xe8\x03\x7c" > /tmp/inst
objdump -D -EL -b binary -m mips:loongson_2f /tmp/inst
[..]
0: 7c03e83b 0x7c03e83b
I double-checked that the objdump command above properly disassembles.
It does. Also for FPU opcodes.
This starts making sense. When benchmarking, the performance
degradation was worst for code that contains a lot of non-primitives.
That's why the RECURSE example is so telling, because it's dominated
From the recursive non-primitive call.
Onebench.fs confirms that theory:
gforth-fast gforth/onebench.fs
sieve bubble matrix fib fft
1.388 1.828 1.640 2.124 1.836
gforth-fast --dynamic gforth/onebench.fs
sieve bubble matrix fib fft
1.880 2.228 2.660 10.776 5.792
The recursive 'fib' benchmark suffers worst (these results were obtained
under load, so may not be very representative for Loongson2f).
cheers,
David
PS: For reference, output of SEE-CODE for the recursion example from my
last mail:
--8<--
gforth-fast
: b 1- DUP 0> IF RECURSE THEN ;
see-code b
$2BD76520 1-
$2BD76524 dup
$2BD76528 0>
$2BD7652C ?branch
$2BD76530 <735536444>
$2BD76534 call
$2BD76538 <b>
$2BD7653C ;s ok
--8<--
gforth-fast --dynamic
: b 1- DUP 0> IF RECURSE THEN ;
see-code b
$2B176520 1-
( $2B5CDE84 ) 16 16 4 addiu,
( $2B5CDE88 ) 21 21 -1 addiu,
$2B176524 noop
( $2B5CDE8C ) 21 0 17 sw,
( $2B5CDE90 ) 17 17 -4 addiu,
( $2B5CDE94 ) 21 4 17 lw,
( $2B5CDE98 ) 16 16 4 addiu,
$2B176528 0>
( $2B5CDE9C ) 21 0 21 slt,
( $2B5CDEA0 ) 16 16 4 addiu,
( $2B5CDEA4 ) 21 0 21 subu,
$2B17652C ?branch
$2B176530 <722953532>
( $2B5CDEA8 ) 3 17 0 addu,
( $2B5CDEAC ) 2 0 16 lw,
( $2B5CDEB0 ) 21 0 28 bne,
( $2B5CDEB4 ) 17 17 4 addiu,
( $2B5CDEB8 ) 16 2 4 addiu,
( $2B5CDEBC ) 2 -4 16 lw,
( $2B5CDEC0 ) 21 4 3 lw,
( $2B5CDEC4 ) 3 2 0 addu,
( $2B5CDEC8 ) 3 jr,
( $2B5CDECC ) 1 1 0 or,
( $2B5CDED0 ) 21 4 3 lw,
( $2B5CDED4 ) 16 16 8 addiu,
$2B176534 call
$2B176538 <b>
( $2B5CDED8 ) 3 16 0 addu,
( $2B5CDEDC ) 16 0 16 lw,
( $2B5CDEE0 ) 2 18 0 addu,
( $2B5CDEE4 ) 3 3 4 addiu,
( $2B5CDEE8 ) 18 18 -4 addiu,
( $2B5CDEEC ) 16 16 4 addiu,
( $2B5CDEF0 ) 3 -4 2 sw,
( $2B5CDEF4 ) 2 -4 16 lw,
( $2B5CDEF8 ) $7C03E83B , ( illegal inst )
( $2B5CDEFC ) 4 -32680 28 lw,
( $2B5CDF00 ) 30 3 0 addu,
( $2B5CDF04 ) 4 4 30 addu,
( $2B5CDF08 ) 3 2 0 addu,
( $2B5CDF0C ) 4 256 29 sw,
( $2B5CDF10 ) 3 jr,
( $2B5CDF14 ) 1 1 0 or,
$2B17653C ;s ok
--8<--
--
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk2.gpg
Fingerprint: B63B 6AF2 4EEB F033 46F7 7F1D 935E 6F08 E457 205F
pgpoTQJE05w95.pgp
Description: PGP signature
- [gforth] Performance anomality with dynamic superinstructions on MIPSel, David Kuehling, 2014/03/22
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, Bernd Paysan, 2014/03/22
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel,
David Kuehling <=
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, Anton Ertl, 2014/03/23
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, David Kuehling, 2014/03/23
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, David Kuehling, 2014/03/23
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, Bernd Paysan, 2014/03/23
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, Bernd Paysan, 2014/03/23
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, David Kuehling, 2014/03/23
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, Bernd Paysan, 2014/03/24
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, Anton Ertl, 2014/03/24
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, Bernd Paysan, 2014/03/24
- Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel, Anton Ertl, 2014/03/25