gforth
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gforth] Performance anomality with dynamic superinstructions on MIP


From: David Kuehling
Subject: Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel
Date: Sat, 22 Mar 2014 15:25:42 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

>>>>> "Bernd" == Bernd Paysan <address@hidden> writes:

> Am Samstag, 22. März 2014, 07:24:55 schrieb David Kuehling:
>> I'm using a recent gforth revision from git (6ec9915f6277de) and
>> noticed that running gforth --dynamic produces pretty extreme
>> performance degradation [..]

> How does this affect other microbenchmarks, e.g. onebench.fs? And:
> SEE-CODE <word> shows the dynamically generated code; could you
> provide that for the microbenchmark above?

Ahh, SEE-CODE does a nice job.  The disassembly for the full
code-sequences of my recursive micro-benchmark for gforth-fast with and
w/o --dynamic is listed below.  Looks like there is a problem with the
CALL code sequence generated for calls into colon-definitions:

  gforth-fast --dynamic
  : test1 ;
  : test2 test1 ;
  see-code test2

  $2BB725B0 call
  $2BB725B4 <test1> 
  ( $2BFC9FA8 ) 3 16 0 addu,
  ( $2BFC9FAC ) 16 0 16 lw,
  ( $2BFC9FB0 ) 2 18 0 addu,
  ( $2BFC9FB4 ) 3 3 4 addiu,
  ( $2BFC9FB8 ) 18 18 -4 addiu,
  ( $2BFC9FBC ) 16 16 4 addiu,
  ( $2BFC9FC0 ) 3 -4 2 sw,
  ( $2BFC9FC4 ) 2 -4 16 lw,
  ( $2BFC9FC8 ) $7C03E83B , ( illegal inst ) 
  ( $2BFC9FCC ) 4 -32680 28 lw,
  ( $2BFC9FD0 ) 30 3 0 addu,
  ( $2BFC9FD4 ) 4 4 30 addu,
  ( $2BFC9FD8 ) 3 2 0 addu,
  ( $2BFC9FDC ) 4 256 29 sw,
  ( $2BFC9FE0 ) 3 jr,
  ( $2BFC9FE4 ) 1 1 0 or,
  $2BB725B8 ;s ok

Compare this against the disassembly of CALL:
see call:

  Code call  
  ( $403C34 ) 3 16 0 addu,
  ( $403C38 ) 16 0 16 lw,
  ( $403C3C ) 2 18 0 addu,
  ( $403C40 ) 3 3 4 addiu,
  ( $403C44 ) 18 18 -4 addiu,
  ( $403C48 ) 16 16 4 addiu,
  ( $403C4C ) 3 -4 2 sw,
  ( $403C50 ) 2 -4 16 lw,
  ( $403C54 ) 3 2 0 addu,
  ( $403C58 ) 3 jr,
  ( $403C5C ) 1 1 0 or,
  end-code

Instead of NEXT the code in test2 holds some nonsense, starting with
invalid instruction $7C03E83B .  Don't know why that instruction doesn't
SIGILL, but maybe it's a non-standard/undocumented instruction on
Loongson2f.  The binutils also don't know anything about that opcode:

  echo -e "\x3b\xe8\x03\x7c" > /tmp/inst  
  objdump -D -EL -b binary -m mips:loongson_2f /tmp/inst 
  [..]
   0:   7c03e83b        0x7c03e83b

I double-checked that the objdump command above properly disassembles.
It does.  Also for FPU opcodes.

This starts making sense.  When benchmarking, the performance
degradation was worst for code that contains a lot of non-primitives.
That's why the RECURSE example is so telling, because it's dominated
From the recursive non-primitive call.

Onebench.fs confirms that theory:

  gforth-fast gforth/onebench.fs 
   sieve bubble matrix   fib   fft
   1.388  1.828  1.640 2.124 1.836

  gforth-fast --dynamic gforth/onebench.fs 
   sieve bubble matrix   fib   fft
   1.880  2.228  2.660 10.776 5.792

The recursive 'fib' benchmark suffers worst (these results were obtained
under load, so may not be very representative for Loongson2f).

cheers,

David

PS: For reference, output of SEE-CODE for the recursion example from my
last mail:

--8<--

gforth-fast
: b 1- DUP 0> IF RECURSE THEN ;
see-code b
$2BD76520 1-
$2BD76524 dup
$2BD76528 0>
$2BD7652C ?branch
$2BD76530 <735536444> 
$2BD76534 call
$2BD76538 <b> 
$2BD7653C ;s ok

--8<--

gforth-fast --dynamic
: b 1- DUP 0> IF RECURSE THEN ;
see-code b
$2B176520 1-
( $2B5CDE84 ) 16 16 4 addiu,
( $2B5CDE88 ) 21 21 -1 addiu,   
$2B176524 noop
( $2B5CDE8C ) 21 0 17 sw,
( $2B5CDE90 ) 17 17 -4 addiu,   
( $2B5CDE94 ) 21 4 17 lw,
( $2B5CDE98 ) 16 16 4 addiu,
$2B176528 0>
( $2B5CDE9C ) 21 0 21 slt,
( $2B5CDEA0 ) 16 16 4 addiu,
( $2B5CDEA4 ) 21 0 21 subu,
$2B17652C ?branch
$2B176530 <722953532>
( $2B5CDEA8 ) 3 17 0 addu,
( $2B5CDEAC ) 2 0 16 lw,
( $2B5CDEB0 ) 21 0 28 bne,
( $2B5CDEB4 ) 17 17 4 addiu,
( $2B5CDEB8 ) 16 2 4 addiu,
( $2B5CDEBC ) 2 -4 16 lw,
( $2B5CDEC0 ) 21 4 3 lw,
( $2B5CDEC4 ) 3 2 0 addu,
( $2B5CDEC8 ) 3 jr,
( $2B5CDECC ) 1 1 0 or,
( $2B5CDED0 ) 21 4 3 lw,
( $2B5CDED4 ) 16 16 8 addiu,
$2B176534 call
$2B176538 <b>
( $2B5CDED8 ) 3 16 0 addu,
( $2B5CDEDC ) 16 0 16 lw,
( $2B5CDEE0 ) 2 18 0 addu,
( $2B5CDEE4 ) 3 3 4 addiu,
( $2B5CDEE8 ) 18 18 -4 addiu,   
( $2B5CDEEC ) 16 16 4 addiu,
( $2B5CDEF0 ) 3 -4 2 sw,
( $2B5CDEF4 ) 2 -4 16 lw,
( $2B5CDEF8 ) $7C03E83B , ( illegal inst )
( $2B5CDEFC ) 4 -32680 28 lw,   
( $2B5CDF00 ) 30 3 0 addu,
( $2B5CDF04 ) 4 4 30 addu,
( $2B5CDF08 ) 3 2 0 addu,
( $2B5CDF0C ) 4 256 29 sw,
( $2B5CDF10 ) 3 jr,
( $2B5CDF14 ) 1 1 0 or,
$2B17653C ;s ok

--8<--
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk2.gpg
Fingerprint: B63B 6AF2 4EEB F033 46F7  7F1D 935E 6F08 E457 205F

Attachment: pgpoTQJE05w95.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]