chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Performance question concerning chicken flonum vs "foreign flonum"


From: Christian Himpe
Subject: Re: Performance question concerning chicken flonum vs "foreign flonum"
Date: Sat, 06 Nov 2021 14:30:40 +0100 (CET)

felix.winkelmann@bevuta.com schrieb am 2021-11-06:
> > modified code:
> >
> > 7.378s CPU time, 0/225861 GCs (major/minor), maximum live heap: 30.78 MiB
> > 8.498s CPU time, 0/238095 GCs (major/minor), maximum live heap: 30.78 MiB
> >
> > Both were compiled with -O3 optimization level in gcc.
> >
> > I am fine with these results given your layout of the internals in the 
> > background.
> >
> > Would it be theoretically thinkable to include such fma functionality 
> > directly into chicken.flonum, i.e. as fp+*, or are included modules 
> > typically unaltered?

> The core modules like chicken.flonum can be optimized freely, as they are 
> always
> delivered with the base system and the compiler is often tuned to treat these 
> specially.
> I wonder why the speed difference still exists, could you send me the 
> generated
> assembly code for the test program, as produced by your compiler? I'd like to 
> see
> how far the C compiler goes at inlining the fma operation.
> If this can give a noticable speedup, I see no reason why not to add such an
> operation, but it would be nice to measure the effect before we do this. I 
> can send
> you a patch for testing if you like.

> Note that one may have to use compiler intrinsics or special C compiler 
> options
> to enable this, see for example:

>     
> https://stackoverflow.com/questions/15933100/how-to-use-fused-multiply-add-fma-instructions-with-sse-avx


> felix

Dear felix,

a patch would  be great, if it is not too much work. Attached you find three 
assembly language files:

* fma-test_original.s (unchananged csc c to assembly)
* fma-test_modified.s (modified csc c from previous mail)
* fma-test_modified_mfma.s (modified csc c and -mfma gcc option)

all files were created with the additional gcc arguments -O3 -S -fverbose-asm. 
Are these files sufficient?

I hoped the fma libc function would insulate one from intrinsics; the compiler 
option -mfma should activate (I think via defining a C macro) the use of the 
corresponding CPU instruction (fma3 on current x86), which my CPU supports, but 
using it does not seem to make a difference.

Best

Christian

Attachment: fma-test_original.s
Description: Binary data

Attachment: fma-test_modified.s
Description: Binary data

Attachment: fma-test_modified_mfma.s
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]