chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Performance question concerning chicken flonum vs "foreign flonum"


From: felix . winkelmann
Subject: Re: Performance question concerning chicken flonum vs "foreign flonum"
Date: Sat, 06 Nov 2021 01:32:59 +0100

> modified code:
>
> 7.378s CPU time, 0/225861 GCs (major/minor), maximum live heap: 30.78 MiB
> 8.498s CPU time, 0/238095 GCs (major/minor), maximum live heap: 30.78 MiB
>
> Both were compiled with -O3 optimization level in gcc.
>
> I am fine with these results given your layout of the internals in the 
> background.
>
> Would it be theoretically thinkable to include such fma functionality 
> directly into chicken.flonum, i.e. as fp+*, or are included modules typically 
> unaltered?

The core modules like chicken.flonum can be optimized freely, as they are always
delivered with the base system and the compiler is often tuned to treat these 
specially.
I wonder why the speed difference still exists, could you send me the generated
assembly code for the test program, as produced by your compiler? I'd like to 
see
how far the C compiler goes at inlining the fma operation.
If this can give a noticable speedup, I see no reason why not to add such an
operation, but it would be nice to measure the effect before we do this. I can 
send
you a patch for testing if you like.

Note that one may have to use compiler intrinsics or special C compiler options
to enable this, see for example:

    
https://stackoverflow.com/questions/15933100/how-to-use-fused-multiply-add-fma-instructions-with-sse-avx


felix




reply via email to

[Prev in Thread] Current Thread [Next in Thread]