[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Performance question concerning chicken flonum vs "foreign flonum"
From: |
Christian Himpe |
Subject: |
Re: Performance question concerning chicken flonum vs "foreign flonum" |
Date: |
Sat, 06 Nov 2021 14:30:40 +0100 (CET) |
felix.winkelmann@bevuta.com schrieb am 2021-11-06:
> > modified code:
> >
> > 7.378s CPU time, 0/225861 GCs (major/minor), maximum live heap: 30.78 MiB
> > 8.498s CPU time, 0/238095 GCs (major/minor), maximum live heap: 30.78 MiB
> >
> > Both were compiled with -O3 optimization level in gcc.
> >
> > I am fine with these results given your layout of the internals in the
> > background.
> >
> > Would it be theoretically thinkable to include such fma functionality
> > directly into chicken.flonum, i.e. as fp+*, or are included modules
> > typically unaltered?
> The core modules like chicken.flonum can be optimized freely, as they are
> always
> delivered with the base system and the compiler is often tuned to treat these
> specially.
> I wonder why the speed difference still exists, could you send me the
> generated
> assembly code for the test program, as produced by your compiler? I'd like to
> see
> how far the C compiler goes at inlining the fma operation.
> If this can give a noticable speedup, I see no reason why not to add such an
> operation, but it would be nice to measure the effect before we do this. I
> can send
> you a patch for testing if you like.
> Note that one may have to use compiler intrinsics or special C compiler
> options
> to enable this, see for example:
>
> https://stackoverflow.com/questions/15933100/how-to-use-fused-multiply-add-fma-instructions-with-sse-avx
> felix
Dear felix,
a patch would be great, if it is not too much work. Attached you find three
assembly language files:
* fma-test_original.s (unchananged csc c to assembly)
* fma-test_modified.s (modified csc c from previous mail)
* fma-test_modified_mfma.s (modified csc c and -mfma gcc option)
all files were created with the additional gcc arguments -O3 -S -fverbose-asm.
Are these files sufficient?
I hoped the fma libc function would insulate one from intrinsics; the compiler
option -mfma should activate (I think via defining a C macro) the use of the
corresponding CPU instruction (fma3 on current x86), which my CPU supports, but
using it does not seem to make a difference.
Best
Christian
fma-test_original.s
Description: Binary data
fma-test_modified.s
Description: Binary data
fma-test_modified_mfma.s
Description: Binary data
- Performance question concerning chicken flonum vs "foreign flonum", christian.himpe, 2021/11/04
- Re: Performance question concerning chicken flonum vs "foreign flonum", Jörg F. Wittenberger, 2021/11/04
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/04
- Re: Performance question concerning chicken flonum vs "foreign flonum", Christian Himpe, 2021/11/05
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/05
- Re: Performance question concerning chicken flonum vs "foreign flonum",
Christian Himpe <=
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/06
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/07
- Re: Performance question concerning chicken flonum vs "foreign flonum", Christian Himpe, 2021/11/07
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/07
- Re: Performance question concerning chicken flonum vs "foreign flonum", Christian Himpe, 2021/11/07