chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Performance question concerning chicken flonum vs "foreign flonum"


From: Jörg F. Wittenberger
Subject: Re: Performance question concerning chicken flonum vs "foreign flonum"
Date: Thu, 4 Nov 2021 20:51:01 +0100

Hi Christian,

this might be a case of "never trust a statistics you did not falsify
yourself".

Not bothering to speculate about explanations, I tend to ask how stable
the results are wrt. larger N's, repetition etc.

IMHO the results are too close for a call.  Roughly this looks like 91%
memory usage (minor gc's) going along of 85% runtime.  Ergo: GC takes
time. My first guess: There may be allocation going on in the FFI
accounting for the increased memory usage.

I'm in no way competent to actually confirm or rule out that
hypothesis.  Please take my whole assessment with a grain of salt; just
a fist guess.

Am Thu, 04 Nov 2021 16:46:50 +0100 (CET)
schrieb <christian.himpe@uni-muenster.de>:

> Dear All,
> 
> I am currently experimenting with Chicken Scheme and I would like to
> ask about the following situation: I am comparing a "pure" Scheme
> fused-multiply-add (fma) using chicken.flonum against C99's fma via
> chicken.foreign. Here is my test code:
> 
> ;;;; fma-test.scm
> 
> (import (chicken flonum) (chicken foreign) srfi-4)
> 
> (foreign-declare "#include <math.h>")
> 
> ;; FMA via nested fp+ and fp* from chicken-flonum
> (define (scm-fma x y z)
>   (fp+ z (fp* x y)))
> 
> ;; FMA via C99 function through chicken-foreign
> (define c99-fma (foreign-lambda double "fma" double double double))
> 
> ;; Test function for FMAs
> (define (dot fma a b)
>   (do [(idx 0 (add1 idx))
>        (dim (f64vector-length a))
>        (ret 0.0 (fma (f64vector-ref a idx) (f64vector-ref b idx)
> ret))] ((= idx dim) ret)))
> 
> ;; Test vector dimension
> (define dim 2000000)
> 
> ;; Test vector 1
> (define a (make-f64vector dim 1.2345))
> 
> ;; Test vector 2
> (define b (make-f64vector dim 0.9876))
> 
> ;; Test repetitions
> (define N 200)
> 
> ;; Test scm-dot
> (time (do [(n 0 (add1 n))]
>         ((= n N))
>         (dot scm-fma a b)))
> 
> ;; Test fma-dot
> (time (do [(n 0 (add1 n))]
>         ((= n N))
>         (dot c99-fma a b)))
> 
> ;eof
> 
> Runnnig this code as follows:
> 
> csc -O5 fma-test.scm && ./fma-test
> 
> yields the results in:
> 
> 7.558s CPU time, 0/225861 GCs (major/minor), maximum live heap: 30.78
> MiB 8.839s CPU time, 0/256410 GCs (major/minor), maximum live heap:
> 30.78 MiB
> 
> Now I wonder why C's single function (instruction) is slower than two
> Scheme functions calls. I have four potential explanations:
> 
> 1. chicken.foreign needs to do some type conversion for each argument
> and return value which accounts for the extra time. If so could this
> be avoided by type declarations somehow?
> 
> 2. chicken.flonum does something to make fpX calls very fast. If so
> can this be done for the foreign fma, too?
> 
> 3. I am using chicken.foreign inefficiently, but I think srfi-144 is
> using it similarly.
> 
> 4. This is an effect only on my machine?
> 
> It would be great to get some help or explanation with this issue.
> 
> Here is my setup:
> 
> CHICKEN Scheme 5.2.0
> gcc 10.3.0
> Ubuntu 20.04
> AMD Ryzen 5 4500U with 16GB
> 
> Thank you very much
> 
> Christian
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]