[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Performance question concerning chicken flonum vs "foreign flonum"
From: |
christian.himpe |
Subject: |
Performance question concerning chicken flonum vs "foreign flonum" |
Date: |
Thu, 04 Nov 2021 16:46:50 +0100 (CET) |
Dear All,
I am currently experimenting with Chicken Scheme and I would like to ask about
the following situation: I am comparing a "pure" Scheme fused-multiply-add
(fma) using chicken.flonum against C99's fma via chicken.foreign. Here is my
test code:
;;;; fma-test.scm
(import (chicken flonum) (chicken foreign) srfi-4)
(foreign-declare "#include <math.h>")
;; FMA via nested fp+ and fp* from chicken-flonum
(define (scm-fma x y z)
(fp+ z (fp* x y)))
;; FMA via C99 function through chicken-foreign
(define c99-fma (foreign-lambda double "fma" double double double))
;; Test function for FMAs
(define (dot fma a b)
(do [(idx 0 (add1 idx))
(dim (f64vector-length a))
(ret 0.0 (fma (f64vector-ref a idx) (f64vector-ref b idx) ret))]
((= idx dim) ret)))
;; Test vector dimension
(define dim 2000000)
;; Test vector 1
(define a (make-f64vector dim 1.2345))
;; Test vector 2
(define b (make-f64vector dim 0.9876))
;; Test repetitions
(define N 200)
;; Test scm-dot
(time (do [(n 0 (add1 n))]
((= n N))
(dot scm-fma a b)))
;; Test fma-dot
(time (do [(n 0 (add1 n))]
((= n N))
(dot c99-fma a b)))
;eof
Runnnig this code as follows:
csc -O5 fma-test.scm && ./fma-test
yields the results in:
7.558s CPU time, 0/225861 GCs (major/minor), maximum live heap: 30.78 MiB
8.839s CPU time, 0/256410 GCs (major/minor), maximum live heap: 30.78 MiB
Now I wonder why C's single function (instruction) is slower than two Scheme
functions calls. I have four potential explanations:
1. chicken.foreign needs to do some type conversion for each argument and
return value which accounts for the extra time. If so could this be avoided by
type declarations somehow?
2. chicken.flonum does something to make fpX calls very fast. If so can this be
done for the foreign fma, too?
3. I am using chicken.foreign inefficiently, but I think srfi-144 is using it
similarly.
4. This is an effect only on my machine?
It would be great to get some help or explanation with this issue.
Here is my setup:
CHICKEN Scheme 5.2.0
gcc 10.3.0
Ubuntu 20.04
AMD Ryzen 5 4500U with 16GB
Thank you very much
Christian
- Performance question concerning chicken flonum vs "foreign flonum",
christian.himpe <=
- Re: Performance question concerning chicken flonum vs "foreign flonum", Jörg F. Wittenberger, 2021/11/04
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/04
- Re: Performance question concerning chicken flonum vs "foreign flonum", Christian Himpe, 2021/11/05
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/05
- Re: Performance question concerning chicken flonum vs "foreign flonum", Christian Himpe, 2021/11/06
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/06
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/07
- Re: Performance question concerning chicken flonum vs "foreign flonum", Christian Himpe, 2021/11/07
- Re: Performance question concerning chicken flonum vs "foreign flonum", felix . winkelmann, 2021/11/07
- Re: Performance question concerning chicken flonum vs "foreign flonum", Christian Himpe, 2021/11/07