On 06/19/2017 04:32 PM, Rik wrote:
On 06/19/2017 07:37 AM, Michael D
Godfrey wrote:
On 05/23/2017 07:42 PM, John W.
Eaton wrote:
On
05/23/2017 01:12 PM, Mike Miller wrote:
On Tue, May
23, 2017 at 12:16:07 -0400, John W. Eaton wrote:
On
05/23/2017 04:37 AM, Mike Miller wrote:
Confirmed
here. I bisected and found a lot of performance loss
starting
with c452180ab672. Its immediate predecessor
f4d4d83f15c5 has about the
same performance as 4.2.1. If you can compare those two
revisions and
confirm, that's a good place to start looking for a
cause.
What was your test for performance here?
I recall timing "make check" when I made those changes and
did not see a
significant change in performance.
If I have something to test, I'll take a look at it.
I ran Dmitri's test case a handful of times at each build
revision. I
get a distinct difference between f4d4d83f15c5 and
c452180ab672, all
other things being equal. I'm using OpenBLAS instead of
ATLAS.
I ran multiple Octave sessions with -cli -W, built without
Qt to speed
up bisecting, using the test case "x = rand(4000); tic;
x'*x; toc".
f4d4d83f15c5: mean is 0.63071 seconds, std dev is 0.0024187.
c452180ab672: mean is 1.1713 seconds, std dev is 0.11803.
This is the test case that I used to bisect and the results
stayed
consistent and converged on this revision.
Thanks, it should be fixed now with the latest two changesets
that I pushed.
The implementation of the compound binary _expression_ object is
a bit tricky and I made a mistake when I translated the
rvalue1 operation to a tree_evaluator::visit* function.
I'm sure the reason that I didn't see anything significant in
my tests was that I only looked at the overall performance of
running the test suite, not any one operation individually. I
wasn't expecting much difference in performance in each
evaluation step. I was more concerned with whether using
stack objects to hold function results would perform worse
than returning values from the rvalue functions.
jwe
I have done some comparisons between 4.0.3 and the current dev
be69ea3de7a3 tip @ (also some previous devs)
and typically I see:
4.3.0+
test 2: cputime used: 9.2e-01 seconds
4.0.3 /usr/bin/octave --no-gui
test 2: cputime used: 6.4e-01 seconds
Initially, I was checking Rik's conversion of the elementary
functions to C++ std (which seem to be all
alright) but I noticed the large timing difference. The code
that I used spends most of its time transforming
complex-valued arrays using exp(), atanh(), etc. Since I ran
some tests prior to Rik's new code, it appears
that the cause is not the new std functions.
Michael,
Thanks for noticing this. If the issue is a slow down in
complex-valued arrays then maybe you can re-test in about a week?
At the moment I am converting many of the basic mapper functions
which used to dispatch to gnulib, Fortran, or even our own
hand-rolled C++ code, to instead dispatch to the C++ standard
library. Besides making the code simpler, and reducing our
external dependencies during configure, Octave will now sit
squarely atop the standard library which is a well-debugged and
well-coded piece of software.
My next task, after the basic functions, is to look at how the
mapper functions are implemented for complex values. Currently,
we often hand code our own functions for complex values. However,
std::complex already includes templates for some of the basic math
functions. I would like to switch over to using the standard
templates whenever possible which might improve performance.
--Rik
Rik,
I did not fully appreciate how much work you were doing! But, keep
in mind that the loss of performance seems to
have occurred before you started. Maybe what you are doing will
recover some. In any case, I will run the same tests
as soon as you are done.
Thanks!
Michael
|