octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: profiling Octave


From: Rik
Subject: Re: profiling Octave
Date: Mon, 12 Aug 2019 14:08:51 -0700

On 08/10/2019 09:48 PM, Daniel J Sebald wrote:
>
> You've picked an instruction format heavily dependent on the efficiency
> of the interpreter, correct?

Correct.  The libraries behind Octave, like BLAS and LAPACK, are changing
very slowly and mostly just for the occasional bug fix.  Thus, I don't
foresee any slowdowns in the code that Octave calls.  Hence, I think any
slowdown over time is in Octave's sequence of parsing code, building an
intermediate representation, and then evaluating that representation.

>
> Is there any part of Octave around the interpreter engine that has gone
> from one version to another?  (I'm recalling some kind of open-source
> parser or compiler as one of the libraries included.)  That is, is it an
> Octave issue, or could it be some outside library's issue?

JWE has changed the parser from a pull parser to a push parser (or maybe
the other way around?) over time.  My hunch is that it is not parsing (done
by lex and bison--old, well-established tools), but evaluation that has slowed.

>
> Regarding the interpreting result and how efficient it might be, is there
> anywhere in the Octave code where one could dump out, say, the size of
> the interpreter code/instruction list/or anything that might give an
> indication of efficiency?  That is, say there is a point where the parser
> does it's job, and it is then a case of executing some type of primitive
> "language".  We could then put a printf for the size of the primitive
> list in both 4.2.1 and 4.4.1.

This is a bit beyond me, but I think so.  I believe Octave constructs a
tree of nodes from the source code and then proceeds to evaluate them.  So
you are correct, if

y = x + 1

generates the same primitive actions

1) create temporary octave_value from scalar 1
2) fetch x
3) perform operator function plus (x, tmp_value)
4) assign octave_value output to y

between versions, but one version takes longer than another, then the
conclusion must be that the individual steps are taking longer.
 
>
> Using a timer *inside* the code rather than external might help as well.
>  That is, say there is such a point I described above.  Printing out
> times at various locations along the way
>
> Start
> End of interpreter
> End of execution phase
>
> might reveal something.

We might have to go that route.  I wasn't able to get solid results from
oprofile or gprof, and callgrind just left me confused.

>
> One last question, is just-in-time compiling supposed to improve this by
> cutting down on the amount of interpretation needed in the double looping
> above?

I think so.  Idea is to do the parsing to intermediate form.  Then, instead
of calling evaluate on the IR each time through the loop, continue further
and actually compile the IR to true executable code that can just be run,
not evaluated.

--Rik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]