octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: move constructors likely a requirement


From: Daniel J Sebald
Subject: Re: move constructors likely a requirement
Date: Fri, 30 Aug 2019 03:13:05 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 8/29/19 9:43 PM, John W. Eaton wrote:
On 8/24/19 1:01 AM, Rik wrote:

The only thing it does is assign the static value 13 to the variable z.  In
3.2.4 this took 0.17 seconds and now it takes 0.96.

I think I would start investigations with this script.  Is it the creation
of a new octave_value for 13 every time?  Is it assignment?

I spent a lot of time over the last week looking at the interpreter. It's a bunch of different things, but the biggest factor seems to be using the tree walker pattern instead of just doing virtual dispatch to evaluation methods in the parse tree objects themselves.  I think I can speed it up considerably, possibly even get back much closer to the 3.4.3 performance just by going back to something more like what we used to have for the evaluator (but keeping evaluator state in an object instead of as global data).  Another option is to implement a byte code interpreter instead of directly evaluating the parse tree, but that is more work.  OTOH, it might be instructive and helpful for JIT compiling.

I'll try to provide a more complete summary of what I found with some patches tomorrow.

This experience does show that we need some kind of performance testing, and not just an aggregate time required to running the test suite.  We need a set of tests to check performance of specific features.  But I'm not sure how best to do comparisons.  What do we use for baseline values?

Is there some type of builtin function like

speedtest(test)

where test might be "parse", "matmult", "fft", "discio"? Then come up with some type of basic C++ routine that does operations resembling some of the test types. That is, the C++ implementation might not be general but it would be a baseline for the most efficient one. So long as the algorithms stay fixed, then one can do a relative comparison against the speedtest() result. That *might* take the computer/CPU/bus/design speed out of the picture.

Dan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]