|
From: | Alois Schloegl |
Subject: | Re: package nan warnings |
Date: | Sat, 04 Aug 2012 02:10:01 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20120613 Icedove/3.0.11 |
On 2012-08-03 19:21, Max Brister wrote:
On Thu, Aug 2, 2012 at 4:27 PM, Alois Schloegl<address@hidden> wrote:On 2012-08-02 22:43, Jordi Gutiérrez Hermoso wrote:On 2 August 2012 16:40, Alois Schloegl<address@hidden> wrote:3) after installing the NaN-toolbox, sum([1 NaN 2]) will still result in NaN. But with the NaN-toolbox you have an additional function sumskipnan([1,NaN,2]) which gives 3.Why don't you name all of your functions this way and not shadow core functions, then? For example, why do you overwrite sumsq? - Jordi G. H.Ok, sumsq() is a borderline case because you might argue that is not necessarily a statistical function. But for the other functions, why should one need to thing about whether to use var() or nanvar(), mean() or nanmean(), std() or nanstd() ? There is no need for the NaN-propagating version, you always should use the nan-skipping version.This is not always true. For example, lets say I want to write a quick, simple test to see if rand is working. I might write something like assert (mean (rand (10000)(:)), .5, .1); # the mean value of rand should be around .5 I expect this case to fail if rand produces a NaN.
Hi Max, thanks for your interest and your attempt to find a solution.rand() does never produce NaN, so it's not a good example. But lets assume there is some myrand()- functions, and it can produce NaN, I'd expect that NaN is an encoding for missing values. In that case, mean() should ignore the NaN's.
If you need to test for NaN's, do it in an explicit way using any(isnan(x(:))). That's much cleaner, and others will know that your code is testing for NaN's. The problem with implicit NaN-propagation is that it is very difficult to know, whether the NaN-handling has been is a conscious decision or is just a arbitrary side-effect.
When one tries to solve a challenging problem, why should one need to thing about whether to use var(), nanvar(), or some_other_varfunction() ? There is just no need such proliferation of function names - all doing basically the same.As far as the user is concerned, I agree with you. If a user installs the NaN package when they 'var' they want the nan skipping version. I do not think we should be spitting out a bunch of warnings as what the user wants is unambiguous. On the other hand, this creates an issue for scripts in core. Your functions are doing basically, but not quite the same thing. When writing scripts in core I expect NaNs to be propagated. It leads to a maintenance nightmare if you can not be sure of exactly how a function behaves (see gnulib/autotools).
The functions in core and the NaN-tb are doing the same, except for the NaN-propagation thing. Even the core function do not mention in the documentation that NaN's are propagated (see help mean, help var). So, the NaN-handling is really not strictly defined. Applications that rely on NaN-propagation depended on some undocumented behaviour. If you need to test for NaN's, one should do it in an explicit way, e.g. using any(isnan(x(:))). That avoids any ambiguity about NaN handling in your code.
Concerning you suggestion "to partition the namespaces (classes)". To me this sounds like 2nd class citizens. But perhaps it's just me, and being not familiar with this technique. In that case, it would be best if someone else would transform the NaN-tb into a more compatible mode. I'm open for suggestions.A more practical solution would be to use a package [1]. The main problem here is that Octave does not support packages (yet). What do you think about having NaN inside of a package? [1] http://www.mathworks.com/help/techdoc/matlab_oop/brfynt_-1.html
I do not know - the concept of "package" must be quite new, and I've never used it. It seems to me that it is another way to move the issue to some other namespace/class/packages.
These "solutions" have one thing in common, they are just a bad compromise, to sidestep the really address - namely what kind of NaN-handling should be the default for statistical functions.
However, if you believe that there is some need for a compromise solution, a solution based on packages might be a good idea. In that case, just do it.
AloisMax Brister
[Prev in Thread] | Current Thread | [Next in Thread] |