help-octave
[Top][All Lists]

## Re: handling NaN

 From: Schloegl Alois Subject: Re: handling NaN Date: Tue, 13 Aug 2002 22:27:16 +0200 (MEST) User-agent: IMP/PHP IMAP webmail program 2.2.8

```Zitiere Paul Kienzle <address@hidden>:

> ...
>
> > > > The consequences are that NaN's should be omitted by default.
> > >
> > > The consequences are that MISSING VALUES should be omitted by
> default.
> >
> >
> > Ok, you are right. Actually, I meant Missing Values. However, on
> explanation of
> > this lapsus is that in statistitics and stochastic signal processing,
> there is
> > no difference between NaNs and Missing values.
> >
> > > NaNs are
> > > not missing values in many (most) computations, but instead are
> things
> > > that inform you that your computation has gone awry.
> >
> > In my experience, NaN's can be omitted most of the time.
> >
> > Lets take the example of calculating the mean (or any other statistic)
> from a
> > data series x:
> > Lets further assume, x is a ratio of s over u; i.e. x=s./u.
> > It could happen that for some elements s and u are zero, resulting in
> x=0/0.
> > Clearly, 0/0 is not defined and results in NaN (an awry computation?).
> But we
> > can still calculate the mean from that data series x.
> > The bottomline is that it does not matter whether NaN's come from 0/0
> or from
> > somewhere else.
>
> This is a bad example.  You do not want to compute a scale factor
> between
> two signals using the mean of the ratio, especially if your measured
> value
> of the signal is near zero.  It will be extremely sensitive to noise.
> In
> this case you are lucky to get the occasional NaN to let you know that
> something is going wrong ;-)

You can calculate the mean (as well as other statistics) for ratios.
The example might be simple, but it is reasonable.

>
> Furthermore, in this case, nanmean won't protect you from NaNs.
> Consider
> s=[-1,0,1] and u=[0,0,0].  In this case nanmean(s./u) will give you
> mean([-Inf,Inf]) which is NaN.  This data isn't unreasonable.

If the result is not defined (like +inf-inf), NaN should be returned. That's
ok.
You provide just another example, how pre-processing might cause some
non-defined results.

The following example includes also this case:.

x = [1,2,3,4;inf,0,0/0,4;-inf,-2,0,4]

Now, we'd like to estimate some statistic, e.g. the standard deviation std(x),
from each column. You get

std(x)

NaN     2   NaN     0

With omitting NaN, you get
» addpath('e:\matlab\nan'); % turn on NaN-toolbox
» std(x)

NaN    2.0000    2.8284         0

Column 1 is still undefined; but now you get an estimate for the standard
deviation of Col 3; In the former result, you did not.

Moreover, this result could be used for calculating a summary statistic (e.g.
mean of std).
You get:
» rmpath('e:\matlab\nan'); % turn off NaN-toolbox
» mean(std(x))

ans =

NaN

» addpath('e:\matlab\nan');  % turn on NaN-toolbox
» mean(std(x))

ans =

1.6095

Again, omitting NaN's gives an estimate of the average standard deviation of
each column. Propagating NaN's would have resulted in an undefined result.
You can even estimate the standard error of the average
Getting an estimate is definitively better than getting non.

Alois

P.S.: In case NaN's must be dentified, it still can be done with ISNAN.

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

```