help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

kolmogorov smirnov test. Normal distribution


From: A. Kalten
Subject: kolmogorov smirnov test. Normal distribution
Date: Wed, 23 Apr 2008 19:03:52 -0400

> OK.  Similar to the results I have got from minitab.

>>  [pval, ks] = kolmogorov_smirnov_test(X,'normal',mean(X),var(X), "<>");
>>  pval
>>  pval = 0.92970
>>  ks
>>  ks = 0.54297


Actually, the parameters that define the normal distribution are
the mean and the standard deviation, not the variance.  So you should
have: kolmogorov_smirnov_test(X,'normal',mean(X),std(X))

But the worse error is that sample parameters should never be used
in the KS test.  The KS test is for comparing sample data with an
assumed distribution containing assumed parameters.  In other words,
you are testing whether or not your data fit some normal distribution
defined by some mean, x, and standard deviation, s, but x and s
are not determined by your sample.

However, having said that, the use of sample parameters should
still give meaningful results, but the KS test has to be modified.
I doubt if the octave code includes this modification (although
I haven't examined the source).

Anyway, the KS test can be performed manually using octave.
Let's try it with your data.

X=[7.11, 6.73, 6.95, 7.25, 7.25, 7.03, 7.10, 7.15, 6.78, 7.09, 7.37, 7.22, 
6.82, 6.72, 6.95]

The KS statistic is just the maximum distance between the
empirical cumulative distribution defined by the data and
the cumulative distribution defined by the assumed population
parameters.

First sort the data:

X=sort(X)

Then determine the empirical cumulative distribution:

Xecdf=empirical_cdf(X,X)

Now we assume that the data is drawn from a normal population
with mean=7.0 and std=0.2.  These are close, but not the same
as the sample parameters.  Using these values, we determine
the cumulative distribution:

assumed_pop=normcdf(X,7.0,0.2)

The KS statistic is the maximum of the difference between these
two cumulative distributions:

ks=max(abs(assumed_pop-Xecdf))

ks =  0.14031

This statistic can vary a great deal depending on the
assumed population parameters.  For example, using
the actual sample parameters:

assumed_pop=normcdf(X,mean(X),std(X))
ks=max(abs(assumed_pop-Xecdf))
ks =  0.11502

Consulting a handbook containing tables of the KS statistic,
in both cases the ks value is still less than a tabulated value
of 0.3040 for p<0.05 at n=15.  This indicates that the null hypothesis,
which is that the sample is normal, should not be rejected.

For some reason, the octave built-in KS test does not give the
same statistic as does the manual method outlined above:

[pval,ks]=kolmogorov_smirnov_test(X,"normal",mean(X),std(X))
pval =  0.30502
ks =  0.96875

Without examining the octave code, I can't say what accounts for the
discrepancy.

To be complete, using the R statistical package, we get:

X <- c(7.11, 6.73, 6.95, 7.25, 7.25, 7.03, 7.10, 7.15, 6.78, 7.09, 7.37, 7.22, 
6.82, 6.72, 6.95)

ks.test(X,"pnorm",mean(X),sd(X))
        One-sample Kolmogorov-Smirnov test

data:  X 
D = 0.1402, p-value = 0.9297
alternative hypothesis: two-sided

AK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]