help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kolmogorov-Smirnov test 2


From: Tommy McCann
Subject: Re: Kolmogorov-Smirnov test 2
Date: Thu, 27 Jun 2019 14:35:51 -0700

The version of octave I'm using is 4.2.0 it looks like the statistics package is 1.3.0 

True, in the original thread the error only occurred during the first run and her primary question was why the p value wasn't consistent. I got the error because I was comparing two cdfs instead of two data sets. Here is a piece of code similar to the one I used to troubleshoot. 

%%begin script to test KS error...

%create random vectors
d1 = rand(100,1);
d2 = rand(100,1);

x = 0:0.05:1;

%create cdfs to visualize kolmogorov_smirnov_test results
d1_cdf = empirical_cdf(x,d1);
d2_cdf = empirical_cdf(x,d2);

%plot cdfs
figure(1);
plot(x,d1_cdf,'-',x,d2_cdf,'-');

% run KS-test
disp('first test, without duplicate does not produce warning: ');
P = kolmogorov_smirnov_test_2(d1, d2)

%add duplicate value to vectors then re-run test
disp('second test wcontaining duplicate produces warning : ');
next = length(d1)+1;
d1(next) = 0.5;
d2(next) = 0.5;

P = kolmogorov_smirnov_test_2(d1, d2)

%attempt to use cdfs instead of raw data as specified in help info
disp('using cdf produces warning and incorrect results:');
P = kolmogorov_smirnov_test_2(d1_cdf, d2_cdf)

On Thu, Jun 27, 2019 at 1:09 AM Kai Torben Ohlhus <address@hidden> wrote:
On 6/22/19 8:15 AM, tmac017 wrote:
> I was trying to use the kolmogorov_smirnov_test_2 and I got this error
>
> warning: kolmogorov_smirnov_test_2: cannot compute correct p-values with
> ties
> warning: called from
>     kolmogorov_smirnov_test_2 at line 79 column 5
>
> I saw there was another thread about this but it didn't answer the question
> and that thread is closed.  Since I spent sometime looking at the code I'm
> re-posting.
>
> The warning means that some values in each set are exactly the same. The
> reason this is a problem is because the code sorts the values from both sets
> and the sorted values can't occupy the same place in an ordered series. In
> order to avoid an error caused by the sorting the function deletes the D
> value at that point.  I don't think this should cause any problems but it
> still prints a warning.
>
> The reason I got this error is because I was using the function
> empirical_cdf to generate a cdf for each data set along the same range
> because the HELP info said the function required cdf inputs.  Based on the
> code it seems like the function takes in two data sets not CDFs. Because
> CDFs alter the size of the set it messes with the results.
>
> Note: in the other thread Hamish was having a hard time using the KS-test
> for
> a = randn(2000,1);
> b = randn(2000,1);
> p = kolmogorov_smirnov_test_2(a,b)
>
> she got the same error and the results weren't consistent. This is
> ironically BECAUSE of the large set size.  The test statistic is sqrt (n_x *
> n_y / (n_x + n_y)) * d.  Since the curves were randomly generated some
> deviation was expected, the large sample size made the test more sensitive
> to deviation, increasing the sample size just made the test even more
> sensitive.
>
Please can you tell the version of Octave and the version of the
statistics package you are using?  In version 4.4.0 many statistics
functions moved to the statistics package of Octave Forge [1].

Additionally, it was nice to provide a reproducible test for this
warning message.  The example of Hamish from 2005 [2]


   N = 1e6; while 1, a = randn(N,1); b = randn(N,1); p =
kolmogorov_smirnov_test_2(a,b), endwhile

did not throw the warning you described N=2000 or N=1e6 for 5 minutes.

Best,
Kai


[1] https://octave.sourceforge.io/statistics/NEWS.html
[2] https://lists.gnu.org/archive/html/help-octave/2005-11/msg00232.html

reply via email to

[Prev in Thread] Current Thread [Next in Thread]