help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kolmogorov-Smirnov test 2


From: Kai Torben Ohlhus
Subject: Re: Kolmogorov-Smirnov test 2
Date: Thu, 27 Jun 2019 17:09:38 +0900
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2

On 6/22/19 8:15 AM, tmac017 wrote:
> I was trying to use the kolmogorov_smirnov_test_2 and I got this error
> 
> warning: kolmogorov_smirnov_test_2: cannot compute correct p-values with
> ties
> warning: called from
>     kolmogorov_smirnov_test_2 at line 79 column 5
> 
> I saw there was another thread about this but it didn't answer the question
> and that thread is closed.  Since I spent sometime looking at the code I'm
> re-posting. 
> 
> The warning means that some values in each set are exactly the same. The
> reason this is a problem is because the code sorts the values from both sets
> and the sorted values can't occupy the same place in an ordered series. In
> order to avoid an error caused by the sorting the function deletes the D
> value at that point.  I don't think this should cause any problems but it
> still prints a warning. 
> 
> The reason I got this error is because I was using the function
> empirical_cdf to generate a cdf for each data set along the same range
> because the HELP info said the function required cdf inputs.  Based on the
> code it seems like the function takes in two data sets not CDFs. Because
> CDFs alter the size of the set it messes with the results. 
> 
> Note: in the other thread Hamish was having a hard time using the KS-test
> for 
> a = randn(2000,1); 
> b = randn(2000,1); 
> p = kolmogorov_smirnov_test_2(a,b) 
> 
> she got the same error and the results weren't consistent. This is
> ironically BECAUSE of the large set size.  The test statistic is sqrt (n_x *
> n_y / (n_x + n_y)) * d.  Since the curves were randomly generated some
> deviation was expected, the large sample size made the test more sensitive
> to deviation, increasing the sample size just made the test even more
> sensitive. 
> 
Please can you tell the version of Octave and the version of the
statistics package you are using?  In version 4.4.0 many statistics
functions moved to the statistics package of Octave Forge [1].

Additionally, it was nice to provide a reproducible test for this
warning message.  The example of Hamish from 2005 [2]


   N = 1e6; while 1, a = randn(N,1); b = randn(N,1); p =
kolmogorov_smirnov_test_2(a,b), endwhile

did not throw the warning you described N=2000 or N=1e6 for 5 minutes.

Best,
Kai


[1] https://octave.sourceforge.io/statistics/NEWS.html
[2] https://lists.gnu.org/archive/html/help-octave/2005-11/msg00232.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]