Re: Statistical test for equality ?

Hi all,

thanks for all the replies.

It seems that statistical tests always revolve around distributions and parameters. They are very well suited to prove that two samples are different. But they only give hints as to whether samples are equal.

For example you check if the distribution of the results of two coins are identical, you check if two samples have the same mean etc.

I put some independent data into a simulation-model and calculate a result. My input-data is not arbitrary, it has been observed, for example in a physical experiment. Also the results of the experiment have been observed.

The simulation-model should then output the same result as the experiment for the same input-data, otherwise the model is not correct. (Obviously in real-life experiments the measurements are never exact so even a perfect simulation-model will never exactly match the observed values.)

Assume a near perfect model, then the model-results (x1) will be very close to the observed results (x2). That case will lead to positive results in any previously mentioned statistical test.

A bit more formally:
- x1==x2 implies mean(x1)=mean(x2)
and
- x1==x2 implies distribution(x1)==distribution(x2), whatever the distribution may be.

However the reverse conclusion is not necessarily correct. If mean(x1)==mean(x2) then maybe x1==x2 or maybe x1 is completely unrelated to x2 except for equal means. The same applies to distribution, equal distribution-parameters may mean that x1==x2 or not.

So statistical tests will show if my model generates data that looks similar to the original because it has equal means and equal distribution, but the test will not show if my model actually duplicates the observed reality.

It is easy to construct data-sets that have equal means and equal distributions and are even highly correlated and still there can be non-trivial differences in the pairs, even if all tests show that the null-hypothesis should be assumed to be true.

"Equality" in this context means that each related pair of observations should be equal. Or rather: equal enough, not too unequal, whatever "too unequal" may mean in this context.

The usual statistical tests will only check if two samples are from the same target population, never if the same objects have been chosen for both samples.

2013/12/28 louis scott <address@hidden>

Nir replies with max(abs(x1-x2)) more formally, this the Kolmogorov-Smirnov distance.

Yes, some kind of distance might be the answer. I already commented on rms(), see mail from Dec 24th. The same applies for the Kolmogorov-Smirnov distance. I can easily calculate distances but the question remains: how big can the distance get before the samples are not equal any more?

Kolmogorov-Smirnov-tests seem rather sensitive. I have so far not found any sample from reality that the kolmogorov_smirnov_test(x,"norm") considered a normal distribution. No matter how "normal" the hist() and normplot() may look for the sample.

If you are ok with assuming gaussian process and only care for the difference in means,

Gaussian is fine, difference in means is not helping...

THX

stn

From:	stn021
Subject:	Re: Statistical test for equality ?
Date:	Fri, 3 Jan 2014 10:55:44 +0100