help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with Statistics package


From: Hannes
Subject: Re: Problems with Statistics package
Date: Wed, 30 May 2012 11:06:42 +0200
User-agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4)

Ok, I think I got it pinned downed and solved. I have attached a patch.

Heres the explenation:

  total_mean = mean (y(:));
  SSB = sum (group_count .* (group_mean - total_mean) .^ 2);

is always positive [sum of squares between]

  SST = sumsq (reshape (y, n, 1) - total_mean);

is always positive and >= SSB [sum of squares total]

  SSW = SST - SSB;

=> should always be >=0
but it is an ill conditioned operation if SST==SSB resulting in possibly negative error. Note that if the error is positive, it doesn't matter because it will produce f == Inf and p = 0 as expected.

  df_b = k - 1;
  df_w = n - k;
  v_b = SSB / df_b;
  v_w = SSW / df_w;

this can become negative, if the SSW is negative or 0 if SSW is zero or close to zero.


  f = v_b / v_w;

this results in divide by zero if v_w is zero and results in *wrong result without warning* if v_w is negative.


My patch changes this whole block to:

  SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
  SST = sumsq (reshape (y, n, 1) - total_mean);
  SSW = SST - SSB;
  if (SSW < 0) # machine error if SSB == SSW
    SSW = 0;
  endif
  df_b = k - 1;
  df_w = n - k;
  v_b = SSB / df_b;
  v_w = SSW / df_w;
  if (v_w != 0)
    f = v_b / v_w;
    pval = 1 - fcdf (f, df_b, df_w);
  elseif (v_b == 0)
    f = NaN;
    pval = 1;

0 / 0 is  NaN and the hypothesis mus be rejected

  else
    f = Inf;
    pval = 0;

x / 0 is Inf and hypothesis must be accepted (because there is no variance inside the sample but there is outside

  endif

HTH,
Hannes

Attachment: patch_anova.m
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]