|
From: | Julien Bect |
Subject: | Re: [Help-gsl] Covariance calculation in gsl |
Date: | Thu, 13 May 2010 09:30:34 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686 (x86_64); fr; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0 |
Le 13/05/2010 08:35, Srimal Jayawardena a écrit :
double gsl_stats_covariance (const double data1[], const size_t stride1, const double data2[], const size_t stride2, const size_t n) http://www.gnu.org/software/gsl/manual/html_node/Covariance.html covar = (1/(n - 1)) \sum_{i = 1}^{n} (x_i - \Hat x) (y_i - \Hat y) Is there any particular reason for dividing with (n-1) instead of just 'n' ? Whats the reasoning behind this ?
Dividing by (n-1) makes "covar" an *unbiased* estimate of the population covariance.
This isn't true anymore if you divide by n instead. See, for instance, http://en.wikipedia.org/wiki/Estimation_of_covariance_matrices
[Prev in Thread] | Current Thread | [Next in Thread] |