[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gsl_multifit_linear: chisq is not residual sum of squares when X is not
From: |
Lijun WANG |
Subject: |
gsl_multifit_linear: chisq is not residual sum of squares when X is not full column rank |
Date: |
Sun, 4 Sep 2022 22:34:27 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 |
Hi,
Recently, I am using `gsl_multifit_linear` to regress a response vector
y on a design matrix X. I found that `chisq` is not the sum of squares
of the residuals when X is not full column rank. In other words, it is
different from the residual sum of squares (RSS) calculated based on the
returned coefficients, gif.latex (81×22).
In the related source code, "gsl-2.7/multifit/linear_common.c", on line
192, it assigns the norm calculated on line 118. However, if X is not
full column rank, the modified SVD sets alpha = 0 on line 172 to discard
the zero singular value, RSS is no longer the one || y - U U^T y ||
commented on line 113. Instead, it should be || y - U_* U_*^T y ||,
where U_* is constructed by U after discarding the columns corresponding
to zero singular values.
If you are familiar with R language, I can illustrate the idea with a
toy example,
> set.seed(1234)
> x = matrix(rnorm(10*2), 10, 2)
> y = x[,1] + rnorm(10) * 0.1
> x2 = cbind(x, 0)
> X = cbind(1, x2)
> sv = svd(X)
> # chisq in gsl_linear_multifit
> norm(y - sv$u %*% t(sv$u) %*% y, type = "2")^2
[1] 0.02189036
> # residual sum of squares since there is a zero singular value
> norm(y - sv$u[,1:3] %*% t(sv$u[,1:3]) %*% y, type = "2")^2
[1] 0.02193989
> # calculate via R
> sum(lm(y ~ x2)$residuals^2)
[1] 0.02193989
which shows that the current gsl_linear_multifit returns a different RSS
compared to R language (0.02189036 vs 0.02193989), but the results
(0.02193989 vs 0.02193989) agree after discarding the columns of zero
singular values.
So `rho_ls` on line 192 should be recalculated after discarding the
columns of U for zero singular values. Or a more direct way is to
calculate `rho_ls` with the estimated coefficient `c`,
/// Line 191
gsl_vector_memcpy(&t.vector, y);
gsl_blas_dgemv(CblasNoTrans, -1.0, X, c, 1.0,
&t.vector);
rho_ls = gsl_blas_dnrm2(&t.vector);
/// Line 192
What do you think?
Best,
Lijun
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- gsl_multifit_linear: chisq is not residual sum of squares when X is not full column rank,
Lijun WANG <=