bug-gnu-pspp
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSPP-BUG: Logistic Regression bugs


From: John Darrington
Subject: Re: PSPP-BUG: Logistic Regression bugs
Date: Wed, 14 Nov 2012 10:00:50 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

On Tue, Nov 13, 2012 at 08:28:31PM -0500, Renan Levine wrote:
     Dear Mr. Darrington,

Please call me John :) - except on formal occasions, when I enjoy Dr. 
Darrington.
     
     The problem with the error message only concerns dichotomous
     dependent variables, not predictor variables. Missing values on
     the predictor variables do not pose any problems. Cases with
     missing values on any independent variables are dropped just like
     when completing OLS regressions.

Yes.  Currently PSPP drops cases with missing values on any 
independent variable.
     
     I think unequivocally that what the routine needs to do is to
     ignore all missing values and just focus on the non-missing
     categories. For example, STATA's manual says:  logit fits a
     maximum-likelihood logit model.  depvar=0 indicates a negative
     outcome; depvar!=0 & depvar!=. (typically depvar=1) indicate a
     positive outcome.

So you are suggesting dropping case with missing dependent variables too?
That would seem reasonable.
     
     The way I understand that SPSS statement (if its not a typo) is
     that the SPSS routine will generate a predicted value for any
     observations with a missing value on the dependent variable,
     assuming that none of the  independent variables contain any
     missing values for that observation. This is one way that some
     use maximum likelihood techniques to impute missing values.
     
Yes, that is what it seems to be saying.  The question which arises is,
HOW does it generate the predicted value?  The only reasonable way I 
can think of would be to calculate it from the coefficients of the 
predictors --- but we don't know them a priori (the very purpose of
logistic regression is to find them).  Of course, it is possible to
run the procedure ignoring the cases with missing dependents, then 
impute the values from the calculated coefficients, and run the procdure
again, this time including the cases with imputed values.

However that would yield exactly the same results, except slightly
better (misleading better) confidence values.  So doing that doesn't
make much sense.  Hence my confusion.


If you have access to SPSS, perhaps you could try some experiments for me?
Can you see if SPSS simply drops cases with missing on the dependent variable.
Or does it treat them all as 0 or as 1 or what ...

Thanks for you help.

John


-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]