PSPP-BUG: Logistic Regression bugs

From:

Renan Levine

Subject:

Date:

Wed, 14 Nov 2012 13:29:54 -0500

User-agent:

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1

Dear John:

I did not have access to SPSS, but one of my grad students did us a favour... As a result, I can confirm that missing cases on the dependent variable are dropped from the logistic regression analysis by SPSS.

She writes:
See attached. Originally, the DV had 32.8% system missing cases and .4% DK (which I declared missing). In the first test, these missing cases are dropped. In the second and third tests, I coded the 32.8% missing cases as 0 and 1 respectively, and still declared the .4% DK as missing. This does indeed produce different Ns and coefficients.

The question of how SPSS/others estimate a missing dependent variable value when there are no missing values among the independent variables is that the model will result in a predicted Y, which after rounding would predict 0 or 1. I think this link has a more extensive discussion if you are interested http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf
At this point, my concerns/interests are much more mundane. Having a simple logistic regression routine to use in the classroom is my primary ambition. Ideally, it would be nice if PSPP could replicate SPSS' classification table (a 2x2 table showing how well the model predicted actual responses and the percentage observations that were correctly predicted). See http://www.ats.ucla.edu/stat/spss/dae/logit.htm for a brief exposition and annotated discussion of the analysis.

Yours,
Renan

-------- Original Message --------

Subject:	Re: PSPP-BUG: Logistic Regression bugs
Date:	Wed, 14 Nov 2012 10:00:50 +0100
From:	John Darrington <address@hidden>
To:	Renan Levine <address@hidden>
CC:	John Darrington <address@hidden>, address@hidden

On Tue, Nov 13, 2012 at 08:28:31PM -0500, Renan Levine wrote:
     Dear Mr. Darrington,

Please call me John :) - except on formal occasions, when I enjoy Dr. Darrington.
     
     The problem with the error message only concerns dichotomous
     dependent variables, not predictor variables. Missing values on
     the predictor variables do not pose any problems. Cases with
     missing values on any independent variables are dropped just like
     when completing OLS regressions.

Yes.  Currently PSPP drops cases with missing values on any 
independent variable.
     
     I think unequivocally that what the routine needs to do is to
     ignore all missing values and just focus on the non-missing
     categories. For example, STATA's manual says:  logit fits a
     maximum-likelihood logit model.  depvar=0 indicates a negative
     outcome; depvar!=0 & depvar!=. (typically depvar=1) indicate a
     positive outcome.

So you are suggesting dropping case with missing dependent variables too?
That would seem reasonable.
     
     The way I understand that SPSS statement (if its not a typo) is
     that the SPSS routine will generate a predicted value for any
     observations with a missing value on the dependent variable,
     assuming that none of the  independent variables contain any
     missing values for that observation. This is one way that some
     use maximum likelihood techniques to impute missing values.
     
Yes, that is what it seems to be saying.  The question which arises is,
HOW does it generate the predicted value?  The only reasonable way I 
can think of would be to calculate it from the coefficients of the 
predictors --- but we don't know them a priori (the very purpose of
logistic regression is to find them).  Of course, it is possible to
run the procedure ignoring the cases with missing dependents, then 
impute the values from the calculated coefficients, and run the procdure
again, this time including the cases with imputed values.

However that would yield exactly the same results, except slightly
better (misleading better) confidence values.  So doing that doesn't
make much sense.  Hence my confusion.


If you have access to SPSS, perhaps you could try some experiments for me?
Can you see if SPSS simply drops cases with missing on the dependent variable.
Or does it treat them all as 0 or as 1 or what ...

Thanks for you help.

John


-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.

Tests_Renan.docx
Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document