pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Samples between PSPP and SPSS


From: Andy Choens
Subject: Re: Samples between PSPP and SPSS
Date: Wed, 21 Sep 2011 17:31:07 -0400
User-agent: KMail/4.7.1 (Linux/3.0.0-11-generic; KDE/4.7.1; x86_64; ; )

I ran a very basic test. I created a .sav file with a series of numbers ranging 
from 1-250, in increments of 1. I can provide the file off-list if you'd like, 
but its not hard to make.

I then fed this file to both SPSS 11 and PSPP 0.6.2 (current version in Ubuntu 
repos). I ran nearly the exct same syntax in both. My syntax looked like;

=====
set seed = 123456789123456789 .
sample .1 .
list .
show seed .
======

The only difference is that in SPSS, I force it to use the Mersenne Twister rng 
for consistency. My output differed greatly. The output in both cases surprised 
me, because I expected to get a sample of 25 and neither SPSS nor PSPP gave me 
a return of 25. 

===============================================
PSPP
===============================================
sam_test
--------
      10 
      20 
      24 
      45 
      49 
      67 
      71 
      74 
      80 
      82 
      91 
     114 
     144 
     186 
     190 
     192 
     219 
     231 
     236 
     248 
     249 


===============================================
SPSS
===============================================
    9
    12
    20
    29
    32
    34
    43
    60
    70
    72
    79
    97
    117
    126
    138
    145
    146
    156
    159
    171
    173
    174
    175
    178
    179
    180
    181
    188
    199
    209
    231
    246

SEED = 1,929,887,249



PSPP failed to return the seed it used. Instead, it broke up it's returns in a 
funny way, but that is a bug for another day. At first blush, the results do 
not appear to be reproducible between systems.

My next attempt will be to run a similar test in R to see if it matches either 
SPSS or PSPP or neither one.

--andy

On Wednesday, September 21, 2011 06:29:01 PM you wrote:
> In SPSS, you can choose between the MC rng and the Mersenne Twister
> using the commands:
> 
> SET RNG=MC.
> 
> or
> 
> SET RNG=MT.
> 
> 
> See the SPSS documentation for details.
> 
> 
> PSPP doesn't implement this.  (Perhaps we should?)  Like Jason says, we
> always use the Mersenne Twister.
> 
> I also notice that the SPSS docs say that the default seed is 2000000
> whereas we set it from the realtime clock.
> 
> I'd be interested to see some of your experiements to see what is necessary
> to make them match (if it's at all possible).  It would also be interesting
> to see how the random number distributions fare when analysed with some of
> the non-parametric tests.
> 
> J'
> 
> 
> 
> On Wed, Sep 21, 2011 at 12:50:47PM -0400, Jason Stover wrote:
> 
>      On Wed, Sep 21, 2011 at 10:12:39AM -0400, Andy Choens wrote:
>      > If I set a set to a consitent value, say 123, and create a
>      > sample in PSPP will it match the sample created by SPSS?
>      I doubt it, but can't be sure becaus the source code to SPSS is kept
>      secret.
> 
>      > If someone
>      > knows which pseudo number generator is being used by PSPP and
>      > SPSS respectively that would also be a big help so I could
>      > replicate / confirm output indpenedently.
> 
>      As of about 10 years ago, most of SPSS used a multiplicative
>      congruential random number generator. It had a period of either 2^31 -
>      1 or 2^32 - 1. They may still use such a generator, since changing it
>      would cause users' old syntax to give different answers.
> 
>      PSPP uses the Mersenne Twister, which has a period of 2^19937 - 1, as
>      implemented in GSL. You can see the code for it in src/math/random.c.
> 
>      -Jason
> 
>      _______________________________________________
>      Pspp-users mailing list
>      address@hidden
>      https://lists.gnu.org/mailman/listinfo/pspp-users



reply via email to

[Prev in Thread] Current Thread [Next in Thread]