help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: problem with rand ?


From: Paul Kienzle
Subject: Re: problem with rand ?
Date: Wed, 28 Jul 2004 01:57:23 -0400

Dimitri,

The first thing to note is that I'm only using 32 bits of randomness, not 53.
I was wondering if that might be a problem.  Apparently it is.

With a slight change to the code (using rand53() rather than randu())
the following expression usually equals 0:

        length(find(diff(sort(rand(1e6,1)))==0))

Calculating the probability of at least one duplicate is the birthday paradox:

        1-p = m!/(m^n (m-n)!)

Using Stirling's approximation,

        log(1-p) ~ (m-n+1/2)(log m - log m-n) - n

For m=2^32, n=1e6, 1-p ~ 2.7e-51 => very high probability of duplicates. For m=2^53, n=1e6, 1-p ~ 0.99983 => very low probability of duplicates.
For m=2^53, n=6e7,  1-p ~ 0.55  => even chance of a duplicate.
For m=2^32, n=7.7e4, 1-p ~ 0.50 => even chances of a duplicate.

Unfortunately, my computer is not hefty enough to run this experiment
for n=6e7 (it takes 15 min/trial), but running 100 trials on n=7.7e4
yields a not unexpected 54 trials with no duplicates.  FWIW, the n=6e7
trial I did run yielded 0 duplicates.

The 53-bit code is some 35% slower than the 32-bit code.  Unless
there are objections, I will change to 53-bits.  Accuracy over speed.

- Paul


On Jul 27, 2004, at 11:46 PM, Dmitri A. Sergatskov wrote:

I found that random series returned by rand (from octave-forge) returns
a higher number of _identical_ values than I would expect:

octave:1> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)
ans = 1
octave:2> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
ans = [](1x0)
octave:3> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
ans = 60641
octave:4> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
ans = [](1x0)
octave:5> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
ans =

  26998  27604

octave:6> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
ans = 16034
octave:7> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
ans = 23032
octave:8> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); find(dss1==0)
ans = 121716

etc...

I would think that for random 2^17 numbers probability of two identical
numbers would be something like 2^17 / 2^54 (assuming ieee 64 bit double
representation)

Matlab does not show this behavior even for larger series:

>> s1=rand(1,256*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)

ans =

     0

>> s1=rand(1,1024*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)

ans =

     0

>> s1=rand(1,1024*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0)

ans =

     0

Am I missing something here?

Sincerely,

Dmitri.



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------




-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]