octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with Octave random number generation, part 1


From: michaell
Subject: Re: Problems with Octave random number generation, part 1
Date: Tue, 17 Dec 2019 16:02:06 -0600 (CST)

First of all, let me note that whenever I do serious simulation programming
in C (to finally wind up in peer-reviewed publications, that is, as serious
as it gets), I always take directly the integer output from a PRNG and
transform it to the distribution I need, even if it is only a uniform
floating-point valued one. The reason is that I do not want to rely on such
issues being implemented correctly. Octave's rand() is in this sense just a
convenience function to be used when I quickly try something out. Thus, for
me non-uniformities on the level of 2^-32 are not essential, but of course,
we should try to be correct as long as it is reasonably doable.


Rik-4 wrote
>  static float randu32 (void)
>   {
>     return (static_cast
> <float>
>  (randi32 ()) + 0.5) * (1.0/4294967296.0);
>     /* divided by 2^32 */
>   }

Yes, that's a bug.


Rik-4 wrote
>   do
>     {
>       i = randi32 ();
>     }
>   while (i == 0 || i > (2^32 - 129);
> 
>   return i * (1.0 / 4294967296.0);

No, if that is really what the patch boils down to, that's not a problem in
my eyes.

Both Philipp's and your proposals give uniformly distributed random numbers
as long as one looks at them on a scale that is coarser than eps. The
difference between your and Philipp's proposal is that his uses all the
representable floating-point numbers (at least all above 2^-8, probably --
if you used a randi32(), you would be even finer at small values), and as
the discretization becomes finer at small values, a given number becomes
less probable. But that is okay, because if you look into how often you hit
a given interval, it comes out correctly. Actually I would prefer his
proposal, _specifically_ if you want to pass it through log() to generate
exponentially-distributed numbers (but see below). 

By the way, you definitely should drop the condition (i == 0xFFFFFF) in your
code, it will make you miss the last float value below 1. 

But I am still not quite convinced by either of your proposals, because both
of you introduce some bias (and with 24-bit mantissas, this will be easily
visible): let's assume randi32() is quite dumb and has a cycle length of
2^32, during which time it returns every uint32 exactly once. Let's consider
the histogram of floats after one such cycle resulting from your approaches.
In the region around, say, 0.6, both of your proposals give the correct
density of 2^32 per unit interval. However, in Philipp's proposal 2^32-128
values have been distributed over the unit interval, while in your case it
were only 2^32-256 (after you have dropped the (i == 0xFFFFFF) condition),
because you skip on the zeros. So somewhere these numbers have to be
missing. In your case it is clear, they are missing from the endpoints of
the interval (symmetrically). In Philipp's case, they are missing at 0.5,
0.25... and so on, every time the exponent decreases by one (because any
number above 0.5 gets a range of plus and minus eps/2 around itself, but the
interval that is assigned to exact 0.5 extends only eps/4 to the left). 

I think that the best solution would be to a simple

return (randi32 () + 0.5) * (1.0 / 4294967296.0);

but use one of the non-default IEEE-754 rounding-modes "round towards minus
infinity" or "round towards zero" (and verify that casting doubles to floats
honours the rounding mode, which I do believe). The vast majority of
computers that today run octave will have no problem with that, but is there
a relevant architecture where this does not work?


Rik-4 wrote
> #define RANDU randu32()
> 
> and then RANDU is used in the algorithms for rand_normal and
> rnd_exponential.  In particular, the use occurs as the argument to the log
> function.  For example,
> 
> yy = - std::log (RANDU);

If this is really true, this is a massive problem, but fortunately with an
easy solution: isn't there a randu64()? If not, one should construct a
double random number with all mantissa bits filled from two calls to
randu32(), and at the very least pass that through log. The point is that
2^32 is not so large, you can have computers where you can fill the RAM in a
few seconds by 2^32 exponentially distributed random numbers. With your
proposal, Rik, the largest number you could ever get is 16.636, with
Philipp's it is 22.181, but actually you would expect that the maximum of N
independent exponentially-distributed random numbers goes with log(N). Thus,
already at 2^24 or 2^32 numbers you would start to see significant
deviations in the statistics of the maximum. If you use the full width of
double, you need 2^53 numbers for that, which is not within the reach of
computers on which octave runs today.




--
Sent from: https://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]