octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problems with Octave random number generation, part 1


From: Rik
Subject: Problems with Octave random number generation, part 1
Date: Tue, 17 Dec 2019 11:27:07 -0800

Maintainers,

I believe there are serious, but subtle, problems with Octave's random
number generation.  Resolving them, without introducing further issues,
requires careful thought and review beyond just a single coder (myself). 
Hence, I'm bringing this issue to the Maintainer's list.  I've also CC'ed
Jordi and Philipp Kutin who were active in initially discovering the bugs,
and Michael Leitner who was helpful in removing the bias from Octave's
randi function.

The tip of the iceberg was bug report #41742
(https://savannah.gnu.org/bugs/?41742).  In this report, using rand to
generate an integer in the range [1, a] using the straightforward code

x = round (rand * a + 0.5)

occasionally fails and returns a result of a+1 outside the desired range. 
This is only possible if rand should return exactly 1.  But, according to
the documentation, rand produces a floating point number in the range (0,
1), i.e., endpoints are excluded.  Unfortunately, it is quite simple to
demonstrate that the rand function does include the endpoint.  Test code
that I found that always reproduces the problem is

octave:1> rand ("state", 1);
octave:2> x = max (rand ([20e6, 1], "single"))
x = 1

This result is contrary to the documentation, and contrary to the behavior
of the rand function in Matlab.  Discussion between Jordi and Philipp
ensued at this thread
https://octave.1599824.n4.nabble.com/Re-Random-number-generation-quirk-td4662514.html.

The Octave code which generates a "single" (32-bit IEEE floating point)
random number is found in liboctave/numeric/randmtzig.cc

  /* generates a random number on (0,1)-real-interval */
  static float randu32 (void)
  {
    return (static_cast<float> (randi32 ()) + 0.5) * (1.0/4294967296.0);
    /* divided by 2^32 */
  }

The function randi32 (which I have no reason to doubt) generates an
unsigned 32-bit integer in the range [0, 2^32 - 1].  Unfortunately, I think
there was some confusion by the original programmer between Octave and C
implicit conversion rules.  In Octave, arguments are demoted to lower
precision before the operand is applied:

single + double = single

This is exactly reversed in C where arguments are promoted to the more
precise class.  In C,

single + double = double

So this code is actually performed completely with double precision values,
and then truncated to single precision float at the end of the function
because the return type for the function is float.  In Octave pseudo-code,

single ( (double (randi32 ()) + 0.5) * ( 1.0 / 4294967296.0) )

A useful Octave function for mimicking the C code is

fs = @(x) single ((floor (x) + 0.5) * (1.0 / 4294967296.0))

Using fs() one can determine the maximum valuable allowable in order for
the result not to become exactly equal to 1.

octave:28> fs (2^32-129)
ans = 0.9999999
octave:29> fs (2^32-128)
ans = 1

I assume Philipp did something similar.  He has provided a patch that,
after unpacking various macros, does the following

static float randu32 (void)
{
  uint32_t i;

  do
    {
      i = randi32 ();
    }
  while (i == 0 || i > (2^32 - 129);

  return i * (1.0 / 4294967296.0);
}

My fear, and more mathematical help in this specific would be nice, is that
this will introduce a bias.  In Octave, the new code can be modeled as

fs2 = @(x) single (floor (x) * (1.0 / 4294967296.0 ))

Trying some sample values

octave:78> fs2(1)
ans = 2.3283064e-10
octave:79> fs2(2)
ans = 4.6566129e-10
octave:80> fs2(3)
ans = 6.9849193e-10
octave:82> fs2(2^32-128)
ans = 1
octave:83> fs2(2^32-129)
ans = 0.9999999
octave:84> fs2(2^32-130)
ans = 0.9999999
octave:100> fs2(2^32-129) - fs2(2^32-383)
ans = 0

What one notices is that the small integer values all get mapped to
different floating point values, but at the upper end, large ranges of
different integer values get mapped to the same floating point value.

Given that IEEE single-precision floating point has only 24 bits for the
mantissa, I think something like this would work better

  static float randu32 (void)
  {
    uint32_t i;

    do
      {
        i = randi32 () & static_cast<uint32_t> (0xFFFFFF);
      }
    while (i == 0 || i == 0xFFFFFF);

    return i * (1.0f / 16777216.0f);
  }

In my testing, this does not produce exact values of 1, nor does it map
multiple integer values to the same floating point value.  Is anyone aware
of problems with this algorithm?

Resolving this must happen, but is only part of the issue.  There is a
#define in randmtzig.cc

#define RANDU randu32()

and then RANDU is used in the algorithms for rand_normal and
rnd_exponential.  In particular, the use occurs as the argument to the log
function.  For example,

yy = - std::log (RANDU);

The old RANDU = randu32() occasionally put out values of 1 which will be
mapped by the log function to 0.  Whereas any new function will always be
slightly less than one which means the log function output will be small,
but not exactly 0.  Do rand_exponential and rand_normal rely on the old
behavior in order to function correctly?  I have no idea.

--Rik







reply via email to

[Prev in Thread] Current Thread [Next in Thread]