help-gsl
[Top][All Lists]

## Re: why there are multiple functions for CDF of beta distribution?

 From: Martin Jansche Subject: Re: why there are multiple functions for CDF of beta distribution? Date: Fri, 10 Jan 2020 08:04:17 +0000

```Yup, this is a problem. When I compile and run your program gsl_invBeta.c
(which should #include <gsl/gsl_cdf.h>), I get:

Observed:
gsl: betainv.c:181:
<http://git.savannah.gnu.org/cgit/gsl.git/tree/cdf/betainv.c#n181> ERROR:
inverse failed to converge
Default GSL error handler invoked.
Abort trap: 6

Expected result:
1.0

Result with more precision:
0.99999999999999999999999999980178072518730245884823...

For the supplied values of alpha and beta, the CDF rises very steeply near
1 and the inverse CDF is equal to 1.0 within double precision when P>0.5
(that may not be a tight bound, but is a sufficient condition).

Generic workaround:

#include <stdio.h>
#include <gsl/gsl_sf_gamma.h>

// Computes the inverse function of gsl_sf_beta_inc by binary search.
double qbeta(double p, double a, double b) {
double x = 0.5;
for (double delta = 0.25; delta > 0; delta /= 2) {
const double beta_P = gsl_sf_beta_inc(a, b, x);
const double new_x = p > beta_P ? x + delta : x - delta;
if (beta_P == p || x == new_x) break;
x = new_x;
}
return x;
}

int main() {
double a = 5.3729222;
double b = 0.018122864;
double p = 0.67276126;

printf("%.18g\n", qbeta(p, a, b));

printf("%.7g (expected: 0.25)\n", qbeta(0.25, 1, 1));
printf("%.7g (expected: 0.9922861)\n", qbeta(0.25, 5, 0.1));
printf("%.7g (expected: 2.954261e-200)\n", qbeta(1e-50, 0.25, 0.5));

return 0;
}

Hope this helps,
-- mj

On Fri, Jan 10, 2020 at 1:30 AM Vasu Jaganath <address@hidden>
wrote:

> Martin and others,
>
> Following up, I have a particular example for which both Q and P variants
> of inverse functions fail.
>
> where as scipy beta.ppf (equivalent to gsl_cdf_beta_Pinv) does the right
> thing and converges to 1 or nearest double.
>
> I am attaching a very simple example with exact same data, the gsl
> function fails where as scipy function does the right thing.
>
> Any thoughts? any workarounds? Maybe there is a way I can specify
> convergence criteria?
>
> Thanks,
> Vasu
>
>
> On Wed, Jan 8, 2020 at 12:42 PM Vasu Jaganath <address@hidden>
> wrote:
>
>> Thanks Martin,
>>
>> I will test it out.
>>
>> On Tue, Jan 7, 2020 at 11:16 PM Martin Jansche <address@hidden>
>> wrote:
>>
>>> There are many more floating point values between 0.0 and 0.001 than
>>> there are between 0.999 and 1.0. The difference between 1.0 and the next
>>> smaller double value is only around 1e-16, but the next larger double value
>>> after 0.0 is about 1e-303. So beta_P(0.9, 1, 17) will be necessarily
>>> equivalent to 1.0 due to lack of precision, whereas beta_Q(0.9, 1, 17) will
>>> be 1e-17. (Haven't tried this in GSL. You may want to try and report back.)
>>>
>>> On Wed, Jan 8, 2020 at 1:35 AM Vasu Jaganath <address@hidden>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> This is probably a very silly question,
>>>>
>>>> I don't understand why there are two separate  P and Q variants for
>>>> CDFs?
>>>> particularly for beta distribution?
>>>>
>>>>
>>>> https://www.gnu.org/software/gsl/doc/html/randist.html#the-beta-distribution
>>>>
>>>> Thanks,
>>>> Vasu
>>>>
>>>

```