[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: poisson_rnd() excruciatingly slow
From: |
Ted Harding |
Subject: |
RE: poisson_rnd() excruciatingly slow |
Date: |
Thu, 07 Oct 2004 10:22:56 +0100 (BST) |
On 07-Oct-04 address@hidden wrote:
> Is there a reason why poisson_rnd() is so *glacially slow*
> for high values of lambda? On a 2.0 GHz Athlon system,
> I measured poisson_rnd(200000) to take approximately 18 seconds,
> making this function next to useless for practical use.
The reason lies in the nature of the algorithm, which contains a
"while" loop whose duration is proportional to lambda (on average).
The basic theory for the algorithm is that if X1, X2, X3, ...
(indefinitely) are independent and have negative exponential
distributions with mean 1, then the number of true cases in
the following:
X1 < lambda
X1+X2 < lambda
X1+X2+X3 < lambda
...
has a Poisson distribution with mean lambda. This is in effect
the way the algorithm for poisson_rnd proceeds, hence the "while"
loop which in effect adds the next Xi and checks whether lambda
has yet been passed..
> Compare
> that to the 0.16 seconds for binomial_rnd(400000,0.5) which
> effectively computes the same thing, which I still consider
> to be a sizable amount, by the way.
binomial_rnd(n,p) on the other hand simply depends on counting the
number of times "n uniforms < p" in a vector of length n, so it
is a one-pass procedure. The time it takes will increase with n,
but only because of the time taken to process large vectors.
(in the case that the arguments n,p are not scalars, then of
course the time will increase again because even larger structures
are being processed). So again (approx) proportional to n, but
by a much smaller factor.
Both of these algorithms are based on exact principles, and such
algorithms for discrete distributions can be tricky. The one used
for poisson_rnd is the best exact algorithm I know.
However, you can consider inexact algorithms if you use them in
such a way that the chance of getting an inappropriate result
is unacceptably small.
Consider the case you discuss: poisson_rnd(200000).
[By the way, this is not "effectively the same" as the distribution
of binomial_rnd(400000,0.5) (as you state) since, while the mean
is right, the variance is wrong by a factor of 2:
Var(Poisson) = 200000,
Var(Binomial) = 100000.
More accurate would be something like binomial_rnd(2000000000,0.0001)
but I doubt you want to do this!]
With such a large lambda, you are well into Central Limit Theorem
territory, so
floor(200000 + X*sqrt(200000))
where X is a standard Normal deviate, is going to be a very good
approximation to Poisson(200000). Of course there is a tiny chance
of a negative result, but unless you have a very special problem
you can confidently ignore this. And again there will be very slight
discrepancies in the extreme tails.
Another approximate approach could be based on something like
min(which(cumsum(-log(U)) > 200000)) - 1
where U is N (large, say N=400000) uniforms on [0,1]. This is based
on a similar principle to the exact algorithm, but has two
possibilities for inappropriate result:
a) An empty result, where the condition is not satisfied (i.e. all
the U are sufficiently near 1).
b) An upper limit (N-1) on the value of the "Poisson" variable,
so the histogram of a very large sample generated in this
way would have a little peak at (N-1) and nothing at higher
values.
For sufficiently large N, you could usually consider the chance of
these results to be small enough to be acceptable.
As to whether there should be a built-in proviso in octave to
use such methods in place of the ones built on exact principles,
I have reservations. You could build in a test into poisson_rnd
such that if (say) lambda > 10000 then it uses the above cumsum
method, or the normal approximation, and otherwise uses the
exact algorithm. But this would then pre-empt the user's judgement
about whether or not, in the circumstances of the work the user
wants to do, the risk of inappropriate results is acceptable.
A refinement of this approach would allow the "break-point"
to be an optional argument (default say 10000) so that the user
then has a choice, but in any case in large simulations with
small lambda you would then lose time over the test.
> If my questions have already been answered on various lists,
> then I apologize for asking them, but as the search-function
> of said lists seems to be down, I cannot check myself.
It's an interesting and relevant question, and in any case I don't
think there's any harm in such a question turning up from time to
time on the list. If it's been asked before, it was certainly a
long time ago, and it does no harm that new users encounter it
in this way, and probably little harm to remind old users of it!
Keep posting!
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <address@hidden>
Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
Date: 07-Oct-04 Time: 10:22:56
------------------------------ XFMail ------------------------------
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------