[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Low hanging fruit - Accelerated random distribution functions
From: |
David Bateman |
Subject: |
Re: Low hanging fruit - Accelerated random distribution functions |
Date: |
Fri, 23 Feb 2007 14:27:03 +0100 |
User-agent: |
Thunderbird 1.5.0.7 (X11/20060921) |
Paul Kienzle wrote:
>
> This needs a fast discrete_rnd, which could be implemented
> using lookup as [untested!]
>
> rnd = discrete_rnd(p,v,n)
> edges = cumsum(p(1:end-1))/sum(p);
> rnd = v(lookup(edges,rand(n,1))+1);
>
> The current code is approximately:
>
> edges = cumsum(p)/sum(p);
> u = rand(1,n);
> rnd = v (1 + sum( edges*ones(1,n) <= ones(m,1)*u));
>
> Given n numbers generated from m possibilities:
> current code takes O(m*n) space and O(m*n) time.
> lookup takes O(m+n) space and O((m+n)*log(m+n)) time.
> so lookup saves both space and time.
>
> Note that the documentation says p is a probability
> whereas it is actually a proportion.
Humm, it seems also that the matlab unid* functions are closely related
to the discrete_* function. The difference is that matlab assumes that
there are equal proportions of each value in the vector v. Matlab also
accepts that v is a scalar and then assumes values of 1:v are in the
distribution. Should we modify discrete_* to have this behavior? Should
we have an extended interface that allows p to be set? Or should we keep
discrete_* functions as is and write matlab compatiable wraps to them
for the unid* functions? Assuming that keep discrete_*, consider the
attached patch. I think the unidrnd.m functions are compatible, but
might have missed some subtly.
I also tested the speed and compliance of your code to the original
distribution and I get good argument and speed like
octave:2> function rnd = discrete_rnd2(v, p, varargin), edges =
cumsum(p(1:end-1))/sum(p);rnd = v(lookup(edges,rand(varargin{:}))+1);
endfunction
octave:3> tic; b2 = discrete_rnd(1:100, [1:50,50:-1:1], 1,1e5); toc
Elapsed time is 0.994423 seconds.
octave:4> tic; b2 = discrete_rnd2(1:100, [1:50,50:-1:1], 1,1e5); toc
Elapsed time is 0.204079 seconds.
octave:5> tic; b2 = discrete_rnd2(1:10, [1:5,5:-1:1], 1,1e5); toc
Elapsed time is 0.202477 seconds.
octave:6> tic; b0 = discrete_rnd(1:10, [1:5,5:-1:1], 1,1e5); toc
Elapsed time is 0.177124 seconds.
octave:7> tic; b2 = discrete_rnd2(1:10, [1:5,5:-1:1], 1,1e6); toc
Elapsed time is 2.180717 seconds.
octave:8> tic; b0 = discrete_rnd(1:10, [1:5,5:-1:1], 1,1e6); toc
Elapsed time is 1.346832 seconds.
octave:9> tic; b2 = discrete_rnd2(1:2, [1,1], 1,1e4); toc
Elapsed time is 0.061501 seconds.
octave:10> tic; b0 = discrete_rnd(1:2, [1,1], 1,1e4); toc
Elapsed time is 0.052948 seconds.
octave:11> tic; b2 = discrete_rnd2(1:2, [1,1], 1,1e5); toc
Elapsed time is 0.204903 seconds.
octave:12> tic; b0 = discrete_rnd(1:2, [1,1], 1,1e5); toc
Elapsed time is 0.109550 seconds.
octave:13> tic; b2 = discrete_rnd2(1:2, [1,1], 1,1e6); toc
Elapsed time is 2.192253 seconds.
octave:14> tic; b0 = discrete_rnd(1:2, [1,1], 1,1e6); toc
Elapsed time is 0.731766 seconds.
So for length(v) large there is a clear win for the new code. For
length(v) ~ 10 and smaller the original code is faster, though not by a
huge amount (only about a factor of 2), given that there is also memory
savings, I'd be inclined to think it is better to go with the new code..
Regards
David
--
David Bateman address@hidden
Motorola Labs - Paris +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax)
The information contained in this communication has been classified as:
[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary
*** ./scripts/statistics/distributions/unidcdf.m.orig47 2007-02-23
12:46:34.381451380 +0100
--- ./scripts/statistics/distributions/unidcdf.m 2007-02-23
12:43:13.585784764 +0100
***************
*** 0 ****
--- 1,40 ----
+ ## Copyright (C) 2007 David Bateman
+ ##
+ ## This file is part of Octave.
+ ##
+ ## Octave is free software; you can redistribute it and/or modify it
+ ## under the terms of the GNU General Public License as published by
+ ## the Free Software Foundation; either version 2, or (at your option)
+ ## any later version.
+ ##
+ ## Octave is distributed in the hope that it will be useful, but
+ ## WITHOUT ANY WARRANTY; without even the implied warranty of
+ ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ ## General Public License for more details.
+ ##
+ ## You should have received a copy of the GNU General Public License
+ ## along with Octave; see the file COPYING. If not, write to the Free
+ ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ ## 02110-1301, USA.
+
+ ## -*- texinfo -*-
+ ## @deftypefn {Function File} {} unidcdf (@var{x}, @var{v})
+ ## For each element of @var{x}, compute the cumulative distribution
+ ## function (CDF) at @var{x} of a univariate discrete distribution which
+ ## assumes the values in @var{v} with equal probability.
+ ## @end deftypefn
+
+ function cdf = unidcdf (x, v)
+
+ if (nargin != 2)
+ print_usage ();
+ endif
+
+ if (isscalar(v))
+ v = [1:v].';
+ else
+ v = v(:);
+ endif
+
+ cdf = discrete_cdf (x, v, ones(size(v)));
+ endfunction
*** ./scripts/statistics/distributions/unidrnd.m.orig47 2007-02-23
12:46:34.381451380 +0100
--- ./scripts/statistics/distributions/unidrnd.m 2007-02-23
12:43:04.224263096 +0100
***************
*** 0 ****
--- 1,45 ----
+ ## Copyright (C) 2007 David Bateman
+ ##
+ ## This file is part of Octave.
+ ##
+ ## Octave is free software; you can redistribute it and/or modify it
+ ## under the terms of the GNU General Public License as published by
+ ## the Free Software Foundation; either version 2, or (at your option)
+ ## any later version.
+ ##
+ ## Octave is distributed in the hope that it will be useful, but
+ ## WITHOUT ANY WARRANTY; without even the implied warranty of
+ ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ ## General Public License for more details.
+ ##
+ ## You should have received a copy of the GNU General Public License
+ ## along with Octave; see the file COPYING. If not, write to the Free
+ ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ ## 02110-1301, USA.
+
+ ## -*- texinfo -*-
+ ## @deftypefn {Function File} {} unidrnd (@var{v}, @var{r}, @var{c})
+ ## @deftypefnx {Function File} {} unidrnd (@var{v}, @var{sz})
+ ## Generate a row vector containing a random sample of values from
+ ## the univariate distribution which assumes the values in @var{v} with
+ ## eqal probability. If @var{v} is a scalar, it is promoted to
@code{1:@var{v}}.
+ ##
+ ## If @var{r} and @var{c} are given create a matrix with @var{r} rows and
+ ## @var{c} columns. Or if @var{sz} is a vector, create a matrix of size
+ ## @var{sz}.
+ ## @end deftypefn
+
+ function rnd = unidrnd (v, varargin)
+
+ if (nargin != 2)
+ print_usage ();
+ endif
+
+ if (isscalar(v))
+ v = [1:v].';
+ else
+ v = v(:);
+ endif
+
+ rnd = discrete_rnd (v, ones(size(v)), varargin{:});
+ endfunction
*** ./scripts/statistics/distributions/unidpdf.m.orig47 2007-02-23
12:46:34.381451380 +0100
--- ./scripts/statistics/distributions/unidpdf.m 2007-02-23
12:43:49.510946234 +0100
***************
*** 0 ****
--- 1,40 ----
+ ## Copyright (C) 2007 David Bateman
+ ##
+ ## This file is part of Octave.
+ ##
+ ## Octave is free software; you can redistribute it and/or modify it
+ ## under the terms of the GNU General Public License as published by
+ ## the Free Software Foundation; either version 2, or (at your option)
+ ## any later version.
+ ##
+ ## Octave is distributed in the hope that it will be useful, but
+ ## WITHOUT ANY WARRANTY; without even the implied warranty of
+ ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ ## General Public License for more details.
+ ##
+ ## You should have received a copy of the GNU General Public License
+ ## along with Octave; see the file COPYING. If not, write to the Free
+ ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ ## 02110-1301, USA.
+
+ ## -*- texinfo -*-
+ ## @deftypefn {Function File} {} unidpdf (@var{x}, @var{v})
+ ## For each element of @var{x}, compute the probability density function
+ ## (pDF) at @var{x} of a univariate discrete distribution which assumes
+ ## the values in @var{v} with equal probability.
+ ## @end deftypefn
+
+ function pdf = unidpdf (x, v)
+
+ if (nargin != 2)
+ print_usage ();
+ endif
+
+ if (isscalar(v))
+ v = [1:v].';
+ else
+ v = v(:);
+ endif
+
+ pdf = discrete_pdf (x, v, ones(size(v)));
+ endfunction
*** ./scripts/statistics/distributions/unidinv.m.orig47 2007-02-23
12:46:34.381451380 +0100
--- ./scripts/statistics/distributions/unidinv.m 2007-02-23
12:43:41.653348752 +0100
***************
*** 0 ****
--- 1,40 ----
+ ## Copyright (C) 2007 David Bateman
+ ##
+ ## This file is part of Octave.
+ ##
+ ## Octave is free software; you can redistribute it and/or modify it
+ ## under the terms of the GNU General Public License as published by
+ ## the Free Software Foundation; either version 2, or (at your option)
+ ## any later version.
+ ##
+ ## Octave is distributed in the hope that it will be useful, but
+ ## WITHOUT ANY WARRANTY; without even the implied warranty of
+ ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ ## General Public License for more details.
+ ##
+ ## You should have received a copy of the GNU General Public License
+ ## along with Octave; see the file COPYING. If not, write to the Free
+ ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ ## 02110-1301, USA.
+
+ ## -*- texinfo -*-
+ ## @deftypefn {Function File} {} unidinv (@var{x}, @var{v})
+ ## For each component of @var{x}, compute the quantile (the inverse of
+ ## the CDF) at @var{x} of the univariate distribution which assumes the
+ ## values in @var{v} with equal probability
+ ## @end deftypefn
+
+ function inv = unidinv (x, v)
+
+ if (nargin != 2)
+ print_usage ();
+ endif
+
+ if (isscalar(v))
+ v = [1:v].';
+ else
+ v = v(:);
+ endif
+
+ inv = discrete_inv (x, v, ones(size(v)));
+ endfunction
*** ./scripts/statistics/distributions/discrete_rnd.m.orig47 2007-02-23
12:46:45.663866735 +0100
--- ./scripts/statistics/distributions/discrete_rnd.m 2007-02-23
13:53:27.589019216 +0100
***************
*** 74,97 ****
error ("discrete_rnd: p must be a nonzero, nonnegative vector");
endif
! n = prod (sz);
! m = length (v);
! u = rand (1, n);
! s = reshape (cumsum (p / sum (p)), m, 1);
!
! ## The following loop is a space/time tradeoff in favor of space,
! ## since the dataset may be large.
! ##
! ## Vectorized code is:
! ##
! rnd = v (1 + sum ((s * ones (1, n)) <= ((ones (m, 1) * u))));
! rnd = reshape (rnd, sz);
! ##
! ## Non-vectorized code is:
! ##
! ## rnd = zeros (sz);
! ## for q=1:n
! ## rnd (q) = v (sum (s <= u (q)) + 1);
! ## endfor
!
endfunction
--- 74,78 ----
error ("discrete_rnd: p must be a nonzero, nonnegative vector");
endif
! rnd = v (lookup (cumsum (p (1 : end-1)) / sum(p), rand (sz)) + 1);
endfunction
- Re: Low hanging fruit - Accelerated random distribution functions, (continued)
Re: Low hanging fruit - Accelerated random distribution functions, David Bateman, 2007/02/23
- Re: Low hanging fruit - Accelerated random distribution functions, Daniel J Sebald, 2007/02/23
- Re: Low hanging fruit - Accelerated random distribution functions, David Bateman, 2007/02/23
- Re: Low hanging fruit - Accelerated random distribution functions, Daniel J Sebald, 2007/02/23
- Re: Low hanging fruit - Accelerated random distribution functions, David Bateman, 2007/02/24
- Re: Low hanging fruit - Accelerated random distribution functions, Daniel J Sebald, 2007/02/24
- Re: Low hanging fruit - Accelerated random distribution functions, David Bateman, 2007/02/24
Re: Low hanging fruit - Accelerated random distribution functions, Daniel J Sebald, 2007/02/24
Re: Low hanging fruit - Accelerated random distribution functions,
David Bateman <=