[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
hist(y,n) [was Re: I NEED HELP ...]
From: |
Paul Kienzle |
Subject: |
hist(y,n) [was Re: I NEED HELP ...] |
Date: |
Tue, 18 Jun 2002 11:24:41 -0400 |
On Tue, Jun 18, 2002 at 02:24:59AM -0400, Paul Kienzle wrote:
> On Fri, Jun 07, 2002 at 02:25:17PM -0500, Marco Boni wrote:
<snip>
> For example, in hist:
>
> for i = 1:n-1
> cutoff (i) = (x (i) + x (i + 1)) / 2;
> endfor
>
> is simply translated into
>
> cutoff = ( x(1:n-1) + x(2:n) ) / 2;
>
> The next loop in hist.m is more tricky:
>
> freq(1) = sum(y < cutoff (1));
> for i = 2:n-1
> freq (i) = sum (y >= cutoff (i-1) & y < cutoff (i));
> endfor
> freq(n) = sum (y >= cutoff (n-1));
>
> You can get part way there with sort:
>
> [s, idx] = sort ( [cutoff(:); y(:)] );
> pt = [0; idx > n; 0];
>
> This gives you a sequence of 0's and 1's where the zeros represent
> bin boundaries and the ones represent bin contents. Because sort
> is 'stable' all y values which are identical to an x value will
> appear after the x value, so the semi-open [x-1, x) logic will
> be preserved. The leading and trailing zeros capture all the y's
> which are not otherwise in a bin.
>
> Using cumulative sum, you can turn the ones into frequency counts
>
> chist = cumsum(pt);
>
> But you only need the counts at the bin boundaries:
>
> chist = s(find(diff(chist) == 0));
>
> Now you have a 'cumulative histogram'. Differentiate it and you
> should have the histogram:
>
> freq = diff(chist);
Oops! This will need to be:
freq = [chist(1); diff(chist) ]
>
> I may messed up something along the way, but hopefully you get
> the idea. Could you fix it, test it and submit it to bug-octave?
>
> Since you are dealing with really large numbers of bins (up to 2^20 if
> I'm reading your code correctly), fixing hist should address most of
> the speed problems.
I was bothered by having to compute with all those empty bins. Here is
a version for n equal sized bins which is independent of the number of bins:
bins = zeros(1,n);
q = sort(y(:).');
L = length(q);
if (L == 0)
return;
elseif (q(1) == q(L))
bins(n) = L;
return;
endif
q = (q - q(1))/(q(L)-q(1))/(1+eps); # set y-range to [0,1)
q = fix(q*n); # split into n bins
same = [ q == [q(2:L),-Inf] ]; # true if neighbours are in the same bin
q = q(~same); # q lists the 'active' bins (0-origin)
f = cumsum(same); # cumulative histogram
f = f(~same);
f = [f(1), diff(f)] + 1; # cumulative histogram -> histogram
# we need to add 1 since we did not count the point at the boundary between
# histograms (it was turned into zero)
# distribute f to the active bins, leaving the remaining bins empty
bins(q+1) = f;
Paul Kienzle
address@hidden
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------