help-octave
[Top][All Lists]

## hist(y,n) [was Re: I NEED HELP ...]

 From: Paul Kienzle Subject: hist(y,n) [was Re: I NEED HELP ...] Date: Tue, 18 Jun 2002 11:24:41 -0400

```On Tue, Jun 18, 2002 at 02:24:59AM -0400, Paul Kienzle wrote:
> On Fri, Jun 07, 2002 at 02:25:17PM -0500, Marco Boni wrote:
<snip>

> For example, in hist:
>
>       for i = 1:n-1
>               cutoff (i) = (x (i) + x (i + 1)) / 2;
>       endfor
>
> is simply translated into
>
>       cutoff = ( x(1:n-1) + x(2:n) ) / 2;
>
> The next loop in hist.m is more tricky:
>
>       freq(1) = sum(y < cutoff (1));
>       for i = 2:n-1
>               freq (i) = sum (y >= cutoff (i-1) & y < cutoff (i));
>       endfor
>       freq(n) = sum (y >= cutoff (n-1));
>
> You can get part way there with sort:
>
>       [s, idx] = sort ( [cutoff(:); y(:)] );
>       pt = [0; idx > n; 0];
>
> This gives you a sequence of 0's and 1's where the zeros represent
> bin boundaries and the ones represent bin contents.  Because sort
> is 'stable' all y values which are identical to an x value will
> appear after the x value, so the semi-open [x-1, x) logic will
> be preserved.  The leading and trailing zeros capture all the y's
> which are not otherwise in a bin.
>
> Using cumulative sum, you can turn the ones into frequency counts
>
>       chist = cumsum(pt);
>
> But you only need the counts at the bin boundaries:
>
>       chist = s(find(diff(chist) == 0));
>
> Now you have a 'cumulative histogram'.  Differentiate it and you
> should have the histogram:
>
>       freq = diff(chist);

Oops! This will need to be:

freq = [chist(1); diff(chist) ]

>
> I may messed up something along the way, but hopefully you get
> the idea.  Could you fix it, test it and submit it to bug-octave?
>
> Since you are dealing with really large numbers of bins (up to 2^20 if
> the speed problems.

I was bothered by having to compute with all those empty bins.  Here is
a version for n equal sized bins which is independent of the number of bins:

bins = zeros(1,n);
q = sort(y(:).');
L = length(q);
if (L == 0)
return;
elseif (q(1) == q(L))
bins(n) = L;
return;
endif

q = (q - q(1))/(q(L)-q(1))/(1+eps); # set y-range to [0,1)
q = fix(q*n);                  # split into n bins
same = [ q == [q(2:L),-Inf] ]; # true if neighbours are in the same bin
q = q(~same);                  # q lists the 'active' bins (0-origin)

f = cumsum(same);              # cumulative histogram
f = f(~same);
f = [f(1), diff(f)] + 1;       # cumulative histogram -> histogram
# we need to add 1 since we did not count the point at the boundary between
# histograms (it was turned into zero)

# distribute f to the active bins, leaving the remaining bins empty
bins(q+1) = f;

Paul Kienzle

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

```