help-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Histogram


From: Peter Johansson
Subject: Re: Histogram
Date: Tue, 24 Mar 2020 10:55:34 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1

Hi Max,

I think the current behavior is intuitive. I think it would be unexpected if the count of a bin depend on whether it's a bin in the middle and the last bin.

For example, the current behavior is that

$ echo "1 2 3" | gsl-histogram 1 3 | grep '^2'

result in

2 3 1

and I expect to get the same result as with

$ echo "1 2 3" | gsl-histogram 1 4 | grep '^2'

Cheers,

Peter


On 13/3/20 12:59 am, maxgacode wrote:

Hi,

The current implementation of the histogram in GSL is using, for the first and last bin, the rules

bin[0] corresponds to xmin <= x < xmin + d

bin[n-1] corresponds to xmin + (n -1)d <= x < xmax

and there is a comment about the last bin


***
Thus any samples which fall on the upper end of the histogram are excluded. If you want to include this value for the last bin you will need to add an extra bin to your histogram.
***

I'm facing exactly this problem. Some data are not binned in the last bin if the data is exactly equal to xmax. I can add an extra bin but this is very inconvenient especially if I'm calling gsl_histogram_set_ranges_uniform(), that is using the minimum and maximum data values, after a call to gsl_histogram_alloc().

I'm wondering why (and if it is possible to implement it) GSL is not using a different binning strategy like the NCAR

https://www.ncl.ucar.edu/Document/Graphics/Interfaces/gsn_histogram.shtml

The linked page has the following statement

"Note that the last interval is treated specially. This is intentional, to make sure that all data values that fall inclusively between the lowest and highest intervals are binned. "

So the last bin is filled with the rule

bin[n-1] corresponds to xmin + (n -1)d <= x <= xmax

Using this rule in the functions

gsl_histogram_increment

and

gsl_histogram_accumulate

seems (to me) much more convenient, obvious and simple. May beĀ  it can be an option to be set using a function call after the histogram allocation.

Any comment welcome.

Max






reply via email to

[Prev in Thread] Current Thread [Next in Thread]