Re: [lmi] Ancient rate-table programs: GIGO

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Ancient rate-table programs: GIGO

From:	Greg Chicares
Subject:	Re: [lmi] Ancient rate-table programs: GIGO
Date:	Mon, 12 Dec 2016 06:11:46 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0

On 2016-12-11 19:23, Vadim Zeitlin wrote:
[...]
>  I couldn't help asking myself whether converting the number to string and
> counting the number of decimal digits in it was really the best way to
> determine the number of digits necessary for its representation. And it

I'm about to end the day's labors, so I'll paste my work-in-progress
here for your amusement. In practice, the argument is a 'double'
converted to string with value_cast<std::string>(v); this function
takes a string because that makes it easier to unit-test. (Because
it's the result of value_cast<>, the string cannot contain leading
or trailing blanks.)

int decimals(std::string const& arg)
{
    // Early exit: no decimal point means zero decimals.
    if(std::string::npos == arg.find('.'))
        {
        return 0;
        }

    std::string s(arg);
    std::size_t d = 0;

    // Strip leading zeros.
    std::string::size_type q = s.find_first_not_of("0");
    if(std::string::npos != q)
        {
        s.erase(0, q);
        }

    // Preliminary result is number of characters after '.'.
    // (Decrement for '.' unless nothing followed it.)
    d = s.size() - s.find('.');
    if(d) --d; 

    // Length of stripped string is number of significant digits
    // (on both sides of the decimal point) plus one for the '.'.
    // If this total exceeds 15--i.e., if there are more than 14
    // significant digits--then there may be excess precision.
    // In that case, keep only the first 15 digits (plus the '.',
    // for a total of 16 characters), because those digits are
    // guaranteed to be significant for IEEE754 double precision;
    // drop the rest, which may include arbitrary digits. Then
    // drop any trailing string that's all zeros or nines, and
    // return the length of the remaining string. This wrongly
    // truncates a number whose representation requires 15 or 16
    // digits when the last one or more decimal digit is a nine,
    // but that doesn't matter for the present use case: rate
    // tables aren't expected to have more than about eight
    // decimal places; and this function will be called for each
    // number in a table and the maximum result used, so that
    // such incorrect truncation can only occur if every number
    // in the table is ill-conditioned in this way.
    if(15 < s.size())
        {
        s.resize(16);
        if('0' == s.back() || '9' == s.back())
            {
            d = s.find_last_not_of(s.back()) - s.find('.');
            }
        }

    return d;
}

> turns out that the answer is much more interesting than I'd have guessed.
> Maybe you already know all this, but it turns out that the choice of the
> best algorithm for doing this is still a subject of active research, with
> the Grisu3 algorithm by F. Loitsch being published only in 2010 (I had at
> least heard about this one before...) and the latest important paper on
> this subject, "Printing Floating-Point Numbers (A Faster, Always Correct
> Method)"[1] was published less than a year ago (and I had no idea about it
> until 15 minutes ago...).

I was entirely unaware of this, so thanks very much for pointing it out.

>  I don't know what algorithm does the MinGW CRT implementation use, but
> from a quick web search it seems it doesn't use Grisu3 and it's, of course,
> impossible for it to use the algorithm only described in January 2016, so
> it must be using either something much slower or non-optimal (i.e. not
> always giving the shortest result).

It's been quite a while since I did any work on libmingwex, but IIRC it
uses David Gay's code--so I guess it's an optimized Dragon algorithm.
My impression is that MinGW-w64 must have retained that when they forked
from mingw.org; otherwise we would have noticed the msvcrt errors that
mingw.org fixed. (I just wish they had copied mingw.org's round().)

> So if we want the fastest and optimal
> answer, it seems like we'd need to implement either Grisu3 or the algorithm
> from the paper above ourselves (I'm not sure if non-optimality of Grisu3
> would affect our use or not, I didn't have time to read either of the
> papers fully yet).
> 
>  So now I wonder if you were planning to do this from the beginning or if
> you have another idea for solving this?

No, my work pasted above is crude and childish. I think it'll be just
good enough for the practical task at hand.

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] Ancient rate-table programs: GIGO, Greg Chicares, 2016/12/10
- Re: [lmi] Ancient rate-table programs: GIGO, Vadim Zeitlin, 2016/12/11
  - Re: [lmi] Ancient rate-table programs: GIGO, Greg Chicares, 2016/12/11
    - Re: [lmi] Ancient rate-table programs: GIGO, Vadim Zeitlin, 2016/12/11
    - Re: [lmi] Ancient rate-table programs: GIGO, Greg Chicares <=

Prev by Date: Re: [lmi] Ancient rate-table programs: GIGO
Next by Date: Re: [lmi] Numerics
Previous by thread: Re: [lmi] Ancient rate-table programs: GIGO
Next by thread: Re: [lmi] Numerics
Index(es):
- Date
- Thread