bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 2 new issues with gawk -M : one where output is correct as spec'ed,


From: arnold
Subject: Re: 2 new issues with gawk -M : one where output is correct as spec'ed, but desirability is question, the other being inconsistent NaN treatment
Date: Sun, 06 Feb 2022 03:36:44 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Hello.

Thanks for your report.

Once again, I request that you send such mails to the bug-gawk@gnu.org
mailing list, instead of directly to me.  In the future, if you do
not do this, I will simply ignore such emails.

With respect to your issues:

"Jason C. Kwan" <jasonckwan@yahoo.com> wrote:

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Issue 1 : Casting operations performed when none needed, thus impacting 
> precision 
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Even when absolutely nothing has changed regarding the value itself,
> simply appending a decimal point without any digits to the right of the
> decimal point, or appending a scientific notation of "E+0", which involves
> zero shifting of the decimal point,  would cause it to be processed by
> the portion of the code requiring sufficient PREC,  for both the additive
> identity ( + 0 ) and the multiplicative identity ( x 1 ) operations.
>
> That difference exists both for inputs parsed from STDIN, as well as
> code from the command line. The final portion of the code is to show
> that gawk -M  supposedly treat 0. == 0 as well as 1. == 1, but clearly
> this isn't the case when they're used in conjunction with other values.
> while this is exactly the correct expected output, per spec, whether
> such an output is desirable may be questionable,  since none of them
> involve calling any function, and is purely a matter of how the parser
> chooses to interpret the input, and whether purely identity operations
> of adding zero ( + 0 ) or multiplying by one ( x 1 ) should receive
> special bypass-lane treatment.

When using -M, gawk distinguishes between integer and floating point
values. It uses GMP numbers for integers and MPFR numbers for
floating point. Appending "." or an exponent causes a conversion from
one to the other.

With floating point numbers it is not surprising that there is
a difference in the last digit, which I suspect is beyond the
default precision.

The semantics of string <--> number conversion in awk can sometimes be
subtle, and (from experience) attempting to optimize addition of zero
or multiplication by one is likely to just introduce more problems than
it solves.

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Issue 2 : Inconsistent NaN treatments 
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> not only purely sign-retaining identity operations end up flipping the
> sign of NaN, even the basic output for variable "a" are opposite signed
> depending on whether bignum -M mode is invoked or not. 
> 
> Which operations flip the sign also exhibit inconsistent behavior,
> e.g. the last one ( -a ) x ( -1 ) : standard gawk flips sign, while
> bignum mode doesn't

I am aware of this; it's something that is out of my control.

I plan to add the following to the documentation:

        The sign used for NaN values can vary!  The result depends
        upon both the underlying system architecture and the underlying
        library used to format NaN values. In particular, it's possible
        to get different results for the same function call depending
        upon whether or not @command{gawk} is running in MPFR mode
        (@option{-M}) or not. Caveat Emptor!

You can write a simple function to check if a value is a NaN:

        function isnan(val)
        {
                val = val ""
                return (tolower(val) ~ /[-+]nan/)
        }

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]