bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk: Locale-dependant bug with string to floating point conversion


From: Paul Eggert
Subject: Re: gawk: Locale-dependant bug with string to floating point conversion
Date: Fri, 30 Apr 2004 23:56:10 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Aharon Robbins <address@hidden> writes:

>> Aharon Robbins <address@hidden> writes:
>>
>> > The correct fix will be to de-hardcode the '.' in the replacement strtod().
>>
>> That's part of the fix, but won't it be a little trickier than that?
>> For example, the current POSIX spec requires awk to support
>> hexadecimal floating-point constants like "0xep0", regardless of
>> whether these constants are in the program or in the input data.
>
> Would you care to cite chapter and verse to prove this assertion, please?
> I've skimmed the current posix awk stuff several times, and I never saw
> this.

OK, here's the chapter and verse.  It refers to the 2004 POSIX spec,
which was published today.

1.  <http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html>
under "Expressions in awk" says that a runtime string value is
recognized as a number if, after you do the following:

   * Trim all leading and trailing blanks.
   * Discard a leading '-' or '+'.
   * Change each occurrence of the locale's decimal point character to '.'.

then the resulting string is lexically recognized as a NUMBER
(i.e., a numeric token in an Awk program).

2.  That same spec under "Lexical conventions" says that a NUMBER
shall be equivalent to either a C99 floating-constant or
integer-constant, except that a floating-constant can't include a
suffix (f, F, l, L), and some other restrictions about
integer-constants.

3.  The C standard (in Section 6.4.4.2 of ISO/IEC 9899:1999) says that
a floating-constant can be either a decimal-floating-constant or a
hexadecimal-floating-constant, and if you follow the grammar for
hexadecimal-floating-constant you'll see that 0xep0 is one of the
possibilities.


Here's an algorithm to convert a 2004 POSIX awk runtime string to a
number, assuming C99 is the implementation language:

  1.  Check that the input string is one of the following forms:

            B S F B
            B S I B

      where:

       B is a possibly empty sequence of blanks.
       S is '-' or '+' or empty.
       F is a C99 floating-point constant, except that it
         has a decimal-point character rather than '.',
         and it cannot have an f, F, l, or L suffix.
       I is a sequence of one or more decimal digits.

      If the string is not of this form, fail.

  2.  Invoke strtod on the string, and return whatever strtod returns.


Here's an algorithm to convert a 2004 POSIX awk source-language NUMBER
token (not a run time string) to a number, again assuming C99:

  1.  Check that the input string is one of the following forms:

            G
            I

      where:

       G is a C99 floating-point constant, except that it
         it cannot have an f, F, l, or L suffix.
       I is a sequence of one or more decimal digits.

      If the string is not of this form, fail (this shouldn't happen
      if your lexical analyzer is correct).

  2.  Replace any '.' in the input string with the decimal-point
      character.

  3.  Invoke strtod on the string, and return whatever strtod returns.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]