gawk-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gawk-diffs] [SCM] gawk branch, master, updated. ccdaa3f17b9341e628acd64


From: Arnold Robbins
Subject: [gawk-diffs] [SCM] gawk branch, master, updated. ccdaa3f17b9341e628acd64f68502c67141e8997
Date: Mon, 13 Jun 2011 19:30:41 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gawk".

The branch, master has been updated
       via  ccdaa3f17b9341e628acd64f68502c67141e8997 (commit)
      from  6e7e7acd76d49c0d1f0cb60829e8b340df318b88 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.sv.gnu.org/cgit/gawk.git/commit/?id=ccdaa3f17b9341e628acd64f68502c67141e8997

commit ccdaa3f17b9341e628acd64f68502c67141e8997
Author: Arnold D. Robbins <address@hidden>
Date:   Mon Jun 13 22:29:43 2011 +0300

    Make ranges be character based all the time.

diff --git a/ChangeLog b/ChangeLog
index 8eb3b70..8507b98 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+Sun Jun 12 23:43:06 2011  Arnold D. Robbins  <address@hidden>
+
+       * re.c (resetup): Always turn on RE_RANGES_IGNORE_LOCALES.
+       Add justifying comment with URLs for the relevant portions of
+       POSIX. Thanks to Paul Eggert for pointing out the happy change
+       to the rules and supplying the URLs.
+
 Wed Jun  8 22:41:30 2011  Arnold D. Robbins  <address@hidden>
 
        * regcomp.c (build_range_exp): Add check for RE_NO_EMPTY_RANGES
diff --git a/doc/ChangeLog b/doc/ChangeLog
index eb54d5c..8b32325 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,10 @@
+Mon Jun 13 22:28:02 2011  Arnold D. Robbins  <address@hidden>
+
+       * gawk.texi: Document that POSIX now says [a-z] is undefined outside
+       the C and POSIX locales, so gawk treats it as the Good Lord intended
+       in all cases.  Thanks to Paul Eggert for letting me know about this
+       and providing URLs to cite.
+
 Fri May 27 09:59:38 2011  Arnold D. Robbins  <address@hidden>
 
        * gawk.1, gawk.texi: Minor edits w.r.t. the bug reporting address.
diff --git a/doc/gawk.info b/doc/gawk.info
index 39e2f76..3dd9d73 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -171,7 +171,6 @@ texts being (a) (see below), and with the Back-Cover Texts 
being (b)
 * Case-sensitivity::                   How to do case-insensitive matching.
 * Leftmost Longest::                   How much text matches.
 * Computed Regexps::                   Using Dynamic Regexps.
-* Locales::                            How the locale affects things.
 * Records::                            Controlling how data is split into
                                        records.
 * Fields::                             An introduction to fields.
@@ -270,6 +269,7 @@ texts being (a) (see below), and with the Back-Cover Texts 
being (b)
                                        third subexpression.
 * Function Calls::                     A function call is an expression.
 * Precedence::                         How various operators nest.
+* Locales::                            How the locale affects things.
 * Pattern Overview::                   What goes into a pattern.
 * Regexp Patterns::                    Using regexps as patterns.
 * Expression Patterns::                Any expression can be used as a
@@ -476,6 +476,8 @@ texts being (a) (see below), and with the Back-Cover Texts 
being (b)
 * POSIX/GNU::                          The extensions in `gawk' not in
                                        POSIX `awk'.
 * Common Extensions::                  Common Extensions Summary.
+* Ranges and Locales::                 How locales used to affect regexp
+                                       ranges.
 * Contributors::                       The major contributors to
                                        `gawk'.
 * Gawk Distribution::                  What is in the `gawk'
@@ -2849,7 +2851,6 @@ you specify more complicated classes of strings.
 * Case-sensitivity::            How to do case-insensitive matching.
 * Leftmost Longest::            How much text matches.
 * Computed Regexps::            Using Dynamic Regexps.
-* Locales::                     How the locale affects things.
 
 
 File: gawk.info,  Node: Regexp Usage,  Next: Escape Sequences,  Up: Regexp
@@ -3235,15 +3236,16 @@ File: gawk.info,  Node: Bracket Expressions,  Next: GNU 
Regexp Operators,  Prev:
 3.4 Using Bracket Expressions
 =============================
 
-Within a bracket expression, a "range expression" consists of two
-characters separated by a hyphen.  It matches any single character that
-sorts between the two characters, using the locale's collating sequence
-and character set.  For example, `[0-9]' is equivalent to
-`[0123456789]'.
+As mentioned earlier, a bracket expression matches any character amongst
+those listed between the opening and closing square brackets.
 
-   Unfortunately, providing simple character ranges such as `[a-z]'
-usually does not work like you might expect, due to locale-related
-issues.  This is discussed more fully, in *note Locales::.
+   Within a bracket expression, a "range expression" consists of two
+characters separated by a hyphen.  It matches any single character that
+sorts between the two characters, based upon the system's native
+character set.  For example, `[0-9]' is equivalent to `[0123456789]'.
+(See *note Ranges and Locales::, for an explanation of how the POSIX
+standard and `gawk' have changed over time.  This is mainly of
+historical interest.)
 
    To include one of the characters `\', `]', `-', or `^' in a bracket
 expression, put a `\' in front of it.  For example:
@@ -3293,11 +3295,9 @@ Table 3.1: POSIX Character Classes
 
    For example, before the POSIX standard, you had to write
 `/[A-Za-z0-9]/' to match alphanumeric characters.  If your character
-set had other alphabetic characters in it, this would not match them,
-and if your character set collated differently from ASCII, this might
-not even match the ASCII alphanumeric characters.  With the POSIX
-character classes, you can write `/[[:alnum:]]/' to match the alphabetic
-and numeric characters in your character set.
+set had other alphabetic characters in it, this would not match them.
+With the POSIX character classes, you can write `/[[:alnum:]]/' to
+match the alphabetic and numeric characters in your character set.
 
    Two additional special sequences can appear in bracket expressions.
 These apply to non-ASCII character sets, which can have single symbols
@@ -3528,7 +3528,7 @@ this principle is also important for regexp-based record 
and field
 splitting (*note Records::, and also *note Field Separators::).
 
 
-File: gawk.info,  Node: Computed Regexps,  Next: Locales,  Prev: Leftmost 
Longest,  Up: Regexp
+File: gawk.info,  Node: Computed Regexps,  Prev: Leftmost Longest,  Up: Regexp
 
 3.8 Using Dynamic Regexps
 =========================
@@ -3607,86 +3607,6 @@ be used inside a bracket expression for a dynamic regexp:
 often in practice, but it's worth noting for future reference.
 
 
-File: gawk.info,  Node: Locales,  Prev: Computed Regexps,  Up: Regexp
-
-3.9 Where You Are Makes A Difference
-====================================
-
-Modern systems support the notion of "locales": a way to tell the
-system about the local character set and language.  The current locale
-setting can affect the way regexp matching works, often in surprising
-ways.
-
-   For example, in the default `"C"' locale, `[a-dx-z]' is equivalent to
-`[abcdxyz]'.  Many locales sort characters in dictionary order, and in
-these locales, `[a-dx-z]' is typically not equivalent to `[abcdxyz]';
-instead it might be equivalent to `[aBbCcdXxYyz]', for example.
-
-   This point needs to be emphasized: Much literature teaches that you
-should use `[a-z]' to match a lowercase character.  But on systems with
-non-ASCII locales, this also matches all of the uppercase characters
-except `Z'!  This is a continuous cause of confusion, even well into
-the twenty-first century.
-
-     NOTE: In an attempt to end the confusion once and for all, when
-     not in POSIX mode (*note Options::), `gawk' expands ranges into
-     the characters they include, based only on the machine character
-     set.  This restores the traditional, pre-POSIX, pre-locales
-     behavior.  However, you should read the rest of this section so
-     that you can write portable scripts, instead of relying on
-     behavior specific to `gawk'.
-
-   To obtain the traditional interpretation of bracket expressions, you
-can use the `"C"' locale by setting the `LC_ALL' environment variable
-to the value `C'.  However, it is best to just use POSIX character
-classes, such as `[[:lower:]]' to match specific classes of characters.
-
-   To demonstrate these issues, the following example uses the `sub()'
-function, which does text replacement (*note String Functions::).  Here,
-the intent is to remove trailing uppercase characters:
-
-     $ echo something1234abc | gawk --posix '{ sub("[A-Z]*$", ""); print }'
-     -| something1234a
-
-This output is unexpected, since the `bc' at the end of
-`something1234abc' should not normally match `[A-Z]*'.  This result is
-due to the locale setting (and thus you may not see it on your system).
-There are two fixes.  The first is to use the POSIX character class
-`[[:upper:]]', instead of `[A-Z]'.  (This is preferred, since then your
-program will work everywhere.)
-
-   The second is to change the locale setting in the environment, before
-running `gawk', by using the shell statements:
-
-     LANG=C LC_ALL=C
-     export LANG LC_ALL
-
-   The setting `C' forces `gawk' to behave in the traditional Unix
-manner, where case distinctions do matter.  You may wish to put these
-statements into your shell startup file, e.g., `$HOME/.profile'.
-
-   Similar considerations apply to other ranges.  For example, `["-/]'
-is perfectly valid in ASCII, but is not valid in many Unicode locales,
-such as `en_US.UTF-8'.  (In general, such ranges should be avoided;
-either list the characters individually, or use a POSIX character class
-such as `[[:punct:]]'.)
-
-   An additional factor relates to splitting records.  For the normal
-case of `RS = "\n"', the locale is largely irrelevant.  For other
-single-character record separators, using `LC_ALL=C' will give you much
-better performance when reading records.  Otherwise, `gawk' has to make
-several function calls, _per input character_, to find the record
-terminator.
-
-   According to POSIX, string comparison is also affected by locales
-(similar to regular expressions).  The details are presented in *note
-POSIX String Comparison::.
-
-   Finally, the locale affects the value of the decimal point character
-used when `gawk' parses input data.  This is discussed in detail in
-*note Conversion::.
-
-
 File: gawk.info,  Node: Reading Files,  Next: Printing,  Prev: Regexp,  Up: Top
 
 4 Reading Input Files
@@ -6451,6 +6371,7 @@ operators.
 * Truth Values and Conditions:: Testing for true and false.
 * Function Calls::              A function call is an expression.
 * Precedence::                  How various operators nest.
+* Locales::                     How the locale affects things.
 
 
 File: gawk.info,  Node: Values,  Next: All Operators,  Up: Expressions
@@ -7897,7 +7818,7 @@ Here is a sample run:
      -| 5 1
 
 
-File: gawk.info,  Node: Precedence,  Prev: Function Calls,  Up: Expressions
+File: gawk.info,  Node: Precedence,  Next: Locales,  Prev: Function Calls,  
Up: Expressions
 
 6.5 Operator Precedence (How Operators Nest)
 ============================================
@@ -7998,6 +7919,33 @@ precedence:
      POSIX.  For maximum portability, do not use them.
 
 
+File: gawk.info,  Node: Locales,  Prev: Precedence,  Up: Expressions
+
+6.6 Where You Are Makes A Difference
+====================================
+
+Modern systems support the notion of "locales": a way to tell the
+system about the local character set and language.
+
+   Once upon a time, the locale setting used to affect regexp matching
+(*note Ranges and Locales::), but this is no longer true.
+
+   Locales can affect record splitting.  For the normal case of `RS =
+"\n"', the locale is largely irrelevant.  For other single-character
+record separators, setting `LC_ALL=C' in the environment will give you
+much better performance when reading records.  Otherwise, `gawk' has to
+make several function calls, _per input character_, to find the record
+terminator.
+
+   According to POSIX, string comparison is also affected by locales
+(similar to regular expressions).  The details are presented in *note
+POSIX String Comparison::.
+
+   Finally, the locale affects the value of the decimal point character
+used when `gawk' parses input data.  This is discussed in detail in
+*note Conversion::.
+
+
 File: gawk.info,  Node: Patterns and Actions,  Next: Arrays,  Prev: 
Expressions,  Up: Top
 
 7 Patterns, Actions, and Variables
@@ -19753,6 +19701,7 @@ you can find more information.
 * POSIX/GNU::                   The extensions in `gawk' not in POSIX
                                 `awk'.
 * Common Extensions::           Common Extensions Summary.
+* Ranges and Locales::          How locales used to affect regexp ranges.
 * Contributors::                The major contributors to `gawk'.
 
 
@@ -20066,7 +20015,7 @@ the current version of `gawk'.
 
 
 
-File: gawk.info,  Node: Common Extensions,  Next: Contributors,  Prev: 
POSIX/GNU,  Up: Language History
+File: gawk.info,  Node: Common Extensions,  Next: Ranges and Locales,  Prev: 
POSIX/GNU,  Up: Language History
 
 A.6 Common Extensions Summary
 =============================
@@ -20092,9 +20041,108 @@ Feature                      BWK Awk   Mawk   GNU Awk
 `BINMODE' variable                     X      X
 
 
-File: gawk.info,  Node: Contributors,  Prev: Common Extensions,  Up: Language 
History
+File: gawk.info,  Node: Ranges and Locales,  Next: Contributors,  Prev: Common 
Extensions,  Up: Language History
+
+A.7 Regexp Ranges and Locales: A Long Sad Story
+===============================================
+
+This minor node describes the confusing history of ranges within
+regular expressions and their interactions with locales, and how this
+affected different versions of `gawk'.
+
+   The original Unix tools that worked with regular expressions defined
+character ranges (such as `[a-z]') to match any character between the
+first character in the range and the last character in the range,
+inclusive.  Ordering was based on the numeric value of each character
+in the machine's native character set.  Thus, on ASCII-based systems,
+`[a-z]' matched all the lowercase letters, and only the lowercase
+letters, since the numeric values for the letters from `a' through `z'
+were contigous.  (On an EBCDIC system, the range `[a-z]' includes
+additional, non-alphabetic characters as well.)
+
+   Almost all introductory Unix literature explained range expressions
+as working in this fashion, and in particular, would teach that the
+"correct" way to match lowercase letters was with `[a-z]', and that
+`[A-Z]' was the the "correct" way to match uppercase letters.  And
+indeed, this was true.
+
+   The 1993 POSIX standard introduced the idea of locales (*note
+Locales::).  Since many locales include other letters besides the plain
+twenty-six letters of the American English alphabet, the POSIX standard
+added character classes (*note Bracket Expressions::) as a way to match
+different kinds of characters besides the traditional ones in the ASCII
+character set.
+
+   However, the standard _changed_ the interpretation of range
+expressions.  In the `"C"' and `"POSIX"' locales, a range expression
+like `[a-dx-z]' is still equivalent to `[abcdxyz]', as in ASCII.  But
+outside those locales, the ordering was defined to be based on
+"collation order".
+
+   In many locales, `A' and `a' are both less than `B'.  In other
+words, these locales sort characters in dictionary order, and
+`[a-dx-z]' is typically not equivalent to `[abcdxyz]'; instead it might
+be equivalent to `[aBbCcdXxYyz]', for example.
+
+   This point needs to be emphasized: Much literature teaches that you
+should use `[a-z]' to match a lowercase character.  But on systems with
+non-ASCII locales, this also matched all of the uppercase characters
+except `Z'!  This was a continuous cause of confusion, even well into
+the twenty-first century.
+
+   To demonstrate these issues, the following example uses the `sub()'
+function, which does text replacement (*note String Functions::).  Here,
+the intent is to remove trailing uppercase characters:
+
+     $ echo something1234abc | gawk-3.1.8 '{ sub("[A-Z]*$", ""); print }'
+     -| something1234a
+
+This output is unexpected, since the `bc' at the end of
+`something1234abc' should not normally match `[A-Z]*'.  This result is
+due to the locale setting (and thus you may not see it on your system).
+
+   Similar considerations apply to other ranges.  For example, `["-/]'
+is perfectly valid in ASCII, but is not valid in many Unicode locales,
+such as `en_US.UTF-8'.
+
+   Early versions of `gawk' used regexp matching code that was not
+locale aware, so ranges had their traditional interpretation.
+
+   When `gawk' switched to using locale-aware regexp matchers, the
+problems began; especially as both GNU/Linux and commercial Unix
+vendors started implementing non-ASCII locales, _and making them the
+default_.  Perhaps the most frequently asked question became something
+like "why does `[A-Z]' match lowercase letters?!?"
+
+   This situation existed for close to 10 years, if not more, and the
+`gawk' maintainer grew weary of trying to explain that `gawk' was being
+nicely standards-compliant, and that the issue was in the user's
+locale.  During the development of version 4.0, he modified `gawk' to
+always treat ranges in the original, pre-POSIX fashion, unless
+`--posix' was used (*note Options::).
+
+   Fortunately, shortly before the final release of `gawk' 4.0, the
+maintainer learned that the 2008 standard had changed the definition of
+ranges, such that outside the `"C"' and `"POSIX"' locales, the meaning
+of range expressions was _undefined_.(1)
+
+   By using this lovely technical term, the standard gives license to
+implementors to implement ranges in whatever way they choose.  The
+`gawk' maintainer chose to apply the pre-POSIX meaning in all cases:
+the default regexp matching; with `--traditional', and with `--posix';
+in all cases, `gawk' remains POSIX compliant.
+
+   ---------- Footnotes ----------
+
+   (1) See the standard
+(http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05)
+and its rationale
+(http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05).
+
+
+File: gawk.info,  Node: Contributors,  Prev: Ranges and Locales,  Up: Language 
History
 
-A.7 Major Contributors to `gawk'
+A.8 Major Contributors to `gawk'
 ================================
 
      Always give credit where credit is due.
@@ -24595,7 +24643,7 @@ Index
 * - (hyphen), -= operator <1>:           Precedence.          (line  95)
 * - (hyphen), -= operator:               Assignment Ops.      (line 129)
 * - (hyphen), filenames beginning with:  Options.             (line  59)
-* - (hyphen), in bracket expressions:    Bracket Expressions. (line  16)
+* - (hyphen), in bracket expressions:    Bracket Expressions. (line  17)
 * --assign option:                       Options.             (line  32)
 * --c option:                            Options.             (line  78)
 * --characters-as-bytes option:          Options.             (line  68)
@@ -24765,7 +24813,7 @@ Index
                                                               (line  44)
 * \ (backslash), gsub()/gensub()/sub() functions and: Gory Details.
                                                               (line   6)
-* \ (backslash), in bracket expressions: Bracket Expressions. (line  16)
+* \ (backslash), in bracket expressions: Bracket Expressions. (line  17)
 * \ (backslash), in escape sequences:    Escape Sequences.    (line   6)
 * \ (backslash), in escape sequences, POSIX and: Escape Sequences.
                                                               (line 113)
@@ -24776,7 +24824,7 @@ Index
 * ^ (caret), ^ operator:                 Precedence.          (line  49)
 * ^ (caret), ^= operator <1>:            Precedence.          (line  95)
 * ^ (caret), ^= operator:                Assignment Ops.      (line 129)
-* ^ (caret), in bracket expressions:     Bracket Expressions. (line  16)
+* ^ (caret), in bracket expressions:     Bracket Expressions. (line  17)
 * ^, in FS:                              Regexp Field Splitting.
                                                               (line  59)
 * _ (underscore), _ C macro:             Explaining gettext.  (line  70)
@@ -25024,7 +25072,7 @@ Index
                                                               (line  44)
 * backslash (\), gsub()/gensub()/sub() functions and: Gory Details.
                                                               (line   6)
-* backslash (\), in bracket expressions: Bracket Expressions. (line  16)
+* backslash (\), in bracket expressions: Bracket Expressions. (line  17)
 * backslash (\), in escape sequences:    Escape Sequences.    (line   6)
 * backslash (\), in escape sequences, POSIX and: Escape Sequences.
                                                               (line 113)
@@ -25084,15 +25132,15 @@ Index
 * bracket expressions <1>:               Bracket Expressions. (line   6)
 * bracket expressions:                   Regexp Operators.    (line  55)
 * bracket expressions, character classes: Bracket Expressions.
-                                                              (line  29)
+                                                              (line  30)
 * bracket expressions, collating elements: Bracket Expressions.
-                                                              (line  70)
+                                                              (line  69)
 * bracket expressions, collating symbols: Bracket Expressions.
-                                                              (line  77)
+                                                              (line  76)
 * bracket expressions, complemented:     Regexp Operators.    (line  63)
 * bracket expressions, equivalence classes: Bracket Expressions.
-                                                              (line  83)
-* bracket expressions, non-ASCII:        Bracket Expressions. (line  70)
+                                                              (line  82)
+* bracket expressions, non-ASCII:        Bracket Expressions. (line  69)
 * bracket expressions, range expressions: Bracket Expressions.
                                                               (line   6)
 * break debugger command:                Breakpoint Control.  (line  11)
@@ -25135,7 +25183,7 @@ Index
 * caret (^), ^ operator:                 Precedence.          (line  49)
 * caret (^), ^= operator <1>:            Precedence.          (line  95)
 * caret (^), ^= operator:                Assignment Ops.      (line 129)
-* caret (^), in bracket expressions:     Bracket Expressions. (line  16)
+* caret (^), in bracket expressions:     Bracket Expressions. (line  17)
 * case keyword:                          Switch Statement.    (line   6)
 * case sensitivity, array indices and:   Array Intro.         (line  92)
 * case sensitivity, converting case:     String Functions.    (line 522)
@@ -25175,8 +25223,8 @@ Index
 * Close, Diane <1>:                      Contributors.        (line  21)
 * Close, Diane:                          Manual History.      (line  41)
 * close_func() input method:             Internals.           (line 160)
-* collating elements:                    Bracket Expressions. (line  70)
-* collating symbols:                     Bracket Expressions. (line  77)
+* collating elements:                    Bracket Expressions. (line  69)
+* collating symbols:                     Bracket Expressions. (line  76)
 * Colombo, Antonio:                      Acknowledgments.     (line  60)
 * columns, aligning:                     Print Examples.      (line  70)
 * columns, cutting:                      Cut Program.         (line   6)
@@ -25552,7 +25600,7 @@ Index
 * e debugger command (alias for enable): Breakpoint Control.  (line  72)
 * EBCDIC:                                Ordinal Functions.   (line  45)
 * egrep utility <1>:                     Egrep Program.       (line   6)
-* egrep utility:                         Bracket Expressions. (line  23)
+* egrep utility:                         Bracket Expressions. (line  24)
 * egrep.awk program:                     Egrep Program.       (line  54)
 * elements in arrays:                    Reference to Elements.
                                                               (line   6)
@@ -25596,7 +25644,7 @@ Index
 * equals sign (=), == operator <1>:      Precedence.          (line  65)
 * equals sign (=), == operator:          Comparison Operators.
                                                               (line  11)
-* EREs (Extended Regular Expressions):   Bracket Expressions. (line  23)
+* EREs (Extended Regular Expressions):   Bracket Expressions. (line  24)
 * ERRNO variable <1>:                    Internals.           (line 139)
 * ERRNO variable <2>:                    TCP/IP Networking.   (line  54)
 * ERRNO variable <3>:                    Auto-set.            (line  72)
@@ -25645,7 +25693,7 @@ Index
 * expressions, matching, See comparison expressions: Typing and Comparison.
                                                               (line   9)
 * expressions, selecting:                Conditional Exp.     (line   6)
-* Extended Regular Expressions (EREs):   Bracket Expressions. (line  23)
+* Extended Regular Expressions (EREs):   Bracket Expressions. (line  24)
 * eXtensible Markup Language (XML):      Internals.           (line 160)
 * extension() function (gawk):           Using Internal File Ops.
                                                               (line  15)
@@ -25888,7 +25936,7 @@ Index
 * gawk, bitwise operations in:           Bitwise Functions.   (line  39)
 * gawk, break statement in:              Break Statement.     (line  51)
 * gawk, built-in variables and:          Built-in Variables.  (line  14)
-* gawk, character classes and:           Bracket Expressions. (line  91)
+* gawk, character classes and:           Bracket Expressions. (line  90)
 * gawk, coding style in:                 Adding Code.         (line  38)
 * gawk, command-line options:            GNU Regexp Operators.
                                                               (line  70)
@@ -26074,7 +26122,7 @@ Index
 * hyphen (-), -= operator <1>:           Precedence.          (line  95)
 * hyphen (-), -= operator:               Assignment Ops.      (line 129)
 * hyphen (-), filenames beginning with:  Options.             (line  59)
-* hyphen (-), in bracket expressions:    Bracket Expressions. (line  16)
+* hyphen (-), in bracket expressions:    Bracket Expressions. (line  17)
 * i debugger command (alias for info):   Dgawk Info.          (line  12)
 * id utility:                            Id Program.          (line   6)
 * id.awk program:                        Id Program.          (line  30)
@@ -26177,7 +26225,7 @@ Index
                                                               (line  13)
 * internationalization, localization:    User-modified.       (line 153)
 * internationalization, localization, character classes: Bracket Expressions.
-                                                              (line  91)
+                                                              (line  90)
 * internationalization, localization, gawk and: Internationalization.
                                                               (line  13)
 * internationalization, localization, locale categories: Explaining gettext.
@@ -26618,9 +26666,9 @@ Index
 * POSIX awk, backslashes in string constants: Escape Sequences.
                                                               (line 113)
 * POSIX awk, BEGIN/END patterns:         I/O And BEGIN/END.   (line  16)
-* POSIX awk, bracket expressions and:    Bracket Expressions. (line  23)
+* POSIX awk, bracket expressions and:    Bracket Expressions. (line  24)
 * POSIX awk, bracket expressions and, character classes: Bracket Expressions.
-                                                              (line  29)
+                                                              (line  30)
 * POSIX awk, break statement and:        Break Statement.     (line  51)
 * POSIX awk, changes in awk versions:    POSIX.               (line   6)
 * POSIX awk, continue statement and:     Continue Statement.  (line  43)
@@ -27314,411 +27362,413 @@ Index
 
 Tag Table:
 Node: Top1346
-Node: Foreword33320
-Node: Preface37665
-Ref: Preface-Footnote-140632
-Ref: Preface-Footnote-240738
-Node: History40970
-Node: Names43361
-Ref: Names-Footnote-144838
-Node: This Manual44910
-Ref: This Manual-Footnote-149857
-Node: Conventions49957
-Node: Manual History52091
-Ref: Manual History-Footnote-155361
-Ref: Manual History-Footnote-255402
-Node: How To Contribute55476
-Node: Acknowledgments56620
-Node: Getting Started60951
-Node: Running gawk63330
-Node: One-shot64516
-Node: Read Terminal65741
-Ref: Read Terminal-Footnote-167391
-Ref: Read Terminal-Footnote-267667
-Node: Long67838
-Node: Executable Scripts69214
-Ref: Executable Scripts-Footnote-171083
-Ref: Executable Scripts-Footnote-271185
-Node: Comments71636
-Node: Quoting74103
-Node: DOS Quoting78726
-Node: Sample Data Files79401
-Node: Very Simple82433
-Node: Two Rules87032
-Node: More Complex89179
-Ref: More Complex-Footnote-192109
-Node: Statements/Lines92194
-Ref: Statements/Lines-Footnote-196656
-Node: Other Features96921
-Node: When97849
-Node: Invoking Gawk99996
-Node: Command Line101381
-Node: Options102164
-Ref: Options-Footnote-1115442
-Node: Other Arguments115467
-Node: Naming Standard Input118125
-Node: Environment Variables119219
-Node: AWKPATH Variable119663
-Ref: AWKPATH Variable-Footnote-1122260
-Node: Other Environment Variables122520
-Node: Exit Status124860
-Node: Include Files125535
-Node: Obsolete129020
-Node: Undocumented129706
-Node: Regexp129947
-Node: Regexp Usage131399
-Node: Escape Sequences133425
-Node: Regexp Operators139188
-Ref: Regexp Operators-Footnote-1146385
-Ref: Regexp Operators-Footnote-2146532
-Node: Bracket Expressions146630
-Ref: table-char-classes148433
-Node: GNU Regexp Operators151077
-Node: Case-sensitivity154800
-Ref: Case-sensitivity-Footnote-1157768
-Ref: Case-sensitivity-Footnote-2158003
-Node: Leftmost Longest158111
-Node: Computed Regexps159312
-Node: Locales162738
-Node: Reading Files166445
-Node: Records168386
-Ref: Records-Footnote-1177060
-Node: Fields177097
-Ref: Fields-Footnote-1180130
-Node: Nonconstant Fields180216
-Node: Changing Fields182418
-Node: Field Separators188396
-Node: Default Field Splitting191025
-Node: Regexp Field Splitting192142
-Node: Single Character Fields195484
-Node: Command Line Field Separator196543
-Node: Field Splitting Summary199984
-Ref: Field Splitting Summary-Footnote-1203176
-Node: Constant Size203277
-Node: Splitting By Content207861
-Ref: Splitting By Content-Footnote-1211587
-Node: Multiple Line211627
-Ref: Multiple Line-Footnote-1217474
-Node: Getline217653
-Node: Plain Getline219881
-Node: Getline/Variable221970
-Node: Getline/File223111
-Node: Getline/Variable/File224433
-Ref: Getline/Variable/File-Footnote-1226032
-Node: Getline/Pipe226119
-Node: Getline/Variable/Pipe228679
-Node: Getline/Coprocess229786
-Node: Getline/Variable/Coprocess231029
-Node: Getline Notes231743
-Node: Getline Summary233685
-Ref: table-getline-variants234028
-Node: Command line directories234884
-Node: Printing235509
-Node: Print237140
-Node: Print Examples238477
-Node: Output Separators241261
-Node: OFMT243021
-Node: Printf244379
-Node: Basic Printf245285
-Node: Control Letters246824
-Node: Format Modifiers250636
-Node: Printf Examples256645
-Node: Redirection259360
-Node: Special Files266344
-Node: Special FD266877
-Ref: Special FD-Footnote-1270502
-Node: Special Network270576
-Node: Special Caveats271426
-Node: Close Files And Pipes272222
-Ref: Close Files And Pipes-Footnote-1279245
-Ref: Close Files And Pipes-Footnote-2279393
-Node: Expressions279543
-Node: Values280612
-Node: Constants281288
-Node: Scalar Constants281968
-Ref: Scalar Constants-Footnote-1282827
-Node: Nondecimal-numbers283009
-Node: Regexp Constants286068
-Node: Using Constant Regexps286543
-Node: Variables289598
-Node: Using Variables290253
-Node: Assignment Options291977
-Node: Conversion293849
-Ref: table-locale-affects299225
-Ref: Conversion-Footnote-1299849
-Node: All Operators299958
-Node: Arithmetic Ops300588
-Node: Concatenation303093
-Ref: Concatenation-Footnote-1305886
-Node: Assignment Ops306006
-Ref: table-assign-ops310994
-Node: Increment Ops312402
-Node: Truth Values and Conditions315872
-Node: Truth Values316955
-Node: Typing and Comparison318004
-Node: Variable Typing318793
-Ref: Variable Typing-Footnote-1322690
-Node: Comparison Operators322812
-Ref: table-relational-ops323222
-Node: POSIX String Comparison326771
-Ref: POSIX String Comparison-Footnote-1327727
-Node: Boolean Ops327865
-Ref: Boolean Ops-Footnote-1331943
-Node: Conditional Exp332034
-Node: Function Calls333766
-Node: Precedence337360
-Node: Patterns and Actions341013
-Node: Pattern Overview342067
-Node: Regexp Patterns343733
-Node: Expression Patterns344276
-Node: Ranges347850
-Node: BEGIN/END350816
-Node: Using BEGIN/END351578
-Ref: Using BEGIN/END-Footnote-1354309
-Node: I/O And BEGIN/END354415
-Node: BEGINFILE/ENDFILE356697
-Node: Empty359530
-Node: Using Shell Variables359846
-Node: Action Overview362131
-Node: Statements364488
-Node: If Statement366342
-Node: While Statement367841
-Node: Do Statement369885
-Node: For Statement371041
-Node: Switch Statement374193
-Node: Break Statement376290
-Node: Continue Statement378280
-Node: Next Statement380067
-Node: Nextfile Statement382457
-Node: Exit Statement384754
-Node: Built-in Variables387170
-Node: User-modified388265
-Ref: User-modified-Footnote-1396291
-Node: Auto-set396353
-Ref: Auto-set-Footnote-1405644
-Node: ARGC and ARGV405849
-Node: Arrays409700
-Node: Array Basics411205
-Node: Array Intro411916
-Node: Reference to Elements416234
-Node: Assigning Elements418504
-Node: Array Example418995
-Node: Scanning an Array420727
-Node: Delete423393
-Ref: Delete-Footnote-1425828
-Node: Numeric Array Subscripts425885
-Node: Uninitialized Subscripts428068
-Node: Multi-dimensional429696
-Node: Multi-scanning432790
-Node: Arrays of Arrays434374
-Node: Functions438951
-Node: Built-in439773
-Node: Calling Built-in440851
-Node: Numeric Functions442839
-Ref: Numeric Functions-Footnote-1446604
-Ref: Numeric Functions-Footnote-2446961
-Ref: Numeric Functions-Footnote-3447009
-Node: String Functions447278
-Ref: String Functions-Footnote-1470775
-Ref: String Functions-Footnote-2470904
-Ref: String Functions-Footnote-3471152
-Node: Gory Details471239
-Ref: table-sub-escapes472918
-Ref: table-posix-sub474232
-Ref: table-gensub-escapes475145
-Node: I/O Functions476316
-Ref: I/O Functions-Footnote-1482971
-Node: Time Functions483118
-Ref: Time Functions-Footnote-1494010
-Ref: Time Functions-Footnote-2494078
-Ref: Time Functions-Footnote-3494236
-Ref: Time Functions-Footnote-4494347
-Ref: Time Functions-Footnote-5494459
-Ref: Time Functions-Footnote-6494686
-Node: Bitwise Functions494952
-Ref: table-bitwise-ops495510
-Ref: Bitwise Functions-Footnote-1499670
-Node: Type Functions499854
-Node: I18N Functions500324
-Node: User-defined501951
-Node: Definition Syntax502755
-Ref: Definition Syntax-Footnote-1507665
-Node: Function Example507734
-Node: Function Caveats510328
-Node: Calling A Function510749
-Node: Variable Scope511864
-Node: Pass By Value/Reference513839
-Node: Return Statement517279
-Node: Dynamic Typing520260
-Node: Indirect Calls520995
-Node: Internationalization530680
-Node: I18N and L10N532106
-Node: Explaining gettext532792
-Ref: Explaining gettext-Footnote-1537858
-Ref: Explaining gettext-Footnote-2538042
-Node: Programmer i18n538207
-Node: Translator i18n542407
-Node: String Extraction543200
-Ref: String Extraction-Footnote-1544161
-Node: Printf Ordering544247
-Ref: Printf Ordering-Footnote-1547031
-Node: I18N Portability547095
-Ref: I18N Portability-Footnote-1549544
-Node: I18N Example549607
-Ref: I18N Example-Footnote-1552242
-Node: Gawk I18N552314
-Node: Advanced Features552931
-Node: Nondecimal Data554444
-Node: Array Sorting556027
-Node: Controlling Array Traversal556727
-Node: Controlling Scanning With A Function557474
-Node: Controlling Scanning565177
-Ref: Controlling Scanning-Footnote-1568978
-Node: Array Sorting Functions569294
-Ref: Array Sorting Functions-Footnote-1572810
-Ref: Array Sorting Functions-Footnote-2572903
-Node: Two-way I/O573097
-Ref: Two-way I/O-Footnote-1578529
-Node: TCP/IP Networking578599
-Node: Profiling581443
-Node: Library Functions588917
-Ref: Library Functions-Footnote-1591924
-Node: Library Names592095
-Ref: Library Names-Footnote-1595566
-Ref: Library Names-Footnote-2595786
-Node: General Functions595872
-Node: Strtonum Function596825
-Node: Assert Function599755
-Node: Round Function603081
-Node: Cliff Random Function604624
-Node: Ordinal Functions605640
-Ref: Ordinal Functions-Footnote-1608710
-Ref: Ordinal Functions-Footnote-2608962
-Node: Join Function609171
-Ref: Join Function-Footnote-1610942
-Node: Gettimeofday Function611142
-Node: Data File Management614857
-Node: Filetrans Function615489
-Node: Rewind Function619628
-Node: File Checking621015
-Node: Empty Files622109
-Node: Ignoring Assigns624339
-Node: Getopt Function625892
-Ref: Getopt Function-Footnote-1637196
-Node: Passwd Functions637399
-Ref: Passwd Functions-Footnote-1646374
-Node: Group Functions646462
-Node: Walking Arrays654546
-Node: Sample Programs656115
-Node: Running Examples656780
-Node: Clones657508
-Node: Cut Program658732
-Node: Egrep Program668577
-Ref: Egrep Program-Footnote-1676350
-Node: Id Program676460
-Node: Split Program680076
-Ref: Split Program-Footnote-1683595
-Node: Tee Program683723
-Node: Uniq Program686526
-Node: Wc Program693955
-Ref: Wc Program-Footnote-1698221
-Ref: Wc Program-Footnote-2698421
-Node: Miscellaneous Programs698513
-Node: Dupword Program699701
-Node: Alarm Program701732
-Node: Translate Program706481
-Ref: Translate Program-Footnote-1710868
-Ref: Translate Program-Footnote-2711096
-Node: Labels Program711230
-Ref: Labels Program-Footnote-1714601
-Node: Word Sorting714685
-Node: History Sorting718569
-Node: Extract Program720408
-Ref: Extract Program-Footnote-1727891
-Node: Simple Sed728019
-Node: Igawk Program731081
-Ref: Igawk Program-Footnote-1746238
-Ref: Igawk Program-Footnote-2746439
-Node: Anagram Program746577
-Node: Signature Program749645
-Node: Debugger750745
-Node: Debugging751656
-Node: Debugging Concepts752069
-Node: Debugging Terms753925
-Node: Awk Debugging756548
-Node: Sample dgawk session757440
-Node: dgawk invocation757932
-Node: Finding The Bug759114
-Node: List of Debugger Commands765600
-Node: Breakpoint Control766911
-Node: Dgawk Execution Control770547
-Node: Viewing And Changing Data773898
-Node: Dgawk Stack777235
-Node: Dgawk Info778695
-Node: Miscellaneous Dgawk Commands782643
-Node: Readline Support788071
-Node: Dgawk Limitations788909
-Node: Language History791098
-Node: V7/SVR3.1792536
-Node: SVR4794857
-Node: POSIX796299
-Node: BTL797307
-Node: POSIX/GNU798041
-Node: Common Extensions803192
-Node: Contributors804293
-Node: Installation808554
-Node: Gawk Distribution809448
-Node: Getting809932
-Node: Extracting810758
-Node: Distribution contents812450
-Node: Unix Installation817672
-Node: Quick Installation818289
-Node: Additional Configuration Options820251
-Node: Configuration Philosophy821728
-Node: Non-Unix Installation824070
-Node: PC Installation824528
-Node: PC Binary Installation825827
-Node: PC Compiling827675
-Node: PC Testing830619
-Node: PC Using831795
-Node: Cygwin835980
-Node: MSYS836980
-Node: VMS Installation837494
-Node: VMS Compilation838097
-Ref: VMS Compilation-Footnote-1839104
-Node: VMS Installation Details839162
-Node: VMS Running840797
-Node: VMS Old Gawk842404
-Node: Bugs842878
-Node: Other Versions846731
-Node: Notes852012
-Node: Compatibility Mode852704
-Node: Additions853487
-Node: Accessing The Source854299
-Node: Adding Code855724
-Node: New Ports861691
-Node: Dynamic Extensions865804
-Node: Internals867180
-Node: Plugin License876283
-Node: Sample Library876917
-Node: Internal File Description877603
-Node: Internal File Ops881318
-Ref: Internal File Ops-Footnote-1886099
-Node: Using Internal File Ops886239
-Node: Future Extensions888616
-Node: Basic Concepts891120
-Node: Basic High Level891877
-Ref: Basic High Level-Footnote-1895912
-Node: Basic Data Typing896097
-Node: Floating Point Issues900622
-Node: String Conversion Precision901705
-Ref: String Conversion Precision-Footnote-1903405
-Node: Unexpected Results903514
-Node: POSIX Floating Point Problems905340
-Ref: POSIX Floating Point Problems-Footnote-1909045
-Node: Glossary909083
-Node: Copying934059
-Node: GNU Free Documentation License971616
-Node: Index996753
+Node: Foreword33440
+Node: Preface37785
+Ref: Preface-Footnote-140752
+Ref: Preface-Footnote-240858
+Node: History41090
+Node: Names43481
+Ref: Names-Footnote-144958
+Node: This Manual45030
+Ref: This Manual-Footnote-149977
+Node: Conventions50077
+Node: Manual History52211
+Ref: Manual History-Footnote-155481
+Ref: Manual History-Footnote-255522
+Node: How To Contribute55596
+Node: Acknowledgments56740
+Node: Getting Started61071
+Node: Running gawk63450
+Node: One-shot64636
+Node: Read Terminal65861
+Ref: Read Terminal-Footnote-167511
+Ref: Read Terminal-Footnote-267787
+Node: Long67958
+Node: Executable Scripts69334
+Ref: Executable Scripts-Footnote-171203
+Ref: Executable Scripts-Footnote-271305
+Node: Comments71756
+Node: Quoting74223
+Node: DOS Quoting78846
+Node: Sample Data Files79521
+Node: Very Simple82553
+Node: Two Rules87152
+Node: More Complex89299
+Ref: More Complex-Footnote-192229
+Node: Statements/Lines92314
+Ref: Statements/Lines-Footnote-196776
+Node: Other Features97041
+Node: When97969
+Node: Invoking Gawk100116
+Node: Command Line101501
+Node: Options102284
+Ref: Options-Footnote-1115562
+Node: Other Arguments115587
+Node: Naming Standard Input118245
+Node: Environment Variables119339
+Node: AWKPATH Variable119783
+Ref: AWKPATH Variable-Footnote-1122380
+Node: Other Environment Variables122640
+Node: Exit Status124980
+Node: Include Files125655
+Node: Obsolete129140
+Node: Undocumented129826
+Node: Regexp130067
+Node: Regexp Usage131456
+Node: Escape Sequences133482
+Node: Regexp Operators139245
+Ref: Regexp Operators-Footnote-1146442
+Ref: Regexp Operators-Footnote-2146589
+Node: Bracket Expressions146687
+Ref: table-char-classes148577
+Node: GNU Regexp Operators151100
+Node: Case-sensitivity154823
+Ref: Case-sensitivity-Footnote-1157791
+Ref: Case-sensitivity-Footnote-2158026
+Node: Leftmost Longest158134
+Node: Computed Regexps159335
+Node: Reading Files162745
+Node: Records164686
+Ref: Records-Footnote-1173360
+Node: Fields173397
+Ref: Fields-Footnote-1176430
+Node: Nonconstant Fields176516
+Node: Changing Fields178718
+Node: Field Separators184696
+Node: Default Field Splitting187325
+Node: Regexp Field Splitting188442
+Node: Single Character Fields191784
+Node: Command Line Field Separator192843
+Node: Field Splitting Summary196284
+Ref: Field Splitting Summary-Footnote-1199476
+Node: Constant Size199577
+Node: Splitting By Content204161
+Ref: Splitting By Content-Footnote-1207887
+Node: Multiple Line207927
+Ref: Multiple Line-Footnote-1213774
+Node: Getline213953
+Node: Plain Getline216181
+Node: Getline/Variable218270
+Node: Getline/File219411
+Node: Getline/Variable/File220733
+Ref: Getline/Variable/File-Footnote-1222332
+Node: Getline/Pipe222419
+Node: Getline/Variable/Pipe224979
+Node: Getline/Coprocess226086
+Node: Getline/Variable/Coprocess227329
+Node: Getline Notes228043
+Node: Getline Summary229985
+Ref: table-getline-variants230328
+Node: Command line directories231184
+Node: Printing231809
+Node: Print233440
+Node: Print Examples234777
+Node: Output Separators237561
+Node: OFMT239321
+Node: Printf240679
+Node: Basic Printf241585
+Node: Control Letters243124
+Node: Format Modifiers246936
+Node: Printf Examples252945
+Node: Redirection255660
+Node: Special Files262644
+Node: Special FD263177
+Ref: Special FD-Footnote-1266802
+Node: Special Network266876
+Node: Special Caveats267726
+Node: Close Files And Pipes268522
+Ref: Close Files And Pipes-Footnote-1275545
+Ref: Close Files And Pipes-Footnote-2275693
+Node: Expressions275843
+Node: Values276975
+Node: Constants277651
+Node: Scalar Constants278331
+Ref: Scalar Constants-Footnote-1279190
+Node: Nondecimal-numbers279372
+Node: Regexp Constants282431
+Node: Using Constant Regexps282906
+Node: Variables285961
+Node: Using Variables286616
+Node: Assignment Options288340
+Node: Conversion290212
+Ref: table-locale-affects295588
+Ref: Conversion-Footnote-1296212
+Node: All Operators296321
+Node: Arithmetic Ops296951
+Node: Concatenation299456
+Ref: Concatenation-Footnote-1302249
+Node: Assignment Ops302369
+Ref: table-assign-ops307357
+Node: Increment Ops308765
+Node: Truth Values and Conditions312235
+Node: Truth Values313318
+Node: Typing and Comparison314367
+Node: Variable Typing315156
+Ref: Variable Typing-Footnote-1319053
+Node: Comparison Operators319175
+Ref: table-relational-ops319585
+Node: POSIX String Comparison323134
+Ref: POSIX String Comparison-Footnote-1324090
+Node: Boolean Ops324228
+Ref: Boolean Ops-Footnote-1328306
+Node: Conditional Exp328397
+Node: Function Calls330129
+Node: Precedence333723
+Node: Locales337392
+Node: Patterns and Actions338481
+Node: Pattern Overview339535
+Node: Regexp Patterns341201
+Node: Expression Patterns341744
+Node: Ranges345318
+Node: BEGIN/END348284
+Node: Using BEGIN/END349046
+Ref: Using BEGIN/END-Footnote-1351777
+Node: I/O And BEGIN/END351883
+Node: BEGINFILE/ENDFILE354165
+Node: Empty356998
+Node: Using Shell Variables357314
+Node: Action Overview359599
+Node: Statements361956
+Node: If Statement363810
+Node: While Statement365309
+Node: Do Statement367353
+Node: For Statement368509
+Node: Switch Statement371661
+Node: Break Statement373758
+Node: Continue Statement375748
+Node: Next Statement377535
+Node: Nextfile Statement379925
+Node: Exit Statement382222
+Node: Built-in Variables384638
+Node: User-modified385733
+Ref: User-modified-Footnote-1393759
+Node: Auto-set393821
+Ref: Auto-set-Footnote-1403112
+Node: ARGC and ARGV403317
+Node: Arrays407168
+Node: Array Basics408673
+Node: Array Intro409384
+Node: Reference to Elements413702
+Node: Assigning Elements415972
+Node: Array Example416463
+Node: Scanning an Array418195
+Node: Delete420861
+Ref: Delete-Footnote-1423296
+Node: Numeric Array Subscripts423353
+Node: Uninitialized Subscripts425536
+Node: Multi-dimensional427164
+Node: Multi-scanning430258
+Node: Arrays of Arrays431842
+Node: Functions436419
+Node: Built-in437241
+Node: Calling Built-in438319
+Node: Numeric Functions440307
+Ref: Numeric Functions-Footnote-1444072
+Ref: Numeric Functions-Footnote-2444429
+Ref: Numeric Functions-Footnote-3444477
+Node: String Functions444746
+Ref: String Functions-Footnote-1468243
+Ref: String Functions-Footnote-2468372
+Ref: String Functions-Footnote-3468620
+Node: Gory Details468707
+Ref: table-sub-escapes470386
+Ref: table-posix-sub471700
+Ref: table-gensub-escapes472613
+Node: I/O Functions473784
+Ref: I/O Functions-Footnote-1480439
+Node: Time Functions480586
+Ref: Time Functions-Footnote-1491478
+Ref: Time Functions-Footnote-2491546
+Ref: Time Functions-Footnote-3491704
+Ref: Time Functions-Footnote-4491815
+Ref: Time Functions-Footnote-5491927
+Ref: Time Functions-Footnote-6492154
+Node: Bitwise Functions492420
+Ref: table-bitwise-ops492978
+Ref: Bitwise Functions-Footnote-1497138
+Node: Type Functions497322
+Node: I18N Functions497792
+Node: User-defined499419
+Node: Definition Syntax500223
+Ref: Definition Syntax-Footnote-1505133
+Node: Function Example505202
+Node: Function Caveats507796
+Node: Calling A Function508217
+Node: Variable Scope509332
+Node: Pass By Value/Reference511307
+Node: Return Statement514747
+Node: Dynamic Typing517728
+Node: Indirect Calls518463
+Node: Internationalization528148
+Node: I18N and L10N529574
+Node: Explaining gettext530260
+Ref: Explaining gettext-Footnote-1535326
+Ref: Explaining gettext-Footnote-2535510
+Node: Programmer i18n535675
+Node: Translator i18n539875
+Node: String Extraction540668
+Ref: String Extraction-Footnote-1541629
+Node: Printf Ordering541715
+Ref: Printf Ordering-Footnote-1544499
+Node: I18N Portability544563
+Ref: I18N Portability-Footnote-1547012
+Node: I18N Example547075
+Ref: I18N Example-Footnote-1549710
+Node: Gawk I18N549782
+Node: Advanced Features550399
+Node: Nondecimal Data551912
+Node: Array Sorting553495
+Node: Controlling Array Traversal554195
+Node: Controlling Scanning With A Function554942
+Node: Controlling Scanning562645
+Ref: Controlling Scanning-Footnote-1566446
+Node: Array Sorting Functions566762
+Ref: Array Sorting Functions-Footnote-1570278
+Ref: Array Sorting Functions-Footnote-2570371
+Node: Two-way I/O570565
+Ref: Two-way I/O-Footnote-1575997
+Node: TCP/IP Networking576067
+Node: Profiling578911
+Node: Library Functions586385
+Ref: Library Functions-Footnote-1589392
+Node: Library Names589563
+Ref: Library Names-Footnote-1593034
+Ref: Library Names-Footnote-2593254
+Node: General Functions593340
+Node: Strtonum Function594293
+Node: Assert Function597223
+Node: Round Function600549
+Node: Cliff Random Function602092
+Node: Ordinal Functions603108
+Ref: Ordinal Functions-Footnote-1606178
+Ref: Ordinal Functions-Footnote-2606430
+Node: Join Function606639
+Ref: Join Function-Footnote-1608410
+Node: Gettimeofday Function608610
+Node: Data File Management612325
+Node: Filetrans Function612957
+Node: Rewind Function617096
+Node: File Checking618483
+Node: Empty Files619577
+Node: Ignoring Assigns621807
+Node: Getopt Function623360
+Ref: Getopt Function-Footnote-1634664
+Node: Passwd Functions634867
+Ref: Passwd Functions-Footnote-1643842
+Node: Group Functions643930
+Node: Walking Arrays652014
+Node: Sample Programs653583
+Node: Running Examples654248
+Node: Clones654976
+Node: Cut Program656200
+Node: Egrep Program666045
+Ref: Egrep Program-Footnote-1673818
+Node: Id Program673928
+Node: Split Program677544
+Ref: Split Program-Footnote-1681063
+Node: Tee Program681191
+Node: Uniq Program683994
+Node: Wc Program691423
+Ref: Wc Program-Footnote-1695689
+Ref: Wc Program-Footnote-2695889
+Node: Miscellaneous Programs695981
+Node: Dupword Program697169
+Node: Alarm Program699200
+Node: Translate Program703949
+Ref: Translate Program-Footnote-1708336
+Ref: Translate Program-Footnote-2708564
+Node: Labels Program708698
+Ref: Labels Program-Footnote-1712069
+Node: Word Sorting712153
+Node: History Sorting716037
+Node: Extract Program717876
+Ref: Extract Program-Footnote-1725359
+Node: Simple Sed725487
+Node: Igawk Program728549
+Ref: Igawk Program-Footnote-1743706
+Ref: Igawk Program-Footnote-2743907
+Node: Anagram Program744045
+Node: Signature Program747113
+Node: Debugger748213
+Node: Debugging749124
+Node: Debugging Concepts749537
+Node: Debugging Terms751393
+Node: Awk Debugging754016
+Node: Sample dgawk session754908
+Node: dgawk invocation755400
+Node: Finding The Bug756582
+Node: List of Debugger Commands763068
+Node: Breakpoint Control764379
+Node: Dgawk Execution Control768015
+Node: Viewing And Changing Data771366
+Node: Dgawk Stack774703
+Node: Dgawk Info776163
+Node: Miscellaneous Dgawk Commands780111
+Node: Readline Support785539
+Node: Dgawk Limitations786377
+Node: Language History788566
+Node: V7/SVR3.1790078
+Node: SVR4792399
+Node: POSIX793841
+Node: BTL794849
+Node: POSIX/GNU795583
+Node: Common Extensions800734
+Node: Ranges and Locales801841
+Ref: Ranges and Locales-Footnote-1806448
+Node: Contributors806669
+Node: Installation810931
+Node: Gawk Distribution811825
+Node: Getting812309
+Node: Extracting813135
+Node: Distribution contents814827
+Node: Unix Installation820049
+Node: Quick Installation820666
+Node: Additional Configuration Options822628
+Node: Configuration Philosophy824105
+Node: Non-Unix Installation826447
+Node: PC Installation826905
+Node: PC Binary Installation828204
+Node: PC Compiling830052
+Node: PC Testing832996
+Node: PC Using834172
+Node: Cygwin838357
+Node: MSYS839357
+Node: VMS Installation839871
+Node: VMS Compilation840474
+Ref: VMS Compilation-Footnote-1841481
+Node: VMS Installation Details841539
+Node: VMS Running843174
+Node: VMS Old Gawk844781
+Node: Bugs845255
+Node: Other Versions849108
+Node: Notes854389
+Node: Compatibility Mode855081
+Node: Additions855864
+Node: Accessing The Source856676
+Node: Adding Code858101
+Node: New Ports864068
+Node: Dynamic Extensions868181
+Node: Internals869557
+Node: Plugin License878660
+Node: Sample Library879294
+Node: Internal File Description879980
+Node: Internal File Ops883695
+Ref: Internal File Ops-Footnote-1888476
+Node: Using Internal File Ops888616
+Node: Future Extensions890993
+Node: Basic Concepts893497
+Node: Basic High Level894254
+Ref: Basic High Level-Footnote-1898289
+Node: Basic Data Typing898474
+Node: Floating Point Issues902999
+Node: String Conversion Precision904082
+Ref: String Conversion Precision-Footnote-1905782
+Node: Unexpected Results905891
+Node: POSIX Floating Point Problems907717
+Ref: POSIX Floating Point Problems-Footnote-1911422
+Node: Glossary911460
+Node: Copying936436
+Node: GNU Free Documentation License973993
+Node: Index999130
 
 End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index b9190a6..a74773c 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -20,7 +20,7 @@
 @c applies to and all the info about who's publishing this edition
 
 @c These apply across the board.
address@hidden UPDATE-MONTH May, 2011
address@hidden UPDATE-MONTH June, 2011
 @set VERSION 4.0
 @set PATCHLEVEL 0
 
@@ -368,7 +368,6 @@ particular records in a file and perform operations upon 
them.
 * Case-sensitivity::                   How to do case-insensitive matching.
 * Leftmost Longest::                   How much text matches.
 * Computed Regexps::                   Using Dynamic Regexps.
-* Locales::                            How the locale affects things.
 * Records::                            Controlling how data is split into
                                        records.
 * Fields::                             An introduction to fields.
@@ -467,6 +466,7 @@ particular records in a file and perform operations upon 
them.
                                        third subexpression.
 * Function Calls::                     A function call is an expression.
 * Precedence::                         How various operators nest.
+* Locales::                            How the locale affects things.
 * Pattern Overview::                   What goes into a pattern.
 * Regexp Patterns::                    Using regexps as patterns.
 * Expression Patterns::                Any expression can be used as a
@@ -673,6 +673,8 @@ particular records in a file and perform operations upon 
them.
 * POSIX/GNU::                          The extensions in @command{gawk} not in
                                        POSIX @command{awk}.
 * Common Extensions::                  Common Extensions Summary.
+* Ranges and Locales::                 How locales used to affect regexp
+                                       ranges.
 * Contributors::                       The major contributors to
                                        @command{gawk}.
 * Gawk Distribution::                  What is in the @command{gawk}
@@ -4003,7 +4005,6 @@ regular expressions work, we present more complicated 
instances.
 * Case-sensitivity::            How to do case-insensitive matching.
 * Leftmost Longest::            How much text matches.
 * Computed Regexps::            Using Dynamic Regexps.
-* Locales::                     How the locale affects things.
 @end menu
 
 @node Regexp Usage
@@ -4530,15 +4531,14 @@ As in arithmetic, parentheses can change how operators 
are grouped.
 
 @cindex POSIX @command{awk}, regular expressions and
 @cindex @command{gawk}, regular expressions, precedence
-In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and 
@samp{?} operators
-stand for themselves when there is nothing in the regexp that precedes them.
-For example, @code{/+/} matches a literal plus sign.  However, many other 
versions of
address@hidden treat such a usage as a syntax error.
-
-If @command{gawk} is in compatibility mode
-(@pxref{Options}),
-interval expressions are not available in
-regular expressions.
+In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and
address@hidden operators stand for themselves when there is nothing in the
+regexp that precedes them.  For example, @code{/+/} matches a literal
+plus sign.  However, many other versions of @command{awk} treat such a
+usage as a syntax error.
+
+If @command{gawk} is in compatibility mode (@pxref{Options}), interval
+expressions are not available in regular expressions.
 @c ENDOFRANGE regexpo
 
 @node Bracket Expressions
@@ -4548,15 +4548,16 @@ regular expressions.
 @cindex bracket expressions, range expressions
 @cindex range expressions (regexps)
 
+As mentioned earlier, a bracket expression matches any character amongst
+those listed between the opening and closing square brackets.
+
 Within a bracket expression, a @dfn{range expression} consists of two
 characters separated by a hyphen.  It matches any single character that
-sorts between the two characters, using the locale's
-collating sequence and character set.
-For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
-
-Unfortunately, providing simple character ranges such as @samp{[a-z]}
-usually does not work like you might expect, due to locale-related issues.
-This is discussed more fully, in @ref{Locales}.
+sorts between the two characters, based upon the system's native character
+set.  For example, @samp{[0-9]} is equivalent to @samp{[0123456789]}.
+(See @ref{Ranges and Locales}, for an explanation of how the POSIX
+standard and @command{gawk} have changed over time.  This is mainly
+of historical interest.)
 
 @cindex @code{\} (backslash), in bracket expressions
 @cindex backslash (@code{\}), in bracket expressions
@@ -4625,8 +4626,7 @@ control characters, or space characters).
 For example, before the POSIX standard, you had to write @code{/[A-Za-z0-9]/}
 to match alphanumeric characters.  If your
 character set had other alphabetic characters in it, this would not
-match them, and if your character set collated differently from
-ASCII, this might not even match the ASCII alphanumeric characters.
+match them.
 With the POSIX character classes, you can write
 @code{/[[:alnum:]]/} to match the alphabetic
 and numeric characters in your character set.
@@ -5105,94 +5105,6 @@ occur often in practice, but it's worth noting for 
future reference.
 @c ENDOFRANGE regexpd
 @c ENDOFRANGE regexp
 
address@hidden Locales
address@hidden Where You Are Makes A Difference
address@hidden locale, definition of
-
-Modern systems support the notion of @dfn{locales}: a way to tell
-the system about the local character set and language.  The current
-locale setting can affect the way regexp matching works, often
-in surprising ways.
-
-For example, in the default @code{"C"} locale, @samp{[a-dx-z]} is equivalent to
address@hidden  Many locales sort characters in dictionary order,
-and in these locales, @samp{[a-dx-z]} is typically not equivalent to
address@hidden; instead it might be equivalent to @samp{[aBbCcdXxYyz]},
-for example.
-
-This point needs to be emphasized: Much literature teaches that you should
-use @samp{[a-z]} to match a lowercase character.  But on systems with
-non-ASCII locales, this also matches all of the uppercase characters
-except @samp{Z}!  This is a continuous cause of confusion, even well
-into the twenty-first century.
-
address@hidden NOTE
-In an attempt to end the confusion once and for all,
-when not in POSIX mode (@pxref{Options}),
address@hidden expands ranges into the characters they
-include, based only on the machine character set.
-This restores the traditional, pre-POSIX, pre-locales
-behavior.  However, you should read the rest of this section 
-so that you can write portable scripts, instead of relying
-on behavior specific to @command{gawk}.
address@hidden quotation
-
-To obtain the traditional interpretation of bracket expressions, you can
-use the @code{"C"} locale by setting the @env{LC_ALL} environment variable to 
the
-value @samp{C}.  However, it is best to just use POSIX character classes,
-such as @samp{[[:lower:]]} to match specific classes of characters.
-
-To demonstrate these issues, the following example uses the @code{sub()}
-function, which does text replacement (@pxref{String Functions}).  Here,
-the intent is to remove trailing uppercase characters:
-
address@hidden
-$ @kbd{echo something1234abc | gawk --posix '@{ sub("[A-Z]*$", ""); print @}'}
address@hidden something1234a
address@hidden example
-
address@hidden
-This output is unexpected, since the @samp{bc} at the end of
address@hidden should not normally match @samp{[A-Z]*}.
-This result is due to the locale setting (and thus you may not see
-it on your system).  There are two fixes.  The first is to use the
-POSIX character class @samp{[[:upper:]]}, instead of @samp{[A-Z]}.
-(This is preferred, since then your program will work everywhere.)
-
-The second is to change the locale setting in the environment, before
-running @command{gawk}, by using the shell statements:
-
address@hidden
-LANG=C LC_ALL=C
-export LANG LC_ALL
address@hidden example
-
-The setting @samp{C} forces @command{gawk} to behave in the traditional
-Unix manner, where case distinctions do matter.
-You may wish to put these statements into your shell startup file,
-e.g., @file{$HOME/.profile}.
-
-Similar considerations apply to other ranges.  For example,
address@hidden"-/]} is perfectly valid in ASCII, but is not valid in many
-Unicode locales, such as @samp{en_US.UTF-8}.  (In general, such
-ranges should be avoided; either list the characters individually,
-or use a POSIX character class such as @samp{[[:punct:]]}.)
-
-An additional factor relates to splitting records.
-For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant.
-For other single-character record separators, using @samp{LC_ALL=C}
-will give you much better performance when reading records.  Otherwise,
address@hidden has to make several function calls, @emph{per input
-character}, to find the record terminator.
-
-According to POSIX, string comparison is also affected by locales
-(similar to regular expressions).  The details are presented in
address@hidden String Comparison}.
-
-Finally, the locale affects the value of the decimal point character
-used when @command{gawk} parses input data.  This is discussed in
-detail in @ref{Conversion}.
-
 @node Reading Files
 @chapter Reading Input Files
 
@@ -8773,6 +8685,7 @@ combinations of these with various operators.
 * Truth Values and Conditions:: Testing for true and false.
 * Function Calls::              A function call is an expression.
 * Precedence::                  How various operators nest.
+* Locales::                     How the locale affects things.
 @end menu
 
 @node Values
@@ -10933,6 +10846,33 @@ For maximum portability, do not use them.
 @end quotation
 @c ENDOFRANGE prec
 @c ENDOFRANGE oppr
+
address@hidden Locales
address@hidden Where You Are Makes A Difference
address@hidden locale, definition of
+
+Modern systems support the notion of @dfn{locales}: a way to tell
+the system about the local character set and language.
+
+Once upon a time, the locale setting used to affect regexp matching
+(@pxref{Ranges and Locales}), but this is no longer true.
+
+Locales can affect record splitting.
+For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant.
+For other single-character record separators, setting @samp{LC_ALL=C}
+in the environment
+will give you much better performance when reading records.  Otherwise,
address@hidden has to make several function calls, @emph{per input
+character}, to find the record terminator.
+
+According to POSIX, string comparison is also affected by locales
+(similar to regular expressions).  The details are presented in
address@hidden String Comparison}.
+
+Finally, the locale affects the value of the decimal point character
+used when @command{gawk} parses input data.  This is discussed in
+detail in @ref{Conversion}.
+
 @c ENDOFRANGE exps
 
 @node Patterns and Actions
@@ -26434,6 +26374,7 @@ of the @value{DOCUMENT} where you can find more 
information.
 * POSIX/GNU::                   The extensions in @command{gawk} not in POSIX
                                 @command{awk}.
 * Common Extensions::           Common Extensions Summary.
+* Ranges and Locales::          How locales used to affect regexp ranges.
 * Contributors::                The major contributors to @command{gawk}.
 @end menu
 
@@ -26977,6 +26918,103 @@ the three most widely-used freely available versions 
of @command{awk}
 @item @code{BINMODE} variable @tab @tab X @tab X
 @end multitable
 
address@hidden Ranges and Locales
address@hidden Regexp Ranges and Locales: A Long Sad Story
+
+This @value{SECTION} describes the confusing history of ranges within
+regular expressions and their interactions with locales, and how this
+affected different versions of @command{gawk}.
+
+The original Unix tools that worked with regular expressions defined
+character ranges (such as @samp{[a-z]}) to match any character between
+the first character in the range and the last character in the range,
+inclusive.  Ordering was based on the numeric value of each character
+in the machine's native character set.  Thus, on ASCII-based systems,
address@hidden matched all the lowercase letters, and only the lowercase
+letters, since the numeric values for the letters from @samp{a} through
address@hidden were contigous.  (On an EBCDIC system, the range @samp{[a-z]}
+includes additional, non-alphabetic characters as well.)
+
+Almost all introductory Unix literature explained range expressions
+as working in this fashion, and in particular, would teach that the
+``correct'' way to match lowercase letters was with @samp{[a-z]}, and
+that @samp{[A-Z]} was the the ``correct'' way to match uppercase letters.
+And indeed, this was true.
+
+The 1993 POSIX standard introduced the idea of locales (@pxref{Locales}).
+Since many locales include other letters besides the plain twenty-six
+letters of the American English alphabet, the POSIX standard added
+character classes (@pxref{Bracket Expressions}) as a way to match
+different kinds of characters besides the traditional ones in the ASCII
+character set.
+
+However, the standard @emph{changed} the interpretation of range expressions.
+In the @code{"C"} and @code{"POSIX"} locales, a range expression like
address@hidden is still equivalent to @samp{[abcdxyz]}, as in ASCII.
+But outside those locales, the ordering was defined to be based on
address@hidden order}.
+
+In many locales, @samp{A} and @samp{a} are both less than @samp{B}.
+In other words, these locales sort characters in dictionary order,
+and @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]};
+instead it might be equivalent to @samp{[aBbCcdXxYyz]}, for example.
+
+This point needs to be emphasized: Much literature teaches that you should
+use @samp{[a-z]} to match a lowercase character.  But on systems with
+non-ASCII locales, this also matched all of the uppercase characters
+except @samp{Z}!  This was a continuous cause of confusion, even well
+into the twenty-first century.
+
+To demonstrate these issues, the following example uses the @code{sub()}
+function, which does text replacement (@pxref{String Functions}).  Here,
+the intent is to remove trailing uppercase characters:
+
address@hidden
+$ @kbd{echo something1234abc | gawk-3.1.8 '@{ sub("[A-Z]*$", ""); print @}'}
address@hidden something1234a
address@hidden example
+
address@hidden
+This output is unexpected, since the @samp{bc} at the end of
address@hidden should not normally match @samp{[A-Z]*}.
+This result is due to the locale setting (and thus you may not see
+it on your system).
+
+Similar considerations apply to other ranges.  For example, @samp{["-/]}
+is perfectly valid in ASCII, but is not valid in many Unicode locales,
+such as @samp{en_US.UTF-8}.
+
+Early versions of @command{gawk} used regexp matching code that was not
+locale aware, so ranges had their traditional interpretation.
+
+When @command{gawk} switched to using locale-aware regexp matchers,
+the problems began; especially as both GNU/Linux and commercial Unix
+vendors started implementing non-ASCII locales, @emph{and making them
+the default}.  Perhaps the most frequently asked question became something
+like ``why does @code{[A-Z]} match lowercase letters?!?''
+
+This situation existed for close to 10 years, if not more, and
+the @command{gawk} maintainer grew weary of trying to explain that
address@hidden was being nicely standards-compliant, and that the issue
+was in the user's locale.  During the development of version 4.0,
+he modified @command{gawk} to always treat ranges in the original,
+pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).
+
+Fortunately, shortly before the final release of @command{gawk} 4.0,
+the maintainer learned that the 2008 standard had changed the
+definition of ranges, such that outside the @code{"C"} and @code{"POSIX"}
+locales, the meaning of range expressions was
address@hidden@footnote{See
address@hidden://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05,
 the standard}
+and
address@hidden://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05,
 its rationale}.}
+
+By using this lovely technical term, the standard gives license
+to implementors to implement ranges in whatever way they choose.
+The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all
+cases: the default regexp matching; with @option{--traditional}, and with
address@hidden; in all cases, @command{gawk} remains POSIX compliant.
+
 @node Contributors
 @appendixsec Major Contributors to @command{gawk}
 @cindex @command{gawk}, list of contributors to
diff --git a/re.c b/re.c
index 3dce1d5..2e1a37e 100644
--- a/re.c
+++ b/re.c
@@ -382,13 +382,26 @@ resetup()
 {
        if (do_posix)
                syn = RE_SYNTAX_POSIX_AWK;      /* strict POSIX re's */
-       else if (do_traditional) {
+       else if (do_traditional)
                syn = RE_SYNTAX_AWK;            /* traditional Unix awk re's */
-               syn |= RE_RANGES_IGNORE_LOCALES;
-       } else
+       else
                syn = RE_SYNTAX_GNU_AWK;        /* POSIX re's + GNU ops */
 
        /*
+        * As of POSIX 1003.1-2008 (see rule 7 of 
+        * 
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
+        * and the rationale, at 
http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05)
+        * POSIX changed ranges outside the POSIX locale from requiring
+        * Collation Element Order to being "undefined". This gives an
+        * implementation, like gawk, the freedom to do ranges as it
+        * pleases.
+        *
+        * We very much please to always use numeric ordering, as
+        * the Good Lord intended.
+        */
+       syn |= RE_RANGES_IGNORE_LOCALES;
+
+       /*
         * Interval expressions are now on by default, as POSIX is
         * wide-spread enough that people want it. The do_intervals
         * variable remains for use with --traditional.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog     |    7 +
 doc/ChangeLog |    7 +
 doc/gawk.info | 1102 ++++++++++++++++++++++++++++++---------------------------
 doc/gawk.texi |  256 ++++++++------
 re.c          |   19 +-
 5 files changed, 753 insertions(+), 638 deletions(-)


hooks/post-receive
-- 
gawk



reply via email to

[Prev in Thread] Current Thread [Next in Thread]