gawk-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gawk-diffs] [SCM] gawk branch, gawk-4.0-stable, updated. 7bfc288d27bacb


From: Arnold Robbins
Subject: [gawk-diffs] [SCM] gawk branch, gawk-4.0-stable, updated. 7bfc288d27bacb715ff63dbf71be53304917685a
Date: Fri, 20 Jul 2012 09:27:14 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gawk".

The branch, gawk-4.0-stable has been updated
       via  7bfc288d27bacb715ff63dbf71be53304917685a (commit)
      from  4fe1f4ac1aa0e4b99c9abb26794fc0d10ebb77c6 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.sv.gnu.org/cgit/gawk.git/commit/?id=7bfc288d27bacb715ff63dbf71be53304917685a

commit 7bfc288d27bacb715ff63dbf71be53304917685a
Author: Arnold D. Robbins <address@hidden>
Date:   Fri Jul 20 12:26:59 2012 +0300

    Fix doc on ranges and locales.

diff --git a/doc/ChangeLog b/doc/ChangeLog
index e56c35a..75b3915 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,8 @@
+2012-07-20         Arnold D. Robbins     <address@hidden>
+
+       * gawk.texi (Ranges and Locales): Clarified ranges and
+       locales.
+
 2012-07-13         Arnold D. Robbins     <address@hidden>
 
        * gawk.texi (Getline Notes): Discuss side effects in
diff --git a/doc/gawk.info b/doc/gawk.info
index c278148..c485e4c 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -20163,7 +20163,7 @@ additional, non-alphabetic characters as well.)
 as working in this fashion, and in particular, would teach that the
 "correct" way to match lowercase letters was with `[a-z]', and that
 `[A-Z]' was the "correct" way to match uppercase letters.  And indeed,
-this was true.
+this was true.(1)
 
    The 1993 POSIX standard introduced the idea of locales (*note
 Locales::).  Since many locales include other letters besides the plain
@@ -20181,13 +20181,14 @@ outside those locales, the ordering was defined to be 
based on
    In many locales, `A' and `a' are both less than `B'.  In other
 words, these locales sort characters in dictionary order, and
 `[a-dx-z]' is typically not equivalent to `[abcdxyz]'; instead it might
-be equivalent to `[aBbCcdXxYyz]', for example.
+be equivalent to `[aBbCcDdXxYyZz]', for example.  (And to make things
+worse, on other systems, it might be equivalent to `[aAbBcCdDxXyYz]'.)
 
    This point needs to be emphasized: Much literature teaches that you
 should use `[a-z]' to match a lowercase character.  But on systems with
 non-ASCII locales, this also matched all of the uppercase characters
-except `Z'!  This was a continuous cause of confusion, even well into
-the twenty-first century.
+except `A' or `Z'!  This was a continuous cause of confusion, even well
+into the twenty-first century.
 
    To demonstrate these issues, the following example uses the `sub()'
 function, which does text replacement (*note String Functions::).  Here,
@@ -20218,12 +20219,12 @@ like "why does `[A-Z]' match lowercase letters?!?"
 nicely standards-compliant, and that the issue was in the user's
 locale.  During the development of version 4.0, he modified `gawk' to
 always treat ranges in the original, pre-POSIX fashion, unless
-`--posix' was used (*note Options::).
+`--posix' was used (*note Options::).(2)
 
    Fortunately, shortly before the final release of `gawk' 4.0, the
 maintainer learned that the 2008 standard had changed the definition of
 ranges, such that outside the `"C"' and `"POSIX"' locales, the meaning
-of range expressions was _undefined_.(1)
+of range expressions was _undefined_.(3)
 
    By using this lovely technical term, the standard gives license to
 implementors to implement ranges in whatever way they choose.  The
@@ -20233,7 +20234,14 @@ in all cases, `gawk' remains POSIX compliant.
 
    ---------- Footnotes ----------
 
-   (1) See the standard
+   (1) And Life was good.
+
+   (2) And thus was born the Campain for Rational Range Interpretation
+(or RRI). A number of GNU tools, such as `grep' and `sed', have either
+implemented this change, or will soon.  Thanks to Karl Berry for
+coining the phrase "Rational Range Interpretation."
+
+   (3) See the standard
 
(http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05)
 and its rationale
 
(http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05).
@@ -27817,61 +27825,63 @@ Node: BTL798181
 Node: POSIX/GNU798915
 Node: Common Extensions804066
 Node: Ranges and Locales805173
-Ref: Ranges and Locales-Footnote-1809777
-Node: Contributors809998
-Node: Installation814260
-Node: Gawk Distribution815154
-Node: Getting815638
-Node: Extracting816464
-Node: Distribution contents818156
-Node: Unix Installation823378
-Node: Quick Installation823995
-Node: Additional Configuration Options825957
-Node: Configuration Philosophy827434
-Node: Non-Unix Installation829776
-Node: PC Installation830234
-Node: PC Binary Installation831533
-Node: PC Compiling833381
-Node: PC Testing836325
-Node: PC Using837501
-Node: Cygwin841686
-Node: MSYS842686
-Node: VMS Installation843200
-Node: VMS Compilation843803
-Ref: VMS Compilation-Footnote-1844810
-Node: VMS Installation Details844868
-Node: VMS Running846503
-Node: VMS Old Gawk848110
-Node: Bugs848584
-Node: Other Versions852436
-Node: Notes857717
-Node: Compatibility Mode858409
-Node: Additions859192
-Node: Accessing The Source860004
-Node: Adding Code861429
-Node: New Ports867396
-Node: Dynamic Extensions871509
-Node: Internals872885
-Node: Plugin License881988
-Node: Sample Library882622
-Node: Internal File Description883308
-Node: Internal File Ops887023
-Ref: Internal File Ops-Footnote-1891804
-Node: Using Internal File Ops891944
-Node: Future Extensions894321
-Node: Basic Concepts896825
-Node: Basic High Level897582
-Ref: Basic High Level-Footnote-1901617
-Node: Basic Data Typing901802
-Node: Floating Point Issues906327
-Node: String Conversion Precision907410
-Ref: String Conversion Precision-Footnote-1909110
-Node: Unexpected Results909219
-Node: POSIX Floating Point Problems911045
-Ref: POSIX Floating Point Problems-Footnote-1914750
-Node: Glossary914788
-Node: Copying939764
-Node: GNU Free Documentation License977321
-Node: Index1002458
+Ref: Ranges and Locales-Footnote-1809884
+Ref: Ranges and Locales-Footnote-2809911
+Ref: Ranges and Locales-Footnote-3810171
+Node: Contributors810392
+Node: Installation814654
+Node: Gawk Distribution815548
+Node: Getting816032
+Node: Extracting816858
+Node: Distribution contents818550
+Node: Unix Installation823772
+Node: Quick Installation824389
+Node: Additional Configuration Options826351
+Node: Configuration Philosophy827828
+Node: Non-Unix Installation830170
+Node: PC Installation830628
+Node: PC Binary Installation831927
+Node: PC Compiling833775
+Node: PC Testing836719
+Node: PC Using837895
+Node: Cygwin842080
+Node: MSYS843080
+Node: VMS Installation843594
+Node: VMS Compilation844197
+Ref: VMS Compilation-Footnote-1845204
+Node: VMS Installation Details845262
+Node: VMS Running846897
+Node: VMS Old Gawk848504
+Node: Bugs848978
+Node: Other Versions852830
+Node: Notes858111
+Node: Compatibility Mode858803
+Node: Additions859586
+Node: Accessing The Source860398
+Node: Adding Code861823
+Node: New Ports867790
+Node: Dynamic Extensions871903
+Node: Internals873279
+Node: Plugin License882382
+Node: Sample Library883016
+Node: Internal File Description883702
+Node: Internal File Ops887417
+Ref: Internal File Ops-Footnote-1892198
+Node: Using Internal File Ops892338
+Node: Future Extensions894715
+Node: Basic Concepts897219
+Node: Basic High Level897976
+Ref: Basic High Level-Footnote-1902011
+Node: Basic Data Typing902196
+Node: Floating Point Issues906721
+Node: String Conversion Precision907804
+Ref: String Conversion Precision-Footnote-1909504
+Node: Unexpected Results909613
+Node: POSIX Floating Point Problems911439
+Ref: POSIX Floating Point Problems-Footnote-1915144
+Node: Glossary915182
+Node: Copying940158
+Node: GNU Free Documentation License977715
+Node: Index1002852
 
 End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index fb17b71..bf30d01 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -66,6 +66,15 @@
 @set DARKCORNER (d.c.)
 @set COMMONEXT (c.e.)
 @end ifdocbook
address@hidden
address@hidden DOCUMENT book
address@hidden CHAPTER chapter
address@hidden APPENDIX appendix
address@hidden SECTION section
address@hidden SUBSECTION subsection
address@hidden DARKCORNER (d.c.)
address@hidden COMMONEXT (c.e.)
address@hidden ifxml
 @ifplaintext
 @set DOCUMENT book
 @set CHAPTER chapter
@@ -27062,7 +27071,7 @@ Almost all introductory Unix literature explained range 
expressions
 as working in this fashion, and in particular, would teach that the
 ``correct'' way to match lowercase letters was with @samp{[a-z]}, and
 that @samp{[A-Z]} was the ``correct'' way to match uppercase letters.
-And indeed, this was true.
+And indeed, this was address@hidden Life was good.}
 
 The 1993 POSIX standard introduced the idea of locales (@pxref{Locales}).
 Since many locales include other letters besides the plain twenty-six
@@ -27080,12 +27089,14 @@ But outside those locales, the ordering was defined 
to be based on
 In many locales, @samp{A} and @samp{a} are both less than @samp{B}.
 In other words, these locales sort characters in dictionary order,
 and @samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]};
-instead it might be equivalent to @samp{[aBbCcdXxYyz]}, for example.
+instead it might be equivalent to @samp{[aBbCcDdXxYyZz]}, for example.
+(And to make things worse, on other systems, it might be equivalent to
address@hidden)
 
 This point needs to be emphasized: Much literature teaches that you should
 use @samp{[a-z]} to match a lowercase character.  But on systems with
 non-ASCII locales, this also matched all of the uppercase characters
-except @samp{Z}!  This was a continuous cause of confusion, even well
+except @samp{A} or @samp{Z}!  This was a continuous cause of confusion, even 
well
 into the twenty-first century.
 
 To demonstrate these issues, the following example uses the @code{sub()}
@@ -27121,13 +27132,16 @@ the @command{gawk} maintainer grew weary of trying to 
explain that
 @command{gawk} was being nicely standards-compliant, and that the issue
 was in the user's locale.  During the development of version 4.0,
 he modified @command{gawk} to always treat ranges in the original,
-pre-POSIX fashion, unless @option{--posix} was used (@pxref{Options}).
+pre-POSIX fashion, unless @option{--posix} was used 
(@pxref{Options})address@hidden
+thus was born the Campain for Rational Range Interpretation (or RRI). A number
+of GNU tools, such as @command{grep} and @command{sed}, have either
+implemented this change, or will soon.  Thanks to Karl Berry for coining the 
phrase
+``Rational Range Interpretation.''}
 
 Fortunately, shortly before the final release of @command{gawk} 4.0,
 the maintainer learned that the 2008 standard had changed the
 definition of ranges, such that outside the @code{"C"} and @code{"POSIX"}
-locales, the meaning of range expressions was
address@hidden@footnote{See
+locales, the meaning of range expressions was @address@hidden
 
@uref{http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05,
 the standard}
 and
 
@uref{http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html#tag_21_09_03_05,
 its rationale}.}

-----------------------------------------------------------------------

Summary of changes:
 doc/ChangeLog |    5 ++
 doc/gawk.info |  136 ++++++++++++++++++++++++++++++--------------------------
 doc/gawk.texi |   26 ++++++++---
 3 files changed, 98 insertions(+), 69 deletions(-)


hooks/post-receive
-- 
gawk



reply via email to

[Prev in Thread] Current Thread [Next in Thread]