gawk-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[SCM] gawk branch, gawk-5.1-stable, updated. gawk-4.1.0-4135-gd9ac13f


From: Arnold Robbins
Subject: [SCM] gawk branch, gawk-5.1-stable, updated. gawk-4.1.0-4135-gd9ac13f
Date: Sun, 4 Oct 2020 05:33:55 -0400 (EDT)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gawk".

The branch, gawk-5.1-stable has been updated
       via  d9ac13ffbb4dfe7ea18884f0a8898c0decbfa07b (commit)
       via  0ee9fa0f6ef9ac84a254ebf4068d15a474e0605b (commit)
      from  d8e0c7f7f2d397484f11df9d3b4805d5a343a1b6 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.sv.gnu.org/cgit/gawk.git/commit/?id=d9ac13ffbb4dfe7ea18884f0a8898c0decbfa07b

commit d9ac13ffbb4dfe7ea18884f0a8898c0decbfa07b
Author: Arnold D. Robbins <arnold@skeeve.com>
Date:   Sun Oct 4 12:33:36 2020 +0300

    Another small doc fix.

diff --git a/doc/ChangeLog b/doc/ChangeLog
index a6ad9a7..60b891e 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -4,9 +4,14 @@
 
        Unrelated:
 
-       * gawktexi (Wc Program): Update to POSIX, support both bytes
+       * gawktexi.in (Wc Program): Update to POSIX, support both bytes
        and characters via the gawkextlib mbs extension.
 
+       Unrelated:
+
+       * gawktexi.in: Remove TODO at end of file related to recursion;
+       it's already handled.
+
 2020-10-01         Arnold D. Robbins     <arnold@skeeve.com>
 
        * gawktexi.in (Split Program): Rewrite split to be POSIX
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 91625d0..08b30ee 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -46336,8 +46336,3 @@ But to use it you have to say
 which sorta sucks.
 
 TODO:
-FIXME:
-Add a section explaining recursion from ground zero. Probably
-easiest to do it with factorial as the example. Explain that
-recursion needs a stopping condition. Thanks to
-Bill Duncan <bduncan@beachnet.org> for the suggestion.
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index f982ae8..7558907 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -45307,8 +45307,3 @@ But to use it you have to say
 which sorta sucks.
 
 TODO:
-FIXME:
-Add a section explaining recursion from ground zero. Probably
-easiest to do it with factorial as the example. Explain that
-recursion needs a stopping condition. Thanks to
-Bill Duncan <bduncan@beachnet.org> for the suggestion.

http://git.sv.gnu.org/cgit/gawk.git/commit/?id=0ee9fa0f6ef9ac84a254ebf4068d15a474e0605b

commit 0ee9fa0f6ef9ac84a254ebf4068d15a474e0605b
Author: Arnold D. Robbins <arnold@skeeve.com>
Date:   Sun Oct 4 12:27:34 2020 +0300

    Update wc example to modern POSIX.

diff --git a/awklib/eg/prog/wc.awk b/awklib/eg/prog/wc.awk
index c46d098..4680473 100644
--- a/awklib/eg/prog/wc.awk
+++ b/awklib/eg/prog/wc.awk
@@ -2,39 +2,46 @@
 #
 # Arnold Robbins, arnold@skeeve.com, Public Domain
 # May 1993
+# Revised September 2020
 
 # Options:
 #    -l    only count lines
 #    -w    only count words
-#    -c    only count characters
+#    -c    only count bytes
+#    -m    only count characters
 #
-# Default is to count lines, words, characters
+# Default is to count lines, words, bytes
 #
 # Requires getopt() and file transition library functions
+# Requires mbs extension from gawkextlib
+
+@load "mbs"
 
 BEGIN {
     # let getopt() print a message about
     # invalid options. we ignore them
-    while ((c = getopt(ARGC, ARGV, "lwc")) != -1) {
+    while ((c = getopt(ARGC, ARGV, "lwcm")) != -1) {
         if (c == "l")
             do_lines = 1
         else if (c == "w")
             do_words = 1
         else if (c == "c")
+            do_bytes = 1
+        else if (c == "m")
             do_chars = 1
     }
     for (i = 1; i < Optind; i++)
         ARGV[i] = ""
 
-    # if no options, do all
-    if (! do_lines && ! do_words && ! do_chars)
-        do_lines = do_words = do_chars = 1
+    # if no options, do lines, words, bytes
+    if (! do_lines && ! do_words && ! do_chars && ! do_bytes)
+        do_lines = do_words = do_bytes = 1
 
     print_total = (ARGC - i > 1)
 }
 function beginfile(file)
 {
-    lines = words = chars = 0
+    lines = words = chars = bytes = 0
     fname = FILENAME
 }
 function endfile(file)
@@ -42,17 +49,21 @@ function endfile(file)
     tlines += lines
     twords += words
     tchars += chars
+    tbytes += bytes
     if (do_lines)
         printf "\t%d", lines
     if (do_words)
         printf "\t%d", words
     if (do_chars)
         printf "\t%d", chars
+    if (do_bytes)
+        printf "\t%d", bytes
     printf "\t%s\n", fname
 }
 # do per line
 {
     chars += length($0) + 1    # get newline
+    bytes += mbs_length($0) + 1
     lines++
     words += NF
 }
@@ -64,6 +75,8 @@ END {
             printf "\t%d", twords
         if (do_chars)
             printf "\t%d", tchars
+        if (do_bytes)
+            printf "\t%d", tbytes
         print "\ttotal"
     }
 }
diff --git a/doc/ChangeLog b/doc/ChangeLog
index 0fbba4e..a6ad9a7 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -2,6 +2,11 @@
 
        * gawktexi.in: Minor edits.
 
+       Unrelated:
+
+       * gawktexi (Wc Program): Update to POSIX, support both bytes
+       and characters via the gawkextlib mbs extension.
+
 2020-10-01         Arnold D. Robbins     <arnold@skeeve.com>
 
        * gawktexi.in (Split Program): Rewrite split to be POSIX
diff --git a/doc/gawk.info b/doc/gawk.info
index cf00ed1..320e065 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -19011,10 +19011,76 @@ File: gawk.info,  Node: Wc Program,  Prev: Uniq 
Program,  Up: Clones
 11.2.7 Counting Things
 ----------------------
 
-The 'wc' (word count) utility counts lines, words, and characters in one
-or more input files.  Its usage is as follows:
+The 'wc' (word count) utility counts lines, words, characters and bytes
+in one or more input files.
 
-     'wc' ['-lwc'] [FILES ...]
+* Menu:
+
+* Bytes vs. Characters::        Modern character sets.
+* Using extensions::            A brief intro to extensions.
+* wc program::                  Code for 'wc.awk'.
+
+
+File: gawk.info,  Node: Bytes vs. Characters,  Next: Using extensions,  Up: Wc 
Program
+
+11.2.7.1 Modern Character Sets
+..............................
+
+In the early days of computing, single bytes were used for storing
+characters.  The most common character sets were ASCII and EBCDIC, which
+each provided all the English upper- and lowercase letters, the 10
+Hindu-Arabic numerals from 0 through 9, and a number of other standard
+punctuation and control characters.
+
+   Today, the most popular character set in use is Unicode (of which
+ASCII is a pure subset).  Unicode provides tens of thousands of unique
+characters (called "code points") to cover most existing human languages
+(living and dead) and a number of nonhuman ones as well (such as Klingon
+and J.R.R. Tolkien's elvish languages).
+
+   To save space in files, Unicode code points are "encoded", where each
+character takes from one to four bytes in the file.  UTF-8 is possibly
+the most popular of such "multibyte encodings".
+
+   The POSIX standard requires that 'awk' function in terms of
+characters, not bytes.  Thus in 'gawk', 'length()', 'substr()',
+'split()', 'match()' and the other string functions (*note String
+Functions::) all work in terms of characters in the local character set,
+and not in terms of bytes.  (Not all 'awk' implementations do so,
+though).
+
+   There is no standard, built-in way to distinguish characters from
+bytes in an 'awk' program.  For an 'awk' implementation of 'wc', which
+needs to make such a distinction, we will have to use an external
+extension.
+
+
+File: gawk.info,  Node: Using extensions,  Next: wc program,  Prev: Bytes vs. 
Characters,  Up: Wc Program
+
+11.2.7.2 A Brief Introduction To Extensions
+...........................................
+
+Loadable extensions are presented in full detail in *note Dynamic
+Extensions::.  They provide a way to add functions to 'gawk' which can
+call out to other facilities written in C or C++.
+
+   For the purposes of 'wc.awk', it's enough to know that the extension
+is loaded with the '@load' directive, and the additional function we
+will use is called 'mbs_length()'.  This function returns the number of
+bytes in a string, and not the number of characters.
+
+   The '"mbs"' extension comes from the 'gawkextlib' project.  *Note
+gawkextlib:: for more information.
+
+
+File: gawk.info,  Node: wc program,  Prev: Using extensions,  Up: Wc Program
+
+11.2.7.3 Code for 'wc.awk'
+..........................
+
+The usage for 'wc' is as follows:
+
+     'wc' ['-lwcm'] [FILES ...]
 
    If no files are specified on the command line, 'wc' reads its
 standard input.  If there are multiple files, it also prints total
@@ -19031,21 +19097,27 @@ follows:
      data.
 
 '-c'
+     Count only bytes.  Once upon a time, the 'c' in this option stood
+     for "characters."  But, as explained earlier, bytes and character
+     are no longer synonymous with each other.
+
+'-m'
      Count only characters.
 
    Implementing 'wc' in 'awk' is particularly elegant, because 'awk'
 does a lot of the work for us; it splits lines into words (i.e., fields)
 and counts them, it counts lines (i.e., records), and it can easily tell
-us how long a line is.
+us how long a line is in characters.
 
    This program uses the 'getopt()' library function (*note Getopt
 Function::) and the file-transition functions (*note Filetrans
 Function::).
 
-   This version has one notable difference from traditional versions of
-'wc': it always prints the counts in the order lines, words, and
-characters.  Traditional versions note the order of the '-l', '-w', and
-'-c' options on the command line, and print the counts in that order.
+   This version has one notable difference from older versions of 'wc':
+it always prints the counts in the order lines, words, characters and
+bytes.  Older versions note the order of the '-l', '-w', and '-c'
+options on the command line, and print the counts in that order.  POSIX
+does not mandate this behavior, though.
 
    The 'BEGIN' rule does the argument processing.  The variable
 'print_total' is true if more than one file is named on the command
@@ -19056,40 +19128,46 @@ line:
      # Options:
      #    -l    only count lines
      #    -w    only count words
-     #    -c    only count characters
+     #    -c    only count bytes
+     #    -m    only count characters
      #
-     # Default is to count lines, words, characters
+     # Default is to count lines, words, bytes
      #
      # Requires getopt() and file transition library functions
+     # Requires mbs extension from gawkextlib
+
+     @load "mbs"
 
      BEGIN {
          # let getopt() print a message about
          # invalid options. we ignore them
-         while ((c = getopt(ARGC, ARGV, "lwc")) != -1) {
+         while ((c = getopt(ARGC, ARGV, "lwcm")) != -1) {
              if (c == "l")
                  do_lines = 1
              else if (c == "w")
                  do_words = 1
              else if (c == "c")
+                 do_bytes = 1
+             else if (c == "m")
                  do_chars = 1
          }
          for (i = 1; i < Optind; i++)
              ARGV[i] = ""
 
-         # if no options, do all
-         if (! do_lines && ! do_words && ! do_chars)
-             do_lines = do_words = do_chars = 1
+         # if no options, do lines, words, bytes
+         if (! do_lines && ! do_words && ! do_chars && ! do_bytes)
+             do_lines = do_words = do_bytes = 1
 
          print_total = (ARGC - i > 1)
      }
 
    The 'beginfile()' function is simple; it just resets the counts of
-lines, words, and characters to zero, and saves the current file name in
-'fname':
+lines, words, characters and bytes to zero, and saves the current file
+name in 'fname':
 
      function beginfile(file)
      {
-         lines = words = chars = 0
+         lines = words = chars = bytes = 0
          fname = FILENAME
      }
 
@@ -19103,26 +19181,31 @@ those numbers for the file that was just read.  It 
relies on
          tlines += lines
          twords += words
          tchars += chars
+         tbytes += bytes
          if (do_lines)
              printf "\t%d", lines
          if (do_words)
              printf "\t%d", words
          if (do_chars)
              printf "\t%d", chars
+         if (do_bytes)
+             printf "\t%d", bytes
          printf "\t%s\n", fname
      }
 
    There is one rule that is executed for each line.  It adds the length
-of the record, plus one, to 'chars'.(1)  Adding one plus the record
-length is needed because the newline character separating records (the
-value of 'RS') is not part of the record itself, and thus not included
-in its length.  Next, 'lines' is incremented for each line read, and
+of the record, plus one, to 'chars'.  Adding one plus the record length
+is needed because the newline character separating records (the value of
+'RS') is not part of the record itself, and thus not included in its
+length.  Similarly, it adds the length of the record in bytes, plus one,
+to 'bytes'.  Next, 'lines' is incremented for each line read, and
 'words' is incremented by the value of 'NF', which is the number of
 "words" on this line:
 
      # do per line
      {
          chars += length($0) + 1    # get newline
+         bytes += mbs_length($0) + 1
          lines++
          words += NF
      }
@@ -19137,15 +19220,12 @@ in its length.  Next, 'lines' is incremented for each 
line read, and
                  printf "\t%d", twords
              if (do_chars)
                  printf "\t%d", tchars
+             if (do_bytes)
+                 printf "\t%d", tbytes
              print "\ttotal"
          }
      }
 
-   ---------- Footnotes ----------
-
-   (1) Because 'gawk' understands multibyte locales, this code counts
-characters, not bytes.
-
 
 File: gawk.info,  Node: Miscellaneous Programs,  Next: Programs Summary,  
Prev: Clones,  Up: Sample Programs
 
@@ -35111,6 +35191,7 @@ Index
 * built-in functions:                    Functions.           (line   6)
 * built-in functions, evaluation order:  Calling Built-in.    (line  30)
 * BusyBox Awk:                           Other Versions.      (line  88)
+* bytes, counting:                       Wc Program.          (line   6)
 * C library functions, assert():         Assert Function.     (line   6)
 * C library functions, getopt():         Getopt Function.     (line  15)
 * C library functions, getpwent():       Passwd Functions.    (line  16)
@@ -35303,7 +35384,7 @@ Index
 * coprocesses <1>:                       Two-way I/O.         (line  27)
 * cos:                                   Numeric Functions.   (line  16)
 * cosine:                                Numeric Functions.   (line  16)
-* counting words, lines, and characters: Wc Program.          (line   6)
+* counting words, lines, characters, and bytese: Wc Program.  (line   6)
 * csh utility:                           Statements/Lines.    (line  45)
 * csh utility, POSIXLY_CORRECT environment variable: Options. (line 405)
 * csh utility, |& operator, comparison with: Two-way I/O.     (line  27)
@@ -37765,7 +37846,7 @@ Index
 * watchpoint (debugger):                 Debugging Terms.     (line  42)
 * watchpoints, show in debugger:         Debugger Info.       (line  51)
 * wc utility:                            Wc Program.          (line   6)
-* wc.awk program:                        Wc Program.          (line  46)
+* wc.awk program:                        wc program.          (line  51)
 * Weinberger, Peter:                     History.             (line  17)
 * Weinberger, Peter <1>:                 Contributors.        (line  12)
 * where debugger command (alias for backtrace): Execution Stack.
@@ -38140,266 +38221,268 @@ Ref: Split Program-Footnote-1766907
 Node: Tee Program767080
 Node: Uniq Program769870
 Node: Wc Program777434
-Ref: Wc Program-Footnote-1781689
-Node: Miscellaneous Programs781783
-Node: Dupword Program782996
-Node: Alarm Program785026
-Node: Translate Program789881
-Ref: Translate Program-Footnote-1794446
-Node: Labels Program794716
-Ref: Labels Program-Footnote-1798067
-Node: Word Sorting798151
-Node: History Sorting802223
-Node: Extract Program804448
-Node: Simple Sed812502
-Node: Igawk Program815576
-Ref: Igawk Program-Footnote-1829907
-Ref: Igawk Program-Footnote-2830109
-Ref: Igawk Program-Footnote-3830231
-Node: Anagram Program830346
-Node: Signature Program833408
-Node: Programs Summary834655
-Node: Programs Exercises835869
-Ref: Programs Exercises-Footnote-1839999
-Node: Advanced Features840085
-Node: Nondecimal Data842075
-Node: Array Sorting843666
-Node: Controlling Array Traversal844366
-Ref: Controlling Array Traversal-Footnote-1852734
-Node: Array Sorting Functions852852
-Ref: Array Sorting Functions-Footnote-1857943
-Node: Two-way I/O858139
-Ref: Two-way I/O-Footnote-1865860
-Ref: Two-way I/O-Footnote-2866047
-Node: TCP/IP Networking866129
-Node: Profiling869247
-Node: Advanced Features Summary878561
-Node: Internationalization880405
-Node: I18N and L10N881885
-Node: Explaining gettext882572
-Ref: Explaining gettext-Footnote-1888464
-Ref: Explaining gettext-Footnote-2888649
-Node: Programmer i18n888814
-Ref: Programmer i18n-Footnote-1893763
-Node: Translator i18n893812
-Node: String Extraction894606
-Ref: String Extraction-Footnote-1895738
-Node: Printf Ordering895824
-Ref: Printf Ordering-Footnote-1898610
-Node: I18N Portability898674
-Ref: I18N Portability-Footnote-1901130
-Node: I18N Example901193
-Ref: I18N Example-Footnote-1904468
-Ref: I18N Example-Footnote-2904541
-Node: Gawk I18N904650
-Node: I18N Summary905299
-Node: Debugger906640
-Node: Debugging907640
-Node: Debugging Concepts908081
-Node: Debugging Terms909890
-Node: Awk Debugging912465
-Ref: Awk Debugging-Footnote-1913410
-Node: Sample Debugging Session913542
-Node: Debugger Invocation914076
-Node: Finding The Bug915462
-Node: List of Debugger Commands921936
-Node: Breakpoint Control923269
-Node: Debugger Execution Control926963
-Node: Viewing And Changing Data930325
-Node: Execution Stack933866
-Node: Debugger Info935503
-Node: Miscellaneous Debugger Commands939574
-Node: Readline Support944636
-Node: Limitations945532
-Node: Debugging Summary948086
-Node: Namespaces949365
-Node: Global Namespace950476
-Node: Qualified Names951874
-Node: Default Namespace952873
-Node: Changing The Namespace953614
-Node: Naming Rules955228
-Node: Internal Name Management957076
-Node: Namespace Example958118
-Node: Namespace And Features960680
-Node: Namespace Summary962115
-Node: Arbitrary Precision Arithmetic963592
-Node: Computer Arithmetic965079
-Ref: table-numeric-ranges968845
-Ref: table-floating-point-ranges969338
-Ref: Computer Arithmetic-Footnote-1969996
-Node: Math Definitions970053
-Ref: table-ieee-formats973369
-Ref: Math Definitions-Footnote-1973972
-Node: MPFR features974077
-Node: FP Math Caution975795
-Ref: FP Math Caution-Footnote-1976867
-Node: Inexactness of computations977236
-Node: Inexact representation978196
-Node: Comparing FP Values979556
-Node: Errors accumulate980797
-Node: Getting Accuracy982230
-Node: Try To Round984940
-Node: Setting precision985839
-Ref: table-predefined-precision-strings986536
-Node: Setting the rounding mode988366
-Ref: table-gawk-rounding-modes988740
-Ref: Setting the rounding mode-Footnote-1992671
-Node: Arbitrary Precision Integers992850
-Ref: Arbitrary Precision Integers-Footnote-1996025
-Node: Checking for MPFR996174
-Node: POSIX Floating Point Problems997648
-Ref: POSIX Floating Point Problems-Footnote-11001933
-Node: Floating point summary1001971
-Node: Dynamic Extensions1004161
-Node: Extension Intro1005714
-Node: Plugin License1006980
-Node: Extension Mechanism Outline1007777
-Ref: figure-load-extension1008216
-Ref: figure-register-new-function1009781
-Ref: figure-call-new-function1010873
-Node: Extension API Description1012935
-Node: Extension API Functions Introduction1014648
-Ref: table-api-std-headers1016484
-Node: General Data Types1020733
-Ref: General Data Types-Footnote-11029363
-Node: Memory Allocation Functions1029662
-Ref: Memory Allocation Functions-Footnote-11034163
-Node: Constructor Functions1034262
-Node: API Ownership of MPFR and GMP Values1037728
-Node: Registration Functions1039041
-Node: Extension Functions1039741
-Node: Exit Callback Functions1045063
-Node: Extension Version String1046313
-Node: Input Parsers1046976
-Node: Output Wrappers1059697
-Node: Two-way processors1064209
-Node: Printing Messages1066474
-Ref: Printing Messages-Footnote-11067645
-Node: Updating ERRNO1067798
-Node: Requesting Values1068537
-Ref: table-value-types-returned1069274
-Node: Accessing Parameters1070210
-Node: Symbol Table Access1071447
-Node: Symbol table by name1071959
-Ref: Symbol table by name-Footnote-11074983
-Node: Symbol table by cookie1075111
-Ref: Symbol table by cookie-Footnote-11079296
-Node: Cached values1079360
-Ref: Cached values-Footnote-11082896
-Node: Array Manipulation1083049
-Ref: Array Manipulation-Footnote-11084140
-Node: Array Data Types1084177
-Ref: Array Data Types-Footnote-11086835
-Node: Array Functions1086927
-Node: Flattening Arrays1091425
-Node: Creating Arrays1098401
-Node: Redirection API1103168
-Node: Extension API Variables1106001
-Node: Extension Versioning1106712
-Ref: gawk-api-version1107141
-Node: Extension GMP/MPFR Versioning1108872
-Node: Extension API Informational Variables1110500
-Node: Extension API Boilerplate1111573
-Node: Changes from API V11115547
-Node: Finding Extensions1117119
-Node: Extension Example1117678
-Node: Internal File Description1118476
-Node: Internal File Ops1122556
-Ref: Internal File Ops-Footnote-11133906
-Node: Using Internal File Ops1134046
-Ref: Using Internal File Ops-Footnote-11136429
-Node: Extension Samples1136703
-Node: Extension Sample File Functions1138232
-Node: Extension Sample Fnmatch1145881
-Node: Extension Sample Fork1147368
-Node: Extension Sample Inplace1148586
-Node: Extension Sample Ord1152212
-Node: Extension Sample Readdir1153048
-Ref: table-readdir-file-types1153937
-Node: Extension Sample Revout1155004
-Node: Extension Sample Rev2way1155593
-Node: Extension Sample Read write array1156333
-Node: Extension Sample Readfile1158275
-Node: Extension Sample Time1159370
-Node: Extension Sample API Tests1161122
-Node: gawkextlib1161614
-Node: Extension summary1164532
-Node: Extension Exercises1168234
-Node: Language History1169476
-Node: V7/SVR3.11171132
-Node: SVR41173284
-Node: POSIX1174718
-Node: BTL1176099
-Node: POSIX/GNU1176828
-Node: Feature History1182606
-Node: Common Extensions1198925
-Node: Ranges and Locales1200208
-Ref: Ranges and Locales-Footnote-11204824
-Ref: Ranges and Locales-Footnote-21204851
-Ref: Ranges and Locales-Footnote-31205086
-Node: Contributors1205309
-Node: History summary1211306
-Node: Installation1212686
-Node: Gawk Distribution1213630
-Node: Getting1214114
-Node: Extracting1215077
-Node: Distribution contents1216715
-Node: Unix Installation1223195
-Node: Quick Installation1223877
-Node: Shell Startup Files1226291
-Node: Additional Configuration Options1227380
-Node: Configuration Philosophy1229695
-Node: Non-Unix Installation1232064
-Node: PC Installation1232524
-Node: PC Binary Installation1233362
-Node: PC Compiling1233797
-Node: PC Using1234914
-Node: Cygwin1238467
-Node: MSYS1239691
-Node: VMS Installation1240293
-Node: VMS Compilation1241084
-Ref: VMS Compilation-Footnote-11242313
-Node: VMS Dynamic Extensions1242371
-Node: VMS Installation Details1244056
-Node: VMS Running1246309
-Node: VMS GNV1250588
-Node: VMS Old Gawk1251323
-Node: Bugs1251794
-Node: Bug address1252457
-Node: Usenet1255439
-Node: Maintainers1256443
-Node: Other Versions1257628
-Node: Installation summary1264716
-Node: Notes1265925
-Node: Compatibility Mode1266719
-Node: Additions1267501
-Node: Accessing The Source1268426
-Node: Adding Code1269863
-Node: New Ports1276082
-Node: Derived Files1280457
-Ref: Derived Files-Footnote-11286117
-Ref: Derived Files-Footnote-21286152
-Ref: Derived Files-Footnote-31286750
-Node: Future Extensions1286864
-Node: Implementation Limitations1287522
-Node: Extension Design1288732
-Node: Old Extension Problems1289876
-Ref: Old Extension Problems-Footnote-11291394
-Node: Extension New Mechanism Goals1291451
-Ref: Extension New Mechanism Goals-Footnote-11294815
-Node: Extension Other Design Decisions1295004
-Node: Extension Future Growth1297117
-Node: Notes summary1297723
-Node: Basic Concepts1298881
-Node: Basic High Level1299562
-Ref: figure-general-flow1299844
-Ref: figure-process-flow1300529
-Ref: Basic High Level-Footnote-11303830
-Node: Basic Data Typing1304015
-Node: Glossary1307343
-Node: Copying1339228
-Node: GNU Free Documentation License1376771
-Node: Index1401891
+Node: Bytes vs. Characters777831
+Node: Using extensions779379
+Node: wc program780137
+Node: Miscellaneous Programs784995
+Node: Dupword Program786208
+Node: Alarm Program788238
+Node: Translate Program793093
+Ref: Translate Program-Footnote-1797658
+Node: Labels Program797928
+Ref: Labels Program-Footnote-1801279
+Node: Word Sorting801363
+Node: History Sorting805435
+Node: Extract Program807660
+Node: Simple Sed815714
+Node: Igawk Program818788
+Ref: Igawk Program-Footnote-1833119
+Ref: Igawk Program-Footnote-2833321
+Ref: Igawk Program-Footnote-3833443
+Node: Anagram Program833558
+Node: Signature Program836620
+Node: Programs Summary837867
+Node: Programs Exercises839081
+Ref: Programs Exercises-Footnote-1843211
+Node: Advanced Features843297
+Node: Nondecimal Data845287
+Node: Array Sorting846878
+Node: Controlling Array Traversal847578
+Ref: Controlling Array Traversal-Footnote-1855946
+Node: Array Sorting Functions856064
+Ref: Array Sorting Functions-Footnote-1861155
+Node: Two-way I/O861351
+Ref: Two-way I/O-Footnote-1869072
+Ref: Two-way I/O-Footnote-2869259
+Node: TCP/IP Networking869341
+Node: Profiling872459
+Node: Advanced Features Summary881773
+Node: Internationalization883617
+Node: I18N and L10N885097
+Node: Explaining gettext885784
+Ref: Explaining gettext-Footnote-1891676
+Ref: Explaining gettext-Footnote-2891861
+Node: Programmer i18n892026
+Ref: Programmer i18n-Footnote-1896975
+Node: Translator i18n897024
+Node: String Extraction897818
+Ref: String Extraction-Footnote-1898950
+Node: Printf Ordering899036
+Ref: Printf Ordering-Footnote-1901822
+Node: I18N Portability901886
+Ref: I18N Portability-Footnote-1904342
+Node: I18N Example904405
+Ref: I18N Example-Footnote-1907680
+Ref: I18N Example-Footnote-2907753
+Node: Gawk I18N907862
+Node: I18N Summary908511
+Node: Debugger909852
+Node: Debugging910852
+Node: Debugging Concepts911293
+Node: Debugging Terms913102
+Node: Awk Debugging915677
+Ref: Awk Debugging-Footnote-1916622
+Node: Sample Debugging Session916754
+Node: Debugger Invocation917288
+Node: Finding The Bug918674
+Node: List of Debugger Commands925148
+Node: Breakpoint Control926481
+Node: Debugger Execution Control930175
+Node: Viewing And Changing Data933537
+Node: Execution Stack937078
+Node: Debugger Info938715
+Node: Miscellaneous Debugger Commands942786
+Node: Readline Support947848
+Node: Limitations948744
+Node: Debugging Summary951298
+Node: Namespaces952577
+Node: Global Namespace953688
+Node: Qualified Names955086
+Node: Default Namespace956085
+Node: Changing The Namespace956826
+Node: Naming Rules958440
+Node: Internal Name Management960288
+Node: Namespace Example961330
+Node: Namespace And Features963892
+Node: Namespace Summary965327
+Node: Arbitrary Precision Arithmetic966804
+Node: Computer Arithmetic968291
+Ref: table-numeric-ranges972057
+Ref: table-floating-point-ranges972550
+Ref: Computer Arithmetic-Footnote-1973208
+Node: Math Definitions973265
+Ref: table-ieee-formats976581
+Ref: Math Definitions-Footnote-1977184
+Node: MPFR features977289
+Node: FP Math Caution979007
+Ref: FP Math Caution-Footnote-1980079
+Node: Inexactness of computations980448
+Node: Inexact representation981408
+Node: Comparing FP Values982768
+Node: Errors accumulate984009
+Node: Getting Accuracy985442
+Node: Try To Round988152
+Node: Setting precision989051
+Ref: table-predefined-precision-strings989748
+Node: Setting the rounding mode991578
+Ref: table-gawk-rounding-modes991952
+Ref: Setting the rounding mode-Footnote-1995883
+Node: Arbitrary Precision Integers996062
+Ref: Arbitrary Precision Integers-Footnote-1999237
+Node: Checking for MPFR999386
+Node: POSIX Floating Point Problems1000860
+Ref: POSIX Floating Point Problems-Footnote-11005145
+Node: Floating point summary1005183
+Node: Dynamic Extensions1007373
+Node: Extension Intro1008926
+Node: Plugin License1010192
+Node: Extension Mechanism Outline1010989
+Ref: figure-load-extension1011428
+Ref: figure-register-new-function1012993
+Ref: figure-call-new-function1014085
+Node: Extension API Description1016147
+Node: Extension API Functions Introduction1017860
+Ref: table-api-std-headers1019696
+Node: General Data Types1023945
+Ref: General Data Types-Footnote-11032575
+Node: Memory Allocation Functions1032874
+Ref: Memory Allocation Functions-Footnote-11037375
+Node: Constructor Functions1037474
+Node: API Ownership of MPFR and GMP Values1040940
+Node: Registration Functions1042253
+Node: Extension Functions1042953
+Node: Exit Callback Functions1048275
+Node: Extension Version String1049525
+Node: Input Parsers1050188
+Node: Output Wrappers1062909
+Node: Two-way processors1067421
+Node: Printing Messages1069686
+Ref: Printing Messages-Footnote-11070857
+Node: Updating ERRNO1071010
+Node: Requesting Values1071749
+Ref: table-value-types-returned1072486
+Node: Accessing Parameters1073422
+Node: Symbol Table Access1074659
+Node: Symbol table by name1075171
+Ref: Symbol table by name-Footnote-11078195
+Node: Symbol table by cookie1078323
+Ref: Symbol table by cookie-Footnote-11082508
+Node: Cached values1082572
+Ref: Cached values-Footnote-11086108
+Node: Array Manipulation1086261
+Ref: Array Manipulation-Footnote-11087352
+Node: Array Data Types1087389
+Ref: Array Data Types-Footnote-11090047
+Node: Array Functions1090139
+Node: Flattening Arrays1094637
+Node: Creating Arrays1101613
+Node: Redirection API1106380
+Node: Extension API Variables1109213
+Node: Extension Versioning1109924
+Ref: gawk-api-version1110353
+Node: Extension GMP/MPFR Versioning1112084
+Node: Extension API Informational Variables1113712
+Node: Extension API Boilerplate1114785
+Node: Changes from API V11118759
+Node: Finding Extensions1120331
+Node: Extension Example1120890
+Node: Internal File Description1121688
+Node: Internal File Ops1125768
+Ref: Internal File Ops-Footnote-11137118
+Node: Using Internal File Ops1137258
+Ref: Using Internal File Ops-Footnote-11139641
+Node: Extension Samples1139915
+Node: Extension Sample File Functions1141444
+Node: Extension Sample Fnmatch1149093
+Node: Extension Sample Fork1150580
+Node: Extension Sample Inplace1151798
+Node: Extension Sample Ord1155424
+Node: Extension Sample Readdir1156260
+Ref: table-readdir-file-types1157149
+Node: Extension Sample Revout1158216
+Node: Extension Sample Rev2way1158805
+Node: Extension Sample Read write array1159545
+Node: Extension Sample Readfile1161487
+Node: Extension Sample Time1162582
+Node: Extension Sample API Tests1164334
+Node: gawkextlib1164826
+Node: Extension summary1167744
+Node: Extension Exercises1171446
+Node: Language History1172688
+Node: V7/SVR3.11174344
+Node: SVR41176496
+Node: POSIX1177930
+Node: BTL1179311
+Node: POSIX/GNU1180040
+Node: Feature History1185818
+Node: Common Extensions1202137
+Node: Ranges and Locales1203420
+Ref: Ranges and Locales-Footnote-11208036
+Ref: Ranges and Locales-Footnote-21208063
+Ref: Ranges and Locales-Footnote-31208298
+Node: Contributors1208521
+Node: History summary1214518
+Node: Installation1215898
+Node: Gawk Distribution1216842
+Node: Getting1217326
+Node: Extracting1218289
+Node: Distribution contents1219927
+Node: Unix Installation1226407
+Node: Quick Installation1227089
+Node: Shell Startup Files1229503
+Node: Additional Configuration Options1230592
+Node: Configuration Philosophy1232907
+Node: Non-Unix Installation1235276
+Node: PC Installation1235736
+Node: PC Binary Installation1236574
+Node: PC Compiling1237009
+Node: PC Using1238126
+Node: Cygwin1241679
+Node: MSYS1242903
+Node: VMS Installation1243505
+Node: VMS Compilation1244296
+Ref: VMS Compilation-Footnote-11245525
+Node: VMS Dynamic Extensions1245583
+Node: VMS Installation Details1247268
+Node: VMS Running1249521
+Node: VMS GNV1253800
+Node: VMS Old Gawk1254535
+Node: Bugs1255006
+Node: Bug address1255669
+Node: Usenet1258651
+Node: Maintainers1259655
+Node: Other Versions1260840
+Node: Installation summary1267928
+Node: Notes1269137
+Node: Compatibility Mode1269931
+Node: Additions1270713
+Node: Accessing The Source1271638
+Node: Adding Code1273075
+Node: New Ports1279294
+Node: Derived Files1283669
+Ref: Derived Files-Footnote-11289329
+Ref: Derived Files-Footnote-21289364
+Ref: Derived Files-Footnote-31289962
+Node: Future Extensions1290076
+Node: Implementation Limitations1290734
+Node: Extension Design1291944
+Node: Old Extension Problems1293088
+Ref: Old Extension Problems-Footnote-11294606
+Node: Extension New Mechanism Goals1294663
+Ref: Extension New Mechanism Goals-Footnote-11298027
+Node: Extension Other Design Decisions1298216
+Node: Extension Future Growth1300329
+Node: Notes summary1300935
+Node: Basic Concepts1302093
+Node: Basic High Level1302774
+Ref: figure-general-flow1303056
+Ref: figure-process-flow1303741
+Ref: Basic High Level-Footnote-11307042
+Node: Basic Data Typing1307227
+Node: Glossary1310555
+Node: Copying1342440
+Node: GNU Free Documentation License1379983
+Node: Index1405103
 
 End Tag Table
 
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 9446e69..91625d0 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -26761,19 +26761,76 @@ as fast.''  Consider how to rewrite the logic to 
follow this suggestion.
 @node Wc Program
 @subsection Counting Things
 
-@c FIXME: One day, update to current POSIX version of wc
-
-@cindex counting words, lines, and characters
+@cindex counting words, lines, characters, and bytese
 @cindex input files @subentry counting elements in
 @cindex words @subentry counting
 @cindex characters @subentry counting
 @cindex lines @subentry counting
+@cindex bytes @subentry counting
 @cindex @command{wc} utility
-The @command{wc} (word count) utility counts lines, words, and characters in
-one or more input files. Its usage is as follows:
+The @command{wc} (word count) utility counts lines, words, characters
+and bytes in one or more input files.
+
+@menu
+* Bytes vs. Characters::        Modern character sets.
+* Using extensions::            A brief intro to extensions.
+* @command{wc} program::                  Code for @file{wc.awk}.
+@end menu
+
+@node Bytes vs. Characters
+@subsubsection Modern Character Sets
+
+In the early days of computing, single bytes were used for storing
+characters.  The most common character sets were ASCII and EBCDIC,
+which each provided all the English upper- and lowercase letters, the 10
+Hindu-Arabic numerals from 0 through 9, and a number of other standard
+punctuation and control characters.
+
+Today, the most popular character set in use is Unicode (of which ASCII
+is a pure subset). Unicode provides tens of thousands of unique characters
+(called @dfn{code points}) to cover most existing human languages (living
+and dead) and a number of  nonhuman ones as well (such as Klingon and
+J.R.R.@: Tolkien's elvish languages).
+
+To save space in files, Unicode code points are @dfn{encoded}, where each
+character takes from one to four bytes in the file.  UTF-8 is possibly
+the most popular of such @dfn{multibyte encodings}.
+
+The POSIX standard requires that @command{awk} function in terms
+of characters, not bytes.  Thus in @command{gawk}, @code{length()},
+@code{substr()}, @code{split()}, @code{match()} and the other string
+functions (@pxref{String Functions}) all work in terms of characters in
+the local character set, and not in terms of bytes. (Not all @command{awk}
+implementations do so, though).
+
+There is no standard, built-in way to distinguish characters from bytes
+in an @command{awk} program.  For an @command{awk} implementation of
+@command{wc}, which needs to make such a distinction, we will have to
+use an external extension.
+
+@node Using extensions
+@subsubsection A Brief Introduction To Extensions
+
+Loadable extensions are presented in full detail in @ref{Dynamic Extensions}.
+They provide a way to add functions to @command{gawk} which can call
+out to other facilities written in C or C++.
+
+For the purposes of
+@file{wc.awk}, it's enough to know that the extension is loaded
+with the @code{@@load} directive, and the additional function we
+will use is called @code{mbs_length()}.  This function returns the
+number of bytes in a string, and not the number of characters.
+
+The @code{"mbs"} extension comes from the @code{gawkextlib}
+project. @xref{gawkextlib} for more information.
+
+@node @command{wc} program
+@subsubsection Code for @file{wc.awk}
+
+The usage for @command{wc} is as follows:
 
 @display
-@command{wc} [@option{-lwc}] [@var{files} @dots{}]
+@command{wc} [@option{-lwcm}] [@var{files} @dots{}]
 @end display
 
 If no files are specified on the command line, @command{wc} reads its standard
@@ -26791,24 +26848,30 @@ by spaces and/or TABs.  Luckily, this is the normal 
way @command{awk} separates
 fields in its input data.
 
 @item -c
+Count only bytes.
+Once upon a time, the @samp{c} in this option stood for ``characters.''
+But, as explained earlier, bytes and character are no longer synonymous
+with each other.
+
+@item -m
 Count only characters.
 @end table
 
 Implementing @command{wc} in @command{awk} is particularly elegant,
 because @command{awk} does a lot of the work for us; it splits lines into
 words (i.e., fields) and counts them, it counts lines (i.e., records),
-and it can easily tell us how long a line is.
+and it can easily tell us how long a line is in characters.
 
 This program uses the @code{getopt()} library function
 (@pxref{Getopt Function})
 and the file-transition functions
 (@pxref{Filetrans Function}).
 
-This version has one notable difference from traditional versions of
+This version has one notable difference from older versions of
 @command{wc}: it always prints the counts in the order lines, words,
-and characters.  Traditional versions note the order of the @option{-l},
+characters and bytes.  Older versions note the order of the @option{-l},
 @option{-w}, and @option{-c} options on the command line, and print the
-counts in that order.
+counts in that order.  POSIX does not mandate this behavior, though.
 
 The @code{BEGIN} rule does the argument processing.  The variable
 @code{print_total} is true if more than one file is named on the
@@ -26824,6 +26887,7 @@ command line:
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
 # May 1993
+# Revised September 2020
 @c endfile
 @end ignore
 @c file eg/prog/wc.awk
@@ -26831,29 +26895,35 @@ command line:
 # Options:
 #    -l    only count lines
 #    -w    only count words
-#    -c    only count characters
+#    -c    only count bytes
+#    -m    only count characters
 #
-# Default is to count lines, words, characters
+# Default is to count lines, words, bytes
 #
 # Requires getopt() and file transition library functions
+# Requires mbs extension from gawkextlib
+
+@@load "mbs"
 
 BEGIN @{
     # let getopt() print a message about
     # invalid options. we ignore them
-    while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{
+    while ((c = getopt(ARGC, ARGV, "lwcm")) != -1) @{
         if (c == "l")
             do_lines = 1
         else if (c == "w")
             do_words = 1
         else if (c == "c")
+            do_bytes = 1
+        else if (c == "m")
             do_chars = 1
     @}
     for (i = 1; i < Optind; i++)
         ARGV[i] = ""
 
-    # if no options, do all
-    if (! do_lines && ! do_words && ! do_chars)
-        do_lines = do_words = do_chars = 1
+    # if no options, do lines, words, bytes
+    if (! do_lines && ! do_words && ! do_chars && ! do_bytes)
+        do_lines = do_words = do_bytes = 1
 
     print_total = (ARGC - i > 1)
 @}
@@ -26861,14 +26931,14 @@ BEGIN @{
 @end example
 
 The @code{beginfile()} function is simple; it just resets the counts of lines,
-words, and characters to zero, and saves the current @value{FN} in
+words, characters and bytes to zero, and saves the current @value{FN} in
 @code{fname}:
 
 @example
 @c file eg/prog/wc.awk
 function beginfile(file)
 @{
-    lines = words = chars = 0
+    lines = words = chars = bytes = 0
     fname = FILENAME
 @}
 @c endfile
@@ -26886,6 +26956,7 @@ function endfile(file)
     tlines += lines
     twords += words
     tchars += chars
+    tbytes += bytes
     if (do_lines)
         printf "\t%d", lines
 @group
@@ -26894,26 +26965,28 @@ function endfile(file)
 @end group
     if (do_chars)
         printf "\t%d", chars
+    if (do_bytes)
+        printf "\t%d", bytes
     printf "\t%s\n", fname
 @}
 @c endfile
 @end example
 
 There is one rule that is executed for each line. It adds the length of
-the record, plus one, to @code{chars}.@footnote{Because @command{gawk}
-understands multibyte locales, this code counts characters, not bytes.}
-Adding one plus the record length
+the record, plus one, to @code{chars}.  Adding one plus the record length
 is needed because the newline character separating records (the value
 of @code{RS}) is not part of the record itself, and thus not included
-in its length.  Next, @code{lines} is incremented for each line read,
-and @code{words} is incremented by the value of @code{NF}, which is the
-number of ``words'' on this line:
+in its length.  Similarly, it adds the length of the record in bytes,
+plus one, to @code{bytes}.  Next, @code{lines} is incremented for each
+line read, and @code{words} is incremented by the value of @code{NF},
+which is the number of ``words'' on this line:
 
 @example
 @c file eg/prog/wc.awk
 # do per line
 @{
     chars += length($0) + 1    # get newline
+    bytes += mbs_length($0) + 1
     lines++
     words += NF
 @}
@@ -26932,6 +27005,8 @@ END @{
             printf "\t%d", twords
         if (do_chars)
             printf "\t%d", tchars
+        if (do_bytes)
+            printf "\t%d", tbytes
         print "\ttotal"
     @}
 @}
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index f96ff86..f982ae8 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -25771,19 +25771,76 @@ as fast.''  Consider how to rewrite the logic to 
follow this suggestion.
 @node Wc Program
 @subsection Counting Things
 
-@c FIXME: One day, update to current POSIX version of wc
-
-@cindex counting words, lines, and characters
+@cindex counting words, lines, characters, and bytese
 @cindex input files @subentry counting elements in
 @cindex words @subentry counting
 @cindex characters @subentry counting
 @cindex lines @subentry counting
+@cindex bytes @subentry counting
 @cindex @command{wc} utility
-The @command{wc} (word count) utility counts lines, words, and characters in
-one or more input files. Its usage is as follows:
+The @command{wc} (word count) utility counts lines, words, characters
+and bytes in one or more input files.
+
+@menu
+* Bytes vs. Characters::        Modern character sets.
+* Using extensions::            A brief intro to extensions.
+* @command{wc} program::                  Code for @file{wc.awk}.
+@end menu
+
+@node Bytes vs. Characters
+@subsubsection Modern Character Sets
+
+In the early days of computing, single bytes were used for storing
+characters.  The most common character sets were ASCII and EBCDIC,
+which each provided all the English upper- and lowercase letters, the 10
+Hindu-Arabic numerals from 0 through 9, and a number of other standard
+punctuation and control characters.
+
+Today, the most popular character set in use is Unicode (of which ASCII
+is a pure subset). Unicode provides tens of thousands of unique characters
+(called @dfn{code points}) to cover most existing human languages (living
+and dead) and a number of  nonhuman ones as well (such as Klingon and
+J.R.R.@: Tolkien's elvish languages).
+
+To save space in files, Unicode code points are @dfn{encoded}, where each
+character takes from one to four bytes in the file.  UTF-8 is possibly
+the most popular of such @dfn{multibyte encodings}.
+
+The POSIX standard requires that @command{awk} function in terms
+of characters, not bytes.  Thus in @command{gawk}, @code{length()},
+@code{substr()}, @code{split()}, @code{match()} and the other string
+functions (@pxref{String Functions}) all work in terms of characters in
+the local character set, and not in terms of bytes. (Not all @command{awk}
+implementations do so, though).
+
+There is no standard, built-in way to distinguish characters from bytes
+in an @command{awk} program.  For an @command{awk} implementation of
+@command{wc}, which needs to make such a distinction, we will have to
+use an external extension.
+
+@node Using extensions
+@subsubsection A Brief Introduction To Extensions
+
+Loadable extensions are presented in full detail in @ref{Dynamic Extensions}.
+They provide a way to add functions to @command{gawk} which can call
+out to other facilities written in C or C++.
+
+For the purposes of
+@file{wc.awk}, it's enough to know that the extension is loaded
+with the @code{@@load} directive, and the additional function we
+will use is called @code{mbs_length()}.  This function returns the
+number of bytes in a string, and not the number of characters.
+
+The @code{"mbs"} extension comes from the @code{gawkextlib}
+project. @xref{gawkextlib} for more information.
+
+@node @command{wc} program
+@subsubsection Code for @file{wc.awk}
+
+The usage for @command{wc} is as follows:
 
 @display
-@command{wc} [@option{-lwc}] [@var{files} @dots{}]
+@command{wc} [@option{-lwcm}] [@var{files} @dots{}]
 @end display
 
 If no files are specified on the command line, @command{wc} reads its standard
@@ -25801,24 +25858,30 @@ by spaces and/or TABs.  Luckily, this is the normal 
way @command{awk} separates
 fields in its input data.
 
 @item -c
+Count only bytes.
+Once upon a time, the @samp{c} in this option stood for ``characters.''
+But, as explained earlier, bytes and character are no longer synonymous
+with each other.
+
+@item -m
 Count only characters.
 @end table
 
 Implementing @command{wc} in @command{awk} is particularly elegant,
 because @command{awk} does a lot of the work for us; it splits lines into
 words (i.e., fields) and counts them, it counts lines (i.e., records),
-and it can easily tell us how long a line is.
+and it can easily tell us how long a line is in characters.
 
 This program uses the @code{getopt()} library function
 (@pxref{Getopt Function})
 and the file-transition functions
 (@pxref{Filetrans Function}).
 
-This version has one notable difference from traditional versions of
+This version has one notable difference from older versions of
 @command{wc}: it always prints the counts in the order lines, words,
-and characters.  Traditional versions note the order of the @option{-l},
+characters and bytes.  Older versions note the order of the @option{-l},
 @option{-w}, and @option{-c} options on the command line, and print the
-counts in that order.
+counts in that order.  POSIX does not mandate this behavior, though.
 
 The @code{BEGIN} rule does the argument processing.  The variable
 @code{print_total} is true if more than one file is named on the
@@ -25834,6 +25897,7 @@ command line:
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
 # May 1993
+# Revised September 2020
 @c endfile
 @end ignore
 @c file eg/prog/wc.awk
@@ -25841,29 +25905,35 @@ command line:
 # Options:
 #    -l    only count lines
 #    -w    only count words
-#    -c    only count characters
+#    -c    only count bytes
+#    -m    only count characters
 #
-# Default is to count lines, words, characters
+# Default is to count lines, words, bytes
 #
 # Requires getopt() and file transition library functions
+# Requires mbs extension from gawkextlib
+
+@@load "mbs"
 
 BEGIN @{
     # let getopt() print a message about
     # invalid options. we ignore them
-    while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{
+    while ((c = getopt(ARGC, ARGV, "lwcm")) != -1) @{
         if (c == "l")
             do_lines = 1
         else if (c == "w")
             do_words = 1
         else if (c == "c")
+            do_bytes = 1
+        else if (c == "m")
             do_chars = 1
     @}
     for (i = 1; i < Optind; i++)
         ARGV[i] = ""
 
-    # if no options, do all
-    if (! do_lines && ! do_words && ! do_chars)
-        do_lines = do_words = do_chars = 1
+    # if no options, do lines, words, bytes
+    if (! do_lines && ! do_words && ! do_chars && ! do_bytes)
+        do_lines = do_words = do_bytes = 1
 
     print_total = (ARGC - i > 1)
 @}
@@ -25871,14 +25941,14 @@ BEGIN @{
 @end example
 
 The @code{beginfile()} function is simple; it just resets the counts of lines,
-words, and characters to zero, and saves the current @value{FN} in
+words, characters and bytes to zero, and saves the current @value{FN} in
 @code{fname}:
 
 @example
 @c file eg/prog/wc.awk
 function beginfile(file)
 @{
-    lines = words = chars = 0
+    lines = words = chars = bytes = 0
     fname = FILENAME
 @}
 @c endfile
@@ -25896,6 +25966,7 @@ function endfile(file)
     tlines += lines
     twords += words
     tchars += chars
+    tbytes += bytes
     if (do_lines)
         printf "\t%d", lines
 @group
@@ -25904,26 +25975,28 @@ function endfile(file)
 @end group
     if (do_chars)
         printf "\t%d", chars
+    if (do_bytes)
+        printf "\t%d", bytes
     printf "\t%s\n", fname
 @}
 @c endfile
 @end example
 
 There is one rule that is executed for each line. It adds the length of
-the record, plus one, to @code{chars}.@footnote{Because @command{gawk}
-understands multibyte locales, this code counts characters, not bytes.}
-Adding one plus the record length
+the record, plus one, to @code{chars}.  Adding one plus the record length
 is needed because the newline character separating records (the value
 of @code{RS}) is not part of the record itself, and thus not included
-in its length.  Next, @code{lines} is incremented for each line read,
-and @code{words} is incremented by the value of @code{NF}, which is the
-number of ``words'' on this line:
+in its length.  Similarly, it adds the length of the record in bytes,
+plus one, to @code{bytes}.  Next, @code{lines} is incremented for each
+line read, and @code{words} is incremented by the value of @code{NF},
+which is the number of ``words'' on this line:
 
 @example
 @c file eg/prog/wc.awk
 # do per line
 @{
     chars += length($0) + 1    # get newline
+    bytes += mbs_length($0) + 1
     lines++
     words += NF
 @}
@@ -25942,6 +26015,8 @@ END @{
             printf "\t%d", twords
         if (do_chars)
             printf "\t%d", tchars
+        if (do_bytes)
+            printf "\t%d", tbytes
         print "\ttotal"
     @}
 @}

-----------------------------------------------------------------------

Summary of changes:
 awklib/eg/prog/wc.awk |  27 ++-
 doc/ChangeLog         |  10 +
 doc/gawk.info         | 659 ++++++++++++++++++++++++++++----------------------
 doc/gawk.texi         | 128 +++++++---
 doc/gawktexi.in       | 128 +++++++---
 5 files changed, 599 insertions(+), 353 deletions(-)


hooks/post-receive
-- 
gawk



reply via email to

[Prev in Thread] Current Thread [Next in Thread]