gawk-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gawk-diffs] [SCM] gawk branch, master, updated. da81e3806c39654f4036ac6


From: Arnold Robbins
Subject: [gawk-diffs] [SCM] gawk branch, master, updated. da81e3806c39654f4036ac63abacbc0c8c5e0e92
Date: Tue, 06 Nov 2012 20:00:17 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gawk".

The branch, master has been updated
       via  da81e3806c39654f4036ac63abacbc0c8c5e0e92 (commit)
       via  9e49de573a7ea6f84e6577511aec5a5fc1f47cb6 (commit)
       via  d5cc356948eb6d3ed024b1addad6daccb809448b (commit)
      from  5d5984ca88b4872d8052cde29ac904bdb193ffd8 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.sv.gnu.org/cgit/gawk.git/commit/?id=da81e3806c39654f4036ac63abacbc0c8c5e0e92

commit da81e3806c39654f4036ac63abacbc0c8c5e0e92
Author: Arnold D. Robbins <address@hidden>
Date:   Tue Nov 6 21:59:47 2012 +0200

    Document local arrays.

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 4616137..f0b304f 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -2,6 +2,9 @@
 
        * gawk.texi: Rearrange chapter order and separate into parts
        using @part for TeX.
+       (Variable Scope): Document that arrays can be local also.
+       Thanks to Denis Shirokov <address@hidden>, for pointing out
+       the lack.
 
 2012-11-05         Arnold D. Robbins     <address@hidden>
 
diff --git a/doc/gawk.info b/doc/gawk.info
index cc0c259..e20d14b 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -12899,6 +12899,42 @@ that `i' is a local variable, not an argument):
      foo's i=1
      top's i=10
 
+   Besides scalar values (strings and numbers), you may also have local
+arrays.  By using a parameter name as an array, `awk' treats it as an
+array, and it is local to the function.  In addition, recursive calls
+create new arrays.  Consider this example:
+
+     function some_func(p1,      a)
+     {
+         if (p1++ > 3)
+             return
+
+         a[p1] = p1
+
+         some_func(p1)
+
+         printf("At level %d, index %d %s found in a\n",
+              p1, (p1 - 1), (p1 - 1) in a ? "is" : "is not")
+         printf("At level %d, index %d %s found in a\n",
+              p1, p1, p1 in a ? "is" : "is not")
+         print ""
+     }
+
+     BEGIN {
+         some_func(1)
+     }
+
+   When run, this program produces the following output:
+
+     At level 4, index 3 is not found in a
+     At level 4, index 4 is found in a
+
+     At level 3, index 2 is not found in a
+     At level 3, index 3 is found in a
+
+     At level 2, index 1 is not found in a
+     At level 2, index 2 is found in a
+
 
 File: gawk.info,  Node: Pass By Value/Reference,  Prev: Variable Scope,  Up: 
Function Caveats
 
@@ -26798,7 +26834,7 @@ critical, that for any given branch, the above 
incantation _just works_.
                tar -xpzvf PACKAGE-X.Y.Z.tar.gz
                cd PACKAGE-X.Y.Z
                ./configure && make && make check
-               make install    # as root
+               make install    # as root
 
        B. These days the maintainer uses Ubuntu 10.11 which is medium
           current, but he is already doing the above for `autoconf' and
@@ -31929,271 +31965,271 @@ Node: Function Example535992
 Node: Function Caveats538586
 Node: Calling A Function539007
 Node: Variable Scope540122
-Node: Pass By Value/Reference542097
-Node: Return Statement545537
-Node: Dynamic Typing548518
-Node: Indirect Calls549253
-Node: Library Functions558938
-Ref: Library Functions-Footnote-1561937
-Node: Library Names562108
-Ref: Library Names-Footnote-1565579
-Ref: Library Names-Footnote-2565799
-Node: General Functions565885
-Node: Strtonum Function566838
-Node: Assert Function569768
-Node: Round Function573094
-Node: Cliff Random Function574637
-Node: Ordinal Functions575653
-Ref: Ordinal Functions-Footnote-1578723
-Ref: Ordinal Functions-Footnote-2578975
-Node: Join Function579184
-Ref: Join Function-Footnote-1580955
-Node: Getlocaltime Function581155
-Node: Data File Management584870
-Node: Filetrans Function585502
-Node: Rewind Function589641
-Node: File Checking591028
-Node: Empty Files592122
-Node: Ignoring Assigns594352
-Node: Getopt Function595905
-Ref: Getopt Function-Footnote-1607209
-Node: Passwd Functions607412
-Ref: Passwd Functions-Footnote-1616387
-Node: Group Functions616475
-Node: Walking Arrays624559
-Node: Sample Programs626128
-Node: Running Examples626805
-Node: Clones627533
-Node: Cut Program628757
-Node: Egrep Program638602
-Ref: Egrep Program-Footnote-1646375
-Node: Id Program646485
-Node: Split Program650101
-Ref: Split Program-Footnote-1653620
-Node: Tee Program653748
-Node: Uniq Program656551
-Node: Wc Program663980
-Ref: Wc Program-Footnote-1668246
-Ref: Wc Program-Footnote-2668446
-Node: Miscellaneous Programs668538
-Node: Dupword Program669726
-Node: Alarm Program671757
-Node: Translate Program676506
-Ref: Translate Program-Footnote-1680893
-Ref: Translate Program-Footnote-2681121
-Node: Labels Program681255
-Ref: Labels Program-Footnote-1684626
-Node: Word Sorting684710
-Node: History Sorting688594
-Node: Extract Program690433
-Ref: Extract Program-Footnote-1697916
-Node: Simple Sed698044
-Node: Igawk Program701106
-Ref: Igawk Program-Footnote-1716263
-Ref: Igawk Program-Footnote-2716464
-Node: Anagram Program716602
-Node: Signature Program719670
-Node: Internationalization720770
-Node: I18N and L10N722202
-Node: Explaining gettext722888
-Ref: Explaining gettext-Footnote-1727954
-Ref: Explaining gettext-Footnote-2728138
-Node: Programmer i18n728303
-Node: Translator i18n732503
-Node: String Extraction733296
-Ref: String Extraction-Footnote-1734257
-Node: Printf Ordering734343
-Ref: Printf Ordering-Footnote-1737127
-Node: I18N Portability737191
-Ref: I18N Portability-Footnote-1739640
-Node: I18N Example739703
-Ref: I18N Example-Footnote-1742338
-Node: Gawk I18N742410
-Node: Advanced Features743027
-Node: Nondecimal Data744531
-Node: Array Sorting746114
-Node: Controlling Array Traversal746811
-Node: Array Sorting Functions755049
-Ref: Array Sorting Functions-Footnote-1758723
-Ref: Array Sorting Functions-Footnote-2758816
-Node: Two-way I/O759010
-Ref: Two-way I/O-Footnote-1764442
-Node: TCP/IP Networking764512
-Node: Profiling767356
-Node: Debugger774810
-Node: Debugging775778
-Node: Debugging Concepts776211
-Node: Debugging Terms778067
-Node: Awk Debugging780664
-Node: Sample Debugging Session781556
-Node: Debugger Invocation782076
-Node: Finding The Bug783405
-Node: List of Debugger Commands789893
-Node: Breakpoint Control791227
-Node: Debugger Execution Control794891
-Node: Viewing And Changing Data798251
-Node: Execution Stack801607
-Node: Debugger Info803074
-Node: Miscellaneous Debugger Commands807055
-Node: Readline Support812500
-Node: Limitations813331
-Node: Arbitrary Precision Arithmetic815583
-Ref: Arbitrary Precision Arithmetic-Footnote-1817225
-Node: General Arithmetic817373
-Node: Floating Point Issues819093
-Node: String Conversion Precision819974
-Ref: String Conversion Precision-Footnote-1821680
-Node: Unexpected Results821789
-Node: POSIX Floating Point Problems823942
-Ref: POSIX Floating Point Problems-Footnote-1827767
-Node: Integer Programming827805
-Node: Floating-point Programming829558
-Ref: Floating-point Programming-Footnote-1835867
-Node: Floating-point Representation836131
-Node: Floating-point Context837296
-Ref: table-ieee-formats838138
-Node: Rounding Mode839522
-Ref: table-rounding-modes840001
-Ref: Rounding Mode-Footnote-1843005
-Node: Gawk and MPFR843186
-Node: Arbitrary Precision Floats844428
-Ref: Arbitrary Precision Floats-Footnote-1846857
-Node: Setting Precision847168
-Node: Setting Rounding Mode849901
-Ref: table-gawk-rounding-modes850305
-Node: Floating-point Constants851485
-Node: Changing Precision852909
-Ref: Changing Precision-Footnote-1854309
-Node: Exact Arithmetic854483
-Node: Arbitrary Precision Integers857591
-Ref: Arbitrary Precision Integers-Footnote-1860591
-Node: Dynamic Extensions860738
-Node: Extension Intro862061
-Node: Plugin License863264
-Node: Extension Design863938
-Node: Old Extension Problems865009
-Ref: Old Extension Problems-Footnote-1866519
-Node: Extension New Mechanism Goals866576
-Ref: Extension New Mechanism Goals-Footnote-1869288
-Node: Extension Other Design Decisions869474
-Node: Extension Mechanism Outline871221
-Ref: load-extension872246
-Ref: load-new-function873724
-Ref: call-new-function874705
-Node: Extension Future Growth876686
-Node: Extension API Description877428
-Node: Extension API Functions Introduction878748
-Node: General Data Types882823
-Ref: General Data Types-Footnote-1888456
-Node: Requesting Values888755
-Ref: table-value-types-returned889486
-Node: Constructor Functions890440
-Node: Registration Functions893436
-Node: Extension Functions894121
-Node: Exit Callback Functions895940
-Node: Extension Version String897183
-Node: Input Parsers897833
-Node: Output Wrappers906414
-Node: Two-way processors910807
-Node: Printing Messages912929
-Ref: Printing Messages-Footnote-1914006
-Node: Updating `ERRNO'914158
-Node: Accessing Parameters914897
-Node: Symbol Table Access916127
-Node: Symbol table by name916639
-Ref: Symbol table by name-Footnote-1918811
-Node: Symbol table by cookie918891
-Ref: Symbol table by cookie-Footnote-1923020
-Node: Cached values923083
-Ref: Cached values-Footnote-1926284
-Node: Array Manipulation926375
-Ref: Array Manipulation-Footnote-1927473
-Node: Array Data Types927512
-Ref: Array Data Types-Footnote-1930234
-Node: Array Functions930326
-Node: Flattening Arrays934092
-Node: Creating Arrays940923
-Node: Extension API Variables945719
-Node: Extension Versioning946355
-Node: Extension API Informational Variables948256
-Node: Extension API Boilerplate949342
-Node: Finding Extensions953176
-Node: Extension Example953723
-Node: Internal File Description954461
-Node: Internal File Ops958149
-Ref: Internal File Ops-Footnote-1969233
-Node: Using Internal File Ops969373
-Ref: Using Internal File Ops-Footnote-1971729
-Node: Extension Samples971995
-Node: Extension Sample File Functions973438
-Node: Extension Sample Fnmatch981807
-Node: Extension Sample Fork983533
-Node: Extension Sample Ord984747
-Node: Extension Sample Readdir985523
-Node: Extension Sample Revout987861
-Node: Extension Sample Rev2way988454
-Node: Extension Sample Read write array989144
-Node: Extension Sample Readfile991027
-Node: Extension Sample API Tests991782
-Node: Extension Sample Time992307
-Node: gawkextlib993616
-Node: Language History995999
-Node: V7/SVR3.1997521
-Node: SVR4999842
-Node: POSIX1001284
-Node: BTL1002292
-Node: POSIX/GNU1003026
-Node: Common Extensions1008561
-Node: Ranges and Locales1009668
-Ref: Ranges and Locales-Footnote-11014286
-Ref: Ranges and Locales-Footnote-21014313
-Ref: Ranges and Locales-Footnote-31014573
-Node: Contributors1014794
-Node: Installation1019090
-Node: Gawk Distribution1019984
-Node: Getting1020468
-Node: Extracting1021294
-Node: Distribution contents1022986
-Node: Unix Installation1028208
-Node: Quick Installation1028825
-Node: Additional Configuration Options1030787
-Node: Configuration Philosophy1032264
-Node: Non-Unix Installation1034606
-Node: PC Installation1035064
-Node: PC Binary Installation1036363
-Node: PC Compiling1038211
-Node: PC Testing1041155
-Node: PC Using1042331
-Node: Cygwin1046516
-Node: MSYS1047516
-Node: VMS Installation1048030
-Node: VMS Compilation1048633
-Ref: VMS Compilation-Footnote-11049640
-Node: VMS Installation Details1049698
-Node: VMS Running1051333
-Node: VMS Old Gawk1052940
-Node: Bugs1053414
-Node: Other Versions1057266
-Node: Notes1062581
-Node: Compatibility Mode1063168
-Node: Additions1063951
-Node: Accessing The Source1064878
-Node: Adding Code1066304
-Node: New Ports1072346
-Node: Derived Files1076481
-Ref: Derived Files-Footnote-11081786
-Ref: Derived Files-Footnote-21081820
-Ref: Derived Files-Footnote-31082420
-Node: Future Extensions1082518
-Node: Basic Concepts1084005
-Node: Basic High Level1084686
-Ref: figure-general-flow1084957
-Ref: figure-process-flow1085556
-Ref: Basic High Level-Footnote-11088785
-Node: Basic Data Typing1088970
-Node: Glossary1092325
-Node: Copying1117636
-Node: GNU Free Documentation License1155193
-Node: Index1180330
+Node: Pass By Value/Reference543085
+Node: Return Statement546525
+Node: Dynamic Typing549506
+Node: Indirect Calls550241
+Node: Library Functions559926
+Ref: Library Functions-Footnote-1562925
+Node: Library Names563096
+Ref: Library Names-Footnote-1566567
+Ref: Library Names-Footnote-2566787
+Node: General Functions566873
+Node: Strtonum Function567826
+Node: Assert Function570756
+Node: Round Function574082
+Node: Cliff Random Function575625
+Node: Ordinal Functions576641
+Ref: Ordinal Functions-Footnote-1579711
+Ref: Ordinal Functions-Footnote-2579963
+Node: Join Function580172
+Ref: Join Function-Footnote-1581943
+Node: Getlocaltime Function582143
+Node: Data File Management585858
+Node: Filetrans Function586490
+Node: Rewind Function590629
+Node: File Checking592016
+Node: Empty Files593110
+Node: Ignoring Assigns595340
+Node: Getopt Function596893
+Ref: Getopt Function-Footnote-1608197
+Node: Passwd Functions608400
+Ref: Passwd Functions-Footnote-1617375
+Node: Group Functions617463
+Node: Walking Arrays625547
+Node: Sample Programs627116
+Node: Running Examples627793
+Node: Clones628521
+Node: Cut Program629745
+Node: Egrep Program639590
+Ref: Egrep Program-Footnote-1647363
+Node: Id Program647473
+Node: Split Program651089
+Ref: Split Program-Footnote-1654608
+Node: Tee Program654736
+Node: Uniq Program657539
+Node: Wc Program664968
+Ref: Wc Program-Footnote-1669234
+Ref: Wc Program-Footnote-2669434
+Node: Miscellaneous Programs669526
+Node: Dupword Program670714
+Node: Alarm Program672745
+Node: Translate Program677494
+Ref: Translate Program-Footnote-1681881
+Ref: Translate Program-Footnote-2682109
+Node: Labels Program682243
+Ref: Labels Program-Footnote-1685614
+Node: Word Sorting685698
+Node: History Sorting689582
+Node: Extract Program691421
+Ref: Extract Program-Footnote-1698904
+Node: Simple Sed699032
+Node: Igawk Program702094
+Ref: Igawk Program-Footnote-1717251
+Ref: Igawk Program-Footnote-2717452
+Node: Anagram Program717590
+Node: Signature Program720658
+Node: Internationalization721758
+Node: I18N and L10N723190
+Node: Explaining gettext723876
+Ref: Explaining gettext-Footnote-1728942
+Ref: Explaining gettext-Footnote-2729126
+Node: Programmer i18n729291
+Node: Translator i18n733491
+Node: String Extraction734284
+Ref: String Extraction-Footnote-1735245
+Node: Printf Ordering735331
+Ref: Printf Ordering-Footnote-1738115
+Node: I18N Portability738179
+Ref: I18N Portability-Footnote-1740628
+Node: I18N Example740691
+Ref: I18N Example-Footnote-1743326
+Node: Gawk I18N743398
+Node: Advanced Features744015
+Node: Nondecimal Data745519
+Node: Array Sorting747102
+Node: Controlling Array Traversal747799
+Node: Array Sorting Functions756037
+Ref: Array Sorting Functions-Footnote-1759711
+Ref: Array Sorting Functions-Footnote-2759804
+Node: Two-way I/O759998
+Ref: Two-way I/O-Footnote-1765430
+Node: TCP/IP Networking765500
+Node: Profiling768344
+Node: Debugger775798
+Node: Debugging776766
+Node: Debugging Concepts777199
+Node: Debugging Terms779055
+Node: Awk Debugging781652
+Node: Sample Debugging Session782544
+Node: Debugger Invocation783064
+Node: Finding The Bug784393
+Node: List of Debugger Commands790881
+Node: Breakpoint Control792215
+Node: Debugger Execution Control795879
+Node: Viewing And Changing Data799239
+Node: Execution Stack802595
+Node: Debugger Info804062
+Node: Miscellaneous Debugger Commands808043
+Node: Readline Support813488
+Node: Limitations814319
+Node: Arbitrary Precision Arithmetic816571
+Ref: Arbitrary Precision Arithmetic-Footnote-1818213
+Node: General Arithmetic818361
+Node: Floating Point Issues820081
+Node: String Conversion Precision820962
+Ref: String Conversion Precision-Footnote-1822668
+Node: Unexpected Results822777
+Node: POSIX Floating Point Problems824930
+Ref: POSIX Floating Point Problems-Footnote-1828755
+Node: Integer Programming828793
+Node: Floating-point Programming830546
+Ref: Floating-point Programming-Footnote-1836855
+Node: Floating-point Representation837119
+Node: Floating-point Context838284
+Ref: table-ieee-formats839126
+Node: Rounding Mode840510
+Ref: table-rounding-modes840989
+Ref: Rounding Mode-Footnote-1843993
+Node: Gawk and MPFR844174
+Node: Arbitrary Precision Floats845416
+Ref: Arbitrary Precision Floats-Footnote-1847845
+Node: Setting Precision848156
+Node: Setting Rounding Mode850889
+Ref: table-gawk-rounding-modes851293
+Node: Floating-point Constants852473
+Node: Changing Precision853897
+Ref: Changing Precision-Footnote-1855297
+Node: Exact Arithmetic855471
+Node: Arbitrary Precision Integers858579
+Ref: Arbitrary Precision Integers-Footnote-1861579
+Node: Dynamic Extensions861726
+Node: Extension Intro863049
+Node: Plugin License864252
+Node: Extension Design864926
+Node: Old Extension Problems865997
+Ref: Old Extension Problems-Footnote-1867507
+Node: Extension New Mechanism Goals867564
+Ref: Extension New Mechanism Goals-Footnote-1870276
+Node: Extension Other Design Decisions870462
+Node: Extension Mechanism Outline872209
+Ref: load-extension873234
+Ref: load-new-function874712
+Ref: call-new-function875693
+Node: Extension Future Growth877674
+Node: Extension API Description878416
+Node: Extension API Functions Introduction879736
+Node: General Data Types883811
+Ref: General Data Types-Footnote-1889444
+Node: Requesting Values889743
+Ref: table-value-types-returned890474
+Node: Constructor Functions891428
+Node: Registration Functions894424
+Node: Extension Functions895109
+Node: Exit Callback Functions896928
+Node: Extension Version String898171
+Node: Input Parsers898821
+Node: Output Wrappers907402
+Node: Two-way processors911795
+Node: Printing Messages913917
+Ref: Printing Messages-Footnote-1914994
+Node: Updating `ERRNO'915146
+Node: Accessing Parameters915885
+Node: Symbol Table Access917115
+Node: Symbol table by name917627
+Ref: Symbol table by name-Footnote-1919799
+Node: Symbol table by cookie919879
+Ref: Symbol table by cookie-Footnote-1924008
+Node: Cached values924071
+Ref: Cached values-Footnote-1927272
+Node: Array Manipulation927363
+Ref: Array Manipulation-Footnote-1928461
+Node: Array Data Types928500
+Ref: Array Data Types-Footnote-1931222
+Node: Array Functions931314
+Node: Flattening Arrays935080
+Node: Creating Arrays941911
+Node: Extension API Variables946707
+Node: Extension Versioning947343
+Node: Extension API Informational Variables949244
+Node: Extension API Boilerplate950330
+Node: Finding Extensions954164
+Node: Extension Example954711
+Node: Internal File Description955449
+Node: Internal File Ops959137
+Ref: Internal File Ops-Footnote-1970221
+Node: Using Internal File Ops970361
+Ref: Using Internal File Ops-Footnote-1972717
+Node: Extension Samples972983
+Node: Extension Sample File Functions974426
+Node: Extension Sample Fnmatch982795
+Node: Extension Sample Fork984521
+Node: Extension Sample Ord985735
+Node: Extension Sample Readdir986511
+Node: Extension Sample Revout988849
+Node: Extension Sample Rev2way989442
+Node: Extension Sample Read write array990132
+Node: Extension Sample Readfile992015
+Node: Extension Sample API Tests992770
+Node: Extension Sample Time993295
+Node: gawkextlib994604
+Node: Language History996987
+Node: V7/SVR3.1998509
+Node: SVR41000830
+Node: POSIX1002272
+Node: BTL1003280
+Node: POSIX/GNU1004014
+Node: Common Extensions1009549
+Node: Ranges and Locales1010656
+Ref: Ranges and Locales-Footnote-11015274
+Ref: Ranges and Locales-Footnote-21015301
+Ref: Ranges and Locales-Footnote-31015561
+Node: Contributors1015782
+Node: Installation1020078
+Node: Gawk Distribution1020972
+Node: Getting1021456
+Node: Extracting1022282
+Node: Distribution contents1023974
+Node: Unix Installation1029196
+Node: Quick Installation1029813
+Node: Additional Configuration Options1031775
+Node: Configuration Philosophy1033252
+Node: Non-Unix Installation1035594
+Node: PC Installation1036052
+Node: PC Binary Installation1037351
+Node: PC Compiling1039199
+Node: PC Testing1042143
+Node: PC Using1043319
+Node: Cygwin1047504
+Node: MSYS1048504
+Node: VMS Installation1049018
+Node: VMS Compilation1049621
+Ref: VMS Compilation-Footnote-11050628
+Node: VMS Installation Details1050686
+Node: VMS Running1052321
+Node: VMS Old Gawk1053928
+Node: Bugs1054402
+Node: Other Versions1058254
+Node: Notes1063569
+Node: Compatibility Mode1064156
+Node: Additions1064939
+Node: Accessing The Source1065866
+Node: Adding Code1067292
+Node: New Ports1073334
+Node: Derived Files1077469
+Ref: Derived Files-Footnote-11082777
+Ref: Derived Files-Footnote-21082811
+Ref: Derived Files-Footnote-31083411
+Node: Future Extensions1083509
+Node: Basic Concepts1084996
+Node: Basic High Level1085677
+Ref: figure-general-flow1085948
+Ref: figure-process-flow1086547
+Ref: Basic High Level-Footnote-11089776
+Node: Basic Data Typing1089961
+Node: Glossary1093316
+Node: Copying1118627
+Node: GNU Free Documentation License1156184
+Node: Index1181321
 
 End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index e4ffc22..9493978 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -17314,6 +17314,47 @@ foo's i=1
 top's i=10
 @end example
 
+Besides scalar values (strings and numbers), you may also have
+local arrays.  By using a parameter name as an array, @command{awk}
+treats it as an array, and it is local to the function.
+In addition, recursive calls create new arrays.
+Consider this example:
+
address@hidden
+function some_func(p1,      a)
address@hidden
+    if (p1++ > 3)
+        return
+
+    a[p1] = p1
+
+    some_func(p1)
+
+    printf("At level %d, index %d %s found in a\n",
+         p1, (p1 - 1), (p1 - 1) in a ? "is" : "is not")
+    printf("At level %d, index %d %s found in a\n",
+         p1, p1, p1 in a ? "is" : "is not")
+    print ""
address@hidden
+
+BEGIN @{
+    some_func(1)
address@hidden
address@hidden example
+
+When run, this program produces the following output:
+
address@hidden
+At level 4, index 3 is not found in a
+At level 4, index 4 is found in a
+
+At level 3, index 2 is not found in a
+At level 3, index 3 is found in a
+
+At level 2, index 1 is not found in a
+At level 2, index 2 is found in a
address@hidden example
+
 @node Pass By Value/Reference
 @subsubsection Passing Function Arguments By Value Or By Reference
 
@@ -34698,7 +34739,7 @@ wget 
http://ftp.gnu.org/gnu/@var{package}/@address@hidden@address@hidden
 tar -xpzvf @address@hidden@address@hidden
 cd @address@hidden@address@hidden
 ./configure && make && make check
-make install   # as root
+make install    # as root
 @end example
 
 @item

http://git.sv.gnu.org/cgit/gawk.git/commit/?id=9e49de573a7ea6f84e6577511aec5a5fc1f47cb6

commit 9e49de573a7ea6f84e6577511aec5a5fc1f47cb6
Author: Arnold D. Robbins <address@hidden>
Date:   Tue Nov 6 21:40:09 2012 +0200

    Remove temp API doc after merge into gawk.texi.

diff --git a/doc/api.texi b/doc/api.texi
deleted file mode 100644
index 6d048de..0000000
--- a/doc/api.texi
+++ /dev/null
@@ -1,4103 +0,0 @@
-\input texinfo   @c -*-texinfo-*-
address@hidden %**start of header (This is for running Texinfo on a region.)
address@hidden api.info
address@hidden Writing Extensions For Gawk
address@hidden %**end of header (This is for running Texinfo on a region.)
-
address@hidden Text creation and manipulation
address@hidden
-* Gawk: (gawk).                 A text scanning and processing language.
address@hidden direntry
address@hidden Individual utilities
address@hidden
-* awk: (gawk)Invoking gawk.                     Text scanning and processing.
address@hidden direntry
-
address@hidden xref-automatic-section-title
-
address@hidden The following information should be updated here only!
address@hidden This sets the edition of the document, the version of gawk it
address@hidden applies to and all the info about who's publishing this edition
-
address@hidden These apply across the board.
address@hidden UPDATE-MONTH October, 2012
address@hidden VERSION 4.1
address@hidden PATCHLEVEL 0
-
address@hidden FSF
-
address@hidden TITLE Writing Extensions for Gawk
address@hidden SUBTITLE A Temporary Manual
address@hidden EDITION 1
-
address@hidden
address@hidden DOCUMENT book
address@hidden CHAPTER chapter
address@hidden APPENDIX appendix
address@hidden SECTION section
address@hidden SUBSECTION subsection
address@hidden DARKCORNER @address@hidden,1cm}, @image{rflashlight,1cm}}
address@hidden COMMONEXT (c.e.)
address@hidden iftex
address@hidden
address@hidden DOCUMENT Info file
address@hidden CHAPTER major node
address@hidden APPENDIX major node
address@hidden SECTION minor node
address@hidden SUBSECTION node
address@hidden DARKCORNER (d.c.)
address@hidden COMMONEXT (c.e.)
address@hidden ifinfo
address@hidden
address@hidden DOCUMENT Web page
address@hidden CHAPTER chapter
address@hidden APPENDIX appendix
address@hidden SECTION section
address@hidden SUBSECTION subsection
address@hidden DARKCORNER (d.c.)
address@hidden COMMONEXT (c.e.)
address@hidden ifhtml
address@hidden
address@hidden DOCUMENT book
address@hidden CHAPTER chapter
address@hidden APPENDIX appendix
address@hidden SECTION section
address@hidden SUBSECTION subsection
address@hidden DARKCORNER (d.c.)
address@hidden COMMONEXT (c.e.)
address@hidden ifdocbook
address@hidden
address@hidden DOCUMENT book
address@hidden CHAPTER chapter
address@hidden APPENDIX appendix
address@hidden SECTION section
address@hidden SUBSECTION subsection
address@hidden DARKCORNER (d.c.)
address@hidden COMMONEXT (c.e.)
address@hidden ifplaintext
-
address@hidden some special symbols
address@hidden
address@hidden LEQ @address@hidden
address@hidden PI @address@hidden
address@hidden iftex
address@hidden
address@hidden LEQ <=
address@hidden PI @i{pi}
address@hidden ifnottex
-
address@hidden
address@hidden ii{text}
address@hidden
address@hidden macro
address@hidden ifnottex
-
address@hidden For HTML, spell out email addresses, to avoid problems with
address@hidden address harvesters for spammers.
address@hidden
address@hidden EMAIL{real,spelled}
-``\spelled\''
address@hidden macro
address@hidden ifhtml
address@hidden
address@hidden EMAIL{real,spelled}
address@hidden
address@hidden macro
address@hidden ifnothtml
-
address@hidden FN file name
address@hidden FFN File Name
address@hidden DF data file
address@hidden DDF Data File
address@hidden PVERSION version
address@hidden CTL Ctrl
-
address@hidden
-Some comments on the layout for TeX.
-1. Use at least texinfo.tex 2000-09-06.09
-2. I have done A LOT of work to make this look good. There are  
address@hidden' commands
-   and use of address@hidden ... @end group' in a number of places. If you muck
-   with anything, it's your responsibility not to break the layout.
address@hidden ignore
-
address@hidden merge the function and variable indexes into the concept index
address@hidden
address@hidden fn cp
address@hidden vr cp
address@hidden ifinfo
address@hidden
address@hidden fn cp
address@hidden vr cp
address@hidden iftex
address@hidden
address@hidden fn cp
address@hidden vr cp
address@hidden ifxml
-
address@hidden If "finalout" is commented out, the printed output will show
address@hidden black boxes that mark lines that are too long.  Thus, it is
address@hidden unwise to comment it out when running a master in case there are
address@hidden overfulls which are deemed okay.
-
address@hidden
address@hidden
address@hidden iftex
-
address@hidden
-Copyright @copyright{} 2012
-Free Software Foundation, Inc.
address@hidden 2
-
-This is Edition @value{EDITION} of @address@hidden: @value{SUBTITLE}},
-for the @address@hidden (or later) version of the GNU
-implementation of AWK.
-
-Permission is granted to copy, distribute and/or modify this document
-under the terms of the GNU Free Documentation License, Version 1.3 or
-any later version published by the Free Software Foundation; with the
-Invariant Sections being ``GNU General Public License'', the Front-Cover
-texts being (a) (see below), and with the Back-Cover Texts being (b)
-(see below).  A copy of the license is included in the section entitled
-``GNU Free Documentation License''.
-
address@hidden a
address@hidden
-``A GNU Manual''
-
address@hidden
-``You have the freedom to
-copy and modify this GNU manual.  Buying copies from the FSF
-supports it in developing GNU and promoting software freedom.''
address@hidden enumerate
address@hidden copying
-
address@hidden Comment out the "smallbook" for technical review.  Saves
address@hidden considerable paper.  Remember to turn it back on *before*
address@hidden starting the page-breaking work.
-
address@hidden 4/2002: Karl Berry recommends commenting out this and the
address@hidden address@hidden odd', and letting users use `texi2dvi -t'
address@hidden if they want to waste paper.
address@hidden @smallbook
-
-
address@hidden Uncomment this for the release.  Leaving it off saves paper
address@hidden during editing and review.
address@hidden odd
-
address@hidden
address@hidden @value{TITLE}
address@hidden @value{SUBTITLE}
address@hidden Edition @value{EDITION}
address@hidden @value{UPDATE-MONTH}
address@hidden Arnold D. Robbins
-
address@hidden Include the Distribution inside the titlepage environment so
address@hidden that headings are turned off.  Headings on and off do not work.
-
address@hidden
address@hidden 0pt plus 1filll
-``To boldly go where no man has gone before'' is a
-Registered Trademark of Paramount Pictures Corporation. @*
address@hidden sorry, i couldn't resist
address@hidden 3
-Published by:
address@hidden 1
-
-Free Software Foundation @*
-51 Franklin Street, Fifth Floor @*
-Boston, MA  02110-1301 USA @*
-Phone: +1-617-542-5942 @*
-Fax: +1-617-542-2652 @*
-Email: @email{gnu@@gnu.org} @*
-URL: @uref{http://www.gnu.org/} @*
-
address@hidden This one is correct for gawk 3.1.0 from the FSF
-ISBN 1-882114-28-0 @*
address@hidden 2
address@hidden
address@hidden titlepage
-
address@hidden
address@hidden Top
address@hidden Top Node
-
-Fake top node.
-
address@hidden
-
address@hidden ifnottex
-
address@hidden
-* Extension API::               Writing Extensions for @command{gawk}.
-* Fake Chapter::                Fake Sections For Cross References.
-
address@hidden
-* Extension Intro::                     What is an extension.
-* Plugin License::                      A note about licensing.
-* Extension Design::                    Design notes about the extension API.
-* Old Extension Problems::              Problems with the old mechanism.
-* Extension New Mechanism Goals::       Goals for the new mechanism.
-* Extension Other Design Decisions::    Some other design decisions.
-* Extension Mechanism Outline::         An outline of how it works.
-* Extension Future Growth::             Some room for future growth.
-* Extension API Description::           A full description of the API.
-* Extension API Functions Introduction:: Introduction to the API functions.
-* General Data Types::                  The data types.
-* Requesting Values::                   How to get a value.
-* Constructor Functions::               Functions for creating values.
-* Registration Functions::              Functions to register things with
-                                        @command{gawk}.
-* Extension Functions::                 Registering extension functions.
-* Input Parsers::                       Registering an input parser.
-* Output Wrappers::                     Registering an output wrapper.
-* Two-way processors::                  Registering a two-way processor.
-* Exit Callback Functions::             Registering an exit callback.
-* Extension Version String::            Registering a version string.
-* Printing Messages::                   Functions for printing messages.
-* Updating @code{ERRNO}::               Functions for updating @code{ERRNO}.
-* Accessing Parameters::                Functions for accessing parameters.
-* Symbol Table Access::                 Functions for accessing global
-                                        variables.
-* Symbol table by name::                Accessing variables by name.
-* Symbol table by cookie::              Accessing variables by ``cookie''.
-* Cached values::                       Creating and using cached values.
-* Array Manipulation::                  Functions for working with arrays.
-* Array Data Types::                    Data types for working with arrays.
-* Array Functions::                     Functions for working with arrays.
-* Flattening Arrays::                   How to flatten arrays.
-* Creating Arrays::                     How to create and populate arrays.
-* Extension API Variables::             Variables provided by the API.
-* Extension Versioning::                API Version information.
-* Extension API Informational Variables:: Variables providing information about
-                                        @command{gawk}'s invocation.
-* Extension API Boilerplate::           Boilerplate code for using the API.
-* Finding Extensions::                  How @command{gawk} find compiled
-                                        extensions.
-* Extension Example::                   Example C code for an extension.
-* Internal File Description::           What the new functions will do.
-* Internal File Ops::                   The code for internal file operations.
-* Using Internal File Ops::             How to use an external extension.
-* Extension Samples::                   The sample extensions that ship with
-                                        @code{gawk}.
-* Extension Sample File Functions::     The file functions sample.
-* Extension Sample Fnmatch::            An interface to @code{fnmatch()}.
-* Extension Sample Fork::               An interface to @code{fork()} and
-                                        other process functions.
-* Extension Sample Ord::                Character to value to character
-                                        conversions.
-* Extension Sample Readdir::            An interface to @code{readdir()}.
-* Extension Sample Revout::             Reversing output sample output
-                                        wrapper.
-* Extension Sample Rev2way::            Reversing data sample two-way
-                                        processor.
-* Extension Sample Read write array::   Serializing an array to a file.
-* Extension Sample Readfile::           Reading an entire file into a string.
-* Extension Sample API Tests::          Tests for the API.
-* Extension Sample Time::               An interface to @code{gettimeofday()}
-                                        and @code{sleep()}.
-* gawkextlib::                          The @code{gawkextlib} project.
-* Reference to Elements::               Referring to an Array Element.
-* Built-in::                            Built-in Functions.
-* Built-in Variables::                  Built-in Variables.
-* Options::                             Command-Line Options.
address@hidden detailmenu
address@hidden menu
-
address@hidden
-
address@hidden Extension API
address@hidden Writing Extensions for @command{gawk}
-
-It is possible to add new built-in functions to @command{gawk} using
-dynamically loaded libraries. This facility is available on systems (such
-as GNU/Linux) that support the C @code{dlopen()} and @code{dlsym()}
-functions.  This @value{CHAPTER} describes how to create extensions
-using code written in C or C++.  If you don't know anything about C
-programming, you can safely skip this @value{CHAPTER}, although you
-may wish to review the documentation on the extensions that come with
address@hidden (@pxref{Extension Samples}), and the section on the
address@hidden project (@pxref{gawkextlib}).
-
address@hidden NOTE
-When @option{--sandbox} is specified, extensions are disabled
-(@pxref{Options}).
address@hidden quotation
-
address@hidden
-* Extension Intro::             What is an extension.
-* Plugin License::              A note about licensing.
-* Extension Design::            Design notes about the extension API.
-* Extension API Description::   A full description of the API.
-* Extension Example::           Example C code for an extension.
-* Extension Samples::           The sample extensions that ship with
-                                @code{gawk}.
-* gawkextlib::                  The @code{gawkextlib} project.
address@hidden menu
-
address@hidden Extension Intro
address@hidden Introduction
-
-An @dfn{extension} (sometimes called a @dfn{plug-in}) is a piece of
-external compiled code that @command{gawk} can load at runtime to
-provide additional functionality, over and above the built-in capabilities
-described in the rest of this @value{DOCUMENT}.
-
-Extensions are useful because they allow you (of course) to extend
address@hidden's functionality. For example, they can provide access to
-system calls (such as @code{chdir()} to change directory) and to other
-C library routines that could be of use.  As with most software,
-``the sky is the limit;'' if you can imagine something that you might
-want to do and can write in C or C++, you can write an extension to do it!
-
-Extensions are written in C or C++, using the @dfn{Application Programming
-Interface} (API) defined for this purpose by the @command{gawk}
-developers.  The rest of this @value{CHAPTER} explains the design
-decisions behind the API, the facilities it provides and how to use
-them, and presents a small sample extension.  In addition, it documents
-the sample extensions included in the @command{gawk} distribution,
-and describes the @code{gawkextlib} project.
-
address@hidden Plugin License
address@hidden Extension Licensing
-
-Every dynamic extension should define the global symbol
address@hidden to assert that it has been licensed under
-a GPL-compatible license.  If this symbol does not exist, @command{gawk}
-emits a fatal error and exits when it tries to load your extension.
-
-The declared type of the symbol should be @code{int}.  It does not need
-to be in any allocated section, though.  The code merely asserts that
-the symbol exists in the global scope.  Something like this is enough:
-
address@hidden
-int plugin_is_GPL_compatible;
address@hidden example
-
address@hidden Extension Design
address@hidden Extension API Design
-
-The first version of extensions for @command{gawk} was developed in
-the mid-1990s and released with @command{gawk} 3.1 in the late 1990s.
-The basic mechanisms and design remained unchanged for close to 15 years,
-until 2012.
-
-The old extension mechanism used data types and functions from
address@hidden itself, with a ``clever hack'' to install extension
-functions.
-
address@hidden included some sample extensions, of which a few were
-really useful.  However, it was clear from the outset that the extension
-mechanism was bolted onto the side and was not really thought out.
-
address@hidden
-* Old Extension Problems::           Problems with the old mechanism.
-* Extension New Mechanism Goals::    Goals for the new mechanism.
-* Extension Other Design Decisions:: Some other design decisions.
-* Extension Mechanism Outline::      An outline of how it works.
-* Extension Future Growth::          Some room for future growth.
address@hidden menu
-
address@hidden Old Extension Problems
address@hidden Problems With The Old Mechanism
-
-The old extension mechanism had several problems:
-
address@hidden @bullet
address@hidden
-It depended heavily upon @command{gawk} internals.  Any time the
address@hidden address@hidden critical central data structure
-inside @command{gawk}.} changed, an extension would have to be
-recompiled. Furthermore, to really write extensions required understanding
-something about @command{gawk}'s internal functions.  There was some
-documentation in this @value{DOCUMENT}, but it was quite minimal.
-
address@hidden
-Being able to call into @command{gawk} from an extension required linker
-facilities that are common on Unix-derived systems but that did
-not work on Windows systems; users wanting extensions on Windows
-had to statically link them into @command{gawk}, even though Windows supports
-dynamic loading of shared objects.
-
address@hidden
-The API would change occasionally as @command{gawk} changed; no compatibility
-between versions was ever offered or planned for.
address@hidden itemize
-
-Despite the drawbacks, the @command{xgawk} project developers forked
address@hidden and developed several significant extensions. They also
-enhanced @command{gawk}'s facilities relating to file inclusion and
-shared object access.
-
-A new API was desired for a long time, but only in 2012 did the
address@hidden maintainer and the @command{xgawk} developers finally
-start working on it together.  More information about the @command{xgawk}
-project is provided in @ref{gawkextlib}.
-
address@hidden Extension New Mechanism Goals
address@hidden Goals For A New Mechanism
-
-Some goals for the new API were:
-
address@hidden @bullet
address@hidden
-The API should be independent of @command{gawk} internals.  Changes in
address@hidden internals should not be visible to the writer of an
-extension function.
-
address@hidden
-The API should provide @emph{binary} compatibility across @command{gawk}
-releases as long as the API itself does not change.
-
address@hidden
-The API should enable extensions written in C to have roughly the
-same ``appearance'' to @command{awk}-level code as @command{awk}
-functions do. This means that extensions should have:
-
address@hidden @minus
address@hidden
-The ability to access function parameters.
-
address@hidden
-The ability to turn an undefined parameter into an array (call by reference).
-
address@hidden
-The ability to create, access and update global variables.
-
address@hidden
-Easy access to all the elements of an array at once (``array flattening'')
-in order to loop over all the element in an easy fashion for C code.
-
address@hidden
-The ability to create arrays (including @command{gawk}'s true
-multi-dimensional arrays).
address@hidden itemize
address@hidden itemize
-
-Some additional important goals were:
-
address@hidden @bullet
address@hidden
-The API should use only features in ISO C 90, so that extensions
-can be written using the widest range of C and C++ compilers. The header
-should include the appropriate @samp{#ifdef __cplusplus} and @samp{extern "C"}
-magic so that a C++ compiler could be used.  (If using C++, the runtime
-system has to be smart enough to call any constructors and destructors,
-as @command{gawk} is a C program. As of this writing, this has not been
-tested.)
-
address@hidden
-The API mechanism should not require access to @command{gawk}'s
address@hidden @dfn{symbols} are the variables and functions
-defined inside @command{gawk}.  Access to these symbols by code
-external to @command{gawk} loaded dynamically at runtime is
-problematic on Windows.} by the compile-time or dynamic linker,
-in order to enable creation of extensions that also work on Windows.
address@hidden itemize
-
-During development, it became clear that there were other features
-that should be available to extensions, which were also subsequently
-provided:
-
address@hidden @bullet
address@hidden
-Extensions should have the ability to hook into @command{gawk}'s
-I/O redirection mechanism.  In particular, the @command{xgawk}
-developers provided a so-called ``open hook'' to take over reading
-records.  During development, this was generalized to allow
-extensions to hook into input processing, output processing, and
-two-way I/O.
-
address@hidden
-An extension should be able to provide a ``call back'' function
-to perform clean up actions when @command{gawk} exits.
-
address@hidden
-An extension should be able to provide a version string so that
address@hidden's @option{--version} option can provide information
-about extensions as well.
address@hidden itemize
-
address@hidden Extension Other Design Decisions
address@hidden Other Design Decisions
-
-As an ``arbitrary'' design decision, extensions can read the values of
-built-in variables and arrays (such as @code{ARGV} and @code{FS}), but cannot
-change them, with the exception of @code{PROCINFO}.
-
-The reason for this is to prevent an extension function from affecting
-the flow of an @command{awk} program outside its control.  While a real
address@hidden function can do what it likes, that is at the discretion
-of the programmer.  An extension function should provide a service or
-make a C API available for use within @command{awk}, and not mess with
address@hidden or @code{ARGC} and @code{ARGV}.
-
-In addition, it becomes easy to start down a slippery slope. How
-much access to @command{gawk} facilities do extensions need?
-Do they need @code{getline}?  What about calling @code{gsub()} or
-compiling regular expressions?  What about calling into @command{awk}
-functions? (@emph{That} would be messy.)
-
-In order to avoid these issues, the @command{gawk} developers chose
-to start with the simplest, most basic features that are still truly useful.
-
-Another decision is that although @command{gawk} provides nice things like
-MPFR, and arrays indexed internally by integers, these features are not
-being brought out to the API in order to keep things simple and close to
-traditional @command{awk} semantics.  (In fact, arrays indexed internally
-by integers are so transparent that they aren't even documented!)
-
-With time, the API will undoubtedly evolve; the @command{gawk} developers
-expect this to be driven by user needs. For now, the current API seems
-to provide a minimal yet powerful set of features for creating extensions.
-
address@hidden Extension Mechanism Outline
address@hidden At A High Level How It Works
-
-The requirement to avoid access to @command{gawk}'s symbols is, at first
-glance, a difficult one to meet.
-
-One design, apparently used by Perl and Ruby and maybe others, would
-be to make the mainline @command{gawk} code into a library, with the
address@hidden utility a small C @code{main()} function linked against
-the library.
-
-This seemed like the tail wagging the dog, complicating build and
-installation and making a simple copy of the @command{gawk} executable
-from one system to another (or one place to another on the same
-system!) into a chancy operation.
-
-Pat Rankin suggested the solution that was adopted. Communication between
address@hidden and an extension is two-way.  First, when an extension
-is loaded, it is passed a pointer to a @code{struct} whose fields are
-function pointers.
address@hidden
-This is shown in @ref{load-extension}.
address@hidden iftex
-
address@hidden Figure,load-extension
address@hidden the extension}
address@hidden
address@hidden @image{api-figure1, , , Loading the extension, txt}
address@hidden ifinfo
address@hidden
address@hidden @image{api-figure1, , , Loading the extension, png}
address@hidden ifhtml
address@hidden
address@hidden
address@hidden @image{api-figure1, , , Loading the extension}
address@hidden ifnothtml
address@hidden ifnotinfo
address@hidden float
-
-The extension can call functions inside @command{gawk} through these
-function pointers, at runtime, without needing (link-time) access
-to @command{gawk}'s symbols.  One of these function pointers is to a
-function for ``registering'' new built-in functions.
address@hidden
-This is shown in @ref{load-new-function}.
address@hidden iftex
-
address@hidden Figure,load-new-function
address@hidden the new function}
address@hidden
address@hidden @image{api-figure2, , , Loading the new function, txt}
address@hidden ifinfo
address@hidden
address@hidden @image{api-figure2, , , Loading the new function, png}
address@hidden ifhtml
address@hidden
address@hidden
address@hidden @image{api-figure2, , , Loading the new function}
address@hidden ifnothtml
address@hidden ifnotinfo
address@hidden float
-
-In the other direction, the extension registers its new functions
-with @command{gawk} by passing function pointers to the functions that
-provide the new feature (@code{do_chdir()}, for example).  @command{gawk}
-associates the function pointer with a name and can then call it, using a
-defined calling convention.
address@hidden
-This is shown in @ref{call-new-function}.
address@hidden iftex
-
address@hidden Figure,call-new-function
address@hidden the new function}
address@hidden
address@hidden @image{api-figure3, , , Calling the new function, txt}
address@hidden ifinfo
address@hidden
address@hidden @image{api-figure3, , , Calling the new function, png}
address@hidden ifhtml
address@hidden
address@hidden
address@hidden @image{api-figure3, , , Calling the new function}
address@hidden ifnothtml
address@hidden ifnotinfo
address@hidden float
-
-The @address@hidden()} function, in turn, then uses the function
-pointers in the API @code{struct} to do its work, such as updating
-variables or arrays, printing messages, setting @code{ERRNO}, and so on.
-
-Convenience macros in the @file{gawkapi.h} header file make calling
-through the function pointers look like regular function calls so that
-extension code is quite readable and understandable.
-
-Although all of this sounds medium complicated, the result is that
-extension code is quite clean and straightforward. This can be seen in
-the sample extensions @file{filefuncs.c} (@pxref{Extension Example})
-and also the @file{testext.c} code for testing the APIs.
-
-Some other bits and pieces:
-
address@hidden @bullet
address@hidden
-The API provides access to @command{gawk}'s @address@hidden values,
-reflecting command line options, like @code{do_lint}, @code{do_profiling}
-and so on (@pxref{Extension API Variables}).
-These are informational: an extension cannot affect these
-inside @command{gawk}.  In addition, attempting to assign to them
-produces a compile-time error.
-
address@hidden
-The API also provides major and minor version numbers, so that an
-extension can check if the @command{gawk} it is loaded with supports the
-facilities it was compiled with.  (Version mismatches ``shouldn't''
-happen, but we all know how @emph{that} goes.)
address@hidden Versioning}, for details.
address@hidden itemize
-
address@hidden Extension Future Growth
address@hidden Room For Future Growth
-
-The API provides room for future growth, in two ways.
-
-An ``extension id'' is passed into the extension when its loaded. This
-extension id is then passed back to @command{gawk} with each function
-call.  This allows @command{gawk} to identify the extension calling into it,
-should it need to know.
-
-A ``name space'' is passed into @command{gawk} when an extension function
-is registered.  This provides for a future mechanism for grouping
-extension functions and possibly avoiding name conflicts.
-
-Of course, as of this writing, no decisions have been made with respect
-to any of the above.
-
address@hidden Extension API Description
address@hidden API Description
-
-This (rather large) @value{SECTION} describes the API in detail.
-
address@hidden
-* Extension API Functions Introduction:: Introduction to the API functions.
-* General Data Types::                   The data types.
-* Requesting Values::                    How to get a value.
-* Constructor Functions::                Functions for creating values.
-* Registration Functions::               Functions to register things with
-                                         @command{gawk}.
-* Printing Messages::                    Functions for printing messages.
-* Updating @code{ERRNO}::                Functions for updating @code{ERRNO}.
-* Accessing Parameters::                 Functions for accessing parameters.
-* Symbol Table Access::                  Functions for accessing global
-                                         variables.
-* Array Manipulation::                   Functions for working with arrays.
-* Extension API Variables::              Variables provided by the API.
-* Extension API Boilerplate::            Boilerplate code for using the API.
-* Finding Extensions::                   How @command{gawk} find compiled
-                                         extensions.
address@hidden menu
-
address@hidden Extension API Functions Introduction
address@hidden Introduction
-
-Access to facilities within @command{gawk} are made available
-by calling through function pointers passed into your extension.
-
-API function pointers are provided for the following kinds of operations:
-
address@hidden @bullet
address@hidden
-Registrations functions. You may register:
address@hidden @minus
address@hidden
-extension functions,
address@hidden
-exit callbacks,
address@hidden
-a version string,
address@hidden
-input parsers,
address@hidden
-output wrappers,
address@hidden
-and two-way processors.
address@hidden itemize
-All of these are discussed in detail, later in this @value{CHAPTER}.
-
address@hidden
-Printing fatal, warning, and ``lint'' warning messages.
-
address@hidden
-Updating @code{ERRNO}, or unsetting it.
-
address@hidden
-Accessing parameters, including converting an undefined parameter into
-an array.
-
address@hidden
-Symbol table access: retrieving a global variable, creating one,
-or changing one.  This also includes the ability to create a scalar
-variable that will be @emph{constant} within @command{awk} code.
-
address@hidden
-Creating and releasing cached values; this provides an
-efficient way to use values for multiple variables and
-can be a big performance win.
-
address@hidden
-Manipulating arrays:
address@hidden @minus
address@hidden
-Retrieving, adding, deleting, and modifying elements
address@hidden
-Getting the count of elements in an array
address@hidden
-Creating a new array
address@hidden
-Clearing an array
address@hidden
-Flattening an array for easy C style looping over all its indices and elements
address@hidden itemize
address@hidden itemize
-
-Some points about using the API:
-
address@hidden @bullet
address@hidden
-You must include @code{<sys/types.h>} and @code{<sys/stat.h>} before including
-the @file{gawkapi.h} header file. In addition, you must include either
address@hidden<stddef.h>} or @code{<stdlib.h>} to get the definition of 
@code{size_t}.
-If you wish to use the boilerplate @code{dl_load_func()} macro, you will
-need to include @code{<stdio.h>} as well.
-Finally, to pass reasonable integer values for @code{ERRNO}, you
-will need to include @code{<errno.h>}.
-
address@hidden
-Although the API only uses ISO C 90 features, there is an exception; the
-``constructor'' functions use the @code{inline} keyword. If your compiler
-does not support this keyword, you should either place
address@hidden''} on your command line, or use the GNU Autotools and include a
address@hidden file in your extensions.
-
address@hidden
-All pointers filled in by @command{gawk} are to memory
-managed by @command{gawk} and should be treated by the extension as
-read-only.  Memory for @emph{all} strings passed into @command{gawk}
-from the extension @emph{must} come from @code{malloc()} and is managed
-by @command{gawk} from then on.
-
address@hidden
-The API defines several simple structs that map values as seen
-from @command{awk}.  A value can be a @code{double}, a string, or an
-array (as in multidimensional arrays, or when creating a new array).
-Strings maintain both pointer and length since embedded @code{NUL}
-characters are allowed.
-
-By intent, strings are maintained using the current multibyte encoding (as
-defined by @address@hidden environment variables) and not using wide
-characters.  This matches how @command{gawk} stores strings internally
-and also how characters are likely to be input and output from files.
-
address@hidden
-When retrieving a value (such as a parameter or that of a global variable
-or array element), the extension requests a specific type (number, string,
-scalars, value cookie, array, or ``undefined'').  When the request is
-``undefined,'' the returned value will have the real underlying type.
-
-However, if the request and actual type don't match, the access function
-returns ``false'' and fills in the type of the actual value that is there,
-so that the extension can, e.g., print an error message
-(``scalar passed where array expected'').
-
address@hidden This is documented in the header file and needs some expanding 
upon.
address@hidden The table there should be presented here
address@hidden itemize
-
-While you may call the API functions by using the function pointers
-directly, the interface is not so pretty. To make extension code look
-more like regular code, the @file{gawkapi.h} header file defines a number
-of macros which you should use in your code.  This @value{SECTION} presents
-the macros as if they were functions.
-
address@hidden General Data Types
address@hidden General Purpose Data Types
-
address@hidden
address@hidden have a true love/hate relationship with address@hidden
-Arnold Robbins
-
address@hidden's the thing about unions: the compiler will arrange things so 
they
-can accommodate both love and address@hidden
-Chet Ramey
address@hidden quotation
-
-The extension API defines a number of simple types and structures for general
-purpose use. Additional, more specialized, data structures, are introduced
-in subsequent @value{SECTION}s, together with the functions that use them.
-
address@hidden @code
address@hidden typedef void *awk_ext_id_t;
-A value of this type is received from @command{gawk} when an extension is 
loaded.
-That value must then be passed back to @command{gawk} as the first parameter of
-each API function.
-
address@hidden #define awk_const @dots{}
-This macro expands to @samp{const} when compiling an extension,
-and to nothing when compiling @command{gawk} itself.  This makes
-certain fields in the API data structures unwritable from extension code,
-while allowing @command{gawk} to use them as it needs to.
-
address@hidden typedef int awk_bool_t;
-A simple boolean type. At the moment, the API does not define special
-``true'' and ``false'' values, although perhaps it should.
-
address@hidden typedef struct @{
address@hidden @ @ @ @ char *str;@ @ @ @ @ @ /* data */
address@hidden @ @ @ @ size_t len;@ @ @ @ @ /* length thereof, in chars */
address@hidden @} awk_string_t;
-This represents a mutable string. @command{gawk}
-owns the memory pointed to if it supplied
-the value. Otherwise, it takes ownership of the memory pointed to.
address@hidden memory must come from @code{malloc()}!}
-
-As mentioned earlier, strings are maintained using the current
-multibyte encoding.
-
address@hidden typedef enum @{
address@hidden @ @ @ @ AWK_UNDEFINED,
address@hidden @ @ @ @ AWK_NUMBER,
address@hidden @ @ @ @ AWK_STRING,
address@hidden @ @ @ @ AWK_ARRAY,
address@hidden @ @ @ @ AWK_SCALAR,@ @ @ @ @ @ @ @ @ /* opaque access to a 
variable */
address@hidden @ @ @ @ AWK_VALUE_COOKIE@ @ @ /* for updating a previously 
created value */
address@hidden @} awk_valtype_t;
-This @code{enum} indicates the type of a value.
-It is used in the following @code{struct}.
-
address@hidden typedef struct @{
address@hidden @ @ @ @ awk_valtype_t   val_type;
address@hidden @ @ @ @ union @{
address@hidden @ @ @ @ @ @ @ @ awk_string_t@ @ @ @ @ @ @ s;
address@hidden @ @ @ @ @ @ @ @ double@ @ @ @ @ @ @ @ @ @ @ @ @ d;
address@hidden @ @ @ @ @ @ @ @ awk_array_t@ @ @ @ @ @ @ @ a;
address@hidden @ @ @ @ @ @ @ @ awk_scalar_t@ @ @ @ @ @ @ scl;
address@hidden @ @ @ @ @ @ @ @ awk_value_cookie_t@ vc;
address@hidden @ @ @ @ @} u;
address@hidden @} awk_value_t;
-An address@hidden value.''  
-The @code{val_type} member indicates what kind of value the
address@hidden holds, and each member is of the appropriate type.
-
address@hidden #define str_value@ @ @ @ @ @ u.s
address@hidden #define num_value@ @ @ @ @ @ u.d
address@hidden #define array_cookie@ @ @ u.a
address@hidden #define scalar_cookie@ @ u.scl
address@hidden #define value_cookie@ @ @ u.vc
-These macros make accessing the fields of the @code{awk_value_t} more
-readable.
-
address@hidden typedef void *awk_scalar_t;
-Scalars can be represented as an opaque type. These values are obtained from
address@hidden and then passed back into it. This is discussed in a general 
fashion below,
-and in more detail in @ref{Symbol table by cookie}.
-
address@hidden typedef void *awk_value_cookie_t;
-A ``value cookie'' is an opaque type representing a cached value.
-This is also discussed in a general fashion below,
-and in more detail in @ref{Cached values}.
-
address@hidden table
-
-Scalar values in @command{awk} are either numbers or strings. The
address@hidden struct represents values.  The @code{val_type} member
-indicates what is in the @code{union}.
-
-Representing numbers is easy---the API uses a C @code{double}.  Strings
-require more work. Since @command{gawk} allows embedded @code{NUL} bytes
-in string values, a string must be represented as a pair containing a
-data-pointer and length. This is the @code{awk_string_t} type.
-
-Identifiers (i.e., the names of global variables) can be associated
-with either scalar values or with arrays.  In addition, @command{gawk}
-provides true arrays of arrays, where any given array element can
-itself be an array.  Discussion of arrays is delayed until
address@hidden Manipulation}.
-
-The various macros listed earlier make it easier to use the elements
-of the @code{union} as if they were fields in a @code{struct}; this
-is a common coding practice in C.  Such code is easier to write and to
-read, however it remains @emph{your} responsibility to make sure that
-the @code{val_type} member correctly reflects the type of the value in
-the @code{awk_value_t}.
-
-Conceptually, the first three members of the @code{union} (number, string,
-and array) are all that is needed for working with @command{awk} values.
-However, since the API provides routines for accessing and changing
-the value of global scalar variables only by using the variable's name,
-there is a performance penalty: @command{gawk} must find the variable
-each time it is accessed and changed.  This turns out to be a real issue,
-not just a theoretical one.
-
-Thus, if you know that your extension will spend considerable time
-reading and/or changing the value of one or more scalar variables, you
-can obtain a @dfn{scalar address@hidden
address@hidden://catb.org/jargon/html/C/cookie.html, the ``cookie'' entry in 
the Jargon file} for a
-definition of @dfn{cookie}, and 
@uref{http://catb.org/jargon/html/M/magic-cookie.html,
-the ``magic cookie'' entry in the Jargon file} for a nice example. See
-also the entry for ``Cookie'' in the @ref{Glossary}.}
-object for that variable, and then use
-the cookie for getting the variable's value or for changing the variable's
-value.
-This is the @code{awk_scalar_t} type and @code{scalar_cookie} macro.
-Given a scalar cookie, @command{gawk} can directly retrieve or
-modify the value, as required, without having to first find it.
-
-The @code{awk_value_cookie_t} type and @code{value_cookie} macro are similar.
-If you know that you wish to
-use the same numeric or string @emph{value} for one or more variables,
-you can create the value once, retaining a @dfn{value cookie} for it,
-and then pass in that value cookie whenever you wish to set the value of a
-variable.  This saves both storage space within the running @command{gawk}
-process as well as the time needed to create the value.
-
address@hidden Requesting Values
address@hidden Requesting Values
-
-All of the functions that return values from @command{gawk}
-work in the same way. You pass in an @code{awk_valtype_t} value
-to indicate what kind of value you expect.  If the actual value
-matches what you requested, the function returns true and fills
-in the @code{awk_value_t} result.
-Otherwise, the function returns false, and the @code{val_type}
-member indicates the type of the actual value.  You may then
-print an error message, or reissue the request for the actual
-value type, as appropriate.  This behavior is summarized in
address@hidden
-
address@hidden
address@hidden Table,table-value-types-returned
address@hidden Types Returned}
address@hidden @columnfractions .50 .50
address@hidden @tab Type of Actual Value:
address@hidden multitable
address@hidden @columnfractions .166 .166 .198 .15 .15 .166
address@hidden @tab @tab String @tab Number @tab Array @tab Undefined
address@hidden @tab @b{String} @tab String @tab String @tab false @tab false
address@hidden @tab @b{Number} @tab Number if can be converted, else false @tab 
Number @tab false @tab false
address@hidden @b{Type} @tab @b{Array} @tab false @tab false @tab Array @tab 
false
address@hidden @b{Requested:} @tab @b{Scalar} @tab Scalar @tab Scalar @tab 
false @tab false
address@hidden @tab @b{Undefined} @tab String @tab Number @tab Array @tab 
Undefined
address@hidden @tab @b{Value Cookie} @tab false @tab false @tab false @tab false
address@hidden multitable
address@hidden float
address@hidden ifnotplaintext
address@hidden
address@hidden Table,table-value-types-returned
address@hidden Types Returned}
address@hidden
-                        +-------------------------------------------------+
-                        |                Type of Actual Value:            |
-                        +------------+------------+-----------+-----------+
-                        |   String   |   Number   | Array     | Undefined |
-+-----------+-----------+------------+------------+-----------+-----------+
-|           | String    |   String   |   String   | false     | false     |
-|           |-----------+------------+------------+-----------+-----------+
-|           | Number    | Number if  |   Number   | false     | false     |
-|           |           | can be     |            |           |           |
-|           |           | converted, |            |           |           |
-|           |           | else false |            |           |           |
-|           |-----------+------------+------------+-----------+-----------+
-|   Type    | Array     |   false    |   false    | Array     | false     |
-| Requested |-----------+------------+------------+-----------+-----------+
-|           | Scalar    |   Scalar   |   Scalar   | false     | false     |
-|           |-----------+------------+------------+-----------+-----------+
-|           | Undefined |  String    |   Number   | Array     | Undefined |
-|           |-----------+------------+------------+-----------+-----------+
-|           | Value     |   false    |   false    | false     | false     |
-|           | Cookie    |            |            |           |           |
-+-----------+-----------+------------+------------+-----------+-----------+
address@hidden example
address@hidden float
address@hidden ifplaintext
-
address@hidden Constructor Functions
address@hidden Constructor Functions and Convenience Macros
-
-The API provides a number of @dfn{constructor} functions for creating
-string and numeric values, as well as a number of convenience macros.
-This @value{SUBSECTION} presents them all as function prototypes, in
-the way that extension code would use them.
-
address@hidden @code
address@hidden static inline awk_value_t *
address@hidden make_const_string(const char *string, size_t length, awk_value_t 
*result)
-This function creates a string value in the @code{awk_value_t} variable
-pointed to by @code{result}. It expects @code{string} to be a C string constant
-(or other string data), and automatically creates a @emph{copy} of the data
-for storage in @code{result}. It returns @code{result}.
-
address@hidden static inline awk_value_t *
address@hidden make_malloced_string(const char *string, size_t length, 
awk_value_t *result)
-This function creates a string value in the @code{awk_value_t} variable
-pointed to by @code{result}. It expects @code{string} to be a @samp{char *}
-value pointing to data previously obtained from @code{malloc()}. The idea here
-is that the data is passed directly to @command{gawk}, which assumes
-responsibility for it. It returns @code{result}.
-
address@hidden static inline awk_value_t *
address@hidden make_null_string(awk_value_t *result)
-This specialized function creates a null string (the ``undefined'' value)
-in the @code{awk_value_t} variable pointed to by @code{result}.
-It returns @code{result}.
-
address@hidden static inline awk_value_t *
address@hidden make_number(double num, awk_value_t *result)
-This function simply creates a numeric value in the @code{awk_value_t} variable
-pointed to by @code{result}.
address@hidden table
-
-Two convenience macros may be used for allocating storage from @code{malloc()}
-and @code{realloc()}. If the allocation fails, they cause @command{gawk} to
-exit with a fatal error message.  They should be used as if they were
-procedure calls that do not return a value.
-
address@hidden @code
address@hidden emalloc(pointer, type, size, message)
-The arguments to this macro are as follows:
address@hidden nested table
address@hidden @code
address@hidden pointer
-The pointer variable to point at the allocated storage.
-
address@hidden type
-The type of the pointer variable, used to create a cast for the call to 
@code{malloc()}.
-
address@hidden size
-The total number of bytes to be allocated.
-
address@hidden message
-A message to be prefixed to the fatal error message. Typically this is the name
-of the function using the macro.
address@hidden table
-
address@hidden
-For example, you might allocate a string value like so:
-
address@hidden
-awk_value_t result;
-char *message;
-const char greet[] = "Don't Panic!";
-
-emalloc(message, char *, sizeof(greet), "myfunc");
-strcpy(message, greet);
-make_malloced_string(message, strlen(message), & result);
address@hidden example
-
address@hidden erealloc(pointer, type, size, message)
-This is like @code{emalloc()}, but it calls @code{realloc()},
-instead of @code{malloc()}.
-The arguments are the same as for the @code{emalloc()} macro.
address@hidden table
-
address@hidden Registration Functions
address@hidden Registration Functions
-
-This @value{SECTION} describes the API functions for
-registering parts of your extension with @command{gawk}.
-
address@hidden
-* Extension Functions::         Registering extension functions.
-* Exit Callback Functions::     Registering an exit callback.
-* Extension Version String::    Registering a version string.
-* Input Parsers::               Registering an input parser.
-* Output Wrappers::             Registering an output wrapper.
-* Two-way processors::          Registering a two-way processor.
address@hidden menu
-
address@hidden Extension Functions
address@hidden Registering An Extension Function
-
-Extension functions are described by the following record:
-
address@hidden
-typedef struct @{
-@ @ @ @ const char *name;
-@ @ @ @ awk_value_t *(*function)(int num_actual_args, awk_value_t *result);
-@ @ @ @ size_t num_expected_args;
address@hidden awk_ext_func_t;
address@hidden example
-
-The fields are:
-
address@hidden @code
address@hidden const char *name;
-The name of the new function.
address@hidden level code calls the function by this name.
-This is a regular C string.
-
address@hidden awk_value_t *(*function)(int num_actual_args, awk_value_t 
*result);
-This is a pointer to the C function that provides the desired
-functionality.
-The function must fill in the result with either a number
-or a string. @command{awk} takes ownership of any string memory.
-As mentioned earlier, string memory @strong{must} come from @code{malloc()}.
-
-The function must return the value of @code{result}.
-This is for the convenience of the calling code inside @command{gawk}.
-
address@hidden size_t num_expected_args;
-This is the number of arguments the function expects to receive.
-Each extension function may decide what to do if the number of
-arguments isn't what it expected.  Following @command{awk} functions, it
-is likely OK to ignore extra arguments.
address@hidden table
-
-Once you have a record representing your extension function, you register
-it with @command{gawk} using this API function:
-
address@hidden @code
address@hidden awk_bool_t add_ext_func(const char *namespace, const 
awk_ext_func_t *func);
-This function returns true upon success, false otherwise.
-The @code{namespace} parameter is currently not used; you should pass in an
-empty string (@code{""}).  The @code{func} pointer is the address of a
address@hidden representing your function, as just described.
address@hidden table
-
address@hidden Exit Callback Functions
address@hidden Registering An Exit Callback Function
-
-An @dfn{exit callback} function is a function that
address@hidden calls before it exits.
-Such functions are useful if you have general ``clean up'' tasks
-that should be performed in your extension (such as closing data
-base connections or other resource deallocations).
-You can register such
-a function with @command{gawk} using the following function.
-
address@hidden @code
address@hidden void awk_atexit(void (*funcp)(void *data, int exit_status),
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ void *arg0);
-The parameters are:
address@hidden nested table
address@hidden @code
address@hidden funcp
-A pointer to the function to be called before @command{gawk} exits. The 
@code{data}
-parameter will be the original value of @code{arg0}.
-The @code{exit_status} parameter is
-the exit status value that @command{gawk} will pass to the @code{exit()} 
system call.
-
address@hidden arg0
-A pointer to private data which @command{gawk} saves in order to pass to
-the function pointed to by @code{funcp}.
address@hidden table
address@hidden table
-
-Exit callback functions are called in Last-In-First-Out (LIFO) order---that 
is, in
-the reverse order in which they are registered with @command{gawk}.
-
address@hidden Extension Version String
address@hidden Registering An Extension Version String
-
-You can register a version string which indicates the name and
-version of your extension, with @command{gawk}, as follows:
-
address@hidden @code
address@hidden void register_ext_version(const char *version);
-Register the string pointed to by @code{version} with @command{gawk}.
address@hidden does @emph{not} copy the @code{version} string, so
-it should not be changed.
address@hidden table
-
address@hidden prints all registered extension version strings when it
-is invoked with the @option{--version} option.
-
address@hidden Input Parsers
address@hidden Customized Input Parsers
-
-By default, @command{gawk} reads text files as its input. It uses the value
-of @code{RS} to find the end of the record, and then uses @code{FS}
-(or @code{FIELDWIDTHS}) to split it into fields (@pxref{Reading Files}).
-Additionally, it sets the value of @code{RT} (@pxref{Built-in Variables}).
-
-If you want, you can provide your own, custom, input parser.  An input
-parser's job is to return a record to the @command{gawk} record processing
-code, along with indicators for the value and length of the data to be
-used for @code{RT}, if any.
-
-To provide an input parser, you must first provide two functions
-(where @var{XXX} is a prefix name for your extension):
-
address@hidden @code
address@hidden awk_bool_t @var{XXX}_can_take_file(const awk_input_buf_t *iobuf)
-This function examines the information available in @code{iobuf}
-(which we discuss shortly).  Based on the information there, it
-decides if the input parser should be used for this file.
-If so, it should return true. Otherwise, it should return false.
-It should not change any state (variable values, etc.) within @command{gawk}.
-
address@hidden awk_bool_t @var{XXX}_take_control_of(awk_input_buf_t *iobuf)
-When @command{gawk} decides to hand control of the file over to the
-input parser, it calls this function.  This function in turn must fill
-in certain fields in the @code{awk_input_buf_t} structure, and ensure
-that certain conditions are true.  It should then return true. If an
-error of some kind occurs, it should not fill in any fields, and should
-return false; then @command{gawk} will not use the input parser.
-The details are presented shortly.
address@hidden table
-
-Your extension should package these functions inside an
address@hidden, which looks like this:
-
address@hidden
-typedef struct input_parser @{
-    const char *name;   /* name of parser */
-    awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
-    awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
-    awk_const struct input_parser *awk_const next;    /* for use by gawk */
address@hidden awk_input_parser_t;
address@hidden example
-
-The fields are:
-
address@hidden @code
address@hidden const char *name;
-The name of the input parser. This is a regular C string.
-
address@hidden awk_bool_t (*can_take_file)(const awk_input_buf_t *iobuf);
-A pointer to your @address@hidden()} function.
-
address@hidden awk_bool_t (*take_control_of)(awk_input_buf_t *iobuf);
-A pointer to your @address@hidden()} function.
-
address@hidden awk_const struct input_parser *awk_const next;
-This pointer is used by @command{gawk}.
-The extension cannot modify it.
address@hidden table
-
-The steps are as follows:
-
address@hidden
address@hidden
-Create a @code{static awk_input_parser_t} variable and initialize it
-appropriately.
-
address@hidden
-When your extension is loaded, register your input parser with
address@hidden using the @code{register_input_parser()} API function
-(described below).
address@hidden enumerate
-
-An @code{awk_input_buf_t} looks like this:
-
address@hidden
-typedef struct awk_input @{
-    const char *name;       /* filename */
-    int fd;                 /* file descriptor */
-#define INVALID_HANDLE (-1)
-    void *opaque;           /* private data for input parsers */
-    int (*get_record)(char **out, struct awk_input *iobuf,
-                      int *errcode, char **rt_start, size_t *rt_len);
-    void (*close_func)(struct awk_input *iobuf);
-    struct stat sbuf;       /* stat buf */
address@hidden awk_input_buf_t;
address@hidden example
-
-The fields can be divided into two categories: those for use (initially,
-at least) by @address@hidden()}, and those for use by
address@hidden@var{XXX}_take_control_of()}.  The first group of fields and 
their uses
-are as follows:
-
address@hidden @code
address@hidden const char *name;
-The name of the file.
-
address@hidden int fd;
-A file descriptor for the file.  If @command{gawk} was able to
-open the file, then @code{fd} will @emph{not} be equal to 
address@hidden Otherwise, it will.
-
address@hidden struct stat sbuf;
-If file descriptor is valid, then @command{gawk} will have filled
-in this structure via a call to the @code{fstat()} system call.
address@hidden table
-
-The @address@hidden()} function should examine these
-fields and decide if the input parser should be used for the file.
-The decision can be made based upon @command{gawk} state (the value
-of a variable defined previously by the extension and set by
address@hidden code), the name of the
-file, whether or not the file descriptor is valid, the information
-in the @code{struct stat}, or any combination of the above.
-
-Once @address@hidden()} has returned true, and
address@hidden has decided to use your input parser, it calls
address@hidden@var{XXX}_take_control_of()}.  That function then fills in at
-least the @code{get_record} field of the @code{awk_input_buf_t}.  It must
-also ensure that @code{fd} is not set to @code{INVALID_HANDLE}.  All of
-the fields that may be filled by @address@hidden()}
-are as follows:
-
address@hidden @code
address@hidden void *opaque;
-This is used to hold any state information needed by the input parser
-for this file.  It is ``opaque'' to @command{gawk}.  The input parser
-is not required to use this pointer.
-
address@hidden int@ (*get_record)(char@ **out,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ struct@ awk_input *iobuf,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int *errcode,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ char **rt_start,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ size_t *rt_len);
-This function pointer should point to a function that creates the input
-records.  Said function is the core of the input parser.  Its behavior
-is described below.
-
address@hidden void (*close_func)(struct awk_input *iobuf);
-This function pointer should point to a function that does
-the ``tear down.'' It should release any resources allocated by
address@hidden@var{XXX}_take_control_of()}.  It may also close the file. If it
-does so, it should set the @code{fd} field to @code{INVALID_HANDLE}.
-
-If @code{fd} is still not @code{INVALID_HANDLE} after the call to this
-function, @command{gawk} calls the regular @code{close()} system call.
-
-Having a ``tear down'' function is optional. If your input parser does
-not need it, do not set this field.  Then, @command{gawk} calls the
-regular @code{close()} system call on the file descriptor, so it should
-be valid.
address@hidden table
-
-The @address@hidden()} function does the work of creating
-input records.  The parameters are as follows:
-
address@hidden @code
address@hidden char **out
-This is a pointer to a @code{char *} variable which is set to point
-to the record.  @command{gawk} makes its own copy of the data, so
-the extension must manage this storage.
-
address@hidden struct awk_input *iobuf
-This is the @code{awk_input_buf_t} for the file.  The fields should be
-used for reading data (@code{fd}) and for managing private state
-(@code{opaque}), if any.
-
address@hidden int *errcode
-If an error occurs, @code{*errcode} should be set to an appropriate
-code from @code{<errno.h>}.
-
address@hidden char **rt_start
address@hidden size_t *rt_len
-If the concept of a ``record terminator'' makes sense, then
address@hidden should be set to point to the data to be used for
address@hidden, and @code{*rt_len} should be set to the length of the
-data. Otherwise, @code{*rt_len} should be set to zero.
address@hidden makes its own copy of this data, so the
-extension must manage the storage.
address@hidden table
-
-The return value is the length of the buffer pointed to by
address@hidden, or @code{EOF} if end-of-file was reached or an
-error occurred.
-
-It is guaranteed that @code{errcode} is a valid pointer, so there is no
-need to test for a @code{NULL} value.  @command{gawk} sets @code{*errcode}
-to zero, so there is no need to set it unless an error occurs.
-
-If an error does occur, the function should return @code{EOF} and set
address@hidden to a non-zero value.  In that case, if @code{*errcode}
-does not equal @minus{}1, @command{gawk} automatically updates
-the @code{ERRNO} variable based on the value of @code{*errcode} (e.g.,
-setting @samp{*errcode = errno} should do the right thing).
-
address@hidden ships with a sample extension that reads directories,
-returning records for each entry in the directory (@pxref{Extension
-Sample Readdir}).  You may wish to use that code as a guide for writing
-your own input parser.
-
-When writing an input parser, you should think about (and document)
-how it is expected to interact with @command{awk} code.  You may want
-it to always be called, and take effect as appropriate (as the
address@hidden extension does).  Or you may want it to take effect
-based upon the value of an @code{awk} variable, as the XML extension
-from the @code{gawkextlib} project does (@pxref{gawkextlib}).
-In the latter case, code in a @code{BEGINFILE} section
-can look at @code{FILENAME} and @code{ERRNO} to decide whether or
-not to activate an input parser (@pxref{BEGINFILE/ENDFILE}).
-
-You register your input parser with the following function:
-
address@hidden @code
address@hidden void register_input_parser(awk_input_parser_t *input_parser);
-Register the input parser pointed to by @code{input_parser} with
address@hidden
address@hidden table
-
address@hidden Output Wrappers
address@hidden Customized Output Wrappers
-
-An @dfn{output wrapper} is the mirror image of an input parser.
-It allows an extension to take over the output to a file opened
-with the @samp{>} or @samp{>>} operators (@pxref{Redirection}).
-
-The output wrapper is very similar to the input parser structure:
-
address@hidden
-typedef struct output_wrapper @{
-    const char *name;   /* name of the wrapper */
-    awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
-    awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
-    awk_const struct output_wrapper *awk_const next;  /* for use by gawk */
address@hidden awk_output_wrapper_t;
address@hidden example
-
-The members are as follows:
-
address@hidden @code
address@hidden const char *name;
-This is the name of the output wrapper.
-
address@hidden awk_bool_t (*can_take_file)(const awk_output_buf_t *outbuf);
-This points to a function that examines the information in
-the @code{awk_output_buf_t} structure pointed to by @code{outbuf}.
-It should return true if the output wrapper wants to take over the
-file, and false otherwise.  It should not change any state (variable
-values, etc.) within @command{gawk}.
-
address@hidden awk_bool_t (*take_control_of)(awk_output_buf_t *outbuf);
-The function pointed to by this field is called when @command{gawk}
-decides to let the output wrapper take control of the file. It should
-fill in appropriate members of the @code{awk_output_buf_t} structure,
-as described below, and return true if successful, false otherwise.
-
address@hidden awk_const struct output_wrapper *awk_const next;
-This is for use by @command{gawk}.
address@hidden table
-
-The @code{awk_output_buf_t} structure looks like this:
-
address@hidden
-typedef struct @{
-    const char *name;   /* name of output file */
-    const char *mode;   /* mode argument to fopen */
-    FILE *fp;           /* stdio file pointer */
-    awk_bool_t redirected;  /* true if a wrapper is active */
-    void *opaque;       /* for use by output wrapper */
-    size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
-                FILE *fp, void *opaque);
-    int (*gawk_fflush)(FILE *fp, void *opaque);
-    int (*gawk_ferror)(FILE *fp, void *opaque);
-    int (*gawk_fclose)(FILE *fp, void *opaque);
address@hidden awk_output_buf_t;
address@hidden example
-
-Here too, your extension will define @address@hidden()}
-and @address@hidden()} functions that examine and update
-data members in the @code{awk_output_buf_t}.
-The data members are as follows:
-
address@hidden @code
address@hidden const char *name;
-The name of the output file.
-
address@hidden const char *mode;
-The mode string (as would be used in the second argument to @code{fopen()})
-with which the file was opened.
-
address@hidden FILE *fp;
-The @code{FILE} pointer from @code{<stdio.h>}. @command{gawk} opens the file
-before attempting to find an output wrapper.
-
address@hidden awk_bool_t redirected;
-This field must be set to true by the @address@hidden()} function.
-
address@hidden void *opaque;
-This pointer is opaque to @command{gawk}. The extension should use it to store
-a pointer to any private data associated with the file.
-
address@hidden size_t (*gawk_fwrite)(const void *buf, size_t size, size_t count,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ FILE *fp, void 
*opaque);
address@hidden int (*gawk_fflush)(FILE *fp, void *opaque);
address@hidden int (*gawk_ferror)(FILE *fp, void *opaque);
address@hidden int (*gawk_fclose)(FILE *fp, void *opaque);
-These pointers should be set to point to functions that perform
-the equivalent function as the @code{<stdio.h>} functions do, if appropriate.
address@hidden uses these function pointers for all output.
address@hidden initializes the pointers to point to internal, ``pass through''
-functions that just call the regular @code{<stdio.h>} functions, so an
-extension only needs to redefine those functions that are appropriate for
-what it does.
address@hidden table
-
-The @address@hidden()} function should make a decision based
-upon the @code{name} and @code{mode} fields, and any additional state
-(such as @command{awk} variable values) that is appropriate.
-
-When @command{gawk} calls @address@hidden()}, it should fill
-in the other fields, as appropriate, except for @code{fp}, which it should just
-use normally.
-
-You register your output wrapper with the following function:
-
address@hidden @code
address@hidden void register_output_wrapper(awk_output_wrapper_t 
*output_wrapper);
-Register the output wrapper pointed to by @code{output_wrapper} with
address@hidden
address@hidden table
-
address@hidden Two-way processors
address@hidden Customized Two-way Processors
-
-A @dfn{two-way processor} combines an input parser and an output wrapper for
-two-way I/O with the @samp{|&} operator (@pxref{Redirection}).  It makes 
identical
-use of the @code{awk_input_parser_t} and @code{awk_output_buf_t} structures
-as described earlier.
-
-A two-way processor is represented by the following structure:
-
address@hidden
-typedef struct two_way_processor @{
-    const char *name;   /* name of the two-way processor */
-    awk_bool_t (*can_take_two_way)(const char *name);
-    awk_bool_t (*take_control_of)(const char *name,
-                                  awk_input_buf_t *inbuf,
-                                  awk_output_buf_t *outbuf);
-    awk_const struct two_way_processor *awk_const next;  /* for use by gawk */
address@hidden awk_two_way_processor_t;
address@hidden example
-
-The fields are as follows:
-
address@hidden @code
address@hidden const char *name;
-The name of the two-way processor.
-
address@hidden awk_bool_t (*can_take_two_way)(const char *name);
-This function returns true if it wants to take over two-way I/O for this 
filename.
-It should not change any state (variable
-values, etc.) within @command{gawk}.
-
address@hidden awk_bool_t (*take_control_of)(const char *name,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 
awk_input_buf_t *inbuf,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 
awk_output_buf_t *outbuf);
-This function should fill in the @code{awk_input_buf_t} and
address@hidden structures pointed to by @code{inbuf} and
address@hidden, respectively.  These structures were described earlier.
-
address@hidden awk_const struct two_way_processor *awk_const next;
-This is for use by @command{gawk}.
address@hidden table
-
-As with the input parser and output processor, you provide
-``yes I can take this'' and ``take over for this'' functions,
address@hidden@var{XXX}_can_take_two_way()} and @address@hidden()}.
-
-You register your two-way processor with the following function:
-
address@hidden @code
address@hidden void register_two_way_processor(awk_two_way_processor_t 
*two_way_processor);
-Register the two-way processor pointed to by @code{two_way_processor} with
address@hidden
address@hidden table
-
address@hidden Printing Messages
address@hidden Printing Messages
-
-You can print different kinds of warning messages from your
-extension, as described below.  Note that for these functions,
-you must pass in the extension id received from @command{gawk}
-when the extension was address@hidden the API uses only ISO C 90
-features, it cannot make use of the ISO C 99 variadic macro feature to hide
-that parameter. More's the pity.}
-
address@hidden @code
address@hidden void fatal(awk_ext_id_t id, const char *format, ...);
-Print a message and then cause @command{gawk} to exit immediately.
-
address@hidden void warning(awk_ext_id_t id, const char *format, ...);
-Print a warning message.
-
address@hidden void lintwarn(awk_ext_id_t id, const char *format, ...);
-Print a ``lint warning.''  Normally this is the same as printing a
-warning message, but if @command{gawk} was invoked with @samp{--lint=fatal},
-then lint warnings become fatal error messages.
address@hidden table
-
-All of these functions are otherwise like the C @code{printf()}
-family of functions, where the @code{format} parameter is a string
-with literal characters and formatting codes intermixed.
-
address@hidden Updating @code{ERRNO}
address@hidden Updating @code{ERRNO}
-
-The following functions allow you to update the @code{ERRNO}
-variable:
-
address@hidden @code
address@hidden void update_ERRNO_int(int errno_val);
-Set @code{ERRNO} to the string equivalent of the error code
-in @code{errno_val}. The value should be one of the defined
-error codes in @code{<errno.h>}, and @command{gawk} turns it
-into a (possibly translated) string using the C @code{strerror()} function.
-
address@hidden void update_ERRNO_string(const char *string);
-Set @code{ERRNO} directly to the string value of @code{ERRNO}.
address@hidden makes a copy of the value of @code{string}.
-
address@hidden void unset_ERRNO();
-Unset @code{ERRNO}.
address@hidden table
-
address@hidden Accessing Parameters
address@hidden Accessing and Updating Parameters
-
-Two functions give you access to the arguments (parameters)
-passed to your extension function. They are:
-
address@hidden @code
address@hidden awk_bool_t get_argument(size_t count,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t 
wanted,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t 
*result);
-Fill in the @code{awk_value_t} structure pointed to by @code{result}
-with the @code{count}'th argument.  Return true if the actual
-type matches @code{wanted}, false otherwise.  In the latter
-case, @address@hidden>}val_type} indicates the actual type
-(@pxref{table-value-types-returned}).  Counts are zero based---the first
-argument is numbered zero, the second one, and so on. @code{wanted}
-indicates the type of value expected.
-
address@hidden awk_bool_t set_argument(size_t count, awk_array_t array);
-Convert a parameter that was undefined into an array; this provides
-call-by-reference for arrays.  Return false if @code{count} is too big,
-or if the argument's type is not undefined.  @xref{Array Manipulation},
-for more information on creating arrays.
address@hidden table
-
address@hidden Symbol Table Access
address@hidden Symbol Table Access
-
-Two sets of routines provide access to global variables, and one set
-allows you to create and release cached values.
-
address@hidden
-* Symbol table by name::        Accessing variables by name.
-* Symbol table by cookie::      Accessing variables by ``cookie''.
-* Cached values::               Creating and using cached values.
address@hidden menu
-
address@hidden Symbol table by name
address@hidden Variable Access and Update by Name
-
-The following routines provide the ability to access and update
-global @command{awk}-level variables by name.  In compiler terminology,
-identifiers of different kinds are termed @dfn{symbols}, thus the ``sym''
-in the routines' names.  The data structure which stores information
-about symbols is termed a @dfn{symbol table}.
-
address@hidden @code
address@hidden awk_bool_t sym_lookup(const char *name,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_valtype_t wanted,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ awk_value_t *result);
-Fill in the @code{awk_value_t} structure pointed to by @code{result}
-with the value of the variable named by the string @code{name}, which is
-a regular C string.  @code{wanted} indicates the type of value expected.
-Return true if the actual type matches @code{wanted}, false otherwise
-In the latter case, @code{result->val_type} indicates the actual type
-(@pxref{table-value-types-returned}).
-
address@hidden awk_bool_t sym_update(const char *name, awk_value_t *value);
-Update the variable named by the string @code{name}, which is a regular
-C string.  The variable is added to @command{gawk}'s symbol table
-if it is not there.  Return true if everything worked, false otherwise.
-
-Changing types (scalar to array or vice versa) of an existing variable
-is @emph{not} allowed, nor may this routine be used to update an array.
-This routine cannot be be used to update any of the predefined
-variables (such as @code{ARGC} or @code{NF}).
-
address@hidden awk_bool_t sym_constant(const char *name, awk_value_t *value);
-Create a variable named by the string @code{name}, which is
-a regular C string, that has the constant value as given by
address@hidden @command{awk}-level code cannot change the value of this
address@hidden (currently) is no @code{awk}-level feature that
-provides this ability.} The extension may change the value of @code{name}'s
-variable with subsequent calls to this routine, and may also convert
-a variable created by @code{sym_update()} into a constant.  However,
-once a variable becomes a constant it cannot later be reverted into a
-mutable variable.
address@hidden table
-
address@hidden Symbol table by cookie
address@hidden Variable Access and Update by Cookie
-
-A @dfn{scalar cookie} is an opaque handle that provide access
-to a global variable or array. It is an optimization that
-avoids looking up variables in @command{gawk}'s symbol table every time
-access is needed. This was discussed earlier, in @ref{General Data Types}.
-
-The following functions let you work with scalar cookies.
-
address@hidden @code
address@hidden awk_bool_t sym_lookup_scalar(awk_scalar_t cookie,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 
awk_valtype_t wanted,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 
awk_value_t *result);
-Retrieve the current value of a scalar cookie.
-Once you have obtained a scalar_cookie using @code{sym_lookup()}, you can
-use this function to get its value more efficiently.
-Return false if the value cannot be retrieved.
-
address@hidden awk_bool_t sym_update_scalar(awk_scalar_t cookie, awk_value_t 
*value);
-Update the value associated with a scalar cookie.  Return false if
-the new value is not one of @code{AWK_STRING} or @code{AWK_NUMBER}.
-Here too, the built-in variables may not be updated.
address@hidden table
-
-It is not obvious at first glance how to work with scalar cookies or
-what their @i{raison d'etre} really is.  In theory, the @code{sym_lookup()}
-and @code{sym_update()} routines are all you really need to work with
-variables.  For example, you might have code that looked up the value of
-a variable, evaluated a condition, and then possibly changed the value
-of the variable based on the result of that evaluation, like so:
-
address@hidden
-/*  do_magic --- do something really great */
-
-static awk_value_t *
-do_magic(int nargs, awk_value_t *result)
address@hidden
-    awk_value_t value;
-
-    if (   sym_lookup("MAGIC_VAR", AWK_NUMBER, & value)
-        && some_condition(value.num_value)) @{
-            value.num_value += 42;
-            sym_update("MAGIC_VAR", & value);
-    @}
-
-    return make_number(0.0, result);
address@hidden
address@hidden example
-
address@hidden
-This code looks (and is) simple and straightforward. So what's the problem?
-
-Consider what happens if @command{awk}-level code associated with your
-extension calls the @code{magic()} function (implemented in C by 
@code{do_magic()}),
-once per record, while processing hundreds of thousands or millions of records.
-The @code{MAGIC_VAR} variable is looked up in the symbol table once or twice 
per function call!
-
-The symbol table lookup is really pure overhead; it is considerably more 
efficient
-to get a cookie that represents the variable, and use that to get the 
variable's
-value and update it as address@hidden difference is measurable and quite real. 
Trust us.}
-
-Thus, the way to use cookies is as follows.  First, install your extension's 
variable
-in @command{gawk}'s symbol table using @code{sym_update()}, as usual. Then get 
a
-scalar cookie for the variable using @code{sym_lookup()}:
-
address@hidden
-static awk_scalar_t magic_var_cookie;    /* cookie for MAGIC_VAR */
-
-static void
-my_extension_init()
address@hidden
-    awk_value_t value;
-
-    /* install initial value */
-    sym_update("MAGIC_VAR", make_number(42.0, & value));
-
-    /* get cookie */
-    sym_lookup("MAGIC_VAR", AWK_SCALAR, & value);
-
-    /* save the cookie */
-    magic_var_cookie = value.scalar_cookie;
-    @dots{}
address@hidden
address@hidden example
-
-Next, use the routines in this section for retrieving and updating
-the value through the cookie.  Thus, @code{do_magic()} now becomes
-something like this:
-
address@hidden
-/*  do_magic --- do something really great */
-
-static awk_value_t *
-do_magic(int nargs, awk_value_t *result)
address@hidden
-    awk_value_t value;
-
-    if (   sym_lookup_scalar(magic_var_cookie, AWK_NUMBER, & value)
-        && some_condition(value.num_value)) @{
-            value.num_value += 42;
-            sym_update_scalar(magic_var_cookie, & value);
-    @}
-    @dots{}
-
-    return make_number(0.0, result);
address@hidden
address@hidden example
-
address@hidden NOTE
-The previous code omitted error checking for
-presentation purposes.  Your extension code should be more robust
-and carefully check the return values from the API functions.
address@hidden quotation
-
address@hidden Cached values
address@hidden Creating and Using Cached Values
-
-The routines in this section allow you to create and release
-cached values.  As with scalar cookies, in theory, cached values
-are not necessary. You can create numbers and strings using
-the functions in @ref{Constructor Functions}. You can then
-assign those values to variables using @code{sym_update()}
-or @code{sym_update_scalar()}, as you like.
-
-However, you can understand the point of cached values if you remember that
address@hidden string value's storage @emph{must} come from @code{malloc()}.
-If you have 20 variables, all of which have the same string value, you
-must create 20 identical copies of the address@hidden values
-are clearly less problematic, requiring only a C @code{double} to store.}
-
-It is clearly more efficient, if possible, to create a value once, and
-then tell @command{gawk} to reuse the value for multiple variables. That
-is what the routines in this section let you do.  The functions are as follows:
-
address@hidden @code
address@hidden awk_bool_t create_value(awk_value_t *value, awk_value_cookie_t 
*result);
-Create a cached string or numeric value from @code{value} for efficient later
-assignment.
-Only @code{AWK_NUMBER} and @code{AWK_STRING} values are allowed.  Any other 
type
-is rejected.  While @code{AWK_UNDEFINED} could be allowed, doing so would
-result in inferior performance.
-
address@hidden awk_bool_t release_value(awk_value_cookie_t vc);
-Release the memory associated with a value cookie obtained
-from @code{create_value()}.
address@hidden table
-
-You use value cookies in a fashion similar to the way you use scalar cookies.
-In the extension initialization routine, you create the value cookie:
-
address@hidden
-static awk_value_cookie_t answer_cookie;  /* static value cookie */
-
-static void
-my_extension_init()
address@hidden
-    awk_value_t value;
-    char *long_string;
-    size_t long_string_len;
-
-    /* code from earlier */
-    @dots{} 
-    /* @dots{} fill in long_string and long_string_len @dots{} */
-    make_malloced_string(long_string, long_string_len, & value);
-    create_value(& value, & answer_cookie);    /* create cookie */
-    @dots{}
address@hidden
address@hidden example
-
-Once the value is created, you can use it as the value of any number
-of variables:
-
address@hidden
-static awk_value_t *
-do_magic(int nargs, awk_value_t *result)
address@hidden
-    awk_value_t new_value;
-
-    @dots{}    /* as earlier */
-
-    value.val_type = AWK_VALUE_COOKIE;
-    value.value_cookie = answer_cookie;
-    sym_update("VAR1", & value);
-    sym_update("VAR2", & value);
-    @dots{}
-    sym_update("VAR100", & value);
-    @dots{}
address@hidden
address@hidden example
-
address@hidden
-Using value cookies in this way saves considerable storage, since all of
address@hidden through @code{VAR100} share the same value.
-
-You might be wondering, ``Is this sharing problematic?
-What happens if @command{awk} code assigns a new value to @code{VAR1},
-are all the others be changed too?''
-
-That's a great question. The answer is that no, it's not a problem.
address@hidden is smart enough to avoid such problems.
-
-Finally, as part of your clean up action (@pxref{Exit Callback Functions})
-you should release any cached values that you created, using
address@hidden()}.
-
address@hidden Array Manipulation
address@hidden Array Manipulation
-
-The primary data address@hidden, the only data structure.} in @command{awk}
-is the associative array (@pxref{Arrays}).
-Extensions need to be able to manipulate @command{awk} arrays.
-The API provides a number of data structures for working with arrays,
-functions for working with individual elements, and functions for
-working with arrays as a whole. This includes the ability to
-``flatten'' an array so that it is easy for C code to traverse
-every element in an array.  The array data structures integrate
-nicely with the data structures for values to make it easy to
-both work with and create true arrays of arrays (@pxref{General Data Types}).
-
address@hidden
-* Array Data Types::            Data types for working with arrays.
-* Array Functions::             Functions for working with arrays.
-* Flattening Arrays::           How to flatten arrays.
-* Creating Arrays::             How to create and populate arrays.
address@hidden menu
-
address@hidden Array Data Types
address@hidden Array Data Types
-
-The data types associated with arrays are listed below.
-
address@hidden @code
address@hidden typedef void *awk_array_t;
-If you request the value of an array variable, you get back an
address@hidden value. This value is address@hidden is also
-a ``cookie,'' but the @command{gawk} developers did not wish to overuse this
-term.} to the extension; it uniquely identifies the array but can
-only be used by passing it into API functions or receiving it from API
-functions. This is very similar to way @samp{FILE *} values are used
-with the @code{<stdio.h>} library routines.
-
-
address@hidden
address@hidden typedef struct awk_element @{
address@hidden @ @ @ @ /* convenience linked list pointer, not used by gawk */
address@hidden @ @ @ @ struct awk_element *next;
address@hidden @ @ @ @ enum @{
address@hidden @ @ @ @ @ @ @ @ AWK_ELEMENT_DEFAULT = 0,@ @ /* set by gawk */
address@hidden @ @ @ @ @ @ @ @ AWK_ELEMENT_DELETE = 1@ @ @ @ /* set by 
extension if should be deleted */
address@hidden @ @ @ @ @} flags;
address@hidden @ @ @ @ awk_value_t    index;
address@hidden @ @ @ @ awk_value_t    value;
address@hidden @} awk_element_t;
-The @code{awk_element_t} is a ``flattened''
-array element. @command{awk} produces an array of these
-inside the @code{awk_flat_array_t} (see the next item).
-Individual elements may be marked for deletion. New elements must be added
-individually, one at a time, using the separate API for that purpose.
-The fields are as follows:
-
address@hidden nested table
address@hidden @code
address@hidden struct awk_element *next;
-This pointer is for the convenience of extension writers.  It allows
-an extension to create a linked list of new elements which can then be
-added to an array in a loop that traverses the list.
-
address@hidden enum @{ @dots{} @} flags;
-A set of flag values that convey information between @command{gawk}
-and the extension. Currently there is only one: @code{AWK_ELEMENT_DELETE},
-which the extension can set to cause @command{gawk} to delete the
-element from the original array upon release of the flattened array.
-
address@hidden index
address@hidden value
-The index and value of the element, respectively.
address@hidden memory pointed to by @code{index} and @code{value} belongs to 
@command{gawk}.
address@hidden table
-
address@hidden typedef struct awk_flat_array @{
address@hidden @ @ @ @ awk_const void *awk_const opaque1;@ @ @ @ /* private 
data for use by gawk */
address@hidden @ @ @ @ awk_const void *awk_const opaque2;@ @ @ @ /* private 
data for use by gawk */
address@hidden @ @ @ @ awk_const size_t count;@ @ @ @ @ /* how many elements */
address@hidden @ @ @ @ awk_element_t elements[1];@ @ /* will be extended */
address@hidden @} awk_flat_array_t;
-This is a flattened array. When an extension gets one of these
-from @command{gawk}, the @code{elements} array is of actual
-size @code{count}.
-The @code{opaque1} and @code{opaque2} pointers are for use by @command{gawk};
-therefore they are marked @code{awk_const} so that the extension cannot
-modify them.
address@hidden table
-
address@hidden Array Functions
address@hidden Array Functions
-
-The following functions relate to individual array elements.
-
address@hidden @code
address@hidden awk_bool_t get_element_count(awk_array_t a_cookie, size_t 
*count);
-For the array represented by @code{a_cookie}, return in @code{*count}
-the number of elements it contains. A subarray counts as a single element.
-Return false if there is an error.
-
address@hidden awk_bool_t get_array_element(awk_array_t a_cookie,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const 
awk_value_t *const index,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 
awk_valtype_t wanted,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 
awk_value_t *result);
-For the array represented by @code{a_cookie}, return in @code{*result}
-the value of the element whose index is @code{index}.
address@hidden specifies the type of value you wish to retrieve.
-Return false if @code{wanted} does not match the actual type or if
address@hidden is not in the array (@pxref{table-value-types-returned}).
-
-The value for @code{index} can be numeric, in which case @command{gawk}
-converts it to a string. Using non-integral values is possible, but
-requires that you understand how such values are converted to strings
-(@pxref{Conversion}); thus using integral values is safest.
-
-As with @emph{all} strings passed into @code{gawk} from an extension,
-the string value of @code{index} must come from @code{malloc()}, and
address@hidden releases the storage.
-
address@hidden awk_bool_t set_array_element(awk_array_t a_cookie,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ 
awk_value_t *const index,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const@ 
awk_value_t *const value);
-In the array represented by @code{a_cookie}, create or modify
-the element whose index is given by @code{index}.
-The @code{ARGV} and @code{ENVIRON} arrays may not be changed.
-
address@hidden awk_bool_t set_array_element_by_elem(awk_array_t a_cookie,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 
@ @ @ @ @ awk_element_t element);
-Like @code{set_array_element()}, but take the @code{index} and @code{value}
-from @code{element}. This is a convenience macro.
-
address@hidden awk_bool_t del_array_element(awk_array_t a_cookie,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ const 
awk_value_t* const index);
-Remove the element with the given index from the array
-represented by @code{a_cookie}.
-Return true if the element was removed, or false if the element did
-not exist in the array.
address@hidden table
-
-The following functions relate to arrays as a whole:
-
address@hidden @code
address@hidden awk_array_t create_array();
-Create a new array to which elements may be added.
address@hidden Arrays}, for a discussion of how to
-create a new array and add elements to it.
-
address@hidden awk_bool_t clear_array(awk_array_t a_cookie);
-Clear the array represented by @code{a_cookie}.
-Return false if there was some kind of problem, true otherwise.
-The array remains an array, but after calling this function, it
-has no elements. This is equivalent to using the @code{delete}
-statement (@pxref{Delete}).
-
address@hidden awk_bool_t flatten_array(awk_array_t a_cookie, awk_flat_array_t 
**data);
-For the array represented by @code{a_cookie}, create an @code{awk_flat_array_t}
-structure and fill it in. Set the pointer whose address is passed as 
@code{data}
-to point to this structure.
-Return true upon success, or false otherwise.
address@hidden Arrays}, for a discussion of how to
-flatten an array and work with it.
-
address@hidden awk_bool_t release_flattened_array(awk_array_t a_cookie,
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 
@ @ @ awk_flat_array_t *data);
-When done with a flattened array, release the storage using this function.
-You must pass in both the original array cookie, and the address of
-the created @code{awk_flat_array_t} structure.
-The function returns true upon success, false otherwise.
address@hidden table
-
address@hidden Flattening Arrays
address@hidden Working With All The Elements of an Array
-
-To @dfn{flatten} an array is create a structure that
-represents the full array in a fashion that makes it easy
-for C code to traverse the entire array.  Test code
-in @file{extension/testext.c} does this, and also serves
-as a nice example to show how to use the APIs.
-
-First, the @command{gawk} script that drives the test extension:
-
address@hidden
-@@load "testext"
-BEGIN @{
-    n = split("blacky rusty sophie raincloud lucky", pets)
-    printf "pets has %d elements\n", length(pets)
-    ret = dump_array_and_delete("pets", "3")
-    printf "dump_array_and_delete(pets) returned %d\n", ret
-    if ("3" in pets)
-        printf("dump_array_and_delete() did NOT remove index \"3\"!\n")
-    else
-        printf("dump_array_and_delete() did remove index \"3\"!\n")
-    print ""
address@hidden
address@hidden example
-
address@hidden
-This code creates an array with @code{split()} (@pxref{String Functions})
-and then calls @code{dump_and_delete()}. That function looks up
-the array whose name is passed as the first argument, and
-deletes the element at the index passed in the second argument.
-It then prints the return value and checks if the element
-was indeed deleted.  Here is the C code that implements
address@hidden()}. It has been edited slightly for
-presentation.
-
-The first part declares variables, sets up the default
-return value in @code{result}, and checks that the function
-was called with the correct number of arguments:
-
address@hidden
-static awk_value_t *
-dump_array_and_delete(int nargs, awk_value_t *result)
address@hidden
-    awk_value_t value, value2, value3;
-    awk_flat_array_t *flat_array;
-    size_t count;
-    char *name;
-    int i;
-
-    assert(result != NULL);
-    make_number(0.0, result);
-
-    if (nargs != 2) @{
-        printf("dump_array_and_delete: nargs not right "
-               "(%d should be 2)\n", nargs);
-        goto out;
-    @}
address@hidden example
-
-The function then proceeds in steps, as follows. First, retrieve
-the name of the array, passed as the first argument. Then
-retrieve the array itself. If either operation fails, print
-error messages and return:
-
address@hidden
-    /* get argument named array as flat array and print it */
-    if (get_argument(0, AWK_STRING, & value)) @{
-        name = value.str_value.str;
-        if (sym_lookup(name, AWK_ARRAY, & value2))
-            printf("dump_array_and_delete: sym_lookup of %s passed\n",
-                   name);
-        else @{
-            printf("dump_array_and_delete: sym_lookup of %s failed\n",
-                   name);
-            goto out;
-        @}
-    @} else @{
-        printf("dump_array_and_delete: get_argument(0) failed\n");
-        goto out;
-    @}
address@hidden example
-
-For testing purposes and to make sure that the C code sees
-the same number of elements as the @command{awk} code,
-the second step is to get the count of elements in the array
-and print it:
-
address@hidden
-    if (! get_element_count(value2.array_cookie, & count)) @{
-        printf("dump_array_and_delete: get_element_count failed\n");
-        goto out;
-    @}
-
-    printf("dump_array_and_delete: incoming size is %lu\n",
-           (unsigned long) count);
address@hidden example
-
-The third step is to actually flatten the array, and then
-to double check that the count in the @code{awk_flat_array_t}
-is the same as the count just retrieved:
-
address@hidden
-    if (! flatten_array(value2.array_cookie, & flat_array)) @{
-        printf("dump_array_and_delete: could not flatten array\n");
-        goto out;
-    @}
-
-    if (flat_array->count != count) @{
-        printf("dump_array_and_delete: flat_array->count (%lu)"
-               " != count (%lu)\n",
-                (unsigned long) flat_array->count,
-                (unsigned long) count);
-        goto out;
-    @}
address@hidden example
-
-The fourth step is to retrieve the index of the element
-to be deleted, which was passed as the second argument.
-Remember that argument counts passed to @code{get_argument()}
-are zero-based, thus the second argument is numbered one:
-
address@hidden
-    if (! get_argument(1, AWK_STRING, & value3)) @{
-        printf("dump_array_and_delete: get_argument(1) failed\n");
-        goto out;
-    @}
address@hidden example
-
-The fifth step is where the ``real work'' is done. The function
-loops over every element in the array, printing the index and
-element values. In addition, upon finding the element with the
-index that is supposed to be deleted, the function sets the
address@hidden bit in the @code{flags} field
-of the element.  When the array is released, @command{gawk}
-traverses the flattened array, and deletes any element which
-have this flag bit set:
-
address@hidden
-    for (i = 0; i < flat_array->count; i++) @{
-        printf("\t%s[\"%.*s\"] = %s\n",
-            name,
-            (int) flat_array->elements[i].index.str_value.len,
-            flat_array->elements[i].index.str_value.str,
-            valrep2str(& flat_array->elements[i].value));
-
-        if (strcmp(value3.str_value.str,
-                   flat_array->elements[i].index.str_value.str)
-                   == 0) @{
-            flat_array->elements[i].flags |= AWK_ELEMENT_DELETE;
-            printf("dump_array_and_delete: marking element \"%s\" "
-                   "for deletion\n",
-                flat_array->elements[i].index.str_value.str);
-        @}
-    @}
address@hidden example
-
-The sixth step is to release the flattened array. This tells
address@hidden that the extension is no longer using the array,
-and that it should delete any elements marked for deletion.
address@hidden also frees any storage that was allocated,
-so you should not use the pointer (@code{flat_array} in this
-code) once you have called @code{release_flattened_array()}:
-
address@hidden
-    if (! release_flattened_array(value2.array_cookie, flat_array)) @{
-        printf("dump_array_and_delete: could not release flattened array\n");
-        goto out;
-    @}
address@hidden example
-
-Finally, since everything was successful, the function sets the
-return value to success, and returns:
-
address@hidden
-    make_number(1.0, result);
-out:
-    return result;
address@hidden
address@hidden example
-
-Here is the output from running this part of the test:
-
address@hidden
-pets has 5 elements
-dump_array_and_delete: sym_lookup of pets passed
-dump_array_and_delete: incoming size is 5
-        pets["1"] = "blacky"
-        pets["2"] = "rusty"
-        pets["3"] = "sophie"
-dump_array_and_delete: marking element "3" for deletion
-        pets["4"] = "raincloud"
-        pets["5"] = "lucky"
-dump_array_and_delete(pets) returned 1
-dump_array_and_delete() did remove index "3"!
address@hidden example
-
address@hidden Creating Arrays
address@hidden How To Create and Populate Arrays
-
-Besides working with arrays created by @command{awk} code, you can
-create arrays and populate them as you see fit, and then @command{awk}
-code can access them and manipulate them.
-
-There are two important points about creating arrays from extension code:
-
address@hidden 1
address@hidden
-You must install a new array into @command{gawk}'s symbol
-table immediately upon creating it.  Once you have done so,
-you can then populate the array.
-
address@hidden
-Strictly speaking, this is required only
-for arrays that will have subarrays as elements; however it is
-a good idea to always do this.  This restriction may be relaxed
-in a subsequent revision of the API.
address@hidden ignore
-
-Similarly, if installing a new array as a subarray of an existing array,
-you must add the new array to its parent before adding any elements to it.
-
-Thus, the correct way to build an array is to work ``top down.''  Create
-the array, and immediately install it in @command{gawk}'s symbol table
-using @code{sym_update()}, or install it as an element in a previously
-existing array using @code{set_element()}. Example code is coming shortly.
-
address@hidden
-Due to gawk internals, after using @code{sym_update()} to install an array
-into @command{gawk}, you have to retrieve the array cookie from the value
-passed in to @command{sym_update()} before doing anything else with it, like 
so:
-
address@hidden
-awk_value_t index, value;
-awk_array_t new_array;
-
-make_const_string("an index", 8, & index);
-
-new_array = create_array();
-val.val_type = AWK_ARRAY;
-val.array_cookie = new_array;
-
-/* install array in the symbol table */
-sym_update("array", & index, & val);
-
-new_array = val.array_cookie;    /* YOU MUST DO THIS */
address@hidden example
-
-If installing an array as a subarray, you must also retrieve the value
-of the array cookie after the call to @code{set_element()}.
address@hidden enumerate
-
-The following C code is a simple test extension to create an array
-with two regular elements and with a subarray. The leading @samp{#include}
-directives and boilerplate variable declarations are omitted for brevity.
-The first step is to create a new array and then install it
-in the symbol table:
-
address@hidden
address@hidden
-#ifdef HAVE_CONFIG_H
-#include <config.h>
-#endif
-
-#include <stdio.h>
-#include <assert.h>
-#include <errno.h>
-#include <stdlib.h>
-#include <string.h>
-#include <unistd.h>
-
-#include <sys/types.h>
-#include <sys/stat.h>
-
-#include "gawkapi.h"
-
-static const gawk_api_t *api;   /* for convenience macros to work */
-static awk_ext_id_t *ext_id;
-static const char *ext_version = "testarray extension: version 1.0";
-
-int plugin_is_GPL_compatible;
-
address@hidden ignore
-/* create_new_array --- create a named array */
-
-static void
-create_new_array()
address@hidden
-    awk_array_t a_cookie;
-    awk_array_t subarray;
-    awk_value_t index, value;
-
-    a_cookie = create_array();
-    value.val_type = AWK_ARRAY;
-    value.array_cookie = a_cookie;
-
-    if (! sym_update("new_array", & value))
-        printf("create_new_array: sym_update(\"new_array\") failed!\n");
-    a_cookie = value.array_cookie;
address@hidden example
-
address@hidden
-Note how @code{a_cookie} is reset from the @code{array_cookie} field in
-the @code{value} structure.
-
-The second step is to install two regular values into @code{new_array}:
-
address@hidden
-    (void) make_const_string("hello", 5, & index);
-    (void) make_const_string("world", 5, & value);
-    if (! set_array_element(a_cookie, & index, & value)) @{
-        printf("fill_in_array: set_array_element failed\n");
-        return;
-    @}
-
-    (void) make_const_string("answer", 6, & index);
-    (void) make_number(42.0, & value);
-    if (! set_array_element(a_cookie, & index, & value)) @{
-        printf("fill_in_array: set_array_element failed\n");
-        return;
-    @}
address@hidden example
-
-The third step is to create the subarray and install it:
-
address@hidden
-    (void) make_const_string("subarray", 8, & index);
-    subarray = create_array();
-    value.val_type = AWK_ARRAY;
-    value.array_cookie = subarray;
-    if (! set_array_element(a_cookie, & index, & value)) @{
-        printf("fill_in_array: set_array_element failed\n");
-        return;
-    @}
-    subarray = value.array_cookie;
address@hidden example
-
-The final step is to populate the subarray with its own element:
-
address@hidden
-    (void) make_const_string("foo", 3, & index);
-    (void) make_const_string("bar", 3, & value);
-    if (! set_array_element(subarray, & index, & value)) @{
-        printf("fill_in_array: set_array_element failed\n");
-        return;
-    @}
address@hidden
address@hidden
-static awk_ext_func_t func_table[] = @{
-    @{ NULL, NULL, 0 @}
address@hidden;
-
-/* init_testarray --- additional initialization function */
-
-static awk_bool_t init_testarray(void)
address@hidden
-    create_new_array();
-
-    return 1;
address@hidden
-
-static awk_bool_t (*init_func)(void) = init_testarray;
-
-dl_load_func(func_table, testarray, "")
address@hidden ignore
address@hidden example
-
-Here is sample script that loads the extension
-and then dumps the array:
-
address@hidden
-@@load "subarray"
-
-function dumparray(name, array,     i)
address@hidden
-    for (i in array)
-        if (isarray(array[i]))
-            dumparray(name "[\"" i "\"]", array[i])
-        else
-            printf("%s[\"%s\"] = %s\n", name, i, array[i])
address@hidden
-
-BEGIN @{
-    dumparray("new_array", new_array);
address@hidden
address@hidden example
-
-Here is the result of running the script:
-
address@hidden
-$ @kbd{AWKLIBPATH=$PWD ./gawk -f subarray.awk}
address@hidden new_array["subarray"]["foo"] = bar
address@hidden new_array["hello"] = world
address@hidden new_array["answer"] = 42
address@hidden example
-
address@hidden
-(@xref{Finding Extensions}, for more information on the
address@hidden environment variable.)
-
address@hidden Extension API Variables
address@hidden API Variables
-
-The API provides two sets of variables.  The first provides information
-about the version of the API (both with which the extension was compiled,
-and with which @command{gawk} was compiled).  The second provides
-information about how @command{gawk} was invoked.
-
address@hidden
-* Extension Versioning::        API Version information.
-* Extension API Informational Variables:: Variables providing information about
-                                @command{gawk}'s invocation.
address@hidden menu
-
address@hidden Extension Versioning
address@hidden API Version Constants and Variables
-
-The API provides both a ``major'' and a ``minor'' version number.
-The API versions are available at compile time as constants:
-
address@hidden @code
address@hidden GAWK_API_MAJOR_VERSION
-The major version of the API.
-
address@hidden GAWK_API_MINOR_VERSION
-The minor version of the API.
address@hidden table
-
-The minor version increases when new functions are added to the API. Such
-new functions are always added to the end of the API @code{struct}.
-
-The major version increases (and the minor version is reset to zero) if any
-of the data types change size or member order, or if any of the existing
-functions change signature.
-
-It could happen that an extension may be compiled against one version
-of the API but loaded by a version of @command{gawk} using a different
-version. For this reason, the major and minor API versions of the
-running @command{gawk} are included in the API @code{struct} as read-only
-constant integers:
-
address@hidden @code
address@hidden api->major_version
-The major version of the running @command{gawk}.
-
address@hidden api->minor_version
-The minor version of the running @command{gawk}.
address@hidden table
-
-It is up to the extension to decide if there are API incompatibilities.
-Typically a check like this is enough:
-
address@hidden
-if (api->major_version != GAWK_API_MAJOR_VERSION
-    || api->minor_version < GAWK_API_MINOR_VERSION) @{
-        fprintf(stderr, "foo_extension: version mismatch with gawk!\n");
-        fprintf(stderr, "\tmy version (%d, %d), gawk version (%d, %d)\n",
-                GAWK_API_MAJOR_VERSION, GAWK_API_MINOR_VERSION,
-                api->major_version, api->minor_version);
-        exit(1);
address@hidden
address@hidden example
-
-Such code is included in the boilerplate @code{dl_load_func()} macro
-provided in @file{gawkapi.h} (discussed later, in
address@hidden API Boilerplate}).
-
address@hidden Extension API Informational Variables
address@hidden Informational Variables
-
-The API provides access to several variables that describe
-whether the corresponding command-line options were enabled when
address@hidden was invoked.  The variables are:
-
address@hidden @code
address@hidden do_lint
-This variable is true if @command{gawk} was invoked with @option{--lint} option
-(@pxref{Options}).
-
address@hidden do_traditional
-This variable is true if @command{gawk} was invoked with 
@option{--traditional} option.
-
address@hidden do_profile
-This variable is true if @command{gawk} was invoked with @option{--profile} 
option.
-
address@hidden do_sandbox
-This variable is true if @command{gawk} was invoked with @option{--sandbox} 
option.
-
address@hidden do_debug
-This variable is true if @command{gawk} was invoked with @option{--debug} 
option.
-
address@hidden do_mpfr
-This variable is true if @command{gawk} was invoked with @option{--bignum} 
option.
address@hidden table
-
-The value of @code{do_lint} can change if @command{awk} code
-modifies the @code{LINT} built-in variable (@pxref{Built-in Variables}).
-The others should not change during execution.
-
address@hidden Extension API Boilerplate
address@hidden Boilerplate Code
-
-As mentioned earlier (@pxref{Extension Mechanism Outline}), the function
-definitions as presented are really macros. To use these macros, your
-extension must provide a small amount of boilerplate code (variables and
-functions) towards the top of your source file, using pre-defined names
-as described below.  The boilerplate needed is also provided in comments
-in the @file{gawkapi.h} header file:
-
address@hidden
-/* Boiler plate code: */
-int plugin_is_GPL_compatible;
-
-static gawk_api_t *const api;
-static awk_ext_id_t ext_id;
-static const char *ext_version = NULL; /* or @dots{} = "some string" */
-
-static awk_ext_func_t func_table[] = @{
-    @{ "name", do_name, 1 @},
-    /* @dots{} */
address@hidden;
-
-/* EITHER: */
-
-static awk_bool_t (*init_func)(void) = NULL;
-
-/* OR: */
-
-static awk_bool_t
-init_my_module(void)
address@hidden
-    @dots{}
address@hidden
-
-static awk_bool_t (*init_func)(void) = init_my_module;
-
-dl_load_func(func_table, some_name, "name_space_in_quotes")
address@hidden example
-
-These variables and functions are as follows:
-
address@hidden @code
address@hidden int plugin_is_GPL_compatible;
-This asserts that the extension is compatible with the GNU GPL
-(@pxref{Copying}).  If your extension does not have this, @command{gawk}
-will not load it (@pxref{Plugin License}).
-
address@hidden static gawk_api_t *const api;
-This global @code{static} variable should be set to point to
-the @code{gawk_api_t} pointer that @command{gawk} passes to your
address@hidden()} function.  This variable is used by all of the macros.
-
address@hidden static awk_ext_id_t ext_id;
-This global static variable should be set to the @code{awk_ext_id_t}
-value that @command{gawk} passes to your @code{dl_load()} function.
-This variable is used by all of the macros.
-
address@hidden static const char *ext_version = NULL; /* or @dots{} = "some 
string" */
-This global @code{static} variable should be set either
-to @code{NULL}, or to point to a string giving the name and version of
-your extension.
-
address@hidden static awk_ext_func_t func_table[] = @{ @dots{} @};
-This is an array of one or more @code{awk_ext_func_t} structures
-as described earlier (@pxref{Extension Functions}).
-It can then be looped over for multiple calls to
address@hidden()}.
-
address@hidden static awk_bool_t (*init_func)(void) = NULL;
address@hidden @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @r{OR}
address@hidden static awk_bool_t init_my_module(void) @{ @dots{} @}
address@hidden static awk_bool_t (*init_func)(void) = init_my_module;
-If you need to do some initialization work, you should define a
-function that does it (creates variables, opens files, etc.)
-and then define the @code{init_func} pointer to point to your
-function.
-The function should return zero (false) upon failure, non-zero
-(success) if everything goes well.
-
-If you don't need to do any initialization, define the pointer and
-initialize it to @code{NULL}.
-
address@hidden dl_load_func(func_table, some_name, "name_space_in_quotes")
-This macro expands to a @code{dl_load()} function that performs
-all the necessary initializations.
address@hidden table
-
-The point of the all the variables and arrays is to let the
address@hidden()} function (from the @code{dl_load_func()}
-macro) do all the standard work. It does the following:
-
address@hidden 1
address@hidden
-Check the API versions. If the extension major version does not match
address@hidden's, or if the extension minor version is greater than
address@hidden's, it prints a fatal error message and exits.
-
address@hidden
-Load the functions defined in @code{func_table}.
-If any of them fails to load, it prints a warning message but
-continues on.
-
address@hidden
-If the @code{init_func} pointer is not @code{NULL}, call the
-function it points to. If it returns non-zero, print a
-warning message.
-
address@hidden
-If @code{ext_version} is not @code{NULL}, register
-the version string with @command{gawk}.
address@hidden enumerate
-
address@hidden Finding Extensions
address@hidden How @command{gawk} Finds Extensions
-
-Compiled extensions have to be installed in a directory where
address@hidden can find them.  If @command{gawk} is configured and
-built in the default fashion, the directory in which to find
-extensions is @file{/usr/local/lib/gawk}.  You can also specify a search
-path with a list of directories to search for compiled extensions.
address@hidden Variable}, for more information.
-
address@hidden Extension Example
address@hidden Example: Some File Functions
-
address@hidden
address@hidden matter where you go, there you are.} @*
-Buckaroo Bonzai
address@hidden quotation
-
address@hidden It's enough to show chdir and stat, no need for fts
-
-Two useful functions that are not in @command{awk} are @code{chdir()} (so
-that an @command{awk} program can change its directory) and @code{stat()}
-(so that an @command{awk} program can gather information about a file).
-This @value{SECTION} implements these functions for @command{gawk}
-in an extension.
-
address@hidden
-* Internal File Description::   What the new functions will do.
-* Internal File Ops::           The code for internal file operations.
-* Using Internal File Ops::     How to use an external extension.
address@hidden menu
-
address@hidden Internal File Description
address@hidden Using @code{chdir()} and @code{stat()}
-
-This @value{SECTION} shows how to use the new functions at
-the @command{awk} level once they've been integrated into the
-running @command{gawk} interpreter.  Using @code{chdir()} is very
-straightforward. It takes one argument, the new directory to change to:
-
address@hidden
-@@load "filefuncs"
address@hidden
-newdir = "/home/arnold/funstuff"
-ret = chdir(newdir)
-if (ret < 0) @{
-    printf("could not change to %s: %s\n",
-                   newdir, ERRNO) > "/dev/stderr"
-    exit 1
address@hidden
address@hidden
address@hidden example
-
-The return value is negative if the @code{chdir()} failed, and
address@hidden (@pxref{Built-in Variables}) is set to a string indicating
-the error.
-
-Using @code{stat()} is a bit more complicated.  The C @code{stat()}
-function fills in a structure that has a fair amount of information.
-The right way to model this in @command{awk} is to fill in an associative
-array with the appropriate information:
-
address@hidden broke printf for page breaking
address@hidden
-file = "/home/arnold/.profile"
-ret = stat(file, fdata)
-if (ret < 0) @{
-    printf("could not stat %s: %s\n",
-             file, ERRNO) > "/dev/stderr"
-    exit 1
address@hidden
-printf("size of %s is %d bytes\n", file, fdata["size"])
address@hidden example
-
-The @code{stat()} function always clears the data array, even if
-the @code{stat()} fails.  It fills in the following elements:
-
address@hidden @code
address@hidden "name"
-The name of the file that was @code{stat()}'ed.
-
address@hidden "dev"
address@hidden "ino"
-The file's device and inode numbers, respectively.
-
address@hidden "mode"
-The file's mode, as a numeric value. This includes both the file's
-type and its permissions.
-
address@hidden "nlink"
-The number of hard links (directory entries) the file has.
-
address@hidden "uid"
address@hidden "gid"
-The numeric user and group ID numbers of the file's owner.
-
address@hidden "size"
-The size in bytes of the file.
-
address@hidden "blocks"
-The number of disk blocks the file actually occupies. This may not
-be a function of the file's size if the file has holes.
-
address@hidden "atime"
address@hidden "mtime"
address@hidden "ctime"
-The file's last access, modification, and inode update times,
-respectively.  These are numeric timestamps, suitable for formatting
-with @code{strftime()}
-(@pxref{Time Functions}).
-
address@hidden "pmode"
-The file's ``printable mode.''  This is a string representation of
-the file's type and permissions, such as is produced by
address@hidden -l}---for example, @code{"drwxr-xr-x"}.
-
address@hidden "type"
-A printable string representation of the file's type.  The value
-is one of the following:
-
address@hidden @code
address@hidden "blockdev"
address@hidden "chardev"
-The file is a block or character device (``special file'').
-
address@hidden
address@hidden "door"
-The file is a Solaris ``door'' (special file used for
-interprocess communications).
address@hidden ignore
-
address@hidden "directory"
-The file is a directory.
-
address@hidden "fifo"
-The file is a named-pipe (also known as a FIFO).
-
address@hidden "file"
-The file is just a regular file.
-
address@hidden "socket"
-The file is an @code{AF_UNIX} (``Unix domain'') socket in the
-filesystem.
-
address@hidden "symlink"
-The file is a symbolic link.
address@hidden table
address@hidden table
-
-Several additional elements may be present depending upon the operating
-system and the type of the file.  You can test for them in your @command{awk}
-program by using the @code{in} operator
-(@pxref{Reference to Elements}):
-
address@hidden @code
address@hidden "blksize"
-The preferred block size for I/O to the file. This field is not
-present on all POSIX-like systems in the C @code{stat} structure.
-
address@hidden "linkval"
-If the file is a symbolic link, this element is the name of the
-file the link points to (i.e., the value of the link).
-
address@hidden "rdev"
address@hidden "major"
address@hidden "minor"
-If the file is a block or character device file, then these values
-represent the numeric device number and the major and minor components
-of that number, respectively.
address@hidden table
-
address@hidden Internal File Ops
address@hidden C Code for @code{chdir()} and @code{stat()}
-
-Here is the C code for these address@hidden version is
-edited slightly for presentation.  See @file{extension/filefuncs.c}
-in the @command{gawk} distribution for the complete version.}
-
-The file includes a number of standard header files, and then includes
-the @file{gawkapi.h} header file which provides the API definitions.
-Those are followed by the necessary variable declarations 
-to make use of the API macros and boilerplate code
-(@pxref{Extension API Boilerplate}).
-
address@hidden break line for page breaking
address@hidden
-#ifdef HAVE_CONFIG_H
-#include <config.h>
-#endif
-
-#include <stdio.h>
-#include <assert.h>
-#include <errno.h>
-#include <stdlib.h>
-#include <string.h>
-#include <unistd.h>
-
-#include <sys/types.h>
-#include <sys/stat.h>
-
-#include "gawkapi.h"
-
-#include "gettext.h"
-#define _(msgid)  gettext(msgid)
-#define N_(msgid) msgid
-
-#include "gawkfts.h"
-#include "stack.h"
-
-static const gawk_api_t *api;    /* for convenience macros to work */
-static awk_ext_id_t *ext_id;
-static awk_bool_t init_filefuncs(void);
-static awk_bool_t (*init_func)(void) = init_filefuncs;
-static const char *ext_version = "filefuncs extension: version 1.0";
-
-int plugin_is_GPL_compatible;
address@hidden example
-
address@hidden programming conventions, @command{gawk} internals
-By convention, for an @command{awk} function @code{foo()}, the C function
-that implements it is called @code{do_foo()}.  The function should have
-two arguments: the first is an @code{int} usually called @code{nargs},
-that represents the number of actual arguments for the function.
-The second is a pointer to an @code{awk_value_t}, usually named
address@hidden
-
address@hidden
-/*  do_chdir --- provide dynamically loaded chdir() builtin for gawk */
-
-static awk_value_t *
-do_chdir(int nargs, awk_value_t *result)
address@hidden
-    awk_value_t newdir;
-    int ret = -1;
-
-    assert(result != NULL);
-
-    if (do_lint && nargs != 1)
-        lintwarn(ext_id,
-                 _("chdir: called with incorrect number of arguments, "
-                   "expecting 1"));
address@hidden example
-
-The @code{newdir}
-variable represents the new directory to change to, retrieved
-with @code{get_argument()}.  Note that the first argument is
-numbered zero.
-
-If the argument is retrieved successfully, the function calls the
address@hidden()} system call. If the @code{chdir()} fails, @code{ERRNO}
-is updated.
-
address@hidden
-    if (get_argument(0, AWK_STRING, & newdir)) @{
-        ret = chdir(newdir.str_value.str);
-        if (ret < 0)
-            update_ERRNO_int(errno);
-    @}
address@hidden example
-
-Finally, the function returns the return value to the @command{awk} level:
-
address@hidden
-    return make_number(ret, result);
address@hidden
address@hidden example
-
-The @code{stat()} built-in is more involved.  First comes a function
-that turns a numeric mode into a printable representation
-(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
-
address@hidden break line for page breaking
address@hidden
-/* format_mode --- turn a stat mode field into something readable */
-
-static char *
-format_mode(unsigned long fmode)
address@hidden
-    @dots{}
address@hidden
address@hidden example
-
-Next comes a function for reading symbolic links, which is also
-omitted here for brevity:
-
address@hidden
-/* read_symlink --- read a symbolic link into an allocated buffer.
-   @dots{} */
-
-static char *
-read_symlink(const char *fname, size_t bufsize, ssize_t *linksize)
address@hidden
-    @dots{}
address@hidden
address@hidden example
-
-Two helper functions simplify entering values in the
-array that will contain the result of the @code{stat()}:
-
address@hidden
-/* array_set --- set an array element */
-
-static void
-array_set(awk_array_t array, const char *sub, awk_value_t *value)
address@hidden
-    awk_value_t index;
-
-    set_array_element(array,
-                      make_const_string(sub, strlen(sub), & index),
-                      value);
-
address@hidden
-
-/* array_set_numeric --- set an array element with a number */
-
-static void
-array_set_numeric(awk_array_t array, const char *sub, double num)
address@hidden
-    awk_value_t tmp;
-
-    array_set(array, sub, make_number(num, & tmp));
address@hidden
address@hidden example
-
-The following function does most of the work to fill in
-the @code{awk_array_t} result array with values obtained
-from a valid @code{struct stat}. It is done in a separate function
-to support the @code{stat()} function for @command{gawk} and also
-to support the @code{fts()} extension which is included in
-the same file but whose code is not shown here
-(@pxref{Extension Sample File Functions}).
-
-The first part of the function is variable declarations,
-including a table to map file types to strings:
-
address@hidden
-/* fill_stat_array --- do the work to fill an array with stat info */
-
-static int
-fill_stat_array(const char *name, awk_array_t array, struct stat *sbuf)
address@hidden
-    char *pmode;    /* printable mode */
-    const char *type = "unknown";
-    awk_value_t tmp;
-    static struct ftype_map @{
-        unsigned int mask;
-        const char *type;
-    @} ftype_map[] = @{
-        @{ S_IFREG, "file" @},
-        @{ S_IFBLK, "blockdev" @},
-        @{ S_IFCHR, "chardev" @},
-        @{ S_IFDIR, "directory" @},
-#ifdef S_IFSOCK
-        @{ S_IFSOCK, "socket" @},
-#endif
-#ifdef S_IFIFO
-        @{ S_IFIFO, "fifo" @},
-#endif
-#ifdef S_IFLNK
-        @{ S_IFLNK, "symlink" @},
-#endif
-#ifdef S_IFDOOR /* Solaris weirdness */
-        @{ S_IFDOOR, "door" @},
-#endif /* S_IFDOOR */
-    @};
-    int j, k;
address@hidden example
-
-The destination array is cleared, and then code fills in
-various elements based on values in the @code{struct stat}:
-
address@hidden
-    /* empty out the array */
-    clear_array(array);
-
-    /* fill in the array */
-    array_set(array, "name", make_const_string(name, strlen(name),
-                                               & tmp));
-    array_set_numeric(array, "dev", sbuf->st_dev);
-    array_set_numeric(array, "ino", sbuf->st_ino);
-    array_set_numeric(array, "mode", sbuf->st_mode);
-    array_set_numeric(array, "nlink", sbuf->st_nlink);
-    array_set_numeric(array, "uid", sbuf->st_uid);
-    array_set_numeric(array, "gid", sbuf->st_gid);
-    array_set_numeric(array, "size", sbuf->st_size);
-    array_set_numeric(array, "blocks", sbuf->st_blocks);
-    array_set_numeric(array, "atime", sbuf->st_atime);
-    array_set_numeric(array, "mtime", sbuf->st_mtime);
-    array_set_numeric(array, "ctime", sbuf->st_ctime);
-
-    /* for block and character devices, add rdev,
-       major and minor numbers */
-    if (S_ISBLK(sbuf->st_mode) || S_ISCHR(sbuf->st_mode)) @{
-        array_set_numeric(array, "rdev", sbuf->st_rdev);
-        array_set_numeric(array, "major", major(sbuf->st_rdev));
-        array_set_numeric(array, "minor", minor(sbuf->st_rdev));
-    @}
address@hidden example
-
address@hidden
-The latter part of the function makes selective additions
-to the destination array, depending upon the availability of
-certain members and/or the type of the file. It then returns zero,
-for success:
-
address@hidden
-#ifdef HAVE_ST_BLKSIZE
-    array_set_numeric(array, "blksize", sbuf->st_blksize);
-#endif /* HAVE_ST_BLKSIZE */
-
-    pmode = format_mode(sbuf->st_mode);
-    array_set(array, "pmode", make_const_string(pmode, strlen(pmode),
-                                                & tmp));
-
-    /* for symbolic links, add a linkval field */
-    if (S_ISLNK(sbuf->st_mode)) @{
-        char *buf;
-        ssize_t linksize;
-
-        if ((buf = read_symlink(name, sbuf->st_size,
-                    & linksize)) != NULL)
-            array_set(array, "linkval",
-                      make_malloced_string(buf, linksize, & tmp));
-        else
-            warning(ext_id, _("stat: unable to read symbolic link `%s'"),
-                    name);
-    @}
-
-    /* add a type field */
-    type = "unknown";   /* shouldn't happen */
-    for (j = 0, k = sizeof(ftype_map)/sizeof(ftype_map[0]); j < k; j++) @{
-        if ((sbuf->st_mode & S_IFMT) == ftype_map[j].mask) @{
-            type = ftype_map[j].type;
-            break;
-        @}
-    @}
-
-    array_set(array, "type", make_const_string(type, strlen(type), &tmp));
-
-    return 0;
address@hidden
address@hidden example
-
-Finally, here is the @code{do_stat()} function. It starts with
-variable declarations and argument checking:
-
address@hidden
-Changed message for page breaking. Used to be:
-    "stat: called with incorrect number of arguments (%d), should be 2",
address@hidden ignore
address@hidden
-/* do_stat --- provide a stat() function for gawk */
-
-static awk_value_t *
-do_stat(int nargs, awk_value_t *result)
address@hidden
-    awk_value_t file_param, array_param;
-    char *name;
-    awk_array_t array;
-    int ret;
-    struct stat sbuf;
-
-    assert(result != NULL);
-
-    if (do_lint && nargs != 2) @{
-        lintwarn(ext_id,
-                 _("stat: called with wrong number of arguments"));
-        return make_number(-1, result);
-    @}
address@hidden example
-
-Then comes the actual work. First, the function gets the arguments.
-Next, it gets the information for the file.
-The code use @code{lstat()} (instead of @code{stat()})
-to get the file information,
-in case the file is a symbolic link.
-If there's an error, it sets @code{ERRNO} and returns:
-
address@hidden
-    /* file is first arg, array to hold results is second */
-    if (   ! get_argument(0, AWK_STRING, & file_param)
-        || ! get_argument(1, AWK_ARRAY, & array_param)) @{
-        warning(ext_id, _("stat: bad parameters"));
-        return make_number(-1, result);
-    @}
-
-    name = file_param.str_value.str;
-    array = array_param.array_cookie;
-
-    /* always empty out the array */
-    clear_array(array);
-
-    /* lstat the file, if error, set ERRNO and return */
-    ret = lstat(name, & sbuf);
-    if (ret < 0) @{
-        update_ERRNO_int(errno);
-        return make_number(ret, result);
-    @}
address@hidden example
-
-The tedious work is done by @code{fill_stat_array()}, shown
-earlier.  When done, return the result from @code{fill_stat_array()}:
-
address@hidden
-    ret = fill_stat_array(name, array, & sbuf);
-
-    return make_number(ret, result);
address@hidden
address@hidden example
-
address@hidden programming conventions, @command{gawk} internals
-Finally, it's necessary to provide the ``glue'' that loads the
-new function(s) into @command{gawk}.
-
-The @code{filefuncs} extension also provides an @code{fts()}
-function, which we omit here. For its sake there is an initialization
-function:
-
address@hidden
-/* init_filefuncs --- initialization routine */
-
-static awk_bool_t
-init_filefuncs(void)
address@hidden
-    @dots{}
address@hidden
address@hidden example
-
-We are almost done. We need an array of @code{awk_ext_func_t}
-structures for loading each function into @command{gawk}:
-
address@hidden
-static awk_ext_func_t func_table[] = @{
-    @{ "chdir", do_chdir, 1 @},
-    @{ "stat",  do_stat, 2 @},
-    @{ "fts",   do_fts, 3 @},
address@hidden;
address@hidden example
-
-Each extension must have a routine named @code{dl_load()} to load
-everything that needs to be loaded.  It is simplest to use the
address@hidden()} macro in @code{gawkapi.h}:
-
address@hidden
-/* define the dl_load() function using the boilerplate macro */
-
-dl_load_func(func_table, filefuncs, "")
address@hidden example
-
-And that's it!  As an exercise, consider adding functions to
-implement system calls such as @code{chown()}, @code{chmod()},
-and @code{umask()}.
-
address@hidden Using Internal File Ops
address@hidden Integrating The Extensions
-
address@hidden @command{gawk}, address@hidden adding code to
-Now that the code is written, it must be possible to add it at
-runtime to the running @command{gawk} interpreter.  First, the
-code must be compiled.  Assuming that the functions are in
-a file named @file{filefuncs.c}, and @var{idir} is the location
-of the @file{gawkapi.h} header file,
-the following address@hidden practice, you would probably want to
-use the GNU Autotools---Automake, Autoconf, Libtool, and Gettext---to
-configure and build your libraries. Instructions for doing so are beyond
-the scope of this @value{DOCUMENT}. @xref{gawkextlib}, for WWW links to
-the tools.} create a GNU/Linux shared library:
-
address@hidden
-$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g address@hidden filefuncs.c}
-$ @kbd{ld -o filefuncs.so -shared filefuncs.o -lc}
address@hidden example
-
-Once the library exists, it is loaded by using the @code{@@load} keyword.
-
address@hidden
-# file testff.awk
-@@load "filefuncs"
-
-BEGIN @{
-    "pwd" | getline curdir  # save current directory
-    close("pwd")
-
-    chdir("/tmp")
-    system("pwd")   # test it
-    chdir(curdir)   # go back
-
-    print "Info for testff.awk"
-    ret = stat("testff.awk", data)
-    print "ret =", ret
-    for (i in data)
-        printf "data[\"%s\"] = %s\n", i, data[i]
-    print "testff.awk modified:",
-        strftime("%m %d %y %H:%M:%S", data["mtime"])
-
-    print "\nInfo for JUNK"
-    ret = stat("JUNK", data)
-    print "ret =", ret
-    for (i in data)
-        printf "data[\"%s\"] = %s\n", i, data[i]
-    print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
address@hidden
address@hidden example
-
-The @env{AWKLIBPATH} environment variable tells
address@hidden where to find shared libraries (@pxref{Finding Extensions}).
-We set it to the current directory and run the program:
-
address@hidden
-$ @kbd{AWKLIBPATH=$PWD gawk -f testff.awk}
address@hidden /tmp
address@hidden Info for testff.awk
address@hidden ret = 0
address@hidden data["blksize"] = 4096
address@hidden data["mtime"] = 1350838628
address@hidden data["mode"] = 33204
address@hidden data["type"] = file
address@hidden data["dev"] = 2053
address@hidden data["gid"] = 1000
address@hidden data["ino"] = 1719496
address@hidden data["ctime"] = 1350838628
address@hidden data["blocks"] = 8
address@hidden data["nlink"] = 1
address@hidden data["name"] = testff.awk
address@hidden data["atime"] = 1350838632
address@hidden data["pmode"] = -rw-rw-r--
address@hidden data["size"] = 662
address@hidden data["uid"] = 1000
address@hidden testff.awk modified: 10 21 12 18:57:08
address@hidden 
address@hidden Info for JUNK
address@hidden ret = -1
address@hidden JUNK modified: 01 01 70 02:00:00
address@hidden example
-
address@hidden Extension Samples
address@hidden The Sample Extensions In The @command{gawk} Distribution
-
-This @value{SECTION} provides brief overviews of the sample extensions
-that come in the @command{gawk} distribution. Some of them are intended
-for production use, such the @code{filefuncs} and @code{readdir} extensions.
-Others mainly provide example code that shows how to use the extension API.
-
address@hidden
-* Extension Sample File Functions::   The file functions sample.
-* Extension Sample Fnmatch::          An interface to @code{fnmatch()}.
-* Extension Sample Fork::             An interface to @code{fork()} and other
-                                      process functions.
-* Extension Sample Ord::              Character to value to character
-                                      conversions.
-* Extension Sample Readdir::          An interface to @code{readdir()}.
-* Extension Sample Revout::           Reversing output sample output wrapper.
-* Extension Sample Rev2way::          Reversing data sample two-way processor.
-* Extension Sample Read write array:: Serializing an array to a file.
-* Extension Sample Readfile::         Reading an entire file into a string.
-* Extension Sample API Tests::        Tests for the API.
-* Extension Sample Time::             An interface to @code{gettimeofday()}
-                                      and @code{sleep()}.
address@hidden menu
-
address@hidden Extension Sample File Functions
address@hidden File Related Functions
-
-The @code{filefuncs} extension provides three different functions, as follows:
-The usage is:
-
address@hidden @code
address@hidden @@load "filefuncs"
-This is how you load the extension.
-
address@hidden result = chdir("/some/directory")
-The @code{chdir()} function is a direct hook to the @code{chdir()}
-system call to change the current directory.  It returns zero
-upon success or less than zero upon error.  In the latter case it updates
address@hidden
-
address@hidden result = stat("/some/path", statdata)
-The @code{stat()} function provides a hook into the
address@hidden()} system call. In fact, it uses @code{lstat()}.
-It returns zero upon success or less than zero upon error.
-In the latter case it updates @code{ERRNO}.
-
-In all cases, it clears the @code{statdata} array.
-When the call is successful, @code{stat()} fills the @code{statdata}
-array with information retrieved from the filesystem, as follows:
-
address@hidden nested table
address@hidden @columnfractions .25 .60
address@hidden @code{statdata["name"]} @tab
-The name of the file.
-
address@hidden @code{statdata["dev"]} @tab
-Corresponds to the @code{st_dev} field in the @code{struct stat}.
-
address@hidden @code{statdata["ino"]} @tab
-Corresponds to the @code{st_ino} field in the @code{struct stat}.
-
address@hidden @code{statdata["mode"]} @tab
-Corresponds to the @code{st_mode} field in the @code{struct stat}.
-
address@hidden @code{statdata["nlink"]} @tab
-Corresponds to the @code{st_nlink} field in the @code{struct stat}.
-
address@hidden @code{statdata["uid"]} @tab
-Corresponds to the @code{st_uid} field in the @code{struct stat}.
-
address@hidden @code{statdata["gid"]} @tab
-Corresponds to the @code{st_gid} field in the @code{struct stat}.
-
address@hidden @code{statdata["size"]} @tab
-Corresponds to the @code{st_size} field in the @code{struct stat}.
-
address@hidden @code{statdata["atime"]} @tab
-Corresponds to the @code{st_atime} field in the @code{struct stat}.
-
address@hidden @code{statdata["mtime"]} @tab
-Corresponds to the @code{st_mtime} field in the @code{struct stat}.
-
address@hidden @code{statdata["ctime"]} @tab
-Corresponds to the @code{st_ctime} field in the @code{struct stat}.
-
address@hidden @code{statdata["rdev"]} @tab
-Corresponds to the @code{st_rdev} field in the @code{struct stat}.
-This element is only present for device files.
-
address@hidden @code{statdata["major"]} @tab
-Corresponds to the @code{st_major} field in the @code{struct stat}.
-This element is only present for device files.
-
address@hidden @code{statdata["minor"]} @tab
-Corresponds to the @code{st_minor} field in the @code{struct stat}.
-This element is only present for device files.
-
address@hidden @code{statdata["blksize"]} @tab
-Corresponds to the @code{st_blksize} field in the @code{struct stat}.
-if this field is present on your system.
-(It is present on all modern systems that we know of.)
-
address@hidden @code{statdata["pmode"]} @tab
-A human-readable version of the mode value, such as printed by
address@hidden  For example, @code{"-rwxr-xr-x"}.
-
address@hidden @code{statdata["linkval"]} @tab
-If the named file is a symbolic link, this element will exist
-and its value is the value of the symbolic link (where the
-symbolic link points to).
-
address@hidden @code{statdata["type"]} @tab
-The type of the file as a string. One of
address@hidden"file"},
address@hidden"blockdev"},
address@hidden"chardev"},
address@hidden"directory"},
address@hidden"socket"},
address@hidden"fifo"},
address@hidden"symlink"},
address@hidden"door"},
-or
address@hidden"unknown"}.
-Not all systems support all file types.
address@hidden multitable
-
address@hidden flags = or(FTS_PHYSICAL, ...)
address@hidden result = fts(pathlist, flags, filedata)
-Walk the file trees provided in @code{pathlist} and fill in the
address@hidden array as described below.  @code{flags} is the bitwise
-OR of several predefined constant values, also as described below.
-Return zero if there were no errors, otherwise return @minus{}1.
address@hidden table
-
-The @code{fts()} function provides a hook to the C library @code{fts()}
-routines for traversing file hierarchies.  Instead of returning data
-about one file at a time in a stream, it fills in a multi-dimensional
-array with data about each file and directory encountered in the requested
-hierarchies.
-
-The arguments are as follows:
-
address@hidden @code
address@hidden pathlist
-An array of filenames.  The element values are used; the index values are 
ignored.
-
address@hidden flags
-This should be the bitwise OR of one or more of the following
-predefined constant flag values.  At least one of
address@hidden or @code{FTS_PHYSICAL} must be provided; otherwise
address@hidden()} returns an error value and sets @code{ERRNO}.
-The flags are:
-
address@hidden nested table
address@hidden @code
address@hidden FTS_LOGICAL
-Do a ``logical'' file traversal, where the information returned for
-a symbolic link refers to the linked-to file, and not to the symbolic
-link itself.  This flag is mutually exclusive with @code{FTS_PHYSICAL}.
-
address@hidden FTS_PHYSICAL
-Do a ``physical'' file traversal, where the information returned for a
-symbolic link refers to the symbolic link itself.  This flag is mutually
-exclusive with @code{FTS_LOGICAL}.
-
address@hidden FTS_NOCHDIR
-As a performance optimization, the C library @code{fts()} routines
-change directory as they traverse a file hierarchy.  This flag disables
-that optimization.
-
address@hidden FTS_COMFOLLOW
-Immediately follow a symbolic link named in @code{pathlist},
-whether or not @code{FTS_LOGICAL} is set.
-
address@hidden FTS_SEEDOT
-By default, the @code{fts()} routines do not return entries for @file{.}
-and @file{..}.  This option causes entries for @file{..} to also
-be included.  (The extension always includes an entry for @file{.},
-see below.)
-
address@hidden FTS_XDEV
-During a traversal, do not cross onto a different mounted filesystem.
address@hidden table
-
address@hidden filedata
-The @code{filedata} array is first cleared.  Then, @code{fts()} creates
-an element in @code{filedata} for every element in @code{pathlist}.
-The index is the name of the directory or file given in @code{pathlist}.
-The element for this index is itself an array.  There are two cases.
-
address@hidden nested table
address@hidden @emph
address@hidden The path is a file.
-In this case, the array contains two or three elements:
-
address@hidden doubly nested table
address@hidden @code
address@hidden "path"
-The full path to this file, starting from the ``root'' that was given
-in the @code{pathlist} array.
-
address@hidden "stat"
-This element is itself an array, containing the same information as provided
-by the @code{stat()} function described earlier for its
address@hidden argument.  The element may not be present if
-the @code{stat()} system call for the file failed.
-
address@hidden "error"
-If some kind of error was encountered, the array will also
-contain an element named @code{"error"}, which is a string describing the 
error.
address@hidden table
-
address@hidden The path is a directory.
-In this case, the array contains one element for each entry in the
-directory.  If an entry is a file, that element is as for files, just
-described.  If the entry is a directory, that element is (recursively),
-an array describing the subdirectory.  If @code{FTS_SEEDOT} was provided
-in the flags, then there will also be an element named @code{".."}.  This
-element will be an array containing the data as provided by @code{stat()}.
-
-In addition, there will be an element whose index is @code{"."}.
-This element is an array containing the same two or three elements as
-for a file: @code{"path"}, @code{"stat"}, and @code{"error"}.
address@hidden table
address@hidden table
-
-The @code{fts()} function returns zero if there were no errors.
-Otherwise it returns @minus{}1.
-
address@hidden NOTE
-The @code{fts()} extension does not exactly mimic the
-interface of the C library @code{fts()} routines, choosing instead to
-provide an interface that is based on associative arrays, which should
-be more comfortable to use from an @command{awk} program.  This includes the
-lack of a comparison function, since @command{gawk} already provides
-powerful array sorting facilities.  While an @code{fts_read()}-like
-interface could have been provided, this felt less natural than simply
-creating a multi-dimensional array to represent the file hierarchy and
-its information.
address@hidden quotation
-
-See @file{test/fts.awk} in the @command{gawk} distribution for an example.
-
address@hidden Extension Sample Fnmatch
address@hidden Interface To @code{fnmatch()}
-
-This extension provides an interface to the C library
address@hidden()} function.  The usage is:
-
address@hidden
-@@load "fnmatch"
-
-result = fnmatch(pattern, string, flags)
address@hidden example
-
-The @code{fnmatch} extension adds a single function named
address@hidden()}, one constant (@code{FNM_NOMATCH}), and an array of
-flag values named @code{FNM}.
-
-The arguments to @code{fnmatch()} are:
-
address@hidden @code
address@hidden pattern
-The filename wildcard to match.
-
address@hidden string
-The filename string,
-
address@hidden flag
-Either zero, or the bitwise OR of one or more of the
-flags in the @code{FNM} array.
address@hidden table
-
-The return value is zero on success, @code{FNM_NOMATCH}
-if the string did not match the pattern, or
-a different non-zero value if an error occurred.
-
-The flags are follows:
-
address@hidden @columnfractions .25 .75
address@hidden @code{FNM["CASEFOLD"]} @tab
-Corresponds to the @code{FNM_CASEFOLD} flag as defined in @code{fnmatch()}.
-
address@hidden @code{FNM["FILE_NAME"]} @tab
-Corresponds to the @code{FNM_FILE_NAME} flag as defined in @code{fnmatch()}.
-
address@hidden @code{FNM["LEADING_DIR"]} @tab
-Corresponds to the @code{FNM_LEADING_DIR} flag as defined in @code{fnmatch()}.
-
address@hidden @code{FNM["NOESCAPE"]} @tab
-Corresponds to the @code{FNM_NOESCAPE} flag as defined in @code{fnmatch()}.
-
address@hidden @code{FNM["PATHNAME"]} @tab
-Corresponds to the @code{FNM_PATHNAME} flag as defined in @code{fnmatch()}.
-
address@hidden @code{FNM["PERIOD"]} @tab
-Corresponds to the @code{FNM_PERIOD} flag as defined in @code{fnmatch()}.
address@hidden multitable
-
-Here is an example:
-
address@hidden
-@@load "fnmatch"
address@hidden
-flags = or(FNM["PERIOD"], FNM["NOESCAPE"])
-if (fnmatch("*.a", "foo.c", flags) == FNM_NOMATCH)
-    print "no match"
address@hidden example
-
address@hidden Extension Sample Fork
address@hidden Interface To @code{fork()}, @code{wait()} and @code{waitpid()}
-
-The @code{fork} extension adds three functions, as follows.
-
address@hidden @code
address@hidden @@load "fork"
-This is how you load the extension.
-
address@hidden pid = fork()
-This function creates a new process. The return value is the zero in the
-child and the process-id number of the child in the parent, or @minus{}1
-upon error. In the latter case, @code{ERRNO} indicates the problem.
-In the child, @code{PROCINFO["pid"]} and @code{PROCINFO["ppid"]} are
-updated to reflect the correct values.
-
address@hidden ret = waitpid(pid)
-This function takes a numeric argument, which is the process-id to
-wait for. The return value is that of the
address@hidden()} system call.
-
address@hidden ret = wait()
-This function waits for the first child to die.
-The return value is that of the
address@hidden()} system call.
address@hidden table
-
-There is no corresponding @code{exec()} function.
-
-Here is an example:
-
address@hidden
-@@load "fork"
address@hidden
-if ((pid = fork()) == 0)
-    print "hello from the child"
-else
-    print "hello from the parent"
address@hidden example
-
address@hidden Extension Sample Ord
address@hidden Character and Numeric values: @code{ord()} and @code{chr()}
-
-The @code{ordchr} extension adds two functions, named
address@hidden()} and @code{chr()}, as follows.
-
address@hidden @code
address@hidden number = ord(string)
-Return the numeric value of the first character in @code{string}.
-
address@hidden char = chr(number)
-Return the string whose first character is that represented by @code{number}.
address@hidden table
-
-These functions are inspired by the Pascal language functions
-of the same name.  Here is an example:
-
address@hidden
-@@load "ordchr"
address@hidden
-printf("The numeric value of 'A' is %d\n", ord("A"))
-printf("The string value of 65 is %s\n", chr(65))
address@hidden example
-
address@hidden Extension Sample Readdir
address@hidden Reading Directories
-
-The @code{readdir} extension adds an input parser for directories, and
-adds a single function named @code{readdir_do_ftype()}.
-The usage is as follows:
-
address@hidden
-@@load "readdir"
-
-readdir_do_ftype("stat")    # or "dirent" or "never"
address@hidden example
-
-When this extension is in use, instead of skipping directories named
-on the command line (or with @code{getline}),
-they are read, with each entry returned as a record.
-
-The record consists of at least two fields: the inode number and the
-filename, separated by a forward slash character.
-On systems where the directory entry contains the file type, the record
-has a third field which is a single letter indicating the type of the
-file:
-
address@hidden @columnfractions .1 .9
address@hidden Letter @tab File Type
address@hidden @code{b} @tab Block device
address@hidden @code{c} @tab Character device
address@hidden @code{d} @tab Directory
address@hidden @code{f} @tab Regular file
address@hidden @code{l} @tab Symbolic link
address@hidden @code{p} @tab Named pipe (FIFO)
address@hidden @code{s} @tab Socket
address@hidden @code{u} @tab Anything else (unknown)
address@hidden multitable
-
-On systems without the file type information, calling
address@hidden("stat")} causes the extension to use the
address@hidden()} system call to retrieve the appropriate information. This
-is not the default, since @code{lstat()} is a potentially expensive
-operation.  By calling @samp{readdir_do_ftype("never")} one can ensure
-that the file type information is never displayed, even when readily
-available in the directory entry.
-
-The third option, @samp{readdir_do_ftype("dirent")}, takes file type
-information from the directory entry, if it is available.  This is the
-default on systems that supply this information.
-
-The @code{readdir_do_ftype()} function sets @code{ERRNO} if called
-without arguments or with invalid arguments.
-
address@hidden NOTE
-On GNU/Linux systems, there are filesystems that don't support the
address@hidden entry (see the @i{readdir}(3) manual page), and so the file
-type is always @samp{u}.  Therefore, using @samp{readdir_do_ftype("stat")}
-is advisable even on GNU/Linux systems.  In this case, the @code{readdir}
-extension falls back to using @code{lstat()} when it encounters an
-unknown file type.
address@hidden quotation
-
-Here is an example:
-
address@hidden
-@@load "readdir"
address@hidden
-BEGIN @{ FS = "/" @}
address@hidden print "file name is", $2 @}
address@hidden example
-
address@hidden Extension Sample Revout
address@hidden Reversing Output
-
-The @code{revoutput} extension adds a simple output wrapper that reverses
-the characters in each output line.  It's main purpose is to show how to
-write an output wrapper, although it may be mildly amusing for the unwary.
-Here is an example:
-
address@hidden
-@@load "revoutput"
-
-BEGIN @{
-    REVOUT = 1
-    print "hello, world" > "/dev/stdout"
address@hidden
address@hidden example
-
-The output from this program is:
address@hidden ,olleh}.
-
address@hidden Extension Sample Rev2way
address@hidden Two-Way I/O Example
-
-The @code{revtwoway} extension adds a simple two-way processor that
-reverses the characters in each line sent to it for reading back by
-the @command{awk} program.  It's main purpose is to show how to write
-a two-way processor, although it may also be mildly amusing.
-The following example shows how to use it:
-
address@hidden
-@@load "revtwoway"
-
-BEGIN @{
-    cmd = "/magic/mirror"
-    print "hello, world" |& cmd
-    cmd |& getline result
-    print result
-    close(cmd)
address@hidden
address@hidden example
-
address@hidden Extension Sample Read write array
address@hidden Dumping and Restoring An Array
-
-The @code{rwarray} extension adds two functions,
-named @code{writea()} and @code{reada()}, as follows:
-
address@hidden @code
address@hidden ret = writea(file, array)
-This function takes a string argument, which is the name of the file
-to which dump the array, and the array itself as the second argument.
address@hidden()} understands multidimensional arrays.  It returns one on
-success, or zero upon failure.
-
address@hidden ret = reada(file, array)
address@hidden()} is the inverse of @code{writea()};
-it reads the file named as its first argument, filling in
-the array named as the second argument. It clears the array first.
-Here too, the return value is one on success and zero upon failure.
address@hidden table
-
-The array created by @code{reada()} is identical to that written by
address@hidden()} in the sense that the contents are the same. However,
-due to implementation issues, the array traversal order of the recreated
-array is likely to be different from that of the original array.  As array
-traversal order in @command{awk} is by default undefined, this is not
-(technically) a problem.  If you need to guarantee a particular traversal
-order, use the array sorting features in @command{gawk} to do so
-(@pxref{Array Sorting}).
-
-The file contains binary data.  All integral values are written in network
-byte order.  However, double precision floating-point values are written
-as native binary data.  Thus, arrays containing only string data can
-theoretically be dumped on systems with one byte order and restored on
-systems with a different one, but this has not been tried.
-
-Here is an example:
-
address@hidden
-@@load "rwarray"
address@hidden
-ret = writea("arraydump.bin", array)
address@hidden
-ret = reada("arraydump.bin", array)
address@hidden example
-
address@hidden Extension Sample Readfile
address@hidden Reading An Entire File
-
-The @code{readfile} extension adds a single function
-named @code{readfile()}:
-
address@hidden @code
address@hidden result = readfile("/some/path")
-The argument is the name of the file to read.  The return value is a
-string containing the entire contents of the requested file.  Upon error,
-the function returns the empty string and sets @code{ERRNO}.
address@hidden table
-
-Here is an example:
-
address@hidden
-@@load "readfile"
address@hidden
-contents = readfile("/path/to/file");
-if (contents == "" && ERRNO != "") @{
-    print("problem reading file", ERRNO) > "/dev/stderr"
-    ...
address@hidden
address@hidden example
-
address@hidden Extension Sample API Tests
address@hidden API Tests
-
-The @code{testext} extension exercises parts of the extension API that
-are not tested by the other samples.  The @file{extension/testext.c}
-file contains both the C code for the extension and @command{awk}
-test code inside C comments that run the tests. The testing framework
-extracts the @command{awk} code and runs the tests.  See the source file
-for more information.
-
address@hidden Extension Sample Time
address@hidden Extension Time Functions
-
address@hidden time
address@hidden sleep
-
-These functions can be used by either invoking @command{gawk}
-with a command-line argument of @samp{-l time} or by
-inserting @samp{@@load "time"} in your script.
-
address@hidden @code
-
address@hidden @code{gettimeofday} time extension function
address@hidden the_time = gettimeofday()
-Return the time in seconds that has elapsed since 1970-01-01 UTC as a
-floating point value.  If the time is unavailable on this platform, return
address@hidden and set @code{ERRNO}.  The returned time should have sub-second
-precision, but the actual precision will vary based on the platform.
-If the standard C @code{gettimeofday()} system call is available on this
-platform, then it simply returns the value.  Otherwise, if on Windows,
-it tries to use @code{GetSystemTimeAsFileTime()}.
-
address@hidden @code{sleep} time extension function
address@hidden result = sleep(@var{seconds})
-Attempt to sleep for @var{seconds} seconds.  If @var{seconds} is negative,
-or the attempt to sleep fails, return @minus{}1 and set @code{ERRNO}.
-Otherwise, return zero after sleeping for the indicated amount of time.
-Note that @var{seconds} may be a floating-point (non-integral) value.
-Implementation details: depending on platform availability, this function
-tries to use @code{nanosleep()} or @code{select()} to implement the delay.
address@hidden table
-
address@hidden gawkextlib
address@hidden The @code{gawkextlib} Project
-
-The @uref{http://sourceforge.net/projects/gawkextlib/, @code{gawkextlib}}
-project provides a number of @command{gawk} extensions, including one for
-processing XML files.  This is the evolution of the original @command{xgawk}
-(XML @command{gawk}) project.
-
-As of this writing, there are four extensions:
-
address@hidden @bullet
address@hidden
-XML parser extension, using the @uref{http://expat.sourceforge.net, Expat}
-XML parsing library.
-
address@hidden
-Postgres SQL extension.
-
address@hidden
-GD graphics library extension.
-
address@hidden
-MPFR library extension.
-This provides access to a number of MPFR functions which @command{gawk}'s
-native MPFR support does not.
address@hidden itemize
-
-The @code{time} extension described earlier (@pxref{Extension Sample
-Time}) was originally from this project but has been moved in to the
-main @command{gawk} distribution.
-
-You can check out the code for the @code{gawkextlib} project
-using the @uref{http://git-scm.com, GIT} distributed source
-code control system.  The command is as follows:
-
address@hidden
-git clone git://git.code.sf.net/p/gawkextlib/code gawkextlib-code
address@hidden example
-
-You will need to have the @uref{http://expat.sourceforge.net, Expat}
-XML parser library installed in order to build and use the XML extension.
-
-In addition, you must have the GNU Autotools installed
-(@uref{http://www.gnu.org/software/autoconf, Autoconf},
address@hidden://www.gnu.org/software/automake, Automake},
address@hidden://www.gnu.org/software/libtool, Libtool},
-and
address@hidden://www.gnu.org/software/gettext, Gettext}).
-
-The simple recipe for building and testing @code{gawkextlib} is as follows.
-First, build and install @command{gawk}:
-
address@hidden
-cd .../path/to/gawk/code
-./configure --prefix=/tmp/newgawk     @ii{Install in /tmp/newgawk for now}
-make && make check                    @ii{Build and check that all is OK}
-make install                          @ii{Install gawk}
address@hidden example
-
-Next, build @code{gawkextlib} and test it:
-
address@hidden
-cd .../path/to/gawkextlib-code
-./update-autotools                    @ii{Generate configure, etc.}
-                                      @ii{You may have to run this command 
twice}
-./configure --with-gawk=/tmp/newgawk  @ii{Configure, point at ``installed'' 
gawk}
-make && make check                    @ii{Build and check that all is OK}
address@hidden example
-
-If you write an extension that you wish to share with other
address@hidden users, please consider doing so through the
address@hidden project.
-
address@hidden Fake Chapter
address@hidden Fake Sections For Cross References
-
address@hidden
-* Reference to Elements::       Referring to an Array Element.
-* Built-in::                    Built-in Functions.
-* Built-in Variables::          Built-in Variables.
-* Options::                     Command-Line Options.
-* AWKLIBPATH Variable::         The @env{AWKLIBPATH} Environment Variable.
-* BEGINFILE/ENDFILE::           The @code{BEGINFILE} and @code{ENDFILE} 
Special Patterns.
-* Redirection::                 Redirecting Output of @code{print} and 
@code{printf}.
-* Arrays::                      Arrays in @command{awk}.
-* Conversion::                  Conversion of Strings and Numbers.
-* Delete::                      The @code{delete} Statement.
-* String Functions::            String-Manipulation Functions.
-* Glossary::                    Glossary.
-* Copying::                     GNU General Public License.
-* Reading Files::               Reading Input Files.
-* Time Functions::              Time Functions.
-* Array Sorting::               Controlling Array Traversal and Array Sorting.
address@hidden menu
-
address@hidden Reference to Elements
address@hidden Referring to an Array Element
-
address@hidden Built-in
address@hidden Built-in Functions
-
address@hidden Built-in Variables
address@hidden Built-in Variables
-
address@hidden Options
address@hidden Command-Line Options
-
address@hidden AWKLIBPATH Variable
address@hidden The @env{AWKLIBPATH} Environment Variable
-
address@hidden BEGINFILE/ENDFILE
address@hidden The @code{BEGINFILE} and @code{ENDFILE} Special Patterns
-
address@hidden Redirection
address@hidden Redirecting Output of @code{print} and @code{printf}
-
address@hidden Arrays
address@hidden Arrays in @command{awk}
-
address@hidden Conversion
address@hidden Conversion of Strings and Numbers
-
address@hidden Delete
address@hidden The @code{delete} Statement
-
address@hidden String Functions
address@hidden String-Manipulation Functions
-
address@hidden Glossary
address@hidden Glossary
-
address@hidden Copying
address@hidden GNU General Public License
-
address@hidden Reading Files
address@hidden Reading Input Files
-
address@hidden Time Functions
address@hidden Time Functions
-
address@hidden Array Sorting
address@hidden Controlling Array Traversal and Array Sorting
-
address@hidden
-shold

http://git.sv.gnu.org/cgit/gawk.git/commit/?id=d5cc356948eb6d3ed024b1addad6daccb809448b

commit d5cc356948eb6d3ed024b1addad6daccb809448b
Author: Arnold D. Robbins <address@hidden>
Date:   Tue Nov 6 21:39:12 2012 +0200

    Add parts to gawk.texi, rearrange chapters.

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 61cce0f..4616137 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,8 @@
+2012-11-06         Arnold D. Robbins     <address@hidden>
+
+       * gawk.texi: Rearrange chapter order and separate into parts
+       using @part for TeX.
+
 2012-11-05         Arnold D. Robbins     <address@hidden>
 
        * gawk.texi: Semi-rationalize invocations of @image.
diff --git a/doc/gawk.info b/doc/gawk.info
index f093d4f..cc0c259 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -87,13 +87,13 @@ texts being (a) (see below), and with the Back-Cover Texts 
being (b)
 * Arrays::                         The description and use of arrays. Also
                                    includes array-oriented control statements.
 * Functions::                      Built-in and user-defined functions.
+* Library Functions::              A Library of `awk' Functions.
+* Sample Programs::                Many `awk' programs with complete
+                                   explanations.
 * Internationalization::           Getting `gawk' to speak your
                                    language.
 * Advanced Features::              Stuff for advanced users, specific to
                                    `gawk'.
-* Library Functions::              A Library of `awk' Functions.
-* Sample Programs::                Many `awk' programs with complete
-                                   explanations.
 * Debugger::                       The `gawk' debugger.
 * Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
                                    `gawk'.
@@ -385,28 +385,6 @@ texts being (a) (see below), and with the Back-Cover Texts 
being (b)
                                         runtime.
 * Indirect Calls::                      Choosing the function to call at
                                         runtime.
-* I18N and L10N::                       Internationalization and Localization.
-* Explaining gettext::                  How GNU `gettext' works.
-* Programmer i18n::                     Features for the programmer.
-* Translator i18n::                     Features for the translator.
-* String Extraction::                   Extracting marked strings.
-* Printf Ordering::                     Rearranging `printf' arguments.
-* I18N Portability::                    `awk'-level portability
-                                        issues.
-* I18N Example::                        A simple i18n example.
-* Gawk I18N::                           `gawk' is also
-                                        internationalized.
-* Nondecimal Data::                     Allowing nondecimal input data.
-* Array Sorting::                       Facilities for controlling array
-                                        traversal and sorting arrays.
-* Controlling Array Traversal::         How to use PROCINFO["sorted_in"].
-* Array Sorting Functions::             How to use `asort()' and
-                                        `asorti()'.
-* Two-way I/O::                         Two-way communications with another
-                                        process.
-* TCP/IP Networking::                   Using `gawk' for network
-                                        programming.
-* Profiling::                           Profiling your `awk' programs.
 * Library Names::                       How to best name private global
                                         variables in library functions.
 * General Functions::                   Functions that are of general use.
@@ -468,6 +446,28 @@ texts being (a) (see below), and with the Back-Cover Texts 
being (b)
 * Anagram Program::                     Finding anagrams from a dictionary.
 * Signature Program::                   People do amazing things with too much
                                         time on their hands.
+* I18N and L10N::                       Internationalization and Localization.
+* Explaining gettext::                  How GNU `gettext' works.
+* Programmer i18n::                     Features for the programmer.
+* Translator i18n::                     Features for the translator.
+* String Extraction::                   Extracting marked strings.
+* Printf Ordering::                     Rearranging `printf' arguments.
+* I18N Portability::                    `awk'-level portability
+                                        issues.
+* I18N Example::                        A simple i18n example.
+* Gawk I18N::                           `gawk' is also
+                                        internationalized.
+* Nondecimal Data::                     Allowing nondecimal input data.
+* Array Sorting::                       Facilities for controlling array
+                                        traversal and sorting arrays.
+* Controlling Array Traversal::         How to use PROCINFO["sorted_in"].
+* Array Sorting Functions::             How to use `asort()' and
+                                        `asorti()'.
+* Two-way I/O::                         Two-way communications with another
+                                        process.
+* TCP/IP Networking::                   Using `gawk' for network
+                                        programming.
+* Profiling::                           Profiling your `awk' programs.
 * Debugging::                           Introduction to `gawk'
                                         debugger.
 * Debugging Concepts::                  Debugging in General.
@@ -941,6 +941,12 @@ expert should find useful.  In particular, the description 
of POSIX
 `awk' and the example programs in *note Library Functions::, and in
 *note Sample Programs::, should be of interest.
 
+   This Info file is split into several parts, as follows:
+
+   Part I describes the `awk' language and `gawk' program in detail.
+It starts with the basics, and continues through all of the features of
+`awk'.  It contains the following chapters:
+
    *note Getting Started::, provides the essentials you need to know to
 begin using `awk'.
 
@@ -973,6 +979,21 @@ described, as well as sorting arrays in `gawk'.  It also 
describes how
    *note Functions::, describes the built-in functions `awk' and `gawk'
 provide, as well as how to define your own functions.
 
+   Part II shows how to use `awk' and `gawk' for problem solving.
+There is lots of code here for you to read and learn from.  It contains
+the following chapters:
+
+   *note Library Functions::, which provides a number of functions
+meant to be used from main `awk' programs.
+
+   *note Sample Programs::, which provides many sample `awk' programs.
+
+   Reading these two chapters allows you to see `awk' solving real
+problems.
+
+   Part III focuses on features specific to `gawk'.  It contains the
+following chapters:
+
    *note Internationalization::, describes special features in `gawk'
 for translating program messages into different languages at runtime.
 
@@ -981,10 +1002,6 @@ advanced features.  Of particular note are the abilities 
to have
 two-way communications with another process, perform TCP/IP networking,
 and profile your `awk' programs.
 
-   *note Library Functions::, and *note Sample Programs::, provide many
-sample `awk' programs.  Reading them allows you to see `awk' solving
-real problems.
-
    *note Debugger::, describes the `awk' debugger.
 
    *note Arbitrary Precision Arithmetic::, describes advanced
@@ -993,6 +1010,10 @@ arithmetic facilities provided by `gawk'.
    *note Dynamic Extensions::, describes how to add new variables and
 functions to `gawk' by writing extensions in C.
 
+   Part IV provides the appendices, the Glossary, and two licenses that
+cover the `gawk' source code and this Info file, respectively.  It
+contains the following appendices:
+
    *note Language History::, describes how the `awk' language has
 evolved since its first release to present.  It also describes how
 `gawk' has acquired features over time.
@@ -10928,7 +10949,7 @@ by creating an arbitrary index:
      -| a
 
 
-File: gawk.info,  Node: Functions,  Next: Internationalization,  Prev: Arrays, 
 Up: Top
+File: gawk.info,  Node: Functions,  Next: Library Functions,  Prev: Arrays,  
Up: Top
 
 9 Functions
 ***********
@@ -13374,5809 +13395,5809 @@ example, in the following case:
 `gawk' will look up the actual function to call only once.
 
 
-File: gawk.info,  Node: Internationalization,  Next: Advanced Features,  Prev: 
Functions,  Up: Top
+File: gawk.info,  Node: Library Functions,  Next: Sample Programs,  Prev: 
Functions,  Up: Top
 
-10 Internationalization with `gawk'
-***********************************
+10 A Library of `awk' Functions
+*******************************
 
-Once upon a time, computer makers wrote software that worked only in
-English.  Eventually, hardware and software vendors noticed that if
-their systems worked in the native languages of non-English-speaking
-countries, they were able to sell more systems.  As a result,
-internationalization and localization of programs and software systems
-became a common practice.
+*note User-defined::, describes how to write your own `awk' functions.
+Writing functions is important, because it allows you to encapsulate
+algorithms and program tasks in a single place.  It simplifies
+programming, making program development more manageable, and making
+programs more readable.
 
-   For many years, the ability to provide internationalization was
-largely restricted to programs written in C and C++.  This major node
-describes the underlying library `gawk' uses for internationalization,
-as well as how `gawk' makes internationalization features available at
-the `awk' program level.  Having internationalization available at the
-`awk' level gives software developers additional flexibility--they are
-no longer forced to write in C or C++ when internationalization is a
-requirement.
+   One valuable way to learn a new programming language is to _read_
+programs in that language.  To that end, this major node and *note
+Sample Programs::, provide a good-sized body of code for you to read,
+and hopefully, to learn from.
 
-* Menu:
+   This major node presents a library of useful `awk' functions.  Many
+of the sample programs presented later in this Info file use these
+functions.  The functions are presented here in a progression from
+simple to complex.
 
-* I18N and L10N::               Internationalization and Localization.
-* Explaining gettext::          How GNU `gettext' works.
-* Programmer i18n::             Features for the programmer.
-* Translator i18n::             Features for the translator.
-* I18N Example::                A simple i18n example.
-* Gawk I18N::                   `gawk' is also internationalized.
+   *note Extract Program::, presents a program that you can use to
+extract the source code for these example library functions and
+programs from the Texinfo source for this Info file.  (This has already
+been done as part of the `gawk' distribution.)
 
-
-File: gawk.info,  Node: I18N and L10N,  Next: Explaining gettext,  Up: 
Internationalization
+   If you have written one or more useful, general-purpose `awk'
+functions and would like to contribute them to the `awk' user
+community, see *note How To Contribute::, for more information.
 
-10.1 Internationalization and Localization
-==========================================
+   The programs in this major node and in *note Sample Programs::,
+freely use features that are `gawk'-specific.  Rewriting these programs
+for different implementations of `awk' is pretty straightforward.
 
-"Internationalization" means writing (or modifying) a program once, in
-such a way that it can use multiple languages without requiring further
-source-code changes.  "Localization" means providing the data necessary
-for an internationalized program to work in a particular language.
-Most typically, these terms refer to features such as the language used
-for printing error messages, the language used to read responses, and
-information related to how numerical and monetary values are printed
-and read.
+   * Diagnostic error messages are sent to `/dev/stderr'.  Use `| "cat
+     1>&2"' instead of `> "/dev/stderr"' if your system does not have a
+     `/dev/stderr', or if you cannot use `gawk'.
 
-
-File: gawk.info,  Node: Explaining gettext,  Next: Programmer i18n,  Prev: 
I18N and L10N,  Up: Internationalization
+   * A number of programs use `nextfile' (*note Nextfile Statement::)
+     to skip any remaining input in the input file.
 
-10.2 GNU `gettext'
-==================
+   * Finally, some of the programs choose to ignore upper- and lowercase
+     distinctions in their input. They do so by assigning one to
+     `IGNORECASE'.  You can achieve almost the same effect(1) by adding
+     the following rule to the beginning of the program:
 
-The facilities in GNU `gettext' focus on messages; strings printed by a
-program, either directly or via formatting with `printf' or
-`sprintf()'.(1)
+          # ignore case
+          { $0 = tolower($0) }
 
-   When using GNU `gettext', each application has its own "text
-domain".  This is a unique name, such as `kpilot' or `gawk', that
-identifies the application.  A complete application may have multiple
-components--programs written in C or C++, as well as scripts written in
-`sh' or `awk'.  All of the components use the same text domain.
+     Also, verify that all regexp and string constants used in
+     comparisons use only lowercase letters.
 
-   To make the discussion concrete, assume we're writing an application
-named `guide'.  Internationalization consists of the following steps,
-in this order:
+* Menu:
 
-  1. The programmer goes through the source for all of `guide''s
-     components and marks each string that is a candidate for
-     translation.  For example, `"`-F': option required"' is a good
-     candidate for translation.  A table with strings of option names
-     is not (e.g., `gawk''s `--profile' option should remain the same,
-     no matter what the local language).
+* Library Names::               How to best name private global variables in
+                                library functions.
+* General Functions::           Functions that are of general use.
+* Data File Management::        Functions for managing command-line data
+                                files.
+* Getopt Function::             A function for processing command-line
+                                arguments.
+* Passwd Functions::            Functions for getting user information.
+* Group Functions::             Functions for getting group information.
+* Walking Arrays::              A function to walk arrays of arrays.
 
-  2. The programmer indicates the application's text domain (`"guide"')
-     to the `gettext' library, by calling the `textdomain()' function.
+   ---------- Footnotes ----------
 
-  3. Messages from the application are extracted from the source code
-     and collected into a portable object template file (`guide.pot'),
-     which lists the strings and their translations.  The translations
-     are initially empty.  The original (usually English) messages
-     serve as the key for lookup of the translations.
+   (1) The effects are not identical.  Output of the transformed record
+will be in all lowercase, while `IGNORECASE' preserves the original
+contents of the input record.
 
-  4. For each language with a translator, `guide.pot' is copied to a
-     portable object file (`.po') and translations are created and
-     shipped with the application.  For example, there might be a
-     `fr.po' for a French translation.
+
+File: gawk.info,  Node: Library Names,  Next: General Functions,  Up: Library 
Functions
 
-  5. Each language's `.po' file is converted into a binary message
-     object (`.mo') file.  A message object file contains the original
-     messages and their translations in a binary format that allows
-     fast lookup of translations at runtime.
+10.1 Naming Library Function Global Variables
+=============================================
 
-  6. When `guide' is built and installed, the binary translation files
-     are installed in a standard place.
+Due to the way the `awk' language evolved, variables are either
+"global" (usable by the entire program) or "local" (usable just by a
+specific function).  There is no intermediate state analogous to
+`static' variables in C.
 
-  7. For testing and development, it is possible to tell `gettext' to
-     use `.mo' files in a different directory than the standard one by
-     using the `bindtextdomain()' function.
+   Library functions often need to have global variables that they can
+use to preserve state information between calls to the function--for
+example, `getopt()''s variable `_opti' (*note Getopt Function::).  Such
+variables are called "private", since the only functions that need to
+use them are the ones in the library.
 
-  8. At runtime, `guide' looks up each string via a call to
-     `gettext()'.  The returned string is the translated string if
-     available, or the original string if not.
+   When writing a library function, you should try to choose names for
+your private variables that will not conflict with any variables used by
+either another library function or a user's main program.  For example,
+a name like `i' or `j' is not a good choice, because user programs
+often use variable names like these for their own purposes.
 
-  9. If necessary, it is possible to access messages from a different
-     text domain than the one belonging to the application, without
-     having to switch the application's default text domain back and
-     forth.
+   The example programs shown in this major node all start the names of
+their private variables with an underscore (`_').  Users generally
+don't use leading underscores in their variable names, so this
+convention immediately decreases the chances that the variable name
+will be accidentally shared with the user's program.
 
-   In C (or C++), the string marking and dynamic translation lookup are
-accomplished by wrapping each string in a call to `gettext()':
+   In addition, several of the library functions use a prefix that helps
+indicate what function or set of functions use the variables--for
+example, `_pw_byname' in the user database routines (*note Passwd
+Functions::).  This convention is recommended, since it even further
+decreases the chance of inadvertent conflict among variable names.
+Note that this convention is used equally well for variable names and
+for private function names.(1)
 
-     printf("%s", gettext("Don't Panic!\n"));
+   As a final note on variable naming, if a function makes global
+variables available for use by a main program, it is a good convention
+to start that variable's name with a capital letter--for example,
+`getopt()''s `Opterr' and `Optind' variables (*note Getopt Function::).
+The leading capital letter indicates that it is global, while the fact
+that the variable name is not all capital letters indicates that the
+variable is not one of `awk''s built-in variables, such as `FS'.
 
-   The tools that extract messages from source code pull out all
-strings enclosed in calls to `gettext()'.
+   It is also important that _all_ variables in library functions that
+do not need to save state are, in fact, declared local.(2) If this is
+not done, the variable could accidentally be used in the user's
+program, leading to bugs that are very difficult to track down:
 
-   The GNU `gettext' developers, recognizing that typing `gettext(...)'
-over and over again is both painful and ugly to look at, use the macro
-`_' (an underscore) to make things easier:
+     function lib_func(x, y,    l1, l2)
+     {
+         ...
+         USE VARIABLE some_var   # some_var should be local
+         ...                     # but is not by oversight
+     }
 
-     /* In the standard header file: */
-     #define _(str) gettext(str)
+   A different convention, common in the Tcl community, is to use a
+single associative array to hold the values needed by the library
+function(s), or "package."  This significantly decreases the number of
+actual global names in use.  For example, the functions described in
+*note Passwd Functions::, might have used array elements
+`PW_data["inited"]', `PW_data["total"]', `PW_data["count"]', and
+`PW_data["awklib"]', instead of `_pw_inited', `_pw_awklib', `_pw_total',
+and `_pw_count'.
 
-     /* In the program text: */
-     printf("%s", _("Don't Panic!\n"));
-
-This reduces the typing overhead to just three extra characters per
-string and is considerably easier to read as well.
+   The conventions presented in this minor node are exactly that:
+conventions. You are not required to write your programs this way--we
+merely recommend that you do so.
 
-   There are locale "categories" for different types of locale-related
-information.  The defined locale categories that `gettext' knows about
-are:
+   ---------- Footnotes ----------
 
-`LC_MESSAGES'
-     Text messages.  This is the default category for `gettext'
-     operations, but it is possible to supply a different one
-     explicitly, if necessary.  (It is almost never necessary to supply
-     a different category.)
+   (1) While all the library routines could have been rewritten to use
+this convention, this was not done, in order to show how our own `awk'
+programming style has evolved and to provide some basis for this
+discussion.
 
-`LC_COLLATE'
-     Text-collation information; i.e., how different characters and/or
-     groups of characters sort in a given language.
+   (2) `gawk''s `--dump-variables' command-line option is useful for
+verifying this.
 
-`LC_CTYPE'
-     Character-type information (alphabetic, digit, upper- or
-     lowercase, and so on).  This information is accessed via the POSIX
-     character classes in regular expressions, such as `/[[:alnum:]]/'
-     (*note Regexp Operators::).
+
+File: gawk.info,  Node: General Functions,  Next: Data File Management,  Prev: 
Library Names,  Up: Library Functions
 
-`LC_MONETARY'
-     Monetary information, such as the currency symbol, and whether the
-     symbol goes before or after a number.
+10.2 General Programming
+========================
 
-`LC_NUMERIC'
-     Numeric information, such as which characters to use for the
-     decimal point and the thousands separator.(2)
+This minor node presents a number of functions that are of general
+programming use.
 
-`LC_RESPONSE'
-     Response information, such as how "yes" and "no" appear in the
-     local language, and possibly other information as well.
+* Menu:
 
-`LC_TIME'
-     Time- and date-related information, such as 12- or 24-hour clock,
-     month printed before or after the day in a date, local month
-     abbreviations, and so on.
+* Strtonum Function::           A replacement for the built-in
+                                `strtonum()' function.
+* Assert Function::             A function for assertions in `awk'
+                                programs.
+* Round Function::              A function for rounding if `sprintf()'
+                                does not do it correctly.
+* Cliff Random Function::       The Cliff Random Number Generator.
+* Ordinal Functions::           Functions for using characters as numbers and
+                                vice versa.
+* Join Function::               A function to join an array into a string.
+* Getlocaltime Function::       A function to get formatted times.
 
-`LC_ALL'
-     All of the above.  (Not too useful in the context of `gettext'.)
+
+File: gawk.info,  Node: Strtonum Function,  Next: Assert Function,  Up: 
General Functions
 
-   ---------- Footnotes ----------
+10.2.1 Converting Strings To Numbers
+------------------------------------
 
-   (1) For some operating systems, the `gawk' port doesn't support GNU
-`gettext'.  Therefore, these features are not available if you are
-using one of those operating systems. Sorry.
+The `strtonum()' function (*note String Functions::) is a `gawk'
+extension.  The following function provides an implementation for other
+versions of `awk':
 
-   (2) Americans use a comma every three decimal places and a period
-for the decimal point, while many Europeans do exactly the opposite:
-1,234.56 versus 1.234,56.
+     # mystrtonum --- convert string to number
 
-
-File: gawk.info,  Node: Programmer i18n,  Next: Translator i18n,  Prev: 
Explaining gettext,  Up: Internationalization
+     function mystrtonum(str,        ret, chars, n, i, k, c)
+     {
+         if (str ~ /^0[0-7]*$/) {
+             # octal
+             n = length(str)
+             ret = 0
+             for (i = 1; i <= n; i++) {
+                 c = substr(str, i, 1)
+                 if ((k = index("01234567", c)) > 0)
+                     k-- # adjust for 1-basing in awk
 
-10.3 Internationalizing `awk' Programs
-======================================
+                 ret = ret * 8 + k
+             }
+         } else if (str ~ /^0[xX][[:xdigit:]]+/) {
+             # hexadecimal
+             str = substr(str, 3)    # lop off leading 0x
+             n = length(str)
+             ret = 0
+             for (i = 1; i <= n; i++) {
+                 c = substr(str, i, 1)
+                 c = tolower(c)
+                 if ((k = index("0123456789", c)) > 0)
+                     k-- # adjust for 1-basing in awk
+                 else if ((k = index("abcdef", c)) > 0)
+                     k += 9
 
-`gawk' provides the following variables and functions for
-internationalization:
+                 ret = ret * 16 + k
+             }
+         } else if (str ~ \
+       
/^[-+]?([0-9]+([.][0-9]*([Ee][0-9]+)?)?|([.][0-9]+([Ee][-+]?[0-9]+)?))$/) {
+             # decimal number, possibly floating point
+             ret = str + 0
+         } else
+             ret = "NOT-A-NUMBER"
 
-`TEXTDOMAIN'
-     This variable indicates the application's text domain.  For
-     compatibility with GNU `gettext', the default value is
-     `"messages"'.
+         return ret
+     }
 
-`_"your message here"'
-     String constants marked with a leading underscore are candidates
-     for translation at runtime.  String constants without a leading
-     underscore are not translated.
+     # BEGIN {     # gawk test harness
+     #     a[1] = "25"
+     #     a[2] = ".31"
+     #     a[3] = "0123"
+     #     a[4] = "0xdeadBEEF"
+     #     a[5] = "123.45"
+     #     a[6] = "1.e3"
+     #     a[7] = "1.32"
+     #     a[7] = "1.32E2"
+     #
+     #     for (i = 1; i in a; i++)
+     #         print a[i], strtonum(a[i]), mystrtonum(a[i])
+     # }
 
-`dcgettext(STRING [, DOMAIN [, CATEGORY]])'
-     Return the translation of STRING in text domain DOMAIN for locale
-     category CATEGORY.  The default value for DOMAIN is the current
-     value of `TEXTDOMAIN'.  The default value for CATEGORY is
-     `"LC_MESSAGES"'.
+   The function first looks for C-style octal numbers (base 8).  If the
+input string matches a regular expression describing octal numbers,
+then `mystrtonum()' loops through each character in the string.  It
+sets `k' to the index in `"01234567"' of the current octal digit.
+Since the return value is one-based, the `k--' adjusts `k' so it can be
+used in computing the return value.
 
-     If you supply a value for CATEGORY, it must be a string equal to
-     one of the known locale categories described in *note Explaining
-     gettext::.  You must also supply a text domain.  Use `TEXTDOMAIN'
-     if you want to use the current domain.
+   Similar logic applies to the code that checks for and converts a
+hexadecimal value, which starts with `0x' or `0X'.  The use of
+`tolower()' simplifies the computation for finding the correct numeric
+value for each hexadecimal digit.
 
-          CAUTION: The order of arguments to the `awk' version of the
-          `dcgettext()' function is purposely different from the order
-          for the C version.  The `awk' version's order was chosen to
-          be simple and to allow for reasonable `awk'-style default
-          arguments.
+   Finally, if the string matches the (rather complicated) regexp for a
+regular decimal integer or floating-point number, the computation `ret
+= str + 0' lets `awk' convert the value to a number.
 
-`dcngettext(STRING1, STRING2, NUMBER [, DOMAIN [, CATEGORY]])'
-     Return the plural form used for NUMBER of the translation of
-     STRING1 and STRING2 in text domain DOMAIN for locale category
-     CATEGORY. STRING1 is the English singular variant of a message,
-     and STRING2 the English plural variant of the same message.  The
-     default value for DOMAIN is the current value of `TEXTDOMAIN'.
-     The default value for CATEGORY is `"LC_MESSAGES"'.
+   A commented-out test program is included, so that the function can
+be tested with `gawk' and the results compared to the built-in
+`strtonum()' function.
 
-     The same remarks about argument order as for the `dcgettext()'
-     function apply.
+
+File: gawk.info,  Node: Assert Function,  Next: Round Function,  Prev: 
Strtonum Function,  Up: General Functions
 
-`bindtextdomain(DIRECTORY [, DOMAIN])'
-     Change the directory in which `gettext' looks for `.mo' files, in
-     case they will not or cannot be placed in the standard locations
-     (e.g., during testing).  Return the directory in which DOMAIN is
-     "bound."
+10.2.2 Assertions
+-----------------
 
-     The default DOMAIN is the value of `TEXTDOMAIN'.  If DIRECTORY is
-     the null string (`""'), then `bindtextdomain()' returns the
-     current binding for the given DOMAIN.
+When writing large programs, it is often useful to know that a
+condition or set of conditions is true.  Before proceeding with a
+particular computation, you make a statement about what you believe to
+be the case.  Such a statement is known as an "assertion".  The C
+language provides an `<assert.h>' header file and corresponding
+`assert()' macro that the programmer can use to make assertions.  If an
+assertion fails, the `assert()' macro arranges to print a diagnostic
+message describing the condition that should have been true but was
+not, and then it kills the program.  In C, using `assert()' looks this:
 
-   To use these facilities in your `awk' program, follow the steps
-outlined in *note Explaining gettext::, like so:
+     #include <assert.h>
 
-  1. Set the variable `TEXTDOMAIN' to the text domain of your program.
-     This is best done in a `BEGIN' rule (*note BEGIN/END::), or it can
-     also be done via the `-v' command-line option (*note Options::):
+     int myfunc(int a, double b)
+     {
+          assert(a <= 5 && b >= 17.1);
+          ...
+     }
 
-          BEGIN {
-              TEXTDOMAIN = "guide"
-              ...
-          }
+   If the assertion fails, the program prints a message similar to this:
 
-  2. Mark all translatable strings with a leading underscore (`_')
-     character.  It _must_ be adjacent to the opening quote of the
-     string.  For example:
+     prog.c:5: assertion failed: a <= 5 && b >= 17.1
 
-          print _"hello, world"
-          x = _"you goofed"
-          printf(_"Number of users is %d\n", nusers)
+   The C language makes it possible to turn the condition into a string
+for use in printing the diagnostic message.  This is not possible in
+`awk', so this `assert()' function also requires a string version of
+the condition that is being tested.  Following is the function:
 
-  3. If you are creating strings dynamically, you can still translate
-     them, using the `dcgettext()' built-in function:
+     # assert --- assert that a condition is true. Otherwise exit.
 
-          message = nusers " users logged in"
-          message = dcgettext(message, "adminprog")
-          print message
+     function assert(condition, string)
+     {
+         if (! condition) {
+             printf("%s:%d: assertion failed: %s\n",
+                 FILENAME, FNR, string) > "/dev/stderr"
+             _assert_exit = 1
+             exit 1
+         }
+     }
 
-     Here, the call to `dcgettext()' supplies a different text domain
-     (`"adminprog"') in which to find the message, but it uses the
-     default `"LC_MESSAGES"' category.
+     END {
+         if (_assert_exit)
+             exit 1
+     }
 
-  4. During development, you might want to put the `.mo' file in a
-     private directory for testing.  This is done with the
-     `bindtextdomain()' built-in function:
+   The `assert()' function tests the `condition' parameter. If it is
+false, it prints a message to standard error, using the `string'
+parameter to describe the failed condition.  It then sets the variable
+`_assert_exit' to one and executes the `exit' statement.  The `exit'
+statement jumps to the `END' rule. If the `END' rules finds
+`_assert_exit' to be true, it exits immediately.
 
-          BEGIN {
-             TEXTDOMAIN = "guide"   # our text domain
-             if (Testing) {
-                 # where to find our files
-                 bindtextdomain("testdir")
-                 # joe is in charge of adminprog
-                 bindtextdomain("../joe/testdir", "adminprog")
-             }
-             ...
-          }
+   The purpose of the test in the `END' rule is to keep any other `END'
+rules from running.  When an assertion fails, the program should exit
+immediately.  If no assertions fail, then `_assert_exit' is still false
+when the `END' rule is run normally, and the rest of the program's
+`END' rules execute.  For all of this to work correctly, `assert.awk'
+must be the first source file read by `awk'.  The function can be used
+in a program in the following way:
 
+     function myfunc(a, b)
+     {
+          assert(a <= 5 && b >= 17.1, "a <= 5 && b >= 17.1")
+          ...
+     }
 
-   *Note I18N Example::, for an example program showing the steps to
-create and use translations from `awk'.
+If the assertion fails, you see a message similar to the following:
 
-
-File: gawk.info,  Node: Translator i18n,  Next: I18N Example,  Prev: 
Programmer i18n,  Up: Internationalization
+     mydata:1357: assertion failed: a <= 5 && b >= 17.1
 
-10.4 Translating `awk' Programs
-===============================
+   There is a small problem with this version of `assert()'.  An `END'
+rule is automatically added to the program calling `assert()'.
+Normally, if a program consists of just a `BEGIN' rule, the input files
+and/or standard input are not read. However, now that the program has
+an `END' rule, `awk' attempts to read the input data files or standard
+input (*note Using BEGIN/END::), most likely causing the program to
+hang as it waits for input.
 
-Once a program's translatable strings have been marked, they must be
-extracted to create the initial `.po' file.  As part of translation, it
-is often helpful to rearrange the order in which arguments to `printf'
-are output.
+   There is a simple workaround to this: make sure that such a `BEGIN'
+rule always ends with an `exit' statement.
 
-   `gawk''s `--gen-pot' command-line option extracts the messages and
-is discussed next.  After that, `printf''s ability to rearrange the
-order for `printf' arguments at runtime is covered.
+
+File: gawk.info,  Node: Round Function,  Next: Cliff Random Function,  Prev: 
Assert Function,  Up: General Functions
 
-* Menu:
+10.2.3 Rounding Numbers
+-----------------------
 
-* String Extraction::           Extracting marked strings.
-* Printf Ordering::             Rearranging `printf' arguments.
-* I18N Portability::            `awk'-level portability issues.
+The way `printf' and `sprintf()' (*note Printf::) perform rounding
+often depends upon the system's C `sprintf()' subroutine.  On many
+machines, `sprintf()' rounding is "unbiased," which means it doesn't
+always round a trailing `.5' up, contrary to naive expectations.  In
+unbiased rounding, `.5' rounds to even, rather than always up, so 1.5
+rounds to 2 but 4.5 rounds to 4.  This means that if you are using a
+format that does rounding (e.g., `"%.0f"'), you should check what your
+system does.  The following function does traditional rounding; it
+might be useful if your `awk''s `printf' does unbiased rounding:
 
-
-File: gawk.info,  Node: String Extraction,  Next: Printf Ordering,  Up: 
Translator i18n
+     # round.awk --- do normal rounding
 
-10.4.1 Extracting Marked Strings
---------------------------------
+     function round(x,   ival, aval, fraction)
+     {
+        ival = int(x)    # integer part, int() truncates
 
-Once your `awk' program is working, and all the strings have been
-marked and you've set (and perhaps bound) the text domain, it is time
-to produce translations.  First, use the `--gen-pot' command-line
-option to create the initial `.pot' file:
+        # see if fractional part
+        if (ival == x)   # no fraction
+           return ival   # ensure no decimals
 
-     $ gawk --gen-pot -f guide.awk > guide.pot
+        if (x < 0) {
+           aval = -x     # absolute value
+           ival = int(aval)
+           fraction = aval - ival
+           if (fraction >= .5)
+              return int(x) - 1   # -2.5 --> -3
+           else
+              return int(x)       # -2.3 --> -2
+        } else {
+           fraction = x - ival
+           if (fraction >= .5)
+              return ival + 1
+           else
+              return ival
+        }
+     }
 
-   When run with `--gen-pot', `gawk' does not execute your program.
-Instead, it parses it as usual and prints all marked strings to
-standard output in the format of a GNU `gettext' Portable Object file.
-Also included in the output are any constant strings that appear as the
-first argument to `dcgettext()' or as the first and second argument to
-`dcngettext()'.(1) *Note I18N Example::, for the full list of steps to
-go through to create and test translations for `guide'.
+     # test harness
+     { print $0, round($0) }
 
-   ---------- Footnotes ----------
+
+File: gawk.info,  Node: Cliff Random Function,  Next: Ordinal Functions,  
Prev: Round Function,  Up: General Functions
 
-   (1) The `xgettext' utility that comes with GNU `gettext' can handle
-`.awk' files.
+10.2.4 The Cliff Random Number Generator
+----------------------------------------
 
-
-File: gawk.info,  Node: Printf Ordering,  Next: I18N Portability,  Prev: 
String Extraction,  Up: Translator i18n
+The Cliff random number generator
+(http://mathworld.wolfram.com/CliffRandomNumberGenerator.html) is a
+very simple random number generator that "passes the noise sphere test
+for randomness by showing no structure."  It is easily programmed, in
+less than 10 lines of `awk' code:
 
-10.4.2 Rearranging `printf' Arguments
--------------------------------------
+     # cliff_rand.awk --- generate Cliff random numbers
 
-Format strings for `printf' and `sprintf()' (*note Printf::) present a
-special problem for translation.  Consider the following:(1)
+     BEGIN { _cliff_seed = 0.1 }
 
-     printf(_"String `%s' has %d characters\n",
-               string, length(string)))
+     function cliff_rand()
+     {
+         _cliff_seed = (100 * log(_cliff_seed)) % 1
+         if (_cliff_seed < 0)
+             _cliff_seed = - _cliff_seed
+         return _cliff_seed
+     }
 
-   A possible German translation for this might be:
+   This algorithm requires an initial "seed" of 0.1.  Each new value
+uses the current seed as input for the calculation.  If the built-in
+`rand()' function (*note Numeric Functions::) isn't random enough, you
+might try using this function instead.
 
-     "%d Zeichen lang ist die Zeichenkette `%s'\n"
+
+File: gawk.info,  Node: Ordinal Functions,  Next: Join Function,  Prev: Cliff 
Random Function,  Up: General Functions
 
-   The problem should be obvious: the order of the format
-specifications is different from the original!  Even though `gettext()'
-can return the translated string at runtime, it cannot change the
-argument order in the call to `printf'.
+10.2.5 Translating Between Characters and Numbers
+-------------------------------------------------
 
-   To solve this problem, `printf' format specifiers may have an
-additional optional element, which we call a "positional specifier".
-For example:
+One commercial implementation of `awk' supplies a built-in function,
+`ord()', which takes a character and returns the numeric value for that
+character in the machine's character set.  If the string passed to
+`ord()' has more than one character, only the first one is used.
 
-     "%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
+   The inverse of this function is `chr()' (from the function of the
+same name in Pascal), which takes a number and returns the
+corresponding character.  Both functions are written very nicely in
+`awk'; there is no real reason to build them into the `awk' interpreter:
 
-   Here, the positional specifier consists of an integer count, which
-indicates which argument to use, and a `$'. Counts are one-based, and
-the format string itself is _not_ included.  Thus, in the following
-example, `string' is the first argument and `length(string)' is the
-second:
+     # ord.awk --- do ord and chr
 
-     $ gawk 'BEGIN {
-     >     string = "Dont Panic"
-     >     printf _"%2$d characters live in \"%1$s\"\n",
-     >                         string, length(string)
-     > }'
-     -| 10 characters live in "Dont Panic"
+     # Global identifiers:
+     #    _ord_:        numerical values indexed by characters
+     #    _ord_init:    function to initialize _ord_
 
-   If present, positional specifiers come first in the format
-specification, before the flags, the field width, and/or the precision.
+     BEGIN    { _ord_init() }
 
-   Positional specifiers can be used with the dynamic field width and
-precision capability:
+     function _ord_init(    low, high, i, t)
+     {
+         low = sprintf("%c", 7) # BEL is ascii 7
+         if (low == "\a") {    # regular ascii
+             low = 0
+             high = 127
+         } else if (sprintf("%c", 128 + 7) == "\a") {
+             # ascii, mark parity
+             low = 128
+             high = 255
+         } else {        # ebcdic(!)
+             low = 0
+             high = 255
+         }
 
-     $ gawk 'BEGIN {
-     >    printf("%*.*s\n", 10, 20, "hello")
-     >    printf("%3$*2$.*1$s\n", 20, 10, "hello")
-     > }'
-     -|      hello
-     -|      hello
+         for (i = low; i <= high; i++) {
+             t = sprintf("%c", i)
+             _ord_[t] = i
+         }
+     }
 
-     NOTE: When using `*' with a positional specifier, the `*' comes
-     first, then the integer position, and then the `$'.  This is
-     somewhat counterintuitive.
+   Some explanation of the numbers used by `chr' is worthwhile.  The
+most prominent character set in use today is ASCII.(1) Although an
+8-bit byte can hold 256 distinct values (from 0 to 255), ASCII only
+defines characters that use the values from 0 to 127.(2) In the now
+distant past, at least one minicomputer manufacturer used ASCII, but
+with mark parity, meaning that the leftmost bit in the byte is always
+1.  This means that on those systems, characters have numeric values
+from 128 to 255.  Finally, large mainframe systems use the EBCDIC
+character set, which uses all 256 values.  While there are other
+character sets in use on some older systems, they are not really worth
+worrying about:
 
-   `gawk' does not allow you to mix regular format specifiers and those
-with positional specifiers in the same string:
+     function ord(str,    c)
+     {
+         # only first character is of interest
+         c = substr(str, 1, 1)
+         return _ord_[c]
+     }
 
-     $ gawk 'BEGIN { printf _"%d %3$s\n", 1, 2, "hi" }'
-     error--> gawk: cmd. line:1: fatal: must use `count$' on all formats or 
none
+     function chr(c)
+     {
+         # force c to be numeric by adding 0
+         return sprintf("%c", c + 0)
+     }
 
-     NOTE: There are some pathological cases that `gawk' may fail to
-     diagnose.  In such cases, the output may not be what you expect.
-     It's still a bad idea to try mixing them, even if `gawk' doesn't
-     detect it.
+     #### test code ####
+     # BEGIN    \
+     # {
+     #    for (;;) {
+     #        printf("enter a character: ")
+     #        if (getline var <= 0)
+     #            break
+     #        printf("ord(%s) = %d\n", var, ord(var))
+     #    }
+     # }
 
-   Although positional specifiers can be used directly in `awk'
-programs, their primary purpose is to help in producing correct
-translations of format strings into languages different from the one in
-which the program is first written.
+   An obvious improvement to these functions is to move the code for the
+`_ord_init' function into the body of the `BEGIN' rule.  It was written
+this way initially for ease of development.  There is a "test program"
+in a `BEGIN' rule, to test the function.  It is commented out for
+production use.
 
    ---------- Footnotes ----------
 
-   (1) This example is borrowed from the GNU `gettext' manual.
+   (1) This is changing; many systems use Unicode, a very large
+character set that includes ASCII as a subset.  On systems with full
+Unicode support, a character can occupy up to 32 bits, making simple
+tests such as used here prohibitively expensive.
+
+   (2) ASCII has been extended in many countries to use the values from
+128 to 255 for country-specific characters.  If your  system uses these
+extensions, you can simplify `_ord_init' to loop from 0 to 255.
 
 
-File: gawk.info,  Node: I18N Portability,  Prev: Printf Ordering,  Up: 
Translator i18n
-
-10.4.3 `awk' Portability Issues
--------------------------------
+File: gawk.info,  Node: Join Function,  Next: Getlocaltime Function,  Prev: 
Ordinal Functions,  Up: General Functions
 
-`gawk''s internationalization features were purposely chosen to have as
-little impact as possible on the portability of `awk' programs that use
-them to other versions of `awk'.  Consider this program:
+10.2.6 Merging an Array into a String
+-------------------------------------
 
-     BEGIN {
-         TEXTDOMAIN = "guide"
-         if (Test_Guide)   # set with -v
-             bindtextdomain("/test/guide/messages")
-         print _"don't panic!"
-     }
+When doing string processing, it is often useful to be able to join all
+the strings in an array into one long string.  The following function,
+`join()', accomplishes this task.  It is used later in several of the
+application programs (*note Sample Programs::).
 
-As written, it won't work on other versions of `awk'.  However, it is
-actually almost portable, requiring very little change:
+   Good function design is important; this function needs to be general
+but it should also have a reasonable default behavior.  It is called
+with an array as well as the beginning and ending indices of the
+elements in the array to be merged.  This assumes that the array
+indices are numeric--a reasonable assumption since the array was likely
+created with `split()' (*note String Functions::):
 
-   * Assignments to `TEXTDOMAIN' won't have any effect, since
-     `TEXTDOMAIN' is not special in other `awk' implementations.
+     # join.awk --- join an array into a string
 
-   * Non-GNU versions of `awk' treat marked strings as the
-     concatenation of a variable named `_' with the string following
-     it.(1) Typically, the variable `_' has the null string (`""') as
-     its value, leaving the original string constant as the result.
+     function join(array, start, end, sep,    result, i)
+     {
+         if (sep == "")
+            sep = " "
+         else if (sep == SUBSEP) # magic value
+            sep = ""
+         result = array[start]
+         for (i = start + 1; i <= end; i++)
+             result = result sep array[i]
+         return result
+     }
 
-   * By defining "dummy" functions to replace `dcgettext()',
-     `dcngettext()' and `bindtextdomain()', the `awk' program can be
-     made to run, but all the messages are output in the original
-     language.  For example:
+   An optional additional argument is the separator to use when joining
+the strings back together.  If the caller supplies a nonempty value,
+`join()' uses it; if it is not supplied, it has a null value.  In this
+case, `join()' uses a single space as a default separator for the
+strings.  If the value is equal to `SUBSEP', then `join()' joins the
+strings with no separator between them.  `SUBSEP' serves as a "magic"
+value to indicate that there should be no separation between the
+component strings.(1)
 
-          function bindtextdomain(dir, domain)
-          {
-              return dir
-          }
+   ---------- Footnotes ----------
 
-          function dcgettext(string, domain, category)
-          {
-              return string
-          }
+   (1) It would be nice if `awk' had an assignment operator for
+concatenation.  The lack of an explicit operator for concatenation
+makes string operations more difficult than they really need to be.
 
-          function dcngettext(string1, string2, number, domain, category)
-          {
-              return (number == 1 ? string1 : string2)
-          }
+
+File: gawk.info,  Node: Getlocaltime Function,  Prev: Join Function,  Up: 
General Functions
 
-   * The use of positional specifications in `printf' or `sprintf()' is
-     _not_ portable.  To support `gettext()' at the C level, many
-     systems' C versions of `sprintf()' do support positional
-     specifiers.  But it works only if enough arguments are supplied in
-     the function call.  Many versions of `awk' pass `printf' formats
-     and arguments unchanged to the underlying C library version of
-     `sprintf()', but only one format and argument at a time.  What
-     happens if a positional specification is used is anybody's guess.
-     However, since the positional specifications are primarily for use
-     in _translated_ format strings, and since non-GNU `awk's never
-     retrieve the translated string, this should not be a problem in
-     practice.
+10.2.7 Managing the Time of Day
+-------------------------------
 
-   ---------- Footnotes ----------
+The `systime()' and `strftime()' functions described in *note Time
+Functions::, provide the minimum functionality necessary for dealing
+with the time of day in human readable form.  While `strftime()' is
+extensive, the control formats are not necessarily easy to remember or
+intuitively obvious when reading a program.
 
-   (1) This is good fodder for an "Obfuscated `awk'" contest.
+   The following function, `getlocaltime()', populates a user-supplied
+array with preformatted time information.  It returns a string with the
+current time formatted in the same way as the `date' utility:
 
-
-File: gawk.info,  Node: I18N Example,  Next: Gawk I18N,  Prev: Translator 
i18n,  Up: Internationalization
+     # getlocaltime.awk --- get the time of day in a usable format
 
-10.5 A Simple Internationalization Example
-==========================================
+     # Returns a string in the format of output of date(1)
+     # Populates the array argument time with individual values:
+     #    time["second"]       -- seconds (0 - 59)
+     #    time["minute"]       -- minutes (0 - 59)
+     #    time["hour"]         -- hours (0 - 23)
+     #    time["althour"]      -- hours (0 - 12)
+     #    time["monthday"]     -- day of month (1 - 31)
+     #    time["month"]        -- month of year (1 - 12)
+     #    time["monthname"]    -- name of the month
+     #    time["shortmonth"]   -- short name of the month
+     #    time["year"]         -- year modulo 100 (0 - 99)
+     #    time["fullyear"]     -- full year
+     #    time["weekday"]      -- day of week (Sunday = 0)
+     #    time["altweekday"]   -- day of week (Monday = 0)
+     #    time["dayname"]      -- name of weekday
+     #    time["shortdayname"] -- short name of weekday
+     #    time["yearday"]      -- day of year (0 - 365)
+     #    time["timezone"]     -- abbreviation of timezone name
+     #    time["ampm"]         -- AM or PM designation
+     #    time["weeknum"]      -- week number, Sunday first day
+     #    time["altweeknum"]   -- week number, Monday first day
 
-Now let's look at a step-by-step example of how to internationalize and
-localize a simple `awk' program, using `guide.awk' as our original
-source:
+     function getlocaltime(time,    ret, now, i)
+     {
+         # get time once, avoids unnecessary system calls
+         now = systime()
 
-     BEGIN {
-         TEXTDOMAIN = "guide"
-         bindtextdomain(".")  # for testing
-         print _"Don't Panic"
-         print _"The Answer Is", 42
-         print "Pardon me, Zaphod who?"
-     }
+         # return date(1)-style output
+         ret = strftime("%a %b %e %H:%M:%S %Z %Y", now)
 
-Run `gawk --gen-pot' to create the `.pot' file:
+         # clear out target array
+         delete time
 
-     $ gawk --gen-pot -f guide.awk > guide.pot
+         # fill in values, force numeric values to be
+         # numeric by adding 0
+         time["second"]       = strftime("%S", now) + 0
+         time["minute"]       = strftime("%M", now) + 0
+         time["hour"]         = strftime("%H", now) + 0
+         time["althour"]      = strftime("%I", now) + 0
+         time["monthday"]     = strftime("%d", now) + 0
+         time["month"]        = strftime("%m", now) + 0
+         time["monthname"]    = strftime("%B", now)
+         time["shortmonth"]   = strftime("%b", now)
+         time["year"]         = strftime("%y", now) + 0
+         time["fullyear"]     = strftime("%Y", now) + 0
+         time["weekday"]      = strftime("%w", now) + 0
+         time["altweekday"]   = strftime("%u", now) + 0
+         time["dayname"]      = strftime("%A", now)
+         time["shortdayname"] = strftime("%a", now)
+         time["yearday"]      = strftime("%j", now) + 0
+         time["timezone"]     = strftime("%Z", now)
+         time["ampm"]         = strftime("%p", now)
+         time["weeknum"]      = strftime("%U", now) + 0
+         time["altweeknum"]   = strftime("%W", now) + 0
 
-This produces:
+         return ret
+     }
 
-     #: guide.awk:4
-     msgid "Don't Panic"
-     msgstr ""
+   The string indices are easier to use and read than the various
+formats required by `strftime()'.  The `alarm' program presented in
+*note Alarm Program::, uses this function.  A more general design for
+the `getlocaltime()' function would have allowed the user to supply an
+optional timestamp value to use instead of the current time.
 
-     #: guide.awk:5
-     msgid "The Answer Is"
-     msgstr ""
+
+File: gawk.info,  Node: Data File Management,  Next: Getopt Function,  Prev: 
General Functions,  Up: Library Functions
 
-   This original portable object template file is saved and reused for
-each language into which the application is translated.  The `msgid' is
-the original string and the `msgstr' is the translation.
+10.3 Data File Management
+=========================
 
-     NOTE: Strings not marked with a leading underscore do not appear
-     in the `guide.pot' file.
+This minor node presents functions that are useful for managing
+command-line data files.
 
-   Next, the messages must be translated.  Here is a translation to a
-hypothetical dialect of English, called "Mellow":(1)
+* Menu:
 
-     $ cp guide.pot guide-mellow.po
-     ADD TRANSLATIONS TO guide-mellow.po ...
+* Filetrans Function::          A function for handling data file transitions.
+* Rewind Function::             A function for rereading the current file.
+* File Checking::               Checking that data files are readable.
+* Empty Files::                 Checking for zero-length files.
+* Ignoring Assigns::            Treating assignments as file names.
 
-Following are the translations:
+
+File: gawk.info,  Node: Filetrans Function,  Next: Rewind Function,  Up: Data 
File Management
 
-     #: guide.awk:4
-     msgid "Don't Panic"
-     msgstr "Hey man, relax!"
+10.3.1 Noting Data File Boundaries
+----------------------------------
 
-     #: guide.awk:5
-     msgid "The Answer Is"
-     msgstr "Like, the scoop is"
+The `BEGIN' and `END' rules are each executed exactly once at the
+beginning and end of your `awk' program, respectively (*note
+BEGIN/END::).  We (the `gawk' authors) once had a user who mistakenly
+thought that the `BEGIN' rule is executed at the beginning of each data
+file and the `END' rule is executed at the end of each data file.
 
-   The next step is to make the directory to hold the binary message
-object file and then to create the `guide.mo' file.  The directory
-layout shown here is standard for GNU `gettext' on GNU/Linux systems.
-Other versions of `gettext' may use a different layout:
+   When informed that this was not the case, the user requested that we
+add new special patterns to `gawk', named `BEGIN_FILE' and `END_FILE',
+that would have the desired behavior.  He even supplied us the code to
+do so.
 
-     $ mkdir en_US en_US/LC_MESSAGES
+   Adding these special patterns to `gawk' wasn't necessary; the job
+can be done cleanly in `awk' itself, as illustrated by the following
+library program.  It arranges to call two user-supplied functions,
+`beginfile()' and `endfile()', at the beginning and end of each data
+file.  Besides solving the problem in only nine(!) lines of code, it
+does so _portably_; this works with any implementation of `awk':
 
-   The `msgfmt' utility does the conversion from human-readable `.po'
-file to machine-readable `.mo' file.  By default, `msgfmt' creates a
-file named `messages'.  This file must be renamed and placed in the
-proper directory so that `gawk' can find it:
+     # transfile.awk
+     #
+     # Give the user a hook for filename transitions
+     #
+     # The user must supply functions beginfile() and endfile()
+     # that each take the name of the file being started or
+     # finished, respectively.
 
-     $ msgfmt guide-mellow.po
-     $ mv messages en_US/LC_MESSAGES/guide.mo
+     FILENAME != _oldfilename \
+     {
+         if (_oldfilename != "")
+             endfile(_oldfilename)
+         _oldfilename = FILENAME
+         beginfile(FILENAME)
+     }
 
-   Finally, we run the program to test it:
+     END   { endfile(FILENAME) }
 
-     $ gawk -f guide.awk
-     -| Hey man, relax!
-     -| Like, the scoop is 42
-     -| Pardon me, Zaphod who?
+   This file must be loaded before the user's "main" program, so that
+the rule it supplies is executed first.
 
-   If the three replacement functions for `dcgettext()', `dcngettext()'
-and `bindtextdomain()' (*note I18N Portability::) are in a file named
-`libintl.awk', then we can run `guide.awk' unchanged as follows:
+   This rule relies on `awk''s `FILENAME' variable that automatically
+changes for each new data file.  The current file name is saved in a
+private variable, `_oldfilename'.  If `FILENAME' does not equal
+`_oldfilename', then a new data file is being processed and it is
+necessary to call `endfile()' for the old file.  Because `endfile()'
+should only be called if a file has been processed, the program first
+checks to make sure that `_oldfilename' is not the null string.  The
+program then assigns the current file name to `_oldfilename' and calls
+`beginfile()' for the file.  Because, like all `awk' variables,
+`_oldfilename' is initialized to the null string, this rule executes
+correctly even for the first data file.
 
-     $ gawk --posix -f guide.awk -f libintl.awk
-     -| Don't Panic
-     -| The Answer Is 42
-     -| Pardon me, Zaphod who?
+   The program also supplies an `END' rule to do the final processing
+for the last file.  Because this `END' rule comes before any `END' rules
+supplied in the "main" program, `endfile()' is called first.  Once
+again the value of multiple `BEGIN' and `END' rules should be clear.
 
-   ---------- Footnotes ----------
+   If the same data file occurs twice in a row on the command line, then
+`endfile()' and `beginfile()' are not executed at the end of the first
+pass and at the beginning of the second pass.  The following version
+solves the problem:
 
-   (1) Perhaps it would be better if it were called "Hippy." Ah, well.
+     # ftrans.awk --- handle data file transitions
+     #
+     # user supplies beginfile() and endfile() functions
 
-
-File: gawk.info,  Node: Gawk I18N,  Prev: I18N Example,  Up: 
Internationalization
+     FNR == 1 {
+         if (_filename_ != "")
+             endfile(_filename_)
+         _filename_ = FILENAME
+         beginfile(FILENAME)
+     }
 
-10.6 `gawk' Can Speak Your Language
-===================================
+     END  { endfile(_filename_) }
 
-`gawk' itself has been internationalized using the GNU `gettext'
-package.  (GNU `gettext' is described in complete detail in *note (GNU
-`gettext' utilities)Top:: gettext, GNU gettext tools.)  As of this
-writing, the latest version of GNU `gettext' is version 0.18.1
-(ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz).
+   *note Wc Program::, shows how this library function can be used and
+how it simplifies writing the main program.
 
-   If a translation of `gawk''s messages exists, then `gawk' produces
-usage messages, warnings, and fatal errors in the local language.
+Advanced Notes: So Why Does `gawk' have `BEGINFILE' and `ENDFILE'?
+------------------------------------------------------------------
+
+You are probably wondering, if `beginfile()' and `endfile()' functions
+can do the job, why does `gawk' have `BEGINFILE' and `ENDFILE' patterns
+(*note BEGINFILE/ENDFILE::)?
+
+   Good question.  Normally, if `awk' cannot open a file, this causes
+an immediate fatal error.  In this case, there is no way for a
+user-defined function to deal with the problem, since the mechanism for
+calling it relies on the file being open and at the first record.  Thus,
+the main reason for `BEGINFILE' is to give you a "hook" to catch files
+that cannot be processed.  `ENDFILE' exists for symmetry, and because
+it provides an easy way to do per-file cleanup processing.
 
 
-File: gawk.info,  Node: Advanced Features,  Next: Library Functions,  Prev: 
Internationalization,  Up: Top
+File: gawk.info,  Node: Rewind Function,  Next: File Checking,  Prev: 
Filetrans Function,  Up: Data File Management
 
-11 Advanced Features of `gawk'
-******************************
+10.3.2 Rereading the Current File
+---------------------------------
 
-     Write documentation as if whoever reads it is a violent psychopath
-     who knows where you live.
-     Steve English, as quoted by Peter Langston
+Another request for a new built-in function was for a `rewind()'
+function that would make it possible to reread the current file.  The
+requesting user didn't want to have to use `getline' (*note Getline::)
+inside a loop.
 
-   This major node discusses advanced features in `gawk'.  It's a bit
-of a "grab bag" of items that are otherwise unrelated to each other.
-First, a command-line option allows `gawk' to recognize nondecimal
-numbers in input data, not just in `awk' programs.  Then, `gawk''s
-special features for sorting arrays are presented.  Next, two-way I/O,
-discussed briefly in earlier parts of this Info file, is described in
-full detail, along with the basics of TCP/IP networking.  Finally,
-`gawk' can "profile" an `awk' program, making it possible to tune it
-for performance.
+   However, as long as you are not in the `END' rule, it is quite easy
+to arrange to immediately close the current input file and then start
+over with it from the top.  For lack of a better name, we'll call it
+`rewind()':
 
-   *note Dynamic Extensions::, discusses the ability to dynamically add
-new built-in functions to `gawk'.  As this feature is still immature
-and likely to change, its description is relegated to an appendix.
+     # rewind.awk --- rewind the current file and start over
 
-* Menu:
+     function rewind(    i)
+     {
+         # shift remaining arguments up
+         for (i = ARGC; i > ARGIND; i--)
+             ARGV[i] = ARGV[i-1]
 
-* Nondecimal Data::             Allowing nondecimal input data.
-* Array Sorting::               Facilities for controlling array traversal and
-                                sorting arrays.
-* Two-way I/O::                 Two-way communications with another process.
-* TCP/IP Networking::           Using `gawk' for network programming.
-* Profiling::                   Profiling your `awk' programs.
+         # make sure gawk knows to keep going
+         ARGC++
 
-
-File: gawk.info,  Node: Nondecimal Data,  Next: Array Sorting,  Up: Advanced 
Features
+         # make current file next to get done
+         ARGV[ARGIND+1] = FILENAME
 
-11.1 Allowing Nondecimal Input Data
-===================================
+         # do it
+         nextfile
+     }
 
-If you run `gawk' with the `--non-decimal-data' option, you can have
-nondecimal constants in your input data:
+   This code relies on the `ARGIND' variable (*note Auto-set::), which
+is specific to `gawk'.  If you are not using `gawk', you can use ideas
+presented in *note Filetrans Function::, to either update `ARGIND' on
+your own or modify this code as appropriate.
 
-     $ echo 0123 123 0x123 |
-     > gawk --non-decimal-data '{ printf "%d, %d, %d\n",
-     >                                         $1, $2, $3 }'
-     -| 83, 123, 291
+   The `rewind()' function also relies on the `nextfile' keyword (*note
+Nextfile Statement::).
 
-   For this feature to work, write your program so that `gawk' treats
-your data as numeric:
+
+File: gawk.info,  Node: File Checking,  Next: Empty Files,  Prev: Rewind 
Function,  Up: Data File Management
 
-     $ echo 0123 123 0x123 | gawk '{ print $1, $2, $3 }'
-     -| 0123 123 0x123
+10.3.3 Checking for Readable Data Files
+---------------------------------------
 
-The `print' statement treats its expressions as strings.  Although the
-fields can act as numbers when necessary, they are still strings, so
-`print' does not try to treat them numerically.  You may need to add
-zero to a field to force it to be treated as a number.  For example:
+Normally, if you give `awk' a data file that isn't readable, it stops
+with a fatal error.  There are times when you might want to just ignore
+such files and keep going.  You can do this by prepending the following
+program to your `awk' program:
 
-     $ echo 0123 123 0x123 | gawk --non-decimal-data '
-     > { print $1, $2, $3
-     >   print $1 + 0, $2 + 0, $3 + 0 }'
-     -| 0123 123 0x123
-     -| 83 123 291
+     # readable.awk --- library file to skip over unreadable files
 
-   Because it is common to have decimal data with leading zeros, and
-because using this facility could lead to surprising results, the
-default is to leave it disabled.  If you want it, you must explicitly
-request it.
+     BEGIN {
+         for (i = 1; i < ARGC; i++) {
+             if (ARGV[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/ \
+                 || ARGV[i] == "-" || ARGV[i] == "/dev/stdin")
+                 continue    # assignment or standard input
+             else if ((getline junk < ARGV[i]) < 0) # unreadable
+                 delete ARGV[i]
+             else
+                 close(ARGV[i])
+         }
+     }
 
-     CAUTION: _Use of this option is not recommended._ It can break old
-     programs very badly.  Instead, use the `strtonum()' function to
-     convert your data (*note Nondecimal-numbers::).  This makes your
-     programs easier to write and easier to read, and leads to less
-     surprising results.
+   This works, because the `getline' won't be fatal.  Removing the
+element from `ARGV' with `delete' skips the file (since it's no longer
+in the list).  See also *note ARGC and ARGV::.
 
 
-File: gawk.info,  Node: Array Sorting,  Next: Two-way I/O,  Prev: Nondecimal 
Data,  Up: Advanced Features
+File: gawk.info,  Node: Empty Files,  Next: Ignoring Assigns,  Prev: File 
Checking,  Up: Data File Management
 
-11.2 Controlling Array Traversal and Array Sorting
-==================================================
+10.3.4 Checking For Zero-length Files
+-------------------------------------
 
-`gawk' lets you control the order in which a `for (i in array)' loop
-traverses an array.
+All known `awk' implementations silently skip over zero-length files.
+This is a by-product of `awk''s implicit
+read-a-record-and-match-against-the-rules loop: when `awk' tries to
+read a record from an empty file, it immediately receives an end of
+file indication, closes the file, and proceeds on to the next
+command-line data file, _without_ executing any user-level `awk'
+program code.
 
-   In addition, two built-in functions, `asort()' and `asorti()', let
-you sort arrays based on the array values and indices, respectively.
-These two functions also provide control over the sorting criteria used
-to order the elements during sorting.
+   Using `gawk''s `ARGIND' variable (*note Built-in Variables::), it is
+possible to detect when an empty data file has been skipped.  Similar
+to the library file presented in *note Filetrans Function::, the
+following library file calls a function named `zerofile()' that the
+user must provide.  The arguments passed are the file name and the
+position in `ARGV' where it was found:
 
-* Menu:
+     # zerofile.awk --- library file to process empty input files
 
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions::     How to use `asort()' and `asorti()'.
+     BEGIN { Argind = 0 }
 
-
-File: gawk.info,  Node: Controlling Array Traversal,  Next: Array Sorting 
Functions,  Up: Array Sorting
+     ARGIND > Argind + 1 {
+         for (Argind++; Argind < ARGIND; Argind++)
+             zerofile(ARGV[Argind], Argind)
+     }
 
-11.2.1 Controlling Array Traversal
-----------------------------------
+     ARGIND != Argind { Argind = ARGIND }
 
-By default, the order in which a `for (i in array)' loop scans an array
-is not defined; it is generally based upon the internal implementation
-of arrays inside `awk'.
+     END {
+         if (ARGIND > Argind)
+             for (Argind++; Argind <= ARGIND; Argind++)
+                 zerofile(ARGV[Argind], Argind)
+     }
 
-   Often, though, it is desirable to be able to loop over the elements
-in a particular order that you, the programmer, choose.  `gawk' lets
-you do this.
+   The user-level variable `Argind' allows the `awk' program to track
+its progress through `ARGV'.  Whenever the program detects that
+`ARGIND' is greater than `Argind + 1', it means that one or more empty
+files were skipped.  The action then calls `zerofile()' for each such
+file, incrementing `Argind' along the way.
 
-   *note Controlling Scanning::, describes how you can assign special,
-pre-defined values to `PROCINFO["sorted_in"]' in order to control the
-order in which `gawk' will traverse an array during a `for' loop.
+   The `Argind != ARGIND' rule simply keeps `Argind' up to date in the
+normal case.
 
-   In addition, the value of `PROCINFO["sorted_in"]' can be a function
-name.  This lets you traverse an array based on any custom criterion.
-The array elements are ordered according to the return value of this
-function.  The comparison function should be defined with at least four
-arguments:
+   Finally, the `END' rule catches the case of any empty files at the
+end of the command-line arguments.  Note that the test in the condition
+of the `for' loop uses the `<=' operator, not `<'.
 
-     function comp_func(i1, v1, i2, v2)
-     {
-         COMPARE ELEMENTS 1 AND 2 IN SOME FASHION
-         RETURN < 0; 0; OR > 0
-     }
+   As an exercise, you might consider whether this same problem can be
+solved without relying on `gawk''s `ARGIND' variable.
 
-   Here, I1 and I2 are the indices, and V1 and V2 are the corresponding
-values of the two elements being compared.  Either V1 or V2, or both,
-can be arrays if the array being traversed contains subarrays as values.
-(*Note Arrays of Arrays::, for more information about subarrays.)  The
-three possible return values are interpreted as follows:
+   As a second exercise, revise this code to handle the case where an
+intervening value in `ARGV' is a variable assignment.
 
-`comp_func(i1, v1, i2, v2) < 0'
-     Index I1 comes before index I2 during loop traversal.
+
+File: gawk.info,  Node: Ignoring Assigns,  Prev: Empty Files,  Up: Data File 
Management
 
-`comp_func(i1, v1, i2, v2) == 0'
-     Indices I1 and I2 come together but the relative order with
-     respect to each other is undefined.
+10.3.5 Treating Assignments as File Names
+-----------------------------------------
 
-`comp_func(i1, v1, i2, v2) > 0'
-     Index I1 comes after index I2 during loop traversal.
+Occasionally, you might not want `awk' to process command-line variable
+assignments (*note Assignment Options::).  In particular, if you have a
+file name that contain an `=' character, `awk' treats the file name as
+an assignment, and does not process it.
 
-   Our first comparison function can be used to scan an array in
-numerical order of the indices:
+   Some users have suggested an additional command-line option for
+`gawk' to disable command-line assignments.  However, some simple
+programming with a library file does the trick:
 
-     function cmp_num_idx(i1, v1, i2, v2)
+     # noassign.awk --- library file to avoid the need for a
+     # special option that disables command-line assignments
+
+     function disable_assigns(argc, argv,    i)
      {
-          # numerical index comparison, ascending order
-          return (i1 - i2)
+         for (i = 1; i < argc; i++)
+             if (argv[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/)
+                 argv[i] = ("./" argv[i])
      }
 
-   Our second function traverses an array based on the string order of
-the element values rather than by indices:
-
-     function cmp_str_val(i1, v1, i2, v2)
-     {
-         # string value comparison, ascending order
-         v1 = v1 ""
-         v2 = v2 ""
-         if (v1 < v2)
-             return -1
-         return (v1 != v2)
+     BEGIN {
+         if (No_command_assign)
+             disable_assigns(ARGC, ARGV)
      }
 
-   The third comparison function makes all numbers, and numeric strings
-without any leading or trailing spaces, come out first during loop
-traversal:
+   You then run your program this way:
 
-     function cmp_num_str_val(i1, v1, i2, v2,   n1, n2)
-     {
-          # numbers before string value comparison, ascending order
-          n1 = v1 + 0
-          n2 = v2 + 0
-          if (n1 == v1)
-              return (n2 == v2) ? (n1 - n2) : -1
-          else if (n2 == v2)
-              return 1
-          return (v1 < v2) ? -1 : (v1 != v2)
-     }
+     awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk *
 
-   Here is a main program to demonstrate how `gawk' behaves using each
-of the previous functions:
+   The function works by looping through the arguments.  It prepends
+`./' to any argument that matches the form of a variable assignment,
+turning that argument into a file name.
 
-     BEGIN {
-         data["one"] = 10
-         data["two"] = 20
-         data[10] = "one"
-         data[100] = 100
-         data[20] = "two"
+   The use of `No_command_assign' allows you to disable command-line
+assignments at invocation time, by giving the variable a true value.
+When not set, it is initially zero (i.e., false), so the command-line
+arguments are left alone.
 
-         f[1] = "cmp_num_idx"
-         f[2] = "cmp_str_val"
-         f[3] = "cmp_num_str_val"
-         for (i = 1; i <= 3; i++) {
-             printf("Sort function: %s\n", f[i])
-             PROCINFO["sorted_in"] = f[i]
-             for (j in data)
-                 printf("\tdata[%s] = %s\n", j, data[j])
-             print ""
-         }
-     }
+
+File: gawk.info,  Node: Getopt Function,  Next: Passwd Functions,  Prev: Data 
File Management,  Up: Library Functions
 
-   Here are the results when the program is run:
+10.4 Processing Command-Line Options
+====================================
 
-     $ gawk -f compdemo.awk
-     -| Sort function: cmp_num_idx      Sort by numeric index
-     -|     data[two] = 20
-     -|     data[one] = 10              Both strings are numerically zero
-     -|     data[10] = one
-     -|     data[20] = two
-     -|     data[100] = 100
-     -|
-     -| Sort function: cmp_str_val      Sort by element values as strings
-     -|     data[one] = 10
-     -|     data[100] = 100             String 100 is less than string 20
-     -|     data[two] = 20
-     -|     data[10] = one
-     -|     data[20] = two
-     -|
-     -| Sort function: cmp_num_str_val  Sort all numeric values before all 
strings
-     -|     data[one] = 10
-     -|     data[two] = 20
-     -|     data[100] = 100
-     -|     data[10] = one
-     -|     data[20] = two
+Most utilities on POSIX compatible systems take options on the command
+line that can be used to change the way a program behaves.  `awk' is an
+example of such a program (*note Options::).  Often, options take
+"arguments"; i.e., data that the program needs to correctly obey the
+command-line option.  For example, `awk''s `-F' option requires a
+string to use as the field separator.  The first occurrence on the
+command line of either `--' or a string that does not begin with `-'
+ends the options.
 
-   Consider sorting the entries of a GNU/Linux system password file
-according to login name.  The following program sorts records by a
-specific field position and can be used for this purpose:
+   Modern Unix systems provide a C function named `getopt()' for
+processing command-line arguments.  The programmer provides a string
+describing the one-letter options. If an option requires an argument,
+it is followed in the string with a colon.  `getopt()' is also passed
+the count and values of the command-line arguments and is called in a
+loop.  `getopt()' processes the command-line arguments for option
+letters.  Each time around the loop, it returns a single character
+representing the next option letter that it finds, or `?' if it finds
+an invalid option.  When it returns -1, there are no options left on
+the command line.
 
-     # sort.awk --- simple program to sort by field position
-     # field position is specified by the global variable POS
+   When using `getopt()', options that do not take arguments can be
+grouped together.  Furthermore, options that take arguments require
+that the argument be present.  The argument can immediately follow the
+option letter, or it can be a separate command-line argument.
 
-     function cmp_field(i1, v1, i2, v2)
-     {
-         # comparison by value, as string, and ascending order
-         return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
-     }
+   Given a hypothetical program that takes three command-line options,
+`-a', `-b', and `-c', where `-b' requires an argument, all of the
+following are valid ways of invoking the program:
 
-     {
-         for (i = 1; i <= NF; i++)
-             a[NR][i] = $i
-     }
+     prog -a -b foo -c data1 data2 data3
+     prog -ac -bfoo -- data1 data2 data3
+     prog -acbfoo data1 data2 data3
 
-     END {
-         PROCINFO["sorted_in"] = "cmp_field"
-         if (POS < 1 || POS > NF)
-             POS = 1
-         for (i in a) {
-             for (j = 1; j <= NF; j++)
-                 printf("%s%c", a[i][j], j < NF ? ":" : "")
-             print ""
-         }
-     }
+   Notice that when the argument is grouped with its option, the rest of
+the argument is considered to be the option's argument.  In this
+example, `-acbfoo' indicates that all of the `-a', `-b', and `-c'
+options were supplied, and that `foo' is the argument to the `-b'
+option.
 
-   The first field in each entry of the password file is the user's
-login name, and the fields are separated by colons.  Each record
-defines a subarray, with each field as an element in the subarray.
-Running the program produces the following output:
+   `getopt()' provides four external variables that the programmer can
+use:
 
-     $ gawk -v POS=1 -F: -f sort.awk /etc/passwd
-     -| adm:x:3:4:adm:/var/adm:/sbin/nologin
-     -| apache:x:48:48:Apache:/var/www:/sbin/nologin
-     -| avahi:x:70:70:Avahi daemon:/:/sbin/nologin
-     ...
+`optind'
+     The index in the argument value array (`argv') where the first
+     nonoption command-line argument can be found.
 
-   The comparison should normally always return the same value when
-given a specific pair of array elements as its arguments.  If
-inconsistent results are returned then the order is undefined.  This
-behavior can be exploited to introduce random order into otherwise
-seemingly ordered data:
+`optarg'
+     The string value of the argument to an option.
 
-     function cmp_randomize(i1, v1, i2, v2)
-     {
-         # random order
-         return (2 - 4 * rand())
-     }
+`opterr'
+     Usually `getopt()' prints an error message when it finds an invalid
+     option.  Setting `opterr' to zero disables this feature.  (An
+     application might want to print its own error message.)
 
-   As mentioned above, the order of the indices is arbitrary if two
-elements compare equal.  This is usually not a problem, but letting the
-tied elements come out in arbitrary order can be an issue, especially
-when comparing item values.  The partial ordering of the equal elements
-may change during the next loop traversal, if other elements are added
-or removed from the array.  One way to resolve ties when comparing
-elements with otherwise equal values is to include the indices in the
-comparison rules.  Note that doing this may make the loop traversal
-less efficient, so consider it only if necessary.  The following
-comparison functions force a deterministic order, and are based on the
-fact that the indices of two elements are never equal:
+`optopt'
+     The letter representing the command-line option.
 
-     function cmp_numeric(i1, v1, i2, v2)
-     {
-         # numerical value (and index) comparison, descending order
-         return (v1 != v2) ? (v2 - v1) : (i2 - i1)
-     }
+   The following C fragment shows how `getopt()' might process
+command-line arguments for `awk':
 
-     function cmp_string(i1, v1, i2, v2)
+     int
+     main(int argc, char *argv[])
      {
-         # string value (and index) comparison, descending order
-         v1 = v1 i1
-         v2 = v2 i2
-         return (v1 > v2) ? -1 : (v1 != v2)
+         ...
+         /* print our own message */
+         opterr = 0;
+         while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
+             switch (c) {
+             case 'f':    /* file */
+                 ...
+                 break;
+             case 'F':    /* field separator */
+                 ...
+                 break;
+             case 'v':    /* variable assignment */
+                 ...
+                 break;
+             case 'W':    /* extension */
+                 ...
+                 break;
+             case '?':
+             default:
+                 usage();
+                 break;
+             }
+         }
+         ...
      }
 
-   A custom comparison function can often simplify ordered loop
-traversal, and the sky is really the limit when it comes to designing
-such a function.
+   As a side point, `gawk' actually uses the GNU `getopt_long()'
+function to process both normal and GNU-style long options (*note
+Options::).
 
-   When string comparisons are made during a sort, either for element
-values where one or both aren't numbers, or for element indices handled
-as strings, the value of `IGNORECASE' (*note Built-in Variables::)
-controls whether the comparisons treat corresponding uppercase and
-lowercase letters as equivalent or distinct.
+   The abstraction provided by `getopt()' is very useful and is quite
+handy in `awk' programs as well.  Following is an `awk' version of
+`getopt()'.  This function highlights one of the greatest weaknesses in
+`awk', which is that it is very poor at manipulating single characters.
+Repeated calls to `substr()' are necessary for accessing individual
+characters (*note String Functions::).(1)
 
-   Another point to keep in mind is that in the case of subarrays the
-element values can themselves be arrays; a production comparison
-function should use the `isarray()' function (*note Type Functions::),
-to check for this, and choose a defined sorting order for subarrays.
+   The discussion that follows walks through the code a bit at a time:
 
-   All sorting based on `PROCINFO["sorted_in"]' is disabled in POSIX
-mode, since the `PROCINFO' array is not special in that case.
+     # getopt.awk --- Do C library getopt(3) function in awk
 
-   As a side note, sorting the array indices before traversing the
-array has been reported to add 15% to 20% overhead to the execution
-time of `awk' programs. For this reason, sorted array traversal is not
-the default.
+     # External variables:
+     #    Optind -- index in ARGV of first nonoption argument
+     #    Optarg -- string value of argument to current option
+     #    Opterr -- if nonzero, print our own diagnostic
+     #    Optopt -- current option letter
 
-
-File: gawk.info,  Node: Array Sorting Functions,  Prev: Controlling Array 
Traversal,  Up: Array Sorting
+     # Returns:
+     #    -1     at end of options
+     #    "?"    for unrecognized option
+     #    <c>    a character representing the current option
 
-11.2.2 Sorting Array Values and Indices with `gawk'
----------------------------------------------------
+     # Private Data:
+     #    _opti  -- index in multi-flag option, e.g., -abc
 
-In most `awk' implementations, sorting an array requires writing a
-`sort()' function.  While this can be educational for exploring
-different sorting algorithms, usually that's not the point of the
-program.  `gawk' provides the built-in `asort()' and `asorti()'
-functions (*note String Functions::) for sorting arrays.  For example:
+   The function starts out with comments presenting a list of the
+global variables it uses, what the return values are, what they mean,
+and any global variables that are "private" to this library function.
+Such documentation is essential for any program, and particularly for
+library functions.
 
-     POPULATE THE ARRAY data
-     n = asort(data)
-     for (i = 1; i <= n; i++)
-         DO SOMETHING WITH data[i]
+   The `getopt()' function first checks that it was indeed called with
+a string of options (the `options' parameter).  If `options' has a zero
+length, `getopt()' immediately returns -1:
 
-   After the call to `asort()', the array `data' is indexed from 1 to
-some number N, the total number of elements in `data'.  (This count is
-`asort()''s return value.)  `data[1]' <= `data[2]' <= `data[3]', and so
-on.  The comparison is based on the type of the elements (*note Typing
-and Comparison::).  All numeric values come before all string values,
-which in turn come before all subarrays.
+     function getopt(argc, argv, options,    thisopt, i)
+     {
+         if (length(options) == 0)    # no options given
+             return -1
 
-   An important side effect of calling `asort()' is that _the array's
-original indices are irrevocably lost_.  As this isn't always
-desirable, `asort()' accepts a second argument:
+         if (argv[Optind] == "--") {  # all done
+             Optind++
+             _opti = 0
+             return -1
+         } else if (argv[Optind] !~ /^-[^:[:space:]]/) {
+             _opti = 0
+             return -1
+         }
 
-     POPULATE THE ARRAY source
-     n = asort(source, dest)
-     for (i = 1; i <= n; i++)
-         DO SOMETHING WITH dest[i]
+   The next thing to check for is the end of the options.  A `--' ends
+the command-line options, as does any command-line argument that does
+not begin with a `-'.  `Optind' is used to step through the array of
+command-line arguments; it retains its value across calls to
+`getopt()', because it is a global variable.
 
-   In this case, `gawk' copies the `source' array into the `dest' array
-and then sorts `dest', destroying its indices.  However, the `source'
-array is not affected.
+   The regular expression that is used, `/^-[^:[:space:]/', checks for
+a `-' followed by anything that is not whitespace and not a colon.  If
+the current command-line argument does not match this pattern, it is
+not an option, and it ends option processing. Continuing on:
 
-   `asort()' accepts a third string argument to control comparison of
-array elements.  As with `PROCINFO["sorted_in"]', this argument may be
-one of the predefined names that `gawk' provides (*note Controlling
-Scanning::), or the name of a user-defined function (*note Controlling
-Array Traversal::).
+         if (_opti == 0)
+             _opti = 2
+         thisopt = substr(argv[Optind], _opti, 1)
+         Optopt = thisopt
+         i = index(options, thisopt)
+         if (i == 0) {
+             if (Opterr)
+                 printf("%c -- invalid option\n",
+                                       thisopt) > "/dev/stderr"
+             if (_opti >= length(argv[Optind])) {
+                 Optind++
+                 _opti = 0
+             } else
+                 _opti++
+             return "?"
+         }
 
-     NOTE: In all cases, the sorted element values consist of the
-     original array's element values.  The ability to control
-     comparison merely affects the way in which they are sorted.
+   The `_opti' variable tracks the position in the current command-line
+argument (`argv[Optind]').  If multiple options are grouped together
+with one `-' (e.g., `-abx'), it is necessary to return them to the user
+one at a time.
 
-   Often, what's needed is to sort on the values of the _indices_
-instead of the values of the elements.  To do that, use the `asorti()'
-function.  The interface is identical to that of `asort()', except that
-the index values are used for sorting, and become the values of the
-result array:
+   If `_opti' is equal to zero, it is set to two, which is the index in
+the string of the next character to look at (we skip the `-', which is
+at position one).  The variable `thisopt' holds the character, obtained
+with `substr()'.  It is saved in `Optopt' for the main program to use.
 
-     { source[$0] = some_func($0) }
+   If `thisopt' is not in the `options' string, then it is an invalid
+option.  If `Opterr' is nonzero, `getopt()' prints an error message on
+the standard error that is similar to the message from the C version of
+`getopt()'.
 
-     END {
-         n = asorti(source, dest)
-         for (i = 1; i <= n; i++) {
-             Work with sorted indices directly:
-             DO SOMETHING WITH dest[i]
-             ...
-             Access original array via sorted indices:
-             DO SOMETHING WITH source[dest[i]]
-         }
-     }
+   Because the option is invalid, it is necessary to skip it and move
+on to the next option character.  If `_opti' is greater than or equal
+to the length of the current command-line argument, it is necessary to
+move on to the next argument, so `Optind' is incremented and `_opti' is
+reset to zero. Otherwise, `Optind' is left alone and `_opti' is merely
+incremented.
 
-   Similar to `asort()', in all cases, the sorted element values
-consist of the original array's indices.  The ability to control
-comparison merely affects the way in which they are sorted.
+   In any case, because the option is invalid, `getopt()' returns `"?"'.
+The main program can examine `Optopt' if it needs to know what the
+invalid option letter actually is. Continuing on:
 
-   Sorting the array by replacing the indices provides maximal
-flexibility.  To traverse the elements in decreasing order, use a loop
-that goes from N down to 1, either over the elements or over the
-indices.(1)
+         if (substr(options, i + 1, 1) == ":") {
+             # get option argument
+             if (length(substr(argv[Optind], _opti + 1)) > 0)
+                 Optarg = substr(argv[Optind], _opti + 1)
+             else
+                 Optarg = argv[++Optind]
+             _opti = 0
+         } else
+             Optarg = ""
 
-   Copying array indices and elements isn't expensive in terms of
-memory.  Internally, `gawk' maintains "reference counts" to data.  For
-example, when `asort()' copies the first array to the second one, there
-is only one copy of the original array elements' data, even though both
-arrays use the values.
+   If the option requires an argument, the option letter is followed by
+a colon in the `options' string.  If there are remaining characters in
+the current command-line argument (`argv[Optind]'), then the rest of
+that string is assigned to `Optarg'.  Otherwise, the next command-line
+argument is used (`-xFOO' versus `-x FOO'). In either case, `_opti' is
+reset to zero, because there are no more characters left to examine in
+the current command-line argument. Continuing:
 
-   Because `IGNORECASE' affects string comparisons, the value of
-`IGNORECASE' also affects sorting for both `asort()' and `asorti()'.
-Note also that the locale's sorting order does _not_ come into play;
-comparisons are based on character values only.(2) Caveat Emptor.
+         if (_opti == 0 || _opti >= length(argv[Optind])) {
+             Optind++
+             _opti = 0
+         } else
+             _opti++
+         return thisopt
+     }
 
-   ---------- Footnotes ----------
+   Finally, if `_opti' is either zero or greater than the length of the
+current command-line argument, it means this element in `argv' is
+through being processed, so `Optind' is incremented to point to the
+next element in `argv'.  If neither condition is true, then only
+`_opti' is incremented, so that the next option letter can be processed
+on the next call to `getopt()'.
 
-   (1) You may also use one of the predefined sorting names that sorts
-in decreasing order.
+   The `BEGIN' rule initializes both `Opterr' and `Optind' to one.
+`Opterr' is set to one, since the default behavior is for `getopt()' to
+print a diagnostic message upon seeing an invalid option.  `Optind' is
+set to one, since there's no reason to look at the program name, which
+is in `ARGV[0]':
 
-   (2) This is true because locale-based comparison occurs only when in
-POSIX compatibility mode, and since `asort()' and `asorti()' are `gawk'
-extensions, they are not available in that case.
+     BEGIN {
+         Opterr = 1    # default is to diagnose
+         Optind = 1    # skip ARGV[0]
 
-
-File: gawk.info,  Node: Two-way I/O,  Next: TCP/IP Networking,  Prev: Array 
Sorting,  Up: Advanced Features
+         # test program
+         if (_getopt_test) {
+             while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
+                 printf("c = <%c>, optarg = <%s>\n",
+                                            _go_c, Optarg)
+             printf("non-option arguments:\n")
+             for (; Optind < ARGC; Optind++)
+                 printf("\tARGV[%d] = <%s>\n",
+                                         Optind, ARGV[Optind])
+         }
+     }
 
-11.3 Two-Way Communications with Another Process
-================================================
+   The rest of the `BEGIN' rule is a simple test program.  Here is the
+result of two sample runs of the test program:
 
-     From: address@hidden (Mike Brennan)
-     Newsgroups: comp.lang.awk
-     Subject: Re: Learn the SECRET to Attract Women Easily
-     Date: 4 Aug 1997 17:34:46 GMT
-     Message-ID: <address@hidden>
+     $ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
+     -| c = <a>, optarg = <>
+     -| c = <c>, optarg = <>
+     -| c = <b>, optarg = <ARG>
+     -| non-option arguments:
+     -|         ARGV[3] = <bax>
+     -|         ARGV[4] = <-x>
 
-     On 3 Aug 1997 13:17:43 GMT, Want More Dates???
-     <address@hidden> wrote:
-     >Learn the SECRET to Attract Women Easily
-     >
-     >The SCENT(tm)  Pheromone Sex Attractant For Men to Attract Women
+     $ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
+     -| c = <a>, optarg = <>
+     error--> x -- invalid option
+     -| c = <?>, optarg = <>
+     -| non-option arguments:
+     -|         ARGV[4] = <xyz>
+     -|         ARGV[5] = <abc>
 
-     The scent of awk programmers is a lot more attractive to women than
-     the scent of perl programmers.
-     --
-     Mike Brennan
+   In both runs, the first `--' terminates the arguments to `awk', so
+that it does not try to interpret the `-a', etc., as its own options.
 
-   It is often useful to be able to send data to a separate program for
-processing and then read the result.  This can always be done with
-temporary files:
+     NOTE: After `getopt()' is through, it is the responsibility of the
+     user level code to clear out all the elements of `ARGV' from 1 to
+     `Optind', so that `awk' does not try to process the command-line
+     options as file names.
 
-     # Write the data for processing
-     tempfile = ("mydata." PROCINFO["pid"])
-     while (NOT DONE WITH DATA)
-         print DATA | ("subprogram > " tempfile)
-     close("subprogram > " tempfile)
+   Several of the sample programs presented in *note Sample Programs::,
+use `getopt()' to process their arguments.
 
-     # Read the results, remove tempfile when done
-     while ((getline newdata < tempfile) > 0)
-         PROCESS newdata APPROPRIATELY
-     close(tempfile)
-     system("rm " tempfile)
+   ---------- Footnotes ----------
 
-This works, but not elegantly.  Among other things, it requires that
-the program be run in a directory that cannot be shared among users;
-for example, `/tmp' will not do, as another user might happen to be
-using a temporary file with the same name.
+   (1) This function was written before `gawk' acquired the ability to
+split strings into single characters using `""' as the separator.  We
+have left it alone, since using `substr()' is more portable.
 
-   However, with `gawk', it is possible to open a _two-way_ pipe to
-another process.  The second process is termed a "coprocess", since it
-runs in parallel with `gawk'.  The two-way connection is created using
-the `|&' operator (borrowed from the Korn shell, `ksh'):(1)
+
+File: gawk.info,  Node: Passwd Functions,  Next: Group Functions,  Prev: 
Getopt Function,  Up: Library Functions
 
-     do {
-         print DATA |& "subprogram"
-         "subprogram" |& getline results
-     } while (DATA LEFT TO PROCESS)
-     close("subprogram")
+10.5 Reading the User Database
+==============================
 
-   The first time an I/O operation is executed using the `|&' operator,
-`gawk' creates a two-way pipeline to a child process that runs the
-other program.  Output created with `print' or `printf' is written to
-the program's standard input, and output from the program's standard
-output can be read by the `gawk' program using `getline'.  As is the
-case with processes started by `|', the subprogram can be any program,
-or pipeline of programs, that can be started by the shell.
+The `PROCINFO' array (*note Built-in Variables::) provides access to
+the current user's real and effective user and group ID numbers, and if
+available, the user's supplementary group set.  However, because these
+are numbers, they do not provide very useful information to the average
+user.  There needs to be some way to find the user information
+associated with the user and group ID numbers.  This minor node
+presents a suite of functions for retrieving information from the user
+database.  *Note Group Functions::, for a similar suite that retrieves
+information from the group database.
 
-   There are some cautionary items to be aware of:
+   The POSIX standard does not define the file where user information is
+kept.  Instead, it provides the `<pwd.h>' header file and several C
+language subroutines for obtaining user information.  The primary
+function is `getpwent()', for "get password entry."  The "password"
+comes from the original user database file, `/etc/passwd', which stores
+user information, along with the encrypted passwords (hence the name).
 
-   * As the code inside `gawk' currently stands, the coprocess's
-     standard error goes to the same place that the parent `gawk''s
-     standard error goes. It is not possible to read the child's
-     standard error separately.
+   While an `awk' program could simply read `/etc/passwd' directly,
+this file may not contain complete information about the system's set
+of users.(1) To be sure you are able to produce a readable and complete
+version of the user database, it is necessary to write a small C
+program that calls `getpwent()'.  `getpwent()' is defined as returning
+a pointer to a `struct passwd'.  Each time it is called, it returns the
+next entry in the database.  When there are no more entries, it returns
+`NULL', the null pointer.  When this happens, the C program should call
+`endpwent()' to close the database.  Following is `pwcat', a C program
+that "cats" the password database:
 
-   * I/O buffering may be a problem.  `gawk' automatically flushes all
-     output down the pipe to the coprocess.  However, if the coprocess
-     does not flush its output, `gawk' may hang when doing a `getline'
-     in order to read the coprocess's results.  This could lead to a
-     situation known as "deadlock", where each process is waiting for
-     the other one to do something.
+     /*
+      * pwcat.c
+      *
+      * Generate a printable version of the password database
+      */
+     #include <stdio.h>
+     #include <pwd.h>
 
-   It is possible to close just one end of the two-way pipe to a
-coprocess, by supplying a second argument to the `close()' function of
-either `"to"' or `"from"' (*note Close Files And Pipes::).  These
-strings tell `gawk' to close the end of the pipe that sends data to the
-coprocess or the end that reads from it, respectively.
+     int
+     main(int argc, char **argv)
+     {
+         struct passwd *p;
 
-   This is particularly necessary in order to use the system `sort'
-utility as part of a coprocess; `sort' must read _all_ of its input
-data before it can produce any output.  The `sort' program does not
-receive an end-of-file indication until `gawk' closes the write end of
-the pipe.
+         while ((p = getpwent()) != NULL)
+             printf("%s:%s:%ld:%ld:%s:%s:%s\n",
+                 p->pw_name, p->pw_passwd, (long) p->pw_uid,
+                 (long) p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
 
-   When you have finished writing data to the `sort' utility, you can
-close the `"to"' end of the pipe, and then start reading sorted data
-via `getline'.  For example:
+         endpwent();
+         return 0;
+     }
 
-     BEGIN {
-         command = "LC_ALL=C sort"
-         n = split("abcdefghijklmnopqrstuvwxyz", a, "")
+   If you don't understand C, don't worry about it.  The output from
+`pwcat' is the user database, in the traditional `/etc/passwd' format
+of colon-separated fields.  The fields are:
 
-         for (i = n; i > 0; i--)
-             print a[i] |& command
-         close(command, "to")
+Login name
+     The user's login name.
 
-         while ((command |& getline line) > 0)
-             print "got", line
-         close(command)
-     }
+Encrypted password
+     The user's encrypted password.  This may not be available on some
+     systems.
 
-   This program writes the letters of the alphabet in reverse order, one
-per line, down the two-way pipe to `sort'.  It then closes the write
-end of the pipe, so that `sort' receives an end-of-file indication.
-This causes `sort' to sort the data and write the sorted data back to
-the `gawk' program.  Once all of the data has been read, `gawk'
-terminates the coprocess and exits.
+User-ID
+     The user's numeric user ID number.  (On some systems it's a C
+     `long', and not an `int'.  Thus we cast it to `long' for all
+     cases.)
 
-   As a side note, the assignment `LC_ALL=C' in the `sort' command
-ensures traditional Unix (ASCII) sorting from `sort'.
+Group-ID
+     The user's numeric group ID number.  (Similar comments about
+     `long' vs. `int' apply here.)
 
-   You may also use pseudo-ttys (ptys) for two-way communication
-instead of pipes, if your system supports them.  This is done on a
-per-command basis, by setting a special element in the `PROCINFO' array
-(*note Auto-set::), like so:
+Full name
+     The user's full name, and perhaps other information associated
+     with the user.
 
-     command = "sort -nr"           # command, save in convenience variable
-     PROCINFO[command, "pty"] = 1   # update PROCINFO
-     print ... |& command       # start two-way pipe
-     ...
+Home directory
+     The user's login (or "home") directory (familiar to shell
+     programmers as `$HOME').
 
-Using ptys avoids the buffer deadlock issues described earlier, at some
-loss in performance.  If your system does not have ptys, or if all the
-system's ptys are in use, `gawk' automatically falls back to using
-regular pipes.
+Login shell
+     The program that is run when the user logs in.  This is usually a
+     shell, such as Bash.
 
-   ---------- Footnotes ----------
+   A few lines representative of `pwcat''s output are as follows:
 
-   (1) This is very different from the same operator in the C shell.
+     $ pwcat
+     -| root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
+     -| nobody:*:65534:65534::/:
+     -| daemon:*:1:1::/:
+     -| sys:*:2:2::/:/bin/csh
+     -| bin:*:3:3::/bin:
+     -| arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
+     -| miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
+     -| andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
+     ...
 
-
-File: gawk.info,  Node: TCP/IP Networking,  Next: Profiling,  Prev: Two-way 
I/O,  Up: Advanced Features
+   With that introduction, following is a group of functions for
+getting user information.  There are several functions here,
+corresponding to the C functions of the same names:
 
-11.4 Using `gawk' for Network Programming
-=========================================
+     # passwd.awk --- access password file information
 
-     `EMISTERED':
-     A host is a host from coast to coast,
-     and no-one can talk to host that's close,
-     unless the host that isn't close
-     is busy hung or dead.
+     BEGIN {
+         # tailor this to suit your system
+         _pw_awklib = "/usr/local/libexec/awk/"
+     }
 
-   In addition to being able to open a two-way pipeline to a coprocess
-on the same system (*note Two-way I/O::), it is possible to make a
-two-way connection to another process on another system across an IP
-network connection.
+     function _pw_init(    oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
+     {
+         if (_pw_inited)
+             return
 
-   You can think of this as just a _very long_ two-way pipeline to a
-coprocess.  The way `gawk' decides that you want to use TCP/IP
-networking is by recognizing special file names that begin with one of
-`/inet/', `/inet4/' or `/inet6'.
+         oldfs = FS
+         oldrs = RS
+         olddol0 = $0
+         using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
+         using_fpat = (PROCINFO["FS"] == "FPAT")
+         FS = ":"
+         RS = "\n"
 
-   The full syntax of the special file name is
-`/NET-TYPE/PROTOCOL/LOCAL-PORT/REMOTE-HOST/REMOTE-PORT'.  The
-components are:
+         pwcat = _pw_awklib "pwcat"
+         while ((pwcat | getline) > 0) {
+             _pw_byname[$1] = $0
+             _pw_byuid[$3] = $0
+             _pw_bycount[++_pw_total] = $0
+         }
+         close(pwcat)
+         _pw_count = 0
+         _pw_inited = 1
+         FS = oldfs
+         if (using_fw)
+             FIELDWIDTHS = FIELDWIDTHS
+         else if (using_fpat)
+             FPAT = FPAT
+         RS = oldrs
+         $0 = olddol0
+     }
 
-NET-TYPE
-     Specifies the kind of Internet connection to make.  Use `/inet4/'
-     to force IPv4, and `/inet6/' to force IPv6.  Plain `/inet/' (which
-     used to be the only option) uses the system default, most likely
-     IPv4.
+   The `BEGIN' rule sets a private variable to the directory where
+`pwcat' is stored.  Because it is used to help out an `awk' library
+routine, we have chosen to put it in `/usr/local/libexec/awk'; however,
+you might want it to be in a different directory on your system.
 
-PROTOCOL
-     The protocol to use over IP.  This must be either `tcp', or `udp',
-     for a TCP or UDP IP connection, respectively.  The use of TCP is
-     recommended for most applications.
+   The function `_pw_init()' keeps three copies of the user information
+in three associative arrays.  The arrays are indexed by username
+(`_pw_byname'), by user ID number (`_pw_byuid'), and by order of
+occurrence (`_pw_bycount').  The variable `_pw_inited' is used for
+efficiency, since `_pw_init()' needs to be called only once.
 
-LOCAL-PORT
-     The local TCP or UDP port number to use.  Use a port number of `0'
-     when you want the system to pick a port. This is what you should do
-     when writing a TCP or UDP client.  You may also use a well-known
-     service name, such as `smtp' or `http', in which case `gawk'
-     attempts to determine the predefined port number using the C
-     `getaddrinfo()' function.
+   Because this function uses `getline' to read information from
+`pwcat', it first saves the values of `FS', `RS', and `$0'.  It notes
+in the variable `using_fw' whether field splitting with `FIELDWIDTHS'
+is in effect or not.  Doing so is necessary, since these functions
+could be called from anywhere within a user's program, and the user may
+have his or her own way of splitting records and fields.
 
-REMOTE-HOST
-     The IP address or fully-qualified domain name of the Internet host
-     to which you want to connect.
+   The `using_fw' variable checks `PROCINFO["FS"]', which is
+`"FIELDWIDTHS"' if field splitting is being done with `FIELDWIDTHS'.
+This makes it possible to restore the correct field-splitting mechanism
+later.  The test can only be true for `gawk'.  It is false if using
+`FS' or `FPAT', or on some other `awk' implementation.
 
-REMOTE-PORT
-     The TCP or UDP port number to use on the given REMOTE-HOST.
-     Again, use `0' if you don't care, or else a well-known service
-     name.
+   The code that checks for using `FPAT', using `using_fpat' and
+`PROCINFO["FS"]' is similar.
 
-     NOTE: Failure in opening a two-way socket will result in a
-     non-fatal error being returned to the calling code. The value of
-     `ERRNO' indicates the error (*note Auto-set::).
+   The main part of the function uses a loop to read database lines,
+split the line into fields, and then store the line into each array as
+necessary.  When the loop is done, `_pw_init()' cleans up by closing
+the pipeline, setting `_pw_inited' to one, and restoring `FS' (and
+`FIELDWIDTHS' or `FPAT' if necessary), `RS', and `$0'.  The use of
+`_pw_count' is explained shortly.
 
-   Consider the following very simple example:
+   The `getpwnam()' function takes a username as a string argument. If
+that user is in the database, it returns the appropriate line.
+Otherwise, it relies on the array reference to a nonexistent element to
+create the element with the null string as its value:
 
-     BEGIN {
-       Service = "/inet/tcp/0/localhost/daytime"
-       Service |& getline
-       print $0
-       close(Service)
+     function getpwnam(name)
+     {
+         _pw_init()
+         return _pw_byname[name]
      }
 
-   This program reads the current date and time from the local system's
-TCP `daytime' server.  It then prints the results and closes the
-connection.
+   Similarly, the `getpwuid' function takes a user ID number argument.
+If that user number is in the database, it returns the appropriate
+line. Otherwise, it returns the null string:
 
-   Because this topic is extensive, the use of `gawk' for TCP/IP
-programming is documented separately.  See *note (General
-Introduction)Top:: gawkinet, TCP/IP Internetworking with `gawk', for a
-much more complete introduction and discussion, as well as extensive
-examples.
+     function getpwuid(uid)
+     {
+         _pw_init()
+         return _pw_byuid[uid]
+     }
 
-
-File: gawk.info,  Node: Profiling,  Prev: TCP/IP Networking,  Up: Advanced 
Features
+   The `getpwent()' function simply steps through the database, one
+entry at a time.  It uses `_pw_count' to track its current position in
+the `_pw_bycount' array:
 
-11.5 Profiling Your `awk' Programs
-==================================
+     function getpwent()
+     {
+         _pw_init()
+         if (_pw_count < _pw_total)
+             return _pw_bycount[++_pw_count]
+         return ""
+     }
 
-You may produce execution traces of your `awk' programs.  This is done
-by passing the option `--profile' to `gawk'.  When `gawk' has finished
-running, it creates a profile of your program in a file named
-`awkprof.out'. Because it is profiling, it also executes up to 45%
-slower than `gawk' normally does.
+   The `endpwent()' function resets `_pw_count' to zero, so that
+subsequent calls to `getpwent()' start over again:
 
-   As shown in the following example, the `--profile' option can be
-used to change the name of the file where `gawk' will write the profile:
+     function endpwent()
+     {
+         _pw_count = 0
+     }
 
-     gawk --profile=myprog.prof -f myprog.awk data1 data2
+   A conscious design decision in this suite is that each subroutine
+calls `_pw_init()' to initialize the database arrays.  The overhead of
+running a separate process to generate the user database, and the I/O
+to scan it, are only incurred if the user's main program actually calls
+one of these functions.  If this library file is loaded along with a
+user's program, but none of the routines are ever called, then there is
+no extra runtime overhead.  (The alternative is move the body of
+`_pw_init()' into a `BEGIN' rule, which always runs `pwcat'.  This
+simplifies the code but runs an extra process that may never be needed.)
 
-In the above example, `gawk' places the profile in `myprog.prof'
-instead of in `awkprof.out'.
+   In turn, calling `_pw_init()' is not too expensive, because the
+`_pw_inited' variable keeps the program from reading the data more than
+once.  If you are worried about squeezing every last cycle out of your
+`awk' program, the check of `_pw_inited' could be moved out of
+`_pw_init()' and duplicated in all the other functions.  In practice,
+this is not necessary, since most `awk' programs are I/O-bound, and
+such a change would clutter up the code.
 
-   Here is a sample session showing a simple `awk' program, its input
-data, and the results from running `gawk' with the `--profile' option.
-First, the `awk' program:
+   The `id' program in *note Id Program::, uses these functions.
 
-     BEGIN { print "First BEGIN rule" }
+   ---------- Footnotes ----------
 
-     END { print "First END rule" }
+   (1) It is often the case that password information is stored in a
+network database.
 
-     /foo/ {
-         print "matched /foo/, gosh"
-         for (i = 1; i <= 3; i++)
-             sing()
-     }
+
+File: gawk.info,  Node: Group Functions,  Next: Walking Arrays,  Prev: Passwd 
Functions,  Up: Library Functions
 
-     {
-         if (/foo/)
-             print "if is true"
-         else
-             print "else is true"
-     }
+10.6 Reading the Group Database
+===============================
 
-     BEGIN { print "Second BEGIN rule" }
+Much of the discussion presented in *note Passwd Functions::, applies
+to the group database as well.  Although there has traditionally been a
+well-known file (`/etc/group') in a well-known format, the POSIX
+standard only provides a set of C library routines (`<grp.h>' and
+`getgrent()') for accessing the information.  Even though this file may
+exist, it may not have complete information.  Therefore, as with the
+user database, it is necessary to have a small C program that generates
+the group database as its output.  `grcat', a C program that "cats" the
+group database, is as follows:
 
-     END { print "Second END rule" }
+     /*
+      * grcat.c
+      *
+      * Generate a printable version of the group database
+      */
+     #include <stdio.h>
+     #include <grp.h>
 
-     function sing(    dummy)
+     int
+     main(int argc, char **argv)
      {
-         print "I gotta be me!"
+         struct group *g;
+         int i;
+
+         while ((g = getgrent()) != NULL) {
+             printf("%s:%s:%ld:", g->gr_name, g->gr_passwd,
+                                          (long) g->gr_gid);
+             for (i = 0; g->gr_mem[i] != NULL; i++) {
+                 printf("%s", g->gr_mem[i]);
+                 if (g->gr_mem[i+1] != NULL)
+                     putchar(',');
+             }
+             putchar('\n');
+         }
+         endgrent();
+         return 0;
      }
 
-   Following is the input data:
+   Each line in the group database represents one group.  The fields are
+separated with colons and represent the following information:
 
-     foo
-     bar
-     baz
-     foo
-     junk
+Group Name
+     The group's name.
 
-   Here is the `awkprof.out' that results from running the `gawk'
-profiler on this program and data (this example also illustrates that
-`awk' programmers sometimes have to work late):
+Group Password
+     The group's encrypted password. In practice, this field is never
+     used; it is usually empty or set to `*'.
 
-             # gawk profile, created Sun Aug 13 00:00:15 2000
+Group ID Number
+     The group's numeric group ID number; this number must be unique
+     within the file.  (On some systems it's a C `long', and not an
+     `int'.  Thus we cast it to `long' for all cases.)
 
-             # BEGIN block(s)
+Group Member List
+     A comma-separated list of user names.  These users are members of
+     the group.  Modern Unix systems allow users to be members of
+     several groups simultaneously.  If your system does, then there
+     are elements `"group1"' through `"groupN"' in `PROCINFO' for those
+     group ID numbers.  (Note that `PROCINFO' is a `gawk' extension;
+     *note Built-in Variables::.)
 
-             BEGIN {
-          1          print "First BEGIN rule"
-          1          print "Second BEGIN rule"
-             }
+   Here is what running `grcat' might produce:
 
-             # Rule(s)
+     $ grcat
+     -| wheel:*:0:arnold
+     -| nogroup:*:65534:
+     -| daemon:*:1:
+     -| kmem:*:2:
+     -| staff:*:10:arnold,miriam,andy
+     -| other:*:20:
+     ...
 
-          5  /foo/   { # 2
-          2          print "matched /foo/, gosh"
-          6          for (i = 1; i <= 3; i++) {
-          6                  sing()
-                     }
-             }
+   Here are the functions for obtaining information from the group
+database.  There are several, modeled after the C library functions of
+the same names:
 
-          5  {
-          5          if (/foo/) { # 2
-          2                  print "if is true"
-          3          } else {
-          3                  print "else is true"
-                     }
-             }
+     # group.awk --- functions for dealing with the group file
 
-             # END block(s)
+     BEGIN    \
+     {
+         # Change to suit your system
+         _gr_awklib = "/usr/local/libexec/awk/"
+     }
 
-             END {
-          1          print "First END rule"
-          1          print "Second END rule"
-             }
+     function _gr_init(    oldfs, oldrs, olddol0, grcat,
+                                  using_fw, using_fpat, n, a, i)
+     {
+         if (_gr_inited)
+             return
 
-             # Functions, listed alphabetically
+         oldfs = FS
+         oldrs = RS
+         olddol0 = $0
+         using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
+         using_fpat = (PROCINFO["FS"] == "FPAT")
+         FS = ":"
+         RS = "\n"
 
-          6  function sing(dummy)
-             {
-          6          print "I gotta be me!"
-             }
+         grcat = _gr_awklib "grcat"
+         while ((grcat | getline) > 0) {
+             if ($1 in _gr_byname)
+                 _gr_byname[$1] = _gr_byname[$1] "," $4
+             else
+                 _gr_byname[$1] = $0
+             if ($3 in _gr_bygid)
+                 _gr_bygid[$3] = _gr_bygid[$3] "," $4
+             else
+                 _gr_bygid[$3] = $0
 
-   This example illustrates many of the basic features of profiling
-output.  They are as follows:
+             n = split($4, a, "[ \t]*,[ \t]*")
+             for (i = 1; i <= n; i++)
+                 if (a[i] in _gr_groupsbyuser)
+                     _gr_groupsbyuser[a[i]] = \
+                         _gr_groupsbyuser[a[i]] " " $1
+                 else
+                     _gr_groupsbyuser[a[i]] = $1
 
-   * The program is printed in the order `BEGIN' rule, `BEGINFILE' rule,
-     pattern/action rules, `ENDFILE' rule, `END' rule and functions,
-     listed alphabetically.  Multiple `BEGIN' and `END' rules are
-     merged together, as are multiple `BEGINFILE' and `ENDFILE' rules.
+             _gr_bycount[++_gr_count] = $0
+         }
+         close(grcat)
+         _gr_count = 0
+         _gr_inited++
+         FS = oldfs
+         if (using_fw)
+             FIELDWIDTHS = FIELDWIDTHS
+         else if (using_fpat)
+             FPAT = FPAT
+         RS = oldrs
+         $0 = olddol0
+     }
 
-   * Pattern-action rules have two counts.  The first count, to the
-     left of the rule, shows how many times the rule's pattern was
-     _tested_.  The second count, to the right of the rule's opening
-     left brace in a comment, shows how many times the rule's action
-     was _executed_.  The difference between the two indicates how many
-     times the rule's pattern evaluated to false.
+   The `BEGIN' rule sets a private variable to the directory where
+`grcat' is stored.  Because it is used to help out an `awk' library
+routine, we have chosen to put it in `/usr/local/libexec/awk'.  You
+might want it to be in a different directory on your system.
 
-   * Similarly, the count for an `if'-`else' statement shows how many
-     times the condition was tested.  To the right of the opening left
-     brace for the `if''s body is a count showing how many times the
-     condition was true.  The count for the `else' indicates how many
-     times the test failed.
+   These routines follow the same general outline as the user database
+routines (*note Passwd Functions::).  The `_gr_inited' variable is used
+to ensure that the database is scanned no more than once.  The
+`_gr_init()' function first saves `FS', `RS', and `$0', and then sets
+`FS' and `RS' to the correct values for scanning the group information.
+It also takes care to note whether `FIELDWIDTHS' or `FPAT' is being
+used, and to restore the appropriate field splitting mechanism.
 
-   * The count for a loop header (such as `for' or `while') shows how
-     many times the loop test was executed.  (Because of this, you
-     can't just look at the count on the first statement in a rule to
-     determine how many times the rule was executed.  If the first
-     statement is a loop, the count is misleading.)
+   The group information is stored is several associative arrays.  The
+arrays are indexed by group name (`_gr_byname'), by group ID number
+(`_gr_bygid'), and by position in the database (`_gr_bycount').  There
+is an additional array indexed by user name (`_gr_groupsbyuser'), which
+is a space-separated list of groups to which each user belongs.
 
-   * For user-defined functions, the count next to the `function'
-     keyword indicates how many times the function was called.  The
-     counts next to the statements in the body show how many times
-     those statements were executed.
+   Unlike the user database, it is possible to have multiple records in
+the database for the same group.  This is common when a group has a
+large number of members.  A pair of such entries might look like the
+following:
 
-   * The layout uses "K&R" style with TABs.  Braces are used
-     everywhere, even when the body of an `if', `else', or loop is only
-     a single statement.
+     tvpeople:*:101:johnny,jay,arsenio
+     tvpeople:*:101:david,conan,tom,joan
 
-   * Parentheses are used only where needed, as indicated by the
-     structure of the program and the precedence rules.  For example,
-     `(3 + 5) * 4' means add three plus five, then multiply the total
-     by four.  However, `3 + 5 * 4' has no parentheses, and means `3 +
-     (5 * 4)'.
+   For this reason, `_gr_init()' looks to see if a group name or group
+ID number is already seen.  If it is, then the user names are simply
+concatenated onto the previous list of users.  (There is actually a
+subtle problem with the code just presented.  Suppose that the first
+time there were no names. This code adds the names with a leading
+comma. It also doesn't check that there is a `$4'.)
 
-   * Parentheses are used around the arguments to `print' and `printf'
-     only when the `print' or `printf' statement is followed by a
-     redirection.  Similarly, if the target of a redirection isn't a
-     scalar, it gets parenthesized.
+   Finally, `_gr_init()' closes the pipeline to `grcat', restores `FS'
+(and `FIELDWIDTHS' or `FPAT' if necessary), `RS', and `$0', initializes
+`_gr_count' to zero (it is used later), and makes `_gr_inited' nonzero.
 
-   * `gawk' supplies leading comments in front of the `BEGIN' and `END'
-     rules, the pattern/action rules, and the functions.
+   The `getgrnam()' function takes a group name as its argument, and if
+that group exists, it is returned.  Otherwise, it relies on the array
+reference to a nonexistent element to create the element with the null
+string as its value:
 
+     function getgrnam(group)
+     {
+         _gr_init()
+         return _gr_byname[group]
+     }
 
-   The profiled version of your program may not look exactly like what
-you typed when you wrote it.  This is because `gawk' creates the
-profiled version by "pretty printing" its internal representation of
-the program.  The advantage to this is that `gawk' can produce a
-standard representation.  The disadvantage is that all source-code
-comments are lost, as are the distinctions among multiple `BEGIN',
-`END', `BEGINFILE', and `ENDFILE' rules.  Also, things such as:
+   The `getgrgid()' function is similar; it takes a numeric group ID and
+looks up the information associated with that group ID:
 
-     /foo/
+     function getgrgid(gid)
+     {
+         _gr_init()
+         return _gr_bygid[gid]
+     }
 
-come out as:
+   The `getgruser()' function does not have a C counterpart. It takes a
+user name and returns the list of groups that have the user as a member:
 
-     /foo/   {
-         print $0
+     function getgruser(user)
+     {
+         _gr_init()
+         return _gr_groupsbyuser[user]
      }
 
-which is correct, but possibly surprising.
+   The `getgrent()' function steps through the database one entry at a
+time.  It uses `_gr_count' to track its position in the list:
 
-   Besides creating profiles when a program has completed, `gawk' can
-produce a profile while it is running.  This is useful if your `awk'
-program goes into an infinite loop and you want to see what has been
-executed.  To use this feature, run `gawk' with the `--profile' option
-in the background:
+     function getgrent()
+     {
+         _gr_init()
+         if (++_gr_count in _gr_bycount)
+             return _gr_bycount[_gr_count]
+         return ""
+     }
 
-     $ gawk --profile -f myprog &
-     [1] 13992
+   The `endgrent()' function resets `_gr_count' to zero so that
+`getgrent()' can start over again:
 
-The shell prints a job number and process ID number; in this case,
-13992.  Use the `kill' command to send the `USR1' signal to `gawk':
+     function endgrent()
+     {
+         _gr_count = 0
+     }
 
-     $ kill -USR1 13992
+   As with the user database routines, each function calls `_gr_init()'
+to initialize the arrays.  Doing so only incurs the extra overhead of
+running `grcat' if these functions are used (as opposed to moving the
+body of `_gr_init()' into a `BEGIN' rule).
 
-As usual, the profiled version of the program is written to
-`awkprof.out', or to a different file if one specified with the
-`--profile' option.
+   Most of the work is in scanning the database and building the various
+associative arrays.  The functions that the user calls are themselves
+very simple, relying on `awk''s associative arrays to do work.
 
-   Along with the regular profile, as shown earlier, the profile
-includes a trace of any active functions:
+   The `id' program in *note Id Program::, uses these functions.
 
-     # Function Call Stack:
+
+File: gawk.info,  Node: Walking Arrays,  Prev: Group Functions,  Up: Library 
Functions
 
-     #   3. baz
-     #   2. bar
-     #   1. foo
-     # -- main --
+10.7 Traversing Arrays of Arrays
+================================
 
-   You may send `gawk' the `USR1' signal as many times as you like.
-Each time, the profile and function call trace are appended to the
-output profile file.
+*note Arrays of Arrays::, described how `gawk' provides arrays of
+arrays.  In particular, any element of an array may be either a scalar,
+or another array. The `isarray()' function (*note Type Functions::)
+lets you distinguish an array from a scalar.  The following function,
+`walk_array()', recursively traverses an array, printing each element's
+indices and value.  You call it with the array and a string
+representing the name of the array:
 
-   If you use the `HUP' signal instead of the `USR1' signal, `gawk'
-produces the profile and the function call trace and then exits.
+     function walk_array(arr, name,      i)
+     {
+         for (i in arr) {
+             if (isarray(arr[i]))
+                 walk_array(arr[i], (name "[" i "]"))
+             else
+                 printf("%s[%s] = %s\n", name, i, arr[i])
+         }
+     }
 
-   When `gawk' runs on MS-Windows systems, it uses the `INT' and `QUIT'
-signals for producing the profile and, in the case of the `INT' signal,
-`gawk' exits.  This is because these systems don't support the `kill'
-command, so the only signals you can deliver to a program are those
-generated by the keyboard.  The `INT' signal is generated by the
-`Ctrl-<C>' or `Ctrl-<BREAK>' key, while the `QUIT' signal is generated
-by the `Ctrl-<\>' key.
+It works by looping over each element of the array. If any given
+element is itself an array, the function calls itself recursively,
+passing the subarray and a new string representing the current index.
+Otherwise, the function simply prints the element's name, index, and
+value.  Here is a main program to demonstrate:
 
-   Finally, `gawk' also accepts another option `--pretty-print'.  When
-called this way, `gawk' "pretty prints" the program into `awkprof.out',
-without any execution counts.
+     BEGIN {
+         a[1] = 1
+         a[2][1] = 21
+         a[2][2] = 22
+         a[3] = 3
+         a[4][1][1] = 411
+         a[4][2] = 42
 
-
-File: gawk.info,  Node: Library Functions,  Next: Sample Programs,  Prev: 
Advanced Features,  Up: Top
+         walk_array(a, "a")
+     }
 
-12 A Library of `awk' Functions
-*******************************
+   When run, the program produces the following output:
 
-*note User-defined::, describes how to write your own `awk' functions.
-Writing functions is important, because it allows you to encapsulate
-algorithms and program tasks in a single place.  It simplifies
-programming, making program development more manageable, and making
-programs more readable.
+     $ gawk -f walk_array.awk
+     -| a[4][1][1] = 411
+     -| a[4][2] = 42
+     -| a[1] = 1
+     -| a[2][1] = 21
+     -| a[2][2] = 22
+     -| a[3] = 3
 
-   One valuable way to learn a new programming language is to _read_
-programs in that language.  To that end, this major node and *note
-Sample Programs::, provide a good-sized body of code for you to read,
-and hopefully, to learn from.
+
+File: gawk.info,  Node: Sample Programs,  Next: Internationalization,  Prev: 
Library Functions,  Up: Top
 
-   This major node presents a library of useful `awk' functions.  Many
-of the sample programs presented later in this Info file use these
-functions.  The functions are presented here in a progression from
-simple to complex.
+11 Practical `awk' Programs
+***************************
 
-   *note Extract Program::, presents a program that you can use to
-extract the source code for these example library functions and
-programs from the Texinfo source for this Info file.  (This has already
-been done as part of the `gawk' distribution.)
+*note Library Functions::, presents the idea that reading programs in a
+language contributes to learning that language.  This major node
+continues that theme, presenting a potpourri of `awk' programs for your
+reading enjoyment.
 
-   If you have written one or more useful, general-purpose `awk'
-functions and would like to contribute them to the `awk' user
-community, see *note How To Contribute::, for more information.
+   Many of these programs use library functions presented in *note
+Library Functions::.
 
-   The programs in this major node and in *note Sample Programs::,
-freely use features that are `gawk'-specific.  Rewriting these programs
-for different implementations of `awk' is pretty straightforward.
+* Menu:
 
-   * Diagnostic error messages are sent to `/dev/stderr'.  Use `| "cat
-     1>&2"' instead of `> "/dev/stderr"' if your system does not have a
-     `/dev/stderr', or if you cannot use `gawk'.
+* Running Examples::            How to run these examples.
+* Clones::                      Clones of common utilities.
+* Miscellaneous Programs::      Some interesting `awk' programs.
 
-   * A number of programs use `nextfile' (*note Nextfile Statement::)
-     to skip any remaining input in the input file.
+
+File: gawk.info,  Node: Running Examples,  Next: Clones,  Up: Sample Programs
 
-   * Finally, some of the programs choose to ignore upper- and lowercase
-     distinctions in their input. They do so by assigning one to
-     `IGNORECASE'.  You can achieve almost the same effect(1) by adding
-     the following rule to the beginning of the program:
+11.1 Running the Example Programs
+=================================
 
-          # ignore case
-          { $0 = tolower($0) }
+To run a given program, you would typically do something like this:
 
-     Also, verify that all regexp and string constants used in
-     comparisons use only lowercase letters.
+     awk -f PROGRAM -- OPTIONS FILES
 
-* Menu:
+Here, PROGRAM is the name of the `awk' program (such as `cut.awk'),
+OPTIONS are any command-line options for the program that start with a
+`-', and FILES are the actual data files.
 
-* Library Names::               How to best name private global variables in
-                                library functions.
-* General Functions::           Functions that are of general use.
-* Data File Management::        Functions for managing command-line data
-                                files.
-* Getopt Function::             A function for processing command-line
-                                arguments.
-* Passwd Functions::            Functions for getting user information.
-* Group Functions::             Functions for getting group information.
-* Walking Arrays::              A function to walk arrays of arrays.
+   If your system supports the `#!' executable interpreter mechanism
+(*note Executable Scripts::), you can instead run your program directly:
 
-   ---------- Footnotes ----------
+     cut.awk -c1-8 myfiles > results
 
-   (1) The effects are not identical.  Output of the transformed record
-will be in all lowercase, while `IGNORECASE' preserves the original
-contents of the input record.
+   If your `awk' is not `gawk', you may instead need to use this:
+
+     cut.awk -- -c1-8 myfiles > results
 
 
-File: gawk.info,  Node: Library Names,  Next: General Functions,  Up: Library 
Functions
+File: gawk.info,  Node: Clones,  Next: Miscellaneous Programs,  Prev: Running 
Examples,  Up: Sample Programs
 
-12.1 Naming Library Function Global Variables
-=============================================
+11.2 Reinventing Wheels for Fun and Profit
+==========================================
 
-Due to the way the `awk' language evolved, variables are either
-"global" (usable by the entire program) or "local" (usable just by a
-specific function).  There is no intermediate state analogous to
-`static' variables in C.
+This minor node presents a number of POSIX utilities implemented in
+`awk'.  Reinventing these programs in `awk' is often enjoyable, because
+the algorithms can be very clearly expressed, and the code is usually
+very concise and simple.  This is true because `awk' does so much for
+you.
 
-   Library functions often need to have global variables that they can
-use to preserve state information between calls to the function--for
-example, `getopt()''s variable `_opti' (*note Getopt Function::).  Such
-variables are called "private", since the only functions that need to
-use them are the ones in the library.
+   It should be noted that these programs are not necessarily intended
+to replace the installed versions on your system.  Nor may all of these
+programs be fully compliant with the most recent POSIX standard.  This
+is not a problem; their purpose is to illustrate `awk' language
+programming for "real world" tasks.
 
-   When writing a library function, you should try to choose names for
-your private variables that will not conflict with any variables used by
-either another library function or a user's main program.  For example,
-a name like `i' or `j' is not a good choice, because user programs
-often use variable names like these for their own purposes.
+   The programs are presented in alphabetical order.
 
-   The example programs shown in this major node all start the names of
-their private variables with an underscore (`_').  Users generally
-don't use leading underscores in their variable names, so this
-convention immediately decreases the chances that the variable name
-will be accidentally shared with the user's program.
+* Menu:
 
-   In addition, several of the library functions use a prefix that helps
-indicate what function or set of functions use the variables--for
-example, `_pw_byname' in the user database routines (*note Passwd
-Functions::).  This convention is recommended, since it even further
-decreases the chance of inadvertent conflict among variable names.
-Note that this convention is used equally well for variable names and
-for private function names.(1)
+* Cut Program::                 The `cut' utility.
+* Egrep Program::               The `egrep' utility.
+* Id Program::                  The `id' utility.
+* Split Program::               The `split' utility.
+* Tee Program::                 The `tee' utility.
+* Uniq Program::                The `uniq' utility.
+* Wc Program::                  The `wc' utility.
 
-   As a final note on variable naming, if a function makes global
-variables available for use by a main program, it is a good convention
-to start that variable's name with a capital letter--for example,
-`getopt()''s `Opterr' and `Optind' variables (*note Getopt Function::).
-The leading capital letter indicates that it is global, while the fact
-that the variable name is not all capital letters indicates that the
-variable is not one of `awk''s built-in variables, such as `FS'.
+
+File: gawk.info,  Node: Cut Program,  Next: Egrep Program,  Up: Clones
 
-   It is also important that _all_ variables in library functions that
-do not need to save state are, in fact, declared local.(2) If this is
-not done, the variable could accidentally be used in the user's
-program, leading to bugs that are very difficult to track down:
+11.2.1 Cutting out Fields and Columns
+-------------------------------------
 
-     function lib_func(x, y,    l1, l2)
-     {
-         ...
-         USE VARIABLE some_var   # some_var should be local
-         ...                     # but is not by oversight
-     }
+The `cut' utility selects, or "cuts," characters or fields from its
+standard input and sends them to its standard output.  Fields are
+separated by TABs by default, but you may supply a command-line option
+to change the field "delimiter" (i.e., the field-separator character).
+`cut''s definition of fields is less general than `awk''s.
 
-   A different convention, common in the Tcl community, is to use a
-single associative array to hold the values needed by the library
-function(s), or "package."  This significantly decreases the number of
-actual global names in use.  For example, the functions described in
-*note Passwd Functions::, might have used array elements
-`PW_data["inited"]', `PW_data["total"]', `PW_data["count"]', and
-`PW_data["awklib"]', instead of `_pw_inited', `_pw_awklib', `_pw_total',
-and `_pw_count'.
+   A common use of `cut' might be to pull out just the login name of
+logged-on users from the output of `who'.  For example, the following
+pipeline generates a sorted, unique list of the logged-on users:
 
-   The conventions presented in this minor node are exactly that:
-conventions. You are not required to write your programs this way--we
-merely recommend that you do so.
+     who | cut -c1-8 | sort | uniq
 
-   ---------- Footnotes ----------
+   The options for `cut' are:
 
-   (1) While all the library routines could have been rewritten to use
-this convention, this was not done, in order to show how our own `awk'
-programming style has evolved and to provide some basis for this
-discussion.
+`-c LIST'
+     Use LIST as the list of characters to cut out.  Items within the
+     list may be separated by commas, and ranges of characters can be
+     separated with dashes.  The list `1-8,15,22-35' specifies
+     characters 1 through 8, 15, and 22 through 35.
 
-   (2) `gawk''s `--dump-variables' command-line option is useful for
-verifying this.
+`-f LIST'
+     Use LIST as the list of fields to cut out.
 
-
-File: gawk.info,  Node: General Functions,  Next: Data File Management,  Prev: 
Library Names,  Up: Library Functions
+`-d DELIM'
+     Use DELIM as the field-separator character instead of the TAB
+     character.
 
-12.2 General Programming
-========================
+`-s'
+     Suppress printing of lines that do not contain the field delimiter.
 
-This minor node presents a number of functions that are of general
-programming use.
+   The `awk' implementation of `cut' uses the `getopt()' library
+function (*note Getopt Function::) and the `join()' library function
+(*note Join Function::).
 
-* Menu:
+   The program begins with a comment describing the options, the library
+functions needed, and a `usage()' function that prints out a usage
+message and exits.  `usage()' is called if invalid arguments are
+supplied:
 
-* Strtonum Function::           A replacement for the built-in
-                                `strtonum()' function.
-* Assert Function::             A function for assertions in `awk'
-                                programs.
-* Round Function::              A function for rounding if `sprintf()'
-                                does not do it correctly.
-* Cliff Random Function::       The Cliff Random Number Generator.
-* Ordinal Functions::           Functions for using characters as numbers and
-                                vice versa.
-* Join Function::               A function to join an array into a string.
-* Getlocaltime Function::       A function to get formatted times.
+     # cut.awk --- implement cut in awk
 
-
-File: gawk.info,  Node: Strtonum Function,  Next: Assert Function,  Up: 
General Functions
+     # Options:
+     #    -f list     Cut fields
+     #    -d c        Field delimiter character
+     #    -c list     Cut characters
+     #
+     #    -s          Suppress lines without the delimiter
+     #
+     # Requires getopt() and join() library functions
 
-12.2.1 Converting Strings To Numbers
-------------------------------------
+     function usage(    e1, e2)
+     {
+         e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
+         e2 = "usage: cut [-c list] [files...]"
+         print e1 > "/dev/stderr"
+         print e2 > "/dev/stderr"
+         exit 1
+     }
 
-The `strtonum()' function (*note String Functions::) is a `gawk'
-extension.  The following function provides an implementation for other
-versions of `awk':
+The variables `e1' and `e2' are used so that the function fits nicely
+on the screen.
 
-     # mystrtonum --- convert string to number
+   Next comes a `BEGIN' rule that parses the command-line options.  It
+sets `FS' to a single TAB character, because that is `cut''s default
+field separator. The rule then sets the output field separator to be the
+same as the input field separator.  A loop using `getopt()' steps
+through the command-line options.  Exactly one of the variables
+`by_fields' or `by_chars' is set to true, to indicate that processing
+should be done by fields or by characters, respectively.  When cutting
+by characters, the output field separator is set to the null string:
 
-     function mystrtonum(str,        ret, chars, n, i, k, c)
+     BEGIN    \
      {
-         if (str ~ /^0[0-7]*$/) {
-             # octal
-             n = length(str)
-             ret = 0
-             for (i = 1; i <= n; i++) {
-                 c = substr(str, i, 1)
-                 if ((k = index("01234567", c)) > 0)
-                     k-- # adjust for 1-basing in awk
+         FS = "\t"    # default
+         OFS = FS
+         while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) {
+             if (c == "f") {
+                 by_fields = 1
+                 fieldlist = Optarg
+             } else if (c == "c") {
+                 by_chars = 1
+                 fieldlist = Optarg
+                 OFS = ""
+             } else if (c == "d") {
+                 if (length(Optarg) > 1) {
+                     printf("Using first character of %s" \
+                            " for delimiter\n", Optarg) > "/dev/stderr"
+                     Optarg = substr(Optarg, 1, 1)
+                 }
+                 FS = Optarg
+                 OFS = FS
+                 if (FS == " ")    # defeat awk semantics
+                     FS = "[ ]"
+             } else if (c == "s")
+                 suppress++
+             else
+                 usage()
+         }
 
-                 ret = ret * 8 + k
-             }
-         } else if (str ~ /^0[xX][[:xdigit:]]+/) {
-             # hexadecimal
-             str = substr(str, 3)    # lop off leading 0x
-             n = length(str)
-             ret = 0
-             for (i = 1; i <= n; i++) {
-                 c = substr(str, i, 1)
-                 c = tolower(c)
-                 if ((k = index("0123456789", c)) > 0)
-                     k-- # adjust for 1-basing in awk
-                 else if ((k = index("abcdef", c)) > 0)
-                     k += 9
-
-                 ret = ret * 16 + k
-             }
-         } else if (str ~ \
-       
/^[-+]?([0-9]+([.][0-9]*([Ee][0-9]+)?)?|([.][0-9]+([Ee][-+]?[0-9]+)?))$/) {
-             # decimal number, possibly floating point
-             ret = str + 0
-         } else
-             ret = "NOT-A-NUMBER"
-
-         return ret
-     }
-
-     # BEGIN {     # gawk test harness
-     #     a[1] = "25"
-     #     a[2] = ".31"
-     #     a[3] = "0123"
-     #     a[4] = "0xdeadBEEF"
-     #     a[5] = "123.45"
-     #     a[6] = "1.e3"
-     #     a[7] = "1.32"
-     #     a[7] = "1.32E2"
-     #
-     #     for (i = 1; i in a; i++)
-     #         print a[i], strtonum(a[i]), mystrtonum(a[i])
-     # }
-
-   The function first looks for C-style octal numbers (base 8).  If the
-input string matches a regular expression describing octal numbers,
-then `mystrtonum()' loops through each character in the string.  It
-sets `k' to the index in `"01234567"' of the current octal digit.
-Since the return value is one-based, the `k--' adjusts `k' so it can be
-used in computing the return value.
+         # Clear out options
+         for (i = 1; i < Optind; i++)
+             ARGV[i] = ""
 
-   Similar logic applies to the code that checks for and converts a
-hexadecimal value, which starts with `0x' or `0X'.  The use of
-`tolower()' simplifies the computation for finding the correct numeric
-value for each hexadecimal digit.
+   The code must take special care when the field delimiter is a space.
+Using a single space (`" "') for the value of `FS' is incorrect--`awk'
+would separate fields with runs of spaces, TABs, and/or newlines, and
+we want them to be separated with individual spaces.  Also remember
+that after `getopt()' is through (as described in *note Getopt
+Function::), we have to clear out all the elements of `ARGV' from 1 to
+`Optind', so that `awk' does not try to process the command-line options
+as file names.
 
-   Finally, if the string matches the (rather complicated) regexp for a
-regular decimal integer or floating-point number, the computation `ret
-= str + 0' lets `awk' convert the value to a number.
+   After dealing with the command-line options, the program verifies
+that the options make sense.  Only one or the other of `-c' and `-f'
+should be used, and both require a field list.  Then the program calls
+either `set_fieldlist()' or `set_charlist()' to pull apart the list of
+fields or characters:
 
-   A commented-out test program is included, so that the function can
-be tested with `gawk' and the results compared to the built-in
-`strtonum()' function.
+         if (by_fields && by_chars)
+             usage()
 
-
-File: gawk.info,  Node: Assert Function,  Next: Round Function,  Prev: 
Strtonum Function,  Up: General Functions
+         if (by_fields == 0 && by_chars == 0)
+             by_fields = 1    # default
 
-12.2.2 Assertions
------------------
+         if (fieldlist == "") {
+             print "cut: needs list for -c or -f" > "/dev/stderr"
+             exit 1
+         }
 
-When writing large programs, it is often useful to know that a
-condition or set of conditions is true.  Before proceeding with a
-particular computation, you make a statement about what you believe to
-be the case.  Such a statement is known as an "assertion".  The C
-language provides an `<assert.h>' header file and corresponding
-`assert()' macro that the programmer can use to make assertions.  If an
-assertion fails, the `assert()' macro arranges to print a diagnostic
-message describing the condition that should have been true but was
-not, and then it kills the program.  In C, using `assert()' looks this:
+         if (by_fields)
+             set_fieldlist()
+         else
+             set_charlist()
+     }
 
-     #include <assert.h>
+   `set_fieldlist()' splits the field list apart at the commas into an
+array.  Then, for each element of the array, it looks to see if the
+element is actually a range, and if so, splits it apart.  The function
+checks the range to make sure that the first number is smaller than the
+second.  Each number in the list is added to the `flist' array, which
+simply lists the fields that will be printed.  Normal field splitting
+is used.  The program lets `awk' handle the job of doing the field
+splitting:
 
-     int myfunc(int a, double b)
+     function set_fieldlist(        n, m, i, j, k, f, g)
      {
-          assert(a <= 5 && b >= 17.1);
-          ...
+         n = split(fieldlist, f, ",")
+         j = 1    # index in flist
+         for (i = 1; i <= n; i++) {
+             if (index(f[i], "-") != 0) { # a range
+                 m = split(f[i], g, "-")
+                 if (m != 2 || g[1] >= g[2]) {
+                     printf("bad field list: %s\n",
+                                       f[i]) > "/dev/stderr"
+                     exit 1
+                 }
+                 for (k = g[1]; k <= g[2]; k++)
+                     flist[j++] = k
+             } else
+                 flist[j++] = f[i]
+         }
+         nfields = j - 1
      }
 
-   If the assertion fails, the program prints a message similar to this:
-
-     prog.c:5: assertion failed: a <= 5 && b >= 17.1
-
-   The C language makes it possible to turn the condition into a string
-for use in printing the diagnostic message.  This is not possible in
-`awk', so this `assert()' function also requires a string version of
-the condition that is being tested.  Following is the function:
+   The `set_charlist()' function is more complicated than
+`set_fieldlist()'.  The idea here is to use `gawk''s `FIELDWIDTHS'
+variable (*note Constant Size::), which describes constant-width input.
+When using a character list, that is exactly what we have.
 
-     # assert --- assert that a condition is true. Otherwise exit.
+   Setting up `FIELDWIDTHS' is more complicated than simply listing the
+fields that need to be printed.  We have to keep track of the fields to
+print and also the intervening characters that have to be skipped.  For
+example, suppose you wanted characters 1 through 8, 15, and 22 through
+35.  You would use `-c 1-8,15,22-35'.  The necessary value for
+`FIELDWIDTHS' is `"8 6 1 6 14"'.  This yields five fields, and the
+fields to print are `$1', `$3', and `$5'.  The intermediate fields are
+"filler", which is stuff in between the desired data.  `flist' lists
+the fields to print, and `t' tracks the complete field list, including
+filler fields:
 
-     function assert(condition, string)
+     function set_charlist(    field, i, j, f, g, t,
+                               filler, last, len)
      {
-         if (! condition) {
-             printf("%s:%d: assertion failed: %s\n",
-                 FILENAME, FNR, string) > "/dev/stderr"
-             _assert_exit = 1
-             exit 1
+         field = 1   # count total fields
+         n = split(fieldlist, f, ",")
+         j = 1       # index in flist
+         for (i = 1; i <= n; i++) {
+             if (index(f[i], "-") != 0) { # range
+                 m = split(f[i], g, "-")
+                 if (m != 2 || g[1] >= g[2]) {
+                     printf("bad character list: %s\n",
+                                    f[i]) > "/dev/stderr"
+                     exit 1
+                 }
+                 len = g[2] - g[1] + 1
+                 if (g[1] > 1)  # compute length of filler
+                     filler = g[1] - last - 1
+                 else
+                     filler = 0
+                 if (filler)
+                     t[field++] = filler
+                 t[field++] = len  # length of field
+                 last = g[2]
+                 flist[j++] = field - 1
+             } else {
+                 if (f[i] > 1)
+                     filler = f[i] - last - 1
+                 else
+                     filler = 0
+                 if (filler)
+                     t[field++] = filler
+                 t[field++] = 1
+                 last = f[i]
+                 flist[j++] = field - 1
+             }
          }
+         FIELDWIDTHS = join(t, 1, field - 1)
+         nfields = j - 1
      }
 
-     END {
-         if (_assert_exit)
-             exit 1
-     }
-
-   The `assert()' function tests the `condition' parameter. If it is
-false, it prints a message to standard error, using the `string'
-parameter to describe the failed condition.  It then sets the variable
-`_assert_exit' to one and executes the `exit' statement.  The `exit'
-statement jumps to the `END' rule. If the `END' rules finds
-`_assert_exit' to be true, it exits immediately.
+   Next is the rule that actually processes the data.  If the `-s'
+option is given, then `suppress' is true.  The first `if' statement
+makes sure that the input record does have the field separator.  If
+`cut' is processing fields, `suppress' is true, and the field separator
+character is not in the record, then the record is skipped.
 
-   The purpose of the test in the `END' rule is to keep any other `END'
-rules from running.  When an assertion fails, the program should exit
-immediately.  If no assertions fail, then `_assert_exit' is still false
-when the `END' rule is run normally, and the rest of the program's
-`END' rules execute.  For all of this to work correctly, `assert.awk'
-must be the first source file read by `awk'.  The function can be used
-in a program in the following way:
+   If the record is valid, then `gawk' has split the data into fields,
+either using the character in `FS' or using fixed-length fields and
+`FIELDWIDTHS'.  The loop goes through the list of fields that should be
+printed.  The corresponding field is printed if it contains data.  If
+the next field also has data, then the separator character is written
+out between the fields:
 
-     function myfunc(a, b)
      {
-          assert(a <= 5 && b >= 17.1, "a <= 5 && b >= 17.1")
-          ...
-     }
+         if (by_fields && suppress && index($0, FS) != 0)
+             next
 
-If the assertion fails, you see a message similar to the following:
+         for (i = 1; i <= nfields; i++) {
+             if ($flist[i] != "") {
+                 printf "%s", $flist[i]
+                 if (i < nfields && $flist[i+1] != "")
+                     printf "%s", OFS
+             }
+         }
+         print ""
+     }
 
-     mydata:1357: assertion failed: a <= 5 && b >= 17.1
+   This version of `cut' relies on `gawk''s `FIELDWIDTHS' variable to
+do the character-based cutting.  While it is possible in other `awk'
+implementations to use `substr()' (*note String Functions::), it is
+also extremely painful.  The `FIELDWIDTHS' variable supplies an elegant
+solution to the problem of picking the input line apart by characters.
 
-   There is a small problem with this version of `assert()'.  An `END'
-rule is automatically added to the program calling `assert()'.
-Normally, if a program consists of just a `BEGIN' rule, the input files
-and/or standard input are not read. However, now that the program has
-an `END' rule, `awk' attempts to read the input data files or standard
-input (*note Using BEGIN/END::), most likely causing the program to
-hang as it waits for input.
+
+File: gawk.info,  Node: Egrep Program,  Next: Id Program,  Prev: Cut Program,  
Up: Clones
 
-   There is a simple workaround to this: make sure that such a `BEGIN'
-rule always ends with an `exit' statement.
+11.2.2 Searching for Regular Expressions in Files
+-------------------------------------------------
 
-
-File: gawk.info,  Node: Round Function,  Next: Cliff Random Function,  Prev: 
Assert Function,  Up: General Functions
+The `egrep' utility searches files for patterns.  It uses regular
+expressions that are almost identical to those available in `awk'
+(*note Regexp::).  You invoke it as follows:
 
-12.2.3 Rounding Numbers
------------------------
+     egrep [ OPTIONS ] 'PATTERN' FILES ...
 
-The way `printf' and `sprintf()' (*note Printf::) perform rounding
-often depends upon the system's C `sprintf()' subroutine.  On many
-machines, `sprintf()' rounding is "unbiased," which means it doesn't
-always round a trailing `.5' up, contrary to naive expectations.  In
-unbiased rounding, `.5' rounds to even, rather than always up, so 1.5
-rounds to 2 but 4.5 rounds to 4.  This means that if you are using a
-format that does rounding (e.g., `"%.0f"'), you should check what your
-system does.  The following function does traditional rounding; it
-might be useful if your `awk''s `printf' does unbiased rounding:
+   The PATTERN is a regular expression.  In typical usage, the regular
+expression is quoted to prevent the shell from expanding any of the
+special characters as file name wildcards.  Normally, `egrep' prints
+the lines that matched.  If multiple file names are provided on the
+command line, each output line is preceded by the name of the file and
+a colon.
 
-     # round.awk --- do normal rounding
+   The options to `egrep' are as follows:
 
-     function round(x,   ival, aval, fraction)
-     {
-        ival = int(x)    # integer part, int() truncates
+`-c'
+     Print out a count of the lines that matched the pattern, instead
+     of the lines themselves.
 
-        # see if fractional part
-        if (ival == x)   # no fraction
-           return ival   # ensure no decimals
+`-s'
+     Be silent.  No output is produced and the exit value indicates
+     whether the pattern was matched.
 
-        if (x < 0) {
-           aval = -x     # absolute value
-           ival = int(aval)
-           fraction = aval - ival
-           if (fraction >= .5)
-              return int(x) - 1   # -2.5 --> -3
-           else
-              return int(x)       # -2.3 --> -2
-        } else {
-           fraction = x - ival
-           if (fraction >= .5)
-              return ival + 1
-           else
-              return ival
-        }
-     }
+`-v'
+     Invert the sense of the test. `egrep' prints the lines that do
+     _not_ match the pattern and exits successfully if the pattern is
+     not matched.
 
-     # test harness
-     { print $0, round($0) }
+`-i'
+     Ignore case distinctions in both the pattern and the input data.
 
-
-File: gawk.info,  Node: Cliff Random Function,  Next: Ordinal Functions,  
Prev: Round Function,  Up: General Functions
+`-l'
+     Only print (list) the names of the files that matched, not the
+     lines that matched.
 
-12.2.4 The Cliff Random Number Generator
-----------------------------------------
+`-e PATTERN'
+     Use PATTERN as the regexp to match.  The purpose of the `-e'
+     option is to allow patterns that start with a `-'.
 
-The Cliff random number generator
-(http://mathworld.wolfram.com/CliffRandomNumberGenerator.html) is a
-very simple random number generator that "passes the noise sphere test
-for randomness by showing no structure."  It is easily programmed, in
-less than 10 lines of `awk' code:
+   This version uses the `getopt()' library function (*note Getopt
+Function::) and the file transition library program (*note Filetrans
+Function::).
 
-     # cliff_rand.awk --- generate Cliff random numbers
+   The program begins with a descriptive comment and then a `BEGIN' rule
+that processes the command-line arguments with `getopt()'.  The `-i'
+(ignore case) option is particularly easy with `gawk'; we just use the
+`IGNORECASE' built-in variable (*note Built-in Variables::):
 
-     BEGIN { _cliff_seed = 0.1 }
+     # egrep.awk --- simulate egrep in awk
+     #
+     # Options:
+     #    -c    count of lines
+     #    -s    silent - use exit value
+     #    -v    invert test, success if no match
+     #    -i    ignore case
+     #    -l    print filenames only
+     #    -e    argument is pattern
+     #
+     # Requires getopt and file transition library functions
 
-     function cliff_rand()
-     {
-         _cliff_seed = (100 * log(_cliff_seed)) % 1
-         if (_cliff_seed < 0)
-             _cliff_seed = - _cliff_seed
-         return _cliff_seed
-     }
+     BEGIN {
+         while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
+             if (c == "c")
+                 count_only++
+             else if (c == "s")
+                 no_print++
+             else if (c == "v")
+                 invert++
+             else if (c == "i")
+                 IGNORECASE = 1
+             else if (c == "l")
+                 filenames_only++
+             else if (c == "e")
+                 pattern = Optarg
+             else
+                 usage()
+         }
 
-   This algorithm requires an initial "seed" of 0.1.  Each new value
-uses the current seed as input for the calculation.  If the built-in
-`rand()' function (*note Numeric Functions::) isn't random enough, you
-might try using this function instead.
+   Next comes the code that handles the `egrep'-specific behavior. If no
+pattern is supplied with `-e', the first nonoption on the command line
+is used.  The `awk' command-line arguments up to `ARGV[Optind]' are
+cleared, so that `awk' won't try to process them as files.  If no files
+are specified, the standard input is used, and if multiple files are
+specified, we make sure to note this so that the file names can precede
+the matched lines in the output:
 
-
-File: gawk.info,  Node: Ordinal Functions,  Next: Join Function,  Prev: Cliff 
Random Function,  Up: General Functions
+         if (pattern == "")
+             pattern = ARGV[Optind++]
 
-12.2.5 Translating Between Characters and Numbers
--------------------------------------------------
+         for (i = 1; i < Optind; i++)
+             ARGV[i] = ""
+         if (Optind >= ARGC) {
+             ARGV[1] = "-"
+             ARGC = 2
+         } else if (ARGC - Optind > 1)
+             do_filenames++
 
-One commercial implementation of `awk' supplies a built-in function,
-`ord()', which takes a character and returns the numeric value for that
-character in the machine's character set.  If the string passed to
-`ord()' has more than one character, only the first one is used.
+     #    if (IGNORECASE)
+     #        pattern = tolower(pattern)
+     }
 
-   The inverse of this function is `chr()' (from the function of the
-same name in Pascal), which takes a number and returns the
-corresponding character.  Both functions are written very nicely in
-`awk'; there is no real reason to build them into the `awk' interpreter:
+   The last two lines are commented out, since they are not needed in
+`gawk'.  They should be uncommented if you have to use another version
+of `awk'.
 
-     # ord.awk --- do ord and chr
+   The next set of lines should be uncommented if you are not using
+`gawk'.  This rule translates all the characters in the input line into
+lowercase if the `-i' option is specified.(1) The rule is commented out
+since it is not necessary with `gawk':
 
-     # Global identifiers:
-     #    _ord_:        numerical values indexed by characters
-     #    _ord_init:    function to initialize _ord_
+     #{
+     #    if (IGNORECASE)
+     #        $0 = tolower($0)
+     #}
 
-     BEGIN    { _ord_init() }
+   The `beginfile()' function is called by the rule in `ftrans.awk'
+when each new file is processed.  In this case, it is very simple; all
+it does is initialize a variable `fcount' to zero. `fcount' tracks how
+many lines in the current file matched the pattern.  Naming the
+parameter `junk' shows we know that `beginfile()' is called with a
+parameter, but that we're not interested in its value:
 
-     function _ord_init(    low, high, i, t)
+     function beginfile(junk)
      {
-         low = sprintf("%c", 7) # BEL is ascii 7
-         if (low == "\a") {    # regular ascii
-             low = 0
-             high = 127
-         } else if (sprintf("%c", 128 + 7) == "\a") {
-             # ascii, mark parity
-             low = 128
-             high = 255
-         } else {        # ebcdic(!)
-             low = 0
-             high = 255
-         }
-
-         for (i = low; i <= high; i++) {
-             t = sprintf("%c", i)
-             _ord_[t] = i
-         }
+         fcount = 0
      }
 
-   Some explanation of the numbers used by `chr' is worthwhile.  The
-most prominent character set in use today is ASCII.(1) Although an
-8-bit byte can hold 256 distinct values (from 0 to 255), ASCII only
-defines characters that use the values from 0 to 127.(2) In the now
-distant past, at least one minicomputer manufacturer used ASCII, but
-with mark parity, meaning that the leftmost bit in the byte is always
-1.  This means that on those systems, characters have numeric values
-from 128 to 255.  Finally, large mainframe systems use the EBCDIC
-character set, which uses all 256 values.  While there are other
-character sets in use on some older systems, they are not really worth
-worrying about:
+   The `endfile()' function is called after each file has been
+processed.  It affects the output only when the user wants a count of
+the number of lines that matched.  `no_print' is true only if the exit
+status is desired.  `count_only' is true if line counts are desired.
+`egrep' therefore only prints line counts if printing and counting are
+enabled.  The output format must be adjusted depending upon the number
+of files to process.  Finally, `fcount' is added to `total', so that we
+know the total number of lines that matched the pattern:
 
-     function ord(str,    c)
+     function endfile(file)
      {
-         # only first character is of interest
-         c = substr(str, 1, 1)
-         return _ord_[c]
-     }
+         if (! no_print && count_only) {
+             if (do_filenames)
+                 print file ":" fcount
+             else
+                 print fcount
+         }
 
-     function chr(c)
-     {
-         # force c to be numeric by adding 0
-         return sprintf("%c", c + 0)
+         total += fcount
      }
 
-     #### test code ####
-     # BEGIN    \
-     # {
-     #    for (;;) {
-     #        printf("enter a character: ")
-     #        if (getline var <= 0)
-     #            break
-     #        printf("ord(%s) = %d\n", var, ord(var))
-     #    }
-     # }
+   The following rule does most of the work of matching lines. The
+variable `matches' is true if the line matched the pattern. If the user
+wants lines that did not match, the sense of `matches' is inverted
+using the `!' operator. `fcount' is incremented with the value of
+`matches', which is either one or zero, depending upon a successful or
+unsuccessful match.  If the line does not match, the `next' statement
+just moves on to the next record.
 
-   An obvious improvement to these functions is to move the code for the
-`_ord_init' function into the body of the `BEGIN' rule.  It was written
-this way initially for ease of development.  There is a "test program"
-in a `BEGIN' rule, to test the function.  It is commented out for
-production use.
+   A number of additional tests are made, but they are only done if we
+are not counting lines.  First, if the user only wants exit status
+(`no_print' is true), then it is enough to know that _one_ line in this
+file matched, and we can skip on to the next file with `nextfile'.
+Similarly, if we are only printing file names, we can print the file
+name, and then skip to the next file with `nextfile'.  Finally, each
+line is printed, with a leading file name and colon if necessary:
 
-   ---------- Footnotes ----------
+     {
+         matches = ($0 ~ pattern)
+         if (invert)
+             matches = ! matches
 
-   (1) This is changing; many systems use Unicode, a very large
-character set that includes ASCII as a subset.  On systems with full
-Unicode support, a character can occupy up to 32 bits, making simple
-tests such as used here prohibitively expensive.
+         fcount += matches    # 1 or 0
 
-   (2) ASCII has been extended in many countries to use the values from
-128 to 255 for country-specific characters.  If your  system uses these
-extensions, you can simplify `_ord_init' to loop from 0 to 255.
+         if (! matches)
+             next
 
-
-File: gawk.info,  Node: Join Function,  Next: Getlocaltime Function,  Prev: 
Ordinal Functions,  Up: General Functions
+         if (! count_only) {
+             if (no_print)
+                 nextfile
 
-12.2.6 Merging an Array into a String
--------------------------------------
+             if (filenames_only) {
+                 print FILENAME
+                 nextfile
+             }
 
-When doing string processing, it is often useful to be able to join all
-the strings in an array into one long string.  The following function,
-`join()', accomplishes this task.  It is used later in several of the
-application programs (*note Sample Programs::).
+             if (do_filenames)
+                 print FILENAME ":" $0
+             else
+                 print
+         }
+     }
 
-   Good function design is important; this function needs to be general
-but it should also have a reasonable default behavior.  It is called
-with an array as well as the beginning and ending indices of the
-elements in the array to be merged.  This assumes that the array
-indices are numeric--a reasonable assumption since the array was likely
-created with `split()' (*note String Functions::):
+   The `END' rule takes care of producing the correct exit status. If
+there are no matches, the exit status is one; otherwise it is zero:
 
-     # join.awk --- join an array into a string
+     END    \
+     {
+         if (total == 0)
+             exit 1
+         exit 0
+     }
 
-     function join(array, start, end, sep,    result, i)
+   The `usage()' function prints a usage message in case of invalid
+options, and then exits:
+
+     function usage(    e)
      {
-         if (sep == "")
-            sep = " "
-         else if (sep == SUBSEP) # magic value
-            sep = ""
-         result = array[start]
-         for (i = start + 1; i <= end; i++)
-             result = result sep array[i]
-         return result
+         e = "Usage: egrep [-csvil] [-e pat] [files ...]"
+         e = e "\n\tegrep [-csvil] pat [files ...]"
+         print e > "/dev/stderr"
+         exit 1
      }
 
-   An optional additional argument is the separator to use when joining
-the strings back together.  If the caller supplies a nonempty value,
-`join()' uses it; if it is not supplied, it has a null value.  In this
-case, `join()' uses a single space as a default separator for the
-strings.  If the value is equal to `SUBSEP', then `join()' joins the
-strings with no separator between them.  `SUBSEP' serves as a "magic"
-value to indicate that there should be no separation between the
-component strings.(1)
+   The variable `e' is used so that the function fits nicely on the
+printed page.
+
+   Just a note on programming style: you may have noticed that the `END'
+rule uses backslash continuation, with the open brace on a line by
+itself.  This is so that it more closely resembles the way functions
+are written.  Many of the examples in this major node use this style.
+You can decide for yourself if you like writing your `BEGIN' and `END'
+rules this way or not.
 
    ---------- Footnotes ----------
 
-   (1) It would be nice if `awk' had an assignment operator for
-concatenation.  The lack of an explicit operator for concatenation
-makes string operations more difficult than they really need to be.
+   (1) It also introduces a subtle bug; if a match happens, we output
+the translated line, not the original.
 
 
-File: gawk.info,  Node: Getlocaltime Function,  Prev: Join Function,  Up: 
General Functions
+File: gawk.info,  Node: Id Program,  Next: Split Program,  Prev: Egrep 
Program,  Up: Clones
 
-12.2.7 Managing the Time of Day
--------------------------------
+11.2.3 Printing out User Information
+------------------------------------
 
-The `systime()' and `strftime()' functions described in *note Time
-Functions::, provide the minimum functionality necessary for dealing
-with the time of day in human readable form.  While `strftime()' is
-extensive, the control formats are not necessarily easy to remember or
-intuitively obvious when reading a program.
+The `id' utility lists a user's real and effective user ID numbers,
+real and effective group ID numbers, and the user's group set, if any.
+`id' only prints the effective user ID and group ID if they are
+different from the real ones.  If possible, `id' also supplies the
+corresponding user and group names.  The output might look like this:
 
-   The following function, `getlocaltime()', populates a user-supplied
-array with preformatted time information.  It returns a string with the
-current time formatted in the same way as the `date' utility:
+     $ id
+     -| uid=500(arnold) gid=500(arnold) groups=6(disk),7(lp),19(floppy)
 
-     # getlocaltime.awk --- get the time of day in a usable format
+   This information is part of what is provided by `gawk''s `PROCINFO'
+array (*note Built-in Variables::).  However, the `id' utility provides
+a more palatable output than just individual numbers.
 
-     # Returns a string in the format of output of date(1)
-     # Populates the array argument time with individual values:
-     #    time["second"]       -- seconds (0 - 59)
-     #    time["minute"]       -- minutes (0 - 59)
-     #    time["hour"]         -- hours (0 - 23)
-     #    time["althour"]      -- hours (0 - 12)
-     #    time["monthday"]     -- day of month (1 - 31)
-     #    time["month"]        -- month of year (1 - 12)
-     #    time["monthname"]    -- name of the month
-     #    time["shortmonth"]   -- short name of the month
-     #    time["year"]         -- year modulo 100 (0 - 99)
-     #    time["fullyear"]     -- full year
-     #    time["weekday"]      -- day of week (Sunday = 0)
-     #    time["altweekday"]   -- day of week (Monday = 0)
-     #    time["dayname"]      -- name of weekday
-     #    time["shortdayname"] -- short name of weekday
-     #    time["yearday"]      -- day of year (0 - 365)
-     #    time["timezone"]     -- abbreviation of timezone name
-     #    time["ampm"]         -- AM or PM designation
-     #    time["weeknum"]      -- week number, Sunday first day
-     #    time["altweeknum"]   -- week number, Monday first day
+   Here is a simple version of `id' written in `awk'.  It uses the user
+database library functions (*note Passwd Functions::) and the group
+database library functions (*note Group Functions::):
 
-     function getlocaltime(time,    ret, now, i)
-     {
-         # get time once, avoids unnecessary system calls
-         now = systime()
+   The program is fairly straightforward.  All the work is done in the
+`BEGIN' rule.  The user and group ID numbers are obtained from
+`PROCINFO'.  The code is repetitive.  The entry in the user database
+for the real user ID number is split into parts at the `:'. The name is
+the first field.  Similar code is used for the effective user ID number
+and the group numbers:
 
-         # return date(1)-style output
-         ret = strftime("%a %b %e %H:%M:%S %Z %Y", now)
+     # id.awk --- implement id in awk
+     #
+     # Requires user and group library functions
+     # output is:
+     # uid=12(foo) euid=34(bar) gid=3(baz) \
+     #             egid=5(blat) groups=9(nine),2(two),1(one)
 
-         # clear out target array
-         delete time
+     BEGIN    \
+     {
+         uid = PROCINFO["uid"]
+         euid = PROCINFO["euid"]
+         gid = PROCINFO["gid"]
+         egid = PROCINFO["egid"]
 
-         # fill in values, force numeric values to be
-         # numeric by adding 0
-         time["second"]       = strftime("%S", now) + 0
-         time["minute"]       = strftime("%M", now) + 0
-         time["hour"]         = strftime("%H", now) + 0
-         time["althour"]      = strftime("%I", now) + 0
-         time["monthday"]     = strftime("%d", now) + 0
-         time["month"]        = strftime("%m", now) + 0
-         time["monthname"]    = strftime("%B", now)
-         time["shortmonth"]   = strftime("%b", now)
-         time["year"]         = strftime("%y", now) + 0
-         time["fullyear"]     = strftime("%Y", now) + 0
-         time["weekday"]      = strftime("%w", now) + 0
-         time["altweekday"]   = strftime("%u", now) + 0
-         time["dayname"]      = strftime("%A", now)
-         time["shortdayname"] = strftime("%a", now)
-         time["yearday"]      = strftime("%j", now) + 0
-         time["timezone"]     = strftime("%Z", now)
-         time["ampm"]         = strftime("%p", now)
-         time["weeknum"]      = strftime("%U", now) + 0
-         time["altweeknum"]   = strftime("%W", now) + 0
+         printf("uid=%d", uid)
+         pw = getpwuid(uid)
+         if (pw != "") {
+             split(pw, a, ":")
+             printf("(%s)", a[1])
+         }
 
-         return ret
-     }
+         if (euid != uid) {
+             printf(" euid=%d", euid)
+             pw = getpwuid(euid)
+             if (pw != "") {
+                 split(pw, a, ":")
+                 printf("(%s)", a[1])
+             }
+         }
 
-   The string indices are easier to use and read than the various
-formats required by `strftime()'.  The `alarm' program presented in
-*note Alarm Program::, uses this function.  A more general design for
-the `getlocaltime()' function would have allowed the user to supply an
-optional timestamp value to use instead of the current time.
+         printf(" gid=%d", gid)
+         pw = getgrgid(gid)
+         if (pw != "") {
+             split(pw, a, ":")
+             printf("(%s)", a[1])
+         }
 
-
-File: gawk.info,  Node: Data File Management,  Next: Getopt Function,  Prev: 
General Functions,  Up: Library Functions
+         if (egid != gid) {
+             printf(" egid=%d", egid)
+             pw = getgrgid(egid)
+             if (pw != "") {
+                 split(pw, a, ":")
+                 printf("(%s)", a[1])
+             }
+         }
 
-12.3 Data File Management
-=========================
+         for (i = 1; ("group" i) in PROCINFO; i++) {
+             if (i == 1)
+                 printf(" groups=")
+             group = PROCINFO["group" i]
+             printf("%d", group)
+             pw = getgrgid(group)
+             if (pw != "") {
+                 split(pw, a, ":")
+                 printf("(%s)", a[1])
+             }
+             if (("group" (i+1)) in PROCINFO)
+                 printf(",")
+         }
 
-This minor node presents functions that are useful for managing
-command-line data files.
+         print ""
+     }
 
-* Menu:
+   The test in the `for' loop is worth noting.  Any supplementary
+groups in the `PROCINFO' array have the indices `"group1"' through
+`"groupN"' for some N, i.e., the total number of supplementary groups.
+However, we don't know in advance how many of these groups there are.
 
-* Filetrans Function::          A function for handling data file transitions.
-* Rewind Function::             A function for rereading the current file.
-* File Checking::               Checking that data files are readable.
-* Empty Files::                 Checking for zero-length files.
-* Ignoring Assigns::            Treating assignments as file names.
+   This loop works by starting at one, concatenating the value with
+`"group"', and then using `in' to see if that value is in the array.
+Eventually, `i' is incremented past the last group in the array and the
+loop exits.
+
+   The loop is also correct if there are _no_ supplementary groups;
+then the condition is false the first time it's tested, and the loop
+body never executes.
 
 
-File: gawk.info,  Node: Filetrans Function,  Next: Rewind Function,  Up: Data 
File Management
+File: gawk.info,  Node: Split Program,  Next: Tee Program,  Prev: Id Program,  
Up: Clones
 
-12.3.1 Noting Data File Boundaries
-----------------------------------
+11.2.4 Splitting a Large File into Pieces
+-----------------------------------------
 
-The `BEGIN' and `END' rules are each executed exactly once at the
-beginning and end of your `awk' program, respectively (*note
-BEGIN/END::).  We (the `gawk' authors) once had a user who mistakenly
-thought that the `BEGIN' rule is executed at the beginning of each data
-file and the `END' rule is executed at the end of each data file.
-
-   When informed that this was not the case, the user requested that we
-add new special patterns to `gawk', named `BEGIN_FILE' and `END_FILE',
-that would have the desired behavior.  He even supplied us the code to
-do so.
+The `split' program splits large text files into smaller pieces.  Usage
+is as follows:(1)
 
-   Adding these special patterns to `gawk' wasn't necessary; the job
-can be done cleanly in `awk' itself, as illustrated by the following
-library program.  It arranges to call two user-supplied functions,
-`beginfile()' and `endfile()', at the beginning and end of each data
-file.  Besides solving the problem in only nine(!) lines of code, it
-does so _portably_; this works with any implementation of `awk':
+     split [-COUNT] file [ PREFIX ]
 
-     # transfile.awk
-     #
-     # Give the user a hook for filename transitions
-     #
-     # The user must supply functions beginfile() and endfile()
-     # that each take the name of the file being started or
-     # finished, respectively.
+   By default, the output files are named `xaa', `xab', and so on. Each
+file has 1000 lines in it, with the likely exception of the last file.
+To change the number of lines in each file, supply a number on the
+command line preceded with a minus; e.g., `-500' for files with 500
+lines in them instead of 1000.  To change the name of the output files
+to something like `myfileaa', `myfileab', and so on, supply an
+additional argument that specifies the file name prefix.
 
-     FILENAME != _oldfilename \
-     {
-         if (_oldfilename != "")
-             endfile(_oldfilename)
-         _oldfilename = FILENAME
-         beginfile(FILENAME)
-     }
+   Here is a version of `split' in `awk'. It uses the `ord()' and
+`chr()' functions presented in *note Ordinal Functions::.
 
-     END   { endfile(FILENAME) }
+   The program first sets its defaults, and then tests to make sure
+there are not too many arguments.  It then looks at each argument in
+turn.  The first argument could be a minus sign followed by a number.
+If it is, this happens to look like a negative number, so it is made
+positive, and that is the count of lines.  The data file name is
+skipped over and the final argument is used as the prefix for the
+output file names:
 
-   This file must be loaded before the user's "main" program, so that
-the rule it supplies is executed first.
+     # split.awk --- do split in awk
+     #
+     # Requires ord() and chr() library functions
+     # usage: split [-num] [file] [outname]
 
-   This rule relies on `awk''s `FILENAME' variable that automatically
-changes for each new data file.  The current file name is saved in a
-private variable, `_oldfilename'.  If `FILENAME' does not equal
-`_oldfilename', then a new data file is being processed and it is
-necessary to call `endfile()' for the old file.  Because `endfile()'
-should only be called if a file has been processed, the program first
-checks to make sure that `_oldfilename' is not the null string.  The
-program then assigns the current file name to `_oldfilename' and calls
-`beginfile()' for the file.  Because, like all `awk' variables,
-`_oldfilename' is initialized to the null string, this rule executes
-correctly even for the first data file.
+     BEGIN {
+         outfile = "x"    # default
+         count = 1000
+         if (ARGC > 4)
+             usage()
 
-   The program also supplies an `END' rule to do the final processing
-for the last file.  Because this `END' rule comes before any `END' rules
-supplied in the "main" program, `endfile()' is called first.  Once
-again the value of multiple `BEGIN' and `END' rules should be clear.
+         i = 1
+         if (ARGV[i] ~ /^-[[:digit:]]+$/) {
+             count = -ARGV[i]
+             ARGV[i] = ""
+             i++
+         }
+         # test argv in case reading from stdin instead of file
+         if (i in ARGV)
+             i++    # skip data file name
+         if (i in ARGV) {
+             outfile = ARGV[i]
+             ARGV[i] = ""
+         }
 
-   If the same data file occurs twice in a row on the command line, then
-`endfile()' and `beginfile()' are not executed at the end of the first
-pass and at the beginning of the second pass.  The following version
-solves the problem:
+         s1 = s2 = "a"
+         out = (outfile s1 s2)
+     }
 
-     # ftrans.awk --- handle data file transitions
-     #
-     # user supplies beginfile() and endfile() functions
+   The next rule does most of the work. `tcount' (temporary count)
+tracks how many lines have been printed to the output file so far. If
+it is greater than `count', it is time to close the current file and
+start a new one.  `s1' and `s2' track the current suffixes for the file
+name. If they are both `z', the file is just too big.  Otherwise, `s1'
+moves to the next letter in the alphabet and `s2' starts over again at
+`a':
 
-     FNR == 1 {
-         if (_filename_ != "")
-             endfile(_filename_)
-         _filename_ = FILENAME
-         beginfile(FILENAME)
+     {
+         if (++tcount > count) {
+             close(out)
+             if (s2 == "z") {
+                 if (s1 == "z") {
+                     printf("split: %s is too large to split\n",
+                            FILENAME) > "/dev/stderr"
+                     exit 1
+                 }
+                 s1 = chr(ord(s1) + 1)
+                 s2 = "a"
+             }
+             else
+                 s2 = chr(ord(s2) + 1)
+             out = (outfile s1 s2)
+             tcount = 1
+         }
+         print > out
      }
 
-     END  { endfile(_filename_) }
+The `usage()' function simply prints an error message and exits:
 
-   *note Wc Program::, shows how this library function can be used and
-how it simplifies writing the main program.
+     function usage(   e)
+     {
+         e = "usage: split [-num] [file] [outname]"
+         print e > "/dev/stderr"
+         exit 1
+     }
 
-Advanced Notes: So Why Does `gawk' have `BEGINFILE' and `ENDFILE'?
-------------------------------------------------------------------
+The variable `e' is used so that the function fits nicely on the screen.
 
-You are probably wondering, if `beginfile()' and `endfile()' functions
-can do the job, why does `gawk' have `BEGINFILE' and `ENDFILE' patterns
-(*note BEGINFILE/ENDFILE::)?
+   This program is a bit sloppy; it relies on `awk' to automatically
+close the last file instead of doing it in an `END' rule.  It also
+assumes that letters are contiguous in the character set, which isn't
+true for EBCDIC systems.
 
-   Good question.  Normally, if `awk' cannot open a file, this causes
-an immediate fatal error.  In this case, there is no way for a
-user-defined function to deal with the problem, since the mechanism for
-calling it relies on the file being open and at the first record.  Thus,
-the main reason for `BEGINFILE' is to give you a "hook" to catch files
-that cannot be processed.  `ENDFILE' exists for symmetry, and because
-it provides an easy way to do per-file cleanup processing.
+   ---------- Footnotes ----------
+
+   (1) This is the traditional usage. The POSIX usage is different, but
+not relevant for what the program aims to demonstrate.
 
 
-File: gawk.info,  Node: Rewind Function,  Next: File Checking,  Prev: 
Filetrans Function,  Up: Data File Management
+File: gawk.info,  Node: Tee Program,  Next: Uniq Program,  Prev: Split 
Program,  Up: Clones
 
-12.3.2 Rereading the Current File
----------------------------------
+11.2.5 Duplicating Output into Multiple Files
+---------------------------------------------
 
-Another request for a new built-in function was for a `rewind()'
-function that would make it possible to reread the current file.  The
-requesting user didn't want to have to use `getline' (*note Getline::)
-inside a loop.
+The `tee' program is known as a "pipe fitting."  `tee' copies its
+standard input to its standard output and also duplicates it to the
+files named on the command line.  Its usage is as follows:
 
-   However, as long as you are not in the `END' rule, it is quite easy
-to arrange to immediately close the current input file and then start
-over with it from the top.  For lack of a better name, we'll call it
-`rewind()':
+     tee [-a] file ...
 
-     # rewind.awk --- rewind the current file and start over
+   The `-a' option tells `tee' to append to the named files, instead of
+truncating them and starting over.
 
-     function rewind(    i)
-     {
-         # shift remaining arguments up
-         for (i = ARGC; i > ARGIND; i--)
-             ARGV[i] = ARGV[i-1]
+   The `BEGIN' rule first makes a copy of all the command-line arguments
+into an array named `copy'.  `ARGV[0]' is not copied, since it is not
+needed.  `tee' cannot use `ARGV' directly, since `awk' attempts to
+process each file name in `ARGV' as input data.
 
-         # make sure gawk knows to keep going
-         ARGC++
+   If the first argument is `-a', then the flag variable `append' is
+set to true, and both `ARGV[1]' and `copy[1]' are deleted. If `ARGC' is
+less than two, then no file names were supplied and `tee' prints a
+usage message and exits.  Finally, `awk' is forced to read the standard
+input by setting `ARGV[1]' to `"-"' and `ARGC' to two:
 
-         # make current file next to get done
-         ARGV[ARGIND+1] = FILENAME
+     # tee.awk --- tee in awk
+     #
+     # Copy standard input to all named output files.
+     # Append content if -a option is supplied.
+     #
+     BEGIN    \
+     {
+         for (i = 1; i < ARGC; i++)
+             copy[i] = ARGV[i]
 
-         # do it
-         nextfile
+         if (ARGV[1] == "-a") {
+             append = 1
+             delete ARGV[1]
+             delete copy[1]
+             ARGC--
+         }
+         if (ARGC < 2) {
+             print "usage: tee [-a] file ..." > "/dev/stderr"
+             exit 1
+         }
+         ARGV[1] = "-"
+         ARGC = 2
      }
 
-   This code relies on the `ARGIND' variable (*note Auto-set::), which
-is specific to `gawk'.  If you are not using `gawk', you can use ideas
-presented in *note Filetrans Function::, to either update `ARGIND' on
-your own or modify this code as appropriate.
+   The following single rule does all the work.  Since there is no
+pattern, it is executed for each line of input.  The body of the rule
+simply prints the line into each file on the command line, and then to
+the standard output:
 
-   The `rewind()' function also relies on the `nextfile' keyword (*note
-Nextfile Statement::).
+     {
+         # moving the if outside the loop makes it run faster
+         if (append)
+             for (i in copy)
+                 print >> copy[i]
+         else
+             for (i in copy)
+                 print > copy[i]
+         print
+     }
 
-
-File: gawk.info,  Node: File Checking,  Next: Empty Files,  Prev: Rewind 
Function,  Up: Data File Management
+It is also possible to write the loop this way:
 
-12.3.3 Checking for Readable Data Files
----------------------------------------
+     for (i in copy)
+         if (append)
+             print >> copy[i]
+         else
+             print > copy[i]
 
-Normally, if you give `awk' a data file that isn't readable, it stops
-with a fatal error.  There are times when you might want to just ignore
-such files and keep going.  You can do this by prepending the following
-program to your `awk' program:
+This is more concise but it is also less efficient.  The `if' is tested
+for each record and for each output file.  By duplicating the loop
+body, the `if' is only tested once for each input record.  If there are
+N input records and M output files, the first method only executes N
+`if' statements, while the second executes N`*'M `if' statements.
 
-     # readable.awk --- library file to skip over unreadable files
+   Finally, the `END' rule cleans up by closing all the output files:
 
-     BEGIN {
-         for (i = 1; i < ARGC; i++) {
-             if (ARGV[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/ \
-                 || ARGV[i] == "-" || ARGV[i] == "/dev/stdin")
-                 continue    # assignment or standard input
-             else if ((getline junk < ARGV[i]) < 0) # unreadable
-                 delete ARGV[i]
-             else
-                 close(ARGV[i])
-         }
+     END    \
+     {
+         for (i in copy)
+             close(copy[i])
      }
 
-   This works, because the `getline' won't be fatal.  Removing the
-element from `ARGV' with `delete' skips the file (since it's no longer
-in the list).  See also *note ARGC and ARGV::.
-
 
-File: gawk.info,  Node: Empty Files,  Next: Ignoring Assigns,  Prev: File 
Checking,  Up: Data File Management
-
-12.3.4 Checking For Zero-length Files
--------------------------------------
-
-All known `awk' implementations silently skip over zero-length files.
-This is a by-product of `awk''s implicit
-read-a-record-and-match-against-the-rules loop: when `awk' tries to
-read a record from an empty file, it immediately receives an end of
-file indication, closes the file, and proceeds on to the next
-command-line data file, _without_ executing any user-level `awk'
-program code.
+File: gawk.info,  Node: Uniq Program,  Next: Wc Program,  Prev: Tee Program,  
Up: Clones
 
-   Using `gawk''s `ARGIND' variable (*note Built-in Variables::), it is
-possible to detect when an empty data file has been skipped.  Similar
-to the library file presented in *note Filetrans Function::, the
-following library file calls a function named `zerofile()' that the
-user must provide.  The arguments passed are the file name and the
-position in `ARGV' where it was found:
+11.2.6 Printing Nonduplicated Lines of Text
+-------------------------------------------
 
-     # zerofile.awk --- library file to process empty input files
+The `uniq' utility reads sorted lines of data on its standard input,
+and by default removes duplicate lines.  In other words, it only prints
+unique lines--hence the name.  `uniq' has a number of options. The
+usage is as follows:
 
-     BEGIN { Argind = 0 }
+     uniq [-udc [-N]] [+N] [ INPUT FILE [ OUTPUT FILE ]]
 
-     ARGIND > Argind + 1 {
-         for (Argind++; Argind < ARGIND; Argind++)
-             zerofile(ARGV[Argind], Argind)
-     }
+   The options for `uniq' are:
 
-     ARGIND != Argind { Argind = ARGIND }
+`-d'
+     Print only repeated lines.
 
-     END {
-         if (ARGIND > Argind)
-             for (Argind++; Argind <= ARGIND; Argind++)
-                 zerofile(ARGV[Argind], Argind)
-     }
+`-u'
+     Print only nonrepeated lines.
 
-   The user-level variable `Argind' allows the `awk' program to track
-its progress through `ARGV'.  Whenever the program detects that
-`ARGIND' is greater than `Argind + 1', it means that one or more empty
-files were skipped.  The action then calls `zerofile()' for each such
-file, incrementing `Argind' along the way.
+`-c'
+     Count lines. This option overrides `-d' and `-u'.  Both repeated
+     and nonrepeated lines are counted.
 
-   The `Argind != ARGIND' rule simply keeps `Argind' up to date in the
-normal case.
+`-N'
+     Skip N fields before comparing lines.  The definition of fields is
+     similar to `awk''s default: nonwhitespace characters separated by
+     runs of spaces and/or TABs.
 
-   Finally, the `END' rule catches the case of any empty files at the
-end of the command-line arguments.  Note that the test in the condition
-of the `for' loop uses the `<=' operator, not `<'.
+`+N'
+     Skip N characters before comparing lines.  Any fields specified
+     with `-N' are skipped first.
 
-   As an exercise, you might consider whether this same problem can be
-solved without relying on `gawk''s `ARGIND' variable.
+`INPUT FILE'
+     Data is read from the input file named on the command line,
+     instead of from the standard input.
 
-   As a second exercise, revise this code to handle the case where an
-intervening value in `ARGV' is a variable assignment.
+`OUTPUT FILE'
+     The generated output is sent to the named output file, instead of
+     to the standard output.
 
-
-File: gawk.info,  Node: Ignoring Assigns,  Prev: Empty Files,  Up: Data File 
Management
+   Normally `uniq' behaves as if both the `-d' and `-u' options are
+provided.
 
-12.3.5 Treating Assignments as File Names
------------------------------------------
+   `uniq' uses the `getopt()' library function (*note Getopt Function::)
+and the `join()' library function (*note Join Function::).
 
-Occasionally, you might not want `awk' to process command-line variable
-assignments (*note Assignment Options::).  In particular, if you have a
-file name that contain an `=' character, `awk' treats the file name as
-an assignment, and does not process it.
+   The program begins with a `usage()' function and then a brief
+outline of the options and their meanings in comments.  The `BEGIN'
+rule deals with the command-line arguments and options. It uses a trick
+to get `getopt()' to handle options of the form `-25', treating such an
+option as the option letter `2' with an argument of `5'. If indeed two
+or more digits are supplied (`Optarg' looks like a number), `Optarg' is
+concatenated with the option digit and then the result is added to zero
+to make it into a number.  If there is only one digit in the option,
+then `Optarg' is not needed. In this case, `Optind' must be decremented
+so that `getopt()' processes it next time.  This code is admittedly a
+bit tricky.
 
-   Some users have suggested an additional command-line option for
-`gawk' to disable command-line assignments.  However, some simple
-programming with a library file does the trick:
+   If no options are supplied, then the default is taken, to print both
+repeated and nonrepeated lines.  The output file, if provided, is
+assigned to `outputfile'.  Early on, `outputfile' is initialized to the
+standard output, `/dev/stdout':
 
-     # noassign.awk --- library file to avoid the need for a
-     # special option that disables command-line assignments
+     # uniq.awk --- do uniq in awk
+     #
+     # Requires getopt() and join() library functions
 
-     function disable_assigns(argc, argv,    i)
+     function usage(    e)
      {
-         for (i = 1; i < argc; i++)
-             if (argv[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/)
-                 argv[i] = ("./" argv[i])
-     }
-
-     BEGIN {
-         if (No_command_assign)
-             disable_assigns(ARGC, ARGV)
+         e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
+         print e > "/dev/stderr"
+         exit 1
      }
 
-   You then run your program this way:
+     # -c    count lines. overrides -d and -u
+     # -d    only repeated lines
+     # -u    only nonrepeated lines
+     # -n    skip n fields
+     # +n    skip n characters, skip fields first
 
-     awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk *
+     BEGIN   \
+     {
+         count = 1
+         outputfile = "/dev/stdout"
+         opts = "udc0:1:2:3:4:5:6:7:8:9:"
+         while ((c = getopt(ARGC, ARGV, opts)) != -1) {
+             if (c == "u")
+                 non_repeated_only++
+             else if (c == "d")
+                 repeated_only++
+             else if (c == "c")
+                 do_count++
+             else if (index("0123456789", c) != 0) {
+                 # getopt requires args to options
+                 # this messes us up for things like -5
+                 if (Optarg ~ /^[[:digit:]]+$/)
+                     fcount = (c Optarg) + 0
+                 else {
+                     fcount = c + 0
+                     Optind--
+                 }
+             } else
+                 usage()
+         }
 
-   The function works by looping through the arguments.  It prepends
-`./' to any argument that matches the form of a variable assignment,
-turning that argument into a file name.
+         if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) {
+             charcount = substr(ARGV[Optind], 2) + 0
+             Optind++
+         }
 
-   The use of `No_command_assign' allows you to disable command-line
-assignments at invocation time, by giving the variable a true value.
-When not set, it is initially zero (i.e., false), so the command-line
-arguments are left alone.
+         for (i = 1; i < Optind; i++)
+             ARGV[i] = ""
 
-
-File: gawk.info,  Node: Getopt Function,  Next: Passwd Functions,  Prev: Data 
File Management,  Up: Library Functions
+         if (repeated_only == 0 && non_repeated_only == 0)
+             repeated_only = non_repeated_only = 1
 
-12.4 Processing Command-Line Options
-====================================
+         if (ARGC - Optind == 2) {
+             outputfile = ARGV[ARGC - 1]
+             ARGV[ARGC - 1] = ""
+         }
+     }
 
-Most utilities on POSIX compatible systems take options on the command
-line that can be used to change the way a program behaves.  `awk' is an
-example of such a program (*note Options::).  Often, options take
-"arguments"; i.e., data that the program needs to correctly obey the
-command-line option.  For example, `awk''s `-F' option requires a
-string to use as the field separator.  The first occurrence on the
-command line of either `--' or a string that does not begin with `-'
-ends the options.
+   The following function, `are_equal()', compares the current line,
+`$0', to the previous line, `last'.  It handles skipping fields and
+characters.  If no field count and no character count are specified,
+`are_equal()' simply returns one or zero depending upon the result of a
+simple string comparison of `last' and `$0'.  Otherwise, things get more
+complicated.  If fields have to be skipped, each line is broken into an
+array using `split()' (*note String Functions::); the desired fields
+are then joined back into a line using `join()'.  The joined lines are
+stored in `clast' and `cline'.  If no fields are skipped, `clast' and
+`cline' are set to `last' and `$0', respectively.  Finally, if
+characters are skipped, `substr()' is used to strip off the leading
+`charcount' characters in `clast' and `cline'.  The two strings are
+then compared and `are_equal()' returns the result:
 
-   Modern Unix systems provide a C function named `getopt()' for
-processing command-line arguments.  The programmer provides a string
-describing the one-letter options. If an option requires an argument,
-it is followed in the string with a colon.  `getopt()' is also passed
-the count and values of the command-line arguments and is called in a
-loop.  `getopt()' processes the command-line arguments for option
-letters.  Each time around the loop, it returns a single character
-representing the next option letter that it finds, or `?' if it finds
-an invalid option.  When it returns -1, there are no options left on
-the command line.
+     function are_equal(    n, m, clast, cline, alast, aline)
+     {
+         if (fcount == 0 && charcount == 0)
+             return (last == $0)
 
-   When using `getopt()', options that do not take arguments can be
-grouped together.  Furthermore, options that take arguments require
-that the argument be present.  The argument can immediately follow the
-option letter, or it can be a separate command-line argument.
+         if (fcount > 0) {
+             n = split(last, alast)
+             m = split($0, aline)
+             clast = join(alast, fcount+1, n)
+             cline = join(aline, fcount+1, m)
+         } else {
+             clast = last
+             cline = $0
+         }
+         if (charcount) {
+             clast = substr(clast, charcount + 1)
+             cline = substr(cline, charcount + 1)
+         }
 
-   Given a hypothetical program that takes three command-line options,
-`-a', `-b', and `-c', where `-b' requires an argument, all of the
-following are valid ways of invoking the program:
+         return (clast == cline)
+     }
 
-     prog -a -b foo -c data1 data2 data3
-     prog -ac -bfoo -- data1 data2 data3
-     prog -acbfoo data1 data2 data3
+   The following two rules are the body of the program.  The first one
+is executed only for the very first line of data.  It sets `last' equal
+to `$0', so that subsequent lines of text have something to be compared
+to.
 
-   Notice that when the argument is grouped with its option, the rest of
-the argument is considered to be the option's argument.  In this
-example, `-acbfoo' indicates that all of the `-a', `-b', and `-c'
-options were supplied, and that `foo' is the argument to the `-b'
-option.
-
-   `getopt()' provides four external variables that the programmer can
-use:
-
-`optind'
-     The index in the argument value array (`argv') where the first
-     nonoption command-line argument can be found.
-
-`optarg'
-     The string value of the argument to an option.
+   The second rule does the work. The variable `equal' is one or zero,
+depending upon the results of `are_equal()''s comparison. If `uniq' is
+counting repeated lines, and the lines are equal, then it increments
+the `count' variable.  Otherwise, it prints the line and resets `count',
+since the two lines are not equal.
 
-`opterr'
-     Usually `getopt()' prints an error message when it finds an invalid
-     option.  Setting `opterr' to zero disables this feature.  (An
-     application might want to print its own error message.)
+   If `uniq' is not counting, and if the lines are equal, `count' is
+incremented.  Nothing is printed, since the point is to remove
+duplicates.  Otherwise, if `uniq' is counting repeated lines and more
+than one line is seen, or if `uniq' is counting nonrepeated lines and
+only one line is seen, then the line is printed, and `count' is reset.
 
-`optopt'
-     The letter representing the command-line option.
+   Finally, similar logic is used in the `END' rule to print the final
+line of input data:
 
-   The following C fragment shows how `getopt()' might process
-command-line arguments for `awk':
+     NR == 1 {
+         last = $0
+         next
+     }
 
-     int
-     main(int argc, char *argv[])
      {
-         ...
-         /* print our own message */
-         opterr = 0;
-         while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
-             switch (c) {
-             case 'f':    /* file */
-                 ...
-                 break;
-             case 'F':    /* field separator */
-                 ...
-                 break;
-             case 'v':    /* variable assignment */
-                 ...
-                 break;
-             case 'W':    /* extension */
-                 ...
-                 break;
-             case '?':
-             default:
-                 usage();
-                 break;
+         equal = are_equal()
+
+         if (do_count) {    # overrides -d and -u
+             if (equal)
+                 count++
+             else {
+                 printf("%4d %s\n", count, last) > outputfile
+                 last = $0
+                 count = 1    # reset
              }
+             next
+         }
+
+         if (equal)
+             count++
+         else {
+             if ((repeated_only && count > 1) ||
+                 (non_repeated_only && count == 1))
+                     print last > outputfile
+             last = $0
+             count = 1
          }
-         ...
      }
 
-   As a side point, `gawk' actually uses the GNU `getopt_long()'
-function to process both normal and GNU-style long options (*note
-Options::).
+     END {
+         if (do_count)
+             printf("%4d %s\n", count, last) > outputfile
+         else if ((repeated_only && count > 1) ||
+                 (non_repeated_only && count == 1))
+             print last > outputfile
+         close(outputfile)
+     }
 
-   The abstraction provided by `getopt()' is very useful and is quite
-handy in `awk' programs as well.  Following is an `awk' version of
-`getopt()'.  This function highlights one of the greatest weaknesses in
-`awk', which is that it is very poor at manipulating single characters.
-Repeated calls to `substr()' are necessary for accessing individual
-characters (*note String Functions::).(1)
+
+File: gawk.info,  Node: Wc Program,  Prev: Uniq Program,  Up: Clones
 
-   The discussion that follows walks through the code a bit at a time:
+11.2.7 Counting Things
+----------------------
 
-     # getopt.awk --- Do C library getopt(3) function in awk
+The `wc' (word count) utility counts lines, words, and characters in
+one or more input files. Its usage is as follows:
 
-     # External variables:
-     #    Optind -- index in ARGV of first nonoption argument
-     #    Optarg -- string value of argument to current option
-     #    Opterr -- if nonzero, print our own diagnostic
-     #    Optopt -- current option letter
+     wc [-lwc] [ FILES ... ]
 
-     # Returns:
-     #    -1     at end of options
-     #    "?"    for unrecognized option
-     #    <c>    a character representing the current option
+   If no files are specified on the command line, `wc' reads its
+standard input. If there are multiple files, it also prints total
+counts for all the files.  The options and their meanings are shown in
+the following list:
 
-     # Private Data:
-     #    _opti  -- index in multi-flag option, e.g., -abc
+`-l'
+     Count only lines.
 
-   The function starts out with comments presenting a list of the
-global variables it uses, what the return values are, what they mean,
-and any global variables that are "private" to this library function.
-Such documentation is essential for any program, and particularly for
-library functions.
+`-w'
+     Count only words.  A "word" is a contiguous sequence of
+     nonwhitespace characters, separated by spaces and/or TABs.
+     Luckily, this is the normal way `awk' separates fields in its
+     input data.
 
-   The `getopt()' function first checks that it was indeed called with
-a string of options (the `options' parameter).  If `options' has a zero
-length, `getopt()' immediately returns -1:
+`-c'
+     Count only characters.
 
-     function getopt(argc, argv, options,    thisopt, i)
-     {
-         if (length(options) == 0)    # no options given
-             return -1
+   Implementing `wc' in `awk' is particularly elegant, since `awk' does
+a lot of the work for us; it splits lines into words (i.e., fields) and
+counts them, it counts lines (i.e., records), and it can easily tell us
+how long a line is.
 
-         if (argv[Optind] == "--") {  # all done
-             Optind++
-             _opti = 0
-             return -1
-         } else if (argv[Optind] !~ /^-[^:[:space:]]/) {
-             _opti = 0
-             return -1
-         }
+   This program uses the `getopt()' library function (*note Getopt
+Function::) and the file-transition functions (*note Filetrans
+Function::).
 
-   The next thing to check for is the end of the options.  A `--' ends
-the command-line options, as does any command-line argument that does
-not begin with a `-'.  `Optind' is used to step through the array of
-command-line arguments; it retains its value across calls to
-`getopt()', because it is a global variable.
+   This version has one notable difference from traditional versions of
+`wc': it always prints the counts in the order lines, words, and
+characters.  Traditional versions note the order of the `-l', `-w', and
+`-c' options on the command line, and print the counts in that order.
 
-   The regular expression that is used, `/^-[^:[:space:]/', checks for
-a `-' followed by anything that is not whitespace and not a colon.  If
-the current command-line argument does not match this pattern, it is
-not an option, and it ends option processing. Continuing on:
+   The `BEGIN' rule does the argument processing.  The variable
+`print_total' is true if more than one file is named on the command
+line:
 
-         if (_opti == 0)
-             _opti = 2
-         thisopt = substr(argv[Optind], _opti, 1)
-         Optopt = thisopt
-         i = index(options, thisopt)
-         if (i == 0) {
-             if (Opterr)
-                 printf("%c -- invalid option\n",
-                                       thisopt) > "/dev/stderr"
-             if (_opti >= length(argv[Optind])) {
-                 Optind++
-                 _opti = 0
-             } else
-                 _opti++
-             return "?"
-         }
+     # wc.awk --- count lines, words, characters
 
-   The `_opti' variable tracks the position in the current command-line
-argument (`argv[Optind]').  If multiple options are grouped together
-with one `-' (e.g., `-abx'), it is necessary to return them to the user
-one at a time.
+     # Options:
+     #    -l    only count lines
+     #    -w    only count words
+     #    -c    only count characters
+     #
+     # Default is to count lines, words, characters
+     #
+     # Requires getopt() and file transition library functions
 
-   If `_opti' is equal to zero, it is set to two, which is the index in
-the string of the next character to look at (we skip the `-', which is
-at position one).  The variable `thisopt' holds the character, obtained
-with `substr()'.  It is saved in `Optopt' for the main program to use.
+     BEGIN {
+         # let getopt() print a message about
+         # invalid options. we ignore them
+         while ((c = getopt(ARGC, ARGV, "lwc")) != -1) {
+             if (c == "l")
+                 do_lines = 1
+             else if (c == "w")
+                 do_words = 1
+             else if (c == "c")
+                 do_chars = 1
+         }
+         for (i = 1; i < Optind; i++)
+             ARGV[i] = ""
 
-   If `thisopt' is not in the `options' string, then it is an invalid
-option.  If `Opterr' is nonzero, `getopt()' prints an error message on
-the standard error that is similar to the message from the C version of
-`getopt()'.
+         # if no options, do all
+         if (! do_lines && ! do_words && ! do_chars)
+             do_lines = do_words = do_chars = 1
 
-   Because the option is invalid, it is necessary to skip it and move
-on to the next option character.  If `_opti' is greater than or equal
-to the length of the current command-line argument, it is necessary to
-move on to the next argument, so `Optind' is incremented and `_opti' is
-reset to zero. Otherwise, `Optind' is left alone and `_opti' is merely
-incremented.
+         print_total = (ARGC - i > 2)
+     }
 
-   In any case, because the option is invalid, `getopt()' returns `"?"'.
-The main program can examine `Optopt' if it needs to know what the
-invalid option letter actually is. Continuing on:
+   The `beginfile()' function is simple; it just resets the counts of
+lines, words, and characters to zero, and saves the current file name in
+`fname':
 
-         if (substr(options, i + 1, 1) == ":") {
-             # get option argument
-             if (length(substr(argv[Optind], _opti + 1)) > 0)
-                 Optarg = substr(argv[Optind], _opti + 1)
-             else
-                 Optarg = argv[++Optind]
-             _opti = 0
-         } else
-             Optarg = ""
+     function beginfile(file)
+     {
+         lines = words = chars = 0
+         fname = FILENAME
+     }
 
-   If the option requires an argument, the option letter is followed by
-a colon in the `options' string.  If there are remaining characters in
-the current command-line argument (`argv[Optind]'), then the rest of
-that string is assigned to `Optarg'.  Otherwise, the next command-line
-argument is used (`-xFOO' versus `-x FOO'). In either case, `_opti' is
-reset to zero, because there are no more characters left to examine in
-the current command-line argument. Continuing:
+   The `endfile()' function adds the current file's numbers to the
+running totals of lines, words, and characters.(1)  It then prints out
+those numbers for the file that was just read. It relies on
+`beginfile()' to reset the numbers for the following data file:
 
-         if (_opti == 0 || _opti >= length(argv[Optind])) {
-             Optind++
-             _opti = 0
-         } else
-             _opti++
-         return thisopt
+     function endfile(file)
+     {
+         tlines += lines
+         twords += words
+         tchars += chars
+         if (do_lines)
+             printf "\t%d", lines
+         if (do_words)
+             printf "\t%d", words
+         if (do_chars)
+             printf "\t%d", chars
+         printf "\t%s\n", fname
      }
 
-   Finally, if `_opti' is either zero or greater than the length of the
-current command-line argument, it means this element in `argv' is
-through being processed, so `Optind' is incremented to point to the
-next element in `argv'.  If neither condition is true, then only
-`_opti' is incremented, so that the next option letter can be processed
-on the next call to `getopt()'.
+   There is one rule that is executed for each line. It adds the length
+of the record, plus one, to `chars'.(2) Adding one plus the record
+length is needed because the newline character separating records (the
+value of `RS') is not part of the record itself, and thus not included
+in its length.  Next, `lines' is incremented for each line read, and
+`words' is incremented by the value of `NF', which is the number of
+"words" on this line:
 
-   The `BEGIN' rule initializes both `Opterr' and `Optind' to one.
-`Opterr' is set to one, since the default behavior is for `getopt()' to
-print a diagnostic message upon seeing an invalid option.  `Optind' is
-set to one, since there's no reason to look at the program name, which
-is in `ARGV[0]':
+     # do per line
+     {
+         chars += length($0) + 1    # get newline
+         lines++
+         words += NF
+     }
 
-     BEGIN {
-         Opterr = 1    # default is to diagnose
-         Optind = 1    # skip ARGV[0]
+   Finally, the `END' rule simply prints the totals for all the files:
 
-         # test program
-         if (_getopt_test) {
-             while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
-                 printf("c = <%c>, optarg = <%s>\n",
-                                            _go_c, Optarg)
-             printf("non-option arguments:\n")
-             for (; Optind < ARGC; Optind++)
-                 printf("\tARGV[%d] = <%s>\n",
-                                         Optind, ARGV[Optind])
+     END {
+         if (print_total) {
+             if (do_lines)
+                 printf "\t%d", tlines
+             if (do_words)
+                 printf "\t%d", twords
+             if (do_chars)
+                 printf "\t%d", tchars
+             print "\ttotal"
          }
      }
 
-   The rest of the `BEGIN' rule is a simple test program.  Here is the
-result of two sample runs of the test program:
+   ---------- Footnotes ----------
 
-     $ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
-     -| c = <a>, optarg = <>
-     -| c = <c>, optarg = <>
-     -| c = <b>, optarg = <ARG>
-     -| non-option arguments:
-     -|         ARGV[3] = <bax>
-     -|         ARGV[4] = <-x>
+   (1) `wc' can't just use the value of `FNR' in `endfile()'. If you
+examine the code in *note Filetrans Function::, you will see that `FNR'
+has already been reset by the time `endfile()' is called.
 
-     $ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
-     -| c = <a>, optarg = <>
-     error--> x -- invalid option
-     -| c = <?>, optarg = <>
-     -| non-option arguments:
-     -|         ARGV[4] = <xyz>
-     -|         ARGV[5] = <abc>
+   (2) Since `gawk' understands multibyte locales, this code counts
+characters, not bytes.
 
-   In both runs, the first `--' terminates the arguments to `awk', so
-that it does not try to interpret the `-a', etc., as its own options.
+
+File: gawk.info,  Node: Miscellaneous Programs,  Prev: Clones,  Up: Sample 
Programs
 
-     NOTE: After `getopt()' is through, it is the responsibility of the
-     user level code to clear out all the elements of `ARGV' from 1 to
-     `Optind', so that `awk' does not try to process the command-line
-     options as file names.
+11.3 A Grab Bag of `awk' Programs
+=================================
 
-   Several of the sample programs presented in *note Sample Programs::,
-use `getopt()' to process their arguments.
+This minor node is a large "grab bag" of miscellaneous programs.  We
+hope you find them both interesting and enjoyable.
 
-   ---------- Footnotes ----------
+* Menu:
 
-   (1) This function was written before `gawk' acquired the ability to
-split strings into single characters using `""' as the separator.  We
-have left it alone, since using `substr()' is more portable.
+* Dupword Program::             Finding duplicated words in a document.
+* Alarm Program::               An alarm clock.
+* Translate Program::           A program similar to the `tr' utility.
+* Labels Program::              Printing mailing labels.
+* Word Sorting::                A program to produce a word usage count.
+* History Sorting::             Eliminating duplicate entries from a history
+                                file.
+* Extract Program::             Pulling out programs from Texinfo source
+                                files.
+* Simple Sed::                  A Simple Stream Editor.
+* Igawk Program::               A wrapper for `awk' that includes
+                                files.
+* Anagram Program::             Finding anagrams from a dictionary.
+* Signature Program::           People do amazing things with too much time on
+                                their hands.
 
 
-File: gawk.info,  Node: Passwd Functions,  Next: Group Functions,  Prev: 
Getopt Function,  Up: Library Functions
+File: gawk.info,  Node: Dupword Program,  Next: Alarm Program,  Up: 
Miscellaneous Programs
 
-12.5 Reading the User Database
-==============================
+11.3.1 Finding Duplicated Words in a Document
+---------------------------------------------
 
-The `PROCINFO' array (*note Built-in Variables::) provides access to
-the current user's real and effective user and group ID numbers, and if
-available, the user's supplementary group set.  However, because these
-are numbers, they do not provide very useful information to the average
-user.  There needs to be some way to find the user information
-associated with the user and group ID numbers.  This minor node
-presents a suite of functions for retrieving information from the user
-database.  *Note Group Functions::, for a similar suite that retrieves
-information from the group database.
+A common error when writing large amounts of prose is to accidentally
+duplicate words.  Typically you will see this in text as something like
+"the the program does the following..."  When the text is online, often
+the duplicated words occur at the end of one line and the beginning of
+another, making them very difficult to spot.
 
-   The POSIX standard does not define the file where user information is
-kept.  Instead, it provides the `<pwd.h>' header file and several C
-language subroutines for obtaining user information.  The primary
-function is `getpwent()', for "get password entry."  The "password"
-comes from the original user database file, `/etc/passwd', which stores
-user information, along with the encrypted passwords (hence the name).
+   This program, `dupword.awk', scans through a file one line at a time
+and looks for adjacent occurrences of the same word.  It also saves the
+last word on a line (in the variable `prev') for comparison with the
+first word on the next line.
 
-   While an `awk' program could simply read `/etc/passwd' directly,
-this file may not contain complete information about the system's set
-of users.(1) To be sure you are able to produce a readable and complete
-version of the user database, it is necessary to write a small C
-program that calls `getpwent()'.  `getpwent()' is defined as returning
-a pointer to a `struct passwd'.  Each time it is called, it returns the
-next entry in the database.  When there are no more entries, it returns
-`NULL', the null pointer.  When this happens, the C program should call
-`endpwent()' to close the database.  Following is `pwcat', a C program
-that "cats" the password database:
+   The first two statements make sure that the line is all lowercase,
+so that, for example, "The" and "the" compare equal to each other.  The
+next statement replaces nonalphanumeric and nonwhitespace characters
+with spaces, so that punctuation does not affect the comparison either.
+The characters are replaced with spaces so that formatting controls
+don't create nonsense words (e.g., the Texinfo address@hidden' becomes
+`codeNF' if punctuation is simply deleted).  The record is then resplit
+into fields, yielding just the actual words on the line, and ensuring
+that there are no empty fields.
 
-     /*
-      * pwcat.c
-      *
-      * Generate a printable version of the password database
-      */
-     #include <stdio.h>
-     #include <pwd.h>
+   If there are no fields left after removing all the punctuation, the
+current record is skipped.  Otherwise, the program loops through each
+word, comparing it to the previous one:
 
-     int
-     main(int argc, char **argv)
+     # dupword.awk --- find duplicate words in text
      {
-         struct passwd *p;
-
-         while ((p = getpwent()) != NULL)
-             printf("%s:%s:%ld:%ld:%s:%s:%s\n",
-                 p->pw_name, p->pw_passwd, (long) p->pw_uid,
-                 (long) p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
-
-         endpwent();
-         return 0;
+         $0 = tolower($0)
+         gsub(/[^[:alnum:][:blank:]]/, " ");
+         $0 = $0         # re-split
+         if (NF == 0)
+             next
+         if ($1 == prev)
+             printf("%s:%d: duplicate %s\n",
+                 FILENAME, FNR, $1)
+         for (i = 2; i <= NF; i++)
+             if ($i == $(i-1))
+                 printf("%s:%d: duplicate %s\n",
+                     FILENAME, FNR, $i)
+         prev = $NF
      }
 
-   If you don't understand C, don't worry about it.  The output from
-`pwcat' is the user database, in the traditional `/etc/passwd' format
-of colon-separated fields.  The fields are:
-
-Login name
-     The user's login name.
-
-Encrypted password
-     The user's encrypted password.  This may not be available on some
-     systems.
-
-User-ID
-     The user's numeric user ID number.  (On some systems it's a C
-     `long', and not an `int'.  Thus we cast it to `long' for all
-     cases.)
-
-Group-ID
-     The user's numeric group ID number.  (Similar comments about
-     `long' vs. `int' apply here.)
-
-Full name
-     The user's full name, and perhaps other information associated
-     with the user.
-
-Home directory
-     The user's login (or "home") directory (familiar to shell
-     programmers as `$HOME').
+
+File: gawk.info,  Node: Alarm Program,  Next: Translate Program,  Prev: 
Dupword Program,  Up: Miscellaneous Programs
 
-Login shell
-     The program that is run when the user logs in.  This is usually a
-     shell, such as Bash.
+11.3.2 An Alarm Clock Program
+-----------------------------
 
-   A few lines representative of `pwcat''s output are as follows:
+     Nothing cures insomnia like a ringing alarm clock.
+     Arnold Robbins
 
-     $ pwcat
-     -| root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
-     -| nobody:*:65534:65534::/:
-     -| daemon:*:1:1::/:
-     -| sys:*:2:2::/:/bin/csh
-     -| bin:*:3:3::/bin:
-     -| arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
-     -| miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
-     -| andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
-     ...
+   The following program is a simple "alarm clock" program.  You give
+it a time of day and an optional message.  At the specified time, it
+prints the message on the standard output. In addition, you can give it
+the number of times to repeat the message as well as a delay between
+repetitions.
 
-   With that introduction, following is a group of functions for
-getting user information.  There are several functions here,
-corresponding to the C functions of the same names:
+   This program uses the `getlocaltime()' function from *note
+Getlocaltime Function::.
 
-     # passwd.awk --- access password file information
+   All the work is done in the `BEGIN' rule.  The first part is argument
+checking and setting of defaults: the delay, the count, and the message
+to print.  If the user supplied a message without the ASCII BEL
+character (known as the "alert" character, `"\a"'), then it is added to
+the message.  (On many systems, printing the ASCII BEL generates an
+audible alert. Thus when the alarm goes off, the system calls attention
+to itself in case the user is not looking at the computer.)  Just for a
+change, this program uses a `switch' statement (*note Switch
+Statement::), but the processing could be done with a series of
+`if'-`else' statements instead.  Here is the program:
 
-     BEGIN {
-         # tailor this to suit your system
-         _pw_awklib = "/usr/local/libexec/awk/"
-     }
+     # alarm.awk --- set an alarm
+     #
+     # Requires getlocaltime() library function
+     # usage: alarm time [ "message" [ count [ delay ] ] ]
 
-     function _pw_init(    oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
+     BEGIN    \
      {
-         if (_pw_inited)
-             return
-
-         oldfs = FS
-         oldrs = RS
-         olddol0 = $0
-         using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
-         using_fpat = (PROCINFO["FS"] == "FPAT")
-         FS = ":"
-         RS = "\n"
+         # Initial argument sanity checking
+         usage1 = "usage: alarm time ['message' [count [delay]]]"
+         usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
 
-         pwcat = _pw_awklib "pwcat"
-         while ((pwcat | getline) > 0) {
-             _pw_byname[$1] = $0
-             _pw_byuid[$3] = $0
-             _pw_bycount[++_pw_total] = $0
+         if (ARGC < 2) {
+             print usage1 > "/dev/stderr"
+             print usage2 > "/dev/stderr"
+             exit 1
+         }
+         switch (ARGC) {
+         case 5:
+             delay = ARGV[4] + 0
+             # fall through
+         case 4:
+             count = ARGV[3] + 0
+             # fall through
+         case 3:
+             message = ARGV[2]
+             break
+         default:
+             if (ARGV[1] !~ /[[:digit:]]?[[:digit:]]:[[:digit:]]{2}/) {
+                 print usage1 > "/dev/stderr"
+                 print usage2 > "/dev/stderr"
+                 exit 1
+             }
+             break
          }
-         close(pwcat)
-         _pw_count = 0
-         _pw_inited = 1
-         FS = oldfs
-         if (using_fw)
-             FIELDWIDTHS = FIELDWIDTHS
-         else if (using_fpat)
-             FPAT = FPAT
-         RS = oldrs
-         $0 = olddol0
-     }
-
-   The `BEGIN' rule sets a private variable to the directory where
-`pwcat' is stored.  Because it is used to help out an `awk' library
-routine, we have chosen to put it in `/usr/local/libexec/awk'; however,
-you might want it to be in a different directory on your system.
-
-   The function `_pw_init()' keeps three copies of the user information
-in three associative arrays.  The arrays are indexed by username
-(`_pw_byname'), by user ID number (`_pw_byuid'), and by order of
-occurrence (`_pw_bycount').  The variable `_pw_inited' is used for
-efficiency, since `_pw_init()' needs to be called only once.
-
-   Because this function uses `getline' to read information from
-`pwcat', it first saves the values of `FS', `RS', and `$0'.  It notes
-in the variable `using_fw' whether field splitting with `FIELDWIDTHS'
-is in effect or not.  Doing so is necessary, since these functions
-could be called from anywhere within a user's program, and the user may
-have his or her own way of splitting records and fields.
-
-   The `using_fw' variable checks `PROCINFO["FS"]', which is
-`"FIELDWIDTHS"' if field splitting is being done with `FIELDWIDTHS'.
-This makes it possible to restore the correct field-splitting mechanism
-later.  The test can only be true for `gawk'.  It is false if using
-`FS' or `FPAT', or on some other `awk' implementation.
-
-   The code that checks for using `FPAT', using `using_fpat' and
-`PROCINFO["FS"]' is similar.
-
-   The main part of the function uses a loop to read database lines,
-split the line into fields, and then store the line into each array as
-necessary.  When the loop is done, `_pw_init()' cleans up by closing
-the pipeline, setting `_pw_inited' to one, and restoring `FS' (and
-`FIELDWIDTHS' or `FPAT' if necessary), `RS', and `$0'.  The use of
-`_pw_count' is explained shortly.
-
-   The `getpwnam()' function takes a username as a string argument. If
-that user is in the database, it returns the appropriate line.
-Otherwise, it relies on the array reference to a nonexistent element to
-create the element with the null string as its value:
 
-     function getpwnam(name)
-     {
-         _pw_init()
-         return _pw_byname[name]
-     }
+         # set defaults for once we reach the desired time
+         if (delay == 0)
+             delay = 180    # 3 minutes
+         if (count == 0)
+             count = 5
+         if (message == "")
+             message = sprintf("\aIt is now %s!\a", ARGV[1])
+         else if (index(message, "\a") == 0)
+             message = "\a" message "\a"
 
-   Similarly, the `getpwuid' function takes a user ID number argument.
-If that user number is in the database, it returns the appropriate
-line. Otherwise, it returns the null string:
+   The next minor node of code turns the alarm time into hours and
+minutes, converts it (if necessary) to a 24-hour clock, and then turns
+that time into a count of the seconds since midnight.  Next it turns
+the current time into a count of seconds since midnight.  The
+difference between the two is how long to wait before setting off the
+alarm:
 
-     function getpwuid(uid)
-     {
-         _pw_init()
-         return _pw_byuid[uid]
-     }
+         # split up alarm time
+         split(ARGV[1], atime, ":")
+         hour = atime[1] + 0    # force numeric
+         minute = atime[2] + 0  # force numeric
 
-   The `getpwent()' function simply steps through the database, one
-entry at a time.  It uses `_pw_count' to track its current position in
-the `_pw_bycount' array:
+         # get current broken down time
+         getlocaltime(now)
 
-     function getpwent()
-     {
-         _pw_init()
-         if (_pw_count < _pw_total)
-             return _pw_bycount[++_pw_count]
-         return ""
-     }
+         # if time given is 12-hour hours and it's after that
+         # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
+         # then add 12 to real hour
+         if (hour < 12 && now["hour"] > hour)
+             hour += 12
 
-   The `endpwent()' function resets `_pw_count' to zero, so that
-subsequent calls to `getpwent()' start over again:
+         # set target time in seconds since midnight
+         target = (hour * 60 * 60) + (minute * 60)
 
-     function endpwent()
-     {
-         _pw_count = 0
-     }
+         # get current time in seconds since midnight
+         current = (now["hour"] * 60 * 60) + \
+                    (now["minute"] * 60) + now["second"]
 
-   A conscious design decision in this suite is that each subroutine
-calls `_pw_init()' to initialize the database arrays.  The overhead of
-running a separate process to generate the user database, and the I/O
-to scan it, are only incurred if the user's main program actually calls
-one of these functions.  If this library file is loaded along with a
-user's program, but none of the routines are ever called, then there is
-no extra runtime overhead.  (The alternative is move the body of
-`_pw_init()' into a `BEGIN' rule, which always runs `pwcat'.  This
-simplifies the code but runs an extra process that may never be needed.)
+         # how long to sleep for
+         naptime = target - current
+         if (naptime <= 0) {
+             print "time is in the past!" > "/dev/stderr"
+             exit 1
+         }
 
-   In turn, calling `_pw_init()' is not too expensive, because the
-`_pw_inited' variable keeps the program from reading the data more than
-once.  If you are worried about squeezing every last cycle out of your
-`awk' program, the check of `_pw_inited' could be moved out of
-`_pw_init()' and duplicated in all the other functions.  In practice,
-this is not necessary, since most `awk' programs are I/O-bound, and
-such a change would clutter up the code.
+   Finally, the program uses the `system()' function (*note I/O
+Functions::) to call the `sleep' utility.  The `sleep' utility simply
+pauses for the given number of seconds.  If the exit status is not zero,
+the program assumes that `sleep' was interrupted and exits. If `sleep'
+exited with an OK status (zero), then the program prints the message in
+a loop, again using `sleep' to delay for however many seconds are
+necessary:
 
-   The `id' program in *note Id Program::, uses these functions.
+         # zzzzzz..... go away if interrupted
+         if (system(sprintf("sleep %d", naptime)) != 0)
+             exit 1
 
-   ---------- Footnotes ----------
+         # time to notify!
+         command = sprintf("sleep %d", delay)
+         for (i = 1; i <= count; i++) {
+             print message
+             # if sleep command interrupted, go away
+             if (system(command) != 0)
+                 break
+         }
 
-   (1) It is often the case that password information is stored in a
-network database.
+         exit 0
+     }
 
 
-File: gawk.info,  Node: Group Functions,  Next: Walking Arrays,  Prev: Passwd 
Functions,  Up: Library Functions
-
-12.6 Reading the Group Database
-===============================
+File: gawk.info,  Node: Translate Program,  Next: Labels Program,  Prev: Alarm 
Program,  Up: Miscellaneous Programs
 
-Much of the discussion presented in *note Passwd Functions::, applies
-to the group database as well.  Although there has traditionally been a
-well-known file (`/etc/group') in a well-known format, the POSIX
-standard only provides a set of C library routines (`<grp.h>' and
-`getgrent()') for accessing the information.  Even though this file may
-exist, it may not have complete information.  Therefore, as with the
-user database, it is necessary to have a small C program that generates
-the group database as its output.  `grcat', a C program that "cats" the
-group database, is as follows:
+11.3.3 Transliterating Characters
+---------------------------------
 
-     /*
-      * grcat.c
-      *
-      * Generate a printable version of the group database
-      */
-     #include <stdio.h>
-     #include <grp.h>
+The system `tr' utility transliterates characters.  For example, it is
+often used to map uppercase letters into lowercase for further
+processing:
 
-     int
-     main(int argc, char **argv)
-     {
-         struct group *g;
-         int i;
+     GENERATE DATA | tr 'A-Z' 'a-z' | PROCESS DATA ...
 
-         while ((g = getgrent()) != NULL) {
-             printf("%s:%s:%ld:", g->gr_name, g->gr_passwd,
-                                          (long) g->gr_gid);
-             for (i = 0; g->gr_mem[i] != NULL; i++) {
-                 printf("%s", g->gr_mem[i]);
-                 if (g->gr_mem[i+1] != NULL)
-                     putchar(',');
-             }
-             putchar('\n');
-         }
-         endgrent();
-         return 0;
-     }
+   `tr' requires two lists of characters.(1)  When processing the
+input, the first character in the first list is replaced with the first
+character in the second list, the second character in the first list is
+replaced with the second character in the second list, and so on.  If
+there are more characters in the "from" list than in the "to" list, the
+last character of the "to" list is used for the remaining characters in
+the "from" list.
 
-   Each line in the group database represents one group.  The fields are
-separated with colons and represent the following information:
+   Some time ago, a user proposed that a transliteration function should
+be added to `gawk'.  The following program was written to prove that
+character transliteration could be done with a user-level function.
+This program is not as complete as the system `tr' utility but it does
+most of the job.
 
-Group Name
-     The group's name.
+   The `translate' program demonstrates one of the few weaknesses of
+standard `awk': dealing with individual characters is very painful,
+requiring repeated use of the `substr()', `index()', and `gsub()'
+built-in functions (*note String Functions::).(2) There are two
+functions.  The first, `stranslate()', takes three arguments:
 
-Group Password
-     The group's encrypted password. In practice, this field is never
-     used; it is usually empty or set to `*'.
+`from'
+     A list of characters from which to translate.
 
-Group ID Number
-     The group's numeric group ID number; this number must be unique
-     within the file.  (On some systems it's a C `long', and not an
-     `int'.  Thus we cast it to `long' for all cases.)
+`to'
+     A list of characters to which to translate.
 
-Group Member List
-     A comma-separated list of user names.  These users are members of
-     the group.  Modern Unix systems allow users to be members of
-     several groups simultaneously.  If your system does, then there
-     are elements `"group1"' through `"groupN"' in `PROCINFO' for those
-     group ID numbers.  (Note that `PROCINFO' is a `gawk' extension;
-     *note Built-in Variables::.)
+`target'
+     The string on which to do the translation.
 
-   Here is what running `grcat' might produce:
+   Associative arrays make the translation part fairly easy. `t_ar'
+holds the "to" characters, indexed by the "from" characters.  Then a
+simple loop goes through `from', one character at a time.  For each
+character in `from', if the character appears in `target', it is
+replaced with the corresponding `to' character.
 
-     $ grcat
-     -| wheel:*:0:arnold
-     -| nogroup:*:65534:
-     -| daemon:*:1:
-     -| kmem:*:2:
-     -| staff:*:10:arnold,miriam,andy
-     -| other:*:20:
-     ...
+   The `translate()' function simply calls `stranslate()' using `$0' as
+the target.  The main program sets two global variables, `FROM' and
+`TO', from the command line, and then changes `ARGV' so that `awk'
+reads from the standard input.
 
-   Here are the functions for obtaining information from the group
-database.  There are several, modeled after the C library functions of
-the same names:
+   Finally, the processing rule simply calls `translate()' for each
+record:
 
-     # group.awk --- functions for dealing with the group file
+     # translate.awk --- do tr-like stuff
+     # Bugs: does not handle things like: tr A-Z a-z, it has
+     # to be spelled out. However, if `to' is shorter than `from',
+     # the last character in `to' is used for the rest of `from'.
 
-     BEGIN    \
+     function stranslate(from, to, target,     lf, lt, ltarget, t_ar, i, c,
+                                                                    result)
      {
-         # Change to suit your system
-         _gr_awklib = "/usr/local/libexec/awk/"
+         lf = length(from)
+         lt = length(to)
+         ltarget = length(target)
+         for (i = 1; i <= lt; i++)
+             t_ar[substr(from, i, 1)] = substr(to, i, 1)
+         if (lt < lf)
+             for (; i <= lf; i++)
+                 t_ar[substr(from, i, 1)] = substr(to, lt, 1)
+         for (i = 1; i <= ltarget; i++) {
+             c = substr(target, i, 1)
+             if (c in t_ar)
+                 c = t_ar[c]
+             result = result c
+         }
+         return result
      }
 
-     function _gr_init(    oldfs, oldrs, olddol0, grcat,
-                                  using_fw, using_fpat, n, a, i)
+     function translate(from, to)
      {
-         if (_gr_inited)
-             return
-
-         oldfs = FS
-         oldrs = RS
-         olddol0 = $0
-         using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
-         using_fpat = (PROCINFO["FS"] == "FPAT")
-         FS = ":"
-         RS = "\n"
-
-         grcat = _gr_awklib "grcat"
-         while ((grcat | getline) > 0) {
-             if ($1 in _gr_byname)
-                 _gr_byname[$1] = _gr_byname[$1] "," $4
-             else
-                 _gr_byname[$1] = $0
-             if ($3 in _gr_bygid)
-                 _gr_bygid[$3] = _gr_bygid[$3] "," $4
-             else
-                 _gr_bygid[$3] = $0
-
-             n = split($4, a, "[ \t]*,[ \t]*")
-             for (i = 1; i <= n; i++)
-                 if (a[i] in _gr_groupsbyuser)
-                     _gr_groupsbyuser[a[i]] = \
-                         _gr_groupsbyuser[a[i]] " " $1
-                 else
-                     _gr_groupsbyuser[a[i]] = $1
+         return $0 = stranslate(from, to, $0)
+     }
 
-             _gr_bycount[++_gr_count] = $0
+     # main program
+     BEGIN {
+         if (ARGC < 3) {
+             print "usage: translate from to" > "/dev/stderr"
+             exit
          }
-         close(grcat)
-         _gr_count = 0
-         _gr_inited++
-         FS = oldfs
-         if (using_fw)
-             FIELDWIDTHS = FIELDWIDTHS
-         else if (using_fpat)
-             FPAT = FPAT
-         RS = oldrs
-         $0 = olddol0
+         FROM = ARGV[1]
+         TO = ARGV[2]
+         ARGC = 2
+         ARGV[1] = "-"
      }
 
-   The `BEGIN' rule sets a private variable to the directory where
-`grcat' is stored.  Because it is used to help out an `awk' library
-routine, we have chosen to put it in `/usr/local/libexec/awk'.  You
-might want it to be in a different directory on your system.
+     {
+         translate(FROM, TO)
+         print
+     }
 
-   These routines follow the same general outline as the user database
-routines (*note Passwd Functions::).  The `_gr_inited' variable is used
-to ensure that the database is scanned no more than once.  The
-`_gr_init()' function first saves `FS', `RS', and `$0', and then sets
-`FS' and `RS' to the correct values for scanning the group information.
-It also takes care to note whether `FIELDWIDTHS' or `FPAT' is being
-used, and to restore the appropriate field splitting mechanism.
+   While it is possible to do character transliteration in a user-level
+function, it is not necessarily efficient, and we (the `gawk' authors)
+started to consider adding a built-in function.  However, shortly after
+writing this program, we learned that the System V Release 4 `awk' had
+added the `toupper()' and `tolower()' functions (*note String
+Functions::).  These functions handle the vast majority of the cases
+where character transliteration is necessary, and so we chose to simply
+add those functions to `gawk' as well and then leave well enough alone.
 
-   The group information is stored is several associative arrays.  The
-arrays are indexed by group name (`_gr_byname'), by group ID number
-(`_gr_bygid'), and by position in the database (`_gr_bycount').  There
-is an additional array indexed by user name (`_gr_groupsbyuser'), which
-is a space-separated list of groups to which each user belongs.
+   An obvious improvement to this program would be to set up the `t_ar'
+array only once, in a `BEGIN' rule. However, this assumes that the
+"from" and "to" lists will never change throughout the lifetime of the
+program.
 
-   Unlike the user database, it is possible to have multiple records in
-the database for the same group.  This is common when a group has a
-large number of members.  A pair of such entries might look like the
-following:
+   ---------- Footnotes ----------
 
-     tvpeople:*:101:johnny,jay,arsenio
-     tvpeople:*:101:david,conan,tom,joan
+   (1) On some older systems, `tr' may require that the lists be
+written as range expressions enclosed in square brackets (`[a-z]') and
+quoted, to prevent the shell from attempting a file name expansion.
+This is not a feature.
 
-   For this reason, `_gr_init()' looks to see if a group name or group
-ID number is already seen.  If it is, then the user names are simply
-concatenated onto the previous list of users.  (There is actually a
-subtle problem with the code just presented.  Suppose that the first
-time there were no names. This code adds the names with a leading
-comma. It also doesn't check that there is a `$4'.)
+   (2) This program was written before `gawk' acquired the ability to
+split each character in a string into separate array elements.
 
-   Finally, `_gr_init()' closes the pipeline to `grcat', restores `FS'
-(and `FIELDWIDTHS' or `FPAT' if necessary), `RS', and `$0', initializes
-`_gr_count' to zero (it is used later), and makes `_gr_inited' nonzero.
+
+File: gawk.info,  Node: Labels Program,  Next: Word Sorting,  Prev: Translate 
Program,  Up: Miscellaneous Programs
 
-   The `getgrnam()' function takes a group name as its argument, and if
-that group exists, it is returned.  Otherwise, it relies on the array
-reference to a nonexistent element to create the element with the null
-string as its value:
+11.3.4 Printing Mailing Labels
+------------------------------
 
-     function getgrnam(group)
-     {
-         _gr_init()
-         return _gr_byname[group]
-     }
+Here is a "real world"(1) program.  This script reads lists of names and
+addresses and generates mailing labels.  Each page of labels has 20
+labels on it, two across and 10 down.  The addresses are guaranteed to
+be no more than five lines of data.  Each address is separated from the
+next by a blank line.
 
-   The `getgrgid()' function is similar; it takes a numeric group ID and
-looks up the information associated with that group ID:
+   The basic idea is to read 20 labels worth of data.  Each line of
+each label is stored in the `line' array.  The single rule takes care
+of filling the `line' array and printing the page when 20 labels have
+been read.
 
-     function getgrgid(gid)
-     {
-         _gr_init()
-         return _gr_bygid[gid]
-     }
+   The `BEGIN' rule simply sets `RS' to the empty string, so that `awk'
+splits records at blank lines (*note Records::).  It sets `MAXLINES' to
+100, since 100 is the maximum number of lines on the page (20 * 5 =
+100).
 
-   The `getgruser()' function does not have a C counterpart. It takes a
-user name and returns the list of groups that have the user as a member:
+   Most of the work is done in the `printpage()' function.  The label
+lines are stored sequentially in the `line' array.  But they have to
+print horizontally; `line[1]' next to `line[6]', `line[2]' next to
+`line[7]', and so on.  Two loops are used to accomplish this.  The
+outer loop, controlled by `i', steps through every 10 lines of data;
+this is each row of labels.  The inner loop, controlled by `j', goes
+through the lines within the row.  As `j' goes from 0 to 4, `i+j' is
+the `j'-th line in the row, and `i+j+5' is the entry next to it.  The
+output ends up looking something like this:
 
-     function getgruser(user)
-     {
-         _gr_init()
-         return _gr_groupsbyuser[user]
-     }
+     line 1          line 6
+     line 2          line 7
+     line 3          line 8
+     line 4          line 9
+     line 5          line 10
+     ...
 
-   The `getgrent()' function steps through the database one entry at a
-time.  It uses `_gr_count' to track its position in the list:
+The `printf' format string `%-41s' left-aligns the data and prints it
+within a fixed-width field.
 
-     function getgrent()
-     {
-         _gr_init()
-         if (++_gr_count in _gr_bycount)
-             return _gr_bycount[_gr_count]
-         return ""
-     }
+   As a final note, an extra blank line is printed at lines 21 and 61,
+to keep the output lined up on the labels.  This is dependent on the
+particular brand of labels in use when the program was written.  You
+will also note that there are two blank lines at the top and two blank
+lines at the bottom.
 
-   The `endgrent()' function resets `_gr_count' to zero so that
-`getgrent()' can start over again:
+   The `END' rule arranges to flush the final page of labels; there may
+not have been an even multiple of 20 labels in the data:
 
-     function endgrent()
-     {
-         _gr_count = 0
-     }
+     # labels.awk --- print mailing labels
 
-   As with the user database routines, each function calls `_gr_init()'
-to initialize the arrays.  Doing so only incurs the extra overhead of
-running `grcat' if these functions are used (as opposed to moving the
-body of `_gr_init()' into a `BEGIN' rule).
+     # Each label is 5 lines of data that may have blank lines.
+     # The label sheets have 2 blank lines at the top and 2 at
+     # the bottom.
 
-   Most of the work is in scanning the database and building the various
-associative arrays.  The functions that the user calls are themselves
-very simple, relying on `awk''s associative arrays to do work.
+     BEGIN    { RS = "" ; MAXLINES = 100 }
 
-   The `id' program in *note Id Program::, uses these functions.
+     function printpage(    i, j)
+     {
+         if (Nlines <= 0)
+             return
 
-
-File: gawk.info,  Node: Walking Arrays,  Prev: Group Functions,  Up: Library 
Functions
+         printf "\n\n"        # header
 
-12.7 Traversing Arrays of Arrays
-================================
+         for (i = 1; i <= Nlines; i += 10) {
+             if (i == 21 || i == 61)
+                 print ""
+             for (j = 0; j < 5; j++) {
+                 if (i + j > MAXLINES)
+                     break
+                 printf "   %-41s %s\n", line[i+j], line[i+j+5]
+             }
+             print ""
+         }
 
-*note Arrays of Arrays::, described how `gawk' provides arrays of
-arrays.  In particular, any element of an array may be either a scalar,
-or another array. The `isarray()' function (*note Type Functions::)
-lets you distinguish an array from a scalar.  The following function,
-`walk_array()', recursively traverses an array, printing each element's
-indices and value.  You call it with the array and a string
-representing the name of the array:
+         printf "\n\n"        # footer
 
-     function walk_array(arr, name,      i)
+         delete line
+     }
+
+     # main rule
      {
-         for (i in arr) {
-             if (isarray(arr[i]))
-                 walk_array(arr[i], (name "[" i "]"))
-             else
-                 printf("%s[%s] = %s\n", name, i, arr[i])
+         if (Count >= 20) {
+             printpage()
+             Count = 0
+             Nlines = 0
          }
+         n = split($0, a, "\n")
+         for (i = 1; i <= n; i++)
+             line[++Nlines] = a[i]
+         for (; i <= 5; i++)
+             line[++Nlines] = ""
+         Count++
      }
 
-It works by looping over each element of the array. If any given
-element is itself an array, the function calls itself recursively,
-passing the subarray and a new string representing the current index.
-Otherwise, the function simply prints the element's name, index, and
-value.  Here is a main program to demonstrate:
-
-     BEGIN {
-         a[1] = 1
-         a[2][1] = 21
-         a[2][2] = 22
-         a[3] = 3
-         a[4][1][1] = 411
-         a[4][2] = 42
-
-         walk_array(a, "a")
+     END    \
+     {
+         printpage()
      }
 
-   When run, the program produces the following output:
+   ---------- Footnotes ----------
 
-     $ gawk -f walk_array.awk
-     -| a[4][1][1] = 411
-     -| a[4][2] = 42
-     -| a[1] = 1
-     -| a[2][1] = 21
-     -| a[2][2] = 22
-     -| a[3] = 3
+   (1) "Real world" is defined as "a program actually used to get
+something done."
 
 
-File: gawk.info,  Node: Sample Programs,  Next: Debugger,  Prev: Library 
Functions,  Up: Top
+File: gawk.info,  Node: Word Sorting,  Next: History Sorting,  Prev: Labels 
Program,  Up: Miscellaneous Programs
 
-13 Practical `awk' Programs
-***************************
+11.3.5 Generating Word-Usage Counts
+-----------------------------------
 
-*note Library Functions::, presents the idea that reading programs in a
-language contributes to learning that language.  This major node
-continues that theme, presenting a potpourri of `awk' programs for your
-reading enjoyment.
+When working with large amounts of text, it can be interesting to know
+how often different words appear.  For example, an author may overuse
+certain words, in which case she might wish to find synonyms to
+substitute for words that appear too often. This node develops a
+program for counting words and presenting the frequency information in
+a useful format.
 
-   Many of these programs use library functions presented in *note
-Library Functions::.
+   At first glance, a program like this would seem to do the job:
 
-* Menu:
+     # Print list of word frequencies
 
-* Running Examples::            How to run these examples.
-* Clones::                      Clones of common utilities.
-* Miscellaneous Programs::      Some interesting `awk' programs.
+     {
+         for (i = 1; i <= NF; i++)
+             freq[$i]++
+     }
 
-
-File: gawk.info,  Node: Running Examples,  Next: Clones,  Up: Sample Programs
+     END {
+         for (word in freq)
+             printf "%s\t%d\n", word, freq[word]
+     }
 
-13.1 Running the Example Programs
-=================================
+   The program relies on `awk''s default field splitting mechanism to
+break each line up into "words," and uses an associative array named
+`freq', indexed by each word, to count the number of times the word
+occurs. In the `END' rule, it prints the counts.
 
-To run a given program, you would typically do something like this:
+   This program has several problems that prevent it from being useful
+on real text files:
+
+   * The `awk' language considers upper- and lowercase characters to be
+     distinct.  Therefore, "bartender" and "Bartender" are not treated
+     as the same word.  This is undesirable, since in normal text, words
+     are capitalized if they begin sentences, and a frequency analyzer
+     should not be sensitive to capitalization.
+
+   * Words are detected using the `awk' convention that fields are
+     separated just by whitespace.  Other characters in the input
+     (except newlines) don't have any special meaning to `awk'.  This
+     means that punctuation characters count as part of words.
+
+   * The output does not come out in any useful order.  You're more
+     likely to be interested in which words occur most frequently or in
+     having an alphabetized table of how frequently each word occurs.
+
+   The first problem can be solved by using `tolower()' to remove case
+distinctions.  The second problem can be solved by using `gsub()' to
+remove punctuation characters.  Finally, we solve the third problem by
+using the system `sort' utility to process the output of the `awk'
+script.  Here is the new version of the program:
 
-     awk -f PROGRAM -- OPTIONS FILES
+     # wordfreq.awk --- print list of word frequencies
 
-Here, PROGRAM is the name of the `awk' program (such as `cut.awk'),
-OPTIONS are any command-line options for the program that start with a
-`-', and FILES are the actual data files.
+     {
+         $0 = tolower($0)    # remove case distinctions
+         # remove punctuation
+         gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
+         for (i = 1; i <= NF; i++)
+             freq[$i]++
+     }
 
-   If your system supports the `#!' executable interpreter mechanism
-(*note Executable Scripts::), you can instead run your program directly:
+     END {
+         for (word in freq)
+             printf "%s\t%d\n", word, freq[word]
+     }
 
-     cut.awk -c1-8 myfiles > results
+   Assuming we have saved this program in a file named `wordfreq.awk',
+and that the data is in `file1', the following pipeline:
 
-   If your `awk' is not `gawk', you may instead need to use this:
+     awk -f wordfreq.awk file1 | sort -k 2nr
 
-     cut.awk -- -c1-8 myfiles > results
+produces a table of the words appearing in `file1' in order of
+decreasing frequency.
 
-
-File: gawk.info,  Node: Clones,  Next: Miscellaneous Programs,  Prev: Running 
Examples,  Up: Sample Programs
+   The `awk' program suitably massages the data and produces a word
+frequency table, which is not ordered.  The `awk' script's output is
+then sorted by the `sort' utility and printed on the screen.
 
-13.2 Reinventing Wheels for Fun and Profit
-==========================================
+   The options given to `sort' specify a sort that uses the second
+field of each input line (skipping one field), that the sort keys
+should be treated as numeric quantities (otherwise `15' would come
+before `5'), and that the sorting should be done in descending
+(reverse) order.
 
-This minor node presents a number of POSIX utilities implemented in
-`awk'.  Reinventing these programs in `awk' is often enjoyable, because
-the algorithms can be very clearly expressed, and the code is usually
-very concise and simple.  This is true because `awk' does so much for
-you.
+   The `sort' could even be done from within the program, by changing
+the `END' action to:
 
-   It should be noted that these programs are not necessarily intended
-to replace the installed versions on your system.  Nor may all of these
-programs be fully compliant with the most recent POSIX standard.  This
-is not a problem; their purpose is to illustrate `awk' language
-programming for "real world" tasks.
+     END {
+         sort = "sort -k 2nr"
+         for (word in freq)
+             printf "%s\t%d\n", word, freq[word] | sort
+         close(sort)
+     }
 
-   The programs are presented in alphabetical order.
+   This way of sorting must be used on systems that do not have true
+pipes at the command-line (or batch-file) level.  See the general
+operating system documentation for more information on how to use the
+`sort' program.
 
-* Menu:
+
+File: gawk.info,  Node: History Sorting,  Next: Extract Program,  Prev: Word 
Sorting,  Up: Miscellaneous Programs
 
-* Cut Program::                 The `cut' utility.
-* Egrep Program::               The `egrep' utility.
-* Id Program::                  The `id' utility.
-* Split Program::               The `split' utility.
-* Tee Program::                 The `tee' utility.
-* Uniq Program::                The `uniq' utility.
-* Wc Program::                  The `wc' utility.
+11.3.6 Removing Duplicates from Unsorted Text
+---------------------------------------------
 
-
-File: gawk.info,  Node: Cut Program,  Next: Egrep Program,  Up: Clones
+The `uniq' program (*note Uniq Program::), removes duplicate lines from
+_sorted_ data.
 
-13.2.1 Cutting out Fields and Columns
--------------------------------------
+   Suppose, however, you need to remove duplicate lines from a data
+file but that you want to preserve the order the lines are in.  A good
+example of this might be a shell history file.  The history file keeps
+a copy of all the commands you have entered, and it is not unusual to
+repeat a command several times in a row.  Occasionally you might want
+to compact the history by removing duplicate entries.  Yet it is
+desirable to maintain the order of the original commands.
 
-The `cut' utility selects, or "cuts," characters or fields from its
-standard input and sends them to its standard output.  Fields are
-separated by TABs by default, but you may supply a command-line option
-to change the field "delimiter" (i.e., the field-separator character).
-`cut''s definition of fields is less general than `awk''s.
+   This simple program does the job.  It uses two arrays.  The `data'
+array is indexed by the text of each line.  For each line, `data[$0]'
+is incremented.  If a particular line has not been seen before, then
+`data[$0]' is zero.  In this case, the text of the line is stored in
+`lines[count]'.  Each element of `lines' is a unique command, and the
+indices of `lines' indicate the order in which those lines are
+encountered.  The `END' rule simply prints out the lines, in order:
 
-   A common use of `cut' might be to pull out just the login name of
-logged-on users from the output of `who'.  For example, the following
-pipeline generates a sorted, unique list of the logged-on users:
+     # histsort.awk --- compact a shell history file
+     # Thanks to Byron Rakitzis for the general idea
 
-     who | cut -c1-8 | sort | uniq
+     {
+         if (data[$0]++ == 0)
+             lines[++count] = $0
+     }
 
-   The options for `cut' are:
+     END {
+         for (i = 1; i <= count; i++)
+             print lines[i]
+     }
 
-`-c LIST'
-     Use LIST as the list of characters to cut out.  Items within the
-     list may be separated by commas, and ranges of characters can be
-     separated with dashes.  The list `1-8,15,22-35' specifies
-     characters 1 through 8, 15, and 22 through 35.
+   This program also provides a foundation for generating other useful
+information.  For example, using the following `print' statement in the
+`END' rule indicates how often a particular command is used:
 
-`-f LIST'
-     Use LIST as the list of fields to cut out.
+     print data[lines[i]], lines[i]
 
-`-d DELIM'
-     Use DELIM as the field-separator character instead of the TAB
-     character.
+   This works because `data[$0]' is incremented each time a line is
+seen.
 
-`-s'
-     Suppress printing of lines that do not contain the field delimiter.
+
+File: gawk.info,  Node: Extract Program,  Next: Simple Sed,  Prev: History 
Sorting,  Up: Miscellaneous Programs
 
-   The `awk' implementation of `cut' uses the `getopt()' library
-function (*note Getopt Function::) and the `join()' library function
-(*note Join Function::).
+11.3.7 Extracting Programs from Texinfo Source Files
+----------------------------------------------------
 
-   The program begins with a comment describing the options, the library
-functions needed, and a `usage()' function that prints out a usage
-message and exits.  `usage()' is called if invalid arguments are
-supplied:
+The nodes *note Library Functions::, and *note Sample Programs::, are
+the top level nodes for a large number of `awk' programs.  If you want
+to experiment with these programs, it is tedious to have to type them
+in by hand.  Here we present a program that can extract parts of a
+Texinfo input file into separate files.
 
-     # cut.awk --- implement cut in awk
+This Info file is written in Texinfo (http://texinfo.org), the GNU
+project's document formatting language.  A single Texinfo source file
+can be used to produce both printed and online documentation.  The
+Texinfo language is described fully, starting with *note (Texinfo)Top::
+texinfo,Texinfo--The GNU Documentation Format.
 
-     # Options:
-     #    -f list     Cut fields
-     #    -d c        Field delimiter character
-     #    -c list     Cut characters
-     #
-     #    -s          Suppress lines without the delimiter
-     #
-     # Requires getopt() and join() library functions
+   For our purposes, it is enough to know three things about Texinfo
+input files:
 
-     function usage(    e1, e2)
-     {
-         e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
-         e2 = "usage: cut [-c list] [files...]"
-         print e1 > "/dev/stderr"
-         print e2 > "/dev/stderr"
-         exit 1
-     }
+   * The "at" symbol (`@') is special in Texinfo, much as the backslash
+     (`\') is in C or `awk'.  Literal `@' symbols are represented in
+     Texinfo source files as `@@'.
 
-The variables `e1' and `e2' are used so that the function fits nicely
-on the screen.
+   * Comments start with either address@hidden' or address@hidden'.  The
+     file-extraction program works by using special comments that start
+     at the beginning of a line.
 
-   Next comes a `BEGIN' rule that parses the command-line options.  It
-sets `FS' to a single TAB character, because that is `cut''s default
-field separator. The rule then sets the output field separator to be the
-same as the input field separator.  A loop using `getopt()' steps
-through the command-line options.  Exactly one of the variables
-`by_fields' or `by_chars' is set to true, to indicate that processing
-should be done by fields or by characters, respectively.  When cutting
-by characters, the output field separator is set to the null string:
+   * Lines containing address@hidden' and address@hidden group' commands 
bracket
+     example text that should not be split across a page boundary.
+     (Unfortunately, TeX isn't always smart enough to do things exactly
+     right, so we have to give it some help.)
 
-     BEGIN    \
-     {
-         FS = "\t"    # default
-         OFS = FS
-         while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) {
-             if (c == "f") {
-                 by_fields = 1
-                 fieldlist = Optarg
-             } else if (c == "c") {
-                 by_chars = 1
-                 fieldlist = Optarg
-                 OFS = ""
-             } else if (c == "d") {
-                 if (length(Optarg) > 1) {
-                     printf("Using first character of %s" \
-                            " for delimiter\n", Optarg) > "/dev/stderr"
-                     Optarg = substr(Optarg, 1, 1)
-                 }
-                 FS = Optarg
-                 OFS = FS
-                 if (FS == " ")    # defeat awk semantics
-                     FS = "[ ]"
-             } else if (c == "s")
-                 suppress++
-             else
-                 usage()
-         }
+   The following program, `extract.awk', reads through a Texinfo source
+file and does two things, based on the special comments.  Upon seeing
address@hidden system ...', it runs a command, by extracting the command text 
from
+the control line and passing it on to the `system()' function (*note
+I/O Functions::).  Upon seeing address@hidden file FILENAME', each subsequent 
line
+is sent to the file FILENAME, until address@hidden endfile' is encountered.  
The
+rules in `extract.awk' match either address@hidden' or address@hidden' by 
letting the
+`omment' part be optional.  Lines containing address@hidden' and 
address@hidden group'
+are simply removed.  `extract.awk' uses the `join()' library function
+(*note Join Function::).
 
-         # Clear out options
-         for (i = 1; i < Optind; i++)
-             ARGV[i] = ""
+   The example programs in the online Texinfo source for `GAWK:
+Effective AWK Programming' (`gawk.texi') have all been bracketed inside
+`file' and `endfile' lines.  The `gawk' distribution uses a copy of
+`extract.awk' to extract the sample programs and install many of them
+in a standard directory where `gawk' can find them.  The Texinfo file
+looks something like this:
 
-   The code must take special care when the field delimiter is a space.
-Using a single space (`" "') for the value of `FS' is incorrect--`awk'
-would separate fields with runs of spaces, TABs, and/or newlines, and
-we want them to be separated with individual spaces.  Also remember
-that after `getopt()' is through (as described in *note Getopt
-Function::), we have to clear out all the elements of `ARGV' from 1 to
-`Optind', so that `awk' does not try to process the command-line options
-as file names.
+     ...
+     This program has a @code{BEGIN} rule,
+     that prints a nice message:
 
-   After dealing with the command-line options, the program verifies
-that the options make sense.  Only one or the other of `-c' and `-f'
-should be used, and both require a field list.  Then the program calls
-either `set_fieldlist()' or `set_charlist()' to pull apart the list of
-fields or characters:
+     @example
+     @c file examples/messages.awk
+     BEGIN @{ print "Don't panic!" @}
+     @c end file
+     @end example
 
-         if (by_fields && by_chars)
-             usage()
+     It also prints some final advice:
 
-         if (by_fields == 0 && by_chars == 0)
-             by_fields = 1    # default
+     @example
+     @c file examples/messages.awk
+     END @{ print "Always avoid bored archeologists!" @}
+     @c end file
+     @end example
+     ...
 
-         if (fieldlist == "") {
-             print "cut: needs list for -c or -f" > "/dev/stderr"
-             exit 1
-         }
+   `extract.awk' begins by setting `IGNORECASE' to one, so that mixed
+upper- and lowercase letters in the directives won't matter.
 
-         if (by_fields)
-             set_fieldlist()
-         else
-             set_charlist()
-     }
+   The first rule handles calling `system()', checking that a command is
+given (`NF' is at least three) and also checking that the command exits
+with a zero exit status, signifying OK:
 
-   `set_fieldlist()' splits the field list apart at the commas into an
-array.  Then, for each element of the array, it looks to see if the
-element is actually a range, and if so, splits it apart.  The function
-checks the range to make sure that the first number is smaller than the
-second.  Each number in the list is added to the `flist' array, which
-simply lists the fields that will be printed.  Normal field splitting
-is used.  The program lets `awk' handle the job of doing the field
-splitting:
+     # extract.awk --- extract files and run programs
+     #                 from texinfo files
 
-     function set_fieldlist(        n, m, i, j, k, f, g)
+     BEGIN    { IGNORECASE = 1 }
+
+     /address@hidden(omment)?[ \t]+system/    \
      {
-         n = split(fieldlist, f, ",")
-         j = 1    # index in flist
-         for (i = 1; i <= n; i++) {
-             if (index(f[i], "-") != 0) { # a range
-                 m = split(f[i], g, "-")
-                 if (m != 2 || g[1] >= g[2]) {
-                     printf("bad field list: %s\n",
-                                       f[i]) > "/dev/stderr"
-                     exit 1
-                 }
-                 for (k = g[1]; k <= g[2]; k++)
-                     flist[j++] = k
-             } else
-                 flist[j++] = f[i]
+         if (NF < 3) {
+             e = (FILENAME ":" FNR)
+             e = (e  ": badly formed `system' line")
+             print e > "/dev/stderr"
+             next
+         }
+         $1 = ""
+         $2 = ""
+         stat = system($0)
+         if (stat != 0) {
+             e = (FILENAME ":" FNR)
+             e = (e ": warning: system returned " stat)
+             print e > "/dev/stderr"
          }
-         nfields = j - 1
      }
 
-   The `set_charlist()' function is more complicated than
-`set_fieldlist()'.  The idea here is to use `gawk''s `FIELDWIDTHS'
-variable (*note Constant Size::), which describes constant-width input.
-When using a character list, that is exactly what we have.
+The variable `e' is used so that the rule fits nicely on the screen.
 
-   Setting up `FIELDWIDTHS' is more complicated than simply listing the
-fields that need to be printed.  We have to keep track of the fields to
-print and also the intervening characters that have to be skipped.  For
-example, suppose you wanted characters 1 through 8, 15, and 22 through
-35.  You would use `-c 1-8,15,22-35'.  The necessary value for
-`FIELDWIDTHS' is `"8 6 1 6 14"'.  This yields five fields, and the
-fields to print are `$1', `$3', and `$5'.  The intermediate fields are
-"filler", which is stuff in between the desired data.  `flist' lists
-the fields to print, and `t' tracks the complete field list, including
-filler fields:
+   The second rule handles moving data into files.  It verifies that a
+file name is given in the directive.  If the file named is not the
+current file, then the current file is closed.  Keeping the current file
+open until a new file is encountered allows the use of the `>'
+redirection for printing the contents, keeping open file management
+simple.
 
-     function set_charlist(    field, i, j, f, g, t,
-                               filler, last, len)
-     {
-         field = 1   # count total fields
-         n = split(fieldlist, f, ",")
-         j = 1       # index in flist
-         for (i = 1; i <= n; i++) {
-             if (index(f[i], "-") != 0) { # range
-                 m = split(f[i], g, "-")
-                 if (m != 2 || g[1] >= g[2]) {
-                     printf("bad character list: %s\n",
-                                    f[i]) > "/dev/stderr"
-                     exit 1
-                 }
-                 len = g[2] - g[1] + 1
-                 if (g[1] > 1)  # compute length of filler
-                     filler = g[1] - last - 1
-                 else
-                     filler = 0
-                 if (filler)
-                     t[field++] = filler
-                 t[field++] = len  # length of field
-                 last = g[2]
-                 flist[j++] = field - 1
-             } else {
-                 if (f[i] > 1)
-                     filler = f[i] - last - 1
-                 else
-                     filler = 0
-                 if (filler)
-                     t[field++] = filler
-                 t[field++] = 1
-                 last = f[i]
-                 flist[j++] = field - 1
-             }
-         }
-         FIELDWIDTHS = join(t, 1, field - 1)
-         nfields = j - 1
-     }
+   The `for' loop does the work.  It reads lines using `getline' (*note
+Getline::).  For an unexpected end of file, it calls the
+`unexpected_eof()' function.  If the line is an "endfile" line, then it
+breaks out of the loop.  If the line is an address@hidden' or address@hidden 
group'
+line, then it ignores it and goes on to the next line.  Similarly,
+comments within examples are also ignored.
 
-   Next is the rule that actually processes the data.  If the `-s'
-option is given, then `suppress' is true.  The first `if' statement
-makes sure that the input record does have the field separator.  If
-`cut' is processing fields, `suppress' is true, and the field separator
-character is not in the record, then the record is skipped.
+   Most of the work is in the following few lines.  If the line has no
+`@' symbols, the program can print it directly.  Otherwise, each
+leading `@' must be stripped off.  To remove the `@' symbols, the line
+is split into separate elements of the array `a', using the `split()'
+function (*note String Functions::).  The `@' symbol is used as the
+separator character.  Each element of `a' that is empty indicates two
+successive `@' symbols in the original line.  For each two empty
+elements (`@@' in the original file), we have to add a single `@'
+symbol back in.(1)
 
-   If the record is valid, then `gawk' has split the data into fields,
-either using the character in `FS' or using fixed-length fields and
-`FIELDWIDTHS'.  The loop goes through the list of fields that should be
-printed.  The corresponding field is printed if it contains data.  If
-the next field also has data, then the separator character is written
-out between the fields:
+   When the processing of the array is finished, `join()' is called
+with the value of `SUBSEP', to rejoin the pieces back into a single
+line.  That line is then printed to the output file:
 
+     /address@hidden(omment)?[ \t]+file/    \
      {
-         if (by_fields && suppress && index($0, FS) != 0)
+         if (NF != 3) {
+             e = (FILENAME ":" FNR ": badly formed `file' line")
+             print e > "/dev/stderr"
              next
+         }
+         if ($3 != curfile) {
+             if (curfile != "")
+                 close(curfile)
+             curfile = $3
+         }
 
-         for (i = 1; i <= nfields; i++) {
-             if ($flist[i] != "") {
-                 printf "%s", $flist[i]
-                 if (i < nfields && $flist[i+1] != "")
-                     printf "%s", OFS
+         for (;;) {
+             if ((getline line) <= 0)
+                 unexpected_eof()
+             if (line ~ /address@hidden(omment)?[ \t]+endfile/)
+                 break
+             else if (line ~ /^@(end[ \t]+)?group/)
+                 continue
+             else if (line ~ /address@hidden(omment+)?[ \t]+/)
+                 continue
+             if (index(line, "@") == 0) {
+                 print line > curfile
+                 continue
+             }
+             n = split(line, a, "@")
+             # if a[1] == "", means leading @,
+             # don't add one back in.
+             for (i = 2; i <= n; i++) {
+                 if (a[i] == "") { # was an @@
+                     a[i] = "@"
+                     if (a[i+1] == "")
+                         i++
+                 }
              }
+             print join(a, 1, n, SUBSEP) > curfile
          }
-         print ""
      }
 
-   This version of `cut' relies on `gawk''s `FIELDWIDTHS' variable to
-do the character-based cutting.  While it is possible in other `awk'
-implementations to use `substr()' (*note String Functions::), it is
-also extremely painful.  The `FIELDWIDTHS' variable supplies an elegant
-solution to the problem of picking the input line apart by characters.
-
-
-File: gawk.info,  Node: Egrep Program,  Next: Id Program,  Prev: Cut Program,  
Up: Clones
-
-13.2.2 Searching for Regular Expressions in Files
--------------------------------------------------
+   An important thing to note is the use of the `>' redirection.
+Output done with `>' only opens the file once; it stays open and
+subsequent output is appended to the file (*note Redirection::).  This
+makes it easy to mix program text and explanatory prose for the same
+sample source file (as has been done here!) without any hassle.  The
+file is only closed when a new data file name is encountered or at the
+end of the input file.
 
-The `egrep' utility searches files for patterns.  It uses regular
-expressions that are almost identical to those available in `awk'
-(*note Regexp::).  You invoke it as follows:
+   Finally, the function `unexpected_eof()' prints an appropriate error
+message and then exits.  The `END' rule handles the final cleanup,
+closing the open file:
 
-     egrep [ OPTIONS ] 'PATTERN' FILES ...
+     function unexpected_eof()
+     {
+         printf("%s:%d: unexpected EOF or error\n",
+             FILENAME, FNR) > "/dev/stderr"
+         exit 1
+     }
 
-   The PATTERN is a regular expression.  In typical usage, the regular
-expression is quoted to prevent the shell from expanding any of the
-special characters as file name wildcards.  Normally, `egrep' prints
-the lines that matched.  If multiple file names are provided on the
-command line, each output line is preceded by the name of the file and
-a colon.
+     END {
+         if (curfile)
+             close(curfile)
+     }
 
-   The options to `egrep' are as follows:
+   ---------- Footnotes ----------
 
-`-c'
-     Print out a count of the lines that matched the pattern, instead
-     of the lines themselves.
+   (1) This program was written before `gawk' had the `gensub()'
+function. Consider how you might use it to simplify the code.
 
-`-s'
-     Be silent.  No output is produced and the exit value indicates
-     whether the pattern was matched.
+
+File: gawk.info,  Node: Simple Sed,  Next: Igawk Program,  Prev: Extract 
Program,  Up: Miscellaneous Programs
 
-`-v'
-     Invert the sense of the test. `egrep' prints the lines that do
-     _not_ match the pattern and exits successfully if the pattern is
-     not matched.
+11.3.8 A Simple Stream Editor
+-----------------------------
 
-`-i'
-     Ignore case distinctions in both the pattern and the input data.
+The `sed' utility is a stream editor, a program that reads a stream of
+data, makes changes to it, and passes it on.  It is often used to make
+global changes to a large file or to a stream of data generated by a
+pipeline of commands.  While `sed' is a complicated program in its own
+right, its most common use is to perform global substitutions in the
+middle of a pipeline:
 
-`-l'
-     Only print (list) the names of the files that matched, not the
-     lines that matched.
+     command1 < orig.data | sed 's/old/new/g' | command2 > result
 
-`-e PATTERN'
-     Use PATTERN as the regexp to match.  The purpose of the `-e'
-     option is to allow patterns that start with a `-'.
+   Here, `s/old/new/g' tells `sed' to look for the regexp `old' on each
+input line and globally replace it with the text `new', i.e., all the
+occurrences on a line.  This is similar to `awk''s `gsub()' function
+(*note String Functions::).
 
-   This version uses the `getopt()' library function (*note Getopt
-Function::) and the file transition library program (*note Filetrans
-Function::).
+   The following program, `awksed.awk', accepts at least two
+command-line arguments: the pattern to look for and the text to replace
+it with. Any additional arguments are treated as data file names to
+process. If none are provided, the standard input is used:
 
-   The program begins with a descriptive comment and then a `BEGIN' rule
-that processes the command-line arguments with `getopt()'.  The `-i'
-(ignore case) option is particularly easy with `gawk'; we just use the
-`IGNORECASE' built-in variable (*note Built-in Variables::):
+     # awksed.awk --- do s/foo/bar/g using just print
+     #    Thanks to Michael Brennan for the idea
 
-     # egrep.awk --- simulate egrep in awk
-     #
-     # Options:
-     #    -c    count of lines
-     #    -s    silent - use exit value
-     #    -v    invert test, success if no match
-     #    -i    ignore case
-     #    -l    print filenames only
-     #    -e    argument is pattern
-     #
-     # Requires getopt and file transition library functions
+     function usage()
+     {
+         print "usage: awksed pat repl [files...]" > "/dev/stderr"
+         exit 1
+     }
 
      BEGIN {
-         while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
-             if (c == "c")
-                 count_only++
-             else if (c == "s")
-                 no_print++
-             else if (c == "v")
-                 invert++
-             else if (c == "i")
-                 IGNORECASE = 1
-             else if (c == "l")
-                 filenames_only++
-             else if (c == "e")
-                 pattern = Optarg
-             else
-                 usage()
-         }
-
-   Next comes the code that handles the `egrep'-specific behavior. If no
-pattern is supplied with `-e', the first nonoption on the command line
-is used.  The `awk' command-line arguments up to `ARGV[Optind]' are
-cleared, so that `awk' won't try to process them as files.  If no files
-are specified, the standard input is used, and if multiple files are
-specified, we make sure to note this so that the file names can precede
-the matched lines in the output:
+         # validate arguments
+         if (ARGC < 3)
+             usage()
 
-         if (pattern == "")
-             pattern = ARGV[Optind++]
+         RS = ARGV[1]
+         ORS = ARGV[2]
 
-         for (i = 1; i < Optind; i++)
-             ARGV[i] = ""
-         if (Optind >= ARGC) {
-             ARGV[1] = "-"
-             ARGC = 2
-         } else if (ARGC - Optind > 1)
-             do_filenames++
+         # don't use arguments as files
+         ARGV[1] = ARGV[2] = ""
+     }
 
-     #    if (IGNORECASE)
-     #        pattern = tolower(pattern)
+     # look ma, no hands!
+     {
+         if (RT == "")
+             printf "%s", $0
+         else
+             print
      }
 
-   The last two lines are commented out, since they are not needed in
-`gawk'.  They should be uncommented if you have to use another version
-of `awk'.
+   The program relies on `gawk''s ability to have `RS' be a regexp, as
+well as on the setting of `RT' to the actual text that terminates the
+record (*note Records::).
 
-   The next set of lines should be uncommented if you are not using
-`gawk'.  This rule translates all the characters in the input line into
-lowercase if the `-i' option is specified.(1) The rule is commented out
-since it is not necessary with `gawk':
+   The idea is to have `RS' be the pattern to look for. `gawk'
+automatically sets `$0' to the text between matches of the pattern.
+This is text that we want to keep, unmodified.  Then, by setting `ORS'
+to the replacement text, a simple `print' statement outputs the text we
+want to keep, followed by the replacement text.
 
-     #{
-     #    if (IGNORECASE)
-     #        $0 = tolower($0)
-     #}
+   There is one wrinkle to this scheme, which is what to do if the last
+record doesn't end with text that matches `RS'.  Using a `print'
+statement unconditionally prints the replacement text, which is not
+correct.  However, if the file did not end in text that matches `RS',
+`RT' is set to the null string.  In this case, we can print `$0' using
+`printf' (*note Printf::).
 
-   The `beginfile()' function is called by the rule in `ftrans.awk'
-when each new file is processed.  In this case, it is very simple; all
-it does is initialize a variable `fcount' to zero. `fcount' tracks how
-many lines in the current file matched the pattern.  Naming the
-parameter `junk' shows we know that `beginfile()' is called with a
-parameter, but that we're not interested in its value:
+   The `BEGIN' rule handles the setup, checking for the right number of
+arguments and calling `usage()' if there is a problem. Then it sets
+`RS' and `ORS' from the command-line arguments and sets `ARGV[1]' and
+`ARGV[2]' to the null string, so that they are not treated as file names
+(*note ARGC and ARGV::).
 
-     function beginfile(junk)
-     {
-         fcount = 0
-     }
+   The `usage()' function prints an error message and exits.  Finally,
+the single rule handles the printing scheme outlined above, using
+`print' or `printf' as appropriate, depending upon the value of `RT'.
 
-   The `endfile()' function is called after each file has been
-processed.  It affects the output only when the user wants a count of
-the number of lines that matched.  `no_print' is true only if the exit
-status is desired.  `count_only' is true if line counts are desired.
-`egrep' therefore only prints line counts if printing and counting are
-enabled.  The output format must be adjusted depending upon the number
-of files to process.  Finally, `fcount' is added to `total', so that we
-know the total number of lines that matched the pattern:
+
+File: gawk.info,  Node: Igawk Program,  Next: Anagram Program,  Prev: Simple 
Sed,  Up: Miscellaneous Programs
 
-     function endfile(file)
-     {
-         if (! no_print && count_only) {
-             if (do_filenames)
-                 print file ":" fcount
-             else
-                 print fcount
-         }
+11.3.9 An Easy Way to Use Library Functions
+-------------------------------------------
 
-         total += fcount
+In *note Include Files::, we saw how `gawk' provides a built-in
+file-inclusion capability.  However, this is a `gawk' extension.  This
+minor node provides the motivation for making file inclusion available
+for standard `awk', and shows how to do it using a combination of shell
+and `awk' programming.
+
+   Using library functions in `awk' can be very beneficial. It
+encourages code reuse and the writing of general functions. Programs are
+smaller and therefore clearer.  However, using library functions is
+only easy when writing `awk' programs; it is painful when running them,
+requiring multiple `-f' options.  If `gawk' is unavailable, then so too
+is the `AWKPATH' environment variable and the ability to put `awk'
+functions into a library directory (*note Options::).  It would be nice
+to be able to write programs in the following manner:
+
+     # library functions
+     @include getopt.awk
+     @include join.awk
+     ...
+
+     # main program
+     BEGIN {
+         while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
+             ...
+         ...
      }
 
-   The following rule does most of the work of matching lines. The
-variable `matches' is true if the line matched the pattern. If the user
-wants lines that did not match, the sense of `matches' is inverted
-using the `!' operator. `fcount' is incremented with the value of
-`matches', which is either one or zero, depending upon a successful or
-unsuccessful match.  If the line does not match, the `next' statement
-just moves on to the next record.
+   The following program, `igawk.sh', provides this service.  It
+simulates `gawk''s searching of the `AWKPATH' variable and also allows
+"nested" includes; i.e., a file that is included with address@hidden' can
+contain further address@hidden' statements.  `igawk' makes an effort to only
+include files once, so that nested includes don't accidentally include
+a library function twice.
 
-   A number of additional tests are made, but they are only done if we
-are not counting lines.  First, if the user only wants exit status
-(`no_print' is true), then it is enough to know that _one_ line in this
-file matched, and we can skip on to the next file with `nextfile'.
-Similarly, if we are only printing file names, we can print the file
-name, and then skip to the next file with `nextfile'.  Finally, each
-line is printed, with a leading file name and colon if necessary:
+   `igawk' should behave just like `gawk' externally.  This means it
+should accept all of `gawk''s command-line arguments, including the
+ability to have multiple source files specified via `-f', and the
+ability to mix command-line and library source files.
 
-     {
-         matches = ($0 ~ pattern)
-         if (invert)
-             matches = ! matches
+   The program is written using the POSIX Shell (`sh') command
+language.(1) It works as follows:
 
-         fcount += matches    # 1 or 0
+  1. Loop through the arguments, saving anything that doesn't represent
+     `awk' source code for later, when the expanded program is run.
 
-         if (! matches)
-             next
+  2. For any arguments that do represent `awk' text, put the arguments
+     into a shell variable that will be expanded.  There are two cases:
 
-         if (! count_only) {
-             if (no_print)
-                 nextfile
+       a. Literal text, provided with `--source' or `--source='.  This
+          text is just appended directly.
 
-             if (filenames_only) {
-                 print FILENAME
-                 nextfile
-             }
+       b. Source file names, provided with `-f'.  We use a neat trick
+          and append address@hidden FILENAME' to the shell variable's
+          contents.  Since the file-inclusion program works the way
+          `gawk' does, this gets the text of the file included into the
+          program at the correct point.
 
-             if (do_filenames)
-                 print FILENAME ":" $0
-             else
-                 print
-         }
-     }
+  3. Run an `awk' program (naturally) over the shell variable's
+     contents to expand address@hidden' statements.  The expanded program is
+     placed in a second shell variable.
 
-   The `END' rule takes care of producing the correct exit status. If
-there are no matches, the exit status is one; otherwise it is zero:
+  4. Run the expanded program with `gawk' and any other original
+     command-line arguments that the user supplied (such as the data
+     file names).
 
-     END    \
-     {
-         if (total == 0)
-             exit 1
-         exit 0
-     }
+   This program uses shell variables extensively: for storing
+command-line arguments, the text of the `awk' program that will expand
+the user's program, for the user's original program, and for the
+expanded program.  Doing so removes some potential problems that might
+arise were we to use temporary files instead, at the cost of making the
+script somewhat more complicated.
 
-   The `usage()' function prints a usage message in case of invalid
-options, and then exits:
+   The initial part of the program turns on shell tracing if the first
+argument is `debug'.
 
-     function usage(    e)
-     {
-         e = "Usage: egrep [-csvil] [-e pat] [files ...]"
-         e = e "\n\tegrep [-csvil] pat [files ...]"
-         print e > "/dev/stderr"
-         exit 1
-     }
+   The next part loops through all the command-line arguments.  There
+are several cases of interest:
 
-   The variable `e' is used so that the function fits nicely on the
-printed page.
+`--'
+     This ends the arguments to `igawk'.  Anything else should be
+     passed on to the user's `awk' program without being evaluated.
 
-   Just a note on programming style: you may have noticed that the `END'
-rule uses backslash continuation, with the open brace on a line by
-itself.  This is so that it more closely resembles the way functions
-are written.  Many of the examples in this major node use this style.
-You can decide for yourself if you like writing your `BEGIN' and `END'
-rules this way or not.
+`-W'
+     This indicates that the next option is specific to `gawk'.  To make
+     argument processing easier, the `-W' is appended to the front of
+     the remaining arguments and the loop continues.  (This is an `sh'
+     programming trick.  Don't worry about it if you are not familiar
+     with `sh'.)
 
-   ---------- Footnotes ----------
+`-v, -F'
+     These are saved and passed on to `gawk'.
 
-   (1) It also introduces a subtle bug; if a match happens, we output
-the translated line, not the original.
+`-f, --file, --file=, -Wfile='
+     The file name is appended to the shell variable `program' with an
+     address@hidden' statement.  The `expr' utility is used to remove the
+     leading option part of the argument (e.g., `--file=').  (Typical
+     `sh' usage would be to use the `echo' and `sed' utilities to do
+     this work.  Unfortunately, some versions of `echo' evaluate escape
+     sequences in their arguments, possibly mangling the program text.
+     Using `expr' avoids this problem.)
 
-
-File: gawk.info,  Node: Id Program,  Next: Split Program,  Prev: Egrep 
Program,  Up: Clones
+`--source, --source=, -Wsource='
+     The source text is appended to `program'.
 
-13.2.3 Printing out User Information
-------------------------------------
+`--version, -Wversion'
+     `igawk' prints its version number, runs `gawk --version' to get
+     the `gawk' version information, and then exits.
 
-The `id' utility lists a user's real and effective user ID numbers,
-real and effective group ID numbers, and the user's group set, if any.
-`id' only prints the effective user ID and group ID if they are
-different from the real ones.  If possible, `id' also supplies the
-corresponding user and group names.  The output might look like this:
+   If none of the `-f', `--file', `-Wfile', `--source', or `-Wsource'
+arguments are supplied, then the first nonoption argument should be the
+`awk' program.  If there are no command-line arguments left, `igawk'
+prints an error message and exits.  Otherwise, the first argument is
+appended to `program'.  In any case, after the arguments have been
+processed, `program' contains the complete text of the original `awk'
+program.
 
-     $ id
-     -| uid=500(arnold) gid=500(arnold) groups=6(disk),7(lp),19(floppy)
+   The program is as follows:
 
-   This information is part of what is provided by `gawk''s `PROCINFO'
-array (*note Built-in Variables::).  However, the `id' utility provides
-a more palatable output than just individual numbers.
+     #! /bin/sh
+     # igawk --- like gawk but do @include processing
 
-   Here is a simple version of `id' written in `awk'.  It uses the user
-database library functions (*note Passwd Functions::) and the group
-database library functions (*note Group Functions::):
+     if [ "$1" = debug ]
+     then
+         set -x
+         shift
+     fi
 
-   The program is fairly straightforward.  All the work is done in the
-`BEGIN' rule.  The user and group ID numbers are obtained from
-`PROCINFO'.  The code is repetitive.  The entry in the user database
-for the real user ID number is split into parts at the `:'. The name is
-the first field.  Similar code is used for the effective user ID number
-and the group numbers:
+     # A literal newline, so that program text is formatted correctly
+     n='
+     '
 
-     # id.awk --- implement id in awk
-     #
-     # Requires user and group library functions
-     # output is:
-     # uid=12(foo) euid=34(bar) gid=3(baz) \
-     #             egid=5(blat) groups=9(nine),2(two),1(one)
+     # Initialize variables to empty
+     program=
+     opts=
 
-     BEGIN    \
-     {
-         uid = PROCINFO["uid"]
-         euid = PROCINFO["euid"]
-         gid = PROCINFO["gid"]
-         egid = PROCINFO["egid"]
+     while [ $# -ne 0 ] # loop over arguments
+     do
+         case $1 in
+         --)     shift
+                 break ;;
 
-         printf("uid=%d", uid)
-         pw = getpwuid(uid)
-         if (pw != "") {
-             split(pw, a, ":")
-             printf("(%s)", a[1])
-         }
+         -W)     shift
+                 # The ${x?'message here'} construct prints a
+                 # diagnostic if $x is the null string
+                 set -- -W"address@hidden'missing operand'}"
+                 continue ;;
 
-         if (euid != uid) {
-             printf(" euid=%d", euid)
-             pw = getpwuid(euid)
-             if (pw != "") {
-                 split(pw, a, ":")
-                 printf("(%s)", a[1])
-             }
-         }
+         -[vF])  opts="$opts $1 '${2?'missing operand'}'"
+                 shift ;;
 
-         printf(" gid=%d", gid)
-         pw = getgrgid(gid)
-         if (pw != "") {
-             split(pw, a, ":")
-             printf("(%s)", a[1])
-         }
+         -[vF]*) opts="$opts '$1'" ;;
 
-         if (egid != gid) {
-             printf(" egid=%d", egid)
-             pw = getgrgid(egid)
-             if (pw != "") {
-                 split(pw, a, ":")
-                 printf("(%s)", a[1])
-             }
-         }
+         -f)     program="address@hidden ${2?'missing operand'}"
+                 shift ;;
 
-         for (i = 1; ("group" i) in PROCINFO; i++) {
-             if (i == 1)
-                 printf(" groups=")
-             group = PROCINFO["group" i]
-             printf("%d", group)
-             pw = getgrgid(group)
-             if (pw != "") {
-                 split(pw, a, ":")
-                 printf("(%s)", a[1])
-             }
-             if (("group" (i+1)) in PROCINFO)
-                 printf(",")
-         }
+         -f*)    f=$(expr "$1" : '-f\(.*\)')
+                 program="address@hidden $f" ;;
 
-         print ""
-     }
+         -[W-]file=*)
+                 f=$(expr "$1" : '-.file=\(.*\)')
+                 program="address@hidden $f" ;;
 
-   The test in the `for' loop is worth noting.  Any supplementary
-groups in the `PROCINFO' array have the indices `"group1"' through
-`"groupN"' for some N, i.e., the total number of supplementary groups.
-However, we don't know in advance how many of these groups there are.
+         -[W-]file)
+                 program="address@hidden ${2?'missing operand'}"
+                 shift ;;
 
-   This loop works by starting at one, concatenating the value with
-`"group"', and then using `in' to see if that value is in the array.
-Eventually, `i' is incremented past the last group in the array and the
-loop exits.
+         -[W-]source=*)
+                 t=$(expr "$1" : '-.source=\(.*\)')
+                 program="$program$n$t" ;;
 
-   The loop is also correct if there are _no_ supplementary groups;
-then the condition is false the first time it's tested, and the loop
-body never executes.
+         -[W-]source)
+                 program="$program$n${2?'missing operand'}"
+                 shift ;;
 
-
-File: gawk.info,  Node: Split Program,  Next: Tee Program,  Prev: Id Program,  
Up: Clones
+         -[W-]version)
+                 echo igawk: version 3.0 1>&2
+                 gawk --version
+                 exit 0 ;;
 
-13.2.4 Splitting a Large File into Pieces
------------------------------------------
+         -[W-]*) opts="$opts '$1'" ;;
 
-The `split' program splits large text files into smaller pieces.  Usage
-is as follows:(1)
+         *)      break ;;
+         esac
+         shift
+     done
 
-     split [-COUNT] file [ PREFIX ]
+     if [ -z "$program" ]
+     then
+          program=${1?'missing program'}
+          shift
+     fi
 
-   By default, the output files are named `xaa', `xab', and so on. Each
-file has 1000 lines in it, with the likely exception of the last file.
-To change the number of lines in each file, supply a number on the
-command line preceded with a minus; e.g., `-500' for files with 500
-lines in them instead of 1000.  To change the name of the output files
-to something like `myfileaa', `myfileab', and so on, supply an
-additional argument that specifies the file name prefix.
+     # At this point, `program' has the program.
 
-   Here is a version of `split' in `awk'. It uses the `ord()' and
-`chr()' functions presented in *note Ordinal Functions::.
+   The `awk' program to process address@hidden' directives is stored in the
+shell variable `expand_prog'.  Doing this keeps the shell script
+readable.  The `awk' program reads through the user's program, one line
+at a time, using `getline' (*note Getline::).  The input file names and
address@hidden' statements are managed using a stack.  As each address@hidden' 
is
+encountered, the current file name is "pushed" onto the stack and the
+file named in the address@hidden' directive becomes the current file name.
+As each file is finished, the stack is "popped," and the previous input
+file becomes the current input file again.  The process is started by
+making the original file the first one on the stack.
 
-   The program first sets its defaults, and then tests to make sure
-there are not too many arguments.  It then looks at each argument in
-turn.  The first argument could be a minus sign followed by a number.
-If it is, this happens to look like a negative number, so it is made
-positive, and that is the count of lines.  The data file name is
-skipped over and the final argument is used as the prefix for the
-output file names:
+   The `pathto()' function does the work of finding the full path to a
+file.  It simulates `gawk''s behavior when searching the `AWKPATH'
+environment variable (*note AWKPATH Variable::).  If a file name has a
+`/' in it, no path search is done.  Similarly, if the file name is
+`"-"', then that string is used as-is.  Otherwise, the file name is
+concatenated with the name of each directory in the path, and an
+attempt is made to open the generated file name.  The only way to test
+if a file can be read in `awk' is to go ahead and try to read it with
+`getline'; this is what `pathto()' does.(2) If the file can be read, it
+is closed and the file name is returned:
 
-     # split.awk --- do split in awk
-     #
-     # Requires ord() and chr() library functions
-     # usage: split [-num] [file] [outname]
+     expand_prog='
+
+     function pathto(file,    i, t, junk)
+     {
+         if (index(file, "/") != 0)
+             return file
 
-     BEGIN {
-         outfile = "x"    # default
-         count = 1000
-         if (ARGC > 4)
-             usage()
+         if (file == "-")
+             return file
 
-         i = 1
-         if (ARGV[i] ~ /^-[[:digit:]]+$/) {
-             count = -ARGV[i]
-             ARGV[i] = ""
-             i++
+         for (i = 1; i <= ndirs; i++) {
+             t = (pathlist[i] "/" file)
+             if ((getline junk < t) > 0) {
+                 # found it
+                 close(t)
+                 return t
+             }
          }
-         # test argv in case reading from stdin instead of file
-         if (i in ARGV)
-             i++    # skip data file name
-         if (i in ARGV) {
-             outfile = ARGV[i]
-             ARGV[i] = ""
+         return ""
+     }
+
+   The main program is contained inside one `BEGIN' rule.  The first
+thing it does is set up the `pathlist' array that `pathto()' uses.
+After splitting the path on `:', null elements are replaced with `"."',
+which represents the current directory:
+
+     BEGIN {
+         path = ENVIRON["AWKPATH"]
+         ndirs = split(path, pathlist, ":")
+         for (i = 1; i <= ndirs; i++) {
+             if (pathlist[i] == "")
+                 pathlist[i] = "."
          }
 
-         s1 = s2 = "a"
-         out = (outfile s1 s2)
-     }
+   The stack is initialized with `ARGV[1]', which will be `/dev/stdin'.
+The main loop comes next.  Input lines are read in succession. Lines
+that do not start with address@hidden' are printed verbatim.  If the line
+does start with address@hidden', the file name is in `$2'.  `pathto()' is
+called to generate the full path.  If it cannot, then the program
+prints an error message and continues.
 
-   The next rule does most of the work. `tcount' (temporary count)
-tracks how many lines have been printed to the output file so far. If
-it is greater than `count', it is time to close the current file and
-start a new one.  `s1' and `s2' track the current suffixes for the file
-name. If they are both `z', the file is just too big.  Otherwise, `s1'
-moves to the next letter in the alphabet and `s2' starts over again at
-`a':
+   The next thing to check is if the file is included already.  The
+`processed' array is indexed by the full file name of each included
+file and it tracks this information for us.  If the file is seen again,
+a warning message is printed. Otherwise, the new file name is pushed
+onto the stack and processing continues.
 
-     {
-         if (++tcount > count) {
-             close(out)
-             if (s2 == "z") {
-                 if (s1 == "z") {
-                     printf("split: %s is too large to split\n",
-                            FILENAME) > "/dev/stderr"
-                     exit 1
+   Finally, when `getline' encounters the end of the input file, the
+file is closed and the stack is popped.  When `stackptr' is less than
+zero, the program is done:
+
+         stackptr = 0
+         input[stackptr] = ARGV[1] # ARGV[1] is first file
+
+         for (; stackptr >= 0; stackptr--) {
+             while ((getline < input[stackptr]) > 0) {
+                 if (tolower($1) != "@include") {
+                     print
+                     continue
                  }
-                 s1 = chr(ord(s1) + 1)
-                 s2 = "a"
+                 fpath = pathto($2)
+                 if (fpath == "") {
+                     printf("igawk:%s:%d: cannot find %s\n",
+                         input[stackptr], FNR, $2) > "/dev/stderr"
+                     continue
+                 }
+                 if (! (fpath in processed)) {
+                     processed[fpath] = input[stackptr]
+                     input[++stackptr] = fpath  # push onto stack
+                 } else
+                     print $2, "included in", input[stackptr],
+                         "already included in",
+                         processed[fpath] > "/dev/stderr"
              }
-             else
-                 s2 = chr(ord(s2) + 1)
-             out = (outfile s1 s2)
-             tcount = 1
+             close(input[stackptr])
          }
-         print > out
-     }
+     }'  # close quote ends `expand_prog' variable
 
-The `usage()' function simply prints an error message and exits:
+     processed_program=$(gawk -- "$expand_prog" /dev/stdin << EOF
+     $program
+     EOF
+     )
 
-     function usage(   e)
-     {
-         e = "usage: split [-num] [file] [outname]"
-         print e > "/dev/stderr"
-         exit 1
-     }
+   The shell construct `COMMAND << MARKER' is called a "here document".
+Everything in the shell script up to the MARKER is fed to COMMAND as
+input.  The shell processes the contents of the here document for
+variable and command substitution (and possibly other things as well,
+depending upon the shell).
 
-The variable `e' is used so that the function fits nicely on the screen.
+   The shell construct `$(...)' is called "command substitution".  The
+output of the command inside the parentheses is substituted into the
+command line.  Because the result is used in a variable assignment, it
+is saved as a single string, even if the results contain whitespace.
 
-   This program is a bit sloppy; it relies on `awk' to automatically
-close the last file instead of doing it in an `END' rule.  It also
-assumes that letters are contiguous in the character set, which isn't
-true for EBCDIC systems.
+   The expanded program is saved in the variable `processed_program'.
+It's done in these steps:
 
-   ---------- Footnotes ----------
+  1. Run `gawk' with the address@hidden'-processing program (the value of
+     the `expand_prog' shell variable) on standard input.
 
-   (1) This is the traditional usage. The POSIX usage is different, but
-not relevant for what the program aims to demonstrate.
+  2. Standard input is the contents of the user's program, from the
+     shell variable `program'.  Its contents are fed to `gawk' via a
+     here document.
 
-
-File: gawk.info,  Node: Tee Program,  Next: Uniq Program,  Prev: Split 
Program,  Up: Clones
+  3. The results of this processing are saved in the shell variable
+     `processed_program' by using command substitution.
 
-13.2.5 Duplicating Output into Multiple Files
----------------------------------------------
+   The last step is to call `gawk' with the expanded program, along
+with the original options and command-line arguments that the user
+supplied.
 
-The `tee' program is known as a "pipe fitting."  `tee' copies its
-standard input to its standard output and also duplicates it to the
-files named on the command line.  Its usage is as follows:
+     eval gawk $opts -- '"$processed_program"' '"$@"'
 
-     tee [-a] file ...
+   The `eval' command is a shell construct that reruns the shell's
+parsing process.  This keeps things properly quoted.
 
-   The `-a' option tells `tee' to append to the named files, instead of
-truncating them and starting over.
+   This version of `igawk' represents my fifth version of this program.
+There are four key simplifications that make the program work better:
 
-   The `BEGIN' rule first makes a copy of all the command-line arguments
-into an array named `copy'.  `ARGV[0]' is not copied, since it is not
-needed.  `tee' cannot use `ARGV' directly, since `awk' attempts to
-process each file name in `ARGV' as input data.
+   * Using address@hidden' even for the files named with `-f' makes building
+     the initial collected `awk' program much simpler; all the
+     address@hidden' processing can be done once.
 
-   If the first argument is `-a', then the flag variable `append' is
-set to true, and both `ARGV[1]' and `copy[1]' are deleted. If `ARGC' is
-less than two, then no file names were supplied and `tee' prints a
-usage message and exits.  Finally, `awk' is forced to read the standard
-input by setting `ARGV[1]' to `"-"' and `ARGC' to two:
+   * Not trying to save the line read with `getline' in the `pathto()'
+     function when testing for the file's accessibility for use with
+     the main program simplifies things considerably.
 
-     # tee.awk --- tee in awk
-     #
-     # Copy standard input to all named output files.
-     # Append content if -a option is supplied.
-     #
-     BEGIN    \
-     {
-         for (i = 1; i < ARGC; i++)
-             copy[i] = ARGV[i]
+   * Using a `getline' loop in the `BEGIN' rule does it all in one
+     place.  It is not necessary to call out to a separate loop for
+     processing nested address@hidden' statements.
 
-         if (ARGV[1] == "-a") {
-             append = 1
-             delete ARGV[1]
-             delete copy[1]
-             ARGC--
-         }
-         if (ARGC < 2) {
-             print "usage: tee [-a] file ..." > "/dev/stderr"
-             exit 1
-         }
-         ARGV[1] = "-"
-         ARGC = 2
-     }
+   * Instead of saving the expanded program in a temporary file,
+     putting it in a shell variable avoids some potential security
+     problems.  This has the disadvantage that the script relies upon
+     more features of the `sh' language, making it harder to follow for
+     those who aren't familiar with `sh'.
 
-   The following single rule does all the work.  Since there is no
-pattern, it is executed for each line of input.  The body of the rule
-simply prints the line into each file on the command line, and then to
-the standard output:
+   Also, this program illustrates that it is often worthwhile to combine
+`sh' and `awk' programming together.  You can usually accomplish quite
+a lot, without having to resort to low-level programming in C or C++,
+and it is frequently easier to do certain kinds of string and argument
+manipulation using the shell than it is in `awk'.
 
-     {
-         # moving the if outside the loop makes it run faster
-         if (append)
-             for (i in copy)
-                 print >> copy[i]
-         else
-             for (i in copy)
-                 print > copy[i]
-         print
-     }
+   Finally, `igawk' shows that it is not always necessary to add new
+features to a program; they can often be layered on top.
 
-It is also possible to write the loop this way:
+   As an additional example of this, consider the idea of having two
+files in a directory in the search path:
 
-     for (i in copy)
-         if (append)
-             print >> copy[i]
-         else
-             print > copy[i]
+`default.awk'
+     This file contains a set of default library functions, such as
+     `getopt()' and `assert()'.
 
-This is more concise but it is also less efficient.  The `if' is tested
-for each record and for each output file.  By duplicating the loop
-body, the `if' is only tested once for each input record.  If there are
-N input records and M output files, the first method only executes N
-`if' statements, while the second executes N`*'M `if' statements.
+`site.awk'
+     This file contains library functions that are specific to a site or
+     installation; i.e., locally developed functions.  Having a
+     separate file allows `default.awk' to change with new `gawk'
+     releases, without requiring the system administrator to update it
+     each time by adding the local functions.
 
-   Finally, the `END' rule cleans up by closing all the output files:
+   One user suggested that `gawk' be modified to automatically read
+these files upon startup.  Instead, it would be very simple to modify
+`igawk' to do this. Since `igawk' can process nested address@hidden'
+directives, `default.awk' could simply contain address@hidden' statements
+for the desired library functions.
+
+   ---------- Footnotes ----------
+
+   (1) Fully explaining the `sh' language is beyond the scope of this
+book. We provide some minimal explanations, but see a good shell
+programming book if you wish to understand things in more depth.
 
-     END    \
-     {
-         for (i in copy)
-             close(copy[i])
-     }
+   (2) On some very old versions of `awk', the test `getline junk < t'
+can loop forever if the file exists but is empty.  Caveat emptor.
 
 
-File: gawk.info,  Node: Uniq Program,  Next: Wc Program,  Prev: Tee Program,  
Up: Clones
-
-13.2.6 Printing Nonduplicated Lines of Text
--------------------------------------------
+File: gawk.info,  Node: Anagram Program,  Next: Signature Program,  Prev: 
Igawk Program,  Up: Miscellaneous Programs
 
-The `uniq' utility reads sorted lines of data on its standard input,
-and by default removes duplicate lines.  In other words, it only prints
-unique lines--hence the name.  `uniq' has a number of options. The
-usage is as follows:
+11.3.10 Finding Anagrams From A Dictionary
+------------------------------------------
 
-     uniq [-udc [-N]] [+N] [ INPUT FILE [ OUTPUT FILE ]]
+An interesting programming challenge is to search for "anagrams" in a
+word list (such as `/usr/share/dict/words' on many GNU/Linux systems).
+One word is an anagram of another if both words contain the same letters
+(for example, "babbling" and "blabbing").
 
-   The options for `uniq' are:
+   An elegant algorithm is presented in Column 2, Problem C of Jon
+Bentley's `Programming Pearls', second edition.  The idea is to give
+words that are anagrams a common signature, sort all the words together
+by their signature, and then print them.  Dr. Bentley observes that
+taking the letters in each word and sorting them produces that common
+signature.
 
-`-d'
-     Print only repeated lines.
+   The following program uses arrays of arrays to bring together words
+with the same signature and array sorting to print the words in sorted
+order.
 
-`-u'
-     Print only nonrepeated lines.
+     # anagram.awk --- An implementation of the anagram finding algorithm
+     #                 from Jon Bentley's "Programming Pearls", 2nd edition.
+     #                 Addison Wesley, 2000, ISBN 0-201-65788-0.
+     #                 Column 2, Problem C, section 2.8, pp 18-20.
 
-`-c'
-     Count lines. This option overrides `-d' and `-u'.  Both repeated
-     and nonrepeated lines are counted.
+     /'s$/   { next }        # Skip possessives
 
-`-N'
-     Skip N fields before comparing lines.  The definition of fields is
-     similar to `awk''s default: nonwhitespace characters separated by
-     runs of spaces and/or TABs.
+   The program starts with a header, and then a rule to skip
+possessives in the dictionary file. The next rule builds up the data
+structure. The first dimension of the array is indexed by the
+signature; the second dimension is the word itself:
 
-`+N'
-     Skip N characters before comparing lines.  Any fields specified
-     with `-N' are skipped first.
+     {
+         key = word2key($1)  # Build signature
+         data[key][$1] = $1  # Store word with signature
+     }
 
-`INPUT FILE'
-     Data is read from the input file named on the command line,
-     instead of from the standard input.
+   The `word2key()' function creates the signature.  It splits the word
+apart into individual letters, sorts the letters, and then joins them
+back together:
 
-`OUTPUT FILE'
-     The generated output is sent to the named output file, instead of
-     to the standard output.
+     # word2key --- split word apart into letters, sort, joining back together
 
-   Normally `uniq' behaves as if both the `-d' and `-u' options are
-provided.
+     function word2key(word,     a, i, n, result)
+     {
+         n = split(word, a, "")
+         asort(a)
 
-   `uniq' uses the `getopt()' library function (*note Getopt Function::)
-and the `join()' library function (*note Join Function::).
+         for (i = 1; i <= n; i++)
+             result = result a[i]
 
-   The program begins with a `usage()' function and then a brief
-outline of the options and their meanings in comments.  The `BEGIN'
-rule deals with the command-line arguments and options. It uses a trick
-to get `getopt()' to handle options of the form `-25', treating such an
-option as the option letter `2' with an argument of `5'. If indeed two
-or more digits are supplied (`Optarg' looks like a number), `Optarg' is
-concatenated with the option digit and then the result is added to zero
-to make it into a number.  If there is only one digit in the option,
-then `Optarg' is not needed. In this case, `Optind' must be decremented
-so that `getopt()' processes it next time.  This code is admittedly a
-bit tricky.
+         return result
+     }
 
-   If no options are supplied, then the default is taken, to print both
-repeated and nonrepeated lines.  The output file, if provided, is
-assigned to `outputfile'.  Early on, `outputfile' is initialized to the
-standard output, `/dev/stdout':
+   Finally, the `END' rule traverses the array and prints out the
+anagram lists.  It sends the output to the system `sort' command, since
+otherwise the anagrams would appear in arbitrary order:
 
-     # uniq.awk --- do uniq in awk
-     #
-     # Requires getopt() and join() library functions
+     END {
+         sort = "sort"
+         for (key in data) {
+             # Sort words with same key
+             nwords = asorti(data[key], words)
+             if (nwords == 1)
+                 continue
 
-     function usage(    e)
-     {
-         e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
-         print e > "/dev/stderr"
-         exit 1
+             # And print. Minor glitch: trailing space at end of each line
+             for (j = 1; j <= nwords; j++)
+                 printf("%s ", words[j]) | sort
+             print "" | sort
+         }
+         close(sort)
      }
 
-     # -c    count lines. overrides -d and -u
-     # -d    only repeated lines
-     # -u    only nonrepeated lines
-     # -n    skip n fields
-     # +n    skip n characters, skip fields first
+   Here is some partial output when the program is run:
 
-     BEGIN   \
-     {
-         count = 1
-         outputfile = "/dev/stdout"
-         opts = "udc0:1:2:3:4:5:6:7:8:9:"
-         while ((c = getopt(ARGC, ARGV, opts)) != -1) {
-             if (c == "u")
-                 non_repeated_only++
-             else if (c == "d")
-                 repeated_only++
-             else if (c == "c")
-                 do_count++
-             else if (index("0123456789", c) != 0) {
-                 # getopt requires args to options
-                 # this messes us up for things like -5
-                 if (Optarg ~ /^[[:digit:]]+$/)
-                     fcount = (c Optarg) + 0
-                 else {
-                     fcount = c + 0
-                     Optind--
-                 }
-             } else
-                 usage()
-         }
+     $ gawk -f anagram.awk /usr/share/dict/words | grep '^b'
+     ...
+     babbled blabbed
+     babbler blabber brabble
+     babblers blabbers brabbles
+     babbling blabbing
+     babbly blabby
+     babel bable
+     babels beslab
+     babery yabber
+     ...
 
-         if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) {
-             charcount = substr(ARGV[Optind], 2) + 0
-             Optind++
-         }
+
+File: gawk.info,  Node: Signature Program,  Prev: Anagram Program,  Up: 
Miscellaneous Programs
 
-         for (i = 1; i < Optind; i++)
-             ARGV[i] = ""
+11.3.11 And Now For Something Completely Different
+--------------------------------------------------
 
-         if (repeated_only == 0 && non_repeated_only == 0)
-             repeated_only = non_repeated_only = 1
+The following program was written by Davide Brini and is published on
+his website (http://backreference.org/2011/02/03/obfuscated-awk/).  It
+serves as his signature in the Usenet group `comp.lang.awk'.  He
+supplies the following copyright terms:
 
-         if (ARGC - Optind == 2) {
-             outputfile = ARGV[ARGC - 1]
-             ARGV[ARGC - 1] = ""
-         }
-     }
+     Copyright (C) 2008 Davide Brini
 
-   The following function, `are_equal()', compares the current line,
-`$0', to the previous line, `last'.  It handles skipping fields and
-characters.  If no field count and no character count are specified,
-`are_equal()' simply returns one or zero depending upon the result of a
-simple string comparison of `last' and `$0'.  Otherwise, things get more
-complicated.  If fields have to be skipped, each line is broken into an
-array using `split()' (*note String Functions::); the desired fields
-are then joined back into a line using `join()'.  The joined lines are
-stored in `clast' and `cline'.  If no fields are skipped, `clast' and
-`cline' are set to `last' and `$0', respectively.  Finally, if
-characters are skipped, `substr()' is used to strip off the leading
-`charcount' characters in `clast' and `cline'.  The two strings are
-then compared and `are_equal()' returns the result:
+     Copying and distribution of the code published in this page, with
+     or without modification, are permitted in any medium without
+     royalty provided the copyright notice and this notice are
+     preserved.
 
-     function are_equal(    n, m, clast, cline, alast, aline)
-     {
-         if (fcount == 0 && charcount == 0)
-             return (last == $0)
+   Here is the program:
 
-         if (fcount > 0) {
-             n = split(last, alast)
-             m = split($0, aline)
-             clast = join(alast, fcount+1, n)
-             cline = join(aline, fcount+1, m)
-         } else {
-             clast = last
-             cline = $0
-         }
-         if (charcount) {
-             clast = substr(clast, charcount + 1)
-             cline = substr(cline, charcount + 1)
-         }
+     awk 'BEGIN{O="~"~"~";o="=="=="==";o+=+o;x=O""O;while(X++<=x+o+o)c=c"%c";
+     printf c,(x-O)*(x-O),x*(x-o)-o,x*(x-O)+x-O-o,+x*(x-O)-x+o,X*(o*o+O)+x-O,
+     X*(X-x)-o*o,(x+X)*o*o+o,x*(X-x)-O-O,x-O+(O+o+X+x)*(o+O),X*X-X*(x-O)-x+O,
+     O+X*(o*(o+O)+O),+x+O+X*o,x*(x-o),(o+X+x)*o*o-(x-O-O),O+(X-x)*(X+O),x-O}'
 
-         return (clast == cline)
-     }
+   We leave it to you to determine what the program does.
 
-   The following two rules are the body of the program.  The first one
-is executed only for the very first line of data.  It sets `last' equal
-to `$0', so that subsequent lines of text have something to be compared
-to.
+
+File: gawk.info,  Node: Internationalization,  Next: Advanced Features,  Prev: 
Sample Programs,  Up: Top
 
-   The second rule does the work. The variable `equal' is one or zero,
-depending upon the results of `are_equal()''s comparison. If `uniq' is
-counting repeated lines, and the lines are equal, then it increments
-the `count' variable.  Otherwise, it prints the line and resets `count',
-since the two lines are not equal.
+12 Internationalization with `gawk'
+***********************************
 
-   If `uniq' is not counting, and if the lines are equal, `count' is
-incremented.  Nothing is printed, since the point is to remove
-duplicates.  Otherwise, if `uniq' is counting repeated lines and more
-than one line is seen, or if `uniq' is counting nonrepeated lines and
-only one line is seen, then the line is printed, and `count' is reset.
+Once upon a time, computer makers wrote software that worked only in
+English.  Eventually, hardware and software vendors noticed that if
+their systems worked in the native languages of non-English-speaking
+countries, they were able to sell more systems.  As a result,
+internationalization and localization of programs and software systems
+became a common practice.
 
-   Finally, similar logic is used in the `END' rule to print the final
-line of input data:
+   For many years, the ability to provide internationalization was
+largely restricted to programs written in C and C++.  This major node
+describes the underlying library `gawk' uses for internationalization,
+as well as how `gawk' makes internationalization features available at
+the `awk' program level.  Having internationalization available at the
+`awk' level gives software developers additional flexibility--they are
+no longer forced to write in C or C++ when internationalization is a
+requirement.
 
-     NR == 1 {
-         last = $0
-         next
-     }
+* Menu:
 
-     {
-         equal = are_equal()
+* I18N and L10N::               Internationalization and Localization.
+* Explaining gettext::          How GNU `gettext' works.
+* Programmer i18n::             Features for the programmer.
+* Translator i18n::             Features for the translator.
+* I18N Example::                A simple i18n example.
+* Gawk I18N::                   `gawk' is also internationalized.
 
-         if (do_count) {    # overrides -d and -u
-             if (equal)
-                 count++
-             else {
-                 printf("%4d %s\n", count, last) > outputfile
-                 last = $0
-                 count = 1    # reset
-             }
-             next
-         }
+
+File: gawk.info,  Node: I18N and L10N,  Next: Explaining gettext,  Up: 
Internationalization
 
-         if (equal)
-             count++
-         else {
-             if ((repeated_only && count > 1) ||
-                 (non_repeated_only && count == 1))
-                     print last > outputfile
-             last = $0
-             count = 1
-         }
-     }
+12.1 Internationalization and Localization
+==========================================
 
-     END {
-         if (do_count)
-             printf("%4d %s\n", count, last) > outputfile
-         else if ((repeated_only && count > 1) ||
-                 (non_repeated_only && count == 1))
-             print last > outputfile
-         close(outputfile)
-     }
+"Internationalization" means writing (or modifying) a program once, in
+such a way that it can use multiple languages without requiring further
+source-code changes.  "Localization" means providing the data necessary
+for an internationalized program to work in a particular language.
+Most typically, these terms refer to features such as the language used
+for printing error messages, the language used to read responses, and
+information related to how numerical and monetary values are printed
+and read.
 
 
-File: gawk.info,  Node: Wc Program,  Prev: Uniq Program,  Up: Clones
+File: gawk.info,  Node: Explaining gettext,  Next: Programmer i18n,  Prev: 
I18N and L10N,  Up: Internationalization
 
-13.2.7 Counting Things
-----------------------
+12.2 GNU `gettext'
+==================
 
-The `wc' (word count) utility counts lines, words, and characters in
-one or more input files. Its usage is as follows:
+The facilities in GNU `gettext' focus on messages; strings printed by a
+program, either directly or via formatting with `printf' or
+`sprintf()'.(1)
 
-     wc [-lwc] [ FILES ... ]
+   When using GNU `gettext', each application has its own "text
+domain".  This is a unique name, such as `kpilot' or `gawk', that
+identifies the application.  A complete application may have multiple
+components--programs written in C or C++, as well as scripts written in
+`sh' or `awk'.  All of the components use the same text domain.
 
-   If no files are specified on the command line, `wc' reads its
-standard input. If there are multiple files, it also prints total
-counts for all the files.  The options and their meanings are shown in
-the following list:
+   To make the discussion concrete, assume we're writing an application
+named `guide'.  Internationalization consists of the following steps,
+in this order:
 
-`-l'
-     Count only lines.
+  1. The programmer goes through the source for all of `guide''s
+     components and marks each string that is a candidate for
+     translation.  For example, `"`-F': option required"' is a good
+     candidate for translation.  A table with strings of option names
+     is not (e.g., `gawk''s `--profile' option should remain the same,
+     no matter what the local language).
 
-`-w'
-     Count only words.  A "word" is a contiguous sequence of
-     nonwhitespace characters, separated by spaces and/or TABs.
-     Luckily, this is the normal way `awk' separates fields in its
-     input data.
+  2. The programmer indicates the application's text domain (`"guide"')
+     to the `gettext' library, by calling the `textdomain()' function.
 
-`-c'
-     Count only characters.
+  3. Messages from the application are extracted from the source code
+     and collected into a portable object template file (`guide.pot'),
+     which lists the strings and their translations.  The translations
+     are initially empty.  The original (usually English) messages
+     serve as the key for lookup of the translations.
 
-   Implementing `wc' in `awk' is particularly elegant, since `awk' does
-a lot of the work for us; it splits lines into words (i.e., fields) and
-counts them, it counts lines (i.e., records), and it can easily tell us
-how long a line is.
+  4. For each language with a translator, `guide.pot' is copied to a
+     portable object file (`.po') and translations are created and
+     shipped with the application.  For example, there might be a
+     `fr.po' for a French translation.
 
-   This program uses the `getopt()' library function (*note Getopt
-Function::) and the file-transition functions (*note Filetrans
-Function::).
+  5. Each language's `.po' file is converted into a binary message
+     object (`.mo') file.  A message object file contains the original
+     messages and their translations in a binary format that allows
+     fast lookup of translations at runtime.
 
-   This version has one notable difference from traditional versions of
-`wc': it always prints the counts in the order lines, words, and
-characters.  Traditional versions note the order of the `-l', `-w', and
-`-c' options on the command line, and print the counts in that order.
+  6. When `guide' is built and installed, the binary translation files
+     are installed in a standard place.
 
-   The `BEGIN' rule does the argument processing.  The variable
-`print_total' is true if more than one file is named on the command
-line:
+  7. For testing and development, it is possible to tell `gettext' to
+     use `.mo' files in a different directory than the standard one by
+     using the `bindtextdomain()' function.
 
-     # wc.awk --- count lines, words, characters
+  8. At runtime, `guide' looks up each string via a call to
+     `gettext()'.  The returned string is the translated string if
+     available, or the original string if not.
 
-     # Options:
-     #    -l    only count lines
-     #    -w    only count words
-     #    -c    only count characters
-     #
-     # Default is to count lines, words, characters
-     #
-     # Requires getopt() and file transition library functions
+  9. If necessary, it is possible to access messages from a different
+     text domain than the one belonging to the application, without
+     having to switch the application's default text domain back and
+     forth.
 
-     BEGIN {
-         # let getopt() print a message about
-         # invalid options. we ignore them
-         while ((c = getopt(ARGC, ARGV, "lwc")) != -1) {
-             if (c == "l")
-                 do_lines = 1
-             else if (c == "w")
-                 do_words = 1
-             else if (c == "c")
-                 do_chars = 1
-         }
-         for (i = 1; i < Optind; i++)
-             ARGV[i] = ""
+   In C (or C++), the string marking and dynamic translation lookup are
+accomplished by wrapping each string in a call to `gettext()':
 
-         # if no options, do all
-         if (! do_lines && ! do_words && ! do_chars)
-             do_lines = do_words = do_chars = 1
+     printf("%s", gettext("Don't Panic!\n"));
 
-         print_total = (ARGC - i > 2)
-     }
+   The tools that extract messages from source code pull out all
+strings enclosed in calls to `gettext()'.
 
-   The `beginfile()' function is simple; it just resets the counts of
-lines, words, and characters to zero, and saves the current file name in
-`fname':
+   The GNU `gettext' developers, recognizing that typing `gettext(...)'
+over and over again is both painful and ugly to look at, use the macro
+`_' (an underscore) to make things easier:
 
-     function beginfile(file)
-     {
-         lines = words = chars = 0
-         fname = FILENAME
-     }
+     /* In the standard header file: */
+     #define _(str) gettext(str)
 
-   The `endfile()' function adds the current file's numbers to the
-running totals of lines, words, and characters.(1)  It then prints out
-those numbers for the file that was just read. It relies on
-`beginfile()' to reset the numbers for the following data file:
+     /* In the program text: */
+     printf("%s", _("Don't Panic!\n"));
 
-     function endfile(file)
-     {
-         tlines += lines
-         twords += words
-         tchars += chars
-         if (do_lines)
-             printf "\t%d", lines
-         if (do_words)
-             printf "\t%d", words
-         if (do_chars)
-             printf "\t%d", chars
-         printf "\t%s\n", fname
-     }
+This reduces the typing overhead to just three extra characters per
+string and is considerably easier to read as well.
 
-   There is one rule that is executed for each line. It adds the length
-of the record, plus one, to `chars'.(2) Adding one plus the record
-length is needed because the newline character separating records (the
-value of `RS') is not part of the record itself, and thus not included
-in its length.  Next, `lines' is incremented for each line read, and
-`words' is incremented by the value of `NF', which is the number of
-"words" on this line:
+   There are locale "categories" for different types of locale-related
+information.  The defined locale categories that `gettext' knows about
+are:
 
-     # do per line
-     {
-         chars += length($0) + 1    # get newline
-         lines++
-         words += NF
-     }
+`LC_MESSAGES'
+     Text messages.  This is the default category for `gettext'
+     operations, but it is possible to supply a different one
+     explicitly, if necessary.  (It is almost never necessary to supply
+     a different category.)
 
-   Finally, the `END' rule simply prints the totals for all the files:
+`LC_COLLATE'
+     Text-collation information; i.e., how different characters and/or
+     groups of characters sort in a given language.
 
-     END {
-         if (print_total) {
-             if (do_lines)
-                 printf "\t%d", tlines
-             if (do_words)
-                 printf "\t%d", twords
-             if (do_chars)
-                 printf "\t%d", tchars
-             print "\ttotal"
-         }
-     }
+`LC_CTYPE'
+     Character-type information (alphabetic, digit, upper- or
+     lowercase, and so on).  This information is accessed via the POSIX
+     character classes in regular expressions, such as `/[[:alnum:]]/'
+     (*note Regexp Operators::).
 
-   ---------- Footnotes ----------
+`LC_MONETARY'
+     Monetary information, such as the currency symbol, and whether the
+     symbol goes before or after a number.
 
-   (1) `wc' can't just use the value of `FNR' in `endfile()'. If you
-examine the code in *note Filetrans Function::, you will see that `FNR'
-has already been reset by the time `endfile()' is called.
+`LC_NUMERIC'
+     Numeric information, such as which characters to use for the
+     decimal point and the thousands separator.(2)
+
+`LC_RESPONSE'
+     Response information, such as how "yes" and "no" appear in the
+     local language, and possibly other information as well.
+
+`LC_TIME'
+     Time- and date-related information, such as 12- or 24-hour clock,
+     month printed before or after the day in a date, local month
+     abbreviations, and so on.
 
-   (2) Since `gawk' understands multibyte locales, this code counts
-characters, not bytes.
+`LC_ALL'
+     All of the above.  (Not too useful in the context of `gettext'.)
 
-
-File: gawk.info,  Node: Miscellaneous Programs,  Prev: Clones,  Up: Sample 
Programs
+   ---------- Footnotes ----------
 
-13.3 A Grab Bag of `awk' Programs
-=================================
+   (1) For some operating systems, the `gawk' port doesn't support GNU
+`gettext'.  Therefore, these features are not available if you are
+using one of those operating systems. Sorry.
 
-This minor node is a large "grab bag" of miscellaneous programs.  We
-hope you find them both interesting and enjoyable.
+   (2) Americans use a comma every three decimal places and a period
+for the decimal point, while many Europeans do exactly the opposite:
+1,234.56 versus 1.234,56.
 
-* Menu:
+
+File: gawk.info,  Node: Programmer i18n,  Next: Translator i18n,  Prev: 
Explaining gettext,  Up: Internationalization
 
-* Dupword Program::             Finding duplicated words in a document.
-* Alarm Program::               An alarm clock.
-* Translate Program::           A program similar to the `tr' utility.
-* Labels Program::              Printing mailing labels.
-* Word Sorting::                A program to produce a word usage count.
-* History Sorting::             Eliminating duplicate entries from a history
-                                file.
-* Extract Program::             Pulling out programs from Texinfo source
-                                files.
-* Simple Sed::                  A Simple Stream Editor.
-* Igawk Program::               A wrapper for `awk' that includes
-                                files.
-* Anagram Program::             Finding anagrams from a dictionary.
-* Signature Program::           People do amazing things with too much time on
-                                their hands.
+12.3 Internationalizing `awk' Programs
+======================================
 
-
-File: gawk.info,  Node: Dupword Program,  Next: Alarm Program,  Up: 
Miscellaneous Programs
+`gawk' provides the following variables and functions for
+internationalization:
 
-13.3.1 Finding Duplicated Words in a Document
----------------------------------------------
+`TEXTDOMAIN'
+     This variable indicates the application's text domain.  For
+     compatibility with GNU `gettext', the default value is
+     `"messages"'.
 
-A common error when writing large amounts of prose is to accidentally
-duplicate words.  Typically you will see this in text as something like
-"the the program does the following..."  When the text is online, often
-the duplicated words occur at the end of one line and the beginning of
-another, making them very difficult to spot.
+`_"your message here"'
+     String constants marked with a leading underscore are candidates
+     for translation at runtime.  String constants without a leading
+     underscore are not translated.
 
-   This program, `dupword.awk', scans through a file one line at a time
-and looks for adjacent occurrences of the same word.  It also saves the
-last word on a line (in the variable `prev') for comparison with the
-first word on the next line.
+`dcgettext(STRING [, DOMAIN [, CATEGORY]])'
+     Return the translation of STRING in text domain DOMAIN for locale
+     category CATEGORY.  The default value for DOMAIN is the current
+     value of `TEXTDOMAIN'.  The default value for CATEGORY is
+     `"LC_MESSAGES"'.
 
-   The first two statements make sure that the line is all lowercase,
-so that, for example, "The" and "the" compare equal to each other.  The
-next statement replaces nonalphanumeric and nonwhitespace characters
-with spaces, so that punctuation does not affect the comparison either.
-The characters are replaced with spaces so that formatting controls
-don't create nonsense words (e.g., the Texinfo address@hidden' becomes
-`codeNF' if punctuation is simply deleted).  The record is then resplit
-into fields, yielding just the actual words on the line, and ensuring
-that there are no empty fields.
+     If you supply a value for CATEGORY, it must be a string equal to
+     one of the known locale categories described in *note Explaining
+     gettext::.  You must also supply a text domain.  Use `TEXTDOMAIN'
+     if you want to use the current domain.
 
-   If there are no fields left after removing all the punctuation, the
-current record is skipped.  Otherwise, the program loops through each
-word, comparing it to the previous one:
+          CAUTION: The order of arguments to the `awk' version of the
+          `dcgettext()' function is purposely different from the order
+          for the C version.  The `awk' version's order was chosen to
+          be simple and to allow for reasonable `awk'-style default
+          arguments.
 
-     # dupword.awk --- find duplicate words in text
-     {
-         $0 = tolower($0)
-         gsub(/[^[:alnum:][:blank:]]/, " ");
-         $0 = $0         # re-split
-         if (NF == 0)
-             next
-         if ($1 == prev)
-             printf("%s:%d: duplicate %s\n",
-                 FILENAME, FNR, $1)
-         for (i = 2; i <= NF; i++)
-             if ($i == $(i-1))
-                 printf("%s:%d: duplicate %s\n",
-                     FILENAME, FNR, $i)
-         prev = $NF
-     }
+`dcngettext(STRING1, STRING2, NUMBER [, DOMAIN [, CATEGORY]])'
+     Return the plural form used for NUMBER of the translation of
+     STRING1 and STRING2 in text domain DOMAIN for locale category
+     CATEGORY. STRING1 is the English singular variant of a message,
+     and STRING2 the English plural variant of the same message.  The
+     default value for DOMAIN is the current value of `TEXTDOMAIN'.
+     The default value for CATEGORY is `"LC_MESSAGES"'.
 
-
-File: gawk.info,  Node: Alarm Program,  Next: Translate Program,  Prev: 
Dupword Program,  Up: Miscellaneous Programs
+     The same remarks about argument order as for the `dcgettext()'
+     function apply.
 
-13.3.2 An Alarm Clock Program
------------------------------
+`bindtextdomain(DIRECTORY [, DOMAIN])'
+     Change the directory in which `gettext' looks for `.mo' files, in
+     case they will not or cannot be placed in the standard locations
+     (e.g., during testing).  Return the directory in which DOMAIN is
+     "bound."
 
-     Nothing cures insomnia like a ringing alarm clock.
-     Arnold Robbins
+     The default DOMAIN is the value of `TEXTDOMAIN'.  If DIRECTORY is
+     the null string (`""'), then `bindtextdomain()' returns the
+     current binding for the given DOMAIN.
 
-   The following program is a simple "alarm clock" program.  You give
-it a time of day and an optional message.  At the specified time, it
-prints the message on the standard output. In addition, you can give it
-the number of times to repeat the message as well as a delay between
-repetitions.
+   To use these facilities in your `awk' program, follow the steps
+outlined in *note Explaining gettext::, like so:
 
-   This program uses the `getlocaltime()' function from *note
-Getlocaltime Function::.
+  1. Set the variable `TEXTDOMAIN' to the text domain of your program.
+     This is best done in a `BEGIN' rule (*note BEGIN/END::), or it can
+     also be done via the `-v' command-line option (*note Options::):
 
-   All the work is done in the `BEGIN' rule.  The first part is argument
-checking and setting of defaults: the delay, the count, and the message
-to print.  If the user supplied a message without the ASCII BEL
-character (known as the "alert" character, `"\a"'), then it is added to
-the message.  (On many systems, printing the ASCII BEL generates an
-audible alert. Thus when the alarm goes off, the system calls attention
-to itself in case the user is not looking at the computer.)  Just for a
-change, this program uses a `switch' statement (*note Switch
-Statement::), but the processing could be done with a series of
-`if'-`else' statements instead.  Here is the program:
+          BEGIN {
+              TEXTDOMAIN = "guide"
+              ...
+          }
 
-     # alarm.awk --- set an alarm
-     #
-     # Requires getlocaltime() library function
-     # usage: alarm time [ "message" [ count [ delay ] ] ]
+  2. Mark all translatable strings with a leading underscore (`_')
+     character.  It _must_ be adjacent to the opening quote of the
+     string.  For example:
 
-     BEGIN    \
-     {
-         # Initial argument sanity checking
-         usage1 = "usage: alarm time ['message' [count [delay]]]"
-         usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
+          print _"hello, world"
+          x = _"you goofed"
+          printf(_"Number of users is %d\n", nusers)
 
-         if (ARGC < 2) {
-             print usage1 > "/dev/stderr"
-             print usage2 > "/dev/stderr"
-             exit 1
-         }
-         switch (ARGC) {
-         case 5:
-             delay = ARGV[4] + 0
-             # fall through
-         case 4:
-             count = ARGV[3] + 0
-             # fall through
-         case 3:
-             message = ARGV[2]
-             break
-         default:
-             if (ARGV[1] !~ /[[:digit:]]?[[:digit:]]:[[:digit:]]{2}/) {
-                 print usage1 > "/dev/stderr"
-                 print usage2 > "/dev/stderr"
-                 exit 1
-             }
-             break
-         }
+  3. If you are creating strings dynamically, you can still translate
+     them, using the `dcgettext()' built-in function:
 
-         # set defaults for once we reach the desired time
-         if (delay == 0)
-             delay = 180    # 3 minutes
-         if (count == 0)
-             count = 5
-         if (message == "")
-             message = sprintf("\aIt is now %s!\a", ARGV[1])
-         else if (index(message, "\a") == 0)
-             message = "\a" message "\a"
+          message = nusers " users logged in"
+          message = dcgettext(message, "adminprog")
+          print message
 
-   The next minor node of code turns the alarm time into hours and
-minutes, converts it (if necessary) to a 24-hour clock, and then turns
-that time into a count of the seconds since midnight.  Next it turns
-the current time into a count of seconds since midnight.  The
-difference between the two is how long to wait before setting off the
-alarm:
+     Here, the call to `dcgettext()' supplies a different text domain
+     (`"adminprog"') in which to find the message, but it uses the
+     default `"LC_MESSAGES"' category.
 
-         # split up alarm time
-         split(ARGV[1], atime, ":")
-         hour = atime[1] + 0    # force numeric
-         minute = atime[2] + 0  # force numeric
+  4. During development, you might want to put the `.mo' file in a
+     private directory for testing.  This is done with the
+     `bindtextdomain()' built-in function:
 
-         # get current broken down time
-         getlocaltime(now)
+          BEGIN {
+             TEXTDOMAIN = "guide"   # our text domain
+             if (Testing) {
+                 # where to find our files
+                 bindtextdomain("testdir")
+                 # joe is in charge of adminprog
+                 bindtextdomain("../joe/testdir", "adminprog")
+             }
+             ...
+          }
 
-         # if time given is 12-hour hours and it's after that
-         # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
-         # then add 12 to real hour
-         if (hour < 12 && now["hour"] > hour)
-             hour += 12
 
-         # set target time in seconds since midnight
-         target = (hour * 60 * 60) + (minute * 60)
+   *Note I18N Example::, for an example program showing the steps to
+create and use translations from `awk'.
 
-         # get current time in seconds since midnight
-         current = (now["hour"] * 60 * 60) + \
-                    (now["minute"] * 60) + now["second"]
+
+File: gawk.info,  Node: Translator i18n,  Next: I18N Example,  Prev: 
Programmer i18n,  Up: Internationalization
 
-         # how long to sleep for
-         naptime = target - current
-         if (naptime <= 0) {
-             print "time is in the past!" > "/dev/stderr"
-             exit 1
-         }
+12.4 Translating `awk' Programs
+===============================
 
-   Finally, the program uses the `system()' function (*note I/O
-Functions::) to call the `sleep' utility.  The `sleep' utility simply
-pauses for the given number of seconds.  If the exit status is not zero,
-the program assumes that `sleep' was interrupted and exits. If `sleep'
-exited with an OK status (zero), then the program prints the message in
-a loop, again using `sleep' to delay for however many seconds are
-necessary:
+Once a program's translatable strings have been marked, they must be
+extracted to create the initial `.po' file.  As part of translation, it
+is often helpful to rearrange the order in which arguments to `printf'
+are output.
 
-         # zzzzzz..... go away if interrupted
-         if (system(sprintf("sleep %d", naptime)) != 0)
-             exit 1
+   `gawk''s `--gen-pot' command-line option extracts the messages and
+is discussed next.  After that, `printf''s ability to rearrange the
+order for `printf' arguments at runtime is covered.
 
-         # time to notify!
-         command = sprintf("sleep %d", delay)
-         for (i = 1; i <= count; i++) {
-             print message
-             # if sleep command interrupted, go away
-             if (system(command) != 0)
-                 break
-         }
+* Menu:
 
-         exit 0
-     }
+* String Extraction::           Extracting marked strings.
+* Printf Ordering::             Rearranging `printf' arguments.
+* I18N Portability::            `awk'-level portability issues.
 
 
-File: gawk.info,  Node: Translate Program,  Next: Labels Program,  Prev: Alarm 
Program,  Up: Miscellaneous Programs
+File: gawk.info,  Node: String Extraction,  Next: Printf Ordering,  Up: 
Translator i18n
 
-13.3.3 Transliterating Characters
----------------------------------
+12.4.1 Extracting Marked Strings
+--------------------------------
 
-The system `tr' utility transliterates characters.  For example, it is
-often used to map uppercase letters into lowercase for further
-processing:
+Once your `awk' program is working, and all the strings have been
+marked and you've set (and perhaps bound) the text domain, it is time
+to produce translations.  First, use the `--gen-pot' command-line
+option to create the initial `.pot' file:
 
-     GENERATE DATA | tr 'A-Z' 'a-z' | PROCESS DATA ...
+     $ gawk --gen-pot -f guide.awk > guide.pot
 
-   `tr' requires two lists of characters.(1)  When processing the
-input, the first character in the first list is replaced with the first
-character in the second list, the second character in the first list is
-replaced with the second character in the second list, and so on.  If
-there are more characters in the "from" list than in the "to" list, the
-last character of the "to" list is used for the remaining characters in
-the "from" list.
+   When run with `--gen-pot', `gawk' does not execute your program.
+Instead, it parses it as usual and prints all marked strings to
+standard output in the format of a GNU `gettext' Portable Object file.
+Also included in the output are any constant strings that appear as the
+first argument to `dcgettext()' or as the first and second argument to
+`dcngettext()'.(1) *Note I18N Example::, for the full list of steps to
+go through to create and test translations for `guide'.
 
-   Some time ago, a user proposed that a transliteration function should
-be added to `gawk'.  The following program was written to prove that
-character transliteration could be done with a user-level function.
-This program is not as complete as the system `tr' utility but it does
-most of the job.
+   ---------- Footnotes ----------
 
-   The `translate' program demonstrates one of the few weaknesses of
-standard `awk': dealing with individual characters is very painful,
-requiring repeated use of the `substr()', `index()', and `gsub()'
-built-in functions (*note String Functions::).(2) There are two
-functions.  The first, `stranslate()', takes three arguments:
+   (1) The `xgettext' utility that comes with GNU `gettext' can handle
+`.awk' files.
 
-`from'
-     A list of characters from which to translate.
+
+File: gawk.info,  Node: Printf Ordering,  Next: I18N Portability,  Prev: 
String Extraction,  Up: Translator i18n
 
-`to'
-     A list of characters to which to translate.
+12.4.2 Rearranging `printf' Arguments
+-------------------------------------
 
-`target'
-     The string on which to do the translation.
+Format strings for `printf' and `sprintf()' (*note Printf::) present a
+special problem for translation.  Consider the following:(1)
 
-   Associative arrays make the translation part fairly easy. `t_ar'
-holds the "to" characters, indexed by the "from" characters.  Then a
-simple loop goes through `from', one character at a time.  For each
-character in `from', if the character appears in `target', it is
-replaced with the corresponding `to' character.
+     printf(_"String `%s' has %d characters\n",
+               string, length(string)))
 
-   The `translate()' function simply calls `stranslate()' using `$0' as
-the target.  The main program sets two global variables, `FROM' and
-`TO', from the command line, and then changes `ARGV' so that `awk'
-reads from the standard input.
+   A possible German translation for this might be:
 
-   Finally, the processing rule simply calls `translate()' for each
-record:
+     "%d Zeichen lang ist die Zeichenkette `%s'\n"
 
-     # translate.awk --- do tr-like stuff
-     # Bugs: does not handle things like: tr A-Z a-z, it has
-     # to be spelled out. However, if `to' is shorter than `from',
-     # the last character in `to' is used for the rest of `from'.
+   The problem should be obvious: the order of the format
+specifications is different from the original!  Even though `gettext()'
+can return the translated string at runtime, it cannot change the
+argument order in the call to `printf'.
 
-     function stranslate(from, to, target,     lf, lt, ltarget, t_ar, i, c,
-                                                                    result)
-     {
-         lf = length(from)
-         lt = length(to)
-         ltarget = length(target)
-         for (i = 1; i <= lt; i++)
-             t_ar[substr(from, i, 1)] = substr(to, i, 1)
-         if (lt < lf)
-             for (; i <= lf; i++)
-                 t_ar[substr(from, i, 1)] = substr(to, lt, 1)
-         for (i = 1; i <= ltarget; i++) {
-             c = substr(target, i, 1)
-             if (c in t_ar)
-                 c = t_ar[c]
-             result = result c
-         }
-         return result
-     }
+   To solve this problem, `printf' format specifiers may have an
+additional optional element, which we call a "positional specifier".
+For example:
 
-     function translate(from, to)
-     {
-         return $0 = stranslate(from, to, $0)
-     }
+     "%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
 
-     # main program
-     BEGIN {
-         if (ARGC < 3) {
-             print "usage: translate from to" > "/dev/stderr"
-             exit
-         }
-         FROM = ARGV[1]
-         TO = ARGV[2]
-         ARGC = 2
-         ARGV[1] = "-"
-     }
+   Here, the positional specifier consists of an integer count, which
+indicates which argument to use, and a `$'. Counts are one-based, and
+the format string itself is _not_ included.  Thus, in the following
+example, `string' is the first argument and `length(string)' is the
+second:
 
-     {
-         translate(FROM, TO)
-         print
-     }
+     $ gawk 'BEGIN {
+     >     string = "Dont Panic"
+     >     printf _"%2$d characters live in \"%1$s\"\n",
+     >                         string, length(string)
+     > }'
+     -| 10 characters live in "Dont Panic"
 
-   While it is possible to do character transliteration in a user-level
-function, it is not necessarily efficient, and we (the `gawk' authors)
-started to consider adding a built-in function.  However, shortly after
-writing this program, we learned that the System V Release 4 `awk' had
-added the `toupper()' and `tolower()' functions (*note String
-Functions::).  These functions handle the vast majority of the cases
-where character transliteration is necessary, and so we chose to simply
-add those functions to `gawk' as well and then leave well enough alone.
+   If present, positional specifiers come first in the format
+specification, before the flags, the field width, and/or the precision.
 
-   An obvious improvement to this program would be to set up the `t_ar'
-array only once, in a `BEGIN' rule. However, this assumes that the
-"from" and "to" lists will never change throughout the lifetime of the
-program.
+   Positional specifiers can be used with the dynamic field width and
+precision capability:
 
-   ---------- Footnotes ----------
+     $ gawk 'BEGIN {
+     >    printf("%*.*s\n", 10, 20, "hello")
+     >    printf("%3$*2$.*1$s\n", 20, 10, "hello")
+     > }'
+     -|      hello
+     -|      hello
 
-   (1) On some older systems, `tr' may require that the lists be
-written as range expressions enclosed in square brackets (`[a-z]') and
-quoted, to prevent the shell from attempting a file name expansion.
-This is not a feature.
+     NOTE: When using `*' with a positional specifier, the `*' comes
+     first, then the integer position, and then the `$'.  This is
+     somewhat counterintuitive.
 
-   (2) This program was written before `gawk' acquired the ability to
-split each character in a string into separate array elements.
+   `gawk' does not allow you to mix regular format specifiers and those
+with positional specifiers in the same string:
 
-
-File: gawk.info,  Node: Labels Program,  Next: Word Sorting,  Prev: Translate 
Program,  Up: Miscellaneous Programs
+     $ gawk 'BEGIN { printf _"%d %3$s\n", 1, 2, "hi" }'
+     error--> gawk: cmd. line:1: fatal: must use `count$' on all formats or 
none
 
-13.3.4 Printing Mailing Labels
-------------------------------
+     NOTE: There are some pathological cases that `gawk' may fail to
+     diagnose.  In such cases, the output may not be what you expect.
+     It's still a bad idea to try mixing them, even if `gawk' doesn't
+     detect it.
 
-Here is a "real world"(1) program.  This script reads lists of names and
-addresses and generates mailing labels.  Each page of labels has 20
-labels on it, two across and 10 down.  The addresses are guaranteed to
-be no more than five lines of data.  Each address is separated from the
-next by a blank line.
+   Although positional specifiers can be used directly in `awk'
+programs, their primary purpose is to help in producing correct
+translations of format strings into languages different from the one in
+which the program is first written.
 
-   The basic idea is to read 20 labels worth of data.  Each line of
-each label is stored in the `line' array.  The single rule takes care
-of filling the `line' array and printing the page when 20 labels have
-been read.
+   ---------- Footnotes ----------
 
-   The `BEGIN' rule simply sets `RS' to the empty string, so that `awk'
-splits records at blank lines (*note Records::).  It sets `MAXLINES' to
-100, since 100 is the maximum number of lines on the page (20 * 5 =
-100).
+   (1) This example is borrowed from the GNU `gettext' manual.
 
-   Most of the work is done in the `printpage()' function.  The label
-lines are stored sequentially in the `line' array.  But they have to
-print horizontally; `line[1]' next to `line[6]', `line[2]' next to
-`line[7]', and so on.  Two loops are used to accomplish this.  The
-outer loop, controlled by `i', steps through every 10 lines of data;
-this is each row of labels.  The inner loop, controlled by `j', goes
-through the lines within the row.  As `j' goes from 0 to 4, `i+j' is
-the `j'-th line in the row, and `i+j+5' is the entry next to it.  The
-output ends up looking something like this:
+
+File: gawk.info,  Node: I18N Portability,  Prev: Printf Ordering,  Up: 
Translator i18n
 
-     line 1          line 6
-     line 2          line 7
-     line 3          line 8
-     line 4          line 9
-     line 5          line 10
-     ...
+12.4.3 `awk' Portability Issues
+-------------------------------
 
-The `printf' format string `%-41s' left-aligns the data and prints it
-within a fixed-width field.
+`gawk''s internationalization features were purposely chosen to have as
+little impact as possible on the portability of `awk' programs that use
+them to other versions of `awk'.  Consider this program:
 
-   As a final note, an extra blank line is printed at lines 21 and 61,
-to keep the output lined up on the labels.  This is dependent on the
-particular brand of labels in use when the program was written.  You
-will also note that there are two blank lines at the top and two blank
-lines at the bottom.
+     BEGIN {
+         TEXTDOMAIN = "guide"
+         if (Test_Guide)   # set with -v
+             bindtextdomain("/test/guide/messages")
+         print _"don't panic!"
+     }
+
+As written, it won't work on other versions of `awk'.  However, it is
+actually almost portable, requiring very little change:
+
+   * Assignments to `TEXTDOMAIN' won't have any effect, since
+     `TEXTDOMAIN' is not special in other `awk' implementations.
+
+   * Non-GNU versions of `awk' treat marked strings as the
+     concatenation of a variable named `_' with the string following
+     it.(1) Typically, the variable `_' has the null string (`""') as
+     its value, leaving the original string constant as the result.
 
-   The `END' rule arranges to flush the final page of labels; there may
-not have been an even multiple of 20 labels in the data:
+   * By defining "dummy" functions to replace `dcgettext()',
+     `dcngettext()' and `bindtextdomain()', the `awk' program can be
+     made to run, but all the messages are output in the original
+     language.  For example:
 
-     # labels.awk --- print mailing labels
+          function bindtextdomain(dir, domain)
+          {
+              return dir
+          }
 
-     # Each label is 5 lines of data that may have blank lines.
-     # The label sheets have 2 blank lines at the top and 2 at
-     # the bottom.
+          function dcgettext(string, domain, category)
+          {
+              return string
+          }
 
-     BEGIN    { RS = "" ; MAXLINES = 100 }
+          function dcngettext(string1, string2, number, domain, category)
+          {
+              return (number == 1 ? string1 : string2)
+          }
 
-     function printpage(    i, j)
-     {
-         if (Nlines <= 0)
-             return
+   * The use of positional specifications in `printf' or `sprintf()' is
+     _not_ portable.  To support `gettext()' at the C level, many
+     systems' C versions of `sprintf()' do support positional
+     specifiers.  But it works only if enough arguments are supplied in
+     the function call.  Many versions of `awk' pass `printf' formats
+     and arguments unchanged to the underlying C library version of
+     `sprintf()', but only one format and argument at a time.  What
+     happens if a positional specification is used is anybody's guess.
+     However, since the positional specifications are primarily for use
+     in _translated_ format strings, and since non-GNU `awk's never
+     retrieve the translated string, this should not be a problem in
+     practice.
 
-         printf "\n\n"        # header
+   ---------- Footnotes ----------
 
-         for (i = 1; i <= Nlines; i += 10) {
-             if (i == 21 || i == 61)
-                 print ""
-             for (j = 0; j < 5; j++) {
-                 if (i + j > MAXLINES)
-                     break
-                 printf "   %-41s %s\n", line[i+j], line[i+j+5]
-             }
-             print ""
-         }
+   (1) This is good fodder for an "Obfuscated `awk'" contest.
 
-         printf "\n\n"        # footer
+
+File: gawk.info,  Node: I18N Example,  Next: Gawk I18N,  Prev: Translator 
i18n,  Up: Internationalization
 
-         delete line
-     }
+12.5 A Simple Internationalization Example
+==========================================
 
-     # main rule
-     {
-         if (Count >= 20) {
-             printpage()
-             Count = 0
-             Nlines = 0
-         }
-         n = split($0, a, "\n")
-         for (i = 1; i <= n; i++)
-             line[++Nlines] = a[i]
-         for (; i <= 5; i++)
-             line[++Nlines] = ""
-         Count++
-     }
+Now let's look at a step-by-step example of how to internationalize and
+localize a simple `awk' program, using `guide.awk' as our original
+source:
 
-     END    \
-     {
-         printpage()
+     BEGIN {
+         TEXTDOMAIN = "guide"
+         bindtextdomain(".")  # for testing
+         print _"Don't Panic"
+         print _"The Answer Is", 42
+         print "Pardon me, Zaphod who?"
      }
 
-   ---------- Footnotes ----------
+Run `gawk --gen-pot' to create the `.pot' file:
 
-   (1) "Real world" is defined as "a program actually used to get
-something done."
+     $ gawk --gen-pot -f guide.awk > guide.pot
 
-
-File: gawk.info,  Node: Word Sorting,  Next: History Sorting,  Prev: Labels 
Program,  Up: Miscellaneous Programs
+This produces:
 
-13.3.5 Generating Word-Usage Counts
------------------------------------
+     #: guide.awk:4
+     msgid "Don't Panic"
+     msgstr ""
 
-When working with large amounts of text, it can be interesting to know
-how often different words appear.  For example, an author may overuse
-certain words, in which case she might wish to find synonyms to
-substitute for words that appear too often. This node develops a
-program for counting words and presenting the frequency information in
-a useful format.
+     #: guide.awk:5
+     msgid "The Answer Is"
+     msgstr ""
 
-   At first glance, a program like this would seem to do the job:
+   This original portable object template file is saved and reused for
+each language into which the application is translated.  The `msgid' is
+the original string and the `msgstr' is the translation.
 
-     # Print list of word frequencies
+     NOTE: Strings not marked with a leading underscore do not appear
+     in the `guide.pot' file.
 
-     {
-         for (i = 1; i <= NF; i++)
-             freq[$i]++
-     }
+   Next, the messages must be translated.  Here is a translation to a
+hypothetical dialect of English, called "Mellow":(1)
 
-     END {
-         for (word in freq)
-             printf "%s\t%d\n", word, freq[word]
-     }
+     $ cp guide.pot guide-mellow.po
+     ADD TRANSLATIONS TO guide-mellow.po ...
 
-   The program relies on `awk''s default field splitting mechanism to
-break each line up into "words," and uses an associative array named
-`freq', indexed by each word, to count the number of times the word
-occurs. In the `END' rule, it prints the counts.
+Following are the translations:
 
-   This program has several problems that prevent it from being useful
-on real text files:
+     #: guide.awk:4
+     msgid "Don't Panic"
+     msgstr "Hey man, relax!"
 
-   * The `awk' language considers upper- and lowercase characters to be
-     distinct.  Therefore, "bartender" and "Bartender" are not treated
-     as the same word.  This is undesirable, since in normal text, words
-     are capitalized if they begin sentences, and a frequency analyzer
-     should not be sensitive to capitalization.
+     #: guide.awk:5
+     msgid "The Answer Is"
+     msgstr "Like, the scoop is"
 
-   * Words are detected using the `awk' convention that fields are
-     separated just by whitespace.  Other characters in the input
-     (except newlines) don't have any special meaning to `awk'.  This
-     means that punctuation characters count as part of words.
+   The next step is to make the directory to hold the binary message
+object file and then to create the `guide.mo' file.  The directory
+layout shown here is standard for GNU `gettext' on GNU/Linux systems.
+Other versions of `gettext' may use a different layout:
 
-   * The output does not come out in any useful order.  You're more
-     likely to be interested in which words occur most frequently or in
-     having an alphabetized table of how frequently each word occurs.
+     $ mkdir en_US en_US/LC_MESSAGES
 
-   The first problem can be solved by using `tolower()' to remove case
-distinctions.  The second problem can be solved by using `gsub()' to
-remove punctuation characters.  Finally, we solve the third problem by
-using the system `sort' utility to process the output of the `awk'
-script.  Here is the new version of the program:
+   The `msgfmt' utility does the conversion from human-readable `.po'
+file to machine-readable `.mo' file.  By default, `msgfmt' creates a
+file named `messages'.  This file must be renamed and placed in the
+proper directory so that `gawk' can find it:
 
-     # wordfreq.awk --- print list of word frequencies
+     $ msgfmt guide-mellow.po
+     $ mv messages en_US/LC_MESSAGES/guide.mo
 
-     {
-         $0 = tolower($0)    # remove case distinctions
-         # remove punctuation
-         gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
-         for (i = 1; i <= NF; i++)
-             freq[$i]++
-     }
+   Finally, we run the program to test it:
 
-     END {
-         for (word in freq)
-             printf "%s\t%d\n", word, freq[word]
-     }
+     $ gawk -f guide.awk
+     -| Hey man, relax!
+     -| Like, the scoop is 42
+     -| Pardon me, Zaphod who?
 
-   Assuming we have saved this program in a file named `wordfreq.awk',
-and that the data is in `file1', the following pipeline:
+   If the three replacement functions for `dcgettext()', `dcngettext()'
+and `bindtextdomain()' (*note I18N Portability::) are in a file named
+`libintl.awk', then we can run `guide.awk' unchanged as follows:
 
-     awk -f wordfreq.awk file1 | sort -k 2nr
+     $ gawk --posix -f guide.awk -f libintl.awk
+     -| Don't Panic
+     -| The Answer Is 42
+     -| Pardon me, Zaphod who?
 
-produces a table of the words appearing in `file1' in order of
-decreasing frequency.
+   ---------- Footnotes ----------
 
-   The `awk' program suitably massages the data and produces a word
-frequency table, which is not ordered.  The `awk' script's output is
-then sorted by the `sort' utility and printed on the screen.
+   (1) Perhaps it would be better if it were called "Hippy." Ah, well.
 
-   The options given to `sort' specify a sort that uses the second
-field of each input line (skipping one field), that the sort keys
-should be treated as numeric quantities (otherwise `15' would come
-before `5'), and that the sorting should be done in descending
-(reverse) order.
+
+File: gawk.info,  Node: Gawk I18N,  Prev: I18N Example,  Up: 
Internationalization
 
-   The `sort' could even be done from within the program, by changing
-the `END' action to:
+12.6 `gawk' Can Speak Your Language
+===================================
 
-     END {
-         sort = "sort -k 2nr"
-         for (word in freq)
-             printf "%s\t%d\n", word, freq[word] | sort
-         close(sort)
-     }
+`gawk' itself has been internationalized using the GNU `gettext'
+package.  (GNU `gettext' is described in complete detail in *note (GNU
+`gettext' utilities)Top:: gettext, GNU gettext tools.)  As of this
+writing, the latest version of GNU `gettext' is version 0.18.1
+(ftp://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz).
 
-   This way of sorting must be used on systems that do not have true
-pipes at the command-line (or batch-file) level.  See the general
-operating system documentation for more information on how to use the
-`sort' program.
+   If a translation of `gawk''s messages exists, then `gawk' produces
+usage messages, warnings, and fatal errors in the local language.
 
 
-File: gawk.info,  Node: History Sorting,  Next: Extract Program,  Prev: Word 
Sorting,  Up: Miscellaneous Programs
+File: gawk.info,  Node: Advanced Features,  Next: Debugger,  Prev: 
Internationalization,  Up: Top
 
-13.3.6 Removing Duplicates from Unsorted Text
----------------------------------------------
-
-The `uniq' program (*note Uniq Program::), removes duplicate lines from
-_sorted_ data.
+13 Advanced Features of `gawk'
+******************************
 
-   Suppose, however, you need to remove duplicate lines from a data
-file but that you want to preserve the order the lines are in.  A good
-example of this might be a shell history file.  The history file keeps
-a copy of all the commands you have entered, and it is not unusual to
-repeat a command several times in a row.  Occasionally you might want
-to compact the history by removing duplicate entries.  Yet it is
-desirable to maintain the order of the original commands.
+     Write documentation as if whoever reads it is a violent psychopath
+     who knows where you live.
+     Steve English, as quoted by Peter Langston
 
-   This simple program does the job.  It uses two arrays.  The `data'
-array is indexed by the text of each line.  For each line, `data[$0]'
-is incremented.  If a particular line has not been seen before, then
-`data[$0]' is zero.  In this case, the text of the line is stored in
-`lines[count]'.  Each element of `lines' is a unique command, and the
-indices of `lines' indicate the order in which those lines are
-encountered.  The `END' rule simply prints out the lines, in order:
+   This major node discusses advanced features in `gawk'.  It's a bit
+of a "grab bag" of items that are otherwise unrelated to each other.
+First, a command-line option allows `gawk' to recognize nondecimal
+numbers in input data, not just in `awk' programs.  Then, `gawk''s
+special features for sorting arrays are presented.  Next, two-way I/O,
+discussed briefly in earlier parts of this Info file, is described in
+full detail, along with the basics of TCP/IP networking.  Finally,
+`gawk' can "profile" an `awk' program, making it possible to tune it
+for performance.
 
-     # histsort.awk --- compact a shell history file
-     # Thanks to Byron Rakitzis for the general idea
+   *note Dynamic Extensions::, discusses the ability to dynamically add
+new built-in functions to `gawk'.  As this feature is still immature
+and likely to change, its description is relegated to an appendix.
 
-     {
-         if (data[$0]++ == 0)
-             lines[++count] = $0
-     }
+* Menu:
 
-     END {
-         for (i = 1; i <= count; i++)
-             print lines[i]
-     }
+* Nondecimal Data::             Allowing nondecimal input data.
+* Array Sorting::               Facilities for controlling array traversal and
+                                sorting arrays.
+* Two-way I/O::                 Two-way communications with another process.
+* TCP/IP Networking::           Using `gawk' for network programming.
+* Profiling::                   Profiling your `awk' programs.
 
-   This program also provides a foundation for generating other useful
-information.  For example, using the following `print' statement in the
-`END' rule indicates how often a particular command is used:
+
+File: gawk.info,  Node: Nondecimal Data,  Next: Array Sorting,  Up: Advanced 
Features
 
-     print data[lines[i]], lines[i]
+13.1 Allowing Nondecimal Input Data
+===================================
 
-   This works because `data[$0]' is incremented each time a line is
-seen.
+If you run `gawk' with the `--non-decimal-data' option, you can have
+nondecimal constants in your input data:
 
-
-File: gawk.info,  Node: Extract Program,  Next: Simple Sed,  Prev: History 
Sorting,  Up: Miscellaneous Programs
+     $ echo 0123 123 0x123 |
+     > gawk --non-decimal-data '{ printf "%d, %d, %d\n",
+     >                                         $1, $2, $3 }'
+     -| 83, 123, 291
 
-13.3.7 Extracting Programs from Texinfo Source Files
-----------------------------------------------------
+   For this feature to work, write your program so that `gawk' treats
+your data as numeric:
 
-The nodes *note Library Functions::, and *note Sample Programs::, are
-the top level nodes for a large number of `awk' programs.  If you want
-to experiment with these programs, it is tedious to have to type them
-in by hand.  Here we present a program that can extract parts of a
-Texinfo input file into separate files.
+     $ echo 0123 123 0x123 | gawk '{ print $1, $2, $3 }'
+     -| 0123 123 0x123
 
-This Info file is written in Texinfo (http://texinfo.org), the GNU
-project's document formatting language.  A single Texinfo source file
-can be used to produce both printed and online documentation.  The
-Texinfo language is described fully, starting with *note (Texinfo)Top::
-texinfo,Texinfo--The GNU Documentation Format.
+The `print' statement treats its expressions as strings.  Although the
+fields can act as numbers when necessary, they are still strings, so
+`print' does not try to treat them numerically.  You may need to add
+zero to a field to force it to be treated as a number.  For example:
 
-   For our purposes, it is enough to know three things about Texinfo
-input files:
+     $ echo 0123 123 0x123 | gawk --non-decimal-data '
+     > { print $1, $2, $3
+     >   print $1 + 0, $2 + 0, $3 + 0 }'
+     -| 0123 123 0x123
+     -| 83 123 291
 
-   * The "at" symbol (`@') is special in Texinfo, much as the backslash
-     (`\') is in C or `awk'.  Literal `@' symbols are represented in
-     Texinfo source files as `@@'.
+   Because it is common to have decimal data with leading zeros, and
+because using this facility could lead to surprising results, the
+default is to leave it disabled.  If you want it, you must explicitly
+request it.
 
-   * Comments start with either address@hidden' or address@hidden'.  The
-     file-extraction program works by using special comments that start
-     at the beginning of a line.
+     CAUTION: _Use of this option is not recommended._ It can break old
+     programs very badly.  Instead, use the `strtonum()' function to
+     convert your data (*note Nondecimal-numbers::).  This makes your
+     programs easier to write and easier to read, and leads to less
+     surprising results.
 
-   * Lines containing address@hidden' and address@hidden group' commands 
bracket
-     example text that should not be split across a page boundary.
-     (Unfortunately, TeX isn't always smart enough to do things exactly
-     right, so we have to give it some help.)
+
+File: gawk.info,  Node: Array Sorting,  Next: Two-way I/O,  Prev: Nondecimal 
Data,  Up: Advanced Features
 
-   The following program, `extract.awk', reads through a Texinfo source
-file and does two things, based on the special comments.  Upon seeing
address@hidden system ...', it runs a command, by extracting the command text 
from
-the control line and passing it on to the `system()' function (*note
-I/O Functions::).  Upon seeing address@hidden file FILENAME', each subsequent 
line
-is sent to the file FILENAME, until address@hidden endfile' is encountered.  
The
-rules in `extract.awk' match either address@hidden' or address@hidden' by 
letting the
-`omment' part be optional.  Lines containing address@hidden' and 
address@hidden group'
-are simply removed.  `extract.awk' uses the `join()' library function
-(*note Join Function::).
+13.2 Controlling Array Traversal and Array Sorting
+==================================================
 
-   The example programs in the online Texinfo source for `GAWK:
-Effective AWK Programming' (`gawk.texi') have all been bracketed inside
-`file' and `endfile' lines.  The `gawk' distribution uses a copy of
-`extract.awk' to extract the sample programs and install many of them
-in a standard directory where `gawk' can find them.  The Texinfo file
-looks something like this:
+`gawk' lets you control the order in which a `for (i in array)' loop
+traverses an array.
 
-     ...
-     This program has a @code{BEGIN} rule,
-     that prints a nice message:
+   In addition, two built-in functions, `asort()' and `asorti()', let
+you sort arrays based on the array values and indices, respectively.
+These two functions also provide control over the sorting criteria used
+to order the elements during sorting.
 
-     @example
-     @c file examples/messages.awk
-     BEGIN @{ print "Don't panic!" @}
-     @c end file
-     @end example
+* Menu:
 
-     It also prints some final advice:
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions::     How to use `asort()' and `asorti()'.
 
-     @example
-     @c file examples/messages.awk
-     END @{ print "Always avoid bored archeologists!" @}
-     @c end file
-     @end example
-     ...
+
+File: gawk.info,  Node: Controlling Array Traversal,  Next: Array Sorting 
Functions,  Up: Array Sorting
 
-   `extract.awk' begins by setting `IGNORECASE' to one, so that mixed
-upper- and lowercase letters in the directives won't matter.
+13.2.1 Controlling Array Traversal
+----------------------------------
 
-   The first rule handles calling `system()', checking that a command is
-given (`NF' is at least three) and also checking that the command exits
-with a zero exit status, signifying OK:
+By default, the order in which a `for (i in array)' loop scans an array
+is not defined; it is generally based upon the internal implementation
+of arrays inside `awk'.
 
-     # extract.awk --- extract files and run programs
-     #                 from texinfo files
+   Often, though, it is desirable to be able to loop over the elements
+in a particular order that you, the programmer, choose.  `gawk' lets
+you do this.
 
-     BEGIN    { IGNORECASE = 1 }
+   *note Controlling Scanning::, describes how you can assign special,
+pre-defined values to `PROCINFO["sorted_in"]' in order to control the
+order in which `gawk' will traverse an array during a `for' loop.
 
-     /address@hidden(omment)?[ \t]+system/    \
+   In addition, the value of `PROCINFO["sorted_in"]' can be a function
+name.  This lets you traverse an array based on any custom criterion.
+The array elements are ordered according to the return value of this
+function.  The comparison function should be defined with at least four
+arguments:
+
+     function comp_func(i1, v1, i2, v2)
      {
-         if (NF < 3) {
-             e = (FILENAME ":" FNR)
-             e = (e  ": badly formed `system' line")
-             print e > "/dev/stderr"
-             next
-         }
-         $1 = ""
-         $2 = ""
-         stat = system($0)
-         if (stat != 0) {
-             e = (FILENAME ":" FNR)
-             e = (e ": warning: system returned " stat)
-             print e > "/dev/stderr"
-         }
+         COMPARE ELEMENTS 1 AND 2 IN SOME FASHION
+         RETURN < 0; 0; OR > 0
      }
 
-The variable `e' is used so that the rule fits nicely on the screen.
+   Here, I1 and I2 are the indices, and V1 and V2 are the corresponding
+values of the two elements being compared.  Either V1 or V2, or both,
+can be arrays if the array being traversed contains subarrays as values.
+(*Note Arrays of Arrays::, for more information about subarrays.)  The
+three possible return values are interpreted as follows:
 
-   The second rule handles moving data into files.  It verifies that a
-file name is given in the directive.  If the file named is not the
-current file, then the current file is closed.  Keeping the current file
-open until a new file is encountered allows the use of the `>'
-redirection for printing the contents, keeping open file management
-simple.
+`comp_func(i1, v1, i2, v2) < 0'
+     Index I1 comes before index I2 during loop traversal.
 
-   The `for' loop does the work.  It reads lines using `getline' (*note
-Getline::).  For an unexpected end of file, it calls the
-`unexpected_eof()' function.  If the line is an "endfile" line, then it
-breaks out of the loop.  If the line is an address@hidden' or address@hidden 
group'
-line, then it ignores it and goes on to the next line.  Similarly,
-comments within examples are also ignored.
+`comp_func(i1, v1, i2, v2) == 0'
+     Indices I1 and I2 come together but the relative order with
+     respect to each other is undefined.
 
-   Most of the work is in the following few lines.  If the line has no
-`@' symbols, the program can print it directly.  Otherwise, each
-leading `@' must be stripped off.  To remove the `@' symbols, the line
-is split into separate elements of the array `a', using the `split()'
-function (*note String Functions::).  The `@' symbol is used as the
-separator character.  Each element of `a' that is empty indicates two
-successive `@' symbols in the original line.  For each two empty
-elements (`@@' in the original file), we have to add a single `@'
-symbol back in.(1)
+`comp_func(i1, v1, i2, v2) > 0'
+     Index I1 comes after index I2 during loop traversal.
 
-   When the processing of the array is finished, `join()' is called
-with the value of `SUBSEP', to rejoin the pieces back into a single
-line.  That line is then printed to the output file:
+   Our first comparison function can be used to scan an array in
+numerical order of the indices:
 
-     /address@hidden(omment)?[ \t]+file/    \
+     function cmp_num_idx(i1, v1, i2, v2)
      {
-         if (NF != 3) {
-             e = (FILENAME ":" FNR ": badly formed `file' line")
-             print e > "/dev/stderr"
-             next
-         }
-         if ($3 != curfile) {
-             if (curfile != "")
-                 close(curfile)
-             curfile = $3
-         }
+          # numerical index comparison, ascending order
+          return (i1 - i2)
+     }
 
-         for (;;) {
-             if ((getline line) <= 0)
-                 unexpected_eof()
-             if (line ~ /address@hidden(omment)?[ \t]+endfile/)
-                 break
-             else if (line ~ /^@(end[ \t]+)?group/)
-                 continue
-             else if (line ~ /address@hidden(omment+)?[ \t]+/)
-                 continue
-             if (index(line, "@") == 0) {
-                 print line > curfile
-                 continue
-             }
-             n = split(line, a, "@")
-             # if a[1] == "", means leading @,
-             # don't add one back in.
-             for (i = 2; i <= n; i++) {
-                 if (a[i] == "") { # was an @@
-                     a[i] = "@"
-                     if (a[i+1] == "")
-                         i++
-                 }
-             }
-             print join(a, 1, n, SUBSEP) > curfile
-         }
+   Our second function traverses an array based on the string order of
+the element values rather than by indices:
+
+     function cmp_str_val(i1, v1, i2, v2)
+     {
+         # string value comparison, ascending order
+         v1 = v1 ""
+         v2 = v2 ""
+         if (v1 < v2)
+             return -1
+         return (v1 != v2)
      }
 
-   An important thing to note is the use of the `>' redirection.
-Output done with `>' only opens the file once; it stays open and
-subsequent output is appended to the file (*note Redirection::).  This
-makes it easy to mix program text and explanatory prose for the same
-sample source file (as has been done here!) without any hassle.  The
-file is only closed when a new data file name is encountered or at the
-end of the input file.
-
-   Finally, the function `unexpected_eof()' prints an appropriate error
-message and then exits.  The `END' rule handles the final cleanup,
-closing the open file:
+   The third comparison function makes all numbers, and numeric strings
+without any leading or trailing spaces, come out first during loop
+traversal:
 
-     function unexpected_eof()
+     function cmp_num_str_val(i1, v1, i2, v2,   n1, n2)
      {
-         printf("%s:%d: unexpected EOF or error\n",
-             FILENAME, FNR) > "/dev/stderr"
-         exit 1
+          # numbers before string value comparison, ascending order
+          n1 = v1 + 0
+          n2 = v2 + 0
+          if (n1 == v1)
+              return (n2 == v2) ? (n1 - n2) : -1
+          else if (n2 == v2)
+              return 1
+          return (v1 < v2) ? -1 : (v1 != v2)
      }
 
-     END {
-         if (curfile)
-             close(curfile)
+   Here is a main program to demonstrate how `gawk' behaves using each
+of the previous functions:
+
+     BEGIN {
+         data["one"] = 10
+         data["two"] = 20
+         data[10] = "one"
+         data[100] = 100
+         data[20] = "two"
+
+         f[1] = "cmp_num_idx"
+         f[2] = "cmp_str_val"
+         f[3] = "cmp_num_str_val"
+         for (i = 1; i <= 3; i++) {
+             printf("Sort function: %s\n", f[i])
+             PROCINFO["sorted_in"] = f[i]
+             for (j in data)
+                 printf("\tdata[%s] = %s\n", j, data[j])
+             print ""
+         }
      }
 
-   ---------- Footnotes ----------
+   Here are the results when the program is run:
 
-   (1) This program was written before `gawk' had the `gensub()'
-function. Consider how you might use it to simplify the code.
+     $ gawk -f compdemo.awk
+     -| Sort function: cmp_num_idx      Sort by numeric index
+     -|     data[two] = 20
+     -|     data[one] = 10              Both strings are numerically zero
+     -|     data[10] = one
+     -|     data[20] = two
+     -|     data[100] = 100
+     -|
+     -| Sort function: cmp_str_val      Sort by element values as strings
+     -|     data[one] = 10
+     -|     data[100] = 100             String 100 is less than string 20
+     -|     data[two] = 20
+     -|     data[10] = one
+     -|     data[20] = two
+     -|
+     -| Sort function: cmp_num_str_val  Sort all numeric values before all 
strings
+     -|     data[one] = 10
+     -|     data[two] = 20
+     -|     data[100] = 100
+     -|     data[10] = one
+     -|     data[20] = two
 
-
-File: gawk.info,  Node: Simple Sed,  Next: Igawk Program,  Prev: Extract 
Program,  Up: Miscellaneous Programs
+   Consider sorting the entries of a GNU/Linux system password file
+according to login name.  The following program sorts records by a
+specific field position and can be used for this purpose:
 
-13.3.8 A Simple Stream Editor
------------------------------
+     # sort.awk --- simple program to sort by field position
+     # field position is specified by the global variable POS
 
-The `sed' utility is a stream editor, a program that reads a stream of
-data, makes changes to it, and passes it on.  It is often used to make
-global changes to a large file or to a stream of data generated by a
-pipeline of commands.  While `sed' is a complicated program in its own
-right, its most common use is to perform global substitutions in the
-middle of a pipeline:
+     function cmp_field(i1, v1, i2, v2)
+     {
+         # comparison by value, as string, and ascending order
+         return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
+     }
 
-     command1 < orig.data | sed 's/old/new/g' | command2 > result
+     {
+         for (i = 1; i <= NF; i++)
+             a[NR][i] = $i
+     }
 
-   Here, `s/old/new/g' tells `sed' to look for the regexp `old' on each
-input line and globally replace it with the text `new', i.e., all the
-occurrences on a line.  This is similar to `awk''s `gsub()' function
-(*note String Functions::).
+     END {
+         PROCINFO["sorted_in"] = "cmp_field"
+         if (POS < 1 || POS > NF)
+             POS = 1
+         for (i in a) {
+             for (j = 1; j <= NF; j++)
+                 printf("%s%c", a[i][j], j < NF ? ":" : "")
+             print ""
+         }
+     }
 
-   The following program, `awksed.awk', accepts at least two
-command-line arguments: the pattern to look for and the text to replace
-it with. Any additional arguments are treated as data file names to
-process. If none are provided, the standard input is used:
+   The first field in each entry of the password file is the user's
+login name, and the fields are separated by colons.  Each record
+defines a subarray, with each field as an element in the subarray.
+Running the program produces the following output:
 
-     # awksed.awk --- do s/foo/bar/g using just print
-     #    Thanks to Michael Brennan for the idea
+     $ gawk -v POS=1 -F: -f sort.awk /etc/passwd
+     -| adm:x:3:4:adm:/var/adm:/sbin/nologin
+     -| apache:x:48:48:Apache:/var/www:/sbin/nologin
+     -| avahi:x:70:70:Avahi daemon:/:/sbin/nologin
+     ...
 
-     function usage()
+   The comparison should normally always return the same value when
+given a specific pair of array elements as its arguments.  If
+inconsistent results are returned then the order is undefined.  This
+behavior can be exploited to introduce random order into otherwise
+seemingly ordered data:
+
+     function cmp_randomize(i1, v1, i2, v2)
      {
-         print "usage: awksed pat repl [files...]" > "/dev/stderr"
-         exit 1
+         # random order
+         return (2 - 4 * rand())
      }
 
-     BEGIN {
-         # validate arguments
-         if (ARGC < 3)
-             usage()
-
-         RS = ARGV[1]
-         ORS = ARGV[2]
+   As mentioned above, the order of the indices is arbitrary if two
+elements compare equal.  This is usually not a problem, but letting the
+tied elements come out in arbitrary order can be an issue, especially
+when comparing item values.  The partial ordering of the equal elements
+may change during the next loop traversal, if other elements are added
+or removed from the array.  One way to resolve ties when comparing
+elements with otherwise equal values is to include the indices in the
+comparison rules.  Note that doing this may make the loop traversal
+less efficient, so consider it only if necessary.  The following
+comparison functions force a deterministic order, and are based on the
+fact that the indices of two elements are never equal:
 
-         # don't use arguments as files
-         ARGV[1] = ARGV[2] = ""
+     function cmp_numeric(i1, v1, i2, v2)
+     {
+         # numerical value (and index) comparison, descending order
+         return (v1 != v2) ? (v2 - v1) : (i2 - i1)
      }
 
-     # look ma, no hands!
+     function cmp_string(i1, v1, i2, v2)
      {
-         if (RT == "")
-             printf "%s", $0
-         else
-             print
+         # string value (and index) comparison, descending order
+         v1 = v1 i1
+         v2 = v2 i2
+         return (v1 > v2) ? -1 : (v1 != v2)
      }
 
-   The program relies on `gawk''s ability to have `RS' be a regexp, as
-well as on the setting of `RT' to the actual text that terminates the
-record (*note Records::).
+   A custom comparison function can often simplify ordered loop
+traversal, and the sky is really the limit when it comes to designing
+such a function.
 
-   The idea is to have `RS' be the pattern to look for. `gawk'
-automatically sets `$0' to the text between matches of the pattern.
-This is text that we want to keep, unmodified.  Then, by setting `ORS'
-to the replacement text, a simple `print' statement outputs the text we
-want to keep, followed by the replacement text.
+   When string comparisons are made during a sort, either for element
+values where one or both aren't numbers, or for element indices handled
+as strings, the value of `IGNORECASE' (*note Built-in Variables::)
+controls whether the comparisons treat corresponding uppercase and
+lowercase letters as equivalent or distinct.
 
-   There is one wrinkle to this scheme, which is what to do if the last
-record doesn't end with text that matches `RS'.  Using a `print'
-statement unconditionally prints the replacement text, which is not
-correct.  However, if the file did not end in text that matches `RS',
-`RT' is set to the null string.  In this case, we can print `$0' using
-`printf' (*note Printf::).
+   Another point to keep in mind is that in the case of subarrays the
+element values can themselves be arrays; a production comparison
+function should use the `isarray()' function (*note Type Functions::),
+to check for this, and choose a defined sorting order for subarrays.
 
-   The `BEGIN' rule handles the setup, checking for the right number of
-arguments and calling `usage()' if there is a problem. Then it sets
-`RS' and `ORS' from the command-line arguments and sets `ARGV[1]' and
-`ARGV[2]' to the null string, so that they are not treated as file names
-(*note ARGC and ARGV::).
+   All sorting based on `PROCINFO["sorted_in"]' is disabled in POSIX
+mode, since the `PROCINFO' array is not special in that case.
 
-   The `usage()' function prints an error message and exits.  Finally,
-the single rule handles the printing scheme outlined above, using
-`print' or `printf' as appropriate, depending upon the value of `RT'.
+   As a side note, sorting the array indices before traversing the
+array has been reported to add 15% to 20% overhead to the execution
+time of `awk' programs. For this reason, sorted array traversal is not
+the default.
 
 
-File: gawk.info,  Node: Igawk Program,  Next: Anagram Program,  Prev: Simple 
Sed,  Up: Miscellaneous Programs
+File: gawk.info,  Node: Array Sorting Functions,  Prev: Controlling Array 
Traversal,  Up: Array Sorting
 
-13.3.9 An Easy Way to Use Library Functions
--------------------------------------------
+13.2.2 Sorting Array Values and Indices with `gawk'
+---------------------------------------------------
+
+In most `awk' implementations, sorting an array requires writing a
+`sort()' function.  While this can be educational for exploring
+different sorting algorithms, usually that's not the point of the
+program.  `gawk' provides the built-in `asort()' and `asorti()'
+functions (*note String Functions::) for sorting arrays.  For example:
+
+     POPULATE THE ARRAY data
+     n = asort(data)
+     for (i = 1; i <= n; i++)
+         DO SOMETHING WITH data[i]
+
+   After the call to `asort()', the array `data' is indexed from 1 to
+some number N, the total number of elements in `data'.  (This count is
+`asort()''s return value.)  `data[1]' <= `data[2]' <= `data[3]', and so
+on.  The comparison is based on the type of the elements (*note Typing
+and Comparison::).  All numeric values come before all string values,
+which in turn come before all subarrays.
+
+   An important side effect of calling `asort()' is that _the array's
+original indices are irrevocably lost_.  As this isn't always
+desirable, `asort()' accepts a second argument:
+
+     POPULATE THE ARRAY source
+     n = asort(source, dest)
+     for (i = 1; i <= n; i++)
+         DO SOMETHING WITH dest[i]
+
+   In this case, `gawk' copies the `source' array into the `dest' array
+and then sorts `dest', destroying its indices.  However, the `source'
+array is not affected.
+
+   `asort()' accepts a third string argument to control comparison of
+array elements.  As with `PROCINFO["sorted_in"]', this argument may be
+one of the predefined names that `gawk' provides (*note Controlling
+Scanning::), or the name of a user-defined function (*note Controlling
+Array Traversal::).
 
-In *note Include Files::, we saw how `gawk' provides a built-in
-file-inclusion capability.  However, this is a `gawk' extension.  This
-minor node provides the motivation for making file inclusion available
-for standard `awk', and shows how to do it using a combination of shell
-and `awk' programming.
+     NOTE: In all cases, the sorted element values consist of the
+     original array's element values.  The ability to control
+     comparison merely affects the way in which they are sorted.
 
-   Using library functions in `awk' can be very beneficial. It
-encourages code reuse and the writing of general functions. Programs are
-smaller and therefore clearer.  However, using library functions is
-only easy when writing `awk' programs; it is painful when running them,
-requiring multiple `-f' options.  If `gawk' is unavailable, then so too
-is the `AWKPATH' environment variable and the ability to put `awk'
-functions into a library directory (*note Options::).  It would be nice
-to be able to write programs in the following manner:
+   Often, what's needed is to sort on the values of the _indices_
+instead of the values of the elements.  To do that, use the `asorti()'
+function.  The interface is identical to that of `asort()', except that
+the index values are used for sorting, and become the values of the
+result array:
 
-     # library functions
-     @include getopt.awk
-     @include join.awk
-     ...
+     { source[$0] = some_func($0) }
 
-     # main program
-     BEGIN {
-         while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
+     END {
+         n = asorti(source, dest)
+         for (i = 1; i <= n; i++) {
+             Work with sorted indices directly:
+             DO SOMETHING WITH dest[i]
              ...
-         ...
+             Access original array via sorted indices:
+             DO SOMETHING WITH source[dest[i]]
+         }
      }
 
-   The following program, `igawk.sh', provides this service.  It
-simulates `gawk''s searching of the `AWKPATH' variable and also allows
-"nested" includes; i.e., a file that is included with address@hidden' can
-contain further address@hidden' statements.  `igawk' makes an effort to only
-include files once, so that nested includes don't accidentally include
-a library function twice.
+   Similar to `asort()', in all cases, the sorted element values
+consist of the original array's indices.  The ability to control
+comparison merely affects the way in which they are sorted.
 
-   `igawk' should behave just like `gawk' externally.  This means it
-should accept all of `gawk''s command-line arguments, including the
-ability to have multiple source files specified via `-f', and the
-ability to mix command-line and library source files.
+   Sorting the array by replacing the indices provides maximal
+flexibility.  To traverse the elements in decreasing order, use a loop
+that goes from N down to 1, either over the elements or over the
+indices.(1)
 
-   The program is written using the POSIX Shell (`sh') command
-language.(1) It works as follows:
+   Copying array indices and elements isn't expensive in terms of
+memory.  Internally, `gawk' maintains "reference counts" to data.  For
+example, when `asort()' copies the first array to the second one, there
+is only one copy of the original array elements' data, even though both
+arrays use the values.
 
-  1. Loop through the arguments, saving anything that doesn't represent
-     `awk' source code for later, when the expanded program is run.
+   Because `IGNORECASE' affects string comparisons, the value of
+`IGNORECASE' also affects sorting for both `asort()' and `asorti()'.
+Note also that the locale's sorting order does _not_ come into play;
+comparisons are based on character values only.(2) Caveat Emptor.
 
-  2. For any arguments that do represent `awk' text, put the arguments
-     into a shell variable that will be expanded.  There are two cases:
+   ---------- Footnotes ----------
 
-       a. Literal text, provided with `--source' or `--source='.  This
-          text is just appended directly.
+   (1) You may also use one of the predefined sorting names that sorts
+in decreasing order.
 
-       b. Source file names, provided with `-f'.  We use a neat trick
-          and append address@hidden FILENAME' to the shell variable's
-          contents.  Since the file-inclusion program works the way
-          `gawk' does, this gets the text of the file included into the
-          program at the correct point.
+   (2) This is true because locale-based comparison occurs only when in
+POSIX compatibility mode, and since `asort()' and `asorti()' are `gawk'
+extensions, they are not available in that case.
 
-  3. Run an `awk' program (naturally) over the shell variable's
-     contents to expand address@hidden' statements.  The expanded program is
-     placed in a second shell variable.
+
+File: gawk.info,  Node: Two-way I/O,  Next: TCP/IP Networking,  Prev: Array 
Sorting,  Up: Advanced Features
 
-  4. Run the expanded program with `gawk' and any other original
-     command-line arguments that the user supplied (such as the data
-     file names).
+13.3 Two-Way Communications with Another Process
+================================================
 
-   This program uses shell variables extensively: for storing
-command-line arguments, the text of the `awk' program that will expand
-the user's program, for the user's original program, and for the
-expanded program.  Doing so removes some potential problems that might
-arise were we to use temporary files instead, at the cost of making the
-script somewhat more complicated.
+     From: address@hidden (Mike Brennan)
+     Newsgroups: comp.lang.awk
+     Subject: Re: Learn the SECRET to Attract Women Easily
+     Date: 4 Aug 1997 17:34:46 GMT
+     Message-ID: <address@hidden>
 
-   The initial part of the program turns on shell tracing if the first
-argument is `debug'.
+     On 3 Aug 1997 13:17:43 GMT, Want More Dates???
+     <address@hidden> wrote:
+     >Learn the SECRET to Attract Women Easily
+     >
+     >The SCENT(tm)  Pheromone Sex Attractant For Men to Attract Women
 
-   The next part loops through all the command-line arguments.  There
-are several cases of interest:
+     The scent of awk programmers is a lot more attractive to women than
+     the scent of perl programmers.
+     --
+     Mike Brennan
 
-`--'
-     This ends the arguments to `igawk'.  Anything else should be
-     passed on to the user's `awk' program without being evaluated.
+   It is often useful to be able to send data to a separate program for
+processing and then read the result.  This can always be done with
+temporary files:
 
-`-W'
-     This indicates that the next option is specific to `gawk'.  To make
-     argument processing easier, the `-W' is appended to the front of
-     the remaining arguments and the loop continues.  (This is an `sh'
-     programming trick.  Don't worry about it if you are not familiar
-     with `sh'.)
+     # Write the data for processing
+     tempfile = ("mydata." PROCINFO["pid"])
+     while (NOT DONE WITH DATA)
+         print DATA | ("subprogram > " tempfile)
+     close("subprogram > " tempfile)
 
-`-v, -F'
-     These are saved and passed on to `gawk'.
+     # Read the results, remove tempfile when done
+     while ((getline newdata < tempfile) > 0)
+         PROCESS newdata APPROPRIATELY
+     close(tempfile)
+     system("rm " tempfile)
 
-`-f, --file, --file=, -Wfile='
-     The file name is appended to the shell variable `program' with an
-     address@hidden' statement.  The `expr' utility is used to remove the
-     leading option part of the argument (e.g., `--file=').  (Typical
-     `sh' usage would be to use the `echo' and `sed' utilities to do
-     this work.  Unfortunately, some versions of `echo' evaluate escape
-     sequences in their arguments, possibly mangling the program text.
-     Using `expr' avoids this problem.)
+This works, but not elegantly.  Among other things, it requires that
+the program be run in a directory that cannot be shared among users;
+for example, `/tmp' will not do, as another user might happen to be
+using a temporary file with the same name.
 
-`--source, --source=, -Wsource='
-     The source text is appended to `program'.
+   However, with `gawk', it is possible to open a _two-way_ pipe to
+another process.  The second process is termed a "coprocess", since it
+runs in parallel with `gawk'.  The two-way connection is created using
+the `|&' operator (borrowed from the Korn shell, `ksh'):(1)
 
-`--version, -Wversion'
-     `igawk' prints its version number, runs `gawk --version' to get
-     the `gawk' version information, and then exits.
+     do {
+         print DATA |& "subprogram"
+         "subprogram" |& getline results
+     } while (DATA LEFT TO PROCESS)
+     close("subprogram")
 
-   If none of the `-f', `--file', `-Wfile', `--source', or `-Wsource'
-arguments are supplied, then the first nonoption argument should be the
-`awk' program.  If there are no command-line arguments left, `igawk'
-prints an error message and exits.  Otherwise, the first argument is
-appended to `program'.  In any case, after the arguments have been
-processed, `program' contains the complete text of the original `awk'
-program.
+   The first time an I/O operation is executed using the `|&' operator,
+`gawk' creates a two-way pipeline to a child process that runs the
+other program.  Output created with `print' or `printf' is written to
+the program's standard input, and output from the program's standard
+output can be read by the `gawk' program using `getline'.  As is the
+case with processes started by `|', the subprogram can be any program,
+or pipeline of programs, that can be started by the shell.
 
-   The program is as follows:
+   There are some cautionary items to be aware of:
 
-     #! /bin/sh
-     # igawk --- like gawk but do @include processing
+   * As the code inside `gawk' currently stands, the coprocess's
+     standard error goes to the same place that the parent `gawk''s
+     standard error goes. It is not possible to read the child's
+     standard error separately.
 
-     if [ "$1" = debug ]
-     then
-         set -x
-         shift
-     fi
+   * I/O buffering may be a problem.  `gawk' automatically flushes all
+     output down the pipe to the coprocess.  However, if the coprocess
+     does not flush its output, `gawk' may hang when doing a `getline'
+     in order to read the coprocess's results.  This could lead to a
+     situation known as "deadlock", where each process is waiting for
+     the other one to do something.
 
-     # A literal newline, so that program text is formatted correctly
-     n='
-     '
+   It is possible to close just one end of the two-way pipe to a
+coprocess, by supplying a second argument to the `close()' function of
+either `"to"' or `"from"' (*note Close Files And Pipes::).  These
+strings tell `gawk' to close the end of the pipe that sends data to the
+coprocess or the end that reads from it, respectively.
 
-     # Initialize variables to empty
-     program=
-     opts=
+   This is particularly necessary in order to use the system `sort'
+utility as part of a coprocess; `sort' must read _all_ of its input
+data before it can produce any output.  The `sort' program does not
+receive an end-of-file indication until `gawk' closes the write end of
+the pipe.
 
-     while [ $# -ne 0 ] # loop over arguments
-     do
-         case $1 in
-         --)     shift
-                 break ;;
+   When you have finished writing data to the `sort' utility, you can
+close the `"to"' end of the pipe, and then start reading sorted data
+via `getline'.  For example:
 
-         -W)     shift
-                 # The ${x?'message here'} construct prints a
-                 # diagnostic if $x is the null string
-                 set -- -W"address@hidden'missing operand'}"
-                 continue ;;
+     BEGIN {
+         command = "LC_ALL=C sort"
+         n = split("abcdefghijklmnopqrstuvwxyz", a, "")
 
-         -[vF])  opts="$opts $1 '${2?'missing operand'}'"
-                 shift ;;
+         for (i = n; i > 0; i--)
+             print a[i] |& command
+         close(command, "to")
 
-         -[vF]*) opts="$opts '$1'" ;;
+         while ((command |& getline line) > 0)
+             print "got", line
+         close(command)
+     }
+
+   This program writes the letters of the alphabet in reverse order, one
+per line, down the two-way pipe to `sort'.  It then closes the write
+end of the pipe, so that `sort' receives an end-of-file indication.
+This causes `sort' to sort the data and write the sorted data back to
+the `gawk' program.  Once all of the data has been read, `gawk'
+terminates the coprocess and exits.
+
+   As a side note, the assignment `LC_ALL=C' in the `sort' command
+ensures traditional Unix (ASCII) sorting from `sort'.
 
-         -f)     program="address@hidden ${2?'missing operand'}"
-                 shift ;;
+   You may also use pseudo-ttys (ptys) for two-way communication
+instead of pipes, if your system supports them.  This is done on a
+per-command basis, by setting a special element in the `PROCINFO' array
+(*note Auto-set::), like so:
 
-         -f*)    f=$(expr "$1" : '-f\(.*\)')
-                 program="address@hidden $f" ;;
+     command = "sort -nr"           # command, save in convenience variable
+     PROCINFO[command, "pty"] = 1   # update PROCINFO
+     print ... |& command       # start two-way pipe
+     ...
 
-         -[W-]file=*)
-                 f=$(expr "$1" : '-.file=\(.*\)')
-                 program="address@hidden $f" ;;
+Using ptys avoids the buffer deadlock issues described earlier, at some
+loss in performance.  If your system does not have ptys, or if all the
+system's ptys are in use, `gawk' automatically falls back to using
+regular pipes.
 
-         -[W-]file)
-                 program="address@hidden ${2?'missing operand'}"
-                 shift ;;
+   ---------- Footnotes ----------
 
-         -[W-]source=*)
-                 t=$(expr "$1" : '-.source=\(.*\)')
-                 program="$program$n$t" ;;
+   (1) This is very different from the same operator in the C shell.
 
-         -[W-]source)
-                 program="$program$n${2?'missing operand'}"
-                 shift ;;
+
+File: gawk.info,  Node: TCP/IP Networking,  Next: Profiling,  Prev: Two-way 
I/O,  Up: Advanced Features
 
-         -[W-]version)
-                 echo igawk: version 3.0 1>&2
-                 gawk --version
-                 exit 0 ;;
+13.4 Using `gawk' for Network Programming
+=========================================
 
-         -[W-]*) opts="$opts '$1'" ;;
+     `EMISTERED':
+     A host is a host from coast to coast,
+     and no-one can talk to host that's close,
+     unless the host that isn't close
+     is busy hung or dead.
 
-         *)      break ;;
-         esac
-         shift
-     done
+   In addition to being able to open a two-way pipeline to a coprocess
+on the same system (*note Two-way I/O::), it is possible to make a
+two-way connection to another process on another system across an IP
+network connection.
 
-     if [ -z "$program" ]
-     then
-          program=${1?'missing program'}
-          shift
-     fi
+   You can think of this as just a _very long_ two-way pipeline to a
+coprocess.  The way `gawk' decides that you want to use TCP/IP
+networking is by recognizing special file names that begin with one of
+`/inet/', `/inet4/' or `/inet6'.
 
-     # At this point, `program' has the program.
+   The full syntax of the special file name is
+`/NET-TYPE/PROTOCOL/LOCAL-PORT/REMOTE-HOST/REMOTE-PORT'.  The
+components are:
 
-   The `awk' program to process address@hidden' directives is stored in the
-shell variable `expand_prog'.  Doing this keeps the shell script
-readable.  The `awk' program reads through the user's program, one line
-at a time, using `getline' (*note Getline::).  The input file names and
address@hidden' statements are managed using a stack.  As each address@hidden' 
is
-encountered, the current file name is "pushed" onto the stack and the
-file named in the address@hidden' directive becomes the current file name.
-As each file is finished, the stack is "popped," and the previous input
-file becomes the current input file again.  The process is started by
-making the original file the first one on the stack.
+NET-TYPE
+     Specifies the kind of Internet connection to make.  Use `/inet4/'
+     to force IPv4, and `/inet6/' to force IPv6.  Plain `/inet/' (which
+     used to be the only option) uses the system default, most likely
+     IPv4.
 
-   The `pathto()' function does the work of finding the full path to a
-file.  It simulates `gawk''s behavior when searching the `AWKPATH'
-environment variable (*note AWKPATH Variable::).  If a file name has a
-`/' in it, no path search is done.  Similarly, if the file name is
-`"-"', then that string is used as-is.  Otherwise, the file name is
-concatenated with the name of each directory in the path, and an
-attempt is made to open the generated file name.  The only way to test
-if a file can be read in `awk' is to go ahead and try to read it with
-`getline'; this is what `pathto()' does.(2) If the file can be read, it
-is closed and the file name is returned:
+PROTOCOL
+     The protocol to use over IP.  This must be either `tcp', or `udp',
+     for a TCP or UDP IP connection, respectively.  The use of TCP is
+     recommended for most applications.
 
-     expand_prog='
+LOCAL-PORT
+     The local TCP or UDP port number to use.  Use a port number of `0'
+     when you want the system to pick a port. This is what you should do
+     when writing a TCP or UDP client.  You may also use a well-known
+     service name, such as `smtp' or `http', in which case `gawk'
+     attempts to determine the predefined port number using the C
+     `getaddrinfo()' function.
 
-     function pathto(file,    i, t, junk)
-     {
-         if (index(file, "/") != 0)
-             return file
+REMOTE-HOST
+     The IP address or fully-qualified domain name of the Internet host
+     to which you want to connect.
 
-         if (file == "-")
-             return file
+REMOTE-PORT
+     The TCP or UDP port number to use on the given REMOTE-HOST.
+     Again, use `0' if you don't care, or else a well-known service
+     name.
 
-         for (i = 1; i <= ndirs; i++) {
-             t = (pathlist[i] "/" file)
-             if ((getline junk < t) > 0) {
-                 # found it
-                 close(t)
-                 return t
-             }
-         }
-         return ""
-     }
+     NOTE: Failure in opening a two-way socket will result in a
+     non-fatal error being returned to the calling code. The value of
+     `ERRNO' indicates the error (*note Auto-set::).
 
-   The main program is contained inside one `BEGIN' rule.  The first
-thing it does is set up the `pathlist' array that `pathto()' uses.
-After splitting the path on `:', null elements are replaced with `"."',
-which represents the current directory:
+   Consider the following very simple example:
 
      BEGIN {
-         path = ENVIRON["AWKPATH"]
-         ndirs = split(path, pathlist, ":")
-         for (i = 1; i <= ndirs; i++) {
-             if (pathlist[i] == "")
-                 pathlist[i] = "."
-         }
+       Service = "/inet/tcp/0/localhost/daytime"
+       Service |& getline
+       print $0
+       close(Service)
+     }
 
-   The stack is initialized with `ARGV[1]', which will be `/dev/stdin'.
-The main loop comes next.  Input lines are read in succession. Lines
-that do not start with address@hidden' are printed verbatim.  If the line
-does start with address@hidden', the file name is in `$2'.  `pathto()' is
-called to generate the full path.  If it cannot, then the program
-prints an error message and continues.
+   This program reads the current date and time from the local system's
+TCP `daytime' server.  It then prints the results and closes the
+connection.
 
-   The next thing to check is if the file is included already.  The
-`processed' array is indexed by the full file name of each included
-file and it tracks this information for us.  If the file is seen again,
-a warning message is printed. Otherwise, the new file name is pushed
-onto the stack and processing continues.
+   Because this topic is extensive, the use of `gawk' for TCP/IP
+programming is documented separately.  See *note (General
+Introduction)Top:: gawkinet, TCP/IP Internetworking with `gawk', for a
+much more complete introduction and discussion, as well as extensive
+examples.
 
-   Finally, when `getline' encounters the end of the input file, the
-file is closed and the stack is popped.  When `stackptr' is less than
-zero, the program is done:
+
+File: gawk.info,  Node: Profiling,  Prev: TCP/IP Networking,  Up: Advanced 
Features
 
-         stackptr = 0
-         input[stackptr] = ARGV[1] # ARGV[1] is first file
+13.5 Profiling Your `awk' Programs
+==================================
 
-         for (; stackptr >= 0; stackptr--) {
-             while ((getline < input[stackptr]) > 0) {
-                 if (tolower($1) != "@include") {
-                     print
-                     continue
-                 }
-                 fpath = pathto($2)
-                 if (fpath == "") {
-                     printf("igawk:%s:%d: cannot find %s\n",
-                         input[stackptr], FNR, $2) > "/dev/stderr"
-                     continue
-                 }
-                 if (! (fpath in processed)) {
-                     processed[fpath] = input[stackptr]
-                     input[++stackptr] = fpath  # push onto stack
-                 } else
-                     print $2, "included in", input[stackptr],
-                         "already included in",
-                         processed[fpath] > "/dev/stderr"
-             }
-             close(input[stackptr])
-         }
-     }'  # close quote ends `expand_prog' variable
+You may produce execution traces of your `awk' programs.  This is done
+by passing the option `--profile' to `gawk'.  When `gawk' has finished
+running, it creates a profile of your program in a file named
+`awkprof.out'. Because it is profiling, it also executes up to 45%
+slower than `gawk' normally does.
 
-     processed_program=$(gawk -- "$expand_prog" /dev/stdin << EOF
-     $program
-     EOF
-     )
+   As shown in the following example, the `--profile' option can be
+used to change the name of the file where `gawk' will write the profile:
 
-   The shell construct `COMMAND << MARKER' is called a "here document".
-Everything in the shell script up to the MARKER is fed to COMMAND as
-input.  The shell processes the contents of the here document for
-variable and command substitution (and possibly other things as well,
-depending upon the shell).
+     gawk --profile=myprog.prof -f myprog.awk data1 data2
 
-   The shell construct `$(...)' is called "command substitution".  The
-output of the command inside the parentheses is substituted into the
-command line.  Because the result is used in a variable assignment, it
-is saved as a single string, even if the results contain whitespace.
+In the above example, `gawk' places the profile in `myprog.prof'
+instead of in `awkprof.out'.
 
-   The expanded program is saved in the variable `processed_program'.
-It's done in these steps:
+   Here is a sample session showing a simple `awk' program, its input
+data, and the results from running `gawk' with the `--profile' option.
+First, the `awk' program:
 
-  1. Run `gawk' with the address@hidden'-processing program (the value of
-     the `expand_prog' shell variable) on standard input.
+     BEGIN { print "First BEGIN rule" }
 
-  2. Standard input is the contents of the user's program, from the
-     shell variable `program'.  Its contents are fed to `gawk' via a
-     here document.
+     END { print "First END rule" }
 
-  3. The results of this processing are saved in the shell variable
-     `processed_program' by using command substitution.
+     /foo/ {
+         print "matched /foo/, gosh"
+         for (i = 1; i <= 3; i++)
+             sing()
+     }
 
-   The last step is to call `gawk' with the expanded program, along
-with the original options and command-line arguments that the user
-supplied.
+     {
+         if (/foo/)
+             print "if is true"
+         else
+             print "else is true"
+     }
 
-     eval gawk $opts -- '"$processed_program"' '"$@"'
+     BEGIN { print "Second BEGIN rule" }
 
-   The `eval' command is a shell construct that reruns the shell's
-parsing process.  This keeps things properly quoted.
+     END { print "Second END rule" }
 
-   This version of `igawk' represents my fifth version of this program.
-There are four key simplifications that make the program work better:
+     function sing(    dummy)
+     {
+         print "I gotta be me!"
+     }
+
+   Following is the input data:
 
-   * Using address@hidden' even for the files named with `-f' makes building
-     the initial collected `awk' program much simpler; all the
-     address@hidden' processing can be done once.
+     foo
+     bar
+     baz
+     foo
+     junk
 
-   * Not trying to save the line read with `getline' in the `pathto()'
-     function when testing for the file's accessibility for use with
-     the main program simplifies things considerably.
+   Here is the `awkprof.out' that results from running the `gawk'
+profiler on this program and data (this example also illustrates that
+`awk' programmers sometimes have to work late):
 
-   * Using a `getline' loop in the `BEGIN' rule does it all in one
-     place.  It is not necessary to call out to a separate loop for
-     processing nested address@hidden' statements.
+             # gawk profile, created Sun Aug 13 00:00:15 2000
 
-   * Instead of saving the expanded program in a temporary file,
-     putting it in a shell variable avoids some potential security
-     problems.  This has the disadvantage that the script relies upon
-     more features of the `sh' language, making it harder to follow for
-     those who aren't familiar with `sh'.
+             # BEGIN block(s)
 
-   Also, this program illustrates that it is often worthwhile to combine
-`sh' and `awk' programming together.  You can usually accomplish quite
-a lot, without having to resort to low-level programming in C or C++,
-and it is frequently easier to do certain kinds of string and argument
-manipulation using the shell than it is in `awk'.
+             BEGIN {
+          1          print "First BEGIN rule"
+          1          print "Second BEGIN rule"
+             }
 
-   Finally, `igawk' shows that it is not always necessary to add new
-features to a program; they can often be layered on top.
+             # Rule(s)
 
-   As an additional example of this, consider the idea of having two
-files in a directory in the search path:
+          5  /foo/   { # 2
+          2          print "matched /foo/, gosh"
+          6          for (i = 1; i <= 3; i++) {
+          6                  sing()
+                     }
+             }
 
-`default.awk'
-     This file contains a set of default library functions, such as
-     `getopt()' and `assert()'.
+          5  {
+          5          if (/foo/) { # 2
+          2                  print "if is true"
+          3          } else {
+          3                  print "else is true"
+                     }
+             }
 
-`site.awk'
-     This file contains library functions that are specific to a site or
-     installation; i.e., locally developed functions.  Having a
-     separate file allows `default.awk' to change with new `gawk'
-     releases, without requiring the system administrator to update it
-     each time by adding the local functions.
+             # END block(s)
 
-   One user suggested that `gawk' be modified to automatically read
-these files upon startup.  Instead, it would be very simple to modify
-`igawk' to do this. Since `igawk' can process nested address@hidden'
-directives, `default.awk' could simply contain address@hidden' statements
-for the desired library functions.
+             END {
+          1          print "First END rule"
+          1          print "Second END rule"
+             }
 
-   ---------- Footnotes ----------
+             # Functions, listed alphabetically
 
-   (1) Fully explaining the `sh' language is beyond the scope of this
-book. We provide some minimal explanations, but see a good shell
-programming book if you wish to understand things in more depth.
+          6  function sing(dummy)
+             {
+          6          print "I gotta be me!"
+             }
 
-   (2) On some very old versions of `awk', the test `getline junk < t'
-can loop forever if the file exists but is empty.  Caveat emptor.
+   This example illustrates many of the basic features of profiling
+output.  They are as follows:
 
-
-File: gawk.info,  Node: Anagram Program,  Next: Signature Program,  Prev: 
Igawk Program,  Up: Miscellaneous Programs
+   * The program is printed in the order `BEGIN' rule, `BEGINFILE' rule,
+     pattern/action rules, `ENDFILE' rule, `END' rule and functions,
+     listed alphabetically.  Multiple `BEGIN' and `END' rules are
+     merged together, as are multiple `BEGINFILE' and `ENDFILE' rules.
 
-13.3.10 Finding Anagrams From A Dictionary
-------------------------------------------
+   * Pattern-action rules have two counts.  The first count, to the
+     left of the rule, shows how many times the rule's pattern was
+     _tested_.  The second count, to the right of the rule's opening
+     left brace in a comment, shows how many times the rule's action
+     was _executed_.  The difference between the two indicates how many
+     times the rule's pattern evaluated to false.
 
-An interesting programming challenge is to search for "anagrams" in a
-word list (such as `/usr/share/dict/words' on many GNU/Linux systems).
-One word is an anagram of another if both words contain the same letters
-(for example, "babbling" and "blabbing").
+   * Similarly, the count for an `if'-`else' statement shows how many
+     times the condition was tested.  To the right of the opening left
+     brace for the `if''s body is a count showing how many times the
+     condition was true.  The count for the `else' indicates how many
+     times the test failed.
 
-   An elegant algorithm is presented in Column 2, Problem C of Jon
-Bentley's `Programming Pearls', second edition.  The idea is to give
-words that are anagrams a common signature, sort all the words together
-by their signature, and then print them.  Dr. Bentley observes that
-taking the letters in each word and sorting them produces that common
-signature.
+   * The count for a loop header (such as `for' or `while') shows how
+     many times the loop test was executed.  (Because of this, you
+     can't just look at the count on the first statement in a rule to
+     determine how many times the rule was executed.  If the first
+     statement is a loop, the count is misleading.)
 
-   The following program uses arrays of arrays to bring together words
-with the same signature and array sorting to print the words in sorted
-order.
+   * For user-defined functions, the count next to the `function'
+     keyword indicates how many times the function was called.  The
+     counts next to the statements in the body show how many times
+     those statements were executed.
 
-     # anagram.awk --- An implementation of the anagram finding algorithm
-     #                 from Jon Bentley's "Programming Pearls", 2nd edition.
-     #                 Addison Wesley, 2000, ISBN 0-201-65788-0.
-     #                 Column 2, Problem C, section 2.8, pp 18-20.
+   * The layout uses "K&R" style with TABs.  Braces are used
+     everywhere, even when the body of an `if', `else', or loop is only
+     a single statement.
 
-     /'s$/   { next }        # Skip possessives
+   * Parentheses are used only where needed, as indicated by the
+     structure of the program and the precedence rules.  For example,
+     `(3 + 5) * 4' means add three plus five, then multiply the total
+     by four.  However, `3 + 5 * 4' has no parentheses, and means `3 +
+     (5 * 4)'.
 
-   The program starts with a header, and then a rule to skip
-possessives in the dictionary file. The next rule builds up the data
-structure. The first dimension of the array is indexed by the
-signature; the second dimension is the word itself:
+   * Parentheses are used around the arguments to `print' and `printf'
+     only when the `print' or `printf' statement is followed by a
+     redirection.  Similarly, if the target of a redirection isn't a
+     scalar, it gets parenthesized.
 
-     {
-         key = word2key($1)  # Build signature
-         data[key][$1] = $1  # Store word with signature
-     }
+   * `gawk' supplies leading comments in front of the `BEGIN' and `END'
+     rules, the pattern/action rules, and the functions.
 
-   The `word2key()' function creates the signature.  It splits the word
-apart into individual letters, sorts the letters, and then joins them
-back together:
 
-     # word2key --- split word apart into letters, sort, joining back together
+   The profiled version of your program may not look exactly like what
+you typed when you wrote it.  This is because `gawk' creates the
+profiled version by "pretty printing" its internal representation of
+the program.  The advantage to this is that `gawk' can produce a
+standard representation.  The disadvantage is that all source-code
+comments are lost, as are the distinctions among multiple `BEGIN',
+`END', `BEGINFILE', and `ENDFILE' rules.  Also, things such as:
 
-     function word2key(word,     a, i, n, result)
-     {
-         n = split(word, a, "")
-         asort(a)
+     /foo/
 
-         for (i = 1; i <= n; i++)
-             result = result a[i]
+come out as:
 
-         return result
+     /foo/   {
+         print $0
      }
 
-   Finally, the `END' rule traverses the array and prints out the
-anagram lists.  It sends the output to the system `sort' command, since
-otherwise the anagrams would appear in arbitrary order:
+which is correct, but possibly surprising.
 
-     END {
-         sort = "sort"
-         for (key in data) {
-             # Sort words with same key
-             nwords = asorti(data[key], words)
-             if (nwords == 1)
-                 continue
+   Besides creating profiles when a program has completed, `gawk' can
+produce a profile while it is running.  This is useful if your `awk'
+program goes into an infinite loop and you want to see what has been
+executed.  To use this feature, run `gawk' with the `--profile' option
+in the background:
 
-             # And print. Minor glitch: trailing space at end of each line
-             for (j = 1; j <= nwords; j++)
-                 printf("%s ", words[j]) | sort
-             print "" | sort
-         }
-         close(sort)
-     }
+     $ gawk --profile -f myprog &
+     [1] 13992
 
-   Here is some partial output when the program is run:
+The shell prints a job number and process ID number; in this case,
+13992.  Use the `kill' command to send the `USR1' signal to `gawk':
 
-     $ gawk -f anagram.awk /usr/share/dict/words | grep '^b'
-     ...
-     babbled blabbed
-     babbler blabber brabble
-     babblers blabbers brabbles
-     babbling blabbing
-     babbly blabby
-     babel bable
-     babels beslab
-     babery yabber
-     ...
+     $ kill -USR1 13992
 
-
-File: gawk.info,  Node: Signature Program,  Prev: Anagram Program,  Up: 
Miscellaneous Programs
+As usual, the profiled version of the program is written to
+`awkprof.out', or to a different file if one specified with the
+`--profile' option.
 
-13.3.11 And Now For Something Completely Different
---------------------------------------------------
+   Along with the regular profile, as shown earlier, the profile
+includes a trace of any active functions:
 
-The following program was written by Davide Brini and is published on
-his website (http://backreference.org/2011/02/03/obfuscated-awk/).  It
-serves as his signature in the Usenet group `comp.lang.awk'.  He
-supplies the following copyright terms:
+     # Function Call Stack:
 
-     Copyright (C) 2008 Davide Brini
+     #   3. baz
+     #   2. bar
+     #   1. foo
+     # -- main --
 
-     Copying and distribution of the code published in this page, with
-     or without modification, are permitted in any medium without
-     royalty provided the copyright notice and this notice are
-     preserved.
+   You may send `gawk' the `USR1' signal as many times as you like.
+Each time, the profile and function call trace are appended to the
+output profile file.
 
-   Here is the program:
+   If you use the `HUP' signal instead of the `USR1' signal, `gawk'
+produces the profile and the function call trace and then exits.
 
-     awk 'BEGIN{O="~"~"~";o="=="=="==";o+=+o;x=O""O;while(X++<=x+o+o)c=c"%c";
-     printf c,(x-O)*(x-O),x*(x-o)-o,x*(x-O)+x-O-o,+x*(x-O)-x+o,X*(o*o+O)+x-O,
-     X*(X-x)-o*o,(x+X)*o*o+o,x*(X-x)-O-O,x-O+(O+o+X+x)*(o+O),X*X-X*(x-O)-x+O,
-     O+X*(o*(o+O)+O),+x+O+X*o,x*(x-o),(o+X+x)*o*o-(x-O-O),O+(X-x)*(X+O),x-O}'
+   When `gawk' runs on MS-Windows systems, it uses the `INT' and `QUIT'
+signals for producing the profile and, in the case of the `INT' signal,
+`gawk' exits.  This is because these systems don't support the `kill'
+command, so the only signals you can deliver to a program are those
+generated by the keyboard.  The `INT' signal is generated by the
+`Ctrl-<C>' or `Ctrl-<BREAK>' key, while the `QUIT' signal is generated
+by the `Ctrl-<\>' key.
 
-   We leave it to you to determine what the program does.
+   Finally, `gawk' also accepts another option `--pretty-print'.  When
+called this way, `gawk' "pretty prints" the program into `awkprof.out',
+without any execution counts.
 
 
-File: gawk.info,  Node: Debugger,  Next: Arbitrary Precision Arithmetic,  
Prev: Sample Programs,  Up: Top
+File: gawk.info,  Node: Debugger,  Next: Arbitrary Precision Arithmetic,  
Prev: Advanced Features,  Up: Top
 
 14 Debugging `awk' Programs
 ***************************
@@ -29489,8 +29510,8 @@ Index
 * break debugger command:                Breakpoint Control.  (line  11)
 * break statement:                       Break Statement.     (line   6)
 * Brennan, Michael <1>:                  Other Versions.      (line   6)
-* Brennan, Michael <2>:                  Simple Sed.          (line  25)
-* Brennan, Michael <3>:                  Two-way I/O.         (line   6)
+* Brennan, Michael <2>:                  Two-way I/O.         (line   6)
+* Brennan, Michael <3>:                  Simple Sed.          (line  25)
 * Brennan, Michael:                      Delete.              (line  56)
 * Brian Kernighan's awk, extensions <1>: Other Versions.      (line  13)
 * Brian Kernighan's awk, extensions:     BTL.                 (line   6)
@@ -31042,10 +31063,10 @@ Index
 * private variables:                     Library Names.       (line  11)
 * processes, two-way communications with: Two-way I/O.        (line  23)
 * processing data:                       Basic High Level.    (line   6)
-* PROCINFO array <1>:                    Id Program.          (line  15)
-* PROCINFO array <2>:                    Group Functions.     (line   6)
-* PROCINFO array <3>:                    Passwd Functions.    (line   6)
-* PROCINFO array <4>:                    Two-way I/O.         (line 116)
+* PROCINFO array <1>:                    Two-way I/O.         (line 116)
+* PROCINFO array <2>:                    Id Program.          (line  15)
+* PROCINFO array <3>:                    Group Functions.     (line   6)
+* PROCINFO array <4>:                    Passwd Functions.    (line   6)
 * PROCINFO array <5>:                    Time Functions.      (line  46)
 * PROCINFO array <6>:                    Auto-set.            (line 130)
 * PROCINFO array:                        Obsolete.            (line  11)
@@ -31672,507 +31693,507 @@ Node: History47786
 Node: Names50177
 Ref: Names-Footnote-151654
 Node: This Manual51726
-Ref: This Manual-Footnote-156854
-Node: Conventions56954
-Node: Manual History59088
-Ref: Manual History-Footnote-162358
-Ref: Manual History-Footnote-262399
-Node: How To Contribute62473
-Node: Acknowledgments63617
-Node: Getting Started68113
-Node: Running gawk70492
-Node: One-shot71678
-Node: Read Terminal72903
-Ref: Read Terminal-Footnote-174553
-Ref: Read Terminal-Footnote-274829
-Node: Long75000
-Node: Executable Scripts76376
-Ref: Executable Scripts-Footnote-178245
-Ref: Executable Scripts-Footnote-278347
-Node: Comments78894
-Node: Quoting81361
-Node: DOS Quoting85984
-Node: Sample Data Files86659
-Node: Very Simple89691
-Node: Two Rules94290
-Node: More Complex96437
-Ref: More Complex-Footnote-199367
-Node: Statements/Lines99452
-Ref: Statements/Lines-Footnote-1103914
-Node: Other Features104179
-Node: When105107
-Node: Invoking Gawk107254
-Node: Command Line108715
-Node: Options109498
-Ref: Options-Footnote-1124896
-Node: Other Arguments124921
-Node: Naming Standard Input127579
-Node: Environment Variables128673
-Node: AWKPATH Variable129231
-Ref: AWKPATH Variable-Footnote-1131989
-Node: AWKLIBPATH Variable132249
-Node: Other Environment Variables132846
-Node: Exit Status135341
-Node: Include Files136016
-Node: Loading Shared Libraries139585
-Node: Obsolete140810
-Node: Undocumented141507
-Node: Regexp141750
-Node: Regexp Usage143139
-Node: Escape Sequences145165
-Node: Regexp Operators150928
-Ref: Regexp Operators-Footnote-1158308
-Ref: Regexp Operators-Footnote-2158455
-Node: Bracket Expressions158553
-Ref: table-char-classes160443
-Node: GNU Regexp Operators162966
-Node: Case-sensitivity166689
-Ref: Case-sensitivity-Footnote-1169657
-Ref: Case-sensitivity-Footnote-2169892
-Node: Leftmost Longest170000
-Node: Computed Regexps171201
-Node: Reading Files174611
-Node: Records176614
-Ref: Records-Footnote-1185538
-Node: Fields185575
-Ref: Fields-Footnote-1188608
-Node: Nonconstant Fields188694
-Node: Changing Fields190896
-Node: Field Separators196877
-Node: Default Field Splitting199506
-Node: Regexp Field Splitting200623
-Node: Single Character Fields203965
-Node: Command Line Field Separator205024
-Node: Field Splitting Summary208465
-Ref: Field Splitting Summary-Footnote-1211657
-Node: Constant Size211758
-Node: Splitting By Content216342
-Ref: Splitting By Content-Footnote-1220068
-Node: Multiple Line220108
-Ref: Multiple Line-Footnote-1225955
-Node: Getline226134
-Node: Plain Getline228350
-Node: Getline/Variable230439
-Node: Getline/File231580
-Node: Getline/Variable/File232902
-Ref: Getline/Variable/File-Footnote-1234501
-Node: Getline/Pipe234588
-Node: Getline/Variable/Pipe237148
-Node: Getline/Coprocess238255
-Node: Getline/Variable/Coprocess239498
-Node: Getline Notes240212
-Node: Getline Summary242999
-Ref: table-getline-variants243407
-Node: Read Timeout244263
-Ref: Read Timeout-Footnote-1248008
-Node: Command line directories248065
-Node: Printing248695
-Node: Print250326
-Node: Print Examples251663
-Node: Output Separators254447
-Node: OFMT256207
-Node: Printf257565
-Node: Basic Printf258471
-Node: Control Letters260010
-Node: Format Modifiers263822
-Node: Printf Examples269831
-Node: Redirection272546
-Node: Special Files279530
-Node: Special FD280063
-Ref: Special FD-Footnote-1283688
-Node: Special Network283762
-Node: Special Caveats284612
-Node: Close Files And Pipes285408
-Ref: Close Files And Pipes-Footnote-1292431
-Ref: Close Files And Pipes-Footnote-2292579
-Node: Expressions292729
-Node: Values293861
-Node: Constants294537
-Node: Scalar Constants295217
-Ref: Scalar Constants-Footnote-1296076
-Node: Nondecimal-numbers296258
-Node: Regexp Constants299317
-Node: Using Constant Regexps299792
-Node: Variables302847
-Node: Using Variables303502
-Node: Assignment Options305226
-Node: Conversion307098
-Ref: table-locale-affects312474
-Ref: Conversion-Footnote-1313098
-Node: All Operators313207
-Node: Arithmetic Ops313837
-Node: Concatenation316342
-Ref: Concatenation-Footnote-1319135
-Node: Assignment Ops319255
-Ref: table-assign-ops324243
-Node: Increment Ops325651
-Node: Truth Values and Conditions329121
-Node: Truth Values330204
-Node: Typing and Comparison331253
-Node: Variable Typing332042
-Ref: Variable Typing-Footnote-1335939
-Node: Comparison Operators336061
-Ref: table-relational-ops336471
-Node: POSIX String Comparison340020
-Ref: POSIX String Comparison-Footnote-1340976
-Node: Boolean Ops341114
-Ref: Boolean Ops-Footnote-1345192
-Node: Conditional Exp345283
-Node: Function Calls347015
-Node: Precedence350609
-Node: Locales354278
-Node: Patterns and Actions355367
-Node: Pattern Overview356421
-Node: Regexp Patterns358090
-Node: Expression Patterns358633
-Node: Ranges362318
-Node: BEGIN/END365284
-Node: Using BEGIN/END366046
-Ref: Using BEGIN/END-Footnote-1368777
-Node: I/O And BEGIN/END368883
-Node: BEGINFILE/ENDFILE371165
-Node: Empty374069
-Node: Using Shell Variables374385
-Node: Action Overview376670
-Node: Statements379027
-Node: If Statement380881
-Node: While Statement382380
-Node: Do Statement384424
-Node: For Statement385580
-Node: Switch Statement388732
-Node: Break Statement390829
-Node: Continue Statement392819
-Node: Next Statement394612
-Node: Nextfile Statement397002
-Node: Exit Statement399643
-Node: Built-in Variables402059
-Node: User-modified403154
-Ref: User-modified-Footnote-1411509
-Node: Auto-set411571
-Ref: Auto-set-Footnote-1423922
-Ref: Auto-set-Footnote-2424127
-Node: ARGC and ARGV424183
-Node: Arrays428034
-Node: Array Basics429539
-Node: Array Intro430365
-Node: Reference to Elements434683
-Node: Assigning Elements436953
-Node: Array Example437444
-Node: Scanning an Array439176
-Node: Controlling Scanning441490
-Ref: Controlling Scanning-Footnote-1446423
-Node: Delete446739
-Ref: Delete-Footnote-1449504
-Node: Numeric Array Subscripts449561
-Node: Uninitialized Subscripts451744
-Node: Multi-dimensional453372
-Node: Multi-scanning456466
-Node: Arrays of Arrays458057
-Node: Functions462702
-Node: Built-in463524
-Node: Calling Built-in464602
-Node: Numeric Functions466590
-Ref: Numeric Functions-Footnote-1470422
-Ref: Numeric Functions-Footnote-2470779
-Ref: Numeric Functions-Footnote-3470827
-Node: String Functions471096
-Ref: String Functions-Footnote-1494593
-Ref: String Functions-Footnote-2494722
-Ref: String Functions-Footnote-3494970
-Node: Gory Details495057
-Ref: table-sub-escapes496736
-Ref: table-sub-posix-92498090
-Ref: table-sub-proposed499433
-Ref: table-posix-sub500783
-Ref: table-gensub-escapes502329
-Ref: Gory Details-Footnote-1503536
-Ref: Gory Details-Footnote-2503587
-Node: I/O Functions503738
-Ref: I/O Functions-Footnote-1510393
-Node: Time Functions510540
-Ref: Time Functions-Footnote-1521432
-Ref: Time Functions-Footnote-2521500
-Ref: Time Functions-Footnote-3521658
-Ref: Time Functions-Footnote-4521769
-Ref: Time Functions-Footnote-5521881
-Ref: Time Functions-Footnote-6522108
-Node: Bitwise Functions522374
-Ref: table-bitwise-ops522932
-Ref: Bitwise Functions-Footnote-1527153
-Node: Type Functions527337
-Node: I18N Functions527807
-Node: User-defined529434
-Node: Definition Syntax530238
-Ref: Definition Syntax-Footnote-1535148
-Node: Function Example535217
-Node: Function Caveats537811
-Node: Calling A Function538232
-Node: Variable Scope539347
-Node: Pass By Value/Reference541322
-Node: Return Statement544762
-Node: Dynamic Typing547743
-Node: Indirect Calls548478
-Node: Internationalization558163
-Node: I18N and L10N559589
-Node: Explaining gettext560275
-Ref: Explaining gettext-Footnote-1565341
-Ref: Explaining gettext-Footnote-2565525
-Node: Programmer i18n565690
-Node: Translator i18n569890
-Node: String Extraction570683
-Ref: String Extraction-Footnote-1571644
-Node: Printf Ordering571730
-Ref: Printf Ordering-Footnote-1574514
-Node: I18N Portability574578
-Ref: I18N Portability-Footnote-1577027
-Node: I18N Example577090
-Ref: I18N Example-Footnote-1579725
-Node: Gawk I18N579797
-Node: Advanced Features580414
-Node: Nondecimal Data581927
-Node: Array Sorting583510
-Node: Controlling Array Traversal584207
-Node: Array Sorting Functions592445
-Ref: Array Sorting Functions-Footnote-1596119
-Ref: Array Sorting Functions-Footnote-2596212
-Node: Two-way I/O596406
-Ref: Two-way I/O-Footnote-1601838
-Node: TCP/IP Networking601908
-Node: Profiling604752
-Node: Library Functions612206
-Ref: Library Functions-Footnote-1615213
-Node: Library Names615384
-Ref: Library Names-Footnote-1618855
-Ref: Library Names-Footnote-2619075
-Node: General Functions619161
-Node: Strtonum Function620114
-Node: Assert Function623044
-Node: Round Function626370
-Node: Cliff Random Function627913
-Node: Ordinal Functions628929
-Ref: Ordinal Functions-Footnote-1631999
-Ref: Ordinal Functions-Footnote-2632251
-Node: Join Function632460
-Ref: Join Function-Footnote-1634231
-Node: Getlocaltime Function634431
-Node: Data File Management638146
-Node: Filetrans Function638778
-Node: Rewind Function642917
-Node: File Checking644304
-Node: Empty Files645398
-Node: Ignoring Assigns647628
-Node: Getopt Function649181
-Ref: Getopt Function-Footnote-1660485
-Node: Passwd Functions660688
-Ref: Passwd Functions-Footnote-1669663
-Node: Group Functions669751
-Node: Walking Arrays677835
-Node: Sample Programs679404
-Node: Running Examples680069
-Node: Clones680797
-Node: Cut Program682021
-Node: Egrep Program691866
-Ref: Egrep Program-Footnote-1699639
-Node: Id Program699749
-Node: Split Program703365
-Ref: Split Program-Footnote-1706884
-Node: Tee Program707012
-Node: Uniq Program709815
-Node: Wc Program717244
-Ref: Wc Program-Footnote-1721510
-Ref: Wc Program-Footnote-2721710
-Node: Miscellaneous Programs721802
-Node: Dupword Program722990
-Node: Alarm Program725021
-Node: Translate Program729770
-Ref: Translate Program-Footnote-1734157
-Ref: Translate Program-Footnote-2734385
-Node: Labels Program734519
-Ref: Labels Program-Footnote-1737890
-Node: Word Sorting737974
-Node: History Sorting741858
-Node: Extract Program743697
-Ref: Extract Program-Footnote-1751180
-Node: Simple Sed751308
-Node: Igawk Program754370
-Ref: Igawk Program-Footnote-1769527
-Ref: Igawk Program-Footnote-2769728
-Node: Anagram Program769866
-Node: Signature Program772934
-Node: Debugger774034
-Node: Debugging775000
-Node: Debugging Concepts775433
-Node: Debugging Terms777289
-Node: Awk Debugging779886
-Node: Sample Debugging Session780778
-Node: Debugger Invocation781298
-Node: Finding The Bug782627
-Node: List of Debugger Commands789115
-Node: Breakpoint Control790449
-Node: Debugger Execution Control794113
-Node: Viewing And Changing Data797473
-Node: Execution Stack800829
-Node: Debugger Info802296
-Node: Miscellaneous Debugger Commands806277
-Node: Readline Support811722
-Node: Limitations812553
-Node: Arbitrary Precision Arithmetic814805
-Ref: Arbitrary Precision Arithmetic-Footnote-1816447
-Node: General Arithmetic816595
-Node: Floating Point Issues818315
-Node: String Conversion Precision819196
-Ref: String Conversion Precision-Footnote-1820902
-Node: Unexpected Results821011
-Node: POSIX Floating Point Problems823164
-Ref: POSIX Floating Point Problems-Footnote-1826989
-Node: Integer Programming827027
-Node: Floating-point Programming828780
-Ref: Floating-point Programming-Footnote-1835089
-Node: Floating-point Representation835353
-Node: Floating-point Context836518
-Ref: table-ieee-formats837360
-Node: Rounding Mode838744
-Ref: table-rounding-modes839223
-Ref: Rounding Mode-Footnote-1842227
-Node: Gawk and MPFR842408
-Node: Arbitrary Precision Floats843650
-Ref: Arbitrary Precision Floats-Footnote-1846079
-Node: Setting Precision846390
-Node: Setting Rounding Mode849123
-Ref: table-gawk-rounding-modes849527
-Node: Floating-point Constants850707
-Node: Changing Precision852131
-Ref: Changing Precision-Footnote-1853531
-Node: Exact Arithmetic853705
-Node: Arbitrary Precision Integers856813
-Ref: Arbitrary Precision Integers-Footnote-1859813
-Node: Dynamic Extensions859960
-Node: Extension Intro861283
-Node: Plugin License862486
-Node: Extension Design863160
-Node: Old Extension Problems864231
-Ref: Old Extension Problems-Footnote-1865741
-Node: Extension New Mechanism Goals865798
-Ref: Extension New Mechanism Goals-Footnote-1868510
-Node: Extension Other Design Decisions868696
-Node: Extension Mechanism Outline870443
-Ref: load-extension871468
-Ref: load-new-function872946
-Ref: call-new-function873927
-Node: Extension Future Growth875908
-Node: Extension API Description876650
-Node: Extension API Functions Introduction877970
-Node: General Data Types882045
-Ref: General Data Types-Footnote-1887678
-Node: Requesting Values887977
-Ref: table-value-types-returned888708
-Node: Constructor Functions889662
-Node: Registration Functions892658
-Node: Extension Functions893343
-Node: Exit Callback Functions895162
-Node: Extension Version String896405
-Node: Input Parsers897055
-Node: Output Wrappers905636
-Node: Two-way processors910029
-Node: Printing Messages912151
-Ref: Printing Messages-Footnote-1913228
-Node: Updating `ERRNO'913380
-Node: Accessing Parameters914119
-Node: Symbol Table Access915349
-Node: Symbol table by name915861
-Ref: Symbol table by name-Footnote-1918033
-Node: Symbol table by cookie918113
-Ref: Symbol table by cookie-Footnote-1922242
-Node: Cached values922305
-Ref: Cached values-Footnote-1925506
-Node: Array Manipulation925597
-Ref: Array Manipulation-Footnote-1926695
-Node: Array Data Types926734
-Ref: Array Data Types-Footnote-1929456
-Node: Array Functions929548
-Node: Flattening Arrays933314
-Node: Creating Arrays940145
-Node: Extension API Variables944941
-Node: Extension Versioning945577
-Node: Extension API Informational Variables947478
-Node: Extension API Boilerplate948564
-Node: Finding Extensions952398
-Node: Extension Example952945
-Node: Internal File Description953683
-Node: Internal File Ops957371
-Ref: Internal File Ops-Footnote-1968455
-Node: Using Internal File Ops968595
-Ref: Using Internal File Ops-Footnote-1970951
-Node: Extension Samples971217
-Node: Extension Sample File Functions972660
-Node: Extension Sample Fnmatch981029
-Node: Extension Sample Fork982755
-Node: Extension Sample Ord983969
-Node: Extension Sample Readdir984745
-Node: Extension Sample Revout987083
-Node: Extension Sample Rev2way987676
-Node: Extension Sample Read write array988366
-Node: Extension Sample Readfile990249
-Node: Extension Sample API Tests991004
-Node: Extension Sample Time991529
-Node: gawkextlib992838
-Node: Language History995221
-Node: V7/SVR3.1996743
-Node: SVR4999064
-Node: POSIX1000506
-Node: BTL1001514
-Node: POSIX/GNU1002248
-Node: Common Extensions1007783
-Node: Ranges and Locales1008890
-Ref: Ranges and Locales-Footnote-11013508
-Ref: Ranges and Locales-Footnote-21013535
-Ref: Ranges and Locales-Footnote-31013795
-Node: Contributors1014016
-Node: Installation1018312
-Node: Gawk Distribution1019206
-Node: Getting1019690
-Node: Extracting1020516
-Node: Distribution contents1022208
-Node: Unix Installation1027430
-Node: Quick Installation1028047
-Node: Additional Configuration Options1030009
-Node: Configuration Philosophy1031486
-Node: Non-Unix Installation1033828
-Node: PC Installation1034286
-Node: PC Binary Installation1035585
-Node: PC Compiling1037433
-Node: PC Testing1040377
-Node: PC Using1041553
-Node: Cygwin1045738
-Node: MSYS1046738
-Node: VMS Installation1047252
-Node: VMS Compilation1047855
-Ref: VMS Compilation-Footnote-11048862
-Node: VMS Installation Details1048920
-Node: VMS Running1050555
-Node: VMS Old Gawk1052162
-Node: Bugs1052636
-Node: Other Versions1056488
-Node: Notes1061803
-Node: Compatibility Mode1062390
-Node: Additions1063173
-Node: Accessing The Source1064100
-Node: Adding Code1065526
-Node: New Ports1071568
-Node: Derived Files1075703
-Ref: Derived Files-Footnote-11081008
-Ref: Derived Files-Footnote-21081042
-Ref: Derived Files-Footnote-31081642
-Node: Future Extensions1081740
-Node: Basic Concepts1083227
-Node: Basic High Level1083908
-Ref: figure-general-flow1084179
-Ref: figure-process-flow1084778
-Ref: Basic High Level-Footnote-11088007
-Node: Basic Data Typing1088192
-Node: Glossary1091547
-Node: Copying1116858
-Node: GNU Free Documentation License1154415
-Node: Index1179552
+Ref: This Manual-Footnote-157632
+Node: Conventions57732
+Node: Manual History59866
+Ref: Manual History-Footnote-163136
+Ref: Manual History-Footnote-263177
+Node: How To Contribute63251
+Node: Acknowledgments64395
+Node: Getting Started68891
+Node: Running gawk71270
+Node: One-shot72456
+Node: Read Terminal73681
+Ref: Read Terminal-Footnote-175331
+Ref: Read Terminal-Footnote-275607
+Node: Long75778
+Node: Executable Scripts77154
+Ref: Executable Scripts-Footnote-179023
+Ref: Executable Scripts-Footnote-279125
+Node: Comments79672
+Node: Quoting82139
+Node: DOS Quoting86762
+Node: Sample Data Files87437
+Node: Very Simple90469
+Node: Two Rules95068
+Node: More Complex97215
+Ref: More Complex-Footnote-1100145
+Node: Statements/Lines100230
+Ref: Statements/Lines-Footnote-1104692
+Node: Other Features104957
+Node: When105885
+Node: Invoking Gawk108032
+Node: Command Line109493
+Node: Options110276
+Ref: Options-Footnote-1125674
+Node: Other Arguments125699
+Node: Naming Standard Input128357
+Node: Environment Variables129451
+Node: AWKPATH Variable130009
+Ref: AWKPATH Variable-Footnote-1132767
+Node: AWKLIBPATH Variable133027
+Node: Other Environment Variables133624
+Node: Exit Status136119
+Node: Include Files136794
+Node: Loading Shared Libraries140363
+Node: Obsolete141588
+Node: Undocumented142285
+Node: Regexp142528
+Node: Regexp Usage143917
+Node: Escape Sequences145943
+Node: Regexp Operators151706
+Ref: Regexp Operators-Footnote-1159086
+Ref: Regexp Operators-Footnote-2159233
+Node: Bracket Expressions159331
+Ref: table-char-classes161221
+Node: GNU Regexp Operators163744
+Node: Case-sensitivity167467
+Ref: Case-sensitivity-Footnote-1170435
+Ref: Case-sensitivity-Footnote-2170670
+Node: Leftmost Longest170778
+Node: Computed Regexps171979
+Node: Reading Files175389
+Node: Records177392
+Ref: Records-Footnote-1186316
+Node: Fields186353
+Ref: Fields-Footnote-1189386
+Node: Nonconstant Fields189472
+Node: Changing Fields191674
+Node: Field Separators197655
+Node: Default Field Splitting200284
+Node: Regexp Field Splitting201401
+Node: Single Character Fields204743
+Node: Command Line Field Separator205802
+Node: Field Splitting Summary209243
+Ref: Field Splitting Summary-Footnote-1212435
+Node: Constant Size212536
+Node: Splitting By Content217120
+Ref: Splitting By Content-Footnote-1220846
+Node: Multiple Line220886
+Ref: Multiple Line-Footnote-1226733
+Node: Getline226912
+Node: Plain Getline229128
+Node: Getline/Variable231217
+Node: Getline/File232358
+Node: Getline/Variable/File233680
+Ref: Getline/Variable/File-Footnote-1235279
+Node: Getline/Pipe235366
+Node: Getline/Variable/Pipe237926
+Node: Getline/Coprocess239033
+Node: Getline/Variable/Coprocess240276
+Node: Getline Notes240990
+Node: Getline Summary243777
+Ref: table-getline-variants244185
+Node: Read Timeout245041
+Ref: Read Timeout-Footnote-1248786
+Node: Command line directories248843
+Node: Printing249473
+Node: Print251104
+Node: Print Examples252441
+Node: Output Separators255225
+Node: OFMT256985
+Node: Printf258343
+Node: Basic Printf259249
+Node: Control Letters260788
+Node: Format Modifiers264600
+Node: Printf Examples270609
+Node: Redirection273324
+Node: Special Files280308
+Node: Special FD280841
+Ref: Special FD-Footnote-1284466
+Node: Special Network284540
+Node: Special Caveats285390
+Node: Close Files And Pipes286186
+Ref: Close Files And Pipes-Footnote-1293209
+Ref: Close Files And Pipes-Footnote-2293357
+Node: Expressions293507
+Node: Values294639
+Node: Constants295315
+Node: Scalar Constants295995
+Ref: Scalar Constants-Footnote-1296854
+Node: Nondecimal-numbers297036
+Node: Regexp Constants300095
+Node: Using Constant Regexps300570
+Node: Variables303625
+Node: Using Variables304280
+Node: Assignment Options306004
+Node: Conversion307876
+Ref: table-locale-affects313252
+Ref: Conversion-Footnote-1313876
+Node: All Operators313985
+Node: Arithmetic Ops314615
+Node: Concatenation317120
+Ref: Concatenation-Footnote-1319913
+Node: Assignment Ops320033
+Ref: table-assign-ops325021
+Node: Increment Ops326429
+Node: Truth Values and Conditions329899
+Node: Truth Values330982
+Node: Typing and Comparison332031
+Node: Variable Typing332820
+Ref: Variable Typing-Footnote-1336717
+Node: Comparison Operators336839
+Ref: table-relational-ops337249
+Node: POSIX String Comparison340798
+Ref: POSIX String Comparison-Footnote-1341754
+Node: Boolean Ops341892
+Ref: Boolean Ops-Footnote-1345970
+Node: Conditional Exp346061
+Node: Function Calls347793
+Node: Precedence351387
+Node: Locales355056
+Node: Patterns and Actions356145
+Node: Pattern Overview357199
+Node: Regexp Patterns358868
+Node: Expression Patterns359411
+Node: Ranges363096
+Node: BEGIN/END366062
+Node: Using BEGIN/END366824
+Ref: Using BEGIN/END-Footnote-1369555
+Node: I/O And BEGIN/END369661
+Node: BEGINFILE/ENDFILE371943
+Node: Empty374847
+Node: Using Shell Variables375163
+Node: Action Overview377448
+Node: Statements379805
+Node: If Statement381659
+Node: While Statement383158
+Node: Do Statement385202
+Node: For Statement386358
+Node: Switch Statement389510
+Node: Break Statement391607
+Node: Continue Statement393597
+Node: Next Statement395390
+Node: Nextfile Statement397780
+Node: Exit Statement400421
+Node: Built-in Variables402837
+Node: User-modified403932
+Ref: User-modified-Footnote-1412287
+Node: Auto-set412349
+Ref: Auto-set-Footnote-1424700
+Ref: Auto-set-Footnote-2424905
+Node: ARGC and ARGV424961
+Node: Arrays428812
+Node: Array Basics430317
+Node: Array Intro431143
+Node: Reference to Elements435461
+Node: Assigning Elements437731
+Node: Array Example438222
+Node: Scanning an Array439954
+Node: Controlling Scanning442268
+Ref: Controlling Scanning-Footnote-1447201
+Node: Delete447517
+Ref: Delete-Footnote-1450282
+Node: Numeric Array Subscripts450339
+Node: Uninitialized Subscripts452522
+Node: Multi-dimensional454150
+Node: Multi-scanning457244
+Node: Arrays of Arrays458835
+Node: Functions463480
+Node: Built-in464299
+Node: Calling Built-in465377
+Node: Numeric Functions467365
+Ref: Numeric Functions-Footnote-1471197
+Ref: Numeric Functions-Footnote-2471554
+Ref: Numeric Functions-Footnote-3471602
+Node: String Functions471871
+Ref: String Functions-Footnote-1495368
+Ref: String Functions-Footnote-2495497
+Ref: String Functions-Footnote-3495745
+Node: Gory Details495832
+Ref: table-sub-escapes497511
+Ref: table-sub-posix-92498865
+Ref: table-sub-proposed500208
+Ref: table-posix-sub501558
+Ref: table-gensub-escapes503104
+Ref: Gory Details-Footnote-1504311
+Ref: Gory Details-Footnote-2504362
+Node: I/O Functions504513
+Ref: I/O Functions-Footnote-1511168
+Node: Time Functions511315
+Ref: Time Functions-Footnote-1522207
+Ref: Time Functions-Footnote-2522275
+Ref: Time Functions-Footnote-3522433
+Ref: Time Functions-Footnote-4522544
+Ref: Time Functions-Footnote-5522656
+Ref: Time Functions-Footnote-6522883
+Node: Bitwise Functions523149
+Ref: table-bitwise-ops523707
+Ref: Bitwise Functions-Footnote-1527928
+Node: Type Functions528112
+Node: I18N Functions528582
+Node: User-defined530209
+Node: Definition Syntax531013
+Ref: Definition Syntax-Footnote-1535923
+Node: Function Example535992
+Node: Function Caveats538586
+Node: Calling A Function539007
+Node: Variable Scope540122
+Node: Pass By Value/Reference542097
+Node: Return Statement545537
+Node: Dynamic Typing548518
+Node: Indirect Calls549253
+Node: Library Functions558938
+Ref: Library Functions-Footnote-1561937
+Node: Library Names562108
+Ref: Library Names-Footnote-1565579
+Ref: Library Names-Footnote-2565799
+Node: General Functions565885
+Node: Strtonum Function566838
+Node: Assert Function569768
+Node: Round Function573094
+Node: Cliff Random Function574637
+Node: Ordinal Functions575653
+Ref: Ordinal Functions-Footnote-1578723
+Ref: Ordinal Functions-Footnote-2578975
+Node: Join Function579184
+Ref: Join Function-Footnote-1580955
+Node: Getlocaltime Function581155
+Node: Data File Management584870
+Node: Filetrans Function585502
+Node: Rewind Function589641
+Node: File Checking591028
+Node: Empty Files592122
+Node: Ignoring Assigns594352
+Node: Getopt Function595905
+Ref: Getopt Function-Footnote-1607209
+Node: Passwd Functions607412
+Ref: Passwd Functions-Footnote-1616387
+Node: Group Functions616475
+Node: Walking Arrays624559
+Node: Sample Programs626128
+Node: Running Examples626805
+Node: Clones627533
+Node: Cut Program628757
+Node: Egrep Program638602
+Ref: Egrep Program-Footnote-1646375
+Node: Id Program646485
+Node: Split Program650101
+Ref: Split Program-Footnote-1653620
+Node: Tee Program653748
+Node: Uniq Program656551
+Node: Wc Program663980
+Ref: Wc Program-Footnote-1668246
+Ref: Wc Program-Footnote-2668446
+Node: Miscellaneous Programs668538
+Node: Dupword Program669726
+Node: Alarm Program671757
+Node: Translate Program676506
+Ref: Translate Program-Footnote-1680893
+Ref: Translate Program-Footnote-2681121
+Node: Labels Program681255
+Ref: Labels Program-Footnote-1684626
+Node: Word Sorting684710
+Node: History Sorting688594
+Node: Extract Program690433
+Ref: Extract Program-Footnote-1697916
+Node: Simple Sed698044
+Node: Igawk Program701106
+Ref: Igawk Program-Footnote-1716263
+Ref: Igawk Program-Footnote-2716464
+Node: Anagram Program716602
+Node: Signature Program719670
+Node: Internationalization720770
+Node: I18N and L10N722202
+Node: Explaining gettext722888
+Ref: Explaining gettext-Footnote-1727954
+Ref: Explaining gettext-Footnote-2728138
+Node: Programmer i18n728303
+Node: Translator i18n732503
+Node: String Extraction733296
+Ref: String Extraction-Footnote-1734257
+Node: Printf Ordering734343
+Ref: Printf Ordering-Footnote-1737127
+Node: I18N Portability737191
+Ref: I18N Portability-Footnote-1739640
+Node: I18N Example739703
+Ref: I18N Example-Footnote-1742338
+Node: Gawk I18N742410
+Node: Advanced Features743027
+Node: Nondecimal Data744531
+Node: Array Sorting746114
+Node: Controlling Array Traversal746811
+Node: Array Sorting Functions755049
+Ref: Array Sorting Functions-Footnote-1758723
+Ref: Array Sorting Functions-Footnote-2758816
+Node: Two-way I/O759010
+Ref: Two-way I/O-Footnote-1764442
+Node: TCP/IP Networking764512
+Node: Profiling767356
+Node: Debugger774810
+Node: Debugging775778
+Node: Debugging Concepts776211
+Node: Debugging Terms778067
+Node: Awk Debugging780664
+Node: Sample Debugging Session781556
+Node: Debugger Invocation782076
+Node: Finding The Bug783405
+Node: List of Debugger Commands789893
+Node: Breakpoint Control791227
+Node: Debugger Execution Control794891
+Node: Viewing And Changing Data798251
+Node: Execution Stack801607
+Node: Debugger Info803074
+Node: Miscellaneous Debugger Commands807055
+Node: Readline Support812500
+Node: Limitations813331
+Node: Arbitrary Precision Arithmetic815583
+Ref: Arbitrary Precision Arithmetic-Footnote-1817225
+Node: General Arithmetic817373
+Node: Floating Point Issues819093
+Node: String Conversion Precision819974
+Ref: String Conversion Precision-Footnote-1821680
+Node: Unexpected Results821789
+Node: POSIX Floating Point Problems823942
+Ref: POSIX Floating Point Problems-Footnote-1827767
+Node: Integer Programming827805
+Node: Floating-point Programming829558
+Ref: Floating-point Programming-Footnote-1835867
+Node: Floating-point Representation836131
+Node: Floating-point Context837296
+Ref: table-ieee-formats838138
+Node: Rounding Mode839522
+Ref: table-rounding-modes840001
+Ref: Rounding Mode-Footnote-1843005
+Node: Gawk and MPFR843186
+Node: Arbitrary Precision Floats844428
+Ref: Arbitrary Precision Floats-Footnote-1846857
+Node: Setting Precision847168
+Node: Setting Rounding Mode849901
+Ref: table-gawk-rounding-modes850305
+Node: Floating-point Constants851485
+Node: Changing Precision852909
+Ref: Changing Precision-Footnote-1854309
+Node: Exact Arithmetic854483
+Node: Arbitrary Precision Integers857591
+Ref: Arbitrary Precision Integers-Footnote-1860591
+Node: Dynamic Extensions860738
+Node: Extension Intro862061
+Node: Plugin License863264
+Node: Extension Design863938
+Node: Old Extension Problems865009
+Ref: Old Extension Problems-Footnote-1866519
+Node: Extension New Mechanism Goals866576
+Ref: Extension New Mechanism Goals-Footnote-1869288
+Node: Extension Other Design Decisions869474
+Node: Extension Mechanism Outline871221
+Ref: load-extension872246
+Ref: load-new-function873724
+Ref: call-new-function874705
+Node: Extension Future Growth876686
+Node: Extension API Description877428
+Node: Extension API Functions Introduction878748
+Node: General Data Types882823
+Ref: General Data Types-Footnote-1888456
+Node: Requesting Values888755
+Ref: table-value-types-returned889486
+Node: Constructor Functions890440
+Node: Registration Functions893436
+Node: Extension Functions894121
+Node: Exit Callback Functions895940
+Node: Extension Version String897183
+Node: Input Parsers897833
+Node: Output Wrappers906414
+Node: Two-way processors910807
+Node: Printing Messages912929
+Ref: Printing Messages-Footnote-1914006
+Node: Updating `ERRNO'914158
+Node: Accessing Parameters914897
+Node: Symbol Table Access916127
+Node: Symbol table by name916639
+Ref: Symbol table by name-Footnote-1918811
+Node: Symbol table by cookie918891
+Ref: Symbol table by cookie-Footnote-1923020
+Node: Cached values923083
+Ref: Cached values-Footnote-1926284
+Node: Array Manipulation926375
+Ref: Array Manipulation-Footnote-1927473
+Node: Array Data Types927512
+Ref: Array Data Types-Footnote-1930234
+Node: Array Functions930326
+Node: Flattening Arrays934092
+Node: Creating Arrays940923
+Node: Extension API Variables945719
+Node: Extension Versioning946355
+Node: Extension API Informational Variables948256
+Node: Extension API Boilerplate949342
+Node: Finding Extensions953176
+Node: Extension Example953723
+Node: Internal File Description954461
+Node: Internal File Ops958149
+Ref: Internal File Ops-Footnote-1969233
+Node: Using Internal File Ops969373
+Ref: Using Internal File Ops-Footnote-1971729
+Node: Extension Samples971995
+Node: Extension Sample File Functions973438
+Node: Extension Sample Fnmatch981807
+Node: Extension Sample Fork983533
+Node: Extension Sample Ord984747
+Node: Extension Sample Readdir985523
+Node: Extension Sample Revout987861
+Node: Extension Sample Rev2way988454
+Node: Extension Sample Read write array989144
+Node: Extension Sample Readfile991027
+Node: Extension Sample API Tests991782
+Node: Extension Sample Time992307
+Node: gawkextlib993616
+Node: Language History995999
+Node: V7/SVR3.1997521
+Node: SVR4999842
+Node: POSIX1001284
+Node: BTL1002292
+Node: POSIX/GNU1003026
+Node: Common Extensions1008561
+Node: Ranges and Locales1009668
+Ref: Ranges and Locales-Footnote-11014286
+Ref: Ranges and Locales-Footnote-21014313
+Ref: Ranges and Locales-Footnote-31014573
+Node: Contributors1014794
+Node: Installation1019090
+Node: Gawk Distribution1019984
+Node: Getting1020468
+Node: Extracting1021294
+Node: Distribution contents1022986
+Node: Unix Installation1028208
+Node: Quick Installation1028825
+Node: Additional Configuration Options1030787
+Node: Configuration Philosophy1032264
+Node: Non-Unix Installation1034606
+Node: PC Installation1035064
+Node: PC Binary Installation1036363
+Node: PC Compiling1038211
+Node: PC Testing1041155
+Node: PC Using1042331
+Node: Cygwin1046516
+Node: MSYS1047516
+Node: VMS Installation1048030
+Node: VMS Compilation1048633
+Ref: VMS Compilation-Footnote-11049640
+Node: VMS Installation Details1049698
+Node: VMS Running1051333
+Node: VMS Old Gawk1052940
+Node: Bugs1053414
+Node: Other Versions1057266
+Node: Notes1062581
+Node: Compatibility Mode1063168
+Node: Additions1063951
+Node: Accessing The Source1064878
+Node: Adding Code1066304
+Node: New Ports1072346
+Node: Derived Files1076481
+Ref: Derived Files-Footnote-11081786
+Ref: Derived Files-Footnote-21081820
+Ref: Derived Files-Footnote-31082420
+Node: Future Extensions1082518
+Node: Basic Concepts1084005
+Node: Basic High Level1084686
+Ref: figure-general-flow1084957
+Ref: figure-process-flow1085556
+Ref: Basic High Level-Footnote-11088785
+Node: Basic Data Typing1088970
+Node: Glossary1092325
+Node: Copying1117636
+Node: GNU Free Documentation License1155193
+Node: Index1180330
 
 End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 7584f35..e4ffc22 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -20,7 +20,7 @@
 @c applies to and all the info about who's publishing this edition
 
 @c These apply across the board.
address@hidden UPDATE-MONTH October, 2012
address@hidden UPDATE-MONTH November, 2012
 @set VERSION 4.0
 @set PATCHLEVEL 1
 
@@ -294,13 +294,13 @@ particular records in a file and perform operations upon 
them.
 * Arrays::                         The description and use of arrays. Also
                                    includes array-oriented control statements.
 * Functions::                      Built-in and user-defined functions.
+* Library Functions::              A Library of @command{awk} Functions.
+* Sample Programs::                Many @command{awk} programs with complete
+                                   explanations.
 * Internationalization::           Getting @command{gawk} to speak your
                                    language.
 * Advanced Features::              Stuff for advanced users, specific to
                                    @command{gawk}.
-* Library Functions::              A Library of @command{awk} Functions.
-* Sample Programs::                Many @command{awk} programs with complete
-                                   explanations.
 * Debugger::                       The @code{gawk} debugger.
 * Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
                                    @command{gawk}.
@@ -593,28 +593,6 @@ particular records in a file and perform operations upon 
them.
                                         runtime.
 * Indirect Calls::                      Choosing the function to call at
                                         runtime.
-* I18N and L10N::                       Internationalization and Localization.
-* Explaining gettext::                  How GNU @code{gettext} works.
-* Programmer i18n::                     Features for the programmer.
-* Translator i18n::                     Features for the translator.
-* String Extraction::                   Extracting marked strings.
-* Printf Ordering::                     Rearranging @code{printf} arguments.
-* I18N Portability::                    @command{awk}-level portability
-                                        issues.
-* I18N Example::                        A simple i18n example.
-* Gawk I18N::                           @command{gawk} is also
-                                        internationalized.
-* Nondecimal Data::                     Allowing nondecimal input data.
-* Array Sorting::                       Facilities for controlling array
-                                        traversal and sorting arrays.
-* Controlling Array Traversal::         How to use PROCINFO["sorted_in"].
-* Array Sorting Functions::             How to use @code{asort()} and
-                                        @code{asorti()}.
-* Two-way I/O::                         Two-way communications with another
-                                        process.
-* TCP/IP Networking::                   Using @command{gawk} for network
-                                        programming.
-* Profiling::                           Profiling your @command{awk} programs.
 * Library Names::                       How to best name private global
                                         variables in library functions.
 * General Functions::                   Functions that are of general use.
@@ -676,6 +654,28 @@ particular records in a file and perform operations upon 
them.
 * Anagram Program::                     Finding anagrams from a dictionary.
 * Signature Program::                   People do amazing things with too much
                                         time on their hands.
+* I18N and L10N::                       Internationalization and Localization.
+* Explaining gettext::                  How GNU @code{gettext} works.
+* Programmer i18n::                     Features for the programmer.
+* Translator i18n::                     Features for the translator.
+* String Extraction::                   Extracting marked strings.
+* Printf Ordering::                     Rearranging @code{printf} arguments.
+* I18N Portability::                    @command{awk}-level portability
+                                        issues.
+* I18N Example::                        A simple i18n example.
+* Gawk I18N::                           @command{gawk} is also
+                                        internationalized.
+* Nondecimal Data::                     Allowing nondecimal input data.
+* Array Sorting::                       Facilities for controlling array
+                                        traversal and sorting arrays.
+* Controlling Array Traversal::         How to use PROCINFO["sorted_in"].
+* Array Sorting Functions::             How to use @code{asort()} and
+                                        @code{asorti()}.
+* Two-way I/O::                         Two-way communications with another
+                                        process.
+* TCP/IP Networking::                   Using @command{gawk} for network
+                                        programming.
+* Profiling::                           Profiling your @command{awk} programs.
 * Debugging::                           Introduction to @command{gawk}
                                         debugger.
 * Debugging Concepts::                  Debugging in General.
@@ -1251,6 +1251,12 @@ expert should find useful.  In particular, the 
description of POSIX
 @ref{Sample Programs},
 should be of interest.
 
+This @value{DOCUMENT} is split into several parts, as follows:
+
+Part I describes the @command{awk} language and @command{gawk} program in 
detail.
+It starts with the basics, and continues through all of the features of 
@command{awk}.
+It contains the following chapters:
+
 @ref{Getting Started},
 provides the essentials you need to know to begin using @command{awk}.
 
@@ -1294,6 +1300,22 @@ describes the built-in functions @command{awk} and
 @command{gawk} provide, as well as how to define
 your own functions.
 
+Part II shows how to use @command{awk} and @command{gawk} for problem solving.
+There is lots of code here for you to read and learn from.
+It contains the following chapters:
+
address@hidden Functions}, which provides a number of functions meant to
+be used from main @command{awk} programs.
+
address@hidden Programs},
+which provides many sample @command{awk} programs.
+
+Reading these two chapters allows you to see @command{awk}
+solving real problems.
+
+Part III focuses on features specific to @command{gawk}.
+It contains the following chapters:
+
 @ref{Internationalization},
 describes special features in @command{gawk} for translating program
 messages into different languages at runtime.
@@ -1305,12 +1327,6 @@ are the abilities to have two-way communications with 
another process,
 perform TCP/IP networking, and
 profile your @command{awk} programs.
 
address@hidden Functions}, and
address@hidden Programs},
-provide many sample @command{awk} programs.
-Reading them allows you to see @command{awk}
-solving real problems.
-
 @ref{Debugger}, describes the @command{awk} debugger.
 
 @ref{Arbitrary Precision Arithmetic},
@@ -1320,6 +1336,10 @@ describes advanced arithmetic facilities provided by
 @ref{Dynamic Extensions}, describes how to add new variables and
 functions to @command{gawk} by writing extensions in C.
 
+Part IV provides the appendices, the Glossary, and two licenses that cover
+the @command{gawk} source code and this @value{DOCUMENT}, respectively.
+It contains the following appendices:
+
 @ref{Language History},
 describes how the @command{awk} language has evolved since
 its first release to present.  It also describes how @command{gawk}
@@ -1780,12 +1800,14 @@ Nof Ayalon @*
 ISRAEL @*
 March, 2011
 
address@hidden
address@hidden Try this
 @iftex
address@hidden
address@hidden off
address@hidden I@ @ @ @ The @command{awk} Language and @command{gawk}
address@hidden Part I:@* The @command{awk} Language
address@hidden iftex
+
address@hidden
address@hidden
address@hidden Part I:@* The @command{awk} Language
+
 Part I describes the @command{awk} language and @command{gawk} program in 
detail.
 It starts with the basics, and continues through all of the features of 
@command{awk}
 and @command{gawk}.  It contains the following chapters:
@@ -1795,6 +1817,9 @@ and @command{gawk}.  It contains the following chapters:
 @ref{Getting Started}.
 
 @item
address@hidden Gawk}.
+
address@hidden
 @ref{Regexp}.
 
 @item
@@ -1814,21 +1839,8 @@ and @command{gawk}.  It contains the following chapters:
 
 @item
 @ref{Functions}.
-
address@hidden
address@hidden
-
address@hidden
address@hidden Features}.
-
address@hidden
address@hidden Gawk}.
 @end itemize
-
address@hidden
address@hidden @thispage@ @ @ @address@hidden @| @|
address@hidden  @| @| @address@hidden@ @ @ @thispage
address@hidden iftex
address@hidden ifdocbook
 @end ignore
 
 @node Getting Started
@@ -4181,31 +4193,6 @@ long-undocumented ``feature'' of Unix @code{awk}.
 
 @end ignore
 
address@hidden
address@hidden Try this
address@hidden
address@hidden
address@hidden off
address@hidden II@ @ @ Using @command{awk} and @command{gawk}
-Part II shows how to use @command{awk} and @command{gawk} for problem solving.
-There is lots of code here for you to read and learn from.
-It contains the following chapters:
-
address@hidden @bullet
address@hidden
address@hidden Functions}.
-
address@hidden
address@hidden Programs}.
-
address@hidden itemize
-
address@hidden
address@hidden @thispage@ @ @ @address@hidden @| @|
address@hidden  @| @| @address@hidden@ @ @ @thispage
address@hidden iftex
address@hidden ignore
-
 @node Regexp
 @chapter Regular Expressions
 @cindex regexp, See regular expressions
@@ -17932,7764 +17919,7816 @@ for (i = 1; i <= n; i++)
 
 @c ENDOFRANGE funcud
 
address@hidden Internationalization
address@hidden Internationalization with @command{gawk}
address@hidden
address@hidden Part II:@* Problem Solving With @command{awk}
address@hidden iftex
 
-Once upon a time, computer makers
-wrote software that worked only in English.
-Eventually, hardware and software vendors noticed that if their
-systems worked in the native languages of non-English-speaking
-countries, they were able to sell more systems.
-As a result, internationalization and localization
-of programs and software systems became a common practice.
address@hidden
address@hidden
address@hidden Part II:@* Problem Solving With @command{awk}
 
address@hidden STARTOFRANGE inloc
address@hidden internationalization, localization
address@hidden @command{gawk}, internationalization and, See 
internationalization
address@hidden internationalization, localization, @command{gawk} and
-For many years, the ability to provide internationalization
-was largely restricted to programs written in C and C++.
-This @value{CHAPTER} describes the underlying library @command{gawk}
-uses for internationalization, as well as how
address@hidden makes internationalization
-features available at the @command{awk} program level.
-Having internationalization available at the @command{awk} level
-gives software developers additional flexibility---they are no
-longer forced to write in C or C++ when internationalization is
-a requirement.
+Part II shows how to use @command{awk} and @command{gawk} for problem solving.
+There is lots of code here for you to read and learn from.
+It contains the following chapters:
 
address@hidden
-* I18N and L10N::               Internationalization and Localization.
-* Explaining gettext::          How GNU @code{gettext} works.
-* Programmer i18n::             Features for the programmer.
-* Translator i18n::             Features for the translator.
-* I18N Example::                A simple i18n example.
-* Gawk I18N::                   @command{gawk} is also internationalized.
address@hidden menu
address@hidden @bullet
address@hidden
address@hidden Functions}.
 
address@hidden I18N and L10N
address@hidden Internationalization and Localization
address@hidden
address@hidden Programs}.
address@hidden itemize
address@hidden ifdocbook
address@hidden ignore
 
address@hidden internationalization
address@hidden localization, See address@hidden localization
address@hidden localization
address@hidden means writing (or modifying) a program once,
-in such a way that it can use multiple languages without requiring
-further source-code changes.
address@hidden means providing the data necessary for an
-internationalized program to work in a particular language.
-Most typically, these terms refer to features such as the language
-used for printing error messages, the language used to read
-responses, and information related to how numerical and
-monetary values are printed and read.
address@hidden Library Functions
address@hidden A Library of @command{awk} Functions
address@hidden STARTOFRANGE libf
address@hidden libraries of @command{awk} functions
address@hidden STARTOFRANGE flib
address@hidden functions, library
address@hidden STARTOFRANGE fudlib
address@hidden functions, user-defined, library of
 
address@hidden Explaining gettext
address@hidden GNU @code{gettext}
address@hidden, describes how to write
+your own @command{awk} functions.  Writing functions is important, because
+it allows you to encapsulate algorithms and program tasks in a single
+place.  It simplifies programming, making program development more
+manageable, and making programs more readable.
 
address@hidden internationalizing a program
address@hidden STARTOFRANGE gettex
address@hidden @code{gettext} library
-The facilities in GNU @code{gettext} focus on messages; strings printed
-by a program, either directly or via formatting with @code{printf} or
address@hidden()address@hidden some operating systems, the @command{gawk}
-port doesn't support GNU @code{gettext}.
-Therefore, these features are not available
-if you are using one of those operating systems. Sorry.}
+One valuable way to learn a new programming language is to @emph{read}
+programs in that language.  To that end, this @value{CHAPTER}
+and @ref{Sample Programs},
+provide a good-sized body of code for you to read,
+and hopefully, to learn from.
 
address@hidden portability, @code{gettext} library and
-When using GNU @code{gettext}, each application has its own
address@hidden domain}.  This is a unique name, such as @samp{kpilot} or 
@samp{gawk},
-that identifies the application.
-A complete application may have multiple components---programs written
-in C or C++, as well as scripts written in @command{sh} or @command{awk}.
-All of the components use the same text domain.
address@hidden 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!!
+This @value{CHAPTER} presents a library of useful @command{awk} functions.
+Many of the sample programs presented later in this @value{DOCUMENT}
+use these functions.
+The functions are presented here in a progression from simple to complex.
 
-To make the discussion concrete, assume we're writing an application
-named @command{guide}.  Internationalization consists of the
-following steps, in this order:
address@hidden Texinfo
address@hidden Program},
+presents a program that you can use to extract the source code for
+these example library functions and programs from the Texinfo source
+for this @value{DOCUMENT}.
+(This has already been done as part of the @command{gawk} distribution.)
 
address@hidden
address@hidden
-The programmer goes
-through the source for all of @command{guide}'s components
-and marks each string that is a candidate for translation.
-For example, @code{"`-F': option required"} is a good candidate for 
translation.
-A table with strings of option names is not (e.g., @command{gawk}'s
address@hidden option should remain the same, no matter what the local
-language).
+If you have written one or more useful, general-purpose @command{awk} functions
+and would like to contribute them to the @command{awk} user community, see
address@hidden To Contribute}, for more information.
 
address@hidden @code{textdomain()} function (C library)
address@hidden portability, example programs
+The programs in this @value{CHAPTER} and in
address@hidden Programs},
+freely use features that are @command{gawk}-specific.
+Rewriting these programs for different implementations of @command{awk}
+is pretty straightforward.
+
address@hidden @bullet
 @item
-The programmer indicates the application's text domain
-(@code{"guide"}) to the @code{gettext} library,
-by calling the @code{textdomain()} function.
+Diagnostic error messages are sent to @file{/dev/stderr}.
+Use @samp{| "cat 1>&2"} instead of @samp{> "/dev/stderr"} if your system
+does not have a @file{/dev/stderr}, or if you cannot use @command{gawk}.
 
address@hidden @code{.pot} files
address@hidden files, @code{.pot}
address@hidden portable object template files
address@hidden files, portable object template
 @item
-Messages from the application are extracted from the source code and
-collected into a portable object template file (@file{guide.pot}),
-which lists the strings and their translations.
-The translations are initially empty.
-The original (usually English) messages serve as the key for
-lookup of the translations.
+A number of programs use @code{nextfile}
+(@pxref{Nextfile Statement})
+to skip any remaining input in the input file.
 
address@hidden @code{.po} files
address@hidden files, @code{.po}
address@hidden portable object files
address@hidden files, portable object
 @item
-For each language with a translator, @file{guide.pot}
-is copied to a portable object file (@code{.po})
-and translations are created and shipped with the application.
-For example, there might be a @file{fr.po} for a French translation.
address@hidden 12/2000: Thanks to Nelson Beebe for pointing out the output 
issue.
address@hidden case sensitivity, example programs
address@hidden @code{IGNORECASE} variable, in example programs
+Finally, some of the programs choose to ignore upper- and lowercase
+distinctions in their input. They do so by assigning one to @code{IGNORECASE}.
+You can achieve almost the same address@hidden effects are
+not identical.  Output of the transformed
+record will be in all lowercase, while @code{IGNORECASE} preserves the original
+contents of the input record.} by adding the following rule to the
+beginning of the program:
 
address@hidden @code{.mo} files
address@hidden files, @code{.mo}
address@hidden message object files
address@hidden files, message object
address@hidden
-Each language's @file{.po} file is converted into a binary
-message object (@file{.mo}) file.
-A message object file contains the original messages and their
-translations in a binary format that allows fast lookup of translations
-at runtime.
address@hidden
+# ignore case
address@hidden $0 = tolower($0) @}
address@hidden example
 
address@hidden
-When @command{guide} is built and installed, the binary translation files
-are installed in a standard place.
address@hidden
+Also, verify that all regexp and string constants used in
+comparisons use only lowercase letters.
address@hidden itemize
 
address@hidden @code{bindtextdomain()} function (C library)
address@hidden
-For testing and development, it is possible to tell @code{gettext}
-to use @file{.mo} files in a different directory than the standard
-one by using the @code{bindtextdomain()} function.
address@hidden
+* Library Names::               How to best name private global variables in
+                                library functions.
+* General Functions::           Functions that are of general use.
+* Data File Management::        Functions for managing command-line data
+                                files.
+* Getopt Function::             A function for processing command-line
+                                arguments.
+* Passwd Functions::            Functions for getting user information.
+* Group Functions::             Functions for getting group information.
+* Walking Arrays::              A function to walk arrays of arrays.
address@hidden menu
 
address@hidden @code{.mo} files, specifying directory of
address@hidden files, @code{.mo}, specifying directory of
address@hidden message object files, specifying directory of
address@hidden files, message object, specifying directory of
address@hidden
-At runtime, @command{guide} looks up each string via a call
-to @code{gettext()}.  The returned string is the translated string
-if available, or the original string if not.
address@hidden Library Names
address@hidden Naming Library Function Global Variables
 
address@hidden
-If necessary, it is possible to access messages from a different
-text domain than the one belonging to the application, without
-having to switch the application's default text domain back
-and forth.
address@hidden enumerate
address@hidden names, arrays/variables
address@hidden names, functions
address@hidden namespace issues
address@hidden @command{awk} programs, documenting
address@hidden documentation, of @command{awk} programs
+Due to the way the @command{awk} language evolved, variables are either
address@hidden (usable by the entire program) or @dfn{local} (usable just by
+a specific function).  There is no intermediate state analogous to
address@hidden variables in C.
 
address@hidden @code{gettext()} function (C library)
-In C (or C++), the string marking and dynamic translation lookup
-are accomplished by wrapping each string in a call to @code{gettext()}:
address@hidden variables, global, for library functions
address@hidden private variables
address@hidden variables, private
+Library functions often need to have global variables that they can use to
+preserve state information between calls to the function---for example,
address@hidden()}'s variable @code{_opti}
+(@pxref{Getopt Function}).
+Such variables are called @dfn{private}, since the only functions that need to
+use them are the ones in the library.
 
address@hidden
-printf("%s", gettext("Don't Panic!\n"));
address@hidden example
+When writing a library function, you should try to choose names for your
+private variables that will not conflict with any variables used by
+either another library function or a user's main program.  For example, a
+name like @code{i} or @code{j} is not a good choice, because user programs
+often use variable names like these for their own purposes.
 
-The tools that extract messages from source code pull out all
-strings enclosed in calls to @code{gettext()}.
address@hidden programming conventions, private variable names
+The example programs shown in this @value{CHAPTER} all start the names of their
+private variables with an underscore (@samp{_}).  Users generally don't use
+leading underscores in their variable names, so this convention immediately
+decreases the chances that the variable name will be accidentally shared
+with the user's program.
 
address@hidden @code{_} (underscore), @code{_} C macro
address@hidden underscore (@code{_}), @code{_} C macro
-The GNU @code{gettext} developers, recognizing that typing
address@hidden(@dots{})} over and over again is both painful and ugly to look
-at, use the macro @samp{_} (an underscore) to make things easier:
address@hidden @code{_} (underscore), in names of private variables
address@hidden underscore (@code{_}), in names of private variables
+In addition, several of the library functions use a prefix that helps
+indicate what function or set of functions use the variables---for example,
address@hidden in the user database routines
+(@pxref{Passwd Functions}).
+This convention is recommended, since it even further decreases the
+chance of inadvertent conflict among variable names.  Note that this
+convention is used equally well for variable names and for private
+function address@hidden all the library routines could have
+been rewritten to use this convention, this was not done, in order to
+show how our own @command{awk} programming style has evolved and to
+provide some basis for this discussion.}
 
address@hidden
-/* In the standard header file: */
-#define _(str) gettext(str)
+As a final note on variable naming, if a function makes global variables
+available for use by a main program, it is a good convention to start that
+variable's name with a capital letter---for
+example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables
+(@pxref{Getopt Function}).
+The leading capital letter indicates that it is global, while the fact that
+the variable name is not all capital letters indicates that the variable is
+not one of @command{awk}'s built-in variables, such as @code{FS}.
 
-/* In the program text: */
-printf("%s", _("Don't Panic!\n"));
address@hidden @code{--dump-variables} option
+It is also important that @emph{all} variables in library
+functions that do not need to save state are, in fact, declared
address@hidden@command{gawk}'s @option{--dump-variables} command-line
+option is useful for verifying this.} If this is not done, the variable
+could accidentally be used in the user's program, leading to bugs that
+are very difficult to track down:
+
address@hidden
+function lib_func(x, y,    l1, l2)
address@hidden
+    @dots{}
+    @var{use variable} some_var   # some_var should be local
+    @dots{}                     # but is not by oversight
address@hidden
 @end example
 
address@hidden internationalization, localization, locale categories
address@hidden @code{gettext} library, locale categories
address@hidden locale categories
address@hidden
-This reduces the typing overhead to just three extra characters per string
-and is considerably easier to read as well.
address@hidden arrays, associative, library functions and
address@hidden libraries of @command{awk} functions, associative arrays and
address@hidden functions, library, associative arrays and
address@hidden Tcl
+A different convention, common in the Tcl community, is to use a single
+associative array to hold the values needed by the library function(s), or
+``package.''  This significantly decreases the number of actual global names
+in use.  For example, the functions described in
address@hidden Functions},
+might have used array elements @address@hidden"inited"]}}, 
@address@hidden"total"]}},
address@hidden@w{PW_data["count"]}}, and @address@hidden"awklib"]}}, instead of
address@hidden@w{_pw_inited}}, @address@hidden, @address@hidden,
+and @address@hidden
 
-There are locale @dfn{categories}
-for different types of locale-related information.
-The defined locale categories that @code{gettext} knows about are:
+The conventions presented in this @value{SECTION} are exactly
+that: conventions. You are not required to write your programs this
+way---we merely recommend that you do so.
 
address@hidden @code
address@hidden @code{LC_MESSAGES} locale category
address@hidden LC_MESSAGES
-Text messages.  This is the default category for @code{gettext}
-operations, but it is possible to supply a different one explicitly,
-if necessary.  (It is almost never necessary to supply a different category.)
address@hidden General Functions
address@hidden General Programming
 
address@hidden sorting characters in different languages
address@hidden @code{LC_COLLATE} locale category
address@hidden LC_COLLATE
-Text-collation information; i.e., how different characters
-and/or groups of characters sort in a given language.
+This @value{SECTION} presents a number of functions that are of general
+programming use.
 
address@hidden @code{LC_CTYPE} locale category
address@hidden LC_CTYPE
-Character-type information (alphabetic, digit, upper- or lowercase, and
-so on).
-This information is accessed via the
-POSIX character classes in regular expressions,
-such as @code{/[[:alnum:]]/}
-(@pxref{Regexp Operators}).
address@hidden
+* Strtonum Function::           A replacement for the built-in
+                                @code{strtonum()} function.
+* Assert Function::             A function for assertions in @command{awk}
+                                programs.
+* Round Function::              A function for rounding if @code{sprintf()}
+                                does not do it correctly.
+* Cliff Random Function::       The Cliff Random Number Generator.
+* Ordinal Functions::           Functions for using characters as numbers and
+                                vice versa.
+* Join Function::               A function to join an array into a string.
+* Getlocaltime Function::       A function to get formatted times.
address@hidden menu
 
address@hidden monetary information, localization
address@hidden currency symbols, localization
address@hidden @code{LC_MONETARY} locale category
address@hidden LC_MONETARY
-Monetary information, such as the currency symbol, and whether the
-symbol goes before or after a number.
address@hidden Strtonum Function
address@hidden Converting Strings To Numbers
 
address@hidden @code{LC_NUMERIC} locale category
address@hidden LC_NUMERIC
-Numeric information, such as which characters to use for the decimal
-point and the thousands address@hidden
-use a comma every three decimal places and a period for the decimal
-point, while many Europeans do exactly the opposite:
-1,234.56 versus 1.234,56.}
+The @code{strtonum()} function (@pxref{String Functions})
+is a @command{gawk} extension.  The following function
+provides an implementation for other versions of @command{awk}:
 
address@hidden @code{LC_RESPONSE} locale category
address@hidden LC_RESPONSE
-Response information, such as how ``yes'' and ``no'' appear in the
-local language, and possibly other information as well.
address@hidden
address@hidden file eg/lib/strtonum.awk
+# mystrtonum --- convert string to number
 
address@hidden time, localization and
address@hidden dates, information related address@hidden localization
address@hidden @code{LC_TIME} locale category
address@hidden LC_TIME
-Time- and date-related information, such as 12- or 24-hour clock, month printed
-before or after the day in a date, local month abbreviations, and so on.
-
address@hidden @code{LC_ALL} locale category
address@hidden LC_ALL
-All of the above.  (Not too useful in the context of @code{gettext}.)
address@hidden table
address@hidden ENDOFRANGE gettex
-
address@hidden Programmer i18n
address@hidden Internationalizing @command{awk} Programs
address@hidden STARTOFRANGE inap
address@hidden @command{awk} programs, internationalizing
-
address@hidden provides the following variables and functions for
-internationalization:
address@hidden endfile
address@hidden
address@hidden file eg/lib/strtonum.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# February, 2004
 
address@hidden @code
address@hidden @code{TEXTDOMAIN} variable
address@hidden TEXTDOMAIN
-This variable indicates the application's text domain.
-For compatibility with GNU @code{gettext}, the default
-value is @code{"messages"}.
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/strtonum.awk
+function mystrtonum(str,        ret, chars, n, i, k, c)
address@hidden
+    if (str ~ /^0[0-7]*$/) @{
+        # octal
+        n = length(str)
+        ret = 0
+        for (i = 1; i <= n; i++) @{
+            c = substr(str, i, 1)
+            if ((k = index("01234567", c)) > 0)
+                k-- # adjust for 1-basing in awk
 
address@hidden internationalization, localization, marked strings
address@hidden strings, for localization
address@hidden _"your message here"
-String constants marked with a leading underscore
-are candidates for translation at runtime.
-String constants without a leading underscore are not translated.
+            ret = ret * 8 + k
+        @}
+    @} else if (str ~ /^0[xX][[:xdigit:]]+/) @{
+        # hexadecimal
+        str = substr(str, 3)    # lop off leading 0x
+        n = length(str)
+        ret = 0
+        for (i = 1; i <= n; i++) @{
+            c = substr(str, i, 1)
+            c = tolower(c)
+            if ((k = index("0123456789", c)) > 0)
+                k-- # adjust for 1-basing in awk
+            else if ((k = index("abcdef", c)) > 0)
+                k += 9
 
address@hidden @code{dcgettext()} function (@command{gawk})
address@hidden dcgettext(@var{string} @r{[}, @var{domain} @r{[}, 
@address@hidden)
-Return the translation of @var{string} in
-text domain @var{domain} for locale category @var{category}.
-The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
-The default value for @var{category} is @code{"LC_MESSAGES"}.
+            ret = ret * 16 + k
+        @}
+    @} else if (str ~ \
+  /^[-+]?([0-9]+([.][0-9]*([Ee][0-9]+)?)?|([.][0-9]+([Ee][-+]?[0-9]+)?))$/) @{
+        # decimal number, possibly floating point
+        ret = str + 0
+    @} else
+        ret = "NOT-A-NUMBER"
 
-If you supply a value for @var{category}, it must be a string equal to
-one of the known locale categories described in
address@hidden
-the previous @value{SECTION}.
address@hidden ifnotinfo
address@hidden
address@hidden gettext}.
address@hidden ifinfo
-You must also supply a text domain.  Use @code{TEXTDOMAIN} if
-you want to use the current domain.
+    return ret
address@hidden
 
address@hidden CAUTION
-The order of arguments to the @command{awk} version
-of the @code{dcgettext()} function is purposely different from the order for
-the C version.  The @command{awk} version's order was
-chosen to be simple and to allow for reasonable @command{awk}-style
-default arguments.
address@hidden quotation
+# BEGIN @{     # gawk test harness
+#     a[1] = "25"
+#     a[2] = ".31"
+#     a[3] = "0123"
+#     a[4] = "0xdeadBEEF"
+#     a[5] = "123.45"
+#     a[6] = "1.e3"
+#     a[7] = "1.32"
+#     a[7] = "1.32E2"
+# 
+#     for (i = 1; i in a; i++)
+#         print a[i], strtonum(a[i]), mystrtonum(a[i])
+# @}
address@hidden endfile
address@hidden example
 
address@hidden @code{dcngettext()} function (@command{gawk})
address@hidden dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, 
@var{domain} @r{[}, @address@hidden)
-Return the plural form used for @var{number} of the
-translation of @var{string1} and @var{string2} in text domain
address@hidden for locale category @var{category}. @var{string1} is the
-English singular variant of a message, and @var{string2} the English plural
-variant of the same message.
-The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
-The default value for @var{category} is @code{"LC_MESSAGES"}.
+The function first looks for C-style octal numbers (base 8).
+If the input string matches a regular expression describing octal
+numbers, then @code{mystrtonum()} loops through each character in the
+string.  It sets @code{k} to the index in @code{"01234567"} of the current
+octal digit.  Since the return value is one-based, the @samp{k--}
+adjusts @code{k} so it can be used in computing the return value.
 
-The same remarks about argument order as for the @code{dcgettext()} function 
apply.
+Similar logic applies to the code that checks for and converts a
+hexadecimal value, which starts with @samp{0x} or @samp{0X}.
+The use of @code{tolower()} simplifies the computation for finding
+the correct numeric value for each hexadecimal digit.
 
address@hidden @code{.mo} files, specifying directory of
address@hidden files, @code{.mo}, specifying directory of
address@hidden message object files, specifying directory of
address@hidden files, message object, specifying directory of
address@hidden @code{bindtextdomain()} function (@command{gawk})
address@hidden bindtextdomain(@var{directory} @r{[}, @address@hidden)
-Change the directory in which
address@hidden looks for @file{.mo} files, in case they
-will not or cannot be placed in the standard locations
-(e.g., during testing).
-Return the directory in which @var{domain} is ``bound.''
+Finally, if the string matches the (rather complicated) regexp for a
+regular decimal integer or floating-point number, the computation
address@hidden = str + 0} lets @command{awk} convert the value to a
+number.
 
-The default @var{domain} is the value of @code{TEXTDOMAIN}.
-If @var{directory} is the null string (@code{""}), then
address@hidden()} returns the current binding for the
-given @var{domain}.
address@hidden table
+A commented-out test program is included, so that the function can
+be tested with @command{gawk} and the results compared to the built-in
address@hidden()} function.
 
-To use these facilities in your @command{awk} program, follow the steps
-outlined in
address@hidden
-the previous @value{SECTION},
address@hidden ifnotinfo
address@hidden
address@hidden gettext},
address@hidden ifinfo
-like so:
address@hidden Assert Function
address@hidden Assertions
 
address@hidden
address@hidden @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
address@hidden @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and
address@hidden
-Set the variable @code{TEXTDOMAIN} to the text domain of
-your program.  This is best done in a @code{BEGIN} rule
-(@pxref{BEGIN/END}),
-or it can also be done via the @option{-v} command-line
-option (@pxref{Options}):
address@hidden STARTOFRANGE asse
address@hidden assertions
address@hidden STARTOFRANGE assef
address@hidden @code{assert()} function (C library)
address@hidden STARTOFRANGE libfass
address@hidden libraries of @command{awk} functions, assertions
address@hidden STARTOFRANGE flibass
address@hidden functions, library, assertions
address@hidden @command{awk} programs, lengthy, assertions
+When writing large programs, it is often useful to know
+that a condition or set of conditions is true.  Before proceeding with a
+particular computation, you make a statement about what you believe to be
+the case.  Such a statement is known as an
address@hidden  The C language provides an @code{<assert.h>} header file
+and corresponding @code{assert()} macro that the programmer can use to make
+assertions.  If an assertion fails, the @code{assert()} macro arranges to
+print a diagnostic message describing the condition that should have
+been true but was not, and then it kills the program.  In C, using
address@hidden()} looks this:
 
 @example
-BEGIN @{
-    TEXTDOMAIN = "guide"
-    @dots{}
+#include <assert.h>
+
+int myfunc(int a, double b)
address@hidden
+     assert(a <= 5 && b >= 17.1);
+     @dots{}
 @}
 @end example
 
address@hidden @code{_} (underscore), translatable string
address@hidden underscore (@code{_}), translatable string
address@hidden
-Mark all translatable strings with a leading underscore (@samp{_})
-character.  It @emph{must} be adjacent to the opening
-quote of the string.  For example:
+If the assertion fails, the program prints a message similar to this:
 
 @example
-print _"hello, world"
-x = _"you goofed"
-printf(_"Number of users is %d\n", nusers)
+prog.c:5: assertion failed: a <= 5 && b >= 17.1
 @end example
 
address@hidden
-If you are creating strings dynamically, you can
-still translate them, using the @code{dcgettext()}
-built-in function:
address@hidden @code{assert()} user-defined function
+The C language makes it possible to turn the condition into a string for use
+in printing the diagnostic message.  This is not possible in @command{awk}, so
+this @code{assert()} function also requires a string version of the condition
+that is being tested.
+Following is the function:
 
 @example
-message = nusers " users logged in"
-message = dcgettext(message, "adminprog")
-print message
address@hidden example
address@hidden file eg/lib/assert.awk
+# assert --- assert that a condition is true. Otherwise exit.
 
-Here, the call to @code{dcgettext()} supplies a different
-text domain (@code{"adminprog"}) in which to find the
-message, but it uses the default @code{"LC_MESSAGES"} category.
address@hidden endfile
address@hidden
address@hidden file eg/lib/assert.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# May, 1993
 
address@hidden @code{LC_MESSAGES} locale category, @code{bindtextdomain()} 
function (@command{gawk})
address@hidden
-During development, you might want to put the @file{.mo}
-file in a private directory for testing.  This is done
-with the @code{bindtextdomain()} built-in function:
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/assert.awk
+function assert(condition, string)
address@hidden
+    if (! condition) @{
+        printf("%s:%d: assertion failed: %s\n",
+            FILENAME, FNR, string) > "/dev/stderr"
+        _assert_exit = 1
+        exit 1
+    @}
address@hidden
 
address@hidden
-BEGIN @{
-   TEXTDOMAIN = "guide"   # our text domain
-   if (Testing) @{
-       # where to find our files
-       bindtextdomain("testdir")
-       # joe is in charge of adminprog
-       bindtextdomain("../joe/testdir", "adminprog")
-   @}
-   @dots{}
address@hidden
+END @{
+    if (_assert_exit)
+        exit 1
 @}
address@hidden group
address@hidden endfile
 @end example
 
address@hidden enumerate
-
address@hidden Example},
-for an example program showing the steps to create
-and use translations from @command{awk}.
-
address@hidden Translator i18n
address@hidden Translating @command{awk} Programs
-
address@hidden @code{.po} files
address@hidden files, @code{.po}
address@hidden portable object files
address@hidden files, portable object
-Once a program's translatable strings have been marked, they must
-be extracted to create the initial @file{.po} file.
-As part of translation, it is often helpful to rearrange the order
-in which arguments to @code{printf} are output.
-
address@hidden's @option{--gen-pot} command-line option extracts
-the messages and is discussed next.
-After that, @code{printf}'s ability to
-rearrange the order for @code{printf} arguments at runtime
-is covered.
+The @code{assert()} function tests the @code{condition} parameter. If it
+is false, it prints a message to standard error, using the @code{string}
+parameter to describe the failed condition.  It then sets the variable
address@hidden to one and executes the @code{exit} statement.
+The @code{exit} statement jumps to the @code{END} rule. If the @code{END}
+rules finds @code{_assert_exit} to be true, it exits immediately.
 
address@hidden
-* String Extraction::           Extracting marked strings.
-* Printf Ordering::             Rearranging @code{printf} arguments.
-* I18N Portability::            @command{awk}-level portability issues.
address@hidden menu
+The purpose of the test in the @code{END} rule is to
+keep any other @code{END} rules from running.  When an assertion fails, the
+program should exit immediately.
+If no assertions fail, then @code{_assert_exit} is still
+false when the @code{END} rule is run normally, and the rest of the
+program's @code{END} rules execute.
+For all of this to work correctly, @file{assert.awk} must be the
+first source file read by @command{awk}.
+The function can be used in a program in the following way:
 
address@hidden String Extraction
address@hidden Extracting Marked Strings
address@hidden strings, extracting
address@hidden marked address@hidden extracting
address@hidden @code{--gen-pot} option
address@hidden command-line options, string extraction
address@hidden string extraction (internationalization)
address@hidden marked string extraction (internationalization)
address@hidden extraction, of marked strings (internationalization)
address@hidden
+function myfunc(a, b)
address@hidden
+     assert(a <= 5 && b >= 17.1, "a <= 5 && b >= 17.1")
+     @dots{}
address@hidden
address@hidden example
 
address@hidden @code{--gen-pot} option
-Once your @command{awk} program is working, and all the strings have
-been marked and you've set (and perhaps bound) the text domain,
-it is time to produce translations.
-First, use the @option{--gen-pot} command-line option to create
-the initial @file{.pot} file:
address@hidden
+If the assertion fails, you see a message similar to the following:
 
 @example
-$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
+mydata:1357: assertion failed: a <= 5 && b >= 17.1
 @end example
 
address@hidden @code{xgettext} utility
-When run with @option{--gen-pot}, @command{gawk} does not execute your
-program.  Instead, it parses it as usual and prints all marked strings
-to standard output in the format of a GNU @code{gettext} Portable Object
-file.  Also included in the output are any constant strings that
-appear as the first argument to @code{dcgettext()} or as the first and
-second argument to @code{dcngettext()address@hidden
address@hidden utility that comes with GNU
address@hidden can handle @file{.awk} files.}
address@hidden Example},
-for the full list of steps to go through to create and test
-translations for @command{guide}.
address@hidden @code{END} pattern, @code{assert()} user-defined function and
+There is a small problem with this version of @code{assert()}.
+An @code{END} rule is automatically added
+to the program calling @code{assert()}.  Normally, if a program consists
+of just a @code{BEGIN} rule, the input files and/or standard input are
+not read. However, now that the program has an @code{END} rule, @command{awk}
+attempts to read the input @value{DF}s or standard input
+(@pxref{Using BEGIN/END}),
+most likely causing the program to hang as it waits for input.
 
address@hidden Printf Ordering
address@hidden Rearranging @code{printf} Arguments
address@hidden @code{BEGIN} pattern, @code{assert()} user-defined function and
+There is a simple workaround to this:
+make sure that such a @code{BEGIN} rule always ends
+with an @code{exit} statement.
address@hidden ENDOFRANGE asse
address@hidden ENDOFRANGE assef
address@hidden ENDOFRANGE flibass
address@hidden ENDOFRANGE libfass
 
address@hidden @code{printf} statement, positional specifiers
address@hidden positional specifiers, @code{printf} statement
-Format strings for @code{printf} and @code{sprintf()}
address@hidden Round Function
address@hidden Rounding Numbers
+
address@hidden rounding numbers
address@hidden numbers, rounding
address@hidden libraries of @command{awk} functions, rounding numbers
address@hidden functions, library, rounding numbers
address@hidden @code{print} statement, @code{sprintf()} function and
address@hidden @code{printf} statement, @code{sprintf()} function and
address@hidden @code{sprintf()} function, @code{print}/@code{printf} statements 
and
+The way @code{printf} and @code{sprintf()}
 (@pxref{Printf})
-present a special problem for translation.
-Consider the following:@footnote{This example is borrowed
-from the GNU @code{gettext} manual.}
+perform rounding often depends upon the system's C @code{sprintf()}
+subroutine.  On many machines, @code{sprintf()} rounding is ``unbiased,''
+which means it doesn't always round a trailing @samp{.5} up, contrary
+to naive expectations.  In unbiased rounding, @samp{.5} rounds to even,
+rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4.  This means
+that if you are using a format that does rounding (e.g., @code{"%.0f"}),
+you should check what your system does.  The following function does
+traditional rounding; it might be useful if your @command{awk}'s @code{printf}
+does unbiased rounding:
 
address@hidden line broken here only for smallbook format
address@hidden @code{round()} user-defined function
 @example
-printf(_"String `%s' has %d characters\n",
-          string, length(string)))
address@hidden example
address@hidden file eg/lib/round.awk
+# round.awk --- do normal rounding
address@hidden endfile
address@hidden
address@hidden file eg/lib/round.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# August, 1996
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/round.awk
 
-A possible German translation for this might be:
+function round(x,   ival, aval, fraction)
address@hidden
+   ival = int(x)    # integer part, int() truncates
 
address@hidden
-"%d Zeichen lang ist die Zeichenkette `%s'\n"
+   # see if fractional part
+   if (ival == x)   # no fraction
+      return ival   # ensure no decimals
+
+   if (x < 0) @{
+      aval = -x     # absolute value
+      ival = int(aval)
+      fraction = aval - ival
+      if (fraction >= .5)
+         return int(x) - 1   # -2.5 --> -3
+      else
+         return int(x)       # -2.3 --> -2
+   @} else @{
+      fraction = x - ival
+      if (fraction >= .5)
+         return ival + 1
+      else
+         return ival
+   @}
address@hidden
address@hidden endfile
address@hidden don't include test harness in the file that gets installed
+
+# test harness
address@hidden print $0, round($0) @}
 @end example
 
-The problem should be obvious: the order of the format
-specifications is different from the original!
-Even though @code{gettext()} can return the translated string
-at runtime,
-it cannot change the argument order in the call to @code{printf}.
address@hidden Cliff Random Function
address@hidden The Cliff Random Number Generator
address@hidden random numbers, Cliff
address@hidden Cliff random numbers
address@hidden numbers, Cliff random
address@hidden functions, library, Cliff random numbers
 
-To solve this problem, @code{printf} format specifiers may have
-an additional optional element, which we call a @dfn{positional specifier}.
-For example:
+The
address@hidden://mathworld.wolfram.com/CliffRandomNumberGenerator.html, Cliff 
random number generator}
+is a very simple random number generator that ``passes the noise sphere test
+for randomness by showing no structure.''
+It is easily programmed, in less than 10 lines of @command{awk} code:
 
address@hidden @code{cliff_rand()} user-defined function
 @example
-"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
address@hidden example
address@hidden file eg/lib/cliff_rand.awk
+# cliff_rand.awk --- generate Cliff random numbers
address@hidden endfile
address@hidden
address@hidden file eg/lib/cliff_rand.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# December 2000
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/cliff_rand.awk
 
-Here, the positional specifier consists of an integer count, which indicates 
which
-argument to use, and a @samp{$}. Counts are one-based, and the
-format string itself is @emph{not} included.  Thus, in the following
-example, @samp{string} is the first argument and @samp{length(string)} is the 
second:
+BEGIN @{ _cliff_seed = 0.1 @}
 
address@hidden
-$ @kbd{gawk 'BEGIN @{}
->     @kbd{string = "Dont Panic"}
->     @kbd{printf _"%2$d characters live in \"%1$s\"\n",}
->                         @kbd{string, length(string)}
-> @address@hidden'}
address@hidden 10 characters live in "Dont Panic"
+function cliff_rand()
address@hidden
+    _cliff_seed = (100 * log(_cliff_seed)) % 1
+    if (_cliff_seed < 0)
+        _cliff_seed = - _cliff_seed
+    return _cliff_seed
address@hidden
address@hidden endfile
 @end example
 
-If present, positional specifiers come first in the format specification,
-before the flags, the field width, and/or the precision.
+This algorithm requires an initial ``seed'' of 0.1.  Each new value
+uses the current seed as input for the calculation.
+If the built-in @code{rand()} function
+(@pxref{Numeric Functions})
+isn't random enough, you might try using this function instead.
 
-Positional specifiers can be used with the dynamic field width and
-precision capability:
address@hidden Ordinal Functions
address@hidden Translating Between Characters and Numbers
 
address@hidden
-$ @kbd{gawk 'BEGIN @{}
->    @kbd{printf("%*.*s\n", 10, 20, "hello")}
->    @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")}
-> @address@hidden'}
address@hidden      hello
address@hidden      hello
address@hidden example
-
address@hidden NOTE
-When using @samp{*} with a positional specifier, the @samp{*}
-comes first, then the integer position, and then the @samp{$}.
-This is somewhat counterintuitive.
address@hidden quotation
address@hidden libraries of @command{awk} functions, character values as numbers
address@hidden functions, library, character values as numbers
address@hidden characters, values of as numbers
address@hidden numbers, as values of characters
+One commercial implementation of @command{awk} supplies a built-in function,
address@hidden()}, which takes a character and returns the numeric value for 
that
+character in the machine's character set.  If the string passed to
address@hidden()} has more than one character, only the first one is used.
 
address@hidden @code{printf} statement, positional specifiers, mixing with 
regular formats
address@hidden positional specifiers, @code{printf} statement, mixing with 
regular formats
address@hidden format specifiers, mixing regular with positional specifiers
address@hidden does not allow you to mix regular format specifiers
-and those with positional specifiers in the same string:
+The inverse of this function is @code{chr()} (from the function of the same
+name in Pascal), which takes a number and returns the corresponding character.
+Both functions are written very nicely in @command{awk}; there is no real
+reason to build them into the @command{awk} interpreter:
 
address@hidden @code{ord()} user-defined function
address@hidden @code{chr()} user-defined function
 @example
-$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'}
address@hidden gawk: cmd. line:1: fatal: must use `count$' on all formats or 
none
address@hidden example
-
address@hidden NOTE
-There are some pathological cases that @command{gawk} may fail to
-diagnose.  In such cases, the output may not be what you expect.
-It's still a bad idea to try mixing them, even if @command{gawk}
-doesn't detect it.
address@hidden quotation
address@hidden file eg/lib/ord.awk
+# ord.awk --- do ord and chr
 
-Although positional specifiers can be used directly in @command{awk} programs,
-their primary purpose is to help in producing correct translations of
-format strings into languages different from the one in which the program
-is first written.
+# Global identifiers:
+#    _ord_:        numerical values indexed by characters
+#    _ord_init:    function to initialize _ord_
address@hidden endfile
address@hidden
address@hidden file eg/lib/ord.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# 16 January, 1992
+# 20 July, 1992, revised
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/ord.awk
 
address@hidden I18N Portability
address@hidden @command{awk} Portability Issues
+BEGIN    @{ _ord_init() @}
 
address@hidden portability, internationalization and
address@hidden internationalization, localization, portability and
address@hidden's internationalization features were purposely chosen to
-have as little impact as possible on the portability of @command{awk}
-programs that use them to other versions of @command{awk}.
-Consider this program:
+function _ord_init(    low, high, i, t)
address@hidden
+    low = sprintf("%c", 7) # BEL is ascii 7
+    if (low == "\a") @{    # regular ascii
+        low = 0
+        high = 127
+    @} else if (sprintf("%c", 128 + 7) == "\a") @{
+        # ascii, mark parity
+        low = 128
+        high = 255
+    @} else @{        # ebcdic(!)
+        low = 0
+        high = 255
+    @}
 
address@hidden
-BEGIN @{
-    TEXTDOMAIN = "guide"
-    if (Test_Guide)   # set with -v
-        bindtextdomain("/test/guide/messages")
-    print _"don't panic!"
+    for (i = low; i <= high; i++) @{
+        t = sprintf("%c", i)
+        _ord_[t] = i
+    @}
 @}
address@hidden endfile
 @end example
 
address@hidden
-As written, it won't work on other versions of @command{awk}.
-However, it is actually almost portable, requiring very little
-change:
-
address@hidden @bullet
address@hidden @code{TEXTDOMAIN} variable, portability and
address@hidden
-Assignments to @code{TEXTDOMAIN} won't have any effect,
-since @code{TEXTDOMAIN} is not special in other @command{awk} implementations.
-
address@hidden
-Non-GNU versions of @command{awk} treat marked strings
-as the concatenation of a variable named @code{_} with the string
-following address@hidden is good fodder for an ``Obfuscated
address@hidden'' contest.} Typically, the variable @code{_} has
-the null string (@code{""}) as its value, leaving the original string constant 
as
-the result.
-
address@hidden
-By defining ``dummy'' functions to replace @code{dcgettext()}, 
@code{dcngettext()}
-and @code{bindtextdomain()}, the @command{awk} program can be made to run, but
-all the messages are output in the original language.
-For example:
address@hidden character sets (machine character encodings)
address@hidden ASCII
address@hidden EBCDIC
address@hidden mark parity
+Some explanation of the numbers used by @code{chr} is worthwhile.
+The most prominent character set in use today is address@hidden
+is changing; many systems use Unicode, a very large character set
+that includes ASCII as a subset.  On systems with full Unicode support,
+a character can occupy up to 32 bits, making simple tests such as
+used here prohibitively expensive.}
+Although an
+8-bit byte can hold 256 distinct values (from 0 to 255), ASCII only
+defines characters that use the values from 0 to address@hidden
+has been extended in many countries to use the values from 128 to 255
+for country-specific characters.  If your  system uses these extensions,
+you can simplify @code{_ord_init} to loop from 0 to 255.}
+In the now distant past,
+at least one minicomputer manufacturer
address@hidden Pr1me, blech
+used ASCII, but with mark parity, meaning that the leftmost bit in the byte
+is always 1.  This means that on those systems, characters
+have numeric values from 128 to 255.
+Finally, large mainframe systems use the EBCDIC character set, which
+uses all 256 values.
+While there are other character sets in use on some older systems,
+they are not really worth worrying about:
 
address@hidden @code{bindtextdomain()} function (@command{gawk}), portability 
and
address@hidden @code{dcgettext()} function (@command{gawk}), portability and
address@hidden @code{dcngettext()} function (@command{gawk}), portability and
 @example
address@hidden file eg/lib/libintl.awk
-function bindtextdomain(dir, domain)
address@hidden file eg/lib/ord.awk
+function ord(str,    c)
 @{
-    return dir
+    # only first character is of interest
+    c = substr(str, 1, 1)
+    return _ord_[c]
 @}
 
-function dcgettext(string, domain, category)
+function chr(c)
 @{
-    return string
+    # force c to be numeric by adding 0
+    return sprintf("%c", c + 0)
 @}
address@hidden endfile
 
-function dcngettext(string1, string2, number, domain, category)
address@hidden
-    return (number == 1 ? string1 : string2)
address@hidden
+#### test code ####
+# BEGIN    \
+# @{
+#    for (;;) @{
+#        printf("enter a character: ")
+#        if (getline var <= 0)
+#            break
+#        printf("ord(%s) = %d\n", var, ord(var))
+#    @}
+# @}
 @c endfile
 @end example
 
address@hidden
-The use of positional specifications in @code{printf} or
address@hidden()} is @emph{not} portable.
-To support @code{gettext()} at the C level, many systems' C versions of
address@hidden()} do support positional specifiers.  But it works only if
-enough arguments are supplied in the function call.  Many versions of
address@hidden pass @code{printf} formats and arguments unchanged to the
-underlying C library version of @code{sprintf()}, but only one format and
-argument at a time.  What happens if a positional specification is
-used is anybody's guess.
-However, since the positional specifications are primarily for use in
address@hidden format strings, and since non-GNU @command{awk}s never
-retrieve the translated string, this should not be a problem in practice.
address@hidden itemize
address@hidden ENDOFRANGE inap
+An obvious improvement to these functions is to move the code for the
address@hidden@w{_ord_init}} function into the body of the @code{BEGIN} rule.  
It was
+written this way initially for ease of development.
+There is a ``test program'' in a @code{BEGIN} rule, to test the
+function.  It is commented out for production use.
 
address@hidden I18N Example
address@hidden A Simple Internationalization Example
address@hidden Join Function
address@hidden Merging an Array into a String
 
-Now let's look at a step-by-step example of how to internationalize and
-localize a simple @command{awk} program, using @file{guide.awk} as our
-original source:
address@hidden libraries of @command{awk} functions, merging arrays into strings
address@hidden functions, library, merging arrays into strings
address@hidden strings, merging arrays into
address@hidden arrays, merging into strings
+When doing string processing, it is often useful to be able to join
+all the strings in an array into one long string.  The following function,
address@hidden()}, accomplishes this task.  It is used later in several of
+the application programs
+(@pxref{Sample Programs}).
+
+Good function design is important; this function needs to be general but it
+should also have a reasonable default behavior.  It is called with an array
+as well as the beginning and ending indices of the elements in the array to be
+merged.  This assumes that the array indices are numeric---a reasonable
+assumption since the array was likely created with @code{split()}
+(@pxref{String Functions}):
 
address@hidden @code{join()} user-defined function
 @example
address@hidden file eg/prog/guide.awk
-BEGIN @{
-    TEXTDOMAIN = "guide"
-    bindtextdomain(".")  # for testing
-    print _"Don't Panic"
-    print _"The Answer Is", 42
-    print "Pardon me, Zaphod who?"
address@hidden
address@hidden file eg/lib/join.awk
+# join.awk --- join an array into a string
 @c endfile
address@hidden example
-
address@hidden
-Run @samp{gawk --gen-pot} to create the @file{.pot} file:
address@hidden
address@hidden file eg/lib/join.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# May 1993
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/join.awk
 
address@hidden
-$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
+function join(array, start, end, sep,    result, i)
address@hidden
+    if (sep == "")
+       sep = " "
+    else if (sep == SUBSEP) # magic value
+       sep = ""
+    result = array[start]
+    for (i = start + 1; i <= end; i++)
+        result = result sep array[i]
+    return result
address@hidden
address@hidden endfile
 @end example
 
address@hidden
-This produces:
-
address@hidden
address@hidden file eg/data/guide.po
-#: guide.awk:4
-msgid "Don't Panic"
-msgstr ""
+An optional additional argument is the separator to use when joining the
+strings back together.  If the caller supplies a nonempty value,
address@hidden()} uses it; if it is not supplied, it has a null
+value.  In this case, @code{join()} uses a single space as a default
+separator for the strings.  If the value is equal to @code{SUBSEP},
+then @code{join()} joins the strings with no separator between them.
address@hidden serves as a ``magic'' value to indicate that there should
+be no separation between the component address@hidden would
+be nice if @command{awk} had an assignment operator for concatenation.
+The lack of an explicit operator for concatenation makes string operations
+more difficult than they really need to be.}
 
-#: guide.awk:5
-msgid "The Answer Is"
-msgstr ""
address@hidden Getlocaltime Function
address@hidden Managing the Time of Day
 
address@hidden endfile
address@hidden example
address@hidden libraries of @command{awk} functions, managing, time
address@hidden functions, library, managing time
address@hidden timestamps, formatted
address@hidden time, managing
+The @code{systime()} and @code{strftime()} functions described in
address@hidden Functions},
+provide the minimum functionality necessary for dealing with the time of day
+in human readable form.  While @code{strftime()} is extensive, the control
+formats are not necessarily easy to remember or intuitively obvious when
+reading a program.
 
-This original portable object template file is saved and reused for each 
language
-into which the application is translated.  The @code{msgid}
-is the original string and the @code{msgstr} is the translation.
+The following function, @code{getlocaltime()}, populates a user-supplied array
+with preformatted time information.  It returns a string with the current
+time formatted in the same way as the @command{date} utility:
 
address@hidden NOTE
-Strings not marked with a leading underscore do not
-appear in the @file{guide.pot} file.
address@hidden quotation
address@hidden @code{getlocaltime()} user-defined function
address@hidden
address@hidden file eg/lib/gettime.awk
+# getlocaltime.awk --- get the time of day in a usable format
address@hidden endfile
address@hidden
address@hidden file eg/lib/gettime.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain, May 1993
+#
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/gettime.awk
 
-Next, the messages must be translated.
-Here is a translation to a hypothetical dialect of English,
-called ``Mellow'':@footnote{Perhaps it would be better if it were
-called ``Hippy.'' Ah, well.}
+# Returns a string in the format of output of date(1)
+# Populates the array argument time with individual values:
+#    time["second"]       -- seconds (0 - 59)
+#    time["minute"]       -- minutes (0 - 59)
+#    time["hour"]         -- hours (0 - 23)
+#    time["althour"]      -- hours (0 - 12)
+#    time["monthday"]     -- day of month (1 - 31)
+#    time["month"]        -- month of year (1 - 12)
+#    time["monthname"]    -- name of the month
+#    time["shortmonth"]   -- short name of the month
+#    time["year"]         -- year modulo 100 (0 - 99)
+#    time["fullyear"]     -- full year
+#    time["weekday"]      -- day of week (Sunday = 0)
+#    time["altweekday"]   -- day of week (Monday = 0)
+#    time["dayname"]      -- name of weekday
+#    time["shortdayname"] -- short name of weekday
+#    time["yearday"]      -- day of year (0 - 365)
+#    time["timezone"]     -- abbreviation of timezone name
+#    time["ampm"]         -- AM or PM designation
+#    time["weeknum"]      -- week number, Sunday first day
+#    time["altweeknum"]   -- week number, Monday first day
 
address@hidden
address@hidden
-$ cp guide.pot guide-mellow.po
address@hidden translations to} guide-mellow.po @dots{}
address@hidden group
address@hidden example
+function getlocaltime(time,    ret, now, i)
address@hidden
+    # get time once, avoids unnecessary system calls
+    now = systime()
 
address@hidden
-Following are the translations:
+    # return date(1)-style output
+    ret = strftime("%a %b %e %H:%M:%S %Z %Y", now)
 
address@hidden
address@hidden file eg/data/guide-mellow.po
-#: guide.awk:4
-msgid "Don't Panic"
-msgstr "Hey man, relax!"
+    # clear out target array
+    delete time
 
-#: guide.awk:5
-msgid "The Answer Is"
-msgstr "Like, the scoop is"
+    # fill in values, force numeric values to be
+    # numeric by adding 0
+    time["second"]       = strftime("%S", now) + 0
+    time["minute"]       = strftime("%M", now) + 0
+    time["hour"]         = strftime("%H", now) + 0
+    time["althour"]      = strftime("%I", now) + 0
+    time["monthday"]     = strftime("%d", now) + 0
+    time["month"]        = strftime("%m", now) + 0
+    time["monthname"]    = strftime("%B", now)
+    time["shortmonth"]   = strftime("%b", now)
+    time["year"]         = strftime("%y", now) + 0
+    time["fullyear"]     = strftime("%Y", now) + 0
+    time["weekday"]      = strftime("%w", now) + 0
+    time["altweekday"]   = strftime("%u", now) + 0
+    time["dayname"]      = strftime("%A", now)
+    time["shortdayname"] = strftime("%a", now)
+    time["yearday"]      = strftime("%j", now) + 0
+    time["timezone"]     = strftime("%Z", now)
+    time["ampm"]         = strftime("%p", now)
+    time["weeknum"]      = strftime("%U", now) + 0
+    time["altweeknum"]   = strftime("%W", now) + 0
 
+    return ret
address@hidden
 @c endfile
 @end example
 
address@hidden Linux
address@hidden GNU/Linux
-The next step is to make the directory to hold the binary message object
-file and then to create the @file{guide.mo} file.
-The directory layout shown here is standard for GNU @code{gettext} on
-GNU/Linux systems.  Other versions of @code{gettext} may use a different
-layout:
+The string indices are easier to use and read than the various formats
+required by @code{strftime()}.  The @code{alarm} program presented in
address@hidden Program},
+uses this function.
+A more general design for the @code{getlocaltime()} function would have
+allowed the user to supply an optional timestamp value to use instead
+of the current time.
 
address@hidden
-$ @kbd{mkdir en_US en_US/LC_MESSAGES}
address@hidden example
address@hidden Data File Management
address@hidden @value{DDF} Management
 
address@hidden @code{.po} files, converting to @code{.mo}
address@hidden files, @code{.po}, converting to @code{.mo}
address@hidden @code{.mo} files, converting from @code{.po}
address@hidden files, @code{.mo}, converting from @code{.po}
address@hidden portable object files, converting to message object files
address@hidden files, portable object, converting to message object files
address@hidden message object files, converting from portable object files
address@hidden files, message object, converting from portable object files
address@hidden @command{msgfmt} utility
-The @command{msgfmt} utility does the conversion from human-readable
address@hidden file to machine-readable @file{.mo} file.
-By default, @command{msgfmt} creates a file named @file{messages}.
-This file must be renamed and placed in the proper directory so that
address@hidden can find it:
address@hidden STARTOFRANGE dataf
address@hidden files, managing
address@hidden STARTOFRANGE libfdataf
address@hidden libraries of @command{awk} functions, managing, @value{DF}s
address@hidden STARTOFRANGE flibdataf
address@hidden functions, library, managing @value{DF}s
+This @value{SECTION} presents functions that are useful for managing
+command-line @value{DF}s.
 
address@hidden
-$ @kbd{msgfmt guide-mellow.po}
-$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo}
address@hidden example
address@hidden
+* Filetrans Function::          A function for handling data file transitions.
+* Rewind Function::             A function for rereading the current file.
+* File Checking::               Checking that data files are readable.
+* Empty Files::                 Checking for zero-length files.
+* Ignoring Assigns::            Treating assignments as file names.
address@hidden menu
 
-Finally, we run the program to test it:
address@hidden Filetrans Function
address@hidden Noting @value{DDF} Boundaries
 
address@hidden
-$ @kbd{gawk -f guide.awk}
address@hidden Hey man, relax!
address@hidden Like, the scoop is 42
address@hidden Pardon me, Zaphod who?
address@hidden example
address@hidden files, managing, @value{DF} boundaries
address@hidden files, initialization and cleanup
+The @code{BEGIN} and @code{END} rules are each executed exactly once at
+the beginning and end of your @command{awk} program, respectively
+(@pxref{BEGIN/END}).
+We (the @command{gawk} authors) once had a user who mistakenly thought that the
address@hidden rule is executed at the beginning of each @value{DF} and the
address@hidden rule is executed at the end of each @value{DF}.
 
-If the three replacement functions for @code{dcgettext()}, @code{dcngettext()}
-and @code{bindtextdomain()}
-(@pxref{I18N Portability})
-are in a file named @file{libintl.awk},
-then we can run @file{guide.awk} unchanged as follows:
+When informed
+that this was not the case, the user requested that we add new special
+patterns to @command{gawk}, named @code{BEGIN_FILE} and @code{END_FILE}, that
+would have the desired behavior.  He even supplied us the code to do so.
+
+Adding these special patterns to @command{gawk} wasn't necessary;
+the job can be done cleanly in @command{awk} itself, as illustrated
+by the following library program.
+It arranges to call two user-supplied functions, @code{beginfile()} and
address@hidden()}, at the beginning and end of each @value{DF}.
+Besides solving the problem in only nine(!) lines of code, it does so
address@hidden; this works with any implementation of @command{awk}:
 
 @example
-$ @kbd{gawk --posix -f guide.awk -f libintl.awk}
address@hidden Don't Panic
address@hidden The Answer Is 42
address@hidden Pardon me, Zaphod who?
address@hidden example
+# transfile.awk
+#
+# Give the user a hook for filename transitions
+#
+# The user must supply functions beginfile() and endfile()
+# that each take the name of the file being started or
+# finished, respectively.
address@hidden #
address@hidden # Arnold Robbins, arnold@@skeeve.com, Public Domain
address@hidden # January 1992
 
address@hidden Gawk I18N
address@hidden @command{gawk} Can Speak Your Language
+FILENAME != _oldfilename \
address@hidden
+    if (_oldfilename != "")
+        endfile(_oldfilename)
+    _oldfilename = FILENAME
+    beginfile(FILENAME)
address@hidden
 
address@hidden itself has been internationalized
-using the GNU @code{gettext} package.
-(GNU @code{gettext} is described in
-complete detail in
address@hidden
address@hidden, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.)
address@hidden ifinfo
address@hidden
address@hidden gettext tools}.)
address@hidden ifnotinfo
-As of this writing, the latest version of GNU @code{gettext} is
address@hidden://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, 
@value{PVERSION} 0.18.1}.
+END   @{ endfile(FILENAME) @}
address@hidden example
 
-If a translation of @command{gawk}'s messages exists,
-then @command{gawk} produces usage messages, warnings,
-and fatal errors in the local language.
address@hidden ENDOFRANGE inloc
-
address@hidden Advanced Features
address@hidden Advanced Features of @command{gawk}
address@hidden advanced features, network connections, See Also networks, 
connections
address@hidden STARTOFRANGE gawadv
address@hidden @command{gawk}, features, advanced
address@hidden STARTOFRANGE advgaw
address@hidden advanced features, @command{gawk}
address@hidden
-Contributed by: Peter Langston <address@hidden>
-
-    Found in Steve English's "signature" line:
-
-"Write documentation as if whoever reads it is a violent psychopath
-who knows where you live."
address@hidden ignore
address@hidden
address@hidden documentation as if whoever reads it is
-a violent psychopath who knows where you address@hidden
-Steve English, as quoted by Peter Langston
address@hidden quotation
-
-This @value{CHAPTER} discusses advanced features in @command{gawk}.
-It's a bit of a ``grab bag'' of items that are otherwise unrelated
-to each other.
-First, a command-line option allows @command{gawk} to recognize
-nondecimal numbers in input data, not just in @command{awk}
-programs.
-Then, @command{gawk}'s special features for sorting arrays are presented.
-Next, two-way I/O, discussed briefly in earlier parts of this
address@hidden, is described in full detail, along with the basics
-of TCP/IP networking.  Finally, @command{gawk}
-can @dfn{profile} an @command{awk} program, making it possible to tune
-it for performance.
-
address@hidden Extensions},
-discusses the ability to dynamically add new built-in functions to
address@hidden  As this feature is still immature and likely to change,
-its description is relegated to an appendix.
-
address@hidden
-* Nondecimal Data::             Allowing nondecimal input data.
-* Array Sorting::               Facilities for controlling array traversal and
-                                sorting arrays.
-* Two-way I/O::                 Two-way communications with another process.
-* TCP/IP Networking::           Using @command{gawk} for network programming.
-* Profiling::                   Profiling your @command{awk} programs.
address@hidden menu
-
address@hidden Nondecimal Data
address@hidden Allowing Nondecimal Input Data
address@hidden @code{--non-decimal-data} option
address@hidden advanced features, @command{gawk}, nondecimal input data
address@hidden input, address@hidden nondecimal
address@hidden constants, nondecimal
+This file must be loaded before the user's ``main'' program, so that the
+rule it supplies is executed first.
 
-If you run @command{gawk} with the @option{--non-decimal-data} option,
-you can have nondecimal constants in your input data:
+This rule relies on @command{awk}'s @code{FILENAME} variable that
+automatically changes for each new @value{DF}.  The current @value{FN} is
+saved in a private variable, @code{_oldfilename}.  If @code{FILENAME} does
+not equal @code{_oldfilename}, then a new @value{DF} is being processed and
+it is necessary to call @code{endfile()} for the old file.  Because
address@hidden()} should only be called if a file has been processed, the
+program first checks to make sure that @code{_oldfilename} is not the null
+string.  The program then assigns the current @value{FN} to
address@hidden and calls @code{beginfile()} for the file.
+Because, like all @command{awk} variables, @code{_oldfilename} is
+initialized to the null string, this rule executes correctly even for the
+first @value{DF}.
 
address@hidden line break here for small book format
address@hidden
-$ @kbd{echo 0123 123 0x123 |}
-> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",}
->                                         @kbd{$1, $2, $3 @}'}
address@hidden 83, 123, 291
address@hidden example
+The program also supplies an @code{END} rule to do the final processing for
+the last file.  Because this @code{END} rule comes before any @code{END} rules
+supplied in the ``main'' program, @code{endfile()} is called first.  Once
+again the value of multiple @code{BEGIN} and @code{END} rules should be clear.
 
-For this feature to work, write your program so that
address@hidden treats your data as numeric:
address@hidden @code{beginfile()} user-defined function
address@hidden @code{endfile()} user-defined function
+If the same @value{DF} occurs twice in a row on the command line, then
address@hidden()} and @code{beginfile()} are not executed at the end of the
+first pass and at the beginning of the second pass.
+The following version solves the problem:
 
 @example
-$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
address@hidden 0123 123 0x123
address@hidden example
address@hidden file eg/lib/ftrans.awk
+# ftrans.awk --- handle data file transitions
+#
+# user supplies beginfile() and endfile() functions
address@hidden endfile
address@hidden
address@hidden file eg/lib/ftrans.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# November 1992
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/ftrans.awk
 
address@hidden
-The @code{print} statement treats its expressions as strings.
-Although the fields can act as numbers when necessary,
-they are still strings, so @code{print} does not try to treat them
-numerically.  You may need to add zero to a field to force it to
-be treated as a number.  For example:
+FNR == 1 @{
+    if (_filename_ != "")
+        endfile(_filename_)
+    _filename_ = FILENAME
+    beginfile(FILENAME)
address@hidden
 
address@hidden
-$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
-> @address@hidden print $1, $2, $3}
->   @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
address@hidden 0123 123 0x123
address@hidden 83 123 291
+END  @{ endfile(_filename_) @}
address@hidden endfile
 @end example
 
-Because it is common to have decimal data with leading zeros, and because
-using this facility could lead to surprising results, the default is to leave 
it
-disabled.  If you want it, you must explicitly request it.
address@hidden Program},
+shows how this library function can be used and
+how it simplifies writing the main program.
 
address@hidden programming conventions, @code{--non-decimal-data} option
address@hidden @code{--non-decimal-data} option, @code{strtonum()} function and
address@hidden @code{strtonum()} function (@command{gawk}), 
@code{--non-decimal-data} option and
address@hidden CAUTION
address@hidden of this option is not recommended.}
-It can break old programs very badly.
-Instead, use the @code{strtonum()} function to convert your data
-(@pxref{Nondecimal-numbers}).
-This makes your programs easier to write and easier to read, and
-leads to less surprising results.
address@hidden quotation
address@hidden fakenode --- for prepinfo
address@hidden Advanced Notes: So Why Does @command{gawk} have @code{BEGINFILE} 
and @code{ENDFILE}?
 
address@hidden Array Sorting
address@hidden Controlling Array Traversal and Array Sorting
+You are probably wondering, if @code{beginfile()} and @code{endfile()}
+functions can do the job, why does @command{gawk} have
address@hidden and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
 
address@hidden lets you control the order in which a @samp{for (i in array)}
-loop traverses an array.
+Good question.  Normally, if @command{awk} cannot open a file, this
+causes an immediate fatal error.  In this case, there is no way for a
+user-defined function to deal with the problem, since the mechanism for
+calling it relies on the file being open and at the first record.  Thus,
+the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
+files that cannot be processed.  @code{ENDFILE} exists for symmetry,
+and because it provides an easy way to do per-file cleanup processing.
 
-In addition, two built-in functions, @code{asort()} and @code{asorti()},
-let you sort arrays based on the array values and indices, respectively.
-These two functions also provide control over the sorting criteria used
-to order the elements during sorting.
address@hidden Rewind Function
address@hidden Rereading the Current File
 
address@hidden
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions::     How to use @code{asort()} and @code{asorti()}.
address@hidden menu
address@hidden files, reading
+Another request for a new built-in function was for a @code{rewind()}
+function that would make it possible to reread the current file.
+The requesting user didn't want to have to use @code{getline}
+(@pxref{Getline})
+inside a loop.
 
address@hidden Controlling Array Traversal
address@hidden Controlling Array Traversal
+However, as long as you are not in the @code{END} rule, it is
+quite easy to arrange to immediately close the current input file
+and then start over with it from the top.
+For lack of a better name, we'll call it @code{rewind()}:
 
-By default, the order in which a @samp{for (i in array)} loop
-scans an array is not defined; it is generally based upon
-the internal implementation of arrays inside @command{awk}.
address@hidden @code{rewind()} user-defined function
address@hidden
address@hidden file eg/lib/rewind.awk
+# rewind.awk --- rewind the current file and start over
address@hidden endfile
address@hidden
address@hidden file eg/lib/rewind.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# September 2000
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/rewind.awk
 
-Often, though, it is desirable to be able to loop over the elements
-in a particular order that you, the programmer, choose.  @command{gawk}
-lets you do this.
+function rewind(    i)
address@hidden
+    # shift remaining arguments up
+    for (i = ARGC; i > ARGIND; i--)
+        ARGV[i] = ARGV[i-1]
 
address@hidden Scanning}, describes how you can assign special,
-pre-defined values to @code{PROCINFO["sorted_in"]} in order to
-control the order in which @command{gawk} will traverse an array
-during a @code{for} loop.
+    # make sure gawk knows to keep going
+    ARGC++
 
-In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name.
-This lets you traverse an array based on any custom criterion.
-The array elements are ordered according to the return value of this
-function.  The comparison function should be defined with at least
-four arguments:
+    # make current file next to get done
+    ARGV[ARGIND+1] = FILENAME
 
address@hidden
-function comp_func(i1, v1, i2, v2)
address@hidden
-    @var{compare elements 1 and 2 in some fashion}
-    @var{return < 0; 0; or > 0}
+    # do it
+    nextfile
 @}
address@hidden endfile
 @end example
 
-Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
-are the corresponding values of the two elements being compared.
-Either @var{v1} or @var{v2}, or both, can be arrays if the array being
-traversed contains subarrays as values.
-(@xref{Arrays of Arrays}, for more information about subarrays.)
-The three possible return values are interpreted as follows:
+This code relies on the @code{ARGIND} variable
+(@pxref{Auto-set}),
+which is specific to @command{gawk}.
+If you are not using
address@hidden, you can use ideas presented in
address@hidden
+the previous @value{SECTION}
address@hidden ifnotinfo
address@hidden
address@hidden Function},
address@hidden ifinfo
+to either update @code{ARGIND} on your own
+or modify this code as appropriate.
 
address@hidden @code
address@hidden comp_func(i1, v1, i2, v2) < 0
-Index @var{i1} comes before index @var{i2} during loop traversal.
+The @code{rewind()} function also relies on the @code{nextfile} keyword
+(@pxref{Nextfile Statement}).
 
address@hidden comp_func(i1, v1, i2, v2) == 0
-Indices @var{i1} and @var{i2}
-come together but the relative order with respect to each other is undefined.
address@hidden File Checking
address@hidden Checking for Readable @value{DDF}s
 
address@hidden comp_func(i1, v1, i2, v2) > 0
-Index @var{i1} comes after index @var{i2} during loop traversal.
address@hidden table
-
-Our first comparison function can be used to scan an array in
-numerical order of the indices:
-
address@hidden
-function cmp_num_idx(i1, v1, i2, v2)
address@hidden
-     # numerical index comparison, ascending order
-     return (i1 - i2)
address@hidden
address@hidden example
-
-Our second function traverses an array based on the string order of
-the element values rather than by indices:
-
address@hidden
-function cmp_str_val(i1, v1, i2, v2)
address@hidden
-    # string value comparison, ascending order
-    v1 = v1 ""
-    v2 = v2 ""
-    if (v1 < v2)
-        return -1
-    return (v1 != v2)
address@hidden
address@hidden example
-
-The third
-comparison function makes all numbers, and numeric strings without
-any leading or trailing spaces, come out first during loop traversal:  
address@hidden troubleshooting, readable @value{DF}s
address@hidden readable @address@hidden checking
address@hidden files, skipping
+Normally, if you give @command{awk} a @value{DF} that isn't readable,
+it stops with a fatal error.  There are times when you
+might want to just ignore such files and keep going.  You can
+do this by prepending the following program to your @command{awk}
+program:
 
address@hidden @code{readable.awk} program
 @example
-function cmp_num_str_val(i1, v1, i2, v2,   n1, n2)
address@hidden
-     # numbers before string value comparison, ascending order
-     n1 = v1 + 0
-     n2 = v2 + 0
-     if (n1 == v1) 
-         return (n2 == v2) ? (n1 - n2) : -1
-     else if (n2 == v2)
-         return 1 
-     return (v1 < v2) ? -1 : (v1 != v2)
address@hidden
address@hidden example
-
-Here is a main program to demonstrate how @command{gawk}
-behaves using each of the previous functions:
address@hidden file eg/lib/readable.awk
+# readable.awk --- library file to skip over unreadable files
address@hidden endfile
address@hidden
address@hidden file eg/lib/readable.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# October 2000
+# December 2010
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/readable.awk
 
address@hidden
 BEGIN @{
-    data["one"] = 10
-    data["two"] = 20
-    data[10] = "one"
-    data[100] = 100
-    data[20] = "two"
-    
-    f[1] = "cmp_num_idx"
-    f[2] = "cmp_str_val"
-    f[3] = "cmp_num_str_val"
-    for (i = 1; i <= 3; i++) @{
-        printf("Sort function: %s\n", f[i])
-        PROCINFO["sorted_in"] = f[i]
-        for (j in data)
-            printf("\tdata[%s] = %s\n", j, data[j])
-        print ""
+    for (i = 1; i < ARGC; i++) @{
+        if (ARGV[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/ \
+            || ARGV[i] == "-" || ARGV[i] == "/dev/stdin")
+            continue    # assignment or standard input
+        else if ((getline junk < ARGV[i]) < 0) # unreadable
+            delete ARGV[i]
+        else
+            close(ARGV[i])
     @}
 @}
address@hidden endfile
 @end example
 
-Here are the results when the program is run:
address@hidden
address@hidden troubleshooting, @code{getline} function
+This works, because the @code{getline} won't be fatal.
+Removing the element from @code{ARGV} with @code{delete}
+skips the file (since it's no longer in the list).
+See also @ref{ARGC and ARGV}.
 
address@hidden
-$ @kbd{gawk -f compdemo.awk}
address@hidden Sort function: cmp_num_idx      @ii{Sort by numeric index}
address@hidden     data[two] = 20
address@hidden     data[one] = 10              @ii{Both strings are numerically 
zero}
address@hidden     data[10] = one
address@hidden     data[20] = two
address@hidden     data[100] = 100
address@hidden 
address@hidden Sort function: cmp_str_val      @ii{Sort by element values as 
strings}
address@hidden     data[one] = 10
address@hidden     data[100] = 100             @ii{String 100 is less than 
string 20}
address@hidden     data[two] = 20
address@hidden     data[10] = one
address@hidden     data[20] = two
address@hidden 
address@hidden Sort function: cmp_num_str_val  @ii{Sort all numeric values 
before all strings}
address@hidden     data[one] = 10
address@hidden     data[two] = 20
address@hidden     data[100] = 100
address@hidden     data[10] = one
address@hidden     data[20] = two
address@hidden example
address@hidden Empty Files
address@hidden Checking For Zero-length Files
 
-Consider sorting the entries of a GNU/Linux system password file
-according to login name.  The following program sorts records
-by a specific field position and can be used for this purpose:   
+All known @command{awk} implementations silently skip over zero-length files.
+This is a by-product of @command{awk}'s implicit 
+read-a-record-and-match-against-the-rules loop: when @command{awk}
+tries to read a record from an empty file, it immediately receives an
+end of file indication, closes the file, and proceeds on to the next
+command-line @value{DF}, @emph{without} executing any user-level
address@hidden program code.
+
+Using @command{gawk}'s @code{ARGIND} variable
+(@pxref{Built-in Variables}), it is possible to detect when an empty
address@hidden has been skipped.  Similar to the library file presented
+in @ref{Filetrans Function}, the following library file calls a function named
address@hidden()} that the user must provide.  The arguments passed are
+the @value{FN} and the position in @code{ARGV} where it was found:
 
address@hidden @code{zerofile.awk} program
 @example
-# sort.awk --- simple program to sort by field position
-# field position is specified by the global variable POS
address@hidden file eg/lib/zerofile.awk
+# zerofile.awk --- library file to process empty input files
address@hidden endfile
address@hidden
address@hidden file eg/lib/zerofile.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# June 2003
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/zerofile.awk
 
-function cmp_field(i1, v1, i2, v2)
address@hidden
-    # comparison by value, as string, and ascending order
-    return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
address@hidden
+BEGIN @{ Argind = 0 @}
 
address@hidden
-    for (i = 1; i <= NF; i++)
-        a[NR][i] = $i
+ARGIND > Argind + 1 @{
+    for (Argind++; Argind < ARGIND; Argind++)
+        zerofile(ARGV[Argind], Argind)
 @}
 
+ARGIND != Argind @{ Argind = ARGIND @}
+
 END @{
-    PROCINFO["sorted_in"] = "cmp_field"
-    if (POS < 1 || POS > NF)
-        POS = 1
-    for (i in a) @{
-        for (j = 1; j <= NF; j++)
-            printf("%s%c", a[i][j], j < NF ? ":" : "")
-        print ""
-    @}
+    if (ARGIND > Argind)
+        for (Argind++; Argind <= ARGIND; Argind++)
+            zerofile(ARGV[Argind], Argind)
 @}
address@hidden endfile
 @end example
 
-The first field in each entry of the password file is the user's login name,
-and the fields are separated by colons.
-Each record defines a subarray,
-with each field as an element in the subarray.
-Running the program produces the
-following output:
+The user-level variable @code{Argind} allows the @command{awk} program
+to track its progress through @code{ARGV}.  Whenever the program detects
+that @code{ARGIND} is greater than @samp{Argind + 1}, it means that one or
+more empty files were skipped.  The action then calls @code{zerofile()} for
+each such file, incrementing @code{Argind} along the way.
 
address@hidden
-$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd}
address@hidden adm:x:3:4:adm:/var/adm:/sbin/nologin
address@hidden apache:x:48:48:Apache:/var/www:/sbin/nologin
address@hidden avahi:x:70:70:Avahi daemon:/:/sbin/nologin
address@hidden
address@hidden example
+The @samp{Argind != ARGIND} rule simply keeps @code{Argind} up to date
+in the normal case.
 
-The comparison should normally always return the same value when given a
-specific pair of array elements as its arguments.  If inconsistent
-results are returned then the order is undefined.  This behavior can be
-exploited to introduce random order into otherwise seemingly
-ordered data:
+Finally, the @code{END} rule catches the case of any empty files at
+the end of the command-line arguments.  Note that the test in the
+condition of the @code{for} loop uses the @samp{<=} operator,
+not @samp{<}.
 
address@hidden
-function cmp_randomize(i1, v1, i2, v2)
address@hidden
-    # random order
-    return (2 - 4 * rand())
address@hidden
address@hidden example
+As an exercise, you might consider whether this same problem can
+be solved without relying on @command{gawk}'s @code{ARGIND} variable.
 
-As mentioned above, the order of the indices is arbitrary if two
-elements compare equal.  This is usually not a problem, but letting
-the tied elements come out in arbitrary order can be an issue, especially
-when comparing item values.  The partial ordering of the equal elements
-may change during the next loop traversal, if other elements are added or
-removed from the array.  One way to resolve ties when comparing elements
-with otherwise equal values is to include the indices in the comparison
-rules.  Note that doing this may make the loop traversal less efficient,
-so consider it only if necessary.  The following comparison functions
-force a deterministic order, and are based on the fact that the
-indices of two elements are never equal:
+As a second exercise, revise this code to handle the case where
+an intervening value in @code{ARGV} is a variable assignment.
 
address@hidden
-function cmp_numeric(i1, v1, i2, v2)
address@hidden
-    # numerical value (and index) comparison, descending order
-    return (v1 != v2) ? (v2 - v1) : (i2 - i1)
address@hidden
address@hidden
+# zerofile2.awk --- same thing, portably
 
-function cmp_string(i1, v1, i2, v2)
address@hidden
-    # string value (and index) comparison, descending order
-    v1 = v1 i1
-    v2 = v2 i2
-    return (v1 > v2) ? -1 : (v1 != v2)
address@hidden
address@hidden example
+BEGIN @{
+    ARGIND = Argind = 0
+    for (i = 1; i < ARGC; i++)
+        Fnames[ARGV[i]]++
 
address@hidden Avoid using the term ``stable'' when describing the 
unpredictable behavior
address@hidden if two items compare equal.  Usually, the goal of a "stable 
algorithm"
address@hidden is to maintain the original order of the items, which is a 
meaningless
address@hidden concept for a list constructed from a hash.
-
-A custom comparison function can often simplify ordered loop
-traversal, and the sky is really the limit when it comes to
-designing such a function.
address@hidden
+FNR == 1 @{
+    while (ARGV[ARGIND] != FILENAME)
+        ARGIND++
+    Seen[FILENAME]++
+    if (Seen[FILENAME] == Fnames[FILENAME])
+        do
+            ARGIND++
+        while (ARGV[ARGIND] != FILENAME)
address@hidden
+ARGIND > Argind + 1 @{
+    for (Argind++; Argind < ARGIND; Argind++)
+        zerofile(ARGV[Argind], Argind)
address@hidden
+ARGIND != Argind @{
+    Argind = ARGIND
address@hidden
+END @{
+    if (ARGIND < ARGC - 1)
+        ARGIND = ARGC - 1 
+    if (ARGIND > Argind)
+        for (Argind++; Argind <= ARGIND; Argind++)
+            zerofile(ARGV[Argind], Argind)
address@hidden
address@hidden ignore
 
-When string comparisons are made during a sort, either for element
-values where one or both aren't numbers, or for element indices
-handled as strings, the value of @code{IGNORECASE}
-(@pxref{Built-in Variables}) controls whether
-the comparisons treat corresponding uppercase and lowercase letters as
-equivalent or distinct.
address@hidden Ignoring Assigns
address@hidden Treating Assignments as @value{FFN}s
 
-Another point to keep in mind is that in the case of subarrays
-the element values can themselves be arrays; a production comparison
-function should use the @code{isarray()} function
-(@pxref{Type Functions}),
-to check for this, and choose a defined sorting order for subarrays.
address@hidden assignments as filenames
address@hidden filenames, assignments as
+Occasionally, you might not want @command{awk} to process command-line
+variable assignments
+(@pxref{Assignment Options}).
+In particular, if you have a @value{FN} that contain an @samp{=} character,
address@hidden treats the @value{FN} as an assignment, and does not process it.
 
-All sorting based on @code{PROCINFO["sorted_in"]}
-is disabled in POSIX mode,
-since the @code{PROCINFO} array is not special in that case.
+Some users have suggested an additional command-line option for @command{gawk}
+to disable command-line assignments.  However, some simple programming with
+a library file does the trick:
 
-As a side note, sorting the array indices before traversing
-the array has been reported to add 15% to 20% overhead to the
-execution time of @command{awk} programs. For this reason,
-sorted array traversal is not the default.
address@hidden @code{noassign.awk} program
address@hidden
address@hidden file eg/lib/noassign.awk
+# noassign.awk --- library file to avoid the need for a
+# special option that disables command-line assignments
address@hidden endfile
address@hidden
address@hidden file eg/lib/noassign.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# October 1999
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/noassign.awk
 
address@hidden The @command{gawk}
address@hidden maintainers believe that only the people who wish to use a
address@hidden feature should have to pay for it.
+function disable_assigns(argc, argv,    i)
address@hidden
+    for (i = 1; i < argc; i++)
+        if (argv[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/)
+            argv[i] = ("./" argv[i])
address@hidden
 
address@hidden Array Sorting Functions
address@hidden Sorting Array Values and Indices with @command{gawk}
+BEGIN @{
+    if (No_command_assign)
+        disable_assigns(ARGC, ARGV)
address@hidden
address@hidden endfile
address@hidden example
 
address@hidden arrays, sorting
address@hidden @code{asort()} function (@command{gawk})
address@hidden @code{asort()} function (@command{gawk}), address@hidden sorting
address@hidden sort function, arrays, sorting
-In most @command{awk} implementations, sorting an array requires
-writing a @code{sort()} function.
-While this can be educational for exploring different sorting algorithms,
-usually that's not the point of the program.
address@hidden provides the built-in @code{asort()}
-and @code{asorti()} functions
-(@pxref{String Functions})
-for sorting arrays.  For example:
+You then run your program this way:
 
 @example
address@hidden the array} data
-n = asort(data)
-for (i = 1; i <= n; i++)
-    @var{do something with} data[i]
+awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk *
 @end example
 
-After the call to @code{asort()}, the array @code{data} is indexed from 1
-to some number @var{n}, the total number of elements in @code{data}.
-(This count is @code{asort()}'s return value.)
address@hidden @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
-The comparison is based on the type of the elements
-(@pxref{Typing and Comparison}).
-All numeric values come before all string values,
-which in turn come before all subarrays.
+The function works by looping through the arguments.
+It prepends @samp{./} to
+any argument that matches the form
+of a variable assignment, turning that argument into a @value{FN}.
 
address@hidden side effects, @code{asort()} function
-An important side effect of calling @code{asort()} is that
address@hidden array's original indices are irrevocably lost}.
-As this isn't always desirable, @code{asort()} accepts a
-second argument:
+The use of @code{No_command_assign} allows you to disable command-line
+assignments at invocation time, by giving the variable a true value.
+When not set, it is initially zero (i.e., false), so the command-line arguments
+are left alone.
address@hidden ENDOFRANGE dataf
address@hidden ENDOFRANGE flibdataf
address@hidden ENDOFRANGE libfdataf
 
address@hidden
address@hidden the array} source
-n = asort(source, dest)
-for (i = 1; i <= n; i++)
-    @var{do something with} dest[i]
address@hidden example
address@hidden Getopt Function
address@hidden Processing Command-Line Options
 
-In this case, @command{gawk} copies the @code{source} array into the
address@hidden array and then sorts @code{dest}, destroying its indices.
-However, the @code{source} array is not affected.
address@hidden STARTOFRANGE libfclo
address@hidden libraries of @command{awk} functions, command-line options
address@hidden STARTOFRANGE flibclo
address@hidden functions, library, command-line options
address@hidden STARTOFRANGE clop
address@hidden command-line options, processing
address@hidden STARTOFRANGE oclp
address@hidden options, command-line, processing
address@hidden STARTOFRANGE clibf
address@hidden functions, library, C library
address@hidden arguments, processing
+Most utilities on POSIX compatible systems take options on
+the command line that can be used to change the way a program behaves.
address@hidden is an example of such a program
+(@pxref{Options}).
+Often, options take @dfn{arguments}; i.e., data that the program needs to
+correctly obey the command-line option.  For example, @command{awk}'s
address@hidden option requires a string to use as the field separator.
+The first occurrence on the command line of either @option{--} or a
+string that does not begin with @samp{-} ends the options.
 
address@hidden()} accepts a third string argument to control comparison of
-array elements.  As with @code{PROCINFO["sorted_in"]}, this argument
-may be one of the predefined names that @command{gawk} provides
-(@pxref{Controlling Scanning}), or the name of a user-defined function
-(@pxref{Controlling Array Traversal}).
address@hidden @code{getopt()} function (C library)
+Modern Unix systems provide a C function named @code{getopt()} for processing
+command-line arguments.  The programmer provides a string describing the
+one-letter options. If an option requires an argument, it is followed in the
+string with a colon.  @code{getopt()} is also passed the
+count and values of the command-line arguments and is called in a loop.
address@hidden()} processes the command-line arguments for option letters.
+Each time around the loop, it returns a single character representing the
+next option letter that it finds, or @samp{?} if it finds an invalid option.
+When it returns @minus{}1, there are no options left on the command line.
 
address@hidden NOTE
-In all cases, the sorted element values consist of the original
-array's element values.  The ability to control comparison merely
-affects the way in which they are sorted.
address@hidden quotation
+When using @code{getopt()}, options that do not take arguments can be
+grouped together.  Furthermore, options that take arguments require that the
+argument be present.  The argument can immediately follow the option letter,
+or it can be a separate command-line argument.
 
-Often, what's needed is to sort on the values of the @emph{indices}
-instead of the values of the elements.
-To do that, use the
address@hidden()} function.  The interface is identical to that of
address@hidden()}, except that the index values are used for sorting, and
-become the values of the result array:
+Given a hypothetical program that takes
+three command-line options, @option{-a}, @option{-b}, and @option{-c}, where
address@hidden requires an argument, all of the following are valid ways of
+invoking the program:
 
 @example
address@hidden source[$0] = some_func($0) @}
-
-END @{
-    n = asorti(source, dest)
-    for (i = 1; i <= n; i++) @{
-        @ii{Work with sorted indices directly:}
-        @var{do something with} dest[i]
-        @dots{}
-        @ii{Access original array via sorted indices:}
-        @var{do something with} source[dest[i]]
-    @}
address@hidden
+prog -a -b foo -c data1 data2 data3
+prog -ac -bfoo -- data1 data2 data3
+prog -acbfoo data1 data2 data3
 @end example
 
-Similar to @code{asort()},
-in all cases, the sorted element values consist of the original
-array's indices.  The ability to control comparison merely
-affects the way in which they are sorted.
+Notice that when the argument is grouped with its option, the rest of
+the argument is considered to be the option's argument.
+In this example, @option{-acbfoo} indicates that all of the
address@hidden, @option{-b}, and @option{-c} options were supplied,
+and that @samp{foo} is the argument to the @option{-b} option.
 
-Sorting the array by replacing the indices provides maximal flexibility.
-To traverse the elements in decreasing order, use a loop that goes from
address@hidden down to 1, either over the elements or over the address@hidden
-may also use one of the predefined sorting names that sorts in
-decreasing order.}
address@hidden()} provides four external variables that the programmer can use:
 
address@hidden reference counting, sorting arrays
-Copying array indices and elements isn't expensive in terms of memory.
-Internally, @command{gawk} maintains @dfn{reference counts} to data.
-For example, when @code{asort()} copies the first array to the second one,
-there is only one copy of the original array elements' data, even though
-both arrays use the values.
address@hidden @code
address@hidden optind
+The index in the argument value array (@code{argv}) where the first
+nonoption command-line argument can be found.
 
address@hidden Document It And Call It A Feature. Sigh.
address@hidden @command{gawk}, @code{IGNORECASE} variable in
address@hidden @code{IGNORECASE} variable
address@hidden arrays, sorting, @code{IGNORECASE} variable and
address@hidden @code{IGNORECASE} variable, array sorting and
-Because @code{IGNORECASE} affects string comparisons, the value
-of @code{IGNORECASE} also affects sorting for both @code{asort()} and 
@code{asorti()}.
-Note also that the locale's sorting order does @emph{not}
-come into play; comparisons are based on character values address@hidden
-is true because locale-based comparison occurs only when in POSIX
-compatibility mode, and since @code{asort()} and @code{asorti()} are
address@hidden extensions, they are not available in that case.}
-Caveat Emptor.
address@hidden optarg
+The string value of the argument to an option.
 
address@hidden Two-way I/O
address@hidden Two-Way Communications with Another Process
address@hidden Brennan, Michael
address@hidden programmers, attractiveness of
address@hidden
address@hidden Path: 
cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
-From: brennan@@whidbey.com (Mike Brennan)
-Newsgroups: comp.lang.awk
-Subject: Re: Learn the SECRET to Attract Women Easily
-Date: 4 Aug 1997 17:34:46 GMT
address@hidden Organization: WhidbeyNet
address@hidden Lines: 12
-Message-ID: <5s53rm$eca@@news.whidbey.com>
address@hidden References: <address@hidden>
address@hidden Reply-To: address@hidden
address@hidden NNTP-Posting-Host: asn202.whidbey.com
address@hidden X-Newsreader: slrn (0.9.4.1 UNIX)
address@hidden Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
-
-On 3 Aug 1997 13:17:43 GMT, Want More Dates???
-<tracy78@@kilgrona.com> wrote:
->Learn the SECRET to Attract Women Easily
->
->The SCENT(tm)  Pheromone Sex Attractant For Men to Attract Women
address@hidden opterr
+Usually @code{getopt()} prints an error message when it finds an invalid
+option.  Setting @code{opterr} to zero disables this feature.  (An
+application might want to print its own error message.)
 
-The scent of awk programmers is a lot more attractive to women than
-the scent of perl programmers.
---
-Mike Brennan
address@hidden brennan@@whidbey.com
address@hidden smallexample
address@hidden optopt
+The letter representing the command-line option.
address@hidden While not usually documented, most versions supply this variable.
address@hidden table
 
address@hidden advanced features, @command{gawk}, address@hidden communicating 
with
address@hidden processes, two-way communications with
-It is often useful to be able to
-send data to a separate program for
-processing and then read the result.  This can always be
-done with temporary files:
+The following C fragment shows how @code{getopt()} might process command-line
+arguments for @command{awk}:
 
 @example
-# Write the data for processing
-tempfile = ("mydata." PROCINFO["pid"])
-while (@var{not done with data})
-    print @var{data} | ("subprogram > " tempfile)
-close("subprogram > " tempfile)
-
-# Read the results, remove tempfile when done
-while ((getline newdata < tempfile) > 0)
-    @var{process} newdata @var{appropriately}
-close(tempfile)
-system("rm " tempfile)
+int
+main(int argc, char *argv[])
address@hidden
+    @dots{}
+    /* print our own message */
+    opterr = 0;
+    while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) @{
+        switch (c) @{
+        case 'f':    /* file */
+            @dots{}
+            break;
+        case 'F':    /* field separator */
+            @dots{}
+            break;
+        case 'v':    /* variable assignment */
+            @dots{}
+            break;
+        case 'W':    /* extension */
+            @dots{}
+            break;
+        case '?':
+        default:
+            usage();
+            break;
+        @}
+    @}
+    @dots{}
address@hidden
 @end example
 
address@hidden
-This works, but not elegantly.  Among other things, it requires that
-the program be run in a directory that cannot be shared among users;
-for example, @file{/tmp} will not do, as another user might happen
-to be using a temporary file with the same name.
-
address@hidden coprocesses
address@hidden input/output, two-way
address@hidden @code{|} (vertical bar), @code{|&} operator (I/O)
address@hidden vertical bar (@code{|}), @code{|&} operator (I/O)
address@hidden @command{csh} utility, @code{|&} operator, comparison with
-However, with @command{gawk}, it is possible to
-open a @emph{two-way} pipe to another process.  The second process is
-termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}.
-The two-way connection is created using the @samp{|&} operator
-(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
-different from the same operator in the C shell.}
+As a side point, @command{gawk} actually uses the GNU @code{getopt_long()}
+function to process both normal and GNU-style long options
+(@pxref{Options}).
 
address@hidden
-do @{
-    print @var{data} |& "subprogram"
-    "subprogram" |& getline results
address@hidden while (@var{data left to process})
-close("subprogram")
address@hidden example
+The abstraction provided by @code{getopt()} is very useful and is quite
+handy in @command{awk} programs as well.  Following is an @command{awk}
+version of @code{getopt()}.  This function highlights one of the
+greatest weaknesses in @command{awk}, which is that it is very poor at
+manipulating single characters.  Repeated calls to @code{substr()} are
+necessary for accessing individual characters
+(@pxref{String Functions})address@hidden
+function was written before @command{gawk} acquired the ability to
+split strings into single characters using @code{""} as the separator.
+We have left it alone, since using @code{substr()} is more portable.}
address@hidden FIXME: could use split(str, a, "") to do it more easily.
 
-The first time an I/O operation is executed using the @samp{|&}
-operator, @command{gawk} creates a two-way pipeline to a child process
-that runs the other program.  Output created with @code{print}
-or @code{printf} is written to the program's standard input, and
-output from the program's standard output can be read by the @command{gawk}
-program using @code{getline}.
-As is the case with processes started by @samp{|}, the subprogram
-can be any program, or pipeline of programs, that can be started by
-the shell.
+The discussion that follows walks through the code a bit at a time:
 
-There are some cautionary items to be aware of:
address@hidden @code{getopt()} user-defined function
address@hidden
address@hidden file eg/lib/getopt.awk
+# getopt.awk --- Do C library getopt(3) function in awk
address@hidden endfile
address@hidden
address@hidden file eg/lib/getopt.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+#
+# Initial version: March, 1991
+# Revised: May, 1993
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/getopt.awk
 
address@hidden @bullet
address@hidden
-As the code inside @command{gawk} currently stands, the coprocess's
-standard error goes to the same place that the parent @command{gawk}'s
-standard error goes. It is not possible to read the child's
-standard error separately.
+# External variables:
+#    Optind -- index in ARGV of first nonoption argument
+#    Optarg -- string value of argument to current option
+#    Opterr -- if nonzero, print our own diagnostic
+#    Optopt -- current option letter
 
address@hidden deadlocks
address@hidden buffering, input/output
address@hidden @code{getline} command, deadlock and
address@hidden
-I/O buffering may be a problem.  @command{gawk} automatically
-flushes all output down the pipe to the coprocess.
-However, if the coprocess does not flush its output,
address@hidden may hang when doing a @code{getline} in order to read
-the coprocess's results.  This could lead to a situation
-known as @dfn{deadlock}, where each process is waiting for the
-other one to do something.
address@hidden itemize
+# Returns:
+#    -1     at end of options
+#    "?"    for unrecognized option
+#    <c>    a character representing the current option
 
address@hidden @code{close()} function, two-way pipes and
-It is possible to close just one end of the two-way pipe to
-a coprocess, by supplying a second argument to the @code{close()}
-function of either @code{"to"} or @code{"from"}
-(@pxref{Close Files And Pipes}).
-These strings tell @command{gawk} to close the end of the pipe
-that sends data to the coprocess or the end that reads from it,
-respectively.
+# Private Data:
+#    _opti  -- index in multi-flag option, e.g., -abc
address@hidden endfile
address@hidden example
 
address@hidden @command{sort} utility, coprocesses and
-This is particularly necessary in order to use
-the system @command{sort} utility as part of a coprocess;
address@hidden must read @emph{all} of its input
-data before it can produce any output.
-The @command{sort} program does not receive an end-of-file indication
-until @command{gawk} closes the write end of the pipe.
+The function starts out with comments presenting
+a list of the global variables it uses,
+what the return values are, what they mean, and any global variables that
+are ``private'' to this library function.  Such documentation is essential
+for any program, and particularly for library functions.
 
-When you have finished writing data to the @command{sort}
-utility, you can close the @code{"to"} end of the pipe, and
-then start reading sorted data via @code{getline}.
-For example:
+The @code{getopt()} function first checks that it was indeed called with
+a string of options (the @code{options} parameter).  If @code{options}
+has a zero length, @code{getopt()} immediately returns @minus{}1:
 
address@hidden @code{getopt()} user-defined function
 @example
-BEGIN @{
-    command = "LC_ALL=C sort"
-    n = split("abcdefghijklmnopqrstuvwxyz", a, "")
-
-    for (i = n; i > 0; i--)
-        print a[i] |& command
-    close(command, "to")
address@hidden file eg/lib/getopt.awk
+function getopt(argc, argv, options,    thisopt, i)
address@hidden
+    if (length(options) == 0)    # no options given
+        return -1
 
-    while ((command |& getline line) > 0)
-        print "got", line
-    close(command)
address@hidden
address@hidden
+    if (argv[Optind] == "--") @{  # all done
+        Optind++
+        _opti = 0
+        return -1
address@hidden group
+    @} else if (argv[Optind] !~ /^-[^:[:space:]]/) @{
+        _opti = 0
+        return -1
+    @}
address@hidden endfile
 @end example
 
-This program writes the letters of the alphabet in reverse order, one
-per line, down the two-way pipe to @command{sort}.  It then closes the
-write end of the pipe, so that @command{sort} receives an end-of-file
-indication.  This causes @command{sort} to sort the data and write the
-sorted data back to the @command{gawk} program.  Once all of the data
-has been read, @command{gawk} terminates the coprocess and exits.
+The next thing to check for is the end of the options.  A @option{--}
+ends the command-line options, as does any command-line argument that
+does not begin with a @samp{-}.  @code{Optind} is used to step through
+the array of command-line arguments; it retains its value across calls
+to @code{getopt()}, because it is a global variable.
 
-As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
-command ensures traditional Unix (ASCII) sorting from @command{sort}.
-
address@hidden @command{gawk}, @code{PROCINFO} array in
address@hidden @code{PROCINFO} array
-You may also use pseudo-ttys (ptys) for
-two-way communication instead of pipes, if your system supports them.
-This is done on a per-command basis, by setting a special element
-in the @code{PROCINFO} array
-(@pxref{Auto-set}),
-like so:
+The regular expression that is used, @address@hidden/^-[^:[:space:]/}},
+checks for a @samp{-} followed by anything
+that is not whitespace and not a colon.
+If the current command-line argument does not match this pattern,
+it is not an option, and it ends option processing. Continuing on:
 
 @example
-command = "sort -nr"           # command, save in convenience variable
-PROCINFO[command, "pty"] = 1   # update PROCINFO
-print @dots{} |& command       # start two-way pipe
address@hidden
address@hidden file eg/lib/getopt.awk
+    if (_opti == 0)
+        _opti = 2
+    thisopt = substr(argv[Optind], _opti, 1)
+    Optopt = thisopt
+    i = index(options, thisopt)
+    if (i == 0) @{
+        if (Opterr)
+            printf("%c -- invalid option\n",
+                                  thisopt) > "/dev/stderr"
+        if (_opti >= length(argv[Optind])) @{
+            Optind++
+            _opti = 0
+        @} else
+            _opti++
+        return "?"
+    @}
address@hidden endfile
 @end example
 
address@hidden
-Using ptys avoids the buffer deadlock issues described earlier, at some
-loss in performance.  If your system does not have ptys, or if all the
-system's ptys are in use, @command{gawk} automatically falls back to
-using regular pipes.
-
address@hidden TCP/IP Networking
address@hidden Using @command{gawk} for Network Programming
address@hidden advanced features, @command{gawk}, network programming
address@hidden networks, programming
address@hidden STARTOFRANGE tcpip
address@hidden TCP/IP
address@hidden @code{/inet/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet/@dots{}} (@command{gawk})
address@hidden @code{/inet4/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet4/@dots{}} (@command{gawk})
address@hidden @code{/inet6/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet6/@dots{}} (@command{gawk})
address@hidden @code{EMISTERED}
address@hidden
address@hidden:@*
-@ @ @ @ @i{A host is a host from coast to coast,@*
-@ @ @ @ and no-one can talk to host that's close,@*
-@ @ @ @ unless the host that isn't address@hidden
-@ @ @ @ is busy hung or dead.}
address@hidden quotation
-
-In addition to being able to open a two-way pipeline to a coprocess
-on the same system
-(@pxref{Two-way I/O}),
-it is possible to make a two-way connection to
-another process on another system across an IP network connection.
-
-You can think of this as just a @emph{very long} two-way pipeline to
-a coprocess.
-The way @command{gawk} decides that you want to use TCP/IP networking is
-by recognizing special @value{FN}s that begin with one of @samp{/inet/},
address@hidden/inet4/} or @samp{/inet6}.
-
-The full syntax of the special @value{FN} is
address@hidden/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
-The components are:
-
address@hidden @var
address@hidden net-type
-Specifies the kind of Internet connection to make.
-Use @samp{/inet4/} to force IPv4, and
address@hidden/inet6/} to force IPv6.
-Plain @samp{/inet/} (which used to be the only option) uses
-the system default, most likely IPv4.
+The @code{_opti} variable tracks the position in the current command-line
+argument (@code{argv[Optind]}).  If multiple options are
+grouped together with one @samp{-} (e.g., @option{-abx}), it is necessary
+to return them to the user one at a time.
 
address@hidden protocol
-The protocol to use over IP.  This must be either @samp{tcp}, or
address@hidden, for a TCP or UDP IP connection,
-respectively.  The use of TCP is recommended for most applications.
+If @code{_opti} is equal to zero, it is set to two, which is the index in
+the string of the next character to look at (we skip the @samp{-}, which
+is at position one).  The variable @code{thisopt} holds the character,
+obtained with @code{substr()}.  It is saved in @code{Optopt} for the main
+program to use.
 
address@hidden local-port
address@hidden @code{getaddrinfo()} function (C library)
-The local TCP or UDP port number to use.  Use a port number of @samp{0}
-when you want the system to pick a port. This is what you should do
-when writing a TCP or UDP client.
-You may also use a well-known service name, such as @samp{smtp}
-or @samp{http}, in which case @command{gawk} attempts to determine
-the predefined port number using the C @code{getaddrinfo()} function.
+If @code{thisopt} is not in the @code{options} string, then it is an
+invalid option.  If @code{Opterr} is nonzero, @code{getopt()} prints an error
+message on the standard error that is similar to the message from the C
+version of @code{getopt()}.
 
address@hidden remote-host
-The IP address or fully-qualified domain name of the Internet
-host to which you want to connect.
+Because the option is invalid, it is necessary to skip it and move on to the
+next option character.  If @code{_opti} is greater than or equal to the
+length of the current command-line argument, it is necessary to move on
+to the next argument, so @code{Optind} is incremented and @code{_opti} is reset
+to zero. Otherwise, @code{Optind} is left alone and @code{_opti} is merely
+incremented.
 
address@hidden remote-port
-The TCP or UDP port number to use on the given @var{remote-host}.
-Again, use @samp{0} if you don't care, or else a well-known
-service name.
address@hidden table
+In any case, because the option is invalid, @code{getopt()} returns @code{"?"}.
+The main program can examine @code{Optopt} if it needs to know what the
+invalid option letter actually is. Continuing on:
 
address@hidden @command{gawk}, @code{ERRNO} variable in
address@hidden @code{ERRNO} variable
address@hidden NOTE
-Failure in opening a two-way socket will result in a non-fatal error
-being returned to the calling code. The value of @code{ERRNO} indicates
-the error (@pxref{Auto-set}).
address@hidden quotation
address@hidden
address@hidden file eg/lib/getopt.awk
+    if (substr(options, i + 1, 1) == ":") @{
+        # get option argument
+        if (length(substr(argv[Optind], _opti + 1)) > 0)
+            Optarg = substr(argv[Optind], _opti + 1)
+        else
+            Optarg = argv[++Optind]
+        _opti = 0
+    @} else
+        Optarg = ""
address@hidden endfile
address@hidden example
 
-Consider the following very simple example:
+If the option requires an argument, the option letter is followed by a colon
+in the @code{options} string.  If there are remaining characters in the
+current command-line argument (@code{argv[Optind]}), then the rest of that
+string is assigned to @code{Optarg}.  Otherwise, the next command-line
+argument is used (@samp{-xFOO} versus @address@hidden FOO}}). In either case,
address@hidden is reset to zero, because there are no more characters left to
+examine in the current command-line argument. Continuing:
 
 @example
-BEGIN @{
-  Service = "/inet/tcp/0/localhost/daytime"
-  Service |& getline
-  print $0
-  close(Service)
address@hidden file eg/lib/getopt.awk
+    if (_opti == 0 || _opti >= length(argv[Optind])) @{
+        Optind++
+        _opti = 0
+    @} else
+        _opti++
+    return thisopt
 @}
address@hidden endfile
 @end example
 
-This program reads the current date and time from the local system's
-TCP @samp{daytime} server.
-It then prints the results and closes the connection.
-
-Because this topic is extensive, the use of @command{gawk} for
-TCP/IP programming is documented separately.
address@hidden
-See
address@hidden, , General Introduction, gawkinet, TCP/IP Internetworking with 
@command{gawk}},
address@hidden ifinfo
address@hidden
-See @cite{TCP/IP Internetworking with @command{gawk}},
-which comes as part of the @command{gawk} distribution,
address@hidden ifnotinfo
-for a much more complete introduction and discussion, as well as
-extensive examples.
+Finally, if @code{_opti} is either zero or greater than the length of the
+current command-line argument, it means this element in @code{argv} is
+through being processed, so @code{Optind} is incremented to point to the
+next element in @code{argv}.  If neither condition is true, then only
address@hidden is incremented, so that the next option letter can be processed
+on the next call to @code{getopt()}.
 
address@hidden ENDOFRANGE tcpip
+The @code{BEGIN} rule initializes both @code{Opterr} and @code{Optind} to one.
address@hidden is set to one, since the default behavior is for @code{getopt()}
+to print a diagnostic message upon seeing an invalid option.  @code{Optind}
+is set to one, since there's no reason to look at the program name, which is
+in @code{ARGV[0]}:
 
address@hidden Profiling
address@hidden Profiling Your @command{awk} Programs
address@hidden STARTOFRANGE awkp
address@hidden @command{awk} programs, profiling
address@hidden STARTOFRANGE proawk
address@hidden profiling @command{awk} programs
address@hidden profiling @command{gawk}
address@hidden @code{awkprof.out} file
address@hidden files, @code{awkprof.out}
address@hidden
address@hidden file eg/lib/getopt.awk
+BEGIN @{
+    Opterr = 1    # default is to diagnose
+    Optind = 1    # skip ARGV[0]
 
-You may produce execution traces of your @command{awk} programs.
-This is done by passing the option @option{--profile} to @command{gawk}.
-When @command{gawk} has finished running, it creates a profile of your program 
in a file
-named @file{awkprof.out}. Because it is profiling, it also executes up to 45% 
slower than
address@hidden normally does.
+    # test program
+    if (_getopt_test) @{
+        while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
+            printf("c = <%c>, optarg = <%s>\n",
+                                       _go_c, Optarg)
+        printf("non-option arguments:\n")
+        for (; Optind < ARGC; Optind++)
+            printf("\tARGV[%d] = <%s>\n",
+                                    Optind, ARGV[Optind])
+    @}
address@hidden
address@hidden endfile
address@hidden example
 
address@hidden @code{--profile} option
-As shown in the following example,
-the @option{--profile} option can be used to change the name of the file
-where @command{gawk} will write the profile:
+The rest of the @code{BEGIN} rule is a simple test program.  Here is the
+result of two sample runs of the test program:
 
 @example
-gawk --profile=myprog.prof -f myprog.awk data1 data2
+$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x}
address@hidden c = <a>, optarg = <>
address@hidden c = <c>, optarg = <>
address@hidden c = <b>, optarg = <ARG>
address@hidden non-option arguments:
address@hidden         ARGV[3] = <bax>
address@hidden         ARGV[4] = <-x>
+
+$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc}
address@hidden c = <a>, optarg = <>
address@hidden x -- invalid option
address@hidden c = <?>, optarg = <>
address@hidden non-option arguments:
address@hidden         ARGV[4] = <xyz>
address@hidden         ARGV[5] = <abc>
 @end example
 
address@hidden
-In the above example, @command{gawk} places the profile in
address@hidden instead of in @file{awkprof.out}.
+In both runs,
+the first @option{--} terminates the arguments to @command{awk}, so that it 
does
+not try to interpret the @option{-a}, etc., as its own options.
 
-Here is a sample session showing a simple @command{awk} program, its input 
data, and the
-results from running @command{gawk} with the @option{--profile} option.
-First, the @command{awk} program:
address@hidden NOTE
+After @code{getopt()} is through, it is the responsibility of the user level
+code to
+clear out all the elements of @code{ARGV} from 1 to @code{Optind},
+so that @command{awk} does not try to process the command-line options
+as @value{FN}s.
address@hidden quotation
 
address@hidden
-BEGIN @{ print "First BEGIN rule" @}
+Several of the sample programs presented in
address@hidden Programs},
+use @code{getopt()} to process their arguments.
address@hidden ENDOFRANGE libfclo
address@hidden ENDOFRANGE flibclo
address@hidden ENDOFRANGE clop
address@hidden ENDOFRANGE oclp
 
-END @{ print "First END rule" @}
address@hidden Passwd Functions
address@hidden Reading the User Database
 
-/foo/ @{
-    print "matched /foo/, gosh"
-    for (i = 1; i <= 3; i++)
-        sing()
address@hidden
address@hidden STARTOFRANGE libfudata
address@hidden libraries of @command{awk} functions, user database, reading
address@hidden STARTOFRANGE flibudata
address@hidden functions, library, user database, reading
address@hidden STARTOFRANGE udatar
address@hidden user address@hidden reading
address@hidden STARTOFRANGE dataur
address@hidden database, address@hidden reading
address@hidden @code{PROCINFO} array
+The @code{PROCINFO} array
+(@pxref{Built-in Variables})
+provides access to the current user's real and effective user and group ID
+numbers, and if available, the user's supplementary group set.
+However, because these are numbers, they do not provide very useful
+information to the average user.  There needs to be some way to find the
+user information associated with the user and group ID numbers.  This
address@hidden presents a suite of functions for retrieving information from the
+user database.  @xref{Group Functions},
+for a similar suite that retrieves information from the group database.
 
address@hidden
-    if (/foo/)
-        print "if is true"
-    else
-        print "else is true"
address@hidden
address@hidden @code{getpwent()} function (C library)
address@hidden @code{getpwent()} user-defined function
address@hidden users, information about, retrieving
address@hidden login information
address@hidden account information
address@hidden password file
address@hidden files, password
+The POSIX standard does not define the file where user information is
+kept.  Instead, it provides the @code{<pwd.h>} header file
+and several C language subroutines for obtaining user information.
+The primary function is @code{getpwent()}, for ``get password entry.''
+The ``password'' comes from the original user database file,
address@hidden/etc/passwd}, which stores user information, along with the
+encrypted passwords (hence the name).
 
-BEGIN @{ print "Second BEGIN rule" @}
address@hidden @command{pwcat} program
+While an @command{awk} program could simply read @file{/etc/passwd}
+directly, this file may not contain complete information about the
+system's set of address@hidden is often the case that password
+information is stored in a network database.} To be sure you are able to
+produce a readable and complete version of the user database, it is necessary
+to write a small C program that calls @code{getpwent()}.  @code{getpwent()}
+is defined as returning a pointer to a @code{struct passwd}.  Each time it
+is called, it returns the next entry in the database.  When there are
+no more entries, it returns @code{NULL}, the null pointer.  When this
+happens, the C program should call @code{endpwent()} to close the database.
+Following is @command{pwcat}, a C program that ``cats'' the password database:
 
-END @{ print "Second END rule" @}
address@hidden Use old style function header for portability to old systems 
(SunOS, HP/UX).
 
-function sing(    dummy)
address@hidden
address@hidden file eg/lib/pwcat.c
+/*
+ * pwcat.c
+ *
+ * Generate a printable version of the password database
+ */
address@hidden endfile
address@hidden
address@hidden file eg/lib/pwcat.c
+/*
+ * Arnold Robbins, arnold@@skeeve.com, May 1993
+ * Public Domain
+ * December 2010, move to ANSI C definition for main().
+ */
+
+#if HAVE_CONFIG_H
+#include <config.h>
+#endif
+
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/pwcat.c
+#include <stdio.h>
+#include <pwd.h>
+
address@hidden endfile
address@hidden
address@hidden file eg/lib/pwcat.c
+#if defined (STDC_HEADERS)
+#include <stdlib.h>
+#endif
+
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/pwcat.c
+int
+main(int argc, char **argv)
 @{
-    print "I gotta be me!"
address@hidden
address@hidden example
+    struct passwd *p;
 
-Following is the input data:
+    while ((p = getpwent()) != NULL)
address@hidden endfile
address@hidden
address@hidden file eg/lib/pwcat.c
+#ifdef ZOS_USS
+        printf("%s:%ld:%ld:%s:%s\n",
+            p->pw_name, (long) p->pw_uid,
+            (long) p->pw_gid, p->pw_dir, p->pw_shell);
+#else
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/pwcat.c
+        printf("%s:%s:%ld:%ld:%s:%s:%s\n",
+            p->pw_name, p->pw_passwd, (long) p->pw_uid,
+            (long) p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
address@hidden endfile
address@hidden
address@hidden file eg/lib/pwcat.c
+#endif
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/pwcat.c
 
address@hidden
-foo
-bar
-baz
-foo
-junk
+    endpwent();
+    return 0;
address@hidden
address@hidden endfile
 @end example
 
-Here is the @file{awkprof.out} that results from running the @command{gawk}
-profiler on this program and data (this example also illustrates that 
@command{awk}
-programmers sometimes have to work late):
-
address@hidden @code{BEGIN} pattern
address@hidden @code{END} pattern
address@hidden
-        # gawk profile, created Sun Aug 13 00:00:15 2000
+If you don't understand C, don't worry about it.
+The output from @command{pwcat} is the user database, in the traditional
address@hidden/etc/passwd} format of colon-separated fields.  The fields are:
 
-        # BEGIN block(s)
address@hidden @asis
address@hidden Login name
+The user's login name.
 
-        BEGIN @{
-     1          print "First BEGIN rule"
-     1          print "Second BEGIN rule"
-        @}
address@hidden Encrypted password
+The user's encrypted password.  This may not be available on some systems.
 
-        # Rule(s)
address@hidden User-ID
+The user's numeric user ID number.
+(On some systems it's a C @code{long}, and not an @code{int}.  Thus
+we cast it to @code{long} for all cases.)
 
-     5  /foo/   @{ # 2
-     2          print "matched /foo/, gosh"
-     6          for (i = 1; i <= 3; i++) @{
-     6                  sing()
-                @}
-        @}
address@hidden Group-ID
+The user's numeric group ID number.
+(Similar comments about @code{long} vs.@: @code{int} apply here.)
 
-     5  @{
-     5          if (/foo/) @{ # 2
-     2                  print "if is true"
-     3          @} else @{
-     3                  print "else is true"
-                @}
-        @}
address@hidden Full name
+The user's full name, and perhaps other information associated with the
+user.
 
-        # END block(s)
address@hidden Home directory
+The user's login (or ``home'') directory (familiar to shell programmers as
address@hidden).
 
-        END @{
-     1          print "First END rule"
-     1          print "Second END rule"
-        @}
address@hidden Login shell
+The program that is run when the user logs in.  This is usually a
+shell, such as Bash.
address@hidden table
 
-        # Functions, listed alphabetically
+A few lines representative of @command{pwcat}'s output are as follows:
 
-     6  function sing(dummy)
-        @{
-     6          print "I gotta be me!"
-        @}
address@hidden Jacobs, Andrew
address@hidden Robbins, Arnold
address@hidden Robbins, Miriam
address@hidden
+$ @kbd{pwcat}
address@hidden root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
address@hidden nobody:*:65534:65534::/:
address@hidden daemon:*:1:1::/:
address@hidden sys:*:2:2::/:/bin/csh
address@hidden bin:*:3:3::/bin:
address@hidden arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
address@hidden miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
address@hidden andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
address@hidden
 @end example
 
-This example illustrates many of the basic features of profiling output.
-They are as follows:
+With that introduction, following is a group of functions for getting user
+information.  There are several functions here, corresponding to the C
+functions of the same names:
 
address@hidden @bullet
address@hidden
-The program is printed in the order @code{BEGIN} rule,
address@hidden rule,
-pattern/action rules,
address@hidden rule, @code{END} rule and functions, listed
-alphabetically.
-Multiple @code{BEGIN} and @code{END} rules are merged together,
-as are multiple @code{BEGINFILE} and @code{ENDFILE} rules.
address@hidden @code{_pw_init()} user-defined function
address@hidden
address@hidden file eg/lib/passwdawk.in
+# passwd.awk --- access password file information
address@hidden endfile
address@hidden
address@hidden file eg/lib/passwdawk.in
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# May 1993
+# Revised October 2000
+# Revised December 2010
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/passwdawk.in
 
address@hidden patterns, counts
address@hidden
-Pattern-action rules have two counts.
-The first count, to the left of the rule, shows how many times
-the rule's pattern was @emph{tested}.
-The second count, to the right of the rule's opening left brace
-in a comment,
-shows how many times the rule's action was @emph{executed}.
-The difference between the two indicates how many times the rule's
-pattern evaluated to false.
+BEGIN @{
+    # tailor this to suit your system
+    _pw_awklib = "/usr/local/libexec/awk/"
address@hidden
 
address@hidden
-Similarly,
-the count for an @address@hidden statement shows how many times
-the condition was tested.
-To the right of the opening left brace for the @code{if}'s body
-is a count showing how many times the condition was true.
-The count for the @code{else}
-indicates how many times the test failed.
+function _pw_init(    oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
address@hidden
+    if (_pw_inited)
+        return
 
address@hidden loops, count for header
address@hidden
-The count for a loop header (such as @code{for}
-or @code{while}) shows how many times the loop test was executed.
-(Because of this, you can't just look at the count on the first
-statement in a rule to determine how many times the rule was executed.
-If the first statement is a loop, the count is misleading.)
+    oldfs = FS
+    oldrs = RS
+    olddol0 = $0
+    using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
+    using_fpat = (PROCINFO["FS"] == "FPAT")
+    FS = ":"
+    RS = "\n"
 
address@hidden functions, user-defined, counts
address@hidden user-defined, functions, counts
address@hidden
-For user-defined functions, the count next to the @code{function}
-keyword indicates how many times the function was called.
-The counts next to the statements in the body show how many times
-those statements were executed.
+    pwcat = _pw_awklib "pwcat"
+    while ((pwcat | getline) > 0) @{
+        _pw_byname[$1] = $0
+        _pw_byuid[$3] = $0
+        _pw_bycount[++_pw_total] = $0
+    @}
+    close(pwcat)
+    _pw_count = 0
+    _pw_inited = 1
+    FS = oldfs
+    if (using_fw)
+        FIELDWIDTHS = FIELDWIDTHS
+    else if (using_fpat)
+        FPAT = FPAT
+    RS = oldrs
+    $0 = olddol0
address@hidden
address@hidden endfile
address@hidden example
 
address@hidden @address@hidden@}} (braces)
address@hidden braces (@address@hidden@}})
address@hidden
-The layout uses ``K&R'' style with TABs.
-Braces are used everywhere, even when
-the body of an @code{if}, @code{else}, or loop is only a single statement.
address@hidden @code{BEGIN} pattern, @code{pwcat} program
+The @code{BEGIN} rule sets a private variable to the directory where
address@hidden is stored.  Because it is used to help out an @command{awk} 
library
+routine, we have chosen to put it in @file{/usr/local/libexec/awk};
+however, you might want it to be in a different directory on your system.
 
address@hidden @code{()} (parentheses)
address@hidden parentheses @code{()}
address@hidden
-Parentheses are used only where needed, as indicated by the structure
-of the program and the precedence rules.
address@hidden extra verbiage here satisfies the copyeditor. ugh.
-For example, @samp{(3 + 5) * 4} means add three plus five, then multiply
-the total by four.  However, @samp{3 + 5 * 4} has no parentheses, and
-means @samp{3 + (5 * 4)}.
+The function @code{_pw_init()} keeps three copies of the user information
+in three associative arrays.  The arrays are indexed by username
+(@code{_pw_byname}), by user ID number (@code{_pw_byuid}), and by order of
+occurrence (@code{_pw_bycount}).
+The variable @code{_pw_inited} is used for efficiency, since @code{_pw_init()}
+needs to be called only once.
 
address@hidden
address@hidden
-All string concatenations are parenthesized too.
-(This could be made a bit smarter.)
address@hidden ignore
address@hidden @code{getline} command, @code{_pw_init()} function
+Because this function uses @code{getline} to read information from
address@hidden, it first saves the values of @code{FS}, @code{RS}, and 
@code{$0}.
+It notes in the variable @code{using_fw} whether field splitting
+with @code{FIELDWIDTHS} is in effect or not.
+Doing so is necessary, since these functions could be called
+from anywhere within a user's program, and the user may have his
+or her
+own way of splitting records and fields.
 
address@hidden
-Parentheses are used around the arguments to @code{print}
-and @code{printf} only when
-the @code{print} or @code{printf} statement is followed by a redirection.
-Similarly, if
-the target of a redirection isn't a scalar, it gets parenthesized.
address@hidden @code{PROCINFO} array
+The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which
+is @code{"FIELDWIDTHS"} if field splitting is being done with
address@hidden  This makes it possible to restore the correct
+field-splitting mechanism later.  The test can only be true for
address@hidden  It is false if using @code{FS} or @code{FPAT},
+or on some other @command{awk} implementation.
 
address@hidden
address@hidden supplies leading comments in
-front of the @code{BEGIN} and @code{END} rules,
-the pattern/action rules, and the functions.
+The code that checks for using @code{FPAT}, using @code{using_fpat}
+and @code{PROCINFO["FS"]} is similar.
 
address@hidden itemize
+The main part of the function uses a loop to read database lines, split
+the line into fields, and then store the line into each array as necessary.
+When the loop is done, @address@hidden()}} cleans up by closing the pipeline,
+setting @address@hidden to one, and restoring @code{FS}
+(and @code{FIELDWIDTHS} or @code{FPAT}
+if necessary), @code{RS}, and @code{$0}.
+The use of @address@hidden is explained shortly.
 
-The profiled version of your program may not look exactly like what you
-typed when you wrote it.  This is because @command{gawk} creates the
-profiled version by ``pretty printing'' its internal representation of
-the program.  The advantage to this is that @command{gawk} can produce
-a standard representation.  The disadvantage is that all source-code
-comments are lost, as are the distinctions among multiple @code{BEGIN},
address@hidden, @code{BEGINFILE}, and @code{ENDFILE} rules.  Also, things such 
as:
address@hidden @code{getpwnam()} function (C library)
+The @code{getpwnam()} function takes a username as a string argument. If that
+user is in the database, it returns the appropriate line. Otherwise, it
+relies on the array reference to a nonexistent
+element to create the element with the null string as its value:
 
address@hidden @code{getpwnam()} user-defined function
 @example
-/foo/
address@hidden
address@hidden file eg/lib/passwdawk.in
+function getpwnam(name)
address@hidden
+    _pw_init()
+    return _pw_byname[name]
address@hidden
address@hidden endfile
address@hidden group
 @end example
 
address@hidden
-come out as:
address@hidden @code{getpwuid()} function (C library)
+Similarly,
+the @code{getpwuid} function takes a user ID number argument. If that
+user number is in the database, it returns the appropriate line. Otherwise, it
+returns the null string:
 
address@hidden @code{getpwuid()} user-defined function
 @example
-/foo/   @{
-    print $0
address@hidden file eg/lib/passwdawk.in
+function getpwuid(uid)
address@hidden
+    _pw_init()
+    return _pw_byuid[uid]
 @}
address@hidden endfile
 @end example
 
address@hidden
-which is correct, but possibly surprising.
-
address@hidden profiling @command{awk} programs, dynamically
address@hidden @command{gawk} program, dynamic profiling
-Besides creating profiles when a program has completed,
address@hidden can produce a profile while it is running.
-This is useful if your @command{awk} program goes into an
-infinite loop and you want to see what has been executed.
-To use this feature, run @command{gawk} with the @option{--profile}
-option in the background:
address@hidden @code{getpwent()} function (C library)
+The @code{getpwent()} function simply steps through the database, one entry at
+a time.  It uses @code{_pw_count} to track its current position in the
address@hidden array:
 
address@hidden @code{getpwent()} user-defined function
 @example
-$ @kbd{gawk --profile -f myprog &}
-[1] 13992
address@hidden file eg/lib/passwdawk.in
+function getpwent()
address@hidden
+    _pw_init()
+    if (_pw_count < _pw_total)
+        return _pw_bycount[++_pw_count]
+    return ""
address@hidden
address@hidden endfile
 @end example
 
address@hidden @command{kill} address@hidden dynamic profiling
address@hidden @code{USR1} signal
address@hidden @code{SIGUSR1} signal
address@hidden signals, @code{USR1}/@code{SIGUSR1}
address@hidden
-The shell prints a job number and process ID number; in this case, 13992.
-Use the @command{kill} command to send the @code{USR1} signal
-to @command{gawk}:
address@hidden @code{endpwent()} function (C library)
+The @address@hidden()}} function resets @address@hidden to zero, so that
+subsequent calls to @code{getpwent()} start over again:
 
address@hidden @code{endpwent()} user-defined function
 @example
-$ @kbd{kill -USR1 13992}
address@hidden file eg/lib/passwdawk.in
+function endpwent()
address@hidden
+    _pw_count = 0
address@hidden
address@hidden endfile
 @end example
 
address@hidden
-As usual, the profiled version of the program is written to
address@hidden, or to a different file if one specified with
-the @option{--profile} option.
+A conscious design decision in this suite is that each subroutine calls
address@hidden@w{_pw_init()}} to initialize the database arrays.
+The overhead of running
+a separate process to generate the user database, and the I/O to scan it,
+are only incurred if the user's main program actually calls one of these
+functions.  If this library file is loaded along with a user's program, but
+none of the routines are ever called, then there is no extra runtime overhead.
+(The alternative is move the body of @address@hidden()}} into a
address@hidden rule, which always runs @command{pwcat}.  This simplifies the
+code but runs an extra process that may never be needed.)
 
-Along with the regular profile, as shown earlier, the profile
-includes a trace of any active functions:
+In turn, calling @code{_pw_init()} is not too expensive, because the
address@hidden variable keeps the program from reading the data more than
+once.  If you are worried about squeezing every last cycle out of your
address@hidden program, the check of @code{_pw_inited} could be moved out of
address@hidden()} and duplicated in all the other functions.  In practice,
+this is not necessary, since most @command{awk} programs are I/O-bound,
+and such a change would clutter up the code.
 
address@hidden
-# Function Call Stack:
+The @command{id} program in @ref{Id Program},
+uses these functions.
address@hidden ENDOFRANGE libfudata
address@hidden ENDOFRANGE flibudata
address@hidden ENDOFRANGE udatar
address@hidden ENDOFRANGE dataur
 
-#   3. baz
-#   2. bar
-#   1. foo
-# -- main --
address@hidden example
address@hidden Group Functions
address@hidden Reading the Group Database
 
-You may send @command{gawk} the @code{USR1} signal as many times as you like.
-Each time, the profile and function call trace are appended to the output
-profile file.
-
address@hidden @code{HUP} signal
address@hidden @code{SIGHUP} signal
address@hidden signals, @code{HUP}/@code{SIGHUP}
-If you use the @code{HUP} signal instead of the @code{USR1} signal,
address@hidden produces the profile and the function call trace and then exits.
-
address@hidden @code{INT} signal (MS-Windows)
address@hidden @code{SIGINT} signal (MS-Windows)
address@hidden signals, @code{INT}/@code{SIGINT} (MS-Windows)
address@hidden @code{QUIT} signal (MS-Windows)
address@hidden @code{SIGQUIT} signal (MS-Windows)
address@hidden signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
-When @command{gawk} runs on MS-Windows systems, it uses the
address@hidden and @code{QUIT} signals for producing the profile and, in
-the case of the @code{INT} signal, @command{gawk} exits.  This is
-because these systems don't support the @command{kill} command, so the
-only signals you can deliver to a program are those generated by the
-keyboard.  The @code{INT} signal is generated by the
address@hidden@address@hidden or @address@hidden@key{BREAK}} key, while the
address@hidden signal is generated by the @address@hidden@key{\}} key.
-
-Finally, @command{gawk} also accepts another option @option{--pretty-print}.
-When called this way, @command{gawk} ``pretty prints'' the program into
address@hidden, without any execution counts.
address@hidden ENDOFRANGE advgaw
address@hidden ENDOFRANGE gawadv
address@hidden ENDOFRANGE awkp
address@hidden ENDOFRANGE proawk
-
address@hidden Library Functions
address@hidden A Library of @command{awk} Functions
address@hidden STARTOFRANGE libf
address@hidden libraries of @command{awk} functions
address@hidden STARTOFRANGE flib
address@hidden functions, library
address@hidden STARTOFRANGE fudlib
address@hidden functions, user-defined, library of
-
address@hidden, describes how to write
-your own @command{awk} functions.  Writing functions is important, because
-it allows you to encapsulate algorithms and program tasks in a single
-place.  It simplifies programming, making program development more
-manageable, and making programs more readable.
-
-One valuable way to learn a new programming language is to @emph{read}
-programs in that language.  To that end, this @value{CHAPTER}
-and @ref{Sample Programs},
-provide a good-sized body of code for you to read,
-and hopefully, to learn from.
-
address@hidden 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!!
-This @value{CHAPTER} presents a library of useful @command{awk} functions.
-Many of the sample programs presented later in this @value{DOCUMENT}
-use these functions.
-The functions are presented here in a progression from simple to complex.
-
address@hidden Texinfo
address@hidden Program},
-presents a program that you can use to extract the source code for
-these example library functions and programs from the Texinfo source
-for this @value{DOCUMENT}.
-(This has already been done as part of the @command{gawk} distribution.)
address@hidden STARTOFRANGE libfgdata
address@hidden libraries of @command{awk} functions, group database, reading
address@hidden STARTOFRANGE flibgdata
address@hidden functions, library, group database, reading
address@hidden STARTOFRANGE gdatar
address@hidden group database, reading
address@hidden STARTOFRANGE datagr
address@hidden database, group, reading
address@hidden @code{PROCINFO} array
address@hidden @code{getgrent()} function (C library)
address@hidden @code{getgrent()} user-defined function
address@hidden address@hidden information about
address@hidden account information
address@hidden group file
address@hidden files, group
+Much of the discussion presented in
address@hidden Functions},
+applies to the group database as well.  Although there has traditionally
+been a well-known file (@file{/etc/group}) in a well-known format, the POSIX
+standard only provides a set of C library routines
+(@code{<grp.h>} and @code{getgrent()})
+for accessing the information.
+Even though this file may exist, it may not have
+complete information.  Therefore, as with the user database, it is necessary
+to have a small C program that generates the group database as its output.
address@hidden, a C program that ``cats'' the group database,
+is as follows:
 
-If you have written one or more useful, general-purpose @command{awk} functions
-and would like to contribute them to the @command{awk} user community, see
address@hidden To Contribute}, for more information.
address@hidden @command{grcat} program
address@hidden
address@hidden file eg/lib/grcat.c
+/*
+ * grcat.c
+ *
+ * Generate a printable version of the group database
+ */
address@hidden endfile
address@hidden
address@hidden file eg/lib/grcat.c
+/*
+ * Arnold Robbins, arnold@@skeeve.com, May 1993
+ * Public Domain
+ * December 2010, move to ANSI C definition for main().
+ */
 
address@hidden portability, example programs
-The programs in this @value{CHAPTER} and in
address@hidden Programs},
-freely use features that are @command{gawk}-specific.
-Rewriting these programs for different implementations of @command{awk}
-is pretty straightforward.
+/* For OS/2, do nothing. */
+#if HAVE_CONFIG_H
+#include <config.h>
+#endif
 
address@hidden @bullet
address@hidden
-Diagnostic error messages are sent to @file{/dev/stderr}.
-Use @samp{| "cat 1>&2"} instead of @samp{> "/dev/stderr"} if your system
-does not have a @file{/dev/stderr}, or if you cannot use @command{gawk}.
+#if defined (STDC_HEADERS)
+#include <stdlib.h>
+#endif
 
address@hidden
-A number of programs use @code{nextfile}
-(@pxref{Nextfile Statement})
-to skip any remaining input in the input file.
+#ifndef HAVE_GETGRENT
+int main() { return 0; }
+#else
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/grcat.c
+#include <stdio.h>
+#include <grp.h>
 
address@hidden
address@hidden 12/2000: Thanks to Nelson Beebe for pointing out the output 
issue.
address@hidden case sensitivity, example programs
address@hidden @code{IGNORECASE} variable, in example programs
-Finally, some of the programs choose to ignore upper- and lowercase
-distinctions in their input. They do so by assigning one to @code{IGNORECASE}.
-You can achieve almost the same address@hidden effects are
-not identical.  Output of the transformed
-record will be in all lowercase, while @code{IGNORECASE} preserves the original
-contents of the input record.} by adding the following rule to the
-beginning of the program:
+int
+main(int argc, char **argv)
address@hidden
+    struct group *g;
+    int i;
 
address@hidden
-# ignore case
address@hidden $0 = tolower($0) @}
+    while ((g = getgrent()) != NULL) @{
address@hidden endfile
address@hidden
address@hidden file eg/lib/grcat.c
+#ifdef ZOS_USS
+        printf("%s:%ld:", g->gr_name, (long) g->gr_gid);
+#else
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/grcat.c
+        printf("%s:%s:%ld:", g->gr_name, g->gr_passwd,
+                                     (long) g->gr_gid);
address@hidden endfile
address@hidden
address@hidden file eg/lib/grcat.c
+#endif
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/grcat.c
+        for (i = 0; g->gr_mem[i] != NULL; i++) @{
+            printf("%s", g->gr_mem[i]);
address@hidden
+            if (g->gr_mem[i+1] != NULL)
+                putchar(',');
+        @}
address@hidden group
+        putchar('\n');
+    @}
+    endgrent();
+    return 0;
address@hidden
address@hidden endfile
address@hidden
address@hidden file eg/lib/grcat.c
+#endif /* HAVE_GETGRENT */
address@hidden endfile
address@hidden ignore
 @end example
 
address@hidden
-Also, verify that all regexp and string constants used in
-comparisons use only lowercase letters.
address@hidden itemize
-
address@hidden
-* Library Names::               How to best name private global variables in
-                                library functions.
-* General Functions::           Functions that are of general use.
-* Data File Management::        Functions for managing command-line data
-                                files.
-* Getopt Function::             A function for processing command-line
-                                arguments.
-* Passwd Functions::            Functions for getting user information.
-* Group Functions::             Functions for getting group information.
-* Walking Arrays::              A function to walk arrays of arrays.
address@hidden menu
-
address@hidden Library Names
address@hidden Naming Library Function Global Variables
-
address@hidden names, arrays/variables
address@hidden names, functions
address@hidden namespace issues
address@hidden @command{awk} programs, documenting
address@hidden documentation, of @command{awk} programs
-Due to the way the @command{awk} language evolved, variables are either
address@hidden (usable by the entire program) or @dfn{local} (usable just by
-a specific function).  There is no intermediate state analogous to
address@hidden variables in C.
-
address@hidden variables, global, for library functions
address@hidden private variables
address@hidden variables, private
-Library functions often need to have global variables that they can use to
-preserve state information between calls to the function---for example,
address@hidden()}'s variable @code{_opti}
-(@pxref{Getopt Function}).
-Such variables are called @dfn{private}, since the only functions that need to
-use them are the ones in the library.
+Each line in the group database represents one group.  The fields are
+separated with colons and represent the following information:
 
-When writing a library function, you should try to choose names for your
-private variables that will not conflict with any variables used by
-either another library function or a user's main program.  For example, a
-name like @code{i} or @code{j} is not a good choice, because user programs
-often use variable names like these for their own purposes.
address@hidden @asis
address@hidden Group Name
+The group's name.
 
address@hidden programming conventions, private variable names
-The example programs shown in this @value{CHAPTER} all start the names of their
-private variables with an underscore (@samp{_}).  Users generally don't use
-leading underscores in their variable names, so this convention immediately
-decreases the chances that the variable name will be accidentally shared
-with the user's program.
address@hidden Group Password
+The group's encrypted password. In practice, this field is never used;
+it is usually empty or set to @samp{*}.
 
address@hidden @code{_} (underscore), in names of private variables
address@hidden underscore (@code{_}), in names of private variables
-In addition, several of the library functions use a prefix that helps
-indicate what function or set of functions use the variables---for example,
address@hidden in the user database routines
-(@pxref{Passwd Functions}).
-This convention is recommended, since it even further decreases the
-chance of inadvertent conflict among variable names.  Note that this
-convention is used equally well for variable names and for private
-function address@hidden all the library routines could have
-been rewritten to use this convention, this was not done, in order to
-show how our own @command{awk} programming style has evolved and to
-provide some basis for this discussion.}
address@hidden Group ID Number
+The group's numeric group ID number;
+this number must be unique within the file.
+(On some systems it's a C @code{long}, and not an @code{int}.  Thus
+we cast it to @code{long} for all cases.)
 
-As a final note on variable naming, if a function makes global variables
-available for use by a main program, it is a good convention to start that
-variable's name with a capital letter---for
-example, @code{getopt()}'s @code{Opterr} and @code{Optind} variables
-(@pxref{Getopt Function}).
-The leading capital letter indicates that it is global, while the fact that
-the variable name is not all capital letters indicates that the variable is
-not one of @command{awk}'s built-in variables, such as @code{FS}.
address@hidden Group Member List
+A comma-separated list of user names.  These users are members of the group.
+Modern Unix systems allow users to be members of several groups
+simultaneously.  If your system does, then there are elements
address@hidden"group1"} through @code{"address@hidden"} in @code{PROCINFO}
+for those group ID numbers.
+(Note that @code{PROCINFO} is a @command{gawk} extension;
address@hidden Variables}.)
address@hidden table
 
address@hidden @code{--dump-variables} option
-It is also important that @emph{all} variables in library
-functions that do not need to save state are, in fact, declared
address@hidden@command{gawk}'s @option{--dump-variables} command-line
-option is useful for verifying this.} If this is not done, the variable
-could accidentally be used in the user's program, leading to bugs that
-are very difficult to track down:
+Here is what running @command{grcat} might produce:
 
 @example
-function lib_func(x, y,    l1, l2)
address@hidden
-    @dots{}
-    @var{use variable} some_var   # some_var should be local
-    @dots{}                     # but is not by oversight
address@hidden
+$ @kbd{grcat}
address@hidden wheel:*:0:arnold
address@hidden nogroup:*:65534:
address@hidden daemon:*:1:
address@hidden kmem:*:2:
address@hidden staff:*:10:arnold,miriam,andy
address@hidden other:*:20:
address@hidden
 @end example
 
address@hidden arrays, associative, library functions and
address@hidden libraries of @command{awk} functions, associative arrays and
address@hidden functions, library, associative arrays and
address@hidden Tcl
-A different convention, common in the Tcl community, is to use a single
-associative array to hold the values needed by the library function(s), or
-``package.''  This significantly decreases the number of actual global names
-in use.  For example, the functions described in
address@hidden Functions},
-might have used array elements @address@hidden"inited"]}}, 
@address@hidden"total"]}},
address@hidden@w{PW_data["count"]}}, and @address@hidden"awklib"]}}, instead of
address@hidden@w{_pw_inited}}, @address@hidden, @address@hidden,
-and @address@hidden
-
-The conventions presented in this @value{SECTION} are exactly
-that: conventions. You are not required to write your programs this
-way---we merely recommend that you do so.
-
address@hidden General Functions
address@hidden General Programming
-
-This @value{SECTION} presents a number of functions that are of general
-programming use.
-
address@hidden
-* Strtonum Function::           A replacement for the built-in
-                                @code{strtonum()} function.
-* Assert Function::             A function for assertions in @command{awk}
-                                programs.
-* Round Function::              A function for rounding if @code{sprintf()}
-                                does not do it correctly.
-* Cliff Random Function::       The Cliff Random Number Generator.
-* Ordinal Functions::           Functions for using characters as numbers and
-                                vice versa.
-* Join Function::               A function to join an array into a string.
-* Getlocaltime Function::       A function to get formatted times.
address@hidden menu
-
address@hidden Strtonum Function
address@hidden Converting Strings To Numbers
-
-The @code{strtonum()} function (@pxref{String Functions})
-is a @command{gawk} extension.  The following function
-provides an implementation for other versions of @command{awk}:
+Here are the functions for obtaining information from the group database.
+There are several, modeled after the C library functions of the same names:
 
address@hidden @code{getline} command, @code{_gr_init()} user-defined function
address@hidden @code{_gr_init()} user-defined function
 @example
address@hidden file eg/lib/strtonum.awk
-# mystrtonum --- convert string to number
-
address@hidden file eg/lib/groupawk.in
+# group.awk --- functions for dealing with the group file
 @c endfile
 @ignore
address@hidden file eg/lib/strtonum.awk
address@hidden file eg/lib/groupawk.in
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
-# February, 2004
-
+# May 1993
+# Revised October 2000
+# Revised December 2010
 @c endfile
 @end ignore
address@hidden file eg/lib/strtonum.awk
-function mystrtonum(str,        ret, chars, n, i, k, c)
address@hidden line break on _gr_init for smallbook
address@hidden file eg/lib/groupawk.in
+
+BEGIN    \
 @{
-    if (str ~ /^0[0-7]*$/) @{
-        # octal
-        n = length(str)
-        ret = 0
-        for (i = 1; i <= n; i++) @{
-            c = substr(str, i, 1)
-            if ((k = index("01234567", c)) > 0)
-                k-- # adjust for 1-basing in awk
+    # Change to suit your system
+    _gr_awklib = "/usr/local/libexec/awk/"
address@hidden
 
-            ret = ret * 8 + k
-        @}
-    @} else if (str ~ /^0[xX][[:xdigit:]]+/) @{
-        # hexadecimal
-        str = substr(str, 3)    # lop off leading 0x
-        n = length(str)
-        ret = 0
-        for (i = 1; i <= n; i++) @{
-            c = substr(str, i, 1)
-            c = tolower(c)
-            if ((k = index("0123456789", c)) > 0)
-                k-- # adjust for 1-basing in awk
-            else if ((k = index("abcdef", c)) > 0)
-                k += 9
+function _gr_init(    oldfs, oldrs, olddol0, grcat,
+                             using_fw, using_fpat, n, a, i)
address@hidden
+    if (_gr_inited)
+        return
 
-            ret = ret * 16 + k
-        @}
-    @} else if (str ~ \
-  /^[-+]?([0-9]+([.][0-9]*([Ee][0-9]+)?)?|([.][0-9]+([Ee][-+]?[0-9]+)?))$/) @{
-        # decimal number, possibly floating point
-        ret = str + 0
-    @} else
-        ret = "NOT-A-NUMBER"
+    oldfs = FS
+    oldrs = RS
+    olddol0 = $0
+    using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
+    using_fpat = (PROCINFO["FS"] == "FPAT")
+    FS = ":"
+    RS = "\n"
 
-    return ret
+    grcat = _gr_awklib "grcat"
+    while ((grcat | getline) > 0) @{
+        if ($1 in _gr_byname)
+            _gr_byname[$1] = _gr_byname[$1] "," $4
+        else
+            _gr_byname[$1] = $0
+        if ($3 in _gr_bygid)
+            _gr_bygid[$3] = _gr_bygid[$3] "," $4
+        else
+            _gr_bygid[$3] = $0
+
+        n = split($4, a, "[ \t]*,[ \t]*")
+        for (i = 1; i <= n; i++)
+            if (a[i] in _gr_groupsbyuser)
+                _gr_groupsbyuser[a[i]] = \
+                    _gr_groupsbyuser[a[i]] " " $1
+            else
+                _gr_groupsbyuser[a[i]] = $1
+
+        _gr_bycount[++_gr_count] = $0
+    @}
+    close(grcat)
+    _gr_count = 0
+    _gr_inited++
+    FS = oldfs
+    if (using_fw)
+        FIELDWIDTHS = FIELDWIDTHS
+    else if (using_fpat)
+        FPAT = FPAT
+    RS = oldrs
+    $0 = olddol0
 @}
-
-# BEGIN @{     # gawk test harness
-#     a[1] = "25"
-#     a[2] = ".31"
-#     a[3] = "0123"
-#     a[4] = "0xdeadBEEF"
-#     a[5] = "123.45"
-#     a[6] = "1.e3"
-#     a[7] = "1.32"
-#     a[7] = "1.32E2"
-# 
-#     for (i = 1; i in a; i++)
-#         print a[i], strtonum(a[i]), mystrtonum(a[i])
-# @}
 @c endfile
 @end example
 
-The function first looks for C-style octal numbers (base 8).
-If the input string matches a regular expression describing octal
-numbers, then @code{mystrtonum()} loops through each character in the
-string.  It sets @code{k} to the index in @code{"01234567"} of the current
-octal digit.  Since the return value is one-based, the @samp{k--}
-adjusts @code{k} so it can be used in computing the return value.
+The @code{BEGIN} rule sets a private variable to the directory where
address@hidden is stored.  Because it is used to help out an @command{awk} 
library
+routine, we have chosen to put it in @file{/usr/local/libexec/awk}.  You might
+want it to be in a different directory on your system.
 
-Similar logic applies to the code that checks for and converts a
-hexadecimal value, which starts with @samp{0x} or @samp{0X}.
-The use of @code{tolower()} simplifies the computation for finding
-the correct numeric value for each hexadecimal digit.
+These routines follow the same general outline as the user database routines
+(@pxref{Passwd Functions}).
+The @address@hidden variable is used to
+ensure that the database is scanned no more than once.
+The @address@hidden()}} function first saves @code{FS},
address@hidden, and
address@hidden, and then sets @code{FS} and @code{RS} to the correct values for
+scanning the group information.
+It also takes care to note whether @code{FIELDWIDTHS} or @code{FPAT}
+is being used, and to restore the appropriate field splitting mechanism.
 
-Finally, if the string matches the (rather complicated) regexp for a
-regular decimal integer or floating-point number, the computation
address@hidden = str + 0} lets @command{awk} convert the value to a
-number.
+The group information is stored is several associative arrays.
+The arrays are indexed by group name (@address@hidden), by group ID number
+(@address@hidden), and by position in the database (@address@hidden).
+There is an additional array indexed by user name (@address@hidden),
+which is a space-separated list of groups to which each user belongs.
 
-A commented-out test program is included, so that the function can
-be tested with @command{gawk} and the results compared to the built-in
address@hidden()} function.
+Unlike the user database, it is possible to have multiple records in the
+database for the same group.  This is common when a group has a large number
+of members.  A pair of such entries might look like the following:
 
address@hidden Assert Function
address@hidden Assertions
address@hidden
+tvpeople:*:101:johnny,jay,arsenio
+tvpeople:*:101:david,conan,tom,joan
address@hidden example
 
address@hidden STARTOFRANGE asse
address@hidden assertions
address@hidden STARTOFRANGE assef
address@hidden @code{assert()} function (C library)
address@hidden STARTOFRANGE libfass
address@hidden libraries of @command{awk} functions, assertions
address@hidden STARTOFRANGE flibass
address@hidden functions, library, assertions
address@hidden @command{awk} programs, lengthy, assertions
-When writing large programs, it is often useful to know
-that a condition or set of conditions is true.  Before proceeding with a
-particular computation, you make a statement about what you believe to be
-the case.  Such a statement is known as an
address@hidden  The C language provides an @code{<assert.h>} header file
-and corresponding @code{assert()} macro that the programmer can use to make
-assertions.  If an assertion fails, the @code{assert()} macro arranges to
-print a diagnostic message describing the condition that should have
-been true but was not, and then it kills the program.  In C, using
address@hidden()} looks this:
+For this reason, @code{_gr_init()} looks to see if a group name or
+group ID number is already seen.  If it is, then the user names are
+simply concatenated onto the previous list of users.  (There is actually a
+subtle problem with the code just presented.  Suppose that
+the first time there were no names. This code adds the names with
+a leading comma. It also doesn't check that there is a @code{$4}.)
 
address@hidden
-#include <assert.h>
+Finally, @code{_gr_init()} closes the pipeline to @command{grcat}, restores
address@hidden (and @code{FIELDWIDTHS} or @code{FPAT} if necessary), @code{RS}, 
and @code{$0},
+initializes @code{_gr_count} to zero
+(it is used later), and makes @code{_gr_inited} nonzero.
 
-int myfunc(int a, double b)
address@hidden @code{getgrnam()} function (C library)
+The @code{getgrnam()} function takes a group name as its argument, and if that
+group exists, it is returned.
+Otherwise, it
+relies on the array reference to a nonexistent
+element to create the element with the null string as its value:
+
address@hidden @code{getgrnam()} user-defined function
address@hidden
address@hidden file eg/lib/groupawk.in
+function getgrnam(group)
 @{
-     assert(a <= 5 && b >= 17.1);
-     @dots{}
+    _gr_init()
+    return _gr_byname[group]
 @}
address@hidden endfile
 @end example
 
-If the assertion fails, the program prints a message similar to this:
address@hidden @code{getgrgid()} function (C library)
+The @code{getgrgid()} function is similar; it takes a numeric group ID and
+looks up the information associated with that group ID:
 
address@hidden @code{getgrgid()} user-defined function
 @example
-prog.c:5: assertion failed: a <= 5 && b >= 17.1
address@hidden file eg/lib/groupawk.in
+function getgrgid(gid)
address@hidden
+    _gr_init()
+    return _gr_bygid[gid]
address@hidden
address@hidden endfile
 @end example
 
address@hidden @code{assert()} user-defined function
-The C language makes it possible to turn the condition into a string for use
-in printing the diagnostic message.  This is not possible in @command{awk}, so
-this @code{assert()} function also requires a string version of the condition
-that is being tested.
-Following is the function:
address@hidden @code{getgruser()} function (C library)
+The @code{getgruser()} function does not have a C counterpart. It takes a
+user name and returns the list of groups that have the user as a member:
 
address@hidden @code{getgruser()} function, user-defined
 @example
address@hidden file eg/lib/assert.awk
-# assert --- assert that a condition is true. Otherwise exit.
-
address@hidden file eg/lib/groupawk.in
+function getgruser(user)
address@hidden
+    _gr_init()
+    return _gr_groupsbyuser[user]
address@hidden
 @c endfile
address@hidden
address@hidden file eg/lib/assert.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May, 1993
address@hidden example
+
address@hidden @code{getgrent()} function (C library)
+The @code{getgrent()} function steps through the database one entry at a time.
+It uses @code{_gr_count} to track its position in the list:
 
address@hidden @code{getgrent()} user-defined function
address@hidden
address@hidden file eg/lib/groupawk.in
+function getgrent()
address@hidden
+    _gr_init()
+    if (++_gr_count in _gr_bycount)
+        return _gr_bycount[_gr_count]
+    return ""
address@hidden
 @c endfile
address@hidden ignore
address@hidden file eg/lib/assert.awk
-function assert(condition, string)
address@hidden example
address@hidden ENDOFRANGE clibf
+
address@hidden @code{endgrent()} function (C library)
+The @code{endgrent()} function resets @code{_gr_count} to zero so that 
@code{getgrent()} can
+start over again:
+
address@hidden @code{endgrent()} user-defined function
address@hidden
address@hidden file eg/lib/groupawk.in
+function endgrent()
 @{
-    if (! condition) @{
-        printf("%s:%d: assertion failed: %s\n",
-            FILENAME, FNR, string) > "/dev/stderr"
-        _assert_exit = 1
-        exit 1
-    @}
+    _gr_count = 0
 @}
address@hidden endfile
address@hidden example
 
address@hidden
-END @{
-    if (_assert_exit)
-        exit 1
+As with the user database routines, each function calls @code{_gr_init()} to
+initialize the arrays.  Doing so only incurs the extra overhead of running
address@hidden if these functions are used (as opposed to moving the body of
address@hidden()} into a @code{BEGIN} rule).
+
+Most of the work is in scanning the database and building the various
+associative arrays.  The functions that the user calls are themselves very
+simple, relying on @command{awk}'s associative arrays to do work.
+
+The @command{id} program in @ref{Id Program},
+uses these functions.
+
address@hidden Walking Arrays
address@hidden Traversing Arrays of Arrays
+
address@hidden of Arrays}, described how @command{gawk}
+provides arrays of arrays.  In particular, any element of
+an array may be either a scalar, or another array. The
address@hidden()} function (@pxref{Type Functions})
+lets you distinguish an array
+from a scalar.
+The following function, @code{walk_array()}, recursively traverses
+an array, printing each element's indices and value.
+You call it with the array and a string representing the name
+of the array:
+
address@hidden @code{walk_array()} user-defined function
address@hidden
address@hidden file eg/lib/walkarray.awk
+function walk_array(arr, name,      i)
address@hidden
+    for (i in arr) @{
+        if (isarray(arr[i]))
+            walk_array(arr[i], (name "[" i "]"))
+        else
+            printf("%s[%s] = %s\n", name, i, arr[i])
+    @}
 @}
address@hidden group
 @c endfile
 @end example
 
-The @code{assert()} function tests the @code{condition} parameter. If it
-is false, it prints a message to standard error, using the @code{string}
-parameter to describe the failed condition.  It then sets the variable
address@hidden to one and executes the @code{exit} statement.
-The @code{exit} statement jumps to the @code{END} rule. If the @code{END}
-rules finds @code{_assert_exit} to be true, it exits immediately.
address@hidden
+It works by looping over each element of the array. If any given
+element is itself an array, the function calls itself recursively,
+passing the subarray and a new string representing the current index.
+Otherwise, the function simply prints the element's name, index, and value.
+Here is a main program to demonstrate:
 
-The purpose of the test in the @code{END} rule is to
-keep any other @code{END} rules from running.  When an assertion fails, the
-program should exit immediately.
-If no assertions fail, then @code{_assert_exit} is still
-false when the @code{END} rule is run normally, and the rest of the
-program's @code{END} rules execute.
-For all of this to work correctly, @file{assert.awk} must be the
-first source file read by @command{awk}.
-The function can be used in a program in the following way:
address@hidden
+BEGIN @{
+    a[1] = 1
+    a[2][1] = 21
+    a[2][2] = 22
+    a[3] = 3
+    a[4][1][1] = 411
+    a[4][2] = 42
 
address@hidden
-function myfunc(a, b)
address@hidden
-     assert(a <= 5 && b >= 17.1, "a <= 5 && b >= 17.1")
-     @dots{}
+    walk_array(a, "a")
 @}
 @end example
 
address@hidden
-If the assertion fails, you see a message similar to the following:
+When run, the program produces the following output:
 
 @example
-mydata:1357: assertion failed: a <= 5 && b >= 17.1
+$ @kbd{gawk -f walk_array.awk}
address@hidden a[4][1][1] = 411
address@hidden a[4][2] = 42
address@hidden a[1] = 1
address@hidden a[2][1] = 21
address@hidden a[2][2] = 22
address@hidden a[3] = 3
 @end example
 
address@hidden @code{END} pattern, @code{assert()} user-defined function and
-There is a small problem with this version of @code{assert()}.
-An @code{END} rule is automatically added
-to the program calling @code{assert()}.  Normally, if a program consists
-of just a @code{BEGIN} rule, the input files and/or standard input are
-not read. However, now that the program has an @code{END} rule, @command{awk}
-attempts to read the input @value{DF}s or standard input
-(@pxref{Using BEGIN/END}),
-most likely causing the program to hang as it waits for input.
-
address@hidden @code{BEGIN} pattern, @code{assert()} user-defined function and
-There is a simple workaround to this:
-make sure that such a @code{BEGIN} rule always ends
-with an @code{exit} statement.
address@hidden ENDOFRANGE asse
address@hidden ENDOFRANGE assef
address@hidden ENDOFRANGE flibass
address@hidden ENDOFRANGE libfass
-
address@hidden Round Function
address@hidden Rounding Numbers
address@hidden ENDOFRANGE libfgdata
address@hidden ENDOFRANGE flibgdata
address@hidden ENDOFRANGE gdatar
address@hidden ENDOFRANGE libf
address@hidden ENDOFRANGE flib
address@hidden ENDOFRANGE fudlib
address@hidden ENDOFRANGE datagr
 
address@hidden rounding numbers
address@hidden numbers, rounding
address@hidden libraries of @command{awk} functions, rounding numbers
address@hidden functions, library, rounding numbers
address@hidden @code{print} statement, @code{sprintf()} function and
address@hidden @code{printf} statement, @code{sprintf()} function and
address@hidden @code{sprintf()} function, @code{print}/@code{printf} statements 
and
-The way @code{printf} and @code{sprintf()}
-(@pxref{Printf})
-perform rounding often depends upon the system's C @code{sprintf()}
-subroutine.  On many machines, @code{sprintf()} rounding is ``unbiased,''
-which means it doesn't always round a trailing @samp{.5} up, contrary
-to naive expectations.  In unbiased rounding, @samp{.5} rounds to even,
-rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4.  This means
-that if you are using a format that does rounding (e.g., @code{"%.0f"}),
-you should check what your system does.  The following function does
-traditional rounding; it might be useful if your @command{awk}'s @code{printf}
-does unbiased rounding:
address@hidden Sample Programs
address@hidden Practical @command{awk} Programs
address@hidden STARTOFRANGE awkpex
address@hidden @command{awk} programs, examples of
 
address@hidden @code{round()} user-defined function
address@hidden
address@hidden file eg/lib/round.awk
-# round.awk --- do normal rounding
address@hidden endfile
address@hidden
address@hidden file eg/lib/round.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# August, 1996
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/round.awk
address@hidden Functions},
+presents the idea that reading programs in a language contributes to
+learning that language.  This @value{CHAPTER} continues that theme,
+presenting a potpourri of @command{awk} programs for your reading
+enjoyment.
address@hidden
+There are three sections.
+The first describes how to run the programs presented
+in this @value{CHAPTER}.
 
-function round(x,   ival, aval, fraction)
address@hidden
-   ival = int(x)    # integer part, int() truncates
+The second presents @command{awk}
+versions of several common POSIX utilities.
+These are programs that you are hopefully already familiar with,
+and therefore, whose problems are understood.
+By reimplementing these programs in @command{awk},
+you can focus on the @command{awk}-related aspects of solving
+the programming problem.
 
-   # see if fractional part
-   if (ival == x)   # no fraction
-      return ival   # ensure no decimals
+The third is a grab bag of interesting programs.
+These solve a number of different data-manipulation and management
+problems.  Many of the programs are short, which emphasizes @command{awk}'s
+ability to do a lot in just a few lines of code.
address@hidden ifnotinfo
 
-   if (x < 0) @{
-      aval = -x     # absolute value
-      ival = int(aval)
-      fraction = aval - ival
-      if (fraction >= .5)
-         return int(x) - 1   # -2.5 --> -3
-      else
-         return int(x)       # -2.3 --> -2
-   @} else @{
-      fraction = x - ival
-      if (fraction >= .5)
-         return ival + 1
-      else
-         return ival
-   @}
address@hidden
address@hidden endfile
address@hidden don't include test harness in the file that gets installed
+Many of these programs use library functions presented in
address@hidden Functions}.
 
-# test harness
address@hidden print $0, round($0) @}
address@hidden example
address@hidden
+* Running Examples::            How to run these examples.
+* Clones::                      Clones of common utilities.
+* Miscellaneous Programs::      Some interesting @command{awk} programs.
address@hidden menu
 
address@hidden Cliff Random Function
address@hidden The Cliff Random Number Generator
address@hidden random numbers, Cliff
address@hidden Cliff random numbers
address@hidden numbers, Cliff random
address@hidden functions, library, Cliff random numbers
address@hidden Running Examples
address@hidden Running the Example Programs
 
-The
address@hidden://mathworld.wolfram.com/CliffRandomNumberGenerator.html, Cliff 
random number generator}
-is a very simple random number generator that ``passes the noise sphere test
-for randomness by showing no structure.''
-It is easily programmed, in less than 10 lines of @command{awk} code:
+To run a given program, you would typically do something like this:
 
address@hidden @code{cliff_rand()} user-defined function
 @example
address@hidden file eg/lib/cliff_rand.awk
-# cliff_rand.awk --- generate Cliff random numbers
address@hidden endfile
address@hidden
address@hidden file eg/lib/cliff_rand.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# December 2000
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/cliff_rand.awk
-
-BEGIN @{ _cliff_seed = 0.1 @}
-
-function cliff_rand()
address@hidden
-    _cliff_seed = (100 * log(_cliff_seed)) % 1
-    if (_cliff_seed < 0)
-        _cliff_seed = - _cliff_seed
-    return _cliff_seed
address@hidden
address@hidden endfile
+awk -f @var{program} -- @var{options} @var{files}
 @end example
 
-This algorithm requires an initial ``seed'' of 0.1.  Each new value
-uses the current seed as input for the calculation.
-If the built-in @code{rand()} function
-(@pxref{Numeric Functions})
-isn't random enough, you might try using this function instead.
address@hidden
+Here, @var{program} is the name of the @command{awk} program (such as
address@hidden), @var{options} are any command-line options for the
+program that start with a @samp{-}, and @var{files} are the actual @value{DF}s.
 
address@hidden Ordinal Functions
address@hidden Translating Between Characters and Numbers
+If your system supports the @samp{#!} executable interpreter mechanism
+(@pxref{Executable Scripts}),
+you can instead run your program directly:
 
address@hidden libraries of @command{awk} functions, character values as numbers
address@hidden functions, library, character values as numbers
address@hidden characters, values of as numbers
address@hidden numbers, as values of characters
-One commercial implementation of @command{awk} supplies a built-in function,
address@hidden()}, which takes a character and returns the numeric value for 
that
-character in the machine's character set.  If the string passed to
address@hidden()} has more than one character, only the first one is used.
address@hidden
+cut.awk -c1-8 myfiles > results
address@hidden example
 
-The inverse of this function is @code{chr()} (from the function of the same
-name in Pascal), which takes a number and returns the corresponding character.
-Both functions are written very nicely in @command{awk}; there is no real
-reason to build them into the @command{awk} interpreter:
+If your @command{awk} is not @command{gawk}, you may instead need to use this:
 
address@hidden @code{ord()} user-defined function
address@hidden @code{chr()} user-defined function
 @example
address@hidden file eg/lib/ord.awk
-# ord.awk --- do ord and chr
+cut.awk -- -c1-8 myfiles > results
address@hidden example
 
-# Global identifiers:
-#    _ord_:        numerical values indexed by characters
-#    _ord_init:    function to initialize _ord_
address@hidden endfile
address@hidden
address@hidden file eg/lib/ord.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# 16 January, 1992
-# 20 July, 1992, revised
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/ord.awk
address@hidden Clones
address@hidden Reinventing Wheels for Fun and Profit
address@hidden STARTOFRANGE posimawk
address@hidden POSIX, address@hidden implementing in @command{awk}
 
-BEGIN    @{ _ord_init() @}
+This @value{SECTION} presents a number of POSIX utilities implemented in
address@hidden  Reinventing these programs in @command{awk} is often enjoyable,
+because the algorithms can be very clearly expressed, and the code is usually
+very concise and simple.  This is true because @command{awk} does so much for 
you.
 
-function _ord_init(    low, high, i, t)
address@hidden
-    low = sprintf("%c", 7) # BEL is ascii 7
-    if (low == "\a") @{    # regular ascii
-        low = 0
-        high = 127
-    @} else if (sprintf("%c", 128 + 7) == "\a") @{
-        # ascii, mark parity
-        low = 128
-        high = 255
-    @} else @{        # ebcdic(!)
-        low = 0
-        high = 255
-    @}
+It should be noted that these programs are not necessarily intended to
+replace the installed versions on your system.
+Nor may all of these programs be fully compliant with the most recent
+POSIX standard.  This is not a problem; their
+purpose is to illustrate @command{awk} language programming for ``real world''
+tasks.
+
+The programs are presented in alphabetical order.
 
-    for (i = low; i <= high; i++) @{
-        t = sprintf("%c", i)
-        _ord_[t] = i
-    @}
address@hidden
address@hidden endfile
address@hidden example
address@hidden
+* Cut Program::                 The @command{cut} utility.
+* Egrep Program::               The @command{egrep} utility.
+* Id Program::                  The @command{id} utility.
+* Split Program::               The @command{split} utility.
+* Tee Program::                 The @command{tee} utility.
+* Uniq Program::                The @command{uniq} utility.
+* Wc Program::                  The @command{wc} utility.
address@hidden menu
 
address@hidden character sets (machine character encodings)
address@hidden ASCII
address@hidden EBCDIC
address@hidden mark parity
-Some explanation of the numbers used by @code{chr} is worthwhile.
-The most prominent character set in use today is address@hidden
-is changing; many systems use Unicode, a very large character set
-that includes ASCII as a subset.  On systems with full Unicode support,
-a character can occupy up to 32 bits, making simple tests such as
-used here prohibitively expensive.}
-Although an
-8-bit byte can hold 256 distinct values (from 0 to 255), ASCII only
-defines characters that use the values from 0 to address@hidden
-has been extended in many countries to use the values from 128 to 255
-for country-specific characters.  If your  system uses these extensions,
-you can simplify @code{_ord_init} to loop from 0 to 255.}
-In the now distant past,
-at least one minicomputer manufacturer
address@hidden Pr1me, blech
-used ASCII, but with mark parity, meaning that the leftmost bit in the byte
-is always 1.  This means that on those systems, characters
-have numeric values from 128 to 255.
-Finally, large mainframe systems use the EBCDIC character set, which
-uses all 256 values.
-While there are other character sets in use on some older systems,
-they are not really worth worrying about:
address@hidden Cut Program
address@hidden Cutting out Fields and Columns
 
address@hidden
address@hidden file eg/lib/ord.awk
-function ord(str,    c)
address@hidden
-    # only first character is of interest
-    c = substr(str, 1, 1)
-    return _ord_[c]
address@hidden
address@hidden @command{cut} utility
address@hidden STARTOFRANGE cut
address@hidden @command{cut} utility
address@hidden STARTOFRANGE ficut
address@hidden fields, cutting
address@hidden STARTOFRANGE colcut
address@hidden columns, cutting
+The @command{cut} utility selects, or ``cuts,'' characters or fields
+from its standard input and sends them to its standard output.
+Fields are separated by TABs by default,
+but you may supply a command-line option to change the field
address@hidden (i.e., the field-separator character). @command{cut}'s
+definition of fields is less general than @command{awk}'s.
 
-function chr(c)
address@hidden
-    # force c to be numeric by adding 0
-    return sprintf("%c", c + 0)
address@hidden
address@hidden endfile
+A common use of @command{cut} might be to pull out just the login name of
+logged-on users from the output of @command{who}.  For example, the following
+pipeline generates a sorted, unique list of the logged-on users:
 
-#### test code ####
-# BEGIN    \
-# @{
-#    for (;;) @{
-#        printf("enter a character: ")
-#        if (getline var <= 0)
-#            break
-#        printf("ord(%s) = %d\n", var, ord(var))
-#    @}
-# @}
address@hidden endfile
address@hidden
+who | cut -c1-8 | sort | uniq
 @end example
 
-An obvious improvement to these functions is to move the code for the
address@hidden@w{_ord_init}} function into the body of the @code{BEGIN} rule.  
It was
-written this way initially for ease of development.
-There is a ``test program'' in a @code{BEGIN} rule, to test the
-function.  It is commented out for production use.
+The options for @command{cut} are:
 
address@hidden Join Function
address@hidden Merging an Array into a String
address@hidden @code
address@hidden -c @var{list}
+Use @var{list} as the list of characters to cut out.  Items within the list
+may be separated by commas, and ranges of characters can be separated with
+dashes.  The list @samp{1-8,15,22-35} specifies characters 1 through
+8, 15, and 22 through 35.
 
address@hidden libraries of @command{awk} functions, merging arrays into strings
address@hidden functions, library, merging arrays into strings
address@hidden strings, merging arrays into
address@hidden arrays, merging into strings
-When doing string processing, it is often useful to be able to join
-all the strings in an array into one long string.  The following function,
address@hidden()}, accomplishes this task.  It is used later in several of
-the application programs
-(@pxref{Sample Programs}).
address@hidden -f @var{list}
+Use @var{list} as the list of fields to cut out.
 
-Good function design is important; this function needs to be general but it
-should also have a reasonable default behavior.  It is called with an array
-as well as the beginning and ending indices of the elements in the array to be
-merged.  This assumes that the array indices are numeric---a reasonable
-assumption since the array was likely created with @code{split()}
-(@pxref{String Functions}):
address@hidden -d @var{delim}
+Use @var{delim} as the field-separator character instead of the TAB
+character.
 
address@hidden @code{join()} user-defined function
address@hidden -s
+Suppress printing of lines that do not contain the field delimiter.
address@hidden table
+
+The @command{awk} implementation of @command{cut} uses the @code{getopt()} 
library
+function (@pxref{Getopt Function})
+and the @code{join()} library function
+(@pxref{Join Function}).
+
+The program begins with a comment describing the options, the library
+functions needed, and a @code{usage()} function that prints out a usage
+message and exits.  @code{usage()} is called if invalid arguments are
+supplied:
+
address@hidden @code{cut.awk} program
 @example
address@hidden file eg/lib/join.awk
-# join.awk --- join an array into a string
address@hidden file eg/prog/cut.awk
+# cut.awk --- implement cut in awk
 @c endfile
 @ignore
address@hidden file eg/lib/join.awk
address@hidden file eg/prog/cut.awk
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
 # May 1993
 @c endfile
 @end ignore
address@hidden file eg/lib/join.awk
address@hidden file eg/prog/cut.awk
 
-function join(array, start, end, sep,    result, i)
+# Options:
+#    -f list     Cut fields
+#    -d c        Field delimiter character
+#    -c list     Cut characters
+#
+#    -s          Suppress lines without the delimiter
+#
+# Requires getopt() and join() library functions
+
address@hidden
+function usage(    e1, e2)
 @{
-    if (sep == "")
-       sep = " "
-    else if (sep == SUBSEP) # magic value
-       sep = ""
-    result = array[start]
-    for (i = start + 1; i <= end; i++)
-        result = result sep array[i]
-    return result
+    e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
+    e2 = "usage: cut [-c list] [files...]"
+    print e1 > "/dev/stderr"
+    print e2 > "/dev/stderr"
+    exit 1
 @}
address@hidden group
 @c endfile
 @end example
 
-An optional additional argument is the separator to use when joining the
-strings back together.  If the caller supplies a nonempty value,
address@hidden()} uses it; if it is not supplied, it has a null
-value.  In this case, @code{join()} uses a single space as a default
-separator for the strings.  If the value is equal to @code{SUBSEP},
-then @code{join()} joins the strings with no separator between them.
address@hidden serves as a ``magic'' value to indicate that there should
-be no separation between the component address@hidden would
-be nice if @command{awk} had an assignment operator for concatenation.
-The lack of an explicit operator for concatenation makes string operations
-more difficult than they really need to be.}
-
address@hidden Getlocaltime Function
address@hidden Managing the Time of Day
-
address@hidden libraries of @command{awk} functions, managing, time
address@hidden functions, library, managing time
address@hidden timestamps, formatted
address@hidden time, managing
-The @code{systime()} and @code{strftime()} functions described in
address@hidden Functions},
-provide the minimum functionality necessary for dealing with the time of day
-in human readable form.  While @code{strftime()} is extensive, the control
-formats are not necessarily easy to remember or intuitively obvious when
-reading a program.
address@hidden
+The variables @code{e1} and @code{e2} are used so that the function
+fits nicely on the
address@hidden
+page.
address@hidden ifnotinfo
address@hidden
+screen.
address@hidden ifnottex
 
-The following function, @code{getlocaltime()}, populates a user-supplied array
-with preformatted time information.  It returns a string with the current
-time formatted in the same way as the @command{date} utility:
address@hidden @code{BEGIN} pattern, running @command{awk} programs and
address@hidden @code{FS} variable, running @command{awk} programs and
+Next comes a @code{BEGIN} rule that parses the command-line options.
+It sets @code{FS} to a single TAB character, because that is @command{cut}'s
+default field separator. The rule then sets the output field separator to be 
the
+same as the input field separator.  A loop using @code{getopt()} steps
+through the command-line options.  Exactly one of the variables
address@hidden or @code{by_chars} is set to true, to indicate that
+processing should be done by fields or by characters, respectively.
+When cutting by characters, the output field separator is set to the null
+string:
 
address@hidden @code{getlocaltime()} user-defined function
 @example
address@hidden file eg/lib/gettime.awk
-# getlocaltime.awk --- get the time of day in a usable format
address@hidden endfile
address@hidden
address@hidden file eg/lib/gettime.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain, May 1993
-#
address@hidden file eg/prog/cut.awk
+BEGIN    \
address@hidden
+    FS = "\t"    # default
+    OFS = FS
+    while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) @{
+        if (c == "f") @{
+            by_fields = 1
+            fieldlist = Optarg
+        @} else if (c == "c") @{
+            by_chars = 1
+            fieldlist = Optarg
+            OFS = ""
+        @} else if (c == "d") @{
+            if (length(Optarg) > 1) @{
+                printf("Using first character of %s" \
+                       " for delimiter\n", Optarg) > "/dev/stderr"
+                Optarg = substr(Optarg, 1, 1)
+            @}
+            FS = Optarg
+            OFS = FS
+            if (FS == " ")    # defeat awk semantics
+                FS = "[ ]"
+        @} else if (c == "s")
+            suppress++
+        else
+            usage()
+    @}
+
+    # Clear out options
+    for (i = 1; i < Optind; i++)
+        ARGV[i] = ""
 @c endfile
address@hidden ignore
address@hidden file eg/lib/gettime.awk
address@hidden example
 
-# Returns a string in the format of output of date(1)
-# Populates the array argument time with individual values:
-#    time["second"]       -- seconds (0 - 59)
-#    time["minute"]       -- minutes (0 - 59)
-#    time["hour"]         -- hours (0 - 23)
-#    time["althour"]      -- hours (0 - 12)
-#    time["monthday"]     -- day of month (1 - 31)
-#    time["month"]        -- month of year (1 - 12)
-#    time["monthname"]    -- name of the month
-#    time["shortmonth"]   -- short name of the month
-#    time["year"]         -- year modulo 100 (0 - 99)
-#    time["fullyear"]     -- full year
-#    time["weekday"]      -- day of week (Sunday = 0)
-#    time["altweekday"]   -- day of week (Monday = 0)
-#    time["dayname"]      -- name of weekday
-#    time["shortdayname"] -- short name of weekday
-#    time["yearday"]      -- day of year (0 - 365)
-#    time["timezone"]     -- abbreviation of timezone name
-#    time["ampm"]         -- AM or PM designation
-#    time["weeknum"]      -- week number, Sunday first day
-#    time["altweeknum"]   -- week number, Monday first day
address@hidden field separators, spaces as
+The code must take
+special care when the field delimiter is a space.  Using
+a single space (@address@hidden" "}}) for the value of @code{FS} is
address@hidden would separate fields with runs of spaces,
+TABs, and/or newlines, and we want them to be separated with individual
+spaces.  Also remember that after @code{getopt()} is through
+(as described in @ref{Getopt Function}),
+we have to
+clear out all the elements of @code{ARGV} from 1 to @code{Optind},
+so that @command{awk} does not try to process the command-line options
+as @value{FN}s.
 
-function getlocaltime(time,    ret, now, i)
address@hidden
-    # get time once, avoids unnecessary system calls
-    now = systime()
+After dealing with the command-line options, the program verifies that the
+options make sense.  Only one or the other of @option{-c} and @option{-f}
+should be used, and both require a field list.  Then the program calls
+either @code{set_fieldlist()} or @code{set_charlist()} to pull apart the
+list of fields or characters:
 
-    # return date(1)-style output
-    ret = strftime("%a %b %e %H:%M:%S %Z %Y", now)
address@hidden
address@hidden file eg/prog/cut.awk
+    if (by_fields && by_chars)
+        usage()
 
-    # clear out target array
-    delete time
+    if (by_fields == 0 && by_chars == 0)
+        by_fields = 1    # default
 
-    # fill in values, force numeric values to be
-    # numeric by adding 0
-    time["second"]       = strftime("%S", now) + 0
-    time["minute"]       = strftime("%M", now) + 0
-    time["hour"]         = strftime("%H", now) + 0
-    time["althour"]      = strftime("%I", now) + 0
-    time["monthday"]     = strftime("%d", now) + 0
-    time["month"]        = strftime("%m", now) + 0
-    time["monthname"]    = strftime("%B", now)
-    time["shortmonth"]   = strftime("%b", now)
-    time["year"]         = strftime("%y", now) + 0
-    time["fullyear"]     = strftime("%Y", now) + 0
-    time["weekday"]      = strftime("%w", now) + 0
-    time["altweekday"]   = strftime("%u", now) + 0
-    time["dayname"]      = strftime("%A", now)
-    time["shortdayname"] = strftime("%a", now)
-    time["yearday"]      = strftime("%j", now) + 0
-    time["timezone"]     = strftime("%Z", now)
-    time["ampm"]         = strftime("%p", now)
-    time["weeknum"]      = strftime("%U", now) + 0
-    time["altweeknum"]   = strftime("%W", now) + 0
+    if (fieldlist == "") @{
+        print "cut: needs list for -c or -f" > "/dev/stderr"
+        exit 1
+    @}
 
-    return ret
+    if (by_fields)
+        set_fieldlist()
+    else
+        set_charlist()
 @}
 @c endfile
 @end example
 
-The string indices are easier to use and read than the various formats
-required by @code{strftime()}.  The @code{alarm} program presented in
address@hidden Program},
-uses this function.
-A more general design for the @code{getlocaltime()} function would have
-allowed the user to supply an optional timestamp value to use instead
-of the current time.
-
address@hidden Data File Management
address@hidden @value{DDF} Management
-
address@hidden STARTOFRANGE dataf
address@hidden files, managing
address@hidden STARTOFRANGE libfdataf
address@hidden libraries of @command{awk} functions, managing, @value{DF}s
address@hidden STARTOFRANGE flibdataf
address@hidden functions, library, managing @value{DF}s
-This @value{SECTION} presents functions that are useful for managing
-command-line @value{DF}s.
address@hidden()} splits the field list apart at the commas
+into an array.  Then, for each element of the array, it looks to
+see if the element is actually a range, and if so, splits it apart.
+The function checks the range
+to make sure that the first number is smaller than the second.
+Each number in the list is added to the @code{flist} array, which
+simply lists the fields that will be printed.  Normal field splitting
+is used.  The program lets @command{awk} handle the job of doing the
+field splitting:
 
address@hidden
-* Filetrans Function::          A function for handling data file transitions.
-* Rewind Function::             A function for rereading the current file.
-* File Checking::               Checking that data files are readable.
-* Empty Files::                 Checking for zero-length files.
-* Ignoring Assigns::            Treating assignments as file names.
address@hidden menu
address@hidden
address@hidden file eg/prog/cut.awk
+function set_fieldlist(        n, m, i, j, k, f, g)
address@hidden
+    n = split(fieldlist, f, ",")
+    j = 1    # index in flist
+    for (i = 1; i <= n; i++) @{
+        if (index(f[i], "-") != 0) @{ # a range
+            m = split(f[i], g, "-")
address@hidden
+            if (m != 2 || g[1] >= g[2]) @{
+                printf("bad field list: %s\n",
+                                  f[i]) > "/dev/stderr"
+                exit 1
+            @}
address@hidden group
+            for (k = g[1]; k <= g[2]; k++)
+                flist[j++] = k
+        @} else
+            flist[j++] = f[i]
+    @}
+    nfields = j - 1
address@hidden
address@hidden endfile
address@hidden example
 
address@hidden Filetrans Function
address@hidden Noting @value{DDF} Boundaries
+The @code{set_charlist()} function is more complicated than
address@hidden()}.
+The idea here is to use @command{gawk}'s @code{FIELDWIDTHS} variable
+(@pxref{Constant Size}),
+which describes constant-width input.  When using a character list, that is
+exactly what we have.
 
address@hidden files, managing, @value{DF} boundaries
address@hidden files, initialization and cleanup
-The @code{BEGIN} and @code{END} rules are each executed exactly once at
-the beginning and end of your @command{awk} program, respectively
-(@pxref{BEGIN/END}).
-We (the @command{gawk} authors) once had a user who mistakenly thought that the
address@hidden rule is executed at the beginning of each @value{DF} and the
address@hidden rule is executed at the end of each @value{DF}.
+Setting up @code{FIELDWIDTHS} is more complicated than simply listing the
+fields that need to be printed.  We have to keep track of the fields to
+print and also the intervening characters that have to be skipped.
+For example, suppose you wanted characters 1 through 8, 15, and
+22 through 35.  You would use @samp{-c 1-8,15,22-35}.  The necessary value
+for @code{FIELDWIDTHS} is @address@hidden"8 6 1 6 14"}}.  This yields five
+fields, and the fields to print
+are @code{$1}, @code{$3}, and @code{$5}.
+The intermediate fields are @dfn{filler},
+which is stuff in between the desired data.
address@hidden lists the fields to print, and @code{t} tracks the
+complete field list, including filler fields:
 
-When informed
-that this was not the case, the user requested that we add new special
-patterns to @command{gawk}, named @code{BEGIN_FILE} and @code{END_FILE}, that
-would have the desired behavior.  He even supplied us the code to do so.
address@hidden
address@hidden file eg/prog/cut.awk
+function set_charlist(    field, i, j, f, g, t,
+                          filler, last, len)
address@hidden
+    field = 1   # count total fields
+    n = split(fieldlist, f, ",")
+    j = 1       # index in flist
+    for (i = 1; i <= n; i++) @{
+        if (index(f[i], "-") != 0) @{ # range
+            m = split(f[i], g, "-")
+            if (m != 2 || g[1] >= g[2]) @{
+                printf("bad character list: %s\n",
+                               f[i]) > "/dev/stderr"
+                exit 1
+            @}
+            len = g[2] - g[1] + 1
+            if (g[1] > 1)  # compute length of filler
+                filler = g[1] - last - 1
+            else
+                filler = 0
address@hidden
+            if (filler)
+                t[field++] = filler
address@hidden group
+            t[field++] = len  # length of field
+            last = g[2]
+            flist[j++] = field - 1
+        @} else @{
+            if (f[i] > 1)
+                filler = f[i] - last - 1
+            else
+                filler = 0
+            if (filler)
+                t[field++] = filler
+            t[field++] = 1
+            last = f[i]
+            flist[j++] = field - 1
+        @}
+    @}
+    FIELDWIDTHS = join(t, 1, field - 1)
+    nfields = j - 1
address@hidden
address@hidden endfile
address@hidden example
 
-Adding these special patterns to @command{gawk} wasn't necessary;
-the job can be done cleanly in @command{awk} itself, as illustrated
-by the following library program.
-It arranges to call two user-supplied functions, @code{beginfile()} and
address@hidden()}, at the beginning and end of each @value{DF}.
-Besides solving the problem in only nine(!) lines of code, it does so
address@hidden; this works with any implementation of @command{awk}:
+Next is the rule that actually processes the data.  If the @option{-s} option
+is given, then @code{suppress} is true.  The first @code{if} statement
+makes sure that the input record does have the field separator.  If
address@hidden is processing fields, @code{suppress} is true, and the field
+separator character is not in the record, then the record is skipped.
 
address@hidden
-# transfile.awk
-#
-# Give the user a hook for filename transitions
-#
-# The user must supply functions beginfile() and endfile()
-# that each take the name of the file being started or
-# finished, respectively.
address@hidden #
address@hidden # Arnold Robbins, arnold@@skeeve.com, Public Domain
address@hidden # January 1992
+If the record is valid, then @command{gawk} has split the data
+into fields, either using the character in @code{FS} or using fixed-length
+fields and @code{FIELDWIDTHS}.  The loop goes through the list of fields
+that should be printed.  The corresponding field is printed if it contains 
data.
+If the next field also has data, then the separator character is
+written out between the fields:
 
-FILENAME != _oldfilename \
address@hidden
address@hidden file eg/prog/cut.awk
 @{
-    if (_oldfilename != "")
-        endfile(_oldfilename)
-    _oldfilename = FILENAME
-    beginfile(FILENAME)
address@hidden
+    if (by_fields && suppress && index($0, FS) != 0)
+        next
 
-END   @{ endfile(FILENAME) @}
+    for (i = 1; i <= nfields; i++) @{
+        if ($flist[i] != "") @{
+            printf "%s", $flist[i]
+            if (i < nfields && $flist[i+1] != "")
+                printf "%s", OFS
+        @}
+    @}
+    print ""
address@hidden
address@hidden endfile
 @end example
 
-This file must be loaded before the user's ``main'' program, so that the
-rule it supplies is executed first.
+This version of @command{cut} relies on @command{gawk}'s @code{FIELDWIDTHS}
+variable to do the character-based cutting.  While it is possible in
+other @command{awk} implementations to use @code{substr()}
+(@pxref{String Functions}),
+it is also extremely painful.
+The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem
+of picking the input line apart by characters.
address@hidden ENDOFRANGE cut
address@hidden ENDOFRANGE ficut
address@hidden ENDOFRANGE colcut
 
-This rule relies on @command{awk}'s @code{FILENAME} variable that
-automatically changes for each new @value{DF}.  The current @value{FN} is
-saved in a private variable, @code{_oldfilename}.  If @code{FILENAME} does
-not equal @code{_oldfilename}, then a new @value{DF} is being processed and
-it is necessary to call @code{endfile()} for the old file.  Because
address@hidden()} should only be called if a file has been processed, the
-program first checks to make sure that @code{_oldfilename} is not the null
-string.  The program then assigns the current @value{FN} to
address@hidden and calls @code{beginfile()} for the file.
-Because, like all @command{awk} variables, @code{_oldfilename} is
-initialized to the null string, this rule executes correctly even for the
-first @value{DF}.
address@hidden Exercise: Rewrite using split with "".
 
-The program also supplies an @code{END} rule to do the final processing for
-the last file.  Because this @code{END} rule comes before any @code{END} rules
-supplied in the ``main'' program, @code{endfile()} is called first.  Once
-again the value of multiple @code{BEGIN} and @code{END} rules should be clear.
address@hidden Egrep Program
address@hidden Searching for Regular Expressions in Files
 
address@hidden @code{beginfile()} user-defined function
address@hidden @code{endfile()} user-defined function
-If the same @value{DF} occurs twice in a row on the command line, then
address@hidden()} and @code{beginfile()} are not executed at the end of the
-first pass and at the beginning of the second pass.
-The following version solves the problem:
address@hidden STARTOFRANGE regexps
address@hidden regular expressions, searching for
address@hidden STARTOFRANGE sfregexp
address@hidden searching, files for regular expressions
address@hidden STARTOFRANGE fsregexp
address@hidden files, searching for regular expressions
address@hidden @command{egrep} utility
+The @command{egrep} utility searches files for patterns.  It uses regular
+expressions that are almost identical to those available in @command{awk}
+(@pxref{Regexp}).
+You invoke it as follows:
 
 @example
address@hidden file eg/lib/ftrans.awk
-# ftrans.awk --- handle data file transitions
-#
-# user supplies beginfile() and endfile() functions
address@hidden endfile
address@hidden
address@hidden file eg/lib/ftrans.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# November 1992
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/ftrans.awk
+egrep @r{[} @var{options} @r{]} '@var{pattern}' @var{files} @dots{}
address@hidden example
 
-FNR == 1 @{
-    if (_filename_ != "")
-        endfile(_filename_)
-    _filename_ = FILENAME
-    beginfile(FILENAME)
address@hidden
+The @var{pattern} is a regular expression.  In typical usage, the regular
+expression is quoted to prevent the shell from expanding any of the
+special characters as @value{FN} wildcards.  Normally, @command{egrep}
+prints the lines that matched.  If multiple @value{FN}s are provided on
+the command line, each output line is preceded by the name of the file
+and a colon.
 
-END  @{ endfile(_filename_) @}
address@hidden endfile
address@hidden example
+The options to @command{egrep} are as follows:
 
address@hidden Program},
-shows how this library function can be used and
-how it simplifies writing the main program.
address@hidden @code
address@hidden -c
+Print out a count of the lines that matched the pattern, instead of the
+lines themselves.
 
address@hidden fakenode --- for prepinfo
address@hidden Advanced Notes: So Why Does @command{gawk} have @code{BEGINFILE} 
and @code{ENDFILE}?
address@hidden -s
+Be silent.  No output is produced and the exit value indicates whether
+the pattern was matched.
 
-You are probably wondering, if @code{beginfile()} and @code{endfile()}
-functions can do the job, why does @command{gawk} have
address@hidden and @code{ENDFILE} patterns (@pxref{BEGINFILE/ENDFILE})?
address@hidden -v
+Invert the sense of the test. @command{egrep} prints the lines that do
address@hidden match the pattern and exits successfully if the pattern is not
+matched.
 
-Good question.  Normally, if @command{awk} cannot open a file, this
-causes an immediate fatal error.  In this case, there is no way for a
-user-defined function to deal with the problem, since the mechanism for
-calling it relies on the file being open and at the first record.  Thus,
-the main reason for @code{BEGINFILE} is to give you a ``hook'' to catch
-files that cannot be processed.  @code{ENDFILE} exists for symmetry,
-and because it provides an easy way to do per-file cleanup processing.
address@hidden -i
+Ignore case distinctions in both the pattern and the input data.
 
address@hidden Rewind Function
address@hidden Rereading the Current File
address@hidden -l
+Only print (list) the names of the files that matched, not the lines that 
matched.
 
address@hidden files, reading
-Another request for a new built-in function was for a @code{rewind()}
-function that would make it possible to reread the current file.
-The requesting user didn't want to have to use @code{getline}
-(@pxref{Getline})
-inside a loop.
address@hidden -e @var{pattern}
+Use @var{pattern} as the regexp to match.  The purpose of the @option{-e}
+option is to allow patterns that start with a @samp{-}.
address@hidden table
 
-However, as long as you are not in the @code{END} rule, it is
-quite easy to arrange to immediately close the current input file
-and then start over with it from the top.
-For lack of a better name, we'll call it @code{rewind()}:
+This version uses the @code{getopt()} library function
+(@pxref{Getopt Function})
+and the file transition library program
+(@pxref{Filetrans Function}).
 
address@hidden @code{rewind()} user-defined function
+The program begins with a descriptive comment and then a @code{BEGIN} rule
+that processes the command-line arguments with @code{getopt()}.  The 
@option{-i}
+(ignore case) option is particularly easy with @command{gawk}; we just use the
address@hidden built-in variable
+(@pxref{Built-in Variables}):
+
address@hidden @code{egrep.awk} program
 @example
address@hidden file eg/lib/rewind.awk
-# rewind.awk --- rewind the current file and start over
address@hidden file eg/prog/egrep.awk
+# egrep.awk --- simulate egrep in awk
+#
 @c endfile
 @ignore
address@hidden file eg/lib/rewind.awk
-#
address@hidden file eg/prog/egrep.awk
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
-# September 2000
+# May 1993
+
 @c endfile
 @end ignore
address@hidden file eg/lib/rewind.awk
address@hidden file eg/prog/egrep.awk
+# Options:
+#    -c    count of lines
+#    -s    silent - use exit value
+#    -v    invert test, success if no match
+#    -i    ignore case
+#    -l    print filenames only
+#    -e    argument is pattern
+#
+# Requires getopt and file transition library functions
 
-function rewind(    i)
address@hidden
-    # shift remaining arguments up
-    for (i = ARGC; i > ARGIND; i--)
-        ARGV[i] = ARGV[i-1]
+BEGIN @{
+    while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) @{
+        if (c == "c")
+            count_only++
+        else if (c == "s")
+            no_print++
+        else if (c == "v")
+            invert++
+        else if (c == "i")
+            IGNORECASE = 1
+        else if (c == "l")
+            filenames_only++
+        else if (c == "e")
+            pattern = Optarg
+        else
+            usage()
+    @}
address@hidden endfile
address@hidden example
 
-    # make sure gawk knows to keep going
-    ARGC++
+Next comes the code that handles the @command{egrep}-specific behavior. If no
+pattern is supplied with @option{-e}, the first nonoption on the
+command line is used.  The @command{awk} command-line arguments up to 
@code{ARGV[Optind]}
+are cleared, so that @command{awk} won't try to process them as files.  If no
+files are specified, the standard input is used, and if multiple files are
+specified, we make sure to note this so that the @value{FN}s can precede the
+matched lines in the output:
 
-    # make current file next to get done
-    ARGV[ARGIND+1] = FILENAME
address@hidden
address@hidden file eg/prog/egrep.awk
+    if (pattern == "")
+        pattern = ARGV[Optind++]
 
-    # do it
-    nextfile
+    for (i = 1; i < Optind; i++)
+        ARGV[i] = ""
+    if (Optind >= ARGC) @{
+        ARGV[1] = "-"
+        ARGC = 2
+    @} else if (ARGC - Optind > 1)
+        do_filenames++
+
+#    if (IGNORECASE)
+#        pattern = tolower(pattern)
 @}
 @c endfile
 @end example
 
-This code relies on the @code{ARGIND} variable
-(@pxref{Auto-set}),
-which is specific to @command{gawk}.
-If you are not using
address@hidden, you can use ideas presented in
address@hidden
-the previous @value{SECTION}
address@hidden ifnotinfo
address@hidden
address@hidden Function},
address@hidden ifinfo
-to either update @code{ARGIND} on your own
-or modify this code as appropriate.
-
-The @code{rewind()} function also relies on the @code{nextfile} keyword
-(@pxref{Nextfile Statement}).
+The last two lines are commented out, since they are not needed in
address@hidden  They should be uncommented if you have to use another version
+of @command{awk}.
 
address@hidden File Checking
address@hidden Checking for Readable @value{DDF}s
+The next set of lines should be uncommented if you are not using
address@hidden  This rule translates all the characters in the input line
+into lowercase if the @option{-i} option is address@hidden
+also introduces a subtle bug;
+if a match happens, we output the translated line, not the original.}
+The rule is
+commented out since it is not necessary with @command{gawk}:
 
address@hidden troubleshooting, readable @value{DF}s
address@hidden readable @address@hidden checking
address@hidden files, skipping
-Normally, if you give @command{awk} a @value{DF} that isn't readable,
-it stops with a fatal error.  There are times when you
-might want to just ignore such files and keep going.  You can
-do this by prepending the following program to your @command{awk}
-program:
address@hidden Exercise: Fix this, w/array and new line as key to original line
 
address@hidden @code{readable.awk} program
 @example
address@hidden file eg/lib/readable.awk
-# readable.awk --- library file to skip over unreadable files
address@hidden file eg/prog/egrep.awk
address@hidden
+#    if (IGNORECASE)
+#        $0 = tolower($0)
address@hidden
 @c endfile
address@hidden
address@hidden file eg/lib/readable.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# October 2000
-# December 2010
address@hidden example
+
+The @code{beginfile()} function is called by the rule in @file{ftrans.awk}
+when each new file is processed.  In this case, it is very simple; all it
+does is initialize a variable @code{fcount} to zero. @code{fcount} tracks
+how many lines in the current file matched the pattern.
+Naming the parameter @code{junk} shows we know that @code{beginfile()}
+is called with a parameter, but that we're not interested in its value:
+
address@hidden
address@hidden file eg/prog/egrep.awk
+function beginfile(junk)
address@hidden
+    fcount = 0
address@hidden
 @c endfile
address@hidden ignore
address@hidden file eg/lib/readable.awk
address@hidden example
 
-BEGIN @{
-    for (i = 1; i < ARGC; i++) @{
-        if (ARGV[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/ \
-            || ARGV[i] == "-" || ARGV[i] == "/dev/stdin")
-            continue    # assignment or standard input
-        else if ((getline junk < ARGV[i]) < 0) # unreadable
-            delete ARGV[i]
+The @code{endfile()} function is called after each file has been processed.
+It affects the output only when the user wants a count of the number of lines 
that
+matched.  @code{no_print} is true only if the exit status is desired.
address@hidden is true if line counts are desired.  @command{egrep}
+therefore only prints line counts if printing and counting are enabled.
+The output format must be adjusted depending upon the number of files to
+process.  Finally, @code{fcount} is added to @code{total}, so that we
+know the total number of lines that matched the pattern:
+
address@hidden
address@hidden file eg/prog/egrep.awk
+function endfile(file)
address@hidden
+    if (! no_print && count_only) @{
+        if (do_filenames)
+            print file ":" fcount
         else
-            close(ARGV[i])
+            print fcount
     @}
+
+    total += fcount
 @}
 @c endfile
 @end example
 
address@hidden troubleshooting, @code{getline} function
-This works, because the @code{getline} won't be fatal.
-Removing the element from @code{ARGV} with @code{delete}
-skips the file (since it's no longer in the list).
-See also @ref{ARGC and ARGV}.
+The following rule does most of the work of matching lines. The variable
address@hidden is true if the line matched the pattern. If the user
+wants lines that did not match, the sense of @code{matches} is inverted
+using the @samp{!} operator. @code{fcount} is incremented with the value of
address@hidden, which is either one or zero, depending upon a
+successful or unsuccessful match.  If the line does not match, the
address@hidden statement just moves on to the next record.
 
address@hidden Empty Files
address@hidden Checking For Zero-length Files
+A number of additional tests are made, but they are only done if we
+are not counting lines.  First, if the user only wants exit status
+(@code{no_print} is true), then it is enough to know that @emph{one}
+line in this file matched, and we can skip on to the next file with
address@hidden  Similarly, if we are only printing @value{FN}s, we can
+print the @value{FN}, and then skip to the next file with @code{nextfile}.
+Finally, each line is printed, with a leading @value{FN} and colon
+if necessary:
 
-All known @command{awk} implementations silently skip over zero-length files.
-This is a by-product of @command{awk}'s implicit 
-read-a-record-and-match-against-the-rules loop: when @command{awk}
-tries to read a record from an empty file, it immediately receives an
-end of file indication, closes the file, and proceeds on to the next
-command-line @value{DF}, @emph{without} executing any user-level
address@hidden program code.
address@hidden @code{!} (exclamation point), @code{!} operator
address@hidden exclamation point (@code{!}), @code{!} operator
address@hidden
address@hidden file eg/prog/egrep.awk
address@hidden
+    matches = ($0 ~ pattern)
+    if (invert)
+        matches = ! matches
 
-Using @command{gawk}'s @code{ARGIND} variable
-(@pxref{Built-in Variables}), it is possible to detect when an empty
address@hidden has been skipped.  Similar to the library file presented
-in @ref{Filetrans Function}, the following library file calls a function named
address@hidden()} that the user must provide.  The arguments passed are
-the @value{FN} and the position in @code{ARGV} where it was found:
+    fcount += matches    # 1 or 0
 
address@hidden @code{zerofile.awk} program
address@hidden
address@hidden file eg/lib/zerofile.awk
-# zerofile.awk --- library file to process empty input files
address@hidden endfile
address@hidden
address@hidden file eg/lib/zerofile.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# June 2003
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/zerofile.awk
+    if (! matches)
+        next
 
-BEGIN @{ Argind = 0 @}
+    if (! count_only) @{
+        if (no_print)
+            nextfile
 
-ARGIND > Argind + 1 @{
-    for (Argind++; Argind < ARGIND; Argind++)
-        zerofile(ARGV[Argind], Argind)
+        if (filenames_only) @{
+            print FILENAME
+            nextfile
+        @}
+
+        if (do_filenames)
+            print FILENAME ":" $0
+        else
+            print
+    @}
 @}
address@hidden endfile
address@hidden example
 
-ARGIND != Argind @{ Argind = ARGIND @}
+The @code{END} rule takes care of producing the correct exit status. If
+there are no matches, the exit status is one; otherwise it is zero:
 
-END @{
-    if (ARGIND > Argind)
-        for (Argind++; Argind <= ARGIND; Argind++)
-            zerofile(ARGV[Argind], Argind)
address@hidden
address@hidden file eg/prog/egrep.awk
+END    \
address@hidden
+    if (total == 0)
+        exit 1
+    exit 0
 @}
 @c endfile
 @end example
 
-The user-level variable @code{Argind} allows the @command{awk} program
-to track its progress through @code{ARGV}.  Whenever the program detects
-that @code{ARGIND} is greater than @samp{Argind + 1}, it means that one or
-more empty files were skipped.  The action then calls @code{zerofile()} for
-each such file, incrementing @code{Argind} along the way.
-
-The @samp{Argind != ARGIND} rule simply keeps @code{Argind} up to date
-in the normal case.
+The @code{usage()} function prints a usage message in case of invalid options,
+and then exits:
 
-Finally, the @code{END} rule catches the case of any empty files at
-the end of the command-line arguments.  Note that the test in the
-condition of the @code{for} loop uses the @samp{<=} operator,
-not @samp{<}.
address@hidden
address@hidden file eg/prog/egrep.awk
+function usage(    e)
address@hidden
+    e = "Usage: egrep [-csvil] [-e pat] [files ...]"
+    e = e "\n\tegrep [-csvil] pat [files ...]"
+    print e > "/dev/stderr"
+    exit 1
address@hidden
address@hidden endfile
address@hidden example
 
-As an exercise, you might consider whether this same problem can
-be solved without relying on @command{gawk}'s @code{ARGIND} variable.
+The variable @code{e} is used so that the function fits nicely
+on the printed page.
 
-As a second exercise, revise this code to handle the case where
-an intervening value in @code{ARGV} is a variable assignment.
address@hidden @code{END} pattern, backslash continuation and
address@hidden @code{\} (backslash), continuing lines and
address@hidden backslash (@code{\}), continuing lines and
+Just a note on programming style: you may have noticed that the @code{END}
+rule uses backslash continuation, with the open brace on a line by
+itself.  This is so that it more closely resembles the way functions
+are written.  Many of the examples
+in this @value{CHAPTER}
+use this style. You can decide for yourself if you like writing
+your @code{BEGIN} and @code{END} rules this way
+or not.
address@hidden ENDOFRANGE regexps
address@hidden ENDOFRANGE sfregexp
address@hidden ENDOFRANGE fsregexp
 
address@hidden
-# zerofile2.awk --- same thing, portably
address@hidden Id Program
address@hidden Printing out User Information
 
-BEGIN @{
-    ARGIND = Argind = 0
-    for (i = 1; i < ARGC; i++)
-        Fnames[ARGV[i]]++
address@hidden printing, user information
address@hidden users, information about, printing
address@hidden @command{id} utility
+The @command{id} utility lists a user's real and effective user ID numbers,
+real and effective group ID numbers, and the user's group set, if any.
address@hidden only prints the effective user ID and group ID if they are
+different from the real ones.  If possible, @command{id} also supplies the
+corresponding user and group names.  The output might look like this:
 
address@hidden
-FNR == 1 @{
-    while (ARGV[ARGIND] != FILENAME)
-        ARGIND++
-    Seen[FILENAME]++
-    if (Seen[FILENAME] == Fnames[FILENAME])
-        do
-            ARGIND++
-        while (ARGV[ARGIND] != FILENAME)
address@hidden
-ARGIND > Argind + 1 @{
-    for (Argind++; Argind < ARGIND; Argind++)
-        zerofile(ARGV[Argind], Argind)
address@hidden
-ARGIND != Argind @{
-    Argind = ARGIND
address@hidden
-END @{
-    if (ARGIND < ARGC - 1)
-        ARGIND = ARGC - 1 
-    if (ARGIND > Argind)
-        for (Argind++; Argind <= ARGIND; Argind++)
-            zerofile(ARGV[Argind], Argind)
address@hidden
address@hidden ignore
address@hidden
+$ @kbd{id}
address@hidden uid=500(arnold) gid=500(arnold) groups=6(disk),7(lp),19(floppy)
address@hidden example
 
address@hidden Ignoring Assigns
address@hidden Treating Assignments as @value{FFN}s
address@hidden @code{PROCINFO} array
+This information is part of what is provided by @command{gawk}'s
address@hidden array (@pxref{Built-in Variables}).
+However, the @command{id} utility provides a more palatable output than just
+individual numbers.
 
address@hidden assignments as filenames
address@hidden filenames, assignments as
-Occasionally, you might not want @command{awk} to process command-line
-variable assignments
-(@pxref{Assignment Options}).
-In particular, if you have a @value{FN} that contain an @samp{=} character,
address@hidden treats the @value{FN} as an assignment, and does not process it.
+Here is a simple version of @command{id} written in @command{awk}.
+It uses the user database library functions
+(@pxref{Passwd Functions})
+and the group database library functions
+(@pxref{Group Functions}):
 
-Some users have suggested an additional command-line option for @command{gawk}
-to disable command-line assignments.  However, some simple programming with
-a library file does the trick:
+The program is fairly straightforward.  All the work is done in the
address@hidden rule.  The user and group ID numbers are obtained from
address@hidden
+The code is repetitive.  The entry in the user database for the real user ID
+number is split into parts at the @samp{:}. The name is the first field.
+Similar code is used for the effective user ID number and the group
+numbers:
 
address@hidden @code{noassign.awk} program
address@hidden @code{id.awk} program
 @example
address@hidden file eg/lib/noassign.awk
-# noassign.awk --- library file to avoid the need for a
-# special option that disables command-line assignments
address@hidden file eg/prog/id.awk
+# id.awk --- implement id in awk
+#
+# Requires user and group library functions
 @c endfile
 @ignore
address@hidden file eg/lib/noassign.awk
address@hidden file eg/prog/id.awk
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
-# October 1999
+# May 1993
+# Revised February 1996
+
 @c endfile
 @end ignore
address@hidden file eg/lib/noassign.awk
address@hidden file eg/prog/id.awk
+# output is:
+# uid=12(foo) euid=34(bar) gid=3(baz) \
+#             egid=5(blat) groups=9(nine),2(two),1(one)
 
-function disable_assigns(argc, argv,    i)
address@hidden
+BEGIN    \
 @{
-    for (i = 1; i < argc; i++)
-        if (argv[i] ~ /^[[:alpha:]_][[:alnum:]_]*=.*/)
-            argv[i] = ("./" argv[i])
address@hidden
-
-BEGIN @{
-    if (No_command_assign)
-        disable_assigns(ARGC, ARGV)
address@hidden
address@hidden endfile
address@hidden example
-
-You then run your program this way:
-
address@hidden
-awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk *
address@hidden example
-
-The function works by looping through the arguments.
-It prepends @samp{./} to
-any argument that matches the form
-of a variable assignment, turning that argument into a @value{FN}.
-
-The use of @code{No_command_assign} allows you to disable command-line
-assignments at invocation time, by giving the variable a true value.
-When not set, it is initially zero (i.e., false), so the command-line arguments
-are left alone.
address@hidden ENDOFRANGE dataf
address@hidden ENDOFRANGE flibdataf
address@hidden ENDOFRANGE libfdataf
+    uid = PROCINFO["uid"]
+    euid = PROCINFO["euid"]
+    gid = PROCINFO["gid"]
+    egid = PROCINFO["egid"]
address@hidden group
 
address@hidden Getopt Function
address@hidden Processing Command-Line Options
+    printf("uid=%d", uid)
+    pw = getpwuid(uid)
+    if (pw != "") @{
+        split(pw, a, ":")
+        printf("(%s)", a[1])
+    @}
 
address@hidden STARTOFRANGE libfclo
address@hidden libraries of @command{awk} functions, command-line options
address@hidden STARTOFRANGE flibclo
address@hidden functions, library, command-line options
address@hidden STARTOFRANGE clop
address@hidden command-line options, processing
address@hidden STARTOFRANGE oclp
address@hidden options, command-line, processing
address@hidden STARTOFRANGE clibf
address@hidden functions, library, C library
address@hidden arguments, processing
-Most utilities on POSIX compatible systems take options on
-the command line that can be used to change the way a program behaves.
address@hidden is an example of such a program
-(@pxref{Options}).
-Often, options take @dfn{arguments}; i.e., data that the program needs to
-correctly obey the command-line option.  For example, @command{awk}'s
address@hidden option requires a string to use as the field separator.
-The first occurrence on the command line of either @option{--} or a
-string that does not begin with @samp{-} ends the options.
+    if (euid != uid) @{
+        printf(" euid=%d", euid)
+        pw = getpwuid(euid)
+        if (pw != "") @{
+            split(pw, a, ":")
+            printf("(%s)", a[1])
+        @}
+    @}
 
address@hidden @code{getopt()} function (C library)
-Modern Unix systems provide a C function named @code{getopt()} for processing
-command-line arguments.  The programmer provides a string describing the
-one-letter options. If an option requires an argument, it is followed in the
-string with a colon.  @code{getopt()} is also passed the
-count and values of the command-line arguments and is called in a loop.
address@hidden()} processes the command-line arguments for option letters.
-Each time around the loop, it returns a single character representing the
-next option letter that it finds, or @samp{?} if it finds an invalid option.
-When it returns @minus{}1, there are no options left on the command line.
+    printf(" gid=%d", gid)
+    pw = getgrgid(gid)
+    if (pw != "") @{
+        split(pw, a, ":")
+        printf("(%s)", a[1])
+    @}
 
-When using @code{getopt()}, options that do not take arguments can be
-grouped together.  Furthermore, options that take arguments require that the
-argument be present.  The argument can immediately follow the option letter,
-or it can be a separate command-line argument.
+    if (egid != gid) @{
+        printf(" egid=%d", egid)
+        pw = getgrgid(egid)
+        if (pw != "") @{
+            split(pw, a, ":")
+            printf("(%s)", a[1])
+        @}
+    @}
 
-Given a hypothetical program that takes
-three command-line options, @option{-a}, @option{-b}, and @option{-c}, where
address@hidden requires an argument, all of the following are valid ways of
-invoking the program:
+    for (i = 1; ("group" i) in PROCINFO; i++) @{
+        if (i == 1)
+            printf(" groups=")
+        group = PROCINFO["group" i]
+        printf("%d", group)
+        pw = getgrgid(group)
+        if (pw != "") @{
+            split(pw, a, ":")
+            printf("(%s)", a[1])
+        @}
+        if (("group" (i+1)) in PROCINFO)
+            printf(",")
+    @}
 
address@hidden
-prog -a -b foo -c data1 data2 data3
-prog -ac -bfoo -- data1 data2 data3
-prog -acbfoo data1 data2 data3
+    print ""
address@hidden
address@hidden endfile
 @end example
 
-Notice that when the argument is grouped with its option, the rest of
-the argument is considered to be the option's argument.
-In this example, @option{-acbfoo} indicates that all of the
address@hidden, @option{-b}, and @option{-c} options were supplied,
-and that @samp{foo} is the argument to the @option{-b} option.
address@hidden @code{in} operator
+The test in the @code{for} loop is worth noting.
+Any supplementary groups in the @code{PROCINFO} array have the
+indices @code{"group1"} through @code{"address@hidden"} for some
address@hidden, i.e., the total number of supplementary groups.
+However, we don't know in advance how many of these groups
+there are.
 
address@hidden()} provides four external variables that the programmer can use:
+This loop works by starting at one, concatenating the value with
address@hidden"group"}, and then using @code{in} to see if that value is
+in the array.  Eventually, @code{i} is incremented past
+the last group in the array and the loop exits.
 
address@hidden @code
address@hidden optind
-The index in the argument value array (@code{argv}) where the first
-nonoption command-line argument can be found.
+The loop is also correct if there are @emph{no} supplementary
+groups; then the condition is false the first time it's
+tested, and the loop body never executes.
 
address@hidden optarg
-The string value of the argument to an option.
address@hidden exercise!!!
address@hidden
+The POSIX version of @command{id} takes arguments that control which
+information is printed.  Modify this version to accept the same
+arguments and perform in the same way.
address@hidden ignore
 
address@hidden opterr
-Usually @code{getopt()} prints an error message when it finds an invalid
-option.  Setting @code{opterr} to zero disables this feature.  (An
-application might want to print its own error message.)
address@hidden Split Program
address@hidden Splitting a Large File into Pieces
 
address@hidden optopt
-The letter representing the command-line option.
address@hidden While not usually documented, most versions supply this variable.
address@hidden table
address@hidden FIXME: One day, update to current POSIX version of split
 
-The following C fragment shows how @code{getopt()} might process command-line
-arguments for @command{awk}:
address@hidden STARTOFRANGE filspl
address@hidden files, splitting
address@hidden @code{split} utility
+The @command{split} program splits large text files into smaller pieces.
+Usage is as follows:@footnote{This is the traditional usage. The
+POSIX usage is different, but not relevant for what the program
+aims to demonstrate.}
 
 @example
-int
-main(int argc, char *argv[])
address@hidden
-    @dots{}
-    /* print our own message */
-    opterr = 0;
-    while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) @{
-        switch (c) @{
-        case 'f':    /* file */
-            @dots{}
-            break;
-        case 'F':    /* field separator */
-            @dots{}
-            break;
-        case 'v':    /* variable assignment */
-            @dots{}
-            break;
-        case 'W':    /* extension */
-            @dots{}
-            break;
-        case '?':
-        default:
-            usage();
-            break;
-        @}
-    @}
-    @dots{}
address@hidden
+split @address@hidden@r{]} file @r{[} @var{prefix} @r{]}
 @end example
 
-As a side point, @command{gawk} actually uses the GNU @code{getopt_long()}
-function to process both normal and GNU-style long options
-(@pxref{Options}).
-
-The abstraction provided by @code{getopt()} is very useful and is quite
-handy in @command{awk} programs as well.  Following is an @command{awk}
-version of @code{getopt()}.  This function highlights one of the
-greatest weaknesses in @command{awk}, which is that it is very poor at
-manipulating single characters.  Repeated calls to @code{substr()} are
-necessary for accessing individual characters
-(@pxref{String Functions})address@hidden
-function was written before @command{gawk} acquired the ability to
-split strings into single characters using @code{""} as the separator.
-We have left it alone, since using @code{substr()} is more portable.}
address@hidden FIXME: could use split(str, a, "") to do it more easily.
+By default,
+the output files are named @file{xaa}, @file{xab}, and so on. Each file has
+1000 lines in it, with the likely exception of the last file. To change the
+number of lines in each file, supply a number on the command line
+preceded with a minus; e.g., @samp{-500} for files with 500 lines in them
+instead of 1000.  To change the name of the output files to something like
address@hidden, @file{myfileab}, and so on, supply an additional
+argument that specifies the @value{FN} prefix.
 
-The discussion that follows walks through the code a bit at a time:
+Here is a version of @command{split} in @command{awk}. It uses the
address@hidden()} and @code{chr()} functions presented in
address@hidden Functions}.
 
address@hidden @code{getopt()} user-defined function
+The program first sets its defaults, and then tests to make sure there are
+not too many arguments.  It then looks at each argument in turn.  The
+first argument could be a minus sign followed by a number. If it is, this 
happens
+to look like a negative number, so it is made positive, and that is the
+count of lines.  The data @value{FN} is skipped over and the final argument
+is used as the prefix for the output @value{FN}s:
+
address@hidden @code{split.awk} program
 @example
address@hidden file eg/lib/getopt.awk
-# getopt.awk --- Do C library getopt(3) function in awk
address@hidden file eg/prog/split.awk
+# split.awk --- do split in awk
+#
+# Requires ord() and chr() library functions
 @c endfile
 @ignore
address@hidden file eg/lib/getopt.awk
address@hidden file eg/prog/split.awk
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
-#
-# Initial version: March, 1991
-# Revised: May, 1993
+# May 1993
+
 @c endfile
 @end ignore
address@hidden file eg/lib/getopt.awk
address@hidden file eg/prog/split.awk
+# usage: split [-num] [file] [outname]
 
-# External variables:
-#    Optind -- index in ARGV of first nonoption argument
-#    Optarg -- string value of argument to current option
-#    Opterr -- if nonzero, print our own diagnostic
-#    Optopt -- current option letter
+BEGIN @{
+    outfile = "x"    # default
+    count = 1000
+    if (ARGC > 4)
+        usage()
 
-# Returns:
-#    -1     at end of options
-#    "?"    for unrecognized option
-#    <c>    a character representing the current option
+    i = 1
+    if (ARGV[i] ~ /^-[[:digit:]]+$/) @{
+        count = -ARGV[i]
+        ARGV[i] = ""
+        i++
+    @}
+    # test argv in case reading from stdin instead of file
+    if (i in ARGV)
+        i++    # skip data file name
+    if (i in ARGV) @{
+        outfile = ARGV[i]
+        ARGV[i] = ""
+    @}
 
-# Private Data:
-#    _opti  -- index in multi-flag option, e.g., -abc
+    s1 = s2 = "a"
+    out = (outfile s1 s2)
address@hidden
 @c endfile
 @end example
 
-The function starts out with comments presenting
-a list of the global variables it uses,
-what the return values are, what they mean, and any global variables that
-are ``private'' to this library function.  Such documentation is essential
-for any program, and particularly for library functions.
-
-The @code{getopt()} function first checks that it was indeed called with
-a string of options (the @code{options} parameter).  If @code{options}
-has a zero length, @code{getopt()} immediately returns @minus{}1:
+The next rule does most of the work. @code{tcount} (temporary count) tracks
+how many lines have been printed to the output file so far. If it is greater
+than @code{count}, it is time to close the current file and start a new one.
address@hidden and @code{s2} track the current suffixes for the @value{FN}. If
+they are both @samp{z}, the file is just too big.  Otherwise, @code{s1}
+moves to the next letter in the alphabet and @code{s2} starts over again at
address@hidden:
 
address@hidden @code{getopt()} user-defined function
address@hidden else on separate line here for page breaking
 @example
address@hidden file eg/lib/getopt.awk
-function getopt(argc, argv, options,    thisopt, i)
address@hidden file eg/prog/split.awk
 @{
-    if (length(options) == 0)    # no options given
-        return -1
-
+    if (++tcount > count) @{
+        close(out)
+        if (s2 == "z") @{
+            if (s1 == "z") @{
+                printf("split: %s is too large to split\n",
+                       FILENAME) > "/dev/stderr"
+                exit 1
+            @}
+            s1 = chr(ord(s1) + 1)
+            s2 = "a"
+        @}
 @group
-    if (argv[Optind] == "--") @{  # all done
-        Optind++
-        _opti = 0
-        return -1
+        else
+            s2 = chr(ord(s2) + 1)
 @end group
-    @} else if (argv[Optind] !~ /^-[^:[:space:]]/) @{
-        _opti = 0
-        return -1
+        out = (outfile s1 s2)
+        tcount = 1
     @}
+    print > out
address@hidden
 @c endfile
 @end example
 
-The next thing to check for is the end of the options.  A @option{--}
-ends the command-line options, as does any command-line argument that
-does not begin with a @samp{-}.  @code{Optind} is used to step through
-the array of command-line arguments; it retains its value across calls
-to @code{getopt()}, because it is a global variable.
address@hidden Exercise: do this with just awk builtin functions, 
index("abc..."), substr, etc.
 
-The regular expression that is used, @address@hidden/^-[^:[:space:]/}},
-checks for a @samp{-} followed by anything
-that is not whitespace and not a colon.
-If the current command-line argument does not match this pattern,
-it is not an option, and it ends option processing. Continuing on:
address@hidden
+The @code{usage()} function simply prints an error message and exits:
 
 @example
address@hidden file eg/lib/getopt.awk
-    if (_opti == 0)
-        _opti = 2
-    thisopt = substr(argv[Optind], _opti, 1)
-    Optopt = thisopt
-    i = index(options, thisopt)
-    if (i == 0) @{
-        if (Opterr)
-            printf("%c -- invalid option\n",
-                                  thisopt) > "/dev/stderr"
-        if (_opti >= length(argv[Optind])) @{
-            Optind++
-            _opti = 0
-        @} else
-            _opti++
-        return "?"
address@hidden file eg/prog/split.awk
+function usage(   e)
address@hidden
+    e = "usage: split [-num] [file] [outname]"
+    print e > "/dev/stderr"
+    exit 1
address@hidden
address@hidden endfile
address@hidden example
+
address@hidden
+The variable @code{e} is used so that the function
+fits nicely on the
address@hidden
+screen.
address@hidden ifinfo
address@hidden
+page.
address@hidden ifnotinfo
+
+This program is a bit sloppy; it relies on @command{awk} to automatically 
close the last file
+instead of doing it in an @code{END} rule.
+It also assumes that letters are contiguous in the character set,
+which isn't true for EBCDIC systems.
+
address@hidden Exercise: Fix these problems.
address@hidden BFD...
address@hidden ENDOFRANGE filspl
+
address@hidden Tee Program
address@hidden Duplicating Output into Multiple Files
+
address@hidden files, address@hidden duplicating output into
address@hidden output, duplicating into files
address@hidden @code{tee} utility
+The @code{tee} program is known as a ``pipe fitting.''  @code{tee} copies
+its standard input to its standard output and also duplicates it to the
+files named on the command line.  Its usage is as follows:
+
address@hidden
+tee @address@hidden file @dots{}
address@hidden example
+
+The @option{-a} option tells @code{tee} to append to the named files, instead 
of
+truncating them and starting over.
+
+The @code{BEGIN} rule first makes a copy of all the command-line arguments
+into an array named @code{copy}.
address@hidden is not copied, since it is not needed.
address@hidden cannot use @code{ARGV} directly, since @command{awk} attempts to
+process each @value{FN} in @code{ARGV} as input data.
+
address@hidden flag variables
+If the first argument is @option{-a}, then the flag variable
address@hidden is set to true, and both @code{ARGV[1]} and
address@hidden are deleted. If @code{ARGC} is less than two, then no
address@hidden were supplied and @code{tee} prints a usage message and exits.
+Finally, @command{awk} is forced to read the standard input by setting
address@hidden to @code{"-"} and @code{ARGC} to two:
+
address@hidden @code{tee.awk} program
address@hidden
address@hidden file eg/prog/tee.awk
+# tee.awk --- tee in awk
+#
+# Copy standard input to all named output files.
+# Append content if -a option is supplied.
+#
address@hidden endfile
address@hidden
address@hidden file eg/prog/tee.awk
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# May 1993
+# Revised December 1995
+
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/tee.awk
+BEGIN    \
address@hidden
+    for (i = 1; i < ARGC; i++)
+        copy[i] = ARGV[i]
+
+    if (ARGV[1] == "-a") @{
+        append = 1
+        delete ARGV[1]
+        delete copy[1]
+        ARGC--
+    @}
+    if (ARGC < 2) @{
+        print "usage: tee [-a] file ..." > "/dev/stderr"
+        exit 1
     @}
+    ARGV[1] = "-"
+    ARGC = 2
address@hidden
address@hidden endfile
address@hidden example
+
+The following single rule does all the work.  Since there is no pattern, it is
+executed for each line of input.  The body of the rule simply prints the
+line into each file on the command line, and then to the standard output:
+
address@hidden
address@hidden file eg/prog/tee.awk
address@hidden
+    # moving the if outside the loop makes it run faster
+    if (append)
+        for (i in copy)
+            print >> copy[i]
+    else
+        for (i in copy)
+            print > copy[i]
+    print
address@hidden
 @c endfile
 @end example
 
-The @code{_opti} variable tracks the position in the current command-line
-argument (@code{argv[Optind]}).  If multiple options are
-grouped together with one @samp{-} (e.g., @option{-abx}), it is necessary
-to return them to the user one at a time.
-
-If @code{_opti} is equal to zero, it is set to two, which is the index in
-the string of the next character to look at (we skip the @samp{-}, which
-is at position one).  The variable @code{thisopt} holds the character,
-obtained with @code{substr()}.  It is saved in @code{Optopt} for the main
-program to use.
address@hidden
+It is also possible to write the loop this way:
 
-If @code{thisopt} is not in the @code{options} string, then it is an
-invalid option.  If @code{Opterr} is nonzero, @code{getopt()} prints an error
-message on the standard error that is similar to the message from the C
-version of @code{getopt()}.
address@hidden
+for (i in copy)
+    if (append)
+        print >> copy[i]
+    else
+        print > copy[i]
address@hidden example
 
-Because the option is invalid, it is necessary to skip it and move on to the
-next option character.  If @code{_opti} is greater than or equal to the
-length of the current command-line argument, it is necessary to move on
-to the next argument, so @code{Optind} is incremented and @code{_opti} is reset
-to zero. Otherwise, @code{Optind} is left alone and @code{_opti} is merely
-incremented.
address@hidden
+This is more concise but it is also less efficient.  The @samp{if} is
+tested for each record and for each output file.  By duplicating the loop
+body, the @samp{if} is only tested once for each input record.  If there are
address@hidden input records and @var{M} output files, the first method only
+executes @var{N} @samp{if} statements, while the second executes
address@hidden@address@hidden @samp{if} statements.
 
-In any case, because the option is invalid, @code{getopt()} returns @code{"?"}.
-The main program can examine @code{Optopt} if it needs to know what the
-invalid option letter actually is. Continuing on:
+Finally, the @code{END} rule cleans up by closing all the output files:
 
 @example
address@hidden file eg/lib/getopt.awk
-    if (substr(options, i + 1, 1) == ":") @{
-        # get option argument
-        if (length(substr(argv[Optind], _opti + 1)) > 0)
-            Optarg = substr(argv[Optind], _opti + 1)
-        else
-            Optarg = argv[++Optind]
-        _opti = 0
-    @} else
-        Optarg = ""
address@hidden file eg/prog/tee.awk
+END    \
address@hidden
+    for (i in copy)
+        close(copy[i])
address@hidden
 @c endfile
 @end example
 
-If the option requires an argument, the option letter is followed by a colon
-in the @code{options} string.  If there are remaining characters in the
-current command-line argument (@code{argv[Optind]}), then the rest of that
-string is assigned to @code{Optarg}.  Otherwise, the next command-line
-argument is used (@samp{-xFOO} versus @address@hidden FOO}}). In either case,
address@hidden is reset to zero, because there are no more characters left to
-examine in the current command-line argument. Continuing:
address@hidden Uniq Program
address@hidden Printing Nonduplicated Lines of Text
+
address@hidden FIXME: One day, update to current POSIX version of uniq
+
address@hidden STARTOFRANGE prunt
address@hidden printing, unduplicated lines of text
address@hidden STARTOFRANGE tpul
address@hidden address@hidden printing, unduplicated lines of
address@hidden @command{uniq} utility
+The @command{uniq} utility reads sorted lines of data on its standard
+input, and by default removes duplicate lines.  In other words, it only
+prints unique lines---hence the name.  @command{uniq} has a number of
+options. The usage is as follows:
 
 @example
address@hidden file eg/lib/getopt.awk
-    if (_opti == 0 || _opti >= length(argv[Optind])) @{
-        Optind++
-        _opti = 0
-    @} else
-        _opti++
-    return thisopt
address@hidden
address@hidden endfile
+uniq @r{[}-udc @address@hidden@r{]]} @address@hidden@r{]} @r{[} @var{input 
file} @r{[} @var{output file} @r{]]}
 @end example
 
-Finally, if @code{_opti} is either zero or greater than the length of the
-current command-line argument, it means this element in @code{argv} is
-through being processed, so @code{Optind} is incremented to point to the
-next element in @code{argv}.  If neither condition is true, then only
address@hidden is incremented, so that the next option letter can be processed
-on the next call to @code{getopt()}.
+The options for @command{uniq} are:
 
-The @code{BEGIN} rule initializes both @code{Opterr} and @code{Optind} to one.
address@hidden is set to one, since the default behavior is for @code{getopt()}
-to print a diagnostic message upon seeing an invalid option.  @code{Optind}
-is set to one, since there's no reason to look at the program name, which is
-in @code{ARGV[0]}:
address@hidden @code
address@hidden -d
+Print only repeated lines.
 
address@hidden
address@hidden file eg/lib/getopt.awk
-BEGIN @{
-    Opterr = 1    # default is to diagnose
-    Optind = 1    # skip ARGV[0]
address@hidden -u
+Print only nonrepeated lines.
 
-    # test program
-    if (_getopt_test) @{
-        while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
-            printf("c = <%c>, optarg = <%s>\n",
-                                       _go_c, Optarg)
-        printf("non-option arguments:\n")
-        for (; Optind < ARGC; Optind++)
-            printf("\tARGV[%d] = <%s>\n",
-                                    Optind, ARGV[Optind])
-    @}
address@hidden
address@hidden endfile
address@hidden example
address@hidden -c
+Count lines. This option overrides @option{-d} and @option{-u}.  Both repeated
+and nonrepeated lines are counted.
 
-The rest of the @code{BEGIN} rule is a simple test program.  Here is the
-result of two sample runs of the test program:
address@hidden address@hidden
+Skip @var{n} fields before comparing lines.  The definition of fields
+is similar to @command{awk}'s default: nonwhitespace characters separated
+by runs of spaces and/or TABs.
+
address@hidden address@hidden
+Skip @var{n} characters before comparing lines.  Any fields specified with
address@hidden@var{n}} are skipped first.
+
address@hidden @var{input file}
+Data is read from the input file named on the command line, instead of from
+the standard input.
+
address@hidden @var{output file}
+The generated output is sent to the named output file, instead of to the
+standard output.
address@hidden table
+
+Normally @command{uniq} behaves as if both the @option{-d} and
address@hidden options are provided.
+
address@hidden uses the
address@hidden()} library function
+(@pxref{Getopt Function})
+and the @code{join()} library function
+(@pxref{Join Function}).
+
+The program begins with a @code{usage()} function and then a brief outline of
+the options and their meanings in comments.
+The @code{BEGIN} rule deals with the command-line arguments and options. It
+uses a trick to get @code{getopt()} to handle options of the form @samp{-25},
+treating such an option as the option letter @samp{2} with an argument of
address@hidden If indeed two or more digits are supplied (@code{Optarg} looks
+like a number), @code{Optarg} is
+concatenated with the option digit and then the result is added to zero to make
+it into a number.  If there is only one digit in the option, then
address@hidden is not needed. In this case, @code{Optind} must be decremented 
so that
address@hidden()} processes it next time.  This code is admittedly a bit
+tricky.
+
+If no options are supplied, then the default is taken, to print both
+repeated and nonrepeated lines.  The output file, if provided, is assigned
+to @code{outputfile}.  Early on, @code{outputfile} is initialized to the
+standard output, @file{/dev/stdout}:
 
address@hidden @code{uniq.awk} program
 @example
-$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x}
address@hidden c = <a>, optarg = <>
address@hidden c = <c>, optarg = <>
address@hidden c = <b>, optarg = <ARG>
address@hidden non-option arguments:
address@hidden         ARGV[3] = <bax>
address@hidden         ARGV[4] = <-x>
address@hidden file eg/prog/uniq.awk
address@hidden
+# uniq.awk --- do uniq in awk
+#
+# Requires getopt() and join() library functions
address@hidden group
address@hidden endfile
address@hidden
address@hidden file eg/prog/uniq.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# May 1993
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/uniq.awk
 
-$ @kbd{awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc}
address@hidden c = <a>, optarg = <>
address@hidden x -- invalid option
address@hidden c = <?>, optarg = <>
address@hidden non-option arguments:
address@hidden         ARGV[4] = <xyz>
address@hidden         ARGV[5] = <abc>
address@hidden example
+function usage(    e)
address@hidden
+    e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
+    print e > "/dev/stderr"
+    exit 1
address@hidden
 
-In both runs,
-the first @option{--} terminates the arguments to @command{awk}, so that it 
does
-not try to interpret the @option{-a}, etc., as its own options.
+# -c    count lines. overrides -d and -u
+# -d    only repeated lines
+# -u    only nonrepeated lines
+# -n    skip n fields
+# +n    skip n characters, skip fields first
+
+BEGIN   \
address@hidden
+    count = 1
+    outputfile = "/dev/stdout"
+    opts = "udc0:1:2:3:4:5:6:7:8:9:"
+    while ((c = getopt(ARGC, ARGV, opts)) != -1) @{
+        if (c == "u")
+            non_repeated_only++
+        else if (c == "d")
+            repeated_only++
+        else if (c == "c")
+            do_count++
+        else if (index("0123456789", c) != 0) @{
+            # getopt requires args to options
+            # this messes us up for things like -5
+            if (Optarg ~ /^[[:digit:]]+$/)
+                fcount = (c Optarg) + 0
+            else @{
+                fcount = c + 0
+                Optind--
+            @}
+        @} else
+            usage()
+    @}
 
address@hidden NOTE
-After @code{getopt()} is through, it is the responsibility of the user level
-code to
-clear out all the elements of @code{ARGV} from 1 to @code{Optind},
-so that @command{awk} does not try to process the command-line options
-as @value{FN}s.
address@hidden quotation
+    if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) @{
+        charcount = substr(ARGV[Optind], 2) + 0
+        Optind++
+    @}
 
-Several of the sample programs presented in
address@hidden Programs},
-use @code{getopt()} to process their arguments.
address@hidden ENDOFRANGE libfclo
address@hidden ENDOFRANGE flibclo
address@hidden ENDOFRANGE clop
address@hidden ENDOFRANGE oclp
+    for (i = 1; i < Optind; i++)
+        ARGV[i] = ""
 
address@hidden Passwd Functions
address@hidden Reading the User Database
+    if (repeated_only == 0 && non_repeated_only == 0)
+        repeated_only = non_repeated_only = 1
 
address@hidden STARTOFRANGE libfudata
address@hidden libraries of @command{awk} functions, user database, reading
address@hidden STARTOFRANGE flibudata
address@hidden functions, library, user database, reading
address@hidden STARTOFRANGE udatar
address@hidden user address@hidden reading
address@hidden STARTOFRANGE dataur
address@hidden database, address@hidden reading
address@hidden @code{PROCINFO} array
-The @code{PROCINFO} array
-(@pxref{Built-in Variables})
-provides access to the current user's real and effective user and group ID
-numbers, and if available, the user's supplementary group set.
-However, because these are numbers, they do not provide very useful
-information to the average user.  There needs to be some way to find the
-user information associated with the user and group ID numbers.  This
address@hidden presents a suite of functions for retrieving information from the
-user database.  @xref{Group Functions},
-for a similar suite that retrieves information from the group database.
+    if (ARGC - Optind == 2) @{
+        outputfile = ARGV[ARGC - 1]
+        ARGV[ARGC - 1] = ""
+    @}
address@hidden
address@hidden endfile
address@hidden example
 
address@hidden @code{getpwent()} function (C library)
address@hidden @code{getpwent()} user-defined function
address@hidden users, information about, retrieving
address@hidden login information
address@hidden account information
address@hidden password file
address@hidden files, password
-The POSIX standard does not define the file where user information is
-kept.  Instead, it provides the @code{<pwd.h>} header file
-and several C language subroutines for obtaining user information.
-The primary function is @code{getpwent()}, for ``get password entry.''
-The ``password'' comes from the original user database file,
address@hidden/etc/passwd}, which stores user information, along with the
-encrypted passwords (hence the name).
+The following function, @code{are_equal()}, compares the current line,
address@hidden, to the
+previous line, @code{last}.  It handles skipping fields and characters.
+If no field count and no character count are specified, @code{are_equal()}
+simply returns one or zero depending upon the result of a simple string
+comparison of @code{last} and @code{$0}.  Otherwise, things get more
+complicated.
+If fields have to be skipped, each line is broken into an array using
address@hidden()}
+(@pxref{String Functions});
+the desired fields are then joined back into a line using @code{join()}.
+The joined lines are stored in @code{clast} and @code{cline}.
+If no fields are skipped, @code{clast} and @code{cline} are set to
address@hidden and @code{$0}, respectively.
+Finally, if characters are skipped, @code{substr()} is used to strip off the
+leading @code{charcount} characters in @code{clast} and @code{cline}.  The
+two strings are then compared and @code{are_equal()} returns the result:
 
address@hidden @command{pwcat} program
-While an @command{awk} program could simply read @file{/etc/passwd}
-directly, this file may not contain complete information about the
-system's set of address@hidden is often the case that password
-information is stored in a network database.} To be sure you are able to
-produce a readable and complete version of the user database, it is necessary
-to write a small C program that calls @code{getpwent()}.  @code{getpwent()}
-is defined as returning a pointer to a @code{struct passwd}.  Each time it
-is called, it returns the next entry in the database.  When there are
-no more entries, it returns @code{NULL}, the null pointer.  When this
-happens, the C program should call @code{endpwent()} to close the database.
-Following is @command{pwcat}, a C program that ``cats'' the password database:
address@hidden
address@hidden file eg/prog/uniq.awk
+function are_equal(    n, m, clast, cline, alast, aline)
address@hidden
+    if (fcount == 0 && charcount == 0)
+        return (last == $0)
 
address@hidden Use old style function header for portability to old systems 
(SunOS, HP/UX).
+    if (fcount > 0) @{
+        n = split(last, alast)
+        m = split($0, aline)
+        clast = join(alast, fcount+1, n)
+        cline = join(aline, fcount+1, m)
+    @} else @{
+        clast = last
+        cline = $0
+    @}
+    if (charcount) @{
+        clast = substr(clast, charcount + 1)
+        cline = substr(cline, charcount + 1)
+    @}
 
address@hidden
address@hidden file eg/lib/pwcat.c
-/*
- * pwcat.c
- *
- * Generate a printable version of the password database
- */
+    return (clast == cline)
address@hidden
 @c endfile
address@hidden
address@hidden file eg/lib/pwcat.c
-/*
- * Arnold Robbins, arnold@@skeeve.com, May 1993
- * Public Domain
- * December 2010, move to ANSI C definition for main().
- */
address@hidden example
 
-#if HAVE_CONFIG_H
-#include <config.h>
-#endif
+The following two rules are the body of the program.  The first one is
+executed only for the very first line of data.  It sets @code{last} equal to
address@hidden, so that subsequent lines of text have something to be compared 
to.
 
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/pwcat.c
-#include <stdio.h>
-#include <pwd.h>
+The second rule does the work. The variable @code{equal} is one or zero,
+depending upon the results of @code{are_equal()}'s comparison. If 
@command{uniq}
+is counting repeated lines, and the lines are equal, then it increments the 
@code{count} variable.
+Otherwise, it prints the line and resets @code{count},
+since the two lines are not equal.
 
address@hidden endfile
address@hidden
address@hidden file eg/lib/pwcat.c
-#if defined (STDC_HEADERS)
-#include <stdlib.h>
-#endif
+If @command{uniq} is not counting, and if the lines are equal, @code{count} is 
incremented.
+Nothing is printed, since the point is to remove duplicates.
+Otherwise, if @command{uniq} is counting repeated lines and more than
+one line is seen, or if @command{uniq} is counting nonrepeated lines
+and only one line is seen, then the line is printed, and @code{count}
+is reset.
+
+Finally, similar logic is used in the @code{END} rule to print the final
+line of input data:
+
address@hidden
address@hidden file eg/prog/uniq.awk
+NR == 1 @{
+    last = $0
+    next
address@hidden
 
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/pwcat.c
-int
-main(int argc, char **argv)
 @{
-    struct passwd *p;
+    equal = are_equal()
 
-    while ((p = getpwent()) != NULL)
address@hidden endfile
address@hidden
address@hidden file eg/lib/pwcat.c
-#ifdef ZOS_USS
-        printf("%s:%ld:%ld:%s:%s\n",
-            p->pw_name, (long) p->pw_uid,
-            (long) p->pw_gid, p->pw_dir, p->pw_shell);
-#else
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/pwcat.c
-        printf("%s:%s:%ld:%ld:%s:%s:%s\n",
-            p->pw_name, p->pw_passwd, (long) p->pw_uid,
-            (long) p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
address@hidden endfile
address@hidden
address@hidden file eg/lib/pwcat.c
-#endif
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/pwcat.c
+    if (do_count) @{    # overrides -d and -u
+        if (equal)
+            count++
+        else @{
+            printf("%4d %s\n", count, last) > outputfile
+            last = $0
+            count = 1    # reset
+        @}
+        next
+    @}
 
-    endpwent();
-    return 0;
+    if (equal)
+        count++
+    else @{
+        if ((repeated_only && count > 1) ||
+            (non_repeated_only && count == 1))
+                print last > outputfile
+        last = $0
+        count = 1
+    @}
address@hidden
+
+END @{
+    if (do_count)
+        printf("%4d %s\n", count, last) > outputfile
+    else if ((repeated_only && count > 1) ||
+            (non_repeated_only && count == 1))
+        print last > outputfile
+    close(outputfile)
 @}
 @c endfile
 @end example
address@hidden ENDOFRANGE prunt
address@hidden ENDOFRANGE tpul
 
-If you don't understand C, don't worry about it.
-The output from @command{pwcat} is the user database, in the traditional
address@hidden/etc/passwd} format of colon-separated fields.  The fields are:
-
address@hidden @asis
address@hidden Login name
-The user's login name.
address@hidden Wc Program
address@hidden Counting Things
 
address@hidden Encrypted password
-The user's encrypted password.  This may not be available on some systems.
address@hidden FIXME: One day, update to current POSIX version of wc
 
address@hidden User-ID
-The user's numeric user ID number.
-(On some systems it's a C @code{long}, and not an @code{int}.  Thus
-we cast it to @code{long} for all cases.)
address@hidden STARTOFRANGE count
address@hidden counting
address@hidden STARTOFRANGE infco
address@hidden input files, counting elements in
address@hidden STARTOFRANGE woco
address@hidden words, counting
address@hidden STARTOFRANGE chco
address@hidden characters, counting
address@hidden STARTOFRANGE lico
address@hidden lines, counting
address@hidden @command{wc} utility
+The @command{wc} (word count) utility counts lines, words, and characters in
+one or more input files. Its usage is as follows:
 
address@hidden Group-ID
-The user's numeric group ID number.
-(Similar comments about @code{long} vs.@: @code{int} apply here.)
address@hidden
+wc @address@hidden @r{[} @var{files} @dots{} @r{]}
address@hidden example
 
address@hidden Full name
-The user's full name, and perhaps other information associated with the
-user.
+If no files are specified on the command line, @command{wc} reads its standard
+input. If there are multiple files, it also prints total counts for all
+the files.  The options and their meanings are shown in the following list:
 
address@hidden Home directory
-The user's login (or ``home'') directory (familiar to shell programmers as
address@hidden).
address@hidden @code
address@hidden -l
+Count only lines.
 
address@hidden Login shell
-The program that is run when the user logs in.  This is usually a
-shell, such as Bash.
address@hidden -w
+Count only words.
+A ``word'' is a contiguous sequence of nonwhitespace characters, separated
+by spaces and/or TABs.  Luckily, this is the normal way @command{awk} separates
+fields in its input data.
+
address@hidden -c
+Count only characters.
 @end table
 
-A few lines representative of @command{pwcat}'s output are as follows:
+Implementing @command{wc} in @command{awk} is particularly elegant,
+since @command{awk} does a lot of the work for us; it splits lines into
+words (i.e., fields) and counts them, it counts lines (i.e., records),
+and it can easily tell us how long a line is.
 
address@hidden Jacobs, Andrew
address@hidden Robbins, Arnold
address@hidden Robbins, Miriam
address@hidden
-$ @kbd{pwcat}
address@hidden root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
address@hidden nobody:*:65534:65534::/:
address@hidden daemon:*:1:1::/:
address@hidden sys:*:2:2::/:/bin/csh
address@hidden bin:*:3:3::/bin:
address@hidden arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
address@hidden miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
address@hidden andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
address@hidden
address@hidden example
+This program uses the @code{getopt()} library function
+(@pxref{Getopt Function})
+and the file-transition functions
+(@pxref{Filetrans Function}).
 
-With that introduction, following is a group of functions for getting user
-information.  There are several functions here, corresponding to the C
-functions of the same names:
+This version has one notable difference from traditional versions of
address@hidden: it always prints the counts in the order lines, words,
+and characters.  Traditional versions note the order of the @option{-l},
address@hidden, and @option{-c} options on the command line, and print the
+counts in that order.
 
address@hidden @code{_pw_init()} user-defined function
+The @code{BEGIN} rule does the argument processing.  The variable
address@hidden is true if more than one file is named on the
+command line:
+
address@hidden @code{wc.awk} program
 @example
address@hidden file eg/lib/passwdawk.in
-# passwd.awk --- access password file information
address@hidden file eg/prog/wc.awk
+# wc.awk --- count lines, words, characters
 @c endfile
 @ignore
address@hidden file eg/lib/passwdawk.in
address@hidden file eg/prog/wc.awk
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
 # May 1993
-# Revised October 2000
-# Revised December 2010
 @c endfile
 @end ignore
address@hidden file eg/lib/passwdawk.in
address@hidden file eg/prog/wc.awk
 
-BEGIN @{
-    # tailor this to suit your system
-    _pw_awklib = "/usr/local/libexec/awk/"
address@hidden
+# Options:
+#    -l    only count lines
+#    -w    only count words
+#    -c    only count characters
+#
+# Default is to count lines, words, characters
+#
+# Requires getopt() and file transition library functions
 
-function _pw_init(    oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
address@hidden
-    if (_pw_inited)
-        return
+BEGIN @{
+    # let getopt() print a message about
+    # invalid options. we ignore them
+    while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{
+        if (c == "l")
+            do_lines = 1
+        else if (c == "w")
+            do_words = 1
+        else if (c == "c")
+            do_chars = 1
+    @}
+    for (i = 1; i < Optind; i++)
+        ARGV[i] = ""
 
-    oldfs = FS
-    oldrs = RS
-    olddol0 = $0
-    using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
-    using_fpat = (PROCINFO["FS"] == "FPAT")
-    FS = ":"
-    RS = "\n"
+    # if no options, do all
+    if (! do_lines && ! do_words && ! do_chars)
+        do_lines = do_words = do_chars = 1
 
-    pwcat = _pw_awklib "pwcat"
-    while ((pwcat | getline) > 0) @{
-        _pw_byname[$1] = $0
-        _pw_byuid[$3] = $0
-        _pw_bycount[++_pw_total] = $0
-    @}
-    close(pwcat)
-    _pw_count = 0
-    _pw_inited = 1
-    FS = oldfs
-    if (using_fw)
-        FIELDWIDTHS = FIELDWIDTHS
-    else if (using_fpat)
-        FPAT = FPAT
-    RS = oldrs
-    $0 = olddol0
+    print_total = (ARGC - i > 2)
 @}
 @c endfile
 @end example
 
address@hidden @code{BEGIN} pattern, @code{pwcat} program
-The @code{BEGIN} rule sets a private variable to the directory where
address@hidden is stored.  Because it is used to help out an @command{awk} 
library
-routine, we have chosen to put it in @file{/usr/local/libexec/awk};
-however, you might want it to be in a different directory on your system.
-
-The function @code{_pw_init()} keeps three copies of the user information
-in three associative arrays.  The arrays are indexed by username
-(@code{_pw_byname}), by user ID number (@code{_pw_byuid}), and by order of
-occurrence (@code{_pw_bycount}).
-The variable @code{_pw_inited} is used for efficiency, since @code{_pw_init()}
-needs to be called only once.
-
address@hidden @code{getline} command, @code{_pw_init()} function
-Because this function uses @code{getline} to read information from
address@hidden, it first saves the values of @code{FS}, @code{RS}, and 
@code{$0}.
-It notes in the variable @code{using_fw} whether field splitting
-with @code{FIELDWIDTHS} is in effect or not.
-Doing so is necessary, since these functions could be called
-from anywhere within a user's program, and the user may have his
-or her
-own way of splitting records and fields.
-
address@hidden @code{PROCINFO} array
-The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which
-is @code{"FIELDWIDTHS"} if field splitting is being done with
address@hidden  This makes it possible to restore the correct
-field-splitting mechanism later.  The test can only be true for
address@hidden  It is false if using @code{FS} or @code{FPAT},
-or on some other @command{awk} implementation.
-
-The code that checks for using @code{FPAT}, using @code{using_fpat}
-and @code{PROCINFO["FS"]} is similar.
-
-The main part of the function uses a loop to read database lines, split
-the line into fields, and then store the line into each array as necessary.
-When the loop is done, @address@hidden()}} cleans up by closing the pipeline,
-setting @address@hidden to one, and restoring @code{FS}
-(and @code{FIELDWIDTHS} or @code{FPAT}
-if necessary), @code{RS}, and @code{$0}.
-The use of @address@hidden is explained shortly.
-
address@hidden @code{getpwnam()} function (C library)
-The @code{getpwnam()} function takes a username as a string argument. If that
-user is in the database, it returns the appropriate line. Otherwise, it
-relies on the array reference to a nonexistent
-element to create the element with the null string as its value:
+The @code{beginfile()} function is simple; it just resets the counts of lines,
+words, and characters to zero, and saves the current @value{FN} in
address@hidden:
 
address@hidden @code{getpwnam()} user-defined function
 @example
address@hidden
address@hidden file eg/lib/passwdawk.in
-function getpwnam(name)
address@hidden file eg/prog/wc.awk
+function beginfile(file)
 @{
-    _pw_init()
-    return _pw_byname[name]
+    lines = words = chars = 0
+    fname = FILENAME
 @}
 @c endfile
address@hidden group
 @end example
 
address@hidden @code{getpwuid()} function (C library)
-Similarly,
-the @code{getpwuid} function takes a user ID number argument. If that
-user number is in the database, it returns the appropriate line. Otherwise, it
-returns the null string:
+The @code{endfile()} function adds the current file's numbers to the running
+totals of lines, words, and address@hidden@command{wc} can't just use the 
value of
address@hidden in @code{endfile()}. If you examine
+the code in
address@hidden Function},
+you will see that
address@hidden has already been reset by the time
address@hidden()} is called.}  It then prints out those numbers
+for the file that was just read. It relies on @code{beginfile()} to reset the
+numbers for the following @value{DF}:
address@hidden FIXME: ONE DAY: make the above footnote an exercise,
address@hidden instead of giving away the answer.
 
address@hidden @code{getpwuid()} user-defined function
 @example
address@hidden file eg/lib/passwdawk.in
-function getpwuid(uid)
address@hidden file eg/prog/wc.awk
+function endfile(file)
 @{
-    _pw_init()
-    return _pw_byuid[uid]
+    tlines += lines
+    twords += words
+    tchars += chars
+    if (do_lines)
+        printf "\t%d", lines
address@hidden
+    if (do_words)
+        printf "\t%d", words
address@hidden group
+    if (do_chars)
+        printf "\t%d", chars
+    printf "\t%s\n", fname
 @}
 @c endfile
 @end example
 
address@hidden @code{getpwent()} function (C library)
-The @code{getpwent()} function simply steps through the database, one entry at
-a time.  It uses @code{_pw_count} to track its current position in the
address@hidden array:
+There is one rule that is executed for each line. It adds the length of
+the record, plus one, to @address@hidden @command{gawk}
+understands multibyte locales, this code counts characters, not bytes.}
+Adding one plus the record length
+is needed because the newline character separating records (the value
+of @code{RS}) is not part of the record itself, and thus not included
+in its length.  Next, @code{lines} is incremented for each line read,
+and @code{words} is incremented by the value of @code{NF}, which is the
+number of ``words'' on this line:
 
address@hidden @code{getpwent()} user-defined function
 @example
address@hidden file eg/lib/passwdawk.in
-function getpwent()
address@hidden file eg/prog/wc.awk
+# do per line
 @{
-    _pw_init()
-    if (_pw_count < _pw_total)
-        return _pw_bycount[++_pw_count]
-    return ""
+    chars += length($0) + 1    # get newline
+    lines++
+    words += NF
 @}
 @c endfile
 @end example
 
address@hidden @code{endpwent()} function (C library)
-The @address@hidden()}} function resets @address@hidden to zero, so that
-subsequent calls to @code{getpwent()} start over again:
+Finally, the @code{END} rule simply prints the totals for all the files:
 
address@hidden @code{endpwent()} user-defined function
 @example
address@hidden file eg/lib/passwdawk.in
-function endpwent()
address@hidden
-    _pw_count = 0
address@hidden file eg/prog/wc.awk
+END @{
+    if (print_total) @{
+        if (do_lines)
+            printf "\t%d", tlines
+        if (do_words)
+            printf "\t%d", twords
+        if (do_chars)
+            printf "\t%d", tchars
+        print "\ttotal"
+    @}
 @}
 @c endfile
 @end example
address@hidden ENDOFRANGE count
address@hidden ENDOFRANGE infco
address@hidden ENDOFRANGE lico
address@hidden ENDOFRANGE woco
address@hidden ENDOFRANGE chco
address@hidden ENDOFRANGE posimawk
 
-A conscious design decision in this suite is that each subroutine calls
address@hidden@w{_pw_init()}} to initialize the database arrays.
-The overhead of running
-a separate process to generate the user database, and the I/O to scan it,
-are only incurred if the user's main program actually calls one of these
-functions.  If this library file is loaded along with a user's program, but
-none of the routines are ever called, then there is no extra runtime overhead.
-(The alternative is move the body of @address@hidden()}} into a
address@hidden rule, which always runs @command{pwcat}.  This simplifies the
-code but runs an extra process that may never be needed.)
-
-In turn, calling @code{_pw_init()} is not too expensive, because the
address@hidden variable keeps the program from reading the data more than
-once.  If you are worried about squeezing every last cycle out of your
address@hidden program, the check of @code{_pw_inited} could be moved out of
address@hidden()} and duplicated in all the other functions.  In practice,
-this is not necessary, since most @command{awk} programs are I/O-bound,
-and such a change would clutter up the code.
-
-The @command{id} program in @ref{Id Program},
-uses these functions.
address@hidden ENDOFRANGE libfudata
address@hidden ENDOFRANGE flibudata
address@hidden ENDOFRANGE udatar
address@hidden ENDOFRANGE dataur
address@hidden Miscellaneous Programs
address@hidden A Grab Bag of @command{awk} Programs
 
address@hidden Group Functions
address@hidden Reading the Group Database
+This @value{SECTION} is a large ``grab bag'' of miscellaneous programs.
+We hope you find them both interesting and enjoyable.
 
address@hidden STARTOFRANGE libfgdata
address@hidden libraries of @command{awk} functions, group database, reading
address@hidden STARTOFRANGE flibgdata
address@hidden functions, library, group database, reading
address@hidden STARTOFRANGE gdatar
address@hidden group database, reading
address@hidden STARTOFRANGE datagr
address@hidden database, group, reading
address@hidden @code{PROCINFO} array
address@hidden @code{getgrent()} function (C library)
address@hidden @code{getgrent()} user-defined function
address@hidden address@hidden information about
address@hidden account information
address@hidden group file
address@hidden files, group
-Much of the discussion presented in
address@hidden Functions},
-applies to the group database as well.  Although there has traditionally
-been a well-known file (@file{/etc/group}) in a well-known format, the POSIX
-standard only provides a set of C library routines
-(@code{<grp.h>} and @code{getgrent()})
-for accessing the information.
-Even though this file may exist, it may not have
-complete information.  Therefore, as with the user database, it is necessary
-to have a small C program that generates the group database as its output.
address@hidden, a C program that ``cats'' the group database,
-is as follows:
address@hidden
+* Dupword Program::             Finding duplicated words in a document.
+* Alarm Program::               An alarm clock.
+* Translate Program::           A program similar to the @command{tr} utility.
+* Labels Program::              Printing mailing labels.
+* Word Sorting::                A program to produce a word usage count.
+* History Sorting::             Eliminating duplicate entries from a history
+                                file.
+* Extract Program::             Pulling out programs from Texinfo source
+                                files.
+* Simple Sed::                  A Simple Stream Editor.
+* Igawk Program::               A wrapper for @command{awk} that includes
+                                files.
+* Anagram Program::             Finding anagrams from a dictionary.
+* Signature Program::           People do amazing things with too much time on
+                                their hands.
address@hidden menu
 
address@hidden @command{grcat} program
address@hidden
address@hidden file eg/lib/grcat.c
-/*
- * grcat.c
- *
- * Generate a printable version of the group database
- */
address@hidden endfile
address@hidden
address@hidden file eg/lib/grcat.c
-/*
- * Arnold Robbins, arnold@@skeeve.com, May 1993
- * Public Domain
- * December 2010, move to ANSI C definition for main().
- */
address@hidden Dupword Program
address@hidden Finding Duplicated Words in a Document
 
-/* For OS/2, do nothing. */
-#if HAVE_CONFIG_H
-#include <config.h>
-#endif
address@hidden words, address@hidden searching for
address@hidden searching, for words
address@hidden address@hidden searching
+A common error when writing large amounts of prose is to accidentally
+duplicate words.  Typically you will see this in text as something like ``the
+the program does the address@hidden''  When the text is online, often
+the duplicated words occur at the end of one line and the
address@hidden
+the
address@hidden iftex
+beginning of
+another, making them very difficult to spot.
address@hidden as here!
 
-#if defined (STDC_HEADERS)
-#include <stdlib.h>
-#endif
+This program, @file{dupword.awk}, scans through a file one line at a time
+and looks for adjacent occurrences of the same word.  It also saves the last
+word on a line (in the variable @code{prev}) for comparison with the first
+word on the next line.
 
-#ifndef HAVE_GETGRENT
-int main() { return 0; }
-#else
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/grcat.c
-#include <stdio.h>
-#include <grp.h>
address@hidden Texinfo
+The first two statements make sure that the line is all lowercase,
+so that, for example, ``The'' and ``the'' compare equal to each other.
+The next statement replaces nonalphanumeric and nonwhitespace characters
+with spaces, so that punctuation does not affect the comparison either.
+The characters are replaced with spaces so that formatting controls
+don't create nonsense words (e.g., the Texinfo @samp{@@address@hidden@}}
+becomes @samp{codeNF} if punctuation is simply deleted).  The record is
+then resplit into fields, yielding just the actual words on the line,
+and ensuring that there are no empty fields.
 
-int
-main(int argc, char **argv)
address@hidden
-    struct group *g;
-    int i;
+If there are no fields left after removing all the punctuation, the
+current record is skipped.  Otherwise, the program loops through each
+word, comparing it to the previous one:
 
-    while ((g = getgrent()) != NULL) @{
address@hidden endfile
address@hidden
address@hidden file eg/lib/grcat.c
-#ifdef ZOS_USS
-        printf("%s:%ld:", g->gr_name, (long) g->gr_gid);
-#else
address@hidden endfile
address@hidden ignore
address@hidden file eg/lib/grcat.c
-        printf("%s:%s:%ld:", g->gr_name, g->gr_passwd,
-                                     (long) g->gr_gid);
address@hidden @code{dupword.awk} program
address@hidden
address@hidden file eg/prog/dupword.awk
+# dupword.awk --- find duplicate words in text
 @c endfile
 @ignore
address@hidden file eg/lib/grcat.c
-#endif
address@hidden file eg/prog/dupword.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# December 1991
+# Revised October 2000
+
 @c endfile
 @end ignore
address@hidden file eg/lib/grcat.c
-        for (i = 0; g->gr_mem[i] != NULL; i++) @{
-            printf("%s", g->gr_mem[i]);
address@hidden
-            if (g->gr_mem[i+1] != NULL)
-                putchar(',');
-        @}
address@hidden group
-        putchar('\n');
-    @}
-    endgrent();
-    return 0;
address@hidden file eg/prog/dupword.awk
address@hidden
+    $0 = tolower($0)
+    gsub(/[^[:alnum:][:blank:]]/, " ");
+    $0 = $0         # re-split
+    if (NF == 0)
+        next
+    if ($1 == prev)
+        printf("%s:%d: duplicate %s\n",
+            FILENAME, FNR, $1)
+    for (i = 2; i <= NF; i++)
+        if ($i == $(i-1))
+            printf("%s:%d: duplicate %s\n",
+                FILENAME, FNR, $i)
+    prev = $NF
 @}
 @c endfile
address@hidden
address@hidden file eg/lib/grcat.c
-#endif /* HAVE_GETGRENT */
address@hidden endfile
address@hidden ignore
 @end example
 
-Each line in the group database represents one group.  The fields are
-separated with colons and represent the following information:
-
address@hidden @asis
address@hidden Group Name
-The group's name.
-
address@hidden Group Password
-The group's encrypted password. In practice, this field is never used;
-it is usually empty or set to @samp{*}.
-
address@hidden Group ID Number
-The group's numeric group ID number;
-this number must be unique within the file.
-(On some systems it's a C @code{long}, and not an @code{int}.  Thus
-we cast it to @code{long} for all cases.)
-
address@hidden Group Member List
-A comma-separated list of user names.  These users are members of the group.
-Modern Unix systems allow users to be members of several groups
-simultaneously.  If your system does, then there are elements
address@hidden"group1"} through @code{"address@hidden"} in @code{PROCINFO}
-for those group ID numbers.
-(Note that @code{PROCINFO} is a @command{gawk} extension;
address@hidden Variables}.)
address@hidden table
address@hidden Alarm Program
address@hidden An Alarm Clock Program
address@hidden insomnia, cure for
address@hidden Robbins, Arnold
address@hidden
address@hidden cures insomnia like a ringing alarm address@hidden
+Arnold Robbins
address@hidden quotation
 
-Here is what running @command{grcat} might produce:
address@hidden STARTOFRANGE tialarm
address@hidden time, alarm clock example program
address@hidden STARTOFRANGE alaex
address@hidden alarm clock example program
+The following program is a simple ``alarm clock'' program.
+You give it a time of day and an optional message.  At the specified time,
+it prints the message on the standard output. In addition, you can give it
+the number of times to repeat the message as well as a delay between
+repetitions.
 
address@hidden
-$ @kbd{grcat}
address@hidden wheel:*:0:arnold
address@hidden nogroup:*:65534:
address@hidden daemon:*:1:
address@hidden kmem:*:2:
address@hidden staff:*:10:arnold,miriam,andy
address@hidden other:*:20:
address@hidden
address@hidden example
+This program uses the @code{getlocaltime()} function from
address@hidden Function}.
 
-Here are the functions for obtaining information from the group database.
-There are several, modeled after the C library functions of the same names:
+All the work is done in the @code{BEGIN} rule.  The first part is argument
+checking and setting of defaults: the delay, the count, and the message to
+print.  If the user supplied a message without the ASCII BEL
+character (known as the ``alert'' character, @code{"\a"}), then it is added to
+the message.  (On many systems, printing the ASCII BEL generates an
+audible alert. Thus when the alarm goes off, the system calls attention
+to itself in case the user is not looking at the computer.)
+Just for a change, this program uses a @code{switch} statement
+(@pxref{Switch Statement}), but the processing could be done with a series of
address@hidden@code{else} statements instead.
+Here is the program:
 
address@hidden @code{getline} command, @code{_gr_init()} user-defined function
address@hidden @code{_gr_init()} user-defined function
address@hidden @code{alarm.awk} program
 @example
address@hidden file eg/lib/groupawk.in
-# group.awk --- functions for dealing with the group file
address@hidden file eg/prog/alarm.awk
+# alarm.awk --- set an alarm
+#
+# Requires getlocaltime() library function
 @c endfile
 @ignore
address@hidden file eg/lib/groupawk.in
address@hidden file eg/prog/alarm.awk
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
 # May 1993
-# Revised October 2000
 # Revised December 2010
+
 @c endfile
 @end ignore
address@hidden line break on _gr_init for smallbook
address@hidden file eg/lib/groupawk.in
address@hidden file eg/prog/alarm.awk
+# usage: alarm time [ "message" [ count [ delay ] ] ]
 
 BEGIN    \
 @{
-    # Change to suit your system
-    _gr_awklib = "/usr/local/libexec/awk/"
address@hidden
+    # Initial argument sanity checking
+    usage1 = "usage: alarm time ['message' [count [delay]]]"
+    usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
 
-function _gr_init(    oldfs, oldrs, olddol0, grcat,
-                             using_fw, using_fpat, n, a, i)
address@hidden
-    if (_gr_inited)
-        return
+    if (ARGC < 2) @{
+        print usage1 > "/dev/stderr"
+        print usage2 > "/dev/stderr"
+        exit 1
+    @}
+    switch (ARGC) @{
+    case 5:
+        delay = ARGV[4] + 0
+        # fall through
+    case 4:
+        count = ARGV[3] + 0
+        # fall through
+    case 3:
+        message = ARGV[2]
+        break
+    default:
+        if (ARGV[1] !~ /[[:digit:]]?[[:digit:]]:[[:digit:address@hidden@}/) @{
+            print usage1 > "/dev/stderr"
+            print usage2 > "/dev/stderr"
+            exit 1
+        @}
+        break
+    @}
 
-    oldfs = FS
-    oldrs = RS
-    olddol0 = $0
-    using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
-    using_fpat = (PROCINFO["FS"] == "FPAT")
-    FS = ":"
-    RS = "\n"
+    # set defaults for once we reach the desired time
+    if (delay == 0)
+        delay = 180    # 3 minutes
address@hidden
+    if (count == 0)
+        count = 5
address@hidden group
+    if (message == "")
+        message = sprintf("\aIt is now %s!\a", ARGV[1])
+    else if (index(message, "\a") == 0)
+        message = "\a" message "\a"
address@hidden endfile
address@hidden example
 
-    grcat = _gr_awklib "grcat"
-    while ((grcat | getline) > 0) @{
-        if ($1 in _gr_byname)
-            _gr_byname[$1] = _gr_byname[$1] "," $4
-        else
-            _gr_byname[$1] = $0
-        if ($3 in _gr_bygid)
-            _gr_bygid[$3] = _gr_bygid[$3] "," $4
-        else
-            _gr_bygid[$3] = $0
+The next @value{SECTION} of code turns the alarm time into hours and minutes,
+converts it (if necessary) to a 24-hour clock, and then turns that
+time into a count of the seconds since midnight.  Next it turns the current
+time into a count of seconds since midnight.  The difference between the two
+is how long to wait before setting off the alarm:
 
-        n = split($4, a, "[ \t]*,[ \t]*")
-        for (i = 1; i <= n; i++)
-            if (a[i] in _gr_groupsbyuser)
-                _gr_groupsbyuser[a[i]] = \
-                    _gr_groupsbyuser[a[i]] " " $1
-            else
-                _gr_groupsbyuser[a[i]] = $1
address@hidden
address@hidden file eg/prog/alarm.awk
+    # split up alarm time
+    split(ARGV[1], atime, ":")
+    hour = atime[1] + 0    # force numeric
+    minute = atime[2] + 0  # force numeric
 
-        _gr_bycount[++_gr_count] = $0
+    # get current broken down time
+    getlocaltime(now)
+
+    # if time given is 12-hour hours and it's after that
+    # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
+    # then add 12 to real hour
+    if (hour < 12 && now["hour"] > hour)
+        hour += 12
+
+    # set target time in seconds since midnight
+    target = (hour * 60 * 60) + (minute * 60)
+
+    # get current time in seconds since midnight
+    current = (now["hour"] * 60 * 60) + \
+               (now["minute"] * 60) + now["second"]
+
+    # how long to sleep for
+    naptime = target - current
+    if (naptime <= 0) @{
+        print "time is in the past!" > "/dev/stderr"
+        exit 1
     @}
-    close(grcat)
-    _gr_count = 0
-    _gr_inited++
-    FS = oldfs
-    if (using_fw)
-        FIELDWIDTHS = FIELDWIDTHS
-    else if (using_fpat)
-        FPAT = FPAT
-    RS = oldrs
-    $0 = olddol0
address@hidden endfile
address@hidden example
+
address@hidden @command{sleep} utility
+Finally, the program uses the @code{system()} function
+(@pxref{I/O Functions})
+to call the @command{sleep} utility.  The @command{sleep} utility simply pauses
+for the given number of seconds.  If the exit status is not zero,
+the program assumes that @command{sleep} was interrupted and exits. If
address@hidden exited with an OK status (zero), then the program prints the
+message in a loop, again using @command{sleep} to delay for however many
+seconds are necessary:
+
address@hidden
address@hidden file eg/prog/alarm.awk
+    # zzzzzz..... go away if interrupted
+    if (system(sprintf("sleep %d", naptime)) != 0)
+        exit 1
+
+    # time to notify!
+    command = sprintf("sleep %d", delay)
+    for (i = 1; i <= count; i++) @{
+        print message
+        # if sleep command interrupted, go away
+        if (system(command) != 0)
+            break
+    @}
+
+    exit 0
 @}
 @c endfile
 @end example
address@hidden ENDOFRANGE tialarm
address@hidden ENDOFRANGE alaex
 
-The @code{BEGIN} rule sets a private variable to the directory where
address@hidden is stored.  Because it is used to help out an @command{awk} 
library
-routine, we have chosen to put it in @file{/usr/local/libexec/awk}.  You might
-want it to be in a different directory on your system.
address@hidden Translate Program
address@hidden Transliterating Characters
 
-These routines follow the same general outline as the user database routines
-(@pxref{Passwd Functions}).
-The @address@hidden variable is used to
-ensure that the database is scanned no more than once.
-The @address@hidden()}} function first saves @code{FS},
address@hidden, and
address@hidden, and then sets @code{FS} and @code{RS} to the correct values for
-scanning the group information.
-It also takes care to note whether @code{FIELDWIDTHS} or @code{FPAT}
-is being used, and to restore the appropriate field splitting mechanism.
address@hidden STARTOFRANGE chtra
address@hidden characters, transliterating
address@hidden @command{tr} utility
+The system @command{tr} utility transliterates characters.  For example, it is
+often used to map uppercase letters into lowercase for further processing:
+
address@hidden
address@hidden data} | tr 'A-Z' 'a-z' | @var{process data} @dots{}
address@hidden example
+
address@hidden requires two lists of address@hidden some older
+systems,
address@hidden ORA
+including Solaris,
address@hidden ifset
address@hidden may require that the lists be written as
+range expressions enclosed in square brackets (@samp{[a-z]}) and quoted,
+to prevent the shell from attempting a @value{FN} expansion.  This is
+not a feature.}  When processing the input, the first character in the
+first list is replaced with the first character in the second list,
+the second character in the first list is replaced with the second
+character in the second list, and so on.  If there are more characters
+in the ``from'' list than in the ``to'' list, the last character of the
+``to'' list is used for the remaining characters in the ``from'' list.
+
+Some time ago,
address@hidden early or mid-1989!
+a user proposed that a transliteration function should
+be added to @command{gawk}.
address@hidden Wishing to avoid gratuitous new features,
address@hidden at least theoretically
+The following program was written to
+prove that character transliteration could be done with a user-level
+function.  This program is not as complete as the system @command{tr} utility
+but it does most of the job.
+
+The @command{translate} program demonstrates one of the few weaknesses
+of standard @command{awk}: dealing with individual characters is very
+painful, requiring repeated use of the @code{substr()}, @code{index()},
+and @code{gsub()} built-in functions
+(@pxref{String Functions})address@hidden
+program was written before @command{gawk} acquired the ability to
+split each character in a string into separate array elements.}
address@hidden Exercise: How might you use this new feature to simplify the 
program?
+There are two functions.  The first, @code{stranslate()}, takes three
+arguments:
 
-The group information is stored is several associative arrays.
-The arrays are indexed by group name (@address@hidden), by group ID number
-(@address@hidden), and by position in the database (@address@hidden).
-There is an additional array indexed by user name (@address@hidden),
-which is a space-separated list of groups to which each user belongs.
address@hidden @code
address@hidden from
+A list of characters from which to translate.
 
-Unlike the user database, it is possible to have multiple records in the
-database for the same group.  This is common when a group has a large number
-of members.  A pair of such entries might look like the following:
address@hidden to
+A list of characters to which to translate.
 
address@hidden
-tvpeople:*:101:johnny,jay,arsenio
-tvpeople:*:101:david,conan,tom,joan
address@hidden example
address@hidden target
+The string on which to do the translation.
address@hidden table
 
-For this reason, @code{_gr_init()} looks to see if a group name or
-group ID number is already seen.  If it is, then the user names are
-simply concatenated onto the previous list of users.  (There is actually a
-subtle problem with the code just presented.  Suppose that
-the first time there were no names. This code adds the names with
-a leading comma. It also doesn't check that there is a @code{$4}.)
+Associative arrays make the translation part fairly easy. @code{t_ar} holds
+the ``to'' characters, indexed by the ``from'' characters.  Then a simple
+loop goes through @code{from}, one character at a time.  For each character
+in @code{from}, if the character appears in @code{target},
+it is replaced with the corresponding @code{to} character.
 
-Finally, @code{_gr_init()} closes the pipeline to @command{grcat}, restores
address@hidden (and @code{FIELDWIDTHS} or @code{FPAT} if necessary), @code{RS}, 
and @code{$0},
-initializes @code{_gr_count} to zero
-(it is used later), and makes @code{_gr_inited} nonzero.
+The @code{translate()} function simply calls @code{stranslate()} using 
@code{$0}
+as the target.  The main program sets two global variables, @code{FROM} and
address@hidden, from the command line, and then changes @code{ARGV} so that
address@hidden reads from the standard input.
 
address@hidden @code{getgrnam()} function (C library)
-The @code{getgrnam()} function takes a group name as its argument, and if that
-group exists, it is returned.
-Otherwise, it
-relies on the array reference to a nonexistent
-element to create the element with the null string as its value:
+Finally, the processing rule simply calls @code{translate()} for each record:
 
address@hidden @code{getgrnam()} user-defined function
address@hidden @code{translate.awk} program
 @example
address@hidden file eg/lib/groupawk.in
-function getgrnam(group)
address@hidden
-    _gr_init()
-    return _gr_byname[group]
address@hidden
address@hidden file eg/prog/translate.awk
+# translate.awk --- do tr-like stuff
 @c endfile
address@hidden example
address@hidden
address@hidden file eg/prog/translate.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# August 1989
+# February 2009 - bug fix
 
address@hidden @code{getgrgid()} function (C library)
-The @code{getgrgid()} function is similar; it takes a numeric group ID and
-looks up the information associated with that group ID:
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/translate.awk
+# Bugs: does not handle things like: tr A-Z a-z, it has
+# to be spelled out. However, if `to' is shorter than `from',
+# the last character in `to' is used for the rest of `from'.
 
address@hidden @code{getgrgid()} user-defined function
address@hidden
address@hidden file eg/lib/groupawk.in
-function getgrgid(gid)
+function stranslate(from, to, target,     lf, lt, ltarget, t_ar, i, c,
+                                                               result)
 @{
-    _gr_init()
-    return _gr_bygid[gid]
+    lf = length(from)
+    lt = length(to)
+    ltarget = length(target)
+    for (i = 1; i <= lt; i++)
+        t_ar[substr(from, i, 1)] = substr(to, i, 1)
+    if (lt < lf)
+        for (; i <= lf; i++)
+            t_ar[substr(from, i, 1)] = substr(to, lt, 1)
+    for (i = 1; i <= ltarget; i++) @{
+        c = substr(target, i, 1)
+        if (c in t_ar)
+            c = t_ar[c]
+        result = result c
+    @}
+    return result
 @}
address@hidden endfile
address@hidden example
 
address@hidden @code{getgruser()} function (C library)
-The @code{getgruser()} function does not have a C counterpart. It takes a
-user name and returns the list of groups that have the user as a member:
+function translate(from, to)
address@hidden
+    return $0 = stranslate(from, to, $0)
address@hidden
+
+# main program
+BEGIN @{
address@hidden
+    if (ARGC < 3) @{
+        print "usage: translate from to" > "/dev/stderr"
+        exit
+    @}
address@hidden group
+    FROM = ARGV[1]
+    TO = ARGV[2]
+    ARGC = 2
+    ARGV[1] = "-"
address@hidden
 
address@hidden @code{getgruser()} function, user-defined
address@hidden
address@hidden file eg/lib/groupawk.in
-function getgruser(user)
 @{
-    _gr_init()
-    return _gr_groupsbyuser[user]
+    translate(FROM, TO)
+    print
 @}
 @c endfile
 @end example
 
address@hidden @code{getgrent()} function (C library)
-The @code{getgrent()} function steps through the database one entry at a time.
-It uses @code{_gr_count} to track its position in the list:
+While it is possible to do character transliteration in a user-level
+function, it is not necessarily efficient, and we (the @command{gawk}
+authors) started to consider adding a built-in function.  However,
+shortly after writing this program, we learned that the System V Release 4
address@hidden had added the @code{toupper()} and @code{tolower()} functions
+(@pxref{String Functions}).
+These functions handle the vast majority of the
+cases where character transliteration is necessary, and so we chose to
+simply add those functions to @command{gawk} as well and then leave well
+enough alone.
+
+An obvious improvement to this program would be to set up the
address@hidden array only once, in a @code{BEGIN} rule. However, this
+assumes that the ``from'' and ``to'' lists
+will never change throughout the lifetime of the program.
address@hidden ENDOFRANGE chtra
+
address@hidden Labels Program
address@hidden Printing Mailing Labels
+
address@hidden STARTOFRANGE prml
address@hidden printing, mailing labels
address@hidden STARTOFRANGE mlprint
address@hidden mailing address@hidden printing
+Here is a ``real world''@footnote{``Real world'' is defined as
+``a program actually used to get something done.''}
+program.  This
+script reads lists of names and
+addresses and generates mailing labels.  Each page of labels has 20 labels
+on it, two across and 10 down.  The addresses are guaranteed to be no more
+than five lines of data.  Each address is separated from the next by a blank
+line.
+
+The basic idea is to read 20 labels worth of data.  Each line of each label
+is stored in the @code{line} array.  The single rule takes care of filling
+the @code{line} array and printing the page when 20 labels have been read.
+
+The @code{BEGIN} rule simply sets @code{RS} to the empty string, so that
address@hidden splits records at blank lines
+(@pxref{Records}).
+It sets @code{MAXLINES} to 100, since 100 is the maximum number
+of lines on the page (20 * 5 = 100).
+
+Most of the work is done in the @code{printpage()} function.
+The label lines are stored sequentially in the @code{line} array.  But they
+have to print horizontally; @code{line[1]} next to @code{line[6]},
address@hidden next to @code{line[7]}, and so on.  Two loops are used to
+accomplish this.  The outer loop, controlled by @code{i}, steps through
+every 10 lines of data; this is each row of labels.  The inner loop,
+controlled by @code{j}, goes through the lines within the row.
+As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in
+the row, and @samp{i+j+5} is the entry next to it.  The output ends up
+looking something like this:
 
address@hidden @code{getgrent()} user-defined function
 @example
address@hidden file eg/lib/groupawk.in
-function getgrent()
address@hidden
-    _gr_init()
-    if (++_gr_count in _gr_bycount)
-        return _gr_bycount[_gr_count]
-    return ""
address@hidden
address@hidden endfile
+line 1          line 6
+line 2          line 7
+line 3          line 8
+line 4          line 9
+line 5          line 10
address@hidden
 @end example
address@hidden ENDOFRANGE clibf
 
address@hidden @code{endgrent()} function (C library)
-The @code{endgrent()} function resets @code{_gr_count} to zero so that 
@code{getgrent()} can
-start over again:
address@hidden
+The @code{printf} format string @samp{%-41s} left-aligns
+the data and prints it within a fixed-width field.
 
address@hidden @code{endgrent()} user-defined function
+As a final note, an extra blank line is printed at lines 21 and 61, to keep
+the output lined up on the labels.  This is dependent on the particular
+brand of labels in use when the program was written.  You will also note
+that there are two blank lines at the top and two blank lines at the bottom.
+
+The @code{END} rule arranges to flush the final page of labels; there may
+not have been an even multiple of 20 labels in the data:
+
address@hidden @code{labels.awk} program
 @example
address@hidden file eg/lib/groupawk.in
-function endgrent()
address@hidden
-    _gr_count = 0
address@hidden
address@hidden file eg/prog/labels.awk
+# labels.awk --- print mailing labels
 @c endfile
address@hidden example
address@hidden
address@hidden file eg/prog/labels.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# June 1992
+# December 2010, minor edits
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/labels.awk
 
-As with the user database routines, each function calls @code{_gr_init()} to
-initialize the arrays.  Doing so only incurs the extra overhead of running
address@hidden if these functions are used (as opposed to moving the body of
address@hidden()} into a @code{BEGIN} rule).
+# Each label is 5 lines of data that may have blank lines.
+# The label sheets have 2 blank lines at the top and 2 at
+# the bottom.
 
-Most of the work is in scanning the database and building the various
-associative arrays.  The functions that the user calls are themselves very
-simple, relying on @command{awk}'s associative arrays to do work.
+BEGIN    @{ RS = "" ; MAXLINES = 100 @}
+
+function printpage(    i, j)
address@hidden
+    if (Nlines <= 0)
+        return
+
+    printf "\n\n"        # header
 
-The @command{id} program in @ref{Id Program},
-uses these functions.
+    for (i = 1; i <= Nlines; i += 10) @{
+        if (i == 21 || i == 61)
+            print ""
+        for (j = 0; j < 5; j++) @{
+            if (i + j > MAXLINES)
+                break
+            printf "   %-41s %s\n", line[i+j], line[i+j+5]
+        @}
+        print ""
+    @}
 
address@hidden Walking Arrays
address@hidden Traversing Arrays of Arrays
+    printf "\n\n"        # footer
 
address@hidden of Arrays}, described how @command{gawk}
-provides arrays of arrays.  In particular, any element of
-an array may be either a scalar, or another array. The
address@hidden()} function (@pxref{Type Functions})
-lets you distinguish an array
-from a scalar.
-The following function, @code{walk_array()}, recursively traverses
-an array, printing each element's indices and value.
-You call it with the array and a string representing the name
-of the array:
+    delete line
address@hidden
 
address@hidden @code{walk_array()} user-defined function
address@hidden
address@hidden file eg/lib/walkarray.awk
-function walk_array(arr, name,      i)
+# main rule
 @{
-    for (i in arr) @{
-        if (isarray(arr[i]))
-            walk_array(arr[i], (name "[" i "]"))
-        else
-            printf("%s[%s] = %s\n", name, i, arr[i])
+    if (Count >= 20) @{
+        printpage()
+        Count = 0
+        Nlines = 0
     @}
+    n = split($0, a, "\n")
+    for (i = 1; i <= n; i++)
+        line[++Nlines] = a[i]
+    for (; i <= 5; i++)
+        line[++Nlines] = ""
+    Count++
address@hidden
+
+END    \
address@hidden
+    printpage()
 @}
 @c endfile
 @end example
address@hidden ENDOFRANGE prml
address@hidden ENDOFRANGE mlprint
 
address@hidden
-It works by looping over each element of the array. If any given
-element is itself an array, the function calls itself recursively,
-passing the subarray and a new string representing the current index.
-Otherwise, the function simply prints the element's name, index, and value.
-Here is a main program to demonstrate:
address@hidden Word Sorting
address@hidden Generating Word-Usage Counts
 
address@hidden
-BEGIN @{
-    a[1] = 1
-    a[2][1] = 21
-    a[2][2] = 22
-    a[3] = 3
-    a[4][1][1] = 411
-    a[4][2] = 42
address@hidden STARTOFRANGE worus
address@hidden words, usage address@hidden generating
 
-    walk_array(a, "a")
address@hidden
address@hidden example
+When working with large amounts of text, it can be interesting to know
+how often different words appear.  For example, an author may overuse
+certain words, in which case she might wish to find synonyms to substitute
+for words that appear too often. This @value{SUBSECTION} develops a
+program for counting words and presenting the frequency information
+in a useful format.
 
-When run, the program produces the following output:
+At first glance, a program like this would seem to do the job:
 
 @example
-$ @kbd{gawk -f walk_array.awk}
address@hidden a[4][1][1] = 411
address@hidden a[4][2] = 42
address@hidden a[1] = 1
address@hidden a[2][1] = 21
address@hidden a[2][2] = 22
address@hidden a[3] = 3
address@hidden example
-
address@hidden ENDOFRANGE libfgdata
address@hidden ENDOFRANGE flibgdata
address@hidden ENDOFRANGE gdatar
address@hidden ENDOFRANGE libf
address@hidden ENDOFRANGE flib
address@hidden ENDOFRANGE fudlib
address@hidden ENDOFRANGE datagr
+# Print list of word frequencies
 
address@hidden Sample Programs
address@hidden Practical @command{awk} Programs
address@hidden STARTOFRANGE awkpex
address@hidden @command{awk} programs, examples of
address@hidden
+    for (i = 1; i <= NF; i++)
+        freq[$i]++
address@hidden
 
address@hidden Functions},
-presents the idea that reading programs in a language contributes to
-learning that language.  This @value{CHAPTER} continues that theme,
-presenting a potpourri of @command{awk} programs for your reading
-enjoyment.
address@hidden
-There are three sections.
-The first describes how to run the programs presented
-in this @value{CHAPTER}.
+END @{
+    for (word in freq)
+        printf "%s\t%d\n", word, freq[word]
address@hidden
address@hidden example
 
-The second presents @command{awk}
-versions of several common POSIX utilities.
-These are programs that you are hopefully already familiar with,
-and therefore, whose problems are understood.
-By reimplementing these programs in @command{awk},
-you can focus on the @command{awk}-related aspects of solving
-the programming problem.
+The program relies on @command{awk}'s default field splitting
+mechanism to break each line up into ``words,'' and uses an
+associative array named @code{freq}, indexed by each word, to count
+the number of times the word occurs. In the @code{END} rule,
+it prints the counts.
 
-The third is a grab bag of interesting programs.
-These solve a number of different data-manipulation and management
-problems.  Many of the programs are short, which emphasizes @command{awk}'s
-ability to do a lot in just a few lines of code.
address@hidden ifnotinfo
+This program has several problems that prevent it from being
+useful on real text files:
 
-Many of these programs use library functions presented in
address@hidden Functions}.
address@hidden @bullet
address@hidden
+The @command{awk} language considers upper- and lowercase characters to be
+distinct.  Therefore, ``bartender'' and ``Bartender'' are not treated
+as the same word.  This is undesirable, since in normal text, words
+are capitalized if they begin sentences, and a frequency analyzer should not
+be sensitive to capitalization.
 
address@hidden
-* Running Examples::            How to run these examples.
-* Clones::                      Clones of common utilities.
-* Miscellaneous Programs::      Some interesting @command{awk} programs.
address@hidden menu
address@hidden
+Words are detected using the @command{awk} convention that fields are
+separated just by whitespace.  Other characters in the input (except
+newlines) don't have any special meaning to @command{awk}.  This means that
+punctuation characters count as part of words.
 
address@hidden Running Examples
address@hidden Running the Example Programs
address@hidden
+The output does not come out in any useful order.  You're more likely to be
+interested in which words occur most frequently or in having an alphabetized
+table of how frequently each word occurs.
address@hidden itemize
 
-To run a given program, you would typically do something like this:
address@hidden @command{sort} utility
+The first problem can be solved by using @code{tolower()} to remove case
+distinctions.  The second problem can be solved by using @code{gsub()}
+to remove punctuation characters.  Finally, we solve the third problem
+by using the system @command{sort} utility to process the output of the
address@hidden script.  Here is the new version of the program:
 
address@hidden @code{wordfreq.awk} program
 @example
-awk -f @var{program} -- @var{options} @var{files}
address@hidden example
-
address@hidden
-Here, @var{program} is the name of the @command{awk} program (such as
address@hidden), @var{options} are any command-line options for the
-program that start with a @samp{-}, and @var{files} are the actual @value{DF}s.
address@hidden file eg/prog/wordfreq.awk
+# wordfreq.awk --- print list of word frequencies
 
-If your system supports the @samp{#!} executable interpreter mechanism
-(@pxref{Executable Scripts}),
-you can instead run your program directly:
address@hidden
+    $0 = tolower($0)    # remove case distinctions
+    # remove punctuation
+    gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
+    for (i = 1; i <= NF; i++)
+        freq[$i]++
address@hidden
 
address@hidden
-cut.awk -c1-8 myfiles > results
address@hidden endfile
+END @{
+    for (word in freq)
+        printf "%s\t%d\n", word, freq[word]
address@hidden
 @end example
 
-If your @command{awk} is not @command{gawk}, you may instead need to use this:
+Assuming we have saved this program in a file named @file{wordfreq.awk},
+and that the data is in @file{file1}, the following pipeline:
 
 @example
-cut.awk -- -c1-8 myfiles > results
+awk -f wordfreq.awk file1 | sort -k 2nr
 @end example
 
address@hidden Clones
address@hidden Reinventing Wheels for Fun and Profit
address@hidden STARTOFRANGE posimawk
address@hidden POSIX, address@hidden implementing in @command{awk}
-
-This @value{SECTION} presents a number of POSIX utilities implemented in
address@hidden  Reinventing these programs in @command{awk} is often enjoyable,
-because the algorithms can be very clearly expressed, and the code is usually
-very concise and simple.  This is true because @command{awk} does so much for 
you.
-
-It should be noted that these programs are not necessarily intended to
-replace the installed versions on your system.
-Nor may all of these programs be fully compliant with the most recent
-POSIX standard.  This is not a problem; their
-purpose is to illustrate @command{awk} language programming for ``real world''
-tasks.
-
-The programs are presented in alphabetical order.
-
address@hidden
-* Cut Program::                 The @command{cut} utility.
-* Egrep Program::               The @command{egrep} utility.
-* Id Program::                  The @command{id} utility.
-* Split Program::               The @command{split} utility.
-* Tee Program::                 The @command{tee} utility.
-* Uniq Program::                The @command{uniq} utility.
-* Wc Program::                  The @command{wc} utility.
address@hidden menu
-
address@hidden Cut Program
address@hidden Cutting out Fields and Columns
address@hidden
+produces a table of the words appearing in @file{file1} in order of
+decreasing frequency.
 
address@hidden @command{cut} utility
address@hidden STARTOFRANGE cut
address@hidden @command{cut} utility
address@hidden STARTOFRANGE ficut
address@hidden fields, cutting
address@hidden STARTOFRANGE colcut
address@hidden columns, cutting
-The @command{cut} utility selects, or ``cuts,'' characters or fields
-from its standard input and sends them to its standard output.
-Fields are separated by TABs by default,
-but you may supply a command-line option to change the field
address@hidden (i.e., the field-separator character). @command{cut}'s
-definition of fields is less general than @command{awk}'s.
+The @command{awk} program suitably massages the
+data and produces a word frequency table, which is not ordered.
+The @command{awk} script's output is then sorted by the @command{sort}
+utility and printed on the screen.
 
-A common use of @command{cut} might be to pull out just the login name of
-logged-on users from the output of @command{who}.  For example, the following
-pipeline generates a sorted, unique list of the logged-on users:
+The options given to @command{sort}
+specify a sort that uses the second field of each input line (skipping
+one field), that the sort keys should be treated as numeric quantities
+(otherwise @samp{15} would come before @samp{5}), and that the sorting
+should be done in descending (reverse) order.
+
+The @command{sort} could even be done from within the program, by changing
+the @code{END} action to:
 
 @example
-who | cut -c1-8 | sort | uniq
address@hidden file eg/prog/wordfreq.awk
+END @{
+    sort = "sort -k 2nr"
+    for (word in freq)
+        printf "%s\t%d\n", word, freq[word] | sort
+    close(sort)
address@hidden
address@hidden endfile
 @end example
 
-The options for @command{cut} are:
-
address@hidden @code
address@hidden -c @var{list}
-Use @var{list} as the list of characters to cut out.  Items within the list
-may be separated by commas, and ranges of characters can be separated with
-dashes.  The list @samp{1-8,15,22-35} specifies characters 1 through
-8, 15, and 22 through 35.
-
address@hidden -f @var{list}
-Use @var{list} as the list of fields to cut out.
+This way of sorting must be used on systems that do not
+have true pipes at the command-line (or batch-file) level.
+See the general operating system documentation for more information on how
+to use the @command{sort} program.
address@hidden ENDOFRANGE worus
 
address@hidden -d @var{delim}
-Use @var{delim} as the field-separator character instead of the TAB
-character.
address@hidden History Sorting
address@hidden Removing Duplicates from Unsorted Text
 
address@hidden -s
-Suppress printing of lines that do not contain the field delimiter.
address@hidden table
address@hidden STARTOFRANGE lidu
address@hidden lines, address@hidden removing
+The @command{uniq} program
+(@pxref{Uniq Program}),
+removes duplicate lines from @emph{sorted} data.
 
-The @command{awk} implementation of @command{cut} uses the @code{getopt()} 
library
-function (@pxref{Getopt Function})
-and the @code{join()} library function
-(@pxref{Join Function}).
+Suppose, however, you need to remove duplicate lines from a @value{DF} but
+that you want to preserve the order the lines are in.  A good example of
+this might be a shell history file.  The history file keeps a copy of all
+the commands you have entered, and it is not unusual to repeat a command
+several times in a row.  Occasionally you might want to compact the history
+by removing duplicate entries.  Yet it is desirable to maintain the order
+of the original commands.
 
-The program begins with a comment describing the options, the library
-functions needed, and a @code{usage()} function that prints out a usage
-message and exits.  @code{usage()} is called if invalid arguments are
-supplied:
+This simple program does the job.  It uses two arrays.  The @code{data}
+array is indexed by the text of each line.
+For each line, @code{data[$0]} is incremented.
+If a particular line has not
+been seen before, then @code{data[$0]} is zero.
+In this case, the text of the line is stored in @code{lines[count]}.
+Each element of @code{lines} is a unique command, and the indices of
address@hidden indicate the order in which those lines are encountered.
+The @code{END} rule simply prints out the lines, in order:
 
address@hidden @code{cut.awk} program
address@hidden Rakitzis, Byron
address@hidden @code{histsort.awk} program
 @example
address@hidden file eg/prog/cut.awk
-# cut.awk --- implement cut in awk
address@hidden file eg/prog/histsort.awk
+# histsort.awk --- compact a shell history file
+# Thanks to Byron Rakitzis for the general idea
 @c endfile
 @ignore
address@hidden file eg/prog/cut.awk
address@hidden file eg/prog/histsort.awk
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
 # May 1993
 @c endfile
 @end ignore
address@hidden file eg/prog/cut.awk
-
-# Options:
-#    -f list     Cut fields
-#    -d c        Field delimiter character
-#    -c list     Cut characters
-#
-#    -s          Suppress lines without the delimiter
-#
-# Requires getopt() and join() library functions
address@hidden file eg/prog/histsort.awk
 
 @group
-function usage(    e1, e2)
 @{
-    e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
-    e2 = "usage: cut [-c list] [files...]"
-    print e1 > "/dev/stderr"
-    print e2 > "/dev/stderr"
-    exit 1
+    if (data[$0]++ == 0)
+        lines[++count] = $0
address@hidden
address@hidden group
+
address@hidden
+END @{
+    for (i = 1; i <= count; i++)
+        print lines[i]
 @}
 @end group
 @c endfile
 @end example
 
address@hidden
-The variables @code{e1} and @code{e2} are used so that the function
-fits nicely on the
+This program also provides a foundation for generating other useful
+information.  For example, using the following @code{print} statement in the
address@hidden rule indicates how often a particular command is used:
+
address@hidden
+print data[lines[i]], lines[i]
address@hidden example
+
+This works because @code{data[$0]} is incremented each time a line is
+seen.
address@hidden ENDOFRANGE lidu
+
address@hidden Extract Program
address@hidden Extracting Programs from Texinfo Source Files
+
address@hidden STARTOFRANGE texse
address@hidden Texinfo, extracting programs from source files
address@hidden STARTOFRANGE fitex
address@hidden files, address@hidden extracting programs from
 @ifnotinfo
-page.
+Both this chapter and the previous chapter
+(@ref{Library Functions})
+present a large number of @command{awk} programs.
 @end ifnotinfo
address@hidden
-screen.
address@hidden ifnottex
address@hidden
+The nodes
address@hidden Functions},
+and @ref{Sample Programs},
+are the top level nodes for a large number of @command{awk} programs.
address@hidden ifinfo
+If you want to experiment with these programs, it is tedious to have to type
+them in by hand.  Here we present a program that can extract parts of a
+Texinfo input file into separate files.
 
address@hidden @code{BEGIN} pattern, running @command{awk} programs and
address@hidden @code{FS} variable, running @command{awk} programs and
-Next comes a @code{BEGIN} rule that parses the command-line options.
-It sets @code{FS} to a single TAB character, because that is @command{cut}'s
-default field separator. The rule then sets the output field separator to be 
the
-same as the input field separator.  A loop using @code{getopt()} steps
-through the command-line options.  Exactly one of the variables
address@hidden or @code{by_chars} is set to true, to indicate that
-processing should be done by fields or by characters, respectively.
-When cutting by characters, the output field separator is set to the null
-string:
address@hidden Texinfo
+This @value{DOCUMENT} is written in @uref{http://texinfo.org, Texinfo},
+the GNU project's document formatting language.
+A single Texinfo source file can be used to produce both
+printed and online documentation.
address@hidden
+Texinfo is fully documented in the book
address@hidden GNU Documentation Format},
+available from the Free Software Foundation.
address@hidden ifnotinfo
address@hidden
+The Texinfo language is described fully, starting with
address@hidden, , Texinfo, texinfo,Texinfo---The GNU Documentation Format}.
address@hidden ifinfo
 
address@hidden
address@hidden file eg/prog/cut.awk
-BEGIN    \
address@hidden
-    FS = "\t"    # default
-    OFS = FS
-    while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) @{
-        if (c == "f") @{
-            by_fields = 1
-            fieldlist = Optarg
-        @} else if (c == "c") @{
-            by_chars = 1
-            fieldlist = Optarg
-            OFS = ""
-        @} else if (c == "d") @{
-            if (length(Optarg) > 1) @{
-                printf("Using first character of %s" \
-                       " for delimiter\n", Optarg) > "/dev/stderr"
-                Optarg = substr(Optarg, 1, 1)
-            @}
-            FS = Optarg
-            OFS = FS
-            if (FS == " ")    # defeat awk semantics
-                FS = "[ ]"
-        @} else if (c == "s")
-            suppress++
-        else
-            usage()
-    @}
+For our purposes, it is enough to know three things about Texinfo input
+files:
 
-    # Clear out options
-    for (i = 1; i < Optind; i++)
-        ARGV[i] = ""
address@hidden endfile
address@hidden example
address@hidden @bullet
address@hidden
+The ``at'' symbol (@samp{@@}) is special in Texinfo, much as
+the backslash (@samp{\}) is in C
+or @command{awk}.  Literal @samp{@@} symbols are represented in Texinfo source
+files as @samp{@@@@}.
 
address@hidden field separators, spaces as
-The code must take
-special care when the field delimiter is a space.  Using
-a single space (@address@hidden" "}}) for the value of @code{FS} is
address@hidden would separate fields with runs of spaces,
-TABs, and/or newlines, and we want them to be separated with individual
-spaces.  Also remember that after @code{getopt()} is through
-(as described in @ref{Getopt Function}),
-we have to
-clear out all the elements of @code{ARGV} from 1 to @code{Optind},
-so that @command{awk} does not try to process the command-line options
-as @value{FN}s.
address@hidden
+Comments start with either @samp{@@c} or @samp{@@comment}.
+The file-extraction program works by using special comments that start
+at the beginning of a line.
 
-After dealing with the command-line options, the program verifies that the
-options make sense.  Only one or the other of @option{-c} and @option{-f}
-should be used, and both require a field list.  Then the program calls
-either @code{set_fieldlist()} or @code{set_charlist()} to pull apart the
-list of fields or characters:
address@hidden
+Lines containing @samp{@@group} and @samp{@@end group} commands bracket
+example text that should not be split across a page boundary.
+(Unfortunately, @TeX{} isn't always smart enough to do things exactly right,
+so we have to give it some help.)
address@hidden itemize
+
+The following program, @file{extract.awk}, reads through a Texinfo source
+file and does two things, based on the special comments.
+Upon seeing @address@hidden@@c system @dots{}}},
+it runs a command, by extracting the command text from the
+control line and passing it on to the @code{system()} function
+(@pxref{I/O Functions}).
+Upon seeing @samp{@@c file @var{filename}}, each subsequent line is sent to
+the file @var{filename}, until @samp{@@c endfile} is encountered.
+The rules in @file{extract.awk} match either @samp{@@c} or
address@hidden@@comment} by letting the @samp{omment} part be optional.
+Lines containing @samp{@@group} and @samp{@@end group} are simply removed.
address@hidden uses the @code{join()} library function
+(@pxref{Join Function}).
+
+The example programs in the online Texinfo source for @address@hidden
+(@file{gawk.texi}) have all been bracketed inside @samp{file} and
address@hidden lines.  The @command{gawk} distribution uses a copy of
address@hidden to extract the sample programs and install many
+of them in a standard directory where @command{gawk} can find them.
+The Texinfo file looks something like this:
 
 @example
address@hidden file eg/prog/cut.awk
-    if (by_fields && by_chars)
-        usage()
address@hidden
+This program has a @@address@hidden@} rule,
+that prints a nice message:
 
-    if (by_fields == 0 && by_chars == 0)
-        by_fields = 1    # default
+@@example
+@@c file examples/messages.awk
+BEGIN @@@{ print "Don't panic!" @@@}
+@@c end file
+@@end example
 
-    if (fieldlist == "") @{
-        print "cut: needs list for -c or -f" > "/dev/stderr"
-        exit 1
-    @}
+It also prints some final advice:
 
-    if (by_fields)
-        set_fieldlist()
-    else
-        set_charlist()
address@hidden
address@hidden endfile
+@@example
+@@c file examples/messages.awk
+END @@@{ print "Always avoid bored archeologists!" @@@}
+@@c end file
+@@end example
address@hidden
 @end example
 
address@hidden()} splits the field list apart at the commas
-into an array.  Then, for each element of the array, it looks to
-see if the element is actually a range, and if so, splits it apart.
-The function checks the range
-to make sure that the first number is smaller than the second.
-Each number in the list is added to the @code{flist} array, which
-simply lists the fields that will be printed.  Normal field splitting
-is used.  The program lets @command{awk} handle the job of doing the
-field splitting:
address@hidden begins by setting @code{IGNORECASE} to one, so that
+mixed upper- and lowercase letters in the directives won't matter.
+
+The first rule handles calling @code{system()}, checking that a command is
+given (@code{NF} is at least three) and also checking that the command
+exits with a zero exit status, signifying OK:
 
address@hidden @code{extract.awk} program
 @example
address@hidden file eg/prog/cut.awk
-function set_fieldlist(        n, m, i, j, k, f, g)
address@hidden file eg/prog/extract.awk
+# extract.awk --- extract files and run programs
+#                 from texinfo files
address@hidden endfile
address@hidden
address@hidden file eg/prog/extract.awk
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# May 1993
+# Revised September 2000
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/extract.awk
+
+BEGIN    @{ IGNORECASE = 1 @}
+
+/^@@c(omment)?[ \t]+system/    \
 @{
-    n = split(fieldlist, f, ",")
-    j = 1    # index in flist
-    for (i = 1; i <= n; i++) @{
-        if (index(f[i], "-") != 0) @{ # a range
-            m = split(f[i], g, "-")
address@hidden
-            if (m != 2 || g[1] >= g[2]) @{
-                printf("bad field list: %s\n",
-                                  f[i]) > "/dev/stderr"
-                exit 1
-            @}
address@hidden group
-            for (k = g[1]; k <= g[2]; k++)
-                flist[j++] = k
-        @} else
-            flist[j++] = f[i]
+    if (NF < 3) @{
+        e = (FILENAME ":" FNR)
+        e = (e  ": badly formed `system' line")
+        print e > "/dev/stderr"
+        next
+    @}
+    $1 = ""
+    $2 = ""
+    stat = system($0)
+    if (stat != 0) @{
+        e = (FILENAME ":" FNR)
+        e = (e ": warning: system returned " stat)
+        print e > "/dev/stderr"
     @}
-    nfields = j - 1
 @}
 @c endfile
 @end example
 
-The @code{set_charlist()} function is more complicated than
address@hidden()}.
-The idea here is to use @command{gawk}'s @code{FIELDWIDTHS} variable
-(@pxref{Constant Size}),
-which describes constant-width input.  When using a character list, that is
-exactly what we have.
address@hidden
+The variable @code{e} is used so that the rule
+fits nicely on the
address@hidden
+page.
address@hidden ifnotinfo
address@hidden
+screen.
address@hidden ifnottex
 
-Setting up @code{FIELDWIDTHS} is more complicated than simply listing the
-fields that need to be printed.  We have to keep track of the fields to
-print and also the intervening characters that have to be skipped.
-For example, suppose you wanted characters 1 through 8, 15, and
-22 through 35.  You would use @samp{-c 1-8,15,22-35}.  The necessary value
-for @code{FIELDWIDTHS} is @address@hidden"8 6 1 6 14"}}.  This yields five
-fields, and the fields to print
-are @code{$1}, @code{$3}, and @code{$5}.
-The intermediate fields are @dfn{filler},
-which is stuff in between the desired data.
address@hidden lists the fields to print, and @code{t} tracks the
-complete field list, including filler fields:
+The second rule handles moving data into files.  It verifies that a
address@hidden is given in the directive.  If the file named is not the
+current file, then the current file is closed.  Keeping the current file
+open until a new file is encountered allows the use of the @samp{>}
+redirection for printing the contents, keeping open file management
+simple.
+
+The @code{for} loop does the work.  It reads lines using @code{getline}
+(@pxref{Getline}).
+For an unexpected end of file, it calls the @address@hidden()}}
+function.  If the line is an ``endfile'' line, then it breaks out of
+the loop.
+If the line is an @samp{@@group} or @samp{@@end group} line, then it
+ignores it and goes on to the next line.
+Similarly, comments within examples are also ignored.
+
+Most of the work is in the following few lines.  If the line has no @samp{@@}
+symbols, the program can print it directly.
+Otherwise, each leading @samp{@@} must be stripped off.
+To remove the @samp{@@} symbols, the line is split into separate elements of
+the array @code{a}, using the @code{split()} function
+(@pxref{String Functions}).
+The @samp{@@} symbol is used as the separator character.
+Each element of @code{a} that is empty indicates two successive @samp{@@}
+symbols in the original line.  For each two empty elements (@samp{@@@@} in
+the original file), we have to add a single @samp{@@} symbol back
address@hidden program was written before @command{gawk} had the
address@hidden()} function. Consider how you might use it to simplify the code.}
+
+When the processing of the array is finished, @code{join()} is called with the
+value of @code{SUBSEP}, to rejoin the pieces back into a single
+line.  That line is then printed to the output file:
 
 @example
address@hidden file eg/prog/cut.awk
-function set_charlist(    field, i, j, f, g, t,
-                          filler, last, len)
address@hidden file eg/prog/extract.awk
+/^@@c(omment)?[ \t]+file/    \
 @{
-    field = 1   # count total fields
-    n = split(fieldlist, f, ",")
-    j = 1       # index in flist
-    for (i = 1; i <= n; i++) @{
-        if (index(f[i], "-") != 0) @{ # range
-            m = split(f[i], g, "-")
-            if (m != 2 || g[1] >= g[2]) @{
-                printf("bad character list: %s\n",
-                               f[i]) > "/dev/stderr"
-                exit 1
+    if (NF != 3) @{
+        e = (FILENAME ":" FNR ": badly formed `file' line")
+        print e > "/dev/stderr"
+        next
+    @}
+    if ($3 != curfile) @{
+        if (curfile != "")
+            close(curfile)
+        curfile = $3
+    @}
+
+    for (;;) @{
+        if ((getline line) <= 0)
+            unexpected_eof()
+        if (line ~ /^@@c(omment)?[ \t]+endfile/)
+            break
+        else if (line ~ /^@@(end[ \t]+)?group/)
+            continue
+        else if (line ~ /^@@c(omment+)?[ \t]+/)
+            continue
+        if (index(line, "@@") == 0) @{
+            print line > curfile
+            continue
+        @}
+        n = split(line, a, "@@")
+        # if a[1] == "", means leading @@,
+        # don't add one back in.
+        for (i = 2; i <= n; i++) @{
+            if (a[i] == "") @{ # was an @@@@
+                a[i] = "@@"
+                if (a[i+1] == "")
+                    i++
             @}
-            len = g[2] - g[1] + 1
-            if (g[1] > 1)  # compute length of filler
-                filler = g[1] - last - 1
-            else
-                filler = 0
address@hidden
-            if (filler)
-                t[field++] = filler
address@hidden group
-            t[field++] = len  # length of field
-            last = g[2]
-            flist[j++] = field - 1
-        @} else @{
-            if (f[i] > 1)
-                filler = f[i] - last - 1
-            else
-                filler = 0
-            if (filler)
-                t[field++] = filler
-            t[field++] = 1
-            last = f[i]
-            flist[j++] = field - 1
         @}
+        print join(a, 1, n, SUBSEP) > curfile
     @}
-    FIELDWIDTHS = join(t, 1, field - 1)
-    nfields = j - 1
 @}
 @c endfile
 @end example
 
-Next is the rule that actually processes the data.  If the @option{-s} option
-is given, then @code{suppress} is true.  The first @code{if} statement
-makes sure that the input record does have the field separator.  If
address@hidden is processing fields, @code{suppress} is true, and the field
-separator character is not in the record, then the record is skipped.
+An important thing to note is the use of the @samp{>} redirection.
+Output done with @samp{>} only opens the file once; it stays open and
+subsequent output is appended to the file
+(@pxref{Redirection}).
+This makes it easy to mix program text and explanatory prose for the same
+sample source file (as has been done here!) without any hassle.  The file is
+only closed when a new data @value{FN} is encountered or at the end of the
+input file.
 
-If the record is valid, then @command{gawk} has split the data
-into fields, either using the character in @code{FS} or using fixed-length
-fields and @code{FIELDWIDTHS}.  The loop goes through the list of fields
-that should be printed.  The corresponding field is printed if it contains 
data.
-If the next field also has data, then the separator character is
-written out between the fields:
+Finally, the function @address@hidden()}} prints an appropriate
+error message and then exits.
+The @code{END} rule handles the final cleanup, closing the open file:
 
address@hidden function lb put on same line for page breaking. sigh
 @example
address@hidden file eg/prog/cut.awk
address@hidden file eg/prog/extract.awk
address@hidden
+function unexpected_eof()
 @{
-    if (by_fields && suppress && index($0, FS) != 0)
-        next
+    printf("%s:%d: unexpected EOF or error\n",
+        FILENAME, FNR) > "/dev/stderr"
+    exit 1
address@hidden
address@hidden group
 
-    for (i = 1; i <= nfields; i++) @{
-        if ($flist[i] != "") @{
-            printf "%s", $flist[i]
-            if (i < nfields && $flist[i+1] != "")
-                printf "%s", OFS
-        @}
-    @}
-    print ""
+END @{
+    if (curfile)
+        close(curfile)
 @}
 @c endfile
 @end example
address@hidden ENDOFRANGE texse
address@hidden ENDOFRANGE fitex
 
-This version of @command{cut} relies on @command{gawk}'s @code{FIELDWIDTHS}
-variable to do the character-based cutting.  While it is possible in
-other @command{awk} implementations to use @code{substr()}
-(@pxref{String Functions}),
-it is also extremely painful.
-The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem
-of picking the input line apart by characters.
address@hidden ENDOFRANGE cut
address@hidden ENDOFRANGE ficut
address@hidden ENDOFRANGE colcut
-
address@hidden Exercise: Rewrite using split with "".
-
address@hidden Egrep Program
address@hidden Searching for Regular Expressions in Files
address@hidden Simple Sed
address@hidden A Simple Stream Editor
 
address@hidden STARTOFRANGE regexps
address@hidden regular expressions, searching for
address@hidden STARTOFRANGE sfregexp
address@hidden searching, files for regular expressions
address@hidden STARTOFRANGE fsregexp
address@hidden files, searching for regular expressions
address@hidden @command{egrep} utility
-The @command{egrep} utility searches files for patterns.  It uses regular
-expressions that are almost identical to those available in @command{awk}
-(@pxref{Regexp}).
-You invoke it as follows:
address@hidden @command{sed} utility
address@hidden stream editors
+The @command{sed} utility is a stream editor, a program that reads a
+stream of data, makes changes to it, and passes it on.
+It is often used to make global changes to a large file or to a stream
+of data generated by a pipeline of commands.
+While @command{sed} is a complicated program in its own right, its most common
+use is to perform global substitutions in the middle of a pipeline:
 
 @example
-egrep @r{[} @var{options} @r{]} '@var{pattern}' @var{files} @dots{}
+command1 < orig.data | sed 's/old/new/g' | command2 > result
 @end example
 
-The @var{pattern} is a regular expression.  In typical usage, the regular
-expression is quoted to prevent the shell from expanding any of the
-special characters as @value{FN} wildcards.  Normally, @command{egrep}
-prints the lines that matched.  If multiple @value{FN}s are provided on
-the command line, each output line is preceded by the name of the file
-and a colon.
-
-The options to @command{egrep} are as follows:
-
address@hidden @code
address@hidden -c
-Print out a count of the lines that matched the pattern, instead of the
-lines themselves.
-
address@hidden -s
-Be silent.  No output is produced and the exit value indicates whether
-the pattern was matched.
-
address@hidden -v
-Invert the sense of the test. @command{egrep} prints the lines that do
address@hidden match the pattern and exits successfully if the pattern is not
-matched.
-
address@hidden -i
-Ignore case distinctions in both the pattern and the input data.
-
address@hidden -l
-Only print (list) the names of the files that matched, not the lines that 
matched.
-
address@hidden -e @var{pattern}
-Use @var{pattern} as the regexp to match.  The purpose of the @option{-e}
-option is to allow patterns that start with a @samp{-}.
address@hidden table
-
-This version uses the @code{getopt()} library function
-(@pxref{Getopt Function})
-and the file transition library program
-(@pxref{Filetrans Function}).
+Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp
address@hidden on each input line and globally replace it with the text
address@hidden, i.e., all the occurrences on a line.  This is similar to
address@hidden's @code{gsub()} function
+(@pxref{String Functions}).
 
-The program begins with a descriptive comment and then a @code{BEGIN} rule
-that processes the command-line arguments with @code{getopt()}.  The 
@option{-i}
-(ignore case) option is particularly easy with @command{gawk}; we just use the
address@hidden built-in variable
-(@pxref{Built-in Variables}):
+The following program, @file{awksed.awk}, accepts at least two command-line
+arguments: the pattern to look for and the text to replace it with. Any
+additional arguments are treated as data @value{FN}s to process. If none
+are provided, the standard input is used:
 
address@hidden @code{egrep.awk} program
address@hidden Brennan, Michael
address@hidden @command{awksed.awk} program
address@hidden @cindex simple stream editor
address@hidden @cindex stream editor, simple
 @example
address@hidden file eg/prog/egrep.awk
-# egrep.awk --- simulate egrep in awk
-#
address@hidden file eg/prog/awksed.awk
+# awksed.awk --- do s/foo/bar/g using just print
+#    Thanks to Michael Brennan for the idea
 @c endfile
 @ignore
address@hidden file eg/prog/egrep.awk
address@hidden file eg/prog/awksed.awk
+#
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
-
+# August 1995
 @c endfile
 @end ignore
address@hidden file eg/prog/egrep.awk
-# Options:
-#    -c    count of lines
-#    -s    silent - use exit value
-#    -v    invert test, success if no match
-#    -i    ignore case
-#    -l    print filenames only
-#    -e    argument is pattern
-#
-# Requires getopt and file transition library functions
address@hidden file eg/prog/awksed.awk
 
-BEGIN @{
-    while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) @{
-        if (c == "c")
-            count_only++
-        else if (c == "s")
-            no_print++
-        else if (c == "v")
-            invert++
-        else if (c == "i")
-            IGNORECASE = 1
-        else if (c == "l")
-            filenames_only++
-        else if (c == "e")
-            pattern = Optarg
-        else
-            usage()
-    @}
address@hidden endfile
address@hidden example
+function usage()
address@hidden
+    print "usage: awksed pat repl [files...]" > "/dev/stderr"
+    exit 1
address@hidden
 
-Next comes the code that handles the @command{egrep}-specific behavior. If no
-pattern is supplied with @option{-e}, the first nonoption on the
-command line is used.  The @command{awk} command-line arguments up to 
@code{ARGV[Optind]}
-are cleared, so that @command{awk} won't try to process them as files.  If no
-files are specified, the standard input is used, and if multiple files are
-specified, we make sure to note this so that the @value{FN}s can precede the
-matched lines in the output:
+BEGIN @{
+    # validate arguments
+    if (ARGC < 3)
+        usage()
 
address@hidden
address@hidden file eg/prog/egrep.awk
-    if (pattern == "")
-        pattern = ARGV[Optind++]
+    RS = ARGV[1]
+    ORS = ARGV[2]
 
-    for (i = 1; i < Optind; i++)
-        ARGV[i] = ""
-    if (Optind >= ARGC) @{
-        ARGV[1] = "-"
-        ARGC = 2
-    @} else if (ARGC - Optind > 1)
-        do_filenames++
+    # don't use arguments as files
+    ARGV[1] = ARGV[2] = ""
address@hidden
 
-#    if (IGNORECASE)
-#        pattern = tolower(pattern)
address@hidden
+# look ma, no hands!
address@hidden
+    if (RT == "")
+        printf "%s", $0
+    else
+        print
 @}
address@hidden group
 @c endfile
 @end example
 
-The last two lines are commented out, since they are not needed in
address@hidden  They should be uncommented if you have to use another version
-of @command{awk}.
+The program relies on @command{gawk}'s ability to have @code{RS} be a regexp,
+as well as on the setting of @code{RT} to the actual text that terminates the
+record (@pxref{Records}).
 
-The next set of lines should be uncommented if you are not using
address@hidden  This rule translates all the characters in the input line
-into lowercase if the @option{-i} option is address@hidden
-also introduces a subtle bug;
-if a match happens, we output the translated line, not the original.}
-The rule is
-commented out since it is not necessary with @command{gawk}:
+The idea is to have @code{RS} be the pattern to look for. @command{gawk}
+automatically sets @code{$0} to the text between matches of the pattern.
+This is text that we want to keep, unmodified.  Then, by setting @code{ORS}
+to the replacement text, a simple @code{print} statement outputs the
+text we want to keep, followed by the replacement text.
 
address@hidden Exercise: Fix this, w/array and new line as key to original line
+There is one wrinkle to this scheme, which is what to do if the last record
+doesn't end with text that matches @code{RS}.  Using a @code{print}
+statement unconditionally prints the replacement text, which is not correct.
+However, if the file did not end in text that matches @code{RS}, @code{RT}
+is set to the null string.  In this case, we can print @code{$0} using
address@hidden
+(@pxref{Printf}).
+
+The @code{BEGIN} rule handles the setup, checking for the right number
+of arguments and calling @code{usage()} if there is a problem. Then it sets
address@hidden and @code{ORS} from the command-line arguments and sets
address@hidden and @code{ARGV[2]} to the null string, so that they are
+not treated as @value{FN}s
+(@pxref{ARGC and ARGV}).
+
+The @code{usage()} function prints an error message and exits.
+Finally, the single rule handles the printing scheme outlined above,
+using @code{print} or @code{printf} as appropriate, depending upon the
+value of @code{RT}.
+
address@hidden
+Exercise, compare the performance of this version with the more
+straightforward:
+
+BEGIN {
+    pat = ARGV[1]
+    repl = ARGV[2]
+    ARGV[1] = ARGV[2] = ""
+}
+
+{ gsub(pat, repl); print }
+
+Exercise: what are the advantages and disadvantages of this version versus sed?
+  Advantage: egrep regexps
+             speed (?)
+  Disadvantage: no & in replacement text
 
address@hidden
address@hidden file eg/prog/egrep.awk
address@hidden
-#    if (IGNORECASE)
-#        $0 = tolower($0)
address@hidden
address@hidden endfile
address@hidden example
+Others?
address@hidden ignore
 
-The @code{beginfile()} function is called by the rule in @file{ftrans.awk}
-when each new file is processed.  In this case, it is very simple; all it
-does is initialize a variable @code{fcount} to zero. @code{fcount} tracks
-how many lines in the current file matched the pattern.
-Naming the parameter @code{junk} shows we know that @code{beginfile()}
-is called with a parameter, but that we're not interested in its value:
address@hidden Igawk Program
address@hidden An Easy Way to Use Library Functions
 
address@hidden
address@hidden file eg/prog/egrep.awk
-function beginfile(junk)
address@hidden
-    fcount = 0
address@hidden
address@hidden endfile
address@hidden example
address@hidden STARTOFRANGE libfex
address@hidden libraries of @command{awk} functions, example program for using
address@hidden STARTOFRANGE flibex
address@hidden functions, library, example program for using
+In @ref{Include Files}, we saw how @command{gawk} provides a built-in
+file-inclusion capability.  However, this is a @command{gawk} extension.
+This @value{SECTION} provides the motivation for making file inclusion
+available for standard @command{awk}, and shows how to do it using a
+combination of shell and @command{awk} programming.
 
-The @code{endfile()} function is called after each file has been processed.
-It affects the output only when the user wants a count of the number of lines 
that
-matched.  @code{no_print} is true only if the exit status is desired.
address@hidden is true if line counts are desired.  @command{egrep}
-therefore only prints line counts if printing and counting are enabled.
-The output format must be adjusted depending upon the number of files to
-process.  Finally, @code{fcount} is added to @code{total}, so that we
-know the total number of lines that matched the pattern:
+Using library functions in @command{awk} can be very beneficial. It
+encourages code reuse and the writing of general functions. Programs are
+smaller and therefore clearer.
+However, using library functions is only easy when writing @command{awk}
+programs; it is painful when running them, requiring multiple @option{-f}
+options.  If @command{gawk} is unavailable, then so too is the @env{AWKPATH}
+environment variable and the ability to put @command{awk} functions into a
+library directory (@pxref{Options}).
+It would be nice to be able to write programs in the following manner:
 
 @example
address@hidden file eg/prog/egrep.awk
-function endfile(file)
address@hidden
-    if (! no_print && count_only) @{
-        if (do_filenames)
-            print file ":" fcount
-        else
-            print fcount
-    @}
+# library functions
+@@include getopt.awk
+@@include join.awk
address@hidden
 
-    total += fcount
+# main program
+BEGIN @{
+    while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
+        @dots{}
+    @dots{}
 @}
address@hidden endfile
 @end example
 
-The following rule does most of the work of matching lines. The variable
address@hidden is true if the line matched the pattern. If the user
-wants lines that did not match, the sense of @code{matches} is inverted
-using the @samp{!} operator. @code{fcount} is incremented with the value of
address@hidden, which is either one or zero, depending upon a
-successful or unsuccessful match.  If the line does not match, the
address@hidden statement just moves on to the next record.
-
-A number of additional tests are made, but they are only done if we
-are not counting lines.  First, if the user only wants exit status
-(@code{no_print} is true), then it is enough to know that @emph{one}
-line in this file matched, and we can skip on to the next file with
address@hidden  Similarly, if we are only printing @value{FN}s, we can
-print the @value{FN}, and then skip to the next file with @code{nextfile}.
-Finally, each line is printed, with a leading @value{FN} and colon
-if necessary:
+The following program, @file{igawk.sh}, provides this service.
+It simulates @command{gawk}'s searching of the @env{AWKPATH} variable
+and also allows @dfn{nested} includes; i.e., a file that is included
+with @samp{@@include} can contain further @samp{@@include} statements.
address@hidden makes an effort to only include files once, so that nested
+includes don't accidentally include a library function twice.
 
address@hidden @code{!} (exclamation point), @code{!} operator
address@hidden exclamation point (@code{!}), @code{!} operator
address@hidden
address@hidden file eg/prog/egrep.awk
address@hidden
-    matches = ($0 ~ pattern)
-    if (invert)
-        matches = ! matches
address@hidden should behave just like @command{gawk} externally.  This
+means it should accept all of @command{gawk}'s command-line arguments,
+including the ability to have multiple source files specified via
address@hidden, and the ability to mix command-line and library source files.
 
-    fcount += matches    # 1 or 0
+The program is written using the POSIX Shell (@command{sh}) command
address@hidden explaining the @command{sh} language is beyond
+the scope of this book. We provide some minimal explanations, but see
+a good shell programming book if you wish to understand things in more
+depth.} It works as follows:
 
-    if (! matches)
-        next
address@hidden
address@hidden
+Loop through the arguments, saving anything that doesn't represent
address@hidden source code for later, when the expanded program is run.
 
-    if (! count_only) @{
-        if (no_print)
-            nextfile
address@hidden
+For any arguments that do represent @command{awk} text, put the arguments into
+a shell variable that will be expanded.  There are two cases:
 
-        if (filenames_only) @{
-            print FILENAME
-            nextfile
-        @}
address@hidden a
address@hidden
+Literal text, provided with @option{--source} or @option{--source=}.  This
+text is just appended directly.
 
-        if (do_filenames)
-            print FILENAME ":" $0
-        else
-            print
-    @}
address@hidden
address@hidden endfile
address@hidden example
address@hidden
+Source @value{FN}s, provided with @option{-f}.  We use a neat trick and append
address@hidden@@include @var{filename}} to the shell variable's contents.  
Since the file-inclusion
+program works the way @command{gawk} does, this gets the text
+of the file included into the program at the correct point.
address@hidden enumerate
 
-The @code{END} rule takes care of producing the correct exit status. If
-there are no matches, the exit status is one; otherwise it is zero:
address@hidden
+Run an @command{awk} program (naturally) over the shell variable's contents to 
expand
address@hidden@@include} statements.  The expanded program is placed in a second
+shell variable.
 
address@hidden
address@hidden file eg/prog/egrep.awk
-END    \
address@hidden
-    if (total == 0)
-        exit 1
-    exit 0
address@hidden
address@hidden endfile
address@hidden example
address@hidden
+Run the expanded program with @command{gawk} and any other original 
command-line
+arguments that the user supplied (such as the data @value{FN}s).
address@hidden enumerate
 
-The @code{usage()} function prints a usage message in case of invalid options,
-and then exits:
+This program uses shell variables extensively: for storing command-line 
arguments,
+the text of the @command{awk} program that will expand the user's program, for 
the
+user's original program, and for the expanded program.  Doing so removes some
+potential problems that might arise were we to use temporary files instead,
+at the cost of making the script somewhat more complicated.
 
address@hidden
address@hidden file eg/prog/egrep.awk
-function usage(    e)
address@hidden
-    e = "Usage: egrep [-csvil] [-e pat] [files ...]"
-    e = e "\n\tegrep [-csvil] pat [files ...]"
-    print e > "/dev/stderr"
-    exit 1
address@hidden
address@hidden endfile
address@hidden example
+The initial part of the program turns on shell tracing if the first
+argument is @samp{debug}.
 
-The variable @code{e} is used so that the function fits nicely
-on the printed page.
+The next part loops through all the command-line arguments.
+There are several cases of interest:
 
address@hidden @code{END} pattern, backslash continuation and
address@hidden @code{\} (backslash), continuing lines and
address@hidden backslash (@code{\}), continuing lines and
-Just a note on programming style: you may have noticed that the @code{END}
-rule uses backslash continuation, with the open brace on a line by
-itself.  This is so that it more closely resembles the way functions
-are written.  Many of the examples
-in this @value{CHAPTER}
-use this style. You can decide for yourself if you like writing
-your @code{BEGIN} and @code{END} rules this way
-or not.
address@hidden ENDOFRANGE regexps
address@hidden ENDOFRANGE sfregexp
address@hidden ENDOFRANGE fsregexp
address@hidden @code
address@hidden --
+This ends the arguments to @command{igawk}.  Anything else should be passed on
+to the user's @command{awk} program without being evaluated.
 
address@hidden Id Program
address@hidden Printing out User Information
address@hidden -W
+This indicates that the next option is specific to @command{gawk}.  To make
+argument processing easier, the @option{-W} is appended to the front of the
+remaining arguments and the loop continues.  (This is an @command{sh}
+programming trick.  Don't worry about it if you are not familiar with
address@hidden)
 
address@hidden printing, user information
address@hidden users, information about, printing
address@hidden @command{id} utility
-The @command{id} utility lists a user's real and effective user ID numbers,
-real and effective group ID numbers, and the user's group set, if any.
address@hidden only prints the effective user ID and group ID if they are
-different from the real ones.  If possible, @command{id} also supplies the
-corresponding user and group names.  The output might look like this:
address@hidden address@hidden,} -F
+These are saved and passed on to @command{gawk}.
 
address@hidden
-$ @kbd{id}
address@hidden uid=500(arnold) gid=500(arnold) groups=6(disk),7(lp),19(floppy)
address@hidden example
address@hidden address@hidden,} address@hidden,} address@hidden,} -Wfile=
+The @value{FN} is appended to the shell variable @code{program} with an
address@hidden@@include} statement.
+The @command{expr} utility is used to remove the leading option part of the
+argument (e.g., @samp{--file=}).
+(Typical @command{sh} usage would be to use the @command{echo} and 
@command{sed}
+utilities to do this work.  Unfortunately, some versions of @command{echo} 
evaluate
+escape sequences in their arguments, possibly mangling the program text.
+Using @command{expr} avoids this problem.)
 
address@hidden @code{PROCINFO} array
-This information is part of what is provided by @command{gawk}'s
address@hidden array (@pxref{Built-in Variables}).
-However, the @command{id} utility provides a more palatable output than just
-individual numbers.
address@hidden address@hidden,} address@hidden,} -Wsource=
+The source text is appended to @code{program}.
 
-Here is a simple version of @command{id} written in @command{awk}.
-It uses the user database library functions
-(@pxref{Passwd Functions})
-and the group database library functions
-(@pxref{Group Functions}):
address@hidden address@hidden,} -Wversion
address@hidden prints its version number, runs @samp{gawk --version}
+to get the @command{gawk} version information, and then exits.
address@hidden table
 
-The program is fairly straightforward.  All the work is done in the
address@hidden rule.  The user and group ID numbers are obtained from
address@hidden
-The code is repetitive.  The entry in the user database for the real user ID
-number is split into parts at the @samp{:}. The name is the first field.
-Similar code is used for the effective user ID number and the group
-numbers:
+If none of the @option{-f}, @option{--file}, @option{-Wfile}, 
@option{--source},
+or @option{-Wsource} arguments are supplied, then the first nonoption argument
+should be the @command{awk} program.  If there are no command-line
+arguments left, @command{igawk} prints an error message and exits.
+Otherwise, the first argument is appended to @code{program}.
+In any case, after the arguments have been processed,
address@hidden contains the complete text of the original @command{awk}
+program.
 
address@hidden @code{id.awk} program
+The program is as follows:
+
address@hidden @code{igawk.sh} program
 @example
address@hidden file eg/prog/id.awk
-# id.awk --- implement id in awk
-#
-# Requires user and group library functions
address@hidden file eg/prog/igawk.sh
+#! /bin/sh
+# igawk --- like gawk but do @@include processing
 @c endfile
 @ignore
address@hidden file eg/prog/id.awk
address@hidden file eg/prog/igawk.sh
 #
 # Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
-# Revised February 1996
-
+# July 1993
+# December 2010, minor edits
 @c endfile
 @end ignore
address@hidden file eg/prog/id.awk
-# output is:
-# uid=12(foo) euid=34(bar) gid=3(baz) \
-#             egid=5(blat) groups=9(nine),2(two),1(one)
-
address@hidden
-BEGIN    \
address@hidden
-    uid = PROCINFO["uid"]
-    euid = PROCINFO["euid"]
-    gid = PROCINFO["gid"]
-    egid = PROCINFO["egid"]
address@hidden group
-
-    printf("uid=%d", uid)
-    pw = getpwuid(uid)
-    if (pw != "") @{
-        split(pw, a, ":")
-        printf("(%s)", a[1])
-    @}
address@hidden file eg/prog/igawk.sh
 
-    if (euid != uid) @{
-        printf(" euid=%d", euid)
-        pw = getpwuid(euid)
-        if (pw != "") @{
-            split(pw, a, ":")
-            printf("(%s)", a[1])
-        @}
-    @}
+if [ "$1" = debug ]
+then
+    set -x
+    shift
+fi
 
-    printf(" gid=%d", gid)
-    pw = getgrgid(gid)
-    if (pw != "") @{
-        split(pw, a, ":")
-        printf("(%s)", a[1])
-    @}
+# A literal newline, so that program text is formatted correctly
+n='
+'
 
-    if (egid != gid) @{
-        printf(" egid=%d", egid)
-        pw = getgrgid(egid)
-        if (pw != "") @{
-            split(pw, a, ":")
-            printf("(%s)", a[1])
-        @}
-    @}
+# Initialize variables to empty
+program=
+opts=
 
-    for (i = 1; ("group" i) in PROCINFO; i++) @{
-        if (i == 1)
-            printf(" groups=")
-        group = PROCINFO["group" i]
-        printf("%d", group)
-        pw = getgrgid(group)
-        if (pw != "") @{
-            split(pw, a, ":")
-            printf("(%s)", a[1])
-        @}
-        if (("group" (i+1)) in PROCINFO)
-            printf(",")
-    @}
+while [ $# -ne 0 ] # loop over arguments
+do
+    case $1 in
+    --)     shift
+            break ;;
 
-    print ""
address@hidden
address@hidden endfile
address@hidden example
+    -W)     shift
+            # The address@hidden'message here'@} construct prints a
+            # diagnostic if $x is the null string
+            set -- -W"address@hidden@@?'missing operand'@}"
+            continue ;;
 
address@hidden @code{in} operator
-The test in the @code{for} loop is worth noting.
-Any supplementary groups in the @code{PROCINFO} array have the
-indices @code{"group1"} through @code{"address@hidden"} for some
address@hidden, i.e., the total number of supplementary groups.
-However, we don't know in advance how many of these groups
-there are.
+    -[vF])  opts="$opts $1 'address@hidden'missing operand'@}'"
+            shift ;;
 
-This loop works by starting at one, concatenating the value with
address@hidden"group"}, and then using @code{in} to see if that value is
-in the array.  Eventually, @code{i} is incremented past
-the last group in the array and the loop exits.
+    -[vF]*) opts="$opts '$1'" ;;
 
-The loop is also correct if there are @emph{no} supplementary
-groups; then the condition is false the first time it's
-tested, and the loop body never executes.
+    -f)     program="$program$n@@include address@hidden'missing operand'@}"
+            shift ;;
 
address@hidden exercise!!!
address@hidden
-The POSIX version of @command{id} takes arguments that control which
-information is printed.  Modify this version to accept the same
-arguments and perform in the same way.
address@hidden ignore
+    -f*)    f=$(expr "$1" : '-f\(.*\)')
+            program="$program$n@@include $f" ;;
 
address@hidden Split Program
address@hidden Splitting a Large File into Pieces
+    -[W-]file=*)
+            f=$(expr "$1" : '-.file=\(.*\)')
+            program="$program$n@@include $f" ;;
 
address@hidden FIXME: One day, update to current POSIX version of split
+    -[W-]file)
+            program="$program$n@@include address@hidden'missing operand'@}"
+            shift ;;
 
address@hidden STARTOFRANGE filspl
address@hidden files, splitting
address@hidden @code{split} utility
-The @command{split} program splits large text files into smaller pieces.
-Usage is as follows:@footnote{This is the traditional usage. The
-POSIX usage is different, but not relevant for what the program
-aims to demonstrate.}
+    -[W-]source=*)
+            t=$(expr "$1" : '-.source=\(.*\)')
+            program="$program$n$t" ;;
 
address@hidden
-split @address@hidden@r{]} file @r{[} @var{prefix} @r{]}
address@hidden example
+    -[W-]source)
+            program="address@hidden'missing operand'@}"
+            shift ;;
 
-By default,
-the output files are named @file{xaa}, @file{xab}, and so on. Each file has
-1000 lines in it, with the likely exception of the last file. To change the
-number of lines in each file, supply a number on the command line
-preceded with a minus; e.g., @samp{-500} for files with 500 lines in them
-instead of 1000.  To change the name of the output files to something like
address@hidden, @file{myfileab}, and so on, supply an additional
-argument that specifies the @value{FN} prefix.
+    -[W-]version)
+            echo igawk: version 3.0 1>&2
+            gawk --version
+            exit 0 ;;
 
-Here is a version of @command{split} in @command{awk}. It uses the
address@hidden()} and @code{chr()} functions presented in
address@hidden Functions}.
+    -[W-]*) opts="$opts '$1'" ;;
 
-The program first sets its defaults, and then tests to make sure there are
-not too many arguments.  It then looks at each argument in turn.  The
-first argument could be a minus sign followed by a number. If it is, this 
happens
-to look like a negative number, so it is made positive, and that is the
-count of lines.  The data @value{FN} is skipped over and the final argument
-is used as the prefix for the output @value{FN}s:
+    *)      break ;;
+    esac
+    shift
+done
 
address@hidden @code{split.awk} program
address@hidden
address@hidden file eg/prog/split.awk
-# split.awk --- do split in awk
-#
-# Requires ord() and chr() library functions
address@hidden endfile
address@hidden
address@hidden file eg/prog/split.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
+if [ -z "$program" ]
+then
+     address@hidden'missing program'@}
+     shift
+fi
 
+# At this point, `program' has the program.
 @c endfile
address@hidden ignore
address@hidden file eg/prog/split.awk
-# usage: split [-num] [file] [outname]
-
-BEGIN @{
-    outfile = "x"    # default
-    count = 1000
-    if (ARGC > 4)
-        usage()
address@hidden example
 
-    i = 1
-    if (ARGV[i] ~ /^-[[:digit:]]+$/) @{
-        count = -ARGV[i]
-        ARGV[i] = ""
-        i++
-    @}
-    # test argv in case reading from stdin instead of file
-    if (i in ARGV)
-        i++    # skip data file name
-    if (i in ARGV) @{
-        outfile = ARGV[i]
-        ARGV[i] = ""
-    @}
+The @command{awk} program to process @samp{@@include} directives
+is stored in the shell variable @code{expand_prog}.  Doing this keeps
+the shell script readable.  The @command{awk} program
+reads through the user's program, one line at a time, using @code{getline}
+(@pxref{Getline}).  The input
address@hidden and @samp{@@include} statements are managed using a stack.
+As each @samp{@@include} is encountered, the current @value{FN} is
+``pushed'' onto the stack and the file named in the @samp{@@include}
+directive becomes the current @value{FN}.  As each file is finished,
+the stack is ``popped,'' and the previous input file becomes the current
+input file again.  The process is started by making the original file
+the first one on the stack.
 
-    s1 = s2 = "a"
-    out = (outfile s1 s2)
address@hidden
address@hidden endfile
address@hidden example
+The @code{pathto()} function does the work of finding the full path to
+a file.  It simulates @command{gawk}'s behavior when searching the
address@hidden environment variable
+(@pxref{AWKPATH Variable}).
+If a @value{FN} has a @samp{/} in it, no path search is done.
+Similarly, if the @value{FN} is @code{"-"}, then that string is
+used as-is.  Otherwise,
+the @value{FN} is concatenated with the name of each directory in
+the path, and an attempt is made to open the generated @value{FN}.
+The only way to test if a file can be read in @command{awk} is to go
+ahead and try to read it with @code{getline}; this is what @code{pathto()}
address@hidden some very old versions of @command{awk}, the test
address@hidden junk < t} can loop forever if the file exists but is empty.
+Caveat emptor.} If the file can be read, it is closed and the @value{FN}
+is returned:
 
-The next rule does most of the work. @code{tcount} (temporary count) tracks
-how many lines have been printed to the output file so far. If it is greater
-than @code{count}, it is time to close the current file and start a new one.
address@hidden and @code{s2} track the current suffixes for the @value{FN}. If
-they are both @samp{z}, the file is just too big.  Otherwise, @code{s1}
-moves to the next letter in the alphabet and @code{s2} starts over again at
address@hidden:
address@hidden
+An alternative way to test for the file's existence would be to call
address@hidden("test -r " t)}, which uses the @command{test} utility to
+see if the file exists and is readable.  The disadvantage to this method
+is that it requires creating an extra process and can thus be slightly
+slower.
address@hidden ignore
 
address@hidden else on separate line here for page breaking
 @example
address@hidden file eg/prog/split.awk
address@hidden file eg/prog/igawk.sh
+expand_prog='
+
+function pathto(file,    i, t, junk)
 @{
-    if (++tcount > count) @{
-        close(out)
-        if (s2 == "z") @{
-            if (s1 == "z") @{
-                printf("split: %s is too large to split\n",
-                       FILENAME) > "/dev/stderr"
-                exit 1
-            @}
-            s1 = chr(ord(s1) + 1)
-            s2 = "a"
-        @}
+    if (index(file, "/") != 0)
+        return file
+
+    if (file == "-")
+        return file
+
+    for (i = 1; i <= ndirs; i++) @{
+        t = (pathlist[i] "/" file)
 @group
-        else
-            s2 = chr(ord(s2) + 1)
+        if ((getline junk < t) > 0) @{
+            # found it
+            close(t)
+            return t
+        @}
 @end group
-        out = (outfile s1 s2)
-        tcount = 1
     @}
-    print > out
+    return ""
 @}
 @c endfile
 @end example
 
address@hidden Exercise: do this with just awk builtin functions, 
index("abc..."), substr, etc.
-
address@hidden
-The @code{usage()} function simply prints an error message and exits:
+The main program is contained inside one @code{BEGIN} rule.  The first thing it
+does is set up the @code{pathlist} array that @code{pathto()} uses.  After
+splitting the path on @samp{:}, null elements are replaced with @code{"."},
+which represents the current directory:
 
 @example
address@hidden file eg/prog/split.awk
-function usage(   e)
address@hidden
-    e = "usage: split [-num] [file] [outname]"
-    print e > "/dev/stderr"
-    exit 1
address@hidden
address@hidden file eg/prog/igawk.sh
+BEGIN @{
+    path = ENVIRON["AWKPATH"]
+    ndirs = split(path, pathlist, ":")
+    for (i = 1; i <= ndirs; i++) @{
+        if (pathlist[i] == "")
+            pathlist[i] = "."
+    @}
 @c endfile
 @end example
 
address@hidden
-The variable @code{e} is used so that the function
-fits nicely on the
address@hidden
-screen.
address@hidden ifinfo
address@hidden
-page.
address@hidden ifnotinfo
-
-This program is a bit sloppy; it relies on @command{awk} to automatically 
close the last file
-instead of doing it in an @code{END} rule.
-It also assumes that letters are contiguous in the character set,
-which isn't true for EBCDIC systems.
-
address@hidden Exercise: Fix these problems.
address@hidden BFD...
address@hidden ENDOFRANGE filspl
+The stack is initialized with @code{ARGV[1]}, which will be @file{/dev/stdin}.
+The main loop comes next.  Input lines are read in succession. Lines that
+do not start with @samp{@@include} are printed verbatim.
+If the line does start with @samp{@@include}, the @value{FN} is in @code{$2}.
address@hidden()} is called to generate the full path.  If it cannot, then the 
program
+prints an error message and continues.
 
address@hidden Tee Program
address@hidden Duplicating Output into Multiple Files
+The next thing to check is if the file is included already.  The
address@hidden array is indexed by the full @value{FN} of each included
+file and it tracks this information for us.  If the file is
+seen again, a warning message is printed. Otherwise, the new @value{FN} is
+pushed onto the stack and processing continues.
 
address@hidden files, address@hidden duplicating output into
address@hidden output, duplicating into files
address@hidden @code{tee} utility
-The @code{tee} program is known as a ``pipe fitting.''  @code{tee} copies
-its standard input to its standard output and also duplicates it to the
-files named on the command line.  Its usage is as follows:
+Finally, when @code{getline} encounters the end of the input file, the file
+is closed and the stack is popped.  When @code{stackptr} is less than zero,
+the program is done:
 
 @example
-tee @address@hidden file @dots{}
address@hidden example
-
-The @option{-a} option tells @code{tee} to append to the named files, instead 
of
-truncating them and starting over.
-
-The @code{BEGIN} rule first makes a copy of all the command-line arguments
-into an array named @code{copy}.
address@hidden is not copied, since it is not needed.
address@hidden cannot use @code{ARGV} directly, since @command{awk} attempts to
-process each @value{FN} in @code{ARGV} as input data.
address@hidden file eg/prog/igawk.sh
+    stackptr = 0
+    input[stackptr] = ARGV[1] # ARGV[1] is first file
 
address@hidden flag variables
-If the first argument is @option{-a}, then the flag variable
address@hidden is set to true, and both @code{ARGV[1]} and
address@hidden are deleted. If @code{ARGC} is less than two, then no
address@hidden were supplied and @code{tee} prints a usage message and exits.
-Finally, @command{awk} is forced to read the standard input by setting
address@hidden to @code{"-"} and @code{ARGC} to two:
+    for (; stackptr >= 0; stackptr--) @{
+        while ((getline < input[stackptr]) > 0) @{
+            if (tolower($1) != "@@include") @{
+                print
+                continue
+            @}
+            fpath = pathto($2)
address@hidden
+            if (fpath == "") @{
+                printf("igawk:%s:%d: cannot find %s\n",
+                    input[stackptr], FNR, $2) > "/dev/stderr"
+                continue
+            @}
address@hidden group
+            if (! (fpath in processed)) @{
+                processed[fpath] = input[stackptr]
+                input[++stackptr] = fpath  # push onto stack
+            @} else
+                print $2, "included in", input[stackptr],
+                    "already included in",
+                    processed[fpath] > "/dev/stderr"
+        @}
+        close(input[stackptr])
+    @}
address@hidden'  # close quote ends `expand_prog' variable
 
address@hidden @code{tee.awk} program
address@hidden
address@hidden file eg/prog/tee.awk
-# tee.awk --- tee in awk
-#
-# Copy standard input to all named output files.
-# Append content if -a option is supplied.
-#
+processed_program=$(gawk -- "$expand_prog" /dev/stdin << EOF
+$program
+EOF
+)
 @c endfile
address@hidden
address@hidden file eg/prog/tee.awk
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
-# Revised December 1995
address@hidden example
 
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/tee.awk
-BEGIN    \
address@hidden
-    for (i = 1; i < ARGC; i++)
-        copy[i] = ARGV[i]
+The shell construct @address@hidden << @var{marker}} is called a @dfn{here 
document}.
+Everything in the shell script up to the @var{marker} is fed to @var{command} 
as input.
+The shell processes the contents of the here document for variable and command 
substitution
+(and possibly other things as well, depending upon the shell).
 
-    if (ARGV[1] == "-a") @{
-        append = 1
-        delete ARGV[1]
-        delete copy[1]
-        ARGC--
-    @}
-    if (ARGC < 2) @{
-        print "usage: tee [-a] file ..." > "/dev/stderr"
-        exit 1
-    @}
-    ARGV[1] = "-"
-    ARGC = 2
address@hidden
address@hidden endfile
address@hidden example
+The shell construct @samp{$(@dots{})} is called @dfn{command substitution}.
+The output of the command inside the parentheses is substituted
+into the command line.
+Because the result is used in a variable assignment,
+it is saved as a single string, even if the results contain whitespace.
 
-The following single rule does all the work.  Since there is no pattern, it is
-executed for each line of input.  The body of the rule simply prints the
-line into each file on the command line, and then to the standard output:
+The expanded program is saved in the variable @code{processed_program}.
+It's done in these steps:
 
address@hidden
address@hidden file eg/prog/tee.awk
address@hidden
-    # moving the if outside the loop makes it run faster
-    if (append)
-        for (i in copy)
-            print >> copy[i]
-    else
-        for (i in copy)
-            print > copy[i]
-    print
address@hidden
address@hidden endfile
address@hidden example
address@hidden
address@hidden
+Run @command{gawk} with the @samp{@@include}-processing program (the
+value of the @code{expand_prog} shell variable) on standard input.
 
address@hidden
-It is also possible to write the loop this way:
address@hidden
+Standard input is the contents of the user's program, from the shell variable 
@code{program}.
+Its contents are fed to @command{gawk} via a here document.
 
address@hidden
-for (i in copy)
-    if (append)
-        print >> copy[i]
-    else
-        print > copy[i]
address@hidden example
address@hidden
+The results of this processing are saved in the shell variable 
@code{processed_program} by using command substitution.
address@hidden enumerate
 
address@hidden
-This is more concise but it is also less efficient.  The @samp{if} is
-tested for each record and for each output file.  By duplicating the loop
-body, the @samp{if} is only tested once for each input record.  If there are
address@hidden input records and @var{M} output files, the first method only
-executes @var{N} @samp{if} statements, while the second executes
address@hidden@address@hidden @samp{if} statements.
+The last step is to call @command{gawk} with the expanded program,
+along with the original
+options and command-line arguments that the user supplied.
 
-Finally, the @code{END} rule cleans up by closing all the output files:
address@hidden this causes more problems than it solves, so leave it out.
address@hidden
+The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk}
+to handle an interesting case. Suppose that the user's program only has
+a @code{BEGIN} rule and there are no @value{DF}s to read.
+The program should exit without reading any @value{DF}s.
+However, suppose that an included library file defines an @code{END}
+rule of its own. In this case, @command{gawk} will hang, reading standard
+input. In order to avoid this, @file{/dev/null} is explicitly added to the
+command-line. Reading from @file{/dev/null} always returns an immediate
+end of file indication.
+
address@hidden Hmm. Add /dev/null if $# is 0?  Still messes up ARGV. Sigh.
address@hidden ignore
 
 @example
address@hidden file eg/prog/tee.awk
-END    \
address@hidden
-    for (i in copy)
-        close(copy[i])
address@hidden
address@hidden file eg/prog/igawk.sh
+eval gawk $opts -- '"$processed_program"' '"$@@"'
 @c endfile
 @end example
 
address@hidden Uniq Program
address@hidden Printing Nonduplicated Lines of Text
-
address@hidden FIXME: One day, update to current POSIX version of uniq
+The @command{eval} command is a shell construct that reruns the shell's parsing
+process.  This keeps things properly quoted.
 
address@hidden STARTOFRANGE prunt
address@hidden printing, unduplicated lines of text
address@hidden STARTOFRANGE tpul
address@hidden address@hidden printing, unduplicated lines of
address@hidden @command{uniq} utility
-The @command{uniq} utility reads sorted lines of data on its standard
-input, and by default removes duplicate lines.  In other words, it only
-prints unique lines---hence the name.  @command{uniq} has a number of
-options. The usage is as follows:
+This version of @command{igawk} represents my fifth version of this program.
+There are four key simplifications that make the program work better:
 
address@hidden
-uniq @r{[}-udc @address@hidden@r{]]} @address@hidden@r{]} @r{[} @var{input 
file} @r{[} @var{output file} @r{]]}
address@hidden example
address@hidden @bullet
address@hidden
+Using @samp{@@include} even for the files named with @option{-f} makes building
+the initial collected @command{awk} program much simpler; all the
address@hidden@@include} processing can be done once.
 
-The options for @command{uniq} are:
address@hidden
+Not trying to save the line read with @code{getline}
+in the @code{pathto()} function when testing for the
+file's accessibility for use with the main program simplifies things
+considerably.
address@hidden what problem does this engender though - exercise
address@hidden answer, reading from "-" or /dev/stdin
 
address@hidden @code
address@hidden -d
-Print only repeated lines.
address@hidden
+Using a @code{getline} loop in the @code{BEGIN} rule does it all in one
+place.  It is not necessary to call out to a separate loop for processing
+nested @samp{@@include} statements.
 
address@hidden -u
-Print only nonrepeated lines.
address@hidden
+Instead of saving the expanded program in a temporary file, putting it in a 
shell variable
+avoids some potential security problems.
+This has the disadvantage that the script relies upon more features
+of the @command{sh} language, making it harder to follow for those who
+aren't familiar with @command{sh}.
address@hidden itemize
 
address@hidden -c
-Count lines. This option overrides @option{-d} and @option{-u}.  Both repeated
-and nonrepeated lines are counted.
+Also, this program illustrates that it is often worthwhile to combine
address@hidden and @command{awk} programming together.  You can usually
+accomplish quite a lot, without having to resort to low-level programming
+in C or C++, and it is frequently easier to do certain kinds of string
+and argument manipulation using the shell than it is in @command{awk}.
 
address@hidden address@hidden
-Skip @var{n} fields before comparing lines.  The definition of fields
-is similar to @command{awk}'s default: nonwhitespace characters separated
-by runs of spaces and/or TABs.
+Finally, @command{igawk} shows that it is not always necessary to add new
+features to a program; they can often be layered on top.
address@hidden
+With @command{igawk},
+there is no real reason to build @samp{@@include} processing into
address@hidden itself.
address@hidden ignore
 
address@hidden address@hidden
-Skip @var{n} characters before comparing lines.  Any fields specified with
address@hidden@var{n}} are skipped first.
address@hidden search paths
address@hidden search paths, for source files
address@hidden source address@hidden search path for
address@hidden files, address@hidden search path for
address@hidden directories, searching
+As an additional example of this, consider the idea of having two
+files in a directory in the search path:
 
address@hidden @var{input file}
-Data is read from the input file named on the command line, instead of from
-the standard input.
address@hidden @file
address@hidden default.awk
+This file contains a set of default library functions, such
+as @code{getopt()} and @code{assert()}.
 
address@hidden @var{output file}
-The generated output is sent to the named output file, instead of to the
-standard output.
address@hidden site.awk
+This file contains library functions that are specific to a site or
+installation; i.e., locally developed functions.
+Having a separate file allows @file{default.awk} to change with
+new @command{gawk} releases, without requiring the system administrator to
+update it each time by adding the local functions.
 @end table
 
-Normally @command{uniq} behaves as if both the @option{-d} and
address@hidden options are provided.
+One user
address@hidden Karl Berry, address@hidden, 10/95
+suggested that @command{gawk} be modified to automatically read these files
+upon startup.  Instead, it would be very simple to modify @command{igawk}
+to do this. Since @command{igawk} can process nested @samp{@@include}
+directives, @file{default.awk} could simply contain @samp{@@include}
+statements for the desired library functions.
 
address@hidden uses the
address@hidden()} library function
-(@pxref{Getopt Function})
-and the @code{join()} library function
-(@pxref{Join Function}).
address@hidden Exercise: make this change
address@hidden ENDOFRANGE libfex
address@hidden ENDOFRANGE flibex
address@hidden ENDOFRANGE awkpex
 
-The program begins with a @code{usage()} function and then a brief outline of
-the options and their meanings in comments.
-The @code{BEGIN} rule deals with the command-line arguments and options. It
-uses a trick to get @code{getopt()} to handle options of the form @samp{-25},
-treating such an option as the option letter @samp{2} with an argument of
address@hidden If indeed two or more digits are supplied (@code{Optarg} looks
-like a number), @code{Optarg} is
-concatenated with the option digit and then the result is added to zero to make
-it into a number.  If there is only one digit in the option, then
address@hidden is not needed. In this case, @code{Optind} must be decremented 
so that
address@hidden()} processes it next time.  This code is admittedly a bit
-tricky.
address@hidden Anagram Program
address@hidden Finding Anagrams From A Dictionary
 
-If no options are supplied, then the default is taken, to print both
-repeated and nonrepeated lines.  The output file, if provided, is assigned
-to @code{outputfile}.  Early on, @code{outputfile} is initialized to the
-standard output, @file{/dev/stdout}:
+An interesting programming challenge is to
+search for @dfn{anagrams} in a
+word list (such as
address@hidden/usr/share/dict/words} on many GNU/Linux systems).
+One word is an anagram of another if both words contain
+the same letters
+(for example, ``babbling'' and ``blabbing'').
 
address@hidden @code{uniq.awk} program
+An elegant algorithm is presented in Column 2, Problem C of
+Jon Bentley's @cite{Programming Pearls}, second edition.
+The idea is to give words that are anagrams a common signature,
+sort all the words together by their signature, and then print them.
+Dr.@: Bentley observes that taking the letters in each word and
+sorting them produces that common signature.
+
+The following program uses arrays of arrays to bring together
+words with the same signature and array sorting to print the words
+in sorted order.
+
address@hidden @code{anagram.awk} program
 @example
address@hidden file eg/prog/uniq.awk
address@hidden
-# uniq.awk --- do uniq in awk
-#
-# Requires getopt() and join() library functions
address@hidden group
address@hidden file eg/prog/anagram.awk
+# anagram.awk --- An implementation of the anagram finding algorithm
+#                 from Jon Bentley's "Programming Pearls", 2nd edition.
+#                 Addison Wesley, 2000, ISBN 0-201-65788-0.
+#                 Column 2, Problem C, section 2.8, pp 18-20.
 @c endfile
 @ignore
address@hidden file eg/prog/uniq.awk
address@hidden file eg/prog/anagram.awk
 #
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
+# This program requires gawk 4.0 or newer.
+# Required gawk-specific features:
+#   - True multidimensional arrays
+#   - split() with "" as separator splits out individual characters
+#   - asort() and asorti() functions
+#
+# See http://savannah.gnu.org/projects/gawk.
+#
+# Arnold Robbins
+# arnold@@skeeve.com
+# Public Domain
+# January, 2011
 @c endfile
 @end ignore
address@hidden file eg/prog/uniq.awk
-
-function usage(    e)
address@hidden
-    e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]"
-    print e > "/dev/stderr"
-    exit 1
address@hidden
-
-# -c    count lines. overrides -d and -u
-# -d    only repeated lines
-# -u    only nonrepeated lines
-# -n    skip n fields
-# +n    skip n characters, skip fields first
-
-BEGIN   \
address@hidden
-    count = 1
-    outputfile = "/dev/stdout"
-    opts = "udc0:1:2:3:4:5:6:7:8:9:"
-    while ((c = getopt(ARGC, ARGV, opts)) != -1) @{
-        if (c == "u")
-            non_repeated_only++
-        else if (c == "d")
-            repeated_only++
-        else if (c == "c")
-            do_count++
-        else if (index("0123456789", c) != 0) @{
-            # getopt requires args to options
-            # this messes us up for things like -5
-            if (Optarg ~ /^[[:digit:]]+$/)
-                fcount = (c Optarg) + 0
-            else @{
-                fcount = c + 0
-                Optind--
-            @}
-        @} else
-            usage()
-    @}
-
-    if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) @{
-        charcount = substr(ARGV[Optind], 2) + 0
-        Optind++
-    @}
address@hidden file eg/prog/anagram.awk
 
-    for (i = 1; i < Optind; i++)
-        ARGV[i] = ""
+/'s$/   @{ next @}        # Skip possessives
address@hidden endfile
address@hidden example
 
-    if (repeated_only == 0 && non_repeated_only == 0)
-        repeated_only = non_repeated_only = 1
+The program starts with a header, and then a rule to skip
+possessives in the dictionary file. The next rule builds
+up the data structure. The first dimension of the array
+is indexed by the signature; the second dimension is the word
+itself:
 
-    if (ARGC - Optind == 2) @{
-        outputfile = ARGV[ARGC - 1]
-        ARGV[ARGC - 1] = ""
-    @}
address@hidden
address@hidden file eg/prog/anagram.awk
address@hidden
+    key = word2key($1)  # Build signature
+    data[key][$1] = $1  # Store word with signature
 @}
 @c endfile
 @end example
 
-The following function, @code{are_equal()}, compares the current line,
address@hidden, to the
-previous line, @code{last}.  It handles skipping fields and characters.
-If no field count and no character count are specified, @code{are_equal()}
-simply returns one or zero depending upon the result of a simple string
-comparison of @code{last} and @code{$0}.  Otherwise, things get more
-complicated.
-If fields have to be skipped, each line is broken into an array using
address@hidden()}
-(@pxref{String Functions});
-the desired fields are then joined back into a line using @code{join()}.
-The joined lines are stored in @code{clast} and @code{cline}.
-If no fields are skipped, @code{clast} and @code{cline} are set to
address@hidden and @code{$0}, respectively.
-Finally, if characters are skipped, @code{substr()} is used to strip off the
-leading @code{charcount} characters in @code{clast} and @code{cline}.  The
-two strings are then compared and @code{are_equal()} returns the result:
+The @code{word2key()} function creates the signature.
+It splits the word apart into individual letters,
+sorts the letters, and then joins them back together:
 
 @example
address@hidden file eg/prog/uniq.awk
-function are_equal(    n, m, clast, cline, alast, aline)
address@hidden file eg/prog/anagram.awk
+# word2key --- split word apart into letters, sort, joining back together
+
+function word2key(word,     a, i, n, result)
 @{
-    if (fcount == 0 && charcount == 0)
-        return (last == $0)
+    n = split(word, a, "")
+    asort(a)
 
-    if (fcount > 0) @{
-        n = split(last, alast)
-        m = split($0, aline)
-        clast = join(alast, fcount+1, n)
-        cline = join(aline, fcount+1, m)
-    @} else @{
-        clast = last
-        cline = $0
-    @}
-    if (charcount) @{
-        clast = substr(clast, charcount + 1)
-        cline = substr(cline, charcount + 1)
-    @}
+    for (i = 1; i <= n; i++)
+        result = result a[i]
 
-    return (clast == cline)
+    return result
 @}
 @c endfile
 @end example
 
-The following two rules are the body of the program.  The first one is
-executed only for the very first line of data.  It sets @code{last} equal to
address@hidden, so that subsequent lines of text have something to be compared 
to.
+Finally, the @code{END} rule traverses the array
+and prints out the anagram lists.  It sends the output
+to the system @command{sort} command, since otherwise
+the anagrams would appear in arbitrary order:
 
-The second rule does the work. The variable @code{equal} is one or zero,
-depending upon the results of @code{are_equal()}'s comparison. If 
@command{uniq}
-is counting repeated lines, and the lines are equal, then it increments the 
@code{count} variable.
-Otherwise, it prints the line and resets @code{count},
-since the two lines are not equal.
address@hidden
address@hidden file eg/prog/anagram.awk
+END @{
+    sort = "sort"
+    for (key in data) @{
+        # Sort words with same key
+        nwords = asorti(data[key], words)
+        if (nwords == 1)
+            continue
 
-If @command{uniq} is not counting, and if the lines are equal, @code{count} is 
incremented.
-Nothing is printed, since the point is to remove duplicates.
-Otherwise, if @command{uniq} is counting repeated lines and more than
-one line is seen, or if @command{uniq} is counting nonrepeated lines
-and only one line is seen, then the line is printed, and @code{count}
-is reset.
+        # And print. Minor glitch: trailing space at end of each line
+        for (j = 1; j <= nwords; j++)
+            printf("%s ", words[j]) | sort
+        print "" | sort
+    @}
+    close(sort)
address@hidden
address@hidden endfile
address@hidden example
 
-Finally, similar logic is used in the @code{END} rule to print the final
-line of input data:
+Here is some partial output when the program is run:
 
 @example
address@hidden file eg/prog/uniq.awk
-NR == 1 @{
-    last = $0
-    next
address@hidden
+$ @kbd{gawk -f anagram.awk /usr/share/dict/words | grep '^b'}
address@hidden
+babbled blabbed 
+babbler blabber brabble 
+babblers blabbers brabbles 
+babbling blabbing 
+babbly blabby 
+babel bable 
+babels beslab 
+babery yabber 
address@hidden
address@hidden example
 
address@hidden
-    equal = are_equal()
address@hidden Signature Program
address@hidden And Now For Something Completely Different
 
-    if (do_count) @{    # overrides -d and -u
-        if (equal)
-            count++
-        else @{
-            printf("%4d %s\n", count, last) > outputfile
-            last = $0
-            count = 1    # reset
-        @}
-        next
-    @}
+The following program was written by Davide Brini
address@hidden (@email{dave_br@@gmx.com})
+and is published on @uref{http://backreference.org/2011/02/03/obfuscated-awk/,
+his website}.
+It serves as his signature in the Usenet group @code{comp.lang.awk}.
+He supplies the following copyright terms:
 
-    if (equal)
-        count++
-    else @{
-        if ((repeated_only && count > 1) ||
-            (non_repeated_only && count == 1))
-                print last > outputfile
-        last = $0
-        count = 1
-    @}
address@hidden
address@hidden
+Copyright @copyright{} 2008 Davide Brini
+
+Copying and distribution of the code published in this page, with or without
+modification, are permitted in any medium without royalty provided the 
copyright
+notice and this notice are preserved.
address@hidden quotation
+
+Here is the program:
+
address@hidden
+awk 'address@hidden"~"~"~";o="=="=="==";o+=+o;x=O""O;while(X++<=x+o+o)c=c"%c";
+printf c,(x-O)*(x-O),x*(x-o)-o,x*(x-O)+x-O-o,+x*(x-O)-x+o,X*(o*o+O)+x-O,
+X*(X-x)-o*o,(x+X)*o*o+o,x*(X-x)-O-O,x-O+(O+o+X+x)*(o+O),X*X-X*(x-O)-x+O,
+O+X*(o*(o+O)+O),+x+O+X*o,x*(x-o),(o+X+x)*o*o-(x-O-O),O+(X-x)*(X+O),address@hidden'
address@hidden example
+
+We leave it to you to determine what the program does.
+
address@hidden
+To: "Arnold Robbins" <address@hidden>
+Date: Sat, 20 Aug 2011 13:50:46 -0400
+Subject: The GNU Awk User's Guide, Section 13.3.11
+From: "Chris Johansen" <address@hidden>
+Message-ID: <address@hidden>
+
+Arnold, you don't know me, but we have a tenuous connection.  My wife is  
+Barbara A. Field, FAIA, GIT '65 (B. Arch.).
+
+I have had a couple of paper copies of "Effective Awk Programming" for  
+years, and now I'm going through a Kindle version of "The GNU Awk User's  
+Guide" again.  When I got to section 13.3.11, I reformatted and lightly  
+commented Davide Brin's signature script to understand its workings.
+
+It occurs to me that this might have pedagogical value as an example  
+(although imperfect) of the value of whitespace and comments, and a  
+starting point for that discussion.  It certainly helped _me_ understand  
+what's going on.  You are welcome to it, as-is or modified (subject to  
+Davide's constraints, of course, which I think I have met).
+
+If I were to include it in a future edition, I would put it at some  
+distance from section 13.3.11, say, as a note or an appendix, so as not to  
+be a "spoiler" to the puzzle.
+
+Best regards,
+-- 
+Chris Johansen {johansen at main dot nc dot us}
+  . . . collapsing the probability wave function, sending ripples of  
+certainty through the space-time continuum.
+
+
+#! /usr/bin/gawk -f
+
+# From "13.3.11 And Now For Something Completely Different"
+#   
http://www.gnu.org/software/gawk/manual/html_node/Signature-Program.html#Signature-Program
+
+# Copyright © 2008 Davide Brini 
+
+# Copying and distribution of the code published in this page, with
+# or without modification, are permitted in any medium without
+# royalty provided the copyright notice and this notice are preserved.
+
+BEGIN {
+  O = "~" ~ "~";    #  1
+  o = "==" == "=="; #  1
+  o += +o;          #  2
+  x = O "" O;       # 11
+
+
+  while ( X++ <= x + o + o ) c = c "%c";
+
+  # O is  1
+  # o is  2
+  # x is 11
+  # X is 17
+  # c is "%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c"
+
+  printf c,
+    ( x - O )*( x - O),                  # 100 d
+    x*( x - o ) - o,                     #  97 a
+    x*( x - O ) + x - O - o,             # 118 v
+    +x*( x - O ) - x + o,                # 101 e
+    X*( o*o + O ) + x - O,               #  95 _
+    X*( X - x ) - o*o,                   #  98 b
+    ( x + X )*o*o + o,                   # 114 r
+    x*( X - x ) - O - O,                 #  64 @
+    x - O + ( O + o + X + x )*( o + O ), # 103 g
+    X*X - X*( x - O ) - x + O,           # 109 m
+    O + X*( o*( o + O ) + O ),           # 120 x
+    +x + O + X*o,                        #  46 .
+    x*( x - o),                          #  99 c
+    ( o + X + x )*o*o - ( x - O - O ),   # 111 0
+    O + ( X - x )*( X + O ),             # 109 m
+    x - O                                #  10 \n
+}
address@hidden ignore
+
address@hidden
address@hidden Part III:@* Moving Beyond Standard @command{awk} With 
@command{gawk}
address@hidden iftex
+
address@hidden
address@hidden
 
-END @{
-    if (do_count)
-        printf("%4d %s\n", count, last) > outputfile
-    else if ((repeated_only && count > 1) ||
-            (non_repeated_only && count == 1))
-        print last > outputfile
-    close(outputfile)
address@hidden
address@hidden endfile
address@hidden example
address@hidden ENDOFRANGE prunt
address@hidden ENDOFRANGE tpul
address@hidden Part III:@* Moving Beyond Standard @command{awk} With 
@command{gawk}
 
address@hidden Wc Program
address@hidden Counting Things
+Part III focuses on features specific to @command{gawk}.
+It contains the following chapters:
 
address@hidden FIXME: One day, update to current POSIX version of wc
address@hidden @bullet
address@hidden
address@hidden
 
address@hidden STARTOFRANGE count
address@hidden counting
address@hidden STARTOFRANGE infco
address@hidden input files, counting elements in
address@hidden STARTOFRANGE woco
address@hidden words, counting
address@hidden STARTOFRANGE chco
address@hidden characters, counting
address@hidden STARTOFRANGE lico
address@hidden lines, counting
address@hidden @command{wc} utility
-The @command{wc} (word count) utility counts lines, words, and characters in
-one or more input files. Its usage is as follows:
address@hidden
address@hidden Features}.
 
address@hidden
-wc @address@hidden @r{[} @var{files} @dots{} @r{]}
address@hidden example
address@hidden
address@hidden
 
-If no files are specified on the command line, @command{wc} reads its standard
-input. If there are multiple files, it also prints total counts for all
-the files.  The options and their meanings are shown in the following list:
address@hidden
address@hidden Precision Arithmetic}.
 
address@hidden @code
address@hidden -l
-Count only lines.
address@hidden
address@hidden Extensions}.
address@hidden ifdocbook
address@hidden ignore
 
address@hidden -w
-Count only words.
-A ``word'' is a contiguous sequence of nonwhitespace characters, separated
-by spaces and/or TABs.  Luckily, this is the normal way @command{awk} separates
-fields in its input data.
address@hidden Internationalization
address@hidden Internationalization with @command{gawk}
 
address@hidden -c
-Count only characters.
address@hidden table
+Once upon a time, computer makers
+wrote software that worked only in English.
+Eventually, hardware and software vendors noticed that if their
+systems worked in the native languages of non-English-speaking
+countries, they were able to sell more systems.
+As a result, internationalization and localization
+of programs and software systems became a common practice.
 
-Implementing @command{wc} in @command{awk} is particularly elegant,
-since @command{awk} does a lot of the work for us; it splits lines into
-words (i.e., fields) and counts them, it counts lines (i.e., records),
-and it can easily tell us how long a line is.
address@hidden STARTOFRANGE inloc
address@hidden internationalization, localization
address@hidden @command{gawk}, internationalization and, See 
internationalization
address@hidden internationalization, localization, @command{gawk} and
+For many years, the ability to provide internationalization
+was largely restricted to programs written in C and C++.
+This @value{CHAPTER} describes the underlying library @command{gawk}
+uses for internationalization, as well as how
address@hidden makes internationalization
+features available at the @command{awk} program level.
+Having internationalization available at the @command{awk} level
+gives software developers additional flexibility---they are no
+longer forced to write in C or C++ when internationalization is
+a requirement.
 
-This program uses the @code{getopt()} library function
-(@pxref{Getopt Function})
-and the file-transition functions
-(@pxref{Filetrans Function}).
address@hidden
+* I18N and L10N::               Internationalization and Localization.
+* Explaining gettext::          How GNU @code{gettext} works.
+* Programmer i18n::             Features for the programmer.
+* Translator i18n::             Features for the translator.
+* I18N Example::                A simple i18n example.
+* Gawk I18N::                   @command{gawk} is also internationalized.
address@hidden menu
 
-This version has one notable difference from traditional versions of
address@hidden: it always prints the counts in the order lines, words,
-and characters.  Traditional versions note the order of the @option{-l},
address@hidden, and @option{-c} options on the command line, and print the
-counts in that order.
address@hidden I18N and L10N
address@hidden Internationalization and Localization
 
-The @code{BEGIN} rule does the argument processing.  The variable
address@hidden is true if more than one file is named on the
-command line:
address@hidden internationalization
address@hidden localization, See address@hidden localization
address@hidden localization
address@hidden means writing (or modifying) a program once,
+in such a way that it can use multiple languages without requiring
+further source-code changes.
address@hidden means providing the data necessary for an
+internationalized program to work in a particular language.
+Most typically, these terms refer to features such as the language
+used for printing error messages, the language used to read
+responses, and information related to how numerical and
+monetary values are printed and read.
 
address@hidden @code{wc.awk} program
address@hidden
address@hidden file eg/prog/wc.awk
-# wc.awk --- count lines, words, characters
address@hidden endfile
address@hidden
address@hidden file eg/prog/wc.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/wc.awk
address@hidden Explaining gettext
address@hidden GNU @code{gettext}
 
-# Options:
-#    -l    only count lines
-#    -w    only count words
-#    -c    only count characters
-#
-# Default is to count lines, words, characters
-#
-# Requires getopt() and file transition library functions
address@hidden internationalizing a program
address@hidden STARTOFRANGE gettex
address@hidden @code{gettext} library
+The facilities in GNU @code{gettext} focus on messages; strings printed
+by a program, either directly or via formatting with @code{printf} or
address@hidden()address@hidden some operating systems, the @command{gawk}
+port doesn't support GNU @code{gettext}.
+Therefore, these features are not available
+if you are using one of those operating systems. Sorry.}
 
-BEGIN @{
-    # let getopt() print a message about
-    # invalid options. we ignore them
-    while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{
-        if (c == "l")
-            do_lines = 1
-        else if (c == "w")
-            do_words = 1
-        else if (c == "c")
-            do_chars = 1
-    @}
-    for (i = 1; i < Optind; i++)
-        ARGV[i] = ""
address@hidden portability, @code{gettext} library and
+When using GNU @code{gettext}, each application has its own
address@hidden domain}.  This is a unique name, such as @samp{kpilot} or 
@samp{gawk},
+that identifies the application.
+A complete application may have multiple components---programs written
+in C or C++, as well as scripts written in @command{sh} or @command{awk}.
+All of the components use the same text domain.
 
-    # if no options, do all
-    if (! do_lines && ! do_words && ! do_chars)
-        do_lines = do_words = do_chars = 1
+To make the discussion concrete, assume we're writing an application
+named @command{guide}.  Internationalization consists of the
+following steps, in this order:
 
-    print_total = (ARGC - i > 2)
address@hidden
address@hidden endfile
address@hidden example
address@hidden
address@hidden
+The programmer goes
+through the source for all of @command{guide}'s components
+and marks each string that is a candidate for translation.
+For example, @code{"`-F': option required"} is a good candidate for 
translation.
+A table with strings of option names is not (e.g., @command{gawk}'s
address@hidden option should remain the same, no matter what the local
+language).
 
-The @code{beginfile()} function is simple; it just resets the counts of lines,
-words, and characters to zero, and saves the current @value{FN} in
address@hidden:
address@hidden @code{textdomain()} function (C library)
address@hidden
+The programmer indicates the application's text domain
+(@code{"guide"}) to the @code{gettext} library,
+by calling the @code{textdomain()} function.
 
address@hidden
address@hidden file eg/prog/wc.awk
-function beginfile(file)
address@hidden
-    lines = words = chars = 0
-    fname = FILENAME
address@hidden
address@hidden endfile
address@hidden example
address@hidden @code{.pot} files
address@hidden files, @code{.pot}
address@hidden portable object template files
address@hidden files, portable object template
address@hidden
+Messages from the application are extracted from the source code and
+collected into a portable object template file (@file{guide.pot}),
+which lists the strings and their translations.
+The translations are initially empty.
+The original (usually English) messages serve as the key for
+lookup of the translations.
 
-The @code{endfile()} function adds the current file's numbers to the running
-totals of lines, words, and address@hidden@command{wc} can't just use the 
value of
address@hidden in @code{endfile()}. If you examine
-the code in
address@hidden Function},
-you will see that
address@hidden has already been reset by the time
address@hidden()} is called.}  It then prints out those numbers
-for the file that was just read. It relies on @code{beginfile()} to reset the
-numbers for the following @value{DF}:
address@hidden FIXME: ONE DAY: make the above footnote an exercise,
address@hidden instead of giving away the answer.
address@hidden @code{.po} files
address@hidden files, @code{.po}
address@hidden portable object files
address@hidden files, portable object
address@hidden
+For each language with a translator, @file{guide.pot}
+is copied to a portable object file (@code{.po})
+and translations are created and shipped with the application.
+For example, there might be a @file{fr.po} for a French translation.
+
address@hidden @code{.mo} files
address@hidden files, @code{.mo}
address@hidden message object files
address@hidden files, message object
address@hidden
+Each language's @file{.po} file is converted into a binary
+message object (@file{.mo}) file.
+A message object file contains the original messages and their
+translations in a binary format that allows fast lookup of translations
+at runtime.
+
address@hidden
+When @command{guide} is built and installed, the binary translation files
+are installed in a standard place.
+
address@hidden @code{bindtextdomain()} function (C library)
address@hidden
+For testing and development, it is possible to tell @code{gettext}
+to use @file{.mo} files in a different directory than the standard
+one by using the @code{bindtextdomain()} function.
 
address@hidden
address@hidden file eg/prog/wc.awk
-function endfile(file)
address@hidden
-    tlines += lines
-    twords += words
-    tchars += chars
-    if (do_lines)
-        printf "\t%d", lines
address@hidden
-    if (do_words)
-        printf "\t%d", words
address@hidden group
-    if (do_chars)
-        printf "\t%d", chars
-    printf "\t%s\n", fname
address@hidden
address@hidden endfile
address@hidden example
address@hidden @code{.mo} files, specifying directory of
address@hidden files, @code{.mo}, specifying directory of
address@hidden message object files, specifying directory of
address@hidden files, message object, specifying directory of
address@hidden
+At runtime, @command{guide} looks up each string via a call
+to @code{gettext()}.  The returned string is the translated string
+if available, or the original string if not.
 
-There is one rule that is executed for each line. It adds the length of
-the record, plus one, to @address@hidden @command{gawk}
-understands multibyte locales, this code counts characters, not bytes.}
-Adding one plus the record length
-is needed because the newline character separating records (the value
-of @code{RS}) is not part of the record itself, and thus not included
-in its length.  Next, @code{lines} is incremented for each line read,
-and @code{words} is incremented by the value of @code{NF}, which is the
-number of ``words'' on this line:
address@hidden
+If necessary, it is possible to access messages from a different
+text domain than the one belonging to the application, without
+having to switch the application's default text domain back
+and forth.
address@hidden enumerate
+
address@hidden @code{gettext()} function (C library)
+In C (or C++), the string marking and dynamic translation lookup
+are accomplished by wrapping each string in a call to @code{gettext()}:
 
 @example
address@hidden file eg/prog/wc.awk
-# do per line
address@hidden
-    chars += length($0) + 1    # get newline
-    lines++
-    words += NF
address@hidden
address@hidden endfile
+printf("%s", gettext("Don't Panic!\n"));
 @end example
 
-Finally, the @code{END} rule simply prints the totals for all the files:
+The tools that extract messages from source code pull out all
+strings enclosed in calls to @code{gettext()}.
+
address@hidden @code{_} (underscore), @code{_} C macro
address@hidden underscore (@code{_}), @code{_} C macro
+The GNU @code{gettext} developers, recognizing that typing
address@hidden(@dots{})} over and over again is both painful and ugly to look
+at, use the macro @samp{_} (an underscore) to make things easier:
 
 @example
address@hidden file eg/prog/wc.awk
-END @{
-    if (print_total) @{
-        if (do_lines)
-            printf "\t%d", tlines
-        if (do_words)
-            printf "\t%d", twords
-        if (do_chars)
-            printf "\t%d", tchars
-        print "\ttotal"
-    @}
address@hidden
address@hidden endfile
+/* In the standard header file: */
+#define _(str) gettext(str)
+
+/* In the program text: */
+printf("%s", _("Don't Panic!\n"));
 @end example
address@hidden ENDOFRANGE count
address@hidden ENDOFRANGE infco
address@hidden ENDOFRANGE lico
address@hidden ENDOFRANGE woco
address@hidden ENDOFRANGE chco
address@hidden ENDOFRANGE posimawk
 
address@hidden Miscellaneous Programs
address@hidden A Grab Bag of @command{awk} Programs
address@hidden internationalization, localization, locale categories
address@hidden @code{gettext} library, locale categories
address@hidden locale categories
address@hidden
+This reduces the typing overhead to just three extra characters per string
+and is considerably easier to read as well.
 
-This @value{SECTION} is a large ``grab bag'' of miscellaneous programs.
-We hope you find them both interesting and enjoyable.
+There are locale @dfn{categories}
+for different types of locale-related information.
+The defined locale categories that @code{gettext} knows about are:
 
address@hidden
-* Dupword Program::             Finding duplicated words in a document.
-* Alarm Program::               An alarm clock.
-* Translate Program::           A program similar to the @command{tr} utility.
-* Labels Program::              Printing mailing labels.
-* Word Sorting::                A program to produce a word usage count.
-* History Sorting::             Eliminating duplicate entries from a history
-                                file.
-* Extract Program::             Pulling out programs from Texinfo source
-                                files.
-* Simple Sed::                  A Simple Stream Editor.
-* Igawk Program::               A wrapper for @command{awk} that includes
-                                files.
-* Anagram Program::             Finding anagrams from a dictionary.
-* Signature Program::           People do amazing things with too much time on
-                                their hands.
address@hidden menu
address@hidden @code
address@hidden @code{LC_MESSAGES} locale category
address@hidden LC_MESSAGES
+Text messages.  This is the default category for @code{gettext}
+operations, but it is possible to supply a different one explicitly,
+if necessary.  (It is almost never necessary to supply a different category.)
 
address@hidden Dupword Program
address@hidden Finding Duplicated Words in a Document
address@hidden sorting characters in different languages
address@hidden @code{LC_COLLATE} locale category
address@hidden LC_COLLATE
+Text-collation information; i.e., how different characters
+and/or groups of characters sort in a given language.
 
address@hidden words, address@hidden searching for
address@hidden searching, for words
address@hidden address@hidden searching
-A common error when writing large amounts of prose is to accidentally
-duplicate words.  Typically you will see this in text as something like ``the
-the program does the address@hidden''  When the text is online, often
-the duplicated words occur at the end of one line and the
address@hidden
-the
address@hidden iftex
-beginning of
-another, making them very difficult to spot.
address@hidden as here!
address@hidden @code{LC_CTYPE} locale category
address@hidden LC_CTYPE
+Character-type information (alphabetic, digit, upper- or lowercase, and
+so on).
+This information is accessed via the
+POSIX character classes in regular expressions,
+such as @code{/[[:alnum:]]/}
+(@pxref{Regexp Operators}).
 
-This program, @file{dupword.awk}, scans through a file one line at a time
-and looks for adjacent occurrences of the same word.  It also saves the last
-word on a line (in the variable @code{prev}) for comparison with the first
-word on the next line.
address@hidden monetary information, localization
address@hidden currency symbols, localization
address@hidden @code{LC_MONETARY} locale category
address@hidden LC_MONETARY
+Monetary information, such as the currency symbol, and whether the
+symbol goes before or after a number.
 
address@hidden Texinfo
-The first two statements make sure that the line is all lowercase,
-so that, for example, ``The'' and ``the'' compare equal to each other.
-The next statement replaces nonalphanumeric and nonwhitespace characters
-with spaces, so that punctuation does not affect the comparison either.
-The characters are replaced with spaces so that formatting controls
-don't create nonsense words (e.g., the Texinfo @samp{@@address@hidden@}}
-becomes @samp{codeNF} if punctuation is simply deleted).  The record is
-then resplit into fields, yielding just the actual words on the line,
-and ensuring that there are no empty fields.
address@hidden @code{LC_NUMERIC} locale category
address@hidden LC_NUMERIC
+Numeric information, such as which characters to use for the decimal
+point and the thousands address@hidden
+use a comma every three decimal places and a period for the decimal
+point, while many Europeans do exactly the opposite:
+1,234.56 versus 1.234,56.}
 
-If there are no fields left after removing all the punctuation, the
-current record is skipped.  Otherwise, the program loops through each
-word, comparing it to the previous one:
address@hidden @code{LC_RESPONSE} locale category
address@hidden LC_RESPONSE
+Response information, such as how ``yes'' and ``no'' appear in the
+local language, and possibly other information as well.
 
address@hidden @code{dupword.awk} program
address@hidden
address@hidden file eg/prog/dupword.awk
-# dupword.awk --- find duplicate words in text
address@hidden endfile
address@hidden
address@hidden file eg/prog/dupword.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# December 1991
-# Revised October 2000
address@hidden time, localization and
address@hidden dates, information related address@hidden localization
address@hidden @code{LC_TIME} locale category
address@hidden LC_TIME
+Time- and date-related information, such as 12- or 24-hour clock, month printed
+before or after the day in a date, local month abbreviations, and so on.
 
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/dupword.awk
address@hidden
-    $0 = tolower($0)
-    gsub(/[^[:alnum:][:blank:]]/, " ");
-    $0 = $0         # re-split
-    if (NF == 0)
-        next
-    if ($1 == prev)
-        printf("%s:%d: duplicate %s\n",
-            FILENAME, FNR, $1)
-    for (i = 2; i <= NF; i++)
-        if ($i == $(i-1))
-            printf("%s:%d: duplicate %s\n",
-                FILENAME, FNR, $i)
-    prev = $NF
address@hidden
address@hidden endfile
address@hidden example
address@hidden @code{LC_ALL} locale category
address@hidden LC_ALL
+All of the above.  (Not too useful in the context of @code{gettext}.)
address@hidden table
address@hidden ENDOFRANGE gettex
 
address@hidden Alarm Program
address@hidden An Alarm Clock Program
address@hidden insomnia, cure for
address@hidden Robbins, Arnold
address@hidden
address@hidden cures insomnia like a ringing alarm address@hidden
-Arnold Robbins
address@hidden quotation
address@hidden Programmer i18n
address@hidden Internationalizing @command{awk} Programs
address@hidden STARTOFRANGE inap
address@hidden @command{awk} programs, internationalizing
 
address@hidden STARTOFRANGE tialarm
address@hidden time, alarm clock example program
address@hidden STARTOFRANGE alaex
address@hidden alarm clock example program
-The following program is a simple ``alarm clock'' program.
-You give it a time of day and an optional message.  At the specified time,
-it prints the message on the standard output. In addition, you can give it
-the number of times to repeat the message as well as a delay between
-repetitions.
address@hidden provides the following variables and functions for
+internationalization:
 
-This program uses the @code{getlocaltime()} function from
address@hidden Function}.
address@hidden @code
address@hidden @code{TEXTDOMAIN} variable
address@hidden TEXTDOMAIN
+This variable indicates the application's text domain.
+For compatibility with GNU @code{gettext}, the default
+value is @code{"messages"}.
 
-All the work is done in the @code{BEGIN} rule.  The first part is argument
-checking and setting of defaults: the delay, the count, and the message to
-print.  If the user supplied a message without the ASCII BEL
-character (known as the ``alert'' character, @code{"\a"}), then it is added to
-the message.  (On many systems, printing the ASCII BEL generates an
-audible alert. Thus when the alarm goes off, the system calls attention
-to itself in case the user is not looking at the computer.)
-Just for a change, this program uses a @code{switch} statement
-(@pxref{Switch Statement}), but the processing could be done with a series of
address@hidden@code{else} statements instead.
-Here is the program:
address@hidden internationalization, localization, marked strings
address@hidden strings, for localization
address@hidden _"your message here"
+String constants marked with a leading underscore
+are candidates for translation at runtime.
+String constants without a leading underscore are not translated.
 
address@hidden @code{alarm.awk} program
address@hidden
address@hidden file eg/prog/alarm.awk
-# alarm.awk --- set an alarm
-#
-# Requires getlocaltime() library function
address@hidden endfile
address@hidden
address@hidden file eg/prog/alarm.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
-# Revised December 2010
address@hidden @code{dcgettext()} function (@command{gawk})
address@hidden dcgettext(@var{string} @r{[}, @var{domain} @r{[}, 
@address@hidden)
+Return the translation of @var{string} in
+text domain @var{domain} for locale category @var{category}.
+The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
+The default value for @var{category} is @code{"LC_MESSAGES"}.
 
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/alarm.awk
-# usage: alarm time [ "message" [ count [ delay ] ] ]
+If you supply a value for @var{category}, it must be a string equal to
+one of the known locale categories described in
address@hidden
+the previous @value{SECTION}.
address@hidden ifnotinfo
address@hidden
address@hidden gettext}.
address@hidden ifinfo
+You must also supply a text domain.  Use @code{TEXTDOMAIN} if
+you want to use the current domain.
 
-BEGIN    \
address@hidden
-    # Initial argument sanity checking
-    usage1 = "usage: alarm time ['message' [count [delay]]]"
-    usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1])
address@hidden CAUTION
+The order of arguments to the @command{awk} version
+of the @code{dcgettext()} function is purposely different from the order for
+the C version.  The @command{awk} version's order was
+chosen to be simple and to allow for reasonable @command{awk}-style
+default arguments.
address@hidden quotation
 
-    if (ARGC < 2) @{
-        print usage1 > "/dev/stderr"
-        print usage2 > "/dev/stderr"
-        exit 1
-    @}
-    switch (ARGC) @{
-    case 5:
-        delay = ARGV[4] + 0
-        # fall through
-    case 4:
-        count = ARGV[3] + 0
-        # fall through
-    case 3:
-        message = ARGV[2]
-        break
-    default:
-        if (ARGV[1] !~ /[[:digit:]]?[[:digit:]]:[[:digit:address@hidden@}/) @{
-            print usage1 > "/dev/stderr"
-            print usage2 > "/dev/stderr"
-            exit 1
-        @}
-        break
-    @}
address@hidden @code{dcngettext()} function (@command{gawk})
address@hidden dcngettext(@var{string1}, @var{string2}, @var{number} @r{[}, 
@var{domain} @r{[}, @address@hidden)
+Return the plural form used for @var{number} of the
+translation of @var{string1} and @var{string2} in text domain
address@hidden for locale category @var{category}. @var{string1} is the
+English singular variant of a message, and @var{string2} the English plural
+variant of the same message.
+The default value for @var{domain} is the current value of @code{TEXTDOMAIN}.
+The default value for @var{category} is @code{"LC_MESSAGES"}.
 
-    # set defaults for once we reach the desired time
-    if (delay == 0)
-        delay = 180    # 3 minutes
address@hidden
-    if (count == 0)
-        count = 5
address@hidden group
-    if (message == "")
-        message = sprintf("\aIt is now %s!\a", ARGV[1])
-    else if (index(message, "\a") == 0)
-        message = "\a" message "\a"
address@hidden endfile
address@hidden example
+The same remarks about argument order as for the @code{dcgettext()} function 
apply.
 
-The next @value{SECTION} of code turns the alarm time into hours and minutes,
-converts it (if necessary) to a 24-hour clock, and then turns that
-time into a count of the seconds since midnight.  Next it turns the current
-time into a count of seconds since midnight.  The difference between the two
-is how long to wait before setting off the alarm:
address@hidden @code{.mo} files, specifying directory of
address@hidden files, @code{.mo}, specifying directory of
address@hidden message object files, specifying directory of
address@hidden files, message object, specifying directory of
address@hidden @code{bindtextdomain()} function (@command{gawk})
address@hidden bindtextdomain(@var{directory} @r{[}, @address@hidden)
+Change the directory in which
address@hidden looks for @file{.mo} files, in case they
+will not or cannot be placed in the standard locations
+(e.g., during testing).
+Return the directory in which @var{domain} is ``bound.''
 
address@hidden
address@hidden file eg/prog/alarm.awk
-    # split up alarm time
-    split(ARGV[1], atime, ":")
-    hour = atime[1] + 0    # force numeric
-    minute = atime[2] + 0  # force numeric
+The default @var{domain} is the value of @code{TEXTDOMAIN}.
+If @var{directory} is the null string (@code{""}), then
address@hidden()} returns the current binding for the
+given @var{domain}.
address@hidden table
 
-    # get current broken down time
-    getlocaltime(now)
+To use these facilities in your @command{awk} program, follow the steps
+outlined in
address@hidden
+the previous @value{SECTION},
address@hidden ifnotinfo
address@hidden
address@hidden gettext},
address@hidden ifinfo
+like so:
 
-    # if time given is 12-hour hours and it's after that
-    # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m.,
-    # then add 12 to real hour
-    if (hour < 12 && now["hour"] > hour)
-        hour += 12
address@hidden
address@hidden @code{BEGIN} pattern, @code{TEXTDOMAIN} variable and
address@hidden @code{TEXTDOMAIN} variable, @code{BEGIN} pattern and
address@hidden
+Set the variable @code{TEXTDOMAIN} to the text domain of
+your program.  This is best done in a @code{BEGIN} rule
+(@pxref{BEGIN/END}),
+or it can also be done via the @option{-v} command-line
+option (@pxref{Options}):
 
-    # set target time in seconds since midnight
-    target = (hour * 60 * 60) + (minute * 60)
address@hidden
+BEGIN @{
+    TEXTDOMAIN = "guide"
+    @dots{}
address@hidden
address@hidden example
 
-    # get current time in seconds since midnight
-    current = (now["hour"] * 60 * 60) + \
-               (now["minute"] * 60) + now["second"]
address@hidden @code{_} (underscore), translatable string
address@hidden underscore (@code{_}), translatable string
address@hidden
+Mark all translatable strings with a leading underscore (@samp{_})
+character.  It @emph{must} be adjacent to the opening
+quote of the string.  For example:
 
-    # how long to sleep for
-    naptime = target - current
-    if (naptime <= 0) @{
-        print "time is in the past!" > "/dev/stderr"
-        exit 1
-    @}
address@hidden endfile
address@hidden
+print _"hello, world"
+x = _"you goofed"
+printf(_"Number of users is %d\n", nusers)
 @end example
 
address@hidden @command{sleep} utility
-Finally, the program uses the @code{system()} function
-(@pxref{I/O Functions})
-to call the @command{sleep} utility.  The @command{sleep} utility simply pauses
-for the given number of seconds.  If the exit status is not zero,
-the program assumes that @command{sleep} was interrupted and exits. If
address@hidden exited with an OK status (zero), then the program prints the
-message in a loop, again using @command{sleep} to delay for however many
-seconds are necessary:
address@hidden
+If you are creating strings dynamically, you can
+still translate them, using the @code{dcgettext()}
+built-in function:
 
 @example
address@hidden file eg/prog/alarm.awk
-    # zzzzzz..... go away if interrupted
-    if (system(sprintf("sleep %d", naptime)) != 0)
-        exit 1
+message = nusers " users logged in"
+message = dcgettext(message, "adminprog")
+print message
address@hidden example
 
-    # time to notify!
-    command = sprintf("sleep %d", delay)
-    for (i = 1; i <= count; i++) @{
-        print message
-        # if sleep command interrupted, go away
-        if (system(command) != 0)
-            break
-    @}
+Here, the call to @code{dcgettext()} supplies a different
+text domain (@code{"adminprog"}) in which to find the
+message, but it uses the default @code{"LC_MESSAGES"} category.
 
-    exit 0
address@hidden @code{LC_MESSAGES} locale category, @code{bindtextdomain()} 
function (@command{gawk})
address@hidden
+During development, you might want to put the @file{.mo}
+file in a private directory for testing.  This is done
+with the @code{bindtextdomain()} built-in function:
+
address@hidden
+BEGIN @{
+   TEXTDOMAIN = "guide"   # our text domain
+   if (Testing) @{
+       # where to find our files
+       bindtextdomain("testdir")
+       # joe is in charge of adminprog
+       bindtextdomain("../joe/testdir", "adminprog")
+   @}
+   @dots{}
 @}
address@hidden endfile
 @end example
address@hidden ENDOFRANGE tialarm
address@hidden ENDOFRANGE alaex
 
address@hidden Translate Program
address@hidden Transliterating Characters
address@hidden enumerate
 
address@hidden STARTOFRANGE chtra
address@hidden characters, transliterating
address@hidden @command{tr} utility
-The system @command{tr} utility transliterates characters.  For example, it is
-often used to map uppercase letters into lowercase for further processing:
address@hidden Example},
+for an example program showing the steps to create
+and use translations from @command{awk}.
 
address@hidden
address@hidden data} | tr 'A-Z' 'a-z' | @var{process data} @dots{}
address@hidden example
address@hidden Translator i18n
address@hidden Translating @command{awk} Programs
 
address@hidden requires two lists of address@hidden some older
-systems,
address@hidden ORA
-including Solaris,
address@hidden ifset
address@hidden may require that the lists be written as
-range expressions enclosed in square brackets (@samp{[a-z]}) and quoted,
-to prevent the shell from attempting a @value{FN} expansion.  This is
-not a feature.}  When processing the input, the first character in the
-first list is replaced with the first character in the second list,
-the second character in the first list is replaced with the second
-character in the second list, and so on.  If there are more characters
-in the ``from'' list than in the ``to'' list, the last character of the
-``to'' list is used for the remaining characters in the ``from'' list.
address@hidden @code{.po} files
address@hidden files, @code{.po}
address@hidden portable object files
address@hidden files, portable object
+Once a program's translatable strings have been marked, they must
+be extracted to create the initial @file{.po} file.
+As part of translation, it is often helpful to rearrange the order
+in which arguments to @code{printf} are output.
 
-Some time ago,
address@hidden early or mid-1989!
-a user proposed that a transliteration function should
-be added to @command{gawk}.
address@hidden Wishing to avoid gratuitous new features,
address@hidden at least theoretically
-The following program was written to
-prove that character transliteration could be done with a user-level
-function.  This program is not as complete as the system @command{tr} utility
-but it does most of the job.
address@hidden's @option{--gen-pot} command-line option extracts
+the messages and is discussed next.
+After that, @code{printf}'s ability to
+rearrange the order for @code{printf} arguments at runtime
+is covered.
 
-The @command{translate} program demonstrates one of the few weaknesses
-of standard @command{awk}: dealing with individual characters is very
-painful, requiring repeated use of the @code{substr()}, @code{index()},
-and @code{gsub()} built-in functions
-(@pxref{String Functions})address@hidden
-program was written before @command{gawk} acquired the ability to
-split each character in a string into separate array elements.}
address@hidden Exercise: How might you use this new feature to simplify the 
program?
-There are two functions.  The first, @code{stranslate()}, takes three
-arguments:
address@hidden
+* String Extraction::           Extracting marked strings.
+* Printf Ordering::             Rearranging @code{printf} arguments.
+* I18N Portability::            @command{awk}-level portability issues.
address@hidden menu
 
address@hidden @code
address@hidden from
-A list of characters from which to translate.
address@hidden String Extraction
address@hidden Extracting Marked Strings
address@hidden strings, extracting
address@hidden marked address@hidden extracting
address@hidden @code{--gen-pot} option
address@hidden command-line options, string extraction
address@hidden string extraction (internationalization)
address@hidden marked string extraction (internationalization)
address@hidden extraction, of marked strings (internationalization)
 
address@hidden to
-A list of characters to which to translate.
address@hidden @code{--gen-pot} option
+Once your @command{awk} program is working, and all the strings have
+been marked and you've set (and perhaps bound) the text domain,
+it is time to produce translations.
+First, use the @option{--gen-pot} command-line option to create
+the initial @file{.pot} file:
 
address@hidden target
-The string on which to do the translation.
address@hidden table
address@hidden
+$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
address@hidden example
 
-Associative arrays make the translation part fairly easy. @code{t_ar} holds
-the ``to'' characters, indexed by the ``from'' characters.  Then a simple
-loop goes through @code{from}, one character at a time.  For each character
-in @code{from}, if the character appears in @code{target},
-it is replaced with the corresponding @code{to} character.
address@hidden @code{xgettext} utility
+When run with @option{--gen-pot}, @command{gawk} does not execute your
+program.  Instead, it parses it as usual and prints all marked strings
+to standard output in the format of a GNU @code{gettext} Portable Object
+file.  Also included in the output are any constant strings that
+appear as the first argument to @code{dcgettext()} or as the first and
+second argument to @code{dcngettext()address@hidden
address@hidden utility that comes with GNU
address@hidden can handle @file{.awk} files.}
address@hidden Example},
+for the full list of steps to go through to create and test
+translations for @command{guide}.
 
-The @code{translate()} function simply calls @code{stranslate()} using 
@code{$0}
-as the target.  The main program sets two global variables, @code{FROM} and
address@hidden, from the command line, and then changes @code{ARGV} so that
address@hidden reads from the standard input.
address@hidden Printf Ordering
address@hidden Rearranging @code{printf} Arguments
 
-Finally, the processing rule simply calls @code{translate()} for each record:
address@hidden @code{printf} statement, positional specifiers
address@hidden positional specifiers, @code{printf} statement
+Format strings for @code{printf} and @code{sprintf()}
+(@pxref{Printf})
+present a special problem for translation.
+Consider the following:@footnote{This example is borrowed
+from the GNU @code{gettext} manual.}
 
address@hidden @code{translate.awk} program
address@hidden line broken here only for smallbook format
 @example
address@hidden file eg/prog/translate.awk
-# translate.awk --- do tr-like stuff
address@hidden endfile
address@hidden
address@hidden file eg/prog/translate.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# August 1989
-# February 2009 - bug fix
+printf(_"String `%s' has %d characters\n",
+          string, length(string)))
address@hidden example
 
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/translate.awk
-# Bugs: does not handle things like: tr A-Z a-z, it has
-# to be spelled out. However, if `to' is shorter than `from',
-# the last character in `to' is used for the rest of `from'.
+A possible German translation for this might be:
 
-function stranslate(from, to, target,     lf, lt, ltarget, t_ar, i, c,
-                                                               result)
address@hidden
-    lf = length(from)
-    lt = length(to)
-    ltarget = length(target)
-    for (i = 1; i <= lt; i++)
-        t_ar[substr(from, i, 1)] = substr(to, i, 1)
-    if (lt < lf)
-        for (; i <= lf; i++)
-            t_ar[substr(from, i, 1)] = substr(to, lt, 1)
-    for (i = 1; i <= ltarget; i++) @{
-        c = substr(target, i, 1)
-        if (c in t_ar)
-            c = t_ar[c]
-        result = result c
-    @}
-    return result
address@hidden
address@hidden
+"%d Zeichen lang ist die Zeichenkette `%s'\n"
address@hidden example
 
-function translate(from, to)
address@hidden
-    return $0 = stranslate(from, to, $0)
address@hidden
+The problem should be obvious: the order of the format
+specifications is different from the original!
+Even though @code{gettext()} can return the translated string
+at runtime,
+it cannot change the argument order in the call to @code{printf}.
 
-# main program
-BEGIN @{
address@hidden
-    if (ARGC < 3) @{
-        print "usage: translate from to" > "/dev/stderr"
-        exit
-    @}
address@hidden group
-    FROM = ARGV[1]
-    TO = ARGV[2]
-    ARGC = 2
-    ARGV[1] = "-"
address@hidden
+To solve this problem, @code{printf} format specifiers may have
+an additional optional element, which we call a @dfn{positional specifier}.
+For example:
 
address@hidden
-    translate(FROM, TO)
-    print
address@hidden
address@hidden endfile
address@hidden
+"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n"
 @end example
 
-While it is possible to do character transliteration in a user-level
-function, it is not necessarily efficient, and we (the @command{gawk}
-authors) started to consider adding a built-in function.  However,
-shortly after writing this program, we learned that the System V Release 4
address@hidden had added the @code{toupper()} and @code{tolower()} functions
-(@pxref{String Functions}).
-These functions handle the vast majority of the
-cases where character transliteration is necessary, and so we chose to
-simply add those functions to @command{gawk} as well and then leave well
-enough alone.
+Here, the positional specifier consists of an integer count, which indicates 
which
+argument to use, and a @samp{$}. Counts are one-based, and the
+format string itself is @emph{not} included.  Thus, in the following
+example, @samp{string} is the first argument and @samp{length(string)} is the 
second:
 
-An obvious improvement to this program would be to set up the
address@hidden array only once, in a @code{BEGIN} rule. However, this
-assumes that the ``from'' and ``to'' lists
-will never change throughout the lifetime of the program.
address@hidden ENDOFRANGE chtra
address@hidden
+$ @kbd{gawk 'BEGIN @{}
+>     @kbd{string = "Dont Panic"}
+>     @kbd{printf _"%2$d characters live in \"%1$s\"\n",}
+>                         @kbd{string, length(string)}
+> @address@hidden'}
address@hidden 10 characters live in "Dont Panic"
address@hidden example
 
address@hidden Labels Program
address@hidden Printing Mailing Labels
+If present, positional specifiers come first in the format specification,
+before the flags, the field width, and/or the precision.
 
address@hidden STARTOFRANGE prml
address@hidden printing, mailing labels
address@hidden STARTOFRANGE mlprint
address@hidden mailing address@hidden printing
-Here is a ``real world''@footnote{``Real world'' is defined as
-``a program actually used to get something done.''}
-program.  This
-script reads lists of names and
-addresses and generates mailing labels.  Each page of labels has 20 labels
-on it, two across and 10 down.  The addresses are guaranteed to be no more
-than five lines of data.  Each address is separated from the next by a blank
-line.
+Positional specifiers can be used with the dynamic field width and
+precision capability:
 
-The basic idea is to read 20 labels worth of data.  Each line of each label
-is stored in the @code{line} array.  The single rule takes care of filling
-the @code{line} array and printing the page when 20 labels have been read.
address@hidden
+$ @kbd{gawk 'BEGIN @{}
+>    @kbd{printf("%*.*s\n", 10, 20, "hello")}
+>    @kbd{printf("%3$*2$.*1$s\n", 20, 10, "hello")}
+> @address@hidden'}
address@hidden      hello
address@hidden      hello
address@hidden example
 
-The @code{BEGIN} rule simply sets @code{RS} to the empty string, so that
address@hidden splits records at blank lines
-(@pxref{Records}).
-It sets @code{MAXLINES} to 100, since 100 is the maximum number
-of lines on the page (20 * 5 = 100).
address@hidden NOTE
+When using @samp{*} with a positional specifier, the @samp{*}
+comes first, then the integer position, and then the @samp{$}.
+This is somewhat counterintuitive.
address@hidden quotation
 
-Most of the work is done in the @code{printpage()} function.
-The label lines are stored sequentially in the @code{line} array.  But they
-have to print horizontally; @code{line[1]} next to @code{line[6]},
address@hidden next to @code{line[7]}, and so on.  Two loops are used to
-accomplish this.  The outer loop, controlled by @code{i}, steps through
-every 10 lines of data; this is each row of labels.  The inner loop,
-controlled by @code{j}, goes through the lines within the row.
-As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}-th line in
-the row, and @samp{i+j+5} is the entry next to it.  The output ends up
-looking something like this:
address@hidden @code{printf} statement, positional specifiers, mixing with 
regular formats
address@hidden positional specifiers, @code{printf} statement, mixing with 
regular formats
address@hidden format specifiers, mixing regular with positional specifiers
address@hidden does not allow you to mix regular format specifiers
+and those with positional specifiers in the same string:
 
 @example
-line 1          line 6
-line 2          line 7
-line 3          line 8
-line 4          line 9
-line 5          line 10
address@hidden
+$ @kbd{gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}'}
address@hidden gawk: cmd. line:1: fatal: must use `count$' on all formats or 
none
 @end example
 
address@hidden
-The @code{printf} format string @samp{%-41s} left-aligns
-the data and prints it within a fixed-width field.
-
-As a final note, an extra blank line is printed at lines 21 and 61, to keep
-the output lined up on the labels.  This is dependent on the particular
-brand of labels in use when the program was written.  You will also note
-that there are two blank lines at the top and two blank lines at the bottom.
address@hidden NOTE
+There are some pathological cases that @command{gawk} may fail to
+diagnose.  In such cases, the output may not be what you expect.
+It's still a bad idea to try mixing them, even if @command{gawk}
+doesn't detect it.
address@hidden quotation
 
-The @code{END} rule arranges to flush the final page of labels; there may
-not have been an even multiple of 20 labels in the data:
+Although positional specifiers can be used directly in @command{awk} programs,
+their primary purpose is to help in producing correct translations of
+format strings into languages different from the one in which the program
+is first written.
 
address@hidden @code{labels.awk} program
address@hidden
address@hidden file eg/prog/labels.awk
-# labels.awk --- print mailing labels
address@hidden endfile
address@hidden
address@hidden file eg/prog/labels.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# June 1992
-# December 2010, minor edits
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/labels.awk
address@hidden I18N Portability
address@hidden @command{awk} Portability Issues
 
-# Each label is 5 lines of data that may have blank lines.
-# The label sheets have 2 blank lines at the top and 2 at
-# the bottom.
address@hidden portability, internationalization and
address@hidden internationalization, localization, portability and
address@hidden's internationalization features were purposely chosen to
+have as little impact as possible on the portability of @command{awk}
+programs that use them to other versions of @command{awk}.
+Consider this program:
 
-BEGIN    @{ RS = "" ; MAXLINES = 100 @}
address@hidden
+BEGIN @{
+    TEXTDOMAIN = "guide"
+    if (Test_Guide)   # set with -v
+        bindtextdomain("/test/guide/messages")
+    print _"don't panic!"
address@hidden
address@hidden example
 
-function printpage(    i, j)
address@hidden
-    if (Nlines <= 0)
-        return
address@hidden
+As written, it won't work on other versions of @command{awk}.
+However, it is actually almost portable, requiring very little
+change:
 
-    printf "\n\n"        # header
address@hidden @bullet
address@hidden @code{TEXTDOMAIN} variable, portability and
address@hidden
+Assignments to @code{TEXTDOMAIN} won't have any effect,
+since @code{TEXTDOMAIN} is not special in other @command{awk} implementations.
 
-    for (i = 1; i <= Nlines; i += 10) @{
-        if (i == 21 || i == 61)
-            print ""
-        for (j = 0; j < 5; j++) @{
-            if (i + j > MAXLINES)
-                break
-            printf "   %-41s %s\n", line[i+j], line[i+j+5]
-        @}
-        print ""
-    @}
address@hidden
+Non-GNU versions of @command{awk} treat marked strings
+as the concatenation of a variable named @code{_} with the string
+following address@hidden is good fodder for an ``Obfuscated
address@hidden'' contest.} Typically, the variable @code{_} has
+the null string (@code{""}) as its value, leaving the original string constant 
as
+the result.
 
-    printf "\n\n"        # footer
address@hidden
+By defining ``dummy'' functions to replace @code{dcgettext()}, 
@code{dcngettext()}
+and @code{bindtextdomain()}, the @command{awk} program can be made to run, but
+all the messages are output in the original language.
+For example:
 
-    delete line
address@hidden @code{bindtextdomain()} function (@command{gawk}), portability 
and
address@hidden @code{dcgettext()} function (@command{gawk}), portability and
address@hidden @code{dcngettext()} function (@command{gawk}), portability and
address@hidden
address@hidden file eg/lib/libintl.awk
+function bindtextdomain(dir, domain)
address@hidden
+    return dir
 @}
 
-# main rule
+function dcgettext(string, domain, category)
 @{
-    if (Count >= 20) @{
-        printpage()
-        Count = 0
-        Nlines = 0
-    @}
-    n = split($0, a, "\n")
-    for (i = 1; i <= n; i++)
-        line[++Nlines] = a[i]
-    for (; i <= 5; i++)
-        line[++Nlines] = ""
-    Count++
+    return string
 @}
 
-END    \
+function dcngettext(string1, string2, number, domain, category)
 @{
-    printpage()
+    return (number == 1 ? string1 : string2)
 @}
 @c endfile
 @end example
address@hidden ENDOFRANGE prml
address@hidden ENDOFRANGE mlprint
-
address@hidden Word Sorting
address@hidden Generating Word-Usage Counts
 
address@hidden STARTOFRANGE worus
address@hidden words, usage address@hidden generating
address@hidden
+The use of positional specifications in @code{printf} or
address@hidden()} is @emph{not} portable.
+To support @code{gettext()} at the C level, many systems' C versions of
address@hidden()} do support positional specifiers.  But it works only if
+enough arguments are supplied in the function call.  Many versions of
address@hidden pass @code{printf} formats and arguments unchanged to the
+underlying C library version of @code{sprintf()}, but only one format and
+argument at a time.  What happens if a positional specification is
+used is anybody's guess.
+However, since the positional specifications are primarily for use in
address@hidden format strings, and since non-GNU @command{awk}s never
+retrieve the translated string, this should not be a problem in practice.
address@hidden itemize
address@hidden ENDOFRANGE inap
 
-When working with large amounts of text, it can be interesting to know
-how often different words appear.  For example, an author may overuse
-certain words, in which case she might wish to find synonyms to substitute
-for words that appear too often. This @value{SUBSECTION} develops a
-program for counting words and presenting the frequency information
-in a useful format.
address@hidden I18N Example
address@hidden A Simple Internationalization Example
 
-At first glance, a program like this would seem to do the job:
+Now let's look at a step-by-step example of how to internationalize and
+localize a simple @command{awk} program, using @file{guide.awk} as our
+original source:
 
 @example
-# Print list of word frequencies
-
address@hidden
-    for (i = 1; i <= NF; i++)
-        freq[$i]++
address@hidden
-
-END @{
-    for (word in freq)
-        printf "%s\t%d\n", word, freq[word]
address@hidden file eg/prog/guide.awk
+BEGIN @{
+    TEXTDOMAIN = "guide"
+    bindtextdomain(".")  # for testing
+    print _"Don't Panic"
+    print _"The Answer Is", 42
+    print "Pardon me, Zaphod who?"
 @}
address@hidden endfile
 @end example
 
-The program relies on @command{awk}'s default field splitting
-mechanism to break each line up into ``words,'' and uses an
-associative array named @code{freq}, indexed by each word, to count
-the number of times the word occurs. In the @code{END} rule,
-it prints the counts.
-
-This program has several problems that prevent it from being
-useful on real text files:
-
address@hidden @bullet
address@hidden
-The @command{awk} language considers upper- and lowercase characters to be
-distinct.  Therefore, ``bartender'' and ``Bartender'' are not treated
-as the same word.  This is undesirable, since in normal text, words
-are capitalized if they begin sentences, and a frequency analyzer should not
-be sensitive to capitalization.
-
address@hidden
-Words are detected using the @command{awk} convention that fields are
-separated just by whitespace.  Other characters in the input (except
-newlines) don't have any special meaning to @command{awk}.  This means that
-punctuation characters count as part of words.
address@hidden
+Run @samp{gawk --gen-pot} to create the @file{.pot} file:
 
address@hidden
-The output does not come out in any useful order.  You're more likely to be
-interested in which words occur most frequently or in having an alphabetized
-table of how frequently each word occurs.
address@hidden itemize
address@hidden
+$ @kbd{gawk --gen-pot -f guide.awk > guide.pot}
address@hidden example
 
address@hidden @command{sort} utility
-The first problem can be solved by using @code{tolower()} to remove case
-distinctions.  The second problem can be solved by using @code{gsub()}
-to remove punctuation characters.  Finally, we solve the third problem
-by using the system @command{sort} utility to process the output of the
address@hidden script.  Here is the new version of the program:
address@hidden
+This produces:
 
address@hidden @code{wordfreq.awk} program
 @example
address@hidden file eg/prog/wordfreq.awk
-# wordfreq.awk --- print list of word frequencies
address@hidden file eg/data/guide.po
+#: guide.awk:4
+msgid "Don't Panic"
+msgstr ""
 
address@hidden
-    $0 = tolower($0)    # remove case distinctions
-    # remove punctuation
-    gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
-    for (i = 1; i <= NF; i++)
-        freq[$i]++
address@hidden
+#: guide.awk:5
+msgid "The Answer Is"
+msgstr ""
 
 @c endfile
-END @{
-    for (word in freq)
-        printf "%s\t%d\n", word, freq[word]
address@hidden
 @end example
 
-Assuming we have saved this program in a file named @file{wordfreq.awk},
-and that the data is in @file{file1}, the following pipeline:
+This original portable object template file is saved and reused for each 
language
+into which the application is translated.  The @code{msgid}
+is the original string and the @code{msgstr} is the translation.
+
address@hidden NOTE
+Strings not marked with a leading underscore do not
+appear in the @file{guide.pot} file.
address@hidden quotation
+
+Next, the messages must be translated.
+Here is a translation to a hypothetical dialect of English,
+called ``Mellow'':@footnote{Perhaps it would be better if it were
+called ``Hippy.'' Ah, well.}
 
 @example
-awk -f wordfreq.awk file1 | sort -k 2nr
address@hidden
+$ cp guide.pot guide-mellow.po
address@hidden translations to} guide-mellow.po @dots{}
address@hidden group
 @end example
 
 @noindent
-produces a table of the words appearing in @file{file1} in order of
-decreasing frequency.
-
-The @command{awk} program suitably massages the
-data and produces a word frequency table, which is not ordered.
-The @command{awk} script's output is then sorted by the @command{sort}
-utility and printed on the screen.
+Following are the translations:
 
-The options given to @command{sort}
-specify a sort that uses the second field of each input line (skipping
-one field), that the sort keys should be treated as numeric quantities
-(otherwise @samp{15} would come before @samp{5}), and that the sorting
-should be done in descending (reverse) order.
address@hidden
address@hidden file eg/data/guide-mellow.po
+#: guide.awk:4
+msgid "Don't Panic"
+msgstr "Hey man, relax!"
 
-The @command{sort} could even be done from within the program, by changing
-the @code{END} action to:
+#: guide.awk:5
+msgid "The Answer Is"
+msgstr "Like, the scoop is"
 
address@hidden
address@hidden file eg/prog/wordfreq.awk
-END @{
-    sort = "sort -k 2nr"
-    for (word in freq)
-        printf "%s\t%d\n", word, freq[word] | sort
-    close(sort)
address@hidden
 @c endfile
 @end example
 
-This way of sorting must be used on systems that do not
-have true pipes at the command-line (or batch-file) level.
-See the general operating system documentation for more information on how
-to use the @command{sort} program.
address@hidden ENDOFRANGE worus
-
address@hidden History Sorting
address@hidden Removing Duplicates from Unsorted Text
-
address@hidden STARTOFRANGE lidu
address@hidden lines, address@hidden removing
-The @command{uniq} program
-(@pxref{Uniq Program}),
-removes duplicate lines from @emph{sorted} data.
address@hidden Linux
address@hidden GNU/Linux
+The next step is to make the directory to hold the binary message object
+file and then to create the @file{guide.mo} file.
+The directory layout shown here is standard for GNU @code{gettext} on
+GNU/Linux systems.  Other versions of @code{gettext} may use a different
+layout:
 
-Suppose, however, you need to remove duplicate lines from a @value{DF} but
-that you want to preserve the order the lines are in.  A good example of
-this might be a shell history file.  The history file keeps a copy of all
-the commands you have entered, and it is not unusual to repeat a command
-several times in a row.  Occasionally you might want to compact the history
-by removing duplicate entries.  Yet it is desirable to maintain the order
-of the original commands.
address@hidden
+$ @kbd{mkdir en_US en_US/LC_MESSAGES}
address@hidden example
 
-This simple program does the job.  It uses two arrays.  The @code{data}
-array is indexed by the text of each line.
-For each line, @code{data[$0]} is incremented.
-If a particular line has not
-been seen before, then @code{data[$0]} is zero.
-In this case, the text of the line is stored in @code{lines[count]}.
-Each element of @code{lines} is a unique command, and the indices of
address@hidden indicate the order in which those lines are encountered.
-The @code{END} rule simply prints out the lines, in order:
address@hidden @code{.po} files, converting to @code{.mo}
address@hidden files, @code{.po}, converting to @code{.mo}
address@hidden @code{.mo} files, converting from @code{.po}
address@hidden files, @code{.mo}, converting from @code{.po}
address@hidden portable object files, converting to message object files
address@hidden files, portable object, converting to message object files
address@hidden message object files, converting from portable object files
address@hidden files, message object, converting from portable object files
address@hidden @command{msgfmt} utility
+The @command{msgfmt} utility does the conversion from human-readable
address@hidden file to machine-readable @file{.mo} file.
+By default, @command{msgfmt} creates a file named @file{messages}.
+This file must be renamed and placed in the proper directory so that
address@hidden can find it:
 
address@hidden Rakitzis, Byron
address@hidden @code{histsort.awk} program
 @example
address@hidden file eg/prog/histsort.awk
-# histsort.awk --- compact a shell history file
-# Thanks to Byron Rakitzis for the general idea
address@hidden endfile
address@hidden
address@hidden file eg/prog/histsort.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/histsort.awk
+$ @kbd{msgfmt guide-mellow.po}
+$ @kbd{mv messages en_US/LC_MESSAGES/guide.mo}
address@hidden example
 
address@hidden
address@hidden
-    if (data[$0]++ == 0)
-        lines[++count] = $0
address@hidden
address@hidden group
+Finally, we run the program to test it:
 
address@hidden
-END @{
-    for (i = 1; i <= count; i++)
-        print lines[i]
address@hidden
address@hidden group
address@hidden endfile
address@hidden
+$ @kbd{gawk -f guide.awk}
address@hidden Hey man, relax!
address@hidden Like, the scoop is 42
address@hidden Pardon me, Zaphod who?
 @end example
 
-This program also provides a foundation for generating other useful
-information.  For example, using the following @code{print} statement in the
address@hidden rule indicates how often a particular command is used:
+If the three replacement functions for @code{dcgettext()}, @code{dcngettext()}
+and @code{bindtextdomain()}
+(@pxref{I18N Portability})
+are in a file named @file{libintl.awk},
+then we can run @file{guide.awk} unchanged as follows:
 
 @example
-print data[lines[i]], lines[i]
+$ @kbd{gawk --posix -f guide.awk -f libintl.awk}
address@hidden Don't Panic
address@hidden The Answer Is 42
address@hidden Pardon me, Zaphod who?
 @end example
 
-This works because @code{data[$0]} is incremented each time a line is
-seen.
address@hidden ENDOFRANGE lidu
-
address@hidden Extract Program
address@hidden Extracting Programs from Texinfo Source Files
address@hidden Gawk I18N
address@hidden @command{gawk} Can Speak Your Language
 
address@hidden STARTOFRANGE texse
address@hidden Texinfo, extracting programs from source files
address@hidden STARTOFRANGE fitex
address@hidden files, address@hidden extracting programs from
address@hidden
-Both this chapter and the previous chapter
-(@ref{Library Functions})
-present a large number of @command{awk} programs.
address@hidden ifnotinfo
address@hidden itself has been internationalized
+using the GNU @code{gettext} package.
+(GNU @code{gettext} is described in
+complete detail in
 @ifinfo
-The nodes
address@hidden Functions},
-and @ref{Sample Programs},
-are the top level nodes for a large number of @command{awk} programs.
address@hidden, , GNU @code{gettext} utilities, gettext, GNU gettext tools}.)
 @end ifinfo
-If you want to experiment with these programs, it is tedious to have to type
-them in by hand.  Here we present a program that can extract parts of a
-Texinfo input file into separate files.
-
address@hidden Texinfo
-This @value{DOCUMENT} is written in @uref{http://texinfo.org, Texinfo},
-the GNU project's document formatting language.
-A single Texinfo source file can be used to produce both
-printed and online documentation.
 @ifnotinfo
-Texinfo is fully documented in the book
address@hidden GNU Documentation Format},
-available from the Free Software Foundation.
address@hidden gettext tools}.)
 @end ifnotinfo
address@hidden
-The Texinfo language is described fully, starting with
address@hidden, , Texinfo, texinfo,Texinfo---The GNU Documentation Format}.
address@hidden ifinfo
+As of this writing, the latest version of GNU @code{gettext} is
address@hidden://ftp.gnu.org/gnu/gettext/gettext-0.18.1.tar.gz, 
@value{PVERSION} 0.18.1}.
 
-For our purposes, it is enough to know three things about Texinfo input
-files:
+If a translation of @command{gawk}'s messages exists,
+then @command{gawk} produces usage messages, warnings,
+and fatal errors in the local language.
address@hidden ENDOFRANGE inloc
 
address@hidden @bullet
address@hidden
-The ``at'' symbol (@samp{@@}) is special in Texinfo, much as
-the backslash (@samp{\}) is in C
-or @command{awk}.  Literal @samp{@@} symbols are represented in Texinfo source
-files as @samp{@@@@}.
address@hidden Advanced Features
address@hidden Advanced Features of @command{gawk}
address@hidden advanced features, network connections, See Also networks, 
connections
address@hidden STARTOFRANGE gawadv
address@hidden @command{gawk}, features, advanced
address@hidden STARTOFRANGE advgaw
address@hidden advanced features, @command{gawk}
address@hidden
+Contributed by: Peter Langston <address@hidden>
 
address@hidden
-Comments start with either @samp{@@c} or @samp{@@comment}.
-The file-extraction program works by using special comments that start
-at the beginning of a line.
+    Found in Steve English's "signature" line:
 
address@hidden
-Lines containing @samp{@@group} and @samp{@@end group} commands bracket
-example text that should not be split across a page boundary.
-(Unfortunately, @TeX{} isn't always smart enough to do things exactly right,
-so we have to give it some help.)
address@hidden itemize
+"Write documentation as if whoever reads it is a violent psychopath
+who knows where you live."
address@hidden ignore
address@hidden
address@hidden documentation as if whoever reads it is
+a violent psychopath who knows where you address@hidden
+Steve English, as quoted by Peter Langston
address@hidden quotation
 
-The following program, @file{extract.awk}, reads through a Texinfo source
-file and does two things, based on the special comments.
-Upon seeing @address@hidden@@c system @dots{}}},
-it runs a command, by extracting the command text from the
-control line and passing it on to the @code{system()} function
-(@pxref{I/O Functions}).
-Upon seeing @samp{@@c file @var{filename}}, each subsequent line is sent to
-the file @var{filename}, until @samp{@@c endfile} is encountered.
-The rules in @file{extract.awk} match either @samp{@@c} or
address@hidden@@comment} by letting the @samp{omment} part be optional.
-Lines containing @samp{@@group} and @samp{@@end group} are simply removed.
address@hidden uses the @code{join()} library function
-(@pxref{Join Function}).
+This @value{CHAPTER} discusses advanced features in @command{gawk}.
+It's a bit of a ``grab bag'' of items that are otherwise unrelated
+to each other.
+First, a command-line option allows @command{gawk} to recognize
+nondecimal numbers in input data, not just in @command{awk}
+programs.
+Then, @command{gawk}'s special features for sorting arrays are presented.
+Next, two-way I/O, discussed briefly in earlier parts of this
address@hidden, is described in full detail, along with the basics
+of TCP/IP networking.  Finally, @command{gawk}
+can @dfn{profile} an @command{awk} program, making it possible to tune
+it for performance.
 
-The example programs in the online Texinfo source for @address@hidden
-(@file{gawk.texi}) have all been bracketed inside @samp{file} and
address@hidden lines.  The @command{gawk} distribution uses a copy of
address@hidden to extract the sample programs and install many
-of them in a standard directory where @command{gawk} can find them.
-The Texinfo file looks something like this:
address@hidden Extensions},
+discusses the ability to dynamically add new built-in functions to
address@hidden  As this feature is still immature and likely to change,
+its description is relegated to an appendix.
 
address@hidden
address@hidden
-This program has a @@address@hidden@} rule,
-that prints a nice message:
address@hidden
+* Nondecimal Data::             Allowing nondecimal input data.
+* Array Sorting::               Facilities for controlling array traversal and
+                                sorting arrays.
+* Two-way I/O::                 Two-way communications with another process.
+* TCP/IP Networking::           Using @command{gawk} for network programming.
+* Profiling::                   Profiling your @command{awk} programs.
address@hidden menu
 
-@@example
-@@c file examples/messages.awk
-BEGIN @@@{ print "Don't panic!" @@@}
-@@c end file
-@@end example
address@hidden Nondecimal Data
address@hidden Allowing Nondecimal Input Data
address@hidden @code{--non-decimal-data} option
address@hidden advanced features, @command{gawk}, nondecimal input data
address@hidden input, address@hidden nondecimal
address@hidden constants, nondecimal
 
-It also prints some final advice:
+If you run @command{gawk} with the @option{--non-decimal-data} option,
+you can have nondecimal constants in your input data:
 
-@@example
-@@c file examples/messages.awk
-END @@@{ print "Always avoid bored archeologists!" @@@}
-@@c end file
-@@end example
address@hidden
address@hidden line break here for small book format
address@hidden
+$ @kbd{echo 0123 123 0x123 |}
+> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n",}
+>                                         @kbd{$1, $2, $3 @}'}
address@hidden 83, 123, 291
 @end example
 
address@hidden begins by setting @code{IGNORECASE} to one, so that
-mixed upper- and lowercase letters in the directives won't matter.
-
-The first rule handles calling @code{system()}, checking that a command is
-given (@code{NF} is at least three) and also checking that the command
-exits with a zero exit status, signifying OK:
+For this feature to work, write your program so that
address@hidden treats your data as numeric:
 
address@hidden @code{extract.awk} program
 @example
address@hidden file eg/prog/extract.awk
-# extract.awk --- extract files and run programs
-#                 from texinfo files
address@hidden endfile
address@hidden
address@hidden file eg/prog/extract.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
-# Revised September 2000
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/extract.awk
+$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
address@hidden 0123 123 0x123
address@hidden example
 
-BEGIN    @{ IGNORECASE = 1 @}
address@hidden
+The @code{print} statement treats its expressions as strings.
+Although the fields can act as numbers when necessary,
+they are still strings, so @code{print} does not try to treat them
+numerically.  You may need to add zero to a field to force it to
+be treated as a number.  For example:
 
-/^@@c(omment)?[ \t]+system/    \
address@hidden
-    if (NF < 3) @{
-        e = (FILENAME ":" FNR)
-        e = (e  ": badly formed `system' line")
-        print e > "/dev/stderr"
-        next
-    @}
-    $1 = ""
-    $2 = ""
-    stat = system($0)
-    if (stat != 0) @{
-        e = (FILENAME ":" FNR)
-        e = (e ": warning: system returned " stat)
-        print e > "/dev/stderr"
-    @}
address@hidden
address@hidden endfile
address@hidden
+$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
+> @address@hidden print $1, $2, $3}
+>   @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
address@hidden 0123 123 0x123
address@hidden 83 123 291
 @end example
 
address@hidden
-The variable @code{e} is used so that the rule
-fits nicely on the
address@hidden
-page.
address@hidden ifnotinfo
address@hidden
-screen.
address@hidden ifnottex
+Because it is common to have decimal data with leading zeros, and because
+using this facility could lead to surprising results, the default is to leave 
it
+disabled.  If you want it, you must explicitly request it.
 
-The second rule handles moving data into files.  It verifies that a
address@hidden is given in the directive.  If the file named is not the
-current file, then the current file is closed.  Keeping the current file
-open until a new file is encountered allows the use of the @samp{>}
-redirection for printing the contents, keeping open file management
-simple.
address@hidden programming conventions, @code{--non-decimal-data} option
address@hidden @code{--non-decimal-data} option, @code{strtonum()} function and
address@hidden @code{strtonum()} function (@command{gawk}), 
@code{--non-decimal-data} option and
address@hidden CAUTION
address@hidden of this option is not recommended.}
+It can break old programs very badly.
+Instead, use the @code{strtonum()} function to convert your data
+(@pxref{Nondecimal-numbers}).
+This makes your programs easier to write and easier to read, and
+leads to less surprising results.
address@hidden quotation
 
-The @code{for} loop does the work.  It reads lines using @code{getline}
-(@pxref{Getline}).
-For an unexpected end of file, it calls the @address@hidden()}}
-function.  If the line is an ``endfile'' line, then it breaks out of
-the loop.
-If the line is an @samp{@@group} or @samp{@@end group} line, then it
-ignores it and goes on to the next line.
-Similarly, comments within examples are also ignored.
address@hidden Array Sorting
address@hidden Controlling Array Traversal and Array Sorting
 
-Most of the work is in the following few lines.  If the line has no @samp{@@}
-symbols, the program can print it directly.
-Otherwise, each leading @samp{@@} must be stripped off.
-To remove the @samp{@@} symbols, the line is split into separate elements of
-the array @code{a}, using the @code{split()} function
-(@pxref{String Functions}).
-The @samp{@@} symbol is used as the separator character.
-Each element of @code{a} that is empty indicates two successive @samp{@@}
-symbols in the original line.  For each two empty elements (@samp{@@@@} in
-the original file), we have to add a single @samp{@@} symbol back
address@hidden program was written before @command{gawk} had the
address@hidden()} function. Consider how you might use it to simplify the code.}
address@hidden lets you control the order in which a @samp{for (i in array)}
+loop traverses an array.
 
-When the processing of the array is finished, @code{join()} is called with the
-value of @code{SUBSEP}, to rejoin the pieces back into a single
-line.  That line is then printed to the output file:
+In addition, two built-in functions, @code{asort()} and @code{asorti()},
+let you sort arrays based on the array values and indices, respectively.
+These two functions also provide control over the sorting criteria used
+to order the elements during sorting.
 
address@hidden
address@hidden file eg/prog/extract.awk
-/^@@c(omment)?[ \t]+file/    \
address@hidden
-    if (NF != 3) @{
-        e = (FILENAME ":" FNR ": badly formed `file' line")
-        print e > "/dev/stderr"
-        next
-    @}
-    if ($3 != curfile) @{
-        if (curfile != "")
-            close(curfile)
-        curfile = $3
-    @}
address@hidden
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions::     How to use @code{asort()} and @code{asorti()}.
address@hidden menu
 
-    for (;;) @{
-        if ((getline line) <= 0)
-            unexpected_eof()
-        if (line ~ /^@@c(omment)?[ \t]+endfile/)
-            break
-        else if (line ~ /^@@(end[ \t]+)?group/)
-            continue
-        else if (line ~ /^@@c(omment+)?[ \t]+/)
-            continue
-        if (index(line, "@@") == 0) @{
-            print line > curfile
-            continue
-        @}
-        n = split(line, a, "@@")
-        # if a[1] == "", means leading @@,
-        # don't add one back in.
-        for (i = 2; i <= n; i++) @{
-            if (a[i] == "") @{ # was an @@@@
-                a[i] = "@@"
-                if (a[i+1] == "")
-                    i++
-            @}
-        @}
-        print join(a, 1, n, SUBSEP) > curfile
-    @}
address@hidden
address@hidden endfile
address@hidden example
address@hidden Controlling Array Traversal
address@hidden Controlling Array Traversal
 
-An important thing to note is the use of the @samp{>} redirection.
-Output done with @samp{>} only opens the file once; it stays open and
-subsequent output is appended to the file
-(@pxref{Redirection}).
-This makes it easy to mix program text and explanatory prose for the same
-sample source file (as has been done here!) without any hassle.  The file is
-only closed when a new data @value{FN} is encountered or at the end of the
-input file.
+By default, the order in which a @samp{for (i in array)} loop
+scans an array is not defined; it is generally based upon
+the internal implementation of arrays inside @command{awk}.
 
-Finally, the function @address@hidden()}} prints an appropriate
-error message and then exits.
-The @code{END} rule handles the final cleanup, closing the open file:
+Often, though, it is desirable to be able to loop over the elements
+in a particular order that you, the programmer, choose.  @command{gawk}
+lets you do this.
 
address@hidden function lb put on same line for page breaking. sigh
address@hidden
address@hidden file eg/prog/extract.awk
address@hidden
-function unexpected_eof()
address@hidden
-    printf("%s:%d: unexpected EOF or error\n",
-        FILENAME, FNR) > "/dev/stderr"
-    exit 1
address@hidden
address@hidden group
address@hidden Scanning}, describes how you can assign special,
+pre-defined values to @code{PROCINFO["sorted_in"]} in order to
+control the order in which @command{gawk} will traverse an array
+during a @code{for} loop.
+
+In addition, the value of @code{PROCINFO["sorted_in"]} can be a function name.
+This lets you traverse an array based on any custom criterion.
+The array elements are ordered according to the return value of this
+function.  The comparison function should be defined with at least
+four arguments:
 
-END @{
-    if (curfile)
-        close(curfile)
address@hidden
+function comp_func(i1, v1, i2, v2)
address@hidden
+    @var{compare elements 1 and 2 in some fashion}
+    @var{return < 0; 0; or > 0}
 @}
address@hidden endfile
 @end example
address@hidden ENDOFRANGE texse
address@hidden ENDOFRANGE fitex
 
address@hidden Simple Sed
address@hidden A Simple Stream Editor
+Here, @var{i1} and @var{i2} are the indices, and @var{v1} and @var{v2}
+are the corresponding values of the two elements being compared.
+Either @var{v1} or @var{v2}, or both, can be arrays if the array being
+traversed contains subarrays as values.
+(@xref{Arrays of Arrays}, for more information about subarrays.)
+The three possible return values are interpreted as follows:
 
address@hidden @command{sed} utility
address@hidden stream editors
-The @command{sed} utility is a stream editor, a program that reads a
-stream of data, makes changes to it, and passes it on.
-It is often used to make global changes to a large file or to a stream
-of data generated by a pipeline of commands.
-While @command{sed} is a complicated program in its own right, its most common
-use is to perform global substitutions in the middle of a pipeline:
address@hidden @code
address@hidden comp_func(i1, v1, i2, v2) < 0
+Index @var{i1} comes before index @var{i2} during loop traversal.
 
address@hidden
-command1 < orig.data | sed 's/old/new/g' | command2 > result
address@hidden example
address@hidden comp_func(i1, v1, i2, v2) == 0
+Indices @var{i1} and @var{i2}
+come together but the relative order with respect to each other is undefined.
 
-Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp
address@hidden on each input line and globally replace it with the text
address@hidden, i.e., all the occurrences on a line.  This is similar to
address@hidden's @code{gsub()} function
-(@pxref{String Functions}).
address@hidden comp_func(i1, v1, i2, v2) > 0
+Index @var{i1} comes after index @var{i2} during loop traversal.
address@hidden table
 
-The following program, @file{awksed.awk}, accepts at least two command-line
-arguments: the pattern to look for and the text to replace it with. Any
-additional arguments are treated as data @value{FN}s to process. If none
-are provided, the standard input is used:
+Our first comparison function can be used to scan an array in
+numerical order of the indices:
 
address@hidden Brennan, Michael
address@hidden @command{awksed.awk} program
address@hidden @cindex simple stream editor
address@hidden @cindex stream editor, simple
 @example
address@hidden file eg/prog/awksed.awk
-# awksed.awk --- do s/foo/bar/g using just print
-#    Thanks to Michael Brennan for the idea
address@hidden endfile
address@hidden
address@hidden file eg/prog/awksed.awk
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# August 1995
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/awksed.awk
-
-function usage()
+function cmp_num_idx(i1, v1, i2, v2)
 @{
-    print "usage: awksed pat repl [files...]" > "/dev/stderr"
-    exit 1
+     # numerical index comparison, ascending order
+     return (i1 - i2)
 @}
address@hidden example
 
-BEGIN @{
-    # validate arguments
-    if (ARGC < 3)
-        usage()
-
-    RS = ARGV[1]
-    ORS = ARGV[2]
-
-    # don't use arguments as files
-    ARGV[1] = ARGV[2] = ""
address@hidden
+Our second function traverses an array based on the string order of
+the element values rather than by indices:
 
address@hidden
-# look ma, no hands!
address@hidden
+function cmp_str_val(i1, v1, i2, v2)
 @{
-    if (RT == "")
-        printf "%s", $0
-    else
-        print
+    # string value comparison, ascending order
+    v1 = v1 ""
+    v2 = v2 ""
+    if (v1 < v2)
+        return -1
+    return (v1 != v2)
 @}
address@hidden group
address@hidden endfile
 @end example
 
-The program relies on @command{gawk}'s ability to have @code{RS} be a regexp,
-as well as on the setting of @code{RT} to the actual text that terminates the
-record (@pxref{Records}).
-
-The idea is to have @code{RS} be the pattern to look for. @command{gawk}
-automatically sets @code{$0} to the text between matches of the pattern.
-This is text that we want to keep, unmodified.  Then, by setting @code{ORS}
-to the replacement text, a simple @code{print} statement outputs the
-text we want to keep, followed by the replacement text.
+The third
+comparison function makes all numbers, and numeric strings without
+any leading or trailing spaces, come out first during loop traversal:  
 
-There is one wrinkle to this scheme, which is what to do if the last record
-doesn't end with text that matches @code{RS}.  Using a @code{print}
-statement unconditionally prints the replacement text, which is not correct.
-However, if the file did not end in text that matches @code{RS}, @code{RT}
-is set to the null string.  In this case, we can print @code{$0} using
address@hidden
-(@pxref{Printf}).
address@hidden
+function cmp_num_str_val(i1, v1, i2, v2,   n1, n2)
address@hidden
+     # numbers before string value comparison, ascending order
+     n1 = v1 + 0
+     n2 = v2 + 0
+     if (n1 == v1) 
+         return (n2 == v2) ? (n1 - n2) : -1
+     else if (n2 == v2)
+         return 1 
+     return (v1 < v2) ? -1 : (v1 != v2)
address@hidden
address@hidden example
 
-The @code{BEGIN} rule handles the setup, checking for the right number
-of arguments and calling @code{usage()} if there is a problem. Then it sets
address@hidden and @code{ORS} from the command-line arguments and sets
address@hidden and @code{ARGV[2]} to the null string, so that they are
-not treated as @value{FN}s
-(@pxref{ARGC and ARGV}).
+Here is a main program to demonstrate how @command{gawk}
+behaves using each of the previous functions:
 
-The @code{usage()} function prints an error message and exits.
-Finally, the single rule handles the printing scheme outlined above,
-using @code{print} or @code{printf} as appropriate, depending upon the
-value of @code{RT}.
address@hidden
+BEGIN @{
+    data["one"] = 10
+    data["two"] = 20
+    data[10] = "one"
+    data[100] = 100
+    data[20] = "two"
+    
+    f[1] = "cmp_num_idx"
+    f[2] = "cmp_str_val"
+    f[3] = "cmp_num_str_val"
+    for (i = 1; i <= 3; i++) @{
+        printf("Sort function: %s\n", f[i])
+        PROCINFO["sorted_in"] = f[i]
+        for (j in data)
+            printf("\tdata[%s] = %s\n", j, data[j])
+        print ""
+    @}
address@hidden
address@hidden example
 
address@hidden
-Exercise, compare the performance of this version with the more
-straightforward:
+Here are the results when the program is run:
address@hidden
 
-BEGIN {
-    pat = ARGV[1]
-    repl = ARGV[2]
-    ARGV[1] = ARGV[2] = ""
-}
address@hidden
+$ @kbd{gawk -f compdemo.awk}
address@hidden Sort function: cmp_num_idx      @ii{Sort by numeric index}
address@hidden     data[two] = 20
address@hidden     data[one] = 10              @ii{Both strings are numerically 
zero}
address@hidden     data[10] = one
address@hidden     data[20] = two
address@hidden     data[100] = 100
address@hidden 
address@hidden Sort function: cmp_str_val      @ii{Sort by element values as 
strings}
address@hidden     data[one] = 10
address@hidden     data[100] = 100             @ii{String 100 is less than 
string 20}
address@hidden     data[two] = 20
address@hidden     data[10] = one
address@hidden     data[20] = two
address@hidden 
address@hidden Sort function: cmp_num_str_val  @ii{Sort all numeric values 
before all strings}
address@hidden     data[one] = 10
address@hidden     data[two] = 20
address@hidden     data[100] = 100
address@hidden     data[10] = one
address@hidden     data[20] = two
address@hidden example
 
-{ gsub(pat, repl); print }
+Consider sorting the entries of a GNU/Linux system password file
+according to login name.  The following program sorts records
+by a specific field position and can be used for this purpose:   
 
-Exercise: what are the advantages and disadvantages of this version versus sed?
-  Advantage: egrep regexps
-             speed (?)
-  Disadvantage: no & in replacement text
address@hidden
+# sort.awk --- simple program to sort by field position
+# field position is specified by the global variable POS
 
-Others?
address@hidden ignore
+function cmp_field(i1, v1, i2, v2)
address@hidden
+    # comparison by value, as string, and ascending order
+    return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
address@hidden
 
address@hidden Igawk Program
address@hidden An Easy Way to Use Library Functions
address@hidden
+    for (i = 1; i <= NF; i++)
+        a[NR][i] = $i
address@hidden
 
address@hidden STARTOFRANGE libfex
address@hidden libraries of @command{awk} functions, example program for using
address@hidden STARTOFRANGE flibex
address@hidden functions, library, example program for using
-In @ref{Include Files}, we saw how @command{gawk} provides a built-in
-file-inclusion capability.  However, this is a @command{gawk} extension.
-This @value{SECTION} provides the motivation for making file inclusion
-available for standard @command{awk}, and shows how to do it using a
-combination of shell and @command{awk} programming.
+END @{
+    PROCINFO["sorted_in"] = "cmp_field"
+    if (POS < 1 || POS > NF)
+        POS = 1
+    for (i in a) @{
+        for (j = 1; j <= NF; j++)
+            printf("%s%c", a[i][j], j < NF ? ":" : "")
+        print ""
+    @}
address@hidden
address@hidden example
 
-Using library functions in @command{awk} can be very beneficial. It
-encourages code reuse and the writing of general functions. Programs are
-smaller and therefore clearer.
-However, using library functions is only easy when writing @command{awk}
-programs; it is painful when running them, requiring multiple @option{-f}
-options.  If @command{gawk} is unavailable, then so too is the @env{AWKPATH}
-environment variable and the ability to put @command{awk} functions into a
-library directory (@pxref{Options}).
-It would be nice to be able to write programs in the following manner:
+The first field in each entry of the password file is the user's login name,
+and the fields are separated by colons.
+Each record defines a subarray,
+with each field as an element in the subarray.
+Running the program produces the
+following output:
 
 @example
-# library functions
-@@include getopt.awk
-@@include join.awk
+$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd}
address@hidden adm:x:3:4:adm:/var/adm:/sbin/nologin
address@hidden apache:x:48:48:Apache:/var/www:/sbin/nologin
address@hidden avahi:x:70:70:Avahi daemon:/:/sbin/nologin
 @dots{}
address@hidden example
 
-# main program
-BEGIN @{
-    while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
-        @dots{}
-    @dots{}
+The comparison should normally always return the same value when given a
+specific pair of array elements as its arguments.  If inconsistent
+results are returned then the order is undefined.  This behavior can be
+exploited to introduce random order into otherwise seemingly
+ordered data:
+
address@hidden
+function cmp_randomize(i1, v1, i2, v2)
address@hidden
+    # random order
+    return (2 - 4 * rand())
 @}
 @end example
 
-The following program, @file{igawk.sh}, provides this service.
-It simulates @command{gawk}'s searching of the @env{AWKPATH} variable
-and also allows @dfn{nested} includes; i.e., a file that is included
-with @samp{@@include} can contain further @samp{@@include} statements.
address@hidden makes an effort to only include files once, so that nested
-includes don't accidentally include a library function twice.
+As mentioned above, the order of the indices is arbitrary if two
+elements compare equal.  This is usually not a problem, but letting
+the tied elements come out in arbitrary order can be an issue, especially
+when comparing item values.  The partial ordering of the equal elements
+may change during the next loop traversal, if other elements are added or
+removed from the array.  One way to resolve ties when comparing elements
+with otherwise equal values is to include the indices in the comparison
+rules.  Note that doing this may make the loop traversal less efficient,
+so consider it only if necessary.  The following comparison functions
+force a deterministic order, and are based on the fact that the
+indices of two elements are never equal:
 
address@hidden should behave just like @command{gawk} externally.  This
-means it should accept all of @command{gawk}'s command-line arguments,
-including the ability to have multiple source files specified via
address@hidden, and the ability to mix command-line and library source files.
address@hidden
+function cmp_numeric(i1, v1, i2, v2)
address@hidden
+    # numerical value (and index) comparison, descending order
+    return (v1 != v2) ? (v2 - v1) : (i2 - i1)
address@hidden
 
-The program is written using the POSIX Shell (@command{sh}) command
address@hidden explaining the @command{sh} language is beyond
-the scope of this book. We provide some minimal explanations, but see
-a good shell programming book if you wish to understand things in more
-depth.} It works as follows:
+function cmp_string(i1, v1, i2, v2)
address@hidden
+    # string value (and index) comparison, descending order
+    v1 = v1 i1
+    v2 = v2 i2
+    return (v1 > v2) ? -1 : (v1 != v2)
address@hidden
address@hidden example
 
address@hidden
address@hidden
-Loop through the arguments, saving anything that doesn't represent
address@hidden source code for later, when the expanded program is run.
address@hidden Avoid using the term ``stable'' when describing the 
unpredictable behavior
address@hidden if two items compare equal.  Usually, the goal of a "stable 
algorithm"
address@hidden is to maintain the original order of the items, which is a 
meaningless
address@hidden concept for a list constructed from a hash.
 
address@hidden
-For any arguments that do represent @command{awk} text, put the arguments into
-a shell variable that will be expanded.  There are two cases:
+A custom comparison function can often simplify ordered loop
+traversal, and the sky is really the limit when it comes to
+designing such a function.
 
address@hidden a
address@hidden
-Literal text, provided with @option{--source} or @option{--source=}.  This
-text is just appended directly.
+When string comparisons are made during a sort, either for element
+values where one or both aren't numbers, or for element indices
+handled as strings, the value of @code{IGNORECASE}
+(@pxref{Built-in Variables}) controls whether
+the comparisons treat corresponding uppercase and lowercase letters as
+equivalent or distinct.
 
address@hidden
-Source @value{FN}s, provided with @option{-f}.  We use a neat trick and append
address@hidden@@include @var{filename}} to the shell variable's contents.  
Since the file-inclusion
-program works the way @command{gawk} does, this gets the text
-of the file included into the program at the correct point.
address@hidden enumerate
+Another point to keep in mind is that in the case of subarrays
+the element values can themselves be arrays; a production comparison
+function should use the @code{isarray()} function
+(@pxref{Type Functions}),
+to check for this, and choose a defined sorting order for subarrays.
 
address@hidden
-Run an @command{awk} program (naturally) over the shell variable's contents to 
expand
address@hidden@@include} statements.  The expanded program is placed in a second
-shell variable.
+All sorting based on @code{PROCINFO["sorted_in"]}
+is disabled in POSIX mode,
+since the @code{PROCINFO} array is not special in that case.
 
address@hidden
-Run the expanded program with @command{gawk} and any other original 
command-line
-arguments that the user supplied (such as the data @value{FN}s).
address@hidden enumerate
+As a side note, sorting the array indices before traversing
+the array has been reported to add 15% to 20% overhead to the
+execution time of @command{awk} programs. For this reason,
+sorted array traversal is not the default.
 
-This program uses shell variables extensively: for storing command-line 
arguments,
-the text of the @command{awk} program that will expand the user's program, for 
the
-user's original program, and for the expanded program.  Doing so removes some
-potential problems that might arise were we to use temporary files instead,
-at the cost of making the script somewhat more complicated.
address@hidden The @command{gawk}
address@hidden maintainers believe that only the people who wish to use a
address@hidden feature should have to pay for it.
 
-The initial part of the program turns on shell tracing if the first
-argument is @samp{debug}.
address@hidden Array Sorting Functions
address@hidden Sorting Array Values and Indices with @command{gawk}
 
-The next part loops through all the command-line arguments.
-There are several cases of interest:
address@hidden arrays, sorting
address@hidden @code{asort()} function (@command{gawk})
address@hidden @code{asort()} function (@command{gawk}), address@hidden sorting
address@hidden sort function, arrays, sorting
+In most @command{awk} implementations, sorting an array requires
+writing a @code{sort()} function.
+While this can be educational for exploring different sorting algorithms,
+usually that's not the point of the program.
address@hidden provides the built-in @code{asort()}
+and @code{asorti()} functions
+(@pxref{String Functions})
+for sorting arrays.  For example:
 
address@hidden @code
address@hidden --
-This ends the arguments to @command{igawk}.  Anything else should be passed on
-to the user's @command{awk} program without being evaluated.
address@hidden
address@hidden the array} data
+n = asort(data)
+for (i = 1; i <= n; i++)
+    @var{do something with} data[i]
address@hidden example
 
address@hidden -W
-This indicates that the next option is specific to @command{gawk}.  To make
-argument processing easier, the @option{-W} is appended to the front of the
-remaining arguments and the loop continues.  (This is an @command{sh}
-programming trick.  Don't worry about it if you are not familiar with
address@hidden)
+After the call to @code{asort()}, the array @code{data} is indexed from 1
+to some number @var{n}, the total number of elements in @code{data}.
+(This count is @code{asort()}'s return value.)
address@hidden @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
+The comparison is based on the type of the elements
+(@pxref{Typing and Comparison}).
+All numeric values come before all string values,
+which in turn come before all subarrays.
 
address@hidden address@hidden,} -F
-These are saved and passed on to @command{gawk}.
address@hidden side effects, @code{asort()} function
+An important side effect of calling @code{asort()} is that
address@hidden array's original indices are irrevocably lost}.
+As this isn't always desirable, @code{asort()} accepts a
+second argument:
 
address@hidden address@hidden,} address@hidden,} address@hidden,} -Wfile=
-The @value{FN} is appended to the shell variable @code{program} with an
address@hidden@@include} statement.
-The @command{expr} utility is used to remove the leading option part of the
-argument (e.g., @samp{--file=}).
-(Typical @command{sh} usage would be to use the @command{echo} and 
@command{sed}
-utilities to do this work.  Unfortunately, some versions of @command{echo} 
evaluate
-escape sequences in their arguments, possibly mangling the program text.
-Using @command{expr} avoids this problem.)
address@hidden
address@hidden the array} source
+n = asort(source, dest)
+for (i = 1; i <= n; i++)
+    @var{do something with} dest[i]
address@hidden example
 
address@hidden address@hidden,} address@hidden,} -Wsource=
-The source text is appended to @code{program}.
+In this case, @command{gawk} copies the @code{source} array into the
address@hidden array and then sorts @code{dest}, destroying its indices.
+However, the @code{source} array is not affected.
 
address@hidden address@hidden,} -Wversion
address@hidden prints its version number, runs @samp{gawk --version}
-to get the @command{gawk} version information, and then exits.
address@hidden table
address@hidden()} accepts a third string argument to control comparison of
+array elements.  As with @code{PROCINFO["sorted_in"]}, this argument
+may be one of the predefined names that @command{gawk} provides
+(@pxref{Controlling Scanning}), or the name of a user-defined function
+(@pxref{Controlling Array Traversal}).
 
-If none of the @option{-f}, @option{--file}, @option{-Wfile}, 
@option{--source},
-or @option{-Wsource} arguments are supplied, then the first nonoption argument
-should be the @command{awk} program.  If there are no command-line
-arguments left, @command{igawk} prints an error message and exits.
-Otherwise, the first argument is appended to @code{program}.
-In any case, after the arguments have been processed,
address@hidden contains the complete text of the original @command{awk}
-program.
address@hidden NOTE
+In all cases, the sorted element values consist of the original
+array's element values.  The ability to control comparison merely
+affects the way in which they are sorted.
address@hidden quotation
+
+Often, what's needed is to sort on the values of the @emph{indices}
+instead of the values of the elements.
+To do that, use the
address@hidden()} function.  The interface is identical to that of
address@hidden()}, except that the index values are used for sorting, and
+become the values of the result array:
+
address@hidden
address@hidden source[$0] = some_func($0) @}
+
+END @{
+    n = asorti(source, dest)
+    for (i = 1; i <= n; i++) @{
+        @ii{Work with sorted indices directly:}
+        @var{do something with} dest[i]
+        @dots{}
+        @ii{Access original array via sorted indices:}
+        @var{do something with} source[dest[i]]
+    @}
address@hidden
address@hidden example
+
+Similar to @code{asort()},
+in all cases, the sorted element values consist of the original
+array's indices.  The ability to control comparison merely
+affects the way in which they are sorted.
+
+Sorting the array by replacing the indices provides maximal flexibility.
+To traverse the elements in decreasing order, use a loop that goes from
address@hidden down to 1, either over the elements or over the address@hidden
+may also use one of the predefined sorting names that sorts in
+decreasing order.}
+
address@hidden reference counting, sorting arrays
+Copying array indices and elements isn't expensive in terms of memory.
+Internally, @command{gawk} maintains @dfn{reference counts} to data.
+For example, when @code{asort()} copies the first array to the second one,
+there is only one copy of the original array elements' data, even though
+both arrays use the values.
 
-The program is as follows:
address@hidden Document It And Call It A Feature. Sigh.
address@hidden @command{gawk}, @code{IGNORECASE} variable in
address@hidden @code{IGNORECASE} variable
address@hidden arrays, sorting, @code{IGNORECASE} variable and
address@hidden @code{IGNORECASE} variable, array sorting and
+Because @code{IGNORECASE} affects string comparisons, the value
+of @code{IGNORECASE} also affects sorting for both @code{asort()} and 
@code{asorti()}.
+Note also that the locale's sorting order does @emph{not}
+come into play; comparisons are based on character values address@hidden
+is true because locale-based comparison occurs only when in POSIX
+compatibility mode, and since @code{asort()} and @code{asorti()} are
address@hidden extensions, they are not available in that case.}
+Caveat Emptor.
 
address@hidden @code{igawk.sh} program
address@hidden
address@hidden file eg/prog/igawk.sh
-#! /bin/sh
-# igawk --- like gawk but do @@include processing
address@hidden endfile
address@hidden
address@hidden file eg/prog/igawk.sh
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# July 1993
-# December 2010, minor edits
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/igawk.sh
address@hidden Two-way I/O
address@hidden Two-Way Communications with Another Process
address@hidden Brennan, Michael
address@hidden programmers, attractiveness of
address@hidden
address@hidden Path: 
cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
+From: brennan@@whidbey.com (Mike Brennan)
+Newsgroups: comp.lang.awk
+Subject: Re: Learn the SECRET to Attract Women Easily
+Date: 4 Aug 1997 17:34:46 GMT
address@hidden Organization: WhidbeyNet
address@hidden Lines: 12
+Message-ID: <5s53rm$eca@@news.whidbey.com>
address@hidden References: <address@hidden>
address@hidden Reply-To: address@hidden
address@hidden NNTP-Posting-Host: asn202.whidbey.com
address@hidden X-Newsreader: slrn (0.9.4.1 UNIX)
address@hidden Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
 
-if [ "$1" = debug ]
-then
-    set -x
-    shift
-fi
+On 3 Aug 1997 13:17:43 GMT, Want More Dates???
+<tracy78@@kilgrona.com> wrote:
+>Learn the SECRET to Attract Women Easily
+>
+>The SCENT(tm)  Pheromone Sex Attractant For Men to Attract Women
 
-# A literal newline, so that program text is formatted correctly
-n='
-'
+The scent of awk programmers is a lot more attractive to women than
+the scent of perl programmers.
+--
+Mike Brennan
address@hidden brennan@@whidbey.com
address@hidden smallexample
 
-# Initialize variables to empty
-program=
-opts=
address@hidden advanced features, @command{gawk}, address@hidden communicating 
with
address@hidden processes, two-way communications with
+It is often useful to be able to
+send data to a separate program for
+processing and then read the result.  This can always be
+done with temporary files:
 
-while [ $# -ne 0 ] # loop over arguments
-do
-    case $1 in
-    --)     shift
-            break ;;
address@hidden
+# Write the data for processing
+tempfile = ("mydata." PROCINFO["pid"])
+while (@var{not done with data})
+    print @var{data} | ("subprogram > " tempfile)
+close("subprogram > " tempfile)
 
-    -W)     shift
-            # The address@hidden'message here'@} construct prints a
-            # diagnostic if $x is the null string
-            set -- -W"address@hidden@@?'missing operand'@}"
-            continue ;;
+# Read the results, remove tempfile when done
+while ((getline newdata < tempfile) > 0)
+    @var{process} newdata @var{appropriately}
+close(tempfile)
+system("rm " tempfile)
address@hidden example
 
-    -[vF])  opts="$opts $1 'address@hidden'missing operand'@}'"
-            shift ;;
address@hidden
+This works, but not elegantly.  Among other things, it requires that
+the program be run in a directory that cannot be shared among users;
+for example, @file{/tmp} will not do, as another user might happen
+to be using a temporary file with the same name.
 
-    -[vF]*) opts="$opts '$1'" ;;
address@hidden coprocesses
address@hidden input/output, two-way
address@hidden @code{|} (vertical bar), @code{|&} operator (I/O)
address@hidden vertical bar (@code{|}), @code{|&} operator (I/O)
address@hidden @command{csh} utility, @code{|&} operator, comparison with
+However, with @command{gawk}, it is possible to
+open a @emph{two-way} pipe to another process.  The second process is
+termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}.
+The two-way connection is created using the @samp{|&} operator
+(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
+different from the same operator in the C shell.}
 
-    -f)     program="$program$n@@include address@hidden'missing operand'@}"
-            shift ;;
address@hidden
+do @{
+    print @var{data} |& "subprogram"
+    "subprogram" |& getline results
address@hidden while (@var{data left to process})
+close("subprogram")
address@hidden example
 
-    -f*)    f=$(expr "$1" : '-f\(.*\)')
-            program="$program$n@@include $f" ;;
+The first time an I/O operation is executed using the @samp{|&}
+operator, @command{gawk} creates a two-way pipeline to a child process
+that runs the other program.  Output created with @code{print}
+or @code{printf} is written to the program's standard input, and
+output from the program's standard output can be read by the @command{gawk}
+program using @code{getline}.
+As is the case with processes started by @samp{|}, the subprogram
+can be any program, or pipeline of programs, that can be started by
+the shell.
 
-    -[W-]file=*)
-            f=$(expr "$1" : '-.file=\(.*\)')
-            program="$program$n@@include $f" ;;
+There are some cautionary items to be aware of:
 
-    -[W-]file)
-            program="$program$n@@include address@hidden'missing operand'@}"
-            shift ;;
address@hidden @bullet
address@hidden
+As the code inside @command{gawk} currently stands, the coprocess's
+standard error goes to the same place that the parent @command{gawk}'s
+standard error goes. It is not possible to read the child's
+standard error separately.
 
-    -[W-]source=*)
-            t=$(expr "$1" : '-.source=\(.*\)')
-            program="$program$n$t" ;;
address@hidden deadlocks
address@hidden buffering, input/output
address@hidden @code{getline} command, deadlock and
address@hidden
+I/O buffering may be a problem.  @command{gawk} automatically
+flushes all output down the pipe to the coprocess.
+However, if the coprocess does not flush its output,
address@hidden may hang when doing a @code{getline} in order to read
+the coprocess's results.  This could lead to a situation
+known as @dfn{deadlock}, where each process is waiting for the
+other one to do something.
address@hidden itemize
 
-    -[W-]source)
-            program="address@hidden'missing operand'@}"
-            shift ;;
address@hidden @code{close()} function, two-way pipes and
+It is possible to close just one end of the two-way pipe to
+a coprocess, by supplying a second argument to the @code{close()}
+function of either @code{"to"} or @code{"from"}
+(@pxref{Close Files And Pipes}).
+These strings tell @command{gawk} to close the end of the pipe
+that sends data to the coprocess or the end that reads from it,
+respectively.
 
-    -[W-]version)
-            echo igawk: version 3.0 1>&2
-            gawk --version
-            exit 0 ;;
address@hidden @command{sort} utility, coprocesses and
+This is particularly necessary in order to use
+the system @command{sort} utility as part of a coprocess;
address@hidden must read @emph{all} of its input
+data before it can produce any output.
+The @command{sort} program does not receive an end-of-file indication
+until @command{gawk} closes the write end of the pipe.
 
-    -[W-]*) opts="$opts '$1'" ;;
+When you have finished writing data to the @command{sort}
+utility, you can close the @code{"to"} end of the pipe, and
+then start reading sorted data via @code{getline}.
+For example:
 
-    *)      break ;;
-    esac
-    shift
-done
address@hidden
+BEGIN @{
+    command = "LC_ALL=C sort"
+    n = split("abcdefghijklmnopqrstuvwxyz", a, "")
 
-if [ -z "$program" ]
-then
-     address@hidden'missing program'@}
-     shift
-fi
+    for (i = n; i > 0; i--)
+        print a[i] |& command
+    close(command, "to")
 
-# At this point, `program' has the program.
address@hidden endfile
+    while ((command |& getline line) > 0)
+        print "got", line
+    close(command)
address@hidden
 @end example
 
-The @command{awk} program to process @samp{@@include} directives
-is stored in the shell variable @code{expand_prog}.  Doing this keeps
-the shell script readable.  The @command{awk} program
-reads through the user's program, one line at a time, using @code{getline}
-(@pxref{Getline}).  The input
address@hidden and @samp{@@include} statements are managed using a stack.
-As each @samp{@@include} is encountered, the current @value{FN} is
-``pushed'' onto the stack and the file named in the @samp{@@include}
-directive becomes the current @value{FN}.  As each file is finished,
-the stack is ``popped,'' and the previous input file becomes the current
-input file again.  The process is started by making the original file
-the first one on the stack.
-
-The @code{pathto()} function does the work of finding the full path to
-a file.  It simulates @command{gawk}'s behavior when searching the
address@hidden environment variable
-(@pxref{AWKPATH Variable}).
-If a @value{FN} has a @samp{/} in it, no path search is done.
-Similarly, if the @value{FN} is @code{"-"}, then that string is
-used as-is.  Otherwise,
-the @value{FN} is concatenated with the name of each directory in
-the path, and an attempt is made to open the generated @value{FN}.
-The only way to test if a file can be read in @command{awk} is to go
-ahead and try to read it with @code{getline}; this is what @code{pathto()}
address@hidden some very old versions of @command{awk}, the test
address@hidden junk < t} can loop forever if the file exists but is empty.
-Caveat emptor.} If the file can be read, it is closed and the @value{FN}
-is returned:
+This program writes the letters of the alphabet in reverse order, one
+per line, down the two-way pipe to @command{sort}.  It then closes the
+write end of the pipe, so that @command{sort} receives an end-of-file
+indication.  This causes @command{sort} to sort the data and write the
+sorted data back to the @command{gawk} program.  Once all of the data
+has been read, @command{gawk} terminates the coprocess and exits.
 
address@hidden
-An alternative way to test for the file's existence would be to call
address@hidden("test -r " t)}, which uses the @command{test} utility to
-see if the file exists and is readable.  The disadvantage to this method
-is that it requires creating an extra process and can thus be slightly
-slower.
address@hidden ignore
+As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
+command ensures traditional Unix (ASCII) sorting from @command{sort}.
+
address@hidden @command{gawk}, @code{PROCINFO} array in
address@hidden @code{PROCINFO} array
+You may also use pseudo-ttys (ptys) for
+two-way communication instead of pipes, if your system supports them.
+This is done on a per-command basis, by setting a special element
+in the @code{PROCINFO} array
+(@pxref{Auto-set}),
+like so:
 
 @example
address@hidden file eg/prog/igawk.sh
-expand_prog='
+command = "sort -nr"           # command, save in convenience variable
+PROCINFO[command, "pty"] = 1   # update PROCINFO
+print @dots{} |& command       # start two-way pipe
address@hidden
address@hidden example
 
-function pathto(file,    i, t, junk)
address@hidden
-    if (index(file, "/") != 0)
-        return file
address@hidden
+Using ptys avoids the buffer deadlock issues described earlier, at some
+loss in performance.  If your system does not have ptys, or if all the
+system's ptys are in use, @command{gawk} automatically falls back to
+using regular pipes.
 
-    if (file == "-")
-        return file
address@hidden TCP/IP Networking
address@hidden Using @command{gawk} for Network Programming
address@hidden advanced features, @command{gawk}, network programming
address@hidden networks, programming
address@hidden STARTOFRANGE tcpip
address@hidden TCP/IP
address@hidden @code{/inet/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet/@dots{}} (@command{gawk})
address@hidden @code{/inet4/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet4/@dots{}} (@command{gawk})
address@hidden @code{/inet6/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet6/@dots{}} (@command{gawk})
address@hidden @code{EMISTERED}
address@hidden
address@hidden:@*
+@ @ @ @ @i{A host is a host from coast to coast,@*
+@ @ @ @ and no-one can talk to host that's close,@*
+@ @ @ @ unless the host that isn't address@hidden
+@ @ @ @ is busy hung or dead.}
address@hidden quotation
 
-    for (i = 1; i <= ndirs; i++) @{
-        t = (pathlist[i] "/" file)
address@hidden
-        if ((getline junk < t) > 0) @{
-            # found it
-            close(t)
-            return t
-        @}
address@hidden group
-    @}
-    return ""
address@hidden
address@hidden endfile
address@hidden example
+In addition to being able to open a two-way pipeline to a coprocess
+on the same system
+(@pxref{Two-way I/O}),
+it is possible to make a two-way connection to
+another process on another system across an IP network connection.
 
-The main program is contained inside one @code{BEGIN} rule.  The first thing it
-does is set up the @code{pathlist} array that @code{pathto()} uses.  After
-splitting the path on @samp{:}, null elements are replaced with @code{"."},
-which represents the current directory:
+You can think of this as just a @emph{very long} two-way pipeline to
+a coprocess.
+The way @command{gawk} decides that you want to use TCP/IP networking is
+by recognizing special @value{FN}s that begin with one of @samp{/inet/},
address@hidden/inet4/} or @samp{/inet6}.
 
address@hidden
address@hidden file eg/prog/igawk.sh
-BEGIN @{
-    path = ENVIRON["AWKPATH"]
-    ndirs = split(path, pathlist, ":")
-    for (i = 1; i <= ndirs; i++) @{
-        if (pathlist[i] == "")
-            pathlist[i] = "."
-    @}
address@hidden endfile
address@hidden example
+The full syntax of the special @value{FN} is
address@hidden/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
+The components are:
 
-The stack is initialized with @code{ARGV[1]}, which will be @file{/dev/stdin}.
-The main loop comes next.  Input lines are read in succession. Lines that
-do not start with @samp{@@include} are printed verbatim.
-If the line does start with @samp{@@include}, the @value{FN} is in @code{$2}.
address@hidden()} is called to generate the full path.  If it cannot, then the 
program
-prints an error message and continues.
address@hidden @var
address@hidden net-type
+Specifies the kind of Internet connection to make.
+Use @samp{/inet4/} to force IPv4, and
address@hidden/inet6/} to force IPv6.
+Plain @samp{/inet/} (which used to be the only option) uses
+the system default, most likely IPv4.
 
-The next thing to check is if the file is included already.  The
address@hidden array is indexed by the full @value{FN} of each included
-file and it tracks this information for us.  If the file is
-seen again, a warning message is printed. Otherwise, the new @value{FN} is
-pushed onto the stack and processing continues.
address@hidden protocol
+The protocol to use over IP.  This must be either @samp{tcp}, or
address@hidden, for a TCP or UDP IP connection,
+respectively.  The use of TCP is recommended for most applications.
 
-Finally, when @code{getline} encounters the end of the input file, the file
-is closed and the stack is popped.  When @code{stackptr} is less than zero,
-the program is done:
address@hidden local-port
address@hidden @code{getaddrinfo()} function (C library)
+The local TCP or UDP port number to use.  Use a port number of @samp{0}
+when you want the system to pick a port. This is what you should do
+when writing a TCP or UDP client.
+You may also use a well-known service name, such as @samp{smtp}
+or @samp{http}, in which case @command{gawk} attempts to determine
+the predefined port number using the C @code{getaddrinfo()} function.
 
address@hidden
address@hidden file eg/prog/igawk.sh
-    stackptr = 0
-    input[stackptr] = ARGV[1] # ARGV[1] is first file
address@hidden remote-host
+The IP address or fully-qualified domain name of the Internet
+host to which you want to connect.
 
-    for (; stackptr >= 0; stackptr--) @{
-        while ((getline < input[stackptr]) > 0) @{
-            if (tolower($1) != "@@include") @{
-                print
-                continue
-            @}
-            fpath = pathto($2)
address@hidden
-            if (fpath == "") @{
-                printf("igawk:%s:%d: cannot find %s\n",
-                    input[stackptr], FNR, $2) > "/dev/stderr"
-                continue
-            @}
address@hidden group
-            if (! (fpath in processed)) @{
-                processed[fpath] = input[stackptr]
-                input[++stackptr] = fpath  # push onto stack
-            @} else
-                print $2, "included in", input[stackptr],
-                    "already included in",
-                    processed[fpath] > "/dev/stderr"
-        @}
-        close(input[stackptr])
-    @}
address@hidden'  # close quote ends `expand_prog' variable
address@hidden remote-port
+The TCP or UDP port number to use on the given @var{remote-host}.
+Again, use @samp{0} if you don't care, or else a well-known
+service name.
address@hidden table
 
-processed_program=$(gawk -- "$expand_prog" /dev/stdin << EOF
-$program
-EOF
-)
address@hidden endfile
address@hidden @command{gawk}, @code{ERRNO} variable in
address@hidden @code{ERRNO} variable
address@hidden NOTE
+Failure in opening a two-way socket will result in a non-fatal error
+being returned to the calling code. The value of @code{ERRNO} indicates
+the error (@pxref{Auto-set}).
address@hidden quotation
+
+Consider the following very simple example:
+
address@hidden
+BEGIN @{
+  Service = "/inet/tcp/0/localhost/daytime"
+  Service |& getline
+  print $0
+  close(Service)
address@hidden
 @end example
 
-The shell construct @address@hidden << @var{marker}} is called a @dfn{here 
document}.
-Everything in the shell script up to the @var{marker} is fed to @var{command} 
as input.
-The shell processes the contents of the here document for variable and command 
substitution
-(and possibly other things as well, depending upon the shell).
+This program reads the current date and time from the local system's
+TCP @samp{daytime} server.
+It then prints the results and closes the connection.
 
-The shell construct @samp{$(@dots{})} is called @dfn{command substitution}.
-The output of the command inside the parentheses is substituted
-into the command line.
-Because the result is used in a variable assignment,
-it is saved as a single string, even if the results contain whitespace.
+Because this topic is extensive, the use of @command{gawk} for
+TCP/IP programming is documented separately.
address@hidden
+See
address@hidden, , General Introduction, gawkinet, TCP/IP Internetworking with 
@command{gawk}},
address@hidden ifinfo
address@hidden
+See @cite{TCP/IP Internetworking with @command{gawk}},
+which comes as part of the @command{gawk} distribution,
address@hidden ifnotinfo
+for a much more complete introduction and discussion, as well as
+extensive examples.
 
-The expanded program is saved in the variable @code{processed_program}.
-It's done in these steps:
address@hidden ENDOFRANGE tcpip
 
address@hidden
address@hidden
-Run @command{gawk} with the @samp{@@include}-processing program (the
-value of the @code{expand_prog} shell variable) on standard input.
address@hidden Profiling
address@hidden Profiling Your @command{awk} Programs
address@hidden STARTOFRANGE awkp
address@hidden @command{awk} programs, profiling
address@hidden STARTOFRANGE proawk
address@hidden profiling @command{awk} programs
address@hidden profiling @command{gawk}
address@hidden @code{awkprof.out} file
address@hidden files, @code{awkprof.out}
 
address@hidden
-Standard input is the contents of the user's program, from the shell variable 
@code{program}.
-Its contents are fed to @command{gawk} via a here document.
+You may produce execution traces of your @command{awk} programs.
+This is done by passing the option @option{--profile} to @command{gawk}.
+When @command{gawk} has finished running, it creates a profile of your program 
in a file
+named @file{awkprof.out}. Because it is profiling, it also executes up to 45% 
slower than
address@hidden normally does.
 
address@hidden
-The results of this processing are saved in the shell variable 
@code{processed_program} by using command substitution.
address@hidden enumerate
address@hidden @code{--profile} option
+As shown in the following example,
+the @option{--profile} option can be used to change the name of the file
+where @command{gawk} will write the profile:
 
-The last step is to call @command{gawk} with the expanded program,
-along with the original
-options and command-line arguments that the user supplied.
address@hidden
+gawk --profile=myprog.prof -f myprog.awk data1 data2
address@hidden example
 
address@hidden this causes more problems than it solves, so leave it out.
address@hidden
-The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk}
-to handle an interesting case. Suppose that the user's program only has
-a @code{BEGIN} rule and there are no @value{DF}s to read.
-The program should exit without reading any @value{DF}s.
-However, suppose that an included library file defines an @code{END}
-rule of its own. In this case, @command{gawk} will hang, reading standard
-input. In order to avoid this, @file{/dev/null} is explicitly added to the
-command-line. Reading from @file{/dev/null} always returns an immediate
-end of file indication.
address@hidden
+In the above example, @command{gawk} places the profile in
address@hidden instead of in @file{awkprof.out}.
 
address@hidden Hmm. Add /dev/null if $# is 0?  Still messes up ARGV. Sigh.
address@hidden ignore
+Here is a sample session showing a simple @command{awk} program, its input 
data, and the
+results from running @command{gawk} with the @option{--profile} option.
+First, the @command{awk} program:
 
 @example
address@hidden file eg/prog/igawk.sh
-eval gawk $opts -- '"$processed_program"' '"$@@"'
address@hidden endfile
address@hidden example
+BEGIN @{ print "First BEGIN rule" @}
 
-The @command{eval} command is a shell construct that reruns the shell's parsing
-process.  This keeps things properly quoted.
+END @{ print "First END rule" @}
 
-This version of @command{igawk} represents my fifth version of this program.
-There are four key simplifications that make the program work better:
+/foo/ @{
+    print "matched /foo/, gosh"
+    for (i = 1; i <= 3; i++)
+        sing()
address@hidden
 
address@hidden @bullet
address@hidden
-Using @samp{@@include} even for the files named with @option{-f} makes building
-the initial collected @command{awk} program much simpler; all the
address@hidden@@include} processing can be done once.
address@hidden
+    if (/foo/)
+        print "if is true"
+    else
+        print "else is true"
address@hidden
 
address@hidden
-Not trying to save the line read with @code{getline}
-in the @code{pathto()} function when testing for the
-file's accessibility for use with the main program simplifies things
-considerably.
address@hidden what problem does this engender though - exercise
address@hidden answer, reading from "-" or /dev/stdin
+BEGIN @{ print "Second BEGIN rule" @}
 
address@hidden
-Using a @code{getline} loop in the @code{BEGIN} rule does it all in one
-place.  It is not necessary to call out to a separate loop for processing
-nested @samp{@@include} statements.
+END @{ print "Second END rule" @}
 
address@hidden
-Instead of saving the expanded program in a temporary file, putting it in a 
shell variable
-avoids some potential security problems.
-This has the disadvantage that the script relies upon more features
-of the @command{sh} language, making it harder to follow for those who
-aren't familiar with @command{sh}.
address@hidden itemize
+function sing(    dummy)
address@hidden
+    print "I gotta be me!"
address@hidden
address@hidden example
 
-Also, this program illustrates that it is often worthwhile to combine
address@hidden and @command{awk} programming together.  You can usually
-accomplish quite a lot, without having to resort to low-level programming
-in C or C++, and it is frequently easier to do certain kinds of string
-and argument manipulation using the shell than it is in @command{awk}.
+Following is the input data:
 
-Finally, @command{igawk} shows that it is not always necessary to add new
-features to a program; they can often be layered on top.
address@hidden
-With @command{igawk},
-there is no real reason to build @samp{@@include} processing into
address@hidden itself.
address@hidden ignore
address@hidden
+foo
+bar
+baz
+foo
+junk
address@hidden example
 
address@hidden search paths
address@hidden search paths, for source files
address@hidden source address@hidden search path for
address@hidden files, address@hidden search path for
address@hidden directories, searching
-As an additional example of this, consider the idea of having two
-files in a directory in the search path:
+Here is the @file{awkprof.out} that results from running the @command{gawk}
+profiler on this program and data (this example also illustrates that 
@command{awk}
+programmers sometimes have to work late):
 
address@hidden @file
address@hidden default.awk
-This file contains a set of default library functions, such
-as @code{getopt()} and @code{assert()}.
address@hidden @code{BEGIN} pattern
address@hidden @code{END} pattern
address@hidden
+        # gawk profile, created Sun Aug 13 00:00:15 2000
 
address@hidden site.awk
-This file contains library functions that are specific to a site or
-installation; i.e., locally developed functions.
-Having a separate file allows @file{default.awk} to change with
-new @command{gawk} releases, without requiring the system administrator to
-update it each time by adding the local functions.
address@hidden table
+        # BEGIN block(s)
 
-One user
address@hidden Karl Berry, address@hidden, 10/95
-suggested that @command{gawk} be modified to automatically read these files
-upon startup.  Instead, it would be very simple to modify @command{igawk}
-to do this. Since @command{igawk} can process nested @samp{@@include}
-directives, @file{default.awk} could simply contain @samp{@@include}
-statements for the desired library functions.
+        BEGIN @{
+     1          print "First BEGIN rule"
+     1          print "Second BEGIN rule"
+        @}
 
address@hidden Exercise: make this change
address@hidden ENDOFRANGE libfex
address@hidden ENDOFRANGE flibex
address@hidden ENDOFRANGE awkpex
+        # Rule(s)
 
address@hidden Anagram Program
address@hidden Finding Anagrams From A Dictionary
+     5  /foo/   @{ # 2
+     2          print "matched /foo/, gosh"
+     6          for (i = 1; i <= 3; i++) @{
+     6                  sing()
+                @}
+        @}
 
-An interesting programming challenge is to
-search for @dfn{anagrams} in a
-word list (such as
address@hidden/usr/share/dict/words} on many GNU/Linux systems).
-One word is an anagram of another if both words contain
-the same letters
-(for example, ``babbling'' and ``blabbing'').
+     5  @{
+     5          if (/foo/) @{ # 2
+     2                  print "if is true"
+     3          @} else @{
+     3                  print "else is true"
+                @}
+        @}
 
-An elegant algorithm is presented in Column 2, Problem C of
-Jon Bentley's @cite{Programming Pearls}, second edition.
-The idea is to give words that are anagrams a common signature,
-sort all the words together by their signature, and then print them.
-Dr.@: Bentley observes that taking the letters in each word and
-sorting them produces that common signature.
+        # END block(s)
 
-The following program uses arrays of arrays to bring together
-words with the same signature and array sorting to print the words
-in sorted order.
+        END @{
+     1          print "First END rule"
+     1          print "Second END rule"
+        @}
 
address@hidden @code{anagram.awk} program
address@hidden
address@hidden file eg/prog/anagram.awk
-# anagram.awk --- An implementation of the anagram finding algorithm
-#                 from Jon Bentley's "Programming Pearls", 2nd edition.
-#                 Addison Wesley, 2000, ISBN 0-201-65788-0.
-#                 Column 2, Problem C, section 2.8, pp 18-20.
address@hidden endfile
address@hidden
address@hidden file eg/prog/anagram.awk
-#
-# This program requires gawk 4.0 or newer.
-# Required gawk-specific features:
-#   - True multidimensional arrays
-#   - split() with "" as separator splits out individual characters
-#   - asort() and asorti() functions
-#
-# See http://savannah.gnu.org/projects/gawk.
-#
-# Arnold Robbins
-# arnold@@skeeve.com
-# Public Domain
-# January, 2011
address@hidden endfile
address@hidden ignore
address@hidden file eg/prog/anagram.awk
+        # Functions, listed alphabetically
 
-/'s$/   @{ next @}        # Skip possessives
address@hidden endfile
+     6  function sing(dummy)
+        @{
+     6          print "I gotta be me!"
+        @}
 @end example
 
-The program starts with a header, and then a rule to skip
-possessives in the dictionary file. The next rule builds
-up the data structure. The first dimension of the array
-is indexed by the signature; the second dimension is the word
-itself:
+This example illustrates many of the basic features of profiling output.
+They are as follows:
 
address@hidden
address@hidden file eg/prog/anagram.awk
address@hidden
-    key = word2key($1)  # Build signature
-    data[key][$1] = $1  # Store word with signature
address@hidden
address@hidden endfile
address@hidden example
address@hidden @bullet
address@hidden
+The program is printed in the order @code{BEGIN} rule,
address@hidden rule,
+pattern/action rules,
address@hidden rule, @code{END} rule and functions, listed
+alphabetically.
+Multiple @code{BEGIN} and @code{END} rules are merged together,
+as are multiple @code{BEGINFILE} and @code{ENDFILE} rules.
 
-The @code{word2key()} function creates the signature.
-It splits the word apart into individual letters,
-sorts the letters, and then joins them back together:
address@hidden patterns, counts
address@hidden
+Pattern-action rules have two counts.
+The first count, to the left of the rule, shows how many times
+the rule's pattern was @emph{tested}.
+The second count, to the right of the rule's opening left brace
+in a comment,
+shows how many times the rule's action was @emph{executed}.
+The difference between the two indicates how many times the rule's
+pattern evaluated to false.
+
address@hidden
+Similarly,
+the count for an @address@hidden statement shows how many times
+the condition was tested.
+To the right of the opening left brace for the @code{if}'s body
+is a count showing how many times the condition was true.
+The count for the @code{else}
+indicates how many times the test failed.
+
address@hidden loops, count for header
address@hidden
+The count for a loop header (such as @code{for}
+or @code{while}) shows how many times the loop test was executed.
+(Because of this, you can't just look at the count on the first
+statement in a rule to determine how many times the rule was executed.
+If the first statement is a loop, the count is misleading.)
+
address@hidden functions, user-defined, counts
address@hidden user-defined, functions, counts
address@hidden
+For user-defined functions, the count next to the @code{function}
+keyword indicates how many times the function was called.
+The counts next to the statements in the body show how many times
+those statements were executed.
+
address@hidden @address@hidden@}} (braces)
address@hidden braces (@address@hidden@}})
address@hidden
+The layout uses ``K&R'' style with TABs.
+Braces are used everywhere, even when
+the body of an @code{if}, @code{else}, or loop is only a single statement.
 
address@hidden
address@hidden file eg/prog/anagram.awk
-# word2key --- split word apart into letters, sort, joining back together
address@hidden @code{()} (parentheses)
address@hidden parentheses @code{()}
address@hidden
+Parentheses are used only where needed, as indicated by the structure
+of the program and the precedence rules.
address@hidden extra verbiage here satisfies the copyeditor. ugh.
+For example, @samp{(3 + 5) * 4} means add three plus five, then multiply
+the total by four.  However, @samp{3 + 5 * 4} has no parentheses, and
+means @samp{3 + (5 * 4)}.
 
-function word2key(word,     a, i, n, result)
address@hidden
-    n = split(word, a, "")
-    asort(a)
address@hidden
address@hidden
+All string concatenations are parenthesized too.
+(This could be made a bit smarter.)
address@hidden ignore
 
-    for (i = 1; i <= n; i++)
-        result = result a[i]
address@hidden
+Parentheses are used around the arguments to @code{print}
+and @code{printf} only when
+the @code{print} or @code{printf} statement is followed by a redirection.
+Similarly, if
+the target of a redirection isn't a scalar, it gets parenthesized.
 
-    return result
address@hidden
address@hidden endfile
address@hidden example
address@hidden
address@hidden supplies leading comments in
+front of the @code{BEGIN} and @code{END} rules,
+the pattern/action rules, and the functions.
 
-Finally, the @code{END} rule traverses the array
-and prints out the anagram lists.  It sends the output
-to the system @command{sort} command, since otherwise
-the anagrams would appear in arbitrary order:
address@hidden itemize
 
address@hidden
address@hidden file eg/prog/anagram.awk
-END @{
-    sort = "sort"
-    for (key in data) @{
-        # Sort words with same key
-        nwords = asorti(data[key], words)
-        if (nwords == 1)
-            continue
+The profiled version of your program may not look exactly like what you
+typed when you wrote it.  This is because @command{gawk} creates the
+profiled version by ``pretty printing'' its internal representation of
+the program.  The advantage to this is that @command{gawk} can produce
+a standard representation.  The disadvantage is that all source-code
+comments are lost, as are the distinctions among multiple @code{BEGIN},
address@hidden, @code{BEGINFILE}, and @code{ENDFILE} rules.  Also, things such 
as:
 
-        # And print. Minor glitch: trailing space at end of each line
-        for (j = 1; j <= nwords; j++)
-            printf("%s ", words[j]) | sort
-        print "" | sort
-    @}
-    close(sort)
address@hidden
address@hidden endfile
address@hidden
+/foo/
 @end example
 
-Here is some partial output when the program is run:
address@hidden
+come out as:
 
 @example
-$ @kbd{gawk -f anagram.awk /usr/share/dict/words | grep '^b'}
address@hidden
-babbled blabbed 
-babbler blabber brabble 
-babblers blabbers brabbles 
-babbling blabbing 
-babbly blabby 
-babel bable 
-babels beslab 
-babery yabber 
address@hidden
+/foo/   @{
+    print $0
address@hidden
 @end example
 
address@hidden Signature Program
address@hidden And Now For Something Completely Different
-
-The following program was written by Davide Brini
address@hidden (@email{dave_br@@gmx.com})
-and is published on @uref{http://backreference.org/2011/02/03/obfuscated-awk/,
-his website}.
-It serves as his signature in the Usenet group @code{comp.lang.awk}.
-He supplies the following copyright terms:
-
address@hidden
-Copyright @copyright{} 2008 Davide Brini
-
-Copying and distribution of the code published in this page, with or without
-modification, are permitted in any medium without royalty provided the 
copyright
-notice and this notice are preserved.
address@hidden quotation
address@hidden
+which is correct, but possibly surprising.
 
-Here is the program:
address@hidden profiling @command{awk} programs, dynamically
address@hidden @command{gawk} program, dynamic profiling
+Besides creating profiles when a program has completed,
address@hidden can produce a profile while it is running.
+This is useful if your @command{awk} program goes into an
+infinite loop and you want to see what has been executed.
+To use this feature, run @command{gawk} with the @option{--profile}
+option in the background:
 
 @example
-awk 'address@hidden"~"~"~";o="=="=="==";o+=+o;x=O""O;while(X++<=x+o+o)c=c"%c";
-printf c,(x-O)*(x-O),x*(x-o)-o,x*(x-O)+x-O-o,+x*(x-O)-x+o,X*(o*o+O)+x-O,
-X*(X-x)-o*o,(x+X)*o*o+o,x*(X-x)-O-O,x-O+(O+o+X+x)*(o+O),X*X-X*(x-O)-x+O,
-O+X*(o*(o+O)+O),+x+O+X*o,x*(x-o),(o+X+x)*o*o-(x-O-O),O+(X-x)*(X+O),address@hidden'
+$ @kbd{gawk --profile -f myprog &}
+[1] 13992
 @end example
 
-We leave it to you to determine what the program does.
-
address@hidden
-To: "Arnold Robbins" <address@hidden>
-Date: Sat, 20 Aug 2011 13:50:46 -0400
-Subject: The GNU Awk User's Guide, Section 13.3.11
-From: "Chris Johansen" <address@hidden>
-Message-ID: <address@hidden>
-
-Arnold, you don't know me, but we have a tenuous connection.  My wife is  
-Barbara A. Field, FAIA, GIT '65 (B. Arch.).
-
-I have had a couple of paper copies of "Effective Awk Programming" for  
-years, and now I'm going through a Kindle version of "The GNU Awk User's  
-Guide" again.  When I got to section 13.3.11, I reformatted and lightly  
-commented Davide Brin's signature script to understand its workings.
-
-It occurs to me that this might have pedagogical value as an example  
-(although imperfect) of the value of whitespace and comments, and a  
-starting point for that discussion.  It certainly helped _me_ understand  
-what's going on.  You are welcome to it, as-is or modified (subject to  
-Davide's constraints, of course, which I think I have met).
-
-If I were to include it in a future edition, I would put it at some  
-distance from section 13.3.11, say, as a note or an appendix, so as not to  
-be a "spoiler" to the puzzle.
-
-Best regards,
--- 
-Chris Johansen {johansen at main dot nc dot us}
-  . . . collapsing the probability wave function, sending ripples of  
-certainty through the space-time continuum.
-
address@hidden @command{kill} address@hidden dynamic profiling
address@hidden @code{USR1} signal
address@hidden @code{SIGUSR1} signal
address@hidden signals, @code{USR1}/@code{SIGUSR1}
address@hidden
+The shell prints a job number and process ID number; in this case, 13992.
+Use the @command{kill} command to send the @code{USR1} signal
+to @command{gawk}:
 
-#! /usr/bin/gawk -f
address@hidden
+$ @kbd{kill -USR1 13992}
address@hidden example
 
-# From "13.3.11 And Now For Something Completely Different"
-#   
http://www.gnu.org/software/gawk/manual/html_node/Signature-Program.html#Signature-Program
address@hidden
+As usual, the profiled version of the program is written to
address@hidden, or to a different file if one specified with
+the @option{--profile} option.
 
-# Copyright © 2008 Davide Brini 
+Along with the regular profile, as shown earlier, the profile
+includes a trace of any active functions:
 
-# Copying and distribution of the code published in this page, with
-# or without modification, are permitted in any medium without
-# royalty provided the copyright notice and this notice are preserved.
address@hidden
+# Function Call Stack:
 
-BEGIN {
-  O = "~" ~ "~";    #  1
-  o = "==" == "=="; #  1
-  o += +o;          #  2
-  x = O "" O;       # 11
+#   3. baz
+#   2. bar
+#   1. foo
+# -- main --
address@hidden example
 
+You may send @command{gawk} the @code{USR1} signal as many times as you like.
+Each time, the profile and function call trace are appended to the output
+profile file.
 
-  while ( X++ <= x + o + o ) c = c "%c";
address@hidden @code{HUP} signal
address@hidden @code{SIGHUP} signal
address@hidden signals, @code{HUP}/@code{SIGHUP}
+If you use the @code{HUP} signal instead of the @code{USR1} signal,
address@hidden produces the profile and the function call trace and then exits.
 
-  # O is  1
-  # o is  2
-  # x is 11
-  # X is 17
-  # c is "%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c"
address@hidden @code{INT} signal (MS-Windows)
address@hidden @code{SIGINT} signal (MS-Windows)
address@hidden signals, @code{INT}/@code{SIGINT} (MS-Windows)
address@hidden @code{QUIT} signal (MS-Windows)
address@hidden @code{SIGQUIT} signal (MS-Windows)
address@hidden signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows)
+When @command{gawk} runs on MS-Windows systems, it uses the
address@hidden and @code{QUIT} signals for producing the profile and, in
+the case of the @code{INT} signal, @command{gawk} exits.  This is
+because these systems don't support the @command{kill} command, so the
+only signals you can deliver to a program are those generated by the
+keyboard.  The @code{INT} signal is generated by the
address@hidden@address@hidden or @address@hidden@key{BREAK}} key, while the
address@hidden signal is generated by the @address@hidden@key{\}} key.
 
-  printf c,
-    ( x - O )*( x - O),                  # 100 d
-    x*( x - o ) - o,                     #  97 a
-    x*( x - O ) + x - O - o,             # 118 v
-    +x*( x - O ) - x + o,                # 101 e
-    X*( o*o + O ) + x - O,               #  95 _
-    X*( X - x ) - o*o,                   #  98 b
-    ( x + X )*o*o + o,                   # 114 r
-    x*( X - x ) - O - O,                 #  64 @
-    x - O + ( O + o + X + x )*( o + O ), # 103 g
-    X*X - X*( x - O ) - x + O,           # 109 m
-    O + X*( o*( o + O ) + O ),           # 120 x
-    +x + O + X*o,                        #  46 .
-    x*( x - o),                          #  99 c
-    ( o + X + x )*o*o - ( x - O - O ),   # 111 0
-    O + ( X - x )*( X + O ),             # 109 m
-    x - O                                #  10 \n
-}
address@hidden ignore
+Finally, @command{gawk} also accepts another option @option{--pretty-print}.
+When called this way, @command{gawk} ``pretty prints'' the program into
address@hidden, without any execution counts.
address@hidden ENDOFRANGE advgaw
address@hidden ENDOFRANGE gawadv
address@hidden ENDOFRANGE awkp
address@hidden ENDOFRANGE proawk
 
 @c The original text for this chapter was contributed by Efraim Yawitz.
 @c FIXME: Add more indexing.
@@ -31858,16 +31897,18 @@ If you write an extension that you wish to share with 
other
 @command{gawk} users, please consider doing so through the
 @code{gawkextlib} project.
 
address@hidden
address@hidden Part IV:@* Appendices
address@hidden iftex
 
 @ignore
address@hidden Try this
address@hidden
address@hidden
address@hidden off
address@hidden III@ @ @ Appendixes
-Part III provides the appendixes, the Glossary, and two licenses that cover
address@hidden
+
address@hidden Part IV:@* Appendices
+
+Part IV provides the appendices, the Glossary, and two licenses that cover
 the @command{gawk} source code and this @value{DOCUMENT}, respectively.
-It contains the following appendixes:
+It contains the following appendices:
 
 @itemize @bullet
 @item
@@ -31891,11 +31932,7 @@ It contains the following appendixes:
 @item
 @ref{GNU Free Documentation License}.
 @end itemize
-
address@hidden
address@hidden @thispage@ @ @ @address@hidden @| @|
address@hidden  @| @| @address@hidden@ @ @ @thispage
address@hidden iftex
address@hidden ifdocbook
 @end ignore
 
 @node Language History

-----------------------------------------------------------------------

Summary of changes:
 doc/ChangeLog |    8 +
 doc/api.texi  | 4103 ------------------
 doc/gawk.info |10339 +++++++++++++++++++++++-----------------------
 doc/gawk.texi |12990 +++++++++++++++++++++++++++++----------------------------
 4 files changed, 11740 insertions(+), 15700 deletions(-)
 delete mode 100644 doc/api.texi


hooks/post-receive
-- 
gawk



reply via email to

[Prev in Thread] Current Thread [Next in Thread]