autoconf-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Updates to shell portability documentation


From: Paolo Bonzini
Subject: [PATCH] Updates to shell portability documentation
Date: Wed, 15 Oct 2008 13:33:35 +0200

This updates the documentation according to the fact that M4sh will
be able to find a SVR2-or-better shell.  I also found a few more places
where the docs were out-of-date and referred to workarounds that
Autoconf does not apply anymore.

Ok?

Paolo
---
 doc/autoconf.texi |  276 +++++++++++++++++++++++++++++++----------------------
 NEWS              |    3 ++
 1 files changed, 172 insertions(+), 107 deletions(-)

2008-10-15  Paolo Bonzini  <address@hidden>

        * doc/autoconf.texi: Updates all references to "Portable Shell" and
        "Limitations of Builtins" to use three-argument commands.
        (Programming in M4sh): Document AS_ECHO, AS_ECHO_N, AS_UNSET.
        (Portable Shell): Move here discussion about "Where is the POSIX
        shell?"  Mention that M4sh provides a SVR2 shell and takes care
        of unsetting variables if necessary.  Talk about M4sh and not only
        Autoconf-generated scripts.
        (Special Shell Variables): Talk about M4sh and not only
        Autoconf-generated scripts.  Don't talk about things that Autoconf
        does not do.  Mention problems of $LINENO with shell functions.
        (Limitations of Builtins).  Mention AS_ECHO and AS_ECHO_N.  Move
        discussion of eval bugs before discussion on proper use of eval.
        Mention AS_IF.  Reword why not to use "shift N".  Mention "foo=;
        unset foo" trick.  Include M4sh code that unsets MAIL for Bash 2.01.
        * NEWS: Update list of documented M4sh macros.

diff --git a/NEWS b/NEWS
index 31e58c6..2f7914a 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,9 @@ GNU Autoconf NEWS - User visible changes.
    AS_ME_PREPARE
 
 ** The following m4sh macros are documented now:
+   AS_ECHO
+   AS_ECHO_N
+   AS_UNSET
    AS_VERSION_COMPARE
 
 
diff --git a/doc/autoconf.texi b/doc/autoconf.texi
index ddd0638..fdd0c2a 100644
--- a/doc/autoconf.texi
+++ b/doc/autoconf.texi
@@ -1030,9 +1030,10 @@ use.  Autoconf macros already exist to check for many 
features; see
 you can use Autoconf template macros to produce custom checks; see
 @ref{Writing Tests}, for information about them.  For especially tricky
 or specialized features, @file{configure.ac} might need to contain some
-hand-crafted shell commands; see @ref{Portable Shell}.  The
address@hidden program can give you a good start in writing
address@hidden (@pxref{autoscan Invocation}, for more information).
+hand-crafted shell commands; see @ref{Portable Shell, , Portable Shell
+Programming}.  The @command{autoscan} program can give you a good start
+in writing @file{configure.ac} (@pxref{autoscan Invocation}, for more
+information).
 
 Previous versions of Autoconf promoted the name @file{configure.in},
 which is somewhat ambiguous (the tool needed to process this file is not
@@ -11847,6 +11847,23 @@ if @code{$file} is @samp{/one/two/three}, the command
 @end defmac
 @end ignore
 
address@hidden AS_ECHO (@var{word})
address@hidden
+Emits @var{word} to the standard output, followed by a newline.  @var{word}
+must be a single shell word (typically a quoted string).  The bytes of
address@hidden are output as-is, even if it starts with "-" or contains "\".
+Redirections can be placed outside the macro invocation.
address@hidden defmac
+
address@hidden AS_ECHO_N (@var{word})
address@hidden
+Emits @var{word} to the standard output, without a following newline.
address@hidden must be a single shell word (typically a quoted string) and,
+for portability, should not include more than one newline.  The bytes of
address@hidden are output as-is, even if it starts with "-" or contains "\".
+Redirections can be placed outside the macro invocation.
address@hidden defmac
+
 @defmac AS_IF (@var{test1}, @ovar{run-if-true1}, @dots{}, @ovar{run-if-false})
 @asindex{IF}
 Run shell code @var{test1}.  If @var{test1} exits with a zero status then
@@ -11911,6 +11912,12 @@ optimizing the common cases (@var{dir} or @var{file} 
is @samp{.},
 @var{file} is absolute, etc.).
 @end defmac
 
address@hidden AS_UNSET (@var{var})
address@hidden
+Unsets the shell variable @var{var}, working around bugs in older
+shells (@pxref{Limitations of Builtins, , Limitations of Shell Builtins}).
address@hidden defmac
+
 @defmac AS_VERSION_COMPARE (@var{version-1}, @var{version-2}, @
   @ovar{action-if-less}, @ovar{action-if-equal}, @ovar{action-if-greater})
 @asindex{VERSION_COMPARE}
@@ -12731,18 +12738,52 @@ test "$ac_cv_emxos2" = yes && EMXOS2=yes[]dnl
 When writing your own checks, there are some shell-script programming
 techniques you should avoid in order to make your code portable.  The
 Bourne shell and upward-compatible shells like the Korn shell and Bash
-have evolved over the years, but to prevent trouble, do not take
-advantage of features that were added after Unix version 7, circa
-1977 (@pxref{Systemology}).
+have evolved over the years, and many features added to the original
+System7 shell are now supported on all interesting porting targets.
+However, the following discussion between Russ Allbery and Robert Lipe
+is worth reading:
+
address@hidden
+Russ Allbery:
+
address@hidden
+The @acronym{GNU} assumption that @command{/bin/sh} is the one and only shell
+leads to a permanent deadlock.  Vendors don't want to break users'
+existing shell scripts, and there are some corner cases in the Bourne
+shell that are not completely compatible with a Posix shell.  Thus,
+vendors who have taken this route will @emph{never} (address@hidden say
+never'') replace the Bourne shell (as @command{/bin/sh}) with a
+Posix shell.
address@hidden quotation
+
address@hidden
+Robert Lipe:
+
address@hidden
+This is exactly the problem.  While most (at least most System V's) do
+have a Bourne shell that accepts shell functions most vendor
address@hidden/bin/sh} programs are not the Posix shell.
 
-You should not use aliases, negated character classes, or other features
-that are not found in all Bourne-compatible shells; restrict yourself
-to the lowest common denominator.  Even @code{unset} is not supported
-by all shells!
+So while most modern systems do have a shell @emph{somewhere} that meets the
+Posix standard, the challenge is to find it.
address@hidden quotation
 
-Shell functions are considered portable nowadays.  However, some pitfalls
-have to be avoided for portable use of shell functions (@pxref{Shell
-Functions}).
+For this reason, part of the job of M4sh (@pxref{Programming in M4sh})
+is to find such a shell.  But to prevent trouble, if you're not using
+M4sh you should not take advantage of features that were added after Unix
+version 7, circa 1977 (@pxref{Systemology}); you should not use aliases,
+negated character classes, or even @command{unset}.  @code{#} comments,
+while not in Unix version 7, were retrofitted in the original Bourne
+shell and can be assumed to be part of the least common denominator.
+
+On the other hand, if you're using M4sh you can assume that the shell
+has the features that were added in SVR2, including shell functions,
address@hidden, @command{unset}, and I/O redirection for builtins.  For
+more information, refer to @uref{http://www.in-ulm.de/~mascheck/bourne/}.
+However, some pitfalls have to be avoided for portable use of this
+constructs; these will be documented in the rest of this chapter.
+See in particular @ref{Shell Functions} and @ref{Limitations of
+Builtins, , Limitations of Shell Builtins}.
 
 Some ancient systems have quite
 small limits on the length of the @samp{#!} line; for instance, 32
@@ -12920,34 +12961,6 @@ The default Mac OS X @command{sh} was originally Zsh; 
it was changed to
 Bash in Mac OS X 10.2.
 @end table
 
-The following discussion between Russ Allbery and Robert Lipe is worth
-reading:
-
address@hidden
-Russ Allbery:
-
address@hidden
-The @acronym{GNU} assumption that @command{/bin/sh} is the one and only shell
-leads to a permanent deadlock.  Vendors don't want to break users'
-existing shell scripts, and there are some corner cases in the Bourne
-shell that are not completely compatible with a Posix shell.  Thus,
-vendors who have taken this route will @emph{never} (address@hidden say
-never'') replace the Bourne shell (as @command{/bin/sh}) with a
-Posix shell.
address@hidden quotation
-
address@hidden
-Robert Lipe:
-
address@hidden
-This is exactly the problem.  While most (at least most System V's) do
-have a Bourne shell that accepts shell functions most vendor
address@hidden/bin/sh} programs are not the Posix shell.
-
-So while most modern systems do have a shell @emph{somewhere} that meets the
-Posix standard, the challenge is to find it.
address@hidden quotation
-
 @node Here-Documents
 @section Here-Documents
 @cindex Here-documents
@@ -13249,7 +13262,8 @@ esac
 
 @noindent
 Make sure you quote the brackets if appropriate and keep the backslash as
-first character (@pxref{Limitations of Builtins}).
+first character (@pxref{Limitations of Builtins, , Limitations of Shell
+Builtins}).
 
 Also, because the colon is used as part of a drivespec, these systems don't
 use it as path separator.  When creating or accessing paths, you can use the
@@ -13891,9 +13905,10 @@ it's not worth worrying about working around these 
horrendous bugs.
 
 Some shell variables should not be used, since they can have a deep
 influence on the behavior of the shell.  In order to recover a sane
-behavior from the shell, some variables should be unset, but
address@hidden is not portable (@pxref{Limitations of Builtins}) and a
-fallback value is needed.
+behavior from the shell, some variables should be unset; M4sh takes
+care of this and provides fallback values, whenever needed, to cater
+for a very old @file{/bin/sh} that does not support @command{unset}.
+(@pxref{Portable Shell, , Portable Shell Programming}).
 
 As a general rule, shell variable names containing a lower-case letter
 are safe; you can define and use these variables without worrying about
@@ -13940,7 +13955,7 @@ In practice the shells that have this problem also 
support
 You can also avoid output by ensuring that your directory name is
 absolute or anchored at @samp{./}, as in @samp{abs=`cd ./src && pwd`}.
 
-Autoconf-generated scripts automatically unset @env{CDPATH} if
+Configure scripts use M4sh, which automatically unsets @env{CDPATH} if
 possible, so you need not worry about this problem in those scripts.
 
 @item DUALCASE
@@ -13966,7 +13981,8 @@ supposed to affect only interactive shells.  However, 
at least one
 shell (the pre-3.0 @sc{uwin} Korn shell) gets confused about
 whether it is interactive, which means that (for example) a @env{PS1}
 with a side effect can unexpectedly modify @samp{$?}.  To work around
-this bug, Autoconf-generated scripts do something like this:
+this bug, M4sh scripts (including @file{configure} scripts) do something
+like this:
 
 @example
 (unset ENV) >/dev/null 2>&1 && unset ENV MAIL MAILPATH
@@ -13975,6 +13991,10 @@ PS2='> '
 PS4='+ '
 @end example
 
address@hidden
+(there is actually some more complication due to bugs in @command{unset},
+see @pxref{Limitations of Builtins, , Limitations of Shell Builtins}).
+
 @item FPATH
 The Korn shell uses @env{FPATH} to find shell functions, so avoid
 @env{FPATH} in portable scripts.  @env{FPATH} is consulted after
@@ -14017,20 +14037,23 @@ to this and join with a space anyway.
 @evindex LC_NUMERIC
 @evindex LC_TIME
 
-Autoconf-generated scripts normally set all these variables to
address@hidden because so much configuration code assumes the C locale and
-Posix requires that locale environment variables be set to
address@hidden if the C locale is desired.  However, some older, nonstandard
-systems (notably @acronym{SCO}) break if locale environment variables
-are set to @samp{C}, so when running on these systems
-Autoconf-generated scripts unset the variables instead.
+You should set all these variables to @samp{C} because so much
+configuration code assumes the C locale and Posix requires that locale
+environment variables be set to @samp{C} if the C locale is desired;
address@hidden scripts and M4sh do that for you.
+Export these variables after setting them.
+
address@hidden  However, some older, nonstandard
address@hidden  systems (notably @acronym{SCO}) break if locale environment 
variables
address@hidden  are set to @samp{C}, so when running on these systems
address@hidden  Autoconf-generated scripts unset the variables instead.
 
 @item LANGUAGE
 @evindex LANGUAGE
 
 @env{LANGUAGE} is not specified by Posix, but it is a @acronym{GNU}
-extension that overrides @env{LC_ALL} in some cases, so
-Autoconf-generated scripts set it too.
+extension that overrides @env{LC_ALL} in some cases, so you (or M4sh)
+should set it too.
 
 @item LC_ADDRESS
 @itemx LC_IDENTIFICATION
@@ -14060,13 +14083,13 @@ character) with the line's number.  In M4sh scripts 
you should execute
 @code{AS_LINENO_PREPARE} so that these workarounds are included in
 your script; configure scripts do this automatically in @code{AC_INIT}.
 
-You should not rely on @code{LINENO} within @command{eval}, as the
-behavior differs in practice.  Also, the possibility of the Sed
-prepass means that you should not rely on @code{$LINENO} when quoted,
-when in here-documents, or when in long commands that cross line
-boundaries.  Subshells should be OK, though.  In the following
-example, lines 1, 6, and 9 are portable, but the other instances of
address@hidden are not:
+You should not rely on @code{LINENO} within @command{eval} or shell
+functions, as the behavior differs in practice.  Also, the possibility
+of the Sed prepass means that you should not rely on @code{$LINENO} when
+quoted, when in here-documents, or when in long commands that cross line
+boundaries.  Subshells should be OK, though.  In the following example,
+lines 1, 6, and 9 are portable, but the other instances of @code{LINENO}
+are not:
 
 @example
 @group
@@ -14187,7 +14210,7 @@ hence read-only.  Do not use it.
 @cindex Shell Functions
 
 Nowadays, it is difficult to find a shell that does not support
-shell functions at all.  However, some differences should be expected:
+shell functions at all.  However, some differences should be expected.
 
 Inside a shell function, you should not rely on the error status of a
 subshell if the last command of that subshell was @code{exit} or
@@ -14260,10 +14283,11 @@ No, no, we are serious: some shells do have 
limitations!  :)
 
 You should always keep in mind that any builtin or command may support
 options, and therefore differ in behavior with arguments
-starting with a dash.  For instance, the innocent @samp{echo "$word"}
+starting with a dash.  For instance, even the innocent @samp{echo "$word"}
 can give unexpected results when @code{word} starts with a dash.  It is
 often possible to avoid this problem using @samp{echo "x$word"}, taking
-the @samp{x} into account later in the pipe.
+the @samp{x} into account later in the pipe.  Many of these limitations
+can be worked around using M4sh (@pxref{Programming in M4sh}).
 
 @table @asis
 @item @command{.}
@@ -14491,12 +14515,8 @@ Also please see the discussion of the @command{pwd} 
command.
 @prindex @command{echo}
 The simple @command{echo} is probably the most surprising source of
 portability troubles.  It is not possible to use @samp{echo} portably
-unless both options and escape sequences are omitted.  New applications
-which are not aiming at portability should use @samp{printf} instead of
address@hidden
-
-Don't expect any option.  @xref{Preset Output Variables}, @code{ECHO_N}
-etc.@: for a means to simulate @option{-n}.
+unless both options and escape sequences are omitted.  Don't expect any
+option.  
 
 Do not use backslashes in the arguments, as there is no consensus on
 their handling.  For @samp{echo '\n' | wc -l}, the @command{sh} of
@@ -14517,6 +14537,12 @@ $foo
 EOF
 @end example
 
+New applications which are not aiming at portability should use
address@hidden instead of @samp{echo}.  M4sh provides the @code{AS_ECHO}
+and @code{AS_ECHO_N} macros (corresponding to @samp{echo -n} which use
address@hidden if it is available, or otherwise resort to various creative
+tricks in order to work around the above problems.
+
 
 @item @command{eval}
 @c -----------------
@@ -14524,9 +14550,27 @@ EOF
 The @command{eval} command is useful in limited circumstances, e.g.,
 using commands like @samp{eval table_$key=\$value} and @samp{eval
 value=table_$key} to simulate a hash table when the key is known to be
-alphanumeric.  However, @command{eval} is tricky to use on arbitrary
-arguments, even when it is implemented correctly.
+alphanumeric. 
+
+You should also be wary of common bugs in @command{eval} implementations.
+In some shell implementations (e.g., older @command{ash}, address@hidden 3.8
address@hidden, @command{pdksh} v5.2.14 99/07/13.2, and @command{zsh}
+4.2.5), the arguments of @samp{eval} are evaluated in a context where
address@hidden is 0, so they exhibit behavior like this:
+
address@hidden
+$ @kbd{false; eval 'echo $?'}
+0
address@hidden example
 
+The correct behavior here is to output a nonzero value,
+but portable scripts should not rely on this.
+
+You should not rely on @code{LINENO} within @command{eval}.
address@hidden Shell Variables}.
+
+Note that, even though these bugs are easily avoided,
address@hidden is tricky to use on arbitrary arguments.
 It is obviously unwise to use @samp{eval $cmd} if the string value of
 @samp{cmd} was derived from an untrustworthy source.  But even if the
 string value is valid, @samp{eval $cmd} might not work as intended,
@@ -14550,23 +14594,6 @@ since it mistakenly replaces the contents of 
@file{bar} by the
 string @samp{cat foo}.  No simple, general, and portable solution to
 this problem is known.
 
-You should also be wary of common bugs in @command{eval} implementations.
-In some shell implementations (e.g., older @command{ash}, address@hidden 3.8
address@hidden, @command{pdksh} v5.2.14 99/07/13.2, and @command{zsh}
-4.2.5), the arguments of @samp{eval} are evaluated in a context where
address@hidden is 0, so they exhibit behavior like this:
-
address@hidden
-$ @kbd{false; eval 'echo $?'}
-0
address@hidden example
-
-The correct behavior here is to output a nonzero value,
-but portable scripts should not rely on this.
-
-You should not rely on @code{LINENO} within @command{eval}.
address@hidden Shell Variables}.
-
 @item @command{exec}
 @c -----------------
 @prindex @command{exec}
@@ -14752,6 +14779,18 @@ if cmp -s file file.new; then :; else
 fi
 @end example
 
address@hidden
+Or, especially if the @dfn{else} branch is short, you can use @code{||}.
+In M4sh, the @code{AS_IF} macro provides an easy way to write this kind
+of conditionals as;
+
address@hidden
+AS_IF([cmp -s file file.new], [], [mv file.new file])
address@hidden example
+
+This is especially useful in other M4 macros, where the @dfn{then} and
address@hidden branches might be macro arguments.
+
 There are shells that do not reset the exit status from an @command{if}:
 
 @example
@@ -14917,8 +14956,8 @@ Not only is @command{shift}ing a bad idea when there is 
nothing left to
 shift, but in addition it is not portable: the shell of @acronym{MIPS
 RISC/OS} 4.52 refuses to do it.
 
-Don't use @samp{shift 2} etc.; it was not in the 7th Edition Bourne shell,
-and it is also absent in many pre-Posix shells.
+Don't use @samp{shift 2} etc.; while it in the SVR1 shell (1983),
+it is also absent in many pre-Posix shells.
 
 
 @item @command{source}
@@ -15115,23 +15154,29 @@ for @command{true}.
 @c ------------------
 @prindex @command{unset}
 In some nonconforming shells (e.g., Bash 2.05a), @code{unset FOO} fails
-when @code{FOO} is not set.  Also, Bash 2.01 mishandles @code{unset
-MAIL} in some cases and dumps core.
+when @code{FOO} is not set.  You can use
 
-A few ancient shells lack @command{unset} entirely.  Nevertheless, because
-it is extremely useful to disable embarrassing variables such as
address@hidden, you can test for its existence and use
-it @emph{provided} you give a neutralizing value when @command{unset} is
-not supported:
address@hidden
+FOO=; unset FOO
address@hidden smallexample
+
+if you are not sure that @code{FOO} is set.
+
+A few ancient shells lack @command{unset} entirely.  For some variables
+such as @code{PS1}, you can use a neutralizing value instead:
 
 @smallexample
-# "|| exit" suppresses any "Segmentation fault" message.
-if ( (MAIL=60; unset MAIL) || exit) >/dev/null 2>&1; then
-  unset=unset
-else
-  unset=false
-fi
-$unset PS1 || PS1='$ '
+PS1='$ '
address@hidden smallexample
+
+Usually, shells that do not support @command{unset} need less effort to
+make the environment sane, so for example is not a problem if you cannot
+unset @command{CDPATH} on those shells.  However, Bash 2.01 mishandles
address@hidden MAIL} in some cases and dumps core.  So, you should do
+something like
+
address@hidden
+( (unset MAIL) || exit 1) >/dev/null 2>&1 && unset MAIL || :
 @end smallexample
 
 @noindent
-- 
1.5.5





reply via email to

[Prev in Thread] Current Thread [Next in Thread]