[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
printf incompatibilities with POSIX, ksh93
From: |
Paul Eggert |
Subject: |
printf incompatibilities with POSIX, ksh93 |
Date: |
Fri, 26 Sep 2003 13:30:32 -0700 |
User-agent: |
Gnus/5.1002 (Gnus v5.10.2) Emacs/21.2 (gnu/linux) |
Configuration Information [Automatically generated, do not change]:
Machine: sparc
OS: solaris2.8
Compiler: gcc
Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='sparc'
-DCONF_OSTYPE='solaris2.8' -DCONF_MACHTYPE='sparc-sun-solaris2.8'
-DCONF_VENDOR='sun' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -g
-O2 -Wall -W -Wno-sign-compare -Wpointer-arith -Wstrict-prototypes
-Wmissing-prototypes -Wmissing-noreturn -Wmissing-format-attribute
uname output: SunOS sic.twinsun.com 5.8 Generic_108528-23 sun4u sparc
SUNW,UltraSPARC-IIi-Engine
Machine Type: sparc-sun-solaris2.8
Bash Version: 2.05b
Patch Level: 0
Release Status: release
Description:
Bash's printf command has some incompatibilities with POSIX
1003.1-2001 and with ksh93. Here is the POSIX incompatibility:
$ printf '(\0007)'
(^G)
(Here, "^G" represents the character with octal code 7.)
This output doesn't conform to POSIX 1003.1-2001.
The output should be "(^@7)", where "^@" represents a null byte.
The remaining issues are not POSIX-conformance issues, but they are
incompatibilities with ksh93 and with what Standard C programmers
would expect. Here's the first one:
$ printf '(\x07e)'
(^Ge)
The C standard says that hexadecimal escapes can have any positive
number of digits, and ksh93 agrees with this, so it outputs "(~)" for
this example.
Here's the second ksh93 incompatibility:
$ printf '(\"\?)'
(\"\?)
Again, the C standard says that \" and \? are escapes for " and ?, and
ksh93 outputs "("?)" for this example.
Repeat-By:
printf '(\0007)'
printf '(\x07e)'
printf '(\"\?)'
(Please see "Description" section for discussion.)
Fix:
Here is a proposed patch, which implements these changes:
* At most three octal digits are allowed in printf string octal escapes,
for compatibility with POSIX. Previously, Bash allowed four digits
if the first one was '0'.)
* printf string hexadecimal escapes can now contain any positive
number of digits, for compatibility with the C standard and with
ksh93. Previously, Bash allowed at most two digits.
* New escape sequences \" and \? are now recognized in printf strings,
for compatibility with the C standard and with ksh93.
===================================================================
RCS file: builtins/printf.def,v
retrieving revision 2.5.2.0
retrieving revision 2.5.2.1
diff -pu -r2.5.2.0 -r2.5.2.1
--- builtins/printf.def 2002/05/13 18:36:04 2.5.2.0
+++ builtins/printf.def 2003/09/26 20:24:50 2.5.2.1
@@ -30,7 +30,9 @@ characters, which are simply copied to s
sequences which are converted and copied to the standard output, and
format specifications, each of which causes printing of the next successive
argument. In addition to the standard printf(1) formats, %b means to
-expand backslash escape sequences in the corresponding argument, and %q
+expand backslash escape sequences in the corresponding argument (except
+that \c terminates output, backslashes in \', \", and \? are not removed,
+and octal escapes that start with \0 can have up to four digits), and %q
means to quote the argument in a way that can be reused as shell input.
$END
@@ -105,7 +107,7 @@ extern int errno;
static void printf_erange __P((char *));
static void printstr __P((char *, char *, int, int, int));
-static int tescape __P((char *, int, char *, int *));
+static int tescape __P((char *, char *, int *));
static char *bexpand __P((char *, int, int *, int *));
static char *mklong __P((char *, char *, size_t));
static int getchr __P((void));
@@ -186,9 +188,9 @@ printf_builtin (list)
if (*fmt == '\\')
{
fmt++;
- /* A NULL fourth argument to tescape means to not do special
- processing for \c. */
- fmt += tescape (fmt, 1, &nextch, (int *)NULL);
+ /* A NULL third argument to tescape causes it to bypass
+ the special processing for %b arguments. */
+ fmt += tescape (fmt, &nextch, (int *)NULL);
putchar (nextch);
fmt--; /* for loop will increment it for us again */
continue;
@@ -531,6 +533,7 @@ printstr (fmt, string, len, fieldwidth,
/* Convert STRING by expanding the escape sequences specified by the
POSIX standard for printf's `%b' format string. If SAWC is non-null,
+ do the processing appropriate for %b arguments. In particular,
recognize `\c' and use that as a string terminator. If we see \c, set
*SAWC to 1 before returning. LEN is the length of STRING. */
@@ -540,11 +543,11 @@ printstr (fmt, string, len, fieldwidth,
value. *SAWC is set to 1 if the escape sequence was \c, since that means
to short-circuit the rest of the processing. If SAWC is null, we don't
do the \c short-circuiting, and \c is treated as an unrecognized escape
- sequence. */
+ sequence; also we bypass the other processing that is needed only for
+ %b arguments. */
static int
-tescape (estart, trans_squote, cp, sawc)
+tescape (estart, cp, sawc)
char *estart;
- int trans_squote;
char *cp;
int *sawc;
{
@@ -576,14 +579,13 @@ tescape (estart, trans_squote, cp, sawc)
case 'v': *cp = '\v'; break;
- /* %b octal constants are `\0' followed by one, two, or three
- octal digits... */
- case '0':
- /* but, as an extension, the other echo-like octal escape
- sequences are supported as well. */
- case '1': case '2': case '3': case '4':
- case '5': case '6': case '7':
- for (temp = 2+(c=='0'), evalue = c - '0'; ISOCTAL (*p) && temp--; p++)
+ /* The octal escapes are \0 followed by up to 3 octal digits (if SAWC)
+ or \ followed by up to 3 octal digits (if !SAWC). As an extension,
+ we allow the latter form even if SAWC. */
+ case '0': case '1': case '2': case '3':
+ case '4': case '5': case '6': case '7':
+ evalue = OCTVALUE (c);
+ for (temp = 2 + (!evalue && !!sawc); ISOCTAL (*p) && temp--; p++)
evalue = (evalue * 8) + OCTVALUE (*p);
*cp = evalue & 0xFF;
break;
@@ -591,9 +593,9 @@ tescape (estart, trans_squote, cp, sawc)
/* And, as another extension, we allow \xNNN, where each N is a
hex digit. */
case 'x':
- for (temp = 2, evalue = 0; ISXDIGIT ((unsigned char)*p) && temp--; p++)
+ for (evalue = 0; ISXDIGIT ((unsigned char)*p); p++)
evalue = (evalue * 16) + HEXVALUE (*p);
- if (temp == 2)
+ if (p == estart + 1)
{
builtin_error ("missing hex digit for \\x");
*cp = '\\';
@@ -606,8 +608,9 @@ tescape (estart, trans_squote, cp, sawc)
*cp = c;
break;
- case '\'': /* TRANS_SQUOTE != 0 means \' -> ' */
- if (trans_squote)
+ /* !SAWC means \' -> ', and similarly for \" and \?. */
+ case '\'': case '"': case '?':
+ if (!sawc)
*cp = c;
else
{
@@ -657,7 +660,7 @@ bexpand (string, len, sawc, lenp)
continue;
}
temp = 0;
- s += tescape (s, 0, &c, &temp);
+ s += tescape (s, &c, &temp);
if (temp)
{
if (sawc)
===================================================================
RCS file: doc/bash.1,v
retrieving revision 2.5.2.0
retrieving revision 2.5.2.1
diff -pu -r2.5.2.0 -r2.5.2.1
--- doc/bash.1 2002/07/15 19:21:03 2.5.2.0
+++ doc/bash.1 2003/09/26 20:24:50 2.5.2.1
@@ -6939,7 +6939,10 @@ format specifications, each of which cau
\fIargument\fP.
In addition to the standard \fIprintf\fP(1) formats, \fB%b\fP causes
\fBprintf\fP to expand backslash escape sequences in the corresponding
-\fIargument\fP, and \fB%q\fP causes \fBprintf\fP to output the corresponding
+\fIargument\fP (except that \fB\ec\fP terminates output, backslashes
+in \fB\e'\fP, \fB\e"\fP, and \fB\e?\fP are not removed, and octal
+escapes that start with \fB\e0\fP can have up to four digits),
+and \fB%q\fP causes \fBprintf\fP to output the corresponding
\fIargument\fP in a format that can be reused as shell input.
.sp 1
The \fIformat\fP is reused as necessary to consume all of the \fIarguments\fP.
===================================================================
RCS file: doc/bashref.texi,v
retrieving revision 2.5.2.0
retrieving revision 2.5.2.1
diff -pu -r2.5.2.0 -r2.5.2.1
--- doc/bashref.texi 2002/07/15 19:21:24 2.5.2.0
+++ doc/bashref.texi 2003/09/26 20:24:50 2.5.2.1
@@ -3254,7 +3254,10 @@ format specifications, each of which cau
@var{argument}.
In addition to the standard @code{printf(1)} formats, @samp{%b} causes
@code{printf} to expand backslash escape sequences in the corresponding
-@var{argument}, and @samp{%q} causes @code{printf} to output the
+@var{argument} (except that @samp{\c} terminates output, backslashes
+in @samp{\'}, @samp{\"}, and @samp{\?} are not removed, and octal
+escapes that start with @samp{\0} can have up to four digits),
+and @samp{%q} causes @code{printf} to output the
corresponding @var{argument} in a format that can be reused as shell input.
The @var{format} is reused as necessary to consume all of the @var{arguments}.
- printf incompatibilities with POSIX, ksh93,
Paul Eggert <=