bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

printf cleanup for POSIX and Bash compatibility


From: Paul Eggert
Subject: printf cleanup for POSIX and Bash compatibility
Date: Wed, 07 Jul 2004 00:00:23 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

I looked at the Fedora Core 2 patches for coreutils, and noticed that
they patched coreutils printf to ignore "ll" length specifiers in
formats like "%lld".  This reminded me of a long-ago task that I
didn't finish with coreutils: add support to "printf" for wider
integers, for compatibility with Bash.  I looked into all the
incompatibilities between Bash and coreutils printf (and looked at the
POSIX spec too) and came up with the following proposed patch.
This patch should remove the need for the Fedora Core 2 patch in
this area.

2004-07-06  Paul Eggert  <address@hidden>

        printf cleanup, to avoid undefined behavior, to add support for
        formats that Bash supports, and to support wide integers like
        Bash does.
        
        * NEWS: Document this.
        * src/printf.c (UNSPECIFIED): Remove.  All uses now replaced by
        booleans, so that we don't reserve any values for precision or
        width (like Bash).
        (STRTOX): Use prototype, not K&R-style definition.
        (vstrtoimax): Renamed from xstrtol (to avoid confusion with xstrtol
        in ../lib), with type change to intmax_t.
        All uses changed.
        (vstrtoumax): Renamed from xstrtoul, with type change to uintmax_t.
        All uses changed.
        (vstrtod): Renamed from xstrtod.  All uses changed.
        (print_direc): Use boolean arg instead of special value to indicate
        a missing precision or width.  LENGTH no longer includes
        length modifiers or conversion character.  New arg CONVERSION
        now specifies conversion character.
        Use intmax_t-width formatting for integers (like Bash).
        Add support for C99 %a, %A, %F (like Bash).
        Add support for field width with %c (POSIX requires this).
        Add a FIXME for lack of support for field width and precision
        for %b.
        Add support for '\'', '0' flags.
        Check for invalid combinations of flags, field width, precision,
        and conversion, to prevent use of undefined behavior.
        Allow multiple length modifiers, for formats like "%lld" (like Bash).
        Add support for C99 'j', 't', 'z' length modifiers (like Bash).
        In error message, output entire invalid conversion specification,
        instead of merely outputting % followed by the conversion char.
        * tests/misc/printf: Add tests for the above.

Index: NEWS
===================================================================
RCS file: /home/meyering/coreutils/cu/NEWS,v
retrieving revision 1.223
diff -p -u -r1.223 NEWS
--- NEWS        2 Jul 2004 16:59:41 -0000       1.223
+++ NEWS        7 Jul 2004 06:52:14 -0000
@@ -58,6 +58,19 @@ GNU coreutils NEWS                      
   POSIXLY_CORRECT is set and the first argument is not "-n", echo now
   outputs all option-like arguments instead of treating them as options.
 
+  printf has several changes:
+
+    It now uses 'intmax_t' (not 'long int') to format integers, so it
+    can now format 64-bit integers on most modern hosts.
+
+    On modern hosts it now supports the C99-inspired %a, %A, %F conversion
+    specs, the "'" and "0" flags, and the ll, j, t, and z length modifiers
+    (this is compatible with recent Bash versions).
+
+    The printf command now rejects invalid conversion specifications
+    like %#d, instead of relying on undefined behavior in the underlying
+    printf function.
+
   who now prints user names in full instead of truncating them after 8 bytes.
 
 ** New features
Index: src/printf.c
===================================================================
RCS file: /home/meyering/coreutils/cu/src/printf.c,v
retrieving revision 1.90
diff -p -u -r1.90 printf.c
--- src/printf.c        26 Jun 2004 13:09:17 -0000      1.90
+++ src/printf.c        7 Jul 2004 06:30:39 -0000
@@ -67,9 +67,6 @@
                     (c) >= 'A' && (c) <= 'F' ? (c) - 'A' + 10 : (c) - '0')
 #define octtobin(c) ((c) - '0')
 
-/* A value for field_width or precision that indicates it was not specified.  
*/
-#define UNSPECIFIED INT_MIN
-
 /* The value to return to the calling program.  */
 static int exit_status;
 
@@ -162,8 +159,7 @@ verify (const char *s, const char *end)
 
 #define STRTOX(TYPE, FUNC_NAME, LIB_FUNC_EXPR)                          \
 static TYPE                                                             \
-FUNC_NAME (s)                                                           \
-     const char *s;                                                     \
+FUNC_NAME (char const *s)                                               \
 {                                                                       \
   char *end;                                                            \
   TYPE val;                                                             \
@@ -187,9 +183,9 @@ FUNC_NAME (s)                                               
                 \
   return val;                                                           \
 }                                                                       \
 
-STRTOX (unsigned long int, xstrtoul, (strtoul (s, &end, 0)))
-STRTOX (long int,          xstrtol,  (strtol  (s, &end, 0)))
-STRTOX (double,            xstrtod,  (c_strtod (s, &end)))
+STRTOX (intmax_t,  vstrtoimax, (strtoimax (s, &end, 0)))
+STRTOX (uintmax_t, vstrtoumax, (strtoumax (s, &end, 0)))
+STRTOX (double,    vstrtod,    (c_strtod (s, &end)))
 
 /* Output a single-character \ escape.  */
 
@@ -317,97 +313,138 @@ print_esc_string (const char *str)
       putchar (*str);
 }
 
-/* Output a % directive.  START is the start of the directive,
-   LENGTH is its length, and ARGUMENT is its argument.
-   If FIELD_WIDTH or PRECISION is UNSPECIFIED, they are args for
-   '*' values in those fields. */
+/* Evaluate a printf conversion specification.  START is the start of
+   the directive, LENGTH is its length, and CONVERSION specifies the
+   type of conversion.  LENGTH does not include any length modifier or
+   the conversion specifier itself.  FIELD_WIDTH and PRECISION are the
+   field with and precision for '*' values, if HAVE_FIELD_WIDTH and
+   HAVE_PRECISION are true, respectively.  ARGUMENT is the argument to
+   be formatted.  */
 
 static void
-print_direc (const char *start, size_t length, int field_width,
-            int precision, const char *argument)
+print_direc (const char *start, size_t length, char conversion,
+            bool have_field_width, int field_width,
+            bool have_precision, int precision,
+            char const *argument)
 {
   char *p;             /* Null-terminated copy of % directive. */
 
-  p = xmalloc ((unsigned) (length + 1));
-  strncpy (p, start, length);
-  p[length] = 0;
+  /* Create a null-terminated copy of the % directive, with an
+     intmax_t-wide length modifier substituted for any existing
+     integer length modifier.  */
+  {
+    char *q;
+    size_t length_modifier_len;
+
+    switch (conversion)
+      {
+      case 'd': case 'i': case 'o': case 'u': case 'x': case 'X':
+       length_modifier_len = sizeof PRIdMAX - 2;
+       break;
+
+      default:
+       length_modifier_len = 0;
+       break;
+      }
+
+    p = xmalloc (length + length_modifier_len + 2);
+    q = mempcpy (p, start, length);
+    q = mempcpy (q, PRIdMAX, length_modifier_len);
+    *q++ = conversion;
+    *q = '\0';
+  }
 
-  switch (p[length - 1])
+  switch (conversion)
     {
     case 'd':
     case 'i':
-      if (field_width == UNSPECIFIED)
-       {
-         if (precision == UNSPECIFIED)
-           printf (p, xstrtol (argument));
-         else
-           printf (p, precision, xstrtol (argument));
-       }
-      else
-       {
-         if (precision == UNSPECIFIED)
-           printf (p, field_width, xstrtol (argument));
-         else
-           printf (p, field_width, precision, xstrtol (argument));
-       }
+      {
+       intmax_t arg = vstrtoimax (argument);
+       if (!have_field_width)
+         {
+           if (!have_precision)
+             printf (p, arg);
+           else
+             printf (p, precision, arg);
+         }
+       else
+         {
+           if (!have_precision)
+             printf (p, field_width, arg);
+           else
+             printf (p, field_width, precision, arg);
+         }
+      }
       break;
 
     case 'o':
     case 'u':
     case 'x':
     case 'X':
-      if (field_width == UNSPECIFIED)
-       {
-         if (precision == UNSPECIFIED)
-           printf (p, xstrtoul (argument));
-         else
-           printf (p, precision, xstrtoul (argument));
-       }
-      else
-       {
-         if (precision == UNSPECIFIED)
-           printf (p, field_width, xstrtoul (argument));
-         else
-           printf (p, field_width, precision, xstrtoul (argument));
-       }
+      {
+       uintmax_t arg = vstrtoumax (argument);
+       if (!have_field_width)
+         {
+           if (!have_precision)
+             printf (p, arg);
+           else
+             printf (p, precision, arg);
+         }
+       else
+         {
+           if (!have_precision)
+             printf (p, field_width, arg);
+           else
+             printf (p, field_width, precision, arg);
+         }
+      }
       break;
 
-    case 'f':
+    case 'a':
+    case 'A':
     case 'e':
     case 'E':
+    case 'f':
+    case 'F':
     case 'g':
     case 'G':
-      if (field_width == UNSPECIFIED)
-       {
-         if (precision == UNSPECIFIED)
-           printf (p, xstrtod (argument));
-         else
-           printf (p, precision, xstrtod (argument));
-       }
-      else
-       {
-         if (precision == UNSPECIFIED)
-           printf (p, field_width, xstrtod (argument));
-         else
-           printf (p, field_width, precision, xstrtod (argument));
-       }
+      {
+       double arg = vstrtod (argument);
+       if (!have_field_width)
+         {
+           if (!have_precision)
+             printf (p, arg);
+           else
+             printf (p, precision, arg);
+         }
+       else
+         {
+           if (!have_precision)
+             printf (p, field_width, arg);
+           else
+             printf (p, field_width, precision, arg);
+         }
+      }
       break;
 
     case 'c':
-      printf (p, *argument);
+      if (!have_field_width)
+       printf (p, *argument);
+      else
+       printf (p, field_width, *argument);
       break;
 
     case 's':
-      if (field_width == UNSPECIFIED)
+      if (!have_field_width)
        {
-         if (precision == UNSPECIFIED)
+         if (!have_precision)
            printf (p, argument);
          else
            printf (p, precision, argument);
        }
       else
        {
-         if (precision == UNSPECIFIED)
+         if (!have_precision)
            printf (p, field_width, argument);
          else
            printf (p, field_width, precision, argument);
@@ -429,8 +466,11 @@ print_formatted (const char *format, int
   const char *f;               /* Pointer into `format'.  */
   const char *direc_start;     /* Start of % directive.  */
   size_t direc_length;         /* Length of % directive.  */
-  int field_width;             /* Arg to first '*', or UNSPECIFIED if none. */
-  int precision;               /* Arg to second '*', or UNSPECIFIED if none. */
+  bool have_field_width;       /* True if FIELD_WIDTH is valid.  */
+  int field_width = 0;         /* Arg to first '*'.  */
+  bool have_precision;         /* True if PRECISION is valid.  */
+  int precision = 0;           /* Arg to second '*'.  */
+  char ok[UCHAR_MAX + 1];      /* ok['x'] is true if %x is allowed.  */
 
   for (f = format; *f; ++f)
     {
@@ -439,7 +479,7 @@ print_formatted (const char *format, int
        case '%':
          direc_start = f++;
          direc_length = 1;
-         field_width = precision = UNSPECIFIED;
+         have_field_width = have_precision = false;
          if (*f == '%')
            {
              putchar ('%');
@@ -447,6 +487,8 @@ print_formatted (const char *format, int
            }
          if (*f == 'b')
            {
+             /* FIXME: Field width and precision are not supported
+                for %b, even though POSIX requires it.  */
              if (argc > 0)
                {
                  print_esc_string (*argv);
@@ -455,19 +497,42 @@ print_formatted (const char *format, int
                }
              break;
            }
-         while (*f == ' ' || *f == '#' || *f == '+' || *f == '-')
-           {
-             ++f;
-             ++direc_length;
-           }
+
+         memset (ok, 0, sizeof ok);
+         ok['a'] = ok['A'] = ok['c'] = ok['d'] = ok['e'] = ok['E'] =
+           ok['f'] = ok['F'] = ok['g'] = ok['G'] = ok['i'] = ok['o'] =
+           ok['s'] = ok['u'] = ok['x'] = ok['X'] = 1;
+
+         for (;; f++, direc_length++)
+           switch (*f)
+             {
+             case '\'':
+               ok['a'] = ok['A'] = ok['c'] = ok['e'] = ok['E'] =
+                 ok['o'] = ok['s'] = ok['x'] = ok['X'] = 0;
+               break;
+             case '-': case '+': case ' ':
+               break;
+             case '#':
+               ok['c'] = ok['d'] = ok['i'] = ok['s'] = ok['u'] = 0;
+               break;
+             case '0':
+               ok['c'] = ok['s'] = 0;
+               break;
+             default:
+               goto no_more_flag_characters;
+             }
+       no_more_flag_characters:;
+
          if (*f == '*')
            {
              ++f;
              ++direc_length;
              if (argc > 0)
                {
-                 field_width = xstrtoul (*argv);
-                 if (field_width == UNSPECIFIED)
+                 intmax_t width = vstrtoimax (*argv);
+                 if (INT_MIN <= width && width <= INT_MAX)
+                   field_width = width;
+                 else
                    error (EXIT_FAILURE, 0, _("invalid field width: %s"),
                           *argv);
                  ++argv;
@@ -475,6 +540,7 @@ print_formatted (const char *format, int
                }
              else
                field_width = 0;
+             have_field_width = true;
            }
          else
            while (ISDIGIT (*f))
@@ -486,21 +552,32 @@ print_formatted (const char *format, int
            {
              ++f;
              ++direc_length;
+             ok['c'] = 0;
              if (*f == '*')
                {
                  ++f;
                  ++direc_length;
                  if (argc > 0)
                    {
-                     precision = xstrtoul (*argv);
-                     if (precision == UNSPECIFIED)
+                     intmax_t prec = vstrtoimax (*argv);
+                     if (prec < 0)
+                       {
+                         /* A negative precision is taken as if the
+                            precision were omitted, so -1 is safe
+                            here even if prec < INT_MIN.  */
+                         precision = -1;
+                       }
+                     else if (INT_MAX < prec)
                        error (EXIT_FAILURE, 0, _("invalid precision: %s"),
                               *argv);
+                     else
+                       precision = prec;
                      ++argv;
                      --argc;
                    }
                  else
                    precision = 0;
+                 have_precision = true;
                }
              else
                while (ISDIGIT (*f))
@@ -509,24 +586,23 @@ print_formatted (const char *format, int
                    ++direc_length;
                  }
            }
-         if (*f == 'l' || *f == 'L' || *f == 'h')
-           {
-             ++f;
-             ++direc_length;
-           }
-         if (! (*f && strchr ("diouxXfeEgGcs", *f)))
-           error (EXIT_FAILURE, 0, _("%%%c: invalid directive"), *f);
-         ++direc_length;
-         if (argc > 0)
-           {
-             print_direc (direc_start, direc_length, field_width,
-                          precision, *argv);
-             ++argv;
-             --argc;
-           }
-         else
-           print_direc (direc_start, direc_length, field_width,
-                        precision, "");
+
+         while (*f == 'l' || *f == 'L' || *f == 'h'
+                || *f == 'j' || *f == 't' || *f == 'z')
+           ++f;
+
+         {
+           unsigned char conversion = *f;
+           if (! ok[conversion])
+             error (EXIT_FAILURE, 0,
+                    _("%.*s: invalid conversion specification"),
+                    (int) (f + 1 - direc_start), direc_start);
+         }
+
+         print_direc (direc_start, direc_length, *f,
+                      have_field_width, field_width,
+                      have_precision, precision,
+                      (argc <= 0 ? "" : (argc--, *argv++)));
          break;
 
        case '\\':
Index: tests/misc/printf
===================================================================
RCS file: /home/meyering/coreutils/cu/tests/misc/printf,v
retrieving revision 1.7
diff -p -u -r1.7 printf
--- tests/misc/printf   15 Jun 2004 18:04:47 -0000      1.7
+++ tests/misc/printf   7 Jul 2004 03:43:57 -0000
@@ -63,6 +63,20 @@ $prog '7 \2y \02y \002y \0002y\n' |tr '\
 
 $prog '8 %b %b %b %b\n' '\1y' '\01y' '\001y' '\0001y'|tr '\1' = >> out
 
+$prog '9 %*dx\n' -2 0 >>out || fail=1
+
+$prog '10 %.*dx\n' -2147483648 0 >>out || fail=1
+
+$prog '11 %*c\n' 2 x >>out || fail=1
+
+$prog '%#d\n' 0 >>out 2> /dev/null && fail=1
+
+$prog '%0s\n' 0 >>out 2> /dev/null && fail=1
+
+$prog '%.9c\n' 0 >>out 2> /dev/null && fail=1
+
+$prog '%'\''s\n' 0 >>out 2> /dev/null && fail=1
+
 cat <<\EOF > exp
 1 x  y
 2 failed, as expected
@@ -72,6 +86,9 @@ cat <<\EOF > exp
 6 !
 7 =y =y =y *2y
 8 =y =y =y =y
+9 0 x
+10 0x
+11  x
 EOF
 
 cmp out exp || fail=1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]