emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#43439: closed ([PATCH] doprnt improvements)


From: GNU bug Tracking System
Subject: bug#43439: closed ([PATCH] doprnt improvements)
Date: Sat, 24 Oct 2020 21:03:01 +0000

Your message dated Sat, 24 Oct 2020 14:02:25 -0700
with message-id <3ac480a0-bd03-5068-ae9f-ccbe338245d6@cs.ucla.edu>
and subject line Re: bug#43439: [PATCH] doprnt improvements
has caused the debbugs.gnu.org bug report #43439,
regarding [PATCH] doprnt improvements
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
43439: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=43439
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: [PATCH] doprnt improvements Date: Tue, 15 Sep 2020 18:50:51 -0700
Improve doprnt performance, internal checking, and internal
documentation.  On my platform (Ubuntu 18.04.5 x86-64), this
improved CPU speed of ‘make -C lisp compile-always’ by 6%.
This patch implements some of my suggestions in Bug#8545,
with further changes suggested by Eli Zaretskii.
* src/doprnt.c: Improve comments.
(SIZE_BOUND_EXTRA): Now at top level, for parse_format_integer.
(parse_format_integer): New static function, containing some of
the old doprint.  Fix a bug that caused doprnt to infloop on
formats like "%10s" that Emacs does not use.  We could simplify
doprnt further if we dropped support for these never-used formats.
(doprnt): Omit FORMAT_END argument, since it’s always NULL,
which means doprnt must call strlen on FORMAT; doing this means
doprnt needs just one pass over FORMAT, not two.  All callers changed.
Assume C99 to make code clearer.  Do not use malloc or alloca
to allocate a copy of the format FMTCPY; instead, use a small
fixed-size array FMTSTAR, and use '*' in that array to represent
width and precision, passing them as separate int arguments.
Use eassume to pacify GCC in switch statements.  Drop support for
"%S" which is never used and which would cause GCC to warn anyway.
* src/lisp.h (doprnt): Give it ATTRIBUTE_FORMAT_PRINTF (3, 0),
since GCC can grok doprnt's new API.
---
 src/doprnt.c | 223 ++++++++++++++++++++++++++-------------------------
 src/lisp.h   |   4 +-
 src/sysdep.c |   2 +-
 src/xdisp.c  |   5 +-
 4 files changed, 117 insertions(+), 117 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index b0ba12552b..f154578c0d 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -28,6 +28,7 @@
    . For %s and %c, when field width is specified (e.g., %25s), it accounts for
      the display width of each character, according to char-width-table.  That
      is, it does not assume that each character takes one column on display.
+     Nor does it assume that each character is a single byte.
 
    . If the size of the buffer is not enough to produce the formatted string in
      its entirety, it makes sure that truncation does not chop the last
@@ -42,38 +43,41 @@
      Emacs can handle.
 
    OTOH, this function supports only a small subset of the standard C formatted
-   output facilities.  E.g., %u and %ll are not supported, and precision is
-   ignored %s and %c conversions.  (See below for the detailed documentation of
-   what is supported.)  However, this is okay, as this function is supposed to
-   be called from `error' and similar functions, and thus does not need to
-   support features beyond those in `Fformat_message', which is used
-   by `error' on the Lisp level.  */
+   output facilities.  E.g., %u is not supported, precision is ignored
+   in %s and %c conversions, and although %lld often works it is not
+   supported and code should use something like %"pM"d with intmax_t instead.
+   (See below for the detailed documentation of what is supported.)
+   However, this is okay, as this function is supposed to be called
+   from 'error' and similar C functions, and thus does not need to
+   support all the features of 'Fformat_message', which is used by the
+   Lisp 'error' function.  */
 
 /* In the FORMAT argument this function supports ` and ' as directives
    that output left and right quotes as per ‘text-quoting style’.  It
    also supports the following %-sequences:
 
    %s means print a string argument.
-   %S is treated as %s, for loose compatibility with `Fformat_message'.
    %d means print a `signed int' argument in decimal.
    %o means print an `unsigned int' argument in octal.
    %x means print an `unsigned int' argument in hex.
    %e means print a `double' argument in exponential notation.
    %f means print a `double' argument in decimal-point notation.
    %g means print a `double' argument in exponential notation
-      or in decimal-point notation, whichever uses fewer characters.
+      or in decimal-point notation, depending on the value;
+      this is often (though not always) the shorter of the two notations
    %c means print a `signed int' argument as a single character.
    %% means produce a literal % character.
 
-   A %-sequence may contain optional flag, width, and precision specifiers, and
-   a length modifier, as follows:
+   A %-sequence other than %% may contain optional flags, width, precision,
+   and length, as follows:
 
      %<flags><width><precision><length>character
 
    where flags is [+ -0], width is [0-9]+, precision is .[0-9]+, and length
    is empty or l or the value of the pD or pI or PRIdMAX (sans "d") macros.
-   Also, %% in a format stands for a single % in the output.  A % that
-   does not introduce a valid %-sequence causes undefined behavior.
+   A % that does not introduce a valid %-sequence causes undefined behavior.
+   ASCII bytes in FORMAT other than % are copied through as-is;
+   non-ASCII bytes should not appear in FORMAT.
 
    The + flag character inserts a + before any positive number, while a space
    inserts a space before any positive number; these flags only affect %d, %o,
@@ -99,7 +103,9 @@
 
    For %e, %f, and %g sequences, the number after the "." in the precision
    specifier says how many decimal places to show; if zero, the decimal point
-   itself is omitted.  For %s and %S, the precision specifier is ignored.  */
+   itself is omitted.  For %d, %o, and %x sequences, the precision specifies
+   the minimum number of digits to appear.  Precision specifiers are
+   not supported for other %-sequences.  */
 
 #include <config.h>
 #include <stdio.h>
@@ -115,9 +121,29 @@
    another macro.  */
 #include "character.h"
 
-/* Generate output from a format-spec FORMAT,
-   terminated at position FORMAT_END.
-   (*FORMAT_END is not part of the format, but must exist and be readable.)
+/* Enough to handle floating point formats with large numbers.  */
+enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
+
+/* Parse FMT as an unsigned decimal integer, putting its value into *VALUE.
+   Return the address of the first byte after the integer.
+   If FMT is not an integer, return FMT and store zero into *VALUE.  */
+static char const *
+parse_format_integer (char const *fmt, int *value)
+{
+  int n = 0;
+  bool overflow = false;
+  for (; '0' <= *fmt && *fmt <= '9'; fmt++)
+    {
+      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
+      overflow |= INT_ADD_WRAPV (n, *fmt - '0', &n);
+    }
+  if (overflow || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
+    error ("Format width or precision too large");
+  *value = n;
+  return fmt;
+}
+
+/* Generate output from a format-spec FORMAT.
    Output goes in BUFFER, which has room for BUFSIZE chars.
    BUFSIZE must be positive.  If the output does not fit, truncate it
    to fit and return BUFSIZE - 1; if this truncates a multibyte
@@ -128,15 +154,11 @@
    Integers are passed as C integers.  */
 
 ptrdiff_t
-doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
-       const char *format_end, va_list ap)
+doprnt (char *buffer, ptrdiff_t bufsize, const char *format, va_list ap)
 {
   const char *fmt = format;    /* Pointer into format string.  */
   char *bufptr = buffer;       /* Pointer into output buffer.  */
 
-  /* Enough to handle floating point formats with large numbers.  */
-  enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
-
   /* Use this for sprintf unless we need something really big.  */
   char tembuf[SIZE_BOUND_EXTRA + 50];
 
@@ -150,103 +172,91 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char 
*format,
   char *big_buffer = NULL;
 
   enum text_quoting_style quoting_style = text_quoting_style ();
-  ptrdiff_t tem = -1;
-  char *string;
-  char fixed_buffer[20];       /* Default buffer for small formatting. */
-  char *fmtcpy;
-  int minlen;
-  char charbuf[MAX_MULTIBYTE_LENGTH + 1];      /* Used for %c.  */
-  USE_SAFE_ALLOCA;
-
-  if (format_end == 0)
-    format_end = format + strlen (format);
-
-  fmtcpy = (format_end - format < sizeof (fixed_buffer) - 1
-           ? fixed_buffer
-           : SAFE_ALLOCA (format_end - format + 1));
 
   bufsize--;
 
   /* Loop until end of format string or buffer full. */
-  while (fmt < format_end && bufsize > 0)
+  while (*fmt && bufsize > 0)
     {
       char const *fmt0 = fmt;
       char fmtchar = *fmt++;
       if (fmtchar == '%')
        {
-         ptrdiff_t size_bound = 0;
          ptrdiff_t width;  /* Columns occupied by STRING on display.  */
          enum {
            pDlen = sizeof pD - 1,
            pIlen = sizeof pI - 1,
-           pMlen = sizeof PRIdMAX - 2
+           pMlen = sizeof PRIdMAX - 2,
+           maxmlen = max (max (1, pDlen), max (pIlen, pMlen))
          };
          enum {
            no_modifier, long_modifier, pD_modifier, pI_modifier, pM_modifier
          } length_modifier = no_modifier;
          static char const modifier_len[] = { 0, 1, pDlen, pIlen, pMlen };
-         int maxmlen = max (max (1, pDlen), max (pIlen, pMlen));
          int mlen;
+         char charbuf[MAX_MULTIBYTE_LENGTH + 1];       /* Used for %c.  */
 
-         /* Copy this one %-spec into fmtcpy.  */
-         string = fmtcpy;
+         /* Width and precision specified by this %-sequence.  */
+         int wid = 0, prec = -1;
+
+         /* FMTSTAR will be a "%*.*X"-like version of this %-sequence.
+            Start by putting '%' into FMTSTAR.  */
+         char fmtstar[sizeof "%-+ 0*.*d" + maxmlen];
+         char *string = fmtstar;
          *string++ = '%';
-         while (fmt < format_end)
+
+         /* Copy at most one instance of each flag into FMTSTAR.  */
+         bool minusflag = false, plusflag = false, zeroflag = false,
+           spaceflag = false;
+         for (;; fmt++)
            {
-             *string++ = *fmt;
-             if ('0' <= *fmt && *fmt <= '9')
+             *string = *fmt;
+             switch (*fmt)
                {
-                 /* Get an idea of how much space we might need.
-                    This might be a field width or a precision; e.g.
-                    %1.1000f and %1000.1f both might need 1000+ bytes.
-                    Parse the width or precision, checking for overflow.  */
-                 int n = *fmt - '0';
-                 bool overflow = false;
-                 while (fmt + 1 < format_end
-                        && '0' <= fmt[1] && fmt[1] <= '9')
-                   {
-                     overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
-                     overflow |= INT_ADD_WRAPV (n, fmt[1] - '0', &n);
-                     *string++ = *++fmt;
-                   }
-
-                 if (overflow
-                     || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
-                   error ("Format width or precision too large");
-                 if (size_bound < n)
-                   size_bound = n;
+               case '-': string += !minusflag; minusflag = true; continue;
+               case '+': string += !plusflag; plusflag = true; continue;
+               case ' ': string += !spaceflag; spaceflag = true; continue;
+               case '0': string += !zeroflag; zeroflag = true; continue;
                }
-             else if (! (*fmt == '-' || *fmt == ' ' || *fmt == '.'
-                         || *fmt == '+'))
-               break;
-             fmt++;
+             break;
            }
 
+         /* Parse width and precision, putting "*.*" into FMTSTAR.  */
+         if ('1' <= *fmt && *fmt <= '9')
+           fmt = parse_format_integer (fmt, &wid);
+         if (*fmt == '.')
+           fmt = parse_format_integer (fmt + 1, &prec);
+         *string++ = '*';
+         *string++ = '.';
+         *string++ = '*';
+
          /* Check for the length modifiers in textual length order, so
             that longer modifiers override shorter ones.  */
          for (mlen = 1; mlen <= maxmlen; mlen++)
            {
-             if (format_end - fmt < mlen)
-               break;
              if (mlen == 1 && *fmt == 'l')
                length_modifier = long_modifier;
-             if (mlen == pDlen && memcmp (fmt, pD, pDlen) == 0)
+             if (mlen == pDlen && strncmp (fmt, pD, pDlen) == 0)
                length_modifier = pD_modifier;
-             if (mlen == pIlen && memcmp (fmt, pI, pIlen) == 0)
+             if (mlen == pIlen && strncmp (fmt, pI, pIlen) == 0)
                length_modifier = pI_modifier;
-             if (mlen == pMlen && memcmp (fmt, PRIdMAX, pMlen) == 0)
+             if (mlen == pMlen && strncmp (fmt, PRIdMAX, pMlen) == 0)
                length_modifier = pM_modifier;
            }
 
+         /* Copy optional length modifier and conversion specifier
+            character into FMTSTAR, and append a NUL.  */
          mlen = modifier_len[length_modifier];
-         memcpy (string, fmt + 1, mlen);
-         string += mlen;
+         string = mempcpy (string, fmt, mlen + 1);
          fmt += mlen;
          *string = 0;
 
-         /* Make the size bound large enough to handle floating point formats
+         /* An idea of how much space we might need.
+            This might be a field width or a precision; e.g.
+            %1.1000f and %1000.1f both might need 1000+ bytes.
+            Make it large enough to handle floating point formats
             with large numbers.  */
-         size_bound += SIZE_BOUND_EXTRA;
+         ptrdiff_t size_bound = max (wid, prec) + SIZE_BOUND_EXTRA;
 
          /* Make sure we have that much.  */
          if (size_bound > size_allocated)
@@ -257,48 +267,49 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char 
*format,
              sprintf_buffer = big_buffer;
              size_allocated = size_bound;
            }
-         minlen = 0;
+         int minlen = 0;
+         ptrdiff_t tem;
          switch (*fmt++)
            {
            default:
-             error ("Invalid format operation %s", fmtcpy);
+             error ("Invalid format operation %s", fmt0);
 
-/*         case 'b': */
-           case 'l':
            case 'd':
              switch (length_modifier)
                {
                case no_modifier:
                  {
                    int v = va_arg (ap, int);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
                case long_modifier:
                  {
                    long v = va_arg (ap, long);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
                case pD_modifier:
                signed_pD_modifier:
                  {
                    ptrdiff_t v = va_arg (ap, ptrdiff_t);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
                case pI_modifier:
                  {
                    EMACS_INT v = va_arg (ap, EMACS_INT);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
                case pM_modifier:
                  {
                    intmax_t v = va_arg (ap, intmax_t);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
+               default:
+                 eassume (false);
                }
              /* Now copy into final output, truncating as necessary.  */
              string = sprintf_buffer;
@@ -311,13 +322,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char 
*format,
                case no_modifier:
                  {
                    unsigned v = va_arg (ap, unsigned);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
                case long_modifier:
                  {
                    unsigned long v = va_arg (ap, unsigned long);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
                case pD_modifier:
@@ -325,15 +336,17 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char 
*format,
                case pI_modifier:
                  {
                    EMACS_UINT v = va_arg (ap, EMACS_UINT);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
                case pM_modifier:
                  {
                    uintmax_t v = va_arg (ap, uintmax_t);
-                   tem = sprintf (sprintf_buffer, fmtcpy, v);
+                   tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
                  }
                  break;
+               default:
+                 eassume (false);
                }
              /* Now copy into final output, truncating as necessary.  */
              string = sprintf_buffer;
@@ -344,22 +357,18 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char 
*format,
            case 'g':
              {
                double d = va_arg (ap, double);
-               tem = sprintf (sprintf_buffer, fmtcpy, d);
+               tem = sprintf (sprintf_buffer, fmtstar, wid, prec, d);
                /* Now copy into final output, truncating as necessary.  */
                string = sprintf_buffer;
                goto doit;
              }
 
-           case 'S':
-             string[-1] = 's';
-             FALLTHROUGH;
            case 's':
-             if (fmtcpy[1] != 's')
-               minlen = atoi (&fmtcpy[1]);
+             minlen = minusflag ? -wid : wid;
              string = va_arg (ap, char *);
              tem = strnlen (string, STRING_BYTES_BOUND + 1);
              if (tem == STRING_BYTES_BOUND + 1)
-               error ("String for %%s or %%S format is too long");
+               error ("String for %%s format is too long");
              width = strwidth (string, tem);
              goto doit1;
 
@@ -432,14 +441,12 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char 
*format,
                string = charbuf;
                string[tem] = 0;
                width = strwidth (string, tem);
-               if (fmtcpy[1] != 'c')
-                 minlen = atoi (&fmtcpy[1]);
+               minlen = minusflag ? -wid : wid;
                goto doit1;
              }
 
            case '%':
              /* Treat this '%' as normal.  */
-             fmt0 = fmt - 1;
              break;
            }
        }
@@ -450,13 +457,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char 
*format,
        src = uLSQM, srclen = sizeof uLSQM - 1;
       else if (quoting_style == CURVE_QUOTING_STYLE && fmtchar == '\'')
        src = uRSQM, srclen = sizeof uRSQM - 1;
-      else if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
-       src = "'", srclen = 1;
       else
        {
-         while (fmt < format_end && !CHAR_HEAD_P (*fmt))
-           fmt++;
-         src = fmt0, srclen = fmt - fmt0;
+         if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
+           fmtchar = '\'';
+         eassert (ASCII_CHAR_P (fmtchar));
+         *bufptr++ = fmtchar;
+         continue;
        }
 
       if (bufsize < srclen)
@@ -479,8 +486,6 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   xfree (big_buffer);
 
   *bufptr = 0;         /* Make sure our string ends with a '\0' */
-
-  SAFE_FREE ();
   return bufptr - buffer;
 }
 
@@ -495,10 +500,9 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char 
*format,
 ptrdiff_t
 esprintf (char *buf, char const *format, ...)
 {
-  ptrdiff_t nbytes;
   va_list ap;
   va_start (ap, format);
-  nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, 0, ap);
+  ptrdiff_t nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, ap);
   va_end (ap);
   return nbytes;
 }
@@ -534,10 +538,9 @@ evxprintf (char **buf, ptrdiff_t *bufsize,
 {
   for (;;)
     {
-      ptrdiff_t nbytes;
       va_list ap_copy;
       va_copy (ap_copy, ap);
-      nbytes = doprnt (*buf, *bufsize, format, 0, ap_copy);
+      ptrdiff_t nbytes = doprnt (*buf, *bufsize, format, ap_copy);
       va_end (ap_copy);
       if (nbytes < *bufsize - 1)
        return nbytes;
diff --git a/src/lisp.h b/src/lisp.h
index a24898004d..957ca41702 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -4034,8 +4034,8 @@ #define FLOAT_TO_STRING_BUFSIZE 350
 extern void syms_of_print (void);
 
 /* Defined in doprnt.c.  */
-extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, const char *,
-                        va_list);
+extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, va_list)
+  ATTRIBUTE_FORMAT_PRINTF (3, 0);
 extern ptrdiff_t esprintf (char *, char const *, ...)
   ATTRIBUTE_FORMAT_PRINTF (2, 3);
 extern ptrdiff_t exprintf (char **, ptrdiff_t *, char const *, ptrdiff_t,
diff --git a/src/sysdep.c b/src/sysdep.c
index e161172a79..790ae084d3 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -2192,7 +2192,7 @@ snprintf (char *buf, size_t bufsize, char const *format, 
...)
   if (size)
     {
       va_start (ap, format);
-      nbytes = doprnt (buf, size, format, 0, ap);
+      nbytes = doprnt (buf, size, format, ap);
       va_end (ap);
     }
 
diff --git a/src/xdisp.c b/src/xdisp.c
index 615f0ca7cf..213c2a464a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -11269,13 +11269,10 @@ vmessage (const char *m, va_list ap)
        {
          if (m)
            {
-             ptrdiff_t len;
              ptrdiff_t maxsize = FRAME_MESSAGE_BUF_SIZE (f);
              USE_SAFE_ALLOCA;
              char *message_buf = SAFE_ALLOCA (maxsize + 1);
-
-             len = doprnt (message_buf, maxsize, m, 0, ap);
-
+             ptrdiff_t len = doprnt (message_buf, maxsize, m, ap);
              message3 (make_string (message_buf, len));
              SAFE_FREE ();
            }
-- 
2.17.1




--- End Message ---
--- Begin Message --- Subject: Re: bug#43439: [PATCH] doprnt improvements Date: Sat, 24 Oct 2020 14:02:25 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 OK, I installed the patch, followed by the attached minor cleanups. Closing the bug report.

Attachment: 0001-Rename-doprnt_nul-to-doprnt_non_null_end.patch
Description: Text Data

Attachment: 0002-Minor-doprnt-cleanup-remove-memchr-call.patch
Description: Text Data


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]