unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#43439: [PATCH] doprnt improvements
@ 2020-09-16  1:50 Paul Eggert
  2020-09-16 14:58 ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2020-09-16  1:50 UTC (permalink / raw)
  To: 43439; +Cc: Paul Eggert

Improve doprnt performance, internal checking, and internal
documentation.  On my platform (Ubuntu 18.04.5 x86-64), this
improved CPU speed of ‘make -C lisp compile-always’ by 6%.
This patch implements some of my suggestions in Bug#8545,
with further changes suggested by Eli Zaretskii.
* src/doprnt.c: Improve comments.
(SIZE_BOUND_EXTRA): Now at top level, for parse_format_integer.
(parse_format_integer): New static function, containing some of
the old doprint.  Fix a bug that caused doprnt to infloop on
formats like "%10s" that Emacs does not use.  We could simplify
doprnt further if we dropped support for these never-used formats.
(doprnt): Omit FORMAT_END argument, since it’s always NULL,
which means doprnt must call strlen on FORMAT; doing this means
doprnt needs just one pass over FORMAT, not two.  All callers changed.
Assume C99 to make code clearer.  Do not use malloc or alloca
to allocate a copy of the format FMTCPY; instead, use a small
fixed-size array FMTSTAR, and use '*' in that array to represent
width and precision, passing them as separate int arguments.
Use eassume to pacify GCC in switch statements.  Drop support for
"%S" which is never used and which would cause GCC to warn anyway.
* src/lisp.h (doprnt): Give it ATTRIBUTE_FORMAT_PRINTF (3, 0),
since GCC can grok doprnt's new API.
---
 src/doprnt.c | 223 ++++++++++++++++++++++++++-------------------------
 src/lisp.h   |   4 +-
 src/sysdep.c |   2 +-
 src/xdisp.c  |   5 +-
 4 files changed, 117 insertions(+), 117 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index b0ba12552b..f154578c0d 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -28,6 +28,7 @@
    . For %s and %c, when field width is specified (e.g., %25s), it accounts for
      the display width of each character, according to char-width-table.  That
      is, it does not assume that each character takes one column on display.
+     Nor does it assume that each character is a single byte.
 
    . If the size of the buffer is not enough to produce the formatted string in
      its entirety, it makes sure that truncation does not chop the last
@@ -42,38 +43,41 @@
      Emacs can handle.
 
    OTOH, this function supports only a small subset of the standard C formatted
-   output facilities.  E.g., %u and %ll are not supported, and precision is
-   ignored %s and %c conversions.  (See below for the detailed documentation of
-   what is supported.)  However, this is okay, as this function is supposed to
-   be called from `error' and similar functions, and thus does not need to
-   support features beyond those in `Fformat_message', which is used
-   by `error' on the Lisp level.  */
+   output facilities.  E.g., %u is not supported, precision is ignored
+   in %s and %c conversions, and although %lld often works it is not
+   supported and code should use something like %"pM"d with intmax_t instead.
+   (See below for the detailed documentation of what is supported.)
+   However, this is okay, as this function is supposed to be called
+   from 'error' and similar C functions, and thus does not need to
+   support all the features of 'Fformat_message', which is used by the
+   Lisp 'error' function.  */
 
 /* In the FORMAT argument this function supports ` and ' as directives
    that output left and right quotes as per ‘text-quoting style’.  It
    also supports the following %-sequences:
 
    %s means print a string argument.
-   %S is treated as %s, for loose compatibility with `Fformat_message'.
    %d means print a `signed int' argument in decimal.
    %o means print an `unsigned int' argument in octal.
    %x means print an `unsigned int' argument in hex.
    %e means print a `double' argument in exponential notation.
    %f means print a `double' argument in decimal-point notation.
    %g means print a `double' argument in exponential notation
-      or in decimal-point notation, whichever uses fewer characters.
+      or in decimal-point notation, depending on the value;
+      this is often (though not always) the shorter of the two notations
    %c means print a `signed int' argument as a single character.
    %% means produce a literal % character.
 
-   A %-sequence may contain optional flag, width, and precision specifiers, and
-   a length modifier, as follows:
+   A %-sequence other than %% may contain optional flags, width, precision,
+   and length, as follows:
 
      %<flags><width><precision><length>character
 
    where flags is [+ -0], width is [0-9]+, precision is .[0-9]+, and length
    is empty or l or the value of the pD or pI or PRIdMAX (sans "d") macros.
-   Also, %% in a format stands for a single % in the output.  A % that
-   does not introduce a valid %-sequence causes undefined behavior.
+   A % that does not introduce a valid %-sequence causes undefined behavior.
+   ASCII bytes in FORMAT other than % are copied through as-is;
+   non-ASCII bytes should not appear in FORMAT.
 
    The + flag character inserts a + before any positive number, while a space
    inserts a space before any positive number; these flags only affect %d, %o,
@@ -99,7 +103,9 @@
 
    For %e, %f, and %g sequences, the number after the "." in the precision
    specifier says how many decimal places to show; if zero, the decimal point
-   itself is omitted.  For %s and %S, the precision specifier is ignored.  */
+   itself is omitted.  For %d, %o, and %x sequences, the precision specifies
+   the minimum number of digits to appear.  Precision specifiers are
+   not supported for other %-sequences.  */
 
 #include <config.h>
 #include <stdio.h>
@@ -115,9 +121,29 @@
    another macro.  */
 #include "character.h"
 
-/* Generate output from a format-spec FORMAT,
-   terminated at position FORMAT_END.
-   (*FORMAT_END is not part of the format, but must exist and be readable.)
+/* Enough to handle floating point formats with large numbers.  */
+enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
+
+/* Parse FMT as an unsigned decimal integer, putting its value into *VALUE.
+   Return the address of the first byte after the integer.
+   If FMT is not an integer, return FMT and store zero into *VALUE.  */
+static char const *
+parse_format_integer (char const *fmt, int *value)
+{
+  int n = 0;
+  bool overflow = false;
+  for (; '0' <= *fmt && *fmt <= '9'; fmt++)
+    {
+      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
+      overflow |= INT_ADD_WRAPV (n, *fmt - '0', &n);
+    }
+  if (overflow || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
+    error ("Format width or precision too large");
+  *value = n;
+  return fmt;
+}
+
+/* Generate output from a format-spec FORMAT.
    Output goes in BUFFER, which has room for BUFSIZE chars.
    BUFSIZE must be positive.  If the output does not fit, truncate it
    to fit and return BUFSIZE - 1; if this truncates a multibyte
@@ -128,15 +154,11 @@
    Integers are passed as C integers.  */
 
 ptrdiff_t
-doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
-	const char *format_end, va_list ap)
+doprnt (char *buffer, ptrdiff_t bufsize, const char *format, va_list ap)
 {
   const char *fmt = format;	/* Pointer into format string.  */
   char *bufptr = buffer;	/* Pointer into output buffer.  */
 
-  /* Enough to handle floating point formats with large numbers.  */
-  enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
-
   /* Use this for sprintf unless we need something really big.  */
   char tembuf[SIZE_BOUND_EXTRA + 50];
 
@@ -150,103 +172,91 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   char *big_buffer = NULL;
 
   enum text_quoting_style quoting_style = text_quoting_style ();
-  ptrdiff_t tem = -1;
-  char *string;
-  char fixed_buffer[20];	/* Default buffer for small formatting. */
-  char *fmtcpy;
-  int minlen;
-  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
-  USE_SAFE_ALLOCA;
-
-  if (format_end == 0)
-    format_end = format + strlen (format);
-
-  fmtcpy = (format_end - format < sizeof (fixed_buffer) - 1
-	    ? fixed_buffer
-	    : SAFE_ALLOCA (format_end - format + 1));
 
   bufsize--;
 
   /* Loop until end of format string or buffer full. */
-  while (fmt < format_end && bufsize > 0)
+  while (*fmt && bufsize > 0)
     {
       char const *fmt0 = fmt;
       char fmtchar = *fmt++;
       if (fmtchar == '%')
 	{
-	  ptrdiff_t size_bound = 0;
 	  ptrdiff_t width;  /* Columns occupied by STRING on display.  */
 	  enum {
 	    pDlen = sizeof pD - 1,
 	    pIlen = sizeof pI - 1,
-	    pMlen = sizeof PRIdMAX - 2
+	    pMlen = sizeof PRIdMAX - 2,
+	    maxmlen = max (max (1, pDlen), max (pIlen, pMlen))
 	  };
 	  enum {
 	    no_modifier, long_modifier, pD_modifier, pI_modifier, pM_modifier
 	  } length_modifier = no_modifier;
 	  static char const modifier_len[] = { 0, 1, pDlen, pIlen, pMlen };
-	  int maxmlen = max (max (1, pDlen), max (pIlen, pMlen));
 	  int mlen;
+	  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
 
-	  /* Copy this one %-spec into fmtcpy.  */
-	  string = fmtcpy;
+	  /* Width and precision specified by this %-sequence.  */
+	  int wid = 0, prec = -1;
+
+	  /* FMTSTAR will be a "%*.*X"-like version of this %-sequence.
+	     Start by putting '%' into FMTSTAR.  */
+	  char fmtstar[sizeof "%-+ 0*.*d" + maxmlen];
+	  char *string = fmtstar;
 	  *string++ = '%';
-	  while (fmt < format_end)
+
+	  /* Copy at most one instance of each flag into FMTSTAR.  */
+	  bool minusflag = false, plusflag = false, zeroflag = false,
+	    spaceflag = false;
+	  for (;; fmt++)
 	    {
-	      *string++ = *fmt;
-	      if ('0' <= *fmt && *fmt <= '9')
+	      *string = *fmt;
+	      switch (*fmt)
 		{
-		  /* Get an idea of how much space we might need.
-		     This might be a field width or a precision; e.g.
-		     %1.1000f and %1000.1f both might need 1000+ bytes.
-		     Parse the width or precision, checking for overflow.  */
-		  int n = *fmt - '0';
-		  bool overflow = false;
-		  while (fmt + 1 < format_end
-			 && '0' <= fmt[1] && fmt[1] <= '9')
-		    {
-		      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
-		      overflow |= INT_ADD_WRAPV (n, fmt[1] - '0', &n);
-		      *string++ = *++fmt;
-		    }
-
-		  if (overflow
-		      || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
-		    error ("Format width or precision too large");
-		  if (size_bound < n)
-		    size_bound = n;
+		case '-': string += !minusflag; minusflag = true; continue;
+		case '+': string += !plusflag; plusflag = true; continue;
+		case ' ': string += !spaceflag; spaceflag = true; continue;
+		case '0': string += !zeroflag; zeroflag = true; continue;
 		}
-	      else if (! (*fmt == '-' || *fmt == ' ' || *fmt == '.'
-			  || *fmt == '+'))
-		break;
-	      fmt++;
+	      break;
 	    }
 
+	  /* Parse width and precision, putting "*.*" into FMTSTAR.  */
+	  if ('1' <= *fmt && *fmt <= '9')
+	    fmt = parse_format_integer (fmt, &wid);
+	  if (*fmt == '.')
+	    fmt = parse_format_integer (fmt + 1, &prec);
+	  *string++ = '*';
+	  *string++ = '.';
+	  *string++ = '*';
+
 	  /* Check for the length modifiers in textual length order, so
 	     that longer modifiers override shorter ones.  */
 	  for (mlen = 1; mlen <= maxmlen; mlen++)
 	    {
-	      if (format_end - fmt < mlen)
-		break;
 	      if (mlen == 1 && *fmt == 'l')
 		length_modifier = long_modifier;
-	      if (mlen == pDlen && memcmp (fmt, pD, pDlen) == 0)
+	      if (mlen == pDlen && strncmp (fmt, pD, pDlen) == 0)
 		length_modifier = pD_modifier;
-	      if (mlen == pIlen && memcmp (fmt, pI, pIlen) == 0)
+	      if (mlen == pIlen && strncmp (fmt, pI, pIlen) == 0)
 		length_modifier = pI_modifier;
-	      if (mlen == pMlen && memcmp (fmt, PRIdMAX, pMlen) == 0)
+	      if (mlen == pMlen && strncmp (fmt, PRIdMAX, pMlen) == 0)
 		length_modifier = pM_modifier;
 	    }
 
+	  /* Copy optional length modifier and conversion specifier
+	     character into FMTSTAR, and append a NUL.  */
 	  mlen = modifier_len[length_modifier];
-	  memcpy (string, fmt + 1, mlen);
-	  string += mlen;
+	  string = mempcpy (string, fmt, mlen + 1);
 	  fmt += mlen;
 	  *string = 0;
 
-	  /* Make the size bound large enough to handle floating point formats
+	  /* An idea of how much space we might need.
+	     This might be a field width or a precision; e.g.
+	     %1.1000f and %1000.1f both might need 1000+ bytes.
+	     Make it large enough to handle floating point formats
 	     with large numbers.  */
-	  size_bound += SIZE_BOUND_EXTRA;
+	  ptrdiff_t size_bound = max (wid, prec) + SIZE_BOUND_EXTRA;
 
 	  /* Make sure we have that much.  */
 	  if (size_bound > size_allocated)
@@ -257,48 +267,49 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	      sprintf_buffer = big_buffer;
 	      size_allocated = size_bound;
 	    }
-	  minlen = 0;
+	  int minlen = 0;
+	  ptrdiff_t tem;
 	  switch (*fmt++)
 	    {
 	    default:
-	      error ("Invalid format operation %s", fmtcpy);
+	      error ("Invalid format operation %s", fmt0);
 
-/*	    case 'b': */
-	    case 'l':
 	    case 'd':
 	      switch (length_modifier)
 		{
 		case no_modifier:
 		  {
 		    int v = va_arg (ap, int);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    long v = va_arg (ap, long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
 		signed_pD_modifier:
 		  {
 		    ptrdiff_t v = va_arg (ap, ptrdiff_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pI_modifier:
 		  {
 		    EMACS_INT v = va_arg (ap, EMACS_INT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    intmax_t v = va_arg (ap, intmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -311,13 +322,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case no_modifier:
 		  {
 		    unsigned v = va_arg (ap, unsigned);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    unsigned long v = va_arg (ap, unsigned long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
@@ -325,15 +336,17 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case pI_modifier:
 		  {
 		    EMACS_UINT v = va_arg (ap, EMACS_UINT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    uintmax_t v = va_arg (ap, uintmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -344,22 +357,18 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	    case 'g':
 	      {
 		double d = va_arg (ap, double);
-		tem = sprintf (sprintf_buffer, fmtcpy, d);
+		tem = sprintf (sprintf_buffer, fmtstar, wid, prec, d);
 		/* Now copy into final output, truncating as necessary.  */
 		string = sprintf_buffer;
 		goto doit;
 	      }
 
-	    case 'S':
-	      string[-1] = 's';
-	      FALLTHROUGH;
 	    case 's':
-	      if (fmtcpy[1] != 's')
-		minlen = atoi (&fmtcpy[1]);
+	      minlen = minusflag ? -wid : wid;
 	      string = va_arg (ap, char *);
 	      tem = strnlen (string, STRING_BYTES_BOUND + 1);
 	      if (tem == STRING_BYTES_BOUND + 1)
-		error ("String for %%s or %%S format is too long");
+		error ("String for %%s format is too long");
 	      width = strwidth (string, tem);
 	      goto doit1;
 
@@ -432,14 +441,12 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		string = charbuf;
 		string[tem] = 0;
 		width = strwidth (string, tem);
-		if (fmtcpy[1] != 'c')
-		  minlen = atoi (&fmtcpy[1]);
+		minlen = minusflag ? -wid : wid;
 		goto doit1;
 	      }
 
 	    case '%':
 	      /* Treat this '%' as normal.  */
-	      fmt0 = fmt - 1;
 	      break;
 	    }
 	}
@@ -450,13 +457,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	src = uLSQM, srclen = sizeof uLSQM - 1;
       else if (quoting_style == CURVE_QUOTING_STYLE && fmtchar == '\'')
 	src = uRSQM, srclen = sizeof uRSQM - 1;
-      else if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
-	src = "'", srclen = 1;
       else
 	{
-	  while (fmt < format_end && !CHAR_HEAD_P (*fmt))
-	    fmt++;
-	  src = fmt0, srclen = fmt - fmt0;
+	  if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
+	    fmtchar = '\'';
+	  eassert (ASCII_CHAR_P (fmtchar));
+	  *bufptr++ = fmtchar;
+	  continue;
 	}
 
       if (bufsize < srclen)
@@ -479,8 +486,6 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   xfree (big_buffer);
 
   *bufptr = 0;		/* Make sure our string ends with a '\0' */
-
-  SAFE_FREE ();
   return bufptr - buffer;
 }
 
@@ -495,10 +500,9 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 ptrdiff_t
 esprintf (char *buf, char const *format, ...)
 {
-  ptrdiff_t nbytes;
   va_list ap;
   va_start (ap, format);
-  nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, 0, ap);
+  ptrdiff_t nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, ap);
   va_end (ap);
   return nbytes;
 }
@@ -534,10 +538,9 @@ evxprintf (char **buf, ptrdiff_t *bufsize,
 {
   for (;;)
     {
-      ptrdiff_t nbytes;
       va_list ap_copy;
       va_copy (ap_copy, ap);
-      nbytes = doprnt (*buf, *bufsize, format, 0, ap_copy);
+      ptrdiff_t nbytes = doprnt (*buf, *bufsize, format, ap_copy);
       va_end (ap_copy);
       if (nbytes < *bufsize - 1)
 	return nbytes;
diff --git a/src/lisp.h b/src/lisp.h
index a24898004d..957ca41702 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -4034,8 +4034,8 @@ #define FLOAT_TO_STRING_BUFSIZE 350
 extern void syms_of_print (void);
 
 /* Defined in doprnt.c.  */
-extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, const char *,
-			 va_list);
+extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, va_list)
+  ATTRIBUTE_FORMAT_PRINTF (3, 0);
 extern ptrdiff_t esprintf (char *, char const *, ...)
   ATTRIBUTE_FORMAT_PRINTF (2, 3);
 extern ptrdiff_t exprintf (char **, ptrdiff_t *, char const *, ptrdiff_t,
diff --git a/src/sysdep.c b/src/sysdep.c
index e161172a79..790ae084d3 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -2192,7 +2192,7 @@ snprintf (char *buf, size_t bufsize, char const *format, ...)
   if (size)
     {
       va_start (ap, format);
-      nbytes = doprnt (buf, size, format, 0, ap);
+      nbytes = doprnt (buf, size, format, ap);
       va_end (ap);
     }
 
diff --git a/src/xdisp.c b/src/xdisp.c
index 615f0ca7cf..213c2a464a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -11269,13 +11269,10 @@ vmessage (const char *m, va_list ap)
 	{
 	  if (m)
 	    {
-	      ptrdiff_t len;
 	      ptrdiff_t maxsize = FRAME_MESSAGE_BUF_SIZE (f);
 	      USE_SAFE_ALLOCA;
 	      char *message_buf = SAFE_ALLOCA (maxsize + 1);
-
-	      len = doprnt (message_buf, maxsize, m, 0, ap);
-
+	      ptrdiff_t len = doprnt (message_buf, maxsize, m, ap);
 	      message3 (make_string (message_buf, len));
 	      SAFE_FREE ();
 	    }
-- 
2.17.1






^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-09-16  1:50 bug#43439: [PATCH] doprnt improvements Paul Eggert
@ 2020-09-16 14:58 ` Eli Zaretskii
  2020-09-16 22:09   ` Paul Eggert
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-09-16 14:58 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 43439

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 15 Sep 2020 18:50:51 -0700
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> 
> (doprnt): Omit FORMAT_END argument, since it’s always NULL,
> which means doprnt must call strlen on FORMAT; doing this means
> doprnt needs just one pass over FORMAT, not two.  All callers changed.

This loses a feature.  Emacs traditionally supports strings with
embedded null characters, and this feature is in line with that.  It
is true that it is currently unused, but why is it a good idea to
remove it?

If the problem is the slight inefficiency caused by the call to
strlen, we could instead solve it in the callers: all the formats I've
seen are const strings, so the value of FORMAT_END can be computed at
compile time, and used instead of passing NULL.

> Drop support for
> "%S" which is never used and which would cause GCC to warn anyway.

This is an old compatibility feature, I'd rather not drop it.  Who
knows what code relies on the fact that 'message' and 'format-message'
support it?

Thanks.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-09-16 14:58 ` Eli Zaretskii
@ 2020-09-16 22:09   ` Paul Eggert
  2020-09-18  7:30     ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2020-09-16 22:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43439

On 9/16/20 7:58 AM, Eli Zaretskii wrote:

> Emacs traditionally supports strings with
> embedded null characters, and this feature is in line with that.  It
> is true that it is currently unused, but why is it a good idea to
> remove it?

It's a good idea not only because the feature is currently unused and its 
support complicates and adds bugs to the code, but also because it would be a 
bad idea to ever use the feature.

The Emacs feature is for Lisp strings. Emacs does not (and for API reasons, it 
cannot practically) rely on embedded NULs in C strings. Among other things, if 
we tried to use C-style printf formats with embedded NULs, GCC's warnings about 
formats not matching their arguments would stop working. These GCC warnings are 
quite useful for preventing bugs in Emacs's C code and have helped to catch many 
such bugs, and we should not give them up.

More generally, the vestigial support for NULs and %S in doprnt's C formats 
dates back to long ago, before GCC warned about these features. It may have made 
sense back then but it does not make sense now. Any C-level formatting facility 
that supports NULs and %S should not attempt to use the longstanding printf API 
that is incompatible with such support - it should be a separate facility. And 
Emacs C code already has a non-printf facility for formatting with NULs and %S - 
e.g., Fformat and Fformat_message - so it doesn't need yet another one.

> If the problem is the slight inefficiency caused by the call to
> strlen, we could instead solve it in the callers: all the formats I've
> seen are const strings, so the value of FORMAT_END can be computed at
> compile time, and used instead of passing NULL.

This would require unnecessary complication of the code and runtime overhead. 
doprnt is called directly only in few places: esprintf, evxprintf, vmessage, and 
the (rarely-if-ever used) snprintf replacement. If we modified callers to do 
what you're suggesting, we'd need to modify the callers of these functions, and 
their callers too, all the way back to the original ancestor call that specifies 
the format. This would clutter and bloat the code and would add runtime cost to 
all the callers. For example, we'd have to change all calls to the 'error' 
function from something like this:

   if (ret < GNUTLS_E_SUCCESS)
     error ("GnuTLS AEAD cipher %s/%s initialization failed: %s",
	   gnutls_cipher_get_name (gca), desc, emacs_gnutls_strerror (ret));

to something like this:

   if (ret < GNUTLS_E_SUCCESS)
     {
       char const msg[] = "GnuTLS AEAD cipher %s/%s initialization failed: %s";
       error (msg, sizeof msg - 1,
	     gnutls_cipher_get_name (gca), desc, emacs_gnutls_strerror (ret));
     }

Of course we could invent a new macro ERROR to package this up, but such a macro 
would still be less efficient than what we have, and worse it would not always 
work, for cases like this:

   if (NILP (tem) || (XBUFFER (tem) != current_buffer))
     error (for_region ? "The mark is not set now, so there is no region"
	   : "The mark is not set now");

or like this:

       error (format, string); // in x_check_errors

And of course this could be gotten around as well, but we're now talking about a 
reasonably large amount of code surgery that will hurt code readability and 
runtime performance, all to support a low-level C feature that Emacs does not 
use and won't ever reasonably use.

Instead, we should leave most of the C code alone and just adjust 'doprnt' and 
its very few callers.

>> Drop support for
>> "%S" which is never used and which would cause GCC to warn anyway.
> 
> This is an old compatibility feature, I'd rather not drop it.  Who
> knows what code relies on the fact that 'message' and 'format-message'
> support it?

I know because I checked all the code. No Emacs C code uses %S. And none is 
likely to use it in the future because GCC warns about it if you try. (To be 
specific: GCC warns unless you use %S compatibly with its standard C meaning, 
which differs from that of Emacs doprnt - which is yet another compatibility 
minefield if we insist on keeping doprnt's unused %S feature.)





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-09-16 22:09   ` Paul Eggert
@ 2020-09-18  7:30     ` Eli Zaretskii
  2020-10-15 17:58       ` Paul Eggert
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-09-18  7:30 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 43439

> Cc: 43439@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 16 Sep 2020 15:09:50 -0700
> 
> On 9/16/20 7:58 AM, Eli Zaretskii wrote:
> 
> > Emacs traditionally supports strings with
> > embedded null characters, and this feature is in line with that.  It
> > is true that it is currently unused, but why is it a good idea to
> > remove it?
> 
> It's a good idea not only because the feature is currently unused and its 
> support complicates and adds bugs to the code, but also because it would be a 
> bad idea to ever use the feature.
> 
> The Emacs feature is for Lisp strings. Emacs does not (and for API reasons, it 
> cannot practically) rely on embedded NULs in C strings. Among other things, if 
> we tried to use C-style printf formats with embedded NULs, GCC's warnings about 
> formats not matching their arguments would stop working. These GCC warnings are 
> quite useful for preventing bugs in Emacs's C code and have helped to catch many 
> such bugs, and we should not give them up.

How about a compromise: we modify doprint to exit when either it finds
NUL or reaches the character specified by FORMAT_END?  This will allow
us to keep some of the feature, and I think the amount of changes will
be smaller.  It should also not be much slower than what you propose.

> More generally, the vestigial support for NULs and %S in doprnt's C formats 
> dates back to long ago, before GCC warned about these features.

I understand your desire to have GCC warnings about this, but GCC is a
tool, it shouldn't dictate what features we keep and which ones we
drop.  We should do it the other way around.  doprnt callers don't
change much, so the GCC diagnostic features are not very important in
this case.

Thanks.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-09-18  7:30     ` Eli Zaretskii
@ 2020-10-15 17:58       ` Paul Eggert
  2020-10-15 18:12         ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2020-10-15 17:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43439

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

On 9/18/20 12:30 AM, Eli Zaretskii wrote:

> How about ... we modify doprint to exit when either it finds
> NUL or reaches the character specified by FORMAT_END?  This will allow
> us to keep some of the feature, and I think the amount of changes will
> be smaller.  It should also not be much slower than what you propose.

Better yet, let's leave doprnt's API unchanged, and add a function evsnprintf 
(named by analogy from esprintf) whose API is like C vsnprintf but which does 
formatting the Emacs way. We can avoid duplication of code by implementing 
doprnt in terms of evsnprintf. This fixes the performance issue with current 
Emacs, and avoids the need for evsnprintf having to check for both NULs and 
FORMAT_END etc. Updated patch attached.

[-- Attachment #2: 0001-New-function-evsnprintf-to-speed-clean-up-doprnt.patch --]
[-- Type: text/x-patch, Size: 23026 bytes --]

From c072575e7936c6e9f30864733677f19b1dcdc31a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 15 Oct 2020 10:47:10 -0700
Subject: [PATCH] New function evsnprintf to speed/clean up doprnt
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The new low-level C function evsnprintf improves on doprnt’s
performance and internal checking.  On my platform, this
improved CPU speed of ‘make -C lisp compile-always’ by 6%.
This patch implements some of my suggestions in Bug#8545,
with further changes suggested by Eli Zaretskii (Bug#43439).
* src/doprnt.c: Improve comments.
(SIZE_BOUND_EXTRA): Now at top level, for parse_format_integer.
(parse_format_integer): New static function, containing some of
the old doprnt.  Fix a bug that caused doprnt to infloop on
formats like "%10s" that Emacs does not use.  We could simplify
doprnt further if we dropped support for these never-used formats.
(doprnt): Implement in terms of the new evsnprintf function.
Put inside #if 0 since it’s not currently used.
(evsnprintf): New function, that takes just one pass over FORMAT,
not two.  All doprnt callers changed to use evsnprintf.
This new function has the contents of the old doprnt implementation,
except its format argument argument is a null-terminated string.
Assume C99 to make code clearer.  Do not use malloc or alloca
to allocate a copy of the format FMTCPY; instead, use a small
fixed-size array FMTSTAR, and use '*' in that array to represent
width and precision, passing them as separate int arguments.
Use eassume to pacify GCC in switch statements.  Drop support for
"%S" which is not needed, is never used and which would cause GCC
to warn anyway.
---
 src/doprnt.c | 260 ++++++++++++++++++++++++++++++---------------------
 src/image.c  |   2 +-
 src/lisp.h   |   4 +-
 src/sysdep.c |   2 +-
 src/xdisp.c  |   5 +-
 5 files changed, 159 insertions(+), 114 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index ceadf3bdfa..022b9ef16b 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -28,6 +28,7 @@
    . For %s and %c, when field width is specified (e.g., %25s), it accounts for
      the display width of each character, according to char-width-table.  That
      is, it does not assume that each character takes one column on display.
+     Nor does it assume that each character is a single byte.
 
    . If the size of the buffer is not enough to produce the formatted string in
      its entirety, it makes sure that truncation does not chop the last
@@ -42,38 +43,41 @@
      Emacs can handle.
 
    OTOH, this function supports only a small subset of the standard C formatted
-   output facilities.  E.g., %u and %ll are not supported, and precision is
-   ignored %s and %c conversions.  (See below for the detailed documentation of
-   what is supported.)  However, this is okay, as this function is supposed to
-   be called from `error' and similar functions, and thus does not need to
-   support features beyond those in `Fformat_message', which is used
-   by `error' on the Lisp level.  */
+   output facilities.  E.g., %u is not supported, precision is ignored
+   in %s and %c conversions, and %lld does not necessarily work and
+   code should use something like %"pM"d with intmax_t instead.
+   (See below for the detailed documentation of what is supported.)
+   However, this is okay, as this function is supposed to be called
+   from 'error' and similar C functions, and thus does not need to
+   support all the features of 'Fformat_message', which is used by the
+   Lisp 'error' function.  */
 
 /* In the FORMAT argument this function supports ` and ' as directives
    that output left and right quotes as per ‘text-quoting style’.  It
    also supports the following %-sequences:
 
    %s means print a string argument.
-   %S is treated as %s, for loose compatibility with `Fformat_message'.
    %d means print a `signed int' argument in decimal.
    %o means print an `unsigned int' argument in octal.
    %x means print an `unsigned int' argument in hex.
    %e means print a `double' argument in exponential notation.
    %f means print a `double' argument in decimal-point notation.
    %g means print a `double' argument in exponential notation
-      or in decimal-point notation, whichever uses fewer characters.
+      or in decimal-point notation, depending on the value;
+      this is often (though not always) the shorter of the two notations.
    %c means print a `signed int' argument as a single character.
    %% means produce a literal % character.
 
-   A %-sequence may contain optional flag, width, and precision specifiers, and
-   a length modifier, as follows:
+   A %-sequence other than %% may contain optional flags, width, precision,
+   and length, as follows:
 
      %<flags><width><precision><length>character
 
    where flags is [+ -0], width is [0-9]+, precision is .[0-9]+, and length
    is empty or l or the value of the pD or pI or PRIdMAX (sans "d") macros.
-   Also, %% in a format stands for a single % in the output.  A % that
-   does not introduce a valid %-sequence causes undefined behavior.
+   A % that does not introduce a valid %-sequence causes undefined behavior.
+   ASCII bytes in FORMAT other than % are copied through as-is;
+   non-ASCII bytes should not appear in FORMAT.
 
    The + flag character inserts a + before any positive number, while a space
    inserts a space before any positive number; these flags only affect %d, %o,
@@ -99,7 +103,9 @@
 
    For %e, %f, and %g sequences, the number after the "." in the precision
    specifier says how many decimal places to show; if zero, the decimal point
-   itself is omitted.  For %s and %S, the precision specifier is ignored.  */
+   itself is omitted.  For %d, %o, and %x sequences, the precision specifies
+   the minimum number of digits to appear.  Precision specifiers are
+   not supported for other %-sequences.  */
 
 #include <config.h>
 #include <stdio.h>
@@ -115,6 +121,36 @@
    another macro.  */
 #include "character.h"
 
+/* Enough to handle floating point formats with large numbers.  */
+enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
+
+/* Parse FMT as an unsigned decimal integer, putting its value into *VALUE.
+   Return the address of the first byte after the integer.
+   If FMT is not an integer, return FMT and store zero into *VALUE.  */
+static char const *
+parse_format_integer (char const *fmt, int *value)
+{
+  int n = 0;
+  bool overflow = false;
+  for (; '0' <= *fmt && *fmt <= '9'; fmt++)
+    {
+      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
+      overflow |= INT_ADD_WRAPV (n, *fmt - '0', &n);
+    }
+  if (overflow || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
+    error ("Format width or precision too large");
+  *value = n;
+  return fmt;
+}
+
+#if 0
+/* The doprnt function is not currently used in Emacs.
+   It's here in case it's needed later.
+   It acts like evsnprintf, except it also supports NULs in formats.  */
+
+ptrdiff_t doprnt (char *, ptrdiff_t, char const *, char const *, va_list)
+  ATTRIBUTE_FORMAT_PRINTF (3, 0);
+
 /* Generate output from a format-spec FORMAT,
    terminated at position FORMAT_END.
    (*FORMAT_END is not part of the format, but must exist and be readable.)
@@ -130,13 +166,44 @@
 ptrdiff_t
 doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	const char *format_end, va_list ap)
+{
+  ptrdiff_t nbytes = 0;
+  char const *f = format;
+  for (char const *fend; (fend = memchr (f, 0, format_end - f)); f = fend + 1)
+    {
+      nbytes += evsnprintf (buffer + nbytes, bufsize - nbytes, f, ap);
+      if (nbytes == bufsize - 1)
+	return nbytes;
+      nbytes++;
+    }
+
+  USE_SAFE_ALLOCA;
+  ptrdiff_t ftaillen = format_end - f;
+  char *ftail = SAFE_ALLOCA (ftaillen + 1);
+  memcpy (ftail, f, ftaillen);
+  ftail[ftaillen] = 0;
+  nbytes += evsnprintf (buffer + nbytes, bufsize - nbytes, ftail, ap);
+  SAFE_FREE ();
+  return nbytes;
+}
+#endif
+
+/* Generate output from a format-spec FORMAT.
+   Output goes in BUFFER, which has room for BUFSIZE chars.
+   BUFSIZE must be positive.  If the output does not fit, truncate it
+   to fit and return BUFSIZE - 1; if this truncates a multibyte
+   sequence, store '\0' into the sequence's first byte.
+   Return the number of bytes stored into BUFFER, excluding
+   the terminating null byte.  Output is always null-terminated.
+   String arguments are passed as C strings.
+   Integers are passed as C integers.  */
+
+ptrdiff_t
+evsnprintf (char *buffer, ptrdiff_t bufsize, char const *format, va_list ap)
 {
   const char *fmt = format;	/* Pointer into format string.  */
   char *bufptr = buffer;	/* Pointer into output buffer.  */
 
-  /* Enough to handle floating point formats with large numbers.  */
-  enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
-
   /* Use this for sprintf unless we need something really big.  */
   char tembuf[SIZE_BOUND_EXTRA + 50];
 
@@ -150,103 +217,91 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   char *big_buffer = NULL;
 
   enum text_quoting_style quoting_style = text_quoting_style ();
-  ptrdiff_t tem = -1;
-  char *string;
-  char fixed_buffer[20];	/* Default buffer for small formatting. */
-  char *fmtcpy;
-  int minlen;
-  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
-  USE_SAFE_ALLOCA;
-
-  if (format_end == 0)
-    format_end = format + strlen (format);
-
-  fmtcpy = (format_end - format < sizeof (fixed_buffer) - 1
-	    ? fixed_buffer
-	    : SAFE_ALLOCA (format_end - format + 1));
 
   bufsize--;
 
   /* Loop until end of format string or buffer full. */
-  while (fmt < format_end && bufsize > 0)
+  while (*fmt && bufsize > 0)
     {
       char const *fmt0 = fmt;
       char fmtchar = *fmt++;
       if (fmtchar == '%')
 	{
-	  ptrdiff_t size_bound = 0;
 	  ptrdiff_t width;  /* Columns occupied by STRING on display.  */
 	  enum {
 	    pDlen = sizeof pD - 1,
 	    pIlen = sizeof pI - 1,
-	    pMlen = sizeof PRIdMAX - 2
+	    pMlen = sizeof PRIdMAX - 2,
+	    maxmlen = max (max (1, pDlen), max (pIlen, pMlen))
 	  };
 	  enum {
 	    no_modifier, long_modifier, pD_modifier, pI_modifier, pM_modifier
 	  } length_modifier = no_modifier;
 	  static char const modifier_len[] = { 0, 1, pDlen, pIlen, pMlen };
-	  int maxmlen = max (max (1, pDlen), max (pIlen, pMlen));
 	  int mlen;
+	  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
 
-	  /* Copy this one %-spec into fmtcpy.  */
-	  string = fmtcpy;
+	  /* Width and precision specified by this %-sequence.  */
+	  int wid = 0, prec = -1;
+
+	  /* FMTSTAR will be a "%*.*X"-like version of this %-sequence.
+	     Start by putting '%' into FMTSTAR.  */
+	  char fmtstar[sizeof "%-+ 0*.*d" + maxmlen];
+	  char *string = fmtstar;
 	  *string++ = '%';
-	  while (fmt < format_end)
+
+	  /* Copy at most one instance of each flag into FMTSTAR.  */
+	  bool minusflag = false, plusflag = false, zeroflag = false,
+	    spaceflag = false;
+	  for (;; fmt++)
 	    {
-	      *string++ = *fmt;
-	      if ('0' <= *fmt && *fmt <= '9')
+	      *string = *fmt;
+	      switch (*fmt)
 		{
-		  /* Get an idea of how much space we might need.
-		     This might be a field width or a precision; e.g.
-		     %1.1000f and %1000.1f both might need 1000+ bytes.
-		     Parse the width or precision, checking for overflow.  */
-		  int n = *fmt - '0';
-		  bool overflow = false;
-		  while (fmt + 1 < format_end
-			 && '0' <= fmt[1] && fmt[1] <= '9')
-		    {
-		      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
-		      overflow |= INT_ADD_WRAPV (n, fmt[1] - '0', &n);
-		      *string++ = *++fmt;
-		    }
-
-		  if (overflow
-		      || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
-		    error ("Format width or precision too large");
-		  if (size_bound < n)
-		    size_bound = n;
+		case '-': string += !minusflag; minusflag = true; continue;
+		case '+': string += !plusflag; plusflag = true; continue;
+		case ' ': string += !spaceflag; spaceflag = true; continue;
+		case '0': string += !zeroflag; zeroflag = true; continue;
 		}
-	      else if (! (*fmt == '-' || *fmt == ' ' || *fmt == '.'
-			  || *fmt == '+'))
-		break;
-	      fmt++;
+	      break;
 	    }
 
+	  /* Parse width and precision, putting "*.*" into FMTSTAR.  */
+	  if ('1' <= *fmt && *fmt <= '9')
+	    fmt = parse_format_integer (fmt, &wid);
+	  if (*fmt == '.')
+	    fmt = parse_format_integer (fmt + 1, &prec);
+	  *string++ = '*';
+	  *string++ = '.';
+	  *string++ = '*';
+
 	  /* Check for the length modifiers in textual length order, so
 	     that longer modifiers override shorter ones.  */
 	  for (mlen = 1; mlen <= maxmlen; mlen++)
 	    {
-	      if (format_end - fmt < mlen)
-		break;
 	      if (mlen == 1 && *fmt == 'l')
 		length_modifier = long_modifier;
-	      if (mlen == pDlen && memcmp (fmt, pD, pDlen) == 0)
+	      if (mlen == pDlen && strncmp (fmt, pD, pDlen) == 0)
 		length_modifier = pD_modifier;
-	      if (mlen == pIlen && memcmp (fmt, pI, pIlen) == 0)
+	      if (mlen == pIlen && strncmp (fmt, pI, pIlen) == 0)
 		length_modifier = pI_modifier;
-	      if (mlen == pMlen && memcmp (fmt, PRIdMAX, pMlen) == 0)
+	      if (mlen == pMlen && strncmp (fmt, PRIdMAX, pMlen) == 0)
 		length_modifier = pM_modifier;
 	    }
 
+	  /* Copy optional length modifier and conversion specifier
+	     character into FMTSTAR, and append a NUL.  */
 	  mlen = modifier_len[length_modifier];
-	  memcpy (string, fmt + 1, mlen);
-	  string += mlen;
+	  string = mempcpy (string, fmt, mlen + 1);
 	  fmt += mlen;
 	  *string = 0;
 
-	  /* Make the size bound large enough to handle floating point formats
+	  /* An idea of how much space we might need.
+	     This might be a field width or a precision; e.g.
+	     %1.1000f and %1000.1f both might need 1000+ bytes.
+	     Make it large enough to handle floating point formats
 	     with large numbers.  */
-	  size_bound += SIZE_BOUND_EXTRA;
+	  ptrdiff_t size_bound = max (wid, prec) + SIZE_BOUND_EXTRA;
 
 	  /* Make sure we have that much.  */
 	  if (size_bound > size_allocated)
@@ -257,48 +312,49 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	      sprintf_buffer = big_buffer;
 	      size_allocated = size_bound;
 	    }
-	  minlen = 0;
+	  int minlen = 0;
+	  ptrdiff_t tem;
 	  switch (*fmt++)
 	    {
 	    default:
-	      error ("Invalid format operation %s", fmtcpy);
+	      error ("Invalid format operation %s", fmt0);
 
-/*	    case 'b': */
-	    case 'l':
 	    case 'd':
 	      switch (length_modifier)
 		{
 		case no_modifier:
 		  {
 		    int v = va_arg (ap, int);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    long v = va_arg (ap, long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
 		signed_pD_modifier:
 		  {
 		    ptrdiff_t v = va_arg (ap, ptrdiff_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pI_modifier:
 		  {
 		    EMACS_INT v = va_arg (ap, EMACS_INT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    intmax_t v = va_arg (ap, intmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -311,13 +367,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case no_modifier:
 		  {
 		    unsigned v = va_arg (ap, unsigned);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    unsigned long v = va_arg (ap, unsigned long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
@@ -325,15 +381,17 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case pI_modifier:
 		  {
 		    EMACS_UINT v = va_arg (ap, EMACS_UINT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    uintmax_t v = va_arg (ap, uintmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -344,22 +402,18 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	    case 'g':
 	      {
 		double d = va_arg (ap, double);
-		tem = sprintf (sprintf_buffer, fmtcpy, d);
+		tem = sprintf (sprintf_buffer, fmtstar, wid, prec, d);
 		/* Now copy into final output, truncating as necessary.  */
 		string = sprintf_buffer;
 		goto doit;
 	      }
 
-	    case 'S':
-	      string[-1] = 's';
-	      FALLTHROUGH;
 	    case 's':
-	      if (fmtcpy[1] != 's')
-		minlen = atoi (&fmtcpy[1]);
+	      minlen = minusflag ? -wid : wid;
 	      string = va_arg (ap, char *);
 	      tem = strnlen (string, STRING_BYTES_BOUND + 1);
 	      if (tem == STRING_BYTES_BOUND + 1)
-		error ("String for %%s or %%S format is too long");
+		error ("String for %%s format is too long");
 	      width = strwidth (string, tem);
 	      goto doit1;
 
@@ -432,14 +486,12 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		string = charbuf;
 		string[tem] = 0;
 		width = strwidth (string, tem);
-		if (fmtcpy[1] != 'c')
-		  minlen = atoi (&fmtcpy[1]);
+		minlen = minusflag ? -wid : wid;
 		goto doit1;
 	      }
 
 	    case '%':
 	      /* Treat this '%' as normal.  */
-	      fmt0 = fmt - 1;
 	      break;
 	    }
 	}
@@ -450,13 +502,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	src = uLSQM, srclen = sizeof uLSQM - 1;
       else if (quoting_style == CURVE_QUOTING_STYLE && fmtchar == '\'')
 	src = uRSQM, srclen = sizeof uRSQM - 1;
-      else if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
-	src = "'", srclen = 1;
       else
 	{
-	  while (fmt < format_end && !CHAR_HEAD_P (*fmt))
-	    fmt++;
-	  src = fmt0, srclen = fmt - fmt0;
+	  if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
+	    fmtchar = '\'';
+	  eassert (ASCII_CHAR_P (fmtchar));
+	  *bufptr++ = fmtchar;
+	  continue;
 	}
 
       if (bufsize < srclen)
@@ -479,15 +531,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   xfree (big_buffer);
 
   *bufptr = 0;		/* Make sure our string ends with a '\0' */
-
-  SAFE_FREE ();
   return bufptr - buffer;
 }
 
 /* Format to an unbounded buffer BUF.  This is like sprintf, except it
    is not limited to returning an 'int' so it doesn't have a silly 2
    GiB limit on typical 64-bit hosts.  However, it is limited to the
-   Emacs-style formats that doprnt supports, and it requotes ` and '
+   Emacs-style formats that evsnprintf supports, and it requotes ` and '
    as per ‘text-quoting-style’.
 
    Return the number of bytes put into BUF, excluding the terminating
@@ -495,10 +545,9 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 ptrdiff_t
 esprintf (char *buf, char const *format, ...)
 {
-  ptrdiff_t nbytes;
   va_list ap;
   va_start (ap, format);
-  nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, 0, ap);
+  ptrdiff_t nbytes = evsnprintf (buf, TYPE_MAXIMUM (ptrdiff_t), format, ap);
   va_end (ap);
   return nbytes;
 }
@@ -534,10 +583,9 @@ evxprintf (char **buf, ptrdiff_t *bufsize,
 {
   for (;;)
     {
-      ptrdiff_t nbytes;
       va_list ap_copy;
       va_copy (ap_copy, ap);
-      nbytes = doprnt (*buf, *bufsize, format, 0, ap_copy);
+      ptrdiff_t nbytes = evsnprintf (*buf, *bufsize, format, ap_copy);
       va_end (ap_copy);
       if (nbytes < *bufsize - 1)
 	return nbytes;
diff --git a/src/image.c b/src/image.c
index 25d5af8a8d..8a030c154a 100644
--- a/src/image.c
+++ b/src/image.c
@@ -7826,7 +7826,7 @@ tiff_handler (const char *, const char *, const char *, va_list)
 tiff_handler (const char *log_format, const char *title,
 	      const char *format, va_list ap)
 {
-  /* doprnt is not suitable here, as TIFF handlers are called from
+  /* evsnprintf is not suitable here, as TIFF handlers are called from
      libtiff and are passed arbitrary printf directives.  Instead, use
      vsnprintf, taking care to be portable to nonstandard environments
      where vsnprintf returns -1 on buffer overflow.  Since it's just a
diff --git a/src/lisp.h b/src/lisp.h
index 45353fbef3..e4de9c8e9c 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -4034,8 +4034,8 @@ #define FLOAT_TO_STRING_BUFSIZE 350
 extern void syms_of_print (void);
 
 /* Defined in doprnt.c.  */
-extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, const char *,
-			 va_list);
+extern ptrdiff_t evsnprintf (char *, ptrdiff_t, const char *, va_list)
+  ATTRIBUTE_FORMAT_PRINTF (3, 0);
 extern ptrdiff_t esprintf (char *, char const *, ...)
   ATTRIBUTE_FORMAT_PRINTF (2, 3);
 extern ptrdiff_t exprintf (char **, ptrdiff_t *, char const *, ptrdiff_t,
diff --git a/src/sysdep.c b/src/sysdep.c
index f6c0ddee01..da176634a5 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -2192,7 +2192,7 @@ snprintf (char *buf, size_t bufsize, char const *format, ...)
   if (size)
     {
       va_start (ap, format);
-      nbytes = doprnt (buf, size, format, 0, ap);
+      nbytes = evsnprintf (buf, size, format, ap);
       va_end (ap);
     }
 
diff --git a/src/xdisp.c b/src/xdisp.c
index 5a62cd6eb5..2f1fe74e1d 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -11269,13 +11269,10 @@ vmessage (const char *m, va_list ap)
 	{
 	  if (m)
 	    {
-	      ptrdiff_t len;
 	      ptrdiff_t maxsize = FRAME_MESSAGE_BUF_SIZE (f);
 	      USE_SAFE_ALLOCA;
 	      char *message_buf = SAFE_ALLOCA (maxsize + 1);
-
-	      len = doprnt (message_buf, maxsize, m, 0, ap);
-
+	      ptrdiff_t len = evsnprintf (message_buf, maxsize, m, ap);
 	      message3 (make_string (message_buf, len));
 	      SAFE_FREE ();
 	    }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-10-15 17:58       ` Paul Eggert
@ 2020-10-15 18:12         ` Eli Zaretskii
  2020-10-15 18:50           ` Paul Eggert
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-10-15 18:12 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 43439

> Cc: 43439@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 15 Oct 2020 10:58:49 -0700
> 
> On 9/18/20 12:30 AM, Eli Zaretskii wrote:
> 
> > How about ... we modify doprint to exit when either it finds
> > NUL or reaches the character specified by FORMAT_END?  This will allow
> > us to keep some of the feature, and I think the amount of changes will
> > be smaller.  It should also not be much slower than what you propose.
> 
> Better yet, let's leave doprnt's API unchanged, and add a function evsnprintf 
> (named by analogy from esprintf) whose API is like C vsnprintf but which does 
> formatting the Emacs way.

No, let's not, please.  I didn't agree to modifying doprnt in
significant ways, so you are now suggesting to do an even more radical
modification, just under another name?  This is moving away of a
potential compromise point, not towards it.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-10-15 18:12         ` Eli Zaretskii
@ 2020-10-15 18:50           ` Paul Eggert
  2020-10-15 19:05             ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2020-10-15 18:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43439

On 10/15/20 11:12 AM, Eli Zaretskii wrote:
> I didn't agree to modifying doprnt in
> significant ways, so you are now suggesting to do an even more radical
> modification, just under another name?

If you'd rather have the patch keep doprnt entirely as-is (i.e., not change 
doprnt's implementation at all), I can easily modify the patch to do that.

All current Emacs code that calls doprnt would benefit from switching to the 
proposed evsnprintf function, an API that is simpler and faster and that has 
better static checking with GCC.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-10-15 18:50           ` Paul Eggert
@ 2020-10-15 19:05             ` Eli Zaretskii
  2020-10-15 20:06               ` Paul Eggert
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-10-15 19:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 43439

> Cc: 43439@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 15 Oct 2020 11:50:48 -0700
> 
> On 10/15/20 11:12 AM, Eli Zaretskii wrote:
> > I didn't agree to modifying doprnt in
> > significant ways, so you are now suggesting to do an even more radical
> > modification, just under another name?
> 
> If you'd rather have the patch keep doprnt entirely as-is (i.e., not change 
> doprnt's implementation at all), I can easily modify the patch to do that.

No, I want to continue calling doprnt directly, not replace all its
calls with a call to another function.  doprnt by itself is useless
unless it is used by the relevant primitives.  I see no need to
replace it with another function, because doprnt works and works well.

> All current Emacs code that calls doprnt would benefit from switching to the 
> proposed evsnprintf function, an API that is simpler and faster and that has 
> better static checking with GCC.

Yes, that's exactly where we disagree.  I made my proposal to find
some kind of middle ground, and was disappointed to see you suggesting
to move even farther from a potential agreement.

In general, I'm against messing with code that has been stable for
ages, for ephemeral benefits or minor stylistic reasons.  If nothing
else, it gets in the way of maintaining Emacs because code I've known
for years and could find with my eyes closed constantly shifts and
changes under my feet.  Another example of this is that src/lisp.h
macros look nowadays completely different from what they were several
years ago.  This need to constantly unlearn that which was burned into
my muscle memory is not pleasant at all.  Changes that take us forward
because they are needed for new and improved features are welcome and
justified, but there are no new features in all those changes,
including in the doprnt patch.  I wish this fever would stop.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-10-15 19:05             ` Eli Zaretskii
@ 2020-10-15 20:06               ` Paul Eggert
  2020-10-17 18:32                 ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2020-10-15 20:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43439

[-- Attachment #1: Type: text/plain, Size: 393 bytes --]

On 10/15/20 12:05 PM, Eli Zaretskii wrote:
> I want to continue calling doprnt directly, not replace all its
> calls with a call to another function.

OK, then attached is a patch that does things that way. This patch affects only 
the implementation of doprnt; no doprnt callers are affected. The code should 
have most of the proposed performance benefits of the earlier patches I proposed.

[-- Attachment #2: 0001-Improve-doprnt-performance.patch --]
[-- Type: text/x-patch, Size: 18397 bytes --]

From 9c33be96d1a74b2910753117542358c3d7cc4cf0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 15 Oct 2020 12:09:55 -0700
Subject: [PATCH] Improve doprnt performance

This patch implements some of my suggestions in Bug#8545,
with further changes suggested by Eli Zaretskii (Bug#43439).
* src/doprnt.c: Improve comments.
(SIZE_BOUND_EXTRA): Now at top level, for parse_format_integer.
(parse_format_integer): New static function, containing some of
the old doprnt.  Fix a bug that caused doprnt to infloop on
formats like "%10s" that Emacs does not use.  We could simplify
doprnt further if we dropped support for these never-used formats.
(doprnt_nul): New function.
(doprnt): Use it.  In the typical case where FORMAT_END is null, take
just one pass over FORMAT, not two.  Assume C99 to make code clearer.
Do not use malloc or alloca to allocate a copy of the format FMTCPY;
instead, use a small fixed-size array FMTSTAR, and use '*' in that
array to represent width and precision, passing them as separate int
arguments.  Use eassume to pacify GCC in switch statements.  Drop
support for "%S" which is not needed, is never used and which would
cause GCC to warn anyway.
---
 src/doprnt.c | 241 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 140 insertions(+), 101 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index ceadf3bdfa..f1dbf0cb2c 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -28,6 +28,7 @@
    . For %s and %c, when field width is specified (e.g., %25s), it accounts for
      the display width of each character, according to char-width-table.  That
      is, it does not assume that each character takes one column on display.
+     Nor does it assume that each character is a single byte.
 
    . If the size of the buffer is not enough to produce the formatted string in
      its entirety, it makes sure that truncation does not chop the last
@@ -42,38 +43,41 @@
      Emacs can handle.
 
    OTOH, this function supports only a small subset of the standard C formatted
-   output facilities.  E.g., %u and %ll are not supported, and precision is
-   ignored %s and %c conversions.  (See below for the detailed documentation of
-   what is supported.)  However, this is okay, as this function is supposed to
-   be called from `error' and similar functions, and thus does not need to
-   support features beyond those in `Fformat_message', which is used
-   by `error' on the Lisp level.  */
+   output facilities.  E.g., %u is not supported, precision is ignored
+   in %s and %c conversions, and %lld does not necessarily work and
+   code should use something like %"pM"d with intmax_t instead.
+   (See below for the detailed documentation of what is supported.)
+   However, this is okay, as this function is supposed to be called
+   from 'error' and similar C functions, and thus does not need to
+   support all the features of 'Fformat_message', which is used by the
+   Lisp 'error' function.  */
 
 /* In the FORMAT argument this function supports ` and ' as directives
    that output left and right quotes as per ‘text-quoting style’.  It
    also supports the following %-sequences:
 
    %s means print a string argument.
-   %S is treated as %s, for loose compatibility with `Fformat_message'.
    %d means print a `signed int' argument in decimal.
    %o means print an `unsigned int' argument in octal.
    %x means print an `unsigned int' argument in hex.
    %e means print a `double' argument in exponential notation.
    %f means print a `double' argument in decimal-point notation.
    %g means print a `double' argument in exponential notation
-      or in decimal-point notation, whichever uses fewer characters.
+      or in decimal-point notation, depending on the value;
+      this is often (though not always) the shorter of the two notations.
    %c means print a `signed int' argument as a single character.
    %% means produce a literal % character.
 
-   A %-sequence may contain optional flag, width, and precision specifiers, and
-   a length modifier, as follows:
+   A %-sequence other than %% may contain optional flags, width, precision,
+   and length, as follows:
 
      %<flags><width><precision><length>character
 
    where flags is [+ -0], width is [0-9]+, precision is .[0-9]+, and length
    is empty or l or the value of the pD or pI or PRIdMAX (sans "d") macros.
-   Also, %% in a format stands for a single % in the output.  A % that
-   does not introduce a valid %-sequence causes undefined behavior.
+   A % that does not introduce a valid %-sequence causes undefined behavior.
+   ASCII bytes in FORMAT other than % are copied through as-is;
+   non-ASCII bytes should not appear in FORMAT.
 
    The + flag character inserts a + before any positive number, while a space
    inserts a space before any positive number; these flags only affect %d, %o,
@@ -99,7 +103,9 @@
 
    For %e, %f, and %g sequences, the number after the "." in the precision
    specifier says how many decimal places to show; if zero, the decimal point
-   itself is omitted.  For %s and %S, the precision specifier is ignored.  */
+   itself is omitted.  For %d, %o, and %x sequences, the precision specifies
+   the minimum number of digits to appear.  Precision specifiers are
+   not supported for other %-sequences.  */
 
 #include <config.h>
 #include <stdio.h>
@@ -115,6 +121,56 @@
    another macro.  */
 #include "character.h"
 
+/* Enough to handle floating point formats with large numbers.  */
+enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
+
+/* Parse FMT as an unsigned decimal integer, putting its value into *VALUE.
+   Return the address of the first byte after the integer.
+   If FMT is not an integer, return FMT and store zero into *VALUE.  */
+static char const *
+parse_format_integer (char const *fmt, int *value)
+{
+  int n = 0;
+  bool overflow = false;
+  for (; '0' <= *fmt && *fmt <= '9'; fmt++)
+    {
+      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
+      overflow |= INT_ADD_WRAPV (n, *fmt - '0', &n);
+    }
+  if (overflow || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
+    error ("Format width or precision too large");
+  *value = n;
+  return fmt;
+}
+
+/* Like doprnt, except FORMAT_END must be non-null.
+   Although this function is never exercised in current Emacs,
+   it is retained in case some future Emacs version contains doprnt
+   callers with a nonnull FORMAT_END.  */
+static ptrdiff_t
+doprnt_nul (char *buffer, ptrdiff_t bufsize, char const *format,
+	    char const *format_end, va_list ap)
+{
+  ptrdiff_t nbytes = 0;
+  char const *f = format;
+  for (char const *fend; (fend = memchr (f, 0, format_end - f)); f = fend + 1)
+    {
+      nbytes += doprnt (buffer + nbytes, bufsize - nbytes, f, NULL, ap);
+      if (nbytes == bufsize - 1)
+	return nbytes;
+      nbytes++;
+    }
+
+  USE_SAFE_ALLOCA;
+  ptrdiff_t ftaillen = format_end - f;
+  char *ftail = SAFE_ALLOCA (ftaillen + 1);
+  memcpy (ftail, f, ftaillen);
+  ftail[ftaillen] = 0;
+  nbytes += doprnt (buffer + nbytes, bufsize - nbytes, ftail, NULL, ap);
+  SAFE_FREE ();
+  return nbytes;
+}
+
 /* Generate output from a format-spec FORMAT,
    terminated at position FORMAT_END.
    (*FORMAT_END is not part of the format, but must exist and be readable.)
@@ -131,12 +187,12 @@
 doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	const char *format_end, va_list ap)
 {
+  if (format_end)
+    return doprnt_nul (buffer, bufsize, format, format_end, ap);
+
   const char *fmt = format;	/* Pointer into format string.  */
   char *bufptr = buffer;	/* Pointer into output buffer.  */
 
-  /* Enough to handle floating point formats with large numbers.  */
-  enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
-
   /* Use this for sprintf unless we need something really big.  */
   char tembuf[SIZE_BOUND_EXTRA + 50];
 
@@ -150,103 +206,91 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   char *big_buffer = NULL;
 
   enum text_quoting_style quoting_style = text_quoting_style ();
-  ptrdiff_t tem = -1;
-  char *string;
-  char fixed_buffer[20];	/* Default buffer for small formatting. */
-  char *fmtcpy;
-  int minlen;
-  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
-  USE_SAFE_ALLOCA;
-
-  if (format_end == 0)
-    format_end = format + strlen (format);
-
-  fmtcpy = (format_end - format < sizeof (fixed_buffer) - 1
-	    ? fixed_buffer
-	    : SAFE_ALLOCA (format_end - format + 1));
 
   bufsize--;
 
   /* Loop until end of format string or buffer full. */
-  while (fmt < format_end && bufsize > 0)
+  while (*fmt && bufsize > 0)
     {
       char const *fmt0 = fmt;
       char fmtchar = *fmt++;
       if (fmtchar == '%')
 	{
-	  ptrdiff_t size_bound = 0;
 	  ptrdiff_t width;  /* Columns occupied by STRING on display.  */
 	  enum {
 	    pDlen = sizeof pD - 1,
 	    pIlen = sizeof pI - 1,
-	    pMlen = sizeof PRIdMAX - 2
+	    pMlen = sizeof PRIdMAX - 2,
+	    maxmlen = max (max (1, pDlen), max (pIlen, pMlen))
 	  };
 	  enum {
 	    no_modifier, long_modifier, pD_modifier, pI_modifier, pM_modifier
 	  } length_modifier = no_modifier;
 	  static char const modifier_len[] = { 0, 1, pDlen, pIlen, pMlen };
-	  int maxmlen = max (max (1, pDlen), max (pIlen, pMlen));
 	  int mlen;
+	  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
 
-	  /* Copy this one %-spec into fmtcpy.  */
-	  string = fmtcpy;
+	  /* Width and precision specified by this %-sequence.  */
+	  int wid = 0, prec = -1;
+
+	  /* FMTSTAR will be a "%*.*X"-like version of this %-sequence.
+	     Start by putting '%' into FMTSTAR.  */
+	  char fmtstar[sizeof "%-+ 0*.*d" + maxmlen];
+	  char *string = fmtstar;
 	  *string++ = '%';
-	  while (fmt < format_end)
+
+	  /* Copy at most one instance of each flag into FMTSTAR.  */
+	  bool minusflag = false, plusflag = false, zeroflag = false,
+	    spaceflag = false;
+	  for (;; fmt++)
 	    {
-	      *string++ = *fmt;
-	      if ('0' <= *fmt && *fmt <= '9')
+	      *string = *fmt;
+	      switch (*fmt)
 		{
-		  /* Get an idea of how much space we might need.
-		     This might be a field width or a precision; e.g.
-		     %1.1000f and %1000.1f both might need 1000+ bytes.
-		     Parse the width or precision, checking for overflow.  */
-		  int n = *fmt - '0';
-		  bool overflow = false;
-		  while (fmt + 1 < format_end
-			 && '0' <= fmt[1] && fmt[1] <= '9')
-		    {
-		      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
-		      overflow |= INT_ADD_WRAPV (n, fmt[1] - '0', &n);
-		      *string++ = *++fmt;
-		    }
-
-		  if (overflow
-		      || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
-		    error ("Format width or precision too large");
-		  if (size_bound < n)
-		    size_bound = n;
+		case '-': string += !minusflag; minusflag = true; continue;
+		case '+': string += !plusflag; plusflag = true; continue;
+		case ' ': string += !spaceflag; spaceflag = true; continue;
+		case '0': string += !zeroflag; zeroflag = true; continue;
 		}
-	      else if (! (*fmt == '-' || *fmt == ' ' || *fmt == '.'
-			  || *fmt == '+'))
-		break;
-	      fmt++;
+	      break;
 	    }
 
+	  /* Parse width and precision, putting "*.*" into FMTSTAR.  */
+	  if ('1' <= *fmt && *fmt <= '9')
+	    fmt = parse_format_integer (fmt, &wid);
+	  if (*fmt == '.')
+	    fmt = parse_format_integer (fmt + 1, &prec);
+	  *string++ = '*';
+	  *string++ = '.';
+	  *string++ = '*';
+
 	  /* Check for the length modifiers in textual length order, so
 	     that longer modifiers override shorter ones.  */
 	  for (mlen = 1; mlen <= maxmlen; mlen++)
 	    {
-	      if (format_end - fmt < mlen)
-		break;
 	      if (mlen == 1 && *fmt == 'l')
 		length_modifier = long_modifier;
-	      if (mlen == pDlen && memcmp (fmt, pD, pDlen) == 0)
+	      if (mlen == pDlen && strncmp (fmt, pD, pDlen) == 0)
 		length_modifier = pD_modifier;
-	      if (mlen == pIlen && memcmp (fmt, pI, pIlen) == 0)
+	      if (mlen == pIlen && strncmp (fmt, pI, pIlen) == 0)
 		length_modifier = pI_modifier;
-	      if (mlen == pMlen && memcmp (fmt, PRIdMAX, pMlen) == 0)
+	      if (mlen == pMlen && strncmp (fmt, PRIdMAX, pMlen) == 0)
 		length_modifier = pM_modifier;
 	    }
 
+	  /* Copy optional length modifier and conversion specifier
+	     character into FMTSTAR, and append a NUL.  */
 	  mlen = modifier_len[length_modifier];
-	  memcpy (string, fmt + 1, mlen);
-	  string += mlen;
+	  string = mempcpy (string, fmt, mlen + 1);
 	  fmt += mlen;
 	  *string = 0;
 
-	  /* Make the size bound large enough to handle floating point formats
+	  /* An idea of how much space we might need.
+	     This might be a field width or a precision; e.g.
+	     %1.1000f and %1000.1f both might need 1000+ bytes.
+	     Make it large enough to handle floating point formats
 	     with large numbers.  */
-	  size_bound += SIZE_BOUND_EXTRA;
+	  ptrdiff_t size_bound = max (wid, prec) + SIZE_BOUND_EXTRA;
 
 	  /* Make sure we have that much.  */
 	  if (size_bound > size_allocated)
@@ -257,48 +301,49 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	      sprintf_buffer = big_buffer;
 	      size_allocated = size_bound;
 	    }
-	  minlen = 0;
+	  int minlen = 0;
+	  ptrdiff_t tem;
 	  switch (*fmt++)
 	    {
 	    default:
-	      error ("Invalid format operation %s", fmtcpy);
+	      error ("Invalid format operation %s", fmt0);
 
-/*	    case 'b': */
-	    case 'l':
 	    case 'd':
 	      switch (length_modifier)
 		{
 		case no_modifier:
 		  {
 		    int v = va_arg (ap, int);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    long v = va_arg (ap, long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
 		signed_pD_modifier:
 		  {
 		    ptrdiff_t v = va_arg (ap, ptrdiff_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pI_modifier:
 		  {
 		    EMACS_INT v = va_arg (ap, EMACS_INT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    intmax_t v = va_arg (ap, intmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -311,13 +356,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case no_modifier:
 		  {
 		    unsigned v = va_arg (ap, unsigned);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    unsigned long v = va_arg (ap, unsigned long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
@@ -325,15 +370,17 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case pI_modifier:
 		  {
 		    EMACS_UINT v = va_arg (ap, EMACS_UINT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    uintmax_t v = va_arg (ap, uintmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -344,22 +391,18 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	    case 'g':
 	      {
 		double d = va_arg (ap, double);
-		tem = sprintf (sprintf_buffer, fmtcpy, d);
+		tem = sprintf (sprintf_buffer, fmtstar, wid, prec, d);
 		/* Now copy into final output, truncating as necessary.  */
 		string = sprintf_buffer;
 		goto doit;
 	      }
 
-	    case 'S':
-	      string[-1] = 's';
-	      FALLTHROUGH;
 	    case 's':
-	      if (fmtcpy[1] != 's')
-		minlen = atoi (&fmtcpy[1]);
+	      minlen = minusflag ? -wid : wid;
 	      string = va_arg (ap, char *);
 	      tem = strnlen (string, STRING_BYTES_BOUND + 1);
 	      if (tem == STRING_BYTES_BOUND + 1)
-		error ("String for %%s or %%S format is too long");
+		error ("String for %%s format is too long");
 	      width = strwidth (string, tem);
 	      goto doit1;
 
@@ -432,14 +475,12 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		string = charbuf;
 		string[tem] = 0;
 		width = strwidth (string, tem);
-		if (fmtcpy[1] != 'c')
-		  minlen = atoi (&fmtcpy[1]);
+		minlen = minusflag ? -wid : wid;
 		goto doit1;
 	      }
 
 	    case '%':
 	      /* Treat this '%' as normal.  */
-	      fmt0 = fmt - 1;
 	      break;
 	    }
 	}
@@ -450,13 +491,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	src = uLSQM, srclen = sizeof uLSQM - 1;
       else if (quoting_style == CURVE_QUOTING_STYLE && fmtchar == '\'')
 	src = uRSQM, srclen = sizeof uRSQM - 1;
-      else if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
-	src = "'", srclen = 1;
       else
 	{
-	  while (fmt < format_end && !CHAR_HEAD_P (*fmt))
-	    fmt++;
-	  src = fmt0, srclen = fmt - fmt0;
+	  if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
+	    fmtchar = '\'';
+	  eassert (ASCII_CHAR_P (fmtchar));
+	  *bufptr++ = fmtchar;
+	  continue;
 	}
 
       if (bufsize < srclen)
@@ -479,8 +520,6 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   xfree (big_buffer);
 
   *bufptr = 0;		/* Make sure our string ends with a '\0' */
-
-  SAFE_FREE ();
   return bufptr - buffer;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-10-15 20:06               ` Paul Eggert
@ 2020-10-17 18:32                 ` Eli Zaretskii
  2020-10-18  2:24                   ` Paul Eggert
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-10-17 18:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 43439

> Cc: 43439@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 15 Oct 2020 13:06:26 -0700
> 
> On 10/15/20 12:05 PM, Eli Zaretskii wrote:
> > I want to continue calling doprnt directly, not replace all its
> > calls with a call to another function.
> 
> OK, then attached is a patch that does things that way. This patch affects only 
> the implementation of doprnt; no doprnt callers are affected. The code should 
> have most of the proposed performance benefits of the earlier patches I proposed.

I'm sorry, but this is still nowhere near the compromise I suggested.
And it loses %S, something I didn't agree to.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-10-17 18:32                 ` Eli Zaretskii
@ 2020-10-18  2:24                   ` Paul Eggert
  2020-10-24 10:39                     ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Eggert @ 2020-10-18  2:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43439

[-- Attachment #1: Type: text/plain, Size: 520 bytes --]

On 10/17/20 11:32 AM, Eli Zaretskii wrote:
> this is still nowhere near the compromise I suggested.
> And it loses %S, something I didn't agree to.

Attached is a revised patch that implements what you suggested, and it does not 
lose %S.

Here is what you suggested in 
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=43439#14>:

> How about a compromise: we modify doprint to exit when either it finds
> NUL or reaches the character specified by FORMAT_END?

and the attached patch implements this proposed API change.

[-- Attachment #2: 0001-Improve-doprnt-performance.patch --]
[-- Type: text/x-patch, Size: 17945 bytes --]

From 43ecff27e57eb4ca6e340827315272c3159fcee7 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 17 Oct 2020 18:35:28 -0700
Subject: [PATCH] Improve doprnt performance

This patch implements some of my suggestions in Bug#8545,
with further changes suggested by Eli Zaretskii (Bug#43439).
* src/doprnt.c: Improve comments.
(SIZE_BOUND_EXTRA): Now at top level, for parse_format_integer.
(parse_format_integer): New static function, containing some of
the old doprnt.  Fix a bug that caused doprnt to infloop on
formats like "%10s" that Emacs does not use.  We could simplify
doprnt further if we dropped support for these never-used formats.
(doprnt_nul): New function.
(doprnt): Use it.  Change doprnt API to exit when either it finds NUL
or reaches the character specified by FORMAT_END.  In the typical case
where FORMAT_END is null, take just one pass over FORMAT, not two.
Assume C99 to make code clearer.  Do not use malloc or alloca to
allocate a copy of the format FMTCPY; instead, use a small fixed-size
array FMTSTAR, and use '*' in that array to represent width and
precision, passing them as separate int arguments.  Use eassume to
pacify GCC in switch statements.
---
 src/doprnt.c | 230 +++++++++++++++++++++++++++++----------------------
 1 file changed, 132 insertions(+), 98 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index ceadf3bdfa..07c4d8d797 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -28,6 +28,7 @@
    . For %s and %c, when field width is specified (e.g., %25s), it accounts for
      the display width of each character, according to char-width-table.  That
      is, it does not assume that each character takes one column on display.
+     Nor does it assume that each character is a single byte.
 
    . If the size of the buffer is not enough to produce the formatted string in
      its entirety, it makes sure that truncation does not chop the last
@@ -42,12 +43,14 @@
      Emacs can handle.
 
    OTOH, this function supports only a small subset of the standard C formatted
-   output facilities.  E.g., %u and %ll are not supported, and precision is
-   ignored %s and %c conversions.  (See below for the detailed documentation of
-   what is supported.)  However, this is okay, as this function is supposed to
-   be called from `error' and similar functions, and thus does not need to
-   support features beyond those in `Fformat_message', which is used
-   by `error' on the Lisp level.  */
+   output facilities.  E.g., %u is not supported, precision is ignored
+   in %s and %c conversions, and %lld does not necessarily work and
+   code should use something like %"pM"d with intmax_t instead.
+   (See below for the detailed documentation of what is supported.)
+   However, this is okay, as this function is supposed to be called
+   from 'error' and similar C functions, and thus does not need to
+   support all the features of 'Fformat_message', which is used by the
+   Lisp 'error' function.  */
 
 /* In the FORMAT argument this function supports ` and ' as directives
    that output left and right quotes as per ‘text-quoting style’.  It
@@ -61,19 +64,21 @@
    %e means print a `double' argument in exponential notation.
    %f means print a `double' argument in decimal-point notation.
    %g means print a `double' argument in exponential notation
-      or in decimal-point notation, whichever uses fewer characters.
+      or in decimal-point notation, depending on the value;
+      this is often (though not always) the shorter of the two notations.
    %c means print a `signed int' argument as a single character.
    %% means produce a literal % character.
 
-   A %-sequence may contain optional flag, width, and precision specifiers, and
-   a length modifier, as follows:
+   A %-sequence other than %% may contain optional flags, width, precision,
+   and length, as follows:
 
      %<flags><width><precision><length>character
 
    where flags is [+ -0], width is [0-9]+, precision is .[0-9]+, and length
    is empty or l or the value of the pD or pI or PRIdMAX (sans "d") macros.
-   Also, %% in a format stands for a single % in the output.  A % that
-   does not introduce a valid %-sequence causes undefined behavior.
+   A % that does not introduce a valid %-sequence causes undefined behavior.
+   ASCII bytes in FORMAT other than % are copied through as-is;
+   non-ASCII bytes should not appear in FORMAT.
 
    The + flag character inserts a + before any positive number, while a space
    inserts a space before any positive number; these flags only affect %d, %o,
@@ -99,7 +104,9 @@
 
    For %e, %f, and %g sequences, the number after the "." in the precision
    specifier says how many decimal places to show; if zero, the decimal point
-   itself is omitted.  For %s and %S, the precision specifier is ignored.  */
+   itself is omitted.  For %d, %o, and %x sequences, the precision specifies
+   the minimum number of digits to appear.  Precision specifiers are
+   not supported for other %-sequences.  */
 
 #include <config.h>
 #include <stdio.h>
@@ -115,7 +122,50 @@
    another macro.  */
 #include "character.h"
 
+/* Enough to handle floating point formats with large numbers.  */
+enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
+
+/* Parse FMT as an unsigned decimal integer, putting its value into *VALUE.
+   Return the address of the first byte after the integer.
+   If FMT is not an integer, return FMT and store zero into *VALUE.  */
+static char const *
+parse_format_integer (char const *fmt, int *value)
+{
+  int n = 0;
+  bool overflow = false;
+  for (; '0' <= *fmt && *fmt <= '9'; fmt++)
+    {
+      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
+      overflow |= INT_ADD_WRAPV (n, *fmt - '0', &n);
+    }
+  if (overflow || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
+    error ("Format width or precision too large");
+  *value = n;
+  return fmt;
+}
+
+/* Like doprnt, except FORMAT must not contain NUL bytes and
+   FORMAT_END must be non-null.  Although this function is never
+   exercised in current Emacs, it is retained in case some future
+   Emacs version contains doprnt callers that need such formats.
+   Having a separate function helps GCC optimize doprnt better.  */
+static ptrdiff_t
+doprnt_nul (char *buffer, ptrdiff_t bufsize, char const *format,
+	    char const *format_end, va_list ap)
+{
+  USE_SAFE_ALLOCA;
+  ptrdiff_t fmtlen = format_end - format;
+  char *fmt = SAFE_ALLOCA (fmtlen + 1);
+  memcpy (fmt, format, fmtlen);
+  fmt[fmtlen] = 0;
+  ptrdiff_t nbytes = doprnt (buffer, bufsize, fmt, NULL, ap);
+  SAFE_FREE ();
+  return nbytes;
+}
+
 /* Generate output from a format-spec FORMAT,
+   terminated at either the first NUL or (if FORMAT_END is non-null
+   and there are no NUL bytes between FORMAT and FORMAT_END)
    terminated at position FORMAT_END.
    (*FORMAT_END is not part of the format, but must exist and be readable.)
    Output goes in BUFFER, which has room for BUFSIZE chars.
@@ -131,12 +181,12 @@
 doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	const char *format_end, va_list ap)
 {
+  if (format_end && !memchr (format, 0, format_end - format))
+    return doprnt_nul (buffer, bufsize, format, format_end, ap);
+
   const char *fmt = format;	/* Pointer into format string.  */
   char *bufptr = buffer;	/* Pointer into output buffer.  */
 
-  /* Enough to handle floating point formats with large numbers.  */
-  enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
-
   /* Use this for sprintf unless we need something really big.  */
   char tembuf[SIZE_BOUND_EXTRA + 50];
 
@@ -150,103 +200,91 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   char *big_buffer = NULL;
 
   enum text_quoting_style quoting_style = text_quoting_style ();
-  ptrdiff_t tem = -1;
-  char *string;
-  char fixed_buffer[20];	/* Default buffer for small formatting. */
-  char *fmtcpy;
-  int minlen;
-  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
-  USE_SAFE_ALLOCA;
-
-  if (format_end == 0)
-    format_end = format + strlen (format);
-
-  fmtcpy = (format_end - format < sizeof (fixed_buffer) - 1
-	    ? fixed_buffer
-	    : SAFE_ALLOCA (format_end - format + 1));
 
   bufsize--;
 
   /* Loop until end of format string or buffer full. */
-  while (fmt < format_end && bufsize > 0)
+  while (*fmt && bufsize > 0)
     {
       char const *fmt0 = fmt;
       char fmtchar = *fmt++;
       if (fmtchar == '%')
 	{
-	  ptrdiff_t size_bound = 0;
 	  ptrdiff_t width;  /* Columns occupied by STRING on display.  */
 	  enum {
 	    pDlen = sizeof pD - 1,
 	    pIlen = sizeof pI - 1,
-	    pMlen = sizeof PRIdMAX - 2
+	    pMlen = sizeof PRIdMAX - 2,
+	    maxmlen = max (max (1, pDlen), max (pIlen, pMlen))
 	  };
 	  enum {
 	    no_modifier, long_modifier, pD_modifier, pI_modifier, pM_modifier
 	  } length_modifier = no_modifier;
 	  static char const modifier_len[] = { 0, 1, pDlen, pIlen, pMlen };
-	  int maxmlen = max (max (1, pDlen), max (pIlen, pMlen));
 	  int mlen;
+	  char charbuf[MAX_MULTIBYTE_LENGTH + 1];	/* Used for %c.  */
 
-	  /* Copy this one %-spec into fmtcpy.  */
-	  string = fmtcpy;
+	  /* Width and precision specified by this %-sequence.  */
+	  int wid = 0, prec = -1;
+
+	  /* FMTSTAR will be a "%*.*X"-like version of this %-sequence.
+	     Start by putting '%' into FMTSTAR.  */
+	  char fmtstar[sizeof "%-+ 0*.*d" + maxmlen];
+	  char *string = fmtstar;
 	  *string++ = '%';
-	  while (fmt < format_end)
+
+	  /* Copy at most one instance of each flag into FMTSTAR.  */
+	  bool minusflag = false, plusflag = false, zeroflag = false,
+	    spaceflag = false;
+	  for (;; fmt++)
 	    {
-	      *string++ = *fmt;
-	      if ('0' <= *fmt && *fmt <= '9')
+	      *string = *fmt;
+	      switch (*fmt)
 		{
-		  /* Get an idea of how much space we might need.
-		     This might be a field width or a precision; e.g.
-		     %1.1000f and %1000.1f both might need 1000+ bytes.
-		     Parse the width or precision, checking for overflow.  */
-		  int n = *fmt - '0';
-		  bool overflow = false;
-		  while (fmt + 1 < format_end
-			 && '0' <= fmt[1] && fmt[1] <= '9')
-		    {
-		      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
-		      overflow |= INT_ADD_WRAPV (n, fmt[1] - '0', &n);
-		      *string++ = *++fmt;
-		    }
-
-		  if (overflow
-		      || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
-		    error ("Format width or precision too large");
-		  if (size_bound < n)
-		    size_bound = n;
+		case '-': string += !minusflag; minusflag = true; continue;
+		case '+': string += !plusflag; plusflag = true; continue;
+		case ' ': string += !spaceflag; spaceflag = true; continue;
+		case '0': string += !zeroflag; zeroflag = true; continue;
 		}
-	      else if (! (*fmt == '-' || *fmt == ' ' || *fmt == '.'
-			  || *fmt == '+'))
-		break;
-	      fmt++;
+	      break;
 	    }
 
+	  /* Parse width and precision, putting "*.*" into FMTSTAR.  */
+	  if ('1' <= *fmt && *fmt <= '9')
+	    fmt = parse_format_integer (fmt, &wid);
+	  if (*fmt == '.')
+	    fmt = parse_format_integer (fmt + 1, &prec);
+	  *string++ = '*';
+	  *string++ = '.';
+	  *string++ = '*';
+
 	  /* Check for the length modifiers in textual length order, so
 	     that longer modifiers override shorter ones.  */
 	  for (mlen = 1; mlen <= maxmlen; mlen++)
 	    {
-	      if (format_end - fmt < mlen)
-		break;
 	      if (mlen == 1 && *fmt == 'l')
 		length_modifier = long_modifier;
-	      if (mlen == pDlen && memcmp (fmt, pD, pDlen) == 0)
+	      if (mlen == pDlen && strncmp (fmt, pD, pDlen) == 0)
 		length_modifier = pD_modifier;
-	      if (mlen == pIlen && memcmp (fmt, pI, pIlen) == 0)
+	      if (mlen == pIlen && strncmp (fmt, pI, pIlen) == 0)
 		length_modifier = pI_modifier;
-	      if (mlen == pMlen && memcmp (fmt, PRIdMAX, pMlen) == 0)
+	      if (mlen == pMlen && strncmp (fmt, PRIdMAX, pMlen) == 0)
 		length_modifier = pM_modifier;
 	    }
 
+	  /* Copy optional length modifier and conversion specifier
+	     character into FMTSTAR, and append a NUL.  */
 	  mlen = modifier_len[length_modifier];
-	  memcpy (string, fmt + 1, mlen);
-	  string += mlen;
+	  string = mempcpy (string, fmt, mlen + 1);
 	  fmt += mlen;
 	  *string = 0;
 
-	  /* Make the size bound large enough to handle floating point formats
+	  /* An idea of how much space we might need.
+	     This might be a field width or a precision; e.g.
+	     %1.1000f and %1000.1f both might need 1000+ bytes.
+	     Make it large enough to handle floating point formats
 	     with large numbers.  */
-	  size_bound += SIZE_BOUND_EXTRA;
+	  ptrdiff_t size_bound = max (wid, prec) + SIZE_BOUND_EXTRA;
 
 	  /* Make sure we have that much.  */
 	  if (size_bound > size_allocated)
@@ -257,48 +295,49 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	      sprintf_buffer = big_buffer;
 	      size_allocated = size_bound;
 	    }
-	  minlen = 0;
+	  int minlen = 0;
+	  ptrdiff_t tem;
 	  switch (*fmt++)
 	    {
 	    default:
-	      error ("Invalid format operation %s", fmtcpy);
+	      error ("Invalid format operation %s", fmt0);
 
-/*	    case 'b': */
-	    case 'l':
 	    case 'd':
 	      switch (length_modifier)
 		{
 		case no_modifier:
 		  {
 		    int v = va_arg (ap, int);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    long v = va_arg (ap, long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
 		signed_pD_modifier:
 		  {
 		    ptrdiff_t v = va_arg (ap, ptrdiff_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pI_modifier:
 		  {
 		    EMACS_INT v = va_arg (ap, EMACS_INT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    intmax_t v = va_arg (ap, intmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -311,13 +350,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case no_modifier:
 		  {
 		    unsigned v = va_arg (ap, unsigned);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case long_modifier:
 		  {
 		    unsigned long v = va_arg (ap, unsigned long);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pD_modifier:
@@ -325,15 +364,17 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		case pI_modifier:
 		  {
 		    EMACS_UINT v = va_arg (ap, EMACS_UINT);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
 		case pM_modifier:
 		  {
 		    uintmax_t v = va_arg (ap, uintmax_t);
-		    tem = sprintf (sprintf_buffer, fmtcpy, v);
+		    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
 		  }
 		  break;
+		default:
+		  eassume (false);
 		}
 	      /* Now copy into final output, truncating as necessary.  */
 	      string = sprintf_buffer;
@@ -344,18 +385,15 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	    case 'g':
 	      {
 		double d = va_arg (ap, double);
-		tem = sprintf (sprintf_buffer, fmtcpy, d);
+		tem = sprintf (sprintf_buffer, fmtstar, wid, prec, d);
 		/* Now copy into final output, truncating as necessary.  */
 		string = sprintf_buffer;
 		goto doit;
 	      }
 
 	    case 'S':
-	      string[-1] = 's';
-	      FALLTHROUGH;
 	    case 's':
-	      if (fmtcpy[1] != 's')
-		minlen = atoi (&fmtcpy[1]);
+	      minlen = minusflag ? -wid : wid;
 	      string = va_arg (ap, char *);
 	      tem = strnlen (string, STRING_BYTES_BOUND + 1);
 	      if (tem == STRING_BYTES_BOUND + 1)
@@ -432,14 +470,12 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 		string = charbuf;
 		string[tem] = 0;
 		width = strwidth (string, tem);
-		if (fmtcpy[1] != 'c')
-		  minlen = atoi (&fmtcpy[1]);
+		minlen = minusflag ? -wid : wid;
 		goto doit1;
 	      }
 
 	    case '%':
 	      /* Treat this '%' as normal.  */
-	      fmt0 = fmt - 1;
 	      break;
 	    }
 	}
@@ -450,13 +486,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	src = uLSQM, srclen = sizeof uLSQM - 1;
       else if (quoting_style == CURVE_QUOTING_STYLE && fmtchar == '\'')
 	src = uRSQM, srclen = sizeof uRSQM - 1;
-      else if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
-	src = "'", srclen = 1;
       else
 	{
-	  while (fmt < format_end && !CHAR_HEAD_P (*fmt))
-	    fmt++;
-	  src = fmt0, srclen = fmt - fmt0;
+	  if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
+	    fmtchar = '\'';
+	  eassert (ASCII_CHAR_P (fmtchar));
+	  *bufptr++ = fmtchar;
+	  continue;
 	}
 
       if (bufsize < srclen)
@@ -479,8 +515,6 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   xfree (big_buffer);
 
   *bufptr = 0;		/* Make sure our string ends with a '\0' */
-
-  SAFE_FREE ();
   return bufptr - buffer;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-10-18  2:24                   ` Paul Eggert
@ 2020-10-24 10:39                     ` Eli Zaretskii
  2020-10-24 21:02                       ` Paul Eggert
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-10-24 10:39 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 43439

> Cc: 43439@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 17 Oct 2020 19:24:04 -0700
> 
> On 10/17/20 11:32 AM, Eli Zaretskii wrote:
> > this is still nowhere near the compromise I suggested.
> > And it loses %S, something I didn't agree to.
> 
> Attached is a revised patch that implements what you suggested, and it does not 
> lose %S.
> 
> Here is what you suggested in 
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=43439#14>:
> 
> > How about a compromise: we modify doprint to exit when either it finds
> > NUL or reaches the character specified by FORMAT_END?
> 
> and the attached patch implements this proposed API change.

Thanks, I'm okay with installing this on master.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#43439: [PATCH] doprnt improvements
  2020-10-24 10:39                     ` Eli Zaretskii
@ 2020-10-24 21:02                       ` Paul Eggert
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Eggert @ 2020-10-24 21:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43439-done

[-- Attachment #1: Type: text/plain, Size: 93 bytes --]

OK, I installed the patch, followed by the attached minor cleanups. Closing the 
bug report.

[-- Attachment #2: 0001-Rename-doprnt_nul-to-doprnt_non_null_end.patch --]
[-- Type: text/x-patch, Size: 1534 bytes --]

From 28d2931b4bc934d06f449c01e067258d76a16738 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 24 Oct 2020 13:46:46 -0700
Subject: [PATCH 1/2] Rename doprnt_nul to doprnt_non_null_end

* src/doprnt.c (doprnt_non_null_end): Rename from doprnt_nul,
as the old name was misleading (left over from a previous proposal).
Caller changed.
---
 src/doprnt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index 07c4d8d797..be256f4497 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -150,8 +150,8 @@ parse_format_integer (char const *fmt, int *value)
    Emacs version contains doprnt callers that need such formats.
    Having a separate function helps GCC optimize doprnt better.  */
 static ptrdiff_t
-doprnt_nul (char *buffer, ptrdiff_t bufsize, char const *format,
-	    char const *format_end, va_list ap)
+doprnt_non_null_end (char *buffer, ptrdiff_t bufsize, char const *format,
+		     char const *format_end, va_list ap)
 {
   USE_SAFE_ALLOCA;
   ptrdiff_t fmtlen = format_end - format;
@@ -182,7 +182,7 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	const char *format_end, va_list ap)
 {
   if (format_end && !memchr (format, 0, format_end - format))
-    return doprnt_nul (buffer, bufsize, format, format_end, ap);
+    return doprnt_non_null_end (buffer, bufsize, format, format_end, ap);
 
   const char *fmt = format;	/* Pointer into format string.  */
   char *bufptr = buffer;	/* Pointer into output buffer.  */
-- 
2.25.1


[-- Attachment #3: 0002-Minor-doprnt-cleanup-remove-memchr-call.patch --]
[-- Type: text/x-patch, Size: 1640 bytes --]

From 32e427cca112f5471356c1fa95ba1ed256d200b6 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 24 Oct 2020 13:50:29 -0700
Subject: [PATCH 2/2] Minor doprnt cleanup: remove memchr call

* src/doprnt.c (doprnt): Remove unnecessary call to memchr.
---
 src/doprnt.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index be256f4497..ce259d07cf 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -144,10 +144,10 @@ parse_format_integer (char const *fmt, int *value)
   return fmt;
 }
 
-/* Like doprnt, except FORMAT must not contain NUL bytes and
-   FORMAT_END must be non-null.  Although this function is never
-   exercised in current Emacs, it is retained in case some future
-   Emacs version contains doprnt callers that need such formats.
+/* Like doprnt, except FORMAT_END must be non-null.
+   Although this function is never exercised in current Emacs,
+   it is retained in case some future Emacs version
+   contains doprnt callers that need such formats.
    Having a separate function helps GCC optimize doprnt better.  */
 static ptrdiff_t
 doprnt_non_null_end (char *buffer, ptrdiff_t bufsize, char const *format,
@@ -181,7 +181,7 @@ doprnt_non_null_end (char *buffer, ptrdiff_t bufsize, char const *format,
 doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 	const char *format_end, va_list ap)
 {
-  if (format_end && !memchr (format, 0, format_end - format))
+  if (format_end)
     return doprnt_non_null_end (buffer, bufsize, format, format_end, ap);
 
   const char *fmt = format;	/* Pointer into format string.  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-10-24 21:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-16  1:50 bug#43439: [PATCH] doprnt improvements Paul Eggert
2020-09-16 14:58 ` Eli Zaretskii
2020-09-16 22:09   ` Paul Eggert
2020-09-18  7:30     ` Eli Zaretskii
2020-10-15 17:58       ` Paul Eggert
2020-10-15 18:12         ` Eli Zaretskii
2020-10-15 18:50           ` Paul Eggert
2020-10-15 19:05             ` Eli Zaretskii
2020-10-15 20:06               ` Paul Eggert
2020-10-17 18:32                 ` Eli Zaretskii
2020-10-18  2:24                   ` Paul Eggert
2020-10-24 10:39                     ` Eli Zaretskii
2020-10-24 21:02                       ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).