From: Paul Eggert <eggert@cs.ucla.edu>
To: 43439@debbugs.gnu.org
Cc: Paul Eggert <eggert@cs.ucla.edu>
Subject: bug#43439: [PATCH] doprnt improvements
Date: Tue, 15 Sep 2020 18:50:51 -0700 [thread overview]
Message-ID: <20200916015051.20517-1-eggert@cs.ucla.edu> (raw)
Improve doprnt performance, internal checking, and internal
documentation. On my platform (Ubuntu 18.04.5 x86-64), this
improved CPU speed of ‘make -C lisp compile-always’ by 6%.
This patch implements some of my suggestions in Bug#8545,
with further changes suggested by Eli Zaretskii.
* src/doprnt.c: Improve comments.
(SIZE_BOUND_EXTRA): Now at top level, for parse_format_integer.
(parse_format_integer): New static function, containing some of
the old doprint. Fix a bug that caused doprnt to infloop on
formats like "%10s" that Emacs does not use. We could simplify
doprnt further if we dropped support for these never-used formats.
(doprnt): Omit FORMAT_END argument, since it’s always NULL,
which means doprnt must call strlen on FORMAT; doing this means
doprnt needs just one pass over FORMAT, not two. All callers changed.
Assume C99 to make code clearer. Do not use malloc or alloca
to allocate a copy of the format FMTCPY; instead, use a small
fixed-size array FMTSTAR, and use '*' in that array to represent
width and precision, passing them as separate int arguments.
Use eassume to pacify GCC in switch statements. Drop support for
"%S" which is never used and which would cause GCC to warn anyway.
* src/lisp.h (doprnt): Give it ATTRIBUTE_FORMAT_PRINTF (3, 0),
since GCC can grok doprnt's new API.
---
src/doprnt.c | 223 ++++++++++++++++++++++++++-------------------------
src/lisp.h | 4 +-
src/sysdep.c | 2 +-
src/xdisp.c | 5 +-
4 files changed, 117 insertions(+), 117 deletions(-)
diff --git a/src/doprnt.c b/src/doprnt.c
index b0ba12552b..f154578c0d 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -28,6 +28,7 @@
. For %s and %c, when field width is specified (e.g., %25s), it accounts for
the display width of each character, according to char-width-table. That
is, it does not assume that each character takes one column on display.
+ Nor does it assume that each character is a single byte.
. If the size of the buffer is not enough to produce the formatted string in
its entirety, it makes sure that truncation does not chop the last
@@ -42,38 +43,41 @@
Emacs can handle.
OTOH, this function supports only a small subset of the standard C formatted
- output facilities. E.g., %u and %ll are not supported, and precision is
- ignored %s and %c conversions. (See below for the detailed documentation of
- what is supported.) However, this is okay, as this function is supposed to
- be called from `error' and similar functions, and thus does not need to
- support features beyond those in `Fformat_message', which is used
- by `error' on the Lisp level. */
+ output facilities. E.g., %u is not supported, precision is ignored
+ in %s and %c conversions, and although %lld often works it is not
+ supported and code should use something like %"pM"d with intmax_t instead.
+ (See below for the detailed documentation of what is supported.)
+ However, this is okay, as this function is supposed to be called
+ from 'error' and similar C functions, and thus does not need to
+ support all the features of 'Fformat_message', which is used by the
+ Lisp 'error' function. */
/* In the FORMAT argument this function supports ` and ' as directives
that output left and right quotes as per ‘text-quoting style’. It
also supports the following %-sequences:
%s means print a string argument.
- %S is treated as %s, for loose compatibility with `Fformat_message'.
%d means print a `signed int' argument in decimal.
%o means print an `unsigned int' argument in octal.
%x means print an `unsigned int' argument in hex.
%e means print a `double' argument in exponential notation.
%f means print a `double' argument in decimal-point notation.
%g means print a `double' argument in exponential notation
- or in decimal-point notation, whichever uses fewer characters.
+ or in decimal-point notation, depending on the value;
+ this is often (though not always) the shorter of the two notations
%c means print a `signed int' argument as a single character.
%% means produce a literal % character.
- A %-sequence may contain optional flag, width, and precision specifiers, and
- a length modifier, as follows:
+ A %-sequence other than %% may contain optional flags, width, precision,
+ and length, as follows:
%<flags><width><precision><length>character
where flags is [+ -0], width is [0-9]+, precision is .[0-9]+, and length
is empty or l or the value of the pD or pI or PRIdMAX (sans "d") macros.
- Also, %% in a format stands for a single % in the output. A % that
- does not introduce a valid %-sequence causes undefined behavior.
+ A % that does not introduce a valid %-sequence causes undefined behavior.
+ ASCII bytes in FORMAT other than % are copied through as-is;
+ non-ASCII bytes should not appear in FORMAT.
The + flag character inserts a + before any positive number, while a space
inserts a space before any positive number; these flags only affect %d, %o,
@@ -99,7 +103,9 @@
For %e, %f, and %g sequences, the number after the "." in the precision
specifier says how many decimal places to show; if zero, the decimal point
- itself is omitted. For %s and %S, the precision specifier is ignored. */
+ itself is omitted. For %d, %o, and %x sequences, the precision specifies
+ the minimum number of digits to appear. Precision specifiers are
+ not supported for other %-sequences. */
#include <config.h>
#include <stdio.h>
@@ -115,9 +121,29 @@
another macro. */
#include "character.h"
-/* Generate output from a format-spec FORMAT,
- terminated at position FORMAT_END.
- (*FORMAT_END is not part of the format, but must exist and be readable.)
+/* Enough to handle floating point formats with large numbers. */
+enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
+
+/* Parse FMT as an unsigned decimal integer, putting its value into *VALUE.
+ Return the address of the first byte after the integer.
+ If FMT is not an integer, return FMT and store zero into *VALUE. */
+static char const *
+parse_format_integer (char const *fmt, int *value)
+{
+ int n = 0;
+ bool overflow = false;
+ for (; '0' <= *fmt && *fmt <= '9'; fmt++)
+ {
+ overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
+ overflow |= INT_ADD_WRAPV (n, *fmt - '0', &n);
+ }
+ if (overflow || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
+ error ("Format width or precision too large");
+ *value = n;
+ return fmt;
+}
+
+/* Generate output from a format-spec FORMAT.
Output goes in BUFFER, which has room for BUFSIZE chars.
BUFSIZE must be positive. If the output does not fit, truncate it
to fit and return BUFSIZE - 1; if this truncates a multibyte
@@ -128,15 +154,11 @@
Integers are passed as C integers. */
ptrdiff_t
-doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
- const char *format_end, va_list ap)
+doprnt (char *buffer, ptrdiff_t bufsize, const char *format, va_list ap)
{
const char *fmt = format; /* Pointer into format string. */
char *bufptr = buffer; /* Pointer into output buffer. */
- /* Enough to handle floating point formats with large numbers. */
- enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
-
/* Use this for sprintf unless we need something really big. */
char tembuf[SIZE_BOUND_EXTRA + 50];
@@ -150,103 +172,91 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
char *big_buffer = NULL;
enum text_quoting_style quoting_style = text_quoting_style ();
- ptrdiff_t tem = -1;
- char *string;
- char fixed_buffer[20]; /* Default buffer for small formatting. */
- char *fmtcpy;
- int minlen;
- char charbuf[MAX_MULTIBYTE_LENGTH + 1]; /* Used for %c. */
- USE_SAFE_ALLOCA;
-
- if (format_end == 0)
- format_end = format + strlen (format);
-
- fmtcpy = (format_end - format < sizeof (fixed_buffer) - 1
- ? fixed_buffer
- : SAFE_ALLOCA (format_end - format + 1));
bufsize--;
/* Loop until end of format string or buffer full. */
- while (fmt < format_end && bufsize > 0)
+ while (*fmt && bufsize > 0)
{
char const *fmt0 = fmt;
char fmtchar = *fmt++;
if (fmtchar == '%')
{
- ptrdiff_t size_bound = 0;
ptrdiff_t width; /* Columns occupied by STRING on display. */
enum {
pDlen = sizeof pD - 1,
pIlen = sizeof pI - 1,
- pMlen = sizeof PRIdMAX - 2
+ pMlen = sizeof PRIdMAX - 2,
+ maxmlen = max (max (1, pDlen), max (pIlen, pMlen))
};
enum {
no_modifier, long_modifier, pD_modifier, pI_modifier, pM_modifier
} length_modifier = no_modifier;
static char const modifier_len[] = { 0, 1, pDlen, pIlen, pMlen };
- int maxmlen = max (max (1, pDlen), max (pIlen, pMlen));
int mlen;
+ char charbuf[MAX_MULTIBYTE_LENGTH + 1]; /* Used for %c. */
- /* Copy this one %-spec into fmtcpy. */
- string = fmtcpy;
+ /* Width and precision specified by this %-sequence. */
+ int wid = 0, prec = -1;
+
+ /* FMTSTAR will be a "%*.*X"-like version of this %-sequence.
+ Start by putting '%' into FMTSTAR. */
+ char fmtstar[sizeof "%-+ 0*.*d" + maxmlen];
+ char *string = fmtstar;
*string++ = '%';
- while (fmt < format_end)
+
+ /* Copy at most one instance of each flag into FMTSTAR. */
+ bool minusflag = false, plusflag = false, zeroflag = false,
+ spaceflag = false;
+ for (;; fmt++)
{
- *string++ = *fmt;
- if ('0' <= *fmt && *fmt <= '9')
+ *string = *fmt;
+ switch (*fmt)
{
- /* Get an idea of how much space we might need.
- This might be a field width or a precision; e.g.
- %1.1000f and %1000.1f both might need 1000+ bytes.
- Parse the width or precision, checking for overflow. */
- int n = *fmt - '0';
- bool overflow = false;
- while (fmt + 1 < format_end
- && '0' <= fmt[1] && fmt[1] <= '9')
- {
- overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
- overflow |= INT_ADD_WRAPV (n, fmt[1] - '0', &n);
- *string++ = *++fmt;
- }
-
- if (overflow
- || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
- error ("Format width or precision too large");
- if (size_bound < n)
- size_bound = n;
+ case '-': string += !minusflag; minusflag = true; continue;
+ case '+': string += !plusflag; plusflag = true; continue;
+ case ' ': string += !spaceflag; spaceflag = true; continue;
+ case '0': string += !zeroflag; zeroflag = true; continue;
}
- else if (! (*fmt == '-' || *fmt == ' ' || *fmt == '.'
- || *fmt == '+'))
- break;
- fmt++;
+ break;
}
+ /* Parse width and precision, putting "*.*" into FMTSTAR. */
+ if ('1' <= *fmt && *fmt <= '9')
+ fmt = parse_format_integer (fmt, &wid);
+ if (*fmt == '.')
+ fmt = parse_format_integer (fmt + 1, &prec);
+ *string++ = '*';
+ *string++ = '.';
+ *string++ = '*';
+
/* Check for the length modifiers in textual length order, so
that longer modifiers override shorter ones. */
for (mlen = 1; mlen <= maxmlen; mlen++)
{
- if (format_end - fmt < mlen)
- break;
if (mlen == 1 && *fmt == 'l')
length_modifier = long_modifier;
- if (mlen == pDlen && memcmp (fmt, pD, pDlen) == 0)
+ if (mlen == pDlen && strncmp (fmt, pD, pDlen) == 0)
length_modifier = pD_modifier;
- if (mlen == pIlen && memcmp (fmt, pI, pIlen) == 0)
+ if (mlen == pIlen && strncmp (fmt, pI, pIlen) == 0)
length_modifier = pI_modifier;
- if (mlen == pMlen && memcmp (fmt, PRIdMAX, pMlen) == 0)
+ if (mlen == pMlen && strncmp (fmt, PRIdMAX, pMlen) == 0)
length_modifier = pM_modifier;
}
+ /* Copy optional length modifier and conversion specifier
+ character into FMTSTAR, and append a NUL. */
mlen = modifier_len[length_modifier];
- memcpy (string, fmt + 1, mlen);
- string += mlen;
+ string = mempcpy (string, fmt, mlen + 1);
fmt += mlen;
*string = 0;
- /* Make the size bound large enough to handle floating point formats
+ /* An idea of how much space we might need.
+ This might be a field width or a precision; e.g.
+ %1.1000f and %1000.1f both might need 1000+ bytes.
+ Make it large enough to handle floating point formats
with large numbers. */
- size_bound += SIZE_BOUND_EXTRA;
+ ptrdiff_t size_bound = max (wid, prec) + SIZE_BOUND_EXTRA;
/* Make sure we have that much. */
if (size_bound > size_allocated)
@@ -257,48 +267,49 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
sprintf_buffer = big_buffer;
size_allocated = size_bound;
}
- minlen = 0;
+ int minlen = 0;
+ ptrdiff_t tem;
switch (*fmt++)
{
default:
- error ("Invalid format operation %s", fmtcpy);
+ error ("Invalid format operation %s", fmt0);
-/* case 'b': */
- case 'l':
case 'd':
switch (length_modifier)
{
case no_modifier:
{
int v = va_arg (ap, int);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
case long_modifier:
{
long v = va_arg (ap, long);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
case pD_modifier:
signed_pD_modifier:
{
ptrdiff_t v = va_arg (ap, ptrdiff_t);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
case pI_modifier:
{
EMACS_INT v = va_arg (ap, EMACS_INT);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
case pM_modifier:
{
intmax_t v = va_arg (ap, intmax_t);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
+ default:
+ eassume (false);
}
/* Now copy into final output, truncating as necessary. */
string = sprintf_buffer;
@@ -311,13 +322,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
case no_modifier:
{
unsigned v = va_arg (ap, unsigned);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
case long_modifier:
{
unsigned long v = va_arg (ap, unsigned long);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
case pD_modifier:
@@ -325,15 +336,17 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
case pI_modifier:
{
EMACS_UINT v = va_arg (ap, EMACS_UINT);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
case pM_modifier:
{
uintmax_t v = va_arg (ap, uintmax_t);
- tem = sprintf (sprintf_buffer, fmtcpy, v);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
}
break;
+ default:
+ eassume (false);
}
/* Now copy into final output, truncating as necessary. */
string = sprintf_buffer;
@@ -344,22 +357,18 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
case 'g':
{
double d = va_arg (ap, double);
- tem = sprintf (sprintf_buffer, fmtcpy, d);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, d);
/* Now copy into final output, truncating as necessary. */
string = sprintf_buffer;
goto doit;
}
- case 'S':
- string[-1] = 's';
- FALLTHROUGH;
case 's':
- if (fmtcpy[1] != 's')
- minlen = atoi (&fmtcpy[1]);
+ minlen = minusflag ? -wid : wid;
string = va_arg (ap, char *);
tem = strnlen (string, STRING_BYTES_BOUND + 1);
if (tem == STRING_BYTES_BOUND + 1)
- error ("String for %%s or %%S format is too long");
+ error ("String for %%s format is too long");
width = strwidth (string, tem);
goto doit1;
@@ -432,14 +441,12 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
string = charbuf;
string[tem] = 0;
width = strwidth (string, tem);
- if (fmtcpy[1] != 'c')
- minlen = atoi (&fmtcpy[1]);
+ minlen = minusflag ? -wid : wid;
goto doit1;
}
case '%':
/* Treat this '%' as normal. */
- fmt0 = fmt - 1;
break;
}
}
@@ -450,13 +457,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
src = uLSQM, srclen = sizeof uLSQM - 1;
else if (quoting_style == CURVE_QUOTING_STYLE && fmtchar == '\'')
src = uRSQM, srclen = sizeof uRSQM - 1;
- else if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
- src = "'", srclen = 1;
else
{
- while (fmt < format_end && !CHAR_HEAD_P (*fmt))
- fmt++;
- src = fmt0, srclen = fmt - fmt0;
+ if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
+ fmtchar = '\'';
+ eassert (ASCII_CHAR_P (fmtchar));
+ *bufptr++ = fmtchar;
+ continue;
}
if (bufsize < srclen)
@@ -479,8 +486,6 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
xfree (big_buffer);
*bufptr = 0; /* Make sure our string ends with a '\0' */
-
- SAFE_FREE ();
return bufptr - buffer;
}
@@ -495,10 +500,9 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
ptrdiff_t
esprintf (char *buf, char const *format, ...)
{
- ptrdiff_t nbytes;
va_list ap;
va_start (ap, format);
- nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, 0, ap);
+ ptrdiff_t nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, ap);
va_end (ap);
return nbytes;
}
@@ -534,10 +538,9 @@ evxprintf (char **buf, ptrdiff_t *bufsize,
{
for (;;)
{
- ptrdiff_t nbytes;
va_list ap_copy;
va_copy (ap_copy, ap);
- nbytes = doprnt (*buf, *bufsize, format, 0, ap_copy);
+ ptrdiff_t nbytes = doprnt (*buf, *bufsize, format, ap_copy);
va_end (ap_copy);
if (nbytes < *bufsize - 1)
return nbytes;
diff --git a/src/lisp.h b/src/lisp.h
index a24898004d..957ca41702 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -4034,8 +4034,8 @@ #define FLOAT_TO_STRING_BUFSIZE 350
extern void syms_of_print (void);
/* Defined in doprnt.c. */
-extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, const char *,
- va_list);
+extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, va_list)
+ ATTRIBUTE_FORMAT_PRINTF (3, 0);
extern ptrdiff_t esprintf (char *, char const *, ...)
ATTRIBUTE_FORMAT_PRINTF (2, 3);
extern ptrdiff_t exprintf (char **, ptrdiff_t *, char const *, ptrdiff_t,
diff --git a/src/sysdep.c b/src/sysdep.c
index e161172a79..790ae084d3 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -2192,7 +2192,7 @@ snprintf (char *buf, size_t bufsize, char const *format, ...)
if (size)
{
va_start (ap, format);
- nbytes = doprnt (buf, size, format, 0, ap);
+ nbytes = doprnt (buf, size, format, ap);
va_end (ap);
}
diff --git a/src/xdisp.c b/src/xdisp.c
index 615f0ca7cf..213c2a464a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -11269,13 +11269,10 @@ vmessage (const char *m, va_list ap)
{
if (m)
{
- ptrdiff_t len;
ptrdiff_t maxsize = FRAME_MESSAGE_BUF_SIZE (f);
USE_SAFE_ALLOCA;
char *message_buf = SAFE_ALLOCA (maxsize + 1);
-
- len = doprnt (message_buf, maxsize, m, 0, ap);
-
+ ptrdiff_t len = doprnt (message_buf, maxsize, m, ap);
message3 (make_string (message_buf, len));
SAFE_FREE ();
}
--
2.17.1
next reply other threads:[~2020-09-16 1:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-16 1:50 Paul Eggert [this message]
2020-09-16 14:58 ` bug#43439: [PATCH] doprnt improvements Eli Zaretskii
2020-09-16 22:09 ` Paul Eggert
2020-09-18 7:30 ` Eli Zaretskii
2020-10-15 17:58 ` Paul Eggert
2020-10-15 18:12 ` Eli Zaretskii
2020-10-15 18:50 ` Paul Eggert
2020-10-15 19:05 ` Eli Zaretskii
2020-10-15 20:06 ` Paul Eggert
2020-10-17 18:32 ` Eli Zaretskii
2020-10-18 2:24 ` Paul Eggert
2020-10-24 10:39 ` Eli Zaretskii
2020-10-24 21:02 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200916015051.20517-1-eggert@cs.ucla.edu \
--to=eggert@cs.ucla.edu \
--cc=43439@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).