From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: i18n/l10n summary Date: Sun, 4 Jun 2017 08:54:31 -0700 Organization: UCLA Computer Science Department Message-ID: <0aec575b-6cbe-bc87-2d54-7de95be5c080@cs.ucla.edu> References: <3931675f-e26c-2f14-b229-61f518816fff@cs.ucla.edu> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------3783E2CDF2245EA1C2BDABD3" X-Trace: blaine.gmane.org 1496591982 23923 195.159.176.226 (4 Jun 2017 15:59:42 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 4 Jun 2017 15:59:42 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 To: Philipp Stephani , Jean-Christophe Helary , emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jun 04 17:59:34 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dHXwE-0005ck-5h for ged-emacs-devel@m.gmane.org; Sun, 04 Jun 2017 17:59:34 +0200 Original-Received: from localhost ([::1]:57415 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dHXwG-0004ZU-45 for ged-emacs-devel@m.gmane.org; Sun, 04 Jun 2017 11:59:36 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33322) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dHXrW-0000DI-45 for emacs-devel@gnu.org; Sun, 04 Jun 2017 11:54:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dHXrS-00079B-T2 for emacs-devel@gnu.org; Sun, 04 Jun 2017 11:54:42 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:47234) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dHXrS-000774-Ew for emacs-devel@gnu.org; Sun, 04 Jun 2017 11:54:38 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 372A21600E3; Sun, 4 Jun 2017 08:54:34 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id GNj9-dZZqiyb; Sun, 4 Jun 2017 08:54:32 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 889591600E4; Sun, 4 Jun 2017 08:54:32 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id LA8Kkw_svpH6; Sun, 4 Jun 2017 08:54:32 -0700 (PDT) Original-Received: from [192.168.1.9] (unknown [47.153.188.248]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 502511600E3; Sun, 4 Jun 2017 08:54:32 -0700 (PDT) In-Reply-To: Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:215442 Archived-At: This is a multi-part message in MIME format. --------------3783E2CDF2245EA1C2BDABD3 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Philipp Stephani wrote: > It's not hard to fix, Thanks. However, on my platform those fixes slowed down a microbenchmark = (format=20 "%d %s %c" 3 "def" ?=E2=80=98) by 7%, so I installed the attached further= patch, which=20 recovered the performance loss for me. It avoids the need for the extra p= ass=20 through the format string and it caches more quantities in registers. Whi= le I=20 was at it I changed the doc to agree with POSIX that %% should not have=20 modifiers (not that we enforce this). One little thing that has tripped me up in the past. In Emacs the preferr= ed=20 style is typically to break assignments and initializations before the "=3D= ", not=20 after. Like this: int some_long_name =3D some_long_expression; I myself prefer the style you used, as it's more column-efficient and it=20 simplifies text searches for assignments to a variable, but there it is. --------------3783E2CDF2245EA1C2BDABD3 Content-Type: text/x-patch; name="0001-Tune-format-after-recent-fix.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-Tune-format-after-recent-fix.patch" =46rom d5fcf9e458053af1c50f0d4dad4c59db056790e5 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 4 Jun 2017 08:39:37 -0700 Subject: [PATCH] =3D?UTF-8?q?Tune=3D20=3DE2=3D80=3D98format=3DE2=3D80=3D9= 9=3D20after=3D20recen?=3D =3D?UTF-8?q?t=3D20fix?=3D MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit * doc/lispref/strings.texi (Formatting Strings): * src/editfns.c (Fformat): Format field numbers no longer need to be unique, reverting the previous doc change since that has now been fixed. Also, document that %% should not have modifiers. * src/editfns.c (styled_format): Improve performance. Remove the need for the new prepass over the format string, by using a typically-more-generous bound for the info array size. Initialize the info array lazily. Move string inspection to the same area to help caching. Avoid the need for a converted_to_string bitfield by using EQ. Cache arg in a local and avoid some potential aliasing issues to help the compiler. Info array is now 0-origin, not 1-origin. --- doc/lispref/strings.texi | 13 ++-- src/editfns.c | 154 ++++++++++++++++++++---------------------= ------ 2 files changed, 72 insertions(+), 95 deletions(-) diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index f365c80..23961f9 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -926,7 +926,8 @@ Formatting Strings =20 @item %% Replace the specification with a single @samp{%}. This format -specification is unusual in that it does not use a value. For example, +specification is unusual in that its only form is plain +@samp{%%} and that it does not use a value. For example, @code{(format "%% %d" 30)} returns @code{"% 30"}. @end table =20 @@ -965,10 +966,9 @@ Formatting Strings decimal number immediately after the initial @samp{%}, followed by a literal dollar sign @samp{$}. It causes the format specification to convert the argument with the given number instead of the next -argument. Field numbers start at 1. A field number should differ -from the other field numbers in the same format. A format can contain -either numbered or unnumbered format specifications but not both, -except that @samp{%%} can be mixed with numbered specifications. +argument. Field numbers start at 1. A format can contain either +numbered or unnumbered format specifications but not both, except that +@samp{%%} can be mixed with numbered specifications. =20 @example (format "%2$s, %3$s, %%, %1$s" "x" "y" "z") @@ -1026,8 +1026,7 @@ Formatting Strings A specification can have a @dfn{width}, which is a decimal number that appears after any field number and flags. If the printed representation of the object contains fewer characters than this -width, @code{format} extends it with padding. The width is -ignored for the @samp{%%} specification. Any padding introduced by +width, @code{format} extends it with padding. Any padding introduced by= the width normally consists of spaces inserted on the left: =20 @example diff --git a/src/editfns.c b/src/editfns.c index 56aa8ce..43b17f9 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -3891,8 +3891,8 @@ the next available argument, or the argument explic= itly specified: The argument used for %d, %o, %x, %e, %f, %g or %c must be a number. Use %% to put a single % into the output. =20 -A %-sequence may contain optional field number, flag, width, and -precision specifiers, as follows: +A %-sequence other than %% may contain optional field number, flag, +width, and precision specifiers, as follows: =20 %character =20 @@ -3901,10 +3901,9 @@ where field is [0-9]+ followed by a literal dollar= "$", flags is followed by [0-9]+. =20 If a %-sequence is numbered with a field with positive value N, the -Nth argument is substituted instead of the next one. A field number -should differ from the other field numbers in the same format. A -format can contain either numbered or unnumbered %-sequences but not -both, except that %% can be mixed with numbered %-sequences. +Nth argument is substituted instead of the next one. A format can +contain either numbered or unnumbered %-sequences but not both, except +that %% can be mixed with numbered %-sequences. =20 The + flag character inserts a + before any positive number, while a space inserts a space before any positive number; these flags only @@ -3980,49 +3979,40 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) bool arg_intervals =3D false; USE_SAFE_ALLOCA; =20 - /* Each element records, for one field, - the corresponding argument, - the start and end bytepos in the output string, - whether the argument has been converted to string (e.g., due to "%S= "), - and whether the argument is a string with intervals. */ + /* Information recorded for each format spec. */ struct info { + /* The corresponding argument, converted to string if conversion + was needed. */ Lisp_Object argument; + + /* The start and end bytepos in the output string. */ ptrdiff_t start, end; - bool_bf converted_to_string : 1; + + /* Whether the argument is a string with intervals. */ bool_bf intervals : 1; } *info; =20 CHECK_STRING (args[0]); char *format_start =3D SSDATA (args[0]); + bool multibyte_format =3D STRING_MULTIBYTE (args[0]); ptrdiff_t formatlen =3D SBYTES (args[0]); =20 - /* The number of percent characters is a safe upper bound for the - number of format fields. */ - ptrdiff_t num_percent =3D 0; - for (ptrdiff_t i =3D 0; i < formatlen; ++i) - if (format_start[i] =3D=3D '%') - ++num_percent; + /* Upper bound on number of format specs. Each uses at least 2 chars.= */ + ptrdiff_t nspec_bound =3D SCHARS (args[0]) >> 1; =20 /* Allocate the info and discarded tables. */ ptrdiff_t alloca_size; - if (INT_MULTIPLY_WRAPV (num_percent, sizeof *info, &alloca_size) - || INT_ADD_WRAPV (sizeof *info, alloca_size, &alloca_size) + if (INT_MULTIPLY_WRAPV (nspec_bound, sizeof *info, &alloca_size) || INT_ADD_WRAPV (formatlen, alloca_size, &alloca_size) || SIZE_MAX < alloca_size) memory_full (SIZE_MAX); - /* info[0] is unused. Unused elements have -1 for start. */ info =3D SAFE_ALLOCA (alloca_size); - memset (info, 0, alloca_size); - for (ptrdiff_t i =3D 0; i < num_percent + 1; i++) - { - info[i].argument =3D Qunbound; - info[i].start =3D -1; - } /* discarded[I] is 1 if byte I of the format string was not copied into the output. It is 2 if byte I was not the first byte of its character. */ - char *discarded =3D (char *) &info[num_percent + 1]; + char *discarded =3D (char *) &info[nspec_bound]; + memset (discarded, 0, formatlen); =20 /* Try to determine whether the result should be multibyte. This is not always right; sometimes the result needs to be multibyt= e @@ -4030,8 +4020,6 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, = bool message) or because a grave accent or apostrophe is requoted, and in that case, we won't know it here. */ =20 - /* True if the format is multibyte. */ - bool multibyte_format =3D STRING_MULTIBYTE (args[0]); /* True if the output should be a multibyte string, which is true if any of the inputs is one. */ bool multibyte =3D multibyte_format; @@ -4042,6 +4030,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, = bool message) int quoting_style =3D message ? text_quoting_style () : -1; =20 ptrdiff_t ispec; + ptrdiff_t nspec =3D 0; =20 /* If we start out planning a unibyte result, then discover it has to be multibyte, we jump back to retry. */ @@ -4155,11 +4144,14 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) if (! (n < nargs)) error ("Not enough arguments for format string"); =20 - eassert (ispec < num_percent); - ++ispec; - - if (EQ (info[ispec].argument, Qunbound)) - info[ispec].argument =3D args[n]; + struct info *spec =3D &info[ispec++]; + if (nspec < ispec) + { + spec->argument =3D args[n]; + spec->intervals =3D false; + nspec =3D ispec; + } + Lisp_Object arg =3D spec->argument; =20 /* For 'S', prin1 the argument, and then treat like 's'. For 's', princ any argument that is not a string or @@ -4167,16 +4159,13 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) happen after retrying. */ if ((conversion =3D=3D 'S' || (conversion =3D=3D 's' - && ! STRINGP (info[ispec].argument) - && ! SYMBOLP (info[ispec].argument)))) + && ! STRINGP (arg) && ! SYMBOLP (arg)))) { - if (! info[ispec].converted_to_string) + if (EQ (arg, args[n])) { Lisp_Object noescape =3D conversion =3D=3D 'S' ? Qnil : Qt; - info[ispec].argument =3D - Fprin1_to_string (info[ispec].argument, noescape); - info[ispec].converted_to_string =3D true; - if (STRING_MULTIBYTE (info[ispec].argument) && ! multibyte) + spec->argument =3D arg =3D Fprin1_to_string (arg, noescape); + if (STRING_MULTIBYTE (arg) && ! multibyte) { multibyte =3D true; goto retry; @@ -4186,29 +4175,25 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) } else if (conversion =3D=3D 'c') { - if (INTEGERP (info[ispec].argument) - && ! ASCII_CHAR_P (XINT (info[ispec].argument))) + if (INTEGERP (arg) && ! ASCII_CHAR_P (XINT (arg))) { if (!multibyte) { multibyte =3D true; goto retry; } - info[ispec].argument =3D - Fchar_to_string (info[ispec].argument); - info[ispec].converted_to_string =3D true; + spec->argument =3D arg =3D Fchar_to_string (arg); } =20 - if (info[ispec].converted_to_string) + if (!EQ (arg, args[n])) conversion =3D 's'; zero_flag =3D false; } =20 - if (SYMBOLP (info[ispec].argument)) + if (SYMBOLP (arg)) { - info[ispec].argument =3D - SYMBOL_NAME (info[ispec].argument); - if (STRING_MULTIBYTE (info[ispec].argument) && ! multibyte) + spec->argument =3D arg =3D SYMBOL_NAME (arg); + if (STRING_MULTIBYTE (arg) && ! multibyte) { multibyte =3D true; goto retry; @@ -4239,12 +4224,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) else { ptrdiff_t nch, nby; - width =3D lisp_string_width (info[ispec].argument, - prec, &nch, &nby); + width =3D lisp_string_width (arg, prec, &nch, &nby); if (prec < 0) { - nchars_string =3D SCHARS (info[ispec].argument); - nbytes =3D SBYTES (info[ispec].argument); + nchars_string =3D SCHARS (arg); + nbytes =3D SBYTES (arg); } else { @@ -4254,11 +4238,8 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args,= bool message) } =20 convbytes =3D nbytes; - if (convbytes && multibyte && - ! STRING_MULTIBYTE (info[ispec].argument)) - convbytes =3D - count_size_as_multibyte (SDATA (info[ispec].argument),= - nbytes); + if (convbytes && multibyte && ! STRING_MULTIBYTE (arg)) + convbytes =3D count_size_as_multibyte (SDATA (arg), nbytes); =20 ptrdiff_t padding =3D width < field_width ? field_width - width : 0; @@ -4274,20 +4255,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) p +=3D padding; nchars +=3D padding; } - info[ispec].start =3D nchars; + spec->start =3D nchars; =20 if (p > buf && multibyte && !ASCII_CHAR_P (*((unsigned char *) p - 1)) - && STRING_MULTIBYTE (info[ispec].argument) - && !CHAR_HEAD_P (SREF (info[ispec].argument, 0))) + && STRING_MULTIBYTE (arg) + && !CHAR_HEAD_P (SREF (arg, 0))) maybe_combine_byte =3D true; =20 - p +=3D copy_text (SDATA (info[ispec].argument), - (unsigned char *) p, + p +=3D copy_text (SDATA (arg), (unsigned char *) p, nbytes, - STRING_MULTIBYTE (info[ispec].argument), - multibyte); + STRING_MULTIBYTE (arg), multibyte); =20 nchars +=3D nchars_string; =20 @@ -4297,12 +4276,12 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) p +=3D padding; nchars +=3D padding; } - info[ispec].end =3D nchars; + spec->end =3D nchars; =20 /* If this argument has text properties, record where in the result string it appears. */ - if (string_intervals (info[ispec].argument)) - info[ispec].intervals =3D arg_intervals =3D true; + if (string_intervals (arg)) + spec->intervals =3D arg_intervals =3D true; =20 continue; } @@ -4313,8 +4292,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, = bool message) || conversion =3D=3D 'X')) error ("Invalid format operation %%%c", STRING_CHAR ((unsigned char *) format - 1)); - else if (! (INTEGERP (info[ispec].argument) - || (FLOATP (info[ispec].argument) && conversion !=3D 'c'))) + else if (! (INTEGERP (arg) || (FLOATP (arg) && conversion !=3D 'c')))= error ("Format specifier doesn't match argument type"); else { @@ -4376,7 +4354,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, = bool message) if (INT_AS_LDBL) { *f =3D 'L'; - f +=3D INTEGERP (info[ispec].argument); + f +=3D INTEGERP (arg); } } else if (conversion !=3D 'c') @@ -4408,22 +4386,22 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) ptrdiff_t sprintf_bytes; if (float_conversion) { - if (INT_AS_LDBL && INTEGERP (info[ispec].argument)) + if (INT_AS_LDBL && INTEGERP (arg)) { /* Although long double may have a rounding error if DIG_BITS_LBOUND * LDBL_MANT_DIG < FIXNUM_BITS - 1, it is more accurate than plain 'double'. */ - long double x =3D XINT (info[ispec].argument); + long double x =3D XINT (arg); sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); } else sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, - XFLOATINT (info[ispec].argument)); + XFLOATINT (arg)); } else if (conversion =3D=3D 'c') { /* Don't use sprintf here, as it might mishandle prec. */ - sprintf_buf[0] =3D XINT (info[ispec].argument); + sprintf_buf[0] =3D XINT (arg); sprintf_bytes =3D prec !=3D 0; } else if (conversion =3D=3D 'd' || conversion =3D=3D 'i') @@ -4432,11 +4410,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) instead so it also works for values outside the integer range. */ printmax_t x; - if (INTEGERP (info[ispec].argument)) - x =3D XINT (info[ispec].argument); + if (INTEGERP (arg)) + x =3D XINT (arg); else { - double d =3D XFLOAT_DATA (info[ispec].argument); + double d =3D XFLOAT_DATA (arg); if (d < 0) { x =3D TYPE_MINIMUM (printmax_t); @@ -4456,11 +4434,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) { /* Don't sign-extend for octal or hex printing. */ uprintmax_t x; - if (INTEGERP (info[ispec].argument)) - x =3D XUINT (info[ispec].argument); + if (INTEGERP (arg)) + x =3D XUINT (arg); else { - double d =3D XFLOAT_DATA (info[ispec].argument); + double d =3D XFLOAT_DATA (arg); if (d < 0) x =3D 0; else @@ -4541,7 +4519,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, = bool message) exponent_bytes =3D src + sprintf_bytes - e; } =20 - info[ispec].start =3D nchars; + spec->start =3D nchars; if (! minus_flag) { memset (p, ' ', padding); @@ -4572,7 +4550,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, = bool message) p +=3D padding; nchars +=3D padding; } - info[ispec].end =3D nchars; + spec->end =3D nchars; =20 continue; } @@ -4681,7 +4659,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, = bool message) if (CONSP (props)) { ptrdiff_t bytepos =3D 0, position =3D 0, translated =3D 0; - ptrdiff_t fieldn =3D 1; + ptrdiff_t fieldn =3D 0; =20 /* Adjust the bounds of each text property to the proper start and end in the output string. */ @@ -4747,7 +4725,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, = bool message) =20 /* Add text properties from arguments. */ if (arg_intervals) - for (ptrdiff_t i =3D 1; i <=3D num_percent; i++) + for (ptrdiff_t i =3D 0; i < nspec; i++) if (info[i].intervals) { len =3D make_number (SCHARS (info[i].argument)); --=20 2.7.4 --------------3783E2CDF2245EA1C2BDABD3--