* i18n/l10n summary
@ 2017-05-28 5:29 Jean-Christophe Helary
2017-05-28 14:27 ` Drew Adams
` (3 more replies)
0 siblings, 4 replies; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-05-28 5:29 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]
The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).
====================================================
On May 27, 2017, at 10:52, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote:
> My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected.
On May 27, 2017, at 16:43, Eli Zaretskii <eliz@gnu.org> wrote:
> Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some
> use "error", some use other mechanisms.
Internally, they all boil down to a small set of C functions, which is
where we should make these changes.
====================================================
Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever...
Jean-Christophe
[-- Attachment #2: Type: text/html, Size: 1953 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: i18n/l10n summary
2017-05-28 5:29 i18n/l10n summary Jean-Christophe Helary
@ 2017-05-28 14:27 ` Drew Adams
2017-05-28 14:36 ` Jean-Christophe Helary
` (2 more replies)
2017-05-31 22:18 ` Philipp Stephani
` (2 subsequent siblings)
3 siblings, 3 replies; 40+ messages in thread
From: Drew Adams @ 2017-05-28 14:27 UTC (permalink / raw)
To: Jean-Christophe Helary, emacs-devel
> The discussion so far seems to point at modifying 'message' and
> the likes so that developers don't have to bother with any l10n
> mechanism on their part (besides for writing clean strings).
(Caveat: I haven't been following this thread at all, so
ignore if not helpful.)
Is the idea that something will be done so that, for example,
`message' automatically uses a translation of the message to
the user's currently preferred language? I.e., if that is
what is planned, isn't it perhaps too systematic - all or
nothing?
If so, then perhaps there should be a way to easily, from
Lisp, specify the target language explicitly - e.g. by an
optional `message' argument or (better, because the scope
is controllable without changing the `message' calls) by
binding a variable.
This, as opposed to automatically and always just making
`message' target whatever language is currently being
used/preferred by the user.
And in that case, there should perhaps be a user option
that overrides such a language choice by Lisp code. IOW,
in general, let code control the language for a given
`message' call or for a given scope (by a variable), but
let a user customize Emacs to say whether to allow this.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-28 14:27 ` Drew Adams
@ 2017-05-28 14:36 ` Jean-Christophe Helary
2017-05-28 15:33 ` Eli Zaretskii
2017-06-05 12:55 ` Jean-Christophe Helary
2 siblings, 0 replies; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-05-28 14:36 UTC (permalink / raw)
To: emacs-devel
> On May 28, 2017, at 23:27, Drew Adams <drew.adams@oracle.com> wrote:
>
>> The discussion so far seems to point at modifying 'message' and
>> the likes so that developers don't have to bother with any l10n
>> mechanism on their part (besides for writing clean strings).
>
> (Caveat: I haven't been following this thread at all, so
> ignore if not helpful.)
>
> Is the idea that something will be done so that, for example,
> `message' automatically uses a translation of the message to
> the user's currently preferred language? I.e., if that is
> what is planned, isn't it perhaps too systematic - all or
> nothing?
The idea is to have the messages displayed according to the *specified* language.
How the language is specified has been considered out of the scope of this discussion, if I'm not wrong. But that's the idea.
Basically, there are 2 possible states:
1) Emacs new install → the language is the OS default
2) User setting within Emacs (allows for normal use and for l10n testing).
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-28 14:27 ` Drew Adams
2017-05-28 14:36 ` Jean-Christophe Helary
@ 2017-05-28 15:33 ` Eli Zaretskii
2017-06-05 12:55 ` Jean-Christophe Helary
2 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2017-05-28 15:33 UTC (permalink / raw)
To: Drew Adams; +Cc: jean.christophe.helary, emacs-devel
> Date: Sun, 28 May 2017 07:27:10 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
>
> Is the idea that something will be done so that, for example,
> `message' automatically uses a translation of the message to
> the user's currently preferred language? I.e., if that is
> what is planned, isn't it perhaps too systematic - all or
> nothing?
>
> If so, then perhaps there should be a way to easily, from
> Lisp, specify the target language explicitly - e.g. by an
> optional `message' argument or (better, because the scope
> is controllable without changing the `message' calls) by
> binding a variable.
(let ((current-language-environment "FOO"))
(message "FOOBAR"))
> And in that case, there should perhaps be a user option
> that overrides such a language choice by Lisp code.
M-x set-language-environment RET
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: i18n/l10n summary
[not found] ` <<83a85xgfo2.fsf@gnu.org>
@ 2017-05-28 21:52 ` Drew Adams
0 siblings, 0 replies; 40+ messages in thread
From: Drew Adams @ 2017-05-28 21:52 UTC (permalink / raw)
To: Eli Zaretskii, Drew Adams; +Cc: jean.christophe.helary, emacs-devel
> > If so, then perhaps there should be a way to easily, from
> > Lisp, specify the target language explicitly ... by
> > binding a variable.
>
> (let ((current-language-environment "FOO"))(message "FOOBAR"))
>
> > And...a user option that overrides such a language choice
> > by Lisp code.
>
> M-x set-language-environment RET
Great. Thx.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-28 5:29 i18n/l10n summary Jean-Christophe Helary
2017-05-28 14:27 ` Drew Adams
@ 2017-05-31 22:18 ` Philipp Stephani
2017-05-31 22:29 ` Jean-Christophe Helary
` (3 more replies)
2017-07-28 0:15 ` Byung-Hee HWANG (황병희, 黃炳熙)
2018-04-25 12:58 ` Jean-Christophe Helary
3 siblings, 4 replies; 40+ messages in thread
From: Philipp Stephani @ 2017-05-31 22:18 UTC (permalink / raw)
To: Jean-Christophe Helary, emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 1707 bytes --]
Jean-Christophe Helary <jean.christophe.helary@gmail.com> schrieb am So.,
28. Mai 2017 um 07:29 Uhr:
> The discussion so far seems to point at modifying 'message' and the likes
> so that developers don't have to bother with any l10n mechanism on their
> part (besides for writing clean strings).
>
> ====================================================
> On May 27, 2017, at 10:52, Jean-Christophe Helary <
> jean.christophe.helary@gmail.com> wrote:
>
> My very uninformed idea is that we need an independent function that
> handles the preferred language check and the catalog parsing based on a
> key, and all the string displaying functions (message etc) would be
> redefined to call that function when a non default preferred langage
> (currently English) is detected.
>
>
> On May 27, 2017, at 16:43, Eli Zaretskii <eliz@gnu.org> wrote:
>
> Yes but from what I've seen in package/el, a lot of translatable texts are
> not displayed with "message". Some
> use "error", some use other mechanisms.
>
>
> Internally, they all boil down to a small set of C functions, which is
> where we should make these changes.
> ====================================================
>
> Since it's C, I'm not going to be able to contribute to that before I
> understand the language, and the function definitions. I guess it's time I
> open that K&R that's been on my shelves forever...
>
One small aspect would be to implement field numbers for `format' so that
argument indices can be explicitly specified. That is probably quite
important because the word order is different between languages, but it's
also useful in other situations (e.g. when repeating an argument). I've
implemented this in the attached patch.
[-- Attachment #1.2: Type: text/html, Size: 2422 bytes --]
[-- Attachment #2: 0001-Implement-field-numbers-in-format-strings.txt --]
[-- Type: text/plain, Size: 10433 bytes --]
From 39a0f7439b6cd804c49ea91b412b0adb11851cdd Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Thu, 1 Jun 2017 00:09:43 +0200
Subject: [PATCH] Implement field numbers in format strings
A field number explicitly specifies the argument to be formatted.
This is especially important for potential localization work, since
grammars of various languages dictate different word orders.
* src/editfns.c (Fformat): Update documentation.
(styled_format): Implement field numbers.
* doc/lispref/strings.texi (Formatting Strings): Document field numbers.
* lisp/emacs-lisp/bytecomp.el (byte-compile-format-warn): Adapt.
* test/src/editfns-tests.el (format-with-field): New unit test.
---
doc/lispref/strings.texi | 31 ++++++++++++++++++++++---
etc/NEWS | 3 +++
lisp/emacs-lisp/bytecomp.el | 11 ++++++---
src/editfns.c | 55 ++++++++++++++++++++++++++++++++++++++-------
test/src/editfns-tests.el | 18 +++++++++++++++
5 files changed, 104 insertions(+), 14 deletions(-)
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index 9436a96ead..9bf52f2414 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -864,7 +864,8 @@ Formatting Strings
(format "%s" @var{arbitrary-string})
@end example
- If @var{string} contains more than one format specification, the
+ If @var{string} contains more than one format specification and none
+of the format specifications contain an explicit field number, the
format specifications correspond to successive values from
@var{objects}. Thus, the first format specification in @var{string}
uses the first such value, the second format specification uses the
@@ -961,6 +962,25 @@ Formatting Strings
@end group
@end example
+@cindex field number
+ A specification can have a @dfn{field number}, which is a decimal
+number after the initial @samp{%}, followed by a literal dollar sign
+@samp{$}. If you provide a field numer, then the argument to be
+printed corresponds to the given field number instead of the next
+argument. Field numbers start at 1.
+
+You can mix specifications with and without field numbers. A
+specification without a field number that follows a specification with
+a field number will convert the argument after the one specified by
+the field number:
+
+@example
+(format "First argument %2$s, then %s, then %1$s" 1 2 3)
+ @result{} "First argument 2, then 3, then 1"
+@end example
+
+You can't use field numbers in a @samp{%%} specification.
+
@cindex field width
@cindex padding
A specification can have a @dfn{width}, which is a decimal number
@@ -996,9 +1016,14 @@ Formatting Strings
@end group
@end example
+If you want to use both a field number and a width, place the field
+number before the width. For example, in @samp{%2$7s}, @samp{2} is
+the field number and @samp{7} is the width.
+
@cindex flags in format specifications
- Immediately after the @samp{%} and before the optional width
-specifier, you can also put certain @dfn{flag characters}.
+ After the @samp{%} and before the optional width specifier, you can
+also put certain @dfn{flag characters}. The flag characters need to
+come directly after a potential field number.
The flag @samp{+} inserts a plus sign before a positive number, so
that it always has a sign. A space character as flag inserts a space
diff --git a/etc/NEWS b/etc/NEWS
index 43e7897120..e31c0c7cb0 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -364,6 +364,9 @@ large integers from being displayed as characters.
** Two new commands for finding the source code of Emacs Lisp
libraries: 'find-library-other-window' and 'find-library-other-frame'.
+** You can now provide explicit field numbers in format specifiers.
+For example, '(format "%2$s %1$s" 1 2)' produces "2 1".
+
\f
* Editing Changes in Emacs 26.1
diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
index 12a7d4afc2..e5b9b47b1d 100644
--- a/lisp/emacs-lisp/bytecomp.el
+++ b/lisp/emacs-lisp/bytecomp.el
@@ -1375,10 +1375,15 @@ byte-compile-format-warn
(let ((nfields (with-temp-buffer
(insert (nth 1 form))
(goto-char (point-min))
- (let ((n 0))
+ (let ((i 0) (n 0))
(while (re-search-forward "%." nil t)
- (unless (eq ?% (char-after (1+ (match-beginning 0))))
- (setq n (1+ n))))
+ (backward-char)
+ (unless (eq ?% (char-after))
+ (setq i (if (looking-at "\\([0-9]+\\)\\$")
+ (string-to-number (match-string 1) 10)
+ (1+ i))
+ n (max n i)))
+ (forward-char))
n)))
(nargs (- (length form) 2)))
(unless (= nargs nfields)
diff --git a/src/editfns.c b/src/editfns.c
index a51670cfdf..b76208f8be 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -48,6 +48,7 @@ along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. */
#include <float.h>
#include <limits.h>
+#include <c-ctype.h>
#include <intprops.h>
#include <stdlib.h>
#include <strftime.h>
@@ -3857,7 +3858,7 @@ The first argument is a format control string.
The other arguments are substituted into it to make the result, a string.
The format control string may contain %-sequences meaning to substitute
-the next available argument:
+the next available argument, or the argument explicitly specified:
%s means print a string argument. Actually, prints any object, with `princ'.
%d means print as signed number in decimal.
@@ -3874,13 +3875,17 @@ the next available argument:
The argument used for %d, %o, %x, %e, %f, %g or %c must be a number.
Use %% to put a single % into the output.
-A %-sequence may contain optional flag, width, and precision
-specifiers, as follows:
+A %-sequence may contain optional field number, flag, width, and
+precision specifiers, as follows:
- %<flags><width><precision>character
+ %<field><flags><width><precision>character
-where flags is [+ #-0]+, width is [0-9]+, and precision is a literal
-period "." followed by [0-9]+
+where field is [0-9]+ followed by a literal dollar "$", flags is
+[+ #-0]+, width is [0-9]+, and precision is a literal period "."
+followed by [0-9]+.
+
+If field is given, it must be a one-based argument number; the given
+argument is substituted instead of the next one.
The + flag character inserts a + before any positive number, while a
space inserts a space before any positive number; these flags only
@@ -4033,14 +4038,19 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
{
/* General format specifications look like
- '%' [flags] [field-width] [precision] format
+ '%' [field-number] [flags] [field-width] [precision] format
where
+ field-number ::= [0-9]+ '$'
flags ::= [-+0# ]+
field-width ::= [0-9]+
precision ::= '.' [0-9]*
+ If a field-number is specified, it specifies the argument
+ number to substitute. Otherwise, the next argument is
+ taken.
+
If a field-width is specified, it specifies to which width
the output should be padded with blanks, if the output
string is shorter than field-width.
@@ -4049,6 +4059,29 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
digits to print after the '.' for floats, or the max.
number of chars to print from a string. */
+ char *field_end;
+ uintmax_t raw_field = strtoumax (format, &field_end, 10);
+ bool has_field = false;
+ if (c_isdigit (*format) && *field_end == '$')
+ {
+ if (raw_field < 1 || raw_field >= PTRDIFF_MAX)
+ {
+ /* doprnt doesn't support %.*s, so we need to copy
+ the field number string. */
+ ptrdiff_t length = field_end - format;
+ eassert (length > 0);
+ eassert (length < PTRDIFF_MAX);
+ char *field = SAFE_ALLOCA (length + 1);
+ memcpy (field, format, length);
+ field[length] = '\0';
+ error ("Invalid field number `%s'", field);
+ }
+ has_field = true;
+ /* n is incremented below. */
+ n = raw_field - 1;
+ format = field_end + 1;
+ }
+
bool minus_flag = false;
bool plus_flag = false;
bool space_flag = false;
@@ -4091,7 +4124,13 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
memset (&discarded[format0 - format_start], 1,
format - format0 - (conversion == '%'));
if (conversion == '%')
- goto copy_char;
+ {
+ if (has_field)
+ /* FIXME: `error' doesn't appear to support `%%'. */
+ error ("Field number specified together with `%c' conversion",
+ '%');
+ goto copy_char;
+ }
++n;
if (! (n < nargs))
diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el
index 8019eb0383..f76c6c9fd3 100644
--- a/test/src/editfns-tests.el
+++ b/test/src/editfns-tests.el
@@ -177,4 +177,22 @@ transpose-test-get-byte-positions
(format-time-string "%Y-%m-%d %H:%M:%S.%3N %z" nil
(concat (make-string 2048 ?X) "0")))))
+(ert-deftest format-with-field ()
+ (should (equal (format "First argument %2$s, then %s, then %1$s" 1 2 3)
+ "First argument 2, then 3, then 1"))
+ (should (equal (format "a %2$s %d %1$d %2$S %d %d b" 11 "22" 33 44)
+ "a 22 33 11 \"22\" 33 44 b"))
+ (should (equal (format "a %08$s %s b" 1 2 3 4 5 6 7 8 9) "a 8 9 b"))
+ (should (equal (should-error (format "a %999999$s b" 11))
+ '(error "Not enough arguments for format string")))
+ (should (equal (should-error (format "a %$s b" 11))
+ ;; FIXME: there shouldn't be two % in the error
+ ;; string!
+ '(error "Invalid format operation %%$")))
+ (should (equal (should-error (format "a %0$s b" 11))
+ '(error "Invalid field number `0'")))
+ (should (equal
+ (should-error (format "a %1$% %s b" 11))
+ '(error "Field number specified together with `%' conversion"))))
+
;;; editfns-tests.el ends here
--
2.13.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-31 22:18 ` Philipp Stephani
@ 2017-05-31 22:29 ` Jean-Christophe Helary
2017-06-01 5:18 ` Paul Eggert
` (2 subsequent siblings)
3 siblings, 0 replies; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-05-31 22:29 UTC (permalink / raw)
To: emacs-devel
> On Jun 1, 2017, at 7:18, Philipp Stephani <p.stephani2@gmail.com> wrote:
>
>
>> The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).
>>
>> ====================================================
>> On May 27, 2017, at 10:52, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote:
>>
>>> My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected.
>>
>> On May 27, 2017, at 16:43, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>>> Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some
>>> use "error", some use other mechanisms.
>>
>> Internally, they all boil down to a small set of C functions, which is
>> where we should make these changes.
>> ====================================================
>>
>> Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever...
>
> One small aspect would be to implement field numbers for `format' so that argument indices can be explicitly specified. That is probably quite important because the word order is different between languages, but it's also useful in other situations (e.g. when repeating an argument). I've implemented this in the attached patch.
That's really not a *small* aspect. Thank you for thinking about that. I'm in no position to review your work but I suppose others will.
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-31 22:18 ` Philipp Stephani
2017-05-31 22:29 ` Jean-Christophe Helary
@ 2017-06-01 5:18 ` Paul Eggert
2017-06-01 8:17 ` Philipp Stephani
2017-06-01 14:21 ` Eli Zaretskii
2017-07-02 1:22 ` Jean-Christophe Helary
3 siblings, 1 reply; 40+ messages in thread
From: Paul Eggert @ 2017-06-01 5:18 UTC (permalink / raw)
To: Philipp Stephani, Jean-Christophe Helary, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 985 bytes --]
Thanks for that patch: it's a good move forward for i18n. Some suggestions:
* Today I fixed the bug with "%%" and the 'error' function, so there's no need
for a FIXME or a workaround any more.
* In strings.texi, reorder the format spec description so that it matches the
textual order of a format spec. This should lessen confusion.
* Allow field numbers in a %% spec. All other components of a format spec are
allowed in %%, so odd to report an error for just field numbers.
* There is no need for a special diagnostic for field numbers greater than
PTRDIFF_MAX. Just use the same diagnostic other too-large field numbers use.
This avoids a need for an alloca.
* Reword "Invalid field number `0'" to "Invalid format field number 0" to make
it more obvious that it's a format and there's no need to quote the 0.
Proposed further patch attached (it addresses the above points), along with a
copy of your patch rebased to current master for convenience.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Implement-field-numbers-in-format-strings.patch --]
[-- Type: text/x-patch; name="0001-Implement-field-numbers-in-format-strings.patch", Size: 10666 bytes --]
From 7f98d9c6d59472abf189ccd5a5c76415f9d1cbb3 Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Wed, 31 May 2017 22:09:39 -0700
Subject: [PATCH 1/2] Implement field numbers in format strings
A field number explicitly specifies the argument to be formatted.
This is especially important for potential localization work, since
grammars of various languages dictate different word orders.
* src/editfns.c (Fformat): Update documentation.
(styled_format): Implement field numbers.
* doc/lispref/strings.texi (Formatting Strings): Document field numbers.
* lisp/emacs-lisp/bytecomp.el (byte-compile-format-warn): Adapt.
* test/src/editfns-tests.el (format-with-field): New unit test.
---
doc/lispref/strings.texi | 31 ++++++++++++++++++++++---
etc/NEWS | 3 +++
lisp/emacs-lisp/bytecomp.el | 11 ++++++---
src/editfns.c | 55 ++++++++++++++++++++++++++++++++++++++-------
test/src/editfns-tests.el | 18 +++++++++++++++
5 files changed, 104 insertions(+), 14 deletions(-)
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index 9436a96..9bf52f2 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -864,7 +864,8 @@ Formatting Strings
(format "%s" @var{arbitrary-string})
@end example
- If @var{string} contains more than one format specification, the
+ If @var{string} contains more than one format specification and none
+of the format specifications contain an explicit field number, the
format specifications correspond to successive values from
@var{objects}. Thus, the first format specification in @var{string}
uses the first such value, the second format specification uses the
@@ -961,6 +962,25 @@ Formatting Strings
@end group
@end example
+@cindex field number
+ A specification can have a @dfn{field number}, which is a decimal
+number after the initial @samp{%}, followed by a literal dollar sign
+@samp{$}. If you provide a field numer, then the argument to be
+printed corresponds to the given field number instead of the next
+argument. Field numbers start at 1.
+
+You can mix specifications with and without field numbers. A
+specification without a field number that follows a specification with
+a field number will convert the argument after the one specified by
+the field number:
+
+@example
+(format "First argument %2$s, then %s, then %1$s" 1 2 3)
+ @result{} "First argument 2, then 3, then 1"
+@end example
+
+You can't use field numbers in a @samp{%%} specification.
+
@cindex field width
@cindex padding
A specification can have a @dfn{width}, which is a decimal number
@@ -996,9 +1016,14 @@ Formatting Strings
@end group
@end example
+If you want to use both a field number and a width, place the field
+number before the width. For example, in @samp{%2$7s}, @samp{2} is
+the field number and @samp{7} is the width.
+
@cindex flags in format specifications
- Immediately after the @samp{%} and before the optional width
-specifier, you can also put certain @dfn{flag characters}.
+ After the @samp{%} and before the optional width specifier, you can
+also put certain @dfn{flag characters}. The flag characters need to
+come directly after a potential field number.
The flag @samp{+} inserts a plus sign before a positive number, so
that it always has a sign. A space character as flag inserts a space
diff --git a/etc/NEWS b/etc/NEWS
index 43e7897..e31c0c7 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -364,6 +364,9 @@ large integers from being displayed as characters.
** Two new commands for finding the source code of Emacs Lisp
libraries: 'find-library-other-window' and 'find-library-other-frame'.
+** You can now provide explicit field numbers in format specifiers.
+For example, '(format "%2$s %1$s" 1 2)' produces "2 1".
+
\f
* Editing Changes in Emacs 26.1
diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
index 12a7d4a..e5b9b47 100644
--- a/lisp/emacs-lisp/bytecomp.el
+++ b/lisp/emacs-lisp/bytecomp.el
@@ -1375,10 +1375,15 @@ byte-compile-format-warn
(let ((nfields (with-temp-buffer
(insert (nth 1 form))
(goto-char (point-min))
- (let ((n 0))
+ (let ((i 0) (n 0))
(while (re-search-forward "%." nil t)
- (unless (eq ?% (char-after (1+ (match-beginning 0))))
- (setq n (1+ n))))
+ (backward-char)
+ (unless (eq ?% (char-after))
+ (setq i (if (looking-at "\\([0-9]+\\)\\$")
+ (string-to-number (match-string 1) 10)
+ (1+ i))
+ n (max n i)))
+ (forward-char))
n)))
(nargs (- (length form) 2)))
(unless (= nargs nfields)
diff --git a/src/editfns.c b/src/editfns.c
index 89a6724..44341ce 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -48,6 +48,7 @@ along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. */
#include <float.h>
#include <limits.h>
+#include <c-ctype.h>
#include <intprops.h>
#include <stdlib.h>
#include <strftime.h>
@@ -3856,7 +3857,7 @@ The first argument is a format control string.
The other arguments are substituted into it to make the result, a string.
The format control string may contain %-sequences meaning to substitute
-the next available argument:
+the next available argument, or the argument explicitly specified:
%s means print a string argument. Actually, prints any object, with `princ'.
%d means print as signed number in decimal.
@@ -3873,13 +3874,17 @@ the next available argument:
The argument used for %d, %o, %x, %e, %f, %g or %c must be a number.
Use %% to put a single % into the output.
-A %-sequence may contain optional flag, width, and precision
-specifiers, as follows:
+A %-sequence may contain optional field number, flag, width, and
+precision specifiers, as follows:
- %<flags><width><precision>character
+ %<field><flags><width><precision>character
-where flags is [+ #-0]+, width is [0-9]+, and precision is a literal
-period "." followed by [0-9]+
+where field is [0-9]+ followed by a literal dollar "$", flags is
+[+ #-0]+, width is [0-9]+, and precision is a literal period "."
+followed by [0-9]+.
+
+If field is given, it must be a one-based argument number; the given
+argument is substituted instead of the next one.
The + flag character inserts a + before any positive number, while a
space inserts a space before any positive number; these flags only
@@ -4032,14 +4037,19 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
{
/* General format specifications look like
- '%' [flags] [field-width] [precision] format
+ '%' [field-number] [flags] [field-width] [precision] format
where
+ field-number ::= [0-9]+ '$'
flags ::= [-+0# ]+
field-width ::= [0-9]+
precision ::= '.' [0-9]*
+ If a field-number is specified, it specifies the argument
+ number to substitute. Otherwise, the next argument is
+ taken.
+
If a field-width is specified, it specifies to which width
the output should be padded with blanks, if the output
string is shorter than field-width.
@@ -4048,6 +4058,29 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
digits to print after the '.' for floats, or the max.
number of chars to print from a string. */
+ char *field_end;
+ uintmax_t raw_field = strtoumax (format, &field_end, 10);
+ bool has_field = false;
+ if (c_isdigit (*format) && *field_end == '$')
+ {
+ if (raw_field < 1 || raw_field >= PTRDIFF_MAX)
+ {
+ /* doprnt doesn't support %.*s, so we need to copy
+ the field number string. */
+ ptrdiff_t length = field_end - format;
+ eassert (length > 0);
+ eassert (length < PTRDIFF_MAX);
+ char *field = SAFE_ALLOCA (length + 1);
+ memcpy (field, format, length);
+ field[length] = '\0';
+ error ("Invalid field number `%s'", field);
+ }
+ has_field = true;
+ /* n is incremented below. */
+ n = raw_field - 1;
+ format = field_end + 1;
+ }
+
bool minus_flag = false;
bool plus_flag = false;
bool space_flag = false;
@@ -4090,7 +4123,13 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
memset (&discarded[format0 - format_start], 1,
format - format0 - (conversion == '%'));
if (conversion == '%')
- goto copy_char;
+ {
+ if (has_field)
+ /* FIXME: `error' doesn't appear to support `%%'. */
+ error ("Field number specified together with `%c' conversion",
+ '%');
+ goto copy_char;
+ }
++n;
if (! (n < nargs))
diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el
index 8019eb0..f76c6c9 100644
--- a/test/src/editfns-tests.el
+++ b/test/src/editfns-tests.el
@@ -177,4 +177,22 @@ transpose-test-get-byte-positions
(format-time-string "%Y-%m-%d %H:%M:%S.%3N %z" nil
(concat (make-string 2048 ?X) "0")))))
+(ert-deftest format-with-field ()
+ (should (equal (format "First argument %2$s, then %s, then %1$s" 1 2 3)
+ "First argument 2, then 3, then 1"))
+ (should (equal (format "a %2$s %d %1$d %2$S %d %d b" 11 "22" 33 44)
+ "a 22 33 11 \"22\" 33 44 b"))
+ (should (equal (format "a %08$s %s b" 1 2 3 4 5 6 7 8 9) "a 8 9 b"))
+ (should (equal (should-error (format "a %999999$s b" 11))
+ '(error "Not enough arguments for format string")))
+ (should (equal (should-error (format "a %$s b" 11))
+ ;; FIXME: there shouldn't be two % in the error
+ ;; string!
+ '(error "Invalid format operation %%$")))
+ (should (equal (should-error (format "a %0$s b" 11))
+ '(error "Invalid field number `0'")))
+ (should (equal
+ (should-error (format "a %1$% %s b" 11))
+ '(error "Field number specified together with `%' conversion"))))
+
;;; editfns-tests.el ends here
--
2.9.4
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Minor-improvements-to-format-field-numbers.patch --]
[-- Type: text/x-patch; name="0002-Minor-improvements-to-format-field-numbers.patch", Size: 12995 bytes --]
From 6b7edbf14be8b18e0781559345edda7346a1e363 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 31 May 2017 22:09:39 -0700
Subject: [PATCH 2/2] Minor improvements to format field numbers
* src/editfns.c (styled_format): Allow field numbers in a %% spec.
No need for a special diagnostic for field numbers greater than
PTRDIFF_MAX. Reword diagnostic for field 0.
* test/src/editfns-tests.el (format-with-field): Adjust to match.
---
doc/lispref/strings.texi | 119 +++++++++++++++++++++-------------------------
etc/NEWS | 2 +-
src/editfns.c | 56 ++++++++--------------
test/src/editfns-tests.el | 10 ++--
4 files changed, 79 insertions(+), 108 deletions(-)
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index 9bf52f2..7577410 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -864,15 +864,6 @@ Formatting Strings
(format "%s" @var{arbitrary-string})
@end example
- If @var{string} contains more than one format specification and none
-of the format specifications contain an explicit field number, the
-format specifications correspond to successive values from
-@var{objects}. Thus, the first format specification in @var{string}
-uses the first such value, the second format specification uses the
-second such value, and so on. Any extra format specifications (those
-for which there are no corresponding values) cause an error. Any
-extra values to be formatted are ignored.
-
Certain format specifications require values of particular types. If
you supply a value that doesn't fit the requirements, an error is
signaled.
@@ -962,68 +953,33 @@ Formatting Strings
@end group
@end example
+ By default, format specifications correspond to successive values from
+@var{objects}. Thus, the first format specification in @var{string}
+uses the first such value, the second format specification uses the
+second such value, and so on. Any extra format specifications (those
+for which there are no corresponding values) cause an error. Any
+extra values to be formatted are ignored.
+
@cindex field number
- A specification can have a @dfn{field number}, which is a decimal
-number after the initial @samp{%}, followed by a literal dollar sign
-@samp{$}. If you provide a field numer, then the argument to be
-printed corresponds to the given field number instead of the next
-argument. Field numbers start at 1.
+ A format specification can have a @dfn{field number}, which is a
+decimal number immediately after the initial @samp{%}, followed by a
+literal dollar sign @samp{$}. It causes the format specification to
+convert the argument with the given number instead of the next
+argument. Argument 1 is the argument just after the format.
-You can mix specifications with and without field numbers. A
+ You can mix specifications with and without field numbers. A
specification without a field number that follows a specification with
a field number will convert the argument after the one specified by
the field number:
@example
-(format "First argument %2$s, then %s, then %1$s" 1 2 3)
- @result{} "First argument 2, then 3, then 1"
-@end example
-
-You can't use field numbers in a @samp{%%} specification.
-
-@cindex field width
-@cindex padding
- A specification can have a @dfn{width}, which is a decimal number
-between the @samp{%} and the specification character. If the printed
-representation of the object contains fewer characters than this
-width, @code{format} extends it with padding. The width specifier is
-ignored for the @samp{%%} specification. Any padding introduced by
-the width specifier normally consists of spaces inserted on the left:
-
-@example
-(format "%5d is padded on the left with spaces" 123)
- @result{} " 123 is padded on the left with spaces"
-@end example
-
-@noindent
-If the width is too small, @code{format} does not truncate the
-object's printed representation. Thus, you can use a width to specify
-a minimum spacing between columns with no risk of losing information.
-In the following two examples, @samp{%7s} specifies a minimum width
-of 7. In the first case, the string inserted in place of @samp{%7s}
-has only 3 letters, and needs 4 blank spaces as padding. In the
-second case, the string @code{"specification"} is 13 letters wide but
-is not truncated.
-
-@example
-@group
-(format "The word '%7s' has %d letters in it."
- "foo" (length "foo"))
- @result{} "The word ' foo' has 3 letters in it."
-(format "The word '%7s' has %d letters in it."
- "specification" (length "specification"))
- @result{} "The word 'specification' has 13 letters in it."
-@end group
+(format "Argument %2$s, then %s, then %1$s" "x" "y" "z")
+ @result{} "Argument y, then z, then x"
@end example
-If you want to use both a field number and a width, place the field
-number before the width. For example, in @samp{%2$7s}, @samp{2} is
-the field number and @samp{7} is the width.
-
@cindex flags in format specifications
- After the @samp{%} and before the optional width specifier, you can
-also put certain @dfn{flag characters}. The flag characters need to
-come directly after a potential field number.
+ After the @samp{%} and any field number, you can put certain
+@dfn{flag characters}.
The flag @samp{+} inserts a plus sign before a positive number, so
that it always has a sign. A space character as flag inserts a space
@@ -1048,8 +1004,8 @@ Formatting Strings
These specification characters accept the @samp{0} flag, but still pad
with @emph{spaces}.
- The flag @samp{-} causes the padding inserted by the width
-specifier, if any, to be inserted on the right rather than the left.
+ The flag @samp{-} causes any padding inserted by the width,
+if specified, to be inserted on the right rather than the left.
If both @samp{-} and @samp{0} are present, the @samp{0} flag is
ignored.
@@ -1067,9 +1023,44 @@ Formatting Strings
@end group
@end example
+@cindex field width
+@cindex padding
+ A specification can have a @dfn{width}, which is a decimal number
+that appears after any field number and flags. If the printed
+representation of the object contains fewer characters than this
+width, @code{format} extends it with padding. The width is
+ignored for the @samp{%%} specification. Any padding introduced by
+the width normally consists of spaces inserted on the left:
+
+@example
+(format "%5d is padded on the left with spaces" 123)
+ @result{} " 123 is padded on the left with spaces"
+@end example
+
+@noindent
+If the width is too small, @code{format} does not truncate the
+object's printed representation. Thus, you can use a width to specify
+a minimum spacing between columns with no risk of losing information.
+In the following two examples, @samp{%7s} specifies a minimum width
+of 7. In the first case, the string inserted in place of @samp{%7s}
+has only 3 letters, and needs 4 blank spaces as padding. In the
+second case, the string @code{"specification"} is 13 letters wide but
+is not truncated.
+
+@example
+@group
+(format "The word '%7s' has %d letters in it."
+ "foo" (length "foo"))
+ @result{} "The word ' foo' has 3 letters in it."
+(format "The word '%7s' has %d letters in it."
+ "specification" (length "specification"))
+ @result{} "The word 'specification' has 13 letters in it."
+@end group
+@end example
+
@cindex precision in format specifications
All the specification characters allow an optional @dfn{precision}
-before the character (after the width, if present). The precision is
+after the field number, flags and width, if present. The precision is
a decimal-point @samp{.} followed by a digit-string. For the
floating-point specifications (@samp{%e} and @samp{%f}), the
precision specifies how many digits following the decimal point to
diff --git a/etc/NEWS b/etc/NEWS
index e31c0c7..dfe81cd 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -365,7 +365,7 @@ large integers from being displayed as characters.
libraries: 'find-library-other-window' and 'find-library-other-frame'.
** You can now provide explicit field numbers in format specifiers.
-For example, '(format "%2$s %1$s" 1 2)' produces "2 1".
+For example, '(format "%2$s %1$s" "X" "Y")' produces "Y X".
\f
* Editing Changes in Emacs 26.1
diff --git a/src/editfns.c b/src/editfns.c
index 44341ce..98187df 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -4046,9 +4046,8 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
field-width ::= [0-9]+
precision ::= '.' [0-9]*
- If a field-number is specified, it specifies the argument
- number to substitute. Otherwise, the next argument is
- taken.
+ If present, a field-number specifies the argument number
+ to substitute. Otherwise, the next argument is taken.
If a field-width is specified, it specifies to which width
the output should be padded with blanks, if the output
@@ -4058,28 +4057,20 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
digits to print after the '.' for floats, or the max.
number of chars to print from a string. */
- char *field_end;
- uintmax_t raw_field = strtoumax (format, &field_end, 10);
- bool has_field = false;
- if (c_isdigit (*format) && *field_end == '$')
- {
- if (raw_field < 1 || raw_field >= PTRDIFF_MAX)
- {
- /* doprnt doesn't support %.*s, so we need to copy
- the field number string. */
- ptrdiff_t length = field_end - format;
- eassert (length > 0);
- eassert (length < PTRDIFF_MAX);
- char *field = SAFE_ALLOCA (length + 1);
- memcpy (field, format, length);
- field[length] = '\0';
- error ("Invalid field number `%s'", field);
- }
- has_field = true;
- /* n is incremented below. */
- n = raw_field - 1;
- format = field_end + 1;
- }
+ uintmax_t num;
+ char *num_end;
+ if (c_isdigit (*format))
+ {
+ num = strtoumax (format, &num_end, 10);
+ if (*num_end == '$')
+ {
+ if (num == 0)
+ error ("Invalid format field number 0");
+ n = min (num, PTRDIFF_MAX);
+ n--;
+ format = num_end + 1;
+ }
+ }
bool minus_flag = false;
bool plus_flag = false;
@@ -4104,11 +4095,10 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
space_flag &= ! plus_flag;
zero_flag &= ! minus_flag;
- char *num_end;
- uintmax_t raw_field_width = strtoumax (format, &num_end, 10);
- if (max_bufsize <= raw_field_width)
+ num = strtoumax (format, &num_end, 10);
+ if (max_bufsize <= num)
string_overflow ();
- ptrdiff_t field_width = raw_field_width;
+ ptrdiff_t field_width = num;
bool precision_given = *num_end == '.';
uintmax_t precision = (precision_given
@@ -4123,13 +4113,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
memset (&discarded[format0 - format_start], 1,
format - format0 - (conversion == '%'));
if (conversion == '%')
- {
- if (has_field)
- /* FIXME: `error' doesn't appear to support `%%'. */
- error ("Field number specified together with `%c' conversion",
- '%');
- goto copy_char;
- }
+ goto copy_char;
++n;
if (! (n < nargs))
diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el
index f76c6c9..c5923aa 100644
--- a/test/src/editfns-tests.el
+++ b/test/src/editfns-tests.el
@@ -186,13 +186,9 @@ transpose-test-get-byte-positions
(should (equal (should-error (format "a %999999$s b" 11))
'(error "Not enough arguments for format string")))
(should (equal (should-error (format "a %$s b" 11))
- ;; FIXME: there shouldn't be two % in the error
- ;; string!
- '(error "Invalid format operation %%$")))
+ '(error "Invalid format operation %$")))
(should (equal (should-error (format "a %0$s b" 11))
- '(error "Invalid field number `0'")))
- (should (equal
- (should-error (format "a %1$% %s b" 11))
- '(error "Field number specified together with `%' conversion"))))
+ '(error "Invalid format field number 0")))
+ (should (equal (format "a %1$% %s b" 11) "a % 11 b")))
;;; editfns-tests.el ends here
--
2.9.4
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-01 5:18 ` Paul Eggert
@ 2017-06-01 8:17 ` Philipp Stephani
2017-06-01 23:20 ` Paul Eggert
0 siblings, 1 reply; 40+ messages in thread
From: Philipp Stephani @ 2017-06-01 8:17 UTC (permalink / raw)
To: Paul Eggert, Jean-Christophe Helary, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Do., 1. Juni 2017 um 07:18 Uhr:
> Thanks for that patch: it's a good move forward for i18n. Some suggestions:
>
> * Today I fixed the bug with "%%" and the 'error' function, so there's no
> need
> for a FIXME or a workaround any more.
>
> * In strings.texi, reorder the format spec description so that it matches
> the
> textual order of a format spec. This should lessen confusion.
>
> * Allow field numbers in a %% spec. All other components of a format spec
> are
> allowed in %%, so odd to report an error for just field numbers.
>
The reason I banned that initially is that the behavior for the case "%1$%
%d" is confusing: will the %d take argument 1 or 2? (We should ban such
mixing instead, see below.)
>
> * There is no need for a special diagnostic for field numbers greater than
> PTRDIFF_MAX. Just use the same diagnostic other too-large field numbers
> use.
> This avoids a need for an alloca.
>
> * Reword "Invalid field number `0'" to "Invalid format field number 0" to
> make
> it more obvious that it's a format and there's no need to quote the 0.
>
> Proposed further patch attached (it addresses the above points), along
> with a
> copy of your patch rebased to current master for convenience.
>
Thanks, feel free to push.
Two further things:
- Probably there's a bug lurking because the info[n] ought to be indexed by
specification index, not argument index. Something like (format "%1$c %1$d"
?a) will probably do the wrong thing (untested).
- We should ban mixing explicit and implicit field numbers, like POSIX
printf(3) does. The gain from allowing to mix is negligible, and it makes
the implementation and the documentation needlessly complex.
[-- Attachment #2: Type: text/html, Size: 2314 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-31 22:18 ` Philipp Stephani
2017-05-31 22:29 ` Jean-Christophe Helary
2017-06-01 5:18 ` Paul Eggert
@ 2017-06-01 14:21 ` Eli Zaretskii
2017-07-02 1:22 ` Jean-Christophe Helary
3 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2017-06-01 14:21 UTC (permalink / raw)
To: Philipp Stephani; +Cc: jean.christophe.helary, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 31 May 2017 22:18:11 +0000
>
> +@cindex field number
I think this should say "field numbers in format spec"
> + A specification can have a @dfn{field number}, which is a decimal
> +number after the initial @samp{%}, followed by a literal dollar sign
> +@samp{$}. If you provide a field numer, then the argument to be
^^^^^
A typo.
Thank you for working on this valuable improvement.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-01 8:17 ` Philipp Stephani
@ 2017-06-01 23:20 ` Paul Eggert
2017-06-02 6:52 ` Philipp Stephani
0 siblings, 1 reply; 40+ messages in thread
From: Paul Eggert @ 2017-06-01 23:20 UTC (permalink / raw)
To: Philipp Stephani, Jean-Christophe Helary, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]
On 06/01/2017 01:17 AM, Philipp Stephani wrote:
>
> - Probably there's a bug lurking because the info[n] ought to be
> indexed by specification index, not argument index. Something like
> (format "%1$c %1$d" ?a) will probably do the wrong thing (untested).
Sorry, I'm not following. That call returns "a 97"; isn't that the
expected result?
> - We should ban mixing explicit and implicit field numbers, like POSIX
> printf(3) does. The gain from allowing to mix is negligible, and it
> makes the implementation and the documentation needlessly complex.
Sounds good, and I installed the attached.
The 1st patch fixes a performance regression introduced by calling
strtoumax. I went whole-hog and removed all calls to strtoumax, since
they're all performance-significant, plus it makes for one less porting
issue to worry about.
The 2nd patch fixes the documentation along the lines that you
suggested. And on further thought, the tradition for Emacs is to
document supported behavior and not worry about slowing Emacs down to
check for undocumented usage (aside from preventing crashes), so with
that in mind the 2nd patch removes the check for %0$ (which never crashes).
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Improve-performance-by-avoiding-strtoumax.patch --]
[-- Type: text/x-patch; name="0001-Improve-performance-by-avoiding-strtoumax.patch", Size: 21709 bytes --]
From b126485b8ce3046ca47caaba2f845a39604dd76f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 1 Jun 2017 16:03:12 -0700
Subject: [PATCH 1/2] Improve performance by avoiding strtoumax
This made (string-to-number "10") 20% faster on my old desktop,
an AMD Phenom II X4 910e running Fedora 25 x86-64.
* admin/merge-gnulib (GNULIB_MODULES): Remove strtoumax.
* lib/gnulib.mk.in, m4/gnulib-comp.m4: Regenerate.
* lib/strtoul.c, lib/strtoull.c, lib/strtoumax.c, m4/strtoull.m4:
* m4/strtoumax.m4: Remove.
* src/editfns.c (str2num): New function.
(styled_format): Use it instead of strtoumax. Use ptrdiff_t
instead of uintmax_t. Check for integer overflow.
* src/lread.c (LEAD_INT, DOT_CHAR, TRAIL_INT, E_EXP):
Move to private scope and make them enums.
(string_to_number): Compute integer value directly during
first pass instead of revisiting it with strtoumax later.
---
admin/merge-gnulib | 2 +-
lib/gnulib.mk.in | 27 +------------------------
lib/strtoul.c | 19 ------------------
lib/strtoull.c | 26 ------------------------
lib/strtoumax.c | 2 --
m4/gnulib-comp.m4 | 30 ---------------------------
m4/strtoull.m4 | 24 ----------------------
m4/strtoumax.m4 | 28 --------------------------
src/editfns.c | 43 ++++++++++++++++++++++++++++-----------
src/lread.c | 59 +++++++++++++++++++++++-------------------------------
10 files changed, 58 insertions(+), 202 deletions(-)
delete mode 100644 lib/strtoul.c
delete mode 100644 lib/strtoull.c
delete mode 100644 lib/strtoumax.c
delete mode 100644 m4/strtoull.m4
delete mode 100644 m4/strtoumax.m4
diff --git a/admin/merge-gnulib b/admin/merge-gnulib
index 45e4a78..e5fb0f5 100755
--- a/admin/merge-gnulib
+++ b/admin/merge-gnulib
@@ -38,7 +38,7 @@ GNULIB_MODULES=
manywarnings memrchr mkostemp mktime
pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat
sig2str socklen stat-time std-gnu11 stdalign stddef stdio
- stpcpy strftime strtoimax strtoumax symlink sys_stat
+ stpcpy strftime strtoimax symlink sys_stat
sys_time time time_r time_rz timegm timer-time timespec-add timespec-sub
update-copyright utimens
vla warnings
diff --git a/lib/gnulib.mk.in b/lib/gnulib.mk.in
index d23c2a5..73d3043 100644
--- a/lib/gnulib.mk.in
+++ b/lib/gnulib.mk.in
@@ -21,7 +21,7 @@
# the same distribution terms as the rest of that program.
#
# Generated by gnulib-tool.
-# Reproduce by: gnulib-tool --import --lib=libgnu --source-base=lib --m4-base=m4 --doc-base=doc --tests-base=tests --aux-dir=build-aux --avoid=close --avoid=dup --avoid=fchdir --avoid=fstat --avoid=malloc-posix --avoid=msvc-inval --avoid=msvc-nothrow --avoid=open --avoid=openat-die --avoid=opendir --avoid=raise --avoid=save-cwd --avoid=select --avoid=setenv --avoid=sigprocmask --avoid=stat --avoid=stdarg --avoid=stdbool --avoid=threadlib --avoid=tzset --avoid=unsetenv --avoid=utime --avoid=utime-h --gnu-make --makefile-name=gnulib.mk.in --conditional-dependencies --no-libtool --macro-prefix=gl --no-vc-files alloca-opt binary-io byteswap c-ctype c-strcase careadlinkat close-stream count-leading-zeros count-one-bits count-trailing-zeros crypto/md5 crypto/sha1 crypto/sha256 crypto/sha512 dtoastr dtotimespec dup2 environ execinfo faccessat fcntl fcntl-h fdatasync fdopendir filemode filevercmp flexmember fstatat fsync getloadavg getopt-gnu gettime gettimeofday gitlog-to-changelog ignore-value intprops largefile lstat manywarnings memrchr mkostemp mktime pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat sig2str socklen stat-time std-gnu11 stdalign stddef stdio stpcpy strftime strtoimax strtoumax symlink sys_stat sys_time time time_r time_rz timegm timer-time timespec-add timespec-sub update-copyright utimens vla warnings
+# Reproduce by: gnulib-tool --import --lib=libgnu --source-base=lib --m4-base=m4 --doc-base=doc --tests-base=tests --aux-dir=build-aux --avoid=close --avoid=dup --avoid=fchdir --avoid=fstat --avoid=malloc-posix --avoid=msvc-inval --avoid=msvc-nothrow --avoid=open --avoid=openat-die --avoid=opendir --avoid=raise --avoid=save-cwd --avoid=select --avoid=setenv --avoid=sigprocmask --avoid=stat --avoid=stdarg --avoid=stdbool --avoid=threadlib --avoid=tzset --avoid=unsetenv --avoid=utime --avoid=utime-h --gnu-make --makefile-name=gnulib.mk.in --conditional-dependencies --no-libtool --macro-prefix=gl --no-vc-files alloca-opt binary-io byteswap c-ctype c-strcase careadlinkat close-stream count-leading-zeros count-one-bits count-trailing-zeros crypto/md5 crypto/sha1 crypto/sha256 crypto/sha512 dtoastr dtotimespec dup2 environ execinfo faccessat fcntl fcntl-h fdatasync fdopendir filemode filevercmp flexmember fstatat fsync getloadavg getopt-gnu gettime gettimeofday gitlog-to-changelog ignore-value intprops largefile lstat manywarnings memrchr mkostemp mktime pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat sig2str socklen stat-time std-gnu11 stdalign stddef stdio stpcpy strftime strtoimax symlink sys_stat sys_time time time_r time_rz timegm timer-time timespec-add timespec-sub update-copyright utimens vla warnings
MOSTLYCLEANFILES += core *.stackdump
@@ -905,7 +905,6 @@ gl_GNULIB_ENABLED_getdtablesize = @gl_GNULIB_ENABLED_getdtablesize@
gl_GNULIB_ENABLED_getgroups = @gl_GNULIB_ENABLED_getgroups@
gl_GNULIB_ENABLED_secure_getenv = @gl_GNULIB_ENABLED_secure_getenv@
gl_GNULIB_ENABLED_strtoll = @gl_GNULIB_ENABLED_strtoll@
-gl_GNULIB_ENABLED_strtoull = @gl_GNULIB_ENABLED_strtoull@
gl_GNULIB_ENABLED_tempname = @gl_GNULIB_ENABLED_tempname@
gl_LIBOBJS = @gl_LIBOBJS@
gl_LTLIBOBJS = @gl_LTLIBOBJS@
@@ -2507,30 +2506,6 @@ EXTRA_libgnu_a_SOURCES += strtol.c strtoll.c
endif
## end gnulib module strtoll
-## begin gnulib module strtoull
-ifeq (,$(OMIT_GNULIB_MODULE_strtoull))
-
-ifneq (,$(gl_GNULIB_ENABLED_strtoull))
-
-endif
-EXTRA_DIST += strtol.c strtoul.c strtoull.c
-
-EXTRA_libgnu_a_SOURCES += strtol.c strtoul.c strtoull.c
-
-endif
-## end gnulib module strtoull
-
-## begin gnulib module strtoumax
-ifeq (,$(OMIT_GNULIB_MODULE_strtoumax))
-
-
-EXTRA_DIST += strtoimax.c strtoumax.c
-
-EXTRA_libgnu_a_SOURCES += strtoimax.c strtoumax.c
-
-endif
-## end gnulib module strtoumax
-
## begin gnulib module symlink
ifeq (,$(OMIT_GNULIB_MODULE_symlink))
diff --git a/lib/strtoul.c b/lib/strtoul.c
deleted file mode 100644
index c4974e0..0000000
--- a/lib/strtoul.c
+++ /dev/null
@@ -1,19 +0,0 @@
-/* Copyright (C) 1991, 1997, 2009-2017 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#define UNSIGNED 1
-
-#include "strtol.c"
diff --git a/lib/strtoull.c b/lib/strtoull.c
deleted file mode 100644
index 51ae3ac..0000000
--- a/lib/strtoull.c
+++ /dev/null
@@ -1,26 +0,0 @@
-/* Function to parse an 'unsigned long long int' from text.
- Copyright (C) 1995-1997, 1999, 2009-2017 Free Software Foundation, Inc.
- NOTE: The canonical source of this file is maintained with the GNU C
- Library. Bugs can be reported to bug-glibc@gnu.org.
-
- This program is free software: you can redistribute it and/or modify it
- under the terms of the GNU General Public License as published by the
- Free Software Foundation; either version 3 of the License, or any
- later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#define QUAD 1
-
-#include "strtoul.c"
-
-#ifdef _LIBC
-strong_alias (__strtoull_internal, __strtouq_internal)
-weak_alias (strtoull, strtouq)
-#endif
diff --git a/lib/strtoumax.c b/lib/strtoumax.c
deleted file mode 100644
index dc395d6..0000000
--- a/lib/strtoumax.c
+++ /dev/null
@@ -1,2 +0,0 @@
-#define UNSIGNED 1
-#include "strtoimax.c"
diff --git a/m4/gnulib-comp.m4 b/m4/gnulib-comp.m4
index 3f196d4..8f53a99 100644
--- a/m4/gnulib-comp.m4
+++ b/m4/gnulib-comp.m4
@@ -140,8 +140,6 @@ AC_DEFUN
# Code from module string:
# Code from module strtoimax:
# Code from module strtoll:
- # Code from module strtoull:
- # Code from module strtoumax:
# Code from module symlink:
# Code from module sys_select:
# Code from module sys_stat:
@@ -364,12 +362,6 @@ AC_DEFUN
gl_PREREQ_STRTOIMAX
fi
gl_INTTYPES_MODULE_INDICATOR([strtoimax])
- gl_FUNC_STRTOUMAX
- if test $HAVE_DECL_STRTOUMAX = 0 || test $REPLACE_STRTOUMAX = 1; then
- AC_LIBOBJ([strtoumax])
- gl_PREREQ_STRTOUMAX
- fi
- gl_INTTYPES_MODULE_INDICATOR([strtoumax])
gl_FUNC_SYMLINK
if test $HAVE_SYMLINK = 0 || test $REPLACE_SYMLINK = 1; then
AC_LIBOBJ([symlink])
@@ -420,7 +412,6 @@ AC_DEFUN
gl_gnulib_enabled_6099e9737f757db36c47fa9d9f02e88c=false
gl_gnulib_enabled_secure_getenv=false
gl_gnulib_enabled_strtoll=false
- gl_gnulib_enabled_strtoull=false
gl_gnulib_enabled_tempname=false
gl_gnulib_enabled_682e609604ccaac6be382e4ee3a4eaec=false
func_gl_gnulib_m4code_260941c0e5dc67ec9e87d1fb321c300b ()
@@ -569,18 +560,6 @@ AC_DEFUN
gl_gnulib_enabled_strtoll=true
fi
}
- func_gl_gnulib_m4code_strtoull ()
- {
- if ! $gl_gnulib_enabled_strtoull; then
- gl_FUNC_STRTOULL
- if test $HAVE_STRTOULL = 0; then
- AC_LIBOBJ([strtoull])
- gl_PREREQ_STRTOULL
- fi
- gl_STDLIB_MODULE_INDICATOR([strtoull])
- gl_gnulib_enabled_strtoull=true
- fi
- }
func_gl_gnulib_m4code_tempname ()
{
if ! $gl_gnulib_enabled_tempname; then
@@ -649,9 +628,6 @@ AC_DEFUN
if { test $HAVE_DECL_STRTOIMAX = 0 || test $REPLACE_STRTOIMAX = 1; } && test $ac_cv_type_long_long_int = yes; then
func_gl_gnulib_m4code_strtoll
fi
- if { test $HAVE_DECL_STRTOUMAX = 0 || test $REPLACE_STRTOUMAX = 1; } && test $ac_cv_type_unsigned_long_long_int = yes; then
- func_gl_gnulib_m4code_strtoull
- fi
if test $HAVE_TIMEGM = 0 || test $REPLACE_TIMEGM = 1; then
func_gl_gnulib_m4code_5264294aa0a5557541b53c8c741f7f31
fi
@@ -670,7 +646,6 @@ AC_DEFUN
AM_CONDITIONAL([gl_GNULIB_ENABLED_6099e9737f757db36c47fa9d9f02e88c], [$gl_gnulib_enabled_6099e9737f757db36c47fa9d9f02e88c])
AM_CONDITIONAL([gl_GNULIB_ENABLED_secure_getenv], [$gl_gnulib_enabled_secure_getenv])
AM_CONDITIONAL([gl_GNULIB_ENABLED_strtoll], [$gl_gnulib_enabled_strtoll])
- AM_CONDITIONAL([gl_GNULIB_ENABLED_strtoull], [$gl_gnulib_enabled_strtoull])
AM_CONDITIONAL([gl_GNULIB_ENABLED_tempname], [$gl_gnulib_enabled_tempname])
AM_CONDITIONAL([gl_GNULIB_ENABLED_682e609604ccaac6be382e4ee3a4eaec], [$gl_gnulib_enabled_682e609604ccaac6be382e4ee3a4eaec])
# End of code from modules
@@ -940,9 +915,6 @@ AC_DEFUN
lib/strtoimax.c
lib/strtol.c
lib/strtoll.c
- lib/strtoul.c
- lib/strtoull.c
- lib/strtoumax.c
lib/symlink.c
lib/sys_select.in.h
lib/sys_stat.in.h
@@ -1051,8 +1023,6 @@ AC_DEFUN
m4/string_h.m4
m4/strtoimax.m4
m4/strtoll.m4
- m4/strtoull.m4
- m4/strtoumax.m4
m4/symlink.m4
m4/sys_select_h.m4
m4/sys_socket_h.m4
diff --git a/m4/strtoull.m4 b/m4/strtoull.m4
deleted file mode 100644
index c6b2150..0000000
--- a/m4/strtoull.m4
+++ /dev/null
@@ -1,24 +0,0 @@
-# strtoull.m4 serial 7
-dnl Copyright (C) 2002, 2004, 2006, 2008-2017 Free Software Foundation, Inc.
-dnl This file is free software; the Free Software Foundation
-dnl gives unlimited permission to copy and/or distribute it,
-dnl with or without modifications, as long as this notice is preserved.
-
-AC_DEFUN([gl_FUNC_STRTOULL],
-[
- AC_REQUIRE([gl_STDLIB_H_DEFAULTS])
- dnl We don't need (and can't compile) the replacement strtoull
- dnl unless the type 'unsigned long long int' exists.
- AC_REQUIRE([AC_TYPE_UNSIGNED_LONG_LONG_INT])
- if test "$ac_cv_type_unsigned_long_long_int" = yes; then
- AC_CHECK_FUNCS([strtoull])
- if test $ac_cv_func_strtoull = no; then
- HAVE_STRTOULL=0
- fi
- fi
-])
-
-# Prerequisites of lib/strtoull.c.
-AC_DEFUN([gl_PREREQ_STRTOULL], [
- :
-])
diff --git a/m4/strtoumax.m4 b/m4/strtoumax.m4
deleted file mode 100644
index 43ef5b5..0000000
--- a/m4/strtoumax.m4
+++ /dev/null
@@ -1,28 +0,0 @@
-# strtoumax.m4 serial 12
-dnl Copyright (C) 2002-2004, 2006, 2009-2017 Free Software Foundation, Inc.
-dnl This file is free software; the Free Software Foundation
-dnl gives unlimited permission to copy and/or distribute it,
-dnl with or without modifications, as long as this notice is preserved.
-
-AC_DEFUN([gl_FUNC_STRTOUMAX],
-[
- AC_REQUIRE([gl_INTTYPES_H_DEFAULTS])
-
- dnl On OSF/1 5.1 with cc, this function is declared but not defined.
- AC_CHECK_FUNCS_ONCE([strtoumax])
- AC_CHECK_DECLS_ONCE([strtoumax])
- if test "$ac_cv_have_decl_strtoumax" = yes; then
- if test "$ac_cv_func_strtoumax" != yes; then
- # HP-UX 11.11 has "#define strtoimax(...) ..." but no function.
- REPLACE_STRTOUMAX=1
- fi
- else
- HAVE_DECL_STRTOUMAX=0
- fi
-])
-
-# Prerequisites of lib/strtoumax.c.
-AC_DEFUN([gl_PREREQ_STRTOUMAX], [
- AC_CHECK_DECLS([strtoull])
- AC_REQUIRE([AC_TYPE_UNSIGNED_LONG_LONG_INT])
-])
diff --git a/src/editfns.c b/src/editfns.c
index 98187df..1dbae8f 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -3851,6 +3851,23 @@ usage: (propertize STRING &rest PROPERTIES) */)
return string;
}
+/* Convert the prefix of STR from ASCII decimal digits to a number.
+ Set *STR_END to the address of the first non-digit. Return the
+ number, or PTRDIFF_MAX on overflow. Return 0 if there is no number.
+ This is like strtol for ptrdiff_t and base 10 and C locale,
+ except without negative numbers or errno. */
+
+static ptrdiff_t
+str2num (char *str, char **str_end)
+{
+ ptrdiff_t n = 0;
+ for (; c_isdigit (*str); str++)
+ if (INT_MULTIPLY_WRAPV (n, 10, &n) || INT_ADD_WRAPV (n, *str - '0', &n))
+ n = PTRDIFF_MAX;
+ *str_end = str;
+ return n;
+}
+
DEFUN ("format", Fformat, Sformat, 1, MANY, 0,
doc: /* Format a string out of a format-string and arguments.
The first argument is a format control string.
@@ -4057,17 +4074,16 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
digits to print after the '.' for floats, or the max.
number of chars to print from a string. */
- uintmax_t num;
+ ptrdiff_t num;
char *num_end;
if (c_isdigit (*format))
{
- num = strtoumax (format, &num_end, 10);
+ num = str2num (format, &num_end);
if (*num_end == '$')
{
if (num == 0)
error ("Invalid format field number 0");
- n = min (num, PTRDIFF_MAX);
- n--;
+ n = num - 1;
format = num_end + 1;
}
}
@@ -4095,15 +4111,15 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
space_flag &= ! plus_flag;
zero_flag &= ! minus_flag;
- num = strtoumax (format, &num_end, 10);
+ num = str2num (format, &num_end);
if (max_bufsize <= num)
string_overflow ();
ptrdiff_t field_width = num;
bool precision_given = *num_end == '.';
- uintmax_t precision = (precision_given
- ? strtoumax (num_end + 1, &num_end, 10)
- : UINTMAX_MAX);
+ ptrdiff_t precision = (precision_given
+ ? str2num (num_end + 1, &num_end)
+ : PTRDIFF_MAX);
format = num_end;
if (format == end)
@@ -4176,7 +4192,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
/* handle case (precision[n] >= 0) */
ptrdiff_t prec = -1;
- if (precision_given && precision <= TYPE_MAXIMUM (ptrdiff_t))
+ if (precision_given)
prec = precision;
/* lisp_string_width ignores a precision of 0, but GNU
@@ -4424,8 +4440,9 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
padding and excess precision. Deal with excess precision
first. This happens only when the format specifies
ridiculously large precision. */
- uintmax_t excess_precision = precision - prec;
- uintmax_t leading_zeros = 0, trailing_zeros = 0;
+ ptrdiff_t excess_precision
+ = precision_given ? precision - prec : 0;
+ ptrdiff_t leading_zeros = 0, trailing_zeros = 0;
if (excess_precision)
{
if (float_conversion)
@@ -4451,7 +4468,9 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
/* Compute the total bytes needed for this item, including
excess precision and padding. */
- uintmax_t numwidth = sprintf_bytes + excess_precision;
+ ptrdiff_t numwidth;
+ if (INT_ADD_WRAPV (sprintf_bytes, excess_precision, &numwidth))
+ numwidth = PTRDIFF_MAX;
ptrdiff_t padding
= numwidth < field_width ? field_width - numwidth : 0;
if (max_bufsize - sprintf_bytes <= excess_precision
diff --git a/src/lread.c b/src/lread.c
index 368b86e..f849398 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3495,25 +3495,18 @@ substitute_in_interval (INTERVAL interval, Lisp_Object arg)
}
\f
-#define LEAD_INT 1
-#define DOT_CHAR 2
-#define TRAIL_INT 4
-#define E_EXP 16
-
-
-/* Convert STRING to a number, assuming base BASE. Return a fixnum if CP has
- integer syntax and fits in a fixnum, else return the nearest float if CP has
- either floating point or integer syntax and BASE is 10, else return nil. If
- IGNORE_TRAILING, consider just the longest prefix of CP that has
- valid floating point syntax. Signal an overflow if BASE is not 10 and the
- number has integer syntax but does not fit. */
+/* Convert STRING to a number, assuming base BASE. Return a fixnum if
+ STRING has integer syntax and fits in a fixnum, else return the
+ nearest float if STRING has either floating point or integer syntax
+ and BASE is 10, else return nil. If IGNORE_TRAILING, consider just
+ the longest prefix of STRING that has valid floating point syntax.
+ Signal an overflow if BASE is not 10 and the number has integer
+ syntax but does not fit. */
Lisp_Object
string_to_number (char const *string, int base, bool ignore_trailing)
{
- int state;
char const *cp = string;
- int leading_digit;
bool float_syntax = 0;
double value = 0;
@@ -3525,15 +3518,23 @@ string_to_number (char const *string, int base, bool ignore_trailing)
bool signedp = negative || *cp == '+';
cp += signedp;
- state = 0;
-
- leading_digit = digit_to_number (*cp, base);
+ enum { INTOVERFLOW = 1, LEAD_INT = 2, DOT_CHAR = 4, TRAIL_INT = 8,
+ E_EXP = 16 };
+ int state = 0;
+ int leading_digit = digit_to_number (*cp, base);
+ uintmax_t n = leading_digit;
if (leading_digit >= 0)
{
state |= LEAD_INT;
- do
- ++cp;
- while (digit_to_number (*cp, base) >= 0);
+ for (int digit; 0 <= (digit = digit_to_number (*++cp, base)); )
+ {
+ if (INT_MULTIPLY_OVERFLOW (n, base))
+ state |= INTOVERFLOW;
+ n *= base;
+ if (INT_ADD_OVERFLOW (n, digit))
+ state |= INTOVERFLOW;
+ n += digit;
+ }
}
if (*cp == '.')
{
@@ -3583,32 +3584,22 @@ string_to_number (char const *string, int base, bool ignore_trailing)
}
float_syntax = ((state & (DOT_CHAR|TRAIL_INT)) == (DOT_CHAR|TRAIL_INT)
- || state == (LEAD_INT|E_EXP));
+ || (state & ~INTOVERFLOW) == (LEAD_INT|E_EXP));
}
/* Return nil if the number uses invalid syntax. If IGNORE_TRAILING, accept
any prefix that matches. Otherwise, the entire string must match. */
if (! (ignore_trailing
? ((state & LEAD_INT) != 0 || float_syntax)
- : (!*cp && ((state & ~DOT_CHAR) == LEAD_INT || float_syntax))))
+ : (!*cp && ((state & ~(INTOVERFLOW | DOT_CHAR)) == LEAD_INT
+ || float_syntax))))
return Qnil;
/* If the number uses integer and not float syntax, and is in C-language
range, use its value, preferably as a fixnum. */
if (leading_digit >= 0 && ! float_syntax)
{
- uintmax_t n;
-
- /* Fast special case for single-digit integers. This also avoids a
- glitch when BASE is 16 and IGNORE_TRAILING, because in that
- case some versions of strtoumax accept numbers like "0x1" that Emacs
- does not allow. */
- if (digit_to_number (string[signedp + 1], base) < 0)
- return make_number (negative ? -leading_digit : leading_digit);
-
- errno = 0;
- n = strtoumax (string + signedp, NULL, base);
- if (errno == ERANGE)
+ if (state & INTOVERFLOW)
{
/* Unfortunately there's no simple and accurate way to convert
non-base-10 numbers that are out of C-language range. */
--
2.9.4
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Limit-format-fields-to-more-POSIX-like-spec.patch --]
[-- Type: text/x-patch; name="0002-Limit-format-fields-to-more-POSIX-like-spec.patch", Size: 5754 bytes --]
From 4c9bbc524171fa1022e0cbf4516827be98ae08a5 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 1 Jun 2017 16:03:12 -0700
Subject: [PATCH 2/2] Limit format fields to more POSIX-like spec
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* doc/lispref/strings.texi (Formatting Strings):
Don’t allow mixing numbered with unnumbered format specs.
* src/editfns.c (styled_format): Don’t bother checking for field 0,
since it doesn’t crash and the behavior is not specified.
* test/src/editfns-tests.el (format-with-field): Adjust tests to
match current doc. Add more tests for out-of-range fields.
---
doc/lispref/strings.texi | 13 +++++--------
src/editfns.c | 8 ++++----
test/src/editfns-tests.el | 28 ++++++++++++++++++++++------
3 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index 4d33e55..e80e778 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -965,16 +965,13 @@ Formatting Strings
decimal number immediately after the initial @samp{%}, followed by a
literal dollar sign @samp{$}. It causes the format specification to
convert the argument with the given number instead of the next
-argument. Argument 1 is the argument just after the format.
-
- You can mix specifications with and without field numbers. A
-specification without a field number that follows a specification with
-a field number will convert the argument after the one specified by
-the field number:
+argument. Field numbers start at 1. A format can contain either
+numbered or unnumbered format specifications but not both, except that
+@samp{%%} can be mixed with numbered specifications.
@example
-(format "Argument %2$s, then %s, then %1$s" "x" "y" "z")
- @result{} "Argument y, then z, then x"
+(format "%2$s, %3$s, %%, %1$s" "x" "y" "z")
+ @result{} "y, z, %, x"
@end example
@cindex flags in format specifications
diff --git a/src/editfns.c b/src/editfns.c
index 1dbae8f..29af25a 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -3900,8 +3900,10 @@ where field is [0-9]+ followed by a literal dollar "$", flags is
[+ #-0]+, width is [0-9]+, and precision is a literal period "."
followed by [0-9]+.
-If field is given, it must be a one-based argument number; the given
-argument is substituted instead of the next one.
+If a %-sequence is numbered with a field with positive value N, the
+Nth argument is substituted instead of the next one. A format can
+contain either numbered or unnumbered %-sequences but not both, except
+that %% can be mixed with numbered %-sequences.
The + flag character inserts a + before any positive number, while a
space inserts a space before any positive number; these flags only
@@ -4081,8 +4083,6 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
num = str2num (format, &num_end);
if (*num_end == '$')
{
- if (num == 0)
- error ("Invalid format field number 0");
n = num - 1;
format = num_end + 1;
}
diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el
index c5923aa..54fb743 100644
--- a/test/src/editfns-tests.el
+++ b/test/src/editfns-tests.el
@@ -178,17 +178,33 @@ transpose-test-get-byte-positions
(concat (make-string 2048 ?X) "0")))))
(ert-deftest format-with-field ()
- (should (equal (format "First argument %2$s, then %s, then %1$s" 1 2 3)
+ (should (equal (format "First argument %2$s, then %3$s, then %1$s" 1 2 3)
"First argument 2, then 3, then 1"))
- (should (equal (format "a %2$s %d %1$d %2$S %d %d b" 11 "22" 33 44)
+ (should (equal (format "a %2$s %3$d %1$d %2$S %3$d %4$d b" 11 "22" 33 44)
"a 22 33 11 \"22\" 33 44 b"))
- (should (equal (format "a %08$s %s b" 1 2 3 4 5 6 7 8 9) "a 8 9 b"))
+ (should (equal (format "a %08$s %0000000000000000009$s b" 1 2 3 4 5 6 7 8 9)
+ "a 8 9 b"))
(should (equal (should-error (format "a %999999$s b" 11))
'(error "Not enough arguments for format string")))
+ (should (equal (should-error (format "a %2147483647$s b"))
+ '(error "Not enough arguments for format string")))
+ (should (equal (should-error (format "a %9223372036854775807$s b"))
+ '(error "Not enough arguments for format string")))
+ (should (equal (should-error (format "a %9223372036854775808$s b"))
+ '(error "Not enough arguments for format string")))
+ (should (equal (should-error (format "a %18446744073709551615$s b"))
+ '(error "Not enough arguments for format string")))
+ (should (equal (should-error (format "a %18446744073709551616$s b"))
+ '(error "Not enough arguments for format string")))
+ (should (equal (should-error
+ (format (format "a %%%d$d b" most-positive-fixnum)))
+ '(error "Not enough arguments for format string")))
+ (should (equal (should-error
+ (format (format "a %%%d$d b" (+ 1.0 most-positive-fixnum))))
+ '(error "Not enough arguments for format string")))
(should (equal (should-error (format "a %$s b" 11))
'(error "Invalid format operation %$")))
- (should (equal (should-error (format "a %0$s b" 11))
- '(error "Invalid format field number 0")))
- (should (equal (format "a %1$% %s b" 11) "a % 11 b")))
+ (should (equal (should-error (format "a %-1$s b" 11))
+ '(error "Invalid format operation %$"))))
;;; editfns-tests.el ends here
--
2.9.4
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-01 23:20 ` Paul Eggert
@ 2017-06-02 6:52 ` Philipp Stephani
2017-06-03 8:37 ` Paul Eggert
0 siblings, 1 reply; 40+ messages in thread
From: Philipp Stephani @ 2017-06-02 6:52 UTC (permalink / raw)
To: Paul Eggert, Jean-Christophe Helary, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 740 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Fr., 2. Juni 2017 um 01:20 Uhr:
> On 06/01/2017 01:17 AM, Philipp Stephani wrote:
> >
> > - Probably there's a bug lurking because the info[n] ought to be
> > indexed by specification index, not argument index. Something like
> > (format "%1$c %1$d" ?a) will probably do the wrong thing (untested).
>
> Sorry, I'm not following. That call returns "a 97"; isn't that the
> expected result?
>
Wrong example, try (format "%1$c %1$s" ?±)
> And on further thought, the tradition for Emacs is to
> document supported behavior and not worry about slowing Emacs down to
> check for undocumented usage
Would be great to break that tradition, but that's for another discussion.
[-- Attachment #2: Type: text/html, Size: 1227 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-02 6:52 ` Philipp Stephani
@ 2017-06-03 8:37 ` Paul Eggert
2017-06-03 9:12 ` Andreas Schwab
2017-06-03 9:34 ` Philipp Stephani
0 siblings, 2 replies; 40+ messages in thread
From: Paul Eggert @ 2017-06-03 8:37 UTC (permalink / raw)
To: Philipp Stephani, Jean-Christophe Helary, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 526 bytes --]
Philipp Stephani wrote:
> Wrong example, try (format "%1$c %1$s" ?±)
Ouch. Fixing that (without adversely affecting performance) would be a bit of a
hassle. Not sure that it's worth it. For now let's just document the limitation.
I installed the attached.
>> And on further thought, the tradition for Emacs is to
>> document supported behavior and not worry about slowing Emacs down to
>> check for undocumented usage
>
> Would be great to break that tradition, but that's for another discussion.
Indeed.
[-- Attachment #2: 0001-Document-uniqueness-limitation-of-format.txt --]
[-- Type: text/plain, Size: 2443 bytes --]
From 0147cdd4d96f1eaeef720ee0b89bddd27eaf4233 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 3 Jun 2017 01:31:04 -0700
Subject: [PATCH] =?UTF-8?q?Document=20uniqueness=20limitation=20of=20?=
=?UTF-8?q?=E2=80=98format=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* doc/lispref/strings.texi (Formatting Strings):
* src/editfns.c (Fformat):
Document that field numbers should be unique within a format.
---
doc/lispref/strings.texi | 7 ++++---
src/editfns.c | 7 ++++---
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index e80e778..f365c80 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -965,9 +965,10 @@ Formatting Strings
decimal number immediately after the initial @samp{%}, followed by a
literal dollar sign @samp{$}. It causes the format specification to
convert the argument with the given number instead of the next
-argument. Field numbers start at 1. A format can contain either
-numbered or unnumbered format specifications but not both, except that
-@samp{%%} can be mixed with numbered specifications.
+argument. Field numbers start at 1. A field number should differ
+from the other field numbers in the same format. A format can contain
+either numbered or unnumbered format specifications but not both,
+except that @samp{%%} can be mixed with numbered specifications.
@example
(format "%2$s, %3$s, %%, %1$s" "x" "y" "z")
diff --git a/src/editfns.c b/src/editfns.c
index 29af25a..a5088b0 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -3901,9 +3901,10 @@ where field is [0-9]+ followed by a literal dollar "$", flags is
followed by [0-9]+.
If a %-sequence is numbered with a field with positive value N, the
-Nth argument is substituted instead of the next one. A format can
-contain either numbered or unnumbered %-sequences but not both, except
-that %% can be mixed with numbered %-sequences.
+Nth argument is substituted instead of the next one. A field number
+should differ from the other field numbers in the same format. A
+format can contain either numbered or unnumbered %-sequences but not
+both, except that %% can be mixed with numbered %-sequences.
The + flag character inserts a + before any positive number, while a
space inserts a space before any positive number; these flags only
--
2.7.4
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-03 8:37 ` Paul Eggert
@ 2017-06-03 9:12 ` Andreas Schwab
2017-06-03 9:34 ` Philipp Stephani
1 sibling, 0 replies; 40+ messages in thread
From: Andreas Schwab @ 2017-06-03 9:12 UTC (permalink / raw)
To: Paul Eggert; +Cc: Philipp Stephani, Jean-Christophe Helary, emacs-devel
On Jun 03 2017, Paul Eggert <eggert@cs.ucla.edu> wrote:
> diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
> index e80e778..f365c80 100644
> --- a/doc/lispref/strings.texi
> +++ b/doc/lispref/strings.texi
> @@ -965,9 +965,10 @@ Formatting Strings
> decimal number immediately after the initial @samp{%}, followed by a
> literal dollar sign @samp{$}. It causes the format specification to
> convert the argument with the given number instead of the next
> -argument. Field numbers start at 1. A format can contain either
> -numbered or unnumbered format specifications but not both, except that
> -@samp{%%} can be mixed with numbered specifications.
> +argument. Field numbers start at 1. A field number should differ
> +from the other field numbers in the same format.
printf allows using an argument multiple times, Emacs should do as well.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-03 8:37 ` Paul Eggert
2017-06-03 9:12 ` Andreas Schwab
@ 2017-06-03 9:34 ` Philipp Stephani
2017-06-04 15:54 ` Paul Eggert
1 sibling, 1 reply; 40+ messages in thread
From: Philipp Stephani @ 2017-06-03 9:34 UTC (permalink / raw)
To: Paul Eggert, Jean-Christophe Helary, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 572 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Sa., 3. Juni 2017 um 10:37 Uhr:
> Philipp Stephani wrote:
> > Wrong example, try (format "%1$c %1$s" ?±)
>
> Ouch. Fixing that (without adversely affecting performance) would be a bit
> of a
> hassle. Not sure that it's worth it. For now let's just document the
> limitation.
> I installed the attached.
>
It's not hard to fix, just the info array needs to be indexed by spec
index, no argument. Installed 7d413cb4da89e0bdd70068e6a5e1dbc57190ed10;
0147cdd4d96f1eaeef720ee0b89bddd27eaf4233 can now be reverted.
[-- Attachment #2: Type: text/html, Size: 890 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-03 9:34 ` Philipp Stephani
@ 2017-06-04 15:54 ` Paul Eggert
2017-06-04 16:45 ` Eli Zaretskii
2017-12-03 5:43 ` Paul Eggert
0 siblings, 2 replies; 40+ messages in thread
From: Paul Eggert @ 2017-06-04 15:54 UTC (permalink / raw)
To: Philipp Stephani, Jean-Christophe Helary, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 878 bytes --]
Philipp Stephani wrote:
> It's not hard to fix,
Thanks. However, on my platform those fixes slowed down a microbenchmark (format
"%d %s %c" 3 "def" ?‘) by 7%, so I installed the attached further patch, which
recovered the performance loss for me. It avoids the need for the extra pass
through the format string and it caches more quantities in registers. While I
was at it I changed the doc to agree with POSIX that %% should not have
modifiers (not that we enforce this).
One little thing that has tripped me up in the past. In Emacs the preferred
style is typically to break assignments and initializations before the "=", not
after. Like this:
int some_long_name
= some_long_expression;
I myself prefer the style you used, as it's more column-efficient and it
simplifies text searches for assignments to a variable, but there it is.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Tune-format-after-recent-fix.patch --]
[-- Type: text/x-patch; name="0001-Tune-format-after-recent-fix.patch", Size: 17123 bytes --]
From d5fcf9e458053af1c50f0d4dad4c59db056790e5 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 4 Jun 2017 08:39:37 -0700
Subject: [PATCH] =?UTF-8?q?Tune=20=E2=80=98format=E2=80=99=20after=20recen?=
=?UTF-8?q?t=20fix?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* doc/lispref/strings.texi (Formatting Strings):
* src/editfns.c (Fformat): Format field numbers no longer need
to be unique, reverting the previous doc change since that has
now been fixed. Also, document that %% should not have modifiers.
* src/editfns.c (styled_format): Improve performance. Remove
the need for the new prepass over the format string, by using
a typically-more-generous bound for the info array size.
Initialize the info array lazily. Move string inspection to
the same area to help caching. Avoid the need for a
converted_to_string bitfield by using EQ. Cache arg in a
local and avoid some potential aliasing issues to help the
compiler. Info array is now 0-origin, not 1-origin.
---
doc/lispref/strings.texi | 13 ++--
src/editfns.c | 154 ++++++++++++++++++++---------------------------
2 files changed, 72 insertions(+), 95 deletions(-)
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index f365c80..23961f9 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -926,7 +926,8 @@ Formatting Strings
@item %%
Replace the specification with a single @samp{%}. This format
-specification is unusual in that it does not use a value. For example,
+specification is unusual in that its only form is plain
+@samp{%%} and that it does not use a value. For example,
@code{(format "%% %d" 30)} returns @code{"% 30"}.
@end table
@@ -965,10 +966,9 @@ Formatting Strings
decimal number immediately after the initial @samp{%}, followed by a
literal dollar sign @samp{$}. It causes the format specification to
convert the argument with the given number instead of the next
-argument. Field numbers start at 1. A field number should differ
-from the other field numbers in the same format. A format can contain
-either numbered or unnumbered format specifications but not both,
-except that @samp{%%} can be mixed with numbered specifications.
+argument. Field numbers start at 1. A format can contain either
+numbered or unnumbered format specifications but not both, except that
+@samp{%%} can be mixed with numbered specifications.
@example
(format "%2$s, %3$s, %%, %1$s" "x" "y" "z")
@@ -1026,8 +1026,7 @@ Formatting Strings
A specification can have a @dfn{width}, which is a decimal number
that appears after any field number and flags. If the printed
representation of the object contains fewer characters than this
-width, @code{format} extends it with padding. The width is
-ignored for the @samp{%%} specification. Any padding introduced by
+width, @code{format} extends it with padding. Any padding introduced by
the width normally consists of spaces inserted on the left:
@example
diff --git a/src/editfns.c b/src/editfns.c
index 56aa8ce..43b17f9 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -3891,8 +3891,8 @@ the next available argument, or the argument explicitly specified:
The argument used for %d, %o, %x, %e, %f, %g or %c must be a number.
Use %% to put a single % into the output.
-A %-sequence may contain optional field number, flag, width, and
-precision specifiers, as follows:
+A %-sequence other than %% may contain optional field number, flag,
+width, and precision specifiers, as follows:
%<field><flags><width><precision>character
@@ -3901,10 +3901,9 @@ where field is [0-9]+ followed by a literal dollar "$", flags is
followed by [0-9]+.
If a %-sequence is numbered with a field with positive value N, the
-Nth argument is substituted instead of the next one. A field number
-should differ from the other field numbers in the same format. A
-format can contain either numbered or unnumbered %-sequences but not
-both, except that %% can be mixed with numbered %-sequences.
+Nth argument is substituted instead of the next one. A format can
+contain either numbered or unnumbered %-sequences but not both, except
+that %% can be mixed with numbered %-sequences.
The + flag character inserts a + before any positive number, while a
space inserts a space before any positive number; these flags only
@@ -3980,49 +3979,40 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
bool arg_intervals = false;
USE_SAFE_ALLOCA;
- /* Each element records, for one field,
- the corresponding argument,
- the start and end bytepos in the output string,
- whether the argument has been converted to string (e.g., due to "%S"),
- and whether the argument is a string with intervals. */
+ /* Information recorded for each format spec. */
struct info
{
+ /* The corresponding argument, converted to string if conversion
+ was needed. */
Lisp_Object argument;
+
+ /* The start and end bytepos in the output string. */
ptrdiff_t start, end;
- bool_bf converted_to_string : 1;
+
+ /* Whether the argument is a string with intervals. */
bool_bf intervals : 1;
} *info;
CHECK_STRING (args[0]);
char *format_start = SSDATA (args[0]);
+ bool multibyte_format = STRING_MULTIBYTE (args[0]);
ptrdiff_t formatlen = SBYTES (args[0]);
- /* The number of percent characters is a safe upper bound for the
- number of format fields. */
- ptrdiff_t num_percent = 0;
- for (ptrdiff_t i = 0; i < formatlen; ++i)
- if (format_start[i] == '%')
- ++num_percent;
+ /* Upper bound on number of format specs. Each uses at least 2 chars. */
+ ptrdiff_t nspec_bound = SCHARS (args[0]) >> 1;
/* Allocate the info and discarded tables. */
ptrdiff_t alloca_size;
- if (INT_MULTIPLY_WRAPV (num_percent, sizeof *info, &alloca_size)
- || INT_ADD_WRAPV (sizeof *info, alloca_size, &alloca_size)
+ if (INT_MULTIPLY_WRAPV (nspec_bound, sizeof *info, &alloca_size)
|| INT_ADD_WRAPV (formatlen, alloca_size, &alloca_size)
|| SIZE_MAX < alloca_size)
memory_full (SIZE_MAX);
- /* info[0] is unused. Unused elements have -1 for start. */
info = SAFE_ALLOCA (alloca_size);
- memset (info, 0, alloca_size);
- for (ptrdiff_t i = 0; i < num_percent + 1; i++)
- {
- info[i].argument = Qunbound;
- info[i].start = -1;
- }
/* discarded[I] is 1 if byte I of the format
string was not copied into the output.
It is 2 if byte I was not the first byte of its character. */
- char *discarded = (char *) &info[num_percent + 1];
+ char *discarded = (char *) &info[nspec_bound];
+ memset (discarded, 0, formatlen);
/* Try to determine whether the result should be multibyte.
This is not always right; sometimes the result needs to be multibyte
@@ -4030,8 +4020,6 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
or because a grave accent or apostrophe is requoted,
and in that case, we won't know it here. */
- /* True if the format is multibyte. */
- bool multibyte_format = STRING_MULTIBYTE (args[0]);
/* True if the output should be a multibyte string,
which is true if any of the inputs is one. */
bool multibyte = multibyte_format;
@@ -4042,6 +4030,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
int quoting_style = message ? text_quoting_style () : -1;
ptrdiff_t ispec;
+ ptrdiff_t nspec = 0;
/* If we start out planning a unibyte result,
then discover it has to be multibyte, we jump back to retry. */
@@ -4155,11 +4144,14 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
if (! (n < nargs))
error ("Not enough arguments for format string");
- eassert (ispec < num_percent);
- ++ispec;
-
- if (EQ (info[ispec].argument, Qunbound))
- info[ispec].argument = args[n];
+ struct info *spec = &info[ispec++];
+ if (nspec < ispec)
+ {
+ spec->argument = args[n];
+ spec->intervals = false;
+ nspec = ispec;
+ }
+ Lisp_Object arg = spec->argument;
/* For 'S', prin1 the argument, and then treat like 's'.
For 's', princ any argument that is not a string or
@@ -4167,16 +4159,13 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
happen after retrying. */
if ((conversion == 'S'
|| (conversion == 's'
- && ! STRINGP (info[ispec].argument)
- && ! SYMBOLP (info[ispec].argument))))
+ && ! STRINGP (arg) && ! SYMBOLP (arg))))
{
- if (! info[ispec].converted_to_string)
+ if (EQ (arg, args[n]))
{
Lisp_Object noescape = conversion == 'S' ? Qnil : Qt;
- info[ispec].argument =
- Fprin1_to_string (info[ispec].argument, noescape);
- info[ispec].converted_to_string = true;
- if (STRING_MULTIBYTE (info[ispec].argument) && ! multibyte)
+ spec->argument = arg = Fprin1_to_string (arg, noescape);
+ if (STRING_MULTIBYTE (arg) && ! multibyte)
{
multibyte = true;
goto retry;
@@ -4186,29 +4175,25 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
}
else if (conversion == 'c')
{
- if (INTEGERP (info[ispec].argument)
- && ! ASCII_CHAR_P (XINT (info[ispec].argument)))
+ if (INTEGERP (arg) && ! ASCII_CHAR_P (XINT (arg)))
{
if (!multibyte)
{
multibyte = true;
goto retry;
}
- info[ispec].argument =
- Fchar_to_string (info[ispec].argument);
- info[ispec].converted_to_string = true;
+ spec->argument = arg = Fchar_to_string (arg);
}
- if (info[ispec].converted_to_string)
+ if (!EQ (arg, args[n]))
conversion = 's';
zero_flag = false;
}
- if (SYMBOLP (info[ispec].argument))
+ if (SYMBOLP (arg))
{
- info[ispec].argument =
- SYMBOL_NAME (info[ispec].argument);
- if (STRING_MULTIBYTE (info[ispec].argument) && ! multibyte)
+ spec->argument = arg = SYMBOL_NAME (arg);
+ if (STRING_MULTIBYTE (arg) && ! multibyte)
{
multibyte = true;
goto retry;
@@ -4239,12 +4224,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
else
{
ptrdiff_t nch, nby;
- width = lisp_string_width (info[ispec].argument,
- prec, &nch, &nby);
+ width = lisp_string_width (arg, prec, &nch, &nby);
if (prec < 0)
{
- nchars_string = SCHARS (info[ispec].argument);
- nbytes = SBYTES (info[ispec].argument);
+ nchars_string = SCHARS (arg);
+ nbytes = SBYTES (arg);
}
else
{
@@ -4254,11 +4238,8 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
}
convbytes = nbytes;
- if (convbytes && multibyte &&
- ! STRING_MULTIBYTE (info[ispec].argument))
- convbytes =
- count_size_as_multibyte (SDATA (info[ispec].argument),
- nbytes);
+ if (convbytes && multibyte && ! STRING_MULTIBYTE (arg))
+ convbytes = count_size_as_multibyte (SDATA (arg), nbytes);
ptrdiff_t padding
= width < field_width ? field_width - width : 0;
@@ -4274,20 +4255,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
p += padding;
nchars += padding;
}
- info[ispec].start = nchars;
+ spec->start = nchars;
if (p > buf
&& multibyte
&& !ASCII_CHAR_P (*((unsigned char *) p - 1))
- && STRING_MULTIBYTE (info[ispec].argument)
- && !CHAR_HEAD_P (SREF (info[ispec].argument, 0)))
+ && STRING_MULTIBYTE (arg)
+ && !CHAR_HEAD_P (SREF (arg, 0)))
maybe_combine_byte = true;
- p += copy_text (SDATA (info[ispec].argument),
- (unsigned char *) p,
+ p += copy_text (SDATA (arg), (unsigned char *) p,
nbytes,
- STRING_MULTIBYTE (info[ispec].argument),
- multibyte);
+ STRING_MULTIBYTE (arg), multibyte);
nchars += nchars_string;
@@ -4297,12 +4276,12 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
p += padding;
nchars += padding;
}
- info[ispec].end = nchars;
+ spec->end = nchars;
/* If this argument has text properties, record where
in the result string it appears. */
- if (string_intervals (info[ispec].argument))
- info[ispec].intervals = arg_intervals = true;
+ if (string_intervals (arg))
+ spec->intervals = arg_intervals = true;
continue;
}
@@ -4313,8 +4292,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
|| conversion == 'X'))
error ("Invalid format operation %%%c",
STRING_CHAR ((unsigned char *) format - 1));
- else if (! (INTEGERP (info[ispec].argument)
- || (FLOATP (info[ispec].argument) && conversion != 'c')))
+ else if (! (INTEGERP (arg) || (FLOATP (arg) && conversion != 'c')))
error ("Format specifier doesn't match argument type");
else
{
@@ -4376,7 +4354,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
if (INT_AS_LDBL)
{
*f = 'L';
- f += INTEGERP (info[ispec].argument);
+ f += INTEGERP (arg);
}
}
else if (conversion != 'c')
@@ -4408,22 +4386,22 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
ptrdiff_t sprintf_bytes;
if (float_conversion)
{
- if (INT_AS_LDBL && INTEGERP (info[ispec].argument))
+ if (INT_AS_LDBL && INTEGERP (arg))
{
/* Although long double may have a rounding error if
DIG_BITS_LBOUND * LDBL_MANT_DIG < FIXNUM_BITS - 1,
it is more accurate than plain 'double'. */
- long double x = XINT (info[ispec].argument);
+ long double x = XINT (arg);
sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
}
else
sprintf_bytes = sprintf (sprintf_buf, convspec, prec,
- XFLOATINT (info[ispec].argument));
+ XFLOATINT (arg));
}
else if (conversion == 'c')
{
/* Don't use sprintf here, as it might mishandle prec. */
- sprintf_buf[0] = XINT (info[ispec].argument);
+ sprintf_buf[0] = XINT (arg);
sprintf_bytes = prec != 0;
}
else if (conversion == 'd' || conversion == 'i')
@@ -4432,11 +4410,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
instead so it also works for values outside
the integer range. */
printmax_t x;
- if (INTEGERP (info[ispec].argument))
- x = XINT (info[ispec].argument);
+ if (INTEGERP (arg))
+ x = XINT (arg);
else
{
- double d = XFLOAT_DATA (info[ispec].argument);
+ double d = XFLOAT_DATA (arg);
if (d < 0)
{
x = TYPE_MINIMUM (printmax_t);
@@ -4456,11 +4434,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
{
/* Don't sign-extend for octal or hex printing. */
uprintmax_t x;
- if (INTEGERP (info[ispec].argument))
- x = XUINT (info[ispec].argument);
+ if (INTEGERP (arg))
+ x = XUINT (arg);
else
{
- double d = XFLOAT_DATA (info[ispec].argument);
+ double d = XFLOAT_DATA (arg);
if (d < 0)
x = 0;
else
@@ -4541,7 +4519,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
exponent_bytes = src + sprintf_bytes - e;
}
- info[ispec].start = nchars;
+ spec->start = nchars;
if (! minus_flag)
{
memset (p, ' ', padding);
@@ -4572,7 +4550,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
p += padding;
nchars += padding;
}
- info[ispec].end = nchars;
+ spec->end = nchars;
continue;
}
@@ -4681,7 +4659,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
if (CONSP (props))
{
ptrdiff_t bytepos = 0, position = 0, translated = 0;
- ptrdiff_t fieldn = 1;
+ ptrdiff_t fieldn = 0;
/* Adjust the bounds of each text property
to the proper start and end in the output string. */
@@ -4747,7 +4725,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
/* Add text properties from arguments. */
if (arg_intervals)
- for (ptrdiff_t i = 1; i <= num_percent; i++)
+ for (ptrdiff_t i = 0; i < nspec; i++)
if (info[i].intervals)
{
len = make_number (SCHARS (info[i].argument));
--
2.7.4
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-04 15:54 ` Paul Eggert
@ 2017-06-04 16:45 ` Eli Zaretskii
2017-06-04 18:37 ` Paul Eggert
2017-12-03 5:43 ` Paul Eggert
1 sibling, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2017-06-04 16:45 UTC (permalink / raw)
To: Paul Eggert; +Cc: p.stephani2, jean.christophe.helary, emacs-devel
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 4 Jun 2017 08:54:31 -0700
>
> However, on my platform those fixes slowed down a microbenchmark (format
> "%d %s %c" 3 "def" ?‘) by 7%, so I installed the attached further patch, which
> recovered the performance loss for me.
Would it make sense to start collecting all those benchmarks, e.g.
somewhere under test/, so that in due time we will have a sound
body of code to make sure we don't lose performance during
development?
TIA
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-04 16:45 ` Eli Zaretskii
@ 2017-06-04 18:37 ` Paul Eggert
0 siblings, 0 replies; 40+ messages in thread
From: Paul Eggert @ 2017-06-04 18:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: p.stephani2, jean.christophe.helary, emacs-devel
Eli Zaretskii wrote:
> Would it make sense to start collecting all those benchmarks, e.g.
> somewhere under test/, so that in due time we will have a sound
> body of code
I wouldn't collect microbenchmarks just for the sake of collecting them. They're
flaky, and the overhead of collecting them and maintaining the collection is
likely greater than any future benefit of having them around. If we want to have
a performance testsuite our main problem is building and supporting its
infrastructure, not coming up with examples.
If you're interested despite the above disclaimer, here's what I used:
(defun bench-with (n)
(let ((start (float-time (get-internal-run-time)))
(i 0))
(while (< i n)
(format "%d %s %c" 3 "def" ?‘)
(setq i (1+ i)))
(- (float-time (get-internal-run-time)) start)))
(defun bench-without (n)
(let ((start (float-time (get-internal-run-time)))
(i 0))
(while (< i n)
(setq i (1+ i)))
(- (float-time (get-internal-run-time)) start)))
(defun bench (n)
(- (bench-with n)
(bench-without n)))
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-28 14:27 ` Drew Adams
2017-05-28 14:36 ` Jean-Christophe Helary
2017-05-28 15:33 ` Eli Zaretskii
@ 2017-06-05 12:55 ` Jean-Christophe Helary
2017-07-17 23:22 ` Jean-Christophe Helary
2 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-06-05 12:55 UTC (permalink / raw)
To: emacs-devel
>> The discussion so far seems to point at modifying 'message' and
>> the likes so that developers don't have to bother with any l10n
>> mechanism on their part (besides for writing clean strings).
Just as a reminder, we'll need to update all the texi files so that they include:
@documentlanguage
@documentencoding
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-31 22:18 ` Philipp Stephani
` (2 preceding siblings ...)
2017-06-01 14:21 ` Eli Zaretskii
@ 2017-07-02 1:22 ` Jean-Christophe Helary
2017-07-02 2:34 ` Eli Zaretskii
3 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-02 1:22 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2113 bytes --]
I was wondering if that patch (and the fixes that came after) had been committed ?
Have some people started to use it ?
Jean-Christophe
> On Jun 1, 2017, at 7:18, Philipp Stephani <p.stephani2@gmail.com> wrote:
>
>
>
> Jean-Christophe Helary <jean.christophe.helary@gmail.com <mailto:jean.christophe.helary@gmail.com>> schrieb am So., 28. Mai 2017 um 07:29 Uhr:
> The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).
>
> ====================================================
> On May 27, 2017, at 10:52, Jean-Christophe Helary <jean.christophe.helary@gmail.com <mailto:jean.christophe.helary@gmail.com>> wrote:
>
>> My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected.
>
> On May 27, 2017, at 16:43, Eli Zaretskii <eliz@gnu.org <mailto:eliz@gnu.org>> wrote:
>
>> Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some
>> use "error", some use other mechanisms.
>
> Internally, they all boil down to a small set of C functions, which is
> where we should make these changes.
> ====================================================
>
> Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever...
>
> One small aspect would be to implement field numbers for `format' so that argument indices can be explicitly specified. That is probably quite important because the word order is different between languages, but it's also useful in other situations (e.g. when repeating an argument). I've implemented this in the attached patch.
> <0001-Implement-field-numbers-in-format-strings.txt>
[-- Attachment #2: Type: text/html, Size: 3552 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-02 1:22 ` Jean-Christophe Helary
@ 2017-07-02 2:34 ` Eli Zaretskii
0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2017-07-02 2:34 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Sun, 2 Jul 2017 10:22:28 +0900
>
> I was wondering if that patch (and the fixes that came after) had been committed ?
It was.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-05 12:55 ` Jean-Christophe Helary
@ 2017-07-17 23:22 ` Jean-Christophe Helary
2017-07-22 12:48 ` Jean-Christophe Helary
0 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-17 23:22 UTC (permalink / raw)
To: emacs-devel
> On Jun 5, 2017, at 21:55, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote:
>
>
>>> The discussion so far seems to point at modifying 'message' and
>>> the likes so that developers don't have to bother with any l10n
>>> mechanism on their part (besides for writing clean strings).
>
> Just as a reminder, we'll need to update all the texi files so that they include:
> @documentlanguage
> @documentencoding
Is it ok to proceed with that ?
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-17 23:22 ` Jean-Christophe Helary
@ 2017-07-22 12:48 ` Jean-Christophe Helary
2017-07-22 13:06 ` Eli Zaretskii
0 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-22 12:48 UTC (permalink / raw)
To: emacs-devel
> On Jul 18, 2017, at 8:22, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote:
>>>> The discussion so far seems to point at modifying 'message' and
>>>> the likes so that developers don't have to bother with any l10n
>>>> mechanism on their part (besides for writing clean strings).
>>
>> Just as a reminder, we'll need to update all the texi files so that they include:
>> @documentlanguage
>> @documentencoding
>
> Is it ok to proceed with that ?
I don't think there was a reply to that so I guess that's a yes. I'm going to proceed and send a patch here and then wait for comments.
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-22 12:48 ` Jean-Christophe Helary
@ 2017-07-22 13:06 ` Eli Zaretskii
2017-07-22 13:45 ` Jean-Christophe Helary
0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2017-07-22 13:06 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Sat, 22 Jul 2017 21:48:29 +0900
>
>
> > On Jul 18, 2017, at 8:22, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote:
> >>>> The discussion so far seems to point at modifying 'message' and
> >>>> the likes so that developers don't have to bother with any l10n
> >>>> mechanism on their part (besides for writing clean strings).
> >>
> >> Just as a reminder, we'll need to update all the texi files so that they include:
> >> @documentlanguage
> >> @documentencoding
> >
> > Is it ok to proceed with that ?
>
> I don't think there was a reply to that so I guess that's a yes.
Actually, I didn't reply because I didn't understand what was the
question. Can you remind me?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-22 13:06 ` Eli Zaretskii
@ 2017-07-22 13:45 ` Jean-Christophe Helary
2017-07-22 14:08 ` Eli Zaretskii
0 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-22 13:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]
> On Jul 22, 2017, at 22:06, Eli Zaretskii <eliz@gnu.org> wrote:
>
>>>>>> The discussion so far seems to point at modifying 'message' and
>>>>>> the likes so that developers don't have to bother with any l10n
>>>>>> mechanism on their part (besides for writing clean strings).
>>>>
>>>> Just as a reminder, we'll need to update all the texi files so that they include:
>>>> @documentlanguage
>>>> @documentencoding
>>>
>>> Is it ok to proceed with that ?
>>
>> I don't think there was a reply to that so I guess that's a yes.
>
> Actually, I didn't reply because I didn't understand what was the
> question. Can you remind me?
It was just about making sure that all the files (in fact only docstyle.texi) included "@documentlanguage en_US" and "@documentencoding UTF-8".
After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there.
Let me know if there is a problem with that.
Jean-Christophe
[-- Attachment #2.1: Type: text/html, Size: 3499 bytes --]
[-- Attachment #2.2: documentlanguage.diff --]
[-- Type: application/octet-stream, Size: 386 bytes --]
diff --git a/doc/emacs/docstyle.texi b/doc/emacs/docstyle.texi
index dfd14306b3..140101c076 100644
--- a/doc/emacs/docstyle.texi
+++ b/doc/emacs/docstyle.texi
@@ -1,4 +1,5 @@
@c Emacs documentation style settings
+@documentlanguage en_US
@documentencoding UTF-8
@c These two require Texinfo 5.0 or later, so we use the older
@c equivalent @set variables supported in 4.11 and hence
[-- Attachment #2.3: Type: text/html, Size: 233 bytes --]
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-22 13:45 ` Jean-Christophe Helary
@ 2017-07-22 14:08 ` Eli Zaretskii
2017-07-22 23:54 ` Jean-Christophe Helary
0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2017-07-22 14:08 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Sat, 22 Jul 2017 22:45:41 +0900
> Cc: emacs-devel@gnu.org
>
> After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there.
en_US is the default in the absence of an explicit @documentlanguage,
so I'm not sure I understand why would we need to add it. It will
change nothing, AFAIK.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-22 14:08 ` Eli Zaretskii
@ 2017-07-22 23:54 ` Jean-Christophe Helary
2017-07-23 14:39 ` Eli Zaretskii
0 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-22 23:54 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> On Jul 22, 2017, at 23:08, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
>> Date: Sat, 22 Jul 2017 22:45:41 +0900
>> Cc: emacs-devel@gnu.org
>>
>> After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there.
>
> en_US is the default in the absence of an explicit @documentlanguage,
> so I'm not sure I understand why would we need to add it. It will
> change nothing, AFAIK.
When po4a extracts translatable text to create the pot files docstyle.texi will also have a pot file that includes "@documentlanguage en_US" and that will allow translators to change that to "@documentlanguage fr_FR" etc. If we don't had that to docstyle.texi somebody will have to add the language string manually and that's an extra task that we'll have to check.
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-22 23:54 ` Jean-Christophe Helary
@ 2017-07-23 14:39 ` Eli Zaretskii
2017-07-23 23:29 ` Jean-Christophe Helary
0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2017-07-23 14:39 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Sun, 23 Jul 2017 08:54:36 +0900
> Cc: emacs-devel@gnu.org
>
> >> After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there.
> >
> > en_US is the default in the absence of an explicit @documentlanguage,
> > so I'm not sure I understand why would we need to add it. It will
> > change nothing, AFAIK.
>
> When po4a extracts translatable text to create the pot files docstyle.texi will also have a pot file that includes "@documentlanguage en_US" and that will allow translators to change that to "@documentlanguage fr_FR" etc. If we don't had that to docstyle.texi somebody will have to add the language string manually and that's an extra task that we'll have to check.
Sorry, I don't think I follow. Does po4a understand Texinfo in
general and the @documentlanguage directive in particular? If it
does, why doesn't it also know that en_US is the default when no such
directive is present? And if it doesn't understand Texinfo, why would
the presence of @documentlanguage change anything in its output? And
where would someone need to add the language string manually -- in
what file?
Thanks.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-23 14:39 ` Eli Zaretskii
@ 2017-07-23 23:29 ` Jean-Christophe Helary
2017-07-24 14:47 ` Eli Zaretskii
0 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-23 23:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> On Jul 23, 2017, at 23:39, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
>> Date: Sun, 23 Jul 2017 08:54:36 +0900
>> Cc: emacs-devel@gnu.org
>>
>>>> After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there.
>>>
>>> en_US is the default in the absence of an explicit @documentlanguage,
>>> so I'm not sure I understand why would we need to add it. It will
>>> change nothing, AFAIK.
>>
>> When po4a extracts translatable text to create the pot files docstyle.texi will also have a pot file that includes "@documentlanguage en_US" and that will allow translators to change that to "@documentlanguage fr_FR" etc. If we don't had that to docstyle.texi somebody will have to add the language string manually and that's an extra task that we'll have to check.
>
> Sorry, I don't think I follow. Does po4a understand Texinfo in
> general and the @documentlanguage directive in particular?
No, it doesn't. It just extracts text. po4a is the tool that has been developed to allow people who are familiar with po to work with documentation files. It just round-trips the document to and from the po format and is not at all as smart as gettext.
For ex. The current docstyle.texi is converted to a docstyle.texi.fr.po whose contents is:
================================
# SOME DESCRIPTIVE TITLE
# Copyright (C) YEAR Free Software Foundation, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2017-07-24 08:11+0900\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
#. type: Plain text
#: docstyle.texi:6
msgid "@documentencoding UTF-8 @documentlanguage en_US"
msgstr ""
================================
It is arguable that we could have something better like:
#. type: documentencoding
#: docstyle.texi:2
msgid "UTF-8"
msgstr ""
#. type: documentlanguage
#: docstyle.texi:3
msgid "en_US"
msgstr ""
but that's a separate discussion.
> And if it doesn't understand Texinfo, why would
> the presence of @documentlanguage change anything in its output?
po4a presents the strings to the translator for modification. As far as the translator is concerned, that is the only place where the language for the whole documentation set is presented and that the one and only place that needs to be changed so that the whole documentation gets the appropriate @documentlanguage directive.
> And where would someone need to add the language string manually -- in
> what file?
If the directive were not present, somebody would need to manually add @documentlanguage [appropriate language] directive to the docstyle.texi file of that language.
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-23 23:29 ` Jean-Christophe Helary
@ 2017-07-24 14:47 ` Eli Zaretskii
2017-07-24 15:34 ` Jean-Christophe Helary
0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2017-07-24 14:47 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Mon, 24 Jul 2017 08:29:37 +0900
> Cc: emacs-devel@gnu.org
>
> > And where would someone need to add the language string manually -- in
> > what file?
>
> If the directive were not present, somebody would need to manually add @documentlanguage [appropriate language] directive to the docstyle.texi file of that language.
OK, thanks. But AFAIU, po4a can be told to add text to the translated
document; couldn't we simply insert @documentlanguage in the
translated manual that way, without expecting the original manuals to
add this redundant directive to the English original?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-24 14:47 ` Eli Zaretskii
@ 2017-07-24 15:34 ` Jean-Christophe Helary
2017-07-24 15:51 ` Eli Zaretskii
0 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-24 15:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> On Jul 24, 2017, at 23:47, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
>> Date: Mon, 24 Jul 2017 08:29:37 +0900
>> Cc: emacs-devel@gnu.org
>>
>>> And where would someone need to add the language string manually -- in
>>> what file?
>>
>> If the directive were not present, somebody would need to manually add @documentlanguage [appropriate language] directive to the docstyle.texi file of that language.
>
> OK, thanks. But AFAIU, po4a can be told to add text to the translated
> document; couldn't we simply insert @documentlanguage in the
> translated manual that way, without expecting the original manuals to
> add this redundant directive to the English original?
I was not aware of this option, after checking the po4a manual here is what it says:
https://po4a.alioth.debian.org/man/man7/po4a.7.php
"HOWTO add extra text to translations (like translator's name)?"
> HOWTO add extra text to translations (like translator's name)?
>
> Because of the gettext approach, doing this becomes more difficult in po4a than it was when simply editing a new file along the original one. But it remains possible, thanks to the so-called addenda.
> It may help the comprehension to consider addenda as a sort of patches applied to the localized document after processing. They are rather different from the usual patches (they have only one line of context, which can embed Perl regular expression, and they can only add new text without removing any), but the functionalities are the same.
>
> Their goal is to allow the translator to add extra content to the document which is not translated from the original document. The most common usage is to add a section about the translation itself, listing contributors and explaining how to report bug against the translation.
>
> An addendum must be provided as a separate file. The first line constitutes a header indicating where in the produced document they should be placed. The rest of the addendum file will be added verbatim at the determined position of the resulting document.
>
> The header has a pretty rigid syntax: It must begin with the string PO4A-HEADER:, followed by a semi-colon (;) separated list of key=value fields. White spaces ARE important. Note that you cannot use the semi-colon char (;) in the value, and that quoting it doesn't help.
So we'd have to create and maintain as many files as there are translations, instead of just adding 1 directive in 1 texi file and be done with it.
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-24 15:34 ` Jean-Christophe Helary
@ 2017-07-24 15:51 ` Eli Zaretskii
2017-07-24 16:08 ` Jean-Christophe Helary
0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2017-07-24 15:51 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Tue, 25 Jul 2017 00:34:54 +0900
> Cc: emacs-devel@gnu.org
>
> So we'd have to create and maintain as many files as there are translations
No, because the file can be auto-generated: its contents is simple and
known in advance as soon as the target language is specified.
Anyway, I think we should defer adding this directive until we have
the first translation of the first manual. It sounds wrong to me to
make preparations without having even a single client to benefit from
it.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-24 15:51 ` Eli Zaretskii
@ 2017-07-24 16:08 ` Jean-Christophe Helary
2017-07-24 16:29 ` Eli Zaretskii
0 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-24 16:08 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> On Jul 25, 2017, at 0:51, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
>> Date: Tue, 25 Jul 2017 00:34:54 +0900
>> Cc: emacs-devel@gnu.org
>>
>> So we'd have to create and maintain as many files as there are translations
>
> No, because the file can be auto-generated: its contents is simple and
> known in advance as soon as the target language is specified.
But that's only a trick for one specific case. How is that different from the code I modified in package.el ?
I've just opened tutorial.el and found that we have functions there that do not consider English is the default. Why not generalize that ?
> Anyway, I think we should defer adding this directive until we have
> the first translation of the first manual. It sounds wrong to me to
> make preparations without having even a single client to benefit from
> it.
What I did on package.el was similar. Except for a few obvious errors, the new strings only benefit potential translators that we'll only have when the elisp strings are exposed for translation which is even further down the localization road...
Also, I *am* working on the documentation French translation and that's why I found about this directive. If you want to see the French manual to consider adding the directive I'm fine with that, but I honestly consider that as a punishment rather as an incentive to contribute.
Jean-Christophe
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-24 16:08 ` Jean-Christophe Helary
@ 2017-07-24 16:29 ` Eli Zaretskii
2017-07-24 16:48 ` Jean-Christophe Helary
0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2017-07-24 16:29 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Tue, 25 Jul 2017 01:08:29 +0900
> Cc: emacs-devel@gnu.org
>
> Also, I *am* working on the documentation French translation and that's why I found about this directive. If you want to see the French manual to consider adding the directive I'm fine with that, but I honestly consider that as a punishment rather as an incentive to contribute.
It's not a punishment, believe me, just a precaution. I've seen too
many cases where some project was started, infrastructure added in
preparation for it, then the project stalled, and we were left with
the infrastructure no one uses. What's worse, after enough time has
passed no one even remembers why we added that.
I'm guessing you are working in a local branch, is that true? (If
not, you should, because manuals are being changed all the time.)
Then just add the directive in your branch, and it will be merged when
your work is brought into master. Would that be okay with you?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-24 16:29 ` Eli Zaretskii
@ 2017-07-24 16:48 ` Jean-Christophe Helary
2017-07-24 16:55 ` Eli Zaretskii
0 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2017-07-24 16:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]
> On Jul 25, 2017, at 1:29, Eli Zaretskii <eliz@gnu.org> wrote:
>
> I'm guessing you are working in a local branch, is that true? (If
> not, you should, because manuals are being changed all the time.)
I'm regularly building po files from the updated texi set and moving all that to an OmegaT project that's not in the repository to avoid any git management. There is a bug in po4a that I reported that keeps it from creating the po file for org mode but I guess Martin Quinson is working on it...
> Then just add the directive in your branch, and it will be merged when
> your work is brought into master. Would that be okay with you?
Perfect.
On a side note, I don't see how my French translation can be brought into master since we don't have a location for translated texi file sets... For that we'd need to have a doc/en/... doc/fr/... structure, which does not exist at the moment... But I guess that's something we'll discuss when we have things to add there...
Jean-Christophe
[-- Attachment #2: Type: text/html, Size: 3862 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-07-24 16:48 ` Jean-Christophe Helary
@ 2017-07-24 16:55 ` Eli Zaretskii
0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2017-07-24 16:55 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Tue, 25 Jul 2017 01:48:24 +0900
> Cc: emacs-devel@gnu.org
>
> On a side note, I don't see how my French translation can be brought into master since we don't have a
> location for translated texi file sets... For that we'd need to have a doc/en/... doc/fr/... structure, which does
> not exist at the moment... But I guess that's something we'll discuss when we have things to add there...
We can discuss this later, indeed, but up front I don't see why we'd
need a separate directory, since I think the file name(s) will need to
be different anyway. Texinfo doesn't gracefully handle different
manuals that go under the same name, it wasn't designed to solve that
problem. So it's best to avoid the issue to begin with, and have,
say, emacs-fr.info, emacs-jp.info, etc.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-28 5:29 i18n/l10n summary Jean-Christophe Helary
2017-05-28 14:27 ` Drew Adams
2017-05-31 22:18 ` Philipp Stephani
@ 2017-07-28 0:15 ` Byung-Hee HWANG (황병희, 黃炳熙)
2018-04-25 12:58 ` Jean-Christophe Helary
3 siblings, 0 replies; 40+ messages in thread
From: Byung-Hee HWANG (황병희, 黃炳熙) @ 2017-07-28 0:15 UTC (permalink / raw)
To: emacs-devel
Jean-Christophe Helary <jean.christophe.helary@gmail.com> 께서 쓰시길,
《記事 全文 <AA86315D-1561-41EB-A349-63100C565E8D@gmail.com> 에서》:
> The discussion so far seems to point at modifying 'message' and the
> likes so that developers don't have to bother with any l10n mechanism
> on their part (besides for writing clean strings).
Sorry for late. Possibly we should go with *UTF-8* if this plan
committed. It is just my concern. Thanks!!!
--
^고맙습니다 _救濟蒼生_ 감사합니다_^))//
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-06-04 15:54 ` Paul Eggert
2017-06-04 16:45 ` Eli Zaretskii
@ 2017-12-03 5:43 ` Paul Eggert
1 sibling, 0 replies; 40+ messages in thread
From: Paul Eggert @ 2017-12-03 5:43 UTC (permalink / raw)
To: Philipp Stephani, Jean-Christophe Helary, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 506 bytes --]
I wrote on 2017-06-04:
> Philipp Stephani wrote:
>> It's not hard to fix,
>
> Thanks. However, on my platform those fixes slowed down a microbenchmark (format
> "%d %s %c" 3 "def" ?‘) by 7%, so I installed the attached further patch, which
> recovered the performance loss for me.
Oops, that introduced an off-by-one error that caused Emacs to access one past
the end of the 'info' array. To fix this I installed the attached further patch
into emacs-26 and merged emacs-26 into master.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-bug-in-i18n-l10n-optimization.patch --]
[-- Type: text/x-patch; name="0001-Fix-bug-in-i18n-l10n-optimization.patch", Size: 1429 bytes --]
From 04e5b28ff1691345e023a944dc6a6a9e9573bd07 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 2 Dec 2017 21:31:24 -0800
Subject: [PATCH] Fix bug in i18n/l10n optimization
This fixes a off-by-one buffer overrun bug introduced in
2017-06-04T15:39:37Z!eggert@cs.ucla.edu. Problem uncovered by an
experimental version of Emacs built with -fcheck-pointer-bounds
and running on Intel MPX hardware.
* src/editfns.c (styled_format): Avoid overrunning internal buffers.
---
src/editfns.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/editfns.c b/src/editfns.c
index f275f33..f7c26b9 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -4919,7 +4919,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
else if (discarded[bytepos] == 1)
{
position++;
- if (translated == info[fieldn].start)
+ if (fieldn < nspec && translated == info[fieldn].start)
{
translated += info[fieldn].end - info[fieldn].start;
fieldn++;
@@ -4939,7 +4939,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
else if (discarded[bytepos] == 1)
{
position++;
- if (translated == info[fieldn].start)
+ if (fieldn < nspec && translated == info[fieldn].start)
{
translated += info[fieldn].end - info[fieldn].start;
fieldn++;
--
2.7.4
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2017-05-28 5:29 i18n/l10n summary Jean-Christophe Helary
` (2 preceding siblings ...)
2017-07-28 0:15 ` Byung-Hee HWANG (황병희, 黃炳熙)
@ 2018-04-25 12:58 ` Jean-Christophe Helary
2018-09-21 4:18 ` Jean-Christophe Helary
3 siblings, 1 reply; 40+ messages in thread
From: Jean-Christophe Helary @ 2018-04-25 12:58 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 881 bytes --]
> On May 28, 2017, at 14:29, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote:
>
> The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).
Since there does not seem to be an agreement on the "straightening" of the package.el strings that I posted here (even though Eli seemed to be fine with them and nobody raised any objection), I'm going to start working on other packages and I will continue to send patches here.
Eventually somebody will see why that is important and will commit my package.el modifications *or* tell me why they are not acceptable (which I'm totally fine with btw) :)
Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune
[-- Attachment #2: Type: text/html, Size: 3114 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary
2018-04-25 12:58 ` Jean-Christophe Helary
@ 2018-09-21 4:18 ` Jean-Christophe Helary
0 siblings, 0 replies; 40+ messages in thread
From: Jean-Christophe Helary @ 2018-09-21 4:18 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]
1) packages.el
I just want to remind interested parties that packages.el has seen most of its strings "straightened" so that they don't rely on English grammatical structures (s for plurals, etc.)
The modifications changed strings like:
"3 packages have been updated"
into
"Number of updated packages: 3"
as well as a few other things.
I'd like to thank Eli, Noam and a few others (sorry that I forgot your names) for the coding hints.
2) other packages with user facing strings
There are lots of packages that have user facing strings, and most of them use code to produce English grammar structures (a simple case is if a number is greater than 1 then add an "s" to the string). Such code should be fixed to remove the reliance on English grammatical structures. Similarly, code constructs that create a given word order ( (concat "Nominal group" "Verbal group" "Nominal group", for ex) must be fixed so that they are the most "natural language" agnostic possible.
It would be nice if people check the packages.el modifications and try to adapt other packages in the same spirit.
3) I heard that Xemacs had the infrastructure required to support gettext are there people here familiar with this aspect of Xemacs ?
Jean-Christophe
> On Apr 25, 2018, at 21:58, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote:
>
>
>> On May 28, 2017, at 14:29, Jean-Christophe Helary <jean.christophe.helary@gmail.com <mailto:jean.christophe.helary@gmail.com>> wrote:
>>
>> The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).
>
> Since there does not seem to be an agreement on the "straightening" of the package.el strings that I posted here (even though Eli seemed to be fine with them and nobody raised any objection), I'm going to start working on other packages and I will continue to send patches here.
>
> Eventually somebody will see why that is important and will commit my package.el modifications *or* tell me why they are not acceptable (which I'm totally fine with btw) :)
>
>
> Jean-Christophe Helary
> -----------------------------------------------
> http://mac4translators.blogspot.com <http://mac4translators.blogspot.com/> @brandelune
>
>
Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune
[-- Attachment #2: Type: text/html, Size: 6624 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2018-09-21 4:18 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-28 5:29 i18n/l10n summary Jean-Christophe Helary
2017-05-28 14:27 ` Drew Adams
2017-05-28 14:36 ` Jean-Christophe Helary
2017-05-28 15:33 ` Eli Zaretskii
2017-06-05 12:55 ` Jean-Christophe Helary
2017-07-17 23:22 ` Jean-Christophe Helary
2017-07-22 12:48 ` Jean-Christophe Helary
2017-07-22 13:06 ` Eli Zaretskii
2017-07-22 13:45 ` Jean-Christophe Helary
2017-07-22 14:08 ` Eli Zaretskii
2017-07-22 23:54 ` Jean-Christophe Helary
2017-07-23 14:39 ` Eli Zaretskii
2017-07-23 23:29 ` Jean-Christophe Helary
2017-07-24 14:47 ` Eli Zaretskii
2017-07-24 15:34 ` Jean-Christophe Helary
2017-07-24 15:51 ` Eli Zaretskii
2017-07-24 16:08 ` Jean-Christophe Helary
2017-07-24 16:29 ` Eli Zaretskii
2017-07-24 16:48 ` Jean-Christophe Helary
2017-07-24 16:55 ` Eli Zaretskii
2017-05-31 22:18 ` Philipp Stephani
2017-05-31 22:29 ` Jean-Christophe Helary
2017-06-01 5:18 ` Paul Eggert
2017-06-01 8:17 ` Philipp Stephani
2017-06-01 23:20 ` Paul Eggert
2017-06-02 6:52 ` Philipp Stephani
2017-06-03 8:37 ` Paul Eggert
2017-06-03 9:12 ` Andreas Schwab
2017-06-03 9:34 ` Philipp Stephani
2017-06-04 15:54 ` Paul Eggert
2017-06-04 16:45 ` Eli Zaretskii
2017-06-04 18:37 ` Paul Eggert
2017-12-03 5:43 ` Paul Eggert
2017-06-01 14:21 ` Eli Zaretskii
2017-07-02 1:22 ` Jean-Christophe Helary
2017-07-02 2:34 ` Eli Zaretskii
2017-07-28 0:15 ` Byung-Hee HWANG (황병희, 黃炳熙)
2018-04-25 12:58 ` Jean-Christophe Helary
2018-09-21 4:18 ` Jean-Christophe Helary
[not found] <<AA86315D-1561-41EB-A349-63100C565E8D@gmail.com>
[not found] ` <<0aca6c65-4610-44c2-99c4-6cbe7aa68c9a@default>
[not found] ` <<83a85xgfo2.fsf@gnu.org>
2017-05-28 21:52 ` Drew Adams
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).