* RE: i18n/l10n summary [not found] ` <<83a85xgfo2.fsf@gnu.org> @ 2017-05-28 21:52 ` Drew Adams 0 siblings, 0 replies; 40+ messages in thread From: Drew Adams @ 2017-05-28 21:52 UTC (permalink / raw) To: Eli Zaretskii, Drew Adams; +Cc: jean.christophe.helary, emacs-devel > > If so, then perhaps there should be a way to easily, from > > Lisp, specify the target language explicitly ... by > > binding a variable. > > (let ((current-language-environment "FOO"))(message "FOOBAR")) > > > And...a user option that overrides such a language choice > > by Lisp code. > > M-x set-language-environment RET Great. Thx. ^ permalink raw reply [flat|nested] 40+ messages in thread
* i18n/l10n summary @ 2017-05-28 5:29 Jean-Christophe Helary 2017-05-28 14:27 ` Drew Adams ` (3 more replies) 0 siblings, 4 replies; 40+ messages in thread From: Jean-Christophe Helary @ 2017-05-28 5:29 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1261 bytes --] The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings). ==================================================== On May 27, 2017, at 10:52, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote: > My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected. On May 27, 2017, at 16:43, Eli Zaretskii <eliz@gnu.org> wrote: > Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some > use "error", some use other mechanisms. Internally, they all boil down to a small set of C functions, which is where we should make these changes. ==================================================== Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever... Jean-Christophe [-- Attachment #2: Type: text/html, Size: 1953 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: i18n/l10n summary 2017-05-28 5:29 Jean-Christophe Helary @ 2017-05-28 14:27 ` Drew Adams 2017-05-28 14:36 ` Jean-Christophe Helary ` (2 more replies) 2017-05-31 22:18 ` Philipp Stephani ` (2 subsequent siblings) 3 siblings, 3 replies; 40+ messages in thread From: Drew Adams @ 2017-05-28 14:27 UTC (permalink / raw) To: Jean-Christophe Helary, emacs-devel > The discussion so far seems to point at modifying 'message' and > the likes so that developers don't have to bother with any l10n > mechanism on their part (besides for writing clean strings). (Caveat: I haven't been following this thread at all, so ignore if not helpful.) Is the idea that something will be done so that, for example, `message' automatically uses a translation of the message to the user's currently preferred language? I.e., if that is what is planned, isn't it perhaps too systematic - all or nothing? If so, then perhaps there should be a way to easily, from Lisp, specify the target language explicitly - e.g. by an optional `message' argument or (better, because the scope is controllable without changing the `message' calls) by binding a variable. This, as opposed to automatically and always just making `message' target whatever language is currently being used/preferred by the user. And in that case, there should perhaps be a user option that overrides such a language choice by Lisp code. IOW, in general, let code control the language for a given `message' call or for a given scope (by a variable), but let a user customize Emacs to say whether to allow this. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-28 14:27 ` Drew Adams @ 2017-05-28 14:36 ` Jean-Christophe Helary 2017-05-28 15:33 ` Eli Zaretskii 2017-06-05 12:55 ` Jean-Christophe Helary 2 siblings, 0 replies; 40+ messages in thread From: Jean-Christophe Helary @ 2017-05-28 14:36 UTC (permalink / raw) To: emacs-devel > On May 28, 2017, at 23:27, Drew Adams <drew.adams@oracle.com> wrote: > >> The discussion so far seems to point at modifying 'message' and >> the likes so that developers don't have to bother with any l10n >> mechanism on their part (besides for writing clean strings). > > (Caveat: I haven't been following this thread at all, so > ignore if not helpful.) > > Is the idea that something will be done so that, for example, > `message' automatically uses a translation of the message to > the user's currently preferred language? I.e., if that is > what is planned, isn't it perhaps too systematic - all or > nothing? The idea is to have the messages displayed according to the *specified* language. How the language is specified has been considered out of the scope of this discussion, if I'm not wrong. But that's the idea. Basically, there are 2 possible states: 1) Emacs new install → the language is the OS default 2) User setting within Emacs (allows for normal use and for l10n testing). Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-28 14:27 ` Drew Adams 2017-05-28 14:36 ` Jean-Christophe Helary @ 2017-05-28 15:33 ` Eli Zaretskii 2017-06-05 12:55 ` Jean-Christophe Helary 2 siblings, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2017-05-28 15:33 UTC (permalink / raw) To: Drew Adams; +Cc: jean.christophe.helary, emacs-devel > Date: Sun, 28 May 2017 07:27:10 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > > Is the idea that something will be done so that, for example, > `message' automatically uses a translation of the message to > the user's currently preferred language? I.e., if that is > what is planned, isn't it perhaps too systematic - all or > nothing? > > If so, then perhaps there should be a way to easily, from > Lisp, specify the target language explicitly - e.g. by an > optional `message' argument or (better, because the scope > is controllable without changing the `message' calls) by > binding a variable. (let ((current-language-environment "FOO")) (message "FOOBAR")) > And in that case, there should perhaps be a user option > that overrides such a language choice by Lisp code. M-x set-language-environment RET ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-28 14:27 ` Drew Adams 2017-05-28 14:36 ` Jean-Christophe Helary 2017-05-28 15:33 ` Eli Zaretskii @ 2017-06-05 12:55 ` Jean-Christophe Helary 2017-07-17 23:22 ` Jean-Christophe Helary 2 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-06-05 12:55 UTC (permalink / raw) To: emacs-devel >> The discussion so far seems to point at modifying 'message' and >> the likes so that developers don't have to bother with any l10n >> mechanism on their part (besides for writing clean strings). Just as a reminder, we'll need to update all the texi files so that they include: @documentlanguage @documentencoding Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-05 12:55 ` Jean-Christophe Helary @ 2017-07-17 23:22 ` Jean-Christophe Helary 2017-07-22 12:48 ` Jean-Christophe Helary 0 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-17 23:22 UTC (permalink / raw) To: emacs-devel > On Jun 5, 2017, at 21:55, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote: > > >>> The discussion so far seems to point at modifying 'message' and >>> the likes so that developers don't have to bother with any l10n >>> mechanism on their part (besides for writing clean strings). > > Just as a reminder, we'll need to update all the texi files so that they include: > @documentlanguage > @documentencoding Is it ok to proceed with that ? Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-17 23:22 ` Jean-Christophe Helary @ 2017-07-22 12:48 ` Jean-Christophe Helary 2017-07-22 13:06 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-22 12:48 UTC (permalink / raw) To: emacs-devel > On Jul 18, 2017, at 8:22, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote: >>>> The discussion so far seems to point at modifying 'message' and >>>> the likes so that developers don't have to bother with any l10n >>>> mechanism on their part (besides for writing clean strings). >> >> Just as a reminder, we'll need to update all the texi files so that they include: >> @documentlanguage >> @documentencoding > > Is it ok to proceed with that ? I don't think there was a reply to that so I guess that's a yes. I'm going to proceed and send a patch here and then wait for comments. Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-22 12:48 ` Jean-Christophe Helary @ 2017-07-22 13:06 ` Eli Zaretskii 2017-07-22 13:45 ` Jean-Christophe Helary 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2017-07-22 13:06 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Sat, 22 Jul 2017 21:48:29 +0900 > > > > On Jul 18, 2017, at 8:22, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote: > >>>> The discussion so far seems to point at modifying 'message' and > >>>> the likes so that developers don't have to bother with any l10n > >>>> mechanism on their part (besides for writing clean strings). > >> > >> Just as a reminder, we'll need to update all the texi files so that they include: > >> @documentlanguage > >> @documentencoding > > > > Is it ok to proceed with that ? > > I don't think there was a reply to that so I guess that's a yes. Actually, I didn't reply because I didn't understand what was the question. Can you remind me? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-22 13:06 ` Eli Zaretskii @ 2017-07-22 13:45 ` Jean-Christophe Helary 2017-07-22 14:08 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-22 13:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1039 bytes --] > On Jul 22, 2017, at 22:06, Eli Zaretskii <eliz@gnu.org> wrote: > >>>>>> The discussion so far seems to point at modifying 'message' and >>>>>> the likes so that developers don't have to bother with any l10n >>>>>> mechanism on their part (besides for writing clean strings). >>>> >>>> Just as a reminder, we'll need to update all the texi files so that they include: >>>> @documentlanguage >>>> @documentencoding >>> >>> Is it ok to proceed with that ? >> >> I don't think there was a reply to that so I guess that's a yes. > > Actually, I didn't reply because I didn't understand what was the > question. Can you remind me? It was just about making sure that all the files (in fact only docstyle.texi) included "@documentlanguage en_US" and "@documentencoding UTF-8". After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there. Let me know if there is a problem with that. Jean-Christophe [-- Attachment #2.1: Type: text/html, Size: 3499 bytes --] [-- Attachment #2.2: documentlanguage.diff --] [-- Type: application/octet-stream, Size: 386 bytes --] diff --git a/doc/emacs/docstyle.texi b/doc/emacs/docstyle.texi index dfd14306b3..140101c076 100644 --- a/doc/emacs/docstyle.texi +++ b/doc/emacs/docstyle.texi @@ -1,4 +1,5 @@ @c Emacs documentation style settings +@documentlanguage en_US @documentencoding UTF-8 @c These two require Texinfo 5.0 or later, so we use the older @c equivalent @set variables supported in 4.11 and hence [-- Attachment #2.3: Type: text/html, Size: 233 bytes --] ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-22 13:45 ` Jean-Christophe Helary @ 2017-07-22 14:08 ` Eli Zaretskii 2017-07-22 23:54 ` Jean-Christophe Helary 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2017-07-22 14:08 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Sat, 22 Jul 2017 22:45:41 +0900 > Cc: emacs-devel@gnu.org > > After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there. en_US is the default in the absence of an explicit @documentlanguage, so I'm not sure I understand why would we need to add it. It will change nothing, AFAIK. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-22 14:08 ` Eli Zaretskii @ 2017-07-22 23:54 ` Jean-Christophe Helary 2017-07-23 14:39 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-22 23:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > On Jul 22, 2017, at 23:08, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> >> Date: Sat, 22 Jul 2017 22:45:41 +0900 >> Cc: emacs-devel@gnu.org >> >> After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there. > > en_US is the default in the absence of an explicit @documentlanguage, > so I'm not sure I understand why would we need to add it. It will > change nothing, AFAIK. When po4a extracts translatable text to create the pot files docstyle.texi will also have a pot file that includes "@documentlanguage en_US" and that will allow translators to change that to "@documentlanguage fr_FR" etc. If we don't had that to docstyle.texi somebody will have to add the language string manually and that's an extra task that we'll have to check. Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-22 23:54 ` Jean-Christophe Helary @ 2017-07-23 14:39 ` Eli Zaretskii 2017-07-23 23:29 ` Jean-Christophe Helary 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2017-07-23 14:39 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Sun, 23 Jul 2017 08:54:36 +0900 > Cc: emacs-devel@gnu.org > > >> After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there. > > > > en_US is the default in the absence of an explicit @documentlanguage, > > so I'm not sure I understand why would we need to add it. It will > > change nothing, AFAIK. > > When po4a extracts translatable text to create the pot files docstyle.texi will also have a pot file that includes "@documentlanguage en_US" and that will allow translators to change that to "@documentlanguage fr_FR" etc. If we don't had that to docstyle.texi somebody will have to add the language string manually and that's an extra task that we'll have to check. Sorry, I don't think I follow. Does po4a understand Texinfo in general and the @documentlanguage directive in particular? If it does, why doesn't it also know that en_US is the default when no such directive is present? And if it doesn't understand Texinfo, why would the presence of @documentlanguage change anything in its output? And where would someone need to add the language string manually -- in what file? Thanks. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-23 14:39 ` Eli Zaretskii @ 2017-07-23 23:29 ` Jean-Christophe Helary 2017-07-24 14:47 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-23 23:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > On Jul 23, 2017, at 23:39, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> >> Date: Sun, 23 Jul 2017 08:54:36 +0900 >> Cc: emacs-devel@gnu.org >> >>>> After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there. >>> >>> en_US is the default in the absence of an explicit @documentlanguage, >>> so I'm not sure I understand why would we need to add it. It will >>> change nothing, AFAIK. >> >> When po4a extracts translatable text to create the pot files docstyle.texi will also have a pot file that includes "@documentlanguage en_US" and that will allow translators to change that to "@documentlanguage fr_FR" etc. If we don't had that to docstyle.texi somebody will have to add the language string manually and that's an extra task that we'll have to check. > > Sorry, I don't think I follow. Does po4a understand Texinfo in > general and the @documentlanguage directive in particular? No, it doesn't. It just extracts text. po4a is the tool that has been developed to allow people who are familiar with po to work with documentation files. It just round-trips the document to and from the po format and is not at all as smart as gettext. For ex. The current docstyle.texi is converted to a docstyle.texi.fr.po whose contents is: ================================ # SOME DESCRIPTIVE TITLE # Copyright (C) YEAR Free Software Foundation, Inc. # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2017-07-24 08:11+0900\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "Language: \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" #. type: Plain text #: docstyle.texi:6 msgid "@documentencoding UTF-8 @documentlanguage en_US" msgstr "" ================================ It is arguable that we could have something better like: #. type: documentencoding #: docstyle.texi:2 msgid "UTF-8" msgstr "" #. type: documentlanguage #: docstyle.texi:3 msgid "en_US" msgstr "" but that's a separate discussion. > And if it doesn't understand Texinfo, why would > the presence of @documentlanguage change anything in its output? po4a presents the strings to the translator for modification. As far as the translator is concerned, that is the only place where the language for the whole documentation set is presented and that the one and only place that needs to be changed so that the whole documentation gets the appropriate @documentlanguage directive. > And where would someone need to add the language string manually -- in > what file? If the directive were not present, somebody would need to manually add @documentlanguage [appropriate language] directive to the docstyle.texi file of that language. Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-23 23:29 ` Jean-Christophe Helary @ 2017-07-24 14:47 ` Eli Zaretskii 2017-07-24 15:34 ` Jean-Christophe Helary 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2017-07-24 14:47 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Mon, 24 Jul 2017 08:29:37 +0900 > Cc: emacs-devel@gnu.org > > > And where would someone need to add the language string manually -- in > > what file? > > If the directive were not present, somebody would need to manually add @documentlanguage [appropriate language] directive to the docstyle.texi file of that language. OK, thanks. But AFAIU, po4a can be told to add text to the translated document; couldn't we simply insert @documentlanguage in the translated manual that way, without expecting the original manuals to add this redundant directive to the English original? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-24 14:47 ` Eli Zaretskii @ 2017-07-24 15:34 ` Jean-Christophe Helary 2017-07-24 15:51 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-24 15:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > On Jul 24, 2017, at 23:47, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> >> Date: Mon, 24 Jul 2017 08:29:37 +0900 >> Cc: emacs-devel@gnu.org >> >>> And where would someone need to add the language string manually -- in >>> what file? >> >> If the directive were not present, somebody would need to manually add @documentlanguage [appropriate language] directive to the docstyle.texi file of that language. > > OK, thanks. But AFAIU, po4a can be told to add text to the translated > document; couldn't we simply insert @documentlanguage in the > translated manual that way, without expecting the original manuals to > add this redundant directive to the English original? I was not aware of this option, after checking the po4a manual here is what it says: https://po4a.alioth.debian.org/man/man7/po4a.7.php "HOWTO add extra text to translations (like translator's name)?" > HOWTO add extra text to translations (like translator's name)? > > Because of the gettext approach, doing this becomes more difficult in po4a than it was when simply editing a new file along the original one. But it remains possible, thanks to the so-called addenda. > It may help the comprehension to consider addenda as a sort of patches applied to the localized document after processing. They are rather different from the usual patches (they have only one line of context, which can embed Perl regular expression, and they can only add new text without removing any), but the functionalities are the same. > > Their goal is to allow the translator to add extra content to the document which is not translated from the original document. The most common usage is to add a section about the translation itself, listing contributors and explaining how to report bug against the translation. > > An addendum must be provided as a separate file. The first line constitutes a header indicating where in the produced document they should be placed. The rest of the addendum file will be added verbatim at the determined position of the resulting document. > > The header has a pretty rigid syntax: It must begin with the string PO4A-HEADER:, followed by a semi-colon (;) separated list of key=value fields. White spaces ARE important. Note that you cannot use the semi-colon char (;) in the value, and that quoting it doesn't help. So we'd have to create and maintain as many files as there are translations, instead of just adding 1 directive in 1 texi file and be done with it. Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-24 15:34 ` Jean-Christophe Helary @ 2017-07-24 15:51 ` Eli Zaretskii 2017-07-24 16:08 ` Jean-Christophe Helary 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2017-07-24 15:51 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Tue, 25 Jul 2017 00:34:54 +0900 > Cc: emacs-devel@gnu.org > > So we'd have to create and maintain as many files as there are translations No, because the file can be auto-generated: its contents is simple and known in advance as soon as the target language is specified. Anyway, I think we should defer adding this directive until we have the first translation of the first manual. It sounds wrong to me to make preparations without having even a single client to benefit from it. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-24 15:51 ` Eli Zaretskii @ 2017-07-24 16:08 ` Jean-Christophe Helary 2017-07-24 16:29 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-24 16:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > On Jul 25, 2017, at 0:51, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> >> Date: Tue, 25 Jul 2017 00:34:54 +0900 >> Cc: emacs-devel@gnu.org >> >> So we'd have to create and maintain as many files as there are translations > > No, because the file can be auto-generated: its contents is simple and > known in advance as soon as the target language is specified. But that's only a trick for one specific case. How is that different from the code I modified in package.el ? I've just opened tutorial.el and found that we have functions there that do not consider English is the default. Why not generalize that ? > Anyway, I think we should defer adding this directive until we have > the first translation of the first manual. It sounds wrong to me to > make preparations without having even a single client to benefit from > it. What I did on package.el was similar. Except for a few obvious errors, the new strings only benefit potential translators that we'll only have when the elisp strings are exposed for translation which is even further down the localization road... Also, I *am* working on the documentation French translation and that's why I found about this directive. If you want to see the French manual to consider adding the directive I'm fine with that, but I honestly consider that as a punishment rather as an incentive to contribute. Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-24 16:08 ` Jean-Christophe Helary @ 2017-07-24 16:29 ` Eli Zaretskii 2017-07-24 16:48 ` Jean-Christophe Helary 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2017-07-24 16:29 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Tue, 25 Jul 2017 01:08:29 +0900 > Cc: emacs-devel@gnu.org > > Also, I *am* working on the documentation French translation and that's why I found about this directive. If you want to see the French manual to consider adding the directive I'm fine with that, but I honestly consider that as a punishment rather as an incentive to contribute. It's not a punishment, believe me, just a precaution. I've seen too many cases where some project was started, infrastructure added in preparation for it, then the project stalled, and we were left with the infrastructure no one uses. What's worse, after enough time has passed no one even remembers why we added that. I'm guessing you are working in a local branch, is that true? (If not, you should, because manuals are being changed all the time.) Then just add the directive in your branch, and it will be merged when your work is brought into master. Would that be okay with you? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-24 16:29 ` Eli Zaretskii @ 2017-07-24 16:48 ` Jean-Christophe Helary 2017-07-24 16:55 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-24 16:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1008 bytes --] > On Jul 25, 2017, at 1:29, Eli Zaretskii <eliz@gnu.org> wrote: > > I'm guessing you are working in a local branch, is that true? (If > not, you should, because manuals are being changed all the time.) I'm regularly building po files from the updated texi set and moving all that to an OmegaT project that's not in the repository to avoid any git management. There is a bug in po4a that I reported that keeps it from creating the po file for org mode but I guess Martin Quinson is working on it... > Then just add the directive in your branch, and it will be merged when > your work is brought into master. Would that be okay with you? Perfect. On a side note, I don't see how my French translation can be brought into master since we don't have a location for translated texi file sets... For that we'd need to have a doc/en/... doc/fr/... structure, which does not exist at the moment... But I guess that's something we'll discuss when we have things to add there... Jean-Christophe [-- Attachment #2: Type: text/html, Size: 3862 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-24 16:48 ` Jean-Christophe Helary @ 2017-07-24 16:55 ` Eli Zaretskii 0 siblings, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2017-07-24 16:55 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Tue, 25 Jul 2017 01:48:24 +0900 > Cc: emacs-devel@gnu.org > > On a side note, I don't see how my French translation can be brought into master since we don't have a > location for translated texi file sets... For that we'd need to have a doc/en/... doc/fr/... structure, which does > not exist at the moment... But I guess that's something we'll discuss when we have things to add there... We can discuss this later, indeed, but up front I don't see why we'd need a separate directory, since I think the file name(s) will need to be different anyway. Texinfo doesn't gracefully handle different manuals that go under the same name, it wasn't designed to solve that problem. So it's best to avoid the issue to begin with, and have, say, emacs-fr.info, emacs-jp.info, etc. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-28 5:29 Jean-Christophe Helary 2017-05-28 14:27 ` Drew Adams @ 2017-05-31 22:18 ` Philipp Stephani 2017-05-31 22:29 ` Jean-Christophe Helary ` (3 more replies) 2017-07-28 0:15 ` Byung-Hee HWANG (황병희, 黃炳熙) 2018-04-25 12:58 ` Jean-Christophe Helary 3 siblings, 4 replies; 40+ messages in thread From: Philipp Stephani @ 2017-05-31 22:18 UTC (permalink / raw) To: Jean-Christophe Helary, emacs-devel [-- Attachment #1.1: Type: text/plain, Size: 1707 bytes --] Jean-Christophe Helary <jean.christophe.helary@gmail.com> schrieb am So., 28. Mai 2017 um 07:29 Uhr: > The discussion so far seems to point at modifying 'message' and the likes > so that developers don't have to bother with any l10n mechanism on their > part (besides for writing clean strings). > > ==================================================== > On May 27, 2017, at 10:52, Jean-Christophe Helary < > jean.christophe.helary@gmail.com> wrote: > > My very uninformed idea is that we need an independent function that > handles the preferred language check and the catalog parsing based on a > key, and all the string displaying functions (message etc) would be > redefined to call that function when a non default preferred langage > (currently English) is detected. > > > On May 27, 2017, at 16:43, Eli Zaretskii <eliz@gnu.org> wrote: > > Yes but from what I've seen in package/el, a lot of translatable texts are > not displayed with "message". Some > use "error", some use other mechanisms. > > > Internally, they all boil down to a small set of C functions, which is > where we should make these changes. > ==================================================== > > Since it's C, I'm not going to be able to contribute to that before I > understand the language, and the function definitions. I guess it's time I > open that K&R that's been on my shelves forever... > One small aspect would be to implement field numbers for `format' so that argument indices can be explicitly specified. That is probably quite important because the word order is different between languages, but it's also useful in other situations (e.g. when repeating an argument). I've implemented this in the attached patch. [-- Attachment #1.2: Type: text/html, Size: 2422 bytes --] [-- Attachment #2: 0001-Implement-field-numbers-in-format-strings.txt --] [-- Type: text/plain, Size: 10433 bytes --] From 39a0f7439b6cd804c49ea91b412b0adb11851cdd Mon Sep 17 00:00:00 2001 From: Philipp Stephani <phst@google.com> Date: Thu, 1 Jun 2017 00:09:43 +0200 Subject: [PATCH] Implement field numbers in format strings A field number explicitly specifies the argument to be formatted. This is especially important for potential localization work, since grammars of various languages dictate different word orders. * src/editfns.c (Fformat): Update documentation. (styled_format): Implement field numbers. * doc/lispref/strings.texi (Formatting Strings): Document field numbers. * lisp/emacs-lisp/bytecomp.el (byte-compile-format-warn): Adapt. * test/src/editfns-tests.el (format-with-field): New unit test. --- doc/lispref/strings.texi | 31 ++++++++++++++++++++++--- etc/NEWS | 3 +++ lisp/emacs-lisp/bytecomp.el | 11 ++++++--- src/editfns.c | 55 ++++++++++++++++++++++++++++++++++++++------- test/src/editfns-tests.el | 18 +++++++++++++++ 5 files changed, 104 insertions(+), 14 deletions(-) diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index 9436a96ead..9bf52f2414 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -864,7 +864,8 @@ Formatting Strings (format "%s" @var{arbitrary-string}) @end example - If @var{string} contains more than one format specification, the + If @var{string} contains more than one format specification and none +of the format specifications contain an explicit field number, the format specifications correspond to successive values from @var{objects}. Thus, the first format specification in @var{string} uses the first such value, the second format specification uses the @@ -961,6 +962,25 @@ Formatting Strings @end group @end example +@cindex field number + A specification can have a @dfn{field number}, which is a decimal +number after the initial @samp{%}, followed by a literal dollar sign +@samp{$}. If you provide a field numer, then the argument to be +printed corresponds to the given field number instead of the next +argument. Field numbers start at 1. + +You can mix specifications with and without field numbers. A +specification without a field number that follows a specification with +a field number will convert the argument after the one specified by +the field number: + +@example +(format "First argument %2$s, then %s, then %1$s" 1 2 3) + @result{} "First argument 2, then 3, then 1" +@end example + +You can't use field numbers in a @samp{%%} specification. + @cindex field width @cindex padding A specification can have a @dfn{width}, which is a decimal number @@ -996,9 +1016,14 @@ Formatting Strings @end group @end example +If you want to use both a field number and a width, place the field +number before the width. For example, in @samp{%2$7s}, @samp{2} is +the field number and @samp{7} is the width. + @cindex flags in format specifications - Immediately after the @samp{%} and before the optional width -specifier, you can also put certain @dfn{flag characters}. + After the @samp{%} and before the optional width specifier, you can +also put certain @dfn{flag characters}. The flag characters need to +come directly after a potential field number. The flag @samp{+} inserts a plus sign before a positive number, so that it always has a sign. A space character as flag inserts a space diff --git a/etc/NEWS b/etc/NEWS index 43e7897120..e31c0c7cb0 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -364,6 +364,9 @@ large integers from being displayed as characters. ** Two new commands for finding the source code of Emacs Lisp libraries: 'find-library-other-window' and 'find-library-other-frame'. +** You can now provide explicit field numbers in format specifiers. +For example, '(format "%2$s %1$s" 1 2)' produces "2 1". + \f * Editing Changes in Emacs 26.1 diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el index 12a7d4afc2..e5b9b47b1d 100644 --- a/lisp/emacs-lisp/bytecomp.el +++ b/lisp/emacs-lisp/bytecomp.el @@ -1375,10 +1375,15 @@ byte-compile-format-warn (let ((nfields (with-temp-buffer (insert (nth 1 form)) (goto-char (point-min)) - (let ((n 0)) + (let ((i 0) (n 0)) (while (re-search-forward "%." nil t) - (unless (eq ?% (char-after (1+ (match-beginning 0)))) - (setq n (1+ n)))) + (backward-char) + (unless (eq ?% (char-after)) + (setq i (if (looking-at "\\([0-9]+\\)\\$") + (string-to-number (match-string 1) 10) + (1+ i)) + n (max n i))) + (forward-char)) n))) (nargs (- (length form) 2))) (unless (= nargs nfields) diff --git a/src/editfns.c b/src/editfns.c index a51670cfdf..b76208f8be 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -48,6 +48,7 @@ along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. */ #include <float.h> #include <limits.h> +#include <c-ctype.h> #include <intprops.h> #include <stdlib.h> #include <strftime.h> @@ -3857,7 +3858,7 @@ The first argument is a format control string. The other arguments are substituted into it to make the result, a string. The format control string may contain %-sequences meaning to substitute -the next available argument: +the next available argument, or the argument explicitly specified: %s means print a string argument. Actually, prints any object, with `princ'. %d means print as signed number in decimal. @@ -3874,13 +3875,17 @@ the next available argument: The argument used for %d, %o, %x, %e, %f, %g or %c must be a number. Use %% to put a single % into the output. -A %-sequence may contain optional flag, width, and precision -specifiers, as follows: +A %-sequence may contain optional field number, flag, width, and +precision specifiers, as follows: - %<flags><width><precision>character + %<field><flags><width><precision>character -where flags is [+ #-0]+, width is [0-9]+, and precision is a literal -period "." followed by [0-9]+ +where field is [0-9]+ followed by a literal dollar "$", flags is +[+ #-0]+, width is [0-9]+, and precision is a literal period "." +followed by [0-9]+. + +If field is given, it must be a one-based argument number; the given +argument is substituted instead of the next one. The + flag character inserts a + before any positive number, while a space inserts a space before any positive number; these flags only @@ -4033,14 +4038,19 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) { /* General format specifications look like - '%' [flags] [field-width] [precision] format + '%' [field-number] [flags] [field-width] [precision] format where + field-number ::= [0-9]+ '$' flags ::= [-+0# ]+ field-width ::= [0-9]+ precision ::= '.' [0-9]* + If a field-number is specified, it specifies the argument + number to substitute. Otherwise, the next argument is + taken. + If a field-width is specified, it specifies to which width the output should be padded with blanks, if the output string is shorter than field-width. @@ -4049,6 +4059,29 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) digits to print after the '.' for floats, or the max. number of chars to print from a string. */ + char *field_end; + uintmax_t raw_field = strtoumax (format, &field_end, 10); + bool has_field = false; + if (c_isdigit (*format) && *field_end == '$') + { + if (raw_field < 1 || raw_field >= PTRDIFF_MAX) + { + /* doprnt doesn't support %.*s, so we need to copy + the field number string. */ + ptrdiff_t length = field_end - format; + eassert (length > 0); + eassert (length < PTRDIFF_MAX); + char *field = SAFE_ALLOCA (length + 1); + memcpy (field, format, length); + field[length] = '\0'; + error ("Invalid field number `%s'", field); + } + has_field = true; + /* n is incremented below. */ + n = raw_field - 1; + format = field_end + 1; + } + bool minus_flag = false; bool plus_flag = false; bool space_flag = false; @@ -4091,7 +4124,13 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) memset (&discarded[format0 - format_start], 1, format - format0 - (conversion == '%')); if (conversion == '%') - goto copy_char; + { + if (has_field) + /* FIXME: `error' doesn't appear to support `%%'. */ + error ("Field number specified together with `%c' conversion", + '%'); + goto copy_char; + } ++n; if (! (n < nargs)) diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el index 8019eb0383..f76c6c9fd3 100644 --- a/test/src/editfns-tests.el +++ b/test/src/editfns-tests.el @@ -177,4 +177,22 @@ transpose-test-get-byte-positions (format-time-string "%Y-%m-%d %H:%M:%S.%3N %z" nil (concat (make-string 2048 ?X) "0"))))) +(ert-deftest format-with-field () + (should (equal (format "First argument %2$s, then %s, then %1$s" 1 2 3) + "First argument 2, then 3, then 1")) + (should (equal (format "a %2$s %d %1$d %2$S %d %d b" 11 "22" 33 44) + "a 22 33 11 \"22\" 33 44 b")) + (should (equal (format "a %08$s %s b" 1 2 3 4 5 6 7 8 9) "a 8 9 b")) + (should (equal (should-error (format "a %999999$s b" 11)) + '(error "Not enough arguments for format string"))) + (should (equal (should-error (format "a %$s b" 11)) + ;; FIXME: there shouldn't be two % in the error + ;; string! + '(error "Invalid format operation %%$"))) + (should (equal (should-error (format "a %0$s b" 11)) + '(error "Invalid field number `0'"))) + (should (equal + (should-error (format "a %1$% %s b" 11)) + '(error "Field number specified together with `%' conversion")))) + ;;; editfns-tests.el ends here -- 2.13.0 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-31 22:18 ` Philipp Stephani @ 2017-05-31 22:29 ` Jean-Christophe Helary 2017-06-01 5:18 ` Paul Eggert ` (2 subsequent siblings) 3 siblings, 0 replies; 40+ messages in thread From: Jean-Christophe Helary @ 2017-05-31 22:29 UTC (permalink / raw) To: emacs-devel > On Jun 1, 2017, at 7:18, Philipp Stephani <p.stephani2@gmail.com> wrote: > > >> The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings). >> >> ==================================================== >> On May 27, 2017, at 10:52, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote: >> >>> My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected. >> >> On May 27, 2017, at 16:43, Eli Zaretskii <eliz@gnu.org> wrote: >> >>> Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some >>> use "error", some use other mechanisms. >> >> Internally, they all boil down to a small set of C functions, which is >> where we should make these changes. >> ==================================================== >> >> Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever... > > One small aspect would be to implement field numbers for `format' so that argument indices can be explicitly specified. That is probably quite important because the word order is different between languages, but it's also useful in other situations (e.g. when repeating an argument). I've implemented this in the attached patch. That's really not a *small* aspect. Thank you for thinking about that. I'm in no position to review your work but I suppose others will. Jean-Christophe ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-31 22:18 ` Philipp Stephani 2017-05-31 22:29 ` Jean-Christophe Helary @ 2017-06-01 5:18 ` Paul Eggert 2017-06-01 8:17 ` Philipp Stephani 2017-06-01 14:21 ` Eli Zaretskii 2017-07-02 1:22 ` Jean-Christophe Helary 3 siblings, 1 reply; 40+ messages in thread From: Paul Eggert @ 2017-06-01 5:18 UTC (permalink / raw) To: Philipp Stephani, Jean-Christophe Helary, emacs-devel [-- Attachment #1: Type: text/plain, Size: 985 bytes --] Thanks for that patch: it's a good move forward for i18n. Some suggestions: * Today I fixed the bug with "%%" and the 'error' function, so there's no need for a FIXME or a workaround any more. * In strings.texi, reorder the format spec description so that it matches the textual order of a format spec. This should lessen confusion. * Allow field numbers in a %% spec. All other components of a format spec are allowed in %%, so odd to report an error for just field numbers. * There is no need for a special diagnostic for field numbers greater than PTRDIFF_MAX. Just use the same diagnostic other too-large field numbers use. This avoids a need for an alloca. * Reword "Invalid field number `0'" to "Invalid format field number 0" to make it more obvious that it's a format and there's no need to quote the 0. Proposed further patch attached (it addresses the above points), along with a copy of your patch rebased to current master for convenience. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Implement-field-numbers-in-format-strings.patch --] [-- Type: text/x-patch; name="0001-Implement-field-numbers-in-format-strings.patch", Size: 10666 bytes --] From 7f98d9c6d59472abf189ccd5a5c76415f9d1cbb3 Mon Sep 17 00:00:00 2001 From: Philipp Stephani <phst@google.com> Date: Wed, 31 May 2017 22:09:39 -0700 Subject: [PATCH 1/2] Implement field numbers in format strings A field number explicitly specifies the argument to be formatted. This is especially important for potential localization work, since grammars of various languages dictate different word orders. * src/editfns.c (Fformat): Update documentation. (styled_format): Implement field numbers. * doc/lispref/strings.texi (Formatting Strings): Document field numbers. * lisp/emacs-lisp/bytecomp.el (byte-compile-format-warn): Adapt. * test/src/editfns-tests.el (format-with-field): New unit test. --- doc/lispref/strings.texi | 31 ++++++++++++++++++++++--- etc/NEWS | 3 +++ lisp/emacs-lisp/bytecomp.el | 11 ++++++--- src/editfns.c | 55 ++++++++++++++++++++++++++++++++++++++------- test/src/editfns-tests.el | 18 +++++++++++++++ 5 files changed, 104 insertions(+), 14 deletions(-) diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index 9436a96..9bf52f2 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -864,7 +864,8 @@ Formatting Strings (format "%s" @var{arbitrary-string}) @end example - If @var{string} contains more than one format specification, the + If @var{string} contains more than one format specification and none +of the format specifications contain an explicit field number, the format specifications correspond to successive values from @var{objects}. Thus, the first format specification in @var{string} uses the first such value, the second format specification uses the @@ -961,6 +962,25 @@ Formatting Strings @end group @end example +@cindex field number + A specification can have a @dfn{field number}, which is a decimal +number after the initial @samp{%}, followed by a literal dollar sign +@samp{$}. If you provide a field numer, then the argument to be +printed corresponds to the given field number instead of the next +argument. Field numbers start at 1. + +You can mix specifications with and without field numbers. A +specification without a field number that follows a specification with +a field number will convert the argument after the one specified by +the field number: + +@example +(format "First argument %2$s, then %s, then %1$s" 1 2 3) + @result{} "First argument 2, then 3, then 1" +@end example + +You can't use field numbers in a @samp{%%} specification. + @cindex field width @cindex padding A specification can have a @dfn{width}, which is a decimal number @@ -996,9 +1016,14 @@ Formatting Strings @end group @end example +If you want to use both a field number and a width, place the field +number before the width. For example, in @samp{%2$7s}, @samp{2} is +the field number and @samp{7} is the width. + @cindex flags in format specifications - Immediately after the @samp{%} and before the optional width -specifier, you can also put certain @dfn{flag characters}. + After the @samp{%} and before the optional width specifier, you can +also put certain @dfn{flag characters}. The flag characters need to +come directly after a potential field number. The flag @samp{+} inserts a plus sign before a positive number, so that it always has a sign. A space character as flag inserts a space diff --git a/etc/NEWS b/etc/NEWS index 43e7897..e31c0c7 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -364,6 +364,9 @@ large integers from being displayed as characters. ** Two new commands for finding the source code of Emacs Lisp libraries: 'find-library-other-window' and 'find-library-other-frame'. +** You can now provide explicit field numbers in format specifiers. +For example, '(format "%2$s %1$s" 1 2)' produces "2 1". + \f * Editing Changes in Emacs 26.1 diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el index 12a7d4a..e5b9b47 100644 --- a/lisp/emacs-lisp/bytecomp.el +++ b/lisp/emacs-lisp/bytecomp.el @@ -1375,10 +1375,15 @@ byte-compile-format-warn (let ((nfields (with-temp-buffer (insert (nth 1 form)) (goto-char (point-min)) - (let ((n 0)) + (let ((i 0) (n 0)) (while (re-search-forward "%." nil t) - (unless (eq ?% (char-after (1+ (match-beginning 0)))) - (setq n (1+ n)))) + (backward-char) + (unless (eq ?% (char-after)) + (setq i (if (looking-at "\\([0-9]+\\)\\$") + (string-to-number (match-string 1) 10) + (1+ i)) + n (max n i))) + (forward-char)) n))) (nargs (- (length form) 2))) (unless (= nargs nfields) diff --git a/src/editfns.c b/src/editfns.c index 89a6724..44341ce 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -48,6 +48,7 @@ along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. */ #include <float.h> #include <limits.h> +#include <c-ctype.h> #include <intprops.h> #include <stdlib.h> #include <strftime.h> @@ -3856,7 +3857,7 @@ The first argument is a format control string. The other arguments are substituted into it to make the result, a string. The format control string may contain %-sequences meaning to substitute -the next available argument: +the next available argument, or the argument explicitly specified: %s means print a string argument. Actually, prints any object, with `princ'. %d means print as signed number in decimal. @@ -3873,13 +3874,17 @@ the next available argument: The argument used for %d, %o, %x, %e, %f, %g or %c must be a number. Use %% to put a single % into the output. -A %-sequence may contain optional flag, width, and precision -specifiers, as follows: +A %-sequence may contain optional field number, flag, width, and +precision specifiers, as follows: - %<flags><width><precision>character + %<field><flags><width><precision>character -where flags is [+ #-0]+, width is [0-9]+, and precision is a literal -period "." followed by [0-9]+ +where field is [0-9]+ followed by a literal dollar "$", flags is +[+ #-0]+, width is [0-9]+, and precision is a literal period "." +followed by [0-9]+. + +If field is given, it must be a one-based argument number; the given +argument is substituted instead of the next one. The + flag character inserts a + before any positive number, while a space inserts a space before any positive number; these flags only @@ -4032,14 +4037,19 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) { /* General format specifications look like - '%' [flags] [field-width] [precision] format + '%' [field-number] [flags] [field-width] [precision] format where + field-number ::= [0-9]+ '$' flags ::= [-+0# ]+ field-width ::= [0-9]+ precision ::= '.' [0-9]* + If a field-number is specified, it specifies the argument + number to substitute. Otherwise, the next argument is + taken. + If a field-width is specified, it specifies to which width the output should be padded with blanks, if the output string is shorter than field-width. @@ -4048,6 +4058,29 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) digits to print after the '.' for floats, or the max. number of chars to print from a string. */ + char *field_end; + uintmax_t raw_field = strtoumax (format, &field_end, 10); + bool has_field = false; + if (c_isdigit (*format) && *field_end == '$') + { + if (raw_field < 1 || raw_field >= PTRDIFF_MAX) + { + /* doprnt doesn't support %.*s, so we need to copy + the field number string. */ + ptrdiff_t length = field_end - format; + eassert (length > 0); + eassert (length < PTRDIFF_MAX); + char *field = SAFE_ALLOCA (length + 1); + memcpy (field, format, length); + field[length] = '\0'; + error ("Invalid field number `%s'", field); + } + has_field = true; + /* n is incremented below. */ + n = raw_field - 1; + format = field_end + 1; + } + bool minus_flag = false; bool plus_flag = false; bool space_flag = false; @@ -4090,7 +4123,13 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) memset (&discarded[format0 - format_start], 1, format - format0 - (conversion == '%')); if (conversion == '%') - goto copy_char; + { + if (has_field) + /* FIXME: `error' doesn't appear to support `%%'. */ + error ("Field number specified together with `%c' conversion", + '%'); + goto copy_char; + } ++n; if (! (n < nargs)) diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el index 8019eb0..f76c6c9 100644 --- a/test/src/editfns-tests.el +++ b/test/src/editfns-tests.el @@ -177,4 +177,22 @@ transpose-test-get-byte-positions (format-time-string "%Y-%m-%d %H:%M:%S.%3N %z" nil (concat (make-string 2048 ?X) "0"))))) +(ert-deftest format-with-field () + (should (equal (format "First argument %2$s, then %s, then %1$s" 1 2 3) + "First argument 2, then 3, then 1")) + (should (equal (format "a %2$s %d %1$d %2$S %d %d b" 11 "22" 33 44) + "a 22 33 11 \"22\" 33 44 b")) + (should (equal (format "a %08$s %s b" 1 2 3 4 5 6 7 8 9) "a 8 9 b")) + (should (equal (should-error (format "a %999999$s b" 11)) + '(error "Not enough arguments for format string"))) + (should (equal (should-error (format "a %$s b" 11)) + ;; FIXME: there shouldn't be two % in the error + ;; string! + '(error "Invalid format operation %%$"))) + (should (equal (should-error (format "a %0$s b" 11)) + '(error "Invalid field number `0'"))) + (should (equal + (should-error (format "a %1$% %s b" 11)) + '(error "Field number specified together with `%' conversion")))) + ;;; editfns-tests.el ends here -- 2.9.4 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-Minor-improvements-to-format-field-numbers.patch --] [-- Type: text/x-patch; name="0002-Minor-improvements-to-format-field-numbers.patch", Size: 12995 bytes --] From 6b7edbf14be8b18e0781559345edda7346a1e363 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Wed, 31 May 2017 22:09:39 -0700 Subject: [PATCH 2/2] Minor improvements to format field numbers * src/editfns.c (styled_format): Allow field numbers in a %% spec. No need for a special diagnostic for field numbers greater than PTRDIFF_MAX. Reword diagnostic for field 0. * test/src/editfns-tests.el (format-with-field): Adjust to match. --- doc/lispref/strings.texi | 119 +++++++++++++++++++++------------------------- etc/NEWS | 2 +- src/editfns.c | 56 ++++++++-------------- test/src/editfns-tests.el | 10 ++-- 4 files changed, 79 insertions(+), 108 deletions(-) diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index 9bf52f2..7577410 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -864,15 +864,6 @@ Formatting Strings (format "%s" @var{arbitrary-string}) @end example - If @var{string} contains more than one format specification and none -of the format specifications contain an explicit field number, the -format specifications correspond to successive values from -@var{objects}. Thus, the first format specification in @var{string} -uses the first such value, the second format specification uses the -second such value, and so on. Any extra format specifications (those -for which there are no corresponding values) cause an error. Any -extra values to be formatted are ignored. - Certain format specifications require values of particular types. If you supply a value that doesn't fit the requirements, an error is signaled. @@ -962,68 +953,33 @@ Formatting Strings @end group @end example + By default, format specifications correspond to successive values from +@var{objects}. Thus, the first format specification in @var{string} +uses the first such value, the second format specification uses the +second such value, and so on. Any extra format specifications (those +for which there are no corresponding values) cause an error. Any +extra values to be formatted are ignored. + @cindex field number - A specification can have a @dfn{field number}, which is a decimal -number after the initial @samp{%}, followed by a literal dollar sign -@samp{$}. If you provide a field numer, then the argument to be -printed corresponds to the given field number instead of the next -argument. Field numbers start at 1. + A format specification can have a @dfn{field number}, which is a +decimal number immediately after the initial @samp{%}, followed by a +literal dollar sign @samp{$}. It causes the format specification to +convert the argument with the given number instead of the next +argument. Argument 1 is the argument just after the format. -You can mix specifications with and without field numbers. A + You can mix specifications with and without field numbers. A specification without a field number that follows a specification with a field number will convert the argument after the one specified by the field number: @example -(format "First argument %2$s, then %s, then %1$s" 1 2 3) - @result{} "First argument 2, then 3, then 1" -@end example - -You can't use field numbers in a @samp{%%} specification. - -@cindex field width -@cindex padding - A specification can have a @dfn{width}, which is a decimal number -between the @samp{%} and the specification character. If the printed -representation of the object contains fewer characters than this -width, @code{format} extends it with padding. The width specifier is -ignored for the @samp{%%} specification. Any padding introduced by -the width specifier normally consists of spaces inserted on the left: - -@example -(format "%5d is padded on the left with spaces" 123) - @result{} " 123 is padded on the left with spaces" -@end example - -@noindent -If the width is too small, @code{format} does not truncate the -object's printed representation. Thus, you can use a width to specify -a minimum spacing between columns with no risk of losing information. -In the following two examples, @samp{%7s} specifies a minimum width -of 7. In the first case, the string inserted in place of @samp{%7s} -has only 3 letters, and needs 4 blank spaces as padding. In the -second case, the string @code{"specification"} is 13 letters wide but -is not truncated. - -@example -@group -(format "The word '%7s' has %d letters in it." - "foo" (length "foo")) - @result{} "The word ' foo' has 3 letters in it." -(format "The word '%7s' has %d letters in it." - "specification" (length "specification")) - @result{} "The word 'specification' has 13 letters in it." -@end group +(format "Argument %2$s, then %s, then %1$s" "x" "y" "z") + @result{} "Argument y, then z, then x" @end example -If you want to use both a field number and a width, place the field -number before the width. For example, in @samp{%2$7s}, @samp{2} is -the field number and @samp{7} is the width. - @cindex flags in format specifications - After the @samp{%} and before the optional width specifier, you can -also put certain @dfn{flag characters}. The flag characters need to -come directly after a potential field number. + After the @samp{%} and any field number, you can put certain +@dfn{flag characters}. The flag @samp{+} inserts a plus sign before a positive number, so that it always has a sign. A space character as flag inserts a space @@ -1048,8 +1004,8 @@ Formatting Strings These specification characters accept the @samp{0} flag, but still pad with @emph{spaces}. - The flag @samp{-} causes the padding inserted by the width -specifier, if any, to be inserted on the right rather than the left. + The flag @samp{-} causes any padding inserted by the width, +if specified, to be inserted on the right rather than the left. If both @samp{-} and @samp{0} are present, the @samp{0} flag is ignored. @@ -1067,9 +1023,44 @@ Formatting Strings @end group @end example +@cindex field width +@cindex padding + A specification can have a @dfn{width}, which is a decimal number +that appears after any field number and flags. If the printed +representation of the object contains fewer characters than this +width, @code{format} extends it with padding. The width is +ignored for the @samp{%%} specification. Any padding introduced by +the width normally consists of spaces inserted on the left: + +@example +(format "%5d is padded on the left with spaces" 123) + @result{} " 123 is padded on the left with spaces" +@end example + +@noindent +If the width is too small, @code{format} does not truncate the +object's printed representation. Thus, you can use a width to specify +a minimum spacing between columns with no risk of losing information. +In the following two examples, @samp{%7s} specifies a minimum width +of 7. In the first case, the string inserted in place of @samp{%7s} +has only 3 letters, and needs 4 blank spaces as padding. In the +second case, the string @code{"specification"} is 13 letters wide but +is not truncated. + +@example +@group +(format "The word '%7s' has %d letters in it." + "foo" (length "foo")) + @result{} "The word ' foo' has 3 letters in it." +(format "The word '%7s' has %d letters in it." + "specification" (length "specification")) + @result{} "The word 'specification' has 13 letters in it." +@end group +@end example + @cindex precision in format specifications All the specification characters allow an optional @dfn{precision} -before the character (after the width, if present). The precision is +after the field number, flags and width, if present. The precision is a decimal-point @samp{.} followed by a digit-string. For the floating-point specifications (@samp{%e} and @samp{%f}), the precision specifies how many digits following the decimal point to diff --git a/etc/NEWS b/etc/NEWS index e31c0c7..dfe81cd 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -365,7 +365,7 @@ large integers from being displayed as characters. libraries: 'find-library-other-window' and 'find-library-other-frame'. ** You can now provide explicit field numbers in format specifiers. -For example, '(format "%2$s %1$s" 1 2)' produces "2 1". +For example, '(format "%2$s %1$s" "X" "Y")' produces "Y X". \f * Editing Changes in Emacs 26.1 diff --git a/src/editfns.c b/src/editfns.c index 44341ce..98187df 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -4046,9 +4046,8 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) field-width ::= [0-9]+ precision ::= '.' [0-9]* - If a field-number is specified, it specifies the argument - number to substitute. Otherwise, the next argument is - taken. + If present, a field-number specifies the argument number + to substitute. Otherwise, the next argument is taken. If a field-width is specified, it specifies to which width the output should be padded with blanks, if the output @@ -4058,28 +4057,20 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) digits to print after the '.' for floats, or the max. number of chars to print from a string. */ - char *field_end; - uintmax_t raw_field = strtoumax (format, &field_end, 10); - bool has_field = false; - if (c_isdigit (*format) && *field_end == '$') - { - if (raw_field < 1 || raw_field >= PTRDIFF_MAX) - { - /* doprnt doesn't support %.*s, so we need to copy - the field number string. */ - ptrdiff_t length = field_end - format; - eassert (length > 0); - eassert (length < PTRDIFF_MAX); - char *field = SAFE_ALLOCA (length + 1); - memcpy (field, format, length); - field[length] = '\0'; - error ("Invalid field number `%s'", field); - } - has_field = true; - /* n is incremented below. */ - n = raw_field - 1; - format = field_end + 1; - } + uintmax_t num; + char *num_end; + if (c_isdigit (*format)) + { + num = strtoumax (format, &num_end, 10); + if (*num_end == '$') + { + if (num == 0) + error ("Invalid format field number 0"); + n = min (num, PTRDIFF_MAX); + n--; + format = num_end + 1; + } + } bool minus_flag = false; bool plus_flag = false; @@ -4104,11 +4095,10 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) space_flag &= ! plus_flag; zero_flag &= ! minus_flag; - char *num_end; - uintmax_t raw_field_width = strtoumax (format, &num_end, 10); - if (max_bufsize <= raw_field_width) + num = strtoumax (format, &num_end, 10); + if (max_bufsize <= num) string_overflow (); - ptrdiff_t field_width = raw_field_width; + ptrdiff_t field_width = num; bool precision_given = *num_end == '.'; uintmax_t precision = (precision_given @@ -4123,13 +4113,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) memset (&discarded[format0 - format_start], 1, format - format0 - (conversion == '%')); if (conversion == '%') - { - if (has_field) - /* FIXME: `error' doesn't appear to support `%%'. */ - error ("Field number specified together with `%c' conversion", - '%'); - goto copy_char; - } + goto copy_char; ++n; if (! (n < nargs)) diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el index f76c6c9..c5923aa 100644 --- a/test/src/editfns-tests.el +++ b/test/src/editfns-tests.el @@ -186,13 +186,9 @@ transpose-test-get-byte-positions (should (equal (should-error (format "a %999999$s b" 11)) '(error "Not enough arguments for format string"))) (should (equal (should-error (format "a %$s b" 11)) - ;; FIXME: there shouldn't be two % in the error - ;; string! - '(error "Invalid format operation %%$"))) + '(error "Invalid format operation %$"))) (should (equal (should-error (format "a %0$s b" 11)) - '(error "Invalid field number `0'"))) - (should (equal - (should-error (format "a %1$% %s b" 11)) - '(error "Field number specified together with `%' conversion")))) + '(error "Invalid format field number 0"))) + (should (equal (format "a %1$% %s b" 11) "a % 11 b"))) ;;; editfns-tests.el ends here -- 2.9.4 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-01 5:18 ` Paul Eggert @ 2017-06-01 8:17 ` Philipp Stephani 2017-06-01 23:20 ` Paul Eggert 0 siblings, 1 reply; 40+ messages in thread From: Philipp Stephani @ 2017-06-01 8:17 UTC (permalink / raw) To: Paul Eggert, Jean-Christophe Helary, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1719 bytes --] Paul Eggert <eggert@cs.ucla.edu> schrieb am Do., 1. Juni 2017 um 07:18 Uhr: > Thanks for that patch: it's a good move forward for i18n. Some suggestions: > > * Today I fixed the bug with "%%" and the 'error' function, so there's no > need > for a FIXME or a workaround any more. > > * In strings.texi, reorder the format spec description so that it matches > the > textual order of a format spec. This should lessen confusion. > > * Allow field numbers in a %% spec. All other components of a format spec > are > allowed in %%, so odd to report an error for just field numbers. > The reason I banned that initially is that the behavior for the case "%1$% %d" is confusing: will the %d take argument 1 or 2? (We should ban such mixing instead, see below.) > > * There is no need for a special diagnostic for field numbers greater than > PTRDIFF_MAX. Just use the same diagnostic other too-large field numbers > use. > This avoids a need for an alloca. > > * Reword "Invalid field number `0'" to "Invalid format field number 0" to > make > it more obvious that it's a format and there's no need to quote the 0. > > Proposed further patch attached (it addresses the above points), along > with a > copy of your patch rebased to current master for convenience. > Thanks, feel free to push. Two further things: - Probably there's a bug lurking because the info[n] ought to be indexed by specification index, not argument index. Something like (format "%1$c %1$d" ?a) will probably do the wrong thing (untested). - We should ban mixing explicit and implicit field numbers, like POSIX printf(3) does. The gain from allowing to mix is negligible, and it makes the implementation and the documentation needlessly complex. [-- Attachment #2: Type: text/html, Size: 2314 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-01 8:17 ` Philipp Stephani @ 2017-06-01 23:20 ` Paul Eggert 2017-06-02 6:52 ` Philipp Stephani 0 siblings, 1 reply; 40+ messages in thread From: Paul Eggert @ 2017-06-01 23:20 UTC (permalink / raw) To: Philipp Stephani, Jean-Christophe Helary, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1189 bytes --] On 06/01/2017 01:17 AM, Philipp Stephani wrote: > > - Probably there's a bug lurking because the info[n] ought to be > indexed by specification index, not argument index. Something like > (format "%1$c %1$d" ?a) will probably do the wrong thing (untested). Sorry, I'm not following. That call returns "a 97"; isn't that the expected result? > - We should ban mixing explicit and implicit field numbers, like POSIX > printf(3) does. The gain from allowing to mix is negligible, and it > makes the implementation and the documentation needlessly complex. Sounds good, and I installed the attached. The 1st patch fixes a performance regression introduced by calling strtoumax. I went whole-hog and removed all calls to strtoumax, since they're all performance-significant, plus it makes for one less porting issue to worry about. The 2nd patch fixes the documentation along the lines that you suggested. And on further thought, the tradition for Emacs is to document supported behavior and not worry about slowing Emacs down to check for undocumented usage (aside from preventing crashes), so with that in mind the 2nd patch removes the check for %0$ (which never crashes). [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Improve-performance-by-avoiding-strtoumax.patch --] [-- Type: text/x-patch; name="0001-Improve-performance-by-avoiding-strtoumax.patch", Size: 21709 bytes --] From b126485b8ce3046ca47caaba2f845a39604dd76f Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Thu, 1 Jun 2017 16:03:12 -0700 Subject: [PATCH 1/2] Improve performance by avoiding strtoumax This made (string-to-number "10") 20% faster on my old desktop, an AMD Phenom II X4 910e running Fedora 25 x86-64. * admin/merge-gnulib (GNULIB_MODULES): Remove strtoumax. * lib/gnulib.mk.in, m4/gnulib-comp.m4: Regenerate. * lib/strtoul.c, lib/strtoull.c, lib/strtoumax.c, m4/strtoull.m4: * m4/strtoumax.m4: Remove. * src/editfns.c (str2num): New function. (styled_format): Use it instead of strtoumax. Use ptrdiff_t instead of uintmax_t. Check for integer overflow. * src/lread.c (LEAD_INT, DOT_CHAR, TRAIL_INT, E_EXP): Move to private scope and make them enums. (string_to_number): Compute integer value directly during first pass instead of revisiting it with strtoumax later. --- admin/merge-gnulib | 2 +- lib/gnulib.mk.in | 27 +------------------------ lib/strtoul.c | 19 ------------------ lib/strtoull.c | 26 ------------------------ lib/strtoumax.c | 2 -- m4/gnulib-comp.m4 | 30 --------------------------- m4/strtoull.m4 | 24 ---------------------- m4/strtoumax.m4 | 28 -------------------------- src/editfns.c | 43 ++++++++++++++++++++++++++++----------- src/lread.c | 59 +++++++++++++++++++++++------------------------------- 10 files changed, 58 insertions(+), 202 deletions(-) delete mode 100644 lib/strtoul.c delete mode 100644 lib/strtoull.c delete mode 100644 lib/strtoumax.c delete mode 100644 m4/strtoull.m4 delete mode 100644 m4/strtoumax.m4 diff --git a/admin/merge-gnulib b/admin/merge-gnulib index 45e4a78..e5fb0f5 100755 --- a/admin/merge-gnulib +++ b/admin/merge-gnulib @@ -38,7 +38,7 @@ GNULIB_MODULES= manywarnings memrchr mkostemp mktime pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat sig2str socklen stat-time std-gnu11 stdalign stddef stdio - stpcpy strftime strtoimax strtoumax symlink sys_stat + stpcpy strftime strtoimax symlink sys_stat sys_time time time_r time_rz timegm timer-time timespec-add timespec-sub update-copyright utimens vla warnings diff --git a/lib/gnulib.mk.in b/lib/gnulib.mk.in index d23c2a5..73d3043 100644 --- a/lib/gnulib.mk.in +++ b/lib/gnulib.mk.in @@ -21,7 +21,7 @@ # the same distribution terms as the rest of that program. # # Generated by gnulib-tool. -# Reproduce by: gnulib-tool --import --lib=libgnu --source-base=lib --m4-base=m4 --doc-base=doc --tests-base=tests --aux-dir=build-aux --avoid=close --avoid=dup --avoid=fchdir --avoid=fstat --avoid=malloc-posix --avoid=msvc-inval --avoid=msvc-nothrow --avoid=open --avoid=openat-die --avoid=opendir --avoid=raise --avoid=save-cwd --avoid=select --avoid=setenv --avoid=sigprocmask --avoid=stat --avoid=stdarg --avoid=stdbool --avoid=threadlib --avoid=tzset --avoid=unsetenv --avoid=utime --avoid=utime-h --gnu-make --makefile-name=gnulib.mk.in --conditional-dependencies --no-libtool --macro-prefix=gl --no-vc-files alloca-opt binary-io byteswap c-ctype c-strcase careadlinkat close-stream count-leading-zeros count-one-bits count-trailing-zeros crypto/md5 crypto/sha1 crypto/sha256 crypto/sha512 dtoastr dtotimespec dup2 environ execinfo faccessat fcntl fcntl-h fdatasync fdopendir filemode filevercmp flexmember fstatat fsync getloadavg getopt-gnu gettime gettimeofday gitlog-to-changelog ignore-value intprops largefile lstat manywarnings memrchr mkostemp mktime pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat sig2str socklen stat-time std-gnu11 stdalign stddef stdio stpcpy strftime strtoimax strtoumax symlink sys_stat sys_time time time_r time_rz timegm timer-time timespec-add timespec-sub update-copyright utimens vla warnings +# Reproduce by: gnulib-tool --import --lib=libgnu --source-base=lib --m4-base=m4 --doc-base=doc --tests-base=tests --aux-dir=build-aux --avoid=close --avoid=dup --avoid=fchdir --avoid=fstat --avoid=malloc-posix --avoid=msvc-inval --avoid=msvc-nothrow --avoid=open --avoid=openat-die --avoid=opendir --avoid=raise --avoid=save-cwd --avoid=select --avoid=setenv --avoid=sigprocmask --avoid=stat --avoid=stdarg --avoid=stdbool --avoid=threadlib --avoid=tzset --avoid=unsetenv --avoid=utime --avoid=utime-h --gnu-make --makefile-name=gnulib.mk.in --conditional-dependencies --no-libtool --macro-prefix=gl --no-vc-files alloca-opt binary-io byteswap c-ctype c-strcase careadlinkat close-stream count-leading-zeros count-one-bits count-trailing-zeros crypto/md5 crypto/sha1 crypto/sha256 crypto/sha512 dtoastr dtotimespec dup2 environ execinfo faccessat fcntl fcntl-h fdatasync fdopendir filemode filevercmp flexmember fstatat fsync getloadavg getopt-gnu gettime gettimeofday gitlog-to-changelog ignore-value intprops largefile lstat manywarnings memrchr mkostemp mktime pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat sig2str socklen stat-time std-gnu11 stdalign stddef stdio stpcpy strftime strtoimax symlink sys_stat sys_time time time_r time_rz timegm timer-time timespec-add timespec-sub update-copyright utimens vla warnings MOSTLYCLEANFILES += core *.stackdump @@ -905,7 +905,6 @@ gl_GNULIB_ENABLED_getdtablesize = @gl_GNULIB_ENABLED_getdtablesize@ gl_GNULIB_ENABLED_getgroups = @gl_GNULIB_ENABLED_getgroups@ gl_GNULIB_ENABLED_secure_getenv = @gl_GNULIB_ENABLED_secure_getenv@ gl_GNULIB_ENABLED_strtoll = @gl_GNULIB_ENABLED_strtoll@ -gl_GNULIB_ENABLED_strtoull = @gl_GNULIB_ENABLED_strtoull@ gl_GNULIB_ENABLED_tempname = @gl_GNULIB_ENABLED_tempname@ gl_LIBOBJS = @gl_LIBOBJS@ gl_LTLIBOBJS = @gl_LTLIBOBJS@ @@ -2507,30 +2506,6 @@ EXTRA_libgnu_a_SOURCES += strtol.c strtoll.c endif ## end gnulib module strtoll -## begin gnulib module strtoull -ifeq (,$(OMIT_GNULIB_MODULE_strtoull)) - -ifneq (,$(gl_GNULIB_ENABLED_strtoull)) - -endif -EXTRA_DIST += strtol.c strtoul.c strtoull.c - -EXTRA_libgnu_a_SOURCES += strtol.c strtoul.c strtoull.c - -endif -## end gnulib module strtoull - -## begin gnulib module strtoumax -ifeq (,$(OMIT_GNULIB_MODULE_strtoumax)) - - -EXTRA_DIST += strtoimax.c strtoumax.c - -EXTRA_libgnu_a_SOURCES += strtoimax.c strtoumax.c - -endif -## end gnulib module strtoumax - ## begin gnulib module symlink ifeq (,$(OMIT_GNULIB_MODULE_symlink)) diff --git a/lib/strtoul.c b/lib/strtoul.c deleted file mode 100644 index c4974e0..0000000 --- a/lib/strtoul.c +++ /dev/null @@ -1,19 +0,0 @@ -/* Copyright (C) 1991, 1997, 2009-2017 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see <http://www.gnu.org/licenses/>. */ - -#define UNSIGNED 1 - -#include "strtol.c" diff --git a/lib/strtoull.c b/lib/strtoull.c deleted file mode 100644 index 51ae3ac..0000000 --- a/lib/strtoull.c +++ /dev/null @@ -1,26 +0,0 @@ -/* Function to parse an 'unsigned long long int' from text. - Copyright (C) 1995-1997, 1999, 2009-2017 Free Software Foundation, Inc. - NOTE: The canonical source of this file is maintained with the GNU C - Library. Bugs can be reported to bug-glibc@gnu.org. - - This program is free software: you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by the - Free Software Foundation; either version 3 of the License, or any - later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see <http://www.gnu.org/licenses/>. */ - -#define QUAD 1 - -#include "strtoul.c" - -#ifdef _LIBC -strong_alias (__strtoull_internal, __strtouq_internal) -weak_alias (strtoull, strtouq) -#endif diff --git a/lib/strtoumax.c b/lib/strtoumax.c deleted file mode 100644 index dc395d6..0000000 --- a/lib/strtoumax.c +++ /dev/null @@ -1,2 +0,0 @@ -#define UNSIGNED 1 -#include "strtoimax.c" diff --git a/m4/gnulib-comp.m4 b/m4/gnulib-comp.m4 index 3f196d4..8f53a99 100644 --- a/m4/gnulib-comp.m4 +++ b/m4/gnulib-comp.m4 @@ -140,8 +140,6 @@ AC_DEFUN # Code from module string: # Code from module strtoimax: # Code from module strtoll: - # Code from module strtoull: - # Code from module strtoumax: # Code from module symlink: # Code from module sys_select: # Code from module sys_stat: @@ -364,12 +362,6 @@ AC_DEFUN gl_PREREQ_STRTOIMAX fi gl_INTTYPES_MODULE_INDICATOR([strtoimax]) - gl_FUNC_STRTOUMAX - if test $HAVE_DECL_STRTOUMAX = 0 || test $REPLACE_STRTOUMAX = 1; then - AC_LIBOBJ([strtoumax]) - gl_PREREQ_STRTOUMAX - fi - gl_INTTYPES_MODULE_INDICATOR([strtoumax]) gl_FUNC_SYMLINK if test $HAVE_SYMLINK = 0 || test $REPLACE_SYMLINK = 1; then AC_LIBOBJ([symlink]) @@ -420,7 +412,6 @@ AC_DEFUN gl_gnulib_enabled_6099e9737f757db36c47fa9d9f02e88c=false gl_gnulib_enabled_secure_getenv=false gl_gnulib_enabled_strtoll=false - gl_gnulib_enabled_strtoull=false gl_gnulib_enabled_tempname=false gl_gnulib_enabled_682e609604ccaac6be382e4ee3a4eaec=false func_gl_gnulib_m4code_260941c0e5dc67ec9e87d1fb321c300b () @@ -569,18 +560,6 @@ AC_DEFUN gl_gnulib_enabled_strtoll=true fi } - func_gl_gnulib_m4code_strtoull () - { - if ! $gl_gnulib_enabled_strtoull; then - gl_FUNC_STRTOULL - if test $HAVE_STRTOULL = 0; then - AC_LIBOBJ([strtoull]) - gl_PREREQ_STRTOULL - fi - gl_STDLIB_MODULE_INDICATOR([strtoull]) - gl_gnulib_enabled_strtoull=true - fi - } func_gl_gnulib_m4code_tempname () { if ! $gl_gnulib_enabled_tempname; then @@ -649,9 +628,6 @@ AC_DEFUN if { test $HAVE_DECL_STRTOIMAX = 0 || test $REPLACE_STRTOIMAX = 1; } && test $ac_cv_type_long_long_int = yes; then func_gl_gnulib_m4code_strtoll fi - if { test $HAVE_DECL_STRTOUMAX = 0 || test $REPLACE_STRTOUMAX = 1; } && test $ac_cv_type_unsigned_long_long_int = yes; then - func_gl_gnulib_m4code_strtoull - fi if test $HAVE_TIMEGM = 0 || test $REPLACE_TIMEGM = 1; then func_gl_gnulib_m4code_5264294aa0a5557541b53c8c741f7f31 fi @@ -670,7 +646,6 @@ AC_DEFUN AM_CONDITIONAL([gl_GNULIB_ENABLED_6099e9737f757db36c47fa9d9f02e88c], [$gl_gnulib_enabled_6099e9737f757db36c47fa9d9f02e88c]) AM_CONDITIONAL([gl_GNULIB_ENABLED_secure_getenv], [$gl_gnulib_enabled_secure_getenv]) AM_CONDITIONAL([gl_GNULIB_ENABLED_strtoll], [$gl_gnulib_enabled_strtoll]) - AM_CONDITIONAL([gl_GNULIB_ENABLED_strtoull], [$gl_gnulib_enabled_strtoull]) AM_CONDITIONAL([gl_GNULIB_ENABLED_tempname], [$gl_gnulib_enabled_tempname]) AM_CONDITIONAL([gl_GNULIB_ENABLED_682e609604ccaac6be382e4ee3a4eaec], [$gl_gnulib_enabled_682e609604ccaac6be382e4ee3a4eaec]) # End of code from modules @@ -940,9 +915,6 @@ AC_DEFUN lib/strtoimax.c lib/strtol.c lib/strtoll.c - lib/strtoul.c - lib/strtoull.c - lib/strtoumax.c lib/symlink.c lib/sys_select.in.h lib/sys_stat.in.h @@ -1051,8 +1023,6 @@ AC_DEFUN m4/string_h.m4 m4/strtoimax.m4 m4/strtoll.m4 - m4/strtoull.m4 - m4/strtoumax.m4 m4/symlink.m4 m4/sys_select_h.m4 m4/sys_socket_h.m4 diff --git a/m4/strtoull.m4 b/m4/strtoull.m4 deleted file mode 100644 index c6b2150..0000000 --- a/m4/strtoull.m4 +++ /dev/null @@ -1,24 +0,0 @@ -# strtoull.m4 serial 7 -dnl Copyright (C) 2002, 2004, 2006, 2008-2017 Free Software Foundation, Inc. -dnl This file is free software; the Free Software Foundation -dnl gives unlimited permission to copy and/or distribute it, -dnl with or without modifications, as long as this notice is preserved. - -AC_DEFUN([gl_FUNC_STRTOULL], -[ - AC_REQUIRE([gl_STDLIB_H_DEFAULTS]) - dnl We don't need (and can't compile) the replacement strtoull - dnl unless the type 'unsigned long long int' exists. - AC_REQUIRE([AC_TYPE_UNSIGNED_LONG_LONG_INT]) - if test "$ac_cv_type_unsigned_long_long_int" = yes; then - AC_CHECK_FUNCS([strtoull]) - if test $ac_cv_func_strtoull = no; then - HAVE_STRTOULL=0 - fi - fi -]) - -# Prerequisites of lib/strtoull.c. -AC_DEFUN([gl_PREREQ_STRTOULL], [ - : -]) diff --git a/m4/strtoumax.m4 b/m4/strtoumax.m4 deleted file mode 100644 index 43ef5b5..0000000 --- a/m4/strtoumax.m4 +++ /dev/null @@ -1,28 +0,0 @@ -# strtoumax.m4 serial 12 -dnl Copyright (C) 2002-2004, 2006, 2009-2017 Free Software Foundation, Inc. -dnl This file is free software; the Free Software Foundation -dnl gives unlimited permission to copy and/or distribute it, -dnl with or without modifications, as long as this notice is preserved. - -AC_DEFUN([gl_FUNC_STRTOUMAX], -[ - AC_REQUIRE([gl_INTTYPES_H_DEFAULTS]) - - dnl On OSF/1 5.1 with cc, this function is declared but not defined. - AC_CHECK_FUNCS_ONCE([strtoumax]) - AC_CHECK_DECLS_ONCE([strtoumax]) - if test "$ac_cv_have_decl_strtoumax" = yes; then - if test "$ac_cv_func_strtoumax" != yes; then - # HP-UX 11.11 has "#define strtoimax(...) ..." but no function. - REPLACE_STRTOUMAX=1 - fi - else - HAVE_DECL_STRTOUMAX=0 - fi -]) - -# Prerequisites of lib/strtoumax.c. -AC_DEFUN([gl_PREREQ_STRTOUMAX], [ - AC_CHECK_DECLS([strtoull]) - AC_REQUIRE([AC_TYPE_UNSIGNED_LONG_LONG_INT]) -]) diff --git a/src/editfns.c b/src/editfns.c index 98187df..1dbae8f 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -3851,6 +3851,23 @@ usage: (propertize STRING &rest PROPERTIES) */) return string; } +/* Convert the prefix of STR from ASCII decimal digits to a number. + Set *STR_END to the address of the first non-digit. Return the + number, or PTRDIFF_MAX on overflow. Return 0 if there is no number. + This is like strtol for ptrdiff_t and base 10 and C locale, + except without negative numbers or errno. */ + +static ptrdiff_t +str2num (char *str, char **str_end) +{ + ptrdiff_t n = 0; + for (; c_isdigit (*str); str++) + if (INT_MULTIPLY_WRAPV (n, 10, &n) || INT_ADD_WRAPV (n, *str - '0', &n)) + n = PTRDIFF_MAX; + *str_end = str; + return n; +} + DEFUN ("format", Fformat, Sformat, 1, MANY, 0, doc: /* Format a string out of a format-string and arguments. The first argument is a format control string. @@ -4057,17 +4074,16 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) digits to print after the '.' for floats, or the max. number of chars to print from a string. */ - uintmax_t num; + ptrdiff_t num; char *num_end; if (c_isdigit (*format)) { - num = strtoumax (format, &num_end, 10); + num = str2num (format, &num_end); if (*num_end == '$') { if (num == 0) error ("Invalid format field number 0"); - n = min (num, PTRDIFF_MAX); - n--; + n = num - 1; format = num_end + 1; } } @@ -4095,15 +4111,15 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) space_flag &= ! plus_flag; zero_flag &= ! minus_flag; - num = strtoumax (format, &num_end, 10); + num = str2num (format, &num_end); if (max_bufsize <= num) string_overflow (); ptrdiff_t field_width = num; bool precision_given = *num_end == '.'; - uintmax_t precision = (precision_given - ? strtoumax (num_end + 1, &num_end, 10) - : UINTMAX_MAX); + ptrdiff_t precision = (precision_given + ? str2num (num_end + 1, &num_end) + : PTRDIFF_MAX); format = num_end; if (format == end) @@ -4176,7 +4192,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) /* handle case (precision[n] >= 0) */ ptrdiff_t prec = -1; - if (precision_given && precision <= TYPE_MAXIMUM (ptrdiff_t)) + if (precision_given) prec = precision; /* lisp_string_width ignores a precision of 0, but GNU @@ -4424,8 +4440,9 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) padding and excess precision. Deal with excess precision first. This happens only when the format specifies ridiculously large precision. */ - uintmax_t excess_precision = precision - prec; - uintmax_t leading_zeros = 0, trailing_zeros = 0; + ptrdiff_t excess_precision + = precision_given ? precision - prec : 0; + ptrdiff_t leading_zeros = 0, trailing_zeros = 0; if (excess_precision) { if (float_conversion) @@ -4451,7 +4468,9 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) /* Compute the total bytes needed for this item, including excess precision and padding. */ - uintmax_t numwidth = sprintf_bytes + excess_precision; + ptrdiff_t numwidth; + if (INT_ADD_WRAPV (sprintf_bytes, excess_precision, &numwidth)) + numwidth = PTRDIFF_MAX; ptrdiff_t padding = numwidth < field_width ? field_width - numwidth : 0; if (max_bufsize - sprintf_bytes <= excess_precision diff --git a/src/lread.c b/src/lread.c index 368b86e..f849398 100644 --- a/src/lread.c +++ b/src/lread.c @@ -3495,25 +3495,18 @@ substitute_in_interval (INTERVAL interval, Lisp_Object arg) } \f -#define LEAD_INT 1 -#define DOT_CHAR 2 -#define TRAIL_INT 4 -#define E_EXP 16 - - -/* Convert STRING to a number, assuming base BASE. Return a fixnum if CP has - integer syntax and fits in a fixnum, else return the nearest float if CP has - either floating point or integer syntax and BASE is 10, else return nil. If - IGNORE_TRAILING, consider just the longest prefix of CP that has - valid floating point syntax. Signal an overflow if BASE is not 10 and the - number has integer syntax but does not fit. */ +/* Convert STRING to a number, assuming base BASE. Return a fixnum if + STRING has integer syntax and fits in a fixnum, else return the + nearest float if STRING has either floating point or integer syntax + and BASE is 10, else return nil. If IGNORE_TRAILING, consider just + the longest prefix of STRING that has valid floating point syntax. + Signal an overflow if BASE is not 10 and the number has integer + syntax but does not fit. */ Lisp_Object string_to_number (char const *string, int base, bool ignore_trailing) { - int state; char const *cp = string; - int leading_digit; bool float_syntax = 0; double value = 0; @@ -3525,15 +3518,23 @@ string_to_number (char const *string, int base, bool ignore_trailing) bool signedp = negative || *cp == '+'; cp += signedp; - state = 0; - - leading_digit = digit_to_number (*cp, base); + enum { INTOVERFLOW = 1, LEAD_INT = 2, DOT_CHAR = 4, TRAIL_INT = 8, + E_EXP = 16 }; + int state = 0; + int leading_digit = digit_to_number (*cp, base); + uintmax_t n = leading_digit; if (leading_digit >= 0) { state |= LEAD_INT; - do - ++cp; - while (digit_to_number (*cp, base) >= 0); + for (int digit; 0 <= (digit = digit_to_number (*++cp, base)); ) + { + if (INT_MULTIPLY_OVERFLOW (n, base)) + state |= INTOVERFLOW; + n *= base; + if (INT_ADD_OVERFLOW (n, digit)) + state |= INTOVERFLOW; + n += digit; + } } if (*cp == '.') { @@ -3583,32 +3584,22 @@ string_to_number (char const *string, int base, bool ignore_trailing) } float_syntax = ((state & (DOT_CHAR|TRAIL_INT)) == (DOT_CHAR|TRAIL_INT) - || state == (LEAD_INT|E_EXP)); + || (state & ~INTOVERFLOW) == (LEAD_INT|E_EXP)); } /* Return nil if the number uses invalid syntax. If IGNORE_TRAILING, accept any prefix that matches. Otherwise, the entire string must match. */ if (! (ignore_trailing ? ((state & LEAD_INT) != 0 || float_syntax) - : (!*cp && ((state & ~DOT_CHAR) == LEAD_INT || float_syntax)))) + : (!*cp && ((state & ~(INTOVERFLOW | DOT_CHAR)) == LEAD_INT + || float_syntax)))) return Qnil; /* If the number uses integer and not float syntax, and is in C-language range, use its value, preferably as a fixnum. */ if (leading_digit >= 0 && ! float_syntax) { - uintmax_t n; - - /* Fast special case for single-digit integers. This also avoids a - glitch when BASE is 16 and IGNORE_TRAILING, because in that - case some versions of strtoumax accept numbers like "0x1" that Emacs - does not allow. */ - if (digit_to_number (string[signedp + 1], base) < 0) - return make_number (negative ? -leading_digit : leading_digit); - - errno = 0; - n = strtoumax (string + signedp, NULL, base); - if (errno == ERANGE) + if (state & INTOVERFLOW) { /* Unfortunately there's no simple and accurate way to convert non-base-10 numbers that are out of C-language range. */ -- 2.9.4 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-Limit-format-fields-to-more-POSIX-like-spec.patch --] [-- Type: text/x-patch; name="0002-Limit-format-fields-to-more-POSIX-like-spec.patch", Size: 5754 bytes --] From 4c9bbc524171fa1022e0cbf4516827be98ae08a5 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Thu, 1 Jun 2017 16:03:12 -0700 Subject: [PATCH 2/2] Limit format fields to more POSIX-like spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * doc/lispref/strings.texi (Formatting Strings): Don’t allow mixing numbered with unnumbered format specs. * src/editfns.c (styled_format): Don’t bother checking for field 0, since it doesn’t crash and the behavior is not specified. * test/src/editfns-tests.el (format-with-field): Adjust tests to match current doc. Add more tests for out-of-range fields. --- doc/lispref/strings.texi | 13 +++++-------- src/editfns.c | 8 ++++---- test/src/editfns-tests.el | 28 ++++++++++++++++++++++------ 3 files changed, 31 insertions(+), 18 deletions(-) diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index 4d33e55..e80e778 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -965,16 +965,13 @@ Formatting Strings decimal number immediately after the initial @samp{%}, followed by a literal dollar sign @samp{$}. It causes the format specification to convert the argument with the given number instead of the next -argument. Argument 1 is the argument just after the format. - - You can mix specifications with and without field numbers. A -specification without a field number that follows a specification with -a field number will convert the argument after the one specified by -the field number: +argument. Field numbers start at 1. A format can contain either +numbered or unnumbered format specifications but not both, except that +@samp{%%} can be mixed with numbered specifications. @example -(format "Argument %2$s, then %s, then %1$s" "x" "y" "z") - @result{} "Argument y, then z, then x" +(format "%2$s, %3$s, %%, %1$s" "x" "y" "z") + @result{} "y, z, %, x" @end example @cindex flags in format specifications diff --git a/src/editfns.c b/src/editfns.c index 1dbae8f..29af25a 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -3900,8 +3900,10 @@ where field is [0-9]+ followed by a literal dollar "$", flags is [+ #-0]+, width is [0-9]+, and precision is a literal period "." followed by [0-9]+. -If field is given, it must be a one-based argument number; the given -argument is substituted instead of the next one. +If a %-sequence is numbered with a field with positive value N, the +Nth argument is substituted instead of the next one. A format can +contain either numbered or unnumbered %-sequences but not both, except +that %% can be mixed with numbered %-sequences. The + flag character inserts a + before any positive number, while a space inserts a space before any positive number; these flags only @@ -4081,8 +4083,6 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) num = str2num (format, &num_end); if (*num_end == '$') { - if (num == 0) - error ("Invalid format field number 0"); n = num - 1; format = num_end + 1; } diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el index c5923aa..54fb743 100644 --- a/test/src/editfns-tests.el +++ b/test/src/editfns-tests.el @@ -178,17 +178,33 @@ transpose-test-get-byte-positions (concat (make-string 2048 ?X) "0"))))) (ert-deftest format-with-field () - (should (equal (format "First argument %2$s, then %s, then %1$s" 1 2 3) + (should (equal (format "First argument %2$s, then %3$s, then %1$s" 1 2 3) "First argument 2, then 3, then 1")) - (should (equal (format "a %2$s %d %1$d %2$S %d %d b" 11 "22" 33 44) + (should (equal (format "a %2$s %3$d %1$d %2$S %3$d %4$d b" 11 "22" 33 44) "a 22 33 11 \"22\" 33 44 b")) - (should (equal (format "a %08$s %s b" 1 2 3 4 5 6 7 8 9) "a 8 9 b")) + (should (equal (format "a %08$s %0000000000000000009$s b" 1 2 3 4 5 6 7 8 9) + "a 8 9 b")) (should (equal (should-error (format "a %999999$s b" 11)) '(error "Not enough arguments for format string"))) + (should (equal (should-error (format "a %2147483647$s b")) + '(error "Not enough arguments for format string"))) + (should (equal (should-error (format "a %9223372036854775807$s b")) + '(error "Not enough arguments for format string"))) + (should (equal (should-error (format "a %9223372036854775808$s b")) + '(error "Not enough arguments for format string"))) + (should (equal (should-error (format "a %18446744073709551615$s b")) + '(error "Not enough arguments for format string"))) + (should (equal (should-error (format "a %18446744073709551616$s b")) + '(error "Not enough arguments for format string"))) + (should (equal (should-error + (format (format "a %%%d$d b" most-positive-fixnum))) + '(error "Not enough arguments for format string"))) + (should (equal (should-error + (format (format "a %%%d$d b" (+ 1.0 most-positive-fixnum)))) + '(error "Not enough arguments for format string"))) (should (equal (should-error (format "a %$s b" 11)) '(error "Invalid format operation %$"))) - (should (equal (should-error (format "a %0$s b" 11)) - '(error "Invalid format field number 0"))) - (should (equal (format "a %1$% %s b" 11) "a % 11 b"))) + (should (equal (should-error (format "a %-1$s b" 11)) + '(error "Invalid format operation %$")))) ;;; editfns-tests.el ends here -- 2.9.4 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-01 23:20 ` Paul Eggert @ 2017-06-02 6:52 ` Philipp Stephani 2017-06-03 8:37 ` Paul Eggert 0 siblings, 1 reply; 40+ messages in thread From: Philipp Stephani @ 2017-06-02 6:52 UTC (permalink / raw) To: Paul Eggert, Jean-Christophe Helary, emacs-devel [-- Attachment #1: Type: text/plain, Size: 740 bytes --] Paul Eggert <eggert@cs.ucla.edu> schrieb am Fr., 2. Juni 2017 um 01:20 Uhr: > On 06/01/2017 01:17 AM, Philipp Stephani wrote: > > > > - Probably there's a bug lurking because the info[n] ought to be > > indexed by specification index, not argument index. Something like > > (format "%1$c %1$d" ?a) will probably do the wrong thing (untested). > > Sorry, I'm not following. That call returns "a 97"; isn't that the > expected result? > Wrong example, try (format "%1$c %1$s" ?±) > And on further thought, the tradition for Emacs is to > document supported behavior and not worry about slowing Emacs down to > check for undocumented usage Would be great to break that tradition, but that's for another discussion. [-- Attachment #2: Type: text/html, Size: 1227 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-02 6:52 ` Philipp Stephani @ 2017-06-03 8:37 ` Paul Eggert 2017-06-03 9:12 ` Andreas Schwab 2017-06-03 9:34 ` Philipp Stephani 0 siblings, 2 replies; 40+ messages in thread From: Paul Eggert @ 2017-06-03 8:37 UTC (permalink / raw) To: Philipp Stephani, Jean-Christophe Helary, emacs-devel [-- Attachment #1: Type: text/plain, Size: 526 bytes --] Philipp Stephani wrote: > Wrong example, try (format "%1$c %1$s" ?±) Ouch. Fixing that (without adversely affecting performance) would be a bit of a hassle. Not sure that it's worth it. For now let's just document the limitation. I installed the attached. >> And on further thought, the tradition for Emacs is to >> document supported behavior and not worry about slowing Emacs down to >> check for undocumented usage > > Would be great to break that tradition, but that's for another discussion. Indeed. [-- Attachment #2: 0001-Document-uniqueness-limitation-of-format.txt --] [-- Type: text/plain, Size: 2443 bytes --] From 0147cdd4d96f1eaeef720ee0b89bddd27eaf4233 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sat, 3 Jun 2017 01:31:04 -0700 Subject: [PATCH] =?UTF-8?q?Document=20uniqueness=20limitation=20of=20?= =?UTF-8?q?=E2=80=98format=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * doc/lispref/strings.texi (Formatting Strings): * src/editfns.c (Fformat): Document that field numbers should be unique within a format. --- doc/lispref/strings.texi | 7 ++++--- src/editfns.c | 7 ++++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index e80e778..f365c80 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -965,9 +965,10 @@ Formatting Strings decimal number immediately after the initial @samp{%}, followed by a literal dollar sign @samp{$}. It causes the format specification to convert the argument with the given number instead of the next -argument. Field numbers start at 1. A format can contain either -numbered or unnumbered format specifications but not both, except that -@samp{%%} can be mixed with numbered specifications. +argument. Field numbers start at 1. A field number should differ +from the other field numbers in the same format. A format can contain +either numbered or unnumbered format specifications but not both, +except that @samp{%%} can be mixed with numbered specifications. @example (format "%2$s, %3$s, %%, %1$s" "x" "y" "z") diff --git a/src/editfns.c b/src/editfns.c index 29af25a..a5088b0 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -3901,9 +3901,10 @@ where field is [0-9]+ followed by a literal dollar "$", flags is followed by [0-9]+. If a %-sequence is numbered with a field with positive value N, the -Nth argument is substituted instead of the next one. A format can -contain either numbered or unnumbered %-sequences but not both, except -that %% can be mixed with numbered %-sequences. +Nth argument is substituted instead of the next one. A field number +should differ from the other field numbers in the same format. A +format can contain either numbered or unnumbered %-sequences but not +both, except that %% can be mixed with numbered %-sequences. The + flag character inserts a + before any positive number, while a space inserts a space before any positive number; these flags only -- 2.7.4 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-03 8:37 ` Paul Eggert @ 2017-06-03 9:12 ` Andreas Schwab 2017-06-03 9:34 ` Philipp Stephani 1 sibling, 0 replies; 40+ messages in thread From: Andreas Schwab @ 2017-06-03 9:12 UTC (permalink / raw) To: Paul Eggert; +Cc: Philipp Stephani, Jean-Christophe Helary, emacs-devel On Jun 03 2017, Paul Eggert <eggert@cs.ucla.edu> wrote: > diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi > index e80e778..f365c80 100644 > --- a/doc/lispref/strings.texi > +++ b/doc/lispref/strings.texi > @@ -965,9 +965,10 @@ Formatting Strings > decimal number immediately after the initial @samp{%}, followed by a > literal dollar sign @samp{$}. It causes the format specification to > convert the argument with the given number instead of the next > -argument. Field numbers start at 1. A format can contain either > -numbered or unnumbered format specifications but not both, except that > -@samp{%%} can be mixed with numbered specifications. > +argument. Field numbers start at 1. A field number should differ > +from the other field numbers in the same format. printf allows using an argument multiple times, Emacs should do as well. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-03 8:37 ` Paul Eggert 2017-06-03 9:12 ` Andreas Schwab @ 2017-06-03 9:34 ` Philipp Stephani 2017-06-04 15:54 ` Paul Eggert 1 sibling, 1 reply; 40+ messages in thread From: Philipp Stephani @ 2017-06-03 9:34 UTC (permalink / raw) To: Paul Eggert, Jean-Christophe Helary, emacs-devel [-- Attachment #1: Type: text/plain, Size: 572 bytes --] Paul Eggert <eggert@cs.ucla.edu> schrieb am Sa., 3. Juni 2017 um 10:37 Uhr: > Philipp Stephani wrote: > > Wrong example, try (format "%1$c %1$s" ?±) > > Ouch. Fixing that (without adversely affecting performance) would be a bit > of a > hassle. Not sure that it's worth it. For now let's just document the > limitation. > I installed the attached. > It's not hard to fix, just the info array needs to be indexed by spec index, no argument. Installed 7d413cb4da89e0bdd70068e6a5e1dbc57190ed10; 0147cdd4d96f1eaeef720ee0b89bddd27eaf4233 can now be reverted. [-- Attachment #2: Type: text/html, Size: 890 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-03 9:34 ` Philipp Stephani @ 2017-06-04 15:54 ` Paul Eggert 2017-06-04 16:45 ` Eli Zaretskii 2017-12-03 5:43 ` Paul Eggert 0 siblings, 2 replies; 40+ messages in thread From: Paul Eggert @ 2017-06-04 15:54 UTC (permalink / raw) To: Philipp Stephani, Jean-Christophe Helary, emacs-devel [-- Attachment #1: Type: text/plain, Size: 878 bytes --] Philipp Stephani wrote: > It's not hard to fix, Thanks. However, on my platform those fixes slowed down a microbenchmark (format "%d %s %c" 3 "def" ?‘) by 7%, so I installed the attached further patch, which recovered the performance loss for me. It avoids the need for the extra pass through the format string and it caches more quantities in registers. While I was at it I changed the doc to agree with POSIX that %% should not have modifiers (not that we enforce this). One little thing that has tripped me up in the past. In Emacs the preferred style is typically to break assignments and initializations before the "=", not after. Like this: int some_long_name = some_long_expression; I myself prefer the style you used, as it's more column-efficient and it simplifies text searches for assignments to a variable, but there it is. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Tune-format-after-recent-fix.patch --] [-- Type: text/x-patch; name="0001-Tune-format-after-recent-fix.patch", Size: 17123 bytes --] From d5fcf9e458053af1c50f0d4dad4c59db056790e5 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sun, 4 Jun 2017 08:39:37 -0700 Subject: [PATCH] =?UTF-8?q?Tune=20=E2=80=98format=E2=80=99=20after=20recen?= =?UTF-8?q?t=20fix?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * doc/lispref/strings.texi (Formatting Strings): * src/editfns.c (Fformat): Format field numbers no longer need to be unique, reverting the previous doc change since that has now been fixed. Also, document that %% should not have modifiers. * src/editfns.c (styled_format): Improve performance. Remove the need for the new prepass over the format string, by using a typically-more-generous bound for the info array size. Initialize the info array lazily. Move string inspection to the same area to help caching. Avoid the need for a converted_to_string bitfield by using EQ. Cache arg in a local and avoid some potential aliasing issues to help the compiler. Info array is now 0-origin, not 1-origin. --- doc/lispref/strings.texi | 13 ++-- src/editfns.c | 154 ++++++++++++++++++++--------------------------- 2 files changed, 72 insertions(+), 95 deletions(-) diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index f365c80..23961f9 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -926,7 +926,8 @@ Formatting Strings @item %% Replace the specification with a single @samp{%}. This format -specification is unusual in that it does not use a value. For example, +specification is unusual in that its only form is plain +@samp{%%} and that it does not use a value. For example, @code{(format "%% %d" 30)} returns @code{"% 30"}. @end table @@ -965,10 +966,9 @@ Formatting Strings decimal number immediately after the initial @samp{%}, followed by a literal dollar sign @samp{$}. It causes the format specification to convert the argument with the given number instead of the next -argument. Field numbers start at 1. A field number should differ -from the other field numbers in the same format. A format can contain -either numbered or unnumbered format specifications but not both, -except that @samp{%%} can be mixed with numbered specifications. +argument. Field numbers start at 1. A format can contain either +numbered or unnumbered format specifications but not both, except that +@samp{%%} can be mixed with numbered specifications. @example (format "%2$s, %3$s, %%, %1$s" "x" "y" "z") @@ -1026,8 +1026,7 @@ Formatting Strings A specification can have a @dfn{width}, which is a decimal number that appears after any field number and flags. If the printed representation of the object contains fewer characters than this -width, @code{format} extends it with padding. The width is -ignored for the @samp{%%} specification. Any padding introduced by +width, @code{format} extends it with padding. Any padding introduced by the width normally consists of spaces inserted on the left: @example diff --git a/src/editfns.c b/src/editfns.c index 56aa8ce..43b17f9 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -3891,8 +3891,8 @@ the next available argument, or the argument explicitly specified: The argument used for %d, %o, %x, %e, %f, %g or %c must be a number. Use %% to put a single % into the output. -A %-sequence may contain optional field number, flag, width, and -precision specifiers, as follows: +A %-sequence other than %% may contain optional field number, flag, +width, and precision specifiers, as follows: %<field><flags><width><precision>character @@ -3901,10 +3901,9 @@ where field is [0-9]+ followed by a literal dollar "$", flags is followed by [0-9]+. If a %-sequence is numbered with a field with positive value N, the -Nth argument is substituted instead of the next one. A field number -should differ from the other field numbers in the same format. A -format can contain either numbered or unnumbered %-sequences but not -both, except that %% can be mixed with numbered %-sequences. +Nth argument is substituted instead of the next one. A format can +contain either numbered or unnumbered %-sequences but not both, except +that %% can be mixed with numbered %-sequences. The + flag character inserts a + before any positive number, while a space inserts a space before any positive number; these flags only @@ -3980,49 +3979,40 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) bool arg_intervals = false; USE_SAFE_ALLOCA; - /* Each element records, for one field, - the corresponding argument, - the start and end bytepos in the output string, - whether the argument has been converted to string (e.g., due to "%S"), - and whether the argument is a string with intervals. */ + /* Information recorded for each format spec. */ struct info { + /* The corresponding argument, converted to string if conversion + was needed. */ Lisp_Object argument; + + /* The start and end bytepos in the output string. */ ptrdiff_t start, end; - bool_bf converted_to_string : 1; + + /* Whether the argument is a string with intervals. */ bool_bf intervals : 1; } *info; CHECK_STRING (args[0]); char *format_start = SSDATA (args[0]); + bool multibyte_format = STRING_MULTIBYTE (args[0]); ptrdiff_t formatlen = SBYTES (args[0]); - /* The number of percent characters is a safe upper bound for the - number of format fields. */ - ptrdiff_t num_percent = 0; - for (ptrdiff_t i = 0; i < formatlen; ++i) - if (format_start[i] == '%') - ++num_percent; + /* Upper bound on number of format specs. Each uses at least 2 chars. */ + ptrdiff_t nspec_bound = SCHARS (args[0]) >> 1; /* Allocate the info and discarded tables. */ ptrdiff_t alloca_size; - if (INT_MULTIPLY_WRAPV (num_percent, sizeof *info, &alloca_size) - || INT_ADD_WRAPV (sizeof *info, alloca_size, &alloca_size) + if (INT_MULTIPLY_WRAPV (nspec_bound, sizeof *info, &alloca_size) || INT_ADD_WRAPV (formatlen, alloca_size, &alloca_size) || SIZE_MAX < alloca_size) memory_full (SIZE_MAX); - /* info[0] is unused. Unused elements have -1 for start. */ info = SAFE_ALLOCA (alloca_size); - memset (info, 0, alloca_size); - for (ptrdiff_t i = 0; i < num_percent + 1; i++) - { - info[i].argument = Qunbound; - info[i].start = -1; - } /* discarded[I] is 1 if byte I of the format string was not copied into the output. It is 2 if byte I was not the first byte of its character. */ - char *discarded = (char *) &info[num_percent + 1]; + char *discarded = (char *) &info[nspec_bound]; + memset (discarded, 0, formatlen); /* Try to determine whether the result should be multibyte. This is not always right; sometimes the result needs to be multibyte @@ -4030,8 +4020,6 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) or because a grave accent or apostrophe is requoted, and in that case, we won't know it here. */ - /* True if the format is multibyte. */ - bool multibyte_format = STRING_MULTIBYTE (args[0]); /* True if the output should be a multibyte string, which is true if any of the inputs is one. */ bool multibyte = multibyte_format; @@ -4042,6 +4030,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) int quoting_style = message ? text_quoting_style () : -1; ptrdiff_t ispec; + ptrdiff_t nspec = 0; /* If we start out planning a unibyte result, then discover it has to be multibyte, we jump back to retry. */ @@ -4155,11 +4144,14 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) if (! (n < nargs)) error ("Not enough arguments for format string"); - eassert (ispec < num_percent); - ++ispec; - - if (EQ (info[ispec].argument, Qunbound)) - info[ispec].argument = args[n]; + struct info *spec = &info[ispec++]; + if (nspec < ispec) + { + spec->argument = args[n]; + spec->intervals = false; + nspec = ispec; + } + Lisp_Object arg = spec->argument; /* For 'S', prin1 the argument, and then treat like 's'. For 's', princ any argument that is not a string or @@ -4167,16 +4159,13 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) happen after retrying. */ if ((conversion == 'S' || (conversion == 's' - && ! STRINGP (info[ispec].argument) - && ! SYMBOLP (info[ispec].argument)))) + && ! STRINGP (arg) && ! SYMBOLP (arg)))) { - if (! info[ispec].converted_to_string) + if (EQ (arg, args[n])) { Lisp_Object noescape = conversion == 'S' ? Qnil : Qt; - info[ispec].argument = - Fprin1_to_string (info[ispec].argument, noescape); - info[ispec].converted_to_string = true; - if (STRING_MULTIBYTE (info[ispec].argument) && ! multibyte) + spec->argument = arg = Fprin1_to_string (arg, noescape); + if (STRING_MULTIBYTE (arg) && ! multibyte) { multibyte = true; goto retry; @@ -4186,29 +4175,25 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) } else if (conversion == 'c') { - if (INTEGERP (info[ispec].argument) - && ! ASCII_CHAR_P (XINT (info[ispec].argument))) + if (INTEGERP (arg) && ! ASCII_CHAR_P (XINT (arg))) { if (!multibyte) { multibyte = true; goto retry; } - info[ispec].argument = - Fchar_to_string (info[ispec].argument); - info[ispec].converted_to_string = true; + spec->argument = arg = Fchar_to_string (arg); } - if (info[ispec].converted_to_string) + if (!EQ (arg, args[n])) conversion = 's'; zero_flag = false; } - if (SYMBOLP (info[ispec].argument)) + if (SYMBOLP (arg)) { - info[ispec].argument = - SYMBOL_NAME (info[ispec].argument); - if (STRING_MULTIBYTE (info[ispec].argument) && ! multibyte) + spec->argument = arg = SYMBOL_NAME (arg); + if (STRING_MULTIBYTE (arg) && ! multibyte) { multibyte = true; goto retry; @@ -4239,12 +4224,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) else { ptrdiff_t nch, nby; - width = lisp_string_width (info[ispec].argument, - prec, &nch, &nby); + width = lisp_string_width (arg, prec, &nch, &nby); if (prec < 0) { - nchars_string = SCHARS (info[ispec].argument); - nbytes = SBYTES (info[ispec].argument); + nchars_string = SCHARS (arg); + nbytes = SBYTES (arg); } else { @@ -4254,11 +4238,8 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) } convbytes = nbytes; - if (convbytes && multibyte && - ! STRING_MULTIBYTE (info[ispec].argument)) - convbytes = - count_size_as_multibyte (SDATA (info[ispec].argument), - nbytes); + if (convbytes && multibyte && ! STRING_MULTIBYTE (arg)) + convbytes = count_size_as_multibyte (SDATA (arg), nbytes); ptrdiff_t padding = width < field_width ? field_width - width : 0; @@ -4274,20 +4255,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) p += padding; nchars += padding; } - info[ispec].start = nchars; + spec->start = nchars; if (p > buf && multibyte && !ASCII_CHAR_P (*((unsigned char *) p - 1)) - && STRING_MULTIBYTE (info[ispec].argument) - && !CHAR_HEAD_P (SREF (info[ispec].argument, 0))) + && STRING_MULTIBYTE (arg) + && !CHAR_HEAD_P (SREF (arg, 0))) maybe_combine_byte = true; - p += copy_text (SDATA (info[ispec].argument), - (unsigned char *) p, + p += copy_text (SDATA (arg), (unsigned char *) p, nbytes, - STRING_MULTIBYTE (info[ispec].argument), - multibyte); + STRING_MULTIBYTE (arg), multibyte); nchars += nchars_string; @@ -4297,12 +4276,12 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) p += padding; nchars += padding; } - info[ispec].end = nchars; + spec->end = nchars; /* If this argument has text properties, record where in the result string it appears. */ - if (string_intervals (info[ispec].argument)) - info[ispec].intervals = arg_intervals = true; + if (string_intervals (arg)) + spec->intervals = arg_intervals = true; continue; } @@ -4313,8 +4292,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) || conversion == 'X')) error ("Invalid format operation %%%c", STRING_CHAR ((unsigned char *) format - 1)); - else if (! (INTEGERP (info[ispec].argument) - || (FLOATP (info[ispec].argument) && conversion != 'c'))) + else if (! (INTEGERP (arg) || (FLOATP (arg) && conversion != 'c'))) error ("Format specifier doesn't match argument type"); else { @@ -4376,7 +4354,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) if (INT_AS_LDBL) { *f = 'L'; - f += INTEGERP (info[ispec].argument); + f += INTEGERP (arg); } } else if (conversion != 'c') @@ -4408,22 +4386,22 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) ptrdiff_t sprintf_bytes; if (float_conversion) { - if (INT_AS_LDBL && INTEGERP (info[ispec].argument)) + if (INT_AS_LDBL && INTEGERP (arg)) { /* Although long double may have a rounding error if DIG_BITS_LBOUND * LDBL_MANT_DIG < FIXNUM_BITS - 1, it is more accurate than plain 'double'. */ - long double x = XINT (info[ispec].argument); + long double x = XINT (arg); sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x); } else sprintf_bytes = sprintf (sprintf_buf, convspec, prec, - XFLOATINT (info[ispec].argument)); + XFLOATINT (arg)); } else if (conversion == 'c') { /* Don't use sprintf here, as it might mishandle prec. */ - sprintf_buf[0] = XINT (info[ispec].argument); + sprintf_buf[0] = XINT (arg); sprintf_bytes = prec != 0; } else if (conversion == 'd' || conversion == 'i') @@ -4432,11 +4410,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) instead so it also works for values outside the integer range. */ printmax_t x; - if (INTEGERP (info[ispec].argument)) - x = XINT (info[ispec].argument); + if (INTEGERP (arg)) + x = XINT (arg); else { - double d = XFLOAT_DATA (info[ispec].argument); + double d = XFLOAT_DATA (arg); if (d < 0) { x = TYPE_MINIMUM (printmax_t); @@ -4456,11 +4434,11 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) { /* Don't sign-extend for octal or hex printing. */ uprintmax_t x; - if (INTEGERP (info[ispec].argument)) - x = XUINT (info[ispec].argument); + if (INTEGERP (arg)) + x = XUINT (arg); else { - double d = XFLOAT_DATA (info[ispec].argument); + double d = XFLOAT_DATA (arg); if (d < 0) x = 0; else @@ -4541,7 +4519,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) exponent_bytes = src + sprintf_bytes - e; } - info[ispec].start = nchars; + spec->start = nchars; if (! minus_flag) { memset (p, ' ', padding); @@ -4572,7 +4550,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) p += padding; nchars += padding; } - info[ispec].end = nchars; + spec->end = nchars; continue; } @@ -4681,7 +4659,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) if (CONSP (props)) { ptrdiff_t bytepos = 0, position = 0, translated = 0; - ptrdiff_t fieldn = 1; + ptrdiff_t fieldn = 0; /* Adjust the bounds of each text property to the proper start and end in the output string. */ @@ -4747,7 +4725,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) /* Add text properties from arguments. */ if (arg_intervals) - for (ptrdiff_t i = 1; i <= num_percent; i++) + for (ptrdiff_t i = 0; i < nspec; i++) if (info[i].intervals) { len = make_number (SCHARS (info[i].argument)); -- 2.7.4 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-04 15:54 ` Paul Eggert @ 2017-06-04 16:45 ` Eli Zaretskii 2017-06-04 18:37 ` Paul Eggert 2017-12-03 5:43 ` Paul Eggert 1 sibling, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2017-06-04 16:45 UTC (permalink / raw) To: Paul Eggert; +Cc: p.stephani2, jean.christophe.helary, emacs-devel > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sun, 4 Jun 2017 08:54:31 -0700 > > However, on my platform those fixes slowed down a microbenchmark (format > "%d %s %c" 3 "def" ?‘) by 7%, so I installed the attached further patch, which > recovered the performance loss for me. Would it make sense to start collecting all those benchmarks, e.g. somewhere under test/, so that in due time we will have a sound body of code to make sure we don't lose performance during development? TIA ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-04 16:45 ` Eli Zaretskii @ 2017-06-04 18:37 ` Paul Eggert 0 siblings, 0 replies; 40+ messages in thread From: Paul Eggert @ 2017-06-04 18:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: p.stephani2, jean.christophe.helary, emacs-devel Eli Zaretskii wrote: > Would it make sense to start collecting all those benchmarks, e.g. > somewhere under test/, so that in due time we will have a sound > body of code I wouldn't collect microbenchmarks just for the sake of collecting them. They're flaky, and the overhead of collecting them and maintaining the collection is likely greater than any future benefit of having them around. If we want to have a performance testsuite our main problem is building and supporting its infrastructure, not coming up with examples. If you're interested despite the above disclaimer, here's what I used: (defun bench-with (n) (let ((start (float-time (get-internal-run-time))) (i 0)) (while (< i n) (format "%d %s %c" 3 "def" ?‘) (setq i (1+ i))) (- (float-time (get-internal-run-time)) start))) (defun bench-without (n) (let ((start (float-time (get-internal-run-time))) (i 0)) (while (< i n) (setq i (1+ i))) (- (float-time (get-internal-run-time)) start))) (defun bench (n) (- (bench-with n) (bench-without n))) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-06-04 15:54 ` Paul Eggert 2017-06-04 16:45 ` Eli Zaretskii @ 2017-12-03 5:43 ` Paul Eggert 1 sibling, 0 replies; 40+ messages in thread From: Paul Eggert @ 2017-12-03 5:43 UTC (permalink / raw) To: Philipp Stephani, Jean-Christophe Helary, emacs-devel [-- Attachment #1: Type: text/plain, Size: 506 bytes --] I wrote on 2017-06-04: > Philipp Stephani wrote: >> It's not hard to fix, > > Thanks. However, on my platform those fixes slowed down a microbenchmark (format > "%d %s %c" 3 "def" ?‘) by 7%, so I installed the attached further patch, which > recovered the performance loss for me. Oops, that introduced an off-by-one error that caused Emacs to access one past the end of the 'info' array. To fix this I installed the attached further patch into emacs-26 and merged emacs-26 into master. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Fix-bug-in-i18n-l10n-optimization.patch --] [-- Type: text/x-patch; name="0001-Fix-bug-in-i18n-l10n-optimization.patch", Size: 1429 bytes --] From 04e5b28ff1691345e023a944dc6a6a9e9573bd07 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sat, 2 Dec 2017 21:31:24 -0800 Subject: [PATCH] Fix bug in i18n/l10n optimization This fixes a off-by-one buffer overrun bug introduced in 2017-06-04T15:39:37Z!eggert@cs.ucla.edu. Problem uncovered by an experimental version of Emacs built with -fcheck-pointer-bounds and running on Intel MPX hardware. * src/editfns.c (styled_format): Avoid overrunning internal buffers. --- src/editfns.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/editfns.c b/src/editfns.c index f275f33..f7c26b9 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -4919,7 +4919,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) else if (discarded[bytepos] == 1) { position++; - if (translated == info[fieldn].start) + if (fieldn < nspec && translated == info[fieldn].start) { translated += info[fieldn].end - info[fieldn].start; fieldn++; @@ -4939,7 +4939,7 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message) else if (discarded[bytepos] == 1) { position++; - if (translated == info[fieldn].start) + if (fieldn < nspec && translated == info[fieldn].start) { translated += info[fieldn].end - info[fieldn].start; fieldn++; -- 2.7.4 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-31 22:18 ` Philipp Stephani 2017-05-31 22:29 ` Jean-Christophe Helary 2017-06-01 5:18 ` Paul Eggert @ 2017-06-01 14:21 ` Eli Zaretskii 2017-07-02 1:22 ` Jean-Christophe Helary 3 siblings, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2017-06-01 14:21 UTC (permalink / raw) To: Philipp Stephani; +Cc: jean.christophe.helary, emacs-devel > From: Philipp Stephani <p.stephani2@gmail.com> > Date: Wed, 31 May 2017 22:18:11 +0000 > > +@cindex field number I think this should say "field numbers in format spec" > + A specification can have a @dfn{field number}, which is a decimal > +number after the initial @samp{%}, followed by a literal dollar sign > +@samp{$}. If you provide a field numer, then the argument to be ^^^^^ A typo. Thank you for working on this valuable improvement. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-31 22:18 ` Philipp Stephani ` (2 preceding siblings ...) 2017-06-01 14:21 ` Eli Zaretskii @ 2017-07-02 1:22 ` Jean-Christophe Helary 2017-07-02 2:34 ` Eli Zaretskii 3 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2017-07-02 1:22 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2113 bytes --] I was wondering if that patch (and the fixes that came after) had been committed ? Have some people started to use it ? Jean-Christophe > On Jun 1, 2017, at 7:18, Philipp Stephani <p.stephani2@gmail.com> wrote: > > > > Jean-Christophe Helary <jean.christophe.helary@gmail.com <mailto:jean.christophe.helary@gmail.com>> schrieb am So., 28. Mai 2017 um 07:29 Uhr: > The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings). > > ==================================================== > On May 27, 2017, at 10:52, Jean-Christophe Helary <jean.christophe.helary@gmail.com <mailto:jean.christophe.helary@gmail.com>> wrote: > >> My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected. > > On May 27, 2017, at 16:43, Eli Zaretskii <eliz@gnu.org <mailto:eliz@gnu.org>> wrote: > >> Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some >> use "error", some use other mechanisms. > > Internally, they all boil down to a small set of C functions, which is > where we should make these changes. > ==================================================== > > Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever... > > One small aspect would be to implement field numbers for `format' so that argument indices can be explicitly specified. That is probably quite important because the word order is different between languages, but it's also useful in other situations (e.g. when repeating an argument). I've implemented this in the attached patch. > <0001-Implement-field-numbers-in-format-strings.txt> [-- Attachment #2: Type: text/html, Size: 3552 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-07-02 1:22 ` Jean-Christophe Helary @ 2017-07-02 2:34 ` Eli Zaretskii 0 siblings, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2017-07-02 2:34 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Sun, 2 Jul 2017 10:22:28 +0900 > > I was wondering if that patch (and the fixes that came after) had been committed ? It was. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-28 5:29 Jean-Christophe Helary 2017-05-28 14:27 ` Drew Adams 2017-05-31 22:18 ` Philipp Stephani @ 2017-07-28 0:15 ` Byung-Hee HWANG (황병희, 黃炳熙) 2018-04-25 12:58 ` Jean-Christophe Helary 3 siblings, 0 replies; 40+ messages in thread From: Byung-Hee HWANG (황병희, 黃炳熙) @ 2017-07-28 0:15 UTC (permalink / raw) To: emacs-devel Jean-Christophe Helary <jean.christophe.helary@gmail.com> 께서 쓰시길, 《記事 全文 <AA86315D-1561-41EB-A349-63100C565E8D@gmail.com> 에서》: > The discussion so far seems to point at modifying 'message' and the > likes so that developers don't have to bother with any l10n mechanism > on their part (besides for writing clean strings). Sorry for late. Possibly we should go with *UTF-8* if this plan committed. It is just my concern. Thanks!!! -- ^고맙습니다 _救濟蒼生_ 감사합니다_^))// ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2017-05-28 5:29 Jean-Christophe Helary ` (2 preceding siblings ...) 2017-07-28 0:15 ` Byung-Hee HWANG (황병희, 黃炳熙) @ 2018-04-25 12:58 ` Jean-Christophe Helary 2018-09-21 4:18 ` Jean-Christophe Helary 3 siblings, 1 reply; 40+ messages in thread From: Jean-Christophe Helary @ 2018-04-25 12:58 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 881 bytes --] > On May 28, 2017, at 14:29, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote: > > The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings). Since there does not seem to be an agreement on the "straightening" of the package.el strings that I posted here (even though Eli seemed to be fine with them and nobody raised any objection), I'm going to start working on other packages and I will continue to send patches here. Eventually somebody will see why that is important and will commit my package.el modifications *or* tell me why they are not acceptable (which I'm totally fine with btw) :) Jean-Christophe Helary ----------------------------------------------- http://mac4translators.blogspot.com @brandelune [-- Attachment #2: Type: text/html, Size: 3114 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: i18n/l10n summary 2018-04-25 12:58 ` Jean-Christophe Helary @ 2018-09-21 4:18 ` Jean-Christophe Helary 0 siblings, 0 replies; 40+ messages in thread From: Jean-Christophe Helary @ 2018-09-21 4:18 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2488 bytes --] 1) packages.el I just want to remind interested parties that packages.el has seen most of its strings "straightened" so that they don't rely on English grammatical structures (s for plurals, etc.) The modifications changed strings like: "3 packages have been updated" into "Number of updated packages: 3" as well as a few other things. I'd like to thank Eli, Noam and a few others (sorry that I forgot your names) for the coding hints. 2) other packages with user facing strings There are lots of packages that have user facing strings, and most of them use code to produce English grammar structures (a simple case is if a number is greater than 1 then add an "s" to the string). Such code should be fixed to remove the reliance on English grammatical structures. Similarly, code constructs that create a given word order ( (concat "Nominal group" "Verbal group" "Nominal group", for ex) must be fixed so that they are the most "natural language" agnostic possible. It would be nice if people check the packages.el modifications and try to adapt other packages in the same spirit. 3) I heard that Xemacs had the infrastructure required to support gettext are there people here familiar with this aspect of Xemacs ? Jean-Christophe > On Apr 25, 2018, at 21:58, Jean-Christophe Helary <jean.christophe.helary@gmail.com> wrote: > > >> On May 28, 2017, at 14:29, Jean-Christophe Helary <jean.christophe.helary@gmail.com <mailto:jean.christophe.helary@gmail.com>> wrote: >> >> The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings). > > Since there does not seem to be an agreement on the "straightening" of the package.el strings that I posted here (even though Eli seemed to be fine with them and nobody raised any objection), I'm going to start working on other packages and I will continue to send patches here. > > Eventually somebody will see why that is important and will commit my package.el modifications *or* tell me why they are not acceptable (which I'm totally fine with btw) :) > > > Jean-Christophe Helary > ----------------------------------------------- > http://mac4translators.blogspot.com <http://mac4translators.blogspot.com/> @brandelune > > Jean-Christophe Helary ----------------------------------------------- http://mac4translators.blogspot.com @brandelune [-- Attachment #2: Type: text/html, Size: 6624 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2018-09-21 4:18 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <<AA86315D-1561-41EB-A349-63100C565E8D@gmail.com> [not found] ` <<0aca6c65-4610-44c2-99c4-6cbe7aa68c9a@default> [not found] ` <<83a85xgfo2.fsf@gnu.org> 2017-05-28 21:52 ` i18n/l10n summary Drew Adams 2017-05-28 5:29 Jean-Christophe Helary 2017-05-28 14:27 ` Drew Adams 2017-05-28 14:36 ` Jean-Christophe Helary 2017-05-28 15:33 ` Eli Zaretskii 2017-06-05 12:55 ` Jean-Christophe Helary 2017-07-17 23:22 ` Jean-Christophe Helary 2017-07-22 12:48 ` Jean-Christophe Helary 2017-07-22 13:06 ` Eli Zaretskii 2017-07-22 13:45 ` Jean-Christophe Helary 2017-07-22 14:08 ` Eli Zaretskii 2017-07-22 23:54 ` Jean-Christophe Helary 2017-07-23 14:39 ` Eli Zaretskii 2017-07-23 23:29 ` Jean-Christophe Helary 2017-07-24 14:47 ` Eli Zaretskii 2017-07-24 15:34 ` Jean-Christophe Helary 2017-07-24 15:51 ` Eli Zaretskii 2017-07-24 16:08 ` Jean-Christophe Helary 2017-07-24 16:29 ` Eli Zaretskii 2017-07-24 16:48 ` Jean-Christophe Helary 2017-07-24 16:55 ` Eli Zaretskii 2017-05-31 22:18 ` Philipp Stephani 2017-05-31 22:29 ` Jean-Christophe Helary 2017-06-01 5:18 ` Paul Eggert 2017-06-01 8:17 ` Philipp Stephani 2017-06-01 23:20 ` Paul Eggert 2017-06-02 6:52 ` Philipp Stephani 2017-06-03 8:37 ` Paul Eggert 2017-06-03 9:12 ` Andreas Schwab 2017-06-03 9:34 ` Philipp Stephani 2017-06-04 15:54 ` Paul Eggert 2017-06-04 16:45 ` Eli Zaretskii 2017-06-04 18:37 ` Paul Eggert 2017-12-03 5:43 ` Paul Eggert 2017-06-01 14:21 ` Eli Zaretskii 2017-07-02 1:22 ` Jean-Christophe Helary 2017-07-02 2:34 ` Eli Zaretskii 2017-07-28 0:15 ` Byung-Hee HWANG (황병희, 黃炳熙) 2018-04-25 12:58 ` Jean-Christophe Helary 2018-09-21 4:18 ` Jean-Christophe Helary
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).