Re: bug#34520: delete-matching-lines should report how many lines it deleted

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: bug#34520: delete-matching-lines should report how many lines it deleted
       [not found]         ` <871s3p0zdz.fsf@mail.linkov.net>
@ 2019-03-03  3:04           ` Richard Stallman
  2019-03-03 15:31             ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-03  3:04 UTC (permalink / raw)
  To: Juri Linkov; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

You wrote:

======================================================================
Here is an experimental but extensible implementation
that handles the case of formatting the recently added message
taking into account grammatical number of its argument:

  (defvar i18n-translations-hash (make-hash-table :test 'equal))

  (defun i18n-add-translation (_language-environment from to)
    (puthash from to i18n-translations-hash))

  (i18n-add-translation
   "English"
   "Deleted %d matching lines"
   (lambda (format-string count)
     (if (= count 1)
         "Deleted %d matching line"
         "Deleted %d matching lines")))

  (defun i18n-get-translation (format-string &rest args)
    (pcase (gethash format-string i18n-translations-hash)
      ((and (pred functionp) f) (apply f format-string args))
      ((and (pred stringp) s) s)
      (_ format-string)))

  (advice-add 'message :around
              (lambda (orig-fun format-string &rest args)
                (apply orig-fun (apply 'i18n-get-translation format-string args) args))
              '((name . message-i18n)))
======================================================================

It seems pretty good.  When installing it, it should not use
`advice-add'.  Rather, `message' should call a list of functions.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-03  3:04           ` bug#34520: delete-matching-lines should report how many lines it deleted Richard Stallman
@ 2019-03-03 15:31             ` Eli Zaretskii
  2019-03-03 20:57               ` Emacs i18n Juri Linkov
  2019-03-04  3:27               ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Richard Stallman
  0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-03 15:31 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, juri

> From: Richard Stallman <rms@gnu.org>
> Date: Sat, 02 Mar 2019 22:04:06 -0500
> Cc: emacs-devel@gnu.org
> 
>   (advice-add 'message :around
>               (lambda (orig-fun format-string &rest args)
>                 (apply orig-fun (apply 'i18n-get-translation format-string args) args))
>               '((name . message-i18n)))
> ======================================================================
> 
> It seems pretty good.  When installing it, it should not use
> `advice-add'.  Rather, `message' should call a list of functions.

This has come up several times in the past.  The main problem with
i18n in Emacs is that, unlike in many text-mode programs, 'message'
covers a tiny portion of the Emacs UI.  We have help commands that pop
up buffers; we have commands that prompt in the minibuffer; we have
menu items and labels on tool-bar buttons; we have help-echo on menus,
tool bar, the mode line, and mouse-sensitive text; we have tooltips;
etc. etc.  What's worse, most of the text shown by these features is
computed dynamically by the commands that display the text.

Any reasonably relevant i18n infrastructure for Emacs should address
at least some of the above.  For example, a significant progress could
be made if we had infrastructure for translating doc strings, which
would allow translators to provide message catalogs for individual
Lisp packages.  Past discussions revealed that even this limited
progress is not really trivial.

Unfortunately, past discussions didn't lead to any significant
progress wrt this.  While doing some progress would be welcome, I
suggest that we don't pretend the solution is as easy as advice around
'message', but instead try to attack the more significant parts of the
problem.

Volunteers are welcome.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-03 15:31             ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Eli Zaretskii
@ 2019-03-03 20:57               ` Juri Linkov
  2019-03-04  1:46                 ` Jean-Christophe Helary
  2019-03-04  3:27               ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Richard Stallman
  1 sibling, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-03 20:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 825 bytes --]

>> It seems pretty good.  When installing it, it should not use
>> `advice-add'.  Rather, `message' should call a list of functions.
>
> Unfortunately, past discussions didn't lead to any significant
> progress wrt this.

My intention was to fix the bug which manifests itself in
grammatically incorrect sentences displayed by ‘message’ like

  Deleted 1 matching lines
  1 matches found
  ...

After searching for available packages I found only this page
https://savannah.nongnu.org/projects/emacs-i18n
that shows no progress for many years.

So here is a patch that fixes the bug by translating currently
invalid messages into grammatically correct English.  It also
opens the gate towards translation of messages in many languages.
Currently this feature is activated by (require 'i18n-message):

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: i18n-message.patch --]
[-- Type: text/x-diff, Size: 10283 bytes --]

diff --git a/lisp/replace.el b/lisp/replace.el
index 59ad1a375b..b05bb51353 100644
--- a/lisp/replace.el
+++ b/lisp/replace.el
@@ -986,6 +986,12 @@ flush-lines
     (when interactive (message "Deleted %d matching lines" count))
     count))
 
+(eval-after-load "i18n-message"
+  '(i18n-add-translation "English"
+                         "Deleted %d matching lines"
+                         '("Deleted %d matching line"
+                           "Deleted %d matching lines")))
+
 (defun how-many (regexp &optional rstart rend interactive)
   "Print and return number of matches for REGEXP following point.
 When called from Lisp and INTERACTIVE is omitted or nil, just return
@@ -1032,11 +1038,15 @@ how-many
 	(if (= opoint (point))
 	    (forward-char 1)
 	  (setq count (1+ count))))
-      (when interactive (message "%d occurrence%s"
-				 count
-				 (if (= count 1) "" "s")))
+      (when interactive (message "%d occurrences" count))
       count)))
 
+(eval-after-load "i18n-message"
+  '(i18n-add-translation "English"
+                         "%d occurrences"
+                         '("%d occurrence"
+                           "%d occurrences")))
+
 \f
 (defvar occur-menu-map
   (let ((map (make-sparse-keymap)))
@@ -2730,10 +2740,7 @@ perform-replace
                                            (1+ num-replacements))))))
                              (when (and (eq def 'undo-all)
                                         (null (zerop num-replacements)))
-                               (message "Undid %d %s" num-replacements
-                                        (if (= num-replacements 1)
-                                            "replacement"
-                                          "replacements"))
+                               (message "Undid %d replacements" num-replacements)
                                (ding 'no-terminate)
                                (sit-for 1)))
 			   (setq replaced nil last-was-undo t last-was-act-and-show nil)))
@@ -2859,9 +2866,8 @@ perform-replace
                       last-was-act-and-show     nil))))))
       (replace-dehighlight))
     (or unread-command-events
-	(message "Replaced %d occurrence%s%s"
+	(message "Replaced %d occurrences%s"
 		 replace-count
-		 (if (= replace-count 1) "" "s")
 		 (if (> (+ skip-read-only-count
 			   skip-filtered-count
 			   skip-invisible-count)
@@ -2883,6 +2889,16 @@ perform-replace
 		   "")))
     (or (and keep-going stack) multi-buffer)))
 
+(eval-after-load "i18n-message"
+  '(i18n-add-translations
+    "English"
+    '(("Undid %d replacements"
+       ("Undid %d replacement"
+        "Undid %d replacements"))
+      ("Replaced %d occurrences%s"
+       ("Replaced %d occurrence%s"
+        "Replaced %d occurrences%s")))))
+
 (provide 'replace)
 
 ;;; replace.el ends here
diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index 3fd2a7e701..d2d748fca3 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -459,7 +459,7 @@ grep-mode-font-lock-keywords
      ;; remove match from grep-regexp-alist before fontifying
      ("^Grep[/a-zA-z]* started.*"
       (0 '(face nil compilation-message nil help-echo nil mouse-face nil) t))
-     ("^Grep[/a-zA-z]* finished with \\(?:\\(\\(?:[0-9]+ \\)?matches found\\)\\|\\(no matches found\\)\\).*"
+     ("^Grep[/a-zA-z]* finished with \\(?:\\(\\(?:[0-9]+ \\)?match\\(?:es\\)? found\\)\\|\\(no matches found\\)\\).*"
       (0 '(face nil compilation-message nil help-echo nil mouse-face nil) t)
       (1 compilation-info-face nil t)
       (2 compilation-warning-face nil t))
@@ -561,6 +561,12 @@ grep-exit-message
 	     (cons msg code)))
     (cons msg code)))
 
+(eval-after-load "i18n-message"
+  '(i18n-add-translation "English"
+                         "finished with %d matches found\n"
+                         '("finished with %d match found\n"
+                           "finished with %d matches found\n")))
+
 (defun grep-filter ()
   "Handle match highlighting escape sequences inserted by the grep process.
 This function is called from `compilation-filter-hook'."
diff --git a/lisp/international/i18n-message.el b/lisp/international/i18n-message.el
new file mode 100644
index 0000000000..14755966e0
--- /dev/null
+++ b/lisp/international/i18n-message.el
@@ -0,0 +1,118 @@
+;;; i18n-message.el --- internationalization of messages  -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2019 Free Software Foundation, Inc.
+
+;; Author: Juri Linkov <juri@linkov.net>
+;; Maintainer: emacs-devel@gnu.org
+;; Keywords: i18n, multilingual
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;;
+
+;;; Code:
+
+(defcustom i18n-fallbacks
+  '(("en" "English"))
+  "An alist mapping the current language to possible fallbacks.
+Each element should look like (\"LANG\" . FALLBACK-LIST), where
+FALLBACK-LIST is a list of languages to try to find a translation."
+  :type '(alist :key-type (string :tag "Current language")
+                :value-type (repeat :tag "A list of fallbacks" string))
+  :group 'i18n
+  :version "27.1")
+
+(defvar i18n-dictionaries (make-hash-table :test 'equal))
+
+(defun i18n-add-dictionary (lang)
+  (unless (gethash lang i18n-dictionaries)
+    (puthash lang (make-hash-table :test 'equal) i18n-dictionaries)))
+
+;;;###autoload
+(defun i18n-add-translation (lang from to)
+  (let ((dict (gethash lang i18n-dictionaries)))
+    (unless dict
+      (setq dict (i18n-add-dictionary lang)))
+    (puthash from to dict)))
+
+;;;###autoload
+(defun i18n-add-translations (lang translations)
+  (dolist (translation translations)
+    (i18n-add-translation lang (nth 0 translation) (nth 1 translation))))
+
+(defun i18n-get-plural (lang n)
+  ;; Source: (info "(gettext) Plural forms")
+  (pcase lang
+    ((or "Japanese" "Vietnamese" "Korean" "Thai")
+     0)
+    ((or "English" "German" "Dutch" "Swedish" "Danish" "Norwegian"
+         "Faroese" "Spanish" "Portuguese" "Italian" "Bulgarian" "Greek"
+         "Finnish" "Estonian" "Hebrew" "Bahasa Indonesian" "Esperanto"
+         "Hungarian" "Turkish")
+     (if (/= n 1) 1 0))
+    ((or "Brazilian Portuguese" "French")
+     (if (> n 1) 1 0))
+    ((or "Latvian")
+     (if (and (= (% n 10) 1) (/= (% n 100) 11)) 0 (if (/= n 0) 1 2)))
+    ((or "Gaeilge" "Irish")
+     (if (= n 1) 0 (if (= n 2) 1 2)))
+    ((or "Romanian")
+     (if (= n 1) 0 (if (or (= n 0) (and (> (% n 100) 0) (< (% n 100) 20))) 1 2)))
+    ((or "Lithuanian")
+     (if (and (= (% n 10) 1) (/= (% n 100) 11)) 0
+       (if (and (>= (% n 10) 2) (or (< (% n 100) 10) (>= (% n 100) 20))) 1 2)))
+    ((or "Russian" "Ukrainian" "Belarusian" "Serbian" "Croatian")
+     (if (and (= (% n 10) 1) (/= (% n 100) 11)) 0
+       (if (and (>= (% n 10) 2) (<= (% n 10) 4) (or (< (% n 100) 10) (>= (% n 100) 20))) 1 2)))
+    ((or "Czech" "Slovak")
+     (if (= n 1) 0 (if (and (>= n 2) (<= n 4)) 1 2)))
+    ((or "Polish")
+     (if (= n 1) 0
+       (if (and (>= (% n 10) 2) (<= (% n 10) 4) (or (< (% n 100) 10) (>= (% n 100) 20))) 1 2)))
+    ((or "Slovenian")
+     (if (= (% n 100) 1) 0 (if (= (% n 100) 2) 1 (if (or (= (% n 100) 3) (= (% n 100) 4)) 2 3))))
+    ((or "Arabic")
+     (if (= n 0) 0 (if (= n 1) 1 (if (= n 2) 2 (if (and (>= (% n 100) 3) (<= (% n 100) 10)) 3
+                                                 (if (>= (% n 100) 11) 4 5))))))))
+
+(defun i18n-get-translation (format-string &rest args)
+  (let* ((lang current-language-environment)
+	 (fallbacks (cdr (assoc lang i18n-fallbacks)))
+	 dict found)
+    (while (and (not found) lang)
+      (when (setq dict (gethash lang i18n-dictionaries))
+	(setq found
+              (pcase (gethash format-string dict)
+		((and (pred functionp) f) (apply f format-string args))
+		((and (pred stringp) s) s)
+		((and (pred consp) l)
+		 (let ((n (i18n-get-plural lang (car args))))
+		   (when n (nth n l)))))))
+      (unless found
+	(setq lang (pop fallbacks))))
+    (or found format-string)))
+
+(defun i18n-message-translate (&rest args)
+  (apply 'i18n-get-translation args))
+
+(defvar message-translate-function)
+
+(setq message-translate-function 'i18n-message-translate)
+
+(provide 'i18n-message)
+;;; i18n-message.el ends here
diff --git a/src/editfns.c b/src/editfns.c
index bffb5db43e..f517679576 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -3050,6 +3050,14 @@ produced text.
 usage: (format STRING &rest OBJECTS)  */)
   (ptrdiff_t nargs, Lisp_Object *args)
 {
+  if (!NILP (Vmessage_translate_function) && nargs > 0)
+    {
+      Lisp_Object format = apply1 (Vmessage_translate_function,
+				   Flist (nargs, args));
+      if (STRINGP (format))
+	args[0] = format;
+    }
+
   return styled_format (nargs, args, false);
 }
 
@@ -3066,6 +3074,14 @@ and right quote replacement characters are specified by
 usage: (format-message STRING &rest OBJECTS)  */)
   (ptrdiff_t nargs, Lisp_Object *args)
 {
+  if (!NILP (Vmessage_translate_function) && nargs > 0)
+    {
+      Lisp_Object format = apply1 (Vmessage_translate_function,
+				   Flist (nargs, args));
+      if (STRINGP (format))
+	args[0] = format;
+    }
+
   return styled_format (nargs, args, true);
 }
 
@@ -4462,6 +4478,11 @@ of the buffer being accessed.  */);
 functions if all the text being accessed has this property.  */);
   Vbuffer_access_fontified_property = Qnil;
 
+  DEFVAR_LISP ("message-translate-function",
+	       Vmessage_translate_function,
+	       doc: /* Function that translates messages.  */);
+  Vmessage_translate_function = Qnil;
+
   DEFVAR_LISP ("system-name", Vsystem_name,
 	       doc: /* The host name of the machine Emacs is running on.  */);
   Vsystem_name = cached_system_name = Qnil;

^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-03 20:57               ` Emacs i18n Juri Linkov
@ 2019-03-04  1:46                 ` Jean-Christophe Helary
  2019-03-06  9:38                   ` Elias Mårtenson
  2019-03-21 20:33                   ` Clément Pit-Claudel
  0 siblings, 2 replies; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-04  1:46 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Eli Zaretskii, Richard Stallman, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 625 bytes --]



> On Mar 4, 2019, at 5:57, Juri Linkov <juri@linkov.net <mailto:juri@linkov.net>> wrote:
> 
> My intention was to fix the bug which manifests itself in
> grammatically incorrect sentences displayed by ‘message’ like
> 
>  Deleted 1 matching lines
>  1 matches found
>  ...

The best way to do that (I fixed the almost 100% of the package.el code with that) is to not use such syntax but rather things like:

Number of matches found: %d


Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com <http://mac4translators.blogspot.com/> @brandelune



[-- Attachment #2: Type: text/html, Size: 3121 bytes --]

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-04  1:46                 ` Jean-Christophe Helary
@ 2019-03-06  9:38                   ` Elias Mårtenson
  2019-03-06 11:23                     ` Jean-Christophe Helary
  2019-03-21 20:33                   ` Clément Pit-Claudel
  1 sibling, 1 reply; 145+ messages in thread
From: Elias Mårtenson @ 2019-03-06  9:38 UTC (permalink / raw)
  To: Jean-Christophe Helary
  Cc: Eli Zaretskii, emacs-devel, Richard Stallman, Juri Linkov

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

On Mon, 4 Mar 2019 at 09:48, Jean-Christophe Helary <brandelune@gmail.com>
wrote:

>
> The best way to do that (I fixed the almost 100% of the package.el code
> with that) is to not use such syntax but rather things like:
>
> Number of matches found: %d
>

That works for English most of the time (although I would argue that it
isn't great). But it may be harder in other languages.

[-- Attachment #2: Type: text/html, Size: 979 bytes --]

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06  9:38                   ` Elias Mårtenson
@ 2019-03-06 11:23                     ` Jean-Christophe Helary
  0 siblings, 0 replies; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-06 11:23 UTC (permalink / raw)
  To: emacs-devel

> On Mar 6, 2019, at 18:38, Elias Mårtenson 
> 
> The best way to do that (I fixed the almost 100% of the package.el code with that) is to not use such syntax but rather things like:
> 
> Number of matches found: %d
> 
> That works for English most of the time (although I would argue that it isn't great).

I'm not sure why that is not "great" here. But I know from what I saw in packages.el that what is even less great it to get lost in lisp that attempts to mimic natural language inflections.

> But it may be harder in other languages.

It is unlikely that this structure is harder in non English languages than the original. Removing the need to express the difference in number actually removes an order of complexity.

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-04  1:46                 ` Jean-Christophe Helary
  2019-03-06  9:38                   ` Elias Mårtenson
@ 2019-03-21 20:33                   ` Clément Pit-Claudel
  2019-03-21 20:50                     ` Eli Zaretskii
                                       ` (2 more replies)
  1 sibling, 3 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2019-03-21 20:33 UTC (permalink / raw)
  To: Jean-Christophe Helary, Juri Linkov
  Cc: Eli Zaretskii, Richard Stallman, emacs-devel

On 2019-03-03 20:46, Jean-Christophe Helary wrote:
>> On Mar 4, 2019, at 5:57, Juri Linkov <juri@linkov.net <mailto:juri@linkov.net>> wrote:
>> My intention was to fix the bug which manifests itself in
>> grammatically incorrect sentences displayed by ‘message’ like
>>
>>  Deleted 1 matching lines
>>  1 matches found
>>  ...
> 
> The best way to do that (I fixed the almost 100% of the package.el code with that) is to not use such syntax but rather things like:
> 
> Number of matches found: %d

I'm a bit late to the party, but I hope it's still OK to respond :)  This is a valid way to work around the issue, but I'm not sure how much I like it (I just noticed the change after pulling the latest Emacs from git).

The current package.el doesn't say 'Number of packages that are not available: %d'; instead, it says 'Packages that are not available: %d' (it used to say "%s packages are not available").  Other examples are 'Packages to hide: %d' (originally 'Hiding %s packages') and 'Packages that can be upgraded: %d; type `%s' to mark for upgrading.' (originally '%d package%s can be upgraded; type `%s' to mark %s for upgrading.').

I find this suboptimal for three reasons: First, after 'packages that are not available', I expect to see a list of packages, not a number.  Second, the new way the message is phrased puts the important bit in a less obvious place (in the middle of the message, rather than at the beginning: "Packages that can be upgraded: 5; type `U' to mark for upgrading"). Third (but this is a bit more fuzzy), the way the message is now written makes errors sound like normal events ('Packages that are not available: 3' read like the response to the query 'how many packages are not available?').

I understand that there's hope to support plurals and internationalization in a more principled way soon, but is this workaround (61f73703c74756e6963cc622f03bcc6938ab71b2) needed in the meantime?

Clément.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 20:33                   ` Clément Pit-Claudel
@ 2019-03-21 20:50                     ` Eli Zaretskii
  2019-03-21 21:03                       ` Clément Pit-Claudel
  2019-03-21 21:17                     ` Jean-Christophe Helary
  2019-03-21 21:59                     ` Juri Linkov
  2 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-21 20:50 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel, brandelune, rms, juri

> Cc: Eli Zaretskii <eliz@gnu.org>, Richard Stallman <rms@gnu.org>,
>  emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Thu, 21 Mar 2019 16:33:21 -0400
> 
> I understand that there's hope to support plurals and internationalization in a more principled way soon, but is this workaround (61f73703c74756e6963cc622f03bcc6938ab71b2) needed in the meantime?

Whether we like it or not, it's one of the standard methods of solving
these situations.  It might sound somewhat more awkward in some
languages than the original wording, but it has other more important
advantages.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 20:50                     ` Eli Zaretskii
@ 2019-03-21 21:03                       ` Clément Pit-Claudel
  2019-03-21 21:21                         ` Jean-Christophe Helary
  2019-03-22  8:22                         ` Eli Zaretskii
  0 siblings, 2 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2019-03-21 21:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, brandelune, rms, juri

On 2019-03-21 16:50, Eli Zaretskii wrote:
>> Cc: Eli Zaretskii <eliz@gnu.org>, Richard Stallman <rms@gnu.org>,
>>  emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Thu, 21 Mar 2019 16:33:21 -0400
>>
>> I understand that there's hope to support plurals and internationalization in a more principled way soon, but is this workaround (61f73703c74756e6963cc622f03bcc6938ab71b2) needed in the meantime?
> 
> Whether we like it or not, it's one of the standard methods of solving
> these situations.  It might sound somewhat more awkward in some
> languages than the original wording, but it has other more important
> advantages.

I don't understand: what does this change buy us currently, except the awkward wording?  Arguably the patch for the change was a simplification, but the original author had actually writen the code to get the English plurals right in most cases.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 21:03                       ` Clément Pit-Claudel
@ 2019-03-21 21:21                         ` Jean-Christophe Helary
  2019-03-21 21:34                           ` Clément Pit-Claudel
  2019-03-22  8:22                         ` Eli Zaretskii
  1 sibling, 1 reply; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-21 21:21 UTC (permalink / raw)
  To: emacs-devel



> On Mar 22, 2019, at 6:03, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
> 
> but the original author had actually writen the code to get the English plurals right in most cases.

Yes, but the issue is not to have "most cases right" but rather to have readable strings in the code.

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 21:21                         ` Jean-Christophe Helary
@ 2019-03-21 21:34                           ` Clément Pit-Claudel
  2019-03-21 21:56                             ` Jean-Christophe Helary
  0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2019-03-21 21:34 UTC (permalink / raw)
  To: emacs-devel

On 2019-03-21 17:21, Jean-Christophe Helary wrote:
>> On Mar 22, 2019, at 6:03, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
>>
>> but the original author had actually writen the code to get the English plurals right in most cases.
> 
> Yes, but the issue is not to have "most cases right" but rather to have readable strings in the code.

But is that worth the loss in readability in the UI?



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 21:34                           ` Clément Pit-Claudel
@ 2019-03-21 21:56                             ` Jean-Christophe Helary
  2019-03-21 22:05                               ` Clément Pit-Claudel
  0 siblings, 1 reply; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-21 21:56 UTC (permalink / raw)
  To: emacs-devel



> On Mar 22, 2019, at 6:34, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
> 
> On 2019-03-21 17:21, Jean-Christophe Helary wrote:
>>> On Mar 22, 2019, at 6:03, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
>>> 
>>> but the original author had actually writen the code to get the English plurals right in most cases.
>> 
>> Yes, but the issue is not to have "most cases right" but rather to have readable strings in the code.
> 
> But is that worth the loss in readability in the UI?

That's a subjective issue. I think the strings are more readable now and there are potentially less grammatical mistakes (I am not talking about the "Number of" part you mentioned). While before it was easy to not notice a bug in the code. I think that is a net gain.

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 21:56                             ` Jean-Christophe Helary
@ 2019-03-21 22:05                               ` Clément Pit-Claudel
  2019-03-21 23:46                                 ` Jean-Christophe Helary
  0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2019-03-21 22:05 UTC (permalink / raw)
  To: Jean-Christophe Helary, emacs-devel

On 2019-03-21 17:56, Jean-Christophe Helary wrote:
> 
> 
>> On Mar 22, 2019, at 6:34, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
>>
>> On 2019-03-21 17:21, Jean-Christophe Helary wrote:
>>>> On Mar 22, 2019, at 6:03, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
>>>>
>>>> but the original author had actually writen the code to get the English plurals right in most cases.
>>>
>>> Yes, but the issue is not to have "most cases right" but rather to have readable strings in the code.
>>
>> But is that worth the loss in readability in the UI?
> 
> That's a subjective issue. I think the strings are more readable now and there are potentially less grammatical mistakes (I am not talking about the "Number of" part you mentioned). While before it was easy to not notice a bug in the code. I think that is a net gain.

Understood, thanks.  I would have preferred sticking with the previous phrasing, but a single opinion is not much to act upon.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 22:05                               ` Clément Pit-Claudel
@ 2019-03-21 23:46                                 ` Jean-Christophe Helary
  0 siblings, 0 replies; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-21 23:46 UTC (permalink / raw)
  To: emacs-devel

> On Mar 22, 2019, at 7:05, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
> 
>> That's a subjective issue. I think the strings are more readable now and there are potentially less grammatical mistakes (I am not talking about the "Number of" part you mentioned). While before it was easy to not notice a bug in the code. I think that is a net gain.
> 
> Understood, thanks.  I would have preferred sticking with the previous phrasing, but a single opinion is not much to act upon.

:) No, it's quite the opposite. everything starts from a single opinion. I would also prefer to stick to more natural sounding phrasing, but the tools available at the moment don't allow for that.

It is usually accepted in code internationalization that concatenation and other similar processes should not be used to generate natural language strings. So there are 2 ways to deal with that: either you simplify the strings to the point where there are as little variations possible (my choice for packages.el) or you use strings redundancy to cover all the possible variations with processes that support that (which we don't have at the moment).

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 21:03                       ` Clément Pit-Claudel
  2019-03-21 21:21                         ` Jean-Christophe Helary
@ 2019-03-22  8:22                         ` Eli Zaretskii
  2019-03-22 16:10                           ` Clément Pit-Claudel
  1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-22  8:22 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel, brandelune, rms, juri

> Cc: brandelune@gmail.com, juri@linkov.net, rms@gnu.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Thu, 21 Mar 2019 17:03:31 -0400
> 
> > Whether we like it or not, it's one of the standard methods of solving
> > these situations.  It might sound somewhat more awkward in some
> > languages than the original wording, but it has other more important
> > advantages.
> 
> I don't understand: what does this change buy us currently, except the awkward wording?

It brings us a step closer to the i18n goal.  A very small step,
admittedly, but step in the right direction nonetheless.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-22  8:22                         ` Eli Zaretskii
@ 2019-03-22 16:10                           ` Clément Pit-Claudel
  2019-03-22 16:35                             ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2019-03-22 16:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, brandelune, rms, juri

On 2019-03-22 04:22, Eli Zaretskii wrote:
>> Cc: brandelune@gmail.com, juri@linkov.net, rms@gnu.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Thu, 21 Mar 2019 17:03:31 -0400
>>
>>> Whether we like it or not, it's one of the standard methods of solving
>>> these situations.  It might sound somewhat more awkward in some
>>> languages than the original wording, but it has other more important
>>> advantages.
>>
>> I don't understand: what does this change buy us currently, except the awkward wording?
> 
> It brings us a step closer to the i18n goal.  A very small step,
> admittedly, but step in the right direction nonetheless.

Thanks, that's what puzzles me.  IIUC, we will revert to the previous strings once we have proper translation support in place, right?





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-22 16:10                           ` Clément Pit-Claudel
@ 2019-03-22 16:35                             ` Eli Zaretskii
  2019-03-22 17:16                               ` Clément Pit-Claudel
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-22 16:35 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel, brandelune, rms, juri

> Cc: brandelune@gmail.com, juri@linkov.net, rms@gnu.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 22 Mar 2019 12:10:42 -0400
> 
> > It brings us a step closer to the i18n goal.  A very small step,
> > admittedly, but step in the right direction nonetheless.
> 
> Thanks, that's what puzzles me.  IIUC, we will revert to the previous strings once we have proper translation support in place, right?

It isn't clear to me yet.  At least at the time this change was made,
we didn't expect to revert, we thought this form will remain when it
can be translated.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-22 16:35                             ` Eli Zaretskii
@ 2019-03-22 17:16                               ` Clément Pit-Claudel
  2019-03-22 17:35                                 ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2019-03-22 17:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, brandelune, rms, juri

On 2019-03-22 12:35, Eli Zaretskii wrote:
>> Cc: brandelune@gmail.com, juri@linkov.net, rms@gnu.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Fri, 22 Mar 2019 12:10:42 -0400
>>
>>> It brings us a step closer to the i18n goal.  A very small step,
>>> admittedly, but step in the right direction nonetheless.
>>
>> Thanks, that's what puzzles me.  IIUC, we will revert to the previous strings once we have proper translation support in place, right?
> 
> It isn't clear to me yet.  At least at the time this change was made,
> we didn't expect to revert, we thought this form will remain when it
> can be translated.

Oh!  Then I misunderstood.  I thought the idea was that once we have a library that can handle this well, we'd write something like (ngettext "One package installed" "%d packages installed" n), with ngettext picking between both in English and picking the appropriate string in other languages.

Clément.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-22 17:16                               ` Clément Pit-Claudel
@ 2019-03-22 17:35                                 ` Eli Zaretskii
  2019-03-22 23:17                                   ` Clément Pit-Claudel
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-22 17:35 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel, brandelune, rms, juri

> Cc: brandelune@gmail.com, juri@linkov.net, rms@gnu.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 22 Mar 2019 13:16:17 -0400
> 
> > It isn't clear to me yet.  At least at the time this change was made,
> > we didn't expect to revert, we thought this form will remain when it
> > can be translated.
> 
> Oh!  Then I misunderstood.  I thought the idea was that once we have a library that can handle this well, we'd write something like (ngettext "One package installed" "%d packages installed" n), with ngettext picking between both in English and picking the appropriate string in other languages.

Maybe.  ngettext wasn't on the table when this change was made, and
even now I'm not yet sure what the end result will look like.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-22 17:35                                 ` Eli Zaretskii
@ 2019-03-22 23:17                                   ` Clément Pit-Claudel
  0 siblings, 0 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2019-03-22 23:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, brandelune, rms, juri

On 2019-03-22 13:35, Eli Zaretskii wrote:
>> Cc: brandelune@gmail.com, juri@linkov.net, rms@gnu.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Fri, 22 Mar 2019 13:16:17 -0400
>>
>>> It isn't clear to me yet.  At least at the time this change was made,
>>> we didn't expect to revert, we thought this form will remain when it
>>> can be translated.
>>
>> Oh!  Then I misunderstood.  I thought the idea was that once we have a library that can handle this well, we'd write something like (ngettext "One package installed" "%d packages installed" n), with ngettext picking between both in English and picking the appropriate string in other languages.
> 
> Maybe.  ngettext wasn't on the table when this change was made, and
> even now I'm not yet sure what the end result will look like.

Got it. Thanks for taking the time to explain!



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 20:33                   ` Clément Pit-Claudel
  2019-03-21 20:50                     ` Eli Zaretskii
@ 2019-03-21 21:17                     ` Jean-Christophe Helary
  2019-03-21 21:59                     ` Juri Linkov
  2 siblings, 0 replies; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-21 21:17 UTC (permalink / raw)
  To: emacs-devel

Thank you Clement for the remarks. I wrote all at first because there were issues with the original code that did not take into account some singular cases. When I checked the code, I found a terrible amount of strings mixed with code and so I went a bit further and "fixed" that too. I don't remember forgetting about that "Number" issue you mention though. Sorry for that.

Jean-Christophe 

> On Mar 22, 2019, at 5:33, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
> 
> On 2019-03-03 20:46, Jean-Christophe Helary wrote:
>>> On Mar 4, 2019, at 5:57, Juri Linkov <juri@linkov.net <mailto:juri@linkov.net>> wrote:
>>> My intention was to fix the bug which manifests itself in
>>> grammatically incorrect sentences displayed by ‘message’ like
>>> 
>>> Deleted 1 matching lines
>>> 1 matches found
>>> ...
>> 
>> The best way to do that (I fixed the almost 100% of the package.el code with that) is to not use such syntax but rather things like:
>> 
>> Number of matches found: %d
> 
> I'm a bit late to the party, but I hope it's still OK to respond :)  This is a valid way to work around the issue, but I'm not sure how much I like it (I just noticed the change after pulling the latest Emacs from git).
> 
> The current package.el doesn't say 'Number of packages that are not available: %d'; instead, it says 'Packages that are not available: %d' (it used to say "%s packages are not available").  Other examples are 'Packages to hide: %d' (originally 'Hiding %s packages') and 'Packages that can be upgraded: %d; type `%s' to mark for upgrading.' (originally '%d package%s can be upgraded; type `%s' to mark %s for upgrading.').
> 
> I find this suboptimal for three reasons: First, after 'packages that are not available', I expect to see a list of packages, not a number.  Second, the new way the message is phrased puts the important bit in a less obvious place (in the middle of the message, rather than at the beginning: "Packages that can be upgraded: 5; type `U' to mark for upgrading"). Third (but this is a bit more fuzzy), the way the message is now written makes errors sound like normal events ('Packages that are not available: 3' read like the response to the query 'how many packages are not available?').
> 
> I understand that there's hope to support plurals and internationalization in a more principled way soon, but is this workaround (61f73703c74756e6963cc622f03bcc6938ab71b2) needed in the meantime?
> 
> Clément.
> 

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 20:33                   ` Clément Pit-Claudel
  2019-03-21 20:50                     ` Eli Zaretskii
  2019-03-21 21:17                     ` Jean-Christophe Helary
@ 2019-03-21 21:59                     ` Juri Linkov
  2019-03-22  8:22                       ` Eli Zaretskii
  2 siblings, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-21 21:59 UTC (permalink / raw)
  To: Clément Pit-Claudel
  Cc: Eli Zaretskii, emacs-devel, Jean-Christophe Helary,
	Richard Stallman

> The current package.el doesn't say 'Number of packages that are not
> available: %d'; instead, it says 'Packages that are not available: %d'
> (it used to say "%s packages are not available").

Both don't sound natural, it's too robotic.
Let's use ngettext plurals from now on.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-21 21:59                     ` Juri Linkov
@ 2019-03-22  8:22                       ` Eli Zaretskii
  2019-03-23 21:50                         ` Juri Linkov
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-22  8:22 UTC (permalink / raw)
  To: Juri Linkov; +Cc: cpitclaudel, emacs-devel, brandelune, rms

> From: Juri Linkov <juri@linkov.net>
> Cc: Jean-Christophe Helary <brandelune@gmail.com>,  Eli Zaretskii <eliz@gnu.org>,  Richard Stallman <rms@gnu.org>,  emacs-devel@gnu.org
> Date: Thu, 21 Mar 2019 23:59:23 +0200
> 
> Let's use ngettext plurals from now on.

I don't think I understand the practical implications of that.  Could
you please elaborate?



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-22  8:22                       ` Eli Zaretskii
@ 2019-03-23 21:50                         ` Juri Linkov
  2019-03-24  3:36                           ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-23 21:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cpitclaudel, emacs-devel, brandelune, rms

>> Let's use ngettext plurals from now on.
>
> I don't think I understand the practical implications of that.  Could
> you please elaborate?

This means replacing

  (message "Packages to install: %d" n)

with

  (message (ngettext "One package will be installed"
                     "%d packages will be installed" n) n)



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-23 21:50                         ` Juri Linkov
@ 2019-03-24  3:36                           ` Eli Zaretskii
  2019-03-24 21:55                             ` Juri Linkov
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-24  3:36 UTC (permalink / raw)
  To: Juri Linkov; +Cc: cpitclaudel, emacs-devel, brandelune, rms

> From: Juri Linkov <juri@linkov.net>
> Cc: cpitclaudel@gmail.com,  brandelune@gmail.com,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Sat, 23 Mar 2019 23:50:53 +0200
> 
> >> Let's use ngettext plurals from now on.
> >
> > I don't think I understand the practical implications of that.  Could
> > you please elaborate?
> 
> This means replacing
> 
>   (message "Packages to install: %d" n)
> 
> with
> 
>   (message (ngettext "One package will be installed"
>                      "%d packages will be installed" n) n)

But since we don't yet have ngettext, we cannot yet use this
paradigm.  I thought by "from now on" you literally meant from now;
did I misunderstand?



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-24  3:36                           ` Eli Zaretskii
@ 2019-03-24 21:55                             ` Juri Linkov
  2019-03-24 23:31                               ` Jean-Christophe Helary
                                                 ` (2 more replies)
  0 siblings, 3 replies; 145+ messages in thread
From: Juri Linkov @ 2019-03-24 21:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cpitclaudel, emacs-devel, brandelune, rms

>> >> Let's use ngettext plurals from now on.
>> >
>> > I don't think I understand the practical implications of that.  Could
>> > you please elaborate?
>> 
>> This means replacing
>> 
>>   (message "Packages to install: %d" n)
>> 
>> with
>> 
>>   (message (ngettext "One package will be installed"
>>                      "%d packages will be installed" n) n)
>
> But since we don't yet have ngettext, we cannot yet use this
> paradigm.  I thought by "from now on" you literally meant from now;
> did I misunderstand?

Yes, literally.  After the patch from
http://lists.gnu.org/archive/html/emacs-devel/2019-03/msg00586.html
is pushed to master, ngettext is available to use for pluralization.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-24 21:55                             ` Juri Linkov
@ 2019-03-24 23:31                               ` Jean-Christophe Helary
  2019-03-25 21:32                                 ` Juri Linkov
  2019-03-25  3:35                               ` Eli Zaretskii
  2019-03-25 10:52                               ` Mattias Engdegård
  2 siblings, 1 reply; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-24 23:31 UTC (permalink / raw)
  To: Emacs developers



> On Mar 25, 2019, at 6:55, Juri Linkov <juri@linkov.net> wrote:
> 
>>>>> Let's use ngettext plurals from now on.
>>>> 
>>>> I don't think I understand the practical implications of that.  Could
>>>> you please elaborate?
>>> 
>>> This means replacing
>>> 
>>>  (message "Packages to install: %d" n)
>>> 
>>> with
>>> 
>>>  (message (ngettext "One package will be installed"
>>>                     "%d packages will be installed" n) n)
>> 
>> But since we don't yet have ngettext, we cannot yet use this
>> paradigm.  I thought by "from now on" you literally meant from now;
>> did I misunderstand?
> 
> Yes, literally.  After the patch from
> http://lists.gnu.org/archive/html/emacs-devel/2019-03/msg00586.html
> is pushed to master, ngettext is available to use for pluralization.

Why put the patch in subr.el and not in its own i18n related new package ?

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-24 23:31                               ` Jean-Christophe Helary
@ 2019-03-25 21:32                                 ` Juri Linkov
  2019-03-25 22:31                                   ` Paul Eggert
  0 siblings, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-25 21:32 UTC (permalink / raw)
  To: Jean-Christophe Helary; +Cc: Emacs developers

> Why put the patch in subr.el and not in its own i18n related new package ?

I don't know where to put i18n related code, so since ngettext should
have C calls anyway, I moved it to editfns.c near the function ‘message’
where it still just returns the correct plurals without doing any translation.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25 21:32                                 ` Juri Linkov
@ 2019-03-25 22:31                                   ` Paul Eggert
  2019-03-26 16:11                                     ` Eli Zaretskii
  2019-03-26 23:16                                     ` Juri Linkov
  0 siblings, 2 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-25 22:31 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Jean-Christophe Helary, Emacs developers

[-- Attachment #1: Type: text/plain, Size: 1795 bytes --]

On 3/25/19 2:32 PM, Juri Linkov wrote:
> I don't know where to put i18n related code, so since ngettext should
> have C calls anyway, I moved it to editfns.c near the function ‘message’
> where it still just returns the correct plurals without doing any translation.

That stub had some problems:

1. It lacked documentation in the Elisp manual. Important changes like
this should be documented -- to some extent the documentation is even
more important than the code. Can you write something?

2. While you're thinking about (1) here are some other questions. How
will ngettext determine the message catalog? Is the catalog visible to
users as a global variable, or as a hidden part of the global state, or
is it something explicit? How will catalogs from multiple packages be
used? How would a multi-lingual application work in Emacs if the message
catalog is part of global state? This seems to be a crucial issue, I'd
say. For example, should Emacs export dcngettext to Lisp code, instead
of just plain ngettext? (Emacs could then define ngettext in terms of
dcngettext.)

3. User C code is not supposed to inspect the _LIBC macro; that's for
glibc internal use. In Emacs _LIBC should be used only with code shared
with glibc, and we should assume _LIBC is never defined when files are
compiled for Emacs.

4. The stub doesn't work with bignums.

5. When calling the C-level ngettext, strings are not properly recoded.

I fixed (3) and (4), and temporarily worked around (5), by installing
the attached patch. To do a better job with (2) and (5) please see the
gettext manual's instructions for package maintainers, here:

https://www.gnu.org/software/gettext/manual/gettext.html#Maintainers

To my mind (1) and (2) are the most-pressing problems.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Port-recent-ngettext-stub-to-non-glibc.patch --]
[-- Type: text/x-patch; name="0001-Port-recent-ngettext-stub-to-non-glibc.patch", Size: 2850 bytes --]

From a361c54b8339ad79f65e924c4a1f7bbcdb1859e2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 25 Mar 2019 15:20:20 -0700
Subject: [PATCH] Port recent ngettext stub to non-glibc
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/editfns.c: Don’t try to call glibc ngettext;
we’re not ready for that yet.
(Fngettext): Do not restrict integer arguments to fixnums.
Improve doc string a bit.
---
 src/editfns.c | 34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/src/editfns.c b/src/editfns.c
index ab48cdb6fd..bfffadc733 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -53,12 +53,6 @@ along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.  */
 #include "window.h"
 #include "blockinput.h"
 
-#ifdef _LIBC
-# include <libintl.h>
-#else
-# include "gettext.h"
-#endif
-
 static void update_buffer_properties (ptrdiff_t, ptrdiff_t);
 static Lisp_Object styled_format (ptrdiff_t, Lisp_Object *, bool);
 
@@ -2845,30 +2839,20 @@ usage: (save-restriction &rest BODY)  */)
 /* i18n (internationalization).  */
 
 DEFUN ("ngettext", Fngettext, Sngettext, 3, 3, 0,
-       doc: /* Return the plural form of the translation of the string.
-This function is similar to the `gettext' function as it finds the message
-catalogs in the same way.  But it takes two extra arguments.  The MSGID
-parameter must contain the singular form of the string to be converted.
-It is also used as the key for the search in the catalog.
-The MSGID_PLURAL parameter is the plural form.  The parameter N is used
-to determine the plural form.  If no message catalog is found MSGID is
-returned if N is equal to 1, otherwise MSGID_PLURAL.  */)
+       doc: /* Return the translation of MSGID (plural MSGID_PLURAL) depending on N.
+MSGID is the singular form of the string to be converted;
+use it as the key for the search in the translation catalog.
+MSGID_PLURAL is the plural form.  Use N to select the proper translation.
+If no message catalog is found, MSGID is returned if N is equal to 1,
+otherwise MSGID_PLURAL.  */)
   (Lisp_Object msgid, Lisp_Object msgid_plural, Lisp_Object n)
 {
   CHECK_STRING (msgid);
   CHECK_STRING (msgid_plural);
-  CHECK_FIXNUM (n);
+  CHECK_INTEGER (n);
 
-#ifdef _LIBGETTEXT_H
-  return build_string (ngettext (SSDATA (msgid),
-                                 SSDATA (msgid_plural),
-                                 XFIXNUM (n)));
-#else
-  if (XFIXNUM (n) == 1)
-    return msgid;
-  else
-    return msgid_plural;
-#endif
+  /* Placeholder implementation until we get our act together.  */
+  return EQ (n, make_fixnum (1)) ? msgid : msgid_plural;
 }
 \f
 DEFUN ("message", Fmessage, Smessage, 1, MANY, 0,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25 22:31                                   ` Paul Eggert
@ 2019-03-26 16:11                                     ` Eli Zaretskii
  2019-03-26 16:22                                       ` Stefan Monnier
                                                         ` (2 more replies)
  2019-03-26 23:16                                     ` Juri Linkov
  1 sibling, 3 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-26 16:11 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel, brandelune, juri

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 25 Mar 2019 15:31:14 -0700
> Cc: Jean-Christophe Helary <brandelune@gmail.com>,
> 	Emacs developers <emacs-devel@gnu.org>
> 
> 2. While you're thinking about (1) here are some other questions. How
> will ngettext determine the message catalog? Is the catalog visible to
> users as a global variable, or as a hidden part of the global state, or
> is it something explicit? How will catalogs from multiple packages be
> used? How would a multi-lingual application work in Emacs if the message
> catalog is part of global state? This seems to be a crucial issue, I'd
> say. For example, should Emacs export dcngettext to Lisp code, instead
> of just plain ngettext? (Emacs could then define ngettext in terms of
> dcngettext.)

Do we have any reasons not to follow the CLISP example of factoring
these issues?

> 5. When calling the C-level ngettext, strings are not properly recoded.

Did you mean decoding the translated string that ngettext returns?  If
so, we will need some way of getting at the encoding of the strings in
the catalog, I think.  Or will we mandate that Emacs catalogs need
always to be in UTF-8 encoding?



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-26 16:11                                     ` Eli Zaretskii
@ 2019-03-26 16:22                                       ` Stefan Monnier
  2019-03-26 16:55                                         ` Eli Zaretskii
  2019-03-26 22:35                                       ` Paul Eggert
  2019-03-27  2:34                                       ` Jean-Christophe Helary
  2 siblings, 1 reply; 145+ messages in thread
From: Stefan Monnier @ 2019-03-26 16:22 UTC (permalink / raw)
  To: emacs-devel

> Did you mean decoding the translated string that ngettext returns?  If
> so, we will need some way of getting at the encoding of the strings in
> the catalog, I think.  Or will we mandate that Emacs catalogs need
> always to be in UTF-8 encoding?

Mandating `utf-8-emacs` would make things simpler.


        Stefan




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-26 16:22                                       ` Stefan Monnier
@ 2019-03-26 16:55                                         ` Eli Zaretskii
  0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-26 16:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Tue, 26 Mar 2019 12:22:52 -0400
> 
> > Did you mean decoding the translated string that ngettext returns?  If
> > so, we will need some way of getting at the encoding of the strings in
> > the catalog, I think.  Or will we mandate that Emacs catalogs need
> > always to be in UTF-8 encoding?
> 
> Mandating `utf-8-emacs` would make things simpler.

If the translators won't mind, sure.

Or maybe we will recode into UTF-8 before importing those catalogs
that aren't.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-26 16:11                                     ` Eli Zaretskii
  2019-03-26 16:22                                       ` Stefan Monnier
@ 2019-03-26 22:35                                       ` Paul Eggert
  2019-03-27  3:43                                         ` Eli Zaretskii
  2019-03-27  2:34                                       ` Jean-Christophe Helary
  2 siblings, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-26 22:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, brandelune, juri

On 3/26/19 9:11 AM, Eli Zaretskii wrote:
>> 2. While you're thinking about (1) here are some other questions. How
>> will ngettext determine the message catalog? Is the catalog visible to
>> users as a global variable, or as a hidden part of the global state, or
>> is it something explicit? How will catalogs from multiple packages be
>> used? How would a multi-lingual application work in Emacs if the message
>> catalog is part of global state? This seems to be a crucial issue, I'd
>> say. For example, should Emacs export dcngettext to Lisp code, instead
>> of just plain ngettext? (Emacs could then define ngettext in terms of
>> dcngettext.)
> Do we have any reasons not to follow the CLISP example of factoring
> these issues?

That's the first I've heard that CLISP does gettext. I looked into it,
and it's a reasonably simple binding, which means that the language is
part of the global state (Emacs would not easily be multilingual) and
that each package can have its own catalog and can specify that catalog
as a trailing argument to gettext (presumably the default catalog would
be for Emacs core). This should be good enough, though it will be a bit
of a hassle for non-core code to keep track of the catalog.


> will we mandate that Emacs catalogs need
> always to be in UTF-8 encoding?

Yes, that makes sense.




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-26 22:35                                       ` Paul Eggert
@ 2019-03-27  3:43                                         ` Eli Zaretskii
  2019-03-28 14:56                                           ` Clément Pit-Claudel
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-27  3:43 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel, brandelune, juri

> Cc: juri@linkov.net, brandelune@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 26 Mar 2019 15:35:22 -0700
> 
> > Do we have any reasons not to follow the CLISP example of factoring
> > these issues?
> 
> That's the first I've heard that CLISP does gettext.

I learned that from post by Bruno here up-thread.

> I looked into it, and it's a reasonably simple binding, which means
> that the language is part of the global state (Emacs would not
> easily be multilingual)

We could offer the language as another optional argument.  I'm not
sure we need to allow control of the CATEGORY (for choosing the LC_*
category), so we could replace that with the language.  Or we could
keep CATEGORY for compatibility and just add LANGUAGE.

> and that each package can have its own catalog and can specify that
> catalog as a trailing argument to gettext (presumably the default
> catalog would be for Emacs core). This should be good enough, though
> it will be a bit of a hassle for non-core code to keep track of the
> catalog.

If we want some automatic way of changing the domain when a function
from a package is called, we need to develop the infrastructure for
that.  But that could wait for later, I think.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-27  3:43                                         ` Eli Zaretskii
@ 2019-03-28 14:56                                           ` Clément Pit-Claudel
  2019-03-28 15:52                                             ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2019-03-28 14:56 UTC (permalink / raw)
  To: emacs-devel

On 2019-03-26 23:43, Eli Zaretskii wrote:
> If we want some automatic way of changing the domain when a function
> from a package is called, we need to develop the infrastructure for
> that.  But that could wait for later, I think.

I expect I'd define a foo-ngettext macro in each `foo' package expanding to ngettext with the appropriate group argument.  If there are multiple functions (gettext, ngettext, etc), maybe a single macro defining all foo-* variants at once would be nice.

Clément.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-28 14:56                                           ` Clément Pit-Claudel
@ 2019-03-28 15:52                                             ` Eli Zaretskii
  0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-28 15:52 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel

> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Thu, 28 Mar 2019 10:56:44 -0400
> 
> On 2019-03-26 23:43, Eli Zaretskii wrote:
> > If we want some automatic way of changing the domain when a function
> > from a package is called, we need to develop the infrastructure for
> > that.  But that could wait for later, I think.
> 
> I expect I'd define a foo-ngettext macro in each `foo' package expanding to ngettext with the appropriate group argument.  If there are multiple functions (gettext, ngettext, etc), maybe a single macro defining all foo-* variants at once would be nice.

I really hope we could come up with something more elegant.  And
besides, your suggestion doesn't handle calls from Lisp packages to
core APIs, including primitives and modules.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-26 16:11                                     ` Eli Zaretskii
  2019-03-26 16:22                                       ` Stefan Monnier
  2019-03-26 22:35                                       ` Paul Eggert
@ 2019-03-27  2:34                                       ` Jean-Christophe Helary
  2 siblings, 0 replies; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-27  2:34 UTC (permalink / raw)
  To: Emacs developers



> On Mar 27, 2019, at 1:11, Eli Zaretskii <eliz@gnu.org> wrote:

> Or will we mandate that Emacs catalogs need
> always to be in UTF-8 encoding?

Please.

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25 22:31                                   ` Paul Eggert
  2019-03-26 16:11                                     ` Eli Zaretskii
@ 2019-03-26 23:16                                     ` Juri Linkov
  2019-03-27  1:35                                       ` Paul Eggert
  2019-04-24  6:39                                       ` Jean-Christophe Helary
  1 sibling, 2 replies; 145+ messages in thread
From: Juri Linkov @ 2019-03-26 23:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jean-Christophe Helary, Emacs developers

>> I don't know where to put i18n related code, so since ngettext should
>> have C calls anyway, I moved it to editfns.c near the function ‘message’
>> where it still just returns the correct plurals without doing any translation.
>
> That stub had some problems:
>
> 1. It lacked documentation in the Elisp manual. Important changes like
> this should be documented -- to some extent the documentation is even
> more important than the code. Can you write something?

I'll start writing documentation.  Is it allowed to make
references from the Elisp manual to the Gettext Info manual?
I see in (info "(gettext) elisp-format") a reference back to
the Elisp manual is a web link, not an Info reference.

> 2. While you're thinking about (1) here are some other questions. How
> will ngettext determine the message catalog? Is the catalog visible to
> users as a global variable, or as a hidden part of the global state, or
> is it something explicit? How will catalogs from multiple packages be
> used? How would a multi-lingual application work in Emacs if the message
> catalog is part of global state? This seems to be a crucial issue, I'd
> say. For example, should Emacs export dcngettext to Lisp code, instead
> of just plain ngettext? (Emacs could then define ngettext in terms of
> dcngettext.)

It seems most of these needs could be covered by adding two optional
arguments DOMAIN and CATEGORY to ngettext (where the default domain
"emacs" will be hard-coded).

As a convenience not to require a package to add its domain to every
ngettext call, maybe when something like 'defdomain' is declared at the
beginning of the package, its value should affect the domain within
the package scope.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-26 23:16                                     ` Juri Linkov
@ 2019-03-27  1:35                                       ` Paul Eggert
  2019-04-24  6:39                                       ` Jean-Christophe Helary
  1 sibling, 0 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-27  1:35 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Jean-Christophe Helary, Emacs developers

On 3/26/19 4:16 PM, Juri Linkov wrote:

> I'll start writing documentation. Is it allowed to make references
> from the Elisp manual to the Gettext Info manual? I see in (info
> "(gettext) elisp-format") a reference back to the Elisp manual is a
> web link, not an Info reference.
>
Thanks for taking this on. Yes, you can do cross-references; e.g.,
files.texi has this:

@xref{File permissions,,, coreutils, The @sc{gnu} @code{Coreutils}
Manual}


> It seems most of these needs could be covered by adding two optional
> arguments DOMAIN and CATEGORY to ngettext (where the default domain
> "emacs" will be hard-coded).
>
This appears to be what CLISP does; see:

https://sourceforge.net/p/clisp/clisp/ci/default/tree/modules/i18n/i18n.lisp

https://clisp.sourceforge.io/impnotes.html#i18n-mod

> As a convenience not to require a package to add its domain to every
> ngettext call, maybe when something like 'defdomain' is declared at
> the beginning of the package, its value should affect the domain
> within the package scope.
>
Would this be done statically or dynamically? Preferably the former but
I don't exactly see how it would work, and even dynamically the details
are not obvious to me.

For example, would you have to do something like the following?

(define mymodule--ngettext (n sing-msgid pl-msgid) (ngettext n
sing-msgid pl-msgid "mymodule"))

(defun report-items (n) (message (mymodule--ngettext n "%d item" "%d
items") n))
(defun report-keystrokes (n) (message (mymodule--ngetext n "%d keystroke
received." "%d keystrokes received.") n))

Something like this would work, but it looks pretty annoying....




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-26 23:16                                     ` Juri Linkov
  2019-03-27  1:35                                       ` Paul Eggert
@ 2019-04-24  6:39                                       ` Jean-Christophe Helary
  2019-04-24 20:18                                         ` Juri Linkov
  1 sibling, 1 reply; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-04-24  6:39 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Paul Eggert, Emacs developers

[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]

So, where do we go from here now ?

Juri, have you written documentation ? Do you want help ?

Jean-Christophe

> On Mar 27, 2019, at 8:16, Juri Linkov <juri@linkov.net <mailto:juri@linkov.net>> wrote:
> 
>>> I don't know where to put i18n related code, so since ngettext should
>>> have C calls anyway, I moved it to editfns.c near the function ‘message’
>>> where it still just returns the correct plurals without doing any translation.
>> 
>> That stub had some problems:
>> 
>> 1. It lacked documentation in the Elisp manual. Important changes like
>> this should be documented -- to some extent the documentation is even
>> more important than the code. Can you write something?
> 
> I'll start writing documentation.  Is it allowed to make
> references from the Elisp manual to the Gettext Info manual?
> I see in (info "(gettext) elisp-format") a reference back to
> the Elisp manual is a web link, not an Info reference.
> 
>> 2. While you're thinking about (1) here are some other questions. How
>> will ngettext determine the message catalog? Is the catalog visible to
>> users as a global variable, or as a hidden part of the global state, or
>> is it something explicit? How will catalogs from multiple packages be
>> used? How would a multi-lingual application work in Emacs if the message
>> catalog is part of global state? This seems to be a crucial issue, I'd
>> say. For example, should Emacs export dcngettext to Lisp code, instead
>> of just plain ngettext? (Emacs could then define ngettext in terms of
>> dcngettext.)
> 
> It seems most of these needs could be covered by adding two optional
> arguments DOMAIN and CATEGORY to ngettext (where the default domain
> "emacs" will be hard-coded).
> 
> As a convenience not to require a package to add its domain to every
> ngettext call, maybe when something like 'defdomain' is declared at the
> beginning of the package, its value should affect the domain within
> the package scope.

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com <http://mac4translators.blogspot.com/> @brandelune



[-- Attachment #2: Type: text/html, Size: 4267 bytes --]

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-04-24  6:39                                       ` Jean-Christophe Helary
@ 2019-04-24 20:18                                         ` Juri Linkov
  0 siblings, 0 replies; 145+ messages in thread
From: Juri Linkov @ 2019-04-24 20:18 UTC (permalink / raw)
  To: Jean-Christophe Helary; +Cc: Paul Eggert, Emacs developers

> So, where do we go from here now ?
>
> Juri, have you written documentation ?

It's still WIP.  First I looked how i18n is implemented in XEmacs,
and discovered that whereas the interface is documented, it's
not fully functional.  What is worse, it's quite ugly.
So I turned onto a nicer interface in CLISP that could be
used as a basis of gettext interface in Emacs Lisp.

> Do you want help ?

Help is needed to install the standard gettext infrastructure
using gettextize.  Help is expected from someone who has more
experience in applying gettext to other projects.

Once the default gettext infrastructure is installed,
I could help in adapting gettext to Emacs.

Meanwhile, currently I'm replacing dired-plural-s with ngettext
in bug#35287.  It's not without problems: one problematic place
is in dired-do-kill-lines:

  (defun dired-do-kill-lines (&optional arg fmt)
    ...
    (let ((count 0))
      (setq count (1+ count))
      (or (equal "" fmt)
          (message (or fmt "Killed %d line%s.") count (dired-plural-s count)))
      count)

  (defun dired-omit-expunge (&optional regexp)
    ...
    (setq count (dired-do-kill-lines
                 nil
                 (if dired-omit-verbose "Omitted %d line%s." "")))

The format string can't be just replaced in dired-do-kill-lines with
something like

  (ngettext "Killed %d line." "Killed %d lines." count)

because it can be called with a format string from dired-omit-expunge,
but also dired-omit-expunge has no access to the variable 'count'.

There are more such marginal cases, but eventually they all
have to resolved somehow.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-24 21:55                             ` Juri Linkov
  2019-03-24 23:31                               ` Jean-Christophe Helary
@ 2019-03-25  3:35                               ` Eli Zaretskii
  2019-03-25  9:04                                 ` Jean-Christophe Helary
  2019-03-25 21:02                                 ` Juri Linkov
  2019-03-25 10:52                               ` Mattias Engdegård
  2 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-25  3:35 UTC (permalink / raw)
  To: Juri Linkov; +Cc: cpitclaudel, emacs-devel, brandelune, rms

> From: Juri Linkov <juri@linkov.net>
> Cc: cpitclaudel@gmail.com,  brandelune@gmail.com,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Sun, 24 Mar 2019 23:55:57 +0200
> 
> > But since we don't yet have ngettext, we cannot yet use this
> > paradigm.  I thought by "from now on" you literally meant from now;
> > did I misunderstand?
> 
> Yes, literally.  After the patch from
> http://lists.gnu.org/archive/html/emacs-devel/2019-03/msg00586.html
> is pushed to master, ngettext is available to use for pluralization.

That's changing history.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25  3:35                               ` Eli Zaretskii
@ 2019-03-25  9:04                                 ` Jean-Christophe Helary
  2019-03-25 21:02                                 ` Juri Linkov
  1 sibling, 0 replies; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-25  9:04 UTC (permalink / raw)
  To: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 803 bytes --]



> On Mar 25, 2019, at 12:35, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Juri Linkov <juri@linkov.net>
>> Cc: cpitclaudel@gmail.com,  brandelune@gmail.com,  rms@gnu.org,  emacs-devel@gnu.org
>> Date: Sun, 24 Mar 2019 23:55:57 +0200
>> 
>>> But since we don't yet have ngettext, we cannot yet use this
>>> paradigm.  I thought by "from now on" you literally meant from now;
>>> did I misunderstand?
>> 
>> Yes, literally.  After the patch from
>> http://lists.gnu.org/archive/html/emacs-devel/2019-03/msg00586.html
>> is pushed to master, ngettext is available to use for pluralization.
> 
> That's changing history.

How do we practically use that ?

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune



[-- Attachment #2: Type: text/html, Size: 2835 bytes --]

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25  3:35                               ` Eli Zaretskii
  2019-03-25  9:04                                 ` Jean-Christophe Helary
@ 2019-03-25 21:02                                 ` Juri Linkov
  2019-03-26  3:27                                   ` Eli Zaretskii
  1 sibling, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-25 21:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cpitclaudel, emacs-devel, brandelune, rms

>> > But since we don't yet have ngettext, we cannot yet use this
>> > paradigm.  I thought by "from now on" you literally meant from now;
>> > did I misunderstand?
>>
>> Yes, literally.  After the patch from
>> http://lists.gnu.org/archive/html/emacs-devel/2019-03/msg00586.html
>> is pushed to master, ngettext is available to use for pluralization.
>
> That's changing history.

When you asked to read past discussions, I did it, and among all opinions
the most encouraging were the wise words of François Pinard:

  "Yet, when it is affordable to do so, and to spare the overall effort,
   it is often a good thing to aim in directions which have less chance to
   lead into dead ends, from which we might later have to backtrack from.
   Yet, dead ends are not always technical, so sometimes, not always, dead
   ends might be more fruitful than no road at all.  It is surely a fine art,
   being able to choose the best roads, considering all issues.

   My opinion is, when we are lacking of volunteer time, that the best road is
   the one having the least steps in it, and this often means that backtracking
   should be avoided.  Best is trying to do things the right way, even if not
   everything gets done at once.  So steps accumulate constructively over time."

I hope that starting with a small step of adding ngettext to provide
the correct plurals for English words would lead in the right direction
while avoiding the danger of backtracking from dead ends.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25 21:02                                 ` Juri Linkov
@ 2019-03-26  3:27                                   ` Eli Zaretskii
  2019-03-27 23:06                                     ` Richard Stallman
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-26  3:27 UTC (permalink / raw)
  To: Juri Linkov; +Cc: cpitclaudel, emacs-devel, brandelune, rms

> From: Juri Linkov <juri@linkov.net>
> Cc: cpitclaudel@gmail.com,  brandelune@gmail.com,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Mon, 25 Mar 2019 23:02:59 +0200
> 
> I hope that starting with a small step of adding ngettext to provide
> the correct plurals for English words would lead in the right direction
> while avoiding the danger of backtracking from dead ends.

That's not what I meant.  I asked a question, and you replied as if
the commit done yesterday already existed before I asked the question.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-26  3:27                                   ` Eli Zaretskii
@ 2019-03-27 23:06                                     ` Richard Stallman
  0 siblings, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-27 23:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cpitclaudel, emacs-devel, brandelune, juri

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > I hope that starting with a small step of adding ngettext to provide
  > > the correct plurals for English words would lead in the right direction
  > > while avoiding the danger of backtracking from dead ends.

  > That's not what I meant.  I asked a question, and you replied as if
  > the commit done yesterday already existed before I asked the question.

It sounds like a minor miscommunication.  A short private conversation
might enable you to set it straight.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-24 21:55                             ` Juri Linkov
  2019-03-24 23:31                               ` Jean-Christophe Helary
  2019-03-25  3:35                               ` Eli Zaretskii
@ 2019-03-25 10:52                               ` Mattias Engdegård
  2019-03-25 15:37                                 ` Eli Zaretskii
  2019-03-25 21:11                                 ` Juri Linkov
  2 siblings, 2 replies; 145+ messages in thread
From: Mattias Engdegård @ 2019-03-25 10:52 UTC (permalink / raw)
  To: Juri Linkov; +Cc: brandelune, Eli Zaretskii, cpitclaudel, rms, emacs-devel

24 mars 2019 kl. 22.55 skrev Juri Linkov <juri@linkov.net>:
> 
> http://lists.gnu.org/archive/html/emacs-devel/2019-03/msg00586.html
> is pushed to master, ngettext is available to use for pluralization.

That patch exposes some Emacs-specific translation problems:

-                 (cons (format "finished with %d matches found\n" 
grep-num-matches-found)
+                 (cons (format (ngettext "finished with %d match found\n"
+                                         "finished with %d matches found\n"
+                                         grep-num-matches-found)
+                               grep-num-matches-found)

This is fine -- typical i18n code (except that the subject of the sentence is missing, which should go into a comment to translators).

      ;; remove match from grep-regexp-alist before fontifying
      ("^Grep[/a-zA-Z]* started.*"
       (0 '(face nil compilation-message nil help-echo nil mouse-face nil) t))
-     ("^Grep[/a-zA-Z]* finished with \\(?:\\(\\(?:[0-9]+ \\)?matches 
found\\)\\|\\(no matches found\\)\\).*"
+     ("^Grep[/a-zA-Z]* finished with \\(?:\\(\\(?:[0-9]+ \\)?match\\(?:es\\)? 
found\\)\\|\\(no matches found\\)\\).*"

Since it is not uncommon in Emacs to pattern-match on generated text, either the translator needs to understand regexps well or the code must be restructured to avoid that kind of matching, perhaps by using text properties. Besides, translating regexp strings precludes the use of modern regexp notations like rx, since gettext is string-oriented.

Of course the patch was just a proof-of-concept and not intended as actual code. Please forgive me for using it to make a point.

This is also not an argument against using gettext. Quite the contrary; it's the obvious way to go if i18n is to be undertaken at all.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25 10:52                               ` Mattias Engdegård
@ 2019-03-25 15:37                                 ` Eli Zaretskii
  2019-03-25 21:11                                 ` Juri Linkov
  1 sibling, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-25 15:37 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: cpitclaudel, emacs-devel, brandelune, rms, juri

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Mon, 25 Mar 2019 11:52:45 +0100
> Cc: Eli Zaretskii <eliz@gnu.org>, cpitclaudel@gmail.com, emacs-devel@gnu.org,
>         brandelune@gmail.com, rms@gnu.org
> 
> Of course the patch was just a proof-of-concept and not intended as actual code.

It's not proof-of-concept, it's an actual patch that was committed
yesterday night.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25 10:52                               ` Mattias Engdegård
  2019-03-25 15:37                                 ` Eli Zaretskii
@ 2019-03-25 21:11                                 ` Juri Linkov
  2019-03-25 22:05                                   ` Mattias Engdegård
  1 sibling, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-25 21:11 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: brandelune, Eli Zaretskii, cpitclaudel, rms, emacs-devel

> +                 (cons (format (ngettext "finished with %d match found\n"
> +                                         "finished with %d matches found\n"
> +                                         grep-num-matches-found)
> +                               grep-num-matches-found)
>
>       ("^Grep[/a-zA-Z]* started.*"
>        (0 '(face nil compilation-message nil help-echo nil mouse-face nil) t))
> -     ("^Grep[/a-zA-Z]* finished with \\(?:\\(\\(?:[0-9]+ \\)?matches 
> found\\)\\|\\(no matches found\\)\\).*"
> +     ("^Grep[/a-zA-Z]* finished with \\(?:\\(\\(?:[0-9]+ \\)?match\\(?:es\\)? 
> found\\)\\|\\(no matches found\\)\\).*"
>
> Since it is not uncommon in Emacs to pattern-match on generated text,
> either the translator needs to understand regexps well or the code
> must be restructured to avoid that kind of matching, perhaps by using
> text properties. Besides, translating regexp strings precludes the use
> of modern regexp notations like rx, since gettext is string-oriented.

Is it possible to generate a regexp from ngettext arguments?
For example, given the same arguments and calling a hypothetical
function ‘rx-ngettext’:

  (rx-ngettext "finished with %d match found\n"
               "finished with %d matches found\n")

to generate a regexp like:

  "finished with \\(?:\\(\\(?:[0-9]+ \\)?match\\(?:es\\)? found\\)"



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25 21:11                                 ` Juri Linkov
@ 2019-03-25 22:05                                   ` Mattias Engdegård
  2019-03-27 21:22                                     ` Juri Linkov
  0 siblings, 1 reply; 145+ messages in thread
From: Mattias Engdegård @ 2019-03-25 22:05 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Eli Zaretskii, emacs-devel, cpitclaudel, brandelune, rms

25 mars 2019 kl. 22.11 skrev Juri Linkov <juri@linkov.net>:
> 
> Is it possible to generate a regexp from ngettext arguments?
> For example, given the same arguments and calling a hypothetical
> function ‘rx-ngettext’:
> 
>  (rx-ngettext "finished with %d match found\n"
>               "finished with %d matches found\n")
> 
> to generate a regexp like:
> 
>  "finished with \\(?:\\(\\(?:[0-9]+ \\)?match\\(?:es\\)? found\\)"

Trivially so by generating an or-pattern: "singular text\\|plural text". Anything better is a matter of optimisation, basically a diff algorithm (or just prefix and suffix merging).

Is it practical, though? For %s, we would need to generate a match-anything subexpression, even though the argument is much more constrained in practice.




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-25 22:05                                   ` Mattias Engdegård
@ 2019-03-27 21:22                                     ` Juri Linkov
  2019-03-28 11:03                                       ` Mattias Engdegård
  0 siblings, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-27 21:22 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Eli Zaretskii, emacs-devel, cpitclaudel, brandelune, rms

>> Is it possible to generate a regexp from ngettext arguments?
>> For example, given the same arguments and calling a hypothetical
>> function ‘rx-ngettext’:
>>
>>  (rx-ngettext "finished with %d match found\n"
>>               "finished with %d matches found\n")
>>
>> to generate a regexp like:
>>
>>  "finished with \\(?:\\(\\(?:[0-9]+ \\)?match\\(?:es\\)? found\\)"
>
> Trivially so by generating an or-pattern: "singular text\\|plural text".
> Anything better is a matter of optimisation, basically a diff algorithm
> (or just prefix and suffix merging).
>
> Is it practical, though? For %s, we would need to generate a match-anything
> subexpression, even though the argument is much more constrained in practice.

I tried ‘regexp-opt’ and it generates a ready-to-use regexp:

  (replace-regexp-in-string
   "%d" "\\\\([0-9]+\\\\)"
   (regexp-opt '("finished with %d match found"
                 "finished with %d matches found"
                 "finished with no matches found")))

  ⇒ "\\(?:finished with \\(?:\\(?:\\([0-9]+\\) match\\(?:es\\)?\\|no matches\\) found\\)\\)"



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-27 21:22                                     ` Juri Linkov
@ 2019-03-28 11:03                                       ` Mattias Engdegård
  0 siblings, 0 replies; 145+ messages in thread
From: Mattias Engdegård @ 2019-03-28 11:03 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Emacs developers

27 mars 2019 kl. 22.22 skrev Juri Linkov <juri@linkov.net>:
> 
> I tried ‘regexp-opt’ and it generates a ready-to-use regexp:
> 
>  (replace-regexp-in-string
>   "%d" "\\\\([0-9]+\\\\)"
>   (regexp-opt '("finished with %d match found"
>                 "finished with %d matches found"
>                 "finished with no matches found")))
> 
>  ⇒ "\\(?:finished with \\(?:\\(?:\\([0-9]+\\) match\\(?:es\\)?\\|no matches\\) found\\)\\)"

Well now. There is no guarantee that regexp-opt won't split the %d. Format strings must be parsed left-to-right for correctness¹. I'm still skeptical, but if you really want to give this a try, then first segment the format string:

"Today %d little piggies built %03o houses and said '%s'."
"Today %d little piggy built %o house and said '%s'."
=>
("Today " ?d " little piggies built " ?o " houses and said '" ?s "'.")
("Today " ?d " little piggy built " ?o " house and said '" ?s "'.")

leaving the format placeholders as atomic entities (here shown as characters, but you may need more information there).
Then run your fav diff algo on the result. Most important to performance is prefix merging; anything else is just to make the regexp smaller.

Here, prefix and suffix merging would leave you with (still in abstract form)

("Today " ?d " little pigg"
 (("ies built " ?o " houses")
  ("y built " ?o " house"))
 " and said '" ?s "'.")

From there you can either recursively try to find more common subsequences, or call it a day and render it into a regexp:

"Today -?[0-9]+ little pigg\\(?:ies built -?[0-7]+ houses\\|y built -?[0-7]+ house\\) and said '\\(?:.\\|\n\\)*'."

All this will need to be done at run-time, since it is run on translated strings.

¹ To match format parameters, try something like
  (rx "%"
      (opt (1+ digit) "$")
      (0+ digit)
      (opt "." (0+ digit))
      (any "%sdioxXefgcS"))

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-03 15:31             ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Eli Zaretskii
  2019-03-03 20:57               ` Emacs i18n Juri Linkov
@ 2019-03-04  3:27               ` Richard Stallman
  2019-03-04 16:36                 ` Eli Zaretskii
  1 sibling, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-04  3:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, juri

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > This has come up several times in the past.  The main problem with
  > i18n in Emacs is that, unlike in many text-mode programs, 'message'
  > covers a tiny portion of the Emacs UI.  We have help commands that pop
  > up buffers; we have commands that prompt in the minibuffer; we have
  > menu items and labels on tool-bar buttons; we have help-echo on menus,

That is quite true.  However, I recommend a different approach to
doing the job.  An incremental one.

Let's install the lookup code and make `message' call it -- not using
advice.  Perhaps we should rewrite it into C, since it is short
and we will want to call it from C code.

Let's develop something to load translations from po files.  Let's
develop software to generate and write lists of messages that need
translating.

Then people can start developing useful sets of translations.

Meanwhile, we can also hook it into other interfaces where it
appropriate.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-04  3:27               ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Richard Stallman
@ 2019-03-04 16:36                 ` Eli Zaretskii
  2019-03-04 18:37                   ` Paul Eggert
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-04 16:36 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, juri

> From: Richard Stallman <rms@gnu.org>
> Cc: juri@linkov.net, emacs-devel@gnu.org
> Date: Sun, 03 Mar 2019 22:27:36 -0500
> 
> That is quite true.  However, I recommend a different approach to
> doing the job.  An incremental one.
> 
> Let's install the lookup code and make `message' call it -- not using
> advice.  Perhaps we should rewrite it into C, since it is short
> and we will want to call it from C code.
> 
> Let's develop something to load translations from po files.  Let's
> develop software to generate and write lists of messages that need
> translating.
> 
> Then people can start developing useful sets of translations.
> 
> Meanwhile, we can also hook it into other interfaces where it
> appropriate.

The incremental approach is a great approach, but it does have its
limitations.  Especially when several non-trivial features will
eventually need to be compatible with each other to be true parts of a
greater whole, which is i18n for Emacs.

For example, it is IMO pointless to be able to display translated
strings from 'message' without also having a convenient automated way
of collecting translatable messages and creating a message catalog
that such a 'message' could use, or without being able to install
such message catalogs for different ELisp packages.

IOW, this feature, like many other large features, cannot be
implemented in increments that are too small.  Each increment should
be large enough to make sense.  And then there's a more complex issue
of how the increments will work together; some thought must be
invested in that up front.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-04 16:36                 ` Eli Zaretskii
@ 2019-03-04 18:37                   ` Paul Eggert
  2019-03-04 19:07                     ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-04 18:37 UTC (permalink / raw)
  To: Eli Zaretskii, rms; +Cc: juri, emacs-devel

On 3/4/19 8:36 AM, Eli Zaretskii wrote:
> For example, it is IMO pointless to be able to display translated
> strings from 'message' without also having a convenient automated way
> of collecting translatable messages and creating a message catalog
> that such a 'message' could use

There is longstanding technology to do that for C code. We could apply
that to Emacs, and then at least the builtin C-level messages will be
translated. Later, we could extend this to Elisp.




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-04 18:37                   ` Paul Eggert
@ 2019-03-04 19:07                     ` Eli Zaretskii
  2019-03-05  2:09                       ` Paul Eggert
  2019-03-05  2:49                       ` Richard Stallman
  0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-04 19:07 UTC (permalink / raw)
  To: Paul Eggert; +Cc: juri, rms, emacs-devel

> Cc: emacs-devel@gnu.org, juri@linkov.net
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 4 Mar 2019 10:37:31 -0800
> 
> On 3/4/19 8:36 AM, Eli Zaretskii wrote:
> > For example, it is IMO pointless to be able to display translated
> > strings from 'message' without also having a convenient automated way
> > of collecting translatable messages and creating a message catalog
> > that such a 'message' could use
> 
> There is longstanding technology to do that for C code. We could apply
> that to Emacs, and then at least the builtin C-level messages will be
> translated. Later, we could extend this to Elisp.

I'm saying that IMO it makes no sense at all to do this only for C.
The infrastructure used for that will most probably not work for Lisp,
let alone allow separate translations for separate packages to be
brought together and used in the same Emacs session.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-04 19:07                     ` Eli Zaretskii
@ 2019-03-05  2:09                       ` Paul Eggert
  2019-03-05 21:58                         ` Emacs i18n Juri Linkov
  2019-03-06  2:09                         ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Richard Stallman
  2019-03-05  2:49                       ` Richard Stallman
  1 sibling, 2 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-05  2:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: juri, rms, emacs-devel

On 3/4/19 11:07 AM, Eli Zaretskii wrote:
> I'm saying that IMO it makes no sense at all to do this only for C.

Yes, of course it should also work for Elisp. I mentioned C only as a
way to get it started, since the C infrastructure already exists and we
need to do it for the C messages anyway.

> The infrastructure used for that will most probably not work for Lisp,
> let alone allow separate translations for separate packages to be
> brought together and used in the same Emacs session.

I don't see why it wouldn't work for Elisp. The gettext infrastructure
allows multiple message catalogs in the same session. Obviously some
hacking would be involved, since Elisp currently doesn't do any of this;
but it could be built atop the existing infrastructure used by other GNU
applications rather than being rewritten from scratch.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-05  2:09                       ` Paul Eggert
@ 2019-03-05 21:58                         ` Juri Linkov
  2019-03-06  2:16                           ` Richard Stallman
                                             ` (2 more replies)
  2019-03-06  2:09                         ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Richard Stallman
  1 sibling, 3 replies; 145+ messages in thread
From: Juri Linkov @ 2019-03-05 21:58 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, rms, emacs-devel

>> The infrastructure used for that will most probably not work for Lisp,
>> let alone allow separate translations for separate packages to be
>> brought together and used in the same Emacs session.
>
> I don't see why it wouldn't work for Elisp. The gettext infrastructure
> allows multiple message catalogs in the same session. Obviously some
> hacking would be involved, since Elisp currently doesn't do any of this;
> but it could be built atop the existing infrastructure used by other GNU
> applications rather than being rewritten from scratch.

One of the main decisions that has to be made is whether to wrap all
user-facing translatable strings in all Lisp files using a macro/function
'gettext' (alias '_') explicitly like is implemented in XEmacs' I18N3
that would help to extract translations from the source code, or to use
a low-level implicit translation without changing the existing code like
is implemented for handling text-quoting-style in format strings.
The latter will even allow translation of strings that a package author
forgot to mark with '_'.

Depending on this decision a translation file format has to be selected,
be it flat Gettext PO format files or even some YAML-like hierarchical
Lisp structures with scopes.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-05 21:58                         ` Emacs i18n Juri Linkov
@ 2019-03-06  2:16                           ` Richard Stallman
  2019-03-06 18:15                             ` Eli Zaretskii
  2019-03-06 17:30                           ` Eli Zaretskii
  2019-03-06 18:09                           ` Eli Zaretskii
  2 siblings, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-06  2:16 UTC (permalink / raw)
  To: Juri Linkov; +Cc: eliz, eggert, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > One of the main decisions that has to be made is whether to wrap all
  > user-facing translatable strings in all Lisp files using a macro/function
  > 'gettext' (alias '_') explicitly like is implemented in XEmacs' I18N3
  > that would help to extract translations from the source code, or to use
  > a low-level implicit translation without changing the existing code like
  > is implemented for handling text-quoting-style in format strings.
  > The latter will even allow translation of strings that a package author
  > forgot to mark with '_'.

We could recognize all strings passed as certain arguments to certain
functions as translatable automatically, and have an explicit
way to mark other strings as translatable.  That could reduce
the amount of work for developers to mark them.

The translatability could be recorded as a text property in the
string.  Then, if a function such as 'message' gets a format string
that is not translatable, it could warn, or save up a record that
developers could optionally look at later.  This would help remind
developers to mark the strings that need it.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06  2:16                           ` Richard Stallman
@ 2019-03-06 18:15                             ` Eli Zaretskii
  2019-03-06 19:47                               ` Paul Eggert
  2019-03-07  3:42                               ` Richard Stallman
  0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-06 18:15 UTC (permalink / raw)
  To: rms; +Cc: eggert, emacs-devel, juri

> From: Richard Stallman <rms@gnu.org>
> Cc: eggert@cs.ucla.edu, eliz@gnu.org, emacs-devel@gnu.org
> Date: Tue, 05 Mar 2019 21:16:07 -0500
> 
> We could recognize all strings passed as certain arguments to certain
> functions as translatable automatically, and have an explicit
> way to mark other strings as translatable.  That could reduce
> the amount of work for developers to mark them.

You mean, the function, such as 'message', that receives the string
will translate it?  As opposed to the alternative of translating the
string _before_ it gets passed to the function?

If we do that, how do we deal with strings that are computed by
concatenation or formatting?  They get in one piece to functions like
'message', but the catalog will not hold that concatenated string, it
will have the parts separately.  How will the function be able to look
up the translation in such cases?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 18:15                             ` Eli Zaretskii
@ 2019-03-06 19:47                               ` Paul Eggert
  2019-03-06 20:19                                 ` Eli Zaretskii
  2019-03-08  4:07                                 ` Richard Stallman
  2019-03-07  3:42                               ` Richard Stallman
  1 sibling, 2 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-06 19:47 UTC (permalink / raw)
  To: Eli Zaretskii, rms; +Cc: emacs-devel, juri

On 3/6/19 10:15 AM, Eli Zaretskii wrote:

> how do we deal with strings that are computed by concatenation or
> formatting?
>
The same way that other GNU packages deal with them: we redo calls, to
make the strings easier to translate. For example, instead of this code
(adapted from todo-mode.el):

  (message (concat "The highlighted item" (if (= count 1) " is " "s
precedes ")
                   "the timestamp %s.")
           timestamp)

we do something like this:

  (nmessage count
            "The highlighted item is not up to date."
            "The highlighted items are not up to date."
            timestamp)

where (nmessage N FMT1 FMT2 ...) is a new function that mimics GNU
ngettext by returning a translation of FMT1 (using N) if a translation
is available, and if no translation is available it falls back on using
FMT1 if N is 1 and FMT2 otherwise.

It's inevitable that we'd need to redo Lisp code this way, as
translators cannot be expected to be programming experts that understand
arbitrary Lisp code involving 'concat' and whatnot. This is what other
GNU packages have done, and Emacs can do something similar.

Of course it will be a big task to fully internationalize in this way,
and it's not something that can be done all at once. But it doesn't have
to be done all at once: we can create the machinery, do some proper i18n
of a few key Lisp modules, and let the other modules be fixed up later
when people find time.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 19:47                               ` Paul Eggert
@ 2019-03-06 20:19                                 ` Eli Zaretskii
  2019-03-07  1:52                                   ` Paul Eggert
  2019-03-08  4:07                                 ` Richard Stallman
  1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-06 20:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel, rms, juri

> Cc: juri@linkov.net, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 6 Mar 2019 11:47:23 -0800
> 
> On 3/6/19 10:15 AM, Eli Zaretskii wrote:
> 
> > how do we deal with strings that are computed by concatenation or
> > formatting?
> >
> The same way that other GNU packages deal with them: we redo calls, to
> make the strings easier to translate. For example, instead of this code
> (adapted from todo-mode.el):
> 
>   (message (concat "The highlighted item" (if (= count 1) " is " "s
> precedes ")
>                    "the timestamp %s.")
>            timestamp)
> 
> we do something like this:
> 
>   (nmessage count
>             "The highlighted item is not up to date."
>             "The highlighted items are not up to date."
>             timestamp)

That's the easy case.  This one is a bit tougher:

  (message "The program says: " (shell-command-to-string "foo"))

> It's inevitable that we'd need to redo Lisp code this way, as
> translators cannot be expected to be programming experts that understand
> arbitrary Lisp code involving 'concat' and whatnot. This is what other
> GNU packages have done, and Emacs can do something similar.

Except that Emacs is so much larger that doing this "like other
packages" might make the job infeasible.

Which is one reason why I think we should start from doc strings: they
are both easier and much more important.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 20:19                                 ` Eli Zaretskii
@ 2019-03-07  1:52                                   ` Paul Eggert
  2019-03-07  3:37                                     ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-07  1:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, rms, juri

On 3/6/19 12:19 PM, Eli Zaretskii wrote:

>
> That's the easy case. This one is a bit tougher: (message "The program
> says: " (shell-command-to-string "foo"))
>
Assuming you meant this:

  (message (concat "The program says: " (shell-command-to-string "foo")))

then it shouldn't be tough at all. The Elisp code should be rewritten
like this:

   (message "The program says: %s" (shell-command-to-string "foo"))

xgettext will automatically put "The program says: %s" into the pool of
translatable strings. The output of the "foo" command won't be
translated, nor should it be.

Anyway, the Elisp code with "concat" needs to be rewritten regardless of
whether we do i18n, as it can throw an exception if the shell command's
output contains "%".

All this is routine for program internationalization. Emacs is not
special here; we've often had to do this sort of thing for other GNU
packages.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  1:52                                   ` Paul Eggert
@ 2019-03-07  3:37                                     ` Eli Zaretskii
  2019-03-08  4:07                                       ` Richard Stallman
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-07  3:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: juri, rms, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 6 Mar 2019 17:52:05 -0800
> Cc: emacs-devel@gnu.org, rms@gnu.org, juri@linkov.net
> 
> then it shouldn't be tough at all. The Elisp code should be rewritten
> like this:
> 
>    (message "The program says: %s" (shell-command-to-string "foo"))
> 
> xgettext will automatically put "The program says: %s" into the pool of
> translatable strings. The output of the "foo" command won't be
> translated, nor should it be.
> 
> Anyway, the Elisp code with "concat" needs to be rewritten regardless of
> whether we do i18n, as it can throw an exception if the shell command's
> output contains "%".
> 
> All this is routine for program internationalization. Emacs is not
> special here; we've often had to do this sort of thing for other GNU
> packages.

Sure, except that Emacs is so much larger, and gives the programmer a
lot more freedom with treating code and data alike, than a typical C
program.  I just want people to realize how this job is more
complicated in Emacs than in any other program.  E.g., IIUC what you
say, we will need to rewrite also the likes of this:

  (let* ((field (get-char-property pos 'field))
	 (button (get-char-property pos 'button))
	 (doc (get-char-property pos 'widget-doc))
	 (text (cond (field "This is an editable text area.")
		     (button "This is an active area.")
		     (doc "This is documentation text.")
		     (t "This is unidentified text.")))
	 (widget (or field button doc)))
    (when widget
      (widget-browse widget))
    (message text)))

And this is just a random, and not the most complicated, example.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  3:37                                     ` Eli Zaretskii
@ 2019-03-08  4:07                                       ` Richard Stallman
  2019-03-08  8:16                                         ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-08  4:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: juri, eggert, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > (let* ((field (get-char-property pos 'field))
    > 	 (button (get-char-property pos 'button))
    > 	 (doc (get-char-property pos 'widget-doc))
    > 	 (text (cond (field "This is an editable text area.")
    > 		     (button "This is an active area.")
    > 		     (doc "This is documentation text.")
    > 		     (t "This is unidentified text.")))
    > 	 (widget (or field button doc)))
    >   (when widget
    >     (widget-browse widget))
    >   (message text)))

We would need to make SOME sort of change in it, but change could be
very simple.  It could consist of writing a call to 'translate' around
each of those string constants.

Or we might adopt a reader syntax for translatable strings.
That might be convenient, since we want these to be found
by tools processing the code, not solely handled by
executing the code.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08  4:07                                       ` Richard Stallman
@ 2019-03-08  8:16                                         ` Eli Zaretskii
  0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-08  8:16 UTC (permalink / raw)
  To: rms; +Cc: juri, eggert, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org, juri@linkov.net
> Date: Thu, 07 Mar 2019 23:07:20 -0500
> 
> Or we might adopt a reader syntax for translatable strings.
> That might be convenient, since we want these to be found
> by tools processing the code, not solely handled by
> executing the code.

I think we will need to come up with such a syntax anyway, because we
will want to leave the Lisp programmers the freedom of writing code
that computes displayable text out of thin air.

It doesn't have to be reader syntax, btw: it could be a special
function.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 19:47                               ` Paul Eggert
  2019-03-06 20:19                                 ` Eli Zaretskii
@ 2019-03-08  4:07                                 ` Richard Stallman
  2019-03-08  4:33                                   ` Elias Mårtenson
  1 sibling, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-08  4:07 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, juri, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > we do something like this:

  >   (nmessage count
  >             "The highlighted item is not up to date."
  >             "The highlighted items are not up to date."
  >             timestamp)

It might be better to define a function like this

(defun numeric-select (count &rest messages)
  (or (nth count messages)
      (car (last messages))))

and then write

 (message (numeric-select count
 	       "The highlighted item is not up to date."
	        "The highlighted items are not up to date."))

Translation infrastructure might be able to recognize this construct
and mark the two strings as translatable if they are constants.

Even better, translation could allow replacing that list of messages
with a different list of messages, perhaps longer.   That would
make possible perfect support for a language where you need a different
text for 2 and for numbers larger than 2.

We could decide that the first element is for COUNT = 0,
and if that element is a number instead of a string, it means
to use the element for that number.

  (message (numeric-select count
  	         2
   	         "The highlighted item is not up to date."
	         "The highlighted items are not up to date."))

This, together with the feature of translating the list
as a different list, could be totally general.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08  4:07                                 ` Richard Stallman
@ 2019-03-08  4:33                                   ` Elias Mårtenson
  2019-03-08  8:22                                     ` Eli Zaretskii
  2019-03-09  3:11                                     ` Richard Stallman
  0 siblings, 2 replies; 145+ messages in thread
From: Elias Mårtenson @ 2019-03-08  4:33 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Eli Zaretskii, Paul Eggert, emacs-devel, Juri Linkov

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On Fri, 8 Mar 2019 at 12:08, Richard Stallman <rms@gnu.org> wrote:

Even better, translation could allow replacing that list of messages
> with a different list of messages, perhaps longer.   That would
> make possible perfect support for a language where you need a different
> text for 2 and for numbers larger than 2.


Russian, for example, uses three different grammatical cases, which are
dependent on the last digit of the number, the system needs to be more
complicated.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 837 bytes --]

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08  4:33                                   ` Elias Mårtenson
@ 2019-03-08  8:22                                     ` Eli Zaretskii
  2019-03-09  3:11                                     ` Richard Stallman
  1 sibling, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-08  8:22 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: eggert, emacs-devel, rms, juri

> From: Elias Mårtenson <lokedhs@gmail.com>
> Date: Fri, 8 Mar 2019 12:33:24 +0800
> Cc: Paul Eggert <eggert@cs.ucla.edu>, Eli Zaretskii <eliz@gnu.org>, Juri Linkov <juri@linkov.net>, 
> 	emacs-devel <emacs-devel@gnu.org>
> 
> Russian, for example, uses three different grammatical cases, which are dependent on the last digit of the
> number, the system needs to be more complicated.

It's more complicated than that (e.g., 21 and 11 produce different
forms in Russian), but gettext already has infrastructure for all
that, AFAIR.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08  4:33                                   ` Elias Mårtenson
  2019-03-08  8:22                                     ` Eli Zaretskii
@ 2019-03-09  3:11                                     ` Richard Stallman
  2019-03-09  7:54                                       ` Paul Eggert
  1 sibling, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-09  3:11 UTC (permalink / raw)
  To: Elias MÃ¥rtenson; +Cc: eliz, eggert, emacs-devel, juri

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Russian, for example, uses three different grammatical cases, which are
  > dependent on the last digit of the number, the system needs to be more
  > complicated.

Here's an idea for a scheme general enough to handle Russian as well.
I propose something like a case or select construct.
First, the elegant Lispy way to represent it:

  (numeric-case NUMBER
      (1 "Just one frob")
      (2 "Two frobs")
      (russian-masc "%d-m frobs")
      (russian-fem "%d-f frobs")
      (russian-neut "%d-n frobs")
      (t "%d frobs"))

Translation would have to the entire numeric-case construct
with another (translated) numeric-case construct.  Thus, the source
code would contain one suitable for English:

  (numeric-case NUMBER
      (1 "one frob")
      (t "%d frobs"))

and for Russian we would translate it into this one

  (numeric-case NUMBER
      (russian-masc "%d-m frobs")
      (russian-fem "%d-f frobs")
      (russian-neut "%d-n frobs"))

I think this framework could be extended to handle
whatever other weird grammatical rules we might encounter in other languages
in the future.

While doing it with Lisp syntax is elegant, it would require
generalization of the infrastructure for recording translations to
handle more than strings.   That would be a pain.

Here's a way to represent the conditional construct as a kind of
string.  That way, translation would only need to translate strings
into strings.

We could use | in the string to separate alternatives, and : to end
a condition.  It would look like this:

  (numeric-case NUMBER
    "1:one frob|\
     t:%d frobs")

For Russian, we would translate the source string

  1:one frob|t:%d frobs

into

  russian-masc:%d-m frobs|russian-fem:%d-f frobs|russian-neut:%d-n frobs

The subsequences : and | would be handled by the function numeric-case.
They would not affect the meaning of the string data type as such.
numeric-case would ignore whitespace after |.

With this string convention, we only need to translate strings.

To include a | in an alternative, you could write a double |.
We do not need a way to quote a colon.

Perhaps one could develop a smarter 'russian' alternative
that knows how to change the last letter automatically and handles
all three alternatives.

Maybe we need to define a format-spec for devouring and ignoring one argument.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09  3:11                                     ` Richard Stallman
@ 2019-03-09  7:54                                       ` Paul Eggert
  2019-03-09 10:30                                         ` Eli Zaretskii
                                                           ` (3 more replies)
  0 siblings, 4 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-09  7:54 UTC (permalink / raw)
  To: rms; +Cc: eliz, emacs-devel, Elias Mårtenson, juri

Richard Stallman wrote:
> Here's an idea for a scheme general enough to handle Russian as well.

That idea's use of "-masc", "-fem", and "-neut" suggests that you misunderstood 
the problem with translating format strings like "%d items" into Russian.

Russian has three plural forms useful for translating a string that formats an 
integer N. One form is for when (N%10 == 1 && N%100/10 != 1), one is for when (2 
<= N%10 && N%10 <= 4 && N%100/10 == 1), and one is for everything else. So the 
form depends on N, not on whether the translation of the word "items" is 
masculine or feminine or whatever. Other languages have other rules, with 
varying levels of complexity; for example, Arabic has six different plural forms.

GNU gettext deals with this at the translation level, so that ordinary programs 
can just use a function like ngettext to translate an English-language format 
with two plural forms. Emacs Lisp should do something similar: we shouldn't try 
to reinvent this wheel.

Here's an example, taken from GNU dd. The C source code contains the two English 
forms and looks something like this:

          fprintf (stderr,
                   ngettext ("%"PRIuMAX" byte copied, %s, %s",
                             "%"PRIuMAX" bytes copied, %s, %s",
                             w_bytes),
                   w_bytes, delta_s_buf, bytes_per_second);

And the ru.po file (which Russian translators edit) looks like this:

"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 
&& (n%100<10 || n%100>=20) ? 1 : 2);\n"
...
#: src/dd.c:822
#, c-format
msgid "%<PRIuMAX> byte copied, %s, %s"
msgid_plural "%<PRIuMAX> bytes copied, %s, %s"
msgstr[0] "%<PRIuMAX> байт скопирован, %s, %s"
msgstr[1] "%<PRIuMAX> байта скопировано, %s, %s"
msgstr[2] "%<PRIuMAX> байт скопировано, %s, %s"

Each of the three Russian plural forms is supported, and the right one is chosen 
by the translation system without the programmer having to know how Russian 
plural forms work. For more about this, please see the GNU gettext manual, such 
as this web page:

https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html

PS. Although the email from Elias said "From: =?UTF-8?Q?Elias_M=C3=A5rtenson?=" 
which displays correctly as "Elias Mårtenson", your reply said "To: Elias 
=?iso-8859-1?Q?M=C3=A5rtenson?=" which displays incorrectly as "Elias 
MÃ¥rtenson". It looks like there's a bug in your email client, or in your 
configuration of it, a bug that munges names of your email correspondents.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09  7:54                                       ` Paul Eggert
@ 2019-03-09 10:30                                         ` Eli Zaretskii
  2019-03-10  3:05                                         ` Richard Stallman
                                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-09 10:30 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel, lokedhs, rms, juri

> Cc: Elias Mårtenson <lokedhs@gmail.com>, eliz@gnu.org,
>  juri@linkov.net, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 8 Mar 2019 23:54:28 -0800
> 
> So the form depends on N, not on whether the translation of the word
> "items" is masculine or feminine or whatever.

It actually depends on both.

> GNU gettext deals with this at the translation level, so that ordinary programs 
> can just use a function like ngettext to translate an English-language format 
> with two plural forms. Emacs Lisp should do something similar: we shouldn't try 
> to reinvent this wheel.

Right.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09  7:54                                       ` Paul Eggert
  2019-03-09 10:30                                         ` Eli Zaretskii
@ 2019-03-10  3:05                                         ` Richard Stallman
  2019-03-10  6:07                                           ` Paul Eggert
  2019-03-10  8:45                                           ` Yuri Khan
  2019-03-10  3:05                                         ` Richard Stallman
  2019-03-10  3:05                                         ` Richard Stallman
  3 siblings, 2 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-10  3:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, juri, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Russian has three plural forms useful for translating a string that formats an 
  > integer N. One form is for when (N%10 == 1 && N%100/10 != 1), one is for when (2 
  > <= N%10 && N%10 <= 4 && N%100/10 == 1), and one is for everything else. So the 
  > form depends on N, not on whether the translation of the word "items" is 
  > masculine or feminine or whatever.

That's how I understood it, and that is exactly what my proposal does.

I will try to explain it again.

Each clause inside numeric-select handles certain numbers.
The car of the clause (in Lispy structure) selects numbers
to handle.

'russian-masc' selects numbers that require a masculine ending, in
Russian.  You use it with a string that contains the masculine ending.

'russian-fem' selects numbers that require a feminine ending, in
Russian.  You use it with a string that contains the feminine ending.

'russian-neut' selects numbers that require a neuter ending, in
Russian.  You use it with a string that contains the neuter ending.

If this does not work, why not?

In the example that was sent, I see code that tests for certain kinds
of numbers.  But since I don't know the language that that is written
in, the mathematical conditions are the only part I understand.  I
don't see what it will _do_ in each of those conditions.  I presume it
selects the appropriate suffix for the number, but I don't follow how
it does so.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-10  3:05                                         ` Richard Stallman
@ 2019-03-10  6:07                                           ` Paul Eggert
  2019-03-11  1:20                                             ` Richard Stallman
  2019-03-10  8:45                                           ` Yuri Khan
  1 sibling, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-10  6:07 UTC (permalink / raw)
  To: rms; +Cc: eliz, juri, lokedhs, emacs-devel

Richard Stallman wrote:

> If this does not work, why not?

Thanks for explaining the -masc, -fem, -neut part. I'm afraid, though, that I 
still don't fully understand the proposal. It sounds like it is a redesign of 
what GNU gettext does, but I don't see any advantage over GNU gettext.

> In the example that was sent, I see code that tests for certain kinds
> of numbers.  But since I don't know the language that that is written
> in, the mathematical conditions are the only part I understand.  I
> don't see what it will _do_ in each of those conditions.  I presume it
> selects the appropriate suffix for the number, but I don't follow how
> it does so.

The GNU gettext translation code doesn't know anything about suffixes. All it 
knows is that if n%10==1 && n%100!=11 then it should use msgstr[0], else if 
n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) then it should use msgstr[1], else 
it should use msgstr[2]. The translations themselves are string formats that 
already have the proper suffixes, and GNU gettext simply copies those suffixes.

This is a simple scheme that does not attempt to solve the problem of generating 
idiomatic phrases for numbers (e.g., "twenty-four bytes" in English, "двадцать 
четыре байта" in Russian). All it solves is the problem of generating phrases 
containing numerals (e.g., "24 bytes" in English, "23 байта" in Russian), as 
these are the sorts of phrases that printf formats can generate. In practice, 
this is good enough.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-10  6:07                                           ` Paul Eggert
@ 2019-03-11  1:20                                             ` Richard Stallman
  2019-03-11  3:52                                               ` Paul Eggert
  0 siblings, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-11  1:20 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, juri, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Thanks for explaining the -masc, -fem, -neut part. I'm afraid, though, that I 
  > still don't fully understand the proposal. It sounds like it is a redesign of 
  > what GNU gettext does, but I don't see any advantage over GNU gettext.

The advantage -- which is a big one -- is that the way the translation
is represented is much cleaner.  Compare this

  (numeric-case NUMBER
      (russian-masc "%d байт скопирован, %s, %s")
      (russian-fem "%d байта скопировано, %s, %s")
      (russian-neut "%d байт скопировано, %s, %s"))

(I have filled in strings for the real example you sent.  Since I
don't speak Russian, I was unable to write one myself, and it would
have taken me hours to find one.)

or this:

      "russian-masc:%d байт скопирован, %s, %s|\
       russian-fem:%d байта скопировано, %s, %s|\
       russian-neut:%d байт скопировано, %s, %s"

with this:

    "Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 
    && (n%100<10 || n%100>=20) ? 1 : 2);\n"
    ...
    #: src/dd.c:822
    #, c-format
    msgid "%<PRIuMAX> byte copied, %s, %s"
    msgid_plural "%<PRIuMAX> bytes copied, %s, %s"
    msgstr[0] "%<PRIuMAX> байт скопирован, %s, %s"
    msgstr[1] "%<PRIuMAX> байта скопировано, %s, %s"
    msgstr[2] "%<PRIuMAX> байт скопировано, %s, %s"

If the selector symbol can modify the string too,
I can envision something like this:

      "russian-nom:%d байт%| скопирован%|, %s, %s"

where the 'russian-nom' operator would replace the two %| sequences
with the appropriate declensional suffixes for the nominative case.

Building that sort of thing into gettext would be bad architecture.
Gettext is too low level, and used in too many places.

Making Emacs handle 'russian-nom' in a string it pulls out of gettext
would be no problem at all.

  > This is a simple scheme that does not attempt to solve the problem of generating 
  > idiomatic phrases for numbers (e.g., "twenty-four bytes" in English,

I agree we don't need to do this.  But, with the mechanism I've just
proposed, it would be easy to do, so I suppose we would implement it.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-11  1:20                                             ` Richard Stallman
@ 2019-03-11  3:52                                               ` Paul Eggert
  2019-03-12  3:31                                                 ` Richard Stallman
  2019-03-12  3:31                                                 ` Richard Stallman
  0 siblings, 2 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-11  3:52 UTC (permalink / raw)
  To: rms; +Cc: eliz, juri, lokedhs, emacs-devel

Richard Stallman wrote:
> Compare this
> 
>    (numeric-case NUMBER
>        (russian-masc "%d байт скопирован, %s, %s")
>        (russian-fem "%d байта скопировано, %s, %s")
>        (russian-neut "%d байт скопировано, %s, %s"))
> 
> with this:
> 
>      "Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4
>      && (n%100<10 || n%100>=20) ? 1 : 2);\n"
>      ...
>      #: src/dd.c:822
>      #, c-format
>      msgid "%<PRIuMAX> byte copied, %s, %s"
>      msgid_plural "%<PRIuMAX> bytes copied, %s, %s"
>      msgstr[0] "%<PRIuMAX> байт скопирован, %s, %s"
>      msgstr[1] "%<PRIuMAX> байта скопировано, %s, %s"
>      msgstr[2] "%<PRIuMAX> байт скопировано, %s, %s"

I'm afraid that's not a apples-to-apples comparison. The first form contains 
only the Russian translations, whereas the second form contains much more 
information: the source-code location of the untranslated strings, a copy of the 
untranslated English-language strings, and the general rules for Russian (the 
last is shared among all the Russian translations, not just the translations 
listed here). This extra information is useful for translators, and it has a 
reasonably extensive software suite that already supports it, not to mention 
translators who are already used to it.

> I can envision something like this:
> 
>        "russian-nom:%d байт%| скопирован%|, %s, %s"
> 
> where the 'russian-nom' operator would replace the two %| sequences
> with the appropriate declensional suffixes for the nominative case.

But Russian declension is not that simple. The Russian word for "byte" is 
"байт", but its plural form depends not only on the number (as in the above 
examples) but also in its case: the "байт" and "байта" in the above examples are 
not exhaustive. And some words have irregular declensions: for example, ребёнок 
(singular) versus де́ти (plural) for the same noun. And it's not just nouns and 
pronouns that are affected: adjectives also have singular and plural forms. And 
I have by no means exhausted the issues involved here; to get a better feeling 
for the complexity in this area, please see:

https://en.wikipedia.org/wiki/Russian_declension

Although it wouldn't be impossible for Emacs Lisp code to handle all the special 
cases for Russian declension, it would be tricky to implement, or to document it 
in a way that translators would easily understand. And we'd also have to 
implement and document similarly tricky rules for other languages. And we'd have 
to deal with the fact that not every Russian-speaker agrees with how to decline 
words like "байт" that are imported from English. These sorts of issues should 
be delegated to translators, not to likely-fragile code in Emacs Lisp (a 
technology that translators typically do not grok).

In contrast, the gettext way is relatively simple and easily understood, and is 
already common practice.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-11  3:52                                               ` Paul Eggert
@ 2019-03-12  3:31                                                 ` Richard Stallman
  2019-03-12  3:31                                                 ` Richard Stallman
  1 sibling, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-12  3:31 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, emacs-devel, lokedhs, juri

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > I can envision something like this:
  > > 
  > >        "russian-nom:%d байт%| скопирован%|, %s, %s"
  > > 
  > > where the 'russian-nom' operator would replace the two %| sequences
  > > with the appropriate declensional suffixes for the nominative case.

  > But Russian declension is not that simple. The Russian word for "byte" is 
  > "байт", but its plural form depends not only on the number (as in the above 
  > examples) but also in its case:

Yes, of course.  I anticipated that.  That is why I called the
construct 'russian-nom', and specified that it provides "the
appropriate declensional suffixes for the nominative case."

We could define similar constructs for some of the other cases in
Russian, whichever ones translators would want to use.

    the "байт" and "байта" in the above examples are 
    not exhaustive.

No problem.  Nobody supposed that they were.  

		    And some words have irregular declensions:

I anticipated that, too.  The low-level forms 'russian-masc' and
friends can handle all such situations.  In them you can specify the
precise conjugated forms for the irregular words in the message.

  > nd it's not just nouns and 
  > pronouns that are affected: adjectives also have singular and plural forms.

'russian-masc' and friends allow explicit conjugation of any parts of
speech.

  >  And we'd have 
  > to deal with the fact that not every Russian-speaker agrees with how to decline 
  > words like "байт" that are imported from English.

The translator is always welcome to use the low-level constructs
'russian-masc' and friends, to exercise explicit control over that.

  > I have by no means exhausted the issues involved here; to get a better feeling 
  > for the complexity in this area, please see:

  > https://en.wikipedia.org/wiki/Russian_declension

I don't need to understand all the details of Russian numbers.  I've
designed a method so flexible that it can handle any such
complexities.

  > And we'd also have to 
  > implement and document similarly tricky rules for other languages.

No, we don't.  With my approach, we don't _have to_ implement any of
these specific solutions.  We can implement whichever ones we like.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-11  3:52                                               ` Paul Eggert
  2019-03-12  3:31                                                 ` Richard Stallman
@ 2019-03-12  3:31                                                 ` Richard Stallman
  1 sibling, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-12  3:31 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, emacs-devel, lokedhs, juri

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I'm afraid that's not a apples-to-apples comparison.

It doesn't need to be.

							 The first form contains 
  > only the Russian translations, whereas the second form contains much more 
  > information: the source-code location of the untranslated strings, a copy of the 
  > untranslated English-language strings, and the general rules for Russian (the 
  > last is shared among all the Russian translations, not just the translations 
  > listed here).

I can't draw any conclusions about the translation data you sent.  It
is in a format I have never seen and you have not explained it.
So I don't try to understand it.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-10  3:05                                         ` Richard Stallman
  2019-03-10  6:07                                           ` Paul Eggert
@ 2019-03-10  8:45                                           ` Yuri Khan
  1 sibling, 0 replies; 145+ messages in thread
From: Yuri Khan @ 2019-03-10  8:45 UTC (permalink / raw)
  To: rms
  Cc: Eli Zaretskii, Paul Eggert, Elias Mårtenson,
	Emacs developers, Juri Linkov

On Sun, Mar 10, 2019 at 10:06 AM Richard Stallman <rms@gnu.org> wrote:

> 'russian-neut' selects numbers that require a neuter ending, in
> Russian.  You use it with a string that contains the neuter ending.
>
> If this does not work, why not?

You are conflating three grammatical categories: the number, the
gender, and the declension type.

Gender and declension type are attributes of the noun, and are fixed
with respect to the noun. So if your message is about bytes, your
translator knows to use noun endings according to declension type 1a
and verb endings for masculine gender; there is nothing left for the
machine to guess. (Gender of the noun also affects the form of the
numeral if it is spelled out, but for computer-generated messages we
usually do not do that and just use digits.)

Number depends on the numeral’s value and affects the forms of the
noun, and any adjectives and verbs attached to it. Singular number
applies to values than end in 1, except for values that end in 11.
Dual number applies to values that end in 2..4, again, except for
values that end in 12..14. Plural number applies to everything else.
Any grammatical number can apply to any noun, so the translator will
provide all three wordings and let the machine select one using the
above logic.

Your example would work if you changed -masc, -fem and -neut to -sing,
-dual and -pl. But that is, as Paul mentioned, reinventing
ngettext(3).

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09  7:54                                       ` Paul Eggert
  2019-03-09 10:30                                         ` Eli Zaretskii
  2019-03-10  3:05                                         ` Richard Stallman
@ 2019-03-10  3:05                                         ` Richard Stallman
  2019-03-10  6:14                                           ` Paul Eggert
  2019-03-10  3:05                                         ` Richard Stallman
  3 siblings, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-10  3:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, juri, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Russian has three plural forms useful for translating a string that formats an 
  > integer N. One form is for when (N%10 == 1 && N%100/10 != 1), one is for when (2 
  > <= N%10 && N%10 <= 4 && N%100/10 == 1), and one is for everything else. So the 
  > form depends on N, not on whether the translation of the word "items" is 
  > masculine or feminine or whatever.

I know that.  That is the problem I addressed.

Each clause inside numeric-select tests for and handles certain numbers.
The first thing in the clause is a condition that tests the number.
If the condition is a number, it matches only that number.

'russian-masc' tests for numbers that require a masculine noun ending;
you use it with a string that contains the masculine ending.

'russian-fem' tests for numbers that require a feminine noun ending.
you use it with a string that contains the feminine ending.

'russian-neut' tests for numbers that require a neuter noun ending.
you use it with a string that contains the neuter ending.

Since I do not speak Russian, I wrote dummies for those endings:
-m, -f and -n.

Thus,

  (numeric-case NUMBER
      (russian-masc "%d-m frobs")
      (russian-fem "%d-f frobs")
      (russian-neut "%d-n frobs"))

Do you follow, now?

  > "Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 
  > && (n%100<10 || n%100>=20) ? 1 : 2);\n"
  > ...
  > #: src/dd.c:822
  > #, c-format
  > msgid "%<PRIuMAX> byte copied, %s, %s"
  > msgid_plural "%<PRIuMAX> bytes copied, %s, %s"

It would be better if we can define these criteria just once, rather
than restate them in many places.  My idea is to incorporate it
into the definition of the conditionals, russian-masc etc.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-10  3:05                                         ` Richard Stallman
@ 2019-03-10  6:14                                           ` Paul Eggert
  0 siblings, 0 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-10  6:14 UTC (permalink / raw)
  To: rms; +Cc: eliz, juri, lokedhs, emacs-devel

Richard Stallman wrote:
>    > "Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4
>    > && (n%100<10 || n%100>=20) ? 1 : 2);\n"
> 
> It would be better if we can define these criteria just once, rather
> than restate them in many places.

The criteria are stated just once per translation catalog. For example, the 
"Plural-forms:" line appears just once in the Russian translation catalog for 
coreutils. The criteria need not be repeated for each translated message.

If Emacs ends up having dozens or hundreds of message catalogs, it may be worth 
looking into maintaining just one copy of the Russian criteria, rather than once 
per Russian translation catalog. I hope we don't go that route, though.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09  7:54                                       ` Paul Eggert
                                                           ` (2 preceding siblings ...)
  2019-03-10  3:05                                         ` Richard Stallman
@ 2019-03-10  3:05                                         ` Richard Stallman
  3 siblings, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-10  3:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, juri, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > which displays correctly as "Elias Mårtenson", your reply said "To: Elias 
  > =?iso-8859-1?Q?M=C3=A5rtenson?=" which displays incorrectly as "Elias 
  > MÃ¥rtenson". It looks like there's a bug in your email client, or in your 
  > configuration of it, a bug that munges names of your email correspondents.

Indeed, it is a bug.  Maybe someday I can fix it.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 18:15                             ` Eli Zaretskii
  2019-03-06 19:47                               ` Paul Eggert
@ 2019-03-07  3:42                               ` Richard Stallman
  2019-03-07 14:46                                 ` Eli Zaretskii
  1 sibling, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-07  3:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: juri, eggert, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > You mean, the function, such as 'message', that receives the string
  > will translate it?  As opposed to the alternative of translating the
  > string _before_ it gets passed to the function?

Yes, of course.

  > If we do that, how do we deal with strings that are computed by
  > concatenation or formatting?

Feed them in through %s or something like that.

I'm proposing the convention that the first argument to 'message' gets
by default translated, and other arguments don't.  With this
convention, whichever result you want, it is clear how to get it.

We already do things basically this way, because if you want to
compute a string to be the message, you don't want % to be treated
specially in it.  So you use "%s" as the first argument and pas that
string as the second.

So I think this will require only occasional changes
and they won't be urgent.

  >   They get in one piece to functions like
  > 'message', but the catalog will not hold that concatenated string, it
  > will have the parts separately.

That would happen if the catalog is made ONLY by scanning the source.
That's why I suggested a feature to record whatever nontrivial format
strings are passed to 'message' and are not in the catalog.

Then you can add them to the catalog, or fix things some other way.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  3:42                               ` Richard Stallman
@ 2019-03-07 14:46                                 ` Eli Zaretskii
  2019-03-07 17:19                                   ` Paul Eggert
  2019-03-08  4:11                                   ` Richard Stallman
  0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-07 14:46 UTC (permalink / raw)
  To: rms; +Cc: juri, eggert, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org, juri@linkov.net
> Date: Wed, 06 Mar 2019 22:42:06 -0500
> 
>   > If we do that, how do we deal with strings that are computed by
>   > concatenation or formatting?
> 
> Feed them in through %s or something like that.

But then the strings that are formatted via %s will not be translated,
they will remain in English.

> I'm proposing the convention that the first argument to 'message' gets
> by default translated, and other arguments don't.  With this
> convention, whichever result you want, it is clear how to get it.
> 
> We already do things basically this way, because if you want to
> compute a string to be the message, you don't want % to be treated
> specially in it.  So you use "%s" as the first argument and pas that
> string as the second.

For the point I'm trying to make, it is immaterial whether the first
argument is "%s" and the second argument is computed from several
sources, or the first argument is that computed string.  The problems
that follow are the same.

>   >   They get in one piece to functions like
>   > 'message', but the catalog will not hold that concatenated string, it
>   > will have the parts separately.
> 
> That would happen if the catalog is made ONLY by scanning the source.
> That's why I suggested a feature to record whatever nontrivial format
> strings are passed to 'message' and are not in the catalog.

Such a feature will only help when a given call to 'message' produce a
small number of fixed text strings.  If the text it produces includes
some non-deterministic ingredient, this method will not help.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 14:46                                 ` Eli Zaretskii
@ 2019-03-07 17:19                                   ` Paul Eggert
  2019-03-07 18:24                                     ` martin rudalics
  2019-03-07 20:22                                     ` Eli Zaretskii
  2019-03-08  4:11                                   ` Richard Stallman
  1 sibling, 2 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-07 17:19 UTC (permalink / raw)
  To: Eli Zaretskii, rms; +Cc: juri, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3615 bytes --]

On 3/7/19 6:46 AM, Eli Zaretskii wrote:

>> From: Richard Stallman <rms@gnu.org> Cc: eggert@cs.ucla.edu,
>> emacs-devel@gnu.org, juri@linkov.net Date: Wed, 06 Mar 2019 22:42:06
>> -0500 > If we do that, how do we deal with strings that are computed
>> by > concatenation or formatting? Feed them in through %s or
>> something like that.
>>
> But then the strings that are formatted via %s will not be translated,
> they will remain in English.
>
Yes, but the scenario you describe should not occur in a properly
internationalized GNU application. We obviously can't assume that
Emacs's translation subroutine acts like Google Translate and can
translate any English-language string to the user's language. All we can
assume is that the translation subroutine converts one of a fixed set of
English-language strings to a string appropriate for the user's language.

This limitation will cause problems with Elisp code that does extensive
parsing or processing of English syntax (doctor.el, say), and that sort
of Elisp code will remain English-only (unless someone takes the time to
i18nize them specially). However most Elisp code does not parse English
or generate idiomatic English on the fly: instead, it uses a fixed,
stilted style that can routinely be converted to calls like (message
FORMAT ARG1 ARG2 ...) where FORMAT is translated and the ARG values are not.

To get a quick feel for this issue, I did a simple grep for the string
'(message (concat' in the Emacs source code. I found 41 instances of
this string. Of these, 8 were erroneous because the result of the
concatenation could contain an unwanted "%" or '`' that could cause
'message' to go awry, and I fixed them by installing the attached patch
(by the way, it's routine for i18n efforts to find trivial bugs like
this). The other 33 instances could easily be reworded to do proper i18n
when 'message' translates just its first argument and only simple,
xgettext-style static analysis is used to find the message strings. For
example, this code in calc-do-embedded:

  (message (concat
            "Embedded Calc mode enabled; "
            (if calc-embedded-quiet
                "Type `C-x * x'"
              "Give this command again")
            " to return to normal"))

can easily be rewritten to this:

   (if calc-embedded-quiet
       "Embedded Calc mode enabled; Type `C-x * x' to return to normal"
     "Embedded Calc mode enabled; Give this command again to return to
normal"))

which is easier for translators to grok, and is arguably clearer even if
we don't want to translate at all.Obviously my '(message (concat'
exercises only a small part of the problem, but the results of this
little sample are encouraging.

So this problem is solvable. Sure, it'll require substantial work, but
the work is routine and this sort of thing has been done for other packages.

The main argument against doing all this is that it's too much work
overall and nobody will have the time to do it all, so let's not even
bother. I have some sympathy for this argument, as i18n is clearly too
much work for any single contributor and the work will distract us from
other things. On the other hand, there's no pressing need to do all the
work quickly, it's a low-level task that can be farmed out to non-expert
volunteers that could conceivably grow the volunteer population, and if
we never even start the work then it will never get done and Emacs will
remain unfriendly to users who don't grok English.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Be-safer-about-in-message-formats.patch --]
[-- Type: text/x-patch; name="0001-Be-safer-about-in-message-formats.patch", Size: 7751 bytes --]

From f15d0d0247ffe7bc3bbd5fbe10271c93b2e2fb1c Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 7 Mar 2019 09:02:15 -0800
Subject: [PATCH] Be safer about "%" in message formats
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* lisp/calc/calc-store.el (calc-copy-special-constant):
* lisp/net/rcirc.el (rcirc-handler-PART, rcirc-handler-KICK):
* lisp/org/org-agenda.el (org-agenda):
* lisp/org/org-clock.el (org-clock-out, org-clock-display):
* lisp/org/org.el (org-refile):
* lisp/progmodes/ada-xref.el (ada-goto-declaration):
* lisp/progmodes/idlwave.el (idlwave-scan-library-catalogs):
Don’t trust arbitrary strings to not contain "%" or "`" in
(message (concat STRING1 STRING2 ...)).
---
 lisp/calc/calc-store.el    |  4 ++--
 lisp/net/rcirc.el          |  4 ++--
 lisp/org/org-agenda.el     | 13 ++++++-------
 lisp/org/org-clock.el      | 22 ++++++++++++----------
 lisp/org/org.el            |  3 ++-
 lisp/progmodes/ada-xref.el |  3 +--
 lisp/progmodes/idlwave.el  |  7 +++----
 7 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/lisp/calc/calc-store.el b/lisp/calc/calc-store.el
index 589a776c41..3987c129c2 100644
--- a/lisp/calc/calc-store.el
+++ b/lisp/calc/calc-store.el
@@ -405,8 +405,8 @@ calc-copy-special-constant
                                     sconst))))
        (if var
            (let ((msg (calc-store-value var value "")))
-             (message (concat "Special constant \"%s\" copied to \"%s\"" msg)
-                      sconst (calc-var-name var)))))))))
+	     (message "Special constant \"%s\" copied to \"%s\"%s"
+		      sconst (calc-var-name var) msg))))))))
 
 (defun calc-copy-variable (&optional var1 var2)
   (interactive)
diff --git a/lisp/net/rcirc.el b/lisp/net/rcirc.el
index b1a6c1ce8d..9d53cd4436 100644
--- a/lisp/net/rcirc.el
+++ b/lisp/net/rcirc.el
@@ -2685,7 +2685,7 @@ rcirc-handler-PART-or-KICK
 (defun rcirc-handler-PART (process sender args _text)
   (let* ((channel (car args))
 	 (reason (cadr args))
-	 (message (concat channel " " reason)))
+	 (message "%s %s" channel reason))
     (rcirc-print process sender "PART" channel message)
     ;; print in private chat buffer if it exists
     (when (rcirc-get-buffer (rcirc-buffer-process) sender)
@@ -2697,7 +2697,7 @@ rcirc-handler-KICK
   (let* ((channel (car args))
 	 (nick (cadr args))
 	 (reason (nth 2 args))
-	 (message (concat nick " " channel " " reason)))
+	 (message "%s %s %s" nick channel reason))
     (rcirc-print process sender "KICK" channel message t)
     ;; print in private chat buffer if it exists
     (when (rcirc-get-buffer (rcirc-buffer-process) nick)
diff --git a/lisp/org/org-agenda.el b/lisp/org/org-agenda.el
index e416f5f062..23ee8d71e6 100644
--- a/lisp/org/org-agenda.el
+++ b/lisp/org/org-agenda.el
@@ -2882,13 +2882,12 @@ org-agenda
 	     (let* ((m (org-agenda-get-any-marker))
 		    (note (and m (org-entry-get m "THEFLAGGINGNOTE"))))
 	       (when note
-		 (message (concat
-			   "FLAGGING-NOTE ([?] for more info): "
-			   (org-add-props
-			       (replace-regexp-in-string
-				"\\\\n" "//"
-				(copy-sequence note))
-			       nil 'face 'org-warning)))))))
+		 (message "FLAGGING-NOTE ([?] for more info): %s"
+			  (org-add-props
+			   (replace-regexp-in-string
+			    "\\\\n" "//"
+			    (copy-sequence note))
+			   nil 'face 'org-warning))))))
 	 t t))
        ((equal org-keys "#") (call-interactively 'org-agenda-list-stuck-projects))
        ((equal org-keys "/") (call-interactively 'org-occur-in-agenda-files))
diff --git a/lisp/org/org-clock.el b/lisp/org/org-clock.el
index 34b694d487..62c7cd92d1 100644
--- a/lisp/org/org-clock.el
+++ b/lisp/org/org-clock.el
@@ -1622,9 +1622,10 @@ org-clock-out
 						"\\>"))))
 		  (org-todo org-clock-out-switch-to-state))))))
 	  (force-mode-line-update)
-	  (message (concat "Clock stopped at %s after "
-			   (org-duration-from-minutes (+ (* 60 h) m)) "%s")
-		   te (if remove " => LINE REMOVED" ""))
+	  (message (if remove
+		       "Clock stopped at %s after %s => LINE REMOVED"
+		     "Clock stopped at %s after %s")
+		   te (org-duration-from-minutes (+ (* 60 h) m)))
 	  (run-hooks 'org-clock-out-hook)
 	  (unless (org-clocking-p)
 	    (setq org-clock-current-task nil)))))))
@@ -1925,13 +1926,14 @@ org-clock-display
 		    nil 'local))))
     (let* ((h (/ org-clock-file-total-minutes 60))
 	   (m (- org-clock-file-total-minutes (* 60 h))))
-      (message (concat (format "Total file time%s: "
-			       (cond (todayp " for today")
-				     (customp " (custom)")
-				     (t "")))
-		       (org-duration-from-minutes
-			org-clock-file-total-minutes)
-		       " (%d hours and %d minutes)")
+      (message (cond
+		(todayp
+		 "Total file time for today: %s (%d hours and %d minutes)")
+		(customp
+		 "Total file time (custom): %s (%d hours and %d minutes)")
+		(t
+		 "Total file time: %s (%d hours and %d minutes)"))
+	       (org-duration-from-minutes org-clock-file-total-minutes)
 	       h m))))
 
 (defvar-local org-clock-overlays nil)
diff --git a/lisp/org/org.el b/lisp/org/org.el
index 3a434d12df..e3c78ae90d 100644
--- a/lisp/org/org.el
+++ b/lisp/org/org.el
@@ -11878,7 +11878,8 @@ org-refile
 	    (when (featurep 'org-inlinetask)
 	      (org-inlinetask-remove-END-maybe))
 	    (setq org-markers-to-move nil)
-	    (message (concat actionmsg " to \"%s\" in file %s: done") (car it) file)))))))
+	    (message "%s to \"%s\" in file %s: done" actionmsg
+		     (car it) file)))))))
 
 (defun org-refile-goto-last-stored ()
   "Go to the location where the last refile was stored."
diff --git a/lisp/progmodes/ada-xref.el b/lisp/progmodes/ada-xref.el
index 28c52b0653..c9c923e1d6 100644
--- a/lisp/progmodes/ada-xref.el
+++ b/lisp/progmodes/ada-xref.el
@@ -1133,8 +1133,7 @@ ada-goto-declaration
 	(ada-find-in-ali identlist other-frame)
       ;; File not found: print explicit error message
       (ada-error-file-not-found
-       (message (concat (error-message-string err)
-			(nthcdr 1 err))))
+       (message "%s%s" (error-message-string err) (nthcdr 1 err)))
 
       (error
        (let ((ali-file (ada-get-ali-file-name (ada-file-of identlist))))
diff --git a/lisp/progmodes/idlwave.el b/lisp/progmodes/idlwave.el
index 476d935e8a..25bc788ffc 100644
--- a/lisp/progmodes/idlwave.el
+++ b/lisp/progmodes/idlwave.el
@@ -5588,7 +5588,7 @@ idlwave-scan-library-catalogs
 	     (mapcar 'car idlwave-path-alist)))
 	  (old-libname "")
 	  dir-entry dir catalog all-routines)
-      (if message-base (message message-base))
+      (if message-base (message "%s" message-base))
       (while (setq dir (pop dirs))
 	(catch 'continue
 	  (when (file-readable-p
@@ -5603,8 +5603,7 @@ idlwave-scan-library-catalogs
 		     message-base
 		     (not (string= idlwave-library-catalog-libname
 				   old-libname)))
-		(message "%s" (concat message-base
-				      idlwave-library-catalog-libname))
+		(message "%s%s" message-base idlwave-library-catalog-libname)
 		(setq old-libname idlwave-library-catalog-libname))
 	      (when idlwave-library-catalog-routines
 		(setq all-routines
@@ -5618,7 +5617,7 @@ idlwave-scan-library-catalogs
 		       (setq dir-entry (assoc dir idlwave-path-alist)))
 	      (idlwave-path-alist-add-flag dir-entry 'lib)))))
       (unless no-load (setq idlwave-library-catalog-routines all-routines))
-      (if message-base (message (concat message-base "done"))))))
+      (if message-base (message "%sdone" message-base)))))
 
 ;;----- Communicating with the Shell -------------------
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 17:19                                   ` Paul Eggert
@ 2019-03-07 18:24                                     ` martin rudalics
  2019-03-07 18:44                                       ` Paul Eggert
  2019-03-07 20:22                                     ` Eli Zaretskii
  1 sibling, 1 reply; 145+ messages in thread
From: martin rudalics @ 2019-03-07 18:24 UTC (permalink / raw)
  To: Paul Eggert, Eli Zaretskii, rms; +Cc: emacs-devel, juri

These

-	 (message (concat channel " " reason)))
+	 (message "%s %s" channel reason))
      (rcirc-print process sender "PART" channel message)
      ;; print in private chat buffer if it exists
      (when (rcirc-get-buffer (rcirc-buffer-process) sender)
@@ -2697,7 +2697,7 @@ rcirc-handler-KICK
    (let* ((channel (car args))
  	 (nick (cadr args))
  	 (reason (nth 2 args))
-	 (message (concat nick " " channel " " reason)))
+	 (message "%s %s %s" nick channel reason))

get me here

In toplevel form:
../../lisp/net/rcirc.el:2685:1:Warning: Malformed `let*' binding: (message "%s
     %s" channel reason)
../../lisp/net/rcirc.el:2696:1:Warning: Malformed `let*' binding: (message "%s
     %s %s" nick channel reason)

martin



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 18:24                                     ` martin rudalics
@ 2019-03-07 18:44                                       ` Paul Eggert
  0 siblings, 0 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-07 18:44 UTC (permalink / raw)
  To: martin rudalics, Eli Zaretskii, rms; +Cc: emacs-devel, juri

On 3/7/19 10:24 AM, martin rudalics wrote:
> ../../lisp/net/rcirc.el:2685:1:Warning: Malformed `let*' binding:
> (message "%s
>     %s" channel reason) 

Oops. Thanks for reporting that. I fixed it in master.




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 17:19                                   ` Paul Eggert
  2019-03-07 18:24                                     ` martin rudalics
@ 2019-03-07 20:22                                     ` Eli Zaretskii
  2019-03-07 22:25                                       ` Paul Eggert
  2019-03-08  4:18                                       ` Richard Stallman
  1 sibling, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-07 20:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: juri, rms, emacs-devel

> Cc: emacs-devel@gnu.org, juri@linkov.net
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 7 Mar 2019 09:19:35 -0800
> 
> This limitation will cause problems with Elisp code that does extensive
> parsing or processing of English syntax (doctor.el, say), and that sort
> of Elisp code will remain English-only (unless someone takes the time to
> i18nize them specially). However most Elisp code does not parse English
> or generate idiomatic English on the fly: instead, it uses a fixed,
> stilted style that can routinely be converted to calls like (message
> FORMAT ARG1 ARG2 ...) where FORMAT is translated and the ARG values are not.
> 
> To get a quick feel for this issue, I did a simple grep for the string
> '(message (concat' in the Emacs source code. I found 41 instances of
> this string.

But 'message' is just a representative of a class of such functions.
There are others: 'signal', 'error', 'user-error', 'princ', 'format',
and probably some more I'm missing.  So the actual number of
occurrences is larger than the 40 you found.

I guess I'm saying that we should think some more whether we indeed
want to give up marking translatable strings and instead rely on some
functions always translating their argument strings.  Perhaps doing so
will impose restrictions on what a Lisp program can do, and we don't
want to live with such restrictions without some fire escape, in the
form of explicitly translated strings?

In general, I think we should not blindly accept any technique used
for localization, because Emacs is so much different from a typical
console program written in C.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 20:22                                     ` Eli Zaretskii
@ 2019-03-07 22:25                                       ` Paul Eggert
  2019-03-08  7:29                                         ` Eli Zaretskii
  2019-03-08  4:18                                       ` Richard Stallman
  1 sibling, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-07 22:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: juri, rms, emacs-devel

On 3/7/19 12:22 PM, Eli Zaretskii wrote:
>
> 'message' is just a representative of a class of such functions.
> There are others: 'signal', 'error', 'user-error', 'princ', 'format',
> and probably some more I'm missing.  So the actual number of
> occurrences is larger than the 40 you found.

Yes, of course. And even for 'message', all I searched for was the
string '(message (concat', which is just a fraction of the calls to
'message' that will need to be reworked. That search was not an attempt
to count all the problems we'd run into; it merely was a sample of the
problems. If the sample is representative then each individual problem
should be relatively easy to solve.

> whether we indeed
> want to give up marking translatable strings and instead rely on some
> functions always translating their argument strings.

We could mark each translatable string by hand. But this would make for
more churn to the source code and would be more work. It's hard to see
why that would be a win, compared to the reasonably-common practice of
marking some well-known functions as doing translations automatically.

> Perhaps doing so
> will impose restrictions on what a Lisp program can do, and we don't
> want to live with such restrictions without some fire escape, in the
> form of explicitly translated strings?

One can easily work around any such restrictions by having a variant of
'message' that does not translate its format argument. We're already
doing this for translation of '`', by having two functions 'format' and
'format-message': the former does not translate '`', the latter does. A
similar approach can work for natural-language translation and 'message'.

> we should not blindly accept any technique used
> for localization, because Emacs is so much different from a typical
> console program written in C.

Of course we should not accept techniques blindly. We should use
techniques with our eyes open, based on experience. That being said,
this discussion suggests that Emacs is not really that much of a special
case aside from its size. If so, there is little need to reinvent the
i18n wheel just for Emacs, and there is a real advantage to reusing
existing GNU technology in this area rather than trying to reinvent it.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 22:25                                       ` Paul Eggert
@ 2019-03-08  7:29                                         ` Eli Zaretskii
  0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-08  7:29 UTC (permalink / raw)
  To: Paul Eggert; +Cc: juri, rms, emacs-devel

> Cc: rms@gnu.org, emacs-devel@gnu.org, juri@linkov.net
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 7 Mar 2019 14:25:30 -0800
> 
> > whether we indeed
> > want to give up marking translatable strings and instead rely on some
> > functions always translating their argument strings.
> 
> We could mark each translatable string by hand.

No, I didn't mean "each", I meant just some, hopefully a small
minority.  Because most of the use cases are probably easy enough to
change so that strings could be collected by a tool, and 'message' and
its ilk could then translate them automatically.  Having an explicit
translation function would then be that "fire escape" for when
converting code not to compute strings would be too painful.

> > Perhaps doing so
> > will impose restrictions on what a Lisp program can do, and we don't
> > want to live with such restrictions without some fire escape, in the
> > form of explicitly translated strings?
> 
> One can easily work around any such restrictions by having a variant of
> 'message' that does not translate its format argument.

We could, but I don't see how that would help.  If a string is not
found in the catalog(s), it will be output untranslated anyway, so why
do we need a separate function?

> this discussion suggests that Emacs is not really that much of a special
> case aside from its size.

I'm not sure I agree.  I think the fact that Emacs is written mostly
in Lisp and not in a procedural compiled language will make another
qualitative difference.

> there is a real advantage to reusing existing GNU technology in this
> area rather than trying to reinvent it.

Where it fits, sure.  Especially we should strive hard to use the PO
files for catalogs, because that affects the translation teams.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 20:22                                     ` Eli Zaretskii
  2019-03-07 22:25                                       ` Paul Eggert
@ 2019-03-08  4:18                                       ` Richard Stallman
  1 sibling, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-08  4:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: juri, eggert, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > But 'message' is just a representative of a class of such functions.
  > There are others: 'signal', 'error', 'user-error', 'princ', 'format',
  > and probably some more I'm missing.  So the actual number of
  > occurrences is larger than the 40 you found.

Some of them should be handled in the same way as 'message'.  But not
'format' -- it can be used for various things and some should not be
translated.

  > I guess I'm saying that we should think some more whether we indeed
  > want to give up marking translatable strings

Of course we need an explicit way to mark translatable strings -- but
we should also adopt short cuts (like recognizing first arg of
'message') so that a large fraction of these strings don't need to be
explicitly marked.

If we are going to handle translation, this is the obvious best way,
so let's not worry about the precise details.  It will get done.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 14:46                                 ` Eli Zaretskii
  2019-03-07 17:19                                   ` Paul Eggert
@ 2019-03-08  4:11                                   ` Richard Stallman
  1 sibling, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-08  4:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, emacs-devel, juri

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > That would happen if the catalog is made ONLY by scanning the source.
  > > That's why I suggested a feature to record whatever nontrivial format
  > > strings are passed to 'message' and are not in the catalog.

  > Such a feature will only help when a given call to 'message' produce a
  > small number of fixed text strings.  If the text it produces includes
  > some non-deterministic ingredient, this method will not help.

That is true.  But it is a comparatively small problem,
because those cases are a small minority.

The approach I have in mind is to make several mechanisms,
each designed to handle a large fraction of cases easily,
and leave the exceptions to be handled less easily.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-05 21:58                         ` Emacs i18n Juri Linkov
  2019-03-06  2:16                           ` Richard Stallman
@ 2019-03-06 17:30                           ` Eli Zaretskii
  2019-03-06 18:09                           ` Eli Zaretskii
  2 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-06 17:30 UTC (permalink / raw)
  To: Juri Linkov; +Cc: eggert, rms, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: Eli Zaretskii <eliz@gnu.org>,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Tue, 05 Mar 2019 23:58:25 +0200
> 
> One of the main decisions that has to be made is whether to wrap all
> user-facing translatable strings in all Lisp files using a macro/function
> 'gettext' (alias '_') explicitly like is implemented in XEmacs' I18N3
> that would help to extract translations from the source code, or to use
> a low-level implicit translation without changing the existing code like
> is implemented for handling text-quoting-style in format strings.
> The latter will even allow translation of strings that a package author
> forgot to mark with '_'.

I'd encourage people who want or consider working on this to read past
discussions about related topics.  Some very important conclusions and
ideas came out of those discussions, and it would be a pity if we'd
need to reiterate all of what was already said and argued time and
again, instead of starting from where those past discussions ended.

Significant discussions of this happened in Dec 2001, in July 2007,
and lately in Apr 2017.  Some of those are quite long, but please do
read them, even if you were part of those discussions.  This current
discussion will be much more fruitful if we first recollect what we
already talked over.

> Depending on this decision a translation file format has to be selected,
> be it flat Gettext PO format files or even some YAML-like hierarchical
> Lisp structures with scopes.

The first alternative we should consider is to use the PO format,
because that's what translation teams out there are used to work with.
If it turns out that we cannot use the PO format for some good reasons
(which will have to be very good), we can consider other formats, but
translation teams will be in general very unhappy about that.

And I think these technicalities are not the first, let alone main,
decisions we must make.  They are important, but there are more
important and complex problems we need to address first.  I will talk
about this separately.

Thanks.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-05 21:58                         ` Emacs i18n Juri Linkov
  2019-03-06  2:16                           ` Richard Stallman
  2019-03-06 17:30                           ` Eli Zaretskii
@ 2019-03-06 18:09                           ` Eli Zaretskii
  2019-03-06 19:39                             ` Paul Eggert
                                               ` (2 more replies)
  2 siblings, 3 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-06 18:09 UTC (permalink / raw)
  To: Juri Linkov; +Cc: eggert, rms, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: Eli Zaretskii <eliz@gnu.org>,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Tue, 05 Mar 2019 23:58:25 +0200
> 
> One of the main decisions that has to be made is whether to wrap all
> user-facing translatable strings in all Lisp files using a macro/function
> 'gettext'

First, AFAIR the conclusion back when this was discussed was that we
might not need to mark the translatable strings, because almost all of
them should be translatable.  If anything, we might consider marking
strings that do NOT need to be translated, as they are a very small
minority.  Just look at the strings in a typical Emacs source file and
try to find strings that you wouldn't want translated.  Unlike some
other programs, Emacs almost never says something that is not meant to
be read and understood by the user.

Second, I don't understand why we are still talking about 'message'.
Most of the user interaction in Emacs that will benefit the most from
translation is not messages we show in the echo area: Emacs actually
doesn't chatter there too much.  Most of the stuff that IMO is much
more important to have translated are the doc strings.  It's no
coincidence that Emacs has around 5000 calls to 'message', but almost
50000 doc strings, 10 times more than echo-area messages.  So even if
we do decide to attack the 'message' part first, we should consider
the doc strings as well, so that whatever infrastructure we develop
for messages will work for doc strings as well.  And that adds more
issues that the basic design must solve or be capable of solving.

Then there are some seemingly minor technical issues, but I think
Emacs will force us to deal with them up front, because Emacs is so
much different from a typical localized text-mode program.  Some of
the issues that came up in the past:

 . Do we use a separate message catalog for each Lisp package, or a
   single catalog for all of Emacs?  Each alternative has its merits
   and demerits.  For example, if we go with separate catalogs, then
   how do we make the correct bindtextdomain call, given that packages
   call each other?  If we go for a single catalog, how do we support
   installing and loading a new package without exiting Emacs?

 . How to specify which target language to use?  The locale is not
   necessarily correct, e.g., when editing with Tramp.  Also, since
   translating all of Emacs is such a humongous job, it's quite
   possible that some languages will have little or no translations,
   and the respective users might want to use translations for a
   "fallback" language, which they prefer to English.

 . Many user-facing text messages include portions that we generate
   directly from symbol names, which are of course in English.  We
   should have some idea for how to deal with that.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 18:09                           ` Eli Zaretskii
@ 2019-03-06 19:39                             ` Paul Eggert
  2019-03-06 19:49                               ` Eli Zaretskii
  2019-03-06 19:47                             ` Paul Eggert
  2019-03-07  3:44                             ` Richard Stallman
  2 siblings, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-06 19:39 UTC (permalink / raw)
  To: Eli Zaretskii, Juri Linkov; +Cc: rms, emacs-devel

On 3/6/19 10:09 AM, Eli Zaretskii wrote:
> we might consider marking
> strings that do NOT need to be translated, as they are a very small
> minority.  Just look at the strings in a typical Emacs source file and
> try to find strings that you wouldn't want translated.  Unlike some
> other programs, Emacs almost never says something that is not meant to
> be read and understood by the user.

My impression is just the opposite. Of course it depends on the module,
but I just now took a census of todo-mode.el (which I happened to be
editing anyway) and looked at the first 300 lines of source code (at
which point I got tired of counting). I counted 24 strings that should
not be translated, and 5 strings that should be. (I did not count doc
strings, which obviously should all be translated and shouldn't need to
be marked.)

Here are the strings needing translation:

"==--== DONE "
"DONE "
"Invalid value: must be distinct from `todo-item-mark'"
%s category %d: %s"
"Invalid value: must be a positive integer"

and here are the strings that don't need translation:

"todo/"
"\\.toda\\'"
"\\.todo\\'"
"--==-- "
"="
"["
"]"
"*"
"\\(?4:\\(?5:"
"\\)\\|"
"\\(?6:%s\\)"
"\\(?7:[0-9]+\\|\\*\\)"
"\\(?8:[0-9]+\\|\\*\\)"
"-?\\(?9:[0-9]+\\|\\*\\)"
""
"\\)"
"^\\("
"\\|"
"\\)?"
"^\\["
"\\("
"\\|"
"\\)"
""




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 19:39                             ` Paul Eggert
@ 2019-03-06 19:49                               ` Eli Zaretskii
  2019-03-07  1:33                                 ` Paul Eggert
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-06 19:49 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel, rms, juri

> Cc: rms@gnu.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 6 Mar 2019 11:39:50 -0800
> 
> On 3/6/19 10:09 AM, Eli Zaretskii wrote:
> > we might consider marking
> > strings that do NOT need to be translated, as they are a very small
> > minority.  Just look at the strings in a typical Emacs source file and
> > try to find strings that you wouldn't want translated.  Unlike some
> > other programs, Emacs almost never says something that is not meant to
> > be read and understood by the user.
> 
> My impression is just the opposite. Of course it depends on the module,
> but I just now took a census of todo-mode.el (which I happened to be
> editing anyway) and looked at the first 300 lines of source code (at
> which point I got tired of counting). I counted 24 strings that should
> not be translated, and 5 strings that should be.

We are miscommunicating: I meant strings passed to 'message' and its
ilk, not just any kind of strings.  It goes without saying that most
strings in our sources don't need to be translated, but that's not
what we are discussing.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 19:49                               ` Eli Zaretskii
@ 2019-03-07  1:33                                 ` Paul Eggert
  2019-03-07  3:30                                   ` Eli Zaretskii
  2019-03-07  4:35                                   ` Jean-Christophe Helary
  0 siblings, 2 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-07  1:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, rms, juri

On 3/6/19 11:49 AM, Eli Zaretskii wrote:
> We are miscommunicating: I meant strings passed to 'message' and its
> ilk, not just any kind of strings.

In that case, the solution that Richard proposed should suffice for most
cases. That is, in most cases we shouldn't need to change the Elisp
source code; all we need is for xgettext (or its equivalent) to consider
the first argument of 'message' to be a translatable string. This is a
standard feature of xgettext (see its --keyword argument).

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  1:33                                 ` Paul Eggert
@ 2019-03-07  3:30                                   ` Eli Zaretskii
  2019-03-07 16:06                                     ` Paul Eggert
  2019-03-07  4:35                                   ` Jean-Christophe Helary
  1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-07  3:30 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel, rms, juri

> Cc: juri@linkov.net, rms@gnu.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 6 Mar 2019 17:33:26 -0800
> 
> On 3/6/19 11:49 AM, Eli Zaretskii wrote:
> > We are miscommunicating: I meant strings passed to 'message' and its
> > ilk, not just any kind of strings.
> 
> In that case, the solution that Richard proposed should suffice for most
> cases. That is, in most cases we shouldn't need to change the Elisp
> source code; all we need is for xgettext (or its equivalent) to consider
> the first argument of 'message' to be a translatable string. This is a
> standard feature of xgettext (see its --keyword argument).

This will solve the string extraction part.  But how will the actual
translation happen?  As I wrote elsewhere, I don't see how relying on
the function to perform the extraction will work with non-fixed
strings.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  3:30                                   ` Eli Zaretskii
@ 2019-03-07 16:06                                     ` Paul Eggert
  0 siblings, 0 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-07 16:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, rms, juri

On 3/6/19 7:30 PM, Eli Zaretskii wrote:
> I don't see how relying on
> the function to perform the extraction will work with non-fixed
> strings.

Yes, if a caller computes a string and then passes it to 'message',
xgettext's static analysis won't find the string. Although these calls
are in the minority, they do happen, and they'll need to be rewritten.
This is standard practice when any application is internationalized, and
I've already given an example of this.

Of course Emacs is a much bigger project than a small program like 'cat'
or 'uniq', and so Emacs will take much more work to internationalize.
But this is a problem of quantity, not of technology. That is,
translators will need to do more work than usual (as there are more
messages to translate) and developers will need to do some more work (as
there are more "tricky" uses of 'message' in Emacs than there were
"tricky" uses of fprintf in 'cat'.) However, the standard GNU
internationalization technology should work just fine with Emacs.

> E.g., what to do
> with Org, which is in the core, but also distributed separately.
A simple way to address that problem is to have Org use and ship the
same message catalog that Emacs does. Alternatively, Org could ship a
separate message catalog that contains only Org's messages and is
therefore a subset of the Emacs catalog. However, I doubt whether the
hassle of doing the latter would be worth the effort.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  1:33                                 ` Paul Eggert
  2019-03-07  3:30                                   ` Eli Zaretskii
@ 2019-03-07  4:35                                   ` Jean-Christophe Helary
  2019-03-07 16:04                                     ` Paul Eggert
                                                       ` (2 more replies)
  1 sibling, 3 replies; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-07  4:35 UTC (permalink / raw)
  To: emacs-devel

> On Mar 7, 2019, at 10:33, Paul Eggert <eggert@cs.ucla.edu> wrote:
> 
> On 3/6/19 11:49 AM, Eli Zaretskii wrote:
>> We are miscommunicating: I meant strings passed to 'message' and its ilk, not just any kind of strings.
> 
> In that case, the solution that Richard proposed should suffice for most cases. That is, in most cases we shouldn't need to change the Elisp source code; all we need is for xgettext (or its equivalent) to consider the first argument of 'message' to be a translatable string. This is a standard feature of xgettext (see its --keyword argument).

Yes but... The first argument of message is often a lisp expression that generates natural language strings programatically. That part will have to be modified (although far from perfect, please check what I did on packages.el if what I wrote above is not clear).

ps: what is the proper way to reply to this list ? Keep everybody in Cc or remove the Cc and only keep the list address ?

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  4:35                                   ` Jean-Christophe Helary
@ 2019-03-07 16:04                                     ` Paul Eggert
  2019-03-08  4:09                                     ` Richard Stallman
  2019-03-11 21:48                                     ` Juri Linkov
  2 siblings, 0 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-07 16:04 UTC (permalink / raw)
  To: Jean-Christophe Helary, emacs-devel

On 3/6/19 8:35 PM, Jean-Christophe Helary wrote:

> in most cases we shouldn't need to change the Elisp source code; all we need is for xgettext (or its equivalent) to consider the first argument of 'message' to be a translatable string. This is a standard feature of xgettext (see its --keyword argument).
> Yes but... The first argument of message is often a lisp expression

Of course; we are on the same page here. Most cases of 'message' should
be fine, but there will often be exceptions that we do need to rewrite.
These exceptions are not urgent: we can fix them as we find time.

> ps: what is the proper way to reply to this list ? Keep everybody in Cc or remove the Cc and only keep the list address ?

I typically just reply to whatever my email software defaults to.
Occasionally I'll remove a Cc if I think that particular person probably
won't care about my reply or I know the person is on a mailing list I'm
already replying to; but this is optional and takes time and I typically
don't bother.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  4:35                                   ` Jean-Christophe Helary
  2019-03-07 16:04                                     ` Paul Eggert
@ 2019-03-08  4:09                                     ` Richard Stallman
  2019-03-11 21:48                                     ` Juri Linkov
  2 siblings, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-08  4:09 UTC (permalink / raw)
  To: Jean-Christophe Helary; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Yes but... The first argument of message is often a lisp
  > expression that generates natural language strings
  > programatically. That part will have to be modified (although far
  > from perfect, please check what I did on packages.el if what I
  > wrote above is not clear).

Those cases will need to be modified.
Such is life.  It may take time to get them all, but we will.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  4:35                                   ` Jean-Christophe Helary
  2019-03-07 16:04                                     ` Paul Eggert
  2019-03-08  4:09                                     ` Richard Stallman
@ 2019-03-11 21:48                                     ` Juri Linkov
  2019-03-11 22:51                                       ` Paul Eggert
                                                         ` (2 more replies)
  2 siblings, 3 replies; 145+ messages in thread
From: Juri Linkov @ 2019-03-11 21:48 UTC (permalink / raw)
  To: Jean-Christophe Helary; +Cc: emacs-devel

>> In that case, the solution that Richard proposed should suffice for most
>> cases. That is, in most cases we shouldn't need to change the Elisp
>> source code; all we need is for xgettext (or its equivalent) to consider
>> the first argument of 'message' to be a translatable string. This is
>> a standard feature of xgettext (see its --keyword argument).
>
> Yes but... The first argument of message is often a lisp expression that
> generates natural language strings programatically. That part will have to
> be modified (although far from perfect, please check what I did on
> packages.el if what I wrote above is not clear).

Please note that you have to handle not only format-strings of ‘message’,
but also ‘error’ and even more low-level ‘format’, i.e. all these

  (error STRING &rest ARGS)
  (message FORMAT-STRING &rest ARGS)
  (format-message STRING &rest OBJECTS)
  (format STRING &rest OBJECTS)

because there are many places that construct the string arguments
of ‘message’ using ‘format’ like in ‘perform-replace’:

	(message "Replaced %d occurrences%s"
		 replace-count
		 (if (> (+ skip-read-only-count
			   skip-filtered-count
			   skip-invisible-count)
                        0)
		     (format " (skipped %s)"
			     (mapconcat
			      #'identity
			      (delq nil (list
					 (if (> skip-read-only-count 0)
					     (format "%s read-only"
						     skip-read-only-count))
					 (if (> skip-invisible-count 0)
					     (format "%s invisible"
						     skip-invisible-count))
					 (if (> skip-filtered-count 0)
					     (format "%s filtered out"
						     skip-filtered-count))))
			      ", "))
		   ""))



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-11 21:48                                     ` Juri Linkov
@ 2019-03-11 22:51                                       ` Paul Eggert
  2019-03-12 21:45                                         ` Juri Linkov
  2019-03-11 23:59                                       ` Jean-Christophe Helary
  2019-03-12  9:16                                       ` Michael Albinus
  2 siblings, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-11 22:51 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Jean-Christophe Helary, emacs-devel

On 3/11/19 2:48 PM, Juri Linkov wrote:

> Please note that you have to handle not only format-strings of
> ‘message’, but also ‘error’ and even more low-level ‘format’, i.e. all
> these (error STRING &rest ARGS) (message FORMAT-STRING &rest ARGS)
> (format-message STRING &rest OBJECTS) (format STRING &rest OBJECTS)
>
I expect that 'format' won't translate its first argument, whereas
'error', 'message', and 'format-message' will. This will be for the same
reason that 'format' does not translate quotes.

> there are many places that construct the string arguments of ‘message’
> using ‘format’ like in ‘perform-replace’:
>
Yes, quite right. These places will need to be redone so that the
translation will work properly. Here's a first cut at how to redo the
perform-replace code that you mentioned (this could get fancier if needed):

  (nmessage replace-count
            "Replaced %d occurrence%s"
            "Replaced %d occurrences%s"
            replace-count
            (if (> (+ skip-read-only-count
                      skip-filtered-count
                      skip-invisible-count)
                   0)
                (format-message
                 " (skipped %s)"
                 (mapconcat
                  #'identity
                  (delq nil (list
                             (if (> skip-read-only-count 0)
                                 (format-message "%s read-only"
                                                 skip-read-only-count))
                             (if (> skip-invisible-count 0)
                                 (format-message "%s invisible"
                                                 skip-invisible-count))
                             (if (> skip-filtered-count 0)
                                 (format-message "%s filtered out"
                                                 skip-filtered-count))))
                  (gettext ", ")))
              ""))




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-11 22:51                                       ` Paul Eggert
@ 2019-03-12 21:45                                         ` Juri Linkov
  2019-03-17 21:23                                           ` Juri Linkov
  0 siblings, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-12 21:45 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jean-Christophe Helary, emacs-devel

>> Please note that you have to handle not only format-strings of
>> ‘message’, but also ‘error’ and even more low-level ‘format’, i.e. all
>> these (error STRING &rest ARGS) (message FORMAT-STRING &rest ARGS)
>> (format-message STRING &rest OBJECTS) (format STRING &rest OBJECTS)
>>
> I expect that 'format' won't translate its first argument, whereas
> 'error', 'message', and 'format-message' will. This will be for the same
> reason that 'format' does not translate quotes.

Then it should be sufficient to add a gettext call to 'format-message' only,
because all other related functions 'message', 'error', 'tramp-message',
'tramp-error', etc. all they use 'format-message' directly or indirectly.

If someone would create a new branch with all standard gettext prerequisites
like Makefiles, headers, textdomain bindings, locale settings,
i.e. everything that is required to translate other GNU applications,
then I could help with testing and finding more problematic places.
Only then we could see how well gettext (designed for static translation)
performs in more dynamic Emacs environment.

>> there are many places that construct the string arguments of ‘message’
>> using ‘format’ like in ‘perform-replace’:
>>
> Yes, quite right. These places will need to be redone so that the
> translation will work properly. Here's a first cut at how to redo the
> perform-replace code that you mentioned (this could get fancier if needed):
>
>   (nmessage replace-count
>             "Replaced %d occurrence%s"
>             "Replaced %d occurrences%s"
>             replace-count

IIUC, using standard gettext functions this would rather correspond to

  (message (ngettext "Replaced %1$d occurrence%s"
                     "Replaced %1$d occurrences%s"
                     replace-count)
           replace-count
           (if (> (+ skip-read-only-count
                     skip-filtered-count
                     skip-invisible-count)
                  0)
               (format-message
                " (skipped %s)"
                (mapconcat
                 #'identity
                 (delq nil (list
                            (if (> skip-read-only-count 0)
                                (format-message "%s read-only"
                                                skip-read-only-count))
                            (if (> skip-invisible-count 0)
                                (format-message "%s invisible"
                                                skip-invisible-count))
                            (if (> skip-filtered-count 0)
                                (format-message "%s filtered out"
                                                skip-filtered-count))))
                 (gettext ", ")))
             ""))



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-12 21:45                                         ` Juri Linkov
@ 2019-03-17 21:23                                           ` Juri Linkov
  2019-03-18 21:20                                             ` Juri Linkov
  0 siblings, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-17 21:23 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2627 bytes --]

>>> Please note that you have to handle not only format-strings of
>>> ‘message’, but also ‘error’ and even more low-level ‘format’, i.e. all
>>> these (error STRING &rest ARGS) (message FORMAT-STRING &rest ARGS)
>>> (format-message STRING &rest OBJECTS) (format STRING &rest OBJECTS)
>>>
>> I expect that 'format' won't translate its first argument, whereas
>> 'error', 'message', and 'format-message' will. This will be for the same
>> reason that 'format' does not translate quotes.
>
> Then it should be sufficient to add a gettext call to 'format-message' only,
> because all other related functions 'message', 'error', 'tramp-message',
> 'tramp-error', etc. all they use 'format-message' directly or indirectly.

Maybe I'm too stupid to comprehend the complexity of this task in its entirety,
but I tried to install gettext infrastructure in Emacs with gettextize,
and then tried to run xgettext on source code, and see no technical problems.
What I tried is to run this command, and it extracts all messages:

  xgettext --from-code=UTF-8 -kformat-message -kmessage -kerror -ktramp-message -ktramp-error *.el

then this command extracts all Gnus messages into a separate file:

  xgettext --from-code=UTF-8 -kformat-message -kmessage -kerror gnus/*.el -o gnus_messages.po

this command extracts all menu items:

  xgettext --from-code=UTF-8 -kmenu-item *.el **/*.el -o menus.po

and this extracts all docstrings:

  xgettext --from-code=UTF-8 -kdefcustom:3 -kdefvar:3 -kdefun:3 *.el **/*.el -o docstrings.po

The size of docstrings.po is about 9MB, so perhaps it should reside in
a separate catalog defined by e.g.

  (defdomain emacs-docstrings

with semantics similar to defgroup, but I have no opinion about this.

I think this project urgently needs a coordinator: to negotiate with
package authors and translation teams about how to better split
translations to message catalogs.  So there are not so much technical
problems, but mostly organizational ones.

> IIUC, using standard gettext functions this would rather correspond to
>
>   (message (ngettext "Replaced %1$d occurrence%s"
>                      "Replaced %1$d occurrences%s"
>                      replace-count)

It seems better to start with this standard function
and add more optimizations like ‘nmessage’ later.

Other Lisp implementations use ‘ngettext’ as well, e.g.:
https://clisp.sourceforge.io/impnotes.html#ggettext

So I'm going to start with more obvious parts of the task
by fixing the current bugs of incorrect English syntax
in a forward-compatible way:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: i18n-ngettext.patch --]
[-- Type: text/x-diff, Size: 6767 bytes --]

diff --git a/lisp/subr.el b/lisp/subr.el
index 6c0ad00afa..1f000f77ad 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -342,6 +342,13 @@ define-error
          (delete-dups (copy-sequence (cons name conditions))))
     (when message (put name 'error-message message))))
 
+(defun ngettext (msgid msgid_plural n &optional _domain _category)
+  "Return the plural form of the translation for of MSGID and N.
+In the given DOMAIN, depending on the given CATEGORY.  MSGID and
+MSGID_PLURAL should be ASCII strings, and are normally the English singular
+and English plural variant of the message, respectively."
+  (if (/= n 1) msgid_plural msgid))
+
 ;; We put this here instead of in frame.el so that it's defined even on
 ;; systems where frame.el isn't loaded.
 (defun frame-configuration-p (object)
diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index a5427dd8b7..c0f47159c9 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -459,7 +459,7 @@ grep-mode-font-lock-keywords
      ;; remove match from grep-regexp-alist before fontifying
      ("^Grep[/a-zA-Z]* started.*"
       (0 '(face nil compilation-message nil help-echo nil mouse-face nil) t))
-     ("^Grep[/a-zA-Z]* finished with \\(?:\\(\\(?:[0-9]+ \\)?matches found\\)\\|\\(no matches found\\)\\).*"
+     ("^Grep[/a-zA-Z]* finished with \\(?:\\(\\(?:[0-9]+ \\)?match\\(?:es\\)? found\\)\\|\\(no matches found\\)\\).*"
       (0 '(face nil compilation-message nil help-echo nil mouse-face nil) t)
       (1 compilation-info-face nil t)
       (2 compilation-warning-face nil t))
@@ -552,7 +552,10 @@ grep-exit-message
       ;; so the buffer is still unmodified if there is no output.
       (cond ((and (zerop code) (buffer-modified-p))
 	     (if (> grep-num-matches-found 0)
-                 (cons (format "finished with %d matches found\n" grep-num-matches-found)
+                 (cons (format (ngettext "finished with %d match found\n"
+                                         "finished with %d matches found\n"
+                                         grep-num-matches-found)
+                               grep-num-matches-found)
                        "matched")
                '("finished with matches found\n" . "matched")))
 	    ((not (buffer-modified-p))
diff --git a/lisp/replace.el b/lisp/replace.el
index 59ad1a375b..318a9fb025 100644
--- a/lisp/replace.el
+++ b/lisp/replace.el
@@ -983,7 +983,10 @@ flush-lines
 		       (progn (forward-line 1) (point)))
         (setq count (1+ count))))
     (set-marker rend nil)
-    (when interactive (message "Deleted %d matching lines" count))
+    (when interactive (message (ngettext "Deleted %d matching line"
+					 "Deleted %d matching lines"
+					 count)
+			       count))
     count))
 
 (defun how-many (regexp &optional rstart rend interactive)
@@ -1032,9 +1035,10 @@ how-many
 	(if (= opoint (point))
 	    (forward-char 1)
 	  (setq count (1+ count))))
-      (when interactive (message "%d occurrence%s"
-				 count
-				 (if (= count 1) "" "s")))
+      (when interactive (message (ngettext "%d occurrence"
+					   "%d occurrences"
+					   count)
+				 count))
       count)))
 
 \f
@@ -1617,11 +1621,12 @@ occur-1
 		  (not (eq occur-excluded-properties t))))))
 	  (let* ((bufcount (length active-bufs))
 		 (diff (- (length bufs) bufcount)))
-	    (message "Searched %d buffer%s%s; %s match%s%s"
-		     bufcount (if (= bufcount 1) "" "s")
+	    (message "Searched %d %s%s; %s %s%s"
+		     bufcount
+		     (ngettext "buffer" "buffers" bufcount)
 		     (if (zerop diff) "" (format " (%d killed)" diff))
 		     (if (zerop count) "no" (format "%d" count))
-		     (if (= count 1) "" "es")
+		     (ngettext "match" "matches" count)
 		     ;; Don't display regexp if with remaining text
 		     ;; it is longer than window-width.
 		     (if (> (+ (length (or (get-text-property 0 'isearch-string regexp)
@@ -1856,14 +1861,15 @@ occur-engine
 		  (let ((beg (point))
 		        end)
 		    (insert (propertize
-			     (format "%d match%s%s%s in buffer: %s%s\n"
-				     matches (if (= matches 1) "" "es")
+			     (format "%d %s%s%s in buffer: %s%s\n"
+				     matches
+				     (ngettext "match" "matches" matches)
 				     ;; Don't display the same number of lines
 				     ;; and matches in case of 1 match per line.
 				     (if (= lines matches)
-				         "" (format " in %d line%s"
+				         "" (format " in %d %s"
 						    lines
-						    (if (= lines 1) "" "s")))
+						    (ngettext "line" "lines" lines)))
 				     ;; Don't display regexp for multi-buffer.
 				     (if (> (length buffers) 1)
 				         "" (occur-regexp-descr regexp))
@@ -1889,13 +1895,15 @@ occur-engine
 	(goto-char (point-min))
 	(let ((beg (point))
 	      end)
-	  (insert (format "%d match%s%s total%s:\n"
-			  global-matches (if (= global-matches 1) "" "es")
+	  (insert (format "%d %s%s total%s:\n"
+			  global-matches
+			  (ngettext "match" "matches" global-matches)
 			  ;; Don't display the same number of lines
 			  ;; and matches in case of 1 match per line.
 			  (if (= global-lines global-matches)
-			      "" (format " in %d line%s"
-					 global-lines (if (= global-lines 1) "" "s")))
+			      "" (format " in %d %s"
+					 global-lines
+					 (ngettext "line" "lines" global-lines)))
 			  (occur-regexp-descr regexp)))
 	  (setq end (point))
 	  (when title-face
@@ -2730,10 +2738,10 @@ perform-replace
                                            (1+ num-replacements))))))
                              (when (and (eq def 'undo-all)
                                         (null (zerop num-replacements)))
-                               (message "Undid %d %s" num-replacements
-                                        (if (= num-replacements 1)
-                                            "replacement"
-                                          "replacements"))
+                               (message (ngettext "Undid %d replacement"
+                                                  "Undid %d replacements"
+                                                  num-replacements)
+                                        num-replacements)
                                (ding 'no-terminate)
                                (sit-for 1)))
 			   (setq replaced nil last-was-undo t last-was-act-and-show nil)))
@@ -2859,9 +2867,10 @@ perform-replace
                       last-was-act-and-show     nil))))))
       (replace-dehighlight))
     (or unread-command-events
-	(message "Replaced %d occurrence%s%s"
+	(message (ngettext "Replaced %d occurrence%s"
+			   "Replaced %d occurrences%s"
+			   replace-count)
 		 replace-count
-		 (if (= replace-count 1) "" "s")
 		 (if (> (+ skip-read-only-count
 			   skip-filtered-count
 			   skip-invisible-count)

^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-17 21:23                                           ` Juri Linkov
@ 2019-03-18 21:20                                             ` Juri Linkov
  2019-03-18 21:55                                               ` Paul Eggert
  0 siblings, 1 reply; 145+ messages in thread
From: Juri Linkov @ 2019-03-18 21:20 UTC (permalink / raw)
  To: emacs-devel

> Other Lisp implementations use ‘ngettext’ as well, e.g.:
> https://clisp.sourceforge.io/impnotes.html#ggettext

And this command will extract ‘ngettext’ messages:

  xgettext --from-code=UTF-8 -kngettext:1,2 *.el **/*.el

Using only ‘ngettext’ has an additional advantage:
there will be no need to add more such functions as
nmessage, nerror, nuser-error, ntramp-error, etc.

But for cases when ‘message’ will receive a string
already translated by ‘ngettext’, e.g.:

  (message (ngettext "Replaced %d occurrence%s"
                     "Replaced %d occurrences%s"
                     replace-count)

we need to mark translated strings like Richard suggested:

  (defun ngettext (msgid msgid_plural n &optional _domain _category)
    "Return the plural form of the translation for of MSGID and N."
    (propertize (if (/= n 1) msgid_plural msgid) 'translated t))

so ‘format-message’ should check if its first argument is translated,
and not to call gettext again.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-18 21:20                                             ` Juri Linkov
@ 2019-03-18 21:55                                               ` Paul Eggert
  2019-03-19 20:40                                                 ` Juri Linkov
  0 siblings, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-18 21:55 UTC (permalink / raw)
  To: Juri Linkov; +Cc: emacs-devel

On 3/18/19 2:20 PM, Juri Linkov wrote:
> Using only ‘ngettext’ has an additional advantage:
> there will be no need to add more such functions as
> nmessage, nerror, nuser-error, ntramp-error, etc.

That's not a real advantage, as there is no need to add those functions
anyway. They are merely conveniences that we can either add or not add,
depending on whether the convenience in use is worth the hassle of
supporting and documenting the functions.

For example, suppose 'message' always translates its format argument and
that there is no 'nmessage' function. Then you can use 'message' this
way to handle plurals:

  (message "%s" (format (ngettext n "%d item" "%d items") n))

If we find expressions like the above to be common, we can easily write
an nmessage function in Lisp, so that the code can look like this instead:

  (nmessage n "%d item" "%d items" n)

but this is merely a convenience.

> ‘format-message’ should check if its first argument is translated,
> and not to call gettext again.
>
I'd rather not involve dynamic checking like that, as it's fragile and
more complicated to explain and a bit slower. format-message should
either always translate, or never translate. In practice, it'll be more
convenient for format-message to always translate, so I expect we should
do it that way.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-18 21:55                                               ` Paul Eggert
@ 2019-03-19 20:40                                                 ` Juri Linkov
  0 siblings, 0 replies; 145+ messages in thread
From: Juri Linkov @ 2019-03-19 20:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

>> ‘format-message’ should check if its first argument is translated,
>> and not to call gettext again.
>>
> I'd rather not involve dynamic checking like that, as it's fragile and
> more complicated to explain and a bit slower. format-message should
> either always translate, or never translate. In practice, it'll be more
> convenient for format-message to always translate, so I expect we should
> do it that way.

I see this as a kind of optimization.  But I don't know if it is necessary
until trying how fast gettext is on very large translation files
(if it hashes translations strings then should be fast enough).



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-11 21:48                                     ` Juri Linkov
  2019-03-11 22:51                                       ` Paul Eggert
@ 2019-03-11 23:59                                       ` Jean-Christophe Helary
  2019-03-12  9:16                                       ` Michael Albinus
  2 siblings, 0 replies; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-11 23:59 UTC (permalink / raw)
  To: emacs-devel



> On 2019/03/12, at 6:48, Juri Linkov <juri@linkov.net> wrote:
> 
>> Yes but... The first argument of message is often a lisp expression that
>> generates natural language strings programatically. That part will have to
>> be modified (although far from perfect, please check what I did on
>> packages.el if what I wrote above is not clear).
> 
> Please note that you have to handle not only format-strings of ‘message’,
> but also ‘error’ and even more low-level ‘format’, i.e. all these

I know. That's what I tried to do with packages.el There may be an expression or two that were too obscure for me but I think I managed to straighten all the strings there. Check the current version vs what was in the repository about 1 year ago (I don't remember when my fix was committed).

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-11 21:48                                     ` Juri Linkov
  2019-03-11 22:51                                       ` Paul Eggert
  2019-03-11 23:59                                       ` Jean-Christophe Helary
@ 2019-03-12  9:16                                       ` Michael Albinus
  2 siblings, 0 replies; 145+ messages in thread
From: Michael Albinus @ 2019-03-12  9:16 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Jean-Christophe Helary, emacs-devel

Juri Linkov <juri@linkov.net> writes:

> Please note that you have to handle not only format-strings of ‘message’,
> but also ‘error’ and even more low-level ‘format’, i.e. all these
>
>   (error STRING &rest ARGS)
>   (message FORMAT-STRING &rest ARGS)
>   (format-message STRING &rest OBJECTS)
>   (format STRING &rest OBJECTS)

There are even more functions to be considered. Tramp, for example, uses
consequently `tramp-message' instead of `message', and `tramp-error'
instead of `error'.

Likely, we shall provide a mean that a package like Tramp can add its
own entries to such a list of functions.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 18:09                           ` Eli Zaretskii
  2019-03-06 19:39                             ` Paul Eggert
@ 2019-03-06 19:47                             ` Paul Eggert
  2019-03-06 20:21                               ` Eli Zaretskii
  2019-03-07  3:44                             ` Richard Stallman
  2 siblings, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-06 19:47 UTC (permalink / raw)
  To: Eli Zaretskii, Juri Linkov; +Cc: rms, emacs-devel

On 3/6/19 10:09 AM, Eli Zaretskii wrote:
> even if
> we do decide to attack the 'message' part first, we should consider
> the doc strings as well

Absolutely. In some sense doc strings should be easier, since we
shouldn't need to make changes to existing code; all we need to do is
add some infrastructure that puts doc strings into a .po file and that
translates them when people ask for documentation.

>  . Do we use a separate message catalog for each Lisp package, or a
>    single catalog for all of Emacs?

We can start with a single catalog that handles core Emacs; we'll need
to do that anyway. We can deal with packages later.

>  . How to specify which target language to use?  The locale is not
>    necessarily correct, e.g., when editing with Tramp.  Also, since
>    translating all of Emacs is such a humongous job, it's quite
>    possible that some languages will have little or no translations,
>    and the respective users might want to use translations for a
>    "fallback" language, which they prefer to English.

It should be easy for Emacs users to specify a preferred locale for
messages, independently of what the system locale. Similarly, they can
specify a preferred fallback locale. All this is relatively easy to do
at the C level.

>  . Many user-facing text messages include portions that we generate
>    directly from symbol names, which are of course in English.  We
>    should have some idea for how to deal with that.

We start by leaving them as English, as that's easier. We can get
fancier later, if there's need.

The bottom line is that we don't need to have a complete solution in
order to start working on this.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 19:47                             ` Paul Eggert
@ 2019-03-06 20:21                               ` Eli Zaretskii
  2019-03-07  1:43                                 ` Paul Eggert
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-06 20:21 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel, rms, juri

> Cc: rms@gnu.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 6 Mar 2019 11:47:18 -0800
> 
> >  . Do we use a separate message catalog for each Lisp package, or a
> >    single catalog for all of Emacs?
> 
> We can start with a single catalog that handles core Emacs; we'll need
> to do that anyway. We can deal with packages later.

"Core" here being the C sources?  That's about 4% of the doc strings,
a drop in the sea.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 20:21                               ` Eli Zaretskii
@ 2019-03-07  1:43                                 ` Paul Eggert
  2019-03-07  3:31                                   ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-07  1:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, rms, juri

On 3/6/19 12:21 PM, Eli Zaretskii wrote:
>> We can start with a single catalog that handles core Emacs; we'll need
>> to do that anyway. We can deal with packages later.
> "Core" here being the C sources?  That's about 4% of the doc strings,
> a drop in the sea.

Sure, but it should be relatively easy to also grab the doc strings from
the Emacs core elisp code. GNU gettext already supports getting strings
from Elisp code (this was for XEmacs) and it should be a relatively
minor change to adapt it to also get doc strings.




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  1:43                                 ` Paul Eggert
@ 2019-03-07  3:31                                   ` Eli Zaretskii
  0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-07  3:31 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel, rms, juri

> Cc: juri@linkov.net, rms@gnu.org, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 6 Mar 2019 17:43:43 -0800
> 
> On 3/6/19 12:21 PM, Eli Zaretskii wrote:
> >> We can start with a single catalog that handles core Emacs; we'll need
> >> to do that anyway. We can deal with packages later.
> > "Core" here being the C sources?  That's about 4% of the doc strings,
> > a drop in the sea.
> 
> Sure, but it should be relatively easy to also grab the doc strings from
> the Emacs core elisp code. GNU gettext already supports getting strings
> from Elisp code (this was for XEmacs) and it should be a relatively
> minor change to adapt it to also get doc strings.

That gets back again to the problems I mentioned.  E.g., what to do
with Org, which is in the core, but also distributed separately.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-06 18:09                           ` Eli Zaretskii
  2019-03-06 19:39                             ` Paul Eggert
  2019-03-06 19:47                             ` Paul Eggert
@ 2019-03-07  3:44                             ` Richard Stallman
  2019-03-07 14:48                               ` Eli Zaretskii
  2 siblings, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-07  3:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, emacs-devel, juri

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Second, I don't understand why we are still talking about 'message'.
  > Most of the user interaction in Emacs that will benefit the most from
  > translation is not messages we show in the echo area: Emacs actually
  > doesn't chatter there too much.  Most of the stuff that IMO is much
  > more important to have translated are the doc strings.

I think it would be most natural to handle doc strings through
a special mechanism.  We have already had special mechanisms
for them -- I don't know whether we still do.  But it is easy
for the compiler to find them all and put them in a file
for translations.

   > . Do we use a separate message catalog for each Lisp package, or a
   >   single catalog for all of Emacs?  Each alternative has its merits
   >   and demerits.  For example, if we go with separate catalogs, then
   >   how do we make the correct bindtextdomain call, given that packages
   >   call each other?

I think they have to be separate, and we can use something like
lexical binding to specify the right one for each file.

This is worth a special mechamism for.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07  3:44                             ` Richard Stallman
@ 2019-03-07 14:48                               ` Eli Zaretskii
  2019-03-07 22:29                                 ` Juri Linkov
  2019-03-08  4:11                                 ` Richard Stallman
  0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-07 14:48 UTC (permalink / raw)
  To: rms; +Cc: eggert, emacs-devel, juri

> From: Richard Stallman <rms@gnu.org>
> Cc: juri@linkov.net, eggert@cs.ucla.edu, emacs-devel@gnu.org
> Date: Wed, 06 Mar 2019 22:44:10 -0500
> 
> I think it would be most natural to handle doc strings through
> a special mechanism.

Up to a point, perhaps.  We still should try to use .po files for
them, if at all possible, and perhaps also the gettext code that
supports looking up strings in .gmo catalogs generated from .po.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 14:48                               ` Eli Zaretskii
@ 2019-03-07 22:29                                 ` Juri Linkov
  2019-03-08  1:48                                   ` Jean-Christophe Helary
  2019-03-08  7:37                                   ` Eli Zaretskii
  2019-03-08  4:11                                 ` Richard Stallman
  1 sibling, 2 replies; 145+ messages in thread
From: Juri Linkov @ 2019-03-07 22:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, rms, emacs-devel

>> I think it would be most natural to handle doc strings through
>> a special mechanism.
>
> Up to a point, perhaps.  We still should try to use .po files for
> them, if at all possible, and perhaps also the gettext code that
> supports looking up strings in .gmo catalogs generated from .po.

The PO format is best suited for translation of one-liners like
messages and menu items, but I doubt that the PO format would be
the most efficient implementation for multi-line doc strings since
gettext uses the whole text of the doc string as a key to translation.
Whereas more efficient would be to use a Lisp symbol (function or
variable name) as a translation key.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 22:29                                 ` Juri Linkov
@ 2019-03-08  1:48                                   ` Jean-Christophe Helary
  2019-03-08  8:08                                     ` Eli Zaretskii
  2019-03-08  7:37                                   ` Eli Zaretskii
  1 sibling, 1 reply; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-08  1:48 UTC (permalink / raw)
  To: emacs-devel

> On Mar 8, 2019, at 7:29, Juri Linkov <juri@linkov.net> wrote:
> 
> The PO format is best suited for translation of one-liners like
> messages and menu items, but I doubt that the PO format would be the most efficient implementation for multi-line doc strings since gettext uses the whole text of the doc string as a key to translation.
> Whereas more efficient would be to use a Lisp symbol (function or variable name) as a translation key.

po4a is a commonly used perl utility that creates po files from a number of documentation formats including texinfo. The msgid is indeed the paragraph itself but nobody sees any "efficiency" issue in the process.

Since the emacs code is not a documentation format, there would be a need to find a different way to extract the doc strings, but using each doc string paragraph as a msgid is not a problem in itself.

Let's not forget that most if not all issues regarding formats and processes on the l10n side have mostly been solved decades ago.

I think what really needs to be discussed is:

• which strings do we extract
• how to rewrite the mix of code and strings
• how to extract the resulting strings
• how to process the translations for display in emacs

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08  1:48                                   ` Jean-Christophe Helary
@ 2019-03-08  8:08                                     ` Eli Zaretskii
  2019-03-08 15:11                                       ` Jean-Christophe Helary
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-08  8:08 UTC (permalink / raw)
  To: Jean-Christophe Helary; +Cc: emacs-devel

> From: Jean-Christophe Helary <brandelune@gmail.com>
> Date: Fri, 8 Mar 2019 10:48:46 +0900
> 
> I think what really needs to be discussed is:
> 
> • which strings do we extract
> • how to rewrite the mix of code and strings
> • how to extract the resulting strings
> • how to process the translations for display in emacs

Extraction is just a technicality, it can be done in either of several
possible ways.  We could use xgettext, or we could use a modification
of make-docfile (the latter is probably a must for collecting do
strings from C sources), or we could use po4a or something similar.
As long as the catalogs are PO files, we could even use a mix of
tools, if, for example, some of the tools is more convenient for Lisp,
but not for C.

And I don't understand what problems you see in the last item: what
should be done there other than display the translated string with
'message' or insert it into the *Help* buffer?

So I think you are bothered by stuff that is largely non-issues.  The
most important issues IMO are different: (a) what methodology of
extracting/marking translatable strings to choose so that this job
doesn't become infeasible; and (b) how to arrange the message catalogs
so that they will be easy to maintain and update, given the modular
nature of Emacs.  I think we should also take a better look at how the
built-in help facilities generate documentation and other displayable
strings from symbol names.  Macros such as define-minor-mode should
also be scrutinized to see if there are some special problems there.

Once this is done, the methodology decided, and the necessary tools
are available, the rest is just more or less mechanical work to
convert more and more parts of Emacs.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08  8:08                                     ` Eli Zaretskii
@ 2019-03-08 15:11                                       ` Jean-Christophe Helary
  2019-03-08 20:11                                         ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-08 15:11 UTC (permalink / raw)
  To: emacs-devel



> On Mar 8, 2019, at 17:08, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Jean-Christophe Helary <brandelune@gmail.com>
>> Date: Fri, 8 Mar 2019 10:48:46 +0900
>> 
>> I think what really needs to be discussed is:
>> 
>> • which strings do we extract
>> • how to rewrite the mix of code and strings
>> • how to extract the resulting strings
>> • how to process the translations for display in emacs
> 
> Extraction is just a technicality, it can be done in either of several
> possible ways.

Sure. I just meant that l10n issues (is PO "efficiency", etc.) are already solved, but i18n in general has to be implemented from scratch.

> And I don't understand what problems you see in the last item: what
> should be done there other than display the translated string with
> 'message' or insert it into the *Help* buffer?

As I wrote above: we have to implement everything from scratch.

> So I think you are bothered by stuff that is largely non-issues.  The
> most important issues IMO are different: (a) what methodology of
> extracting/marking translatable strings to choose so that this job
> doesn't become infeasible;

That's my 3 first points elegantly combined into one :)

> and (b) how to arrange the message catalogs
> so that they will be easy to maintain and update, given the modular
> nature of Emacs.

I'm not sure what you mean in "how to arrange ..." Do you mean: how to provide the l10n packages to translator communities ?

>  I think we should also take a better look at how the
> built-in help facilities generate documentation and other displayable
> strings from symbol names.  Macros such as define-minor-mode should
> also be scrutinized to see if there are some special problems there.
> 
> Once this is done, the methodology decided, and the necessary tools
> are available, the rest is just more or less mechanical work to
> convert more and more parts of Emacs.

Can't we start with a survey of the strings we want extracted in a given number of emacs core packages ?


Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08 15:11                                       ` Jean-Christophe Helary
@ 2019-03-08 20:11                                         ` Eli Zaretskii
  2019-03-09  2:44                                           ` Jean-Christophe Helary
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-08 20:11 UTC (permalink / raw)
  To: Jean-Christophe Helary; +Cc: emacs-devel

> From: Jean-Christophe Helary <brandelune@gmail.com>
> Date: Sat, 9 Mar 2019 00:11:24 +0900
> 
> > Extraction is just a technicality, it can be done in either of several
> > possible ways.
> 
> Sure. I just meant that l10n issues (is PO "efficiency", etc.) are already solved, but i18n in general has to be implemented from scratch.

No sure I understand: what part(s) we would need to implement from
scratch?  We already have the capability of inserting arbitrary
non-ASCII text into any buffer and displaying such text as echo area
messages.

> > and (b) how to arrange the message catalogs
> > so that they will be easy to maintain and update, given the modular
> > nature of Emacs.
> 
> I'm not sure what you mean in "how to arrange ..." Do you mean: how to provide the l10n packages to translator communities ?

No, I mean how many catalogs should we have and what should be their
granularity.  Also, how to merge several catalogs (the need for this
might disappear if, for example, we decide that each .el file will
have its own catalog), and how to load catalogs on demand when the
corresponding code is loaded/executed.

> Can't we start with a survey of the strings we want extracted in a given number of emacs core packages ?

How would such a survey help us?  We generally want all of the strings
that are displayed to the user translated.  We don't need any survey
for that decision.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08 20:11                                         ` Eli Zaretskii
@ 2019-03-09  2:44                                           ` Jean-Christophe Helary
  2019-03-09  6:40                                             ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Jean-Christophe Helary @ 2019-03-09  2:44 UTC (permalink / raw)
  To: emacs-devel

> On Mar 9, 2019, at 5:11, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Jean-Christophe Helary <brandelune@gmail.com>
>> Date: Sat, 9 Mar 2019 00:11:24 +0900
>> 
>>> Extraction is just a technicality, it can be done in either of several possible ways.
>> 
>> Sure. I just meant that l10n issues (is PO "efficiency", etc.) are already solved, but i18n in general has to be implemented from scratch.
> 
> No sure I understand: what part(s) we would need to implement from scratch?  We already have the capability of inserting arbitrary non-ASCII text into any buffer and displaying such text as echo area messages.

What I mean by "from scratch" is that we have the possibility to extract text and insert text, but i18n is inexistant in emacs. So we have to build an i18n system that works for emacs and that does not exist yet, at all.

Also, the "how to load catalogs on demand" point that you mention below is part of i18n and as you seem to say has to be developed from scratch.

>>> and (b) how to arrange the message catalogs so that they will be easy to maintain and update, given the modular nature of Emacs.
>> 
>> I'm not sure what you mean in "how to arrange ..." Do you mean: how to provide the l10n packages to translator communities ?
> 
> No, I mean how many catalogs should we have and what should be their granularity.

Isn't that related to the below item ?

> Also, how to merge several catalogs (the need for this might disappear if, for example, we decide that each .el file will have its own catalog),

Won't this depend on the extracting tool's options ? And wouldn't that be more practical in the first place to not merge anything but have one catalog per .el file ? (practical in terms of translation/testing/management, as far as I can tell from experience, etc.)

> and how to load catalogs on demand when the corresponding code is loaded/executed.

I guess you mean the technicalities involved in the obvious (?) "we check the user preferred locale and display the catalog corresponding to that locale" ?

>> Can't we start with a survey of the strings we want extracted in a given number of emacs core packages ?
> 
> How would such a survey help us?  We generally want all of the strings that are displayed to the user translated.  We don't need any survey for that decision.

Of course, but a survey (sorry, I don't have a better word) of a few packages can help us see the workload, build prototypes, test them, establish best practices for developers, etc.

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09  2:44                                           ` Jean-Christophe Helary
@ 2019-03-09  6:40                                             ` Eli Zaretskii
  2019-03-09  8:37                                               ` Michael Albinus
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-09  6:40 UTC (permalink / raw)
  To: Jean-Christophe Helary; +Cc: emacs-devel

> From: Jean-Christophe Helary <brandelune@gmail.com>
> Date: Sat, 9 Mar 2019 11:44:09 +0900
> 
> >> Sure. I just meant that l10n issues (is PO "efficiency", etc.) are already solved, but i18n in general has to be implemented from scratch.
> > 
> > No sure I understand: what part(s) we would need to implement from scratch?  We already have the capability of inserting arbitrary non-ASCII text into any buffer and displaying such text as echo area messages.
> 
> What I mean by "from scratch" is that we have the possibility to extract text and insert text, but i18n is inexistant in emacs. So we have to build an i18n system that works for emacs and that does not exist yet, at all.

I don't see how we can start implementing before deciding what and how
to implement.  This discussion hopefully will eventually lead to such
decisions.

> Also, the "how to load catalogs on demand" point that you mention below is part of i18n and as you seem to say has to be developed from scratch.

If we decide that the gettext way is not entirely appropriate, yes.
But we didn't make that decision yet.

> > Also, how to merge several catalogs (the need for this might disappear if, for example, we decide that each .el file will have its own catalog),
> 
> Won't this depend on the extracting tool's options ?

Not directly, no.  It's actually the other way around: we should first
decide how to arrange the catalogs, and only after that see what
tools/options to use for that.

> And wouldn't that be more practical in the first place to not merge anything but have one catalog per .el file ? (practical in terms of translation/testing/management, as far as I can tell from experience, etc.)

If you are following the discussion, you know that not everyone agrees
with that.  There are advantages in having just one catalog or a small
number of large ones.

> > and how to load catalogs on demand when the corresponding code is loaded/executed.
> 
> I guess you mean the technicalities involved in the obvious (?) "we check the user preferred locale and display the catalog corresponding to that locale" ?

I said "load", not "display".  If you have one catalog per .el file,
when do you load it into memory and when, if ever, do you unload it?
Loading everything at the start would be un-economical, to say the
least.

> >> Can't we start with a survey of the strings we want extracted in a given number of emacs core packages ?
> > 
> > How would such a survey help us?  We generally want all of the strings that are displayed to the user translated.  We don't need any survey for that decision.
> 
> Of course, but a survey (sorry, I don't have a better word) of a few packages can help us see the workload, build prototypes, test them, establish best practices for developers, etc.

I don't think we have reached the point where building prototypes is
useful, since we don't yet have the basic design decisions for
prototyping.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09  6:40                                             ` Eli Zaretskii
@ 2019-03-09  8:37                                               ` Michael Albinus
  2019-03-09 10:45                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 145+ messages in thread
From: Michael Albinus @ 2019-03-09  8:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Jean-Christophe Helary, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

Hi Eli,

>> And wouldn't that be more practical in the first place to not merge
>> anything but have one catalog per .el file ? (practical in terms of
>> translation/testing/management, as far as I can tell from
>> experience, etc.)
>
> If you are following the discussion, you know that not everyone agrees
> with that.  There are advantages in having just one catalog or a small
> number of large ones.

One catalog for the whole Emacs is not appropriate for packages with a
life outside Emacs core, like org or Tramp.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09  8:37                                               ` Michael Albinus
@ 2019-03-09 10:45                                                 ` Eli Zaretskii
  2019-03-09 11:27                                                   ` Michael Albinus
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-09 10:45 UTC (permalink / raw)
  To: Michael Albinus; +Cc: brandelune, emacs-devel

> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: Jean-Christophe Helary <brandelune@gmail.com>,  emacs-devel@gnu.org
> Date: Sat, 09 Mar 2019 09:37:11 +0100
> 
> > If you are following the discussion, you know that not everyone agrees
> > with that.  There are advantages in having just one catalog or a small
> > number of large ones.
> 
> One catalog for the whole Emacs is not appropriate for packages with a
> life outside Emacs core, like org or Tramp.

Yes, but the question still stands whether those packages which _are_
maintained only in the Emacs repository should have one catalog or
more than one, and if more than one, then at which granularity.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 10:45                                                 ` Eli Zaretskii
@ 2019-03-09 11:27                                                   ` Michael Albinus
  2019-03-09 17:23                                                     ` Eli Zaretskii
  2019-03-09 19:22                                                     ` Paul Eggert
  0 siblings, 2 replies; 145+ messages in thread
From: Michael Albinus @ 2019-03-09 11:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: brandelune, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

Hi Eli,

> Yes, but the question still stands whether those packages which _are_
> maintained only in the Emacs repository should have one catalog or
> more than one, and if more than one, then at which granularity.

Packages with an own subdirectory (f.e., gnus, vc) should have an own
catalog. Tramp + ange-ftp.el could get an own subdirectory + catalog as
well (these are 17 *.el files).

Best regards, Michael.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 11:27                                                   ` Michael Albinus
@ 2019-03-09 17:23                                                     ` Eli Zaretskii
  2019-03-09 19:55                                                       ` Paul Eggert
  2019-03-09 20:04                                                       ` Michael Albinus
  2019-03-09 19:22                                                     ` Paul Eggert
  1 sibling, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-09 17:23 UTC (permalink / raw)
  To: Michael Albinus; +Cc: brandelune, emacs-devel

> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: brandelune@gmail.com,  emacs-devel@gnu.org
> Date: Sat, 09 Mar 2019 12:27:04 +0100
> 
> > Yes, but the question still stands whether those packages which _are_
> > maintained only in the Emacs repository should have one catalog or
> > more than one, and if more than one, then at which granularity.
> 
> Packages with an own subdirectory (f.e., gnus, vc) should have an own
> catalog. Tramp + ange-ftp.el could get an own subdirectory + catalog as
> well (these are 17 *.el files).

So you are saying that we should have a single catalog for all the
other .el files, and load it unconditionally in every Emacs session?
That'd waste memory, no?  We have more than 1500 Lisp files in Emacs.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 17:23                                                     ` Eli Zaretskii
@ 2019-03-09 19:55                                                       ` Paul Eggert
  2019-03-09 20:07                                                         ` Eli Zaretskii
  2019-03-09 20:04                                                       ` Michael Albinus
  1 sibling, 1 reply; 145+ messages in thread
From: Paul Eggert @ 2019-03-09 19:55 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii wrote:
> So you are saying that we should have a single catalog for all the
> other .el files, and load it unconditionally in every Emacs session?
> That'd waste memory, no?
Assuming we use GNU gettext, it'd consume virtual memory but not as much 
physical memory, as GNU gettext mmaps the message catalog (using PROT_READ so 
that it's read-only and the physical data can be shared). Only pages containing 
actual translations should need to be brought into physical memory (along with 
the indexes to these pages).

The total amount of virtual memory would depend on the catalog size. A 
reasonable upper bound for current Emacs master would be 61 MB (the sum of sizes 
of all of Emacs's .el files). Although 61 MB is nontrivial, there should be 
little trouble fitting it into virtual memory even on a 32-bit platform.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 19:55                                                       ` Paul Eggert
@ 2019-03-09 20:07                                                         ` Eli Zaretskii
  2019-03-09 20:47                                                           ` Paul Eggert
  0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-09 20:07 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 9 Mar 2019 11:55:15 -0800
> 
> Eli Zaretskii wrote:
> > So you are saying that we should have a single catalog for all the
> > other .el files, and load it unconditionally in every Emacs session?
> > That'd waste memory, no?
> Assuming we use GNU gettext, it'd consume virtual memory but not as much 
> physical memory, as GNU gettext mmaps the message catalog (using PROT_READ so 
> that it's read-only and the physical data can be shared). Only pages containing 
> actual translations should need to be brought into physical memory (along with 
> the indexes to these pages).
> 
> The total amount of virtual memory would depend on the catalog size. A 
> reasonable upper bound for current Emacs master would be 61 MB (the sum of sizes 
> of all of Emacs's .el files). Although 61 MB is nontrivial, there should be 
> little trouble fitting it into virtual memory even on a 32-bit platform.

The same is true for the Lisp files themselves.  Yet we don't load
them all in advance, because that's simply not economical.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 20:07                                                         ` Eli Zaretskii
@ 2019-03-09 20:47                                                           ` Paul Eggert
  0 siblings, 0 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-09 20:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:
>> The total amount of virtual memory would depend on the catalog size. A
>> reasonable upper bound for current Emacs master would be 61 MB (the sum of sizes
>> of all of Emacs's .el files). Although 61 MB is nontrivial, there should be
>> little trouble fitting it into virtual memory even on a 32-bit platform.

> The same is true for the Lisp files themselves.  Yet we don't load
> them all in advance, because that's simply not economical.

No, it would be quite economical if we put all the .elc files into one big file 
that was mmapped in and then used lazily (which is what GNU gettext does for 
message catalogs). Emacs doesn't do that because historically it developed 
another way to use .elc files, a way that is good enough in practice even if it 
might not be as efficient as the mmap approach.

The GNU gettext library was historically developed to use mmap, and is good 
enough in practice for Emacs as-is. None of the issues discussed in this thread 
mean that we should redesign the gettext library, or split up message catalogs 
only for performance reasons. On the contrary, splitting things up (or rewriting 
the gettext library in Elisp) is likely to make things slower.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 17:23                                                     ` Eli Zaretskii
  2019-03-09 19:55                                                       ` Paul Eggert
@ 2019-03-09 20:04                                                       ` Michael Albinus
  2019-03-09 20:14                                                         ` Eli Zaretskii
  1 sibling, 1 reply; 145+ messages in thread
From: Michael Albinus @ 2019-03-09 20:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: brandelune, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

Hi Eli,

> So you are saying that we should have a single catalog for all the
> other .el files, and load it unconditionally in every Emacs session?
> That'd waste memory, no?  We have more than 1500 Lisp files in Emacs.

I haven't said this. I have no strong opinion about the other lisp
files.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 20:04                                                       ` Michael Albinus
@ 2019-03-09 20:14                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-09 20:14 UTC (permalink / raw)
  To: Michael Albinus; +Cc: brandelune, emacs-devel

> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: brandelune@gmail.com,  emacs-devel@gnu.org
> Date: Sat, 09 Mar 2019 21:04:43 +0100
> 
> > So you are saying that we should have a single catalog for all the
> > other .el files, and load it unconditionally in every Emacs session?
> > That'd waste memory, no?  We have more than 1500 Lisp files in Emacs.
> 
> I haven't said this. I have no strong opinion about the other lisp
> files.

Oh, I agree that Gnus should have only one catalog, as should Tramp,
Calc, Org, ERC, NXML, Rmail, etc.  IOW, if a package has several Lisp
files, it should still have no more than one catalog.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 11:27                                                   ` Michael Albinus
  2019-03-09 17:23                                                     ` Eli Zaretskii
@ 2019-03-09 19:22                                                     ` Paul Eggert
  2019-03-09 19:39                                                       ` Eli Zaretskii
                                                                         ` (2 more replies)
  1 sibling, 3 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-09 19:22 UTC (permalink / raw)
  To: Michael Albinus, Eli Zaretskii; +Cc: brandelune, emacs-devel

Michael Albinus wrote:
> Packages with an own subdirectory (f.e., gnus, vc) should have an own
> catalog.

I'm not sure I agree.

Message catalogs are primarily of interest to translators and installers, not 
programmers. Assuming we're using the gettext machinery (a pretty safe 
assumption, as why reinvent the wheel?), the set of messages to be translated 
will be maintained automatically: programmers shouldn't care how many catalogs 
there are, or how they're updated.

Other GNU packages generally go with one large catalog, for several reasons. For 
example, translators can batch their work; similar translations can be shared 
more easily and reliably; and installation is simpler and a bit faster.

A few packages do have multiple catalogs. This is intended for convenience in 
installation, not for convenience to developers. For example, GNU gettext has 
two catalogs, one for the gettext runtime library (used by applications in 
production) and one for gettext tools (used by developers when extracting or 
doing translations). That way, operating systems packagers can install just the 
first message catalog on systems where users are not developers.

In practice, though, this multiple-catalog approach hasn't proved to be all that 
useful. Debian and Fedora both put the two gettext catalogs into one package. 
Debian has a package language-pack-fr-base that contains French translations for 
several core packages, including both gettext catalogs, and similarly for other 
languages. Fedora includes all translations of both gettext catalogs in its 
'gettext' package. So in hindsight, it seems to have been overkill for 'gettext' 
to have two translation catalogs.

With this in mind, I think it unlikely that OS packagers would find it useful 
for Emacs to maintain multiple message catalogs for each source subdirectory.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 19:22                                                     ` Paul Eggert
@ 2019-03-09 19:39                                                       ` Eli Zaretskii
  2019-03-09 20:48                                                         ` Paul Eggert
  2019-03-09 20:08                                                       ` Michael Albinus
  2019-03-10  3:09                                                       ` Richard Stallman
  2 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-09 19:39 UTC (permalink / raw)
  To: Paul Eggert; +Cc: michael.albinus, brandelune, emacs-devel

> Cc: brandelune@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 9 Mar 2019 11:22:12 -0800
> 
> Other GNU packages generally go with one large catalog, for several reasons. For 
> example, translators can batch their work; similar translations can be shared 
> more easily and reliably; and installation is simpler and a bit faster.
> 
> A few packages do have multiple catalogs.

Any example of a package whose 90% gets loaded piecemeal on demand?
Out of ~500 packages that Emacs has, how many are loaded into our
"usual" session?  And if we don't load all of those 500, why should we
load their message catalogs?

This is one of those aspects that make Emacs so different from other
localized programs.  I think the difference really justifies
separating the catalogs by package.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 19:39                                                       ` Eli Zaretskii
@ 2019-03-09 20:48                                                         ` Paul Eggert
  0 siblings, 0 replies; 145+ messages in thread
From: Paul Eggert @ 2019-03-09 20:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, brandelune, emacs-devel

Eli Zaretskii wrote:
>> A few packages do have multiple catalogs.
> Any example of a package whose 90% gets loaded piecemeal on demand?

I'm not quite sure what you mean by "whose 90% gets loaded piecemail on demand". 
However, it's routine for a program to retrieve only a few translations from a 
much larger catalog, so that most of the catalog is not loaded into physical 
RAM. The GNU gettext library is tuned for this sort of thing, and I see no 
reason why Emacs would pose important performance challenges to it.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 19:22                                                     ` Paul Eggert
  2019-03-09 19:39                                                       ` Eli Zaretskii
@ 2019-03-09 20:08                                                       ` Michael Albinus
  2019-03-10  3:09                                                       ` Richard Stallman
  2 siblings, 0 replies; 145+ messages in thread
From: Michael Albinus @ 2019-03-09 20:08 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, brandelune, emacs-devel

Paul Eggert <eggert@cs.ucla.edu> writes:

Hi Paul,

> With this in mind, I think it unlikely that OS packagers would find it
> useful for Emacs to maintain multiple message catalogs for each source
> subdirectory.

There are packages which live outside Emacs. There are packages which
are also available as ELPA core packages. At least these packages need
their own catalog.

For all other packages I'm undecided.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-09 19:22                                                     ` Paul Eggert
  2019-03-09 19:39                                                       ` Eli Zaretskii
  2019-03-09 20:08                                                       ` Michael Albinus
@ 2019-03-10  3:09                                                       ` Richard Stallman
  2019-03-10 13:38                                                         ` Eli Zaretskii
  2 siblings, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-10  3:09 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, michael.albinus, brandelune, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Packages with an own subdirectory (f.e., gnus, vc) should have an own
  > > catalog.

  > I'm not sure I agree.

Let's start out without any particular rule about this, and let people
try various things.  That way we will work out what is useful.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-10  3:09                                                       ` Richard Stallman
@ 2019-03-10 13:38                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-10 13:38 UTC (permalink / raw)
  To: rms; +Cc: eggert, michael.albinus, brandelune, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Date: Sat, 09 Mar 2019 22:09:09 -0500
> Cc: eliz@gnu.org, michael.albinus@gmx.de, brandelune@gmail.com,
> 	emacs-devel@gnu.org
> 
> Let's start out without any particular rule about this, and let people
> try various things.  That way we will work out what is useful.

Btw, translating messages also means that the likes of this:

  static bool
  set_message_1 (ptrdiff_t a1, Lisp_Object string)
  {
    [...]
    if (!NILP (BVAR (current_buffer, bidi_display_reordering)))
      bset_bidi_paragraph_direction (current_buffer, Qleft_to_right);

will need to depend on the current UI language, instead of being
hard-coded, so the value should be probably recorded in some file (the
message catalog?).  Likewise the direction of menu items and tool-bar
buttons, if/when we get to translating menus and the tool bar.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 22:29                                 ` Juri Linkov
  2019-03-08  1:48                                   ` Jean-Christophe Helary
@ 2019-03-08  7:37                                   ` Eli Zaretskii
  2019-03-09  3:12                                     ` Richard Stallman
  1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-08  7:37 UTC (permalink / raw)
  To: Juri Linkov; +Cc: eggert, rms, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: rms@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Fri, 08 Mar 2019 00:29:17 +0200
> 
> > We still should try to use .po files for them, if at all possible,
> > and perhaps also the gettext code that supports looking up strings
> > in .gmo catalogs generated from .po.
> 
> The PO format is best suited for translation of one-liners like
> messages and menu items, but I doubt that the PO format would be
> the most efficient implementation for multi-line doc strings since
> gettext uses the whole text of the doc string as a key to translation.

I'm not sure I understand why the length of the string is an important
factor here.  Can you explain?  If the problem is with the efficiency
of gettext implementation of indexing, then we could have our own
indexing method.

> Whereas more efficient would be to use a Lisp symbol (function or
> variable name) as a translation key.

A key other than the original string would mean abandoning the PO
format.  Any deviation from PO would mean major PITA for translation
teams, so we should make sure the reason for such a deviation is a
very good reason.  I'm not yet sure we have such a good reason.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-08  7:37                                   ` Eli Zaretskii
@ 2019-03-09  3:12                                     ` Richard Stallman
  0 siblings, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-09  3:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, emacs-devel, juri

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

We can handle translating doc strings just like the other
translations.  Or we could have a special system for doc strings -- if
that proves more convenient.  Since doc strings are special in so many
ways, a special system might prove more convenient.  Or it might not.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n
  2019-03-07 14:48                               ` Eli Zaretskii
  2019-03-07 22:29                                 ` Juri Linkov
@ 2019-03-08  4:11                                 ` Richard Stallman
  1 sibling, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-08  4:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: juri, eggert, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > I think it would be most natural to handle doc strings through
  > > a special mechanism.

  > Up to a point, perhaps.  We still should try to use .po files for
  > them, if at all possible, and perhaps also the gettext code that
  > supports looking up strings in .gmo catalogs generated from .po.

I agree completely.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-05  2:09                       ` Paul Eggert
  2019-03-05 21:58                         ` Emacs i18n Juri Linkov
@ 2019-03-06  2:09                         ` Richard Stallman
  1 sibling, 0 replies; 145+ messages in thread
From: Richard Stallman @ 2019-03-06  2:09 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eliz, juri, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I don't see why it wouldn't work for Elisp. The gettext infrastructure
  > allows multiple message catalogs in the same session.

Please try it and see.

If we need to replace gettext.c with Lisp code, that won't be hard,
When we come to that point, people will be enthusiastic about
translations and someone will do it.

What I think is crucial is to be compatible with the gettext
infrastructure for writign, maintaining and distributing translations.
Moving the code in Emacs to Lisp would not conflict with that.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-04 19:07                     ` Eli Zaretskii
  2019-03-05  2:09                       ` Paul Eggert
@ 2019-03-05  2:49                       ` Richard Stallman
  2019-03-05  3:31                         ` Eli Zaretskii
  1 sibling, 1 reply; 145+ messages in thread
From: Richard Stallman @ 2019-03-05  2:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: juri, eggert, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I'm saying that IMO it makes no sense at all to do this only for C.
  > The infrastructure used for that will most probably not work for Lisp,
  > let alone allow separate translations for separate packages to be
  > brought together and used in the same Emacs session.

I tend to agree.  But I think that a first try for the infrastructure
in Lisp won't be hard, and will enable us to get volunteers involvesd
in contributing.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted)
  2019-03-05  2:49                       ` Richard Stallman
@ 2019-03-05  3:31                         ` Eli Zaretskii
  0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2019-03-05  3:31 UTC (permalink / raw)
  To: rms; +Cc: juri, eggert, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org, juri@linkov.net
> Date: Mon, 04 Mar 2019 21:49:19 -0500
> 
> I think that a first try for the infrastructure in Lisp won't be
> hard, and will enable us to get volunteers involvesd in
> contributing.

I certainly hope so.



^ permalink raw reply	[flat|nested] 145+ messages in thread

end of thread, other threads:[~2019-04-24 20:18 UTC | newest]

Thread overview: 145+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <87o97aq6gz.fsf@jidanni.org>
     [not found] ` <87tvgoud56.fsf@mail.linkov.net>
     [not found]   ` <83o96wk2mi.fsf@gnu.org>
     [not found]     ` <87k1hjfvjd.fsf@mail.linkov.net>
     [not found]       ` <E1gzZKP-0000kS-Iw@fencepost.gnu.org>
     [not found]         ` <871s3p0zdz.fsf@mail.linkov.net>
2019-03-03  3:04           ` bug#34520: delete-matching-lines should report how many lines it deleted Richard Stallman
2019-03-03 15:31             ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Eli Zaretskii
2019-03-03 20:57               ` Emacs i18n Juri Linkov
2019-03-04  1:46                 ` Jean-Christophe Helary
2019-03-06  9:38                   ` Elias Mårtenson
2019-03-06 11:23                     ` Jean-Christophe Helary
2019-03-21 20:33                   ` Clément Pit-Claudel
2019-03-21 20:50                     ` Eli Zaretskii
2019-03-21 21:03                       ` Clément Pit-Claudel
2019-03-21 21:21                         ` Jean-Christophe Helary
2019-03-21 21:34                           ` Clément Pit-Claudel
2019-03-21 21:56                             ` Jean-Christophe Helary
2019-03-21 22:05                               ` Clément Pit-Claudel
2019-03-21 23:46                                 ` Jean-Christophe Helary
2019-03-22  8:22                         ` Eli Zaretskii
2019-03-22 16:10                           ` Clément Pit-Claudel
2019-03-22 16:35                             ` Eli Zaretskii
2019-03-22 17:16                               ` Clément Pit-Claudel
2019-03-22 17:35                                 ` Eli Zaretskii
2019-03-22 23:17                                   ` Clément Pit-Claudel
2019-03-21 21:17                     ` Jean-Christophe Helary
2019-03-21 21:59                     ` Juri Linkov
2019-03-22  8:22                       ` Eli Zaretskii
2019-03-23 21:50                         ` Juri Linkov
2019-03-24  3:36                           ` Eli Zaretskii
2019-03-24 21:55                             ` Juri Linkov
2019-03-24 23:31                               ` Jean-Christophe Helary
2019-03-25 21:32                                 ` Juri Linkov
2019-03-25 22:31                                   ` Paul Eggert
2019-03-26 16:11                                     ` Eli Zaretskii
2019-03-26 16:22                                       ` Stefan Monnier
2019-03-26 16:55                                         ` Eli Zaretskii
2019-03-26 22:35                                       ` Paul Eggert
2019-03-27  3:43                                         ` Eli Zaretskii
2019-03-28 14:56                                           ` Clément Pit-Claudel
2019-03-28 15:52                                             ` Eli Zaretskii
2019-03-27  2:34                                       ` Jean-Christophe Helary
2019-03-26 23:16                                     ` Juri Linkov
2019-03-27  1:35                                       ` Paul Eggert
2019-04-24  6:39                                       ` Jean-Christophe Helary
2019-04-24 20:18                                         ` Juri Linkov
2019-03-25  3:35                               ` Eli Zaretskii
2019-03-25  9:04                                 ` Jean-Christophe Helary
2019-03-25 21:02                                 ` Juri Linkov
2019-03-26  3:27                                   ` Eli Zaretskii
2019-03-27 23:06                                     ` Richard Stallman
2019-03-25 10:52                               ` Mattias Engdegård
2019-03-25 15:37                                 ` Eli Zaretskii
2019-03-25 21:11                                 ` Juri Linkov
2019-03-25 22:05                                   ` Mattias Engdegård
2019-03-27 21:22                                     ` Juri Linkov
2019-03-28 11:03                                       ` Mattias Engdegård
2019-03-04  3:27               ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Richard Stallman
2019-03-04 16:36                 ` Eli Zaretskii
2019-03-04 18:37                   ` Paul Eggert
2019-03-04 19:07                     ` Eli Zaretskii
2019-03-05  2:09                       ` Paul Eggert
2019-03-05 21:58                         ` Emacs i18n Juri Linkov
2019-03-06  2:16                           ` Richard Stallman
2019-03-06 18:15                             ` Eli Zaretskii
2019-03-06 19:47                               ` Paul Eggert
2019-03-06 20:19                                 ` Eli Zaretskii
2019-03-07  1:52                                   ` Paul Eggert
2019-03-07  3:37                                     ` Eli Zaretskii
2019-03-08  4:07                                       ` Richard Stallman
2019-03-08  8:16                                         ` Eli Zaretskii
2019-03-08  4:07                                 ` Richard Stallman
2019-03-08  4:33                                   ` Elias Mårtenson
2019-03-08  8:22                                     ` Eli Zaretskii
2019-03-09  3:11                                     ` Richard Stallman
2019-03-09  7:54                                       ` Paul Eggert
2019-03-09 10:30                                         ` Eli Zaretskii
2019-03-10  3:05                                         ` Richard Stallman
2019-03-10  6:07                                           ` Paul Eggert
2019-03-11  1:20                                             ` Richard Stallman
2019-03-11  3:52                                               ` Paul Eggert
2019-03-12  3:31                                                 ` Richard Stallman
2019-03-12  3:31                                                 ` Richard Stallman
2019-03-10  8:45                                           ` Yuri Khan
2019-03-10  3:05                                         ` Richard Stallman
2019-03-10  6:14                                           ` Paul Eggert
2019-03-10  3:05                                         ` Richard Stallman
2019-03-07  3:42                               ` Richard Stallman
2019-03-07 14:46                                 ` Eli Zaretskii
2019-03-07 17:19                                   ` Paul Eggert
2019-03-07 18:24                                     ` martin rudalics
2019-03-07 18:44                                       ` Paul Eggert
2019-03-07 20:22                                     ` Eli Zaretskii
2019-03-07 22:25                                       ` Paul Eggert
2019-03-08  7:29                                         ` Eli Zaretskii
2019-03-08  4:18                                       ` Richard Stallman
2019-03-08  4:11                                   ` Richard Stallman
2019-03-06 17:30                           ` Eli Zaretskii
2019-03-06 18:09                           ` Eli Zaretskii
2019-03-06 19:39                             ` Paul Eggert
2019-03-06 19:49                               ` Eli Zaretskii
2019-03-07  1:33                                 ` Paul Eggert
2019-03-07  3:30                                   ` Eli Zaretskii
2019-03-07 16:06                                     ` Paul Eggert
2019-03-07  4:35                                   ` Jean-Christophe Helary
2019-03-07 16:04                                     ` Paul Eggert
2019-03-08  4:09                                     ` Richard Stallman
2019-03-11 21:48                                     ` Juri Linkov
2019-03-11 22:51                                       ` Paul Eggert
2019-03-12 21:45                                         ` Juri Linkov
2019-03-17 21:23                                           ` Juri Linkov
2019-03-18 21:20                                             ` Juri Linkov
2019-03-18 21:55                                               ` Paul Eggert
2019-03-19 20:40                                                 ` Juri Linkov
2019-03-11 23:59                                       ` Jean-Christophe Helary
2019-03-12  9:16                                       ` Michael Albinus
2019-03-06 19:47                             ` Paul Eggert
2019-03-06 20:21                               ` Eli Zaretskii
2019-03-07  1:43                                 ` Paul Eggert
2019-03-07  3:31                                   ` Eli Zaretskii
2019-03-07  3:44                             ` Richard Stallman
2019-03-07 14:48                               ` Eli Zaretskii
2019-03-07 22:29                                 ` Juri Linkov
2019-03-08  1:48                                   ` Jean-Christophe Helary
2019-03-08  8:08                                     ` Eli Zaretskii
2019-03-08 15:11                                       ` Jean-Christophe Helary
2019-03-08 20:11                                         ` Eli Zaretskii
2019-03-09  2:44                                           ` Jean-Christophe Helary
2019-03-09  6:40                                             ` Eli Zaretskii
2019-03-09  8:37                                               ` Michael Albinus
2019-03-09 10:45                                                 ` Eli Zaretskii
2019-03-09 11:27                                                   ` Michael Albinus
2019-03-09 17:23                                                     ` Eli Zaretskii
2019-03-09 19:55                                                       ` Paul Eggert
2019-03-09 20:07                                                         ` Eli Zaretskii
2019-03-09 20:47                                                           ` Paul Eggert
2019-03-09 20:04                                                       ` Michael Albinus
2019-03-09 20:14                                                         ` Eli Zaretskii
2019-03-09 19:22                                                     ` Paul Eggert
2019-03-09 19:39                                                       ` Eli Zaretskii
2019-03-09 20:48                                                         ` Paul Eggert
2019-03-09 20:08                                                       ` Michael Albinus
2019-03-10  3:09                                                       ` Richard Stallman
2019-03-10 13:38                                                         ` Eli Zaretskii
2019-03-08  7:37                                   ` Eli Zaretskii
2019-03-09  3:12                                     ` Richard Stallman
2019-03-08  4:11                                 ` Richard Stallman
2019-03-06  2:09                         ` Emacs i18n (was: bug#34520: delete-matching-lines should report how many lines it deleted) Richard Stallman
2019-03-05  2:49                       ` Richard Stallman
2019-03-05  3:31                         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).