* Word order in Guix l10n @ 2020-12-15 10:53 Zhu Zihao 2020-12-15 12:25 ` Julien Lepiller 2020-12-15 13:11 ` Arun Isaac 0 siblings, 2 replies; 11+ messages in thread From: Zhu Zihao @ 2020-12-15 10:53 UTC (permalink / raw) To: guix-devel [-- Attachment #1: Type: text/plain, Size: 1345 bytes --] Hi, Guix users! Currently I'm putting my energy into Guix l10n(zh_CN). However, there's a serious flaw in current implementation of l10n. AFAFIK, Guix use format in (ice-9 format) to format the the template string return by `G_`. The template string of (ice-9 format) looks similar to the format template defined in ANSI CL, which only supports format arguments **one by one**. In CJK languages, word order usually different from English. For example. consider message "could not find bootstrap binary '~a' for system '~a'" We mark first ~a as %1 and mark second as %2. It should be translated into Chinese like this "无法找到用于引导 %2 系统的二进制文件 %1" But currently it looks to be impossible because we can't refer to positional argument in format template. My suggestions is we can create a new format function supports refer to positional argument like (pos-format "~3 ~2 ~1" "foo" "bar" "baz") ;; => "baz bar foo" And replace `format` in l10n with `pos-format` step by step. Maybe we can create a function to combine `G_` and `format` so we can change its implement detail without breaking existing code. Please leave your comment on my opinion, thanks :) -- Retrieve my PGP public key: gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F Zihao [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 255 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-15 10:53 Word order in Guix l10n Zhu Zihao @ 2020-12-15 12:25 ` Julien Lepiller 2020-12-18 15:04 ` Ludovic Courtès 2020-12-15 13:11 ` Arun Isaac 1 sibling, 1 reply; 11+ messages in thread From: Julien Lepiller @ 2020-12-15 12:25 UTC (permalink / raw) To: guix-devel, Zhu Zihao [-- Attachment #1: Type: text/plain, Size: 1779 bytes --] Even when translating to French, I sometimes feel the need to change word order, but I end up finding a slightly unnatural way to preserve the order of arguments. I don't have an example at hand though. I don't know enough about guile to know how best to implement that (or if that exists already). Le 15 décembre 2020 05:53:56 GMT-05:00, Zhu Zihao <all_but_last@163.com> a écrit : > >Hi, Guix users! > >Currently I'm putting my energy into Guix l10n(zh_CN). However, there's >a serious flaw in current implementation of l10n. > >AFAFIK, Guix use format in (ice-9 format) to format the the template >string return by `G_`. The template string of (ice-9 format) looks >similar to the format template defined in ANSI CL, which only supports >format arguments **one by one**. > >In CJK languages, word order usually different from English. > >For example. consider message > > "could not find bootstrap binary '~a' for system '~a'" > >We mark first ~a as %1 and mark second as %2. It should be translated >into Chinese like this > > "无法找到用于引导 %2 系统的二进制文件 %1" > >But currently it looks to be impossible because we can't refer >to positional argument in format template. > >My suggestions is we can create a new format function supports refer to >positional argument like > > (pos-format "~3 ~2 ~1" "foo" "bar" "baz") ;; => "baz bar foo" > >And replace `format` in l10n with `pos-format` step by step. Maybe we >can create a function to combine `G_` and `format` so we can change its >implement detail without breaking existing code. > >Please leave your comment on my opinion, thanks :) > > >-- >Retrieve my PGP public key: > > gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F > >Zihao [-- Attachment #2: Type: text/html, Size: 1956 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-15 12:25 ` Julien Lepiller @ 2020-12-18 15:04 ` Ludovic Courtès 2020-12-18 18:03 ` Arun Isaac 0 siblings, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2020-12-18 15:04 UTC (permalink / raw) To: Julien Lepiller; +Cc: guix-devel, Zhu Zihao Hi! Julien Lepiller <julien@lepiller.eu> skribis: > Even when translating to French, I sometimes feel the need to change > word order, but I end up finding a slightly unnatural way to preserve > the order of arguments. I don't have an example at hand though. > > I don't know enough about guile to know how best to implement that (or > if that exists already). This looks like a real issue. I’m surprised this isn’t already addressed though: after all, ‘printf’ format strings have the same problem, right? How does everyone else deal with that? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-18 15:04 ` Ludovic Courtès @ 2020-12-18 18:03 ` Arun Isaac 2020-12-22 15:00 ` Ludovic Courtès 0 siblings, 1 reply; 11+ messages in thread From: Arun Isaac @ 2020-12-18 18:03 UTC (permalink / raw) To: Ludovic Courtès, Julien Lepiller; +Cc: guix-devel, Zhu Zihao [-- Attachment #1: Type: text/plain, Size: 504 bytes --] Hi, > This looks like a real issue. I’m surprised this isn’t already > addressed though: after all, ‘printf’ format strings have the same > problem, right? How does everyone else deal with that? For C's printf format strings, gettext supports special syntax to specify argument order. See https://www.gnu.org/software/gettext/manual/html_node/c_002dformat-Flag.html A German example is provided on that page. "%2$d Zeichen lang ist die Zeichenkette `%1$s'" Regards, Arun [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 524 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-18 18:03 ` Arun Isaac @ 2020-12-22 15:00 ` Ludovic Courtès 2020-12-22 15:06 ` Julien Lepiller 2020-12-22 15:45 ` Miguel Ángel Arruga Vivas 0 siblings, 2 replies; 11+ messages in thread From: Ludovic Courtès @ 2020-12-22 15:00 UTC (permalink / raw) To: Arun Isaac; +Cc: guix-devel, Zhu Zihao Hi, Arun Isaac <arunisaac@systemreboot.net> skribis: >> This looks like a real issue. I’m surprised this isn’t already >> addressed though: after all, ‘printf’ format strings have the same >> problem, right? How does everyone else deal with that? > > For C's printf format strings, gettext supports special syntax to > specify argument order. See > https://www.gnu.org/software/gettext/manual/html_node/c_002dformat-Flag.html Oh, I see. > A German example is provided on that page. > > "%2$d Zeichen lang ist die Zeichenkette `%1$s'" With (ice-9 format), as has been suggested before, we should be able to do away with the “argument jumping” syntax (info "(guile) Formatted Output"): (format #f "~1@*~d Zeichen lang ist die Zeichenkette `~0@*~a'" "ab" 2) It’s a bit awkward though, in particular because we have to jump to the previous argument (0 and 1 here instead of 1 and 2). Does xgettext support that syntax? We’ve had troubles before with ~*. If it does, where should we use this syntax in lieu of the simpler forms? Everywhere? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-22 15:00 ` Ludovic Courtès @ 2020-12-22 15:06 ` Julien Lepiller 2020-12-23 12:22 ` Miguel Ángel Arruga Vivas 2020-12-22 15:45 ` Miguel Ángel Arruga Vivas 1 sibling, 1 reply; 11+ messages in thread From: Julien Lepiller @ 2020-12-22 15:06 UTC (permalink / raw) To: Ludovic Courtès, Arun Isaac; +Cc: guix-devel, Zhu Zihao [-- Attachment #1: Type: text/plain, Size: 1584 bytes --] This specific syntax looks ok, but we need to limit ourself to the common syntax between guile and lisp, because that's what gettext supports. We should use this kind of syntax everywhere we have more than one argument. Also thinking about rtl languages, it's probably important for them, though I'm not sure how gettext works for them. Le 22 décembre 2020 10:00:35 GMT-05:00, "Ludovic Courtès" <ludo@gnu.org> a écrit : >Hi, > >Arun Isaac <arunisaac@systemreboot.net> skribis: > >>> This looks like a real issue. I’m surprised this isn’t already >>> addressed though: after all, ‘printf’ format strings have the same >>> problem, right? How does everyone else deal with that? >> >> For C's printf format strings, gettext supports special syntax to >> specify argument order. See >> >https://www.gnu.org/software/gettext/manual/html_node/c_002dformat-Flag.html > >Oh, I see. > >> A German example is provided on that page. >> >> "%2$d Zeichen lang ist die Zeichenkette `%1$s'" > >With (ice-9 format), as has been suggested before, we should be able to >do away with the “argument jumping” syntax (info "(guile) Formatted >Output"): > > (format #f "~1@*~d Zeichen lang ist die Zeichenkette `~0@*~a'" "ab" 2) > >It’s a bit awkward though, in particular because we have to jump to the >previous argument (0 and 1 here instead of 1 and 2). > >Does xgettext support that syntax? We’ve had troubles before with ~*. > >If it does, where should we use this syntax in lieu of the simpler >forms? Everywhere? > >Thanks, >Ludo’. [-- Attachment #2: Type: text/html, Size: 2343 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-22 15:06 ` Julien Lepiller @ 2020-12-23 12:22 ` Miguel Ángel Arruga Vivas 2020-12-27 22:13 ` Ludovic Courtès 0 siblings, 1 reply; 11+ messages in thread From: Miguel Ángel Arruga Vivas @ 2020-12-23 12:22 UTC (permalink / raw) To: Julien Lepiller; +Cc: guix-devel, Zhu Zihao [-- Attachment #1: Type: text/plain, Size: 3169 bytes --] Hi Julien, Julien Lepiller <julien@lepiller.eu> writes: > This specific syntax looks ok, but we need to limit ourself to the > common syntax between guile and lisp, because that's what gettext > supports. The issue with Guile's format is explained here[1], as the used implementation follows SRFI-28[2], but there are no difference between the format from Common Lisp and the one from (ice-9 format)[3] on the surface level: both implementations are compatible regarding numeric, iteration, selection and jump directives, to name a few. Other directives might be compatible, such as the plural directive ~P, or not, although most of them shouldn't be used in any case: not because they could have compatibility problems but because they don't fit into internationalized messages correctly. For example, most languages have irregular cases for plural formation, some have more than two grammatical numeric cases, such as singular/dual/plural, and some don't have an equivalent category, such as Japanese. That's exactly use case of ngettext---I've pointed out on the other mail the pending issue on that area, which is related to the omission of the numeric parameter but not its order, and applies both to Common Lisp and (ice-9 format). > We should use this kind of syntax everywhere we have more than one > argument. I don't see the advantage of using everywhere jumps on the msgids. Nonetheless, a TRANSLATORS: comment placed on the first string appearing on the POT file, pointing the section of the manual for (ice-9 format), or even an explicit and detailed explanation of this syntax could be very helpful for translators. The attached patch does this, although any suggestion or even a complete rewrite is welcome, because I don't feel it quite inspired. > Also thinking about rtl languages, it's probably important > for them, though I'm not sure how gettext works for them. gettext-family functions only see byte arrays and provide the corresponding array, the bytes are always placed in increasing memory locations. Right-to-left handling is a responsibility of visualization layer, which sometimes includes the final format, but that is an issue even with left-to-right languages as French. For example, this composition... (string-append translated ": " other-translated) ... produces weird results, or convoluted French translations, because it isn't handled properly. A format string must be used here too, because it must include the white-space expected in French before the colon: (format #f (_ "~a: ~a") translated other-translated) Newlines are the only ones that are omitted sometimes from the internationalized composition because the convention up-to-down is followed, but this is a limitation of the teletype/terminal interface though; graphic interfaces aren't composed with this limitation and "whole widgets" should be the localization frame, which usually is the case. Happy hacking! Miguel [1] https://www.gnu.org/software/guile/manual/html_node/Simple-Output.html [2] https://www.gnu.org/software/guile/manual/html_node/SRFI_002d28.html [3] https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: comment.patch --] [-- Type: text/x-patch, Size: 1707 bytes --] From 2615934a2c377858dce2a0410982287faed754a9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miguel=20=C3=81ngel=20Arruga=20Vivas?= <rosen644835@gmail.com> Date: Wed, 23 Dec 2020 13:07:38 +0100 Subject: [PATCH] nls: Add comment about format directives. * gnu.scm (%try-use-modules): Add comment for translations. It should be placed on the first string found by xgettext. --- gnu.scm | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/gnu.scm b/gnu.scm index f139531ef3..0e87b10eb2 100644 --- a/gnu.scm +++ b/gnu.scm @@ -78,6 +78,19 @@ (raise (apply make-compound-condition + ;; TRANSLATORS: The scheme-format tag is used to identify + ;; strings that contain format directives as specified + ;; here: + ;; https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html + ;; + ;; The goto/jump directive can be used to alter the order + ;; of the arguments, either performing relative jumps with + ;; ~N* and ~N:* (forward and backwards respectively) or + ;; the absolute position of the argument can be used + ;; (starting from 0) with ~N@*. When N isn't provided, + ;; it's understood to be 1 on the relative jumps (next and + ;; previous argument respectively) and 0 on the absolute + ;; jumps (first argument). (formatted-message (G_ "module ~a not found") module) (condition -- 2.29.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-23 12:22 ` Miguel Ángel Arruga Vivas @ 2020-12-27 22:13 ` Ludovic Courtès 0 siblings, 0 replies; 11+ messages in thread From: Ludovic Courtès @ 2020-12-27 22:13 UTC (permalink / raw) To: Miguel Ángel Arruga Vivas; +Cc: guix-devel, Zhu Zihao ¡Hola! Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis: > From 2615934a2c377858dce2a0410982287faed754a9 Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?Miguel=20=C3=81ngel=20Arruga=20Vivas?= > <rosen644835@gmail.com> > Date: Wed, 23 Dec 2020 13:07:38 +0100 > Subject: [PATCH] nls: Add comment about format directives. > > * gnu.scm (%try-use-modules): Add comment for translations. It should > be placed on the first string found by xgettext. [...] > + ;; TRANSLATORS: The scheme-format tag is used to identify > + ;; strings that contain format directives as specified > + ;; here: > + ;; https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html > + ;; > + ;; The goto/jump directive can be used to alter the order > + ;; of the arguments, either performing relative jumps with > + ;; ~N* and ~N:* (forward and backwards respectively) or > + ;; the absolute position of the argument can be used > + ;; (starting from 0) with ~N@*. When N isn't provided, > + ;; it's understood to be 1 on the relative jumps (next and > + ;; previous argument respectively) and 0 on the absolute > + ;; jumps (first argument). > (formatted-message (G_ "module ~a not found") > module) Oh good, so we’d keep msgids unchanged and let translators use argument jumping, right? That sounds good to me. The only downside is that it might easier for translators to get it wrong. Perhaps adding an example in the comment above would help? Anyway I’m all for this patch. Thanks! Ludo’. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-22 15:00 ` Ludovic Courtès 2020-12-22 15:06 ` Julien Lepiller @ 2020-12-22 15:45 ` Miguel Ángel Arruga Vivas 1 sibling, 0 replies; 11+ messages in thread From: Miguel Ángel Arruga Vivas @ 2020-12-22 15:45 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, Zhu Zihao Hi Ludo, Ludovic Courtès <ludo@gnu.org> writes: > With (ice-9 format), as has been suggested before, we should be able to > do away with the “argument jumping” syntax (info "(guile) Formatted > Output"): > > (format #f "~1@*~d Zeichen lang ist die Zeichenkette `~0@*~a'" "ab" 2) > > It’s a bit awkward though, in particular because we have to jump to the > previous argument (0 and 1 here instead of 1 and 2). I wouldn't think of absolute goto directive jumping to the previous argument, it's just another chapter of the eternal debate regarding the first ordinal: Common Lisp/SLIB/ice-9 use the '0' convention for the 'first' position---the smallest element from the set of natural numbers---, instead of '1'. C-style arrays can be interpreted like this too. > Does xgettext support that syntax? We’ve had troubles before with ~*. These troubles are related to plural forms[1]. Singular forms don't have any issue because the type and number of format specifiers must match always. > If it does, where should we use this syntax in lieu of the simpler > forms? Everywhere? Yup, for singular forms (non-ngettext) it can be used everywhere right now. The translation of plural forms could, at most, omit one numeric directive (the one used for the ngettext call) to allow a more natural way of expressing implicitly the numeral, but this will need to wait for the next release of GNU gettext---the patch is almost there[2]. Nonetheless, the current version of msgfmt works correctly when no format directive is omitted. Happy hacking! Miguel [1] https://lists.gnu.org/archive/html/bug-gettext/2020-11/msg00027.html [2] https://lists.gnu.org/archive/html/bug-gettext/2020-12/msg00041.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-15 10:53 Word order in Guix l10n Zhu Zihao 2020-12-15 12:25 ` Julien Lepiller @ 2020-12-15 13:11 ` Arun Isaac 2020-12-15 13:51 ` Zhu Zihao 1 sibling, 1 reply; 11+ messages in thread From: Arun Isaac @ 2020-12-15 13:11 UTC (permalink / raw) To: Zhu Zihao, guix-devel [-- Attachment #1: Type: text/plain, Size: 563 bytes --] Hi Zhu Zihao, I faced the same problem while working on the Tamil localization. I used the ~* argument jumping supported by (ice-9 format) to work around this. So, for the message "could not find bootstrap binary '~a' for system '~a'", you could do something like "无法找到用于引导 ~1@* 系统的二进制文件 ~0@*" ~0@* refers to the 0th argument, ~1@* refers to the 1th argument, and so on. Look for "argument jumping" in https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html Hope that helps! Regards, Arun [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 524 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Word order in Guix l10n 2020-12-15 13:11 ` Arun Isaac @ 2020-12-15 13:51 ` Zhu Zihao 0 siblings, 0 replies; 11+ messages in thread From: Zhu Zihao @ 2020-12-15 13:51 UTC (permalink / raw) To: Arun Isaac; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 935 bytes --] Arun Isaac writes: > Hi Zhu Zihao, > > I faced the same problem while working on the Tamil localization. I used > the ~* argument jumping supported by (ice-9 format) to work around > this. So, for the message "could not find bootstrap binary '~a' for > system '~a'", you could do something like > > "无法找到用于引导 ~1@* 系统的二进制文件 ~0@*" > > ~0@* refers to the 0th argument, ~1@* refers to the 1th argument, and so > on. Look for "argument jumping" in > https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html > > Hope that helps! > > Regards, > Arun Thanks! Your code really help me a lot ;) But there's a few mistakes, I should use "无法找到用于引导 ~1@*~a 系统的二进制文件 ~0@*~a" It still requires a "~a" to specify object output. -- Retrieve my PGP public key: gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F Zihao [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 255 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-12-27 22:13 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-12-15 10:53 Word order in Guix l10n Zhu Zihao 2020-12-15 12:25 ` Julien Lepiller 2020-12-18 15:04 ` Ludovic Courtès 2020-12-18 18:03 ` Arun Isaac 2020-12-22 15:00 ` Ludovic Courtès 2020-12-22 15:06 ` Julien Lepiller 2020-12-23 12:22 ` Miguel Ángel Arruga Vivas 2020-12-27 22:13 ` Ludovic Courtès 2020-12-22 15:45 ` Miguel Ángel Arruga Vivas 2020-12-15 13:11 ` Arun Isaac 2020-12-15 13:51 ` Zhu Zihao
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).