unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Word order in Guix l10n
@ 2020-12-15 10:53 Zhu Zihao
  2020-12-15 12:25 ` Julien Lepiller
  2020-12-15 13:11 ` Arun Isaac
  0 siblings, 2 replies; 11+ messages in thread
From: Zhu Zihao @ 2020-12-15 10:53 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1345 bytes --]


Hi, Guix users!

Currently I'm putting my energy into Guix l10n(zh_CN). However, there's
a serious flaw in current implementation of l10n.

AFAFIK, Guix use format in (ice-9 format) to format the the template
string return by `G_`. The template string of (ice-9 format) looks
similar to the format template defined in ANSI CL, which only supports
format arguments **one by one**.

In CJK languages, word order usually different from English.

For example. consider message

  "could not find bootstrap binary '~a' for system '~a'"

We mark first ~a as %1 and mark second as %2. It should be translated
into Chinese like this

  "无法找到用于引导 %2 系统的二进制文件 %1"

But currently it looks to be impossible because we can't refer
to positional argument in format template.

My suggestions is we can create a new format function supports refer to
positional argument like

  (pos-format "~3 ~2 ~1" "foo" "bar" "baz") ;; => "baz bar foo"

And replace `format` in l10n with `pos-format` step by step. Maybe we
can create a function to combine `G_` and `format` so we can change its
implement detail without breaking existing code.

Please leave your comment on my opinion, thanks :)


-- 
Retrieve my PGP public key:

  gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F

Zihao

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-15 10:53 Word order in Guix l10n Zhu Zihao
@ 2020-12-15 12:25 ` Julien Lepiller
  2020-12-18 15:04   ` Ludovic Courtès
  2020-12-15 13:11 ` Arun Isaac
  1 sibling, 1 reply; 11+ messages in thread
From: Julien Lepiller @ 2020-12-15 12:25 UTC (permalink / raw)
  To: guix-devel, Zhu Zihao

[-- Attachment #1: Type: text/plain, Size: 1779 bytes --]

Even when translating to French, I sometimes feel the need to change word order, but I end up finding a slightly unnatural way to preserve the order of arguments. I don't have an example at hand though.

I don't know enough about guile to know how best to implement that (or if that exists already).

Le 15 décembre 2020 05:53:56 GMT-05:00, Zhu Zihao <all_but_last@163.com> a écrit :
>
>Hi, Guix users!
>
>Currently I'm putting my energy into Guix l10n(zh_CN). However, there's
>a serious flaw in current implementation of l10n.
>
>AFAFIK, Guix use format in (ice-9 format) to format the the template
>string return by `G_`. The template string of (ice-9 format) looks
>similar to the format template defined in ANSI CL, which only supports
>format arguments **one by one**.
>
>In CJK languages, word order usually different from English.
>
>For example. consider message
>
>  "could not find bootstrap binary '~a' for system '~a'"
>
>We mark first ~a as %1 and mark second as %2. It should be translated
>into Chinese like this
>
>  "无法找到用于引导 %2 系统的二进制文件 %1"
>
>But currently it looks to be impossible because we can't refer
>to positional argument in format template.
>
>My suggestions is we can create a new format function supports refer to
>positional argument like
>
>  (pos-format "~3 ~2 ~1" "foo" "bar" "baz") ;; => "baz bar foo"
>
>And replace `format` in l10n with `pos-format` step by step. Maybe we
>can create a function to combine `G_` and `format` so we can change its
>implement detail without breaking existing code.
>
>Please leave your comment on my opinion, thanks :)
>
>
>-- 
>Retrieve my PGP public key:
>
>  gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F
>
>Zihao

[-- Attachment #2: Type: text/html, Size: 1956 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-15 10:53 Word order in Guix l10n Zhu Zihao
  2020-12-15 12:25 ` Julien Lepiller
@ 2020-12-15 13:11 ` Arun Isaac
  2020-12-15 13:51   ` Zhu Zihao
  1 sibling, 1 reply; 11+ messages in thread
From: Arun Isaac @ 2020-12-15 13:11 UTC (permalink / raw)
  To: Zhu Zihao, guix-devel

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]


Hi Zhu Zihao,

I faced the same problem while working on the Tamil localization. I used
the ~* argument jumping supported by (ice-9 format) to work around
this. So, for the message "could not find bootstrap binary '~a' for
system '~a'", you could do something like

"无法找到用于引导 ~1@* 系统的二进制文件 ~0@*"

~0@* refers to the 0th argument, ~1@* refers to the 1th argument, and so
on. Look for "argument jumping" in
https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html

Hope that helps!

Regards,
Arun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 524 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-15 13:11 ` Arun Isaac
@ 2020-12-15 13:51   ` Zhu Zihao
  0 siblings, 0 replies; 11+ messages in thread
From: Zhu Zihao @ 2020-12-15 13:51 UTC (permalink / raw)
  To: Arun Isaac; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]


Arun Isaac writes:

> Hi Zhu Zihao,
>
> I faced the same problem while working on the Tamil localization. I used
> the ~* argument jumping supported by (ice-9 format) to work around
> this. So, for the message "could not find bootstrap binary '~a' for
> system '~a'", you could do something like
>
> "无法找到用于引导 ~1@* 系统的二进制文件 ~0@*"
>
> ~0@* refers to the 0th argument, ~1@* refers to the 1th argument, and so
> on. Look for "argument jumping" in
> https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html
>
> Hope that helps!
>
> Regards,
> Arun

Thanks! Your code really help me a lot ;)

But there's a few mistakes, I should use

"无法找到用于引导 ~1@*~a 系统的二进制文件 ~0@*~a"

It still requires a "~a" to specify object output.


-- 
Retrieve my PGP public key:

  gpg --recv-keys D47A9C8B2AE3905B563D9135BE42B352A9F6821F

Zihao

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-15 12:25 ` Julien Lepiller
@ 2020-12-18 15:04   ` Ludovic Courtès
  2020-12-18 18:03     ` Arun Isaac
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2020-12-18 15:04 UTC (permalink / raw)
  To: Julien Lepiller; +Cc: guix-devel, Zhu Zihao

Hi!

Julien Lepiller <julien@lepiller.eu> skribis:

> Even when translating to French, I sometimes feel the need to change
> word order, but I end up finding a slightly unnatural way to preserve
> the order of arguments. I don't have an example at hand though.
>
> I don't know enough about guile to know how best to implement that (or
> if that exists already).

This looks like a real issue.  I’m surprised this isn’t already
addressed though: after all, ‘printf’ format strings have the same
problem, right?  How does everyone else deal with that?

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-18 15:04   ` Ludovic Courtès
@ 2020-12-18 18:03     ` Arun Isaac
  2020-12-22 15:00       ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Isaac @ 2020-12-18 18:03 UTC (permalink / raw)
  To: Ludovic Courtès, Julien Lepiller; +Cc: guix-devel, Zhu Zihao

[-- Attachment #1: Type: text/plain, Size: 504 bytes --]


Hi,

> This looks like a real issue.  I’m surprised this isn’t already
> addressed though: after all, ‘printf’ format strings have the same
> problem, right?  How does everyone else deal with that?

For C's printf format strings, gettext supports special syntax to
specify argument order. See
https://www.gnu.org/software/gettext/manual/html_node/c_002dformat-Flag.html

A German example is provided on that page.

"%2$d Zeichen lang ist die Zeichenkette `%1$s'"

Regards,
Arun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 524 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-18 18:03     ` Arun Isaac
@ 2020-12-22 15:00       ` Ludovic Courtès
  2020-12-22 15:06         ` Julien Lepiller
  2020-12-22 15:45         ` Miguel Ángel Arruga Vivas
  0 siblings, 2 replies; 11+ messages in thread
From: Ludovic Courtès @ 2020-12-22 15:00 UTC (permalink / raw)
  To: Arun Isaac; +Cc: guix-devel, Zhu Zihao

Hi,

Arun Isaac <arunisaac@systemreboot.net> skribis:

>> This looks like a real issue.  I’m surprised this isn’t already
>> addressed though: after all, ‘printf’ format strings have the same
>> problem, right?  How does everyone else deal with that?
>
> For C's printf format strings, gettext supports special syntax to
> specify argument order. See
> https://www.gnu.org/software/gettext/manual/html_node/c_002dformat-Flag.html

Oh, I see.

> A German example is provided on that page.
>
> "%2$d Zeichen lang ist die Zeichenkette `%1$s'"

With (ice-9 format), as has been suggested before, we should be able to
do away with the “argument jumping” syntax (info "(guile) Formatted
Output"):

  (format #f "~1@*~d Zeichen lang ist die Zeichenkette `~0@*~a'" "ab" 2)

It’s a bit awkward though, in particular because we have to jump to the
previous argument (0 and 1 here instead of 1 and 2).

Does xgettext support that syntax?  We’ve had troubles before with ~*.

If it does, where should we use this syntax in lieu of the simpler
forms?  Everywhere?

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-22 15:00       ` Ludovic Courtès
@ 2020-12-22 15:06         ` Julien Lepiller
  2020-12-23 12:22           ` Miguel Ángel Arruga Vivas
  2020-12-22 15:45         ` Miguel Ángel Arruga Vivas
  1 sibling, 1 reply; 11+ messages in thread
From: Julien Lepiller @ 2020-12-22 15:06 UTC (permalink / raw)
  To: Ludovic Courtès, Arun Isaac; +Cc: guix-devel, Zhu Zihao

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

This specific syntax looks ok, but we need to limit ourself to the common syntax between guile and lisp, because that's what gettext supports.

We should use this kind of syntax everywhere we have more than one argument. Also thinking about rtl languages, it's probably important for them, though I'm not sure how gettext works for them.

Le 22 décembre 2020 10:00:35 GMT-05:00, "Ludovic Courtès" <ludo@gnu.org> a écrit :
>Hi,
>
>Arun Isaac <arunisaac@systemreboot.net> skribis:
>
>>> This looks like a real issue.  I’m surprised this isn’t already
>>> addressed though: after all, ‘printf’ format strings have the same
>>> problem, right?  How does everyone else deal with that?
>>
>> For C's printf format strings, gettext supports special syntax to
>> specify argument order. See
>>
>https://www.gnu.org/software/gettext/manual/html_node/c_002dformat-Flag.html
>
>Oh, I see.
>
>> A German example is provided on that page.
>>
>> "%2$d Zeichen lang ist die Zeichenkette `%1$s'"
>
>With (ice-9 format), as has been suggested before, we should be able to
>do away with the “argument jumping” syntax (info "(guile) Formatted
>Output"):
>
> (format #f "~1@*~d Zeichen lang ist die Zeichenkette `~0@*~a'" "ab" 2)
>
>It’s a bit awkward though, in particular because we have to jump to the
>previous argument (0 and 1 here instead of 1 and 2).
>
>Does xgettext support that syntax?  We’ve had troubles before with ~*.
>
>If it does, where should we use this syntax in lieu of the simpler
>forms?  Everywhere?
>
>Thanks,
>Ludo’.

[-- Attachment #2: Type: text/html, Size: 2343 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-22 15:00       ` Ludovic Courtès
  2020-12-22 15:06         ` Julien Lepiller
@ 2020-12-22 15:45         ` Miguel Ángel Arruga Vivas
  1 sibling, 0 replies; 11+ messages in thread
From: Miguel Ángel Arruga Vivas @ 2020-12-22 15:45 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, Zhu Zihao

Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> With (ice-9 format), as has been suggested before, we should be able to
> do away with the “argument jumping” syntax (info "(guile) Formatted
> Output"):
>
>   (format #f "~1@*~d Zeichen lang ist die Zeichenkette `~0@*~a'" "ab" 2)
>
> It’s a bit awkward though, in particular because we have to jump to the
> previous argument (0 and 1 here instead of 1 and 2).

I wouldn't think of absolute goto directive jumping to the previous
argument, it's just another chapter of the eternal debate regarding the
first ordinal: Common Lisp/SLIB/ice-9 use the '0' convention for the
'first' position---the smallest element from the set of natural
numbers---, instead of '1'.  C-style arrays can be interpreted like this
too.

> Does xgettext support that syntax?  We’ve had troubles before with ~*.

These troubles are related to plural forms[1].  Singular forms don't
have any issue because the type and number of format specifiers must
match always.

> If it does, where should we use this syntax in lieu of the simpler
> forms?  Everywhere?

Yup, for singular forms (non-ngettext) it can be used everywhere right
now.  The translation of plural forms could, at most, omit one numeric
directive (the one used for the ngettext call) to allow a more natural
way of expressing implicitly the numeral, but this will need to wait for
the next release of GNU gettext---the patch is almost there[2].
Nonetheless, the current version of msgfmt works correctly when no
format directive is omitted.

Happy hacking!
Miguel

[1] https://lists.gnu.org/archive/html/bug-gettext/2020-11/msg00027.html
[2] https://lists.gnu.org/archive/html/bug-gettext/2020-12/msg00041.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-22 15:06         ` Julien Lepiller
@ 2020-12-23 12:22           ` Miguel Ángel Arruga Vivas
  2020-12-27 22:13             ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Miguel Ángel Arruga Vivas @ 2020-12-23 12:22 UTC (permalink / raw)
  To: Julien Lepiller; +Cc: guix-devel, Zhu Zihao

[-- Attachment #1: Type: text/plain, Size: 3169 bytes --]

Hi Julien,

Julien Lepiller <julien@lepiller.eu> writes:

> This specific syntax looks ok, but we need to limit ourself to the
> common syntax between guile and lisp, because that's what gettext
> supports.

The issue with Guile's format is explained here[1], as the used
implementation follows SRFI-28[2], but there are no difference between
the format from Common Lisp and the one from (ice-9 format)[3] on the
surface level: both implementations are compatible regarding numeric,
iteration, selection and jump directives, to name a few.

Other directives might be compatible, such as the plural directive ~P,
or not, although most of them shouldn't be used in any case: not because
they could have compatibility problems but because they don't fit into
internationalized messages correctly.

For example, most languages have irregular cases for plural formation,
some have more than two grammatical numeric cases, such as
singular/dual/plural, and some don't have an equivalent category, such
as Japanese.  That's exactly use case of ngettext---I've pointed out on
the other mail the pending issue on that area, which is related to the
omission of the numeric parameter but not its order, and applies both to
Common Lisp and (ice-9 format).

> We should use this kind of syntax everywhere we have more than one
> argument.

I don't see the advantage of using everywhere jumps on the msgids.
Nonetheless, a TRANSLATORS: comment placed on the first string appearing
on the POT file, pointing the section of the manual for (ice-9 format),
or even an explicit and detailed explanation of this syntax could be
very helpful for translators.  The attached patch does this, although
any suggestion or even a complete rewrite is welcome, because I don't
feel it quite inspired.

> Also thinking about rtl languages, it's probably important
> for them, though I'm not sure how gettext works for them.

gettext-family functions only see byte arrays and provide the
corresponding array, the bytes are always placed in increasing memory
locations.  Right-to-left handling is a responsibility of visualization
layer, which sometimes includes the final format, but that is an issue
even with left-to-right languages as French.

For example, this composition...

  (string-append translated ": " other-translated)

... produces weird results, or convoluted French translations, because
it isn't handled properly.  A format string must be used here too,
because it must include the white-space expected in French before the
colon:

  (format #f (_ "~a: ~a") translated other-translated)

Newlines are the only ones that are omitted sometimes from the
internationalized composition because the convention up-to-down is
followed, but this is a limitation of the teletype/terminal interface
though; graphic interfaces aren't composed with this limitation and
"whole widgets" should be the localization frame, which usually is the
case.

Happy hacking!
Miguel

[1] https://www.gnu.org/software/guile/manual/html_node/Simple-Output.html
[2] https://www.gnu.org/software/guile/manual/html_node/SRFI_002d28.html
[3] https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: comment.patch --]
[-- Type: text/x-patch, Size: 1707 bytes --]

From 2615934a2c377858dce2a0410982287faed754a9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Miguel=20=C3=81ngel=20Arruga=20Vivas?=
 <rosen644835@gmail.com>
Date: Wed, 23 Dec 2020 13:07:38 +0100
Subject: [PATCH] nls: Add comment about format directives.

* gnu.scm (%try-use-modules): Add comment for translations.  It should
be placed on the first string found by xgettext.
---
 gnu.scm | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/gnu.scm b/gnu.scm
index f139531ef3..0e87b10eb2 100644
--- a/gnu.scm
+++ b/gnu.scm
@@ -78,6 +78,19 @@
                   (raise
                    (apply
                     make-compound-condition
+                    ;; TRANSLATORS: The scheme-format tag is used to identify
+                    ;; strings that contain format directives as specified
+                    ;; here:
+                    ;; https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html
+                    ;;
+                    ;; The goto/jump directive can be used to alter the order
+                    ;; of the arguments, either performing relative jumps with
+                    ;; ~N* and ~N:* (forward and backwards respectively) or
+                    ;; the absolute position of the argument can be used
+                    ;; (starting from 0) with ~N@*.  When N isn't provided,
+                    ;; it's understood to be 1 on the relative jumps (next and
+                    ;; previous argument respectively) and 0 on the absolute
+                    ;; jumps (first argument).
                     (formatted-message (G_ "module ~a not found")
                                        module)
                     (condition
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Word order in Guix l10n
  2020-12-23 12:22           ` Miguel Ángel Arruga Vivas
@ 2020-12-27 22:13             ` Ludovic Courtès
  0 siblings, 0 replies; 11+ messages in thread
From: Ludovic Courtès @ 2020-12-27 22:13 UTC (permalink / raw)
  To: Miguel Ángel Arruga Vivas; +Cc: guix-devel, Zhu Zihao

¡Hola!

Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis:

> From 2615934a2c377858dce2a0410982287faed754a9 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Miguel=20=C3=81ngel=20Arruga=20Vivas?=
>  <rosen644835@gmail.com>
> Date: Wed, 23 Dec 2020 13:07:38 +0100
> Subject: [PATCH] nls: Add comment about format directives.
>
> * gnu.scm (%try-use-modules): Add comment for translations.  It should
> be placed on the first string found by xgettext.

[...]

> +                    ;; TRANSLATORS: The scheme-format tag is used to identify
> +                    ;; strings that contain format directives as specified
> +                    ;; here:
> +                    ;; https://www.gnu.org/software/guile/manual/html_node/Formatted-Output.html
> +                    ;;
> +                    ;; The goto/jump directive can be used to alter the order
> +                    ;; of the arguments, either performing relative jumps with
> +                    ;; ~N* and ~N:* (forward and backwards respectively) or
> +                    ;; the absolute position of the argument can be used
> +                    ;; (starting from 0) with ~N@*.  When N isn't provided,
> +                    ;; it's understood to be 1 on the relative jumps (next and
> +                    ;; previous argument respectively) and 0 on the absolute
> +                    ;; jumps (first argument).
>                      (formatted-message (G_ "module ~a not found")
>                                         module)

Oh good, so we’d keep msgids unchanged and let translators use argument
jumping, right?  That sounds good to me.

The only downside is that it might easier for translators to get it
wrong.  Perhaps adding an example in the comment above would help?

Anyway I’m all for this patch.

Thanks!

Ludo’.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-12-27 22:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-15 10:53 Word order in Guix l10n Zhu Zihao
2020-12-15 12:25 ` Julien Lepiller
2020-12-18 15:04   ` Ludovic Courtès
2020-12-18 18:03     ` Arun Isaac
2020-12-22 15:00       ` Ludovic Courtès
2020-12-22 15:06         ` Julien Lepiller
2020-12-23 12:22           ` Miguel Ángel Arruga Vivas
2020-12-27 22:13             ` Ludovic Courtès
2020-12-22 15:45         ` Miguel Ángel Arruga Vivas
2020-12-15 13:11 ` Arun Isaac
2020-12-15 13:51   ` Zhu Zihao

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).