From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Bruno Haible Newsgroups: gmane.comp.gnu.gettext.bugs,gmane.emacs.devel Subject: Re: Emacs i18n Date: Wed, 20 Mar 2019 12:59:32 +0100 Message-ID: <25076895.mA2g9mTHSI@omega> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="206245"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: KMail/5.1.3 (Linux/4.4.0-141-generic; KDE/5.18.0; x86_64; ; ) Cc: bug-gettext-mXXj517/zsQ@public.gmane.org, emacs-devel-mXXj517/zsQ@public.gmane.org To: rms-mXXj517/zsQ@public.gmane.org Original-X-From: bug-gettext-bounces+gcggb-bug-gettext=m.gmane.org-mXXj517/zsQ@public.gmane.org Wed Mar 20 12:59:54 2019 Return-path: Envelope-to: gcggb-bug-gettext@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h6Zsw-000rYu-Bo for gcggb-bug-gettext@m.gmane.org; Wed, 20 Mar 2019 12:59:54 +0100 Original-Received: from localhost ([127.0.0.1]:46700 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h6Zsv-00023r-8r for gcggb-bug-gettext@m.gmane.org; Wed, 20 Mar 2019 07:59:53 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:55879) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h6Zsm-000220-Ma for bug-gettext-mXXj517/zsQ@public.gmane.org; Wed, 20 Mar 2019 07:59:45 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h6Zsl-0005Ar-JA for bug-gettext-mXXj517/zsQ@public.gmane.org; Wed, 20 Mar 2019 07:59:44 -0400 Original-Received: from mo6-p00-ob.smtp.rzone.de ([2a01:238:20a:202:5300::4]:33337) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h6Zsf-00050o-Ff; Wed, 20 Mar 2019 07:59:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1553083173; s=strato-dkim-0002; d=clisp.org; h=Message-ID:Date:Subject:Cc:To:From:X-RZG-CLASS-ID:X-RZG-AUTH:From: Subject:Sender; bh=Ex16yiuYmdb+ECX+66I8AQ92YqpkNnycl7CtM3adJyg=; b=LGPmeNnO/ZBlwu6t1xtlx8NmQFZS4Rn7kp95Iov1u2Mk5YJ+PK4DfKVF/7D/vwsR0Z rHWCGitHpitXBmIo4uI2h5I0ZLuXMyeGYFwNh5Vbf7YI+pNOel7xLZb/MWSXMLKHBuy5 g1drdCg1+c3ikqUHoXiwziQfhifDZA/ZQZLWxqA0+JWfsIDX0S76c1ctBR4FePiRH1lH pgM9v+3vIc2MlRZFOd3MXh9c0dmtQNTHaa6E48ssKMlb/f+F7gatSFZpS4h7hx3l6NnG 8EcXnHPTQ7Tuzjw7MffLRw8WWuFa+f8UQViduYN5ex8BXWmRNcAIBesf0k3T69to7aXR JEkg== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOGKf3zZFW" X-RZG-CLASS-ID: mo00 Original-Received: from bruno.haible.de by smtp.strato.de (RZmta 44.16 DYNA|AUTH) with ESMTPSA id 5094e6v2KBxXVde (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Wed, 20 Mar 2019 12:59:33 +0100 (CET) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a01:238:20a:202:5300::4 X-BeenThere: bug-gettext-mXXj517/zsQ@public.gmane.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for GNU gettext List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gettext-bounces+gcggb-bug-gettext=m.gmane.org-mXXj517/zsQ@public.gmane.org Original-Sender: "bug-gettext" Xref: news.gmane.org gmane.comp.gnu.gettext.bugs:1960 gmane.emacs.devel:234400 Archived-At: Richard Stallman wrote in : > I can envision something like this: > > "russian-nom:%d =D0=B1=D0=B0=D0=B9=D1=82%| =D1=81=D0=BA=D0=BE=D0=BF= =D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD%|, %s, %s" > > where the 'russian-nom' operator would replace the two %| sequences > with the appropriate declensional suffixes for the nominative case. It is, of course, tempting to try to do morphological analysis in an algorithmic way, based on our background as algorithm hackers. Fran=C3=A7ois Pinard and others considered this, back in 1995 when they started i18n in G= NU. The reason this approach was not chosen is still valid today: When you design a translation system, you have two personas: - the programmer, - the translator. The translation system defines 1) which information flows from the programmer to the translator, and in which format, 2) which information flows back from the translator to the programmer, and in which format. And it has to cope with the assumed skills of these personas: - The programmer, you can assume, can write and understand algorithms, but does not master the grammar of more than one language (usually). - The translator, you can assume, can translate sentences and knows about the different meanings of words in different context. But they cannot write nor understand algorithms. Many translators, in fact, don't see the grammar as a set of rules. You may find some people on the intersection, such as a Russian hacker, but it is hard to find people with both skills for languages such as Vietnamese, Slovenian, or Basque. So, you better design the system in such a way that no person is assumed to have both skills. The challenge is to define these formats 1) and 2) in a way that * Programmers can do their job with their skills (i.e. don't need to understand Russian). * Translators can do their job with their skills (i.e. don't need to understand algorithms). In the gettext approach (where 1) are POT files and 2) are PO files) we added plural form handling, which is just a small morphological variation, and it required a significant amount of documentation and education for translators. I would say, it is on the limit what we can make translators grok. Now, when you give a translator a string "russian-nom:%d =D0=B1=D0=B0=D0=B9=D1=82%| =D1=81=D0=BA=D0=BE=D0=BF=D0= =B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD%|, %s, %s" you need to think about the appropriate tooling that will make the translator understand - what 'russian-nom' means, - what the '|' characters mean, - what the '%' characters mean. Either the translator tool should somehow highlight these characters and present on-line help, or it should present it as a sequence of strings to translate: Rule: russian-nom "%d =D0=B1=D0=B0=D0=B9=D1=82" " =D1=81=D0=BA=D0=BE=D0=BF=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD" ", %s, %s" It is important to realize that each such case of morphological variation requires translator tooling support. And unfortunately different such tools exist, and every translator has their preferred one. For the plural form handling alone, it took several years until the main tools had support for it in their UI. Bruno