From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel,gmane.comp.gnu.gettext.bugs Subject: Re: Emacs i18n Date: Wed, 20 Mar 2019 23:32:52 +0200 Organization: LINKOV.NET Message-ID: <87h8bx5ijn.fsf@mail.linkov.net> References: <25076895.mA2g9mTHSI@omega> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="67793"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu) Cc: emacs-devel@gnu.org, rms@gnu.org, bug-gettext@gnu.org To: Bruno Haible Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Mar 20 23:45:10 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h6jxN-000HWZ-Ms for ged-emacs-devel@m.gmane.org; Wed, 20 Mar 2019 23:45:09 +0100 Original-Received: from localhost ([127.0.0.1]:54392 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h6jxM-0007oF-NI for ged-emacs-devel@m.gmane.org; Wed, 20 Mar 2019 18:45:08 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:36232) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h6jvx-0007gU-29 for emacs-devel@gnu.org; Wed, 20 Mar 2019 18:43:42 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h6jnU-0006fX-LZ for emacs-devel@gnu.org; Wed, 20 Mar 2019 18:34:57 -0400 Original-Received: from bonobo.maple.relay.mailchannels.net ([23.83.214.22]:40990) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h6jnT-0006Nf-3Y; Wed, 20 Mar 2019 18:34:55 -0400 X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 4BBBF140E5C; Wed, 20 Mar 2019 22:34:51 +0000 (UTC) Original-Received: from pdx1-sub0-mail-a45.g.dreamhost.com (100-96-4-60.trex.outbound.svc.cluster.local [100.96.4.60]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id CFD5A140C28; Wed, 20 Mar 2019 22:34:50 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from pdx1-sub0-mail-a45.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.2); Wed, 20 Mar 2019 22:34:51 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jurta@jurta.org X-MailChannels-Auth-Id: dreamhost X-Versed-Duck: 379f64b3245afb09_1553121291089_1971534257 X-MC-Loop-Signature: 1553121291089:3976504373 X-MC-Ingress-Time: 1553121291088 Original-Received: from pdx1-sub0-mail-a45.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a45.g.dreamhost.com (Postfix) with ESMTP id 3450781B25; Wed, 20 Mar 2019 15:34:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=linkov.net; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=linkov.net; bh=bkkf9H y5zXViQ6eLzfbPZTOb85U=; b=2PK/AyIKclmA+gO6393CN8rMc7QWjWEWe8fZAe 9sWyLdkXGKAe9Wpfh6dI6Ai7IXkZwwvgSeuevuI6Jt02cdtHNWAdjHI1QpzQdbUx QuDn4hCuH84r6jrpumgt+f7GqFxhzElZzilaKrzAg5/jwxcPCgSlw+DICQmHdwKn d0xJE= Original-Received: from mail.jurta.org (m91-129-108-250.cust.tele2.ee [91.129.108.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jurta@jurta.org) by pdx1-sub0-mail-a45.g.dreamhost.com (Postfix) with ESMTPSA id 5CC7B81B1D; Wed, 20 Mar 2019 15:34:45 -0700 (PDT) X-DH-BACKEND: pdx1-sub0-mail-a45 In-Reply-To: <25076895.mA2g9mTHSI@omega> (Bruno Haible's message of "Wed, 20 Mar 2019 12:59:32 +0100") X-VR-OUT-STATUS: OK X-VR-OUT-SCORE: -100 X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedutddrieeigdduieefucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufhofhffjgfkfgggtgfgsehtkeertddtreejnecuhfhrohhmpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqnecuffhomhgrihhnpehlihhnkhhovhdrnhgvthdpghhnuhdrohhrghenucfkphepledurdduvdelrddutdekrddvhedtnecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehmrghilhdrjhhurhhtrgdrohhrghdpihhnvghtpeeluddruddvledruddtkedrvdehtddprhgvthhurhhnqdhprghthheplfhurhhiucfnihhnkhhovhcuoehjuhhriheslhhinhhkohhvrdhnvghtqedpmhgrihhlfhhrohhmpehjuhhriheslhhinhhkohhvrdhnvghtpdhnrhgtphhtthhopegsrhhunhhosegtlhhishhprdhorhhgnecuvehluhhsthgvrhfuihiivgeptd X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 23.83.214.22 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:234433 gmane.comp.gnu.gettext.bugs:1962 Archived-At: > Richard Stallman wrote in > : > >> I can envision something like this: >> >> "russian-nom:%d =D0=B1=D0=B0=D0=B9=D1=82%| =D1=81=D0=BA=D0=BE=D0= =BF=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD%|, %s, %s" >> >> where the 'russian-nom' operator would replace the two %| sequences >> with the appropriate declensional suffixes for the nominative case. > > It is, of course, tempting to try to do morphological analysis in an > algorithmic way, based on our background as algorithm hackers. Fran=C3=A7= ois > Pinard and others considered this, back in 1995 when they started i18n = in GNU. > > The reason this approach was not chosen is still valid today: > > When you design a translation system, you have two personas: > - the programmer, > - the translator. > > The translation system defines > 1) which information flows from the programmer to the translator, > and in which format, > 2) which information flows back from the translator to the programmer= , > and in which format. > > And it has to cope with the assumed skills of these personas: > > - The programmer, you can assume, can write and understand algorithms= , > but does not master the grammar of more than one language (usually)= . > > - The translator, you can assume, can translate sentences and knows > about the different meanings of words in different context. But the= y > cannot write nor understand algorithms. Many translators, in fact, > don't see the grammar as a set of rules. > > You may find some people on the intersection, such as a Russian hacker, > but it is hard to find people with both skills for languages such as > Vietnamese, Slovenian, or Basque. So, you better design the system in > such a way that no person is assumed to have both skills. > > The challenge is to define these formats 1) and 2) in a way that > > * Programmers can do their job with their skills (i.e. don't need to > understand Russian). > > * Translators can do their job with their skills (i.e. don't need to > understand algorithms). > > In the gettext approach (where 1) are POT files and 2) are PO files) we > added plural form handling, which is just a small morphological variati= on, > and it required a significant amount of documentation and education for > translators. I would say, it is on the limit what we can make translato= rs > grok. > > Now, when you give a translator a string > > "russian-nom:%d =D0=B1=D0=B0=D0=B9=D1=82%| =D1=81=D0=BA=D0=BE=D0=BF=D0= =B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD%|, %s, %s" > > you need to think about the appropriate tooling that will make the > translator understand > - what 'russian-nom' means, > - what the '|' characters mean, > - what the '%' characters mean. > Either the translator tool should somehow highlight these characters > and present on-line help, or it should present it as a sequence of > strings to translate: > > Rule: russian-nom > "%d =D0=B1=D0=B0=D0=B9=D1=82" > " =D1=81=D0=BA=D0=BE=D0=BF=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD" > ", %s, %s" > > It is important to realize that each such case of morphological variati= on > requires translator tooling support. And unfortunately different such t= ools > exist, and every translator has their preferred one. For the plural for= m > handling alone, it took several years until the main tools had support = for > it in their UI. Indeed, a complete implementation of all Russian morphological rules takes ~1600 lines of dense Perl code: http://www.linkov.net/files/nlp/Lingua-RU-Inflect.pm I can't imagine how to include all these rules to gettext. But there is no need because gettext already strikes a decent balance between complexity of natural languages and practical needs of program internationalization where translators themselves decide how words in messages should be inflected for different plural forms. Currently we have more urgent tasks after the first step of adding =E2=80=98ngettext=E2=80=99 like in CLISP, the development stalled on the = problem of splitting messages into domains. But maybe CLISP already provides a good way to map packages to gettext domains? Does it require every package to have a separate domain or it collects translations from all packages into one domain?