From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Brooks Newsgroups: gmane.emacs.devel Subject: Re: Internationalize Emacs's messages (swahili) Date: Sun, 27 Dec 2020 00:48:19 -0800 Message-ID: <877dp3skh8.fsf@db48x.net> References: <87o8ivumn5.fsf@telefonica.net> <87v9d3nkxk.fsf@gnus.org> <83sg7xrgr5.fsf@gnu.org> <83h7odrdwy.fsf@gnu.org> <86sg7w39fh.fsf@163.com> <83pn30pku5.fsf@gnu.org> <86wnx8otoj.fsf@163.com> <834kkbp9vr.fsf@gnu.org> <87czyxuxw6.fsf@db48x.net> <87v9cosv0b.fsf@db48x.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6428"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cc: all_but_last@163.com, bugs@gnu.support, dimech@gmx.com, abrochard@gmx.com, emacs-devel@gnu.org, eliz@gnu.org To: Richard Stallman Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Dec 27 09:49:02 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ktRjZ-0001SP-TE for ged-emacs-devel@m.gmane-mx.org; Sun, 27 Dec 2020 09:49:02 +0100 Original-Received: from localhost ([::1]:42614 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ktRjY-0000Vd-SN for ged-emacs-devel@m.gmane-mx.org; Sun, 27 Dec 2020 03:49:00 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:45578) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ktRj3-0008Vi-IG for emacs-devel@gnu.org; Sun, 27 Dec 2020 03:48:29 -0500 Original-Received: from smtp-out-4.mxes.net ([2605:d100:2f:10::315]:61584) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ktRj0-0006vP-Bb for emacs-devel@gnu.org; Sun, 27 Dec 2020 03:48:29 -0500 Original-Received: from Customer-MUA (mua.mxes.net [10.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 752EA759B7; Sun, 27 Dec 2020 03:48:21 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mxes.net; s=mta; t=1609058904; bh=0bpyRCOOZ4nDo/GMdFpfSMbcZYlShkAcq5se1jxFYws=; h=From:To:Subject:References:Date:In-Reply-To:Message-ID: MIME-Version:Content-Type; b=pB6fW9u2B7s8ozX4Hbg6pZMs+qPFPTFNMtV8+OfXyXjx2SZq3mdDyoSxJ998lgLJr wepeuX9ID2/P5F8SlZC5bf0qw8UAePvKrpsOjBO8TegebWyz1pGcTcoecKxkaIKCxT KKfDi1prGapKKXGgBE/3psS4aJr73Azs3L6o8WLM= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGOfPtRkwAAABJQ TFRFpKfbdou67PD6JjJgAwUWXGSeIcyLHgAAAkZJREFUOI1VU8Fy6yAMxLi+Q13fCZ3cnQL3dqTc 7RD+/1feStDXVnXHDuvVSivZTMba2GPdw3gyCGcMAFxTyrTd9dwGoxHiZX9PmRFUHYAQlGGtXY+F Uk0SJOxgJiUEnH1qkitT9D+pQub7qGAmUbR6bu3CvI96Yv6QqkBBMrsyfZccr1/RDXGDTLf4P7ZY glVxe2V+/ACXWO1gvDO9/gDRpFFVmPluvLcmBjd5H6d8DEte+Pbk4rcY/Fa5tLKLOtCZsuQKYhpa LOkYDT7hESya7/WIET3lfQBqX0pwFtbI832Is0ayMUR9B+12xjgPCQ089cfwkCkX6L5TPmRelJTh zMS0Sz1PyjLAMCUWjcmgQLWQMds+e3aaauZDf9dU9A2/8kPVF2odCUoMKHkfjJR+mbgC+DRiycw5 3XSqGe6HmhN/AWjHypkAXOAFW5EiuA1ge2GiZuMb0s1fSEXcATeLUfbyEY2L8yPOmdSsdghQXx3K pz2eoeXuYvMCINVFDrCdNfVUp4eJ6cSEbjbgFjBEvonGGTrgv9cHjAc8aVgSAPoxaONbzfwhDIhR at7IIS7fAGiDSwIA9alhhTBzfA7YM2FY6eMwayrIGK8FDFmshmUA43WqhFtpvoqG9HHaJ7fqtgTz 8EWVkgZgtsylFliHDgk0MB7KAEC45C/rgnGvanNLXyzOeTzcT2nw/N44gfrtYXRQLoz9Q3TgmJRx 2Mx/Q51qzpm+l3m8z2SWBqC5+PZXAtNYlGFf/gKfHfjFkDT4x7od7R+w3Ls+ZdQBuQAAAABJRU5E rkJggg== In-Reply-To: (Richard Stallman's message of "Sun, 27 Dec 2020 00:39:04 -0500") X-Sent-To: Received-SPF: none client-ip=2605:d100:2f:10::315; envelope-from=db48x@db48x.net; helo=smtp-out-4.mxes.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:261896 Archived-At: Richard Stallman writes: > I think the idea of integrating Fluent with gettext is interesting. > Would you like to study that possibility? Yes, I have been considering it. The biggest disconnect between Fluent and gettext is that Fluent allows recursive substitutions and multiple substitutions per message. Even just the fact that fluent handles the substitutions itself instead of making the caller delegate to printf is a big difference. Each message needs to give names to the values that will be substituted into it (basically argument names), and the PO files would have to specify how to use those arguments, whether it's subtituting them into the message directly, or switching on their values. The message catalog files (MO files) just have flat strings with no notion of substitutions or function calls. This is why my first inclination was to generate elisp code from a Fluent file. The easiest way to mush gettext and fluent together is to put some syntax into the messages that is post-processed before being returned to the caller, turning it into an interpreter. Something like this in a PO file: msgid "-sync-brand-name" msgstr "Firefox Account" msgid "sync-signedout-title" msgstr "Connect with your {-sync-brand-name}" A hypothetical igettext function could look for the curly braces, recurse to find the value of the -sync-brand-name message, perform the substitution (which allocates a new string), and then return the result. Or the value of sync-signedout-title could be precomputed before it was stored in the MO file. For this simple example, either would work. But consider a more complicated scenario: msgid "tabs-close-tooltip" msgstr "{$tabCount -> [one] Zamknij kart=C4=99 [few] Zamknij {$tabCount} karty *[many] Zamknij { $tabCount } kart }" Again igettext can process this after retrieving it from the MO file, but again it's just turning it into an interpreter for a slightly lispy language. It could be partially unrolled into the MO file by using up multiple strings in the MO file, but it would still need to be an interpreter to substitute in the tabCount value. Good translations frequently have more complexity. I'd rather it were compiled to elisp that can be byte-compiled and hopefully jit-compiled before too long. Also, note that the original language wants to have the same substitution capabilities as the translations. To my mind it would be really weird to embed those in the source of the program that uses igettext. Consider the following hypothetical example: (message (igettext "{$tabCount -> [one] Zamknij kart=C4=99 [few] Zamknij {$tabCount} karty *[many] Zamknij { $tabCount } kart }" 42)) The PO file would look something like this: msgid "{$tabCount -> [one] Zamknij kart=C4=99 [few] Zamknij {$tabCount} karty *[many] Zamknij { $tabCount } kart }" msgstr "{$tabCount -> [one] Close { $tabCount } tab *[many] Close { $tabCount } tabs }" This may be really weird for the translator, because it seems to imply that the degree and type of abstractions used in the translation is supposed to match that of the original text, which is not necessary at all. Thus I have kept the Fluent convention of using simple textual identifiers in the source code, which is a departure from the way gettext is normally used. My thoughts have gotten more organized as I wrote this up, so I apologize if I've skipped any important deductive steps or otherwise left anything unclear. > It seems that Fluent is not self-contained but rather depends > on the presence of an interpreter for JS, Python, or Rust. > Is that correct? That would be very undesirable in C programs > that don't contain any interpreter (and don't need one). Not quite. It's intended that various programs would either integrate with an existing implementation, or implement the Fluent spec in their own language where that's not convenient. For example, a C program can link against the fluent-rs static library, and thus avoid writing that code themselves. Of course depending on two compilers (one for C and another for Rust, since fluent-rs is written in Rust) is sometimes a deal-breaker. I think we would ignore these existing implementations, except possibly as a source of inspiration, and write our own. > Is it feasible to write a small Fluent interpreter in C for this > purpose? Absolutely, but my personal preference is to write it in Elisp. My second choice is actually to link against fluent-rs; Rust is a great language and certainly better than C for implementing such things. Of course that is it's own can of worms. My third choice is to write an implementation in C. This we could reasonably make a separate library we could share with anyone else wanting to use Fluent in their C program. But then we would have to write a lot of C. db48x