From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Drew Adams Newsgroups: gmane.emacs.devel Subject: RE: format use inquiry Date: Sat, 1 Jul 2017 08:43:51 -0700 (PDT) Message-ID: References: <7e7f068e4e.fsf@fencepost.gnu.org> <2FDCF579-44BC-49AD-985F-14AE5C654645@gmail.com> <2CC7167F-53F8-4676-BA51-4C68F25108BC@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1498923857 19602 195.159.176.226 (1 Jul 2017 15:44:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 1 Jul 2017 15:44:17 +0000 (UTC) To: Jean-Christophe Helary , Emacs development discussions Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jul 01 17:44:13 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dRKZ6-0004gf-B6 for ged-emacs-devel@m.gmane.org; Sat, 01 Jul 2017 17:44:08 +0200 Original-Received: from localhost ([::1]:55105 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dRKZB-00062m-Mc for ged-emacs-devel@m.gmane.org; Sat, 01 Jul 2017 11:44:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43776) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dRKZ0-00062b-3g for emacs-devel@gnu.org; Sat, 01 Jul 2017 11:44:03 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dRKYx-0006GH-11 for emacs-devel@gnu.org; Sat, 01 Jul 2017 11:44:02 -0400 Original-Received: from aserp1040.oracle.com ([141.146.126.69]:43907) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dRKYw-0006F3-ML for emacs-devel@gnu.org; Sat, 01 Jul 2017 11:43:58 -0400 Original-Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v61Fhtjb021579 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 1 Jul 2017 15:43:55 GMT Original-Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v61FhsKH014085 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 1 Jul 2017 15:43:55 GMT Original-Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v61FhrhU025073; Sat, 1 Jul 2017 15:43:53 GMT In-Reply-To: <2CC7167F-53F8-4676-BA51-4C68F25108BC@gmail.com> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.9.1 (1003210) [OL 12.0.6770.5000 (x86)] X-Source-IP: userv0021.oracle.com [156.151.31.71] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 141.146.126.69 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:216077 Archived-At: > > As for the question of messages that use singular > > vs plural forms, I'd again point to Common Lisp's > > `format', which addresses that kind of thing (at > > least for English). ^^^^^^^^^^^ > That -P flag for CL format contributes to producing > just as ugly code as what we have here. Code using `format' is ugly - sure. But it sure is convenient too. It's a bit like using Lisp `loop' or using `find' in UNIX, GNU, etc. Or using regexps. For better _and_ worse, `format' is practically a language unto itself - especially the more complex `format' of Common Lisp. Each of `loop', `find', and regexps is a language, and in each case the result of using it can be compact but esoteric code. If you prefer to use conditional code with `concat' etc. instead of `format', feel free. I don't see why existing code that uses `format' should be replaced with code that does similar conditional processing that is more explicit and verbose. > Code should never be used to create natural language > strings with syntactic expectation. In that case, a powerful alternative should be developed and proposed. It's clear that `format' & compagnie are _not_ the right tools for natural-language processing, including for translation help. They were not designed for that. There is nothing in Emacs Lisp that provides such a tool, AFAIK. If we start tearing apart existing Lisp code because it handles messages, menus, titles, doc, etc. in a way that does not facilitate natural-language treatment (including localization), that could just create a mess. Let simple sleeping dogs lie. Instead, someone interested in that aspect of things would do well to work on creating powerful Emacs-Lisp constructs that really _do_ facilitate natural-language treatment. IOW, try to come up with something that is a reasonable alternative to the rudimentary constructs that we use today. That's probably a hard thing to do, but if that's where someone's interest is, it could be a worthwhile endeavor. And with decades of natural language processing and localization research behind us now, perhaps there is some existing code out there (perhaps not Emacs Lisp) that could serve as inspiration, if not as a direct model. I don't think anyone would argue that `format' has what it takes to help with handling multiple languages cleanly. I certainly wouldn't. But it doesn't follow that we should now try to recode existing uses of it to recompose text into full words, in the interest of some potential future localization. Better would be to start localizing, and make whatever changes are truly helpful immediately, here or there, as needed. IOW, demand-driven vs eager, proactive change in ways that someone thinks might help. > Here is the example given in "Common Lisp the Language, > 2nd Edition" > (format nil "~D tr~:@P/~D win~:P" 7 1) =3D> "7 tries/1 win" > (format nil "~D tr~:@P/~D win~:P" 1 0) =3D> "1 try/0 wins" > (format nil "~D tr~:@P/~D win~:P" 1 3) =3D> "1 try/3 wins" >=20 > How do you think that kind of strings can possibly be localized ? I said: "at least for English". I was not proposing `format', Common Lisp or otherwise, as a tool or solution for localization. But if some such code really did need to be localized then the answer would presumably be to evaluate it - as you have done, to see the effect/result, and then code appropriately to produce an appropriate effect/result for other languages. Any way you look at it, in such a situation, regardless of how the first-language treatment is coded, you would presumably need to translate each of the possible outputs. As long as there is no such built-in `format' handling for other languages, their handling would need to be done using some condition tests (if-then-else etc.). That part would presumably follow what you would prefer. But why should it be used also for the English part, if a simpler expression is available for English? And clearly such brute manipulation of strings for natural language is a sledgehammer, whether it uses `format' or explicit, verbose conditional code. As you point out, this is not the way to do natural-language processing. IOW, it is not `format' or its use that is the problem. We don't have natural-language processing constructs in Emacs Lisp. In the absence of general natural-language help for Emacs Lisp, a better short-term treatment might be for someone to extend `format' (or similar) so that it can handle this or that other language - IOW, to give French, etc. a similar advantage wrt composing messages that handle both singular and plural etc. No, that would not be the right tool for localization or general natural-language processing either. But it would at least give French etc. the same advantage of compactness for formatting simple messages. [One reason that something like `format', instead of some more systematic, language-oriented construct, has been used for English messages could be that English is so irregular. (1) It can be used, simply. (2) It's not easy to come up with something more systematic, for English. It does the simple job, and it's hard to do a more systematic job. I expect that it would be simpler to handle plurality, etc. in a more regular language, such as French. In English, there are very few useful rules to take advantage of, so the treatment by CL `format' is both (a) rudimentary and (b) about as good as it can get while remaining simple.] Anyway, again, I was not proposing Common-Lisp `format' for any purpose of localization. I'm a fan of it for English, for simple things, and I think Emacs might benefit from it or something similar. My mention of it was essentially off-topic for your thread about localization. (But it could still be somewhat on-topic for a thread about `format'.)