From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: etc/HELLO markup etc. Date: Sat, 22 Dec 2018 11:41:05 -0800 Organization: UCLA Computer Science Department Message-ID: References: <3fd27fe5-e650-b207-fdd4-36f805b89b4d@cs.ucla.edu> <83bm5hcroa.fsf@gnu.org> <9f33127d-f01b-b138-7a0c-ffeac7b77938@cs.ucla.edu> <835zvochdj.fsf@gnu.org> <5f113128-36c9-30c6-3413-8dc36051e058@cs.ucla.edu> <83va3nban3.fsf@gnu.org> <838t0iasju.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1545507592 16502 195.159.176.226 (22 Dec 2018 19:39:52 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 22 Dec 2018 19:39:52 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 Cc: handa@gnu.org, monnier@iro.umontreal.ca, Emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Dec 22 20:39:48 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gan7h-00048x-SH for ged-emacs-devel@m.gmane.org; Sat, 22 Dec 2018 20:39:46 +0100 Original-Received: from localhost ([::1]:55064 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gan9o-00022D-5W for ged-emacs-devel@m.gmane.org; Sat, 22 Dec 2018 14:41:56 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54038) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gan9B-000228-Ad for Emacs-devel@gnu.org; Sat, 22 Dec 2018 14:41:18 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gan96-0001HC-97 for Emacs-devel@gnu.org; Sat, 22 Dec 2018 14:41:17 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:40498) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gan95-0001DF-Vz; Sat, 22 Dec 2018 14:41:12 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7D284160D15; Sat, 22 Dec 2018 11:41:07 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 89kOfk_G_se8; Sat, 22 Dec 2018 11:41:06 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 80263160D16; Sat, 22 Dec 2018 11:41:06 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jsdIHAk7XGTx; Sat, 22 Dec 2018 11:41:06 -0800 (PST) Original-Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 4789D160D0E; Sat, 22 Dec 2018 11:41:06 -0800 (PST) In-Reply-To: <838t0iasju.fsf@gnu.org> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:231959 Archived-At: Eli Zaretskii wrote: > If Han unification is the only important user of the charset property, > then yes, we could remove the rest of the charset info from HELLO. Yes, that's the case. > the current HELLO just keeps the information > that was there before recoding it in UTF-8, nothing was added. Sure, but the non-Han markup is merely a relic of that file's old method = of=20 encoding, which avoided Unicode and instead used ISO 2022 escape sequence= s to=20 switch among various 8- and 16-bit encodings, as that was the only way to= show=20 text in (say) Russian under the constraints of the old method. The non-Ha= n=20 markup is completely unnecessary now that the file uses UTF-8. (The Han m= arkup=20 probably isn't needed either, though I also would like Handa's opinion on= that.) >> Although the etc/HELLO markup might be of interest to those who care a= bout >> annotating languages in the text, it's irrelevant to the ordinary purp= ose of >> that file, which is to show textual translations of "Hello" >=20 > That's not the original purpose of that file. The purpose is to show > scripts, not languages, and to show how we display different scripts > in the same buffer. OK, but either way the non-Han markup is irrelevant to the ordinary purpo= se of=20 the file. >> It's still not a good user interface, though, as it is difficult to se= e the >> markup's effect when visiting etc/HELLO in the usual way >=20 > If the usual way is via find-file and its ilk, then you should see the > same results as with "C-h h", so I'm not sure I understand what you > mean here. I meant that one cannot see the markup's effect when visiting the file wi= th=20 either C-h h or find-file in the usual way. It's useless markup. > In what way most of what you say is not applicable to etc/enriched.txt > in general? Other forms of enriched-text markup are typically easily visible. If I vi= sit=20 etc/enriched.txt I can easily see which parts are marked white on blue=20 background, which parts are marked italic, etc. Invisible enriched-text m= arkup=20 is much harder to deal with when editing an enriched-text file. >> the file is not a good showroom for how to maintain multilingual >> text. >=20 > What other facilities are you aware of or can suggest for showing > multilingual text with such level of detail and precision? In practice the most common and often the best way to deal with the situa= tion is=20 to do what the non-markup part of etc/HELLO is already doing: indicate wi= thin=20 the text itself what language or script is being used, to help the reader= who=20 may be unacquainted with them, and with enough punctuation within the tex= t so=20 that the reader can easily see what's going on. This technique has been u= sed for=20 centuries, it's by far the most popular technique in common practice toda= y, and=20 it suffices for this particular application (with the possible exception = of its=20 Chinese and Japanese text). >> It's not a good sign that there seem to be errors in the >> possibly-useful (i.e., CJ) markup that nobody has noticed since the >> markup was introduced in May, and that I noticed these errors now >> only because I was visiting the file literally. >=20 > Which errors? I don't think we discovered any errors. Yes, and that's the point! The approach we're taking is not good for deal= ing=20 with the situation. One example of such an error is that "=E6=97=A5=E6=9C=AC=E8=AA=9E" has no= charset properties even=20 though it's obviously intended to use a Japanese script (since it follows= the=20 word "Japanese"). I'm sure there are others.