From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.bugs Subject: bug#33796: 27.0.50; Use utf-8 is all our Elisp files Date: Wed, 19 Dec 2018 09:54:40 -0800 Organization: UCLA Computer Science Department Message-ID: <3fd27fe5-e650-b207-fdd4-36f805b89b4d@cs.ucla.edu> References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1545242130 20970 195.159.176.226 (19 Dec 2018 17:55:30 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 19 Dec 2018 17:55:30 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 Cc: 33796@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Dec 19 18:55:25 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gZg44-0005LA-EU for geb-bug-gnu-emacs@m.gmane.org; Wed, 19 Dec 2018 18:55:24 +0100 Original-Received: from localhost ([::1]:33373 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gZg6B-0005xk-Cx for geb-bug-gnu-emacs@m.gmane.org; Wed, 19 Dec 2018 12:57:35 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52929) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gZg4j-00059B-IV for bug-gnu-emacs@gnu.org; Wed, 19 Dec 2018 12:56:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gZg4g-0002ks-88 for bug-gnu-emacs@gnu.org; Wed, 19 Dec 2018 12:56:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:50808) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gZg4g-0002kY-1j for bug-gnu-emacs@gnu.org; Wed, 19 Dec 2018 12:56:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gZg4f-00076p-Vf for bug-gnu-emacs@gnu.org; Wed, 19 Dec 2018 12:56:02 -0500 X-Loop: help-debbugs@gnu.org In-Reply-To: Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 19 Dec 2018 17:56:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33796 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 33796-submit@debbugs.gnu.org id=B33796.154524210227231 (code B ref 33796); Wed, 19 Dec 2018 17:56:01 +0000 Original-Received: (at 33796) by debbugs.gnu.org; 19 Dec 2018 17:55:02 +0000 Original-Received: from localhost ([127.0.0.1]:55066 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gZg3d-00074q-Pg for submit@debbugs.gnu.org; Wed, 19 Dec 2018 12:55:01 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:43466) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gZg3U-00074U-20 for 33796@debbugs.gnu.org; Wed, 19 Dec 2018 12:54:53 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 2C79C160B58; Wed, 19 Dec 2018 09:54:42 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id hN0tAYqqVPeQ; Wed, 19 Dec 2018 09:54:40 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BF0A4160AD6; Wed, 19 Dec 2018 09:54:40 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id WRMqTWTqquTr; Wed, 19 Dec 2018 09:54:40 -0800 (PST) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A28691609BC; Wed, 19 Dec 2018 09:54:40 -0800 (PST) Openpgp: preference=signencrypt Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECH Content-Language: en-US X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:153605 Archived-At: > I'm not really sure who to ask about this. You can ask me (:-). Although I can't read east-Asian languages I do=20 have significant experience with CJK text as my previous (15-year) job=20 was in a company whose customers were almost all CJK and where CJK=20 internationalization was essential and where I regularly dealt with=20 weird encodings and displays. And this one is an easy call: for=20 maintaining these particular files, UTF-8 is an improvement and this=20 patch should go in. To take just one example, titdic-cnv.el: people who are seriously=20 maintaining it and who need to read the Chinese text will almost surely=20 have their environment set up to display UTF-8 Chinese text well=20 already. Furthermore, if you take a look at all the changes made to this=20 file in the last decade, here are the statistics: =C2=A0 edits contributor =C2=A0=C2=A0=C2=A0=C2=A0 15 Author: Paul Eggert =C2=A0=C2=A0=C2=A0=C2=A0 10 Author: Glenn Morris =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2 Author: Stefan Monnier =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2 Author: Juanma Barranquero =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1 Author: Phillip Lord =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1 Author: Kenichi Handa =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1 Author: Andreas Schwab Only one edit was made by a CJK user, and handa's edit involved only=20 ASCII characters. Switching this file to UTF-8 would not have made any=20 of our maintenance any more difficult in the last decade. Conversely, I commonly use tools like 'git grep' to look for issues in=20 the code, and these tools mishandle non-UTF-8 files and I see mojibake=20 on my screen because of this. So it will be a significant win for me=20 (and I suspect others) when we switch these files to UTF-8. To try to answer Stefan's questions: > - Do those people who edit those files really care about the differenc= e? No, almost always: see above. >=C2=A0=C2=A0 utf-8 is becoming standard even in the CJK world so >=C2=A0=C2=A0 maybe the change is not that terrible (or at least, users = have gotten >=C2=A0=C2=A0 used to lowering their expectations in this respect). Yes, that=E2=80=99s happened. I looked for recent reports about this, and= it=20 appears that the controversy is mostly over. For example,=20 (dated 2015)=20 lamented the demise of Japanese Knoppix and said that Plamo Linux had=20 problems with EUC-JP and suggested users switch to UTF-8. More recently=20 (dated this=20 year) says that the choice of EUC-JP or UTF-8 is user-specific for Plamo=20 Linux, and that applications like Firefox have problems with EUC-JP so=20 discretion is advised if you choose EUC-JP. If even hardcore holdouts=20 like Plamo are folding.... > - If the change is indeed problematic, can we adjust it by using >=C2=A0=C2=A0 a file-global language tag? I hope that=E2=80=99s not necessary, but it=E2=80=99d be OK if we have to= do it. > - If that's not sufficient, can we use a scheme like that >=C2=A0=C2=A0 of etc/HELLO but to keep the files directly usable as Elis= p (so as to >=C2=A0=C2=A0 have our cake and eat it too)? etc/HELLO is pretty much a disaster for me now, as I can=E2=80=99t use an= y tool=20 other than Emacs to look at it, and even Emacs screws up if I do=20 something like 'M-x grep RET hello etc/HELLO RET'. I=E2=80=99d rather not= extend=20 this disaster to other files. PS. One minor suggestion for your patch: please also update the list of=20 files in admin/notes/unicode to remove mention of the files in question. PPS. How about also converting etc/tutorials/TUTORIAL.ja,=20 lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el,=20 lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el?