From: Paul Eggert <eggert@cs.ucla.edu>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 33796@debbugs.gnu.org
Subject: bug#33796: 27.0.50; Use utf-8 is all our Elisp files
Date: Wed, 19 Dec 2018 09:54:40 -0800 [thread overview]
Message-ID: <3fd27fe5-e650-b207-fdd4-36f805b89b4d@cs.ucla.edu> (raw)
In-Reply-To: <jwv36qu3c6i.fsf@iro.umontreal.ca>
> I'm not really sure who to ask about this.
You can ask me (:-). Although I can't read east-Asian languages I do
have significant experience with CJK text as my previous (15-year) job
was in a company whose customers were almost all CJK and where CJK
internationalization was essential and where I regularly dealt with
weird encodings and displays. And this one is an easy call: for
maintaining these particular files, UTF-8 is an improvement and this
patch should go in.
To take just one example, titdic-cnv.el: people who are seriously
maintaining it and who need to read the Chinese text will almost surely
have their environment set up to display UTF-8 Chinese text well
already. Furthermore, if you take a look at all the changes made to this
file in the last decade, here are the statistics:
edits contributor
15 Author: Paul Eggert <eggert@cs.ucla.edu>
10 Author: Glenn Morris <rgm@gnu.org>
2 Author: Stefan Monnier <monnier@iro.umontreal.ca>
2 Author: Juanma Barranquero <lekktu@gmail.com>
1 Author: Phillip Lord <phillip.lord@russet.org.uk>
1 Author: Kenichi Handa <handa@m17n.org>
1 Author: Andreas Schwab <schwab@linux-m68k.org>
Only one edit was made by a CJK user, and handa's edit involved only
ASCII characters. Switching this file to UTF-8 would not have made any
of our maintenance any more difficult in the last decade.
Conversely, I commonly use tools like 'git grep' to look for issues in
the code, and these tools mishandle non-UTF-8 files and I see mojibake
on my screen because of this. So it will be a significant win for me
(and I suspect others) when we switch these files to UTF-8.
To try to answer Stefan's questions:
> - Do those people who edit those files really care about the difference?
No, almost always: see above.
> utf-8 is becoming standard even in the CJK world so
> maybe the change is not that terrible (or at least, users have gotten
> used to lowering their expectations in this respect).
Yes, that’s happened. I looked for recent reports about this, and it
appears that the controversy is mostly over. For example,
<https://gihyo.jp/lifestyle/serial/01/ganshiki-soushi/0069> (dated 2015)
lamented the demise of Japanese Knoppix and said that Plamo Linux had
problems with EUC-JP and suggested users switch to UTF-8. More recently
<https://qiita.com/tenforward/items/5e353f290f0b401139cb> (dated this
year) says that the choice of EUC-JP or UTF-8 is user-specific for Plamo
Linux, and that applications like Firefox have problems with EUC-JP so
discretion is advised if you choose EUC-JP. If even hardcore holdouts
like Plamo are folding....
> - If the change is indeed problematic, can we adjust it by using
> a file-global language tag?
I hope that’s not necessary, but it’d be OK if we have to do it.
> - If that's not sufficient, can we use a scheme like that
> of etc/HELLO but to keep the files directly usable as Elisp (so as to
> have our cake and eat it too)?
etc/HELLO is pretty much a disaster for me now, as I can’t use any tool
other than Emacs to look at it, and even Emacs screws up if I do
something like 'M-x grep RET hello etc/HELLO RET'. I’d rather not extend
this disaster to other files.
PS. One minor suggestion for your patch: please also update the list of
files in admin/notes/unicode to remove mention of the files in question.
PPS. How about also converting etc/tutorials/TUTORIAL.ja,
lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el,
lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el?
next prev parent reply other threads:[~2018-12-19 17:54 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-18 18:46 bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier
2018-12-18 19:22 ` Eli Zaretskii
2018-12-18 19:46 ` Stefan Monnier
2018-12-19 17:54 ` Paul Eggert [this message]
2018-12-19 18:11 ` Eli Zaretskii
2018-12-19 22:13 ` Paul Eggert
2018-12-20 16:06 ` Eli Zaretskii
2018-12-20 21:49 ` Paul Eggert
2018-12-21 7:29 ` Eli Zaretskii
2018-12-21 13:46 ` Stefan Monnier
2018-12-21 15:54 ` Eli Zaretskii
2018-12-21 13:55 ` Eli Zaretskii
2018-12-19 21:16 ` Stefan Monnier
2019-01-08 2:20 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3fd27fe5-e650-b207-fdd4-36f805b89b4d@cs.ucla.edu \
--to=eggert@cs.ucla.edu \
--cc=33796@debbugs.gnu.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).