all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 33796@debbugs.gnu.org
Subject: bug#33796: 27.0.50; Use utf-8 is all our Elisp files
Date: Wed, 19 Dec 2018 09:54:40 -0800	[thread overview]
Message-ID: <3fd27fe5-e650-b207-fdd4-36f805b89b4d@cs.ucla.edu> (raw)
In-Reply-To: <jwv36qu3c6i.fsf@iro.umontreal.ca>

 > I'm not really sure who to ask about this.

You can ask me (:-). Although I can't read east-Asian languages I do 
have significant experience with CJK text as my previous (15-year) job 
was in a company whose customers were almost all CJK and where CJK 
internationalization was essential and where I regularly dealt with 
weird encodings and displays. And this one is an easy call: for 
maintaining these particular files, UTF-8 is an improvement and this 
patch should go in.

To take just one example, titdic-cnv.el: people who are seriously 
maintaining it and who need to read the Chinese text will almost surely 
have their environment set up to display UTF-8 Chinese text well 
already. Furthermore, if you take a look at all the changes made to this 
file in the last decade, here are the statistics:

   edits contributor
      15 Author: Paul Eggert <eggert@cs.ucla.edu>
      10 Author: Glenn Morris <rgm@gnu.org>
       2 Author: Stefan Monnier <monnier@iro.umontreal.ca>
       2 Author: Juanma Barranquero <lekktu@gmail.com>
       1 Author: Phillip Lord <phillip.lord@russet.org.uk>
       1 Author: Kenichi Handa <handa@m17n.org>
       1 Author: Andreas Schwab <schwab@linux-m68k.org>

Only one edit was made by a CJK user, and handa's edit involved only 
ASCII characters. Switching this file to UTF-8 would not have made any 
of our maintenance any more difficult in the last decade.

Conversely, I commonly use tools like 'git grep' to look for issues in 
the code, and these tools mishandle non-UTF-8 files and I see mojibake 
on my screen because of this. So it will be a significant win for me 
(and I suspect others) when we switch these files to UTF-8.

To try to answer Stefan's questions:

 > - Do those people who edit those files really care about the difference?

No, almost always: see above.

 >   utf-8 is becoming standard even in the CJK world so
 >   maybe the change is not that terrible (or at least, users have gotten
 >   used to lowering their expectations in this respect).

Yes, that’s happened. I looked for recent reports about this, and it 
appears that the controversy is mostly over. For example, 
<https://gihyo.jp/lifestyle/serial/01/ganshiki-soushi/0069> (dated 2015) 
lamented the demise of Japanese Knoppix and said that Plamo Linux had 
problems with EUC-JP and suggested users switch to UTF-8. More recently 
<https://qiita.com/tenforward/items/5e353f290f0b401139cb> (dated this 
year) says that the choice of EUC-JP or UTF-8 is user-specific for Plamo 
Linux, and that applications like Firefox have problems with EUC-JP so 
discretion is advised if you choose EUC-JP. If even hardcore holdouts 
like Plamo are folding....

 > - If the change is indeed problematic, can we adjust it by using
 >   a file-global language tag?

I hope that’s not necessary, but it’d be OK if we have to do it.

 > - If that's not sufficient, can we use a scheme like that
 >   of etc/HELLO but to keep the files directly usable as Elisp (so as to
 >   have our cake and eat it too)?

etc/HELLO is pretty much a disaster for me now, as I can’t use any tool 
other than Emacs to look at it, and even Emacs screws up if I do 
something like 'M-x grep RET hello etc/HELLO RET'. I’d rather not extend 
this disaster to other files.

PS. One minor suggestion for your patch: please also update the list of 
files in admin/notes/unicode to remove mention of the files in question.

PPS. How about also converting etc/tutorials/TUTORIAL.ja, 
lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el, 
lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el?






  parent reply	other threads:[~2018-12-19 17:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-18 18:46 bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier
2018-12-18 19:22 ` Eli Zaretskii
2018-12-18 19:46   ` Stefan Monnier
2018-12-19 17:54 ` Paul Eggert [this message]
2018-12-19 18:11   ` Eli Zaretskii
2018-12-19 22:13     ` Paul Eggert
2018-12-20 16:06       ` Eli Zaretskii
2018-12-20 21:49         ` Paul Eggert
2018-12-21  7:29           ` Eli Zaretskii
2018-12-21 13:46             ` Stefan Monnier
2018-12-21 15:54               ` Eli Zaretskii
2018-12-21 13:55             ` Eli Zaretskii
2018-12-21 21:07             ` Paul Eggert
2018-12-22  1:19               ` Eric Lindblad
2018-12-22  7:56                 ` etc/HELLO markup etc. (Was: 27.0.50; Use utf-8 is all our Elisp files) Eli Zaretskii
2018-12-22  8:12               ` etc/HELLO markup etc Eli Zaretskii
2018-12-22 19:41                 ` Paul Eggert
2018-12-22 20:42                   ` Eli Zaretskii
2018-12-23  7:47                 ` Yuri Khan
2018-12-23 15:42                   ` Eli Zaretskii
2018-12-23 15:53                     ` Werner LEMBERG
2018-12-23 16:04                       ` Eli Zaretskii
2018-12-23 21:11                         ` Werner LEMBERG
2018-12-28  7:10                 ` Eli Zaretskii
2018-12-29  7:23                 ` handa
2018-12-29  7:37                   ` Eli Zaretskii
2019-01-06 12:06                     ` handa
2019-01-06 15:29                       ` Eli Zaretskii
2019-01-06 17:26                         ` Stefan Monnier
2019-01-06 17:39                           ` Eli Zaretskii
2019-01-06 18:08                             ` Stefan Monnier
2018-12-19 21:16   ` bug#33796: 27.0.50; Use utf-8 is all our Elisp files Stefan Monnier
2019-01-08  2:20 ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fd27fe5-e650-b207-fdd4-36f805b89b4d@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=33796@debbugs.gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.