From: stephen@xemacs.org
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: emacs-devel@gnu.org
Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Date: Sun, 27 Sep 2015 15:20:56 +0900 [thread overview]
Message-ID: <E1Zg5KS-0005NI-Ul@turnbull.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <56077431.7010906@cs.ucla.edu>
Paul Eggert writes:
> I think your information is out of date.
Rather, I think that yours is superficial. Really, you should listen
to those of us who live and work outside of the ASCII hemisphere.
I live and teach in Japan (a stone's throw from ETL, as it happens),
and most of the students I supervise are Chinese. I regularly need to
access Chinese and Japanese government and corporate data, and
retrieve preprints and data (and sometimes code) from the personal
pages of other scholars. Mojibake in the HTML pages is frequent, in
both Firefox and Chrome (of course it's almost always easy to guess
the actual coded character set in use, but it is mojibake). A
frequent cause is webservers configured to send "Content-Type:
text/html; charset=utf-8" but the page is encoded in something else.
> Yes, ten years ago there was a lot of non-UTF-8 out there, but
> nowadays they've largely moved on to UTF-8.
"Beauty is only skin-deep." The *top* pages, and some whole sites,
have moved on, because having beautiful (if mostly useless) top pages
is a matter of "face", so they buy new ones from companies with fancy
up-to-date web design software every couple of years. Perhaps most
recently authored pages are UTF-8. But the data sets themselves are
typically flat files, either CSV or plaintext. The explanatory pages,
even if in HTML, often haven't been revised in decades. Such useful
content is typically in a national standard coded character set rather
than Unicode.
And Emacs is hardly limited to the web. In practice, almost all mail
I receive from Chinese (even when it is in English or Japanese) is
labelled GB2312, GBK, or GB18030. The great majority of Japanese mail
is either Shift JIS or ISO 2022 JP (sometimes with "OEM characters"
that even today aren't in Unicode because they're not in JIS).
> Of course one can still find a few web sites using other encodings,
> but like it or not, UTF-8 dominates now.
What's not to like about UTF-8?! I *wish* non-UTF-8 was a matter of
information archaeology and Buddhist scholarship! I'm sad to say, it
is not: GB variants, Big5, and JIS variants are the *majority* of the
non-ASCII data I handle every day in my Emacs. (It's not the "great
majority" only because about 30% of the non-ASCII text I handle in
Emacs is authored by me, in UTF-8, of course.)
Regards,
next prev parent reply other threads:[~2015-09-27 6:20 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20150921165211.20434.28114@vcs.savannah.gnu.org>
[not found] ` <E1Ze4K3-0005KC-5U@vcs.savannah.gnu.org>
2015-09-21 19:57 ` [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files Stefan Monnier
2015-09-21 20:07 ` Eli Zaretskii
2015-09-24 16:44 ` Eli Zaretskii
2015-09-24 21:29 ` Stefan Monnier
2015-09-25 7:55 ` Eli Zaretskii
2015-09-25 12:21 ` Stefan Monnier
2015-09-25 13:37 ` Eli Zaretskii
2015-09-25 22:32 ` Paul Eggert
2015-09-26 6:27 ` Eli Zaretskii
2015-09-26 6:32 ` Eli Zaretskii
2015-09-26 14:31 ` Paul Eggert
2015-09-26 15:15 ` Eli Zaretskii
2015-09-26 16:01 ` Paul Eggert
2015-09-26 16:09 ` David Kastrup
2015-09-26 17:26 ` Eli Zaretskii
2015-09-26 18:53 ` Paul Eggert
2015-09-26 19:35 ` Eli Zaretskii
2015-09-26 20:26 ` Chad Brown
2015-09-26 21:50 ` David Kastrup
2015-09-27 4:44 ` Paul Eggert
2015-09-27 5:29 ` David Kastrup
2015-09-27 7:38 ` Paul Eggert
2015-09-27 7:46 ` David Kastrup
2015-09-27 7:52 ` Paul Eggert
2015-09-27 9:47 ` Andreas Schwab
2015-09-27 9:54 ` David Kastrup
2015-09-27 10:03 ` Andreas Schwab
2015-09-27 10:12 ` David Kastrup
2015-09-27 11:10 ` Andreas Schwab
2015-09-27 22:48 ` Richard Stallman
2015-09-28 2:41 ` Paul Eggert
2015-09-28 6:53 ` Eli Zaretskii
2015-09-28 15:08 ` Paul Eggert
2015-09-28 15:58 ` Eli Zaretskii
2015-09-27 7:39 ` Eli Zaretskii
2015-09-27 7:52 ` Paul Eggert
2015-09-27 8:00 ` David Kastrup
2015-09-27 8:03 ` Eli Zaretskii
2015-09-27 8:29 ` Paul Eggert
2015-09-27 8:37 ` David Kastrup
2015-09-27 8:40 ` Paul Eggert
2015-09-27 8:50 ` David Kastrup
2015-09-27 10:14 ` Eli Zaretskii
2015-09-27 8:57 ` Eli Zaretskii
2015-09-27 7:34 ` Eli Zaretskii
2015-09-27 16:03 ` Chad Brown
2015-09-27 18:41 ` Eli Zaretskii
2015-09-27 19:52 ` Chad Brown
2015-09-27 20:52 ` Eli Zaretskii
2015-09-26 20:32 ` Paul Eggert
2015-09-27 7:27 ` Eli Zaretskii
2015-09-27 7:42 ` David Kastrup
2015-09-27 9:20 ` Rustom Mody
2015-09-27 10:13 ` Eli Zaretskii
2015-09-27 20:21 ` Paul Eggert
2015-09-27 21:04 ` Eli Zaretskii
2015-09-27 8:22 ` Paul Eggert
2015-09-27 8:55 ` Eli Zaretskii
2015-09-27 9:56 ` Andreas Schwab
2015-09-27 10:04 ` David Kastrup
2015-09-27 10:16 ` Eli Zaretskii
2015-09-27 10:36 ` Eli Zaretskii
2015-09-27 10:59 ` Eli Zaretskii
2015-09-27 20:05 ` Paul Eggert
2015-09-26 17:25 ` Eli Zaretskii
2015-09-26 18:51 ` Paul Eggert
2015-09-27 0:12 ` stephen
2015-09-27 4:44 ` Paul Eggert
2015-09-27 6:20 ` stephen [this message]
2015-09-27 8:34 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1Zg5KS-0005NI-Ul@turnbull.sk.tsukuba.ac.jp \
--to=stephen@xemacs.org \
--cc=eggert@cs.ucla.edu \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).