From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: stephen@xemacs.org
Newsgroups: gmane.emacs.devel
Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding
	UTF-8 encoded Lisp files
Date: Sun, 27 Sep 2015 09:12:51 +0900
Message-ID: <E1ZfzaF-0003SO-1m@turnbull.sk.tsukuba.ac.jp>
References: <20150921165211.20434.28114@vcs.savannah.gnu.org>
	<E1Ze4K3-0005KC-5U@vcs.savannah.gnu.org>
	<jwv6133mtuz.fsf-monnier+emacsdiffs@gnu.org>
	<83fv27mt7r.fsf@gnu.org> <83wpvfix7i.fsf@gnu.org>
	<jwva8sbbj7w.fsf-monnier+emacsdiffs@gnu.org>
	<83fv23hr0z.fsf@gnu.org>
	<jwv37y2hf6x.fsf-monnier+emacsdiffs@gnu.org>
	<5605CB6B.4000102@cs.ucla.edu> <83twqhhf0g.fsf@gnu.org>
	<5606AC48.7090801@cs.ucla.edu> <83zj09fbzp.fsf@gnu.org>
	<5606C140.6090309@cs.ucla.edu>
NNTP-Posting-Host: plane.gmane.org
X-Trace: ger.gmane.org 1443312798 11132 80.91.229.3 (27 Sep 2015 00:13:18 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 27 Sep 2015 00:13:18 +0000 (UTC)
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 27 02:13:14 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1ZfzaS-0008A6-5z
	for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 02:13:04 +0200
Original-Received: from localhost ([::1]:55661 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1ZfzaR-0001at-I0
	for ged-emacs-devel@m.gmane.org; Sat, 26 Sep 2015 20:13:03 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46921)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <steve@turnbull.sk.tsukuba.ac.jp>) id 1ZfzaO-0001ae-BH
	for emacs-devel@gnu.org; Sat, 26 Sep 2015 20:13:01 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <steve@turnbull.sk.tsukuba.ac.jp>) id 1ZfzaL-0004mP-40
	for emacs-devel@gnu.org; Sat, 26 Sep 2015 20:13:00 -0400
Original-Received: from turnbull.sk.tsukuba.ac.jp ([130.158.96.25]:48887)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <steve@turnbull.sk.tsukuba.ac.jp>) id 1ZfzaK-0004jV-Qy
	for emacs-devel@gnu.org; Sat, 26 Sep 2015 20:12:57 -0400
Original-Received: from steve by turnbull.sk.tsukuba.ac.jp with local (Exim 4.86)
	(envelope-from <steve@turnbull.sk.tsukuba.ac.jp>) id 1ZfzaF-0003SO-1m
	for emacs-devel@gnu.org; Sun, 27 Sep 2015 09:12:51 +0900
In-Reply-To: <5606C140.6090309@cs.ucla.edu>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: steve@turnbull.sk.tsukuba.ac.jp
X-SA-Exim-Scanned: No (on turnbull.sk.tsukuba.ac.jp);
	SAEximRunCond expanded to false
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 130.158.96.25
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:190372
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/190372>

>>>>> Paul Eggert writes:
 > Eli Zaretskii wrote:

 >> So you are, in effect, saying that it is incorrect to derive the
 >> default encodings from the locale's codeset?

 > Yes, for Emacs developers.

I think this makes sense.  IIUC Emacs already uses characters outside
of the Unicode repertoire, so it shouldn't be too hard to replicate
any Emacs capabilities that require non-Unicode characters or charsets
*inside* Emacs by using such characters.  Assuming there are any; I
suspect even HELLO doesn't actually need them.  There's no "gaiji"
problem of how to tell Emacs what to do with those characters; the
developer who introduces them into Emacs is responsible for adding
them to Emacs's non-Unicode repertoire.

 > And come to think of it, for most Emacs users.

I hope not, because that would imply that Emacs users in China, Japan,
probably Korea, and Taiwan are becoming a decreasing rather than
increasing fraction of Emacs users.

 > Nowadays in my experience most non-ASCII text files use UTF-8,
 > regardless of locale.

Toto, I don't think we're in Kansas any more.

 > The old days of having to guess encoding from the locale are
 > passing away.  This is partly due to UTF-8 being the encoding of
 > choice for HTML and XML, where UTF-8 overtook the older 8-bit
 > encodings in 2008 and now is by far the dominant encoding.

On the commercial internet, yes, but not for government and academic
sites in Japan and China.

 > One way to accommodate the new reality would be to

Recognize that it's probably due to insufficient experience?

 > change Emacs so that by default the system locale does not affect
 > Emacs's guess of a file's encoding if the file's initial sample is
 > valid UTF-8.

"Not affect" is probably a bad idea.  Giving UTF-8 too strong
preference on Windows is a bad idea, because there are a lot of
Windows coding systems that use UTF-8 trailing bytes to represent
characters; it's occasionally possible to run into UTF-8-conforming
files that are intended to be something else.  This isn't true for
ISO-8859 coding systems.

 > Users could set a variable to re-enable the old behavior.  If we
 > did this, we wouldn't have the error-prone process if sprinkling
 > 'coding: utf-8' cookies all over the place.