From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: stephen@xemacs.org Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files Date: Sun, 27 Sep 2015 09:12:51 +0900 Message-ID: References: <20150921165211.20434.28114@vcs.savannah.gnu.org> <83fv27mt7r.fsf@gnu.org> <83wpvfix7i.fsf@gnu.org> <83fv23hr0z.fsf@gnu.org> <5605CB6B.4000102@cs.ucla.edu> <83twqhhf0g.fsf@gnu.org> <5606AC48.7090801@cs.ucla.edu> <83zj09fbzp.fsf@gnu.org> <5606C140.6090309@cs.ucla.edu> NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1443312798 11132 80.91.229.3 (27 Sep 2015 00:13:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 27 Sep 2015 00:13:18 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 27 02:13:14 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZfzaS-0008A6-5z for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 02:13:04 +0200 Original-Received: from localhost ([::1]:55661 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZfzaR-0001at-I0 for ged-emacs-devel@m.gmane.org; Sat, 26 Sep 2015 20:13:03 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46921) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZfzaO-0001ae-BH for emacs-devel@gnu.org; Sat, 26 Sep 2015 20:13:01 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZfzaL-0004mP-40 for emacs-devel@gnu.org; Sat, 26 Sep 2015 20:13:00 -0400 Original-Received: from turnbull.sk.tsukuba.ac.jp ([130.158.96.25]:48887) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZfzaK-0004jV-Qy for emacs-devel@gnu.org; Sat, 26 Sep 2015 20:12:57 -0400 Original-Received: from steve by turnbull.sk.tsukuba.ac.jp with local (Exim 4.86) (envelope-from ) id 1ZfzaF-0003SO-1m for emacs-devel@gnu.org; Sun, 27 Sep 2015 09:12:51 +0900 In-Reply-To: <5606C140.6090309@cs.ucla.edu> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: steve@turnbull.sk.tsukuba.ac.jp X-SA-Exim-Scanned: No (on turnbull.sk.tsukuba.ac.jp); SAEximRunCond expanded to false X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 130.158.96.25 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:190372 Archived-At: >>>>> Paul Eggert writes: > Eli Zaretskii wrote: >> So you are, in effect, saying that it is incorrect to derive the >> default encodings from the locale's codeset? > Yes, for Emacs developers. I think this makes sense. IIUC Emacs already uses characters outside of the Unicode repertoire, so it shouldn't be too hard to replicate any Emacs capabilities that require non-Unicode characters or charsets *inside* Emacs by using such characters. Assuming there are any; I suspect even HELLO doesn't actually need them. There's no "gaiji" problem of how to tell Emacs what to do with those characters; the developer who introduces them into Emacs is responsible for adding them to Emacs's non-Unicode repertoire. > And come to think of it, for most Emacs users. I hope not, because that would imply that Emacs users in China, Japan, probably Korea, and Taiwan are becoming a decreasing rather than increasing fraction of Emacs users. > Nowadays in my experience most non-ASCII text files use UTF-8, > regardless of locale. Toto, I don't think we're in Kansas any more. > The old days of having to guess encoding from the locale are > passing away. This is partly due to UTF-8 being the encoding of > choice for HTML and XML, where UTF-8 overtook the older 8-bit > encodings in 2008 and now is by far the dominant encoding. On the commercial internet, yes, but not for government and academic sites in Japan and China. > One way to accommodate the new reality would be to Recognize that it's probably due to insufficient experience? > change Emacs so that by default the system locale does not affect > Emacs's guess of a file's encoding if the file's initial sample is > valid UTF-8. "Not affect" is probably a bad idea. Giving UTF-8 too strong preference on Windows is a bad idea, because there are a lot of Windows coding systems that use UTF-8 trailing bytes to represent characters; it's occasionally possible to run into UTF-8-conforming files that are intended to be something else. This isn't true for ISO-8859 coding systems. > Users could set a variable to re-enable the old behavior. If we > did this, we wouldn't have the error-prone process if sprinkling > 'coding: utf-8' cookies all over the place.