From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files Date: Fri, 25 Sep 2015 10:55:24 +0300 Message-ID: <83fv23hr0z.fsf@gnu.org> References: <20150921165211.20434.28114@vcs.savannah.gnu.org> <83fv27mt7r.fsf@gnu.org> <83wpvfix7i.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1443167749 7796 80.91.229.3 (25 Sep 2015 07:55:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 25 Sep 2015 07:55:49 +0000 (UTC) Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 25 09:55:41 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZfNqz-0008Jr-TI for ged-emacs-devel@m.gmane.org; Fri, 25 Sep 2015 09:55:38 +0200 Original-Received: from localhost ([::1]:46600 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZfNqy-0006Bl-R9 for ged-emacs-devel@m.gmane.org; Fri, 25 Sep 2015 03:55:36 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57168) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZfNqq-0006Af-Gx for emacs-devel@gnu.org; Fri, 25 Sep 2015 03:55:29 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZfNqn-0004bB-B5 for emacs-devel@gnu.org; Fri, 25 Sep 2015 03:55:28 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:42309) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZfNqn-0004aI-2T for emacs-devel@gnu.org; Fri, 25 Sep 2015 03:55:25 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NV800G00336MS00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Fri, 25 Sep 2015 10:55:23 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NV800GC13C8MH10@a-mtaout22.012.net.il>; Fri, 25 Sep 2015 10:55:23 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:190340 Archived-At: > From: Stefan Monnier > Cc: emacs-devel@gnu.org > Date: Thu, 24 Sep 2015 17:29:38 -0400 >=20 > > emacs -Q > > M-x set-locale-environment RET he_IL.ISO-8859-8 RET > > C-x C-f doc/lispref/tips.texi RET >=20 > Hmm.... I don't think this is using prefer-utf-8. `prefer-utf-8' i= s > used for *.el files via file-coding-system-alist. So we now agree that at least non-*.el files should have the coding cookie, yes? As for *.el files: prefer-utf-8 is too easily duped for us to have such infinite faith in it. I can easily force a .el file to be saved in non-UTF-8 encoding, and then it will be decoded incorrectly when visited, if it doesn't have a coding cookie. E.g., try saving a foo.el with the following contents: (setq string "=D7=90=E2=80=9C=E2=80=9D") using cp1255, then kill the buffer and visit it again. You will see this instead: (setq string "=D3=94") Bottom line: we use prefer-utf-8 for *.el files so that the probability of such catastrophic errors be minimized when the lazy maintainers couldn't be bothered to add a cookie. But we don't want to be lazy ourselves, with the files we own and control. More generally, I think we should require any text file in the Emacs repository that includes non-ASCII characters to have an explicit coding cookie, so that these subtle problems don't lie low because most Emacs contributors live in UTF-8 locales.