From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files Date: Sun, 27 Sep 2015 01:22:48 -0700 Organization: UCLA Computer Science Department Message-ID: <5607A758.4020205@cs.ucla.edu> References: <20150921165211.20434.28114@vcs.savannah.gnu.org> <83fv27mt7r.fsf@gnu.org> <83wpvfix7i.fsf@gnu.org> <83fv23hr0z.fsf@gnu.org> <5605CB6B.4000102@cs.ucla.edu> <83twqhhf0g.fsf@gnu.org> <5606AC48.7090801@cs.ucla.edu> <83zj09fbzp.fsf@gnu.org> <5606C140.6090309@cs.ucla.edu> <878u7trwlb.fsf@fencepost.gnu.org> <5606E995.2000102@cs.ucla.edu> <83si61ezxd.fsf@gnu.org> <560700E1.4010403@cs.ucla.edu> <83pp14fhj5.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1443342191 6425 80.91.229.3 (27 Sep 2015 08:23:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 27 Sep 2015 08:23:11 +0000 (UTC) Cc: dak@gnu.org, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 27 10:23:01 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Zg7Ea-00007W-PE for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 10:23:00 +0200 Original-Received: from localhost ([::1]:56538 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg7EZ-0005eU-Qv for ged-emacs-devel@m.gmane.org; Sun, 27 Sep 2015 04:22:59 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56207) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg7EW-0005eL-7M for emacs-devel@gnu.org; Sun, 27 Sep 2015 04:22:57 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zg7EV-0002Bb-EL for emacs-devel@gnu.org; Sun, 27 Sep 2015 04:22:56 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:35818) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zg7EQ-00028U-73; Sun, 27 Sep 2015 04:22:50 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5CBE61611DA; Sun, 27 Sep 2015 01:22:49 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 3ktQiu2KTkTa; Sun, 27 Sep 2015 01:22:48 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 94B751611EB; Sun, 27 Sep 2015 01:22:48 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Jmecqu28kyTC; Sun, 27 Sep 2015 01:22:48 -0700 (PDT) Original-Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 681171611D9; Sun, 27 Sep 2015 01:22:48 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 In-Reply-To: <83pp14fhj5.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:190387 Archived-At: Eli Zaretskii wrote: > I've also looked at the *.po files in the latest releases of GNU Make, > Gawk, Texinfo, and Binutils, and I find that between 20% and 25% of > such files still use non-UTF-8 encodings. Yes, and those files are a pain to look at with Emacs now, since it typically misguesses their encodings. Presumably Emacs should be looking at .po files' charset= decorations. What's likely happening with those files is that they were originally created long ago in an 8-bit locale, and nobody has bothered to update their encodings since then. Many of the files haven't been changed in ages (about half of them have revision dates before 2010), and of course the older files will prefer legacy encodings. These older files are not a particularly good match for text that people edit today. > while I agree with you that UTF-8 encoded files are the majority > among non-ASCII files (and Emacs development aligns itself with that > fact very well), the non-UTF-8 minority, even in the Posix world, is > still significant enough, and we cannot possibly ignore it. Naturally we cannot ignore it. All I'm suggesting is that we change the default behavior so that it's more UTF-8 friendly, since that's the way the world is going. The old Emacs behavior should still be available, for people who need it.