From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.bugs Subject: bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS) Date: Tue, 24 May 2016 23:19:05 -0700 Organization: UCLA Computer Science Department Message-ID: <574543D9.6080901@cs.ucla.edu> References: <87bn3z4l9i.fsf@mat.ucm.es> <1444321464004323@web25h.yandex.ru> <83h9do67pp.fsf@gnu.org> <21f6198c-a2fc-365f-caf7-79fad5027f1c@yandex.ru> <83twho41xd.fsf@gnu.org> <1f8cf525-c138-03f6-7f17-65015dc5cdfa@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1464157240 16827 80.91.229.3 (25 May 2016 06:20:40 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 25 May 2016 06:20:40 +0000 (UTC) Cc: oub@mat.ucm.es, 23595@debbugs.gnu.org To: Dmitry Gutov , Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed May 25 08:20:28 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1b5SB9-0006hA-Nb for geb-bug-gnu-emacs@m.gmane.org; Wed, 25 May 2016 08:20:27 +0200 Original-Received: from localhost ([::1]:57569 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b5SB8-0000Fm-UZ for geb-bug-gnu-emacs@m.gmane.org; Wed, 25 May 2016 02:20:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43443) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b5SAt-00007y-KD for bug-gnu-emacs@gnu.org; Wed, 25 May 2016 02:20:20 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b5SAl-0004Fm-2o for bug-gnu-emacs@gnu.org; Wed, 25 May 2016 02:20:10 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:52833) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b5SAk-0004Fh-Vw for bug-gnu-emacs@gnu.org; Wed, 25 May 2016 02:20:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1b5SAk-0001Ld-Qx for bug-gnu-emacs@gnu.org; Wed, 25 May 2016 02:20:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 25 May 2016 06:20:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23595 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 23595-submit@debbugs.gnu.org id=B23595.14641571545113 (code B ref 23595); Wed, 25 May 2016 06:20:02 +0000 Original-Received: (at 23595) by debbugs.gnu.org; 25 May 2016 06:19:14 +0000 Original-Received: from localhost ([127.0.0.1]:36936 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b5S9y-0001KP-0K for submit@debbugs.gnu.org; Wed, 25 May 2016 02:19:14 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:57292) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b5S9w-0001K0-8O for 23595@debbugs.gnu.org; Wed, 25 May 2016 02:19:12 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 68DC916137E; Tue, 24 May 2016 23:19:06 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 5v6xD4fd4S3j; Tue, 24 May 2016 23:19:05 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A2EC616137F; Tue, 24 May 2016 23:19:05 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id xQ0EC46wwrwL; Tue, 24 May 2016 23:19:05 -0700 (PDT) Original-Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 803F816137E; Tue, 24 May 2016 23:19:05 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 In-Reply-To: <1f8cf525-c138-03f6-7f17-65015dc5cdfa@yandex.ru> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:118649 Archived-At: Dmitry Gutov wrote: > Should `utf-8' altogether replace `undecided' in vc-coding-system-for-d= iff? Then > the use of buffer-file-coding-system could be predicated on its being c= ompatible > with ascii. That might be going too far. We want buffer-file-coding-system to be compatible-enough with ASCII for = the=20 case where diff output might contain ASCII metadata or non-ASCII file con= tents=20 or both. In this case, if buffer-file-coding-system is greatly incompatib= le with=20 ASCII, then ASCII will often be wrong (because the file data in the diff = output=20 will be mostly UTF-16, say), and buffer-file-coding-system will often be = wrong=20 too (because the non-file data will be mostly ASCII). So when=20 buffer-file-coding-system is greatly incompatible with ASCII, we can't us= e=20 either buffer-file-coding-system or UTF-8; they're both wrong too often. The way it's *supposed* to work in a POSIX system, is that diff is suppos= ed to=20 be applied to a file that is valid text according to the current locale's= =20 encoding, and diff is supposed to generate both metadata and data that us= es the=20 current locale's encoding. I expect that we should fall back on this appr= oach=20 when buffer-file-coding-system is greatly incompatible with ASCII. This w= ill=20 better handle unusual cases such as a system operating in an EBCDIC local= e=20 (which can happen on IBM mainframes, though admittedly Emacs is not likel= y to=20 work well on such platforms). And this argues for sticking with 'undecide= d'=20 instead of 'utf-8' here. (In theory it's possible for a GNU/Linux system to establish a locale wit= h=20 UTF-16 encoding, so that diff's metadata and data are consistently UTF-16= for=20 this example. However, I've never heard of such a thing, and couldn't fin= d any=20 evidence of one just now when I searched for it. So I don't think we need= to=20 worry about this now.)