From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS) Date: Mon, 23 May 2016 19:48:50 +0300 Message-ID: <83h9do67pp.fsf@gnu.org> References: <87bn3z4l9i.fsf@mat.ucm.es> <1444321464004323@web25h.yandex.ru> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1464022225 16964 80.91.229.3 (23 May 2016 16:50:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 23 May 2016 16:50:25 +0000 (UTC) Cc: oub@mat.ucm.es, 23595@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon May 23 18:50:14 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1b4t3W-0002tV-4I for geb-bug-gnu-emacs@m.gmane.org; Mon, 23 May 2016 18:50:14 +0200 Original-Received: from localhost ([::1]:49335 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4t3V-0001Y7-4E for geb-bug-gnu-emacs@m.gmane.org; Mon, 23 May 2016 12:50:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33971) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4t3O-0001Uu-VX for bug-gnu-emacs@gnu.org; Mon, 23 May 2016 12:50:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b4t3K-0005rm-1M for bug-gnu-emacs@gnu.org; Mon, 23 May 2016 12:50:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:50913) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4t3J-0005ri-UA for bug-gnu-emacs@gnu.org; Mon, 23 May 2016 12:50:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1b4t3J-0002Ds-Of for bug-gnu-emacs@gnu.org; Mon, 23 May 2016 12:50:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 23 May 2016 16:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23595 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 23595-submit@debbugs.gnu.org id=B23595.14640221468473 (code B ref 23595); Mon, 23 May 2016 16:50:01 +0000 Original-Received: (at 23595) by debbugs.gnu.org; 23 May 2016 16:49:06 +0000 Original-Received: from localhost ([127.0.0.1]:35015 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b4t2P-0002Cb-Tf for submit@debbugs.gnu.org; Mon, 23 May 2016 12:49:06 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:42830) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b4t2O-0002C7-1x for 23595@debbugs.gnu.org; Mon, 23 May 2016 12:49:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b4t2F-0005d1-0W for 23595@debbugs.gnu.org; Mon, 23 May 2016 12:48:58 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:45420) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4t2E-0005cx-Tq; Mon, 23 May 2016 12:48:54 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:3405 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1b4t2D-0007Vv-GO; Mon, 23 May 2016 12:48:54 -0400 In-reply-to: <1444321464004323@web25h.yandex.ru> (message from Dmitry Gutov on Mon, 23 May 2016 14:52:03 +0300) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:118568 Archived-At: > From: Dmitry Gutov > Date: Mon, 23 May 2016 14:52:03 +0300 > > > The resulting diff contains either rubbish or fails to run. > > Files attached. I don't see any rubbish in the Git output. With RCS, the command signals an error, so more digging is needed to find out what's wrong (although it could be that rcsdiff exits with non-zero status when it sees what looks like binary files). > It seems, to an extent, be caused by our setting coding-system-for-read inside vc-diff-internal (to utf-16be-with-signature-unix, which is also the value of buffer-file-coding-system). > > Without that, the result of vc-diff (at least with Git) is "Binary files a/test-chin-jap.tex and b/test-chin-jap.tex differ". Emacs 24.5 does the same. Setting coding-system-for-read is correct, because the important use case is when the diffs are actually output. The problem is that UTF-16 is not ASCII-compatible, and so text output by Git itself will be mishandled. Another problem is that Git doesn't show the diffs at all. > Which is weird, considering both vc-diff-internal and vc-coding-system-for-diff have both been virtually untouched for the last couple of years. Not sure what do you see as weird. > But even if we figure out why happens, you (Uwe) probably want Git, Hg, etc, to treat this file as text, and not binary. Only then you'll be able to get meaningful diffs. I don't have a specific advice on that. Why can't we invoke "git diff --text"? That should fix the second problem, I think. As for the first problem, we should probably refrain from binding coding-system-for-read to a CODING-SYSTEM for which (coding-system-get CODING-SYSTEM :ascii-compatible-p) returns nil. We should instead bind it to no-conversion and decode the file data parts by hand, skipping the parts that Git itself outputs (yes, this is messy). Patches to that effect are welcome. Bottom line: users who put UTF-16 encoded files into VCS are playing with fire, and are best advised not to do that!