From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Uwe Brauer Newsgroups: gmane.emacs.bugs Subject: bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS) Date: Mon, 23 May 2016 17:00:53 +0000 Message-ID: <8737p8k8u2.fsf@mat.ucm.es> References: <87bn3z4l9i.fsf@mat.ucm.es> <1444321464004323@web25h.yandex.ru> <83h9do67pp.fsf@gnu.org> Reply-To: Uwe Brauer NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1464023132 32042 80.91.229.3 (23 May 2016 17:05:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 23 May 2016 17:05:32 +0000 (UTC) Cc: oub@mat.ucm.es, Dmitry Gutov , 23595@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon May 23 19:05:22 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1b4tI8-0001Oi-IB for geb-bug-gnu-emacs@m.gmane.org; Mon, 23 May 2016 19:05:20 +0200 Original-Received: from localhost ([::1]:49430 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4tI7-00032a-UY for geb-bug-gnu-emacs@m.gmane.org; Mon, 23 May 2016 13:05:19 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39043) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4tF0-0000wE-HQ for bug-gnu-emacs@gnu.org; Mon, 23 May 2016 13:02:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b4tEw-0000wS-E8 for bug-gnu-emacs@gnu.org; Mon, 23 May 2016 13:02:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:50960) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4tEw-0000wL-AZ for bug-gnu-emacs@gnu.org; Mon, 23 May 2016 13:02:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1b4tEw-00046I-15 for bug-gnu-emacs@gnu.org; Mon, 23 May 2016 13:02:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <87bn3z4l9i.fsf@mat.ucm.es> Resent-From: Uwe Brauer Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 23 May 2016 17:02:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23595 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 23595-submit@debbugs.gnu.org id=B23595.146402286613072 (code B ref 23595); Mon, 23 May 2016 17:02:01 +0000 Original-Received: (at 23595) by debbugs.gnu.org; 23 May 2016 17:01:06 +0000 Original-Received: from localhost ([127.0.0.1]:35062 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b4tE1-0003OY-Ry for submit@debbugs.gnu.org; Mon, 23 May 2016 13:01:06 -0400 Original-Received: from mail-wm0-f42.google.com ([74.125.82.42]:36469) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b4tDz-0003HM-Vx for 23595@debbugs.gnu.org; Mon, 23 May 2016 13:01:04 -0400 Original-Received: by mail-wm0-f42.google.com with SMTP id n129so89448174wmn.1 for <23595@debbugs.gnu.org>; Mon, 23 May 2016 10:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mat.ucm.es; s=google; h=from:to:cc:subject:references:reply-to:date:message-id:user-agent :mime-version; bh=pIKgdo4vnEzHQqEh419E2CZLnPWuxGaLEa4ckk3wZQE=; b=KrjhHdbUMghv7ScioTtiF8JMJtGK3V03990PIJw0Cqtpcc5ms6cIU+AeZJWL6uaBkK p09lJx8w2ncu5AjqJn2HPFnH05ApZyBw2Bre158dofmDl2AZ0tnmtgGMSCv42pWSr0z1 bcvJrXru5cNlTT920lgZBnCXKIWpGy6xTWsJo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:references:reply-to:date :message-id:user-agent:mime-version; bh=pIKgdo4vnEzHQqEh419E2CZLnPWuxGaLEa4ckk3wZQE=; b=HPpHL6UpoWU6/gaLuh3nhyjYYu450K0vVBapBbbtWOIaYoRku6TFd1OkC/J87QQF+f JeyHIH8Pdxx5M1e9RM4AuMgIQvyTx0RKt+TFIfF3N+lAoUS44Hhav0Lg7ROhIgD1mtaY DSSFtlXcjDRY4mUzkrv3pVXx2B8d0XQ9Dp7tNN2uk9nfoEhqZXHHbem27LbfjBcwQSjy /P2s7yWQFDVfcobbKSBmQCuJAo9dEctzv9uNANPkxY5DVu4WqzuJXpaz9L65mGy5xLnP C2g93lzveu/wf5lstZBv+zMSnBZJwws+JAtO/4UR2tj2tem4lYg9SfQFU8eOEtSvqcz+ P7cA== X-Gm-Message-State: ALyK8tLI1sMH4TrSN/40UoTKbc9nHl3C/Jpr8daRu2sup+4AIM1piRKvM5DCEbfRqITCgSiD X-Received: by 10.194.243.101 with SMTP id wx5mr13156219wjc.132.1464022858053; Mon, 23 May 2016 10:00:58 -0700 (PDT) Original-Received: from Gilgamesch (gilgamesch.quim.ucm.es. [147.96.12.99]) by smtp.gmail.com with ESMTPSA id q76sm14671252wmd.6.2016.05.23.10.00.55 (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 23 May 2016 10:00:56 -0700 (PDT) X-Mailer: emacs 25.1.50.1 (via feedmail 11-beta-1 I) X-Hashcash: 1:20:160523:oub@mat.ucm.es::EBuIHAIMXzxyaktK:0001VGh X-Hashcash: 1:20:160523:dgutov@yandex.ru::OThocIdDf4O0D8OY:03ag7 X-Hashcash: 1:20:160523:eliz@gnu.org::u9LKQw50mVpc0uur:000003pQx X-Hashcash: 1:20:160523:23595@debbugs.gnu.org::rDWGQHhrLBAMJl61:0000000000000000000000000000000000000000A4/e User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:118572 Archived-At: >>> "Eli" == Eli Zaretskii writes: >> From: Dmitry Gutov >> Date: Mon, 23 May 2016 14:52:03 +0300 >> >> > The resulting diff contains either rubbish or fails to run. >> > Files attached. > I don't see any rubbish in the Git output. With RCS, the command > signals an error, so more digging is needed to find out what's wrong > (although it could be that rcsdiff exits with non-zero status when it > sees what looks like binary files). >> It seems, to an extent, be caused by our setting >> coding-system-for-read inside vc-diff-internal (to >> utf-16be-with-signature-unix, which is also the value of >> buffer-file-coding-system). >> >> Without that, the result of vc-diff (at least with Git) is "Binary >> files a/test-chin-jap.tex and b/test-chin-jap.tex differ". Emacs >> 24.5 does the same. > Setting coding-system-for-read is correct, because the important use > case is when the diffs are actually output. The problem is that > UTF-16 is not ASCII-compatible, and so text output by Git itself will > be mishandled. Another problem is that Git doesn't show the diffs at > all. >> Which is weird, considering both vc-diff-internal and >> vc-coding-system-for-diff have both been virtually untouched for the >> last couple of years. > Not sure what do you see as weird. >> But even if we figure out why happens, you (Uwe) probably want Git, >> Hg, etc, to treat this file as text, and not binary. Only then >> you'll be able to get meaningful diffs. I don't have a specific >> advice on that. > Why can't we invoke "git diff --text"? That should fix the second > problem, I think. I thought the problem was caused by the fact that I did not entered that chars, but rather copied it from some tex.stackexchange site, but I see that was not the reason. What is about mercurial?[1] > As for the first problem, we should probably refrain from binding > coding-system-for-read to a CODING-SYSTEM for which > (coding-system-get CODING-SYSTEM :ascii-compatible-p) > returns nil. We should instead bind it to no-conversion and decode > the file data parts by hand, skipping the parts that Git itself > outputs (yes, this is messy). Patches to that effect are welcome. > Bottom line: users who put UTF-16 encoded files into VCS are playing > with fire, and are best advised not to do that! Right, I see, that was just 2 chars in a document which contained latin-1 or UTF8. So Chinese and Japanese programmers are in a disadvantage, no? Footnotes: [1] I don't care so much about RCS in that context.