From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#65049: Minor update to the repro steps Date: Sun, 27 Aug 2023 04:14:11 +0300 Message-ID: <8be534f8-9f03-5de6-53c8-76be0f9456fa@gutov.dev> References: <83y1iruky1.fsf@gnu.org> <83il9qom6k.fsf@gnu.org> <86v8dandhq.fsf@mail.linkov.net> <83bkf1woy3.fsf@gnu.org> <835y57tf23.fsf@gnu.org> <87edjvp6ev.fsf@gmail.com> <83350btdw8.fsf@gnu.org> <831qftspal.fsf@gnu.org> <35b50832-e9ca-9f57-fad6-68621d9b42e7@gutov.dev> <83pm3dqbtp.fsf@gnu.org> <789dacd3-8e62-74ad-f691-5b48cb1d678b@gutov.dev> <2f6986e7-f96b-98bd-4581-7503bb01b111@gutov.dev> <83ttsnoda5.fsf@gnu.org> <49d5e741-f97d-ae4d-f79c-ec418051d868@gutov.dev> <83v8d2kx1g.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37715"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: 65049@debbugs.gnu.org, habamax@gmail.com, juri@linkov.net To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Aug 27 03:15:19 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qa4N4-0009db-Mj for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 27 Aug 2023 03:15:19 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qa4Mk-0003tO-I7; Sat, 26 Aug 2023 21:14:58 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qa4Mj-0003so-0A for bug-gnu-emacs@gnu.org; Sat, 26 Aug 2023 21:14:57 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qa4Mi-0007Ks-DF for bug-gnu-emacs@gnu.org; Sat, 26 Aug 2023 21:14:56 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qa4Mn-000395-RI for bug-gnu-emacs@gnu.org; Sat, 26 Aug 2023 21:15:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 27 Aug 2023 01:15:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 65049 X-GNU-PR-Package: emacs Original-Received: via spool by 65049-submit@debbugs.gnu.org id=B65049.169309886912035 (code B ref 65049); Sun, 27 Aug 2023 01:15:01 +0000 Original-Received: (at 65049) by debbugs.gnu.org; 27 Aug 2023 01:14:29 +0000 Original-Received: from localhost ([127.0.0.1]:43620 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qa4MG-000383-TL for submit@debbugs.gnu.org; Sat, 26 Aug 2023 21:14:29 -0400 Original-Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:35977) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qa4ME-00037o-B5 for 65049@debbugs.gnu.org; Sat, 26 Aug 2023 21:14:27 -0400 Original-Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id B0E0632003F4; Sat, 26 Aug 2023 21:14:14 -0400 (EDT) Original-Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Sat, 26 Aug 2023 21:14:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm3; t= 1693098854; x=1693185254; bh=8Agg+LVteCt+W58/9RBLn4bv/r8KTCk2GE/ UFdSSmes=; b=aW+fU1iNY6M3rXGuBPd4z+zv+kO3nAgYFjKxC/UjU6h996AN7bH Cq1aGOBtqQMZGaxSnk8auo9PmIzR1AQACvgI79xDLVclf9iVnkvZ7IL8rGJeQ77m 27o+8VYefUlWP9DjOb6mhx3q6UiRyPv7raH7TFxrYWQ84BCrrqww0mPFcymH7Isj oDuqNdsktZxMraO4ckIRjE5fHbF6Vy3vLkd+XE4R6A+HzI5/gFejemtfkDCHSUKC 3meWITPTG8rmRD87vN/PlEuZuEpHQGwU36qZsdgd0iwBjFgdF1nqtYPFcLAdKzx+ qH+8KF1tosGftd1QjKZxov7Wvm2sRB75Juw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1693098854; x=1693185254; bh=8Agg+LVteCt+W58/9RBLn4bv/r8KTCk2GE/ UFdSSmes=; b=r84r2hiUJqdYlQXAX5CDlKiuyyK9VAE4B/U4E5pxFSY0smDeGpa qKjztb7b66uLYV60Bg/s7BWoStroFXwi1/k3oBGP/Fe21LTL3Y07PPyxeDZeByTl ImxzrGEwW9nLviKiO5/QmF4k/UNt0+RApq4BJ8Y/xzbiXOYebL0FU+qOfpRG3qeb iiBDpuRhe9wMIhZfmlu5kY8d6mRE+klBvWlEIdwn+Xy9bwXPePy897YFrbtrcWnD fRwiGw2AtMMLE/O41sL3f8hB9i8e+Nab/WuJxjGwt0tQ7/XfOkT9FgyxnxTXbDN/ ND6QpvnZZh+3OEC68ejuUpS5ofYAA6UKY3w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudefuddggeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtfeejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepiefgteevheevveffheeltdeukeeiieekueefgedugfefgefhudelgfefveel vdevnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 26 Aug 2023 21:14:12 -0400 (EDT) Content-Language: en-US In-Reply-To: <83v8d2kx1g.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:268525 Archived-At: On 26/08/2023 11:50, Eli Zaretskii wrote: >>> That code fragment is very old, so just removing it is scary, even if >>> only in master. >> >> Yeah, I noticed: it's from 2007 :-) > > No, it's older. The addition of 'undecided' is from 2007, but the > vc-coding-system-for-diff part is from the original 1992 code. Even better. > I guess we need to force the EOL conversion part to be 'unix? Like > this: > > diff --git a/lisp/vc/vc.el b/lisp/vc/vc.el > index 410fe5c..529553e 100644 > --- a/lisp/vc/vc.el > +++ b/lisp/vc/vc.el > @@ -1910,7 +1910,11 @@ vc-diff-internal > ;; but the only way to set it for each file included would > ;; be to call the back end separately for each file. > (coding-system-for-read > - (if files (vc-coding-system-for-diff (car files)) 'undecided)) > + ;; Force EOL conversion to -unix, in case the file itself > + ;; has DOS EOLs. > + (coding-system-change-eol-conversion > + (if files (vc-coding-system-for-diff (car files)) 'undecided) > + 'unix)) > (orig-diff-buffer-clone > (if revert-buffer-in-progress-p > (clone-buffer Yes, that fixes that scenario, thanks. Both standalone and as part of the full patch at the end of your message. >> So we can conclude that this code is at least a little buggy. But... (*) >> >>> If that doesn't work, please tell to what value does >>> vc-diff-internal set coding-system-for-read in your case there, and I >>> will try to figure out what would needs to be done there. >> >> (vc-coding-system-for-diff (car files)) either returns 'undecided when >> FILES contains the directory (vc-root-diff), or 'undecided-dos when >> FILES contains hello.txt as the sole element (because our scenario made >> sure the file has that encoding), that's the vc-diff case. > > OK, clear. So the above should DTRT in both cases. At least in regards to line endings, yes. I'm guessing that if we try hard enough with files encoded in an "alien" coding system, we will see a similar difference between vc-diff and vc-root-diff. >>> (In general, I believe that using Git on Posix hosts with files that >>> have DOS EOLs could have such problems in other use cases, where diffs >>> are generated and then applied as patches. We just don't know about >>> those cases because they are extremely rare in Real Life.) >> >> I'm definitely curious which scenarios made Eric add that line. >> >> (*) ... upon some reflection, though, it seems like our success here is >> kind of relying on vc-root-diff's bug. Remember I mentioned the ^M chars >> appearing at the ends of lines? That is because the encoding of the diff >> buffer (utf-8-unix) doesn't match the encoding of the file (utf-8-dos). >> >> That only happens with the root diff, but not with vc-diff, which >> follows the old design and uses the return value of >> vc-coding-system-for-diff (undecided-dos). As luck would have it, >> though, our patch generation and application works well with the former >> behavior but not the latter. >> >> Still, Eric's old design did not make allowance for root diffs. Not sure >> what to do with that; though I suppose we could post-process the diff >> outputs instead: read the name of the first file in there, then detect >> its encoding on disk, and then re-decode the diff contents if the >> current value of buffer-file-coding-system doesn't match. And *then* we >> would need to fix vc-git-checkin-patch in that scenario (and maybe other >> backends as well). >> >> Or we decide that seeing ^M in diff buffers is a good thing under those >> conditions, and delete the line in question. > > I don't completely understand what you are saying, probably because I > don't have a clear picture of all the callers of vc-diff-internal. So > I can only explain the fundamental issues here of which I'm aware: > > . When the compared files have DOS EOLs, applying the patch on Posix > hosts (and with Git on all hosts) must preserve the ^M characters > at ends of lines in the diffs buffer. This might be a bit ugly > when viewing the diffs, but if the same commands are used for > patching, this cannot be helped. There are two questions here: how the diff buffer should look to the user, and what patch to feed to Git programmatically. If we decide that the formats should be different (e.g. with/without ^M), we could probably perform additional newline conversion inside the patch text too. > . In all my experience with VCSes managing repositories with mixed > EOL formats (such as what we have in Emacs) on Windows, the only > sane way of doing that is to force the VCS to leave the original > EOLs intact. With CVS and RCS, this is done by checking out all > the text files as "binary"; in Git, there's a config setting to do > that. I have no real experience with SVN and Hg, so I don't know > what happens there. So it's possible we should remove the special > handling of Windows in vc-diff-internal, because its only reason > is to show "nicer" diffs. What does it look like on Windows without the "special handling"? Not displayed as a bunch of ^M, right? > . The line you suggest to remove should IMO stay, because your > suggestion is based on what you see with plain-ASCII files. If > the files have some non-trivial text encoding, failing to use the > right encoding for the diffs will produce mojibake. The EOL > conversion produced by vc-coding-system-for-diff is indeed > problematic, see above; but the text-conversion part is not, and > should stay. > > Therefore, I propose the patch below, which incorporates the above > change, for the emacs-29 branch. I think it is safe to use the 'unix > EOL conversion on all systems, in the vc-git.el part of the changeset, > but if you feel uneasy about that on the release branch, we could make > it Windows-specific on emacs-29 and remove the condition on master. LGTM for emacs-29, thank you. In case anybody reports a problem, we can add that OS limitation later. Regarding your paragraph above about mojibake, though. That makes a lot of sense, but I feel I have to stress: this mechanism doesn't work for vc-root-diff (C-x v D). Does that mean the coding system mismatch sufferers just silently use vc-diff and never report their problems with vc-root-diff? The latter command was added in 2009. No contest with 1992, but still.