From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#65049: Minor update to the repro steps Date: Sat, 26 Aug 2023 11:50:03 +0300 Message-ID: <83v8d2kx1g.fsf@gnu.org> References: <83y1iruky1.fsf@gnu.org> <83il9qom6k.fsf@gnu.org> <86v8dandhq.fsf@mail.linkov.net> <83bkf1woy3.fsf@gnu.org> <835y57tf23.fsf@gnu.org> <87edjvp6ev.fsf@gmail.com> <83350btdw8.fsf@gnu.org> <831qftspal.fsf@gnu.org> <35b50832-e9ca-9f57-fad6-68621d9b42e7@gutov.dev> <83pm3dqbtp.fsf@gnu.org> <789dacd3-8e62-74ad-f691-5b48cb1d678b@gutov.dev> <2f6986e7-f96b-98bd-4581-7503bb01b111@gutov.dev> <83ttsnoda5.fsf@gnu.org> <49d5e741-f97d-ae4d-f79c-ec418051d868@gutov.dev> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34306"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 65049@debbugs.gnu.org, habamax@gmail.com, juri@linkov.net To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Aug 26 10:50:14 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qZozl-0008k1-2L for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 26 Aug 2023 10:50:13 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qZozX-0008Cw-MQ; Sat, 26 Aug 2023 04:49:59 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qZozV-0008CY-Ub for bug-gnu-emacs@gnu.org; Sat, 26 Aug 2023 04:49:58 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qZozV-0003gB-MN for bug-gnu-emacs@gnu.org; Sat, 26 Aug 2023 04:49:57 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qZoza-0005Lf-HK for bug-gnu-emacs@gnu.org; Sat, 26 Aug 2023 04:50:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 26 Aug 2023 08:50:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 65049 X-GNU-PR-Package: emacs Original-Received: via spool by 65049-submit@debbugs.gnu.org id=B65049.169303979220537 (code B ref 65049); Sat, 26 Aug 2023 08:50:02 +0000 Original-Received: (at 65049) by debbugs.gnu.org; 26 Aug 2023 08:49:52 +0000 Original-Received: from localhost ([127.0.0.1]:41723 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qZozP-0005L8-LV for submit@debbugs.gnu.org; Sat, 26 Aug 2023 04:49:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58472) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qZozK-0005Kp-EA for 65049@debbugs.gnu.org; Sat, 26 Aug 2023 04:49:49 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qZoz9-0003fO-FC; Sat, 26 Aug 2023 04:49:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=OV+RpBoUbCK9D1mjK8iqC3AVm+ueBIlF+VgrCQwLnFk=; b=KOrD8eSqqkw8 VkBWYW/f+5OdypFePwTM+3ulmC5lpUSZrCe1CL612zWbdeTgPPAE7BRAz9n8DbxrBleDrHxuKOV1j uO+jrQW1PB/BHino3OVgN47UK914xVEb8fAJ2Kf3YX9K0fPfCj/AhPewsf27TmY/Pf+Nldto2YB/n dgfplDWGyaxPzIoCrUJnoYZ/hgV9Ju4JeRry9SAVV1OKTvri9JWzqj51kGUptVMQ8QQ8fO0X1wcpW UY3fTmWd9hr+I5h/l8vKfcSpdVVBKUD3fvrKmleS9V4ueTFlGCnWP2a0hsFf/BBhwu1cMtcVc87Gu iBfuClLjcqwRYnG3wwzZ+g==; In-Reply-To: <49d5e741-f97d-ae4d-f79c-ec418051d868@gutov.dev> (message from Dmitry Gutov on Sat, 26 Aug 2023 03:45:41 +0300) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:268491 Archived-At: > Date: Sat, 26 Aug 2023 03:45:41 +0300 > Cc: juri@linkov.net, habamax@gmail.com, 65049@debbugs.gnu.org > From: Dmitry Gutov > > On 25/08/2023 09:18, Eli Zaretskii wrote: > > >> Looks like it's this line: > >> > >> (coding-system-for-read > >> (if files (vc-coding-system-for-diff (car files)) 'undecided)) > >> > >> near the beginning of vc-diff-internal that creates the difference. > >> Commenting it out makes the scenario work with both 'C-x v =' and 'C-x v D'. > > > > That code fragment is very old, so just removing it is scary, even if > > only in master. > > Yeah, I noticed: it's from 2007 :-) No, it's older. The addition of 'undecided' is from 2007, but the vc-coding-system-for-diff part is from the original 1992 code. > > What if you change that fragment to say > > > > (coding-system-for-read > > (if files (vc-coding-system-for-diff (car files)) 'undecided-unix)) > > > > instead? > > No change at all. The reasons are twofold: > > - You changed the value that was seemingly used for the "root" case, > because in the individual diff's case files must not be nil: it would > contain the files to be diff'd. That's why that change doesn't affect > 'C-x v ='. > > - But it also doesn't affect 'C-x v D'. Because even in that case FILES > is non-nil ;-(. In that scenario FILES is a list with one item: the > repository's root directory. I guess we need to force the EOL conversion part to be 'unix? Like this: diff --git a/lisp/vc/vc.el b/lisp/vc/vc.el index 410fe5c..529553e 100644 --- a/lisp/vc/vc.el +++ b/lisp/vc/vc.el @@ -1910,7 +1910,11 @@ vc-diff-internal ;; but the only way to set it for each file included would ;; be to call the back end separately for each file. (coding-system-for-read - (if files (vc-coding-system-for-diff (car files)) 'undecided)) + ;; Force EOL conversion to -unix, in case the file itself + ;; has DOS EOLs. + (coding-system-change-eol-conversion + (if files (vc-coding-system-for-diff (car files)) 'undecided) + 'unix)) (orig-diff-buffer-clone (if revert-buffer-in-progress-p (clone-buffer > So we can conclude that this code is at least a little buggy. But... (*) > > > If that doesn't work, please tell to what value does > > vc-diff-internal set coding-system-for-read in your case there, and I > > will try to figure out what would needs to be done there. > > (vc-coding-system-for-diff (car files)) either returns 'undecided when > FILES contains the directory (vc-root-diff), or 'undecided-dos when > FILES contains hello.txt as the sole element (because our scenario made > sure the file has that encoding), that's the vc-diff case. OK, clear. So the above should DTRT in both cases. > > (In general, I believe that using Git on Posix hosts with files that > > have DOS EOLs could have such problems in other use cases, where diffs > > are generated and then applied as patches. We just don't know about > > those cases because they are extremely rare in Real Life.) > > I'm definitely curious which scenarios made Eric add that line. > > (*) ... upon some reflection, though, it seems like our success here is > kind of relying on vc-root-diff's bug. Remember I mentioned the ^M chars > appearing at the ends of lines? That is because the encoding of the diff > buffer (utf-8-unix) doesn't match the encoding of the file (utf-8-dos). > > That only happens with the root diff, but not with vc-diff, which > follows the old design and uses the return value of > vc-coding-system-for-diff (undecided-dos). As luck would have it, > though, our patch generation and application works well with the former > behavior but not the latter. > > Still, Eric's old design did not make allowance for root diffs. Not sure > what to do with that; though I suppose we could post-process the diff > outputs instead: read the name of the first file in there, then detect > its encoding on disk, and then re-decode the diff contents if the > current value of buffer-file-coding-system doesn't match. And *then* we > would need to fix vc-git-checkin-patch in that scenario (and maybe other > backends as well). > > Or we decide that seeing ^M in diff buffers is a good thing under those > conditions, and delete the line in question. I don't completely understand what you are saying, probably because I don't have a clear picture of all the callers of vc-diff-internal. So I can only explain the fundamental issues here of which I'm aware: . When the compared files have DOS EOLs, applying the patch on Posix hosts (and with Git on all hosts) must preserve the ^M characters at ends of lines in the diffs buffer. This might be a bit ugly when viewing the diffs, but if the same commands are used for patching, this cannot be helped. . In all my experience with VCSes managing repositories with mixed EOL formats (such as what we have in Emacs) on Windows, the only sane way of doing that is to force the VCS to leave the original EOLs intact. With CVS and RCS, this is done by checking out all the text files as "binary"; in Git, there's a config setting to do that. I have no real experience with SVN and Hg, so I don't know what happens there. So it's possible we should remove the special handling of Windows in vc-diff-internal, because its only reason is to show "nicer" diffs. . The line you suggest to remove should IMO stay, because your suggestion is based on what you see with plain-ASCII files. If the files have some non-trivial text encoding, failing to use the right encoding for the diffs will produce mojibake. The EOL conversion produced by vc-coding-system-for-diff is indeed problematic, see above; but the text-conversion part is not, and should stay. Therefore, I propose the patch below, which incorporates the above change, for the emacs-29 branch. I think it is safe to use the 'unix EOL conversion on all systems, in the vc-git.el part of the changeset, but if you feel uneasy about that on the release branch, we could make it Windows-specific on emacs-29 and remove the condition on master. diff --git a/lisp/vc/vc-git.el b/lisp/vc/vc-git.el index 7ae763d..218696c 100644 --- a/lisp/vc/vc-git.el +++ b/lisp/vc/vc-git.el @@ -1051,7 +1051,15 @@ vc-git-checkin (user-error "Index not empty")) (setq pos (point)))))) (unless (string-empty-p vc-git-patch-string) - (let ((patch-file (make-nearby-temp-file "git-patch"))) + (let ((patch-file (make-nearby-temp-file "git-patch")) + ;; Temporarily countermand the let-binding at the + ;; beginning of this function. + (coding-system-for-write + (coding-system-change-eol-conversion + ;; On DOS/Windows, it is important for the patch file + ;; to have the Unix EOL format, because Git expects + ;; that, even on Windows. + (or pcsw vc-git-commits-coding-system) 'unix))) (with-temp-file patch-file (insert vc-git-patch-string)) (unwind-protect diff --git a/lisp/vc/vc.el b/lisp/vc/vc.el index 410fe5c..c314988 100644 --- a/lisp/vc/vc.el +++ b/lisp/vc/vc.el @@ -1910,15 +1910,26 @@ vc-diff-internal ;; but the only way to set it for each file included would ;; be to call the back end separately for each file. (coding-system-for-read - (if files (vc-coding-system-for-diff (car files)) 'undecided)) + ;; Force the EOL conversion to be -unix, in case the files + ;; to be compared have DOS EOLs. In that case, EOL + ;; conversion will produce a patch file that will either + ;; fail to apply, or will change the EOL format of some of + ;; the lines in the patched file. + (coding-system-change-eol-conversion + (if files (vc-coding-system-for-diff (car files)) 'undecided) + 'unix)) (orig-diff-buffer-clone (if revert-buffer-in-progress-p (clone-buffer (generate-new-buffer-name " *vc-diff-clone*") nil)))) ;; On MS-Windows and MS-DOS, Diff is likely to produce DOS-style ;; EOLs, which will look ugly if (car files) happens to have Unix - ;; EOLs. - (if (memq system-type '(windows-nt ms-dos)) + ;; EOLs. But for Git, we must force Unix EOLs in the diffs, since + ;; Git always produces Unix EOLs in the parts that didn't come + ;; from the file, and wants to see any CR characters when applying + ;; patches. + (if (and (memq system-type '(windows-nt ms-dos)) + (not (eq (vc-deduce-backend) 'Git))) (setq coding-system-for-read (coding-system-change-eol-conversion coding-system-for-read 'dos)))