bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#65049: Minor update to the repro steps


From: Eli Zaretskii
Subject: bug#65049: Minor update to the repro steps
Date: Sat, 26 Aug 2023 11:50:03 +0300

> Date: Sat, 26 Aug 2023 03:45:41 +0300
> Cc: juri@linkov.net, habamax@gmail.com, 65049@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> On 25/08/2023 09:18, Eli Zaretskii wrote:
> 
> >> Looks like it's this line:
> >>
> >>     (coding-system-for-read
> >>      (if files (vc-coding-system-for-diff (car files)) 'undecided))
> >>
> >> near the beginning of vc-diff-internal that creates the difference.
> >> Commenting it out makes the scenario work with both 'C-x v =' and 'C-x v 
> >> D'.
> > 
> > That code fragment is very old, so just removing it is scary, even if
> > only in master.
> 
> Yeah, I noticed: it's from 2007 :-)

No, it's older.  The addition of 'undecided' is from 2007, but the
vc-coding-system-for-diff part is from the original 1992 code.

> > What if you change that fragment to say
> > 
> >      (coding-system-for-read
> >       (if files (vc-coding-system-for-diff (car files)) 'undecided-unix))
> > 
> > instead?
> 
> No change at all. The reasons are twofold:
> 
> - You changed the value that was seemingly used for the "root" case, 
> because in the individual diff's case files must not be nil: it would 
> contain the files to be diff'd. That's why that change doesn't affect 
> 'C-x v ='.
> 
> - But it also doesn't affect 'C-x v D'. Because even in that case FILES 
> is non-nil ;-(. In that scenario FILES is a list with one item: the 
> repository's root directory.

I guess we need to force the EOL conversion part to be 'unix?  Like
this:

diff --git a/lisp/vc/vc.el b/lisp/vc/vc.el
index 410fe5c..529553e 100644
--- a/lisp/vc/vc.el
+++ b/lisp/vc/vc.el
@@ -1910,7 +1910,11 @@ vc-diff-internal
         ;; but the only way to set it for each file included would
         ;; be to call the back end separately for each file.
         (coding-system-for-read
-         (if files (vc-coding-system-for-diff (car files)) 'undecided))
+          ;; Force EOL conversion to -unix, in case the file itself
+          ;; has DOS EOLs.
+          (coding-system-change-eol-conversion
+          (if files (vc-coding-system-for-diff (car files)) 'undecided)
+           'unix))
          (orig-diff-buffer-clone
           (if revert-buffer-in-progress-p
               (clone-buffer

> So we can conclude that this code is at least a little buggy. But... (*)
> 
> > If that doesn't work, please tell to what value does
> > vc-diff-internal set coding-system-for-read in your case there, and I
> > will try to figure out what would needs to be done there.
> 
> (vc-coding-system-for-diff (car files)) either returns 'undecided when 
> FILES contains the directory (vc-root-diff), or 'undecided-dos when 
> FILES contains hello.txt as the sole element (because our scenario made 
> sure the file has that encoding), that's the vc-diff case.

OK, clear.  So the above should DTRT in both cases.

> > (In general, I believe that using Git on Posix hosts with files that
> > have DOS EOLs could have such problems in other use cases, where diffs
> > are generated and then applied as patches.  We just don't know about
> > those cases because they are extremely rare in Real Life.)
> 
> I'm definitely curious which scenarios made Eric add that line.
> 
> (*) ... upon some reflection, though, it seems like our success here is 
> kind of relying on vc-root-diff's bug. Remember I mentioned the ^M chars 
> appearing at the ends of lines? That is because the encoding of the diff 
> buffer (utf-8-unix) doesn't match the encoding of the file (utf-8-dos).
> 
> That only happens with the root diff, but not with vc-diff, which 
> follows the old design and uses the return value of 
> vc-coding-system-for-diff (undecided-dos). As luck would have it, 
> though, our patch generation and application works well with the former 
> behavior but not the latter.
> 
> Still, Eric's old design did not make allowance for root diffs. Not sure 
> what to do with that; though I suppose we could post-process the diff 
> outputs instead: read the name of the first file in there, then detect 
> its encoding on disk, and then re-decode the diff contents if the 
> current value of buffer-file-coding-system doesn't match. And *then* we 
> would need to fix vc-git-checkin-patch in that scenario (and maybe other 
> backends as well).
> 
> Or we decide that seeing ^M in diff buffers is a good thing under those 
> conditions, and delete the line in question.

I don't completely understand what you are saying, probably because I
don't have a clear picture of all the callers of vc-diff-internal.  So
I can only explain the fundamental issues here of which I'm aware:

  . When the compared files have DOS EOLs, applying the patch on Posix
    hosts (and with Git on all hosts) must preserve the ^M characters
    at ends of lines in the diffs buffer.  This might be a bit ugly
    when viewing the diffs, but if the same commands are used for
    patching, this cannot be helped.
  . In all my experience with VCSes managing repositories with mixed
    EOL formats (such as what we have in Emacs) on Windows, the only
    sane way of doing that is to force the VCS to leave the original
    EOLs intact.  With CVS and RCS, this is done by checking out all
    the text files as "binary"; in Git, there's a config setting to do
    that.  I have no real experience with SVN and Hg, so I don't know
    what happens there.  So it's possible we should remove the special
    handling of Windows in vc-diff-internal, because its only reason
    is to show "nicer" diffs.
  . The line you suggest to remove should IMO stay, because your
    suggestion is based on what you see with plain-ASCII files.  If
    the files have some non-trivial text encoding, failing to use the
    right encoding for the diffs will produce mojibake.  The EOL
    conversion produced by vc-coding-system-for-diff is indeed
    problematic, see above; but the text-conversion part is not, and
    should stay.

Therefore, I propose the patch below, which incorporates the above
change, for the emacs-29 branch.  I think it is safe to use the 'unix
EOL conversion on all systems, in the vc-git.el part of the changeset,
but if you feel uneasy about that on the release branch, we could make
it Windows-specific on emacs-29 and remove the condition on master.

diff --git a/lisp/vc/vc-git.el b/lisp/vc/vc-git.el
index 7ae763d..218696c 100644
--- a/lisp/vc/vc-git.el
+++ b/lisp/vc/vc-git.el
@@ -1051,7 +1051,15 @@ vc-git-checkin
                 (user-error "Index not empty"))
               (setq pos (point))))))
       (unless (string-empty-p vc-git-patch-string)
-        (let ((patch-file (make-nearby-temp-file "git-patch")))
+        (let ((patch-file (make-nearby-temp-file "git-patch"))
+              ;; Temporarily countermand the let-binding at the
+              ;; beginning of this function.
+              (coding-system-for-write
+               (coding-system-change-eol-conversion
+                ;; On DOS/Windows, it is important for the patch file
+                ;; to have the Unix EOL format, because Git expects
+                ;; that, even on Windows.
+                (or pcsw vc-git-commits-coding-system) 'unix)))
           (with-temp-file patch-file
             (insert vc-git-patch-string))
           (unwind-protect
diff --git a/lisp/vc/vc.el b/lisp/vc/vc.el
index 410fe5c..c314988 100644
--- a/lisp/vc/vc.el
+++ b/lisp/vc/vc.el
@@ -1910,15 +1910,26 @@ vc-diff-internal
         ;; but the only way to set it for each file included would
         ;; be to call the back end separately for each file.
         (coding-system-for-read
-         (if files (vc-coding-system-for-diff (car files)) 'undecided))
+          ;; Force the EOL conversion to be -unix, in case the files
+          ;; to be compared have DOS EOLs.  In that case, EOL
+          ;; conversion will produce a patch file that will either
+          ;; fail to apply, or will change the EOL format of some of
+          ;; the lines in the patched file.
+          (coding-system-change-eol-conversion
+          (if files (vc-coding-system-for-diff (car files)) 'undecided)
+           'unix))
          (orig-diff-buffer-clone
           (if revert-buffer-in-progress-p
               (clone-buffer
                (generate-new-buffer-name " *vc-diff-clone*") nil))))
     ;; On MS-Windows and MS-DOS, Diff is likely to produce DOS-style
     ;; EOLs, which will look ugly if (car files) happens to have Unix
-    ;; EOLs.
-    (if (memq system-type '(windows-nt ms-dos))
+    ;; EOLs.  But for Git, we must force Unix EOLs in the diffs, since
+    ;; Git always produces Unix EOLs in the parts that didn't come
+    ;; from the file, and wants to see any CR characters when applying
+    ;; patches.
+    (if (and (memq system-type '(windows-nt ms-dos))
+             (not (eq (vc-deduce-backend) 'Git)))
        (setq coding-system-for-read
              (coding-system-change-eol-conversion coding-system-for-read
                                                   'dos)))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]