unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Syntactic fontification of diff hunks
@ 2018-08-16 21:08 Juri Linkov
  2018-08-16 21:44 ` Kaushal Modi
  2018-08-17  5:41 ` Yuri Khan
  0 siblings, 2 replies; 12+ messages in thread
From: Juri Linkov @ 2018-08-16 21:08 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

Most version control sites like gitlab/github highlight syntax
in code snippets inside diff hunks, for example:
https://github.com/magit/magit/pull/2834/commits/95cacde4fcccc95c25d6fb9988d2aa097193f8c0

This is very helpful when looking at code changes.  I missed this feature in Emacs
for a long time.  This is why I asked a question about a possible implementation in
https://emacs.stackexchange.com/questions/43957/syntactic-fontification-of-diff-hunks
but no one had an answer.

Then I realized that much simpler would be just to use the same approach
implemented by diff-mode refinement, i.e. to take each diff hunk one by
one, and like the diff refinement highlights more fine-grained changes,
do the same for syntax highlighting according to the language in
compared files/commits.

You can see the result at the following screenshot:


[-- Attachment #2: diff-syntax.png --]
[-- Type: image/png, Size: 27954 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-16 21:08 Syntactic fontification of diff hunks Juri Linkov
@ 2018-08-16 21:44 ` Kaushal Modi
  2018-08-16 22:27   ` Juri Linkov
  2018-08-17  5:41 ` Yuri Khan
  1 sibling, 1 reply; 12+ messages in thread
From: Kaushal Modi @ 2018-08-16 21:44 UTC (permalink / raw)
  To: Juri Linkov; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1031 bytes --]

Looks nice.

Where you supposed to attach the patch too? :)

On Thu, Aug 16, 2018 at 5:42 PM Juri Linkov <juri@linkov.net> wrote:

> Most version control sites like gitlab/github highlight syntax
> in code snippets inside diff hunks, for example:
>
> https://github.com/magit/magit/pull/2834/commits/95cacde4fcccc95c25d6fb9988d2aa097193f8c0
>
> This is very helpful when looking at code changes.  I missed this feature
> in Emacs
> for a long time.  This is why I asked a question about a possible
> implementation in
>
> https://emacs.stackexchange.com/questions/43957/syntactic-fontification-of-diff-hunks
> but no one had an answer.
>
> Then I realized that much simpler would be just to use the same approach
> implemented by diff-mode refinement, i.e. to take each diff hunk one by
> one, and like the diff refinement highlights more fine-grained changes,
> do the same for syntax highlighting according to the language in
> compared files/commits.
>
> You can see the result at the following screenshot:
>
> --

Kaushal Modi

[-- Attachment #2: Type: text/html, Size: 1760 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-16 21:44 ` Kaushal Modi
@ 2018-08-16 22:27   ` Juri Linkov
  2018-08-16 22:33     ` Kaushal Modi
  0 siblings, 1 reply; 12+ messages in thread
From: Juri Linkov @ 2018-08-16 22:27 UTC (permalink / raw)
  To: Kaushal Modi; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 159 bytes --]

> Looks nice.
>
> Where you supposed to attach the patch too? :)

The patch is on the screenshot :)

But if you want to try, here it is for your convenience:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: diff-syntax.patch --]
[-- Type: text/x-diff, Size: 3764 bytes --]

diff --git a/lisp/vc/diff-mode.el b/lisp/vc/diff-mode.el
index b91a2ba45a..175687f184 100644
--- a/lisp/vc/diff-mode.el
+++ b/lisp/vc/diff-mode.el
@@ -96,6 +96,11 @@ diff-font-lock-refine
   :version "27.1"
   :type 'boolean)
 
+(defcustom diff-font-lock-syntaxify t
+  "If non-nil, diff hunk font-lock includes syntax highlighting."
+  :version "27.1"
+  :type 'boolean)
+
 (defcustom diff-font-lock-prettify nil
   "If non-nil, font-lock will try and make the format prettier."
   :version "27.1"
@@ -402,7 +411,8 @@ diff-font-lock-keywords
      (2 font-lock-comment-face))
     ("^[^-=+*!<>#].*\n" (0 'diff-context))
     (,#'diff--font-lock-prettify)
-    (,#'diff--font-lock-refined)))
+    (,#'diff--font-lock-refined)
+    (,#'diff--font-lock-syntaxify)))
 
 (defconst diff-font-lock-defaults
   '(diff-font-lock-keywords t nil nil nil (font-lock-multiline . nil)))
@@ -2230,6 +2246,66 @@ diff--font-lock-prettify
                              'display "")))))
   nil)
 
+;;; Syntactic fontification
+
+(defun diff--font-lock-syntaxify (max)
+  "Highlight language syntax in diff hunks."
+  (when diff-font-lock-syntaxify
+    (when (get-char-property (point) 'diff--font-lock-syntaxified)
+      (goto-char (next-single-char-property-change
+                  (point) 'diff--font-lock-syntaxified nil max)))
+    (let* ((min (point))
+           (beg (or (ignore-errors (diff-beginning-of-hunk))
+                    (ignore-errors (let ((diff-auto-refine-mode nil))
+                                     (diff-hunk-next))
+                                   (point))
+                    max)))
+      (while (< beg max)
+        (let ((file (save-excursion
+                      (diff-beginning-of-file)
+                      (when (looking-at "^\\S-+\\s-+\\(\\S-+\\)")
+                        (match-string 1))))
+              (end (save-excursion (goto-char beg) (diff-end-of-hunk) (point))))
+          (if (< end min) (setq beg min))
+          (unless (or (< end beg)
+                      (get-char-property beg 'diff--font-lock-syntaxified))
+            (diff--syntaxify-hunk beg end file)
+            (let ((ol (make-overlay beg end)))
+              (overlay-put ol 'diff--font-lock-syntaxified t)
+              (overlay-put ol 'evaporate t)))
+          (goto-char (max beg end))
+          (setq beg (or (ignore-errors (let ((diff-auto-refine-mode nil))
+                                         (diff-hunk-next))
+                                       (point))
+                        max))))))
+  nil)
+
+(defun diff--syntaxify-hunk (beg end file-name)
+  (let (props (buffer (current-buffer)))
+    (with-temp-buffer
+      (insert-buffer-substring-no-properties buffer beg end)
+      (goto-char (point-min))
+      (while (re-search-forward "^[-+!]" nil t)
+        (replace-match " " nil nil))
+      (let ((buffer-file-name file-name))
+        (set-auto-mode))
+      (jit-lock-register #'font-lock-fontify-region)
+      (jit-lock-fontify-now (point-min) (point-max))
+      (goto-char (point-min))
+      (let* ((from (point)) to
+             (val (get-text-property from 'face)))
+        (while (setq to (next-single-property-change from 'face))
+          (when val (push (list from to val) props))
+          (setq val (get-text-property to 'face)
+                from to))))
+    (dolist (prop props)
+      (let ((ol (make-overlay (+ beg (nth 0 prop) -1)
+                              (+ beg (nth 1 prop) -1)
+                              nil 'front-advance nil)))
+        (overlay-put ol 'evaporate t)
+        (overlay-put ol 'face (nth 2 prop))
+        ol))))
+
 ;;; Support for converting a diff to diff3 markers via `wiggle'.
 
 ;; Wiggle can be found at http://neil.brown.name/wiggle/ or in your nearest

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-16 22:27   ` Juri Linkov
@ 2018-08-16 22:33     ` Kaushal Modi
  2018-08-16 22:41       ` Juri Linkov
  0 siblings, 1 reply; 12+ messages in thread
From: Kaushal Modi @ 2018-08-16 22:33 UTC (permalink / raw)
  To: Juri Linkov; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

On Thu, Aug 16, 2018 at 6:28 PM Juri Linkov <juri@linkov.net> wrote:

>
> The patch is on the screenshot :)
>

Bummer ..

But if you want to try, here it is for your convenience:
>

Thanks!

Just a quick note on the first thing that struck me on reading the new
defcustom name.

It should be `diff-font-lock-syntactify'.

syntax -> syntactical (not syntaxical)

On similar lines, may be syntactify (not syntaxify).
-- 

Kaushal Modi

[-- Attachment #2: Type: text/html, Size: 1059 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-16 22:33     ` Kaushal Modi
@ 2018-08-16 22:41       ` Juri Linkov
  0 siblings, 0 replies; 12+ messages in thread
From: Juri Linkov @ 2018-08-16 22:41 UTC (permalink / raw)
  To: Kaushal Modi; +Cc: emacs-devel

> Just a quick note on the first thing that struck me on reading the new
> defcustom name.
>
> It should be `diff-font-lock-syntactify'.
>
> syntax -> syntactical (not syntaxical)
>
> On similar lines, may be syntactify (not syntaxify).

I choose the name syntaxify to be similar to Taxify
(because I had a job in the same office with this company :)

But I agree that syntactify is more correct syntactically.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-16 21:08 Syntactic fontification of diff hunks Juri Linkov
  2018-08-16 21:44 ` Kaushal Modi
@ 2018-08-17  5:41 ` Yuri Khan
  2018-08-17  6:00   ` Andreas Röhler
  1 sibling, 1 reply; 12+ messages in thread
From: Yuri Khan @ 2018-08-17  5:41 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Emacs developers

On Fri, Aug 17, 2018 at 4:41 AM Juri Linkov <juri@linkov.net> wrote:
>
> Most version control sites like gitlab/github highlight syntax
> in code snippets inside diff hunks, for example:
> https://github.com/magit/magit/pull/2834/commits/95cacde4fcccc95c25d6fb9988d2aa097193f8c0
>
> This is very helpful when looking at code changes.  I missed this feature in Emacs
> for a long time.

Oh yesss.

> Then I realized that much simpler would be just to use the same approach
> implemented by diff-mode refinement, i.e. to take each diff hunk one by
> one, and like the diff refinement highlights more fine-grained changes,
> do the same for syntax highlighting according to the language in
> compared files/commits.

I suppose some integration will be needed before this works in Magit, too?

Also, it looks like it’s going to be somewhere between slightly and
horribly inaccurate depending on where the hunk starts (e.g. in the
middle of a string literal or comment)?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-17  5:41 ` Yuri Khan
@ 2018-08-17  6:00   ` Andreas Röhler
  2018-08-17  6:47     ` Yuri Khan
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Röhler @ 2018-08-17  6:00 UTC (permalink / raw)
  To: emacs-devel

?
> 
> Also, it looks like it’s going to be somewhere between slightly and
> horribly inaccurate depending on where the hunk starts (e.g. in the
> middle of a string literal or comment)?
> 

Which probably thwarts the whole project. Theoretically if being behind 
the end or start of a string could be fetched from source files - but 
probably not worth the effort. Interesting try anyway, thanks guys.

Andreas



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-17  6:00   ` Andreas Röhler
@ 2018-08-17  6:47     ` Yuri Khan
  2018-08-17 17:47       ` Juri Linkov
  2018-08-18 20:02       ` Andreas Röhler
  0 siblings, 2 replies; 12+ messages in thread
From: Yuri Khan @ 2018-08-17  6:47 UTC (permalink / raw)
  To: Andreas Röhler; +Cc: Emacs developers

On Fri, Aug 17, 2018 at 12:57 PM Andreas Röhler
<andreas.roehler@online.de> wrote:

> > Also, it looks like it’s going to be somewhere between slightly and
> > horribly inaccurate depending on where the hunk starts (e.g. in the
> > middle of a string literal or comment)?
>
> Which probably thwarts the whole project.

Not necessarily; I think it might be good enough for a major part of
real-life use cases. Also, if the diff comes from an interactive tool,
the user can probably just increase context size a few times so as to
shift the balance towards “slightly inaccurate”.

> Theoretically if being behind
> the end or start of a string could be fetched from source files -

That condition is undecidable based on the hunk text alone.

An implementation that has access to at least one of the source files
could just fontify that whole file and extract fontification from that
using hunk line offsets (assuming they are accurate). This probably
covers Ediff, vc, and Magit, but not reading standalone patches.

There is also the theoretical issue of syntax being changed by the
patch — e.g. introducing an unbalanced multiline string or comment
opener or closer on a separate line.

     (defun foo ()
    +  "
       (bar baz)
    +  "
       (quux xyzzy))

The middle line here has (punctuation identifier identifier
punctuation) syntax according to the “before” file, but (string)
according to “after”.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-17  6:47     ` Yuri Khan
@ 2018-08-17 17:47       ` Juri Linkov
  2018-08-17 18:34         ` Yuri Khan
  2018-08-18 20:02       ` Andreas Röhler
  1 sibling, 1 reply; 12+ messages in thread
From: Juri Linkov @ 2018-08-17 17:47 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Andreas Röhler, Emacs developers

>> > I suppose some integration will be needed before this works in Magit, too?

If Magit doesn't use diff-mode, then yes, the code should be duplicated
in Magit.

>> > Also, it looks like it’s going to be somewhere between slightly and
>> > horribly inaccurate depending on where the hunk starts (e.g. in the
>> > middle of a string literal or comment)?
>>
>> Which probably thwarts the whole project.
>
> Not necessarily; I think it might be good enough for a major part of
> real-life use cases. Also, if the diff comes from an interactive tool,
> the user can probably just increase context size a few times so as to
> shift the balance towards “slightly inaccurate”.

Such as e.g. using --function-context for git will provide enough context.

>> Theoretically if being behind
>> the end or start of a string could be fetched from source files -
>
> That condition is undecidable based on the hunk text alone.
>
> An implementation that has access to at least one of the source files
> could just fontify that whole file and extract fontification from that
> using hunk line offsets (assuming they are accurate). This probably
> covers Ediff, vc, and Magit, but not reading standalone patches.

I see that gitlab and github highlight syntax on server using whole
files, then send highlighted html chunks to browser.  The same way we
can easily get whole files from git by their sha from the index in diff
headers, or get files by relative paths when using diff on regular files.

But when reading standalone patches e.g. in Gnus, then indeed there is
a potential problem, but in practice when I looked at large patches,
there was only small amount of such problematic places, that fortunately
are narrowed to only one diff hunk, unlike failures in font-lock of some
programming language modes (most often I remember such cases in cperl mode)
where an incorrectly recognized quote breaks fontification to the end of
the whole file.

> There is also the theoretical issue of syntax being changed by the
> patch — e.g. introducing an unbalanced multiline string or comment
> opener or closer on a separate line.
>
>      (defun foo ()
>     +  "
>        (bar baz)
>     +  "
>        (quux xyzzy))
>
> The middle line here has (punctuation identifier identifier
> punctuation) syntax according to the “before” file, but (string)
> according to “after”.

I think “after” should have priority over “before” in context because the
main goal of reading patches is to see how code will look after changes,
so in this case ‘(bar baz)’ should be highlighted as a string.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-17 17:47       ` Juri Linkov
@ 2018-08-17 18:34         ` Yuri Khan
  2018-08-19 20:57           ` Juri Linkov
  0 siblings, 1 reply; 12+ messages in thread
From: Yuri Khan @ 2018-08-17 18:34 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Andreas Röhler, Emacs developers

On Sat, Aug 18, 2018 at 12:50 AM Juri Linkov <juri@linkov.net> wrote:

> I think “after” should have priority over “before” in context because the
> main goal of reading patches is to see how code will look after changes,
> so in this case ‘(bar baz)’ should be highlighted as a string.

Yes, that is sensible.

I see another potential issue. In your patch above, the whole hunk is
fontified as a whole, with all its context lines, deleted lines, and
added lines. A change on the line that opens a multiline string will
disrupt syntax until the end of hunk:

     (def foo ()
    -  "Lorem ipsum
    +  "Cthulhu fhtagn
       dolor sit amet")

A more robust approach would be to fontify separately the “after”
version by taking context + added lines and “before” by taking context
+ deleted lines. Then use fontification from “before” for deleted
lines, and from “after” for context and added lines.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-17  6:47     ` Yuri Khan
  2018-08-17 17:47       ` Juri Linkov
@ 2018-08-18 20:02       ` Andreas Röhler
  1 sibling, 0 replies; 12+ messages in thread
From: Andreas Röhler @ 2018-08-18 20:02 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Emacs Devel



On 17.08.2018 08:47, Yuri Khan wrote:
> There is also the theoretical issue of syntax being changed by the
> patch — e.g. introducing an unbalanced multiline string or comment
> opener or closer on a separate line.
> 
>       (defun foo ()
>      +  "
>         (bar baz)
>      +  "
>         (quux xyzzy))
> 
> The middle line here has (punctuation identifier identifier
> punctuation) syntax according to the “before” file, but (string)
> according to “after”.

May go further saying: if a such a hunk contains valid syntax is not 
decidable. Because, even if it looks like, the impression might be wrong 
by missing syntactic context.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Syntactic fontification of diff hunks
  2018-08-17 18:34         ` Yuri Khan
@ 2018-08-19 20:57           ` Juri Linkov
  0 siblings, 0 replies; 12+ messages in thread
From: Juri Linkov @ 2018-08-19 20:57 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Andreas Röhler, Emacs developers

> I see another potential issue. In your patch above, the whole hunk is
> fontified as a whole, with all its context lines, deleted lines, and
> added lines. A change on the line that opens a multiline string will
> disrupt syntax until the end of hunk:
>
>      (def foo ()
>     -  "Lorem ipsum
>     +  "Cthulhu fhtagn
>        dolor sit amet")
>
> A more robust approach would be to fontify separately the “after”
> version by taking context + added lines and “before” by taking context
> + deleted lines. Then use fontification from “before” for deleted
> lines, and from “after” for context and added lines.

Yes, this is easy to do.  Also there is one significant difference
between diff refinement overlays and diff syntax overlays: while for
multiline refinement it's probably ok to put a refined overlay over the
diff indicator symbols (such as + or - on at the beginning of the line)
when refinement is highlighted using background colors and indicator with
foreground color, but for multiline syntax elements (e.g. multiline
strings) their foreground colors will overwrite foreground colors of
diff indicators.  A solution would be to split multiline syntax overlays
and put them separately on each diff line.

But the real problem is with git diffs.  I tried to reuse the same code
that visits a diff hunk in the corresponding file (this code could be used
to visit the file, fontify it and copy fontification back to the diff buffer),
but bumped into a deficiency:

1. try to open some git log with vc-print-log in a *vc-change-log* buffer;
2. open a diff using ‘d’ (log-view-diff) or ‘D’ (log-view-diff-changeset);
3. try to navigate to the corresponding place in the file with ‘C-c C-c’
   (diff-goto-source)

It visits the current file version and most of the time errs with
“Hunk text not found”.

It seems the most sensible would be to checkout the corresponding file
revision like does ‘f’ (log-view-find-revision) from *vc-change-log*
(BTW, I wonder why it creates a file with the name “filename~commit-sha~”
instead of “filename~treefile-sha~”), but unlike ‘f’ better not to save it,
but just to create a file buffer and navigate to the changed place.

Another question: shouldn't diff-goto-source be context-sensitive?
Currently it visits the old file when invoked with the prefix arg.
Maybe better also allow it to visit the old file when invoked
on old lines beginning with ‘-’ or ‘<’?



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-08-19 20:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-16 21:08 Syntactic fontification of diff hunks Juri Linkov
2018-08-16 21:44 ` Kaushal Modi
2018-08-16 22:27   ` Juri Linkov
2018-08-16 22:33     ` Kaushal Modi
2018-08-16 22:41       ` Juri Linkov
2018-08-17  5:41 ` Yuri Khan
2018-08-17  6:00   ` Andreas Röhler
2018-08-17  6:47     ` Yuri Khan
2018-08-17 17:47       ` Juri Linkov
2018-08-17 18:34         ` Yuri Khan
2018-08-19 20:57           ` Juri Linkov
2018-08-18 20:02       ` Andreas Röhler

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).