unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Yuan Fu <casouri@gmail.com>
To: Troy Brown <brownts@troybrown.dev>
Cc: Eli Zaretskii <eliz@gnu.org>, 64329@debbugs.gnu.org
Subject: bug#64329: 29.0.92; treesit/fill-paragraph syntax highlighting problem
Date: Wed, 28 Jun 2023 17:17:14 -0700	[thread overview]
Message-ID: <14493480-9226-4AB1-A313-E4E2D61AB0BD@gmail.com> (raw)
In-Reply-To: <3031A934-37EC-497D-8A48-ECE7FD703B31@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2943 bytes --]



> On Jun 28, 2023, at 2:23 PM, Yuan Fu <casouri@gmail.com> wrote:
> 
> 
> 
>> On Jun 28, 2023, at 9:46 AM, Troy Brown <brownts@troybrown.dev> wrote:
>> 
>> I've noticed this problem on multiple tree-sitter major modes including
>> c-ts-mode, c++-ts-mode, java-ts-mode, bash-ts-mode.  I haven't tried
>> others, but I suspect those might also suffer from this problem.
>> 
>> The issue occurs when attempting to fill the paragraph of a comment
>> block.  The following comment block can be used as an example to
>> reproduce the problem and happens with "emacs -Q" (assuming
>> corresponding tree-sitter libraries are available).
>> 
>> --8<---------------cut here---------------start------------->8---
>> // The quick brown fox jumps over the
>> // lazy dog.
>> // The quick brown fox jumps over the lazy dog.
>> --8<---------------cut here---------------end--------------->8---
>> 
>> Switch to one of the tree-sitter modes (e.g., M-x java-ts-mode).  Move
>> point to the first line of the comment block above and then execute the
>> fill-paragraph command (i.e., M-q).
>> 
>> The text which is wrapped onto the first line of the comment block will
>> be highlighted incorrectly.  The results appear as if the comment
>> delimiter was removed, fontification occurred, then the text was moved
>> to the first line of the comment block and never refontified with the
>> comment face.
> 
> Thank you very much! It’s funny that how long this went under the radar, presumably because we always use block comment.
> 
> The culprit is the subst-char-in-region function used by the filling function. It has a branch:
> 
> if (xxx)
>  {
> replace_range (pos, pos + 1, string, ...);
>  }
> else
>  {
> for (i = 0; i < len; i++) *p++ = tostr[i];
>  }
> 
> I overlooked the else branch and thought subst-char-in-region always calls replace_range. replace_range notifies tree-sitter of the change it makes; but when subst-char-in-region manually replaces the text in the else branch, those edits are not notified to tree-sitter.

Prompted by this, I went over all the functions that calls signal_after_change again, and found two other editfns.c functions that are missing calls to treesit_record_change. Please see the attached patches that follows the previous one. Sorry for the overlook. I believe I’ve found all places that needs to call treesit_record_change now.

> Please see the attached patch. Eli, is it more preferable to add a subroutine in insdel.c that does what "for (i = 0; i < len; i++) *p++ = tostr[I];” does, plus calling treesit_record_change, and make subst-char-in-region call that subroutine? (This way editfns.c don’t need to include treesit.h and call treesit_record_change itself.)

Since now there are three functions in editfns.c that needs to call treesit_record_change, we might as well just include treesit.h and call treesit_record_change directly.

Yuan


[-- Attachment #2: add-two-other-missing-calls.patch --]
[-- Type: application/octet-stream, Size: 11162 bytes --]

From a09bfd309ffbb4e33df3c1a850193dc20b93af1c Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Wed, 28 Jun 2023 17:03:19 -0700
Subject: [PATCH 1/2] Add missing calls to treesit_record_change in editfns.c

These should be all that are missing.  See the next commit for detail.

* src/editfns.c (Ftranslate_region_internal):
(Ftranspose_regions): Call treesit_record_change.
---
 src/editfns.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/editfns.c b/src/editfns.c
index 0cbeefb3262..a1e48daf6c6 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -2603,6 +2603,15 @@ DEFUN ("translate-region-internal", Ftranslate_region_internal,
 		    *p++ = *str++;
 		  signal_after_change (pos, 1, 1);
 		  update_compositions (pos, pos + 1, CHECK_BORDER);
+
+#ifdef HAVE_TREE_SITTER
+		  /* In the previous branch, replace_range() notifies
+                     changes to tree-sitter, but in this branch, we
+                     modified buffer content manually, so we need to
+                     notify tree-sitter manually.  */
+		  treesit_record_change (pos_byte, pos_byte + len,
+					 pos_byte + len);
+#endif
 		}
 	      characters_changed++;
 	    }
@@ -4776,6 +4785,13 @@ DEFUN ("transpose-regions", Ftranspose_regions, Stranspose_regions, 4, 5,
       adjust_markers_bytepos (start1, start1_byte, end2, end2_byte, 0);
     }
 
+#ifdef HAVE_TREE_SITTER
+  /* I don't think it's common to transpose two far-apart regions, so
+     amalgamating the edit into one should be fine.  This is what the
+     signal_after_change below does, too.  */
+  treesit_record_change (start1_byte, end2_byte, end2_byte);
+#endif
+
   signal_after_change (start1, end2 - start1, end2 - start1);
   return Qnil;
 }
-- 
2.33.1


From 94758fb0e11ce4dd763cd50990828879f9c41b19 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Wed, 28 Jun 2023 17:05:29 -0700
Subject: [PATCH 2/2] ; * admin/notes/tree-sitter/treesit_record_change:
 Update.

---
 admin/notes/tree-sitter/treesit_record_change | 180 +++++++++++++++++-
 1 file changed, 174 insertions(+), 6 deletions(-)

diff --git a/admin/notes/tree-sitter/treesit_record_change b/admin/notes/tree-sitter/treesit_record_change
index 0dc6491e2d1..e80df4adfa7 100644
--- a/admin/notes/tree-sitter/treesit_record_change
+++ b/admin/notes/tree-sitter/treesit_record_change
@@ -3,10 +3,10 @@ NOTES ON TREESIT_RECORD_CHANGE
 It is vital that Emacs informs tree-sitter of every change made to the
 buffer, lest tree-sitter's parse tree would be corrupted/out of sync.
 
-All buffer changes in Emacs are made through functions in insdel.c
-(and casefiddle.c), I augmented functions in those files with calls to
-treesit_record_change.  Below is a manifest of all the relevant
-functions in insdel.c as of Emacs 29:
+Almost all buffer changes in Emacs are made through functions in
+insdel.c (see below for exceptions), I augmented functions in insdel.c
+with calls to treesit_record_change.  Below is a manifest of all the
+relevant functions in insdel.c as of Emacs 29:
 
 Function                          Calls
 ----------------------------------------------------------------------
@@ -43,8 +43,176 @@ insert_from_buffer but not insert_from_buffer_1.  I also left a
 reminder comment.
 
 
-As for casefiddle.c, do_casify_unibyte_region and
+EXCEPTIONS
+
+
+There are a couple of functions that replaces characters in-place
+rather than insert/delete. They are in casefiddle.c and editfns.c.
+
+In casefiddle.c, do_casify_unibyte_region and
 do_casify_multibyte_region modifies buffer, but they are static
 functions and are called by casify_region, which calls
 treesit_record_change.  Other higher-level functions calls
-casify_region to do the work.
\ No newline at end of file
+casify_region to do the work.
+
+In editfns.c, subst-char-in-region and translate-region-internal might
+replace characters in-place, I made them to call
+treesit_record_change.  transpose-regions uses memcpy to move text
+around, it calls treesit_record_change too.
+
+I found these exceptions by grepping for signal_after_change and
+checking each caller manually.  Below is all the result as of Emacs 29
+and some comment for each one.  Readers can use
+
+(highlight-regexp "^[^[:space:]]+?\\.c:[[:digit:]]+:[^z-a]+?$" 'highlight)
+
+to make things easier to read.
+
+grep [...] --color=auto -i --directories=skip -nH --null -e signal_after_change *.c
+
+callproc.c:789:             calling prepare_to_modify_buffer and signal_after_change.
+callproc.c:793:             is one call to signal_after_change in each of the
+callproc.c:800:             signal_after_change hasn't.  A continue statement
+callproc.c:804:             again, and this time signal_after_change gets called,
+
+Not code.
+
+callproc.c:820:              signal_after_change (PT - nread, 0, nread);
+callproc.c:863:              signal_after_change (PT - process_coding.produced_char,
+
+Both are called in call-process.  I don’t think we’ll ever use
+tree-sitter in call-process’s stdio buffer, right?  I didn’t check
+line-by-line, but it seems to only use insert_1_both and del_range_2.
+
+casefiddle.c:558:      signal_after_change (start, end - start - added, end - start);
+
+Called in casify-region, calls treesit_record_change.
+
+decompress.c:195:      signal_after_change (data->orig, data->start - data->orig,
+
+Called in unwind_decompress, uses del_range_2, insdel function.
+
+decompress.c:334:  signal_after_change (istart, iend - istart, unwind_data.nbytes);
+
+Called in zlib-decompress-region, uses del_range_2, insdel function.
+
+editfns.c:2139:      signal_after_change (BEGV, size_a, ZV - BEGV);
+
+Called in replace-buffer-contents, which calls del_range and
+Finsert_buffer_substring, both are ok.
+
+editfns.c:2416:      signal_after_change (changed,
+
+Called in subst-char-in-region, which either calls replace_range (a
+insdel function) or modifies buffer content by itself (need to call
+treesit_record_change).
+
+editfns.c:2544:	      /* Reload as signal_after_change in last iteration may GC.  */
+
+Not code.
+
+editfns.c:2604:		  signal_after_change (pos, 1, 1);
+
+Called in translate-region-internal, which has three cases:
+
+if (nc != oc && nc >= 0) {
+  if (len != str_len) {
+	replace_range()
+  } else {
+	while (str_len-- > 0)
+	  *p++ = *str++;
+  }
+}
+else if (nc < 0) {
+  replace_range()
+}
+
+replace_range is ok, but in the case where it manually modifies buffer
+content, it needs to call treesit_record_change.
+
+editfns.c:4779:  signal_after_change (start1, end2 - start1, end2 - start1);
+
+Called in transpose-regions.  It just uses memcpy’s and doesn’t use
+insdel functions; needs to call treesit_record_change.
+
+fileio.c:4825:      signal_after_change (PT, 0, inserted);
+
+Called in insert_file_contents.  Uses insert_1_both (very first in the
+function); del_range_1 and del_range_byte (the optimized way to
+implement replace when decoding isn’t needed); del_range_byte and
+insert_from_buffer (the optimized way used when decoding is needed);
+decode_coding_gap or insert_from_gap_1 (I’m not sure the condition for
+this, but anyway it’s safe).  The function also calls memcpy and
+memmove, but they are irrelevant: memcpy is used for decoding, and
+memmove is moving stuff inside the gap for decode_coding_gap.
+
+I’d love someone to verify this function, since it’s so complicated
+and large, but from what I can tell it’s safe.
+
+fns.c:3998:  signal_after_change (XFIXNAT (beg), 0, inserted_chars);
+
+Called in base64-decode-region, uses insert_1_both and del_range_both,
+safe.
+
+insdel.c:681:      signal_after_change (opoint, 0, len);
+insdel.c:696:      signal_after_change (opoint, 0, len);
+insdel.c:741:      signal_after_change (opoint, 0, len);
+insdel.c:757:      signal_after_change (opoint, 0, len);
+insdel.c:976:  signal_after_change (opoint, 0, PT - opoint);
+insdel.c:996:  signal_after_change (opoint, 0, PT - opoint);
+insdel.c:1187:  signal_after_change (opoint, 0, PT - opoint);
+insdel.c:1412:   signal_after_change.  */
+insdel.c:1585:      signal_after_change (from, nchars_del, GPT - from);
+insdel.c:1600:   prepare_to_modify_buffer and never call signal_after_change.
+insdel.c:1603:   region once.  Apart from signal_after_change, any caller of this
+insdel.c:1747:  signal_after_change (from, to - from, 0);
+insdel.c:1789:  signal_after_change (from, to - from, 0);
+insdel.c:1833:  signal_after_change (from, to - from, 0);
+insdel.c:2223:signal_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins)
+insdel.c:2396:  signal_after_change (begpos, endpos - begpos - change, endpos - begpos);
+
+I’ve checked all insdel functions.  We can assume insdel functions are
+all safe.
+
+json.c:790:  signal_after_change (PT, 0, inserted);
+
+Called in json-insert, calls either decode_coding_gap or
+insert_from_gap_1, both are safe. Calls memmove but it’s for
+decode_coding_gap.
+
+keymap.c:2873:	    /* Insert calls signal_after_change which may GC.  */
+
+Not code.
+
+print.c:219:      signal_after_change (PT - print_buffer.pos, 0, print_buffer.pos);
+
+Called in print_finish, calls copy_text and insert_1_both, safe.
+
+process.c:6365:	 process buffer is changed in the signal_after_change above.
+search.c:2763:     (see signal_before_change and signal_after_change).  Try to error
+
+Not code.
+
+search.c:2777:  signal_after_change (sub_start, sub_end - sub_start, SCHARS (newtext));
+
+Called in replace_match.  Calls replace_range, upcase-region,
+upcase-initials-region (both calls casify_region in the end), safe.
+Calls memcpy but it’s for string manipulation.
+
+textprop.c:1261:		signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1272:		signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1283:	    signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1458:    signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1652:		signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1661:		signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1672:	    signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1750:     before changes are made and signal_after_change when we are done.
+textprop.c:1752:     and call signal_after_change before returning if MODIFIED. */
+textprop.c:1764:		    signal_after_change (XFIXNUM (start),
+textprop.c:1778:		signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1791:		signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
+textprop.c:1810:                signal_after_change (XFIXNUM (start),
+
+We don’t care about text property changes.
+
+Grep finished with 51 matches found at Wed Jun 28 15:12:23
-- 
2.33.1


[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




  reply	other threads:[~2023-06-29  0:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 16:46 bug#64329: 29.0.92; treesit/fill-paragraph syntax highlighting problem Troy Brown
2023-06-28 21:23 ` Yuan Fu
2023-06-29  0:17   ` Yuan Fu [this message]
2023-06-29  5:22     ` Eli Zaretskii
2023-06-29 18:17       ` Yuan Fu
2023-06-29  5:10   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14493480-9226-4AB1-A313-E4E2D61AB0BD@gmail.com \
    --to=casouri@gmail.com \
    --cc=64329@debbugs.gnu.org \
    --cc=brownts@troybrown.dev \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).