* bug#61369: Problem with keeping tree-sitter parse tree up-to-date @ 2023-02-08 15:34 Dmitry Gutov 2023-02-08 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Dmitry Gutov @ 2023-02-08 15:34 UTC (permalink / raw) To: 61369 1. Have some buffer with text "use std::Path::{self, Path, PathBuf}; // good: std is a crate name ... (maybe something else " 2. Have this text in a different buffer (I used scratch): " let date = DateTime::<chrono::Utc>::from_utc(date, chrono::Utc); " Meaning, the buffer starts with two empty lines. 3. Select the first line from the first buffer including the trailing newline, yank and then paste at the beginning of the second buffer. The second buffer will now look like this: "use std::Path::{self, Path, PathBuf}; // good: std is a crate name let date = DateTime::<chrono::Utc>::from_utc(date, chrono::Utc); " The parse tree will, however, be (according to treesit-explore-mode): (source_file (use_declaration use argument: (scoped_use_list path: (scoped_identifier path: (identifier) :: name: (identifier)) :: list: (use_list { (self) , (identifier) , (identifier) })) ;) (line_comment) (let_declaration let pattern: (identifier) = value: (scoped_identifier path: (generic_type type: (type_identifier) :: type_arguments: (type_arguments < (scoped_type_identifier path: (identifier) :: name: (type_identifier)) >)) :: name: (identifier)) (ERROR ( (identifier) , (scoped_identifier path: (identifier) :: name: (identifier)) ;) ;)) But if I edit the buffer after that, e.g. add and remove a space at the beginning, the error node disappears: (source_file (use_declaration use argument: (scoped_use_list path: (scoped_identifier path: (identifier) :: name: (identifier)) :: list: (use_list { (self) , (identifier) , (identifier) })) ;) (line_comment) (let_declaration let pattern: (identifier) = value: (call_expression function: (scoped_identifier path: (generic_type type: (type_identifier) :: type_arguments: (type_arguments < (scoped_type_identifier path: (identifier) :: name: (type_identifier)) >)) :: name: (identifier)) arguments: (arguments ( (identifier) , (scoped_identifier path: (identifier) :: name: (identifier)) ))) ;)) ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-08 15:34 bug#61369: Problem with keeping tree-sitter parse tree up-to-date Dmitry Gutov @ 2023-02-08 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-02-08 19:41 ` Dmitry Gutov 2023-02-10 1:22 ` Yuan Fu ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-08 18:20 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 61369 Dmitry Gutov <dgutov@yandex.ru> writes: > 1. Have some buffer with text > > "use std::Path::{self, Path, PathBuf}; // good: std is a crate name > ... (maybe something else > " > > 2. Have this text in a different buffer (I used scratch): > > " > > > let date = DateTime::<chrono::Utc>::from_utc(date, chrono::Utc); > " > > Meaning, the buffer starts with two empty lines. > > 3. Select the first line from the first buffer including the trailing newline, > yank and then paste at the beginning of the second buffer. > > The second buffer will now look like this: > > "use std::Path::{self, Path, PathBuf}; // good: std is a crate name > > > let date = DateTime::<chrono::Utc>::from_utc(date, chrono::Utc); > " I just want to confirm that I can reproduce this, and that if you skip the trailing newline from the use-statement, I don't get this behavior. So it seems like the newline is the crucial point, right? Theo ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-08 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-08 19:41 ` Dmitry Gutov 0 siblings, 0 replies; 16+ messages in thread From: Dmitry Gutov @ 2023-02-08 19:41 UTC (permalink / raw) To: Theodor Thornhill; +Cc: 61369@debbugs.gnu.org [-- Attachment #1: Type: text/html, Size: 1628 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-08 15:34 bug#61369: Problem with keeping tree-sitter parse tree up-to-date Dmitry Gutov 2023-02-08 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-10 1:22 ` Yuan Fu 2023-02-10 1:38 ` Dmitry Gutov 2023-02-13 9:10 ` Yuan Fu 2023-02-13 23:59 ` Yuan Fu 3 siblings, 1 reply; 16+ messages in thread From: Yuan Fu @ 2023-02-10 1:22 UTC (permalink / raw) To: Dmitry Gutov; +Cc: theo, 61369 > I just want to confirm that I can reproduce this, and that if you skip > the trailing newline from the use-statement, I don't get this behavior. > So it seems like the newline is the crucial point, right? > > Yes, same. > > Thr trailing newline is necessary. > > The empty lines at the beginning of the buffer (being copied to) are necessary to reproduce this as well. Hmmm, it might be related to how does tree-sitter does incremental parsing? If the newline is necessary, then I guess it’s not because Emacs missed characters when reporting edits to tree-sitter. Yuan ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-10 1:22 ` Yuan Fu @ 2023-02-10 1:38 ` Dmitry Gutov 0 siblings, 0 replies; 16+ messages in thread From: Dmitry Gutov @ 2023-02-10 1:38 UTC (permalink / raw) To: Yuan Fu; +Cc: theo, 61369 On 10/02/2023 03:22, Yuan Fu wrote: >> I just want to confirm that I can reproduce this, and that if you skip >> the trailing newline from the use-statement, I don't get this behavior. >> So it seems like the newline is the crucial point, right? >> >> Yes, same. >> >> Thr trailing newline is necessary. >> >> The empty lines at the beginning of the buffer (being copied to) are necessary to reproduce this as well. > Hmmm, it might be related to how does tree-sitter does incremental > parsing? If the newline is necessary, then I guess it’s not because > Emacs missed characters when reporting edits to tree-sitter. The newline is somewhat necessary: the scenario doesn't work, for example, if the pasted text doesn't include the newline but the buffer had an additional (third) one at the top. But the scenario also doesn't work if some other (any) character is removed from the yanked line before pasting: it could be even one after the comment instruction (//). OTOH, if I add an extra char to the yanked line, anywhere, I can skip the newline. E.g. I can paste use std::path::{self, Path, PathBuf}; // good: std is a crate namee without a newline and still see the exact same syntax error. So it looks more like an off-by-one error somewhere. Maybe in our code, but maybe in tree-sitter somewhere. ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-08 15:34 bug#61369: Problem with keeping tree-sitter parse tree up-to-date Dmitry Gutov 2023-02-08 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-02-10 1:22 ` Yuan Fu @ 2023-02-13 9:10 ` Yuan Fu 2023-02-13 23:59 ` Yuan Fu 3 siblings, 0 replies; 16+ messages in thread From: Yuan Fu @ 2023-02-13 9:10 UTC (permalink / raw) To: Dmitry Gutov; +Cc: theo, 61369 Dmitry Gutov <dgutov@yandex.ru> writes: > On 10/02/2023 03:22, Yuan Fu wrote: >>> I just want to confirm that I can reproduce this, and that if you skip >>> the trailing newline from the use-statement, I don't get this behavior. >>> So it seems like the newline is the crucial point, right? >>> >>> Yes, same. >>> >>> Thr trailing newline is necessary. >>> >>> The empty lines at the beginning of the buffer (being copied to) are necessary to reproduce this as well. >> Hmmm, it might be related to how does tree-sitter does incremental >> parsing? If the newline is necessary, then I guess it’s not because >> Emacs missed characters when reporting edits to tree-sitter. > > The newline is somewhat necessary: the scenario doesn't work, for > example, if the pasted text doesn't include the newline but the buffer > had an additional (third) one at the top. > > But the scenario also doesn't work if some other (any) character is > removed from the yanked line before pasting: it could be even one > after the comment instruction (//). > > OTOH, if I add an extra char to the yanked line, anywhere, I can skip > the newline. E.g. I can paste > > use std::path::{self, Path, PathBuf}; // good: std is a crate namee > > without a newline and still see the exact same syntax error. > > So it looks more like an off-by-one error somewhere. Maybe in our > code, but maybe in tree-sitter somewhere. Some progress report: I added a function that reads the buffer like a parser would, like this: DEFUN ("treesit--parser-view", Ftreesit__parser_view, Streesit__parser_view, 1, 1, 0, doc: /* Return the view of PARSER. Read buffer like PARSER would into a string and return it. */) (Lisp_Object parser) { const ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg; const ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end; const ptrdiff_t view_len = visible_end - visible_beg; char *str_buf = xzalloc (view_len + 1); uint32_t read = 0; TSPoint pos = { 0 }; for (int idx = 0; idx < view_len; idx++) { const char *ch = treesit_read_buffer (XTS_PARSER (parser), idx, pos, &read); if (read == 0) { xfree (str_buf); xsignal1 (Qtreesit_error, make_fixnum (idx)); } else str_buf[idx] = *ch; } Lisp_Object ret_str = make_string (str_buf, view_len); xfree (str_buf); return ret_str; } After I follow the steps and got the error node, I run this function on the parser, and the returned string looks good. Next I’ll try to log every character actually read by the parser and see if anything seems fishy. Yuan ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-08 15:34 bug#61369: Problem with keeping tree-sitter parse tree up-to-date Dmitry Gutov ` (2 preceding siblings ...) 2023-02-13 9:10 ` Yuan Fu @ 2023-02-13 23:59 ` Yuan Fu 2023-02-15 2:17 ` Dmitry Gutov 3 siblings, 1 reply; 16+ messages in thread From: Yuan Fu @ 2023-02-13 23:59 UTC (permalink / raw) To: Dmitry Gutov; +Cc: theo, 61369 [-- Attachment #1: Type: text/plain, Size: 5401 bytes --] Yuan Fu <casouri@gmail.com> writes: > Dmitry Gutov <dgutov@yandex.ru> writes: > >> On 10/02/2023 03:22, Yuan Fu wrote: >>>> I just want to confirm that I can reproduce this, and that if you skip >>>> the trailing newline from the use-statement, I don't get this behavior. >>>> So it seems like the newline is the crucial point, right? >>>> >>>> Yes, same. >>>> >>>> Thr trailing newline is necessary. >>>> >>>> The empty lines at the beginning of the buffer (being copied to) are necessary to reproduce this as well. >>> Hmmm, it might be related to how does tree-sitter does incremental >>> parsing? If the newline is necessary, then I guess it’s not because >>> Emacs missed characters when reporting edits to tree-sitter. >> >> The newline is somewhat necessary: the scenario doesn't work, for >> example, if the pasted text doesn't include the newline but the buffer >> had an additional (third) one at the top. >> >> But the scenario also doesn't work if some other (any) character is >> removed from the yanked line before pasting: it could be even one >> after the comment instruction (//). >> >> OTOH, if I add an extra char to the yanked line, anywhere, I can skip >> the newline. E.g. I can paste >> >> use std::path::{self, Path, PathBuf}; // good: std is a crate namee >> >> without a newline and still see the exact same syntax error. >> >> So it looks more like an off-by-one error somewhere. Maybe in our >> code, but maybe in tree-sitter somewhere. > > Some progress report: I added a function that reads the buffer like a > parser would, like this: > > DEFUN ("treesit--parser-view", > Ftreesit__parser_view, > Streesit__parser_view, 1, 1, 0, > doc: /* Return the view of PARSER. > Read buffer like PARSER would into a string and return it. */) > (Lisp_Object parser) > { > const ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg; > const ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end; > const ptrdiff_t view_len = visible_end - visible_beg; > > char *str_buf = xzalloc (view_len + 1); > uint32_t read = 0; > TSPoint pos = { 0 }; > for (int idx = 0; idx < view_len; idx++) > { > const char *ch = treesit_read_buffer (XTS_PARSER (parser), > idx, pos, &read); > if (read == 0) > { > xfree (str_buf); > xsignal1 (Qtreesit_error, make_fixnum (idx)); > } > else > str_buf[idx] = *ch; > } > Lisp_Object ret_str = make_string (str_buf, view_len); > xfree (str_buf); > return ret_str; > } > > After I follow the steps and got the error node, I run this function on > the parser, and the returned string looks good. > > Next I’ll try to log every character actually read by the parser and see > if anything seems fishy. I don’t know if it’s good news or bad news, but it doesn’t seem like a off-by-one. Here is what I did: 1. I applied the attached patch (patch.diff) so that treesit_read_buffer, the function used by tree-sitter parser to read buffer contents, prints the position it read and the character it gets to stdout. 2. I open test.rs which contains " let date = DateTime::<chrono::Utc>::from_utc(date, chrono::Utc); " as in the recipe. I have rust-ts-mode enabled, so Emacs prints the characters read by the parser to stdout. I type return several times to separate this first batch of output from the next, which is what I’m interested in. 3. I paste "use std::Path::{self, Path, PathBuf}; // good: std is a crate name " at the beginning of the buffer. Now the parse tree contains that error node. I go to the terminal, copy the output out, which looks like: 0 117 1 115 2 101 3 32 0 117 1 115 2 101 ... 133 59 134 10 134 10 134 10 134 10 4. I paste this output (output.txt) into a buffer, and reconstruct the text read by the parser with (setq str (reconstruct)), where reconstruct is: (defun reconstruct () (goto-char (point-min)) (let ((result "")) (while (< (point) (point-max)) (let* ((str (buffer-substring (point) (line-end-position))) (nums (string-split str)) (pos (string-to-number (car nums))) (char (string-to-number (cadr nums)))) (when (not (< pos (length result))) (setq result (concat result (make-string (- (1+ pos) (length result)) ?0)))) (setf (aref result pos) char)) (forward-line 1)) result)) 5. I insert str into a new buffer, and (to my disappointment) the content is identical to the buffer text. There are two surprises here: 1) there isn’t an off-by-one bug, 2) the parser actually read the whole buffer, rather than reading only the new content. Then there are even less reason for it to create that error node. In addition, I inserted a new line in the Rust source buffer (test.rs) (which fixes the error node), here is what the parser read after that insertion: "0000000000000000000000000000000000000000000000000000000000000000000 let 0000 = 000000000000000000000000000000000000000000000000000);" 0 means it didn’t read that position, we can see that the parser read all the newlines, "let ", " = ", and ");". I can’t discern anything interesting from that, tho. Yuan [-- Attachment #2: output.txt --] [-- Type: text/plain, Size: 1375 bytes --] 0 117 1 115 2 101 3 32 0 117 1 115 2 101 3 32 4 115 3 32 4 115 5 116 6 100 7 58 4 115 5 116 6 100 7 58 8 58 9 80 10 97 11 116 12 104 13 58 9 80 13 58 14 58 15 123 16 115 17 101 18 108 19 102 20 44 16 115 17 101 18 108 19 102 20 44 21 32 22 80 21 32 22 80 23 97 24 116 25 104 26 44 22 80 26 44 27 32 28 80 27 32 28 80 29 97 30 116 31 104 32 66 33 117 34 102 35 125 28 80 35 125 36 59 37 32 38 32 39 47 40 47 37 32 38 32 39 47 40 47 41 32 42 103 43 111 44 111 45 100 46 58 47 32 48 115 49 116 50 100 51 32 52 105 53 115 54 32 55 97 56 32 57 99 58 114 59 97 60 116 61 101 62 32 63 110 64 97 65 109 66 101 67 10 68 10 69 10 70 108 67 10 68 10 69 10 70 108 71 101 72 116 73 32 70 108 71 101 72 116 73 32 74 100 73 32 74 100 75 97 76 116 77 101 78 32 74 100 75 97 78 32 79 61 78 32 79 61 80 32 81 68 80 32 81 68 82 97 83 116 84 101 85 84 86 105 87 109 88 101 89 58 81 68 89 58 90 58 91 60 92 99 93 104 94 114 95 111 96 110 97 111 98 58 92 99 93 104 94 114 98 58 99 58 100 85 101 116 102 99 103 62 100 85 103 62 104 58 105 58 106 102 107 114 108 111 109 109 110 95 111 117 112 116 113 99 114 40 106 102 107 114 114 40 115 100 116 97 117 116 118 101 119 44 115 100 116 97 119 44 120 32 121 99 120 32 121 99 122 104 123 114 124 111 125 110 126 111 127 58 121 99 122 104 123 114 127 58 128 58 129 85 130 116 131 99 132 41 129 85 133 59 134 10 132 41 133 59 134 10 134 10 134 10 134 10 [-- Attachment #3: patch.diff --] [-- Type: application/octet-stream, Size: 1767 bytes --] diff --git a/src/treesit.c b/src/treesit.c index cab2f0d5354..ad87a6ae759 100644 --- a/src/treesit.c +++ b/src/treesit.c @@ -1101,6 +1101,13 @@ treesit_read_buffer (void *parser, uint32_t byte_index, assertion should never hit. */ eassert (len < UINT32_MAX); *bytes_read = (uint32_t) len; + + if (*bytes_read > 0) + { + printf ("%d %d\n", byte_index, *beg); + fflush (stdout); + } + return beg; } @@ -3432,6 +3439,37 @@ DEFUN ("treesit-subtree-stat", } } +DEFUN ("treesit--parser-view", + Ftreesit__parser_view, + Streesit__parser_view, 1, 1, 0, + doc: /* Return the view of PARSER. +Read buffer like PARSER would into a string and return it. */) + (Lisp_Object parser) +{ + const ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg; + const ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end; + const ptrdiff_t view_len = visible_end - visible_beg; + + char *str_buf = xzalloc (view_len + 1); + uint32_t read = 0; + TSPoint pos = { 0 }; + for (int idx = 0; idx < view_len; idx++) + { + const char *ch = treesit_read_buffer (XTS_PARSER (parser), + idx, pos, &read); + if (read == 0) + { + xfree (str_buf); + xsignal1 (Qtreesit_error, make_fixnum (idx)); + } + else + str_buf[idx] = *ch; + } + Lisp_Object ret_str = make_string (str_buf, view_len); + xfree (str_buf); + return ret_str; +} + #endif /* HAVE_TREE_SITTER */ DEFUN ("treesit-available-p", Ftreesit_available_p, @@ -3633,6 +3671,8 @@ syms_of_treesit (void) defsubr (&Streesit_search_forward); defsubr (&Streesit_induce_sparse_tree); defsubr (&Streesit_subtree_stat); + + defsubr (&Streesit__parser_view); #endif /* HAVE_TREE_SITTER */ defsubr (&Streesit_available_p); } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-13 23:59 ` Yuan Fu @ 2023-02-15 2:17 ` Dmitry Gutov 2023-02-15 22:44 ` Dmitry Gutov 0 siblings, 1 reply; 16+ messages in thread From: Dmitry Gutov @ 2023-02-15 2:17 UTC (permalink / raw) To: Yuan Fu; +Cc: theo, 61369 On 14/02/2023 01:59, Yuan Fu wrote: > There are two surprises here: 1) there isn’t an off-by-one bug, 2) the > parser actually read the whole buffer, rather than reading only the new > content. Then there are even less reason for it to create that error > node. The parser reads the whole buffer, but if it tries to reparse based on the previous parse tree with incorrect positions, it might get into an invalid state as a result. I've tried gdb-ing treesit_tree_edit_1 (after dropping the 'inline' qualifier), and here's what I see: - If I paste the test line without the trailing newline or not, the value. - If I paste the test line with the trailing newline, the value of new_end_byte is still 67. But then it is followed by this call right away: Thread 1 "emacs" hit Breakpoint 3, treesit_tree_edit_1 (tree=tree@entry=0x5555574139b0, start_byte=start_byte@entry=134, old_end_byte=old_end_byte@entry=134, new_end_byte=135) at treesit.c:739 - If I 'undo' after that, the call is as expected: Thread 1 "emacs" hit Breakpoint 3, treesit_tree_edit_1 (tree=0x555557435cd0, start_byte=start_byte@entry=0, old_end_byte=old_end_byte@entry=68, new_end_byte=new_end_byte@entry=0) at treesit.c:739 739 { So I tried again to figure out the odd call, with the backtrace: Thread 1 "emacs" hit Breakpoint 3, treesit_tree_edit_1 (tree=tree@entry=0x5555575b64f0, start_byte=start_byte@entry=134, old_end_byte=old_end_byte@entry=134, new_end_byte=269) at treesit.c:739 739 { (gdb) backtrace #0 treesit_tree_edit_1 (tree=tree@entry=0x5555575b64f0, start_byte=start_byte@entry=134, old_end_byte=old_end_byte@entry=134, new_end_byte=269) at treesit.c:739 #1 0x00005555557cb085 in treesit_sync_visible_region (parser=parser@entry=XIL(0x555556fc329d)) at treesit.c:931 #2 0x00005555557ccf28 in treesit_ensure_parsed (parser=XIL(0x555556fc329d)) at treesit.c:1025 #3 Ftreesit_parser_root_node (parser=XIL(0x555556fc329d)) at treesit.c:1507 treesit.c:739 points to a treesit_tree_edit_1 call which is predicated on this condition: if (visible_end < BUF_ZV_BYTE (buffer)) ...which shouldn't be the case since the buffer is small enough to fit in the default window. It might already be the consequence of passing the wrong value of new_end_byte to ts_tree_edit, though. Going back to the first call, the backtrace looks like this: Thread 1 "emacs" hit Breakpoint 3, treesit_tree_edit_1 (tree=0x5555574f0ff0, start_byte=start_byte@entry=0, old_end_byte=old_end_byte@entry=0, new_end_byte=new_end_byte@entry=67) at treesit.c:739 739 { (gdb) backtrace #0 treesit_tree_edit_1 (tree=0x5555574f0ff0, start_byte=start_byte@entry=0, old_end_byte=old_end_byte@entry=0, new_end_byte=new_end_byte@entry=67) at treesit.c:739 #1 0x00005555557cc991 in treesit_record_change (start_byte=1, old_end_byte=1, new_end_byte=69) at treesit.c:806 #2 0x00005555556f8bb7 in insert_from_string_1 (string=XIL(0x55555744c4f4), pos=0, pos_byte=0, nchars=68, nbytes=68, inherit=<optimized out>, before_markers=false) at insdel.c:1084 Seems like treesit_record_change turns new_end_byte=69 into new_end_byte=67 inside treesit_tree_edit_1. It seems to fail in this calculation: ptrdiff_t new_end_offset = (min (visible_end, max (visible_end, new_end_byte)) - visible_beg); because visible_end is still 68 there. It value gets updated later, closer to the end of this function. ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-15 2:17 ` Dmitry Gutov @ 2023-02-15 22:44 ` Dmitry Gutov 2023-02-17 22:32 ` Yuan Fu 0 siblings, 1 reply; 16+ messages in thread From: Dmitry Gutov @ 2023-02-15 22:44 UTC (permalink / raw) To: Yuan Fu; +Cc: theo, 61369 On 15/02/2023 04:17, Dmitry Gutov wrote: > Seems like treesit_record_change turns new_end_byte=69 into > new_end_byte=67 inside treesit_tree_edit_1. > > It seems to fail in this calculation: > > ptrdiff_t new_end_offset = (min (visible_end, > max (visible_end, new_end_byte)) > - visible_beg); > > because visible_end is still 68 there. It value gets updated later, > closer to the end of this function. So FWIW the patch below fixes the problem. But I'm not sure about change's clipping by the current restriction, or how it should be handled exactly. diff --git a/src/treesit.c b/src/treesit.c index cab2f0d5354..9f15b88a8bd 100644 --- a/src/treesit.c +++ b/src/treesit.c @@ -794,9 +794,7 @@ treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte, ptrdiff_t old_end_offset = (min (visible_end, max (visible_beg, old_end_byte)) - visible_beg); - ptrdiff_t new_end_offset = (min (visible_end, - max (visible_beg, new_end_byte)) - - visible_beg); + ptrdiff_t new_end_offset = max (visible_beg, new_end_byte) - visible_beg; eassert (start_offset <= old_end_offset); eassert (start_offset <= new_end_offset); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-15 22:44 ` Dmitry Gutov @ 2023-02-17 22:32 ` Yuan Fu 2023-02-18 0:11 ` Dmitry Gutov 0 siblings, 1 reply; 16+ messages in thread From: Yuan Fu @ 2023-02-17 22:32 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Theodor Thornhill, 61369 > On Feb 15, 2023, at 2:44 PM, Dmitry Gutov <dgutov@yandex.ru> wrote: > > On 15/02/2023 04:17, Dmitry Gutov wrote: >> Seems like treesit_record_change turns new_end_byte=69 into new_end_byte=67 inside treesit_tree_edit_1. >> It seems to fail in this calculation: >> ptrdiff_t new_end_offset = (min (visible_end, >> max (visible_end, new_end_byte)) >> - visible_beg); >> because visible_end is still 68 there. It value gets updated later, closer to the end of this function. > > So FWIW the patch below fixes the problem. But I'm not sure about change's clipping by the current restriction, or how it should be handled exactly. > > diff --git a/src/treesit.c b/src/treesit.c > index cab2f0d5354..9f15b88a8bd 100644 > --- a/src/treesit.c > +++ b/src/treesit.c > @@ -794,9 +794,7 @@ treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte, > ptrdiff_t old_end_offset = (min (visible_end, > max (visible_beg, old_end_byte)) > - visible_beg); > - ptrdiff_t new_end_offset = (min (visible_end, > - max (visible_beg, new_end_byte)) > - - visible_beg); > + ptrdiff_t new_end_offset = max (visible_beg, new_end_byte) - visible_beg; > eassert (start_offset <= old_end_offset); > eassert (start_offset <= new_end_offset); Thank you very much! I thought that clipping the change into the fixed visible range, and rely on treesit_sync_visible_region to add back the clipped “tail” when we extend the visible range would be equivalent to not clipping, but I guess clipping and re-adding affects how incremental parsing works inside tree-sitter. I don’t think this change would have any adverse effect, because if you think of it, inserting text in a narrowed region always extends the region, rather than pushing text at the end out of the narrowed region. So the right thing to do here is in fact not clipping new_end_offset. I pushed this change. Yuan ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-17 22:32 ` Yuan Fu @ 2023-02-18 0:11 ` Dmitry Gutov 2023-02-18 1:14 ` Yuan Fu 2023-02-18 7:15 ` Eli Zaretskii 0 siblings, 2 replies; 16+ messages in thread From: Dmitry Gutov @ 2023-02-18 0:11 UTC (permalink / raw) To: Yuan Fu; +Cc: Theodor Thornhill, 61369 On 18/02/2023 00:32, Yuan Fu wrote: > Thank you very much! I thought that clipping the change into the fixed visible range, and rely on treesit_sync_visible_region to add back the clipped “tail” when we extend the visible range would be equivalent to not clipping, but I guess clipping and re-adding affects how incremental parsing works inside tree-sitter. It seems like the "repairing" sync used a different range, one that didn't include the character number 68 inserted from the beginning. It just synced the 1 or 2 characters at the end of the buffer, the difference between the computed visible_end and the actual BUF_ZV_BYTE. > I don’t think this change would have any adverse effect, because if you think of it, inserting text in a narrowed region always extends the region, rather than pushing text at the end out of the narrowed region. So the right thing to do here is in fact not clipping new_end_offset. I figured it could be a problem if both old_end_byte and new_end_byte extend past the current restriction. But I'm not sure whether that could actually happen in practice. The obvious attempts (undo a change outside of the narrowing, or revert the buffer when narrowing is in effect) didn't play out, but I'm not sure whether there is an actual hard limit on modifying the text outside of the current restriction. > I pushed this change. Thanks. Good to see it make it in. ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-18 0:11 ` Dmitry Gutov @ 2023-02-18 1:14 ` Yuan Fu 2023-02-18 1:25 ` Dmitry Gutov 2023-02-18 7:15 ` Eli Zaretskii 1 sibling, 1 reply; 16+ messages in thread From: Yuan Fu @ 2023-02-18 1:14 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Theodor Thornhill, 61369 > On Feb 17, 2023, at 4:11 PM, Dmitry Gutov <dgutov@yandex.ru> wrote: > > On 18/02/2023 00:32, Yuan Fu wrote: >> Thank you very much! I thought that clipping the change into the fixed visible range, and rely on treesit_sync_visible_region to add back the clipped “tail” when we extend the visible range would be equivalent to not clipping, but I guess clipping and re-adding affects how incremental parsing works inside tree-sitter. > > It seems like the "repairing" sync used a different range, one that didn't include the character number 68 inserted from the beginning. > > It just synced the 1 or 2 characters at the end of the buffer, the difference between the computed visible_end and the actual BUF_ZV_BYTE. That should be enough, no? Because other text didn’t change, they just moved. And tree-sitter should know that they moved. Or maybe I’m misunderstanding what you mean. > >> I don’t think this change would have any adverse effect, because if you think of it, inserting text in a narrowed region always extends the region, rather than pushing text at the end out of the narrowed region. So the right thing to do here is in fact not clipping new_end_offset. > > I figured it could be a problem if both old_end_byte and new_end_byte extend past the current restriction. That should be fine (ie, technically correct), since when we widen, the clipped text are reparsed by tree-sitter as new text. > > But I'm not sure whether that could actually happen in practice. The obvious attempts (undo a change outside of the narrowing, or revert the buffer when narrowing is in effect) didn't play out, but I'm not sure whether there is an actual hard limit on modifying the text outside of the current restriction. It is my impression that Emacs in general enforces the narrowing restriction strictly. And we are still correct when exceptions occur. Yuan ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-18 1:14 ` Yuan Fu @ 2023-02-18 1:25 ` Dmitry Gutov 2023-02-18 10:05 ` Yuan Fu 0 siblings, 1 reply; 16+ messages in thread From: Dmitry Gutov @ 2023-02-18 1:25 UTC (permalink / raw) To: Yuan Fu; +Cc: Theodor Thornhill, 61369 On 18/02/2023 03:14, Yuan Fu wrote: > > >> On Feb 17, 2023, at 4:11 PM, Dmitry Gutov <dgutov@yandex.ru> wrote: >> >> On 18/02/2023 00:32, Yuan Fu wrote: >>> Thank you very much! I thought that clipping the change into the fixed visible range, and rely on treesit_sync_visible_region to add back the clipped “tail” when we extend the visible range would be equivalent to not clipping, but I guess clipping and re-adding affects how incremental parsing works inside tree-sitter. >> >> It seems like the "repairing" sync used a different range, one that didn't include the character number 68 inserted from the beginning. >> >> It just synced the 1 or 2 characters at the end of the buffer, the difference between the computed visible_end and the actual BUF_ZV_BYTE. > > That should be enough, no? Because other text didn’t change, they just moved. And tree-sitter should know that they moved. Or maybe I’m misunderstanding what you mean. But the "unsynced" character is at position 68. And we just tell tree-sitter to update positions 134-136. So it stays ignorant of the changed char in the middle of the buffer. It's not just about not knowing about the change either (the character in question is a newline, so its absence wouldn't lead to a syntax error), but about wrong offsets in the old parse tree, based on which the new tree is generated. That probably creates a wrong picture of the source text in the parser. >>> I don’t think this change would have any adverse effect, because if you think of it, inserting text in a narrowed region always extends the region, rather than pushing text at the end out of the narrowed region. So the right thing to do here is in fact not clipping new_end_offset. >> >> I figured it could be a problem if both old_end_byte and new_end_byte extend past the current restriction. > > That should be fine (ie, technically correct), since when we widen, the clipped text are reparsed by tree-sitter as new text. I guess the effect I was thinking of is that XTS_PARSER (lisp_parser)->visible_end would end up with a higher value than BUF_ZV_BYTE. Not sure if it's a problem. >> >> But I'm not sure whether that could actually happen in practice. The obvious attempts (undo a change outside of the narrowing, or revert the buffer when narrowing is in effect) didn't play out, but I'm not sure whether there is an actual hard limit on modifying the text outside of the current restriction. > > It is my impression that Emacs in general enforces the narrowing restriction strictly. And we are still correct when exceptions occur. Very good. ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-18 1:25 ` Dmitry Gutov @ 2023-02-18 10:05 ` Yuan Fu 0 siblings, 0 replies; 16+ messages in thread From: Yuan Fu @ 2023-02-18 10:05 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Theodor Thornhill, 61369 > On Feb 17, 2023, at 5:25 PM, Dmitry Gutov <dgutov@yandex.ru> wrote: > > On 18/02/2023 03:14, Yuan Fu wrote: >>> On Feb 17, 2023, at 4:11 PM, Dmitry Gutov <dgutov@yandex.ru> wrote: >>> >>> On 18/02/2023 00:32, Yuan Fu wrote: >>>> Thank you very much! I thought that clipping the change into the fixed visible range, and rely on treesit_sync_visible_region to add back the clipped “tail” when we extend the visible range would be equivalent to not clipping, but I guess clipping and re-adding affects how incremental parsing works inside tree-sitter. >>> >>> It seems like the "repairing" sync used a different range, one that didn't include the character number 68 inserted from the beginning. >>> >>> It just synced the 1 or 2 characters at the end of the buffer, the difference between the computed visible_end and the actual BUF_ZV_BYTE. >> That should be enough, no? Because other text didn’t change, they just moved. And tree-sitter should know that they moved. Or maybe I’m misunderstanding what you mean. > > But the "unsynced" character is at position 68. > > And we just tell tree-sitter to update positions 134-136. So it stays ignorant of the changed char in the middle of the buffer. > > It's not just about not knowing about the change either (the character in question is a newline, so its absence wouldn't lead to a syntax error), but about wrong offsets in the old parse tree, based on which the new tree is generated. That probably creates a wrong picture of the source text in the parser. Ok, I made some visualization to understand it, and yeah you are right. I’ll need to modify the comment a bit. |visible range| updated range ------------- |aaaaaa| |bbbbbbbbaaaa|aa start: 0, old_end: 0, new_end: 6 ------ |bbbbbbbbaaaaaa| start: 12, old_end: 12, new_end: 14 -- > >>>> I don’t think this change would have any adverse effect, because if you think of it, inserting text in a narrowed region always extends the region, rather than pushing text at the end out of the narrowed region. So the right thing to do here is in fact not clipping new_end_offset. >>> >>> I figured it could be a problem if both old_end_byte and new_end_byte extend past the current restriction. >> That should be fine (ie, technically correct), since when we widen, the clipped text are reparsed by tree-sitter as new text. > > I guess the effect I was thinking of is that > > XTS_PARSER (lisp_parser)->visible_end > > would end up with a higher value than BUF_ZV_BYTE. Not sure if it's a problem. It shouldn’t be, since BUF_ZV_BYTE should automatically grow when user inserts text. Even if it does, we always call treesit_sync_visible_region to sync up visible_beg/end with BUF_(Z)V_BYTE before parsing, so it shouldn’t be a problem. Yuan ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-18 0:11 ` Dmitry Gutov 2023-02-18 1:14 ` Yuan Fu @ 2023-02-18 7:15 ` Eli Zaretskii 2023-02-18 17:21 ` Dmitry Gutov 1 sibling, 1 reply; 16+ messages in thread From: Eli Zaretskii @ 2023-02-18 7:15 UTC (permalink / raw) To: Dmitry Gutov; +Cc: casouri, theo, 61369 > Cc: Theodor Thornhill <theo@thornhill.no>, 61369@debbugs.gnu.org > Date: Sat, 18 Feb 2023 02:11:22 +0200 > From: Dmitry Gutov <dgutov@yandex.ru> > > I'm not sure whether there is an actual hard limit on modifying the > text outside of the current restriction. It is, for most/all practical purposes. If you try to modify text outside of the current restriction, you risk many parts of code barfing or signaling errors on you. For example, conversion from character to byte positions and vice versa will stop working (and in a build with --enable-checking will actually raise SIGABRT). ^ permalink raw reply [flat|nested] 16+ messages in thread
* bug#61369: Problem with keeping tree-sitter parse tree up-to-date 2023-02-18 7:15 ` Eli Zaretskii @ 2023-02-18 17:21 ` Dmitry Gutov 0 siblings, 0 replies; 16+ messages in thread From: Dmitry Gutov @ 2023-02-18 17:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, theo, 61369 On 18/02/2023 09:15, Eli Zaretskii wrote: >> Cc: Theodor Thornhill<theo@thornhill.no>,61369@debbugs.gnu.org >> Date: Sat, 18 Feb 2023 02:11:22 +0200 >> From: Dmitry Gutov<dgutov@yandex.ru> >> >> I'm not sure whether there is an actual hard limit on modifying the >> text outside of the current restriction. > It is, for most/all practical purposes. If you try to modify text > outside of the current restriction, you risk many parts of code > barfing or signaling errors on you. For example, conversion from > character to byte positions and vice versa will stop working (and in a > build with --enable-checking will actually raise SIGABRT). All right, thank you both. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-02-18 17:21 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-08 15:34 bug#61369: Problem with keeping tree-sitter parse tree up-to-date Dmitry Gutov 2023-02-08 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-02-08 19:41 ` Dmitry Gutov 2023-02-10 1:22 ` Yuan Fu 2023-02-10 1:38 ` Dmitry Gutov 2023-02-13 9:10 ` Yuan Fu 2023-02-13 23:59 ` Yuan Fu 2023-02-15 2:17 ` Dmitry Gutov 2023-02-15 22:44 ` Dmitry Gutov 2023-02-17 22:32 ` Yuan Fu 2023-02-18 0:11 ` Dmitry Gutov 2023-02-18 1:14 ` Yuan Fu 2023-02-18 1:25 ` Dmitry Gutov 2023-02-18 10:05 ` Yuan Fu 2023-02-18 7:15 ` Eli Zaretskii 2023-02-18 17:21 ` Dmitry Gutov
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.