unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: Yuan Fu <casouri@gmail.com>
Cc: theo@thornhill.no, 61369@debbugs.gnu.org
Subject: bug#61369: Problem with keeping tree-sitter parse tree up-to-date
Date: Wed, 15 Feb 2023 04:17:29 +0200	[thread overview]
Message-ID: <9c4e551b-42b3-8202-ccff-fb8170b616a6@yandex.ru> (raw)
In-Reply-To: <1AC63591-F4EF-411F-B554-7CD38B4B4888@gmail.com>

On 14/02/2023 01:59, Yuan Fu wrote:
> There are two surprises here: 1) there isn’t an off-by-one bug, 2) the
> parser actually read the whole buffer, rather than reading only the new
> content. Then there are even less reason for it to create that error
> node.

The parser reads the whole buffer, but if it tries to reparse based on 
the previous parse tree with incorrect positions, it might get into an 
invalid state as a result.

I've tried gdb-ing treesit_tree_edit_1 (after dropping the 'inline' 
qualifier), and here's what I see:

- If I paste the test line without the trailing newline or not, the value.

- If I paste the test line with the trailing newline, the value of 
new_end_byte is still 67. But then it is followed by this call right away:

Thread 1 "emacs" hit Breakpoint 3, treesit_tree_edit_1 
(tree=tree@entry=0x5555574139b0, start_byte=start_byte@entry=134, 
old_end_byte=old_end_byte@entry=134, new_end_byte=135) at treesit.c:739

- If I 'undo' after that, the call is as expected:

Thread 1 "emacs" hit Breakpoint 3, treesit_tree_edit_1 
(tree=0x555557435cd0, start_byte=start_byte@entry=0, 
old_end_byte=old_end_byte@entry=68, new_end_byte=new_end_byte@entry=0) 
at treesit.c:739
739	{

So I tried again to figure out the odd call, with the backtrace:

Thread 1 "emacs" hit Breakpoint 3, treesit_tree_edit_1 
(tree=tree@entry=0x5555575b64f0, start_byte=start_byte@entry=134, 
old_end_byte=old_end_byte@entry=134, new_end_byte=269) at treesit.c:739
739	{
(gdb) backtrace
#0  treesit_tree_edit_1 (tree=tree@entry=0x5555575b64f0, 
start_byte=start_byte@entry=134, old_end_byte=old_end_byte@entry=134, 
new_end_byte=269) at treesit.c:739
#1  0x00005555557cb085 in treesit_sync_visible_region 
(parser=parser@entry=XIL(0x555556fc329d)) at treesit.c:931
#2  0x00005555557ccf28 in treesit_ensure_parsed 
(parser=XIL(0x555556fc329d)) at treesit.c:1025
#3  Ftreesit_parser_root_node (parser=XIL(0x555556fc329d)) at treesit.c:1507

treesit.c:739 points to a treesit_tree_edit_1 call which is predicated 
on this condition:

   if (visible_end < BUF_ZV_BYTE (buffer))

...which shouldn't be the case since the buffer is small enough to fit 
in the default window. It might already be the consequence of passing 
the wrong value of new_end_byte to ts_tree_edit, though.

Going back to the first call, the backtrace looks like this:

Thread 1 "emacs" hit Breakpoint 3, treesit_tree_edit_1 
(tree=0x5555574f0ff0, start_byte=start_byte@entry=0, 
old_end_byte=old_end_byte@entry=0, new_end_byte=new_end_byte@entry=67) 
at treesit.c:739
739	{
(gdb) backtrace
#0  treesit_tree_edit_1 (tree=0x5555574f0ff0, 
start_byte=start_byte@entry=0, old_end_byte=old_end_byte@entry=0, 
new_end_byte=new_end_byte@entry=67) at treesit.c:739
#1  0x00005555557cc991 in treesit_record_change (start_byte=1, 
old_end_byte=1, new_end_byte=69) at treesit.c:806
#2  0x00005555556f8bb7 in insert_from_string_1 
(string=XIL(0x55555744c4f4), pos=0, pos_byte=0, nchars=68, nbytes=68, 
inherit=<optimized out>, before_markers=false) at insdel.c:1084

Seems like treesit_record_change turns new_end_byte=69 into 
new_end_byte=67 inside treesit_tree_edit_1.

It seems to fail in this calculation:

   ptrdiff_t new_end_offset = (min (visible_end,
				   max (visible_end, new_end_byte))
			      - visible_beg);

because visible_end is still 68 there. It value gets updated later, 
closer to the end of this function.





  reply	other threads:[~2023-02-15  2:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-08 15:34 bug#61369: Problem with keeping tree-sitter parse tree up-to-date Dmitry Gutov
2023-02-08 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-08 19:41   ` Dmitry Gutov
2023-02-10  1:22 ` Yuan Fu
2023-02-10  1:38   ` Dmitry Gutov
2023-02-13  9:10 ` Yuan Fu
2023-02-13 23:59 ` Yuan Fu
2023-02-15  2:17   ` Dmitry Gutov [this message]
2023-02-15 22:44     ` Dmitry Gutov
2023-02-17 22:32       ` Yuan Fu
2023-02-18  0:11         ` Dmitry Gutov
2023-02-18  1:14           ` Yuan Fu
2023-02-18  1:25             ` Dmitry Gutov
2023-02-18 10:05               ` Yuan Fu
2023-02-18  7:15           ` Eli Zaretskii
2023-02-18 17:21             ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c4e551b-42b3-8202-ccff-fb8170b616a6@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=61369@debbugs.gnu.org \
    --cc=casouri@gmail.com \
    --cc=theo@thornhill.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).