From: Yuan Fu <casouri@gmail.com>
To: Mickey Petersen <mickey@masteringemacs.org>
Cc: Po Lu <luangruo@yahoo.com>, Eli Zaretskii <eliz@gnu.org>,
60237@debbugs.gnu.org
Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Date: Mon, 27 Feb 2023 14:37:22 -0800 [thread overview]
Message-ID: <4FF7BE3E-A966-4419-AA22-BAF57D1EF792@gmail.com> (raw)
In-Reply-To: <871qmbw55n.fsf@masteringemacs.org>
> On Feb 27, 2023, at 6:29 AM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>
>
> Yuan Fu <casouri@gmail.com> writes:
>
>>> On Feb 27, 2023, at 12:22 AM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>>>
>>>
>>> Yuan Fu <casouri@gmail.com> writes:
>>>
>>>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>>>>>
>>>>>
>>>>> Yuan Fu <casouri@gmail.com> writes:
>>>>>
>>>>>>> GC has historically never called xmalloc, so the profiler will
>>>>>>> likely
>>>>>>> crash upon growing the mark stack as well. I guess another
>>>>>>> important
>>>>>>> question is why ts_delete_parser is calling xmalloc.
>>>>>>>
>>>>>>
>>>>>>> As you see, when we call ts_tree_delete, it calls
>>>>>>> ts_subtree_release,
>>>>>>> which in turn calls malloc (redirected into our xmalloc). Is this
>>>>>>> expected? Can you look in the tree-sitter sources and verify that
>>>>>>> this is OK?
>>>>>>
>>>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more
>>>>>> precisely, a Subtree) is just some inlined data plus a refcounted
>>>>>> pointer to the complete data. This way multiple trees share common
>>>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old
>>>>>> tree and get a new tree, these two trees will share the unchanged part
>>>>>> of the tree.
>>>>>
>>>>> Would that mean we could possibly preserve node instances -- either
>>>>> the real TS ones, or an Emacs-created facsimile -- between
>>>>> incremental parsing? That would be useful for refactoring.
>>>>
>>>> What kind of exact interface (function) do you want? The
>>>> treesit-node-outdated error is solely Emacs’s product, tree-sitter
>>>> itself doesn’t mark a node outdated. It is possible for Emacs to not
>>>> delete the old tree and give it to you, or allow you to access
>>>> information of an outdated node.
>>>
>>> OK, so let me explain:
>>>
>>> Touching the buffer for any reason invalidates the whole tree; that's
>>> not good. It's not good, because a lot of the information may still be
>>> useful and viable. Outdating the node is not a bad idea as it avoids a
>>> lot of 'traps' around accidental modifications that can corrupt things
>>> without the developer's knowledge.
>>>
>>> I'd like to be able to access all the information possible; perhaps
>>> behind a flag variable like `treesit-allow-outdated-node-access'. What
>>> I'm really mostly interested in is:
>>>
>>> - How well the node references handle changes in byte positions in TS.
>>
>> They don’t handle position changes. If the buffer content changed, we
>> need to reparse. Once we reparsed the buffer, a new tree is
>> born. While it is true that the new tree shares some node with the old
>> tree, tree-sitter does not expose any function or information that
>> tells you which node in the new tree is “the same” as which node in
>> the old tree; nor does it tell you whether a node in the old tree
>> still “exists” in the new tree.
>>
>> Now, there does exist a function (in tree-sitter’s API) that allows
>> you to “edit” a node with position changes. But a) I’m not sure how
>> does it handle the case where the node is deleted by the change and b)
>> it is not very useful because once you reparse the buffer, the new
>> tree is completely independent from the old tree (ignoring the
>> implementation detail which is not exposed).
>>
>>>
>>> - Does changing something at X shift (like a `point-marker`) everything
>>> below it? Does an outdated node correctly reference its new location
>>> and state, such as changes to children or its position in the tree?
>>
>> Like I said above, any buffer change will create a new tree with no relation to the old tree, so there is no shifting.
>>
>> And there really isn’t a “new location”: we don’t know if the old node
>> is still in the new tree. Mind you, even if the node is completely
>> outside of the changed region, it can still disappear from the new
>> tree because of change of its surrounding context. For example, in the
>> following C code:
>>
>> /*
>> int c = 1;
>>
>> If I insert a closing comment delimiter, and buffer becomes
>>
>> /*
>> int c = 1;
>> */
>>
>> Even though int c = 1; is not in the changed range, nor did it’s
>> position move, all those nodes (int, c, =, etc) are not in the new
>> tree anymore, because the whole thing becomes a comment.
>>
>> I made any access to outdated nodes error because there really isn’t
>> any good reason to use them, at least I didn’t think of any at the
>> time. And make them error out should help people catch errors.
>>
>>>
>>> Right now, Combobulate can make a proxy node, which essentially
>>> captures the basics of a live node and stores it in a defstruct. That
>>> way I can at least retain the start/end, type, text, etc. of a node
>>> and still do light refactoring without contorting myself to do things
>>> in a particular order, which is not always possible (like delaying
>>> editing to the very end.)
>>
>> IIUC, you want to do some very minor whitespace edit to the buffer
>> which doesn’t really change the parse tree, so you don’t want the
>> nodes to be invalidated for no good reason? Not erroring on outdated
>> nodes is easy. As you said, we can add a
>> treesit-inhibit-error-outdated variable. But not it’s not so easy to
>> automatically update outdated nodes’ positions (with aforementioned
>> tree-sitter function). However, if you are making those changes, you
>> much know how to adjust your nodes position, right? So maybe it isn’t
>> a must-have for your purpose.
>
> It's a good point, but it's also easy to create a scenario where you
> at least want to keep the position and esp. the type and text (for
> reporting information to the user, or similar.)
I should be clearer. I meant that treesit-inhibit-error-outdated is reasonable and easy to implement. So if you want we can add it. OTOH auto-updating outdated nodes with position information is nontrivial, and might not be must-have for your purpose.
>
> My main interest is now refactoring and how to best do it. If TS can
> do some of it, then all the better. I realise it was never meant to,
> but if we can continue accessing the information contained in a node
> even if it is outdated, then that could be useful, however niche.
I guess “refactoring” includes not only whitespace changes but also some structural changes like slurping (or whatever it’s called), right? If you want to do structural changes, tree-sitter probably can’t help you much, as you observed. Maybe it’s better to “export” the tree-sitter tree to your own tree and do transformations with it? Maybe that’s already what you does now.
> Currently I use overlays and point markers, but they are not
> infallible.
Yuan
next prev parent reply other threads:[~2023-02-27 22:37 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-21 12:24 bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node Mickey Petersen
2022-12-24 7:23 ` Eli Zaretskii
2022-12-24 9:20 ` Yuan Fu
2022-12-29 14:21 ` Mickey Petersen
2023-02-24 23:22 ` Yuan Fu
2023-02-25 7:13 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-25 7:51 ` Eli Zaretskii
2023-02-26 2:01 ` Yuan Fu
2023-02-26 2:37 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-26 6:18 ` Eli Zaretskii
2023-02-26 6:14 ` Eli Zaretskii
2023-02-26 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-28 14:00 ` Eli Zaretskii
2023-03-01 4:07 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-01 13:27 ` Eli Zaretskii
2023-03-01 14:08 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-01 15:51 ` Eli Zaretskii
2023-03-01 17:39 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-04 12:21 ` Eli Zaretskii
2023-03-08 16:34 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 18:28 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 20:56 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11 6:45 ` Eli Zaretskii
2023-03-11 17:45 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 23:52 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11 2:41 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11 3:29 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11 3:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-02 5:53 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-02 20:24 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-26 9:41 ` Mickey Petersen
2023-02-27 0:34 ` Yuan Fu
2023-02-27 8:22 ` Mickey Petersen
2023-02-27 9:05 ` Yuan Fu
2023-02-27 14:29 ` Mickey Petersen
2023-02-27 22:37 ` Yuan Fu [this message]
2023-02-27 22:45 ` Dmitry Gutov
2023-02-24 23:29 ` Yuan Fu
2023-02-25 7:55 ` Eli Zaretskii
2023-02-26 2:02 ` Yuan Fu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FF7BE3E-A966-4419-AA22-BAF57D1EF792@gmail.com \
--to=casouri@gmail.com \
--cc=60237@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=luangruo@yahoo.com \
--cc=mickey@masteringemacs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.