all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Yuan Fu <casouri@gmail.com>
To: Mickey Petersen <mickey@masteringemacs.org>
Cc: Po Lu <luangruo@yahoo.com>, Eli Zaretskii <eliz@gnu.org>,
	60237@debbugs.gnu.org
Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Date: Mon, 27 Feb 2023 14:37:22 -0800	[thread overview]
Message-ID: <4FF7BE3E-A966-4419-AA22-BAF57D1EF792@gmail.com> (raw)
In-Reply-To: <871qmbw55n.fsf@masteringemacs.org>



> On Feb 27, 2023, at 6:29 AM, Mickey Petersen <mickey@masteringemacs.org> wrote:
> 
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> On Feb 27, 2023, at 12:22 AM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>>> 
>>> 
>>> Yuan Fu <casouri@gmail.com> writes:
>>> 
>>>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>>>>> 
>>>>> 
>>>>> Yuan Fu <casouri@gmail.com> writes:
>>>>> 
>>>>>>> GC has historically never called xmalloc, so the profiler will
>>>>>>> likely
>>>>>>> crash upon growing the mark stack as well.  I guess another
>>>>>>> important
>>>>>>> question is why ts_delete_parser is calling xmalloc.
>>>>>>> 
>>>>>> 
>>>>>>> As you see, when we call ts_tree_delete, it calls
>>>>>>> ts_subtree_release,
>>>>>>> which in turn calls malloc (redirected into our xmalloc).  Is this
>>>>>>> expected?  Can you look in the tree-sitter sources and verify that
>>>>>>> this is OK?
>>>>>> 
>>>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more
>>>>>> precisely, a Subtree) is just some inlined data plus a refcounted
>>>>>> pointer to the complete data. This way multiple trees share common
>>>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old
>>>>>> tree and get a new tree, these two trees will share the unchanged part
>>>>>> of the tree.
>>>>> 
>>>>> Would that mean we could possibly preserve node instances -- either
>>>>> the real TS ones, or an Emacs-created facsimile -- between
>>>>> incremental parsing? That would be useful for refactoring.
>>>> 
>>>> What kind of exact interface (function) do you want? The
>>>> treesit-node-outdated error is solely Emacs’s product, tree-sitter
>>>> itself doesn’t mark a node outdated. It is possible for Emacs to not
>>>> delete the old tree and give it to you, or allow you to access
>>>> information of an outdated node.
>>> 
>>> OK, so let me explain:
>>> 
>>> Touching the buffer for any reason invalidates the whole tree; that's
>>> not good. It's not good, because a lot of the information may still be
>>> useful and viable. Outdating the node is not a bad idea as it avoids a
>>> lot of 'traps' around accidental modifications that can corrupt things
>>> without the developer's knowledge.
>>> 
>>> I'd like to be able to access all the information possible; perhaps
>>> behind a flag variable like `treesit-allow-outdated-node-access'. What
>>> I'm really mostly interested in is:
>>> 
>>> - How well the node references handle changes in byte positions in TS.
>> 
>> They don’t handle position changes. If the buffer content changed, we
>> need to reparse. Once we reparsed the buffer, a new tree is
>> born. While it is true that the new tree shares some node with the old
>> tree, tree-sitter does not expose any function or information that
>> tells you which node in the new tree is “the same” as which node in
>> the old tree; nor does it tell you whether a node in the old tree
>> still “exists” in the new tree.
>> 
>> Now, there does exist a function (in tree-sitter’s API) that allows
>> you to “edit” a node with position changes. But a) I’m not sure how
>> does it handle the case where the node is deleted by the change and b)
>> it is not very useful because once you reparse the buffer, the new
>> tree is completely independent from the old tree (ignoring the
>> implementation detail which is not exposed).
>> 
>>> 
>>> - Does changing something at X shift (like a `point-marker`) everything
>>> below it? Does an outdated node correctly reference its new location
>>> and state, such as changes to children or its position in the tree?
>> 
>> Like I said above, any buffer change will create a new tree with no relation to the old tree, so there is no shifting.
>> 
>> And there really isn’t a “new location”: we don’t know if the old node
>> is still in the new tree. Mind you, even if the node is completely
>> outside of the changed region, it can still disappear from the new
>> tree because of change of its surrounding context. For example, in the
>> following C code:
>> 
>> /*
>> int c = 1;
>> 
>> If I insert a closing comment delimiter, and buffer becomes
>> 
>> /*
>> int c = 1;
>> */
>> 
>> Even though int c = 1; is not in the changed range, nor did it’s
>> position move, all those nodes (int, c, =, etc) are not in the new
>> tree anymore, because the whole thing becomes a comment.
>> 
>> I made any access to outdated nodes error because there really isn’t
>> any good reason to use them, at least I didn’t think of any at the
>> time. And make them error out should help people catch errors.
>> 
>>> 
>>> Right now, Combobulate can make a proxy node, which essentially
>>> captures the basics of a live node and stores it in a defstruct. That
>>> way I can at least retain the start/end, type, text, etc. of a node
>>> and still do light refactoring without contorting myself to do things
>>> in a particular order, which is not always possible (like delaying
>>> editing to the very end.)
>> 
>> IIUC, you want to do some very minor whitespace edit to the buffer
>> which doesn’t really change the parse tree, so you don’t want the
>> nodes to be invalidated for no good reason? Not erroring on outdated
>> nodes is easy. As you said, we can add a
>> treesit-inhibit-error-outdated variable. But not it’s not so easy to
>> automatically update outdated nodes’ positions (with aforementioned
>> tree-sitter function). However, if you are making those changes, you
>> much know how to adjust your nodes position, right? So maybe it isn’t
>> a must-have for your purpose.
> 
> It's a good point, but it's also easy to create a scenario where you
> at least want to keep the position and esp. the type and text (for
> reporting information to the user, or similar.)

I should be clearer. I meant that treesit-inhibit-error-outdated is reasonable and easy to implement. So if you want we can add it. OTOH auto-updating outdated nodes with position information is nontrivial, and might not be must-have for your purpose.

> 
> My main interest is now refactoring and how to best do it. If TS can
> do some of it, then all the better. I realise it was never meant to,
> but if we can continue accessing the information contained in a node
> even if it is outdated, then that could be useful, however niche.

I guess “refactoring” includes not only whitespace changes but also some structural changes like slurping (or whatever it’s called), right? If you want to do structural changes, tree-sitter probably can’t help you much, as you observed. Maybe it’s better to “export” the tree-sitter tree to your own tree and do transformations with it? Maybe that’s already what you does now.

> Currently I use overlays and point markers, but they are not
> infallible.

Yuan




  reply	other threads:[~2023-02-27 22:37 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-21 12:24 bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node Mickey Petersen
2022-12-24  7:23 ` Eli Zaretskii
2022-12-24  9:20 ` Yuan Fu
2022-12-29 14:21   ` Mickey Petersen
2023-02-24 23:22 ` Yuan Fu
2023-02-25  7:13   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-25  7:51   ` Eli Zaretskii
2023-02-26  2:01     ` Yuan Fu
2023-02-26  2:37       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-26  6:18         ` Eli Zaretskii
2023-02-26  6:14       ` Eli Zaretskii
2023-02-26 15:16         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-28 14:00           ` Eli Zaretskii
2023-03-01  4:07             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-01 13:27               ` Eli Zaretskii
2023-03-01 14:08                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-01 15:51                   ` Eli Zaretskii
2023-03-01 17:39                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-04 12:21                       ` Eli Zaretskii
2023-03-08 16:34                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 18:28                           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 20:56                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11  6:45                               ` Eli Zaretskii
2023-03-11 17:45                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 23:52                             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11  2:41                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11  3:29                                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11  3:38                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-02  5:53                   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-02 20:24                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-26  9:41       ` Mickey Petersen
2023-02-27  0:34         ` Yuan Fu
2023-02-27  8:22           ` Mickey Petersen
2023-02-27  9:05             ` Yuan Fu
2023-02-27 14:29               ` Mickey Petersen
2023-02-27 22:37                 ` Yuan Fu [this message]
2023-02-27 22:45                   ` Dmitry Gutov
2023-02-24 23:29 ` Yuan Fu
2023-02-25  7:55   ` Eli Zaretskii
2023-02-26  2:02     ` Yuan Fu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF7BE3E-A966-4419-AA22-BAF57D1EF792@gmail.com \
    --to=casouri@gmail.com \
    --cc=60237@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=luangruo@yahoo.com \
    --cc=mickey@masteringemacs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.