unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Yuan Fu <casouri@gmail.com>
To: Mickey Petersen <mickey@masteringemacs.org>
Cc: Po Lu <luangruo@yahoo.com>, Eli Zaretskii <eliz@gnu.org>,
	60237@debbugs.gnu.org
Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Date: Mon, 27 Feb 2023 01:05:49 -0800	[thread overview]
Message-ID: <8A0520AE-7C8C-43D2-BE93-E80D5CC8856C@gmail.com> (raw)
In-Reply-To: <875ybnwm2r.fsf@masteringemacs.org>



> On Feb 27, 2023, at 12:22 AM, Mickey Petersen <mickey@masteringemacs.org> wrote:
> 
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>>> 
>>> 
>>> Yuan Fu <casouri@gmail.com> writes:
>>> 
>>>>> GC has historically never called xmalloc, so the profiler will
>>>>> likely
>>>>> crash upon growing the mark stack as well.  I guess another
>>>>> important
>>>>> question is why ts_delete_parser is calling xmalloc.
>>>>> 
>>>> 
>>>>> As you see, when we call ts_tree_delete, it calls
>>>>> ts_subtree_release,
>>>>> which in turn calls malloc (redirected into our xmalloc).  Is this
>>>>> expected?  Can you look in the tree-sitter sources and verify that
>>>>> this is OK?
>>>> 
>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more
>>>> precisely, a Subtree) is just some inlined data plus a refcounted
>>>> pointer to the complete data. This way multiple trees share common
>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old
>>>> tree and get a new tree, these two trees will share the unchanged part
>>>> of the tree.
>>> 
>>> Would that mean we could possibly preserve node instances -- either
>>> the real TS ones, or an Emacs-created facsimile -- between
>>> incremental parsing? That would be useful for refactoring.
>> 
>> What kind of exact interface (function) do you want? The
>> treesit-node-outdated error is solely Emacs’s product, tree-sitter
>> itself doesn’t mark a node outdated. It is possible for Emacs to not
>> delete the old tree and give it to you, or allow you to access
>> information of an outdated node.
> 
> OK, so let me explain:
> 
> Touching the buffer for any reason invalidates the whole tree; that's
> not good. It's not good, because a lot of the information may still be
> useful and viable. Outdating the node is not a bad idea as it avoids a
> lot of 'traps' around accidental modifications that can corrupt things
> without the developer's knowledge.
> 
> I'd like to be able to access all the information possible; perhaps
> behind a flag variable like `treesit-allow-outdated-node-access'. What
> I'm really mostly interested in is:
> 
> - How well the node references handle changes in byte positions in TS.

They don’t handle position changes. If the buffer content changed, we need to reparse. Once we reparsed the buffer, a new tree is born. While it is true that the new tree shares some node with the old tree, tree-sitter does not expose any function or information that tells you which node in the new tree is “the same” as which node in the old tree; nor does it tell you whether a node in the old tree still “exists” in the new tree.

Now, there does exist a function (in tree-sitter’s API) that allows you to “edit” a node with position changes. But a) I’m not sure how does it handle the case where the node is deleted by the change and b) it is not very useful because once you reparse the buffer, the new tree is completely independent from the old tree (ignoring the implementation detail which is not exposed).

> 
> - Does changing something at X shift (like a `point-marker`) everything
> below it? Does an outdated node correctly reference its new location
> and state, such as changes to children or its position in the tree?

Like I said above, any buffer change will create a new tree with no relation to the old tree, so there is no shifting.

And there really isn’t a “new location”: we don’t know if the old node is still in the new tree. Mind you, even if the node is completely outside of the changed region, it can still disappear from the new tree because of change of its surrounding context. For example, in the following C code:

/*
int c = 1;

If I insert a closing comment delimiter, and buffer becomes

/*
int c = 1;
*/

Even though int c = 1; is not in the changed range, nor did it’s position move, all those nodes (int, c, =, etc) are not in the new tree anymore, because the whole thing becomes a comment.

I made any access to outdated nodes error because there really isn’t any good reason to use them, at least I didn’t think of any at the time. And make them error out should help people catch errors.

> 
> Right now, Combobulate can make a proxy node, which essentially
> captures the basics of a live node and stores it in a defstruct. That
> way I can at least retain the start/end, type, text, etc. of a node
> and still do light refactoring without contorting myself to do things
> in a particular order, which is not always possible (like delaying
> editing to the very end.)

IIUC, you want to do some very minor whitespace edit to the buffer which doesn’t really change the parse tree, so you don’t want the nodes to be invalidated for no good reason? Not erroring on outdated nodes is easy. As you said, we can add a treesit-inhibit-error-outdated variable. But not it’s not so easy to automatically update outdated nodes’ positions (with aforementioned tree-sitter function). However, if you are making those changes, you much know how to adjust your nodes position, right? So maybe it isn’t a must-have for your purpose.


Yuan






  reply	other threads:[~2023-02-27  9:05 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-21 12:24 bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node Mickey Petersen
2022-12-24  7:23 ` Eli Zaretskii
2022-12-24  9:20 ` Yuan Fu
2022-12-29 14:21   ` Mickey Petersen
2023-02-24 23:22 ` Yuan Fu
2023-02-25  7:13   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-25  7:51   ` Eli Zaretskii
2023-02-26  2:01     ` Yuan Fu
2023-02-26  2:37       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-26  6:18         ` Eli Zaretskii
2023-02-26  6:14       ` Eli Zaretskii
2023-02-26 15:16         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-28 14:00           ` Eli Zaretskii
2023-03-01  4:07             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-01 13:27               ` Eli Zaretskii
2023-03-01 14:08                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-01 15:51                   ` Eli Zaretskii
2023-03-01 17:39                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-04 12:21                       ` Eli Zaretskii
2023-03-08 16:34                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 18:28                           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 20:56                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11  6:45                               ` Eli Zaretskii
2023-03-11 17:45                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-10 23:52                             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11  2:41                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11  3:29                                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-11  3:38                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-02  5:53                   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-02 20:24                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-26  9:41       ` Mickey Petersen
2023-02-27  0:34         ` Yuan Fu
2023-02-27  8:22           ` Mickey Petersen
2023-02-27  9:05             ` Yuan Fu [this message]
2023-02-27 14:29               ` Mickey Petersen
2023-02-27 22:37                 ` Yuan Fu
2023-02-27 22:45                   ` Dmitry Gutov
2023-02-24 23:29 ` Yuan Fu
2023-02-25  7:55   ` Eli Zaretskii
2023-02-26  2:02     ` Yuan Fu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8A0520AE-7C8C-43D2-BE93-E80D5CC8856C@gmail.com \
    --to=casouri@gmail.com \
    --cc=60237@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=luangruo@yahoo.com \
    --cc=mickey@masteringemacs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).