all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates visited node instances
@ 2023-01-08 11:08 Mickey Petersen
  2023-01-09  3:57 ` Yuan Fu
  2023-01-09 20:30 ` Yuan Fu
  0 siblings, 2 replies; 4+ messages in thread
From: Mickey Petersen @ 2023-01-08 11:08 UTC (permalink / raw)
  To: 60656


If you parse some text, retrieve a node -- using `treesit-node-at', for example -- and then edit the buffer, then the node you retrieved is marked outdated.

However, tree-sitter is capable of handling that, to a greater or lesser extent:

https://tree-sitter.github.io/tree-sitter/using-parsers#editing

It is therefore possible to refresh node instances that were created _before_ the edit. I suppose it could remain an explicit step that you must enter a special form and then Emacs will track node instances issued inside that form and refresh them when edits take place inside of it.

As it stands, it is very hard to edit and maintain a node registry at the same time. (I'm using markers and overlays as a crude hack to work around it.)




In GNU Emacs 30.0.50 (build 6, x86_64-pc-linux-gnu, GTK+ Version
 3.24.20, cairo version 1.16.0) of 2023-01-02 built on mickey-work
Repository revision: c209802f7b3721a1b95113290934a23fee88f678
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12013000
System Description: Ubuntu 20.04.3 LTS





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates  visited node instances
  2023-01-08 11:08 bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates visited node instances Mickey Petersen
@ 2023-01-09  3:57 ` Yuan Fu
  2023-01-09  8:56   ` Mickey Petersen
  2023-01-09 20:30 ` Yuan Fu
  1 sibling, 1 reply; 4+ messages in thread
From: Yuan Fu @ 2023-01-09  3:57 UTC (permalink / raw)
  To: Mickey Petersen; +Cc: 60656


Mickey Petersen <mickey@masteringemacs.org> writes:

> If you parse some text, retrieve a node -- using `treesit-node-at',
> for example -- and then edit the buffer, then the node you retrieved
> is marked outdated.
>
> However, tree-sitter is capable of handling that, to a greater or lesser extent:
>
> https://tree-sitter.github.io/tree-sitter/using-parsers#editing
>
> It is therefore possible to refresh node instances that were created
> _before_ the edit. I suppose it could remain an explicit step that you
> must enter a special form and then Emacs will track node instances
> issued inside that form and refresh them when edits take place inside
> of it.
>
> As it stands, it is very hard to edit and maintain a node registry at
> the same time. (I'm using markers and overlays as a crude hack to work
> around it.)

This is kind of a limitation of tree-sitter. The "node editing" isn’t
like what you thought (it fooled me too when I first read it).
Tree-sitter’s incremental parsing works roughly like this:

1. You have a parsed tree, TREE, corresponding to some TEXT
2. You make some edit to the TEXT, eg, TEXT’ = insert(TEXT, 1, "abc")
3. Now you need to "edit" the old tree with _positions_ of your edit:
edit(TREE, Insert(pos=1, len=3)) (Notice that this modifies the tree in-place.)
4. You reparse the edited tree and gets a new tree:
TREE’ = parse(TREE, TEXT’) (Notice that this returns a new tree.)

If you have a NODE from TREE, editing that node only updates position
information. That corresponds to the eidt(TREE, ...) step. There is no
equivalent of the parse(TREE, TEXT’) step for nodes: once the tree is
reparsed and a new tree is returned, none of the nodes in the old tree
gets carried to the new tree. In practice, tree-sitter reuses old tree’s
data, but conceptually the old and new tree don’t share any node.

IOW, the editing feature for nodes is for very specific situations,
where you edit the parse tree but didn’t reparse yet. In this case, if
you want to make your node’s positions to be correct, you edit the node.
But once you reparse, there is no way to somehow "update" this old node
into its "equivalent" in the new tree.

I’m not sure whether tree-sitter is capable to do what you want (after
all the old and new tree are sharing data). But currently it doesn’t
expose the feature to do that.

Yuan





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates visited node instances
  2023-01-09  3:57 ` Yuan Fu
@ 2023-01-09  8:56   ` Mickey Petersen
  0 siblings, 0 replies; 4+ messages in thread
From: Mickey Petersen @ 2023-01-09  8:56 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 60656


Yuan Fu <casouri@gmail.com> writes:

> Mickey Petersen <mickey@masteringemacs.org> writes:
>
>> If you parse some text, retrieve a node -- using `treesit-node-at',
>> for example -- and then edit the buffer, then the node you retrieved
>> is marked outdated.
>>
>> However, tree-sitter is capable of handling that, to a greater or lesser extent:
>>
>> https://tree-sitter.github.io/tree-sitter/using-parsers#editing
>>
>> It is therefore possible to refresh node instances that were created
>> _before_ the edit. I suppose it could remain an explicit step that you
>> must enter a special form and then Emacs will track node instances
>> issued inside that form and refresh them when edits take place inside
>> of it.
>>
>> As it stands, it is very hard to edit and maintain a node registry at
>> the same time. (I'm using markers and overlays as a crude hack to work
>> around it.)
>
> This is kind of a limitation of tree-sitter. The "node editing" isn’t
> like what you thought (it fooled me too when I first read it).
> Tree-sitter’s incremental parsing works roughly like this:
>
> 1. You have a parsed tree, TREE, corresponding to some TEXT
> 2. You make some edit to the TEXT, eg, TEXT’ = insert(TEXT, 1, "abc")
> 3. Now you need to "edit" the old tree with _positions_ of your edit:
> edit(TREE, Insert(pos=1, len=3)) (Notice that this modifies the tree in-place.)
> 4. You reparse the edited tree and gets a new tree:
> TREE’ = parse(TREE, TEXT’) (Notice that this returns a new tree.)
>
> If you have a NODE from TREE, editing that node only updates position
> information. That corresponds to the eidt(TREE, ...) step. There is no
> equivalent of the parse(TREE, TEXT’) step for nodes: once the tree is
> reparsed and a new tree is returned, none of the nodes in the old tree
> gets carried to the new tree. In practice, tree-sitter reuses old tree’s
> data, but conceptually the old and new tree don’t share any node.
>
> IOW, the editing feature for nodes is for very specific situations,
> where you edit the parse tree but didn’t reparse yet. In this case, if
> you want to make your node’s positions to be correct, you edit the node.
> But once you reparse, there is no way to somehow "update" this old node
> into its "equivalent" in the new tree.
>
> I’m not sure whether tree-sitter is capable to do what you want (after
> all the old and new tree are sharing data). But currently it doesn’t
> expose the feature to do that.
>

That's a shame. The documentation is a little bit ambiguous then. But if the library returns a brand-new tree and thus nodes, then I can see why this won't work.

One possible workaround is that outdated nodes are proxies for their
underlying data (node type, range, text, anonymous/named) so that
their actual state is kept around. That will allow `equal' checks to
still succeed on an outdated and a "brand-new, but identical" node.

Food for thought.

> Yuan






^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates  visited node instances
  2023-01-08 11:08 bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates visited node instances Mickey Petersen
  2023-01-09  3:57 ` Yuan Fu
@ 2023-01-09 20:30 ` Yuan Fu
  1 sibling, 0 replies; 4+ messages in thread
From: Yuan Fu @ 2023-01-09 20:30 UTC (permalink / raw)
  To: Mickey Petersen; +Cc: 60656


Mickey Petersen <mickey@masteringemacs.org> writes:

> Yuan Fu <casouri@gmail.com> writes:
>
>> Mickey Petersen <mickey@masteringemacs.org> writes:
>>
>>> If you parse some text, retrieve a node -- using `treesit-node-at',
>>> for example -- and then edit the buffer, then the node you retrieved
>>> is marked outdated.
>>>
>>> However, tree-sitter is capable of handling that, to a greater or lesser extent:
>>>
>>> https://tree-sitter.github.io/tree-sitter/using-parsers#editing
>>>
>>> It is therefore possible to refresh node instances that were created
>>> _before_ the edit. I suppose it could remain an explicit step that you
>>> must enter a special form and then Emacs will track node instances
>>> issued inside that form and refresh them when edits take place inside
>>> of it.
>>>
>>> As it stands, it is very hard to edit and maintain a node registry at
>>> the same time. (I'm using markers and overlays as a crude hack to work
>>> around it.)
>>
>> This is kind of a limitation of tree-sitter. The "node editing" isn’t
>> like what you thought (it fooled me too when I first read it).
>> Tree-sitter’s incremental parsing works roughly like this:
>>
>> 1. You have a parsed tree, TREE, corresponding to some TEXT
>> 2. You make some edit to the TEXT, eg, TEXT’ = insert(TEXT, 1, "abc")
>> 3. Now you need to "edit" the old tree with _positions_ of your edit:
>> edit(TREE, Insert(pos=1, len=3)) (Notice that this modifies the tree in-place.)
>> 4. You reparse the edited tree and gets a new tree:
>> TREE’ = parse(TREE, TEXT’) (Notice that this returns a new tree.)
>>
>> If you have a NODE from TREE, editing that node only updates position
>> information. That corresponds to the eidt(TREE, ...) step. There is no
>> equivalent of the parse(TREE, TEXT’) step for nodes: once the tree is
>> reparsed and a new tree is returned, none of the nodes in the old tree
>> gets carried to the new tree. In practice, tree-sitter reuses old tree’s
>> data, but conceptually the old and new tree don’t share any node.
>>
>> IOW, the editing feature for nodes is for very specific situations,
>> where you edit the parse tree but didn’t reparse yet. In this case, if
>> you want to make your node’s positions to be correct, you edit the node.
>> But once you reparse, there is no way to somehow "update" this old node
>> into its "equivalent" in the new tree.
>>
>> I’m not sure whether tree-sitter is capable to do what you want (after
>> all the old and new tree are sharing data). But currently it doesn’t
>> expose the feature to do that.
>>
>
> That's a shame. The documentation is a little bit ambiguous then. But
> if the library returns a brand-new tree and thus nodes, then I can see
> why this won't work.

Yeah I wish tree-sitter can have it. Maybe you can raise an issue on
tree-sitter’s github. The author seems to be rather busy, though.

> One possible workaround is that outdated nodes are proxies for their
> underlying data (node type, range, text, anonymous/named) so that
> their actual state is kept around. That will allow `equal' checks to
> still succeed on an outdated and a "brand-new, but identical" node.
>
> Food for thought.

If you can describe what high-level feature you want to accomplish (with
node update), maybe I can provide some suggestions.

Yuan





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-01-09 20:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-08 11:08 bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates visited node instances Mickey Petersen
2023-01-09  3:57 ` Yuan Fu
2023-01-09  8:56   ` Mickey Petersen
2023-01-09 20:30 ` Yuan Fu

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.