unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Theodor Thornhill via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Mickey Petersen <mickey@masteringemacs.org>
Cc: Eli Zaretskii <eliz@gnu.org>, 61529@debbugs.gnu.org
Subject: bug#61529: 30.0.50; tree-sitter: weird off-by-one error but only in css-ts-mode(?) with `treesit-node-at'
Date: Wed, 15 Feb 2023 20:42:45 +0100	[thread overview]
Message-ID: <874jrm7m16.fsf@thornhill.no> (raw)
In-Reply-To: <87cz6akakn.fsf@masteringemacs.org>

Mickey Petersen <mickey@masteringemacs.org> writes:

> Theodor Thornhill <theo@thornhill.no> writes:
>
>> Mickey Petersen <mickey@masteringemacs.org> writes:
>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>>>> From: Mickey Petersen <mickey@masteringemacs.org>
>>>>> Date: Wed, 15 Feb 2023 08:25:53 +0000
>>>>>
>>>>>
>>>>> With point at '2', then I'd expect `treesit-node-at' to yield that node. But it does not:
>>>>>
>>>>> (cons (point) (treesit-node-at (point)))
>>>>>
>>>>> => (34 . #<treesit-node "(" in 34-35>)
>>>>
>>>> The value of point is the number of the character which _follows_
>>>> point, yes?  So when the cursor is on '2', point is actually between
>>>> '(' and '2'.  Right?  What does this mean in terms of the node that
>>>> should be returned by tree-sitter?
>>>
>>> Correct, point is between '(' and '2'. So 34-35 means it occupies
>>> position 34-35 or [34,35). So point is outside the scope of the '('
>>> single-char anonymous node.
>>>
>>> Or at least it should be: the problem is that it *is* inside it in
>>> this one weird instance and, near as I can find, only in this mode,
>>> and then only in this place, it isn't. I suspect `treesit-node-at' has
>>> a bug.
>>>
>>
>> Hi, Mickey!
>>
> Hey Theo!
>
>>> Consider:
>>>
>>>     a {
>>>       background: linear-gradient(210deg, rgba(|255,82,41,1) 0%, rgba(251,165,85,1) 54%, rgba(163,73,73,1) 100%);
>>>     }
>>>
>>> Note the new position of point in rgba. `treesit-node-at` with `(point)` now correctly returns
>>>
>>>     #<treesit-node integer_value in 48-51>
>>>
>>> Move point back one position:
>>>
>>>     a {
>>>       background: linear-gradient(210deg, rgba|(255,82,41,1) 0%, rgba(251,165,85,1) 54%, rgba(163,73,73,1) 100%);
>>>     }
>>>
>>> And now:
>>>
>>>   (treesit-node-at (point)) => #<treesit-node "(" in 47-48>
>>>
>>> In start contrast to the original example.
>>
>> So the docstring of treesit-node-at states:
>>
>>
>>   "Return the leaf node at position POS.
>>
>> A leaf node is a node that doesn't have any child nodes.
>>
>> The returned node's span covers POS: the node's beginning is before
>> or at POS, and the node's end is at or after POS.
>>
>> If no leaf node's span covers POS (e.g., POS is on whitespace
>> between two leaf nodes), return the first leaf node after POS.
>>
>> If there is no leaf node after POS, return the first leaf node
>> before POS.
>>
>> Return nil if no leaf node can be returned.  If NAMED is non-nil,
>> only look for named nodes."
>>
>> Doesn't this describe this behavior?
>>
>
> It's a good question: I suppose it's a question of wording (or
> understanding) more than it necessarily being *wrong* -- it is, after
> all, a custom function.
>
> I read and interpreted it to mean that due to how node boundaries work
> that "*end is at* or after POS" to mean that point is wholly contained
> in the node "(" which, due to how tree-sitter determines node extents,
> it technically isn't.
>
> But I think it's fair enough if this is intentional -- I've no real
> suggestions for improving its behaviour if this is intended. So if
> it's working as expected, then it's safe to close the issue.
>

There is one thing here which confuses me a lot and that you might also
have some thoughts on. Consider some simple tsx:

```
const x = () => (
  <div>
    try to C-SPC C-SPC at the beginning of try after activating treesit-explore-mode
  </div>
)
```

Now you can maybe see that the jsx_text node covers a lot more than just
the line in the middle.  There are some other cases like this in some
languages, and they do trip up our semantics. May this be one similar
such case, just not concerning indentation in this case?

IOW, sometimes the parser also returns nodes including whitespace, so it
looks like we are outside a node, but we're not yet.

Theo





  reply	other threads:[~2023-02-15 19:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-15  8:25 bug#61529: 30.0.50; tree-sitter: weird off-by-one error but only in css-ts-mode(?) with `treesit-node-at' Mickey Petersen
2023-02-15 13:42 ` Eli Zaretskii
2023-02-15 13:42   ` Mickey Petersen
2023-02-15 18:35     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 19:01       ` Mickey Petersen
2023-02-15 19:42         ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2023-02-16 19:48           ` Mickey Petersen
2023-02-16 21:26         ` Dmitry Gutov
2023-02-16 21:34       ` Dmitry Gutov
2023-02-17  6:16         ` Eli Zaretskii
2023-02-17  7:24           ` Mickey Petersen
2023-02-17 15:11           ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874jrm7m16.fsf@thornhill.no \
    --to=bug-gnu-emacs@gnu.org \
    --cc=61529@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=mickey@masteringemacs.org \
    --cc=theo@thornhill.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).