Re: treesitter local parser: huge slowdown and memory usage in a long file

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Dmitry Gutov <dmitry@gutov.dev>
To: Yuan Fu <casouri@gmail.com>
Cc: Vincenzo Pupillo <v.pupillo@gmail.com>,
	"Ergus via Emacs development discussions." <emacs-devel@gnu.org>,
	Eli Zaretskii <eliz@gnu.org>
Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file
Date: Sun, 18 Feb 2024 05:37:46 +0200	[thread overview]
Message-ID: <b0e91c50-6f5f-460f-a63d-6e7f4f13abc5@gutov.dev> (raw)
In-Reply-To: <47F1243E-0515-418D-96B9-4D3FE3CC4BBC@gmail.com>

On 13/02/2024 10:08, Yuan Fu wrote:

>> On 12/02/2024 06:16, Yuan Fu wrote:
>>> Thanks, the culprit is the call to treesit-update-ranges in
>>> treesit--pre-redisplay, where we don’t pass it any specific range, so it
>>>   updates the range for the whole buffer. Eli, is there any way to get a
>>> rough estimate the range that redisplay is refreshing? Do you think
>>> something like this would work?
>>
>> If we don't update the ranges outside of some interval surrounding the window, what does that mean for correctness?
> 
> If the place of update and the embedded code currently in view belong to the same node in the host language, then when we update ranges for the current window-visible range, the whole node’s range is updated. So at least for this node, the range is correct.
> 
> If the place of update and the embedded code currently in view belong to different nodes in the host language, then when we update ranges for the current window-visible range, only the visible node’s range is updated.

Okay. What about positions after the visible part of the buffer? Can 
their ranges be outdated? It's probably okay when the ranges are only 
used for font-lock and syntax-ppss, but I wonder about possible other 
applications (reindenting the whole buffer, for example).

>>
>> Perhaps the mode has a syntax-propertize-function which behaves differently (as it should) depending on the language at point. Or different ranges have different syntax tables, something like that.
>>
>> If the ranges, after some edit (perhaps a programmatic one, performed far from the visible area), are kept not update somewhere around the beginning of the buffer, do we not risk confusing the syntax-ppss parser, for example?
> 
> That can happen, yes.
> 
>>
>> Come to think of it, take treesit-indent: it only updates the ranges for the current line. But the line's indentation usually depends on the previous buffer positions, doesn't it?
> 
> The range passed to treesit-update-ranges act as an intercepting range—we capture nodes that intercepts with the range and use them to update ranges. If the line to be indented is in an embedded language block, the whole block will be captured and it’s range will be given to the embedded language parser.
> 
> 
> We haven’t have any problem so far mainly because most embedded code blocks are local,  and it’s rare for some edit to take place far from the visible portion which affects ranges and user expects that edit to affect the current visible range.
> 
> I don’t have any great idea for a better way to update ranges right now. Let me think about that. In the meantime, I’ll push a temporary fix so V’s original problem can be solved.

I was thinking (since considering the same problem in mmm-mode, 
actually) that it would make sense to either plug into 
syntax-propertize-function, or have a parallel data structure similarly 
tracking the outdated buffer regions, which would only update the part 
of the buffer which had been modified since last time.

Dealing with the "remainder" of the buffer might be trickier, but maybe 
some heuristic which would help detect the "no changes" case could be 
implemented.

next prev parent reply	other threads:[~2024-02-18  3:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-11 21:53 treesitter local parser: huge slowdown and memory usage in a long file Vincenzo Pupillo
2024-02-12  4:16 ` Yuan Fu
2024-02-12 14:09   ` Eli Zaretskii
2024-02-13  8:15     ` Yuan Fu
2024-02-13  9:39       ` Vincenzo Pupillo
2024-02-13 12:59       ` Eli Zaretskii
2024-02-13  0:50   ` Dmitry Gutov
2024-02-13  8:08     ` Yuan Fu
2024-02-18  3:37       ` Dmitry Gutov [this message]
2024-02-19  5:53         ` Yuan Fu
2024-03-21  6:39           ` Yuan Fu
  -- strict thread matches above, loose matches on Subject: below --
2024-04-20  2:18 Yuan Fu
2024-04-20 19:14 ` Vincenzo Pupillo
2024-04-23  5:09   ` Yuan Fu
2024-05-06  2:04 ` Dmitry Gutov
2024-05-09  0:16   ` Yuan Fu
2024-05-12 23:44     ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0e91c50-6f5f-460f-a63d-6e7f4f13abc5@gutov.dev \
    --to=dmitry@gutov.dev \
    --cc=casouri@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=v.pupillo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).