From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Sun, 18 Feb 2024 05:37:46 +0200 Message-ID: References: <5991618.MhkbZ0Pkbq@fedora> <93F7DE13-0EC7-4A17-89B1-E07C99C6347B@gmail.com> <47F1243E-0515-418D-96B9-4D3FE3CC4BBC@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29674"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla Thunderbird Cc: Vincenzo Pupillo , "Ergus via Emacs development discussions." , Eli Zaretskii To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Feb 18 04:38:41 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rbY0n-0007Vs-1v for ged-emacs-devel@m.gmane-mx.org; Sun, 18 Feb 2024 04:38:41 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rbY05-0003cu-5Z; Sat, 17 Feb 2024 22:37:57 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rbY01-0003cK-JN for emacs-devel@gnu.org; Sat, 17 Feb 2024 22:37:54 -0500 Original-Received: from out4-smtp.messagingengine.com ([66.111.4.28]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rbXzz-0007TG-2x; Sat, 17 Feb 2024 22:37:53 -0500 Original-Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailout.nyi.internal (Postfix) with ESMTP id CCECD5C0048; Sat, 17 Feb 2024 22:37:49 -0500 (EST) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute7.internal (MEProxy); Sat, 17 Feb 2024 22:37:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1708227469; x=1708313869; bh=uV1UZ3aiLjc6ylc7KhT0fULFHVjFYvBhLNcwMbkf3pQ=; b= OMpJEWSzkJsVG+e++t5fjpd1r+H5BBDryq4zjjyIP1hSvKFNI5F4902N56UsFCiQ rc6U+10x0MnUpdyRKWK9rFnYE6xmYx4vnAYI72fVVNuCNmN3gXpAGQk4EMLUVSui Jsw8GsSJcSSQGtRF41H5D67HqzfCzTh0eYHl2M0MVL4jBxQGu9il0nVs7ahaZJWx 2DuMXKE/4LRgAa3kJvtshLkLuuxlsiQWWYinhZbP69EsL6uT4TL2lhH/G/j1VdJ2 fEthFNWwOCypKB9EfR3QX81VOT8WElsxjOsvWTCOPJpfdYEQ+NY2MypO8KuYStCu D7NuNxgXbhh3Xsxkv1QUYQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1708227469; x= 1708313869; bh=uV1UZ3aiLjc6ylc7KhT0fULFHVjFYvBhLNcwMbkf3pQ=; b=l 1FwuPsPJUicAFgx3TTnl07I0MkG/mvU7atR/eaO6Nx+Xtko+AHW9lUg6K+yAl50q CzLlFj8Bn13yrdgrVE7vcLn/HVgtFqLyVhUBJu3ZcQNytDDgHA94uG12m/7x+kqq DgwpCA8QVp9L9MQ480fzeL0Vr9iqhgESoKLmO02RLlMNnW2us8UAB40Q2/NgscRp XFzOONerxXNd1TqmqhOss+K0z/SAQadu7FbOKOzz8gVbpQylvSBJoSyMMJpy1Yg3 AYpu+0lK39ujjpNHEEvBiHQjVRCuxvy8Uo1cv3Cj00waIovkIpcjnDJ3ngPjgxTz 2xxoWXqofl0oKhU18JLcA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdehgdeivdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkffggfgfuvfevfhfhjggtgfesthekredttddvjeenucfhrhhomhepffhmihht rhihucfiuhhtohhvuceoughmihhtrhihsehguhhtohhvrdguvghvqeenucggtffrrghtth gvrhhnpeegleefteekgffhvdfhtdegveevveetteegteevgeettdehhfdukeetheffueek keenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegumh hithhrhiesghhuthhovhdruggvvh X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 17 Feb 2024 22:37:48 -0500 (EST) Content-Language: en-US In-Reply-To: <47F1243E-0515-418D-96B9-4D3FE3CC4BBC@gmail.com> Received-SPF: pass client-ip=66.111.4.28; envelope-from=dmitry@gutov.dev; helo=out4-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:316293 Archived-At: On 13/02/2024 10:08, Yuan Fu wrote: >> On 12/02/2024 06:16, Yuan Fu wrote: >>> Thanks, the culprit is the call to treesit-update-ranges in >>> treesit--pre-redisplay, where we don’t pass it any specific range, so it >>> updates the range for the whole buffer. Eli, is there any way to get a >>> rough estimate the range that redisplay is refreshing? Do you think >>> something like this would work? >> >> If we don't update the ranges outside of some interval surrounding the window, what does that mean for correctness? > > If the place of update and the embedded code currently in view belong to the same node in the host language, then when we update ranges for the current window-visible range, the whole node’s range is updated. So at least for this node, the range is correct. > > If the place of update and the embedded code currently in view belong to different nodes in the host language, then when we update ranges for the current window-visible range, only the visible node’s range is updated. Okay. What about positions after the visible part of the buffer? Can their ranges be outdated? It's probably okay when the ranges are only used for font-lock and syntax-ppss, but I wonder about possible other applications (reindenting the whole buffer, for example). >> >> Perhaps the mode has a syntax-propertize-function which behaves differently (as it should) depending on the language at point. Or different ranges have different syntax tables, something like that. >> >> If the ranges, after some edit (perhaps a programmatic one, performed far from the visible area), are kept not update somewhere around the beginning of the buffer, do we not risk confusing the syntax-ppss parser, for example? > > That can happen, yes. > >> >> Come to think of it, take treesit-indent: it only updates the ranges for the current line. But the line's indentation usually depends on the previous buffer positions, doesn't it? > > The range passed to treesit-update-ranges act as an intercepting range—we capture nodes that intercepts with the range and use them to update ranges. If the line to be indented is in an embedded language block, the whole block will be captured and it’s range will be given to the embedded language parser. > > > We haven’t have any problem so far mainly because most embedded code blocks are local, and it’s rare for some edit to take place far from the visible portion which affects ranges and user expects that edit to affect the current visible range. > > I don’t have any great idea for a better way to update ranges right now. Let me think about that. In the meantime, I’ll push a temporary fix so V’s original problem can be solved. I was thinking (since considering the same problem in mmm-mode, actually) that it would make sense to either plug into syntax-propertize-function, or have a parallel data structure similarly tracking the outdated buffer regions, which would only update the part of the buffer which had been modified since last time. Dealing with the "remainder" of the buffer might be trickier, but maybe some heuristic which would help detect the "no changes" case could be implemented.