From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Mon, 6 May 2024 05:04:41 +0300 Message-ID: References: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38653"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla Thunderbird Cc: "Ergus via Emacs development discussions." To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon May 06 04:05:48 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s3njf-0009qh-F3 for ged-emacs-devel@m.gmane-mx.org; Mon, 06 May 2024 04:05:48 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s3nij-0002QC-Lg; Sun, 05 May 2024 22:04:49 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s3nih-0002PF-0C for emacs-devel@gnu.org; Sun, 05 May 2024 22:04:47 -0400 Original-Received: from fhigh1-smtp.messagingengine.com ([103.168.172.152]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s3nif-0000C8-0R for emacs-devel@gnu.org; Sun, 05 May 2024 22:04:46 -0400 Original-Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailfhigh.nyi.internal (Postfix) with ESMTP id E7A7D1140118; Sun, 5 May 2024 22:04:43 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Sun, 05 May 2024 22:04:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1714961083; x=1715047483; bh=fOyP7/g/W/jbYXOBBLqfPc3ZwAs5uAY0gA9pRRYju08=; b= PJ9WklBt1H8vkol3lcnTE5jNMJYZ3J7/VH4S3qOHptNNA3f5XHt8ZHz5wz6toVRA bdkW9KCRCj91AFK+MGrl4U5Ue+a/+hVgkv9s5rEYwlHTVx8X3cdrcbg+ic04wqx/ +L7mUBIJVzngkfCn6NweaGK0hJE2loc+lLL7xHLJjB8eSWmbFhjG/uIj+l2uOzxv 440s19EZ1YgXzvjVzXtOELHcye9kltmsiEF8P6nH+mLq98suJAT6QI6wlG0VlLLO VV4nvev2dNHgavdnKN0hUP3HO3PIpwrpS+iAqSmXaINpQeU+UlFnDKa59Bjl6AiN ptFNIN5CH3Ulq1DksV+vQQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1714961083; x= 1715047483; bh=fOyP7/g/W/jbYXOBBLqfPc3ZwAs5uAY0gA9pRRYju08=; b=c hfpb/MWaQNo8Jc3udwbcRKt9UKmuUFhE4YgEXMOjl1aJugzSes5Qa6FvXnWMGrrr +XVGoHB6wdtZPLZ1+B5AeE6p4uhobSScfhGtiZqq7ahK8RBHVw+hdABbNBYqHK+n u4n3SQRCEfRStnvV/KQ7cKvhXCS3B8SgnNI3FGgJUp+7z2MhpS18i7UnsQWVzcNm ASygDTu+HWFtB1iu/59RRNs3VOa6vHfuWifgK7IbT1Sl2DiflxoTgtUr7hd9zTPV JmB6960uNWpUtwlNoNL8JTdFJXXNXl4WLEdaV0w37dQAgyOH2augO/VL4zx4wdDZ fF0KxQEygt4Qxpybt+7RQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvddvhedgheehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepgeelfeetkefghfdvhfdtgeevveevteetgeetveegtedthefhudekteehffeu keeknecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 5 May 2024 22:04:42 -0400 (EDT) Content-Language: en-US In-Reply-To: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> Received-SPF: pass client-ip=103.168.172.152; envelope-from=dmitry@gutov.dev; helo=fhigh1-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:318870 Archived-At: Hi Yuan, Sorry if I'm being too pedantic here. On 20/04/2024 05:18, Yuan Fu wrote: > I believe I’ve found a good way to solve this problem. I pushed the changes to master. > > Basically I added a function treesit-parser-changed-ranges that can directly return the change ranges from last reparse. This means we don’t need to use notifiers to get those change ranges anymore. Then in treesit-pre-redisplay, we reparse the primary parser and get the changed ranges from it. > > Once we have the changed ranges, we update other non-primary parser’s ranges, but only within the changed ranges. Originally we were updating those parser’s ranges on the whole buffer, which led to the slowdown. Then we had to use some workaround to solve this. Now the workaround isn’t needed anymore. The essence of the change (querying fewer ranges) looks good. I'm a bit uneasy about the new function and how it's supposed to be used. treesit-parser-changed-ranges returns the ranges changes during the last reparse. That seems to imply that all of its callers must have the up-to-date information about the state of the buffer before that reparse, and thus basically follow the parser's updates through some mechanism. The implementation also saves some information during every reparse, whether somebody is going to call treesit-parser-changed-ranges or not. To take our new code as an example, the only client of treesit-parser-changed-ranges now is treesit--pre-redisplay, which is called from syntax-propertize-extend-region-functions and pre-redisplay-functions. Is it possible that there would occur multiple changes and reparses between some firings of the above hooks? For example, some new feature might go over the buffer's text with an automated multi-step transformation, calling the parser (but not syntax-ppss) on each step. In such a scenario it seems treesit--pre-redisplay might miss intermediate range updates. Would that be okay?