From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Tue, 28 May 2024 01:24:16 +0300 Message-ID: <46b255d5-d8ec-49ce-b649-02ce8488e873@gutov.dev> References: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> <0132CFC2-CFA0-4D58-9632-6E6E03FE57DB@gmail.com> <8E3466C4-0875-4187-ADC3-5C72FF23A24F@gmail.com> <81dab46b-dba3-45d0-b509-1d40f4b116bf@gutov.dev> <6D101DD5-6201-4CA6-A105-28A6DA32C3DF@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16217"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla Thunderbird Cc: "Ergus via Emacs development discussions." To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue May 28 00:25:15 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sBimH-0003wH-ID for ged-emacs-devel@m.gmane-mx.org; Tue, 28 May 2024 00:25:14 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sBilV-0006I7-LC; Mon, 27 May 2024 18:24:25 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sBilU-0006Hc-AJ for emacs-devel@gnu.org; Mon, 27 May 2024 18:24:24 -0400 Original-Received: from fout8-smtp.messagingengine.com ([103.168.172.151]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sBilS-0005XW-9F for emacs-devel@gnu.org; Mon, 27 May 2024 18:24:24 -0400 Original-Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailfout.nyi.internal (Postfix) with ESMTP id 0B2641380147; Mon, 27 May 2024 18:24:20 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Mon, 27 May 2024 18:24:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1716848660; x=1716935060; bh=dxC497QkI8lUSWxou0aAepArva1fjs3pbNZpP0i7WjE=; b= FyM0A3qG1SQo2C9Uamt3Fh3+OQYbkmzZlBFJ7q5ogUZaHtuqp6dCmo7LudAdAehQ FK2MGDjW1+HwzbUg4KYh42lFprWLrfsmwCetn1VVt1cPEMMubrKHycyRIeUS52GQ YSPAboIj5eClxUHrGfCnVDWV6StAkmZum49r2njyuhe7DgL2kgXf5FbHAPCALhCl bhIlpryB7OX10ly44l3Pkh+AGSxcALObJwuqw1weC/fafGNOHygqxR7t+8yi2Zbe rHY8SeG2Imss3M3NBQPdZaYTTiDDTGVguWYaA4RhHaQ0bQiWJiP+KphKW6fxbwi/ 0D26LPFGkkoa0xaB7tMW0g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716848660; x= 1716935060; bh=dxC497QkI8lUSWxou0aAepArva1fjs3pbNZpP0i7WjE=; b=F 13SLPq/JI1hvAt+5n7faWuevIBM5Tc/2t3xHB4JBCbgDHjewIJG5CJy5zsNbpyZU j+yXxxXnbZtmIUFiYbWNfK7UZhArXqsoWJb7ph2UnRw+1hGOb8GmKbMxwpgDEXI1 DIO5pyim/vEJiMU3pGHB8guj8WxL+kLASRqRiIK0FwxX2P3DYE59b1DlZE3rQmYb 9LZpdM8FPSKtH7RmfhDEVCTuUcK2s1pW6oRh3uK0rIaw+FgWnJzAv826j32zgY0K JBLd2edliTh4YS1RXkiRzsTPyYaSFDqJ7YtG2HFvcG88Ikc6TVzPbQclNC5IoxIR 71ZeRaqJH8/fPby3j93iA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdejhedgtdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepgeelfeetkefghfdvhfdtgeevveevteetgeetveegtedthefhudekteehffeu keeknecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 27 May 2024 18:24:18 -0400 (EDT) Content-Language: en-US In-Reply-To: <6D101DD5-6201-4CA6-A105-28A6DA32C3DF@gmail.com> Received-SPF: pass client-ip=103.168.172.151; envelope-from=dmitry@gutov.dev; helo=fout8-smtp.messagingengine.com X-Spam_score_int: -26 X-Spam_score: -2.7 X-Spam_bar: -- X-Spam_report: (-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_SBL_A=0.1 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319633 Archived-At: On 28/05/2024 01:03, Yuan Fu wrote: >> But if one operation just changes text in that range (keeping its length intact, e.g. capitalizing the whole region), and another does the same (back to lower case), then the combined range would remain 200..300. >> >> Computing that might be difficult without having access to the kinds of changes are being done (does tree-sitter report those?). OTOH, most of the time the most important part is the position of the beginning of the changes (e.g. for syntax-ppss), and we could treat the rest of the buffer as invalidated… > > Oh you’re absolutely right, the range will be shifted by later edits in the buffer. It’ll be hella hairy to keep track of all that—say the previous changed range is (100 . 200), and user inserted 50 chars in position 150, we need to account for that and update the range to (100 . 250) before merging the new updated ranges with this one. > > So it seems the best way is really to move treesit--pre-redisplay entirely into the primary parser’s notifier, WDYT? Yep, that sounds easier. And the performance should be about the same, even if it'd have a bit extra overhead in those theoretical complex cases.