From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Thu, 23 May 2024 02:42:53 +0300 Message-ID: <81dab46b-dba3-45d0-b509-1d40f4b116bf@gutov.dev> References: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> <0132CFC2-CFA0-4D58-9632-6E6E03FE57DB@gmail.com> <8E3466C4-0875-4187-ADC3-5C72FF23A24F@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18387"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla Thunderbird Cc: "Ergus via Emacs development discussions." To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu May 23 01:44:05 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s9vcr-0004bd-2d for ged-emacs-devel@m.gmane-mx.org; Thu, 23 May 2024 01:44:05 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s9vbw-0007Lv-S3; Wed, 22 May 2024 19:43:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9vbu-0007LK-DB for emacs-devel@gnu.org; Wed, 22 May 2024 19:43:06 -0400 Original-Received: from wfout2-smtp.messagingengine.com ([64.147.123.145]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9vbs-0001mb-6z for emacs-devel@gnu.org; Wed, 22 May 2024 19:43:06 -0400 Original-Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailfout.west.internal (Postfix) with ESMTP id 627161C0016C; Wed, 22 May 2024 19:42:57 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Wed, 22 May 2024 19:42:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1716421376; x=1716507776; bh=+ZAtYOHNOh0X2Mh2JoYtdz8XogV4fXWQ/3yHa8y5Bm8=; b= dIQ6PJqzx0PePqNtACkQTGhRH7dWEzEDYeCR3VFMwwaJMQ2NPYIac07Qb1vVaH8H JvGtJ/npXBinLCwR+UjPYL0PS9c2CI5S0zHMjs6MrkErYPzLkDV/xpT2e9sx8EtG bbchKLF+xLvfx9W3+MWI0T/xreQV06lV6VW6CATpjY+o8HNOvvwQZ49v4lN3pWto 51w821VQ5e9lOyk4LWgx9W85fhy3Svlm3SOSEj7jwhUuPujuEYWSWAVVKHKKa8Mk TnypGQ/Fl1vt1Abc31bqIJH/Z4BJOwPSx2yKJl2gKbsztBQ7XwoFnjgB+CaUfeO6 NW8UTE3kVsJm5NJ8bUXznQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716421376; x= 1716507776; bh=+ZAtYOHNOh0X2Mh2JoYtdz8XogV4fXWQ/3yHa8y5Bm8=; b=G xPRWMFSKHMxmTmKMwl+y/OhDPvyD1oJlXEfjuuw9AYZGXnhs3ukRrt1ujj3CfqZo CyAJv9smCDDu1cLd4Gc7NF389m8ezZIrJFNXgL02vgC6T/xR7o16O6TWPHkehI8E 7ZlGmruKwXrr7ifO+ae4eR/+vub1dndo22dPKAYIHQIaOkQrevU/S7eFohw48POn maO16uwx9No63fdWj0VQ8AjeZS8o56dpkdJFG7+Ga3/BWemvsu08KJxLPoohQZLZ Yjrf0AUCJ1A+cufg626aKAyfEtj/up45rc2pjsMzEhNCHO/WFCQU0PKcTw9hv6XL FUH3t1tDIUrlZh6NcStyQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeihedgvdehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepgeelfeetkefghfdvhfdtgeevveevteetgeetveegtedthefhudekteehffeu keeknecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 22 May 2024 19:42:55 -0400 (EDT) Content-Language: en-US In-Reply-To: <8E3466C4-0875-4187-ADC3-5C72FF23A24F@gmail.com> Received-SPF: pass client-ip=64.147.123.145; envelope-from=dmitry@gutov.dev; helo=wfout2-smtp.messagingengine.com X-Spam_score_int: -26 X-Spam_score: -2.7 X-Spam_bar: -- X-Spam_report: (-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_SBL_A=0.1 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319491 Archived-At: On 22/05/2024 08:51, Yuan Fu wrote: >> This listener would be specific for a particular consumer. In our case, we'd have a listener which would populate - and then update - the variable used by treesit--pre-redisplay. That variable would store the "up to date" list of updated ranges. The listener, on every call, would "merge" its current value one with the new list of ranges (*). treesit--pre-redisplay would use the data in that data structure instead of calling treesit-parser-changed-ranges, and set the value to nil to "reset" it for the next update. >> >> (*) So real "merging" would only need to be performed when listener fires 2+ times between the two adjacent treesit--pre-redisplay calls. Otherwise the current value is nil, so the the new list is simply assigned to the variable. Anyway, the merging logic seems to be the trickiest part in this scheme (managing and interpreting offsets), but it should be very similar in both approaches. > I agree. The usefulness of treesit-parser-changed-ranges aren’t really justified at this point (well, except that it makes the caller’s code much easier to follow). That it does. > Let me implement what you described and let’s see how it goes. Thank you, looking forward to it! > I think we don’t even need to merge the ranges (which will be prone to bugs if I were to write it 😉, we can just push the new ranges to a list and later process them one by one. I think this might amount to the same thing (merging when generating, or merging when processing). It seems there will also be a small issue of "kinds" of ranges?.. Like for example suppose we have two consecutive operations which insert new characters in range 200..300. The result should be a range that spans 200..400, right? But if one operation just changes text in that range (keeping its length intact, e.g. capitalizing the whole region), and another does the same (back to lower case), then the combined range would remain 200..300. Computing that might be difficult without having access to the kinds of changes are being done (does tree-sitter report those?). OTOH, most of the time the most important part is the position of the beginning of the changes (e.g. for syntax-ppss), and we could treat the rest of the buffer as invalidated...