From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Vincenzo Pupillo Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Tue, 04 Jun 2024 10:19:30 +0200 Message-ID: <3466362.LZWGnKmheA@3-191.divsi.unimi.it> References: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> <46b255d5-d8ec-49ce-b649-02ce8488e873@gutov.dev> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20875"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "Ergus via Emacs development discussions." , Stefan Monnier , Yuan Fu To: Dmitry Gutov , emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Jun 04 16:13:50 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sEUv7-0005Ak-3G for ged-emacs-devel@m.gmane-mx.org; Tue, 04 Jun 2024 16:13:49 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sEUuZ-0006Vf-TX; Tue, 04 Jun 2024 10:13:16 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sEPOZ-0001lh-0l for emacs-devel@gnu.org; Tue, 04 Jun 2024 04:19:51 -0400 Original-Received: from mout-u-204.mailbox.org ([2001:67c:2050:101:465::204]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256) (Exim 4.90_1) (envelope-from ) id 1sEPOR-0005EH-S7 for emacs-devel@gnu.org; Tue, 04 Jun 2024 04:19:48 -0400 Original-Received: from smtp1.mailbox.org (smtp1.mailbox.org [10.196.197.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-u-204.mailbox.org (Postfix) with ESMTPS id 4Vtk5w5lbPz9sc9; Tue, 4 Jun 2024 10:19:32 +0200 (CEST) In-Reply-To: Received-SPF: pass client-ip=2001:67c:2050:101:465::204; envelope-from=vincenzo.pupillo@lpsd.it; helo=mout-u-204.mailbox.org X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Tue, 04 Jun 2024 10:13:11 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319819 Archived-At: Thank you Yuan! Yesterday I tried your patch with my php-ts-mode (with an without treesit-p= rimary-parser) and works fine.=20 Yesterday I pushed a commit that uses the new variable.=20 When I work at home, php-ts-mode is the major-mode I use, and is almost done (I just need to add in support for flymake).=20 Next week I hope to submit it for inclusion in emacs.=20 It also seems to me that it might also be useful for trying out various=20 combinations of parser.=20 Thanks V. In data marted=C3=AC 4 giugno 2024 06:53:47 CEST, Yuan Fu ha scritto: >=20 > > On May 27, 2024, at 3:24=E2=80=AFPM, Dmitry Gutov wr= ote: > >=20 > > On 28/05/2024 01:03, Yuan Fu wrote: > >=20 > >>> But if one operation just changes text in that range (keeping its len= gth intact, e.g. capitalizing the whole region), and another does the same = (back to lower case), then the combined range would remain 200..300. > >>>=20 > >>> Computing that might be difficult without having access to the kinds = of changes are being done (does tree-sitter report those?). OTOH, most of t= he time the most important part is the position of the beginning of the cha= nges (e.g. for syntax-ppss), and we could treat the rest of the buffer as i= nvalidated=E2=80=A6 > >> Oh you=E2=80=99re absolutely right, the range will be shifted by later= edits in the buffer. It=E2=80=99ll be hella hairy to keep track of all tha= t=E2=80=94say the previous changed range is (100 . 200), and user inserted = 50 chars in position 150, we need to account for that and update the range = to (100 . 250) before merging the new updated ranges with this one. > >> So it seems the best way is really to move treesit--pre-redisplay enti= rely into the primary parser=E2=80=99s notifier, WDYT? > >=20 > > Yep, that sounds easier. And the performance should be about the same, = even if it'd have a bit extra overhead in those theoretical complex cases. > >=20 >=20 > Ok, I pushed a commit to master that does just that. I tried with C=E2=80= =99s block comment, and php-ts-mode. Everything seems to work fine. >=20 > I also added treesit-primary-parser. This is supposed to be another confi= guration variable that a major mode should set. I=E2=80=99ve encountered va= rious cases where knowing the primary parser (parser that parses the entire= buffer rather than just a subset of it) would be very helpful. Treesit-pri= mary-parser can be auto-guessed if major mode doesn=E2=80=99t set it, so it= shouldn=E2=80=99t break anything. I=E2=80=99d love to know yours and Stefa= n=E2=80=99s thoughts on it. >=20 > Yuan >=20