From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Vincenzo Pupillo Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Sat, 20 Apr 2024 21:14:14 +0200 Message-ID: <6049335.lOV4Wx5bFT@fedora> References: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40460"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "Ergus via Emacs development discussions." , Yuan Fu To: =?utf-8?B?5LuY56a55a6J?= , Dmitry Gutov , emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Apr 20 21:15:06 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ryGB0-000AKz-02 for ged-emacs-devel@m.gmane-mx.org; Sat, 20 Apr 2024 21:15:06 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ryGAI-0004rp-Hq; Sat, 20 Apr 2024 15:14:22 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ryGAG-0004rR-QI for emacs-devel@gnu.org; Sat, 20 Apr 2024 15:14:20 -0400 Original-Received: from mail-wm1-x329.google.com ([2a00:1450:4864:20::329]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ryGAE-0002qq-AE for emacs-devel@gnu.org; Sat, 20 Apr 2024 15:14:20 -0400 Original-Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-41a0979b9aeso2388865e9.3 for ; Sat, 20 Apr 2024 12:14:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713640456; x=1714245256; darn=gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dC4aCgX/7KwTKZ3fqO7A+Fxc4XdhqpFiIF9tLw20Z1Q=; b=BJ8WmafOzcIG8WSse+6myFE+7efXUh4o9VPBKG1KK21U0p/pDjnrKDgdSKbORi3H17 84wVVmt9QiVSzoms9huggPTZNNxKPpUi7heQ7l3lgx5Tw421KoOXHKNpIhxYfIR1AY8O 4S0dtmdwSvJ1d3YYizheN5e+xKrWfgDG/OIlhrbYhsrGszXqjA6arGmv7didCjNhYFwl HVFDSyhp6A0ZykOYtQqnOCWnVUJIbRqg+BHqbKsq9jhJ4M8jN60IDK9Meq8p0YFFvOn4 PjiOIN/ozgxAPyllk+hK1SfCwyVjyt9+U7VoylCH0gSFaz+JwmrtyF3a/w9cjV82cMkW 6xag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713640456; x=1714245256; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dC4aCgX/7KwTKZ3fqO7A+Fxc4XdhqpFiIF9tLw20Z1Q=; b=M2W7jb1FPkQv0KAD25Y0vxh0lHLCGaT3kmE7xjKfd2exBREn/IQo99Hu086zTQD+Ga BqG4PQNvPrGyz7DYgCqteFqkJvL4KFKRDIViEvMzo+AVRGkDKyaLewtCQ+EBH8p3FK9f XwTT/IgZes1QJEhUCCwfuOfYZ1tdjbXqO88JCwTEkRFDxJh0OD57p85NF3VbwX64OKSZ iS9orKtMmotcjkSxJkRi7ma5IbgGusIsZen6XOckWgbt8Yb6B/50ss3XIeIB7SqxutGN lbtRNnTIh9oh0+WhTsDaCzgflxlif5DSICk3FAmj3P7/JDMWlrVShgEOm5qlLn78mA0X rd/w== X-Gm-Message-State: AOJu0YzpwXqPSu9Qgd1mQ2QKaRnefNc4GHQLOqbk8ASYkvJQwBFciJdA BUKbqu+O5zH3IofoS5o2nOHkmU8gwPoTidmywI7JLWj+/ajEZ5C5 X-Google-Smtp-Source: AGHT+IGkmECThOERRAfQbBHGRnBK6ETDKbAYc+2D6/ZzzdcyDM83WO1MTLoaDltAvuR47yTeq2BsUQ== X-Received: by 2002:a05:600c:4754:b0:416:7b2c:df0f with SMTP id w20-20020a05600c475400b004167b2cdf0fmr4972523wmo.7.1713640455795; Sat, 20 Apr 2024 12:14:15 -0700 (PDT) Original-Received: from fedora.localnet (2-230-139-124.ip202.fastwebnet.it. [2.230.139.124]) by smtp.gmail.com with ESMTPSA id r16-20020a5d6950000000b003477d26736dsm7512909wrw.94.2024.04.20.12.14.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Apr 2024 12:14:15 -0700 (PDT) In-Reply-To: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> Received-SPF: pass client-ip=2a00:1450:4864:20::329; envelope-from=v.pupillo@gmail.com; helo=mail-wm1-x329.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317892 Archived-At: Great job! I tried your new patch with my usual benchmark, tcpdf.php, and my php-ts-mo= de=20 and it works very well! Thank you very much V. In data sabato 20 aprile 2024 04:18:53 CEST, Yuan Fu ha scritto: > > > On Feb 18, 2024, at 9:53 PM, Yuan Fu wrote: > > >> On Feb 17, 2024, at 7:37 PM, Dmitry Gutov wrote: > > >>=20 > > >> On 13/02/2024 10:08, Yuan Fu wrote: > > >>>> On 12/02/2024 06:16, Yuan Fu wrote: > > >>>>> Thanks, the culprit is the call to treesit-update-ranges in > > >>>>> treesit--pre-redisplay, where we don=E2=80=99t pass it any specif= ic range, > > >>>>> so it > > >>>>> updates the range for the whole buffer. Eli, is there any way to = get > > >>>>> a > > >>>>> rough estimate the range that redisplay is refreshing? Do you thi= nk > > >>>>> something like this would work? > > >>>>=20 > > >>>> If we don't update the ranges outside of some interval surrounding > > >>>> the > > >>>> window, what does that mean for correctness? > > >>>=20 > > >>> If the place of update and the embedded code currently in view belo= ng > > >>> to > > >>> the same node in the host language, then when we update ranges for = the > > >>> current window-visible range, the whole node=E2=80=99s range is upd= ated. So at > > >>> least for this node, the range is correct. > > >>> If the place of update and the embedded code currently in view belo= ng > > >>> to > > >>> different nodes in the host language, then when we update ranges for > > >>> the > > >>> current window-visible range, only the visible node=E2=80=99s range= is > > >>> updated. > > >>=20 > > >> Okay. What about positions after the visible part of the buffer? Can > > >> their > > >> ranges be outdated? It's probably okay when the ranges are only used > > >> for > > >> font-lock and syntax-ppss, but I wonder about possible other > > >> applications > > >> (reindenting the whole buffer, for example). > > >=20 > > > It=E2=80=99s the same as positions before the visible part. For reind= enting the > > > whole buffer, treesit-indent-region will update the range for the who= le > > > buffer at the very beginning. > > >=20 > > >>>> Perhaps the mode has a syntax-propertize-function which behaves > > >>>> differently (as it should) depending on the language at point. Or > > >>>> different ranges have different syntax tables, something like that. > > >>>>=20 > > >>>> If the ranges, after some edit (perhaps a programmatic one, perfor= med > > >>>> far > > >>>> from the visible area), are kept not update somewhere around the > > >>>> beginning > > >>>> of the buffer, do we not risk confusing the syntax-ppss parser, for > > >>>> example? > > >>>=20 > > >>> That can happen, yes. > > >>>=20 > > >>>> Come to think of it, take treesit-indent: it only updates the rang= es > > >>>> for > > >>>> the current line. But the line's indentation usually depends on the > > >>>> previous buffer positions, doesn't it? > > >>>=20 > > >>> The range passed to treesit-update-ranges act as an intercepting > > >>> range=E2=80=94we > > >>> capture nodes that intercepts with the range and use them to update > > >>> ranges. > > >>> If the line to be indented is in an embedded language block, the wh= ole > > >>> block will be captured and it=E2=80=99s range will be given to the = embedded > > >>> language parser. > > >>> We haven=E2=80=99t have any problem so far mainly because most embe= dded code > > >>> blocks > > >>> are local, and it=E2=80=99s rare for some edit to take place far fr= om the > > >>> visible > > >>> portion which affects ranges and user expects that edit to affect t= he > > >>> current visible range. > > >>> I don=E2=80=99t have any great idea for a better way to update rang= es right > > >>> now. > > >>> Let me think about that. In the meantime, I=E2=80=99ll push a tempo= rary fix so > > >>> V=E2=80=99s > > >>> original problem can be solved. > > >>=20 > > >> I was thinking (since considering the same problem in mmm-mode, > > >> actually) > > >> that it would make sense to either plug into > > >> syntax-propertize-function, or > > >> have a parallel data structure similarly tracking the outdated buffer > > >> regions, which would only update the part of the buffer which had be= en > > >> modified since last time. > > >>=20 > > >> Dealing with the "remainder" of the buffer might be trickier, but ma= ybe > > >> some heuristic which would help detect the "no changes" case could be > > >> implemented.> >=20 > > > Yeah, something similar to syntax-ppss or jit-lock. Or maybe it can be > > > avoided, since the current on-demand range update has been working fi= ne, > > > until we added treesit--pre-redisplay for syntax-ppss. > >=20 > > This is actually a bit involved, because there could be multiple layer= =E2=80=99s > > of > > parsers: the host language sets range for a local parser, and the local > > parser can set ranges for a nested-nested parser. Eg, we might have a > > markdown parser for parsing doc-comments, and inside the markdown there > > could be code blocks which require another level of nested parser. > >=20 > > This use-case is a bit advanced but we definitely need to support it in > > our > > design. And my brain is twisted by all the dependency and range. If you > > guys has some ideas they=E2=80=99ll be most welcome :-) >=20 > I believe I=E2=80=99ve found a good way to solve this problem. I pushed t= he changes > to master. >=20 > Basically I added a function treesit-parser-changed-ranges that can direc= tly > return the change ranges from last reparse. This means we don=E2=80=99t n= eed to use > notifiers to get those change ranges anymore. Then in > treesit-pre-redisplay, we reparse the primary parser and get the changed > ranges from it. >=20 > Once we have the changed ranges, we update other non-primary parser=E2=80= =99s > ranges, but only within the changed ranges. Originally we were updating > those parser=E2=80=99s ranges on the whole buffer, which led to the slowd= own. Then > we had to use some workaround to solve this. Now the workaround isn=E2=80= =99t > needed anymore. >=20 > I also remove some notifier functions and moved their work into > treesit-pre-redisplay. >=20 > Yuan