From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Fri, 19 Apr 2024 19:18:53 -0700 Message-ID: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6.1.1\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18036"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "Ergus via Emacs development discussions." To: =?utf-8?B?5LuY56a55a6J?= , Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Apr 20 04:19:59 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ry0Kc-0004Um-Pt for ged-emacs-devel@m.gmane-mx.org; Sat, 20 Apr 2024 04:19:58 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ry0Ju-00079B-O6; Fri, 19 Apr 2024 22:19:14 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ry0Jp-00078k-Jc for emacs-devel@gnu.org; Fri, 19 Apr 2024 22:19:09 -0400 Original-Received: from mail-pg1-x52f.google.com ([2607:f8b0:4864:20::52f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ry0Jn-00074w-Fq for emacs-devel@gnu.org; Fri, 19 Apr 2024 22:19:09 -0400 Original-Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-5d8b887bb0cso1977524a12.2 for ; Fri, 19 Apr 2024 19:19:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713579545; x=1714184345; darn=gnu.org; h=to:cc:date:message-id:subject:mime-version :content-transfer-encoding:from:from:to:cc:subject:date:message-id :reply-to; bh=F6fCY4iK8YK3e9D5OgPAYVcA60B6SEpkwcEYDe0TynM=; b=mJ8EHDrrit6uQlIcTEnjP+X+Wcv/q+C51g96VC/8fp6dsyyESaERwRtORMueq85Fep SgL1KkBTB+ZmOHXGbSJdYioWRuMSvHLRtyKrg4wwDwqJ7IOltyhs+2eu6isdv4iGGGI0 ZeWO0lxUvl538pIFXUMlAPE+bnkENDQQumduDAmT3yMKRTuLlmzRq0rn1vfcp3/pxcvi 6xTu9kUsr4QuGg85bNISTjvzxJndcRzwisY/bxKNgBvAmQYpV5gDXIx8QHybkmFqDUx7 llIvKSHyxqOOBcyi0dPWDJji8WDp1EgCWXkQp4xOILlJfttL2Mgav72G5VmJRDvsNs0r vi1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713579545; x=1714184345; h=to:cc:date:message-id:subject:mime-version :content-transfer-encoding:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F6fCY4iK8YK3e9D5OgPAYVcA60B6SEpkwcEYDe0TynM=; b=RZ7uD5r2xxw8bj1s6ep7pmx3ped3QzItbBLxwIdPHeHdnThjXLdXhTZHfAXzMBiXmW udEVI8KPmJFbKjeRmLflALMr9xnqv/geGIVAU04OOtCRFtCMzD+bxFrJHgrpTU5wIZL8 bbNuG43lm7bQPtqz6lyORn5CQIHmGB7LZL8jx27FLHC5JcAu6cuUgvEC5NU+QuuAwcaZ BSYZypvFzwJGoA4+plKMl9EisqhM/c9H9+3/OMQKZn6WkfLNFiqfj+NMCw3SuhM8bWZR JpDN2NaRsvaB4v84/HMnkLU7gZEz2qg/5mSOdgzpOzZcDe09qtA2VMg3v+llq0Cm4Tl+ Io4w== X-Gm-Message-State: AOJu0YydRQ11gNiTBGUp7jFjZ9ByIMLkS9pUFnkiU2H+90CAMXKwLNFX LjupZWv6paLpQnwNtp6Fw2sizKrPeX4+eODRNQ6PMut5dgR3lMDD X-Google-Smtp-Source: AGHT+IF4lJrsbBYwau9dz53mJA1q+4jUPTzczTzHk4PYYAXdp+e+7fmacDK47ft1A7L9s7tXRRFjEA== X-Received: by 2002:a05:6a20:244d:b0:1a8:4266:3d02 with SMTP id t13-20020a056a20244d00b001a842663d02mr5765532pzc.30.1713579545275; Fri, 19 Apr 2024 19:19:05 -0700 (PDT) Original-Received: from smtpclient.apple ([2601:641:300:4910:21b4:7382:dc:3202]) by smtp.gmail.com with ESMTPSA id fk24-20020a056a003a9800b006ed9d839c4csm3924053pfb.4.2024.04.19.19.19.04 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Apr 2024 19:19:04 -0700 (PDT) X-Mailer: Apple Mail (2.3731.700.6.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::52f; envelope-from=casouri@gmail.com; helo=mail-pg1-x52f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317856 Archived-At: >=20 >=20 > > On Feb 18, 2024, at 9:53 PM, Yuan Fu wrote: > >=20 > >=20 > >=20 > >> On Feb 17, 2024, at 7:37 PM, Dmitry Gutov wrote: > >>=20 > >> On 13/02/2024 10:08, Yuan Fu wrote: > >>=20 > >>>> On 12/02/2024 06:16, Yuan Fu wrote: > >>>>> Thanks, the culprit is the call to treesit-update-ranges in > >>>>> treesit--pre-redisplay, where we don=E2=80=99t pass it any = specific range, so it > >>>>> updates the range for the whole buffer. Eli, is there any way to = get a > >>>>> rough estimate the range that redisplay is refreshing? Do you = think > >>>>> something like this would work? > >>>>=20 > >>>> If we don't update the ranges outside of some interval = surrounding the=20 > >>>> window, what does that mean for correctness? > >>> If the place of update and the embedded code currently in view = belong to=20 > >>> the same node in the host language, then when we update ranges for = the=20 > >>> current window-visible range, the whole node=E2=80=99s range is = updated. So at=20 > >>> least for this node, the range is correct. > >>> If the place of update and the embedded code currently in view = belong to=20 > >>> different nodes in the host language, then when we update ranges = for the=20 > >>> current window-visible range, only the visible node=E2=80=99s = range is updated. > >>=20 > >> Okay. What about positions after the visible part of the buffer? = Can their=20 > >> ranges be outdated? It's probably okay when the ranges are only = used for=20 > >> font-lock and syntax-ppss, but I wonder about possible other = applications=20 > >> (reindenting the whole buffer, for example). > >=20 > > It=E2=80=99s the same as positions before the visible part. For = reindenting the whole=20 > > buffer, treesit-indent-region will update the range for the whole = buffer at=20 > > the very beginning. > >=20 > >>=20 > >>>>=20 > >>>> Perhaps the mode has a syntax-propertize-function which behaves=20= > >>>> differently (as it should) depending on the language at point. Or=20= > >>>> different ranges have different syntax tables, something like = that. > >>>>=20 > >>>> If the ranges, after some edit (perhaps a programmatic one, = performed far=20 > >>>> from the visible area), are kept not update somewhere around the = beginning=20 > >>>> of the buffer, do we not risk confusing the syntax-ppss parser, = for=20 > >>>> example? > >>> That can happen, yes. > >>>>=20 > >>>> Come to think of it, take treesit-indent: it only updates the = ranges for=20 > >>>> the current line. But the line's indentation usually depends on = the=20 > >>>> previous buffer positions, doesn't it? > >>> The range passed to treesit-update-ranges act as an intercepting = range=E2=80=94we=20 > >>> capture nodes that intercepts with the range and use them to = update ranges.=20 > >>> If the line to be indented is in an embedded language block, the = whole=20 > >>> block will be captured and it=E2=80=99s range will be given to the = embedded=20 > >>> language parser. > >>> We haven=E2=80=99t have any problem so far mainly because most = embedded code blocks=20 > >>> are local, and it=E2=80=99s rare for some edit to take place far = from the visible=20 > >>> portion which affects ranges and user expects that edit to affect = the=20 > >>> current visible range. > >>> I don=E2=80=99t have any great idea for a better way to update = ranges right now.=20 > >>> Let me think about that. In the meantime, I=E2=80=99ll push a = temporary fix so V=E2=80=99s=20 > >>> original problem can be solved. > >>=20 > >> I was thinking (since considering the same problem in mmm-mode, = actually)=20 > >> that it would make sense to either plug into = syntax-propertize-function, or=20 > >> have a parallel data structure similarly tracking the outdated = buffer=20 > >> regions, which would only update the part of the buffer which had = been=20 > >> modified since last time. > >>=20 > >> Dealing with the "remainder" of the buffer might be trickier, but = maybe some=20 > >> heuristic which would help detect the "no changes" case could be = implemented. > >=20 > > Yeah, something similar to syntax-ppss or jit-lock. Or maybe it can = be=20 > > avoided, since the current on-demand range update has been working = fine,=20 > > until we added treesit--pre-redisplay for syntax-ppss. >=20 > This is actually a bit involved, because there could be multiple = layer=E2=80=99s of=20 > parsers: the host language sets range for a local parser, and the = local parser=20 > can set ranges for a nested-nested parser. Eg, we might have a = markdown parser=20 > for parsing doc-comments, and inside the markdown there could be code = blocks=20 > which require another level of nested parser. >=20 > This use-case is a bit advanced but we definitely need to support it = in our=20 > design. And my brain is twisted by all the dependency and range. If = you guys=20 > has some ideas they=E2=80=99ll be most welcome :-) >=20 I believe I=E2=80=99ve found a good way to solve this problem. I pushed = the changes to master.=20 Basically I added a function treesit-parser-changed-ranges that can = directly return the change ranges from last reparse. This means we = don=E2=80=99t need to use notifiers to get those change ranges anymore. = Then in treesit-pre-redisplay, we reparse the primary parser and get the = changed ranges from it. Once we have the changed ranges, we update other non-primary parser=E2=80=99= s ranges, but only within the changed ranges. Originally we were = updating those parser=E2=80=99s ranges on the whole buffer, which led to = the slowdown. Then we had to use some workaround to solve this. Now the = workaround isn=E2=80=99t needed anymore. I also remove some notifier functions and moved their work into = treesit-pre-redisplay. Yuan=