From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Yuan Fu <casouri@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: treesitter local parser: huge slowdown and memory usage in a long
 file
Date: Sun, 18 Feb 2024 21:53:45 -0800
Message-ID: <B8D6CA18-4C24-4858-842E-D951CC2A4D37@gmail.com>
References: <5991618.MhkbZ0Pkbq@fedora>
 <93F7DE13-0EC7-4A17-89B1-E07C99C6347B@gmail.com>
 <acd2994d-ca5f-4747-9875-4d362697108f@gutov.dev>
 <47F1243E-0515-418D-96B9-4D3FE3CC4BBC@gmail.com>
 <b0e91c50-6f5f-460f-a63d-6e7f4f13abc5@gutov.dev>
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\))
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="6215"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: Vincenzo Pupillo <v.pupillo@gmail.com>,
 "Ergus via Emacs development discussions." <emacs-devel@gnu.org>,
 Eli Zaretskii <eliz@gnu.org>
To: Dmitry Gutov <dmitry@gutov.dev>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Feb 19 06:54:53 2024
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1rbwc7-0001LP-FZ
	for ged-emacs-devel@m.gmane-mx.org; Mon, 19 Feb 2024 06:54:51 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces@gnu.org>)
	id 1rbwbN-0000yL-74; Mon, 19 Feb 2024 00:54:05 -0500
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <casouri@gmail.com>) id 1rbwbJ-0000xh-Q9
 for emacs-devel@gnu.org; Mon, 19 Feb 2024 00:54:01 -0500
Original-Received: from mail-pg1-x52b.google.com ([2607:f8b0:4864:20::52b])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <casouri@gmail.com>)
 id 1rbwbI-0000kh-4u; Mon, 19 Feb 2024 00:54:01 -0500
Original-Received: by mail-pg1-x52b.google.com with SMTP id
 41be03b00d2f7-5ce07cf1e5dso3146941a12.2; 
 Sun, 18 Feb 2024 21:53:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1708322038; x=1708926838; darn=gnu.org;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=xJWVqeGpKrEgcAg4TeykkbYmbwkIQe1eRxn2IADPDd8=;
 b=FDNO1iR0SYq+F36YgLrse2yu3KglTSCUv/2Btva9j/1Azb7xm2A3Q12HMIDmbry/es
 KEPKuE/Yw6cYCht/NgiuFo+iDYH+zq2L25eguQuEtHUQEZ+HtbcMPXoRrhJz6b2gT+mk
 lCUftEgbMH9XbKwk2O3D81IHOFyEESP6iOQCVB/xqTSIp9mXQxV6YxNNtPilJRWiERn5
 Iyarm8+k7UAJ7eVRHtyhXlNVRQUH1u30K11J9omG7OHulerHzbPglMMaKBhOO/khID6X
 jUj5ZBJIhuVvND1zJHg3Oe9m9xsMizX6oydN0jXgf8VRS2+riQDuENcYAI9l7INjSOpm
 xDlQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1708322038; x=1708926838;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=xJWVqeGpKrEgcAg4TeykkbYmbwkIQe1eRxn2IADPDd8=;
 b=QfUkS2nvZGCX3va2mplj8RTgpe42Ofrosohrr/exjQgzINkY94TJpSEJKVmIDRR0GC
 CtAYYkIMHItkJnnHNhxid9x85fODshezPIc5YTvn/NEwbSLZFgOyhdNlxHzhkVhfvrqn
 /TlyqHj1CmOH7uXgDYBs05d3LEpc4k2Gmact7N/zfUXXRKeN3mz3Qm61X0GfP2BPdACQ
 p/FcUrc7dog82JW6X2xeKC279BG0TKUeplflN7AJjEGEwovAEW3dm9PmTb/ro+bG2/j/
 5nY4t20MahdEydTahw1jI4/+FWdnY/rVTNrCsafRx+gXaPuNiZdtPpDU1YQHc3mGf++U
 Gemg==
X-Forwarded-Encrypted: i=1;
 AJvYcCVe7E3okCj1SP9dc7L7RWMyQ/avrXgBt4U4l3DvoJiaep0+XU7ryYXj9FutNDQcA1/ROwUiMFHCZVlWHiXxKcdEkP6Yl6LGZCfnpIc8BxmQq9E=
X-Gm-Message-State: AOJu0YypeLaRI1Xw+6OnX64BqWiQS4e0Oi5YiTtfIEuVTsdn+9Ei9qty
 8vpoRkX3IBXqzIaE6SiCYxhAfdKBZD2K2fBKuUOi2DRU26GLiCU9
X-Google-Smtp-Source: AGHT+IHMxfOfalYKKW5nqOufumojxgBIT3BYgNYw8GWnVeEog5nLm7cvHyK7Poz+aVUJLj+AdRsbdg==
X-Received: by 2002:a05:6a21:9101:b0:19e:ccb2:fd80 with SMTP id
 tn1-20020a056a21910100b0019eccb2fd80mr10562759pzb.8.1708322037883; 
 Sun, 18 Feb 2024 21:53:57 -0800 (PST)
Original-Received: from smtpclient.apple (172-117-161-177.res.spectrum.com.
 [172.117.161.177]) by smtp.gmail.com with ESMTPSA id
 e14-20020a170902cf4e00b001db523e58f6sm3524474plg.133.2024.02.18.21.53.56
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Sun, 18 Feb 2024 21:53:57 -0800 (PST)
In-Reply-To: <b0e91c50-6f5f-460f-a63d-6e7f4f13abc5@gutov.dev>
X-Mailer: Apple Mail (2.3731.700.6)
Received-SPF: pass client-ip=2607:f8b0:4864:20::52b;
 envelope-from=casouri@gmail.com; helo=mail-pg1-x52b.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.devel:316338
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/316338>



> On Feb 17, 2024, at 7:37 PM, Dmitry Gutov <dmitry@gutov.dev> wrote:
>=20
> On 13/02/2024 10:08, Yuan Fu wrote:
>=20
>>> On 12/02/2024 06:16, Yuan Fu wrote:
>>>> Thanks, the culprit is the call to treesit-update-ranges in
>>>> treesit--pre-redisplay, where we don=E2=80=99t pass it any specific =
range, so it
>>>>  updates the range for the whole buffer. Eli, is there any way to =
get a
>>>> rough estimate the range that redisplay is refreshing? Do you think
>>>> something like this would work?
>>>=20
>>> If we don't update the ranges outside of some interval surrounding =
the window, what does that mean for correctness?
>> If the place of update and the embedded code currently in view belong =
to the same node in the host language, then when we update ranges for =
the current window-visible range, the whole node=E2=80=99s range is =
updated. So at least for this node, the range is correct.
>> If the place of update and the embedded code currently in view belong =
to different nodes in the host language, then when we update ranges for =
the current window-visible range, only the visible node=E2=80=99s range =
is updated.
>=20
> Okay. What about positions after the visible part of the buffer? Can =
their ranges be outdated? It's probably okay when the ranges are only =
used for font-lock and syntax-ppss, but I wonder about possible other =
applications (reindenting the whole buffer, for example).

It=E2=80=99s the same as positions before the visible part. For =
reindenting the whole buffer, treesit-indent-region will update the =
range for the whole buffer at the very beginning.

>=20
>>>=20
>>> Perhaps the mode has a syntax-propertize-function which behaves =
differently (as it should) depending on the language at point. Or =
different ranges have different syntax tables, something like that.
>>>=20
>>> If the ranges, after some edit (perhaps a programmatic one, =
performed far from the visible area), are kept not update somewhere =
around the beginning of the buffer, do we not risk confusing the =
syntax-ppss parser, for example?
>> That can happen, yes.
>>>=20
>>> Come to think of it, take treesit-indent: it only updates the ranges =
for the current line. But the line's indentation usually depends on the =
previous buffer positions, doesn't it?
>> The range passed to treesit-update-ranges act as an intercepting =
range=E2=80=94we capture nodes that intercepts with the range and use =
them to update ranges. If the line to be indented is in an embedded =
language block, the whole block will be captured and it=E2=80=99s range =
will be given to the embedded language parser.
>> We haven=E2=80=99t have any problem so far mainly because most =
embedded code blocks are local,  and it=E2=80=99s rare for some edit to =
take place far from the visible portion which affects ranges and user =
expects that edit to affect the current visible range.
>> I don=E2=80=99t have any great idea for a better way to update ranges =
right now. Let me think about that. In the meantime, I=E2=80=99ll push a =
temporary fix so V=E2=80=99s original problem can be solved.
>=20
> I was thinking (since considering the same problem in mmm-mode, =
actually) that it would make sense to either plug into =
syntax-propertize-function, or have a parallel data structure similarly =
tracking the outdated buffer regions, which would only update the part =
of the buffer which had been modified since last time.
>=20
> Dealing with the "remainder" of the buffer might be trickier, but =
maybe some heuristic which would help detect the "no changes" case could =
be implemented.

Yeah, something similar to syntax-ppss or jit-lock. Or maybe it can be =
avoided, since the current on-demand range update has been working fine, =
until we added treesit--pre-redisplay for syntax-ppss.

Yuan=