From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Sun, 18 Feb 2024 21:53:45 -0800 Message-ID: References: <5991618.MhkbZ0Pkbq@fedora> <93F7DE13-0EC7-4A17-89B1-E07C99C6347B@gmail.com> <47F1243E-0515-418D-96B9-4D3FE3CC4BBC@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6215"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Vincenzo Pupillo , "Ergus via Emacs development discussions." , Eli Zaretskii To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Feb 19 06:54:53 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rbwc7-0001LP-FZ for ged-emacs-devel@m.gmane-mx.org; Mon, 19 Feb 2024 06:54:51 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rbwbN-0000yL-74; Mon, 19 Feb 2024 00:54:05 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rbwbJ-0000xh-Q9 for emacs-devel@gnu.org; Mon, 19 Feb 2024 00:54:01 -0500 Original-Received: from mail-pg1-x52b.google.com ([2607:f8b0:4864:20::52b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rbwbI-0000kh-4u; Mon, 19 Feb 2024 00:54:01 -0500 Original-Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-5ce07cf1e5dso3146941a12.2; Sun, 18 Feb 2024 21:53:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708322038; x=1708926838; darn=gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xJWVqeGpKrEgcAg4TeykkbYmbwkIQe1eRxn2IADPDd8=; b=FDNO1iR0SYq+F36YgLrse2yu3KglTSCUv/2Btva9j/1Azb7xm2A3Q12HMIDmbry/es KEPKuE/Yw6cYCht/NgiuFo+iDYH+zq2L25eguQuEtHUQEZ+HtbcMPXoRrhJz6b2gT+mk lCUftEgbMH9XbKwk2O3D81IHOFyEESP6iOQCVB/xqTSIp9mXQxV6YxNNtPilJRWiERn5 Iyarm8+k7UAJ7eVRHtyhXlNVRQUH1u30K11J9omG7OHulerHzbPglMMaKBhOO/khID6X jUj5ZBJIhuVvND1zJHg3Oe9m9xsMizX6oydN0jXgf8VRS2+riQDuENcYAI9l7INjSOpm xDlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708322038; x=1708926838; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xJWVqeGpKrEgcAg4TeykkbYmbwkIQe1eRxn2IADPDd8=; b=QfUkS2nvZGCX3va2mplj8RTgpe42Ofrosohrr/exjQgzINkY94TJpSEJKVmIDRR0GC CtAYYkIMHItkJnnHNhxid9x85fODshezPIc5YTvn/NEwbSLZFgOyhdNlxHzhkVhfvrqn /TlyqHj1CmOH7uXgDYBs05d3LEpc4k2Gmact7N/zfUXXRKeN3mz3Qm61X0GfP2BPdACQ p/FcUrc7dog82JW6X2xeKC279BG0TKUeplflN7AJjEGEwovAEW3dm9PmTb/ro+bG2/j/ 5nY4t20MahdEydTahw1jI4/+FWdnY/rVTNrCsafRx+gXaPuNiZdtPpDU1YQHc3mGf++U Gemg== X-Forwarded-Encrypted: i=1; AJvYcCVe7E3okCj1SP9dc7L7RWMyQ/avrXgBt4U4l3DvoJiaep0+XU7ryYXj9FutNDQcA1/ROwUiMFHCZVlWHiXxKcdEkP6Yl6LGZCfnpIc8BxmQq9E= X-Gm-Message-State: AOJu0YypeLaRI1Xw+6OnX64BqWiQS4e0Oi5YiTtfIEuVTsdn+9Ei9qty 8vpoRkX3IBXqzIaE6SiCYxhAfdKBZD2K2fBKuUOi2DRU26GLiCU9 X-Google-Smtp-Source: AGHT+IHMxfOfalYKKW5nqOufumojxgBIT3BYgNYw8GWnVeEog5nLm7cvHyK7Poz+aVUJLj+AdRsbdg== X-Received: by 2002:a05:6a21:9101:b0:19e:ccb2:fd80 with SMTP id tn1-20020a056a21910100b0019eccb2fd80mr10562759pzb.8.1708322037883; Sun, 18 Feb 2024 21:53:57 -0800 (PST) Original-Received: from smtpclient.apple (172-117-161-177.res.spectrum.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id e14-20020a170902cf4e00b001db523e58f6sm3524474plg.133.2024.02.18.21.53.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 18 Feb 2024 21:53:57 -0800 (PST) In-Reply-To: X-Mailer: Apple Mail (2.3731.700.6) Received-SPF: pass client-ip=2607:f8b0:4864:20::52b; envelope-from=casouri@gmail.com; helo=mail-pg1-x52b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:316338 Archived-At: > On Feb 17, 2024, at 7:37 PM, Dmitry Gutov wrote: >=20 > On 13/02/2024 10:08, Yuan Fu wrote: >=20 >>> On 12/02/2024 06:16, Yuan Fu wrote: >>>> Thanks, the culprit is the call to treesit-update-ranges in >>>> treesit--pre-redisplay, where we don=E2=80=99t pass it any specific = range, so it >>>> updates the range for the whole buffer. Eli, is there any way to = get a >>>> rough estimate the range that redisplay is refreshing? Do you think >>>> something like this would work? >>>=20 >>> If we don't update the ranges outside of some interval surrounding = the window, what does that mean for correctness? >> If the place of update and the embedded code currently in view belong = to the same node in the host language, then when we update ranges for = the current window-visible range, the whole node=E2=80=99s range is = updated. So at least for this node, the range is correct. >> If the place of update and the embedded code currently in view belong = to different nodes in the host language, then when we update ranges for = the current window-visible range, only the visible node=E2=80=99s range = is updated. >=20 > Okay. What about positions after the visible part of the buffer? Can = their ranges be outdated? It's probably okay when the ranges are only = used for font-lock and syntax-ppss, but I wonder about possible other = applications (reindenting the whole buffer, for example). It=E2=80=99s the same as positions before the visible part. For = reindenting the whole buffer, treesit-indent-region will update the = range for the whole buffer at the very beginning. >=20 >>>=20 >>> Perhaps the mode has a syntax-propertize-function which behaves = differently (as it should) depending on the language at point. Or = different ranges have different syntax tables, something like that. >>>=20 >>> If the ranges, after some edit (perhaps a programmatic one, = performed far from the visible area), are kept not update somewhere = around the beginning of the buffer, do we not risk confusing the = syntax-ppss parser, for example? >> That can happen, yes. >>>=20 >>> Come to think of it, take treesit-indent: it only updates the ranges = for the current line. But the line's indentation usually depends on the = previous buffer positions, doesn't it? >> The range passed to treesit-update-ranges act as an intercepting = range=E2=80=94we capture nodes that intercepts with the range and use = them to update ranges. If the line to be indented is in an embedded = language block, the whole block will be captured and it=E2=80=99s range = will be given to the embedded language parser. >> We haven=E2=80=99t have any problem so far mainly because most = embedded code blocks are local, and it=E2=80=99s rare for some edit to = take place far from the visible portion which affects ranges and user = expects that edit to affect the current visible range. >> I don=E2=80=99t have any great idea for a better way to update ranges = right now. Let me think about that. In the meantime, I=E2=80=99ll push a = temporary fix so V=E2=80=99s original problem can be solved. >=20 > I was thinking (since considering the same problem in mmm-mode, = actually) that it would make sense to either plug into = syntax-propertize-function, or have a parallel data structure similarly = tracking the outdated buffer regions, which would only update the part = of the buffer which had been modified since last time. >=20 > Dealing with the "remainder" of the buffer might be trickier, but = maybe some heuristic which would help detect the "no changes" case could = be implemented. Yeah, something similar to syntax-ppss or jit-lock. Or maybe it can be = avoided, since the current on-demand range update has been working fine, = until we added treesit--pre-redisplay for syntax-ppss. Yuan=