From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Wed, 20 Mar 2024 23:39:19 -0700 Message-ID: <1589C0CB-D9CB-493A-A13C-544F7153081C@gmail.com> References: <5991618.MhkbZ0Pkbq@fedora> <93F7DE13-0EC7-4A17-89B1-E07C99C6347B@gmail.com> <47F1243E-0515-418D-96B9-4D3FE3CC4BBC@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14068"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Vincenzo Pupillo , "Ergus via Emacs development discussions." , Eli Zaretskii To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Mar 21 07:40:35 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rnC6M-0003RY-Ud for ged-emacs-devel@m.gmane-mx.org; Thu, 21 Mar 2024 07:40:35 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rnC5Q-0001zR-7x; Thu, 21 Mar 2024 02:39:36 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rnC5O-0001zJ-Gt for emacs-devel@gnu.org; Thu, 21 Mar 2024 02:39:34 -0400 Original-Received: from mail-pj1-x102d.google.com ([2607:f8b0:4864:20::102d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rnC5M-0002b0-Q7; Thu, 21 Mar 2024 02:39:34 -0400 Original-Received: by mail-pj1-x102d.google.com with SMTP id 98e67ed59e1d1-29c71c6e20cso497812a91.2; Wed, 20 Mar 2024 23:39:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711003171; x=1711607971; darn=gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3YvExv0qkO1sErW9Hdamk/j8OpAM5cL57AGAbXv87FA=; b=hKTYkYg7ukAeJi2E1YRnF/G4VEDkCXnOm3iMO+Bx5eRi4kyfhMm7DnP+43ZYnapq33 b7UL6cwje/Uni9eQMEx2qrPm4YSuYiJS5Vl8wnwshzuNuErzsHqCelLmYaSMioE9I2ET TtnGDRi4ikZCxxfcnrA9z7TxL5bz627mLFJOts4E2+db15cg2GYfDxHt0gpMbDBB2kf6 e4ZB6n5ht3bsA9YkwIxL2Yai9V5rUZKZQ1Bt4OcGDsAwaWd/YBwx1/5tR7+7nH7CbD5e Jhw+lUsDGg1BbqpU/1bC7AxOBN8hIC3rKeDZCfJFTcbyxkUhqIP/6CK9IziqJvXdWtFs 0tcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711003171; x=1711607971; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3YvExv0qkO1sErW9Hdamk/j8OpAM5cL57AGAbXv87FA=; b=A35WtabQXwjJe5gnp5y5xAD+O9b+ghJbmRwrnxZvoWOK70tzb8/oRi1VCUH0py8uc8 N3WAPUFpS9DTbitP8QmAeru3gD57+mY9aTKnrRh8z8vlkUFxNq2VfIiQX/d9KEnOvtrg A8ByK83qPlxoL7YaI4JDvWq7PAmujEASyTJCyoOCgFiC/8IwxS/kb2OGeeRarqsOEui6 tjbEz5sYe62uDGn8k5uXrZ4V1hBDijmlNW+3z0Mxoff6PjVoLjl+BzsX6ASp1fzklOQw vsPyfkzpCpBmQwcICQ51MFqLrxexQPDrXQMhNP/1DIV+KNv1YIHoSQwbvJ37ir9cmw+Q /L/A== X-Forwarded-Encrypted: i=1; AJvYcCVpSPVfF6Gh0Yekp1kerhOoTUL+B32521jy/4rNRlmZ1HSpu3HqGsmXaN9Au9wg/lOpBkLwws2F3YgFJ3WaQyJolq/VasiXH6+0HaJ6J7Z0ofo= X-Gm-Message-State: AOJu0Yy1gkZdwhr4j8sQIR/vDC8J0WYaFteF/xA1MEP1VyMoU/iGJ2o0 a/NbeMo0OA/Qi7mCOI5JsoLYAeEdPlEUAx6dsS9UtlzpYx1PdT0x X-Google-Smtp-Source: AGHT+IHQwcwbIr7PtVdZw9mZlv+Gps8PmjVl5oQvpVxUvRuoXdNfqBSUoYrlbo2BDoYv0sRNQcZxqg== X-Received: by 2002:a17:90b:1209:b0:29c:213c:b135 with SMTP id gl9-20020a17090b120900b0029c213cb135mr1131523pjb.44.1711003170714; Wed, 20 Mar 2024 23:39:30 -0700 (PDT) Original-Received: from smtpclient.apple ([2601:641:300:4910:bce4:3aef:cc71:3db1]) by smtp.gmail.com with ESMTPSA id d23-20020a17090ad3d700b0029c2a7e8338sm2800770pjw.31.2024.03.20.23.39.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Mar 2024 23:39:30 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3731.700.6) Received-SPF: pass client-ip=2607:f8b0:4864:20::102d; envelope-from=casouri@gmail.com; helo=mail-pj1-x102d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317221 Archived-At: > On Feb 18, 2024, at 9:53 PM, Yuan Fu wrote: >=20 >=20 >=20 >> On Feb 17, 2024, at 7:37 PM, Dmitry Gutov wrote: >>=20 >> On 13/02/2024 10:08, Yuan Fu wrote: >>=20 >>>> On 12/02/2024 06:16, Yuan Fu wrote: >>>>> Thanks, the culprit is the call to treesit-update-ranges in >>>>> treesit--pre-redisplay, where we don=E2=80=99t pass it any = specific range, so it >>>>> updates the range for the whole buffer. Eli, is there any way to = get a >>>>> rough estimate the range that redisplay is refreshing? Do you = think >>>>> something like this would work? >>>>=20 >>>> If we don't update the ranges outside of some interval surrounding = the window, what does that mean for correctness? >>> If the place of update and the embedded code currently in view = belong to the same node in the host language, then when we update ranges = for the current window-visible range, the whole node=E2=80=99s range is = updated. So at least for this node, the range is correct. >>> If the place of update and the embedded code currently in view = belong to different nodes in the host language, then when we update = ranges for the current window-visible range, only the visible node=E2=80=99= s range is updated. >>=20 >> Okay. What about positions after the visible part of the buffer? Can = their ranges be outdated? It's probably okay when the ranges are only = used for font-lock and syntax-ppss, but I wonder about possible other = applications (reindenting the whole buffer, for example). >=20 > It=E2=80=99s the same as positions before the visible part. For = reindenting the whole buffer, treesit-indent-region will update the = range for the whole buffer at the very beginning. >=20 >>=20 >>>>=20 >>>> Perhaps the mode has a syntax-propertize-function which behaves = differently (as it should) depending on the language at point. Or = different ranges have different syntax tables, something like that. >>>>=20 >>>> If the ranges, after some edit (perhaps a programmatic one, = performed far from the visible area), are kept not update somewhere = around the beginning of the buffer, do we not risk confusing the = syntax-ppss parser, for example? >>> That can happen, yes. >>>>=20 >>>> Come to think of it, take treesit-indent: it only updates the = ranges for the current line. But the line's indentation usually depends = on the previous buffer positions, doesn't it? >>> The range passed to treesit-update-ranges act as an intercepting = range=E2=80=94we capture nodes that intercepts with the range and use = them to update ranges. If the line to be indented is in an embedded = language block, the whole block will be captured and it=E2=80=99s range = will be given to the embedded language parser. >>> We haven=E2=80=99t have any problem so far mainly because most = embedded code blocks are local, and it=E2=80=99s rare for some edit to = take place far from the visible portion which affects ranges and user = expects that edit to affect the current visible range. >>> I don=E2=80=99t have any great idea for a better way to update = ranges right now. Let me think about that. In the meantime, I=E2=80=99ll = push a temporary fix so V=E2=80=99s original problem can be solved. >>=20 >> I was thinking (since considering the same problem in mmm-mode, = actually) that it would make sense to either plug into = syntax-propertize-function, or have a parallel data structure similarly = tracking the outdated buffer regions, which would only update the part = of the buffer which had been modified since last time. >>=20 >> Dealing with the "remainder" of the buffer might be trickier, but = maybe some heuristic which would help detect the "no changes" case could = be implemented. >=20 > Yeah, something similar to syntax-ppss or jit-lock. Or maybe it can be = avoided, since the current on-demand range update has been working fine, = until we added treesit--pre-redisplay for syntax-ppss. This is actually a bit involved, because there could be multiple = layer=E2=80=99s of parsers: the host language sets range for a local = parser, and the local parser can set ranges for a nested-nested parser. = Eg, we might have a markdown parser for parsing doc-comments, and inside = the markdown there could be code blocks which require another level of = nested parser. This use-case is a bit advanced but we definitely need to support it in = our design. And my brain is twisted by all the dependency and range. If = you guys has some ideas they=E2=80=99ll be most welcome :-) Yuan