From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Tue, 21 May 2024 22:51:02 -0700 Message-ID: <8E3466C4-0875-4187-ADC3-5C72FF23A24F@gmail.com> References: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> <0132CFC2-CFA0-4D58-9632-6E6E03FE57DB@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6.1.1\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15150"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "Ergus via Emacs development discussions." To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed May 22 07:52:10 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s9etT-0003fx-Dl for ged-emacs-devel@m.gmane-mx.org; Wed, 22 May 2024 07:52:07 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s9esl-0005Df-QX; Wed, 22 May 2024 01:51:23 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9esg-0005DP-6q for emacs-devel@gnu.org; Wed, 22 May 2024 01:51:19 -0400 Original-Received: from mail-pj1-x1034.google.com ([2607:f8b0:4864:20::1034]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s9ese-0000J3-9Z for emacs-devel@gnu.org; Wed, 22 May 2024 01:51:17 -0400 Original-Received: by mail-pj1-x1034.google.com with SMTP id 98e67ed59e1d1-2bd9061eac8so831593a91.3 for ; Tue, 21 May 2024 22:51:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716357074; x=1716961874; darn=gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=D4enYIuU6y7oD55qUKUpRUj5LxRrDhWw7tj8RDdaNvY=; b=XQw0AESf9X40rsby57epSR1xf+YWwmqsU3JTVKent9zLaF5z4E/PHM3grvB3fVIiy3 iwTjF3gYXIsGRWOBEye7G5wPiPgEG/UQ0buotPUp0NvknIlCRo789ZUUPNckBuTQXKh/ Oj6qy4Aj0jXrPV8lyJ+nYXrd+vaT3RJSNh2NaPQYF5Cr+g3xoAOfActDGOppkX27t4rr n/bjjViNqnv5IKonGjRr3GFAZrXGQh/J3IOHnNW3stsiIsKCd1nugUdimpXU7DQjlHyV V6v0YqZuiNLiQdTwPHHB6tUD1TYVnJGx8dEfNSaiK4IuvjLblatRfL2i37HncisZUCtN ukZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716357074; x=1716961874; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D4enYIuU6y7oD55qUKUpRUj5LxRrDhWw7tj8RDdaNvY=; b=lzYtUPqX9Mh92NY4VbfbLcrMe5nD4XqEDw2eFptmVB86FVX7VkBJnkgusE7MpnSmxe 9vDvfIOXCMARC+AOQCVENpl8Xc5YpxuNw3ebPo60cqIN0Ay0ffQnNXsBKblygi/3n2aP X5yYU7tKMSriyR9Suh0ejRLv39/M4NEdGG91wJckJUB5NET9HOH6zhpm3hqjpKw9USWd JlZ3+rReq3GGQzQbV3ED2Q30zxtfQE+UU73d9UxPvAkVEWc3ZyQEAHvdHdQUy4FaqRbZ XxukJqFoQTy313Yzf3JtfDkqzLgIJLgjhzZWLHPNLktWTAWkh5Ru2MOLqKW8UR1NzTCC uycA== X-Gm-Message-State: AOJu0YzKHsK/PVnbZsDYbgu/hs1kID1UxIezNGrdYyhwFL0rEumopzRP O+kjuPqh3xeNGYkrHReHRwyUdFuD0HbQo8EU4XTs5BqK6ep81GcKofrmpA== X-Google-Smtp-Source: AGHT+IFv07/Hh37e7dEYhcpyP/yWa4rrARNv9S7CSQmxKUMeQtcmm29bOmifhvxi5Y6krG9miXoOfQ== X-Received: by 2002:a17:90a:4402:b0:2bd:82c0:8724 with SMTP id 98e67ed59e1d1-2bd9f45fc08mr1678107a91.15.1716357073740; Tue, 21 May 2024 22:51:13 -0700 (PDT) Original-Received: from smtpclient.apple ([2601:641:300:4910:d47e:183a:1bb4:b282]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2b671782d89sm22746111a91.55.2024.05.21.22.51.12 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 May 2024 22:51:13 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3731.700.6.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::1034; envelope-from=casouri@gmail.com; helo=mail-pj1-x1034.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319471 Archived-At: > On May 12, 2024, at 4:44 PM, Dmitry Gutov wrote: >=20 > On 09/05/2024 03:16, Yuan Fu wrote: >=20 >>> Is it possible that there would occur multiple changes and reparses = between some firings of the above hooks? For example, some new feature = might go over the buffer's text with an automated multi-step = transformation, calling the parser (but not syntax-ppss) on each step. >>> In such a scenario it seems treesit--pre-redisplay might miss = intermediate range updates. Would that be okay? >> I think you=E2=80=99re right. The chance of it actually go wrong will = be slim, but anything that=E2=80=99s possible to go wrong will = eventually go wrong. >=20 > Thanks for confirming the concern. >=20 >> The remaining question is how. I=E2=80=99m thinking of keeping a = history of updated ranges, each marked with the parser timestamp. The = parser timestamp is already there, it=E2=80=99s incremented every time = the parser reparses. And treesit-parser-changed-ranges will return the = timestamp along with the updated ranges. Then in the next iteration, the = consumer can pass the last timestamp to treesit-parser-changed-ranges, = which tells it to return all the changed ranges since that timestamp. >> The only problem is to decide how long a history of updated ranges do = we keep for each parser. The 100% correct approach is to maintain a = separate history for each consumer, and never throw away old ranges = until the consumer consumes them. But then you risk wasting memory if = some consumer never consumes the ranges. To handle that we can add a = hard limit. But then this hard limit might be too low for some edge = case=E2=80=A6 We can make this hard limit configurable, and if we ever = encountered a case where this hard limit is not enough and there=E2=80=99s= no way around it (unlikely), we can instruct users or lisp program to = increase it. >=20 > That could work. Although it's hard for me to imagine how far back the = history would have to be stored, and would that have any practical = consequences for Emacs's memory use. Maybe not. >=20 > The approach I was thinking of is in different direction: we take a = step back, remove (or stop using at least) the new function, and go back = to the idea of subscribing to parsers' after-change notifiers. The = improvement in commit f62c1b4cd00 seems to stem from relying foremost on = changes ranges in the primary parser. Okay - we re-add the listener for = the primary parser only. >=20 > This listener would be specific for a particular consumer. In our = case, we'd have a listener which would populate - and then update - the = variable used by treesit--pre-redisplay. That variable would store the = "up to date" list of updated ranges. The listener, on every call, would = "merge" its current value one with the new list of ranges (*). = treesit--pre-redisplay would use the data in that data structure instead = of calling treesit-parser-changed-ranges, and set the value to nil to = "reset" it for the next update. >=20 > (*) So real "merging" would only need to be performed when listener = fires 2+ times between the two adjacent treesit--pre-redisplay calls. = Otherwise the current value is nil, so the the new list is simply = assigned to the variable. Anyway, the merging logic seems to be the = trickiest part in this scheme (managing and interpreting offsets), but = it should be very similar in both approaches. I agree. The usefulness of treesit-parser-changed-ranges aren=E2=80=99t = really justified at this point (well, except that it makes the = caller=E2=80=99s code much easier to follow). Let me implement what you = described and let=E2=80=99s see how it goes. I think we don=E2=80=99t = even need to merge the ranges (which will be prone to bugs if I were to = write it ;-), we can just push the new ranges to a list and later = process them one by one. Yuan=