From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Wed, 8 May 2024 17:16:39 -0700 Message-ID: <0132CFC2-CFA0-4D58-9632-6E6E03FE57DB@gmail.com> References: <2DB11528-C657-4AC1-A143-A13B1EAC897A@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6.1.1\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="33863"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "Ergus via Emacs development discussions." To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu May 09 02:17:15 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s4rTH-0008kU-4u for ged-emacs-devel@m.gmane-mx.org; Thu, 09 May 2024 02:17:15 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s4rSz-0005fv-CW; Wed, 08 May 2024 20:16:57 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s4rSx-0005er-GB for emacs-devel@gnu.org; Wed, 08 May 2024 20:16:55 -0400 Original-Received: from mail-pg1-x52e.google.com ([2607:f8b0:4864:20::52e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s4rSu-00014T-Pw for emacs-devel@gnu.org; Wed, 08 May 2024 20:16:54 -0400 Original-Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-6001399f22bso263154a12.0 for ; Wed, 08 May 2024 17:16:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715213810; x=1715818610; darn=gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iZq0j0sPYZDYQouSAdlZzhmtZj8A974OpyPZi0PJvJ4=; b=F7NLOFvqYWnGYZHntwFtarjQq8/puOhdz+V2TKaUvfXYZtqlHZU9ETnluzvNH2hcJn y/160obTJ8eGi+fgbBNtREFzSY3oh4em2RfwHy2NizFll7DWJBJ+hN+L/8MKJTXI0MDd N0irJZXsygBaPUttSea/9OKqDunQBi9jYkR514rrx5fD9QiM+4mU5uI1Ed1QTlX9LjgE H4qdE2KjQX0sGWwFsHYRhI2vhKFKxE/sCIZkT7BDavA5bbGcDfbrNM/STDENNDBErNb+ 2oXk+JzDBzyzo+QUpHc/rn4wk8PBWJqTjx/2rwa9d3GnGo7c8IGlGuOLaXWz/UtwWhqJ Pk8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715213810; x=1715818610; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iZq0j0sPYZDYQouSAdlZzhmtZj8A974OpyPZi0PJvJ4=; b=O/gd7Mw1SQYS3gmugzL3cBOsWGp8qiti7MQTewHl0xlRlBHBpg6SPSgji7HmZ1YG0J qfKBAC1kl0+0xmO+ZAkU0c3BXENNAqa67ZwbAZYej3nO7CqNoWFjbQdCJHfEPTgxISQi q3R+sLr+3taUWXqT/j/zDvONrfM9cGsk3L+IlKnU05PSxGBhXHtnPDFbIpwHaODhV3wM X6cvvWA0bcioOHbjydVW1e7J4mRFzS5nn1CvnHWQ4h3xhdbtK8yGYteXNbI55y6n0IOd h6rEVDMLlQCOV1e1uVdD3OdEGpZKrPsFvLYwFhUPwE/zr+G8HUrrUF6nQ7NCjD7SqppK BAUA== X-Gm-Message-State: AOJu0YxwFb+KXXb0EXeTKFgLTgQjk0m3hXWGshVRdRFPW7DokDZu/SX3 biTyz9+G0MHKFTIx+mdLOWrT5mlxMgmK58GMBpf1ETANnFtqndYC3ILdLw== X-Google-Smtp-Source: AGHT+IEnGZvdPSmLsXfo2NoUqdO95u+dl0jwr1bDbuhUfeABLHp2DeXVda/6roKERRW1cyQmDsE+TQ== X-Received: by 2002:a17:902:e810:b0:1ec:2274:8205 with SMTP id d9443c01a7336-1eeb09ae4e0mr52857335ad.48.1715213810191; Wed, 08 May 2024 17:16:50 -0700 (PDT) Original-Received: from smtpclient.apple ([2601:641:300:4910:38af:f15:5ce3:fab2]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1ef0b9d4278sm1371095ad.12.2024.05.08.17.16.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 May 2024 17:16:49 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3731.700.6.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::52e; envelope-from=casouri@gmail.com; helo=mail-pg1-x52e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319029 Archived-At: > On May 5, 2024, at 7:04 PM, Dmitry Gutov wrote: >=20 > Hi Yuan, >=20 > Sorry if I'm being too pedantic here. >=20 > On 20/04/2024 05:18, Yuan Fu wrote: >=20 >> I believe I=E2=80=99ve found a good way to solve this problem. I = pushed the changes to master. >> Basically I added a function treesit-parser-changed-ranges that can = directly return the change ranges from last reparse. This means we = don=E2=80=99t need to use notifiers to get those change ranges anymore. = Then in treesit-pre-redisplay, we reparse the primary parser and get the = changed ranges from it. >> Once we have the changed ranges, we update other non-primary = parser=E2=80=99s ranges, but only within the changed ranges. Originally = we were updating those parser=E2=80=99s ranges on the whole buffer, = which led to the slowdown. Then we had to use some workaround to solve = this. Now the workaround isn=E2=80=99t needed anymore. >=20 > The essence of the change (querying fewer ranges) looks good. >=20 > I'm a bit uneasy about the new function and how it's supposed to be = used. treesit-parser-changed-ranges returns the ranges changes during = the last reparse. That seems to imply that all of its callers must have = the up-to-date information about the state of the buffer before that = reparse, and thus basically follow the parser's updates through some = mechanism. The implementation also saves some information during every = reparse, whether somebody is going to call treesit-parser-changed-ranges = or not. >=20 > To take our new code as an example, the only client of = treesit-parser-changed-ranges now is treesit--pre-redisplay, which is = called from syntax-propertize-extend-region-functions and = pre-redisplay-functions. >=20 > Is it possible that there would occur multiple changes and reparses = between some firings of the above hooks? For example, some new feature = might go over the buffer's text with an automated multi-step = transformation, calling the parser (but not syntax-ppss) on each step. > In such a scenario it seems treesit--pre-redisplay might miss = intermediate range updates. Would that be okay? I think you=E2=80=99re right. The chance of it actually go wrong will be = slim, but anything that=E2=80=99s possible to go wrong will eventually = go wrong.=20 The remaining question is how. I=E2=80=99m thinking of keeping a history = of updated ranges, each marked with the parser timestamp. The parser = timestamp is already there, it=E2=80=99s incremented every time the = parser reparses. And treesit-parser-changed-ranges will return the = timestamp along with the updated ranges. Then in the next iteration, the = consumer can pass the last timestamp to treesit-parser-changed-ranges, = which tells it to return all the changed ranges since that timestamp. The only problem is to decide how long a history of updated ranges do we = keep for each parser. The 100% correct approach is to maintain a = separate history for each consumer, and never throw away old ranges = until the consumer consumes them. But then you risk wasting memory if = some consumer never consumes the ranges. To handle that we can add a = hard limit. But then this hard limit might be too low for some edge = case=E2=80=A6 We can make this hard limit configurable, and if we ever = encountered a case where this hard limit is not enough and there=E2=80=99s= no way around it (unlikely), we can instruct users or lisp program to = increase it. Yuan=