From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Vincenzo Pupillo Newsgroups: gmane.emacs.devel Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file Date: Tue, 13 Feb 2024 10:39:10 +0100 Message-ID: <3442019.5fSG56mABF@3-191.divsi.unimi.it> References: <5991618.MhkbZ0Pkbq@fedora> <864jedsrjt.fsf@gnu.org> <2F0B4B85-5EAB-4285-BB6B-6CAF24EB96C3@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2168"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Eli Zaretskii , Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Feb 13 10:40:10 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rZpGs-0000Ma-TC for ged-emacs-devel@m.gmane-mx.org; Tue, 13 Feb 2024 10:40:10 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rZpG5-0000Jx-86; Tue, 13 Feb 2024 04:39:21 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rZpG4-0000Ie-Hz for emacs-devel@gnu.org; Tue, 13 Feb 2024 04:39:20 -0500 Original-Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rZpFz-0008E0-Ed; Tue, 13 Feb 2024 04:39:20 -0500 Original-Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-560037b6975so4604523a12.2; Tue, 13 Feb 2024 01:39:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707817152; x=1708421952; darn=gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vf3+zS8X6MLwrdmYkkoRCsUyG0cziiZfmoW7c1eFRUk=; b=IyAquEH5AE7sbspx05MgQAmfQn1SY3TW/Xd7fhIabqOXGxVzSMpdH622XSctv7zMiK 05fKfH+Yi35/sMc0eaGB+Ff8eNtgT5UagKAN8LiAcyB9TuiLyvBfAVuyB8L6U336bwhV 9lvmibfJF5WH/TKGXbhN0ohqZ9RCoN7v0WBawreeSsjbTZFLYxMVh3MwlezropH9MWiA oZJiYW3bmzPgyIRliH74l2Ni6W83lH4Gq2qMPEdBafddOck8N5eFnELFve5oIIBEw4wp rTZJfJr0Hq3xHf6lPnSOE3w8JKQTceGvCBpfnzNFPN0RGhGNS24l1XiWPP+4agzyVzXh 7axw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707817152; x=1708421952; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vf3+zS8X6MLwrdmYkkoRCsUyG0cziiZfmoW7c1eFRUk=; b=SCHIrSg7CuLo4PNmw0VhTqdTtykfqIx2RSRsmR0rpS0Q+n7HnHOEHM2eyDRTj1JmNl ouSTd9SWzZ1Q51l7x2K6ucrWYUZjYSNSa6i14QFEoxgmvplaCpVT9N/i9r4kTRT7lmr6 VzBJkAHSBG1efsv8qc1btd40G0Z/QVyrjjCMp3tLZvd8miqltg9FMASK3lhvmxZq2FcK ggTooykJQo0DLp1kpNj813Gw7iRrJ1Dsdz34cCfOVNnvORdHs5IOEZzeDGQ7NvcFvNzU S9zvGv7J4/AyVMDSCdwxZ+7HS97aTGa82SsA3WgZUj7EBsR0WwWNobog0DUXpyt7qDrd aMow== X-Gm-Message-State: AOJu0YyOMbyV2ogTwCyvexx+5fARSyPBNeEARscFt9A4olvwvBNCksxM YHFXji/ZWXFZkoNb0BYvcpjNf8LfkKYYJBdJnP1zztmG2vGZfjVeJ0mgwLH8ynU= X-Google-Smtp-Source: AGHT+IHRAJVL4ZBYzfPbxOoWHyKHLIukMoOCogNVLoF6zd0qumQkX254bpDmDJPG0QGpE9VTnIK/uA== X-Received: by 2002:aa7:da18:0:b0:55f:30dc:a72f with SMTP id r24-20020aa7da18000000b0055f30dca72fmr7132887eds.7.1707817152216; Tue, 13 Feb 2024 01:39:12 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCVqdUNuB81g48ck4a7nETO+RW1ZstDlDsuCNhlWr2uVs+gyYMMLsV5IIBlf97fd7iEN+wpdklSmnE7YSIIUOi85ZQ== Original-Received: from 3-191.divsi.unimi.it (3-191.divsi.unimi.it. [159.149.3.191]) by smtp.gmail.com with ESMTPSA id n16-20020a05640205d000b005616db210c1sm3301763edx.67.2024.02.13.01.39.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 01:39:11 -0800 (PST) In-Reply-To: <2F0B4B85-5EAB-4285-BB6B-6CAF24EB96C3@gmail.com> Received-SPF: pass client-ip=2a00:1450:4864:20::52d; envelope-from=v.pupillo@gmail.com; helo=mail-ed1-x52d.google.com X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:316168 Archived-At: I don't know if this is a stupid idea or not, but I'll try to explain it.=20 My other php-ts-mode (the one without a tree-sitter parser for php) does th= ese things: there is a "treesit-font-lock-rules" to capture a comment node, this rule calls a function that tries to figure out if it is a comment bloc= k in PHP. If it is a comment block, it uses=20 some regular expression for the font-locking, otherwise use treesit-fontify= =2Dwith-override for the entire comment. Treesit "knows" the intervals in the file to inject the embedded parser. Can this information be used for local embedded parsers? V. In data marted=C3=AC 13 febbraio 2024 09:15:49 CET, Yuan Fu ha scritto: >=20 > > On Feb 12, 2024, at 6:09 AM, Eli Zaretskii wrote: > >=20 > >> From: Yuan Fu > >> Date: Sun, 11 Feb 2024 20:16:11 -0800 > >> Cc: "Ergus via Emacs development discussions." , > >> Eli Zaretskii > >>=20 > >> Thanks, the culprit is the call to treesit-update-ranges in treesit--p= re-redisplay, where we don=E2=80=99t pass it any specific range, so it upda= tes the range for the whole buffer. Eli, is there any way to get a rough es= timate the range that redisplay is refreshing? Do you think something like = this would work? > >>=20 > >> (treesit-update-ranges > >> (max (point-min) (- (window-start) 1000)) ; BEG > >> (min (point-max) (+ (or (window-end) (+ (window-start) 4000)) 1000= ))) ; END > >>=20 > >> I guess the window-start would be outdated in pre-redisplay-function... > >=20 > > The problem is that window-start is not guaranteed to be up-to-date > > when pre-redisplay-function is called: the window-start is updated by > > redisplay, and pre-redisplay-function is called before the update. > > Moreover, pre-redisplay-function could be called either once or twice > > in a redisplay cycle, and window-start is up-to-date only for the > > second call. > >=20 > > The window-end point is basically never up-to-date during redisplay, > > only at its very end. > >=20 > > So my suggestion would be to define the range from position of point, > > using the window dimensions; see get_narrowed_width for ideas. This > > could lose if the buffer has a lot of invisible text, so I suggest to > > check for invisible properties, and if they are present in the buffer, > > punt and use the whole accessible portion of the buffer (I don't > > expect PHP buffers, or any buffers in programming-language modes, to > > have invisible text). >=20 > Ah, clever :-) Programming language buffers could have invisible text whe= n the user uses hideshow, or folded some section of code using outline-mino= r-mode :-( >=20 > But as I said in the reply to Dmitry, we might need some better design fo= r updating parser ranges than the current one. I=E2=80=99ll just fix V=E2= =80=99s problem for now by updating the range around point, and ignore invi= sible text for now. >=20 > Yuan >=20