From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.bugs Subject: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file Date: Sun, 20 Nov 2022 12:59:42 -0800 Message-ID: References: <83v8n94ij9.fsf@gnu.org> <87k03pwgf6.fsf@thornhill.no> <83h6yt4c12.fsf@gnu.org> <87v8n9qscd.fsf@thornhill.no> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21995"; mail-complaints-to="usenet@ciao.gmane.io" Cc: eliz@gnu.org, 59415@debbugs.gnu.org To: Theodor Thornhill Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Nov 20 22:00:19 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1owrQI-0005Tx-Su for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 20 Nov 2022 22:00:18 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1owrQ4-0006f8-HJ; Sun, 20 Nov 2022 16:00:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1owrQ2-0006ew-TP for bug-gnu-emacs@gnu.org; Sun, 20 Nov 2022 16:00:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1owrQ2-0004sY-K7 for bug-gnu-emacs@gnu.org; Sun, 20 Nov 2022 16:00:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1owrQ2-0001op-9L for bug-gnu-emacs@gnu.org; Sun, 20 Nov 2022 16:00:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 20 Nov 2022 21:00:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 59415 X-GNU-PR-Package: emacs X-Debbugs-Original-Cc: Eli Zaretskii , Bug Report Emacs Original-Received: via spool by submit@debbugs.gnu.org id=B.16689779916950 (code B ref -1); Sun, 20 Nov 2022 21:00:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 20 Nov 2022 20:59:51 +0000 Original-Received: from localhost ([127.0.0.1]:44759 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1owrPr-0001o1-7l for submit@debbugs.gnu.org; Sun, 20 Nov 2022 15:59:51 -0500 Original-Received: from lists.gnu.org ([209.51.188.17]:43284) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1owrPo-0001ns-2u for submit@debbugs.gnu.org; Sun, 20 Nov 2022 15:59:50 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1owrPn-0006dB-Tb for bug-gnu-emacs@gnu.org; Sun, 20 Nov 2022 15:59:47 -0500 Original-Received: from mail-pg1-x531.google.com ([2607:f8b0:4864:20::531]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1owrPl-0004qz-WF; Sun, 20 Nov 2022 15:59:47 -0500 Original-Received: by mail-pg1-x531.google.com with SMTP id f3so9502572pgc.2; Sun, 20 Nov 2022 12:59:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xoZViityy/fvF3K79xYrX1DL/IMApUMtmCvLhVbP024=; b=gXDrp6/yTcYRMKBUS/fXpCvGPA4IxDl9gqFK1hPmjhdrM0xYYeOpNGl1ViP3wgi9dE dAJWVKSp7mUYiQ/+Ssqqceh+LQ4kjfttittZMkbIxV9cYheZ38wkepQbQbQRFJARFKeO G6PHT+mU76OalUpC49wC4pWvr0UNsQ9PIYJnusBMPus3mSEhMlI1UpK+wyaGCD/3tMRu vMAp5PJmW4nrUd+9HbkJ54Bu96os80Mo+c3rJddKVVGWldz3KMkPYEVzcjDGsxA3N3s+ kWbIjc1wOeWXOZSSryCWR8DC5LXIRWi+mE36MdBTX3I1h/z6jVd2B6E47UsAzAHEA8bp 4D9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xoZViityy/fvF3K79xYrX1DL/IMApUMtmCvLhVbP024=; b=xfNq+C+mRxWXc2R3UbxDHSTabWNkYBPjxVfRJ1Jfnz6nSNscn7JPszH5WE7MVhXU81 uUi3wPiKj/cIw/MJP2nkUi2WouKgJvS7vWLDsMTKqLRaLouMJs2LfZoI9RPPLc53tM09 NP0avBmkigm3POQS9evNNE+YdJo6zdiiUtnqSc0aAjIIAFa/OilTVF8p3HDGCNWeknFc UP/VLL9kO1F443ONAxQ/7Ob9LbOAk9E9tzOlWRV4dHmgJh+dwxhv6YGtDeFJfMOXtSG7 JQtPn1tvV2DNn3TPX3l2+OFku3tJ/adu5F+gJyvDD6ek7ju2f5DLAPUTTnZV1yKlKPtD NlFw== X-Gm-Message-State: ANoB5plCYTCSWVw9ijwgL8t47Q6AydXcJ/eDSGXhdq0FA5rfIziaxmTb i5zKQcfHUQ1PqvIIuCMWO98= X-Google-Smtp-Source: AA0mqf5pitLNyhl60Hs6SE94ODhAyIFTNmzNCRykgHDxvUVPdEHzPOdH9Scav1Mp0BDTilIFWvdRrw== X-Received: by 2002:a63:2160:0:b0:46f:f26e:e8ba with SMTP id s32-20020a632160000000b0046ff26ee8bamr15512017pgm.250.1668977984002; Sun, 20 Nov 2022 12:59:44 -0800 (PST) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id fh18-20020a17090b035200b00218a4795b0dsm1920057pjb.34.2022.11.20.12.59.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 20 Nov 2022 12:59:43 -0800 (PST) In-Reply-To: <87v8n9qscd.fsf@thornhill.no> X-Mailer: Apple Mail (2.3696.120.41.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::531; envelope-from=casouri@gmail.com; helo=mail-pg1-x531.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:248476 Archived-At: > On Nov 20, 2022, at 12:33 PM, Theodor Thornhill = wrote: >=20 > Eli Zaretskii writes: >=20 >>> From: Theodor Thornhill >>> Cc: Yuan Fu >>> Date: Sun, 20 Nov 2022 20:54:05 +0100 >>>=20 >>>> Observe that fontifications stop at this line for some reason. >>>> Fontification reappears on line 209271. Maybe it's because of the = many >>>> braces that appear in warning face? Why does TS think there are = syntax >>>> errors here? The C++ TS parser doesn't have that problem, btw. >>>=20 >>> It seems the c parser definitely can't handle what it's seeing. >>=20 >> Yes, but do you have any clue why it gives up at that line? >>=20 >=20 > No, not yet. Because the whole thing is contained in an ERROR node. I wasn=E2=80=99t = covered in error face because our rule for error doesn=E2=80=99t = =E2=80=9Coverride=E2=80=9D: if there are existing faces in the range, = the error face isn=E2=80=99t applied. If I change the rule fontifying = errors to override, everything is in error face. Alternatively, if you = disable fontifying errors, like this: (add-hook 'c-ts-mode-hook #'c-ts-setup) (defun c-ts-setup () (treesit-font-lock-recompute-features nil '(error))) >=20 >=20 >> One thing that I see is that many braces around there are shown in = warning >> face, so perhaps the parser is overwhelmed by the amount of parsing = errors? >>=20 >=20 > Yeah that's my first guess, but that shouldn't be an issue, it should = be > able to font-lock _something_. Yeah, see above. >=20 >>>> P.S. Btw, isn't the treesit-max-buffer-size limit too low? 4 MiB? >>>=20 >>> It might be! IIRC treesit uses 10x the buffer size to store the = ast, so >>> it'll be some more memory usage. >>=20 >> After lifting the limit to allow visiting the file, this file causes = Emacs >> to go up to 350 MiB. Which is significant, but definitely not = outrageous >> enough to prevent using TS with this file. And I'm sure "normal" C = files >> (as opposed to ones written by a program) will need less memory. So = 4 MiB >> sounds too restrictive to me. We should maybe increase that to 15 = MiB on >> 32-bit systems and say 40 MiB on 64-bit? >>=20 >=20 > I think it should probably be the same as in the C level, as I = mentioned > in the other mail? 4GB is the absolute upper limit, but the practical maximum size if well = below that. Thought 4MB might be too conservative. >=20 >>> I'll do some more digging, but in the >>> meantime I attach this profiler report that shows font-locking as = the >>> culprit: >>=20 >> Culprit for what? For slow performance? >=20 > Yeah. >=20 >> Don't get me wrong: from my POV, TS works here better than CC Mode, = in >> many use cases which are much more important than scrolling through >> the entire humongous file top to bottom. For example, just visiting >> the file takes 3 times as much with CC Mode as with c-ts-mode; going >> to EOB with CC Mode takes more 1 min 20 sec, whereas TS does it in = 2.5 >> sec. And likewise jumping into a random point in the file. Instead >> of Alan's 150 sec for a full scroll by CC Mode I get 27 min. The >> number of GC cycles with CC Mode is 10 times as large as with TS. >> (Caveat: my Emacs is built without optimizations, whereas Tree-sitter >> and the language support libraries are, of course, fully optimized.) >>=20 >=20 > Ok, that's good to know! >=20 >>> In this profile I followed your repro, and did some more movement = around >>> the buffer after. This isn't from emacs -Q, but I believe the = results >>> will be just the same, considering where the slowness seems to be >>>=20 >>>=20 >>> 16695 85% - redisplay_internal (C function) >>> 16695 85% - jit-lock-function >>> 16695 85% - jit-lock-fontify-now >>> 16695 85% - jit-lock--run-functions >>> 16695 85% - run-hook-wrapped >>> 16695 85% - # >>> 16695 85% - font-lock-fontify-region >>> 16695 85% - font-lock-default-fontify-region >>> 16679 84% - treesit-font-lock-fontify-region >>=20 >> Yes, treesit-font-lock-fontify-region takes the lion's share. If you = or >> Yuan can speed this up, please do. But I see no reason to consider = this a >> catastrophe, quite to the contrary. >=20 > I think it boils down to getting the root too many times. In an > unmodified buffer I think getting the root node should be instant, and > it seems to take some time. I'll try to figure out why. Getting root is trivial, the bulk of the time is spent in query-capture Running the following in that file gives me 1.87 seconds, while in a = smaller file it only takes 0.00016. (benchmark-run 100 (let ((query (caar treesit-font-lock-settings)) (root (treesit-buffer-root-node))) (treesit-query-capture root query 7700472 7703604))) > This diff fixes the font-lock issues: >=20 > diff --git a/lisp/treesit.el b/lisp/treesit.el > index 674c984dfe..0f84d8b83e 100644 > --- a/lisp/treesit.el > +++ b/lisp/treesit.el > @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region > ;; will give you that quote node. We want to capture the string > ;; and apply string face to it, but querying on the quote node > ;; will not give us the string node. > - (when-let ((root (treesit-buffer-root-node language)) > + (when-let ( > ;; Only activate if ENABLE flag is t. > (activate (eq t enable))) > (ignore activate) > (let ((captures (treesit-query-capture > - root query start end)) > + (treesit-node-on start end) query start = end)) > (inhibit-point-motion-hooks t)) > (with-silent-modifications > (dolist (capture captures) >=20 >=20 > However, the comment right above makes a case for why we should have > this. BUT, is this still relevant, Yuan, after the changes in treesit > reporting what has changed etc? What exact case is that an issue? = And > is it more severe than the behavior this bug is exhibiting? The case described by the comment is still relevant. With this patch, = the quote described in that case still wouldn=E2=80=99t be fontified. We = can use some heuristic to get a node =E2=80=9Clarge enough=E2=80=9D and = not the root node. Eg, find some top-level node. That should make = query-capture much faster. Yuan=