unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Yuan Fu <casouri@gmail.com>
To: Theodor Thornhill <theo@thornhill.no>
Cc: eliz@gnu.org, 59415@debbugs.gnu.org
Subject: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
Date: Sun, 20 Nov 2022 12:59:42 -0800	[thread overview]
Message-ID: <BBE9584B-B584-4CE3-BC36-93BBEB4054BE@gmail.com> (raw)
In-Reply-To: <87v8n9qscd.fsf@thornhill.no>



> On Nov 20, 2022, at 12:33 PM, Theodor Thornhill <theo@thornhill.no> wrote:
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> From: Theodor Thornhill <theo@thornhill.no>
>>> Cc: Yuan Fu <casouri@gmail.com>
>>> Date: Sun, 20 Nov 2022 20:54:05 +0100
>>> 
>>>> Observe that fontifications stop at this line for some reason.
>>>> Fontification reappears on line 209271.  Maybe it's because of the many
>>>> braces that appear in warning face?  Why does TS think there are syntax
>>>> errors here?  The C++ TS parser doesn't have that problem, btw.
>>> 
>>> It seems the c parser definitely can't handle what it's seeing.
>> 
>> Yes, but do you have any clue why it gives up at that line?
>> 
> 
> No, not yet.

Because the whole thing is contained in an ERROR node. I wasn’t covered in error face because our rule for error doesn’t “override”: if there are existing faces in the range, the error face isn’t applied. If I change the rule fontifying errors to override, everything is in error face. Alternatively, if you disable fontifying errors, like this:

(add-hook 'c-ts-mode-hook #'c-ts-setup)
(defun c-ts-setup ()
  (treesit-font-lock-recompute-features nil '(error)))

> 
> 
>> One thing that I see is that many braces around there are shown in warning
>> face, so perhaps the parser is overwhelmed by the amount of parsing errors?
>> 
> 
> Yeah that's my first guess, but that shouldn't be an issue, it should be
> able to font-lock _something_.

Yeah, see above.

> 
>>>> P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
>>> 
>>> It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
>>> it'll be some more memory usage.
>> 
>> After lifting the limit to allow visiting the file, this file causes Emacs
>> to go up to 350 MiB.  Which is significant, but definitely not outrageous
>> enough to prevent using TS with this file.  And I'm sure "normal" C files
>> (as opposed to ones written by a program) will need less memory.  So 4 MiB
>> sounds too restrictive to me.  We should maybe increase that to 15 MiB on
>> 32-bit systems and say 40 MiB on 64-bit?
>> 
> 
> I think it should probably be the same as in the C level, as I mentioned
> in the other mail?

4GB is the absolute upper limit, but the practical maximum size if well below that. Thought 4MB might be too conservative.

> 
>>> I'll do some more digging, but in the
>>> meantime I attach this profiler report that shows font-locking as the
>>> culprit:
>> 
>> Culprit for what?  For slow performance?
> 
> Yeah.
> 
>> Don't get me wrong: from my POV, TS works here better than CC Mode, in
>> many use cases which are much more important than scrolling through
>> the entire humongous file top to bottom.  For example, just visiting
>> the file takes 3 times as much with CC Mode as with c-ts-mode; going
>> to EOB with CC Mode takes more 1 min 20 sec, whereas TS does it in 2.5
>> sec.  And likewise jumping into a random point in the file.  Instead
>> of Alan's 150 sec for a full scroll by CC Mode I get 27 min.  The
>> number of GC cycles with CC Mode is 10 times as large as with TS.
>> (Caveat: my Emacs is built without optimizations, whereas Tree-sitter
>> and the language support libraries are, of course, fully optimized.)
>> 
> 
> Ok, that's good to know!
> 
>>> In this profile I followed your repro, and did some more movement around
>>> the buffer after.  This isn't from emacs -Q, but I believe the results
>>> will be just the same, considering where the slowness seems to be
>>> 
>>> 
>>>       16695  85% - redisplay_internal (C function)
>>>       16695  85%  - jit-lock-function
>>>       16695  85%   - jit-lock-fontify-now
>>>       16695  85%    - jit-lock--run-functions
>>>       16695  85%     - run-hook-wrapped
>>>       16695  85%      - #<compiled -0x156eddb48a262583>
>>>       16695  85%       - font-lock-fontify-region
>>>       16695  85%        - font-lock-default-fontify-region
>>>       16679  84%         - treesit-font-lock-fontify-region
>> 
>> Yes, treesit-font-lock-fontify-region takes the lion's share.  If you or
>> Yuan can speed this up, please do.  But I see no reason to consider this a
>> catastrophe, quite to the contrary.
> 
> I think it boils down to getting the root too many times.  In an
> unmodified buffer I think getting the root node should be instant, and
> it seems to take some time.  I'll try to figure out why.

Getting root is trivial, the bulk of the time is spent in query-capture

Running the following in that file gives me 1.87 seconds, while in a smaller file it only takes 0.00016.

(benchmark-run 100
  (let ((query (caar treesit-font-lock-settings))
        (root (treesit-buffer-root-node)))
    (treesit-query-capture root query 7700472 7703604)))

> This diff fixes the font-lock issues:
> 
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 674c984dfe..0f84d8b83e 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
>       ;; will give you that quote node.  We want to capture the string
>       ;; and apply string face to it, but querying on the quote node
>       ;; will not give us the string node.
> -      (when-let ((root (treesit-buffer-root-node language))
> +      (when-let (
>                  ;; Only activate if ENABLE flag is t.
>                  (activate (eq t enable)))
>         (ignore activate)
>         (let ((captures (treesit-query-capture
> -                         root query start end))
> +                         (treesit-node-on start end) query start end))
>               (inhibit-point-motion-hooks t))
>           (with-silent-modifications
>             (dolist (capture captures)
> 
> 
> However, the comment right above makes a case for why we should have
> this.  BUT, is this still relevant, Yuan, after the changes in treesit
> reporting what has changed etc?  What exact case is that an issue?  And
> is it more severe than the behavior this bug is exhibiting?

The case described by the comment is still relevant. With this patch, the quote described in that case still wouldn’t be fontified. We can use some heuristic to get a node “large enough” and not the root node. Eg, find some top-level node. That should make query-capture much faster.

Yuan




  parent reply	other threads:[~2022-11-20 20:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-20 17:55 bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file Eli Zaretskii
2022-11-20 19:54 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-20 20:16   ` Eli Zaretskii
2022-11-20 20:33     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-20 20:51       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-20 20:59       ` Yuan Fu [this message]
2022-11-20 21:09         ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-20 21:27           ` Yuan Fu
2022-11-20 21:56             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-21  1:27               ` Yuan Fu
2022-11-21 11:00                 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-21 13:44                 ` Eli Zaretskii
2022-11-21 15:15                 ` Eli Zaretskii
2022-11-21 16:53                   ` Yuan Fu
2022-11-21 17:17                     ` Eli Zaretskii
2022-11-22  7:31                       ` Yuan Fu
2022-11-21 12:41               ` Eli Zaretskii
2022-11-20 20:17 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BBE9584B-B584-4CE3-BC36-93BBEB4054BE@gmail.com \
    --to=casouri@gmail.com \
    --cc=59415@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=theo@thornhill.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).