unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
@ 2022-11-20 17:55 Eli Zaretskii
  2022-11-20 19:54 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-20 20:17 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 18+ messages in thread
From: Eli Zaretskii @ 2022-11-20 17:55 UTC (permalink / raw)
  To: 59415; +Cc: Yuan Fu, Theodor Thornhill

To reproduce:

  emacs -Q
  Evaluate:

    (setq auto-mode-alist
	  (append
	   '(("\\.c\\'" . c-ts-mode))
	   auto-mode-alist))
    (setq treesit-max-buffer-size (* 11 1024 1024))

  C-x C-f packet-rrc.c RET

(This file is the one from bug#45248.)

  C-u 194770 M-g g

Observe that fontifications stop at this line for some reason.
Fontification reappears on line 209271.  Maybe it's because of the many
braces that appear in warning face?  Why does TS think there are syntax
errors here?  The C++ TS parser doesn't have that problem, btw.

P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?


In GNU Emacs 29.0.50 (build 31, i686-pc-mingw32) of 2022-11-20 built on
 HOME-C4E4A596F7
Repository revision: 4fa13b2d838e11cbe3b713f3172721cb61d499f3
Repository branch: feature/tree-sitter
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)

Configured using:
 'configure -C --prefix=/d/usr --with-wide-int
 --enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3''

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY
W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1255

Major mode: C

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
c-ts-mode rx treesit cl-seq vc-bzr vc-dispatcher vc-cvs vc-rcs log-view
easy-mmode pcvs-util cc-mode cc-fonts cc-guess cc-menus cc-cmds
cc-styles cc-align cc-engine cc-vars cc-defs cl-loaddefs cl-lib rmc
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel dos-w32 ls-lisp disp-table
term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
faces cus-face macroexp files window text-properties overlay sha1 md5
base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty
make-network-process emacs)

Memory information:
((conses 16 71496 109093)
 (symbols 48 8917 42)
 (strings 16 25650 9189)
 (string-bytes 1 810278)
 (vectors 16 13855)
 (vector-slots 8 190997 54271)
 (floats 8 26 158)
 (intervals 40 583 684)
 (buffers 904 13))





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 17:55 bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file Eli Zaretskii
@ 2022-11-20 19:54 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-20 20:16   ` Eli Zaretskii
  2022-11-20 20:17 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 18+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-20 19:54 UTC (permalink / raw)
  To: eliz, 59415; +Cc: Yuan Fu


Hi and thanks for cc.
>
> Observe that fontifications stop at this line for some reason.
> Fontification reappears on line 209271.  Maybe it's because of the many
> braces that appear in warning face?  Why does TS think there are syntax
> errors here?  The C++ TS parser doesn't have that problem, btw.
>

It seems the c parser definitely can't handle what it's seeing.

> P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
>

It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
it'll be some more memory usage.  I'll do some more digging, but in the
meantime I attach this profiler report that shows font-locking as the
culprit:

In this profile I followed your repro, and did some more movement around
the buffer after.  This isn't from emacs -Q, but I believe the results
will be just the same, considering where the slowness seems to be


       16695  85% - redisplay_internal (C function)
       16695  85%  - jit-lock-function
       16695  85%   - jit-lock-fontify-now
       16695  85%    - jit-lock--run-functions
       16695  85%     - run-hook-wrapped
       16695  85%      - #<compiled -0x156eddb48a262583>
       16695  85%       - font-lock-fontify-region
       16695  85%        - font-lock-default-fontify-region
       16679  84%         - treesit-font-lock-fontify-region
        2080  10%            treesit-buffer-root-node
        2689  13% - command-execute
        2689  13%  - call-interactively
        2380  12%   - funcall-interactively
        1576   8%    - scroll-up-command
        1525   7%     - scroll-up
        1525   7%      - jit-lock-function
        1525   7%       - jit-lock-fontify-now
        1525   7%        - jit-lock--run-functions
        1525   7%         - run-hook-wrapped
        1525   7%          - #<compiled -0x15bd2ea490f7f983>
        1525   7%           - font-lock-fontify-region
        1525   7%            - font-lock-default-fontify-region
        1525   7%               treesit-font-lock-fontify-region
         633   3%    - end-of-buffer
         628   3%     - recenter
         628   3%      - jit-lock-function
         628   3%       - jit-lock-fontify-now
         628   3%        - jit-lock--run-functions
         628   3%         - run-hook-wrapped
         628   3%          - #<compiled -0x14388b9914c40883>
         628   3%           - font-lock-fontify-region
         628   3%            - font-lock-default-fontify-region
         628   3%               treesit-font-lock-fontify-region
           5   0%       push-mark
         128   0%    - project-find-file
         128   0%     - project-find-file-in
          86   0%      - project--read-file-cpd-relative
          86   0%       - project--completing-read-strict
          86   0%        - completing-read
          86   0%         - completing-read-default
          86   0%          - apply
          86   0%           - vertico--advice
          86   0%            - apply
          86   0%             - #<compiled -0x2e553dfe9f75520>
          79   0%              - read-from-minibuffer
          37   0%               - vertico--exhibit
          26   0%                - vertico--update
          22   0%                   redisplay
           4   0%                 - vertico--recompute
           4   0%                  - vertico-sort-history-length-alpha
           4   0%                   - mapcan
           4   0%                    - #<compiled -0x1cada1a01280ac5f>
           4   0%                       sort
          11   0%                - vertico--display-candidates
          11   0%                   vertico--resize-window
          15   0%               - timer-event-handler
          10   0%                - apply
           7   0%                 - battery-update-handler
           7   0%                  - sit-for
           7   0%                   - redisplay
           7   0%                      redisplay_internal (C function)
           3   0%                   #<compiled 0x12c58df73848dc86>
           2   0%               - internal-timer-start-idle
           2   0%                  timerp
           2   0%               - command-execute
           2   0%                - call-interactively
           2   0%                 - funcall-interactively
           2   0%                  - vertico-exit
           2   0%                   - vertico--match-p
           2   0%                    - test-completion
           2   0%                     - #<compiled -0x1464df124877e5c8>
           2   0%                        complete-with-action
          27   0%      - find-file
          27   0%       - find-file-noselect
          24   0%        - find-file-noselect-1
           4   0%         - insert-file-contents
           4   0%          - set-auto-coding
           4   0%           - find-auto-coding
           4   0%              sgml-html-meta-auto-coding-function
           4   0%         - after-find-file
           4   0%          - normal-mode
           4   0%           - set-auto-mode
           4   0%            - set-auto-mode--apply-alist
           4   0%             - set-auto-mode-0
           4   0%              - c-ts-mode
           4   0%                 treesit-ready-p
           3   0%        - find-buffer-visiting
           3   0%           abbreviate-file-name
          15   0%      - project-files
          15   0%       - apply
          15   0%        - #<compiled -0x7a9f28e22b82f80>
          15   0%         - mapcan
          15   0%          - #<compiled 0x14d13416934a6c69>
          15   0%           - project--vc-list-files
          11   0%            - apply
          11   0%             - vc-git--run-command-string
          11   0%              - #<compiled 0x88854d79be8a>
          11   0%               - kill-buffer
          11   0%                - replace-buffer-in-windows
          11   0%                 - unrecord-window-buffer
          11   0%                    assq-delete-all
           4   0%              split-string
           5   0%    - next-line
           5   0%     - line-move
           5   0%        line-move-visual
           4   0%    - execute-extended-command
           4   0%     - command-execute
           4   0%      - call-interactively
           4   0%       - funcall-interactively
           4   0%          profiler-stop
           2   0%    - digit-argument
           2   0%     - universal-argument--mode
           2   0%        set-transient-map
         309   1%   - byte-code
         309   1%    - read-extended-command
         309   1%     - read-extended-command-1
         309   1%      - completing-read
         309   1%       - completing-read-default
         309   1%        - apply
         309   1%         - vertico--advice
         309   1%          - apply
         309   1%           - #<compiled -0x2e553dfe9f75520>
         276   1%            - read-from-minibuffer
         253   1%             - vertico--exhibit
         249   1%              - vertico--update
         240   1%               - vertico--recompute
         236   1%                - vertico--all-completions
         236   1%                 - apply
         236   1%                  - completion-all-completions
         236   1%                   - completion--nth-completion
         236   1%                    - completion--some
         236   1%                     - #<compiled -0x18735a95ea969dbf>
         163   0%                      - completion-basic-all-completions
         163   0%                       - completion-pcm--all-completions
         163   0%                        - all-completions
         163   0%                         - #<compiled -0xf2f3e8a19f62ad2>
         163   0%                          - complete-with-action
           4   0%                           - all-completions
           4   0%                            - #<compiled 0xadd42c29ce50255>
           4   0%                               #<compiled 0x1a1dcc3780af9553>
          73   0%                      - completion-substring-all-completions
          73   0%                       - completion-substring--all-completions
          64   0%                        - completion-pcm--all-completions
          64   0%                         - all-completions
          64   0%                          - #<compiled -0x1464df124877e5c8>
          64   0%                             complete-with-action
           4   0%                - test-completion
           4   0%                 - #<compiled -0xf2f3e8a19f62ad2>
           4   0%                    complete-with-action
           7   0%                 redisplay
           4   0%              - vertico--display-candidates
           4   0%                 vertico--resize-window
           4   0%             - redisplay_internal (C function)
           4   0%              - eval
           4   0%                 unless
         201   1% + timer-event-handler
          50   0% + ...
           4   0%   set-message-functions





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 19:54 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-20 20:16   ` Eli Zaretskii
  2022-11-20 20:33     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2022-11-20 20:16 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: 59415, casouri

> From: Theodor Thornhill <theo@thornhill.no>
> Cc: Yuan Fu <casouri@gmail.com>
> Date: Sun, 20 Nov 2022 20:54:05 +0100
> 
> > Observe that fontifications stop at this line for some reason.
> > Fontification reappears on line 209271.  Maybe it's because of the many
> > braces that appear in warning face?  Why does TS think there are syntax
> > errors here?  The C++ TS parser doesn't have that problem, btw.
> 
> It seems the c parser definitely can't handle what it's seeing.

Yes, but do you have any clue why it gives up at that line?

One thing that I see is that many braces around there are shown in warning
face, so perhaps the parser is overwhelmed by the amount of parsing errors?

> > P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
> 
> It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
> it'll be some more memory usage.

After lifting the limit to allow visiting the file, this file causes Emacs
to go up to 350 MiB.  Which is significant, but definitely not outrageous
enough to prevent using TS with this file.  And I'm sure "normal" C files
(as opposed to ones written by a program) will need less memory.  So 4 MiB
sounds too restrictive to me.  We should maybe increase that to 15 MiB on
32-bit systems and say 40 MiB on 64-bit?

> I'll do some more digging, but in the
> meantime I attach this profiler report that shows font-locking as the
> culprit:

Culprit for what?  For slow performance?  Don't get me wrong: from my POV,
TS works here better than CC Mode, in many use cases which are much more
important than scrolling through the entire humongous file top to bottom.
For example, just visiting the file takes 3 times as much with CC Mode as
with c-ts-mode; going to EOB with CC Mode takes more 1 min 20 sec, whereas
TS does it in 2.5 sec.  And likewise jumping into a random point in the
file.  Instead of Alan's 150 sec for a full scroll by CC Mode I get 27 min.
The number of GC cycles with CC Mode is 10 times as large as with TS.
(Caveat: my Emacs is built without optimizations, whereas Tree-sitter and
the language support libraries are, of course, fully optimized.)

> In this profile I followed your repro, and did some more movement around
> the buffer after.  This isn't from emacs -Q, but I believe the results
> will be just the same, considering where the slowness seems to be
> 
> 
>        16695  85% - redisplay_internal (C function)
>        16695  85%  - jit-lock-function
>        16695  85%   - jit-lock-fontify-now
>        16695  85%    - jit-lock--run-functions
>        16695  85%     - run-hook-wrapped
>        16695  85%      - #<compiled -0x156eddb48a262583>
>        16695  85%       - font-lock-fontify-region
>        16695  85%        - font-lock-default-fontify-region
>        16679  84%         - treesit-font-lock-fontify-region

Yes, treesit-font-lock-fontify-region takes the lion's share.  If you or
Yuan can speed this up, please do.  But I see no reason to consider this a
catastrophe, quite to the contrary.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 17:55 bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file Eli Zaretskii
  2022-11-20 19:54 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-20 20:17 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 18+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-20 20:17 UTC (permalink / raw)
  To: eliz, 59415; +Cc: Yuan Fu

>
> P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?

I thought it was supposed to be 4 gigs, as seen in this function:

```
static void
treesit_check_buffer_size (struct buffer *buffer)
{
  ptrdiff_t buffer_size = (BUF_Z (buffer) - BUF_BEG (buffer));
  if (buffer_size > UINT32_MAX)
    xsignal2 (Qtreesit_buffer_too_large,
	      build_pure_c_string ("Buffer size cannot be larger than 4GB"),
	      make_fixnum (buffer_size));
}
```

So my guess is that that is a typo, and should be


(defcustom treesit-max-buffer-size (* 4 1024 1024 1024)
  "Maximum buffer size for enabling tree-sitter parsing (in bytes)."
  :type 'integer
  :version "29.1")

or something like that :-)

Theo





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 20:16   ` Eli Zaretskii
@ 2022-11-20 20:33     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-20 20:51       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-20 20:59       ` Yuan Fu
  0 siblings, 2 replies; 18+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-20 20:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 59415, casouri

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 20 Nov 2022 20:54:05 +0100
>> 
>> > Observe that fontifications stop at this line for some reason.
>> > Fontification reappears on line 209271.  Maybe it's because of the many
>> > braces that appear in warning face?  Why does TS think there are syntax
>> > errors here?  The C++ TS parser doesn't have that problem, btw.
>> 
>> It seems the c parser definitely can't handle what it's seeing.
>
> Yes, but do you have any clue why it gives up at that line?
>

No, not yet.


> One thing that I see is that many braces around there are shown in warning
> face, so perhaps the parser is overwhelmed by the amount of parsing errors?
>

Yeah that's my first guess, but that shouldn't be an issue, it should be
able to font-lock _something_.

>> > P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
>> 
>> It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
>> it'll be some more memory usage.
>
> After lifting the limit to allow visiting the file, this file causes Emacs
> to go up to 350 MiB.  Which is significant, but definitely not outrageous
> enough to prevent using TS with this file.  And I'm sure "normal" C files
> (as opposed to ones written by a program) will need less memory.  So 4 MiB
> sounds too restrictive to me.  We should maybe increase that to 15 MiB on
> 32-bit systems and say 40 MiB on 64-bit?
>

I think it should probably be the same as in the C level, as I mentioned
in the other mail?

>> I'll do some more digging, but in the
>> meantime I attach this profiler report that shows font-locking as the
>> culprit:
>
> Culprit for what?  For slow performance?

Yeah.

> Don't get me wrong: from my POV, TS works here better than CC Mode, in
> many use cases which are much more important than scrolling through
> the entire humongous file top to bottom.  For example, just visiting
> the file takes 3 times as much with CC Mode as with c-ts-mode; going
> to EOB with CC Mode takes more 1 min 20 sec, whereas TS does it in 2.5
> sec.  And likewise jumping into a random point in the file.  Instead
> of Alan's 150 sec for a full scroll by CC Mode I get 27 min.  The
> number of GC cycles with CC Mode is 10 times as large as with TS.
> (Caveat: my Emacs is built without optimizations, whereas Tree-sitter
> and the language support libraries are, of course, fully optimized.)
>

Ok, that's good to know!

>> In this profile I followed your repro, and did some more movement around
>> the buffer after.  This isn't from emacs -Q, but I believe the results
>> will be just the same, considering where the slowness seems to be
>> 
>> 
>>        16695  85% - redisplay_internal (C function)
>>        16695  85%  - jit-lock-function
>>        16695  85%   - jit-lock-fontify-now
>>        16695  85%    - jit-lock--run-functions
>>        16695  85%     - run-hook-wrapped
>>        16695  85%      - #<compiled -0x156eddb48a262583>
>>        16695  85%       - font-lock-fontify-region
>>        16695  85%        - font-lock-default-fontify-region
>>        16679  84%         - treesit-font-lock-fontify-region
>
> Yes, treesit-font-lock-fontify-region takes the lion's share.  If you or
> Yuan can speed this up, please do.  But I see no reason to consider this a
> catastrophe, quite to the contrary.

I think it boils down to getting the root too many times.  In an
unmodified buffer I think getting the root node should be instant, and
it seems to take some time.  I'll try to figure out why.

Theo





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 20:33     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-20 20:51       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-20 20:59       ` Yuan Fu
  1 sibling, 0 replies; 18+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-20 20:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 59415, casouri

Theodor Thornhill <theo@thornhill.no> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Theodor Thornhill <theo@thornhill.no>
>>> Cc: Yuan Fu <casouri@gmail.com>
>>> Date: Sun, 20 Nov 2022 20:54:05 +0100
>>> 
>>> > Observe that fontifications stop at this line for some reason.
>>> > Fontification reappears on line 209271.  Maybe it's because of the many
>>> > braces that appear in warning face?  Why does TS think there are syntax
>>> > errors here?  The C++ TS parser doesn't have that problem, btw.
>>> 
>>> It seems the c parser definitely can't handle what it's seeing.
>>
>> Yes, but do you have any clue why it gives up at that line?
>>
>
> No, not yet.
>
>

This diff fixes the font-lock issues:

diff --git a/lisp/treesit.el b/lisp/treesit.el
index 674c984dfe..0f84d8b83e 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
       ;; will give you that quote node.  We want to capture the string
       ;; and apply string face to it, but querying on the quote node
       ;; will not give us the string node.
-      (when-let ((root (treesit-buffer-root-node language))
+      (when-let (
                  ;; Only activate if ENABLE flag is t.
                  (activate (eq t enable)))
         (ignore activate)
         (let ((captures (treesit-query-capture
-                         root query start end))
+                         (treesit-node-on start end) query start end))
               (inhibit-point-motion-hooks t))
           (with-silent-modifications
             (dolist (capture captures)


However, the comment right above makes a case for why we should have
this.  BUT, is this still relevant, Yuan, after the changes in treesit
reporting what has changed etc?  What exact case is that an issue?  And
is it more severe than the behavior this bug is exhibiting?





^ permalink raw reply related	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 20:33     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-20 20:51       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-20 20:59       ` Yuan Fu
  2022-11-20 21:09         ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 18+ messages in thread
From: Yuan Fu @ 2022-11-20 20:59 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: eliz, 59415



> On Nov 20, 2022, at 12:33 PM, Theodor Thornhill <theo@thornhill.no> wrote:
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> From: Theodor Thornhill <theo@thornhill.no>
>>> Cc: Yuan Fu <casouri@gmail.com>
>>> Date: Sun, 20 Nov 2022 20:54:05 +0100
>>> 
>>>> Observe that fontifications stop at this line for some reason.
>>>> Fontification reappears on line 209271.  Maybe it's because of the many
>>>> braces that appear in warning face?  Why does TS think there are syntax
>>>> errors here?  The C++ TS parser doesn't have that problem, btw.
>>> 
>>> It seems the c parser definitely can't handle what it's seeing.
>> 
>> Yes, but do you have any clue why it gives up at that line?
>> 
> 
> No, not yet.

Because the whole thing is contained in an ERROR node. I wasn’t covered in error face because our rule for error doesn’t “override”: if there are existing faces in the range, the error face isn’t applied. If I change the rule fontifying errors to override, everything is in error face. Alternatively, if you disable fontifying errors, like this:

(add-hook 'c-ts-mode-hook #'c-ts-setup)
(defun c-ts-setup ()
  (treesit-font-lock-recompute-features nil '(error)))

> 
> 
>> One thing that I see is that many braces around there are shown in warning
>> face, so perhaps the parser is overwhelmed by the amount of parsing errors?
>> 
> 
> Yeah that's my first guess, but that shouldn't be an issue, it should be
> able to font-lock _something_.

Yeah, see above.

> 
>>>> P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
>>> 
>>> It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
>>> it'll be some more memory usage.
>> 
>> After lifting the limit to allow visiting the file, this file causes Emacs
>> to go up to 350 MiB.  Which is significant, but definitely not outrageous
>> enough to prevent using TS with this file.  And I'm sure "normal" C files
>> (as opposed to ones written by a program) will need less memory.  So 4 MiB
>> sounds too restrictive to me.  We should maybe increase that to 15 MiB on
>> 32-bit systems and say 40 MiB on 64-bit?
>> 
> 
> I think it should probably be the same as in the C level, as I mentioned
> in the other mail?

4GB is the absolute upper limit, but the practical maximum size if well below that. Thought 4MB might be too conservative.

> 
>>> I'll do some more digging, but in the
>>> meantime I attach this profiler report that shows font-locking as the
>>> culprit:
>> 
>> Culprit for what?  For slow performance?
> 
> Yeah.
> 
>> Don't get me wrong: from my POV, TS works here better than CC Mode, in
>> many use cases which are much more important than scrolling through
>> the entire humongous file top to bottom.  For example, just visiting
>> the file takes 3 times as much with CC Mode as with c-ts-mode; going
>> to EOB with CC Mode takes more 1 min 20 sec, whereas TS does it in 2.5
>> sec.  And likewise jumping into a random point in the file.  Instead
>> of Alan's 150 sec for a full scroll by CC Mode I get 27 min.  The
>> number of GC cycles with CC Mode is 10 times as large as with TS.
>> (Caveat: my Emacs is built without optimizations, whereas Tree-sitter
>> and the language support libraries are, of course, fully optimized.)
>> 
> 
> Ok, that's good to know!
> 
>>> In this profile I followed your repro, and did some more movement around
>>> the buffer after.  This isn't from emacs -Q, but I believe the results
>>> will be just the same, considering where the slowness seems to be
>>> 
>>> 
>>>       16695  85% - redisplay_internal (C function)
>>>       16695  85%  - jit-lock-function
>>>       16695  85%   - jit-lock-fontify-now
>>>       16695  85%    - jit-lock--run-functions
>>>       16695  85%     - run-hook-wrapped
>>>       16695  85%      - #<compiled -0x156eddb48a262583>
>>>       16695  85%       - font-lock-fontify-region
>>>       16695  85%        - font-lock-default-fontify-region
>>>       16679  84%         - treesit-font-lock-fontify-region
>> 
>> Yes, treesit-font-lock-fontify-region takes the lion's share.  If you or
>> Yuan can speed this up, please do.  But I see no reason to consider this a
>> catastrophe, quite to the contrary.
> 
> I think it boils down to getting the root too many times.  In an
> unmodified buffer I think getting the root node should be instant, and
> it seems to take some time.  I'll try to figure out why.

Getting root is trivial, the bulk of the time is spent in query-capture

Running the following in that file gives me 1.87 seconds, while in a smaller file it only takes 0.00016.

(benchmark-run 100
  (let ((query (caar treesit-font-lock-settings))
        (root (treesit-buffer-root-node)))
    (treesit-query-capture root query 7700472 7703604)))

> This diff fixes the font-lock issues:
> 
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 674c984dfe..0f84d8b83e 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
>       ;; will give you that quote node.  We want to capture the string
>       ;; and apply string face to it, but querying on the quote node
>       ;; will not give us the string node.
> -      (when-let ((root (treesit-buffer-root-node language))
> +      (when-let (
>                  ;; Only activate if ENABLE flag is t.
>                  (activate (eq t enable)))
>         (ignore activate)
>         (let ((captures (treesit-query-capture
> -                         root query start end))
> +                         (treesit-node-on start end) query start end))
>               (inhibit-point-motion-hooks t))
>           (with-silent-modifications
>             (dolist (capture captures)
> 
> 
> However, the comment right above makes a case for why we should have
> this.  BUT, is this still relevant, Yuan, after the changes in treesit
> reporting what has changed etc?  What exact case is that an issue?  And
> is it more severe than the behavior this bug is exhibiting?

The case described by the comment is still relevant. With this patch, the quote described in that case still wouldn’t be fontified. We can use some heuristic to get a node “large enough” and not the root node. Eg, find some top-level node. That should make query-capture much faster.

Yuan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 20:59       ` Yuan Fu
@ 2022-11-20 21:09         ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-20 21:27           ` Yuan Fu
  0 siblings, 1 reply; 18+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-20 21:09 UTC (permalink / raw)
  To: Yuan Fu; +Cc: eliz, 59415

>> This diff fixes the font-lock issues:
>> 
>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>> index 674c984dfe..0f84d8b83e 100644
>> --- a/lisp/treesit.el
>> +++ b/lisp/treesit.el
>> @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
>>       ;; will give you that quote node.  We want to capture the string
>>       ;; and apply string face to it, but querying on the quote node
>>       ;; will not give us the string node.
>> -      (when-let ((root (treesit-buffer-root-node language))
>> +      (when-let (
>>                  ;; Only activate if ENABLE flag is t.
>>                  (activate (eq t enable)))
>>         (ignore activate)
>>         (let ((captures (treesit-query-capture
>> -                         root query start end))
>> +                         (treesit-node-on start end) query start end))
>>               (inhibit-point-motion-hooks t))
>>           (with-silent-modifications
>>             (dolist (capture captures)
>> 
>> 
>> However, the comment right above makes a case for why we should have
>> this.  BUT, is this still relevant, Yuan, after the changes in treesit
>> reporting what has changed etc?  What exact case is that an issue?  And
>> is it more severe than the behavior this bug is exhibiting?
>
> The case described by the comment is still relevant. With this patch,
> the quote described in that case still wouldn’t be fontified. We can
> use some heuristic to get a node “large enough” and not the root
> node. Eg, find some top-level node. That should make query-capture
> much faster.
>

I appreciate the explanation.  I think getting the root is a bit
excessive.  I got the same results as you in the capture.  Maybe reuse
the treesit-defun-type-regexp, and default to root if none found?





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 21:09         ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-20 21:27           ` Yuan Fu
  2022-11-20 21:56             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 18+ messages in thread
From: Yuan Fu @ 2022-11-20 21:27 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: eliz, 59415



> On Nov 20, 2022, at 1:09 PM, Theodor Thornhill <theo@thornhill.no> wrote:
> 
>>> This diff fixes the font-lock issues:
>>> 
>>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>>> index 674c984dfe..0f84d8b83e 100644
>>> --- a/lisp/treesit.el
>>> +++ b/lisp/treesit.el
>>> @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
>>>      ;; will give you that quote node.  We want to capture the string
>>>      ;; and apply string face to it, but querying on the quote node
>>>      ;; will not give us the string node.
>>> -      (when-let ((root (treesit-buffer-root-node language))
>>> +      (when-let (
>>>                 ;; Only activate if ENABLE flag is t.
>>>                 (activate (eq t enable)))
>>>        (ignore activate)
>>>        (let ((captures (treesit-query-capture
>>> -                         root query start end))
>>> +                         (treesit-node-on start end) query start end))
>>>              (inhibit-point-motion-hooks t))
>>>          (with-silent-modifications
>>>            (dolist (capture captures)
>>> 
>>> 
>>> However, the comment right above makes a case for why we should have
>>> this.  BUT, is this still relevant, Yuan, after the changes in treesit
>>> reporting what has changed etc?  What exact case is that an issue?  And
>>> is it more severe than the behavior this bug is exhibiting?
>> 
>> The case described by the comment is still relevant. With this patch,
>> the quote described in that case still wouldn’t be fontified. We can
>> use some heuristic to get a node “large enough” and not the root
>> node. Eg, find some top-level node. That should make query-capture
>> much faster.
>> 
> 
> I appreciate the explanation.  I think getting the root is a bit
> excessive.  I got the same results as you in the capture.  Maybe reuse
> the treesit-defun-type-regexp, and default to root if none found?

I tried the "top-level node” approach, and it didn’t help in package-rrc.c: the top-level node (a function definition) is still too large (spans 7680306-9936062). Since the case I described in the comment against using treesit-node-on is the exception rather than the norm, maybe we can go the other way around: use treesit-node-on first, and if the node seems too small (by some heuristic), enlarge it to some degree.

Yuan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 21:27           ` Yuan Fu
@ 2022-11-20 21:56             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-21  1:27               ` Yuan Fu
  2022-11-21 12:41               ` Eli Zaretskii
  0 siblings, 2 replies; 18+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-20 21:56 UTC (permalink / raw)
  To: Yuan Fu; +Cc: eliz, 59415

>> 
>> I appreciate the explanation.  I think getting the root is a bit
>> excessive.  I got the same results as you in the capture.  Maybe reuse
>> the treesit-defun-type-regexp, and default to root if none found?
>
> I tried the "top-level node” approach, and it didn’t help in
> package-rrc.c: the top-level node (a function definition) is still too
> large (spans 7680306-9936062). Since the case I described in the
> comment against using treesit-node-on is the exception rather than the
> norm, maybe we can go the other way around: use treesit-node-on first,
> and if the node seems too small (by some heuristic), enlarge it to
> some degree.
>

Makes sense!

BTW, should the chunk-size of jit-lock be up for discussion again?  I
ran the benchmarks from this thread [0] on this file, and it seems like
increasing the chunk-size from 1500 to 4500 by 500 increments makes it
average from 2 seconds to 1.65.

The density of that file absolutely is a concern performance-wise.

Theo


[0]: https://lists.gnu.org/archive/html/emacs-devel/2021-09/msg00538.html





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 21:56             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-21  1:27               ` Yuan Fu
  2022-11-21 11:00                 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
                                   ` (2 more replies)
  2022-11-21 12:41               ` Eli Zaretskii
  1 sibling, 3 replies; 18+ messages in thread
From: Yuan Fu @ 2022-11-21  1:27 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: Eli Zaretskii, 59415



> On Nov 20, 2022, at 1:56 PM, Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org> wrote:
> 
>>> 
>>> I appreciate the explanation.  I think getting the root is a bit
>>> excessive.  I got the same results as you in the capture.  Maybe reuse
>>> the treesit-defun-type-regexp, and default to root if none found?
>> 
>> I tried the "top-level node” approach, and it didn’t help in
>> package-rrc.c: the top-level node (a function definition) is still too
>> large (spans 7680306-9936062). Since the case I described in the
>> comment against using treesit-node-on is the exception rather than the
>> norm, maybe we can go the other way around: use treesit-node-on first,
>> and if the node seems too small (by some heuristic), enlarge it to
>> some degree.
>> 
> 
> Makes sense!

I pushed a change that uses treesit-node-on. Now scrolling in most parts of the buffer is pretty fast. Scrolling around 194770 is still laggy, because the node we get from treesit-node-on is still too large. I tried some heuristics but they didn’t work very well, IMO because tree-sitter couldn’t parse that part of the code very well. The code should observe a structure like {{}, {}, {}, {}, {}, …} where there are tens thousands of inner brackets, so ideally we only need to grab the {}’s in the region we want to fontify. But tree-sitter seems to understand it in some weird structure and we still end up with very large nodes, which is far larger than the region we want to fontify and is slow to query.

I’ll try to improve it further in the future, but for now I think it’s good enough (because in most cases fontification is pretty fast).

Also I think we should probably disable fontifying errors in C. C’s macros just create too much errors.

Yuan




^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-21  1:27               ` Yuan Fu
@ 2022-11-21 11:00                 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-21 13:44                 ` Eli Zaretskii
  2022-11-21 15:15                 ` Eli Zaretskii
  2 siblings, 0 replies; 18+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-21 11:00 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, 59415

Yuan Fu <casouri@gmail.com> writes:

>> On Nov 20, 2022, at 1:56 PM, Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors <bug-gnu-emacs@gnu.org> wrote:
>> 
>>>> 
>>>> I appreciate the explanation.  I think getting the root is a bit
>>>> excessive.  I got the same results as you in the capture.  Maybe reuse
>>>> the treesit-defun-type-regexp, and default to root if none found?
>>> 
>>> I tried the "top-level node” approach, and it didn’t help in
>>> package-rrc.c: the top-level node (a function definition) is still too
>>> large (spans 7680306-9936062). Since the case I described in the
>>> comment against using treesit-node-on is the exception rather than the
>>> norm, maybe we can go the other way around: use treesit-node-on first,
>>> and if the node seems too small (by some heuristic), enlarge it to
>>> some degree.
>>> 
>> 
>> Makes sense!
>
> I pushed a change that uses treesit-node-on. Now scrolling in most
> parts of the buffer is pretty fast. Scrolling around 194770 is still
> laggy, because the node we get from treesit-node-on is still too
> large. I tried some heuristics but they didn’t work very well, IMO
> because tree-sitter couldn’t parse that part of the code very
> well. The code should observe a structure like {{}, {}, {}, {}, {}, …}
> where there are tens thousands of inner brackets, so ideally we only
> need to grab the {}’s in the region we want to fontify. But
> tree-sitter seems to understand it in some weird structure and we
> still end up with very large nodes, which is far larger than the
> region we want to fontify and is slow to query.
>
> I’ll try to improve it further in the future, but for now I think it’s
> good enough (because in most cases fontification is pretty fast).
>
> Also I think we should probably disable fontifying errors in C. C’s
> macros just create too much errors.


Good job.  I ran this in both c-ts-mode and c-mode on the same file:

(defun scroll-up-benchmark ()
  (interactive)
  (let ((oldgc gcs-done)
        (oldtime (float-time)))
    (condition-case nil (while t (scroll-up) (redisplay))
      (error (message "GCs: %d Elapsed time: %f seconds"
                      (- gcs-done oldgc) (- (float-time) oldtime))))))


c-ts-mode: GCs: 87 Elapsed time: 135.700742 seconds

c-mode: GCs: 224 Elapsed time: 133.329396 seconds

Font locking seems correct too.

Theo





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-20 21:56             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-21  1:27               ` Yuan Fu
@ 2022-11-21 12:41               ` Eli Zaretskii
  1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2022-11-21 12:41 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: casouri, 59415

> From: Theodor Thornhill <theo@thornhill.no>
> Cc: Eli Zaretskii <eliz@gnu.org>, Bug Report Emacs <bug-gnu-emacs@gnu.org>
> Date: Sun, 20 Nov 2022 22:56:12 +0100
> 
> > I tried the "top-level node” approach, and it didn’t help in
> > package-rrc.c: the top-level node (a function definition) is still too
> > large (spans 7680306-9936062). Since the case I described in the
> > comment against using treesit-node-on is the exception rather than the
> > norm, maybe we can go the other way around: use treesit-node-on first,
> > and if the node seems too small (by some heuristic), enlarge it to
> > some degree.
> >
> 
> Makes sense!
> 
> BTW, should the chunk-size of jit-lock be up for discussion again?  I
> ran the benchmarks from this thread [0] on this file, and it seems like
> increasing the chunk-size from 1500 to 4500 by 500 increments makes it
> average from 2 seconds to 1.65.
> 
> The density of that file absolutely is a concern performance-wise.

FWIW, if the root cause is the humongous data structure, I'm not too
worried, because such cases are extremely rare.  If some clever idea arises
that could improve things without endangering more practical use cases, then
fine; otherwise, I'm okay with the slightly slower performance in these
extreme cases -- after all, the interactive responsiveness is not that bad.

But I still don't understand why fontifications stopped _completely_
starting at that line.  That is, if the entire strict is in error, why most
of it is fontified, and only the last party isn't? what is the mechanism
which causes that?

Thanks.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-21  1:27               ` Yuan Fu
  2022-11-21 11:00                 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-21 13:44                 ` Eli Zaretskii
  2022-11-21 15:15                 ` Eli Zaretskii
  2 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2022-11-21 13:44 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 59415, theo

> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 20 Nov 2022 17:27:16 -0800
> Cc: Eli Zaretskii <eliz@gnu.org>,
>  59415@debbugs.gnu.org
> 
> Also I think we should probably disable fontifying errors in C. C’s macros just create too much errors.

Let's make it optional.  I think at least I would like to see those errors.

Thanks.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-21  1:27               ` Yuan Fu
  2022-11-21 11:00                 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-21 13:44                 ` Eli Zaretskii
@ 2022-11-21 15:15                 ` Eli Zaretskii
  2022-11-21 16:53                   ` Yuan Fu
  2 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2022-11-21 15:15 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 59415, theo

> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 20 Nov 2022 17:27:16 -0800
> Cc: Eli Zaretskii <eliz@gnu.org>,
>  59415@debbugs.gnu.org
> 
> I pushed a change that uses treesit-node-on. Now scrolling in most parts of the buffer is pretty fast. Scrolling around 194770 is still laggy, because the node we get from treesit-node-on is still too large. I tried some heuristics but they didn’t work very well, IMO because tree-sitter couldn’t parse that part of the code very well. The code should observe a structure like {{}, {}, {}, {}, {}, …} where there are tens thousands of inner brackets, so ideally we only need to grab the {}’s in the region we want to fontify. But tree-sitter seems to understand it in some weird structure and we still end up with very large nodes, which is far larger than the region we want to fontify and is slow to query.
> 
> I’ll try to improve it further in the future, but for now I think it’s good enough (because in most cases fontification is pretty fast).

Agreed.

Thanks, I think we can close the bug now.

What do you think about enlarging treesit-max-buffer-size as I proposed
up-thread?





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-21 15:15                 ` Eli Zaretskii
@ 2022-11-21 16:53                   ` Yuan Fu
  2022-11-21 17:17                     ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Yuan Fu @ 2022-11-21 16:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 59415, Theodor Thornhill



> On Nov 21, 2022, at 7:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 20 Nov 2022 17:27:16 -0800
>> Cc: Eli Zaretskii <eliz@gnu.org>,
>> 59415@debbugs.gnu.org
>> 
>> I pushed a change that uses treesit-node-on. Now scrolling in most parts of the buffer is pretty fast. Scrolling around 194770 is still laggy, because the node we get from treesit-node-on is still too large. I tried some heuristics but they didn’t work very well, IMO because tree-sitter couldn’t parse that part of the code very well. The code should observe a structure like {{}, {}, {}, {}, {}, …} where there are tens thousands of inner brackets, so ideally we only need to grab the {}’s in the region we want to fontify. But tree-sitter seems to understand it in some weird structure and we still end up with very large nodes, which is far larger than the region we want to fontify and is slow to query.
>> 
>> I’ll try to improve it further in the future, but for now I think it’s good enough (because in most cases fontification is pretty fast).
> 
> Agreed.
> 
> Thanks, I think we can close the bug now.
> 
> What do you think about enlarging treesit-max-buffer-size as I proposed
> up-thread?

Yeah we should do it, but to what value though? 40MB?

Yuan






^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-21 16:53                   ` Yuan Fu
@ 2022-11-21 17:17                     ` Eli Zaretskii
  2022-11-22  7:31                       ` Yuan Fu
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2022-11-21 17:17 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 59415, theo

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 21 Nov 2022 08:53:25 -0800
> Cc: Theodor Thornhill <theo@thornhill.no>,
>  59415@debbugs.gnu.org
> 
> > What do you think about enlarging treesit-max-buffer-size as I proposed
> > up-thread?
> 
> Yeah we should do it, but to what value though? 40MB?

My suggestion was 15 MiB on 32-bit systems and 40 MiB on 64-bit systems.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
  2022-11-21 17:17                     ` Eli Zaretskii
@ 2022-11-22  7:31                       ` Yuan Fu
  0 siblings, 0 replies; 18+ messages in thread
From: Yuan Fu @ 2022-11-22  7:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 59415, Theodor Thornhill, 59415-done



> On Nov 21, 2022, at 9:17 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 21 Nov 2022 08:53:25 -0800
>> Cc: Theodor Thornhill <theo@thornhill.no>,
>> 59415@debbugs.gnu.org
>> 
>>> What do you think about enlarging treesit-max-buffer-size as I proposed
>>> up-thread?
>> 
>> Yeah we should do it, but to what value though? 40MB?
> 
> My suggestion was 15 MiB on 32-bit systems and 40 MiB on 64-bit systems.

Cool, changed, will push soon.





^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-11-22  7:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-20 17:55 bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file Eli Zaretskii
2022-11-20 19:54 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-20 20:16   ` Eli Zaretskii
2022-11-20 20:33     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-20 20:51       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-20 20:59       ` Yuan Fu
2022-11-20 21:09         ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-20 21:27           ` Yuan Fu
2022-11-20 21:56             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-21  1:27               ` Yuan Fu
2022-11-21 11:00                 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-21 13:44                 ` Eli Zaretskii
2022-11-21 15:15                 ` Eli Zaretskii
2022-11-21 16:53                   ` Yuan Fu
2022-11-21 17:17                     ` Eli Zaretskii
2022-11-22  7:31                       ` Yuan Fu
2022-11-21 12:41               ` Eli Zaretskii
2022-11-20 20:17 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).