unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Yoav Marco <yoavm448@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Yuan Fu <casouri@gmail.com>, emacs-devel@gnu.org
Subject: Re: Tree-sitter integration on feature/tree-sitter
Date: Thu, 12 May 2022 17:16:41 +0300	[thread overview]
Message-ID: <87sfpekf6t.fsf@gmail.com> (raw)
In-Reply-To: <838rr7qqhw.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 5405 bytes --]

> Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Wed, 11 May 2022 13:14:33 -0700
>> Cc: Yoav Marco <yoavm448@gmail.com>,
>>  emacs-devel@gnu.org
>>
>> I redid the benchmark, but without his reuse patch, just to see how
>> much time is spent on creating query objects. So fortifying 40 lines
>> for 463 times takes 6.92s (according to Emacs, 7.30s according to the
>> profiler). That counts to 0.0158s per call to font-lock-region, of
>> which 0.0104s is spent on creating the query object. That seems to
>> tell me if we optimize away the query object creation we can make
>> font-locking very very fast?

This is a little confusing, which profiler are we talking about? Is the
difference between Emacs's 6.92s and the profiler's 7.30 because Emacs
is only benchmarking the loop, and the profiler's measuring the entire
execution? Query compilation doesn't improve startup time, so the
conclusion that only 10ms is spent on query compilation might be wrong.
And it probably is: in my benchmark, query compilation improved
performance in much more than 16/6=266%: it went from 6.06 to 0.01.

> According to your benchmarks, it is already very fast: 16 msec is a
> negligible time interval.  Of course, 40 is a somewhat arbitrary
> number, but to get a less arbitrary one, we should determine it from
> some concrete scenarios, such as the 512-character chunk JIT font-lock
> uses during redisplay, or the number of lines on a typical window
> that's important when one scrolls with C-v/M-v, etc.

It's easy enough to convert the benchmarks to 512-chars chunks rather
than 40 lines. See table a few paragraphs below.

>> font-lock: 88.28s -> 0.1997285067873303 / loop
>
> So we already have an order-of-magnitude speed-up with tree-sitter: we
> go from 200 msec down to 16 msec.  Also, 200 msec is above the
> threshold of human perception of a response delay, whereas 16 msec is
> way below that threshold.  With such significantly faster font-lock, I
> wouldn't bother caching anything, at least not yet, not unless someone
> comes up with a practical use case where the query-compilation part
> really makes a significant practical difference in terms of absolute
> response times.

> Bottom line: I think the 6-msec speedup (from 16 to 10) in the
> scenario that was used in these benchmarks doesn't justify the
> complexities of caching the queries, given the overall excellent
> performance we get with tree-sitter.  Caching is an optimization, and
> in this case it sounds like doing that now would be a premature
> optimization.

As said, I think 16→10 is a wrong conclusion.

>> If we expose "compiled query” we don’t need to cache them either.
>
> Then the Lisp program will have to do that, which is even worse,
> because the problems I described will now have to be solved by Lisp
> application programmers, each time anew.

Will they? They'd just need to compile their queries once, when defining
them or when setting treesit-font-lock-defaults.

Right now the most convenient way to represent queries is as sexps, but
although treesit accepts queries as lists major-modes are encouraged to
stringify them, since the tree-sitter API works with string queries.
This exact discussion occured when Theodor asked for feedback on the
go-mode.el:

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 2022-05-09 21:10 UTC
> To: Eli Zaretskii
>
> I have some comments below, I haven’t tested the patch yet.
>>
>> +(defvar js-treesit-font-lock-settings-1
>> +  '((javascript
>> +     (
>> +      ((identifier) @font-lock-constant-face
>> +       (:match "^[A-Z_][A-Z_\\d]*$" @font-lock-constant-face))
>
> I would use treesit-expand-query to “expand” the sexp query to string,
> so Emacs don’t need to re-expand it every time treesit-query-capture is
> called. I don’t know how much it speed things up, but hey its free.

Why don't we check how much it speeds things up?

|   |                     | font-lock | TS sexp |     TS | TS query reuse |
| 1 | xdisp.c all at once |    12.886 |   0.031 |  0.016 |          0.017 |
| 2 | 20 × 512c           |     0.273 |   0.214 |  0.209 |          0.000 |
| 3 | 512c to end         |       4m+ |  24.177 | 23.474 |          0.037 |

So the time to stringify is negligible compared to query compilation.
Also, I don't know why font lock took that much time in the last
benchmark.

> or the number of lines on a typical window that's important when one
> scrolls with C-v/M-v, etc.
The following calculation sounds a little silly to me, but here it is anyway.

xdisp.c has 32.3 chars per line on average, so each 512 char
fontification covers 15.8 lines. My Emacs window can fit 50 lines, so
when jumping to an unfontified buffer location I'll need 4 calls for
fontification. That would take, depending on the engine:

| font-lock | TS sexp |    TS | TS query reuse |
|     0.054 |   0.042 | 0.041 |           0.00 |
(The 20 × 512c row, divided by 5 to represent 4 × 512c)

Improving fontification by 41ms is worth it in my opinion, as long as
it's not complicated, which it shouldn't be when letting users compile
their queries before use, though I don't know the downsides of exposing
another type to lisp. (Currently tree-sitter adds two new types,
treesit-node and treesit-parser.)

 - Yoav

[-- Attachment #2: tree-sitter-benchmark.el --]
[-- Type: text/plain, Size: 3156 bytes --]

;;; tree-sitter-benchmark.el -*- lexical-binding: t; -*-
;; run benchmark with
;; emacs -Q --script tree-sitter-benchmark.el [-1] [-2] [-3] [-regexp]

(require 'treesit)
(require 'cc-mode)

(defvar query-type 'list
  "How to save the query. Either `string' or `list'.")
(defcustom fontifying-mode 'treesit
  "Benchmark mode."
  :type '(choice (const treesit)
                 (const regexp)))
(setq fontifying-mode 'regexp)


(defvar c-font-lock-settings-1
  `((c
     ,(with-temp-buffer
        (insert-file-contents-literally "./highlights.scm")
        ;; make capture names map to a face, any face
        (goto-char (point-min))
        (while (re-search-forward "@[a-z.]+" nil t)
          (replace-match "@font-lock-string-face" t))
        (pcase query-type
          ('string
           (buffer-substring (point-min) (point-max)))
          ('list
           (goto-char (point-min))
           (insert "(")
           (goto-char (point-max))
           (insert ")")
           (goto-char (point-min))
           (read (current-buffer)))
          (_ (user-error "`query-type' must be 'string or 'list")))))))

(defun setup-fontification ()
  (pcase fontifying-mode
    ('treesit
     (treesit-get-parser-create 'c)
     ;; This needs to be non-nil, because reasons
     (unless font-lock-defaults
       (setq font-lock-defaults '(nil t)))
     (setq-local treesit-font-lock-defaults
                 '((c-font-lock-settings-1)))
     (treesit-font-lock-enable)
     (advice-add #'font-lock-default-fontify-region :override #'ignore))
    ('regexp
     (c-mode))))

(defun fontify (beg end)
  (pcase fontifying-mode
    ('treesit (font-lock-fontify-region beg end))
    ('regexp (font-lock-fontify-region beg end nil))))

(defun buffer-middle ()
  (/ (+ (point-min) (point-max)) 2))

(with-temp-buffer
  (message "Fontification method: %s %s" fontifying-mode query-type)
  (setup-fontification)
  (insert-file-contents "xdisp.c")
  (apply #'message
         "Benchmark 1: fontify xdisp.c all at once.\
 took %2.3f, with %d gc runs (meaning %2.3f)"
         (benchmark-run 1
           (fontify (point-min) (point-max))))

  (set-text-properties (point-min) (point-max) nil)

  ;; fontify xdisp.c from the middle, since it starts with a comment header of
  ;; 22k chars
  (goto-char (buffer-middle))
  (apply #'message
         "Benchmark 2: fontify part of xdisp.c, 20 batches of 512 chars.\
 took %2.3f, with %d gc runs (meaning %2.3f)"
         (benchmark-run 1
           (dotimes (_ 20)
             (fontify (point) (min (+ 512 (point)) (point-max)))
             (forward-char 512))))

  (kill-new (buffer-substring (buffer-middle) (+ (* 512 10) (buffer-middle))))

  (set-text-properties (point-min) (point-max) nil)

  (goto-char (point-min))
  (apply #'message
         "Benchmark 3: fontify all of xdisp.c, 512 chars at a time.\
 took %2.3f, with %d gc runs (meaning %2.3f)"
         (benchmark-run 1
           (while (/= (point-max) (point))
             (fontify (point) (min (+ 512 (point)) (point-max)))
             (goto-char (min (+ 512 (point)) (point-max))))))



  (advice-remove #'font-lock-default-fontify-region #'ignore))

  parent reply	other threads:[~2022-05-12 14:16 UTC|newest]

Thread overview: 150+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09 17:50 Tree-sitter integration on feature/tree-sitter Yoav Marco
2022-05-09 20:51 ` Yuan Fu
     [not found]   ` <87lev9wyll.fsf@gmail.com>
2022-05-10 15:20     ` Yoav Marco
2022-05-10 15:43   ` Yoav Marco
2022-05-10 17:54     ` Yuan Fu
2022-05-10 18:18       ` Yoav Marco
2022-05-10 19:58         ` Stefan Monnier
2022-05-10 23:11           ` Yuan Fu
2022-05-10 23:53             ` Yuan Fu
2022-05-11 11:10         ` Eli Zaretskii
2022-05-11 11:16           ` Yoav Marco
2022-05-11 14:20             ` Eli Zaretskii
2022-05-11 15:40               ` Yoav Marco
2022-05-11 16:27                 ` Eli Zaretskii
2022-05-11 20:14                   ` Yuan Fu
2022-05-11 20:25                     ` Yuan Fu
2022-05-12  5:19                       ` Eli Zaretskii
2022-05-12  6:10                         ` Yuan Fu
2022-05-12  7:12                           ` Eli Zaretskii
2022-05-12 15:18                         ` Stefan Monnier
2022-05-12 15:53                           ` Eli Zaretskii
2022-05-12  5:17                     ` Eli Zaretskii
2022-05-12  6:07                       ` Yuan Fu
2022-05-12 14:16                       ` Yoav Marco [this message]
2022-05-12 16:04                         ` Eli Zaretskii
2022-05-12 16:26                           ` Yoav Marco
2022-05-12 17:18                             ` Eli Zaretskii
2022-05-12 17:22                               ` Yoav Marco
2022-05-13  6:34                                 ` Eli Zaretskii
2022-05-13  8:04                                   ` Theodor Thornhill
2022-05-13  8:36                                     ` Yoav Marco
2022-05-13  9:46                                       ` Theodor Thornhill
2022-05-13 10:37                                     ` Eli Zaretskii
2022-05-13 10:52                                       ` Theodor Thornhill
2022-05-13  8:42                                   ` Yoav Marco
2022-05-13 10:41                                     ` Eli Zaretskii
2022-05-14  0:04                                       ` Yuan Fu
2022-06-16 19:16                                         ` Yuan Fu
2022-06-16 21:57                                           ` yoavm448
2022-06-17  1:10                                             ` Yuan Fu
2022-05-12 15:15                       ` Stefan Monnier
2022-05-15 19:20       ` chad
2022-05-15 19:26         ` Eli Zaretskii
  -- strict thread matches above, loose matches on Subject: below --
2022-06-29 16:51 Abin Simon
2022-06-29 17:43 ` Yoav Marco
2022-06-30 11:21   ` Yoav Marco
2022-06-30 14:29     ` Abin Simon
2022-06-30 14:37       ` Yoav Marco
2022-06-28 16:08 Yoav Marco
2022-06-28 19:35 ` Yoav Marco
2022-06-29 15:35   ` Yuan Fu
2022-05-19  1:35 Kiong-Ge Liau
2022-05-19  1:35 Kiong-Ge Liau
2022-05-20  2:01 ` Yuan Fu
2022-06-16 19:03   ` Yuan Fu
2022-06-17  1:24     ` Po Lu
2022-06-18  0:09       ` Yuan Fu
2022-06-17  2:00     ` Ihor Radchenko
2022-06-17  5:23       ` Eli Zaretskii
2022-06-17 10:40         ` Ihor Radchenko
2022-06-17  6:15     ` Eli Zaretskii
2022-06-17  7:17       ` Yuan Fu
2022-06-17 10:37         ` Eli Zaretskii
2022-06-18  0:14           ` Yuan Fu
2022-06-18  6:22             ` Eli Zaretskii
2022-06-18  8:25               ` Yuan Fu
2022-06-18  8:50                 ` Eli Zaretskii
2022-06-18 20:07                   ` Yuan Fu
2022-06-19  5:39                     ` Eli Zaretskii
2022-06-20  3:00                       ` Yuan Fu
2022-06-20 11:44                         ` Eli Zaretskii
2022-06-20 20:01                           ` Yuan Fu
2022-06-21  2:26                             ` Eli Zaretskii
2022-06-21  4:39                               ` Yuan Fu
2022-06-21 10:18                                 ` Eli Zaretskii
2022-06-22  0:34                                   ` Yuan Fu
2022-06-17 11:06     ` Jostein Kjønigsen
2022-06-18  0:28       ` Yuan Fu
2022-06-18 20:57         ` Jostein Kjønigsen
2022-05-07  8:29 Yuan Fu
2022-05-07  8:44 ` Yuan Fu
2022-05-07  8:47 ` Theodor Thornhill
2022-05-07 17:59   ` Yuan Fu
2022-05-07 18:16     ` Theodor Thornhill
2022-05-07  9:04 ` Eli Zaretskii
2022-05-07  9:34   ` Theodor Thornhill
2022-05-07 18:33     ` Yuan Fu
2022-05-07 19:02       ` Theodor Thornhill
2022-05-07 18:27   ` Yuan Fu
2022-05-07 18:48     ` Eli Zaretskii
2022-05-07 19:00       ` Theodor Thornhill
2022-05-07 19:21         ` Eli Zaretskii
2022-05-07 19:11       ` Yuan Fu
2022-05-07 19:25         ` Eli Zaretskii
2022-05-07 20:00           ` Yuan Fu
2022-05-07 20:12             ` Theodor Thornhill
2022-05-07 21:24               ` Stefan Monnier
2022-05-07 22:02                 ` Theodor Thornhill
2022-05-08  6:18                 ` Eli Zaretskii
2022-05-08 12:05                   ` Dmitry Gutov
2022-05-08 12:16                     ` Stefan Monnier
2022-05-08 13:23                       ` Eli Zaretskii
2022-05-08 20:57                         ` Dmitry Gutov
2022-05-08 13:21                     ` Eli Zaretskii
2022-05-08 20:42                       ` Dmitry Gutov
2022-05-09 11:18                         ` Eli Zaretskii
2022-05-08  6:16               ` Eli Zaretskii
2022-05-08  6:49                 ` Theodor Thornhill
2022-05-08  6:58                   ` Eli Zaretskii
2022-05-08  9:02                     ` Theodor Thornhill
2022-05-08  9:09                       ` Theodor Thornhill
2022-05-08  9:10                       ` Eli Zaretskii
2022-05-08  9:19                         ` Theodor Thornhill
2022-05-08 10:33                           ` Eli Zaretskii
2022-05-08 13:47                             ` Theodor Thornhill
2022-05-08 13:58                               ` Eli Zaretskii
2022-05-08 14:01                               ` Stefan Monnier
2022-05-08 14:25                                 ` Theodor Thornhill
2022-05-08 14:42                                   ` Eli Zaretskii
2022-05-08 19:16                                     ` Theodor Thornhill
2022-05-08 21:14                                       ` Yuan Fu
2022-05-09 11:14                                       ` Eli Zaretskii
2022-05-09 12:20                                         ` Theodor Thornhill
2022-05-09 12:23                                           ` Eli Zaretskii
2022-05-09 21:10                                             ` Yuan Fu
2022-05-09 21:33                                               ` Theodor Thornhill
2022-05-14  0:03                                                 ` Yuan Fu
2022-05-14  5:03                                                   ` Theodor Thornhill
2022-05-14  5:13                                                     ` Yuan Fu
2022-05-17 21:45                                                       ` Theodor Thornhill
2022-05-18 20:52                                                         ` Yuan Fu
2022-05-18 21:07                                                           ` Theodor Thornhill
2022-06-16 19:09                                                             ` Yuan Fu
2022-06-17  6:19                                                               ` Eli Zaretskii
2022-06-17  7:32                                                                 ` Yuan Fu
2022-06-17 10:42                                                                   ` Eli Zaretskii
2022-06-18  0:20                                                                     ` Yuan Fu
2022-06-18  6:23                                                                       ` Eli Zaretskii
2022-06-20 14:20                                                                       ` Daniel Martín
2022-06-20 20:03                                                                         ` Yuan Fu
2022-06-17 18:12                                                                   ` Yoav Marco
2022-06-18  0:35                                                                     ` Yuan Fu
2022-06-18  8:15                                                                       ` Yoav Marco
2022-06-18 20:11                                                                         ` Yuan Fu
2022-05-08 22:42                             ` Stephen Leake
2022-05-14 15:09 ` Daniel Martín
2022-05-14 15:55   ` Yuan Fu
2022-05-14 18:50     ` Daniel Martín
2022-05-14 19:09       ` Eli Zaretskii
2022-06-16 19:10       ` Yuan Fu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sfpekf6t.fsf@gmail.com \
    --to=yoavm448@gmail.com \
    --cc=casouri@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).