> Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Wed, 11 May 2022 13:14:33 -0700
>> Cc: Yoav Marco <yoavm448@gmail.com>,
>>  emacs-devel@gnu.org
>>
>> I redid the benchmark, but without his reuse patch, just to see how
>> much time is spent on creating query objects. So fortifying 40 lines
>> for 463 times takes 6.92s (according to Emacs, 7.30s according to the
>> profiler). That counts to 0.0158s per call to font-lock-region, of
>> which 0.0104s is spent on creating the query object. That seems to
>> tell me if we optimize away the query object creation we can make
>> font-locking very very fast?

This is a little confusing, which profiler are we talking about? Is the
difference between Emacs's 6.92s and the profiler's 7.30 because Emacs
is only benchmarking the loop, and the profiler's measuring the entire
execution? Query compilation doesn't improve startup time, so the
conclusion that only 10ms is spent on query compilation might be wrong.
And it probably is: in my benchmark, query compilation improved
performance in much more than 16/6=266%: it went from 6.06 to 0.01.

> According to your benchmarks, it is already very fast: 16 msec is a
> negligible time interval.  Of course, 40 is a somewhat arbitrary
> number, but to get a less arbitrary one, we should determine it from
> some concrete scenarios, such as the 512-character chunk JIT font-lock
> uses during redisplay, or the number of lines on a typical window
> that's important when one scrolls with C-v/M-v, etc.

It's easy enough to convert the benchmarks to 512-chars chunks rather
than 40 lines. See table a few paragraphs below.

>> font-lock: 88.28s -> 0.1997285067873303 / loop
>
> So we already have an order-of-magnitude speed-up with tree-sitter: we
> go from 200 msec down to 16 msec.  Also, 200 msec is above the
> threshold of human perception of a response delay, whereas 16 msec is
> way below that threshold.  With such significantly faster font-lock, I
> wouldn't bother caching anything, at least not yet, not unless someone
> comes up with a practical use case where the query-compilation part
> really makes a significant practical difference in terms of absolute
> response times.

> Bottom line: I think the 6-msec speedup (from 16 to 10) in the
> scenario that was used in these benchmarks doesn't justify the
> complexities of caching the queries, given the overall excellent
> performance we get with tree-sitter.  Caching is an optimization, and
> in this case it sounds like doing that now would be a premature
> optimization.

As said, I think 16→10 is a wrong conclusion.

>> If we expose "compiled query” we don’t need to cache them either.
>
> Then the Lisp program will have to do that, which is even worse,
> because the problems I described will now have to be solved by Lisp
> application programmers, each time anew.

Will they? They'd just need to compile their queries once, when defining
them or when setting treesit-font-lock-defaults.

Right now the most convenient way to represent queries is as sexps, but
although treesit accepts queries as lists major-modes are encouraged to
stringify them, since the tree-sitter API works with string queries.
This exact discussion occured when Theodor asked for feedback on the
go-mode.el:

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 2022-05-09 21:10 UTC
> To: Eli Zaretskii
>
> I have some comments below, I haven’t tested the patch yet.
>>
>> +(defvar js-treesit-font-lock-settings-1
>> +  '((javascript
>> +     (
>> +      ((identifier) @font-lock-constant-face
>> +       (:match "^[A-Z_][A-Z_\\d]*$" @font-lock-constant-face))
>
> I would use treesit-expand-query to “expand” the sexp query to string,
> so Emacs don’t need to re-expand it every time treesit-query-capture is
> called. I don’t know how much it speed things up, but hey its free.

Why don't we check how much it speeds things up?

|   |                     | font-lock | TS sexp |     TS | TS query reuse |
| 1 | xdisp.c all at once |    12.886 |   0.031 |  0.016 |          0.017 |
| 2 | 20 × 512c           |     0.273 |   0.214 |  0.209 |          0.000 |
| 3 | 512c to end         |       4m+ |  24.177 | 23.474 |          0.037 |

So the time to stringify is negligible compared to query compilation.
Also, I don't know why font lock took that much time in the last
benchmark.

> or the number of lines on a typical window that's important when one
> scrolls with C-v/M-v, etc.
The following calculation sounds a little silly to me, but here it is anyway.

xdisp.c has 32.3 chars per line on average, so each 512 char
fontification covers 15.8 lines. My Emacs window can fit 50 lines, so
when jumping to an unfontified buffer location I'll need 4 calls for
fontification. That would take, depending on the engine:

| font-lock | TS sexp |    TS | TS query reuse |
|     0.054 |   0.042 | 0.041 |           0.00 |
(The 20 × 512c row, divided by 5 to represent 4 × 512c)

Improving fontification by 41ms is worth it in my opinion, as long as
it's not complicated, which it shouldn't be when letting users compile
their queries before use, though I don't know the downsides of exposing
another type to lisp. (Currently tree-sitter adds two new types,
treesit-node and treesit-parser.)

 - Yoav