Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
@ 2020-03-29 18:46 Stefan Monnier
  2020-03-29 19:05 ` Andrea Corallo
  0 siblings, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-29 18:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> tree-sitter, like LSP, is something Emacs should embrace.
>   https://lists.gnu.org/archive/html/emacs-devel/2020-01/msg00059.html

Ah, thanks Eli: I guess I skipped over that while catching up.

> Would someone like to try to figure out how we could use the
> incremental parsing technology in Emacs for making our
> programming-language support more accurate and efficient?  One package
> that implements this technology is tree-sitter:
>
>   https://tree-sitter.github.io/tree-sitter/

Yes, adding support for this would be great.  

> AFAIU, these capabilities could be used as an alternative to
> regexp- and syntax-pps-based font-lock, better code folding,
> completion, refactoring, and other similar features; in general, any
> feature which would benefit from having a parse tree for the source
> code in a buffer.

Some of those features could be provided by LSP as well, but IIUC the
way LSP is designed and usually used makes it somewhat inadequate for
synchronous use, when you want an immediate answer.

tree-sitter is designed exactly for that: it can parse "immediately",
in the same sense as `syntax-ppss`, so LSP seems inapplicable (in the
near future at least) for things like font-lock and navigation, and
indentation, whereas tree-sitter should work great for that.

[ W.r.t disucssions around LSP's use of JSON: AFAICT, parsing and
  emitting json can be done as efficiently as any other format, AFAICT,
  so I don't see the use of JSON as a problem in the protocol.  ]

> To be able to use such libraries, we need to figure out how to
> integrate them into the core, what kind of interfaces would be needed
> for that, and what kind of infrastructure we would need for basing
> Lisp features on those libraries.

The existing third party packages should be good starting points to come
up with a design.  But I think an important issue is to figure out how
to make tree-sitter usable for the end users: AFAICT the main issue
being how to let end users download and install new grammars.
IIUC grammars are written in Javascript (or some subset thereof?) and
then somehow compiled to C code.  Having them as C code implies either
the end-user need to have a C compiler or distributing pre-compiled
binaries with all the trouble this entails (with all the variations of
OSes, and architectures, and ABIs, ..., plus issues related to
licensing, security, ...).

Maybe those grammars could be compiled to some other representation (I
don't know if it is made mostly of data-tables or actual code or what)?

        Stefan

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-29 18:46 Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
@ 2020-03-29 19:05 ` Andrea Corallo
  2020-03-29 19:18   ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Andrea Corallo @ 2020-03-29 19:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Maybe those grammars could be compiled to some other representation (I
> don't know if it is made mostly of data-tables or actual code or what)?

IMO ideally should be lisp and we should leverage the native compiler
for that, but I understand we are not there.

  Andrea

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-29 19:05 ` Andrea Corallo
@ 2020-03-29 19:18   ` Eli Zaretskii
  2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
  2020-03-30  3:35     ` Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
  0 siblings, 2 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-29 19:18 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: monnier, emacs-devel

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
> Date: Sun, 29 Mar 2020 19:05:57 +0000
> 
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> 
> > Maybe those grammars could be compiled to some other representation (I
> > don't know if it is made mostly of data-tables or actual code or what)?
> 
> IMO ideally should be lisp and we should leverage the native compiler
> for that, but I understand we are not there.

FWIW, it should indeed be possible to develop the grammars in Lisp,
but that is not the first goal in bringing such a package to Emacs.
Not even the second one.  Because once such a package can be used with
Emacs, and the results are significantly better than what we have
today, you will see someone come up with a way of doing that in Lisp
in no time.  Making the connection happen, and coming up with a good
design for that, should be the first goal.  IMO, we should identify
the features that can benefit from that (font-lock is just one of
them, maybe not even the most important one), and design the
interfaces and the infrastructure so that it could support them all
(and then some).  But I repeat myself.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-29 19:18   ` Eli Zaretskii
@ 2020-03-29 19:29     ` Yuan Fu
  2020-03-30 14:04       ` Eli Zaretskii
  2020-03-30 15:06       ` Stefan Monnier
  2020-03-30  3:35     ` Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
  1 sibling, 2 replies; 109+ messages in thread
From: Yuan Fu @ 2020-03-29 19:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Stefan Monnier, Andrea Corallo

A related question: is there a reliable way to be notified when buffer text changes? Because AFAICT both tree-sitter and LSP needs to know incremental changes. Both LSP packages (lsp-mode and eaglet) add hooks to after-change-function. But their hook is not guaranteed to run because of inhibit-modification-hooks. Undo seems to always know the exact change, but it doesn’t seem to have a hook avaliable.

Yuan

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
@ 2020-03-30 14:04       ` Eli Zaretskii
  2020-03-30 15:06       ` Stefan Monnier
  1 sibling, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-30 14:04 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, monnier, akrl

> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 29 Mar 2020 15:29:41 -0400
> Cc: Andrea Corallo <akrl@sdf.org>,
>  Stefan Monnier <monnier@iro.umontreal.ca>,
>  emacs-devel@gnu.org
> 
> A related question: is there a reliable way to be notified when buffer text changes? Because AFAICT both tree-sitter and LSP needs to know incremental changes.

Why not simply pass to tree-sitter the chunk that jit-lock is about to
fontify?



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
  2020-03-30 14:04       ` Eli Zaretskii
@ 2020-03-30 15:06       ` Stefan Monnier
  2020-03-30 17:14         ` Yuan Fu
  1 sibling, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-30 15:06 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo

> A related question: is there a reliable way to be notified when buffer text
> changes? Because AFAICT both tree-sitter and LSP needs to know incremental
> changes. Both LSP packages (lsp-mode and eaglet) add hooks to
> after-change-function. But their hook is not guaranteed to run because of
> inhibit-modification-hooks.

If they needed to be informed of the change but
`inhibit-modification-hooks` prevented it, it's a bug.
Please report it.


        Stefan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 15:06       ` Stefan Monnier
@ 2020-03-30 17:14         ` Yuan Fu
  2020-03-30 17:54           ` Stefan Monnier
  2020-03-31  2:24           ` Eli Zaretskii
  0 siblings, 2 replies; 109+ messages in thread
From: Yuan Fu @ 2020-03-30 17:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo

[-- Attachment #1: Type: text/plain, Size: 595 bytes --]


> On Mar 30, 2020, at 11:06 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
> If they needed to be informed of the change but
> `inhibit-modification-hooks` prevented it, it's a bug.
> Please report it.
> 

Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in inhibit-modification-hooks (or the code who set it to t)?


> Why not simply pass to tree-sitter the chunk that jit-lock is about to
> fontify?


Incremental parsing seems to be the preferred way to use tree-sitter—maintaining a syntax tree on the fly and later query for information from it.

Yuan

[-- Attachment #2: Type: text/html, Size: 4192 bytes --]

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 17:14         ` Yuan Fu
@ 2020-03-30 17:54           ` Stefan Monnier
  2020-03-30 18:43             ` Štěpán Němec
  2020-03-31  2:24           ` Eli Zaretskii
  1 sibling, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-30 17:54 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo

>> If they needed to be informed of the change but
>> `inhibit-modification-hooks` prevented it, it's a bug.
>> Please report it.
> Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in
> inhibit-modification-hooks (or the code who set it to t)?

The fact that they're not informed is the bug.
So it's presumably not the fault of eglot/lsp-mode.
Whose fault it is will depend on the details of the particular situation
where it occurs.


        Stefan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 17:54           ` Stefan Monnier
@ 2020-03-30 18:43             ` Štěpán Němec
  2020-03-30 18:46               ` Stefan Monnier
  0 siblings, 1 reply; 109+ messages in thread
From: Štěpán Němec @ 2020-03-30 18:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Yuan Fu, Andrea Corallo, Eli Zaretskii, emacs-devel

On Mon, 30 Mar 2020 13:54:53 -0400
Stefan Monnier wrote:

>>> If they needed to be informed of the change but
>>> `inhibit-modification-hooks` prevented it, it's a bug.
>>> Please report it.
>> Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in
>> inhibit-modification-hooks (or the code who set it to t)?
>
> The fact that they're not informed is the bug.
> So it's presumably not the fault of eglot/lsp-mode.
> Whose fault it is will depend on the details of the particular situation
> where it occurs.

FWIW, I have described one such situation (unrelated to lsp) recently
here:

https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403

(In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
the buffer changes caused by populating dired buffers are not noticeable
in `after-change-functions'.)

I was wondering if I should report it as a bug, despite the workaround
not being particularly painful in this case (there's `dired-after-readin-hook').

-- 
Štěpán



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 18:43             ` Štěpán Němec
@ 2020-03-30 18:46               ` Stefan Monnier
  2020-03-30 19:02                 ` Yuan Fu
  2020-03-30 19:27                 ` Štěpán Němec
  0 siblings, 2 replies; 109+ messages in thread
From: Stefan Monnier @ 2020-03-30 18:46 UTC (permalink / raw)
  To: Štěpán Němec
  Cc: Yuan Fu, Andrea Corallo, Eli Zaretskii, emacs-devel

> https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403
> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
> the buffer changes caused by populating dired buffers are not noticeable
> in `after-change-functions'.)
> I was wondering if I should report it as a bug, despite the workaround
> not being particularly painful in this case (there's `dired-after-readin-hook').

I think it deserves a bug report, yes.


        Stefan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 18:46               ` Stefan Monnier
@ 2020-03-30 19:02                 ` Yuan Fu
  2020-03-30 19:10                   ` Eli Zaretskii
  2020-03-30 19:42                   ` Stefan Monnier
  2020-03-30 19:27                 ` Štěpán Němec
  1 sibling, 2 replies; 109+ messages in thread
From: Yuan Fu @ 2020-03-30 19:02 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Eli Zaretskii, Andrea Corallo, Štěpán Němec,
	emacs-devel


> On Mar 30, 2020, at 2:46 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
>> https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403
>> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
>> the buffer changes caused by populating dired buffers are not noticeable
>> in `after-change-functions'.)
>> I was wondering if I should report it as a bug, despite the workaround
>> not being particularly painful in this case (there's `dired-after-readin-hook').
> 
> I think it deserves a bug report, yes.
> 
> 
>        Stefan
> 

Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced).

Yuan


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:02                 ` Yuan Fu
@ 2020-03-30 19:10                   ` Eli Zaretskii
  2020-03-30 19:21                     ` Yuan Fu
  2020-04-01  0:57                     ` Stephen Leake
  2020-03-30 19:42                   ` Stefan Monnier
  1 sibling, 2 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-30 19:10 UTC (permalink / raw)
  To: Yuan Fu; +Cc: akrl, stepnem, monnier, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 30 Mar 2020 15:02:58 -0400
> Cc: Štěpán Němec <stepnem@gmail.com>,
>  Eli Zaretskii <eliz@gnu.org>,
>  emacs-devel <emacs-devel@gnu.org>,
>  Andrea Corallo <akrl@sdf.org>
> 
> >> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
> >> the buffer changes caused by populating dired buffers are not noticeable
> >> in `after-change-functions'.)
> >> I was wondering if I should report it as a bug, despite the workaround
> >> not being particularly painful in this case (there's `dired-after-readin-hook').
> > 
> > I think it deserves a bug report, yes.
> > 
> > 
> >        Stefan
> > 
> 
> Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced).

I agree with Stefan: it's a bug.  All dired-readin needs to do is call
the modification hooks after it's done reading in the directory.  It's
just an optimization that it inhibits the hooks while it runs: read
the comments there and you will see why it is done.

IMO, inhibit-modification-hooks is for when some code makes a
temporary change, or a change that no one is supposed to care about,
like changing faces.  Any other case is a bug.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:10                   ` Eli Zaretskii
@ 2020-03-30 19:21                     ` Yuan Fu
  2020-03-31  3:56                       ` Štěpán Němec
  2020-04-01  0:57                     ` Stephen Leake
  1 sibling, 1 reply; 109+ messages in thread
From: Yuan Fu @ 2020-03-30 19:21 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: akrl, Štěpán Němec, Stefan Monnier,
	emacs-devel



> On Mar 30, 2020, at 3:10 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 30 Mar 2020 15:02:58 -0400
>> Cc: Štěpán Němec <stepnem@gmail.com>,
>> Eli Zaretskii <eliz@gnu.org>,
>> emacs-devel <emacs-devel@gnu.org>,
>> Andrea Corallo <akrl@sdf.org>
>> 
>>>> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
>>>> the buffer changes caused by populating dired buffers are not noticeable
>>>> in `after-change-functions'.)
>>>> I was wondering if I should report it as a bug, despite the workaround
>>>> not being particularly painful in this case (there's `dired-after-readin-hook').
>>> 
>>> I think it deserves a bug report, yes.
>>> 
>>> 
>>>       Stefan
>>> 
>> 
>> Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced).
> 
> I agree with Stefan: it's a bug.  All dired-readin needs to do is call
> the modification hooks after it's done reading in the directory.  It's
> just an optimization that it inhibits the hooks while it runs: read
> the comments there and you will see why it is done.
> 
> IMO, inhibit-modification-hooks is for when some code makes a
> temporary change, or a change that no one is supposed to care about,
> like changing faces.  Any other case is a bug.

I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'.

Yuan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:21                     ` Yuan Fu
@ 2020-03-31  3:56                       ` Štěpán Němec
  2020-03-31 13:16                         ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Štěpán Němec @ 2020-03-31  3:56 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Stefan Monnier, akrl

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

On Mon, 30 Mar 2020 15:21:10 -0400
Yuan Fu wrote:

>> IMO, inhibit-modification-hooks is for when some code makes a
>> temporary change, or a change that no one is supposed to care about,
>> like changing faces.  Any other case is a bug.
>
> I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'.

I think the explanation in (info "(elisp) Change Hooks") is quite good,
but the doc string had better clarify the usage as well.

How about the attached patch?

-- 
Štěpán


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Clarify-inhibit-modification-hooks-intended-usage-in.patch --]
[-- Type: text/x-patch, Size: 1412 bytes --]

From df7e9e1eb9e9ead46c9c8596d7f844e8b7f4d10b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com>
Date: Tue, 31 Mar 2020 05:38:50 +0200
Subject: [PATCH] Clarify inhibit-modification-hooks intended usage in its doc
 string

Cf. bug#40332 and the discussion at
https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html
---
 src/insdel.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/insdel.c b/src/insdel.c
index 21acf0e61d..a9fb25a27d 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -2397,7 +2397,13 @@ syms_of_insdel (void)
 as well as hooks attached to text properties and overlays.
 Setting this variable non-nil also inhibits file locks and checks
 whether files are locked by another Emacs session, as well as
-handling of the active region per `select-active-regions'.  */);
+handling of the active region per `select-active-regions'.
+
+This variable should only be used for modifications that do not result
+in lasting changes to buffer text contents (for example face changes or
+temporary modifications).  If you only need to delay change hooks during
+a series of changes (typically for performance reasons), you can use
+`combine-change-calls' or `combine-after-change-calls' instead.  */);
   inhibit_modification_hooks = 0;
   DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks");
 
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31  3:56                       ` Štěpán Němec
@ 2020-03-31 13:16                         ` Eli Zaretskii
  2020-03-31 13:36                           ` Štěpán Němec
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 13:16 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel

> From: Štěpán Němec <stepnem@gmail.com>
> Date: Tue, 31 Mar 2020 05:56:55 +0200
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org>,
>  Stefan Monnier <monnier@iro.umontreal.ca>, akrl@sdf.org
> 
> > I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'.
> 
> I think the explanation in (info "(elisp) Change Hooks") is quite good,
> but the doc string had better clarify the usage as well.
> 
> How about the attached patch?

Thanks, I think this is too wordy for a doc string.  I think it should
be enough to mention the two variables ("See also ...") and maybe add
a link to the ELisp manual section you mention.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:16                         ` Eli Zaretskii
@ 2020-03-31 13:36                           ` Štěpán Němec
  2020-03-31 14:34                             ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Štěpán Němec @ 2020-03-31 13:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

On Tue, 31 Mar 2020 16:16:20 +0300
Eli Zaretskii wrote:

>> I think the explanation in (info "(elisp) Change Hooks") is quite good,
>> but the doc string had better clarify the usage as well.
>> 
>> How about the attached patch?
>
> Thanks, I think this is too wordy for a doc string.  I think it should
> be enough to mention the two variables ("See also ...") and maybe add
> a link to the ELisp manual section you mention.

In that case, could we add the "should" part (or something similar) to
the manual (in addition to the doc string reference you describe)? It is
true that careful reading of the manual and the relevant doc strings as
they are now could suffice to make an informed decision on when
`inhibit-modification-hooks' is (in)appropriate, but I think having some
kind of explicit heads-up or dissuation regarding the likely misuse
would be better.

-- 
Štěpán

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:36                           ` Štěpán Němec
@ 2020-03-31 14:34                             ` Eli Zaretskii
  2020-03-31 15:37                               ` Štěpán Němec
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 14:34 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel

> From: Štěpán Němec <stepnem@gmail.com>
> Cc: casouri@gmail.com, emacs-devel@gnu.org, monnier@iro.umontreal.ca,
>  akrl@sdf.org
> Date: Tue, 31 Mar 2020 15:36:21 +0200
> 
> > Thanks, I think this is too wordy for a doc string.  I think it should
> > be enough to mention the two variables ("See also ...") and maybe add
> > a link to the ELisp manual section you mention.
> 
> In that case, could we add the "should" part (or something similar) to
> the manual (in addition to the doc string reference you describe)?

Most probably yes, but could you show the change you had in mind for
the manual?

> It is true that careful reading of the manual and the relevant doc
> strings as they are now could suffice to make an informed decision
> on when `inhibit-modification-hooks' is (in)appropriate, but I think
> having some kind of explicit heads-up or dissuation regarding the
> likely misuse would be better.

I agree, and the manual is the place to have such discussions and
recommendations.

Thanks.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 14:34                             ` Eli Zaretskii
@ 2020-03-31 15:37                               ` Štěpán Němec
  2020-03-31 15:58                                 ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Štěpán Němec @ 2020-03-31 15:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

[-- Attachment #1: Type: text/plain, Size: 342 bytes --]

On Tue, 31 Mar 2020 17:34:59 +0300
Eli Zaretskii wrote:

>> In that case, could we add the "should" part (or something similar) to
>> the manual (in addition to the doc string reference you describe)?
>
> Most probably yes, but could you show the change you had in mind for
> the manual?

Another attempt attached.

  Štěpán


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Clarify-documentation-on-inhibit-modification-hooks-.patch --]
[-- Type: text/x-patch, Size: 2038 bytes --]

From ccf0390392b08bcc1aa9aff24bb62dd3bb4bbfbd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com>
Date: Tue, 31 Mar 2020 05:38:50 +0200
Subject: [PATCH] Clarify documentation on inhibit-modification-hooks intended
 usage

Cf. bug#40332 and the discussion at
https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html
---
 doc/lispref/text.texi | 7 +++++++
 src/insdel.c          | 8 +++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index 3bb055a68d..daba03fadf 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -5776,4 +5776,11 @@ Change Hooks
 may cause recursive calls to the modification hooks, so be sure to
 prepare for that (for example, by binding some variable which tells
 your hook to do nothing).
+
+@strong{Warning:} You should only bind this variable for modifications
+that do not result in lasting changes to buffer text contents (for
+example face changes or temporary modifications).  If you need to
+delay change hooks during a series of changes (typically for
+performance reasons), use @code{combine-change-calls} or
+@code{combine-after-change-calls} instead.
 @end defvar
diff --git a/src/insdel.c b/src/insdel.c
index 21acf0e61d..236346fada 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -2397,7 +2397,13 @@ syms_of_insdel (void)
 as well as hooks attached to text properties and overlays.
 Setting this variable non-nil also inhibits file locks and checks
 whether files are locked by another Emacs session, as well as
-handling of the active region per `select-active-regions'.  */);
+handling of the active region per `select-active-regions'.
+
+To delay change hooks during a series of changes, use
+`combine-change-calls' or `combine-after-change-calls' instead of
+modifying this variable.
+
+See also the info node `(elisp) Change Hooks'.  */);
   inhibit_modification_hooks = 0;
   DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks");
 
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:37                               ` Štěpán Němec
@ 2020-03-31 15:58                                 ` Eli Zaretskii
  2020-03-31 16:18                                   ` Štěpán Němec
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 15:58 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: casouri, emacs-devel, monnier, akrl

> From: Štěpán Němec <stepnem@gmail.com>
> Cc: casouri@gmail.com,  akrl@sdf.org,  monnier@iro.umontreal.ca,
>   emacs-devel@gnu.org
> Date: Tue, 31 Mar 2020 17:37:22 +0200
> 
> Another attempt attached.

Thanks.  I have a couple of minor nits:

> +@strong{Warning:} You should only bind this variable for modifications

I'd prefer to remove the warning, and say "We recommend that..."
rather than "You should only...".

> +To delay change hooks during a series of changes, use
> +`combine-change-calls' or `combine-after-change-calls' instead of
> +modifying this variable.
   ^^^^^^^^^
"binding"

Other than that, LGTM.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:58                                 ` Eli Zaretskii
@ 2020-03-31 16:18                                   ` Štěpán Němec
  2020-03-31 17:38                                     ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Štěpán Němec @ 2020-03-31 16:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 492 bytes --]

On Tue, 31 Mar 2020 18:58:58 +0300
Eli Zaretskii wrote:

>> +@strong{Warning:} You should only bind this variable for modifications
>
> I'd prefer to remove the warning, and say "We recommend that..."
> rather than "You should only...".
>
>> +To delay change hooks during a series of changes, use
>> +`combine-change-calls' or `combine-after-change-calls' instead of
>> +modifying this variable.
>   ^^^^^^^^^
> "binding"

Updated version attached, thank you.

  Štěpán


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Clarify-documentation-on-inhibit-modification-hooks-.patch --]
[-- Type: text/x-patch, Size: 2029 bytes --]

From 8e2a5a8c8381c85d138f34d37931c52c289da2ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com>
Date: Tue, 31 Mar 2020 05:38:50 +0200
Subject: [PATCH] Clarify documentation on inhibit-modification-hooks intended
 usage

Cf. bug#40332 and the discussion at
https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html
---
 doc/lispref/text.texi | 7 +++++++
 src/insdel.c          | 8 +++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index 3bb055a68d..0d32c571b7 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -5776,4 +5776,11 @@ Change Hooks
 may cause recursive calls to the modification hooks, so be sure to
 prepare for that (for example, by binding some variable which tells
 your hook to do nothing).
+
+We recommend that you only bind this variable for modifications that
+do not result in lasting changes to buffer text contents (for example
+face changes or temporary modifications).  If you need to delay change
+hooks during a series of changes (typically for performance reasons),
+use @code{combine-change-calls} or @code{combine-after-change-calls}
+instead.
 @end defvar
diff --git a/src/insdel.c b/src/insdel.c
index 21acf0e61d..dfa1cc311c 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -2397,7 +2397,13 @@ syms_of_insdel (void)
 as well as hooks attached to text properties and overlays.
 Setting this variable non-nil also inhibits file locks and checks
 whether files are locked by another Emacs session, as well as
-handling of the active region per `select-active-regions'.  */);
+handling of the active region per `select-active-regions'.
+
+To delay change hooks during a series of changes, use
+`combine-change-calls' or `combine-after-change-calls' instead of
+binding this variable.
+
+See also the info node `(elisp) Change Hooks'.  */);
   inhibit_modification_hooks = 0;
   DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks");
 
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 16:18                                   ` Štěpán Němec
@ 2020-03-31 17:38                                     ` Eli Zaretskii
  0 siblings, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:38 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel

> From: Štěpán Němec <stepnem@gmail.com>
> Cc: casouri@gmail.com,  emacs-devel@gnu.org,  monnier@iro.umontreal.ca,
>   akrl@sdf.org
> Date: Tue, 31 Mar 2020 18:18:57 +0200
> 
> Updated version attached, thank you.

Perfect, thanks.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:10                   ` Eli Zaretskii
  2020-03-30 19:21                     ` Yuan Fu
@ 2020-04-01  0:57                     ` Stephen Leake
  1 sibling, 0 replies; 109+ messages in thread
From: Stephen Leake @ 2020-04-01  0:57 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Is it really a bug of dired-mode? Dired-mode probably has a good
>> reason to bind `inhibit-modification-hooks` to t. And if we provide
>> such feature (disabling after-change-functions), we should expect
>> people using it. Maybe there should be a reliable way to be informed
>> of buffer changes (that cannot be silenced).
>
> I agree with Stefan: it's a bug.  All dired-readin needs to do is call
> the modification hooks after it's done reading in the directory.  It's
> just an optimization that it inhibits the hooks while it runs: read
> the comments there and you will see why it is done.
>
> IMO, inhibit-modification-hooks is for when some code makes a
> temporary change, or a change that no one is supposed to care about,
> like changing faces.  Any other case is a bug.

ada-mode occasionally binds wisi-inhibit-parse for a similar reason; it
is writing Ada source, so it is about to make several changes, during
which the buffer will be syntactically incorrect, but it will be correct
when done. The wisi after-change-functions still record changed
regions, but the parser is not called until all the changes are done.

Perhaps tree-sitter and eglot could use a similar approach.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:02                 ` Yuan Fu
  2020-03-30 19:10                   ` Eli Zaretskii
@ 2020-03-30 19:42                   ` Stefan Monnier
  1 sibling, 0 replies; 109+ messages in thread
From: Stefan Monnier @ 2020-03-30 19:42 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Eli Zaretskii, Andrea Corallo, Štěpán Němec,
	emacs-devel

>>> https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403
>>> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
>>> the buffer changes caused by populating dired buffers are not noticeable
>>> in `after-change-functions'.)
>>> I was wondering if I should report it as a bug, despite the workaround
>>> not being particularly painful in this case (there's `dired-after-readin-hook').
>> I think it deserves a bug report, yes.
> Is it really a bug of dired-mode?

Just file the bug report and send me the bug number so I can include it
in the commit of the fix I have here ready to be installed.


        Stefan "if you have to wonder if it's a bug, then file it as a bug"




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 18:46               ` Stefan Monnier
  2020-03-30 19:02                 ` Yuan Fu
@ 2020-03-30 19:27                 ` Štěpán Němec
  1 sibling, 0 replies; 109+ messages in thread
From: Štěpán Němec @ 2020-03-30 19:27 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Yuan Fu, emacs-devel, Eli Zaretskii, Andrea Corallo

On Mon, 30 Mar 2020 14:46:48 -0400
Stefan Monnier wrote:

> I think it deserves a bug report, yes.

Done (bug#40332), thanks.

-- 
Štěpán



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 17:14         ` Yuan Fu
  2020-03-30 17:54           ` Stefan Monnier
@ 2020-03-31  2:24           ` Eli Zaretskii
  2020-03-31  3:10             ` Stefan Monnier
  1 sibling, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31  2:24 UTC (permalink / raw)
  To: Yuan Fu; +Cc: akrl, monnier, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 30 Mar 2020 13:14:02 -0400
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org,
>  Andrea Corallo <akrl@sdf.org>
> 
>  Why not simply pass to tree-sitter the chunk that jit-lock is about to
>  fontify?
> 
> Incremental parsing seems to be the preferred way to use tree-sitter—maintaining a syntax tree on the fly
> and later query for information from it.

I don't see how this contradicts my proposal of passing just the chunk
that we need to fontify.  The function that actually passes the
portion of the buffer to tree-sitter can always extend the chunk in
both direction to make it easier, like make sure it's a complete code
block or something.

IOW, our goal is not to build the syntax tree, it's to give
tree-sitter enough information to allow us to fontify the part that's
about to be displayed.  We need to have tree-sitter play by Emacs
rules, not teach Emacs to play by tree-sitter rules.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31  2:24           ` Eli Zaretskii
@ 2020-03-31  3:10             ` Stefan Monnier
  2020-03-31 13:14               ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-31  3:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Yuan Fu, akrl, emacs-devel

> IOW, our goal is not to build the syntax tree, it's to give
> tree-sitter enough information to allow us to fontify the part that's
> about to be displayed.  We need to have tree-sitter play by Emacs
> rules, not teach Emacs to play by tree-sitter rules.

IIUC, tree-sitter starts by parsing the whole buffer anyway, and then
keeps the parse tree up-to-date in response to buffer changes.

Its algorithm is tuned so that the time needed to update the tree is
more or less proportional to the size of the change.

So jit-lock/font-lock doesn't need to pass any part of the buffer to
tree-sitter: tree-sitter already has the buffer's content and we can
assume its already parsed.  What emacs-tree-sitter's proposed
tree-sitter-highlight does is provide a function which takes
a START..END, then finds which part of the existing parse tree cover
that region and "reads the tree" to fontify the corresponding text.

        Stefan

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31  3:10             ` Stefan Monnier
@ 2020-03-31 13:14               ` Eli Zaretskii
  2020-03-31 14:31                 ` Dmitry Gutov
                                   ` (2 more replies)
  0 siblings, 3 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 13:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Yuan Fu <casouri@gmail.com>,  emacs-devel@gnu.org,  akrl@sdf.org
> Date: Mon, 30 Mar 2020 23:10:57 -0400
> 
> > IOW, our goal is not to build the syntax tree, it's to give
> > tree-sitter enough information to allow us to fontify the part that's
> > about to be displayed.  We need to have tree-sitter play by Emacs
> > rules, not teach Emacs to play by tree-sitter rules.
> 
> IIUC, tree-sitter starts by parsing the whole buffer anyway, and then
> keeps the parse tree up-to-date in response to buffer changes.

Why does it need the entire buffer up front? that sounds like a
potential performance killer.  Fontifying a small part of a buffer
doesn't need its entire text.

In any case, I hope that passing the buffer to tree-sitter doesn't
involve marshalling the entire buffer text via a function call as a
huge string, or some such.  We should instead request that tree-sitter
exposes an API through which we could give it direct access to buffer
text as 2 parts, before and after the gap, like we do with regex
code.  Otherwise this will be a bottleneck in the long run, not unlike
the problem we have with LSP.

> Its algorithm is tuned so that the time needed to update the tree is
> more or less proportional to the size of the change.
> 
> So jit-lock/font-lock doesn't need to pass any part of the buffer to
> tree-sitter: tree-sitter already has the buffer's content and we can
> assume its already parsed.  What emacs-tree-sitter's proposed
> tree-sitter-highlight does is provide a function which takes
> a START..END, then finds which part of the existing parse tree cover
> that region and "reads the tree" to fontify the corresponding text.

I still don't see why it would need the entire buffer for this class
of applications.  Did anyone try the alternatives, in particular on
very large buffers?

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:14               ` Eli Zaretskii
@ 2020-03-31 14:31                 ` Dmitry Gutov
  2020-03-31 15:36                   ` Eli Zaretskii
  2020-03-31 15:11                 ` Stefan Monnier
  2020-03-31 16:13                 ` Alan Third
  2 siblings, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-03-31 14:31 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: casouri, emacs-devel, akrl

On 31.03.2020 16:14, Eli Zaretskii wrote:
> Why does it need the entire buffer up front? that sounds like a
> potential performance killer.  Fontifying a small part of a buffer
> doesn't need its entire text.

Because the end product of parsing the buffer is an AST, and the author 
decided to minimize the odds of problems that come with 
incomplete/broken ASTs.

The previous (first) discussion of TreeSitter has an URL to a 
presentation video. You can give it a watch.

Regarding performance, their solution is to make first parsing as fast 
as possible, and updates to an existing AST faster still.

As for the difficulty of sending the whole buffer contents... maybe VS 
Code and Atom somehow make it easier? If so, someone should investigate 
why it has to be slower in Emacs.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 14:31                 ` Dmitry Gutov
@ 2020-03-31 15:36                   ` Eli Zaretskii
  2020-03-31 15:45                     ` Dmitry Gutov
  2020-03-31 17:16                     ` Stefan Monnier
  0 siblings, 2 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 15:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl

> Cc: casouri@gmail.com, akrl@sdf.org, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 31 Mar 2020 17:31:43 +0300
> 
> On 31.03.2020 16:14, Eli Zaretskii wrote:
> > Why does it need the entire buffer up front? that sounds like a
> > potential performance killer.  Fontifying a small part of a buffer
> > doesn't need its entire text.
> 
> Because the end product of parsing the buffer is an AST, and the author 
> decided to minimize the odds of problems that come with 
> incomplete/broken ASTs.

But it definitely can work with parts of the buffer, and we don't need
it to have a complete AST for this particular job.

> The previous (first) discussion of TreeSitter has an URL to a 
> presentation video. You can give it a watch.

Thanks, I've watched it back in January, when I wrote my message
calling for its integration.

> Regarding performance, their solution is to make first parsing as fast 
> as possible, and updates to an existing AST faster still.

I'm talking about _our_ performance, not theirs.

> As for the difficulty of sending the whole buffer contents... maybe VS 
> Code and Atom somehow make it easier? If so, someone should investigate 
> why it has to be slower in Emacs.

It should be obvious that sending a buffer as a single string is less
efficient than letting tree-sitter access buffer text directly.  We
just need an appropriate API for that (maybe there is one already, I
didn't take a look at their sources since January).



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:36                   ` Eli Zaretskii
@ 2020-03-31 15:45                     ` Dmitry Gutov
  2020-03-31 17:16                     ` Stefan Monnier
  1 sibling, 0 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-03-31 15:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

On 31.03.2020 18:36, Eli Zaretskii wrote:
> But it definitely can work with parts of the buffer, and we don't need
> it to have a complete AST for this particular job.

Syntax highlighting can and often does depend on buffer contents after 
the region.

It's one thing to mis-highlight a part of the buffer because the 
contents are incomplete (the user hasn't typed the full expression).

It's another thing to mis-highlight it because the chunk requested by 
jit-lock ended on a particular ambiguous position.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:36                   ` Eli Zaretskii
  2020-03-31 15:45                     ` Dmitry Gutov
@ 2020-03-31 17:16                     ` Stefan Monnier
  2020-03-31 17:48                       ` Eli Zaretskii
  1 sibling, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-31 17:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel, Dmitry Gutov

> It should be obvious that sending a buffer as a single string is less
> efficient than letting tree-sitter access buffer text directly.  We
> just need an appropriate API for that (maybe there is one already, I
> didn't take a look at their sources since January).

My benchmark say that `buffer-string` takes about 1/3 the time of
`parse-partial-sexp`, so letting tree-sitter access our buffer text
directly is unlikely to give more than a 30% speed up.

It doesn't mean it wouldn't be a desirable optimization, but it does
mean that it likely won't make a large difference as to whether it's
"fast enough".


        Stefan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:16                     ` Stefan Monnier
@ 2020-03-31 17:48                       ` Eli Zaretskii
  2020-03-31 19:35                         ` Stefan Monnier
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel, dgutov

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Dmitry Gutov <dgutov@yandex.ru>,  casouri@gmail.com,  akrl@sdf.org,
>   emacs-devel@gnu.org
> Date: Tue, 31 Mar 2020 13:16:33 -0400
> 
> > It should be obvious that sending a buffer as a single string is less
> > efficient than letting tree-sitter access buffer text directly.  We
> > just need an appropriate API for that (maybe there is one already, I
> > didn't take a look at their sources since January).
> 
> My benchmark say that `buffer-string` takes about 1/3 the time of
> `parse-partial-sexp`, so letting tree-sitter access our buffer text
> directly is unlikely to give more than a 30% speed up.

Sure, but we never call parse-partial-sexp on the entire buffer, do
we?

> It doesn't mean it wouldn't be a desirable optimization, but it does
> mean that it likely won't make a large difference as to whether it's
> "fast enough".

I disagree.  Communicating with a C library by making a string out of
buffer text is extremely inelegant and inefficient.  We shouldn't do
that except when the strings are very short.





^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:48                       ` Eli Zaretskii
@ 2020-03-31 19:35                         ` Stefan Monnier
  2020-04-01  2:23                           ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-31 19:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel, dgutov

>> > It should be obvious that sending a buffer as a single string is less
>> > efficient than letting tree-sitter access buffer text directly.  We
>> > just need an appropriate API for that (maybe there is one already, I
>> > didn't take a look at their sources since January).
>> My benchmark say that `buffer-string` takes about 1/3 the time of
>> `parse-partial-sexp`, so letting tree-sitter access our buffer text
>> directly is unlikely to give more than a 30% speed up.
> Sure, but we never call parse-partial-sexp on the entire buffer, do we?

Not sure how that's relevant.  I only used `parse-partial-sexp` as
a lower bound on the time tree-sitter is likely to take to do its
own parsing.

>> It doesn't mean it wouldn't be a desirable optimization, but it does
>> mean that it likely won't make a large difference as to whether it's
>> "fast enough".
> I disagree.

Your disagreement doesn't seem to be with what I said: I didn't argue
about the elegance or efficiency, only about the fact that the
performance impact is likely to be small enough that it's not going to
affect the viability of the approach.

> Communicating with a C library by making a string out of buffer text
> is extremely inelegant and inefficient.  We shouldn't do that except
> when the strings are very short.

FWIW, elegant/efficient or not, that's the standard way to do
it, AFAICT.  E.g. that's what we do in `secure-hash`, that's what we do
when parsing JSON, ...

You basically always need to en/decode the content (even if it is into
utf-8, we still need to handle the potential raw-bytes), so a copy is
hard to avoid.

Note that for regexp-matching the problem is slightly different because
we don't know beforehand which part of the buffer will be consulted, so
doing a "copy and then regmatch" would be too inefficient (we'd always
need to copy everything til point-max).

        Stefan

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 19:35                         ` Stefan Monnier
@ 2020-04-01  2:23                           ` Eli Zaretskii
  0 siblings, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-01  2:23 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel, dgutov

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: dgutov@yandex.ru,  casouri@gmail.com,  akrl@sdf.org,  emacs-devel@gnu.org
> Date: Tue, 31 Mar 2020 15:35:41 -0400
> 
> You basically always need to en/decode the content (even if it is into
> utf-8, we still need to handle the potential raw-bytes), so a copy is
> hard to avoid.

It isn't hard in this case, AFAICT.  Tree-sitter has an API where we
can provide a function that will deliver text at a given offset.  We
should use that to access buffer text directly.  We can avoid encoding
the buffer text by converting raw bytes into something like U+FFFD, or
something else that tree-sitter will ignore.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:14               ` Eli Zaretskii
  2020-03-31 14:31                 ` Dmitry Gutov
@ 2020-03-31 15:11                 ` Stefan Monnier
  2020-03-31 15:44                   ` Eli Zaretskii
  2020-03-31 16:13                 ` Alan Third
  2 siblings, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-31 15:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel

>> IIUC, tree-sitter starts by parsing the whole buffer anyway, and then
>> keeps the parse tree up-to-date in response to buffer changes.
> Why does it need the entire buffer up front?

Because as a general rule you cannot parse a region without looking at
all the preceding text.  That's why when we fontify START..BEG we need
to begin by computing the `syntax-ppss` at START, which involved passing
the whole text from `point-min` to START though `parse-partial-sexp`.

> that sounds like a potential performance killer.

Indeed.  And so does this `syntax-ppss` call we have.
It's OK as long as the parsing is fast enough and you don't use it in
too large buffers.

E.g. I expect that most programming major modes currently exhibit
significant delays when you jump to the end of multi-GB buffer because
of that `syntax-ppss` call.

> Fontifying a small part of a buffer doesn't need its entire text.

Sadly, it does.  In specific cases you may be able to speed things up,
but that's only applicable to some cases.

I'm sure there could be other approaches that focus on trying to parse as
little of the buffer text as possible (e.g. SMIE follows this kind of
idea), but it's difficult to make them work with a "normal" grammar,
providing a full parse tree and giving a reliable result (and without
it degenerating to parsing the whole buffer anyway in most cases).

> In any case, I hope that passing the buffer to tree-sitter doesn't
> involve marshalling the entire buffer text via a function call as a
> huge string, or some such.

These are internal implementation details that can be tweaked later on.
I do expect that the code currently needs to call `buffer-string` or its
moral equivalent.  But if the resources this requires are significant
enough to worry about, then it's a great news: it means the parsing
itself is very fast.

> We should instead request that tree-sitter exposes an API through
> which we could give it direct access to buffer text as 2 parts, before
> and after the gap, like we do with regex code.  Otherwise this will be
> a bottleneck in the long run, not unlike the problem we have with LSP.

I'm not sure exactly which problem with LSP you're thinking about, but
I doubt `buffer-string` is a significant component of a performance
problem with LSP: the time to pass that string to the server via a pipe
should dwarf it.

> I still don't see why it would need the entire buffer for this class
> of applications.  Did anyone try the alternatives, in particular on
> very large buffers?

What alternatives?
How large is "very large" here?

        Stefan

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:11                 ` Stefan Monnier
@ 2020-03-31 15:44                   ` Eli Zaretskii
  2020-03-31 17:10                     ` Stefan Monnier
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 15:44 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: casouri@gmail.com,  emacs-devel@gnu.org,  akrl@sdf.org
> Date: Tue, 31 Mar 2020 11:11:22 -0400
> 
> > I still don't see why it would need the entire buffer for this class
> > of applications.  Did anyone try the alternatives, in particular on
> > very large buffers?
> 
> What alternatives?

Let tree-sitter see just a portion of the buffer, like the outer block
of what will be displayed in the window.  You are saying that this is
impossible, but do tree-sitter developers also say that?

> How large is "very large" here?

xdisp.c comes to mind, obviously.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:44                   ` Eli Zaretskii
@ 2020-03-31 17:10                     ` Stefan Monnier
  2020-03-31 17:19                       ` Jorge Javier Araya Navarro
  2020-03-31 17:46                       ` Eli Zaretskii
  0 siblings, 2 replies; 109+ messages in thread
From: Stefan Monnier @ 2020-03-31 17:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel

>> > I still don't see why it would need the entire buffer for this class
>> > of applications.  Did anyone try the alternatives, in particular on
>> > very large buffers?
>> What alternatives?
> Let tree-sitter see just a portion of the buffer, like the outer block
> of what will be displayed in the window.  You are saying that this is
> impossible,

I think it would be definitely possible if you present "from point-min
to POS".  But "from START to END" is much more difficult, yes.

> but do tree-sitter developers also say that?

You'd have to ask them.  But what I say is based on the knowledge
I gleaned by reading the academic literature that the tree-sitter
authors cite (I did that while working on an article on SMIE ;-)

In any case, your question is really about the design of tree-sitter
rather than the design of the interface between tree-sitter and Emacs.

AFAICT tree-sitter is pretty close to the state of the art in this area,
so I think it's worth trying it out to see how it performs before
considering changing its design.

>> How large is "very large" here?
> xdisp.c comes to mind, obviously.

I'd expect tree-sitter to be able to parse xdisp.c in one second or less.

        Stefan

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:10                     ` Stefan Monnier
@ 2020-03-31 17:19                       ` Jorge Javier Araya Navarro
  2020-03-31 17:46                       ` Eli Zaretskii
  1 sibling, 0 replies; 109+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-03-31 17:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, casouri, akrl

[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]

>>> How large is "very large" here?
>> xdisp.c comes to mind, obviously.
>
> I'd expect tree-sitter to be able to parse xdisp.c in one second or less.

It's funny because this can be tested doing a C program, sadly I don't have
the time now for writting it.

El mar., 31 de mar. de 2020 a la(s) 11:10, Stefan Monnier (
monnier@iro.umontreal.ca) escribió:

> >> > I still don't see why it would need the entire buffer for this class
> >> > of applications.  Did anyone try the alternatives, in particular on
> >> > very large buffers?
> >> What alternatives?
> > Let tree-sitter see just a portion of the buffer, like the outer block
> > of what will be displayed in the window.  You are saying that this is
> > impossible,
>
> I think it would be definitely possible if you present "from point-min
> to POS".  But "from START to END" is much more difficult, yes.
>
> > but do tree-sitter developers also say that?
>
> You'd have to ask them.  But what I say is based on the knowledge
> I gleaned by reading the academic literature that the tree-sitter
> authors cite (I did that while working on an article on SMIE ;-)
>
> In any case, your question is really about the design of tree-sitter
> rather than the design of the interface between tree-sitter and Emacs.
>
> AFAICT tree-sitter is pretty close to the state of the art in this area,
> so I think it's worth trying it out to see how it performs before
> considering changing its design.
>
> >> How large is "very large" here?
> > xdisp.c comes to mind, obviously.
>
> I'd expect tree-sitter to be able to parse xdisp.c in one second or less.
>
>
>         Stefan
>
>
>

[-- Attachment #2: Type: text/html, Size: 2465 bytes --]

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:10                     ` Stefan Monnier
  2020-03-31 17:19                       ` Jorge Javier Araya Navarro
@ 2020-03-31 17:46                       ` Eli Zaretskii
  2020-03-31 18:42                         ` 조성빈
  2020-03-31 18:47                         ` Dmitry Gutov
  1 sibling, 2 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:46 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: casouri@gmail.com,  emacs-devel@gnu.org,  akrl@sdf.org
> Date: Tue, 31 Mar 2020 13:10:27 -0400
> 
> >> How large is "very large" here?
> > xdisp.c comes to mind, obviously.
> 
> I'd expect tree-sitter to be able to parse xdisp.c in one second or less.

One second of delay before the first window-full is displayed?  This
is like infinity.

And you didn't account for the time to take buffer-string of the
entire buffer (which involves allocating a large chunk of memory),
then encode it in UTF-8 (which needs to allocate another chunk of
memory), and pass that to tree-sitter.  If that's what the current
interface does.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:46                       ` Eli Zaretskii
@ 2020-03-31 18:42                         ` 조성빈
  2020-03-31 19:29                           ` Eli Zaretskii
  2020-03-31 18:47                         ` Dmitry Gutov
  1 sibling, 1 reply; 109+ messages in thread
From: 조성빈 @ 2020-03-31 18:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, casouri, akrl, Emacs-devel


> 2020. 4. 1. 오전 2:53, Eli Zaretskii <eliz@gnu.org> 작성:
> 
> 
>> 
>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: casouri@gmail.com,  emacs-devel@gnu.org,  akrl@sdf.org
>> Date: Tue, 31 Mar 2020 13:10:27 -0400
>> 
>>>> How large is "very large" here?
>>> xdisp.c comes to mind, obviously.
>> 
>> I'd expect tree-sitter to be able to parse xdisp.c in one second or less.
> 
> One second of delay before the first window-full is displayed?  This
> is like infinity.

Maybe I misunderstood, or maybe it’s just b.c. I don’t know enough internals, but doesn’t Emacs just display the raw text until highlighting is finished? It wouldn’t be an experience of not seeing the text for a sec, it would be more of a see the text and highlights are applied later.  

> And you didn't account for the time to take buffer-string of the
> entire buffer (which involves allocating a large chunk of memory),
> then encode it in UTF-8 (which needs to allocate another chunk of
> memory), and pass that to tree-sitter.  If that's what the current
> interface does.
> 



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 18:42                         ` 조성빈
@ 2020-03-31 19:29                           ` Eli Zaretskii
  0 siblings, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 19:29 UTC (permalink / raw)
  To: 조성빈; +Cc: casouri, Emacs-devel, monnier, akrl

> From: 조성빈 <pcr910303@icloud.com>
> Date: Wed, 1 Apr 2020 03:42:31 +0900
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, casouri@gmail.com,
>  akrl@sdf.org, Emacs-devel@gnu.org
> 
> > One second of delay before the first window-full is displayed?  This
> > is like infinity.
> 
> Maybe I misunderstood, or maybe it’s just b.c. I don’t know enough internals, but doesn’t Emacs just display the raw text until highlighting is finished?

I guess you are talking about jit-lock-defer-time and friends.  That's
off by default.  The default behavior is to fontify completely the
chunk that is about to be displayed (actually, we fontify slightly
more than that).



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:46                       ` Eli Zaretskii
  2020-03-31 18:42                         ` 조성빈
@ 2020-03-31 18:47                         ` Dmitry Gutov
  2020-03-31 18:48                           ` Noam Postavsky
  2020-03-31 19:26                           ` Eli Zaretskii
  1 sibling, 2 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-03-31 18:47 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: casouri, emacs-devel, akrl

On 31.03.2020 20:46, Eli Zaretskii wrote:
> One second of delay before the first window-full is displayed?  This
> is like infinity.

This is what we have now:

(benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))

=> Elapsed time: 1.940401s (0.376140s in 6 GCs)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 18:47                         ` Dmitry Gutov
@ 2020-03-31 18:48                           ` Noam Postavsky
  2020-03-31 19:02                             ` Dmitry Gutov
  2020-03-31 19:26                           ` Eli Zaretskii
  1 sibling, 1 reply; 109+ messages in thread
From: Noam Postavsky @ 2020-03-31 18:48 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: Eli Zaretskii, akrl, Yuan Fu, Stefan Monnier, Emacs developers

On Tue, 31 Mar 2020 at 14:47, Dmitry Gutov <dgutov@yandex.ru> wrote:
>
> On 31.03.2020 20:46, Eli Zaretskii wrote:
> > One second of delay before the first window-full is displayed?  This
> > is like infinity.
>
> This is what we have now:

Except that s/first window-full/last window-full/



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 18:48                           ` Noam Postavsky
@ 2020-03-31 19:02                             ` Dmitry Gutov
  0 siblings, 0 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-03-31 19:02 UTC (permalink / raw)
  To: Noam Postavsky
  Cc: Eli Zaretskii, akrl, Yuan Fu, Stefan Monnier, Emacs developers

On 31.03.2020 21:48, Noam Postavsky wrote:
> On Tue, 31 Mar 2020 at 14:47, Dmitry Gutov<dgutov@yandex.ru>  wrote:
>> On 31.03.2020 20:46, Eli Zaretskii wrote:
>>> One second of delay before the first window-full is displayed?  This
>>> is like infinity.
>> This is what we have now:
> Except that s/first window-full/last window-full/

True. And I meant to suggest that, on average, we'd get the same 1 
second delay (if we assume all positions in the file are equally probable).

However, I've just tried the same experiment without goto-char, and got 
essentially the same result as with it: 1.2 s (my previous result was 
with "cold" filesystem cache).

In addition to that, though, I think this call returns before the window 
finishes displaying. So, when point is at eob, there's some extra wait, 
but I'm not sure how to measure it.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 18:47                         ` Dmitry Gutov
  2020-03-31 18:48                           ` Noam Postavsky
@ 2020-03-31 19:26                           ` Eli Zaretskii
  2020-03-31 19:50                             ` Dmitry Gutov
  1 sibling, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 19:26 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl

> Cc: casouri@gmail.com, akrl@sdf.org, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 31 Mar 2020 21:47:17 +0300
> 
> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))
> 
> => Elapsed time: 1.940401s (0.376140s in 6 GCs)

This doesn't measure the redisplay (which happens after the above
command returns).



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 19:26                           ` Eli Zaretskii
@ 2020-03-31 19:50                             ` Dmitry Gutov
  2020-04-01  2:28                               ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-03-31 19:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

On 31.03.2020 22:26, Eli Zaretskii wrote:
>> Cc:casouri@gmail.com,akrl@sdf.org,emacs-devel@gnu.org
>> From: Dmitry Gutov<dgutov@yandex.ru>
>> Date: Tue, 31 Mar 2020 21:47:17 +0300
>>
>> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))
>>
>> => Elapsed time: 1.940401s (0.376140s in 6 GCs)
> This doesn't measure the redisplay (which happens after the above
> command returns).

Which means that the current state of affairs is even slower.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 19:50                             ` Dmitry Gutov
@ 2020-04-01  2:28                               ` Eli Zaretskii
  2020-04-01  3:49                                 ` Dmitry Gutov
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-01  2:28 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl

> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 31 Mar 2020 22:50:43 +0300
> 
> >> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))
> >>
> >> => Elapsed time: 1.940401s (0.376140s in 6 GCs)
> > This doesn't measure the redisplay (which happens after the above
> > command returns).
> 
> Which means that the current state of affairs is even slower.

No, it means that whatever delay we will have with parsing the entire
buffer is _in_addition_ to whatever you measured.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  2:28                               ` Eli Zaretskii
@ 2020-04-01  3:49                                 ` Dmitry Gutov
  2020-04-01  4:14                                   ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-01  3:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

On 01.04.2020 05:28, Eli Zaretskii wrote:
>> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org,
>>   emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Tue, 31 Mar 2020 22:50:43 +0300
>>
>>>> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))
>>>>
>>>> => Elapsed time: 1.940401s (0.376140s in 6 GCs)
>>> This doesn't measure the redisplay (which happens after the above
>>> command returns).
>>
>> Which means that the current state of affairs is even slower.
> 
> No, it means that whatever delay we will have with parsing the entire
> buffer is _in_addition_ to whatever you measured.

Probably not. IIUC, most of this 1.2 measured delay is CC Mode doing the 
preliminary parsing. That phase would be replaced by TreeSitter's full 
buffer parse, which supposedly takes a comparable amount of time.

The redisplay phase will most likely be faster because by then the 
correct AST is available, and computing highlighting based on it is 
supposedly something that TreeSitter does quickly and well.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  3:49                                 ` Dmitry Gutov
@ 2020-04-01  4:14                                   ` Eli Zaretskii
  2020-04-01 13:47                                     ` Dmitry Gutov
  2020-04-01 13:52                                     ` Alan Mackenzie
  0 siblings, 2 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-01  4:14 UTC (permalink / raw)
  To: emacs-devel, Dmitry Gutov; +Cc: casouri, monnier, akrl

On April 1, 2020 6:49:45 AM GMT+03:00, Dmitry Gutov <dgutov@yandex.ru> wrote:
> On 01.04.2020 05:28, Eli Zaretskii wrote:
> >> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org,
> >>   emacs-devel@gnu.org
> >> From: Dmitry Gutov <dgutov@yandex.ru>
> >> Date: Tue, 31 Mar 2020 22:50:43 +0300
> >>
> >>>> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char
> (point-max))))
> >>>>
> >>>> => Elapsed time: 1.940401s (0.376140s in 6 GCs)
> >>> This doesn't measure the redisplay (which happens after the above
> >>> command returns).
> >>
> >> Which means that the current state of affairs is even slower.
> > 
> > No, it means that whatever delay we will have with parsing the
> entire
> > buffer is _in_addition_ to whatever you measured.
> 
> Probably not. IIUC, most of this 1.2 measured delay is CC Mode doing
> the 
> preliminary parsing.

There's no need to guess.  Just profile this use case, and you will clearly see what takes most of this time.

In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes.  I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification.  CC Mode certainly doesn't seem to do that.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  4:14                                   ` Eli Zaretskii
@ 2020-04-01 13:47                                     ` Dmitry Gutov
  2020-04-01 14:04                                       ` Eli Zaretskii
  2020-04-01 13:52                                     ` Alan Mackenzie
  1 sibling, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-01 13:47 UTC (permalink / raw)
  To: Eli Zaretskii, emacs-devel; +Cc: casouri, monnier, akrl

On 01.04.2020 07:14, Eli Zaretskii wrote:

> There's no need to guess.  Just profile this use case, and you will clearly see what takes most of this time.

   - c-mode                                      772  75%
    - c-common-init                              766  74%
     - mapc                                      764  74%
      - #<compiled 0x158957d29ef1>                509  49%
       + c-neutralize-syntax-in-CPP                276  26%
       + c-after-change-mark-abnormal-strings                204  19%
       + c-parse-quotes-after-change                 18   1%
      - #<compiled 0x158957d29ee5>                255  24%
       + c-before-change-check-unbalanced-strings                199  19%
       + c-depropertize-CPP                       46   4%
       c-font-lock-init                            1   0%
       c-basic-common-init                         1   0%

You can also compare CC Mode's init with JS Mode's.

If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same 
benchmark takes ~60ms. So yes, CC Mode does a lot during initialization, 
and that stuff can be described as "preliminary parsing".

And there will be more of that during redisplay itself.

> In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes.  I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification.  CC Mode certainly doesn't seem to do that.

Now you know.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 13:47                                     ` Dmitry Gutov
@ 2020-04-01 14:04                                       ` Eli Zaretskii
  2020-04-01 14:55                                         ` Eli Zaretskii
  2020-04-01 15:16                                         ` Dmitry Gutov
  0 siblings, 2 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-01 14:04 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel

> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, akrl@sdf.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 1 Apr 2020 16:47:02 +0300
> 
> On 01.04.2020 07:14, Eli Zaretskii wrote:
> 
> > There's no need to guess.  Just profile this use case, and you will clearly see what takes most of this time.
> 
>    - c-mode                                      772  75%
>     - c-common-init                              766  74%
>      - mapc                                      764  74%
>       - #<compiled 0x158957d29ef1>                509  49%
>        + c-neutralize-syntax-in-CPP                276  26%
>        + c-after-change-mark-abnormal-strings                204  19%
>        + c-parse-quotes-after-change                 18   1%
>       - #<compiled 0x158957d29ee5>                255  24%
>        + c-before-change-check-unbalanced-strings                199  19%
>        + c-depropertize-CPP                       46   4%
>        c-font-lock-init                            1   0%
>        c-basic-common-init                         1   0%

I see a very different picture here: the above takes something like
15%.  Most of the time is spent in functions called by jit-lock.

> If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same 
> benchmark takes ~60ms. So yes, CC Mode does a lot during initialization, 
> and that stuff can be described as "preliminary parsing".

Except that I cannot reproduce these results, so I'm not really sure
what we are looking at.

What I did was start the profiler, then manually call got-char, then
produce the profiler report.  What did you do to collect the above
profile?

> And there will be more of that during redisplay itself.

Which is not what your benchmark measures.

> > In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes.  I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification.  CC Mode certainly doesn't seem to do that.
> 
> Now you know.

Do I?



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 14:04                                       ` Eli Zaretskii
@ 2020-04-01 14:55                                         ` Eli Zaretskii
  2020-04-01 15:16                                         ` Dmitry Gutov
  1 sibling, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-01 14:55 UTC (permalink / raw)
  To: dgutov; +Cc: casouri, emacs-devel, monnier, akrl

> Date: Wed, 01 Apr 2020 17:04:24 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> 
> What I did was start the profiler, then manually call got-char, then
> produce the profiler report.

That came out confusingly unclear.  What I actually did was start the
profiler, then evaluate the form that visits xdisp.c and goes to
point-max, then call profiler-report.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 14:04                                       ` Eli Zaretskii
  2020-04-01 14:55                                         ` Eli Zaretskii
@ 2020-04-01 15:16                                         ` Dmitry Gutov
  2020-04-01 15:59                                           ` Eli Zaretskii
  1 sibling, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-01 15:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

On 01.04.2020 17:04, Eli Zaretskii wrote:
>> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, akrl@sdf.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Wed, 1 Apr 2020 16:47:02 +0300
>>
>> On 01.04.2020 07:14, Eli Zaretskii wrote:
>>
>>> There's no need to guess.  Just profile this use case, and you will clearly see what takes most of this time.
>>
>>     - c-mode                                      772  75%
>>      - c-common-init                              766  74%
>>       - mapc                                      764  74%
>>        - #<compiled 0x158957d29ef1>                509  49%
>>         + c-neutralize-syntax-in-CPP                276  26%
>>         + c-after-change-mark-abnormal-strings                204  19%
>>         + c-parse-quotes-after-change                 18   1%
>>        - #<compiled 0x158957d29ee5>                255  24%
>>         + c-before-change-check-unbalanced-strings                199  19%
>>         + c-depropertize-CPP                       46   4%
>>         c-font-lock-init                            1   0%
>>         c-basic-common-init                         1   0%
> 
> I see a very different picture here: the above takes something like
> 15%.  Most of the time is spent in functions called by jit-lock.

What are your measurements, though? Again, what does this print out?

   (benchmark 1 '(progn (find-file "src/xdisp.c")))

>> If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same
>> benchmark takes ~60ms. So yes, CC Mode does a lot during initialization,
>> and that stuff can be described as "preliminary parsing".
> 
> Except that I cannot reproduce these results, so I'm not really sure
> what we are looking at.
> 
> What I did was start the profiler, then manually call got-char, then
> produce the profiler report.  What did you do to collect the above
> profile?

No 'goto-char'. As we've established, it only affects the time taken by 
redisplay, and I can't measure that. So I'm not profiling it either, 
otherwise I'd be comparing apples to oranges.

>> And there will be more of that during redisplay itself.
> 
> Which is not what your benchmark measures.

Exactly. Like I said, I can't measure how long redisplay itself takes.

>>> In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes.  I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification.  CC Mode certainly doesn't seem to do that.
>>
>> Now you know.
> 
> Do I?

Yes. The numbers can be different, but there is definitely some up-front 
computation there. One that's not present with e.g. js-mode.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:16                                         ` Dmitry Gutov
@ 2020-04-01 15:59                                           ` Eli Zaretskii
  2020-04-01 21:48                                             ` Dmitry Gutov
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-01 15:59 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel

> Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  akrl@sdf.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 1 Apr 2020 18:16:04 +0300
> 
> > I see a very different picture here: the above takes something like
> > 15%.  Most of the time is spent in functions called by jit-lock.
> 
> What are your measurements, though?

My full profile is below.  This is from Emacs 27.0.90 compiled with
the -Og optimization and with wide-int (which slows down Emacs by
about 30%).

> Again, what does this print out?
> 
>    (benchmark 1 '(progn (find-file "src/xdisp.c")))

  Elapsed time: 1.733853s (0.140584s in 6 GCs)

> No 'goto-char'. As we've established, it only affects the time taken by 
> redisplay, and I can't measure that. So I'm not profiling it either, 
> otherwise I'd be comparing apples to oranges.

See the second profile below.

> Yes. The numbers can be different, but there is definitely some up-front 
> computation there. One that's not present with e.g. js-mode.

So you are saying that we should do that up-front computation just
because CC mode currently does it?  That we shouldn't try to eliminate
such preprocessing?  I don't think so.

Here's the profile from visiting xdisp.c and going to end of the
buffer:

- redisplay_internal (C function)                                  65  41%
 - jit-lock-function                                               65  41%
  - jit-lock-fontify-now                                           65  41%
   - jit-lock--run-functions                                       65  41%
    - run-hook-wrapped                                             65  41%
     - #<compiled -0x1ffffffff8adaa88>                             65  41%
      - font-lock-fontify-region                                   65  41%
       - c-font-lock-fontify-region                                65  41%
        - font-lock-default-fontify-region                         50  31%
         - font-lock-fontify-keywords-region                       35  22%
          - c-font-lock-declarations                               34  21%
           - c-find-decl-spots                                     34  21%
            - c-bs-at-toplevel-p                                   32  20%
             - c-brace-stack-at                                    32  20%
              - c-update-brace-stack                               31  19%
               - c-syntactic-re-search-forward                     27  17%
                - c-beginning-of-macro                              6   3%
                   back-to-indentation                              2   1%
                   #<compiled -0x1ffffffff8ae5f98>                  1   0%
              c-forward-sws                                         1   0%
          - c-font-lock-complex-decl-prepare                        1   0%
           - c-parse-state                                          1   0%
            - c-parse-state-1                                       1   0%
             - c-parse-state-get-strategy                           1   0%
              - c-get-fallback-scan-pos                             1   0%
               - beginning-of-defun                                 1   0%
                - beginning-of-defun-raw                            1   0%
                   syntax-ppss                                      1   0%
         - font-lock-fontify-syntactically-region                  15   9%
            syntax-ppss                                            15   9%
        - c-before-context-fl-expand-region                        15   9%
         - mapc                                                    15   9%
          - #<compiled -0x1ffffffff8a66198>                        15   9%
           - c-context-expand-fl-region                            15   9%
            - c-fl-decl-start                                      15   9%
             - c-literal-start                                     14   8%
              - c-semi-pp-to-literal                               14   8%
                 c-parse-ps-state-below                            14   8%
               c-determine-limit                                    1   0%
- command-execute                                                  64  40%
 - call-interactively                                              64  40%
  - funcall-interactively                                          63  40%
   - eval-last-sexp                                                63  40%
    - elisp--eval-last-sexp                                        63  40%
     - eval                                                        63  40%
      - progn                                                      63  40%
       - progn                                                     63  40%
        - find-file                                                63  40%
         - find-file-noselect                                      63  40%
          - find-file-noselect-1                                   63  40%
           - after-find-file                                       63  40%
            - normal-mode                                          61  38%
             - set-auto-mode                                       61  38%
              - set-auto-mode-0                                    61  38%
               - c-mode                                            61  38%
                - c-common-init                                    57  36%
                 - mapc                                            57  36%
                  - #<compiled -0x1ffffffff8a7d680>                 37  23%
                   - c-neutralize-syntax-in-CPP                    20  12%
                    - c-beginning-of-macro                          4   2%
                       c-backward-single-comment                    2   1%
                       back-to-indentation                          1   0%
                      c-no-comment-end-of-macro                     3   1%
                     c-after-change-mark-abnormal-strings                 15   9%
                     c-parse-quotes-after-change                    1   0%
                  - #<compiled -0x1ffffffff8a7d6b0>                 20  12%
                   - c-before-change-check-unbalanced-strings                 15   9%
                    - c-literal-limits                             15   9%
                     - c-full-pp-to-literal                        15   9%
                        c-parse-ps-state-below                     15   9%
                     c-depropertize-CPP                             4   2%
                - byte-code                                         2   1%
                   require                                          1   0%
                - run-mode-hooks                                    1   0%
                 - hack-local-variables                             1   0%
                  - hack-dir-local-variables                        1   0%
                     dir-locals-read-from-dir                       1   0%
            - run-hooks                                             2   1%
             - vc-refresh-state                                     2   1%
              - vc-backend                                          2   1%
               - vc-registered                                      2   1%
                - mapc                                              2   1%
                 - #<compiled -0x1ffffffff8a67780>                  2   1%
                  - vc-call-backend                                 2   1%
                   - apply                                          2   1%
                    - vc-git-registered                             2   1%
                     - if                                           2   1%
                      - progn                                       2   1%
                       - load                                       1   0%
                          require                                   1   0%
  - byte-code                                                       1   0%
   - read-extended-command                                          1   0%
    - completing-read                                               1   0%
       completing-read-default                                      1   0%
- ...                                                              28  17%
   Automatic GC                                                    27  17%
 - substitute-key-definition-key                                    1   0%
  - substitute-key-definition                                       1   0%
   - map-keymap                                                     1   0%
    - #<compiled -0x1ffffffff8a80eb8>                               1   0%
     - substitute-key-definition-key                                1   0%
      - substitute-key-definition                                   1   0%
       - map-keymap                                                 1   0%
        - #<compiled -0x1ffffffff8a80c48>                           1   0%
         - substitute-key-definition-key                            1   0%
          - substitute-key-definition                               1   0%
           - map-keymap                                             1   0%
            - #<compiled -0x1ffffffff8a80658>                       1   0%
             - substitute-key-definition-key                        1   0%
              - substitute-key-definition                           1   0%
               - map-keymap                                         1   0%
                  #<compiled -0x1ffffffff8a7ce58>                   1   0%

Here's the profile from just visiting xdisp.c:

- command-execute                                                  67  82%
 - call-interactively                                              67  82%
  - funcall-interactively                                          67  82%
   - eval-expression                                               67  82%
    - eval                                                         67  82%
     - progn                                                       67  82%
      - find-file                                                  67  82%
       - find-file-noselect                                        67  82%
        - find-file-noselect-1                                     66  81%
         - after-find-file                                         66  81%
          - normal-mode                                            62  76%
           - set-auto-mode                                         62  76%
            - set-auto-mode-0                                      62  76%
             - c-mode                                              62  76%
              - c-common-init                                      55  67%
               - mapc                                              55  67%
                - #<compiled -0x1ffffffff8aa7940>                  36  44%
                 - c-neutralize-syntax-in-CPP                      21  25%
                  - c-beginning-of-macro                            2   2%
                     c-backward-single-comment                      1   1%
                   c-after-change-mark-abnormal-strings                 14  17%
                - #<compiled -0x1ffffffff8aa7970>                  19  23%
                 - c-before-change-check-unbalanced-strings                 14  17%
                  - c-literal-limits                               14  17%
                   - c-full-pp-to-literal                          14  17%
                      c-parse-ps-state-below                       14  17%
                 - c-depropertize-CPP                               4   4%
                    c-end-of-macro                                  1   1%
              - byte-code                                           6   7%
                 require                                            4   4%
               - substitute-key-definition                          1   1%
                - map-keymap                                        1   1%
                 - #<compiled -0x1ffffffff8aac0b8>                  1   1%
                  - substitute-key-definition-key                   1   1%
                   - substitute-key-definition                      1   1%
                      map-keymap                                    1   1%
          - run-hooks                                               4   4%
           - vc-refresh-state                                       4   4%
            - vc-backend                                            4   4%
             - vc-registered                                        4   4%
              - mapc                                                3   3%
               - #<compiled -0x1ffffffff8ae8e88>                    3   3%
                - vc-call-backend                                   3   3%
                 - apply                                            3   3%
                  - vc-git-registered                               2   2%
                   - if                                             2   2%
                    - progn                                         2   2%
                     - load                                         1   1%
                      - require                                     1   1%
                       - defconst                                   1   1%
                          byte-code                                 1   1%
                     - vc-git-registered                            1   1%
                      - vc-git--out-ok                              1   1%
                       - apply                                      1   1%
                        - vc-git--call                              1   1%
                         - apply                                    1   1%
                          - process-file                            1   1%
                             apply                                  1   1%
                  - vc-git-find-file-hook                           1   1%
                   - vc-state                                       1   1%
                    - vc-state-refresh                              1   1%
                     - vc-call-backend                              1   1%
                      - apply                                       1   1%
                       - vc-git-state                               1   1%
                        - apply                                     1   1%
                         - vc-git--run-command-string                  1   1%
                          - apply                                   1   1%
                           - vc-git--out-ok                         1   1%
                            - apply                                 1   1%
                             - vc-git--call                         1   1%
                              - apply                               1   1%
                               - process-file                       1   1%
                                  apply                             1   1%
                vc-file-getprop                                     1   1%
        - find-buffer-visiting                                      1   1%
         - file-truename                                            1   1%
          - file-truename                                           1   1%
           - file-truename                                          1   1%
            - file-truename                                         1   1%
             - file-truename                                        1   1%
                file-truename                                       1   1%
- ...                                                              14  17%
   Automatic GC                                                    14  17%



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:59                                           ` Eli Zaretskii
@ 2020-04-01 21:48                                             ` Dmitry Gutov
  2020-04-01 22:29                                               ` Stefan Monnier
  2020-04-02 14:23                                               ` Eli Zaretskii
  0 siblings, 2 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-01 21:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

On 01.04.2020 18:59, Eli Zaretskii wrote:

>> What are your measurements, though?
> 
> My full profile is below.  This is from Emacs 27.0.90 compiled with
> the -Og optimization and with wide-int (which slows down Emacs by
> about 30%).

Thank you. I also build with '-Og -g3' these days, but probably have a 
faster CPU.

>> Again, what does this print out?
>>
>>     (benchmark 1 '(progn (find-file "src/xdisp.c")))
> 
>    Elapsed time: 1.733853s (0.140584s in 6 GCs)

All right. So it takes 1.7s just to open the file, even before full 
syntax highlighting.

>> No 'goto-char'. As we've established, it only affects the time taken by
>> redisplay, and I can't measure that. So I'm not profiling it either,
>> otherwise I'd be comparing apples to oranges.
> 
> See the second profile below.

Comparing both, looks like redisplay (when at eob, at least) takes 
approx. the same amount of time?

>> Yes. The numbers can be different, but there is definitely some up-front
>> computation there. One that's not present with e.g. js-mode.
> 
> So you are saying that we should do that up-front computation just
> because CC mode currently does it?  That we shouldn't try to eliminate
> such preprocessing?  I don't think so.

AFAIU CC Mode could actually eliminate it, but that would require a 
significant rework of its internals.

I'm just pointing out that apparently you didn't even notice an even 
larger delay (1.7s), and were fine with it until now.

I'm not saying that nobody should try to explore how to decrease the 
delay, and what tradeoffs come with that. But for now, I think, we 
should encourage our kind volunteers to just implement integration the 
way TreeSitter's authors expect it. And try, on our side, to provide the 
best tools for it. Then we can see how well it does or doesn't work, and 
what are the biggest annoyances that the users have with it.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 21:48                                             ` Dmitry Gutov
@ 2020-04-01 22:29                                               ` Stefan Monnier
  2020-04-02 14:23                                               ` Eli Zaretskii
  1 sibling, 0 replies; 109+ messages in thread
From: Stefan Monnier @ 2020-04-01 22:29 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, akrl, casouri, emacs-devel

> AFAIU CC Mode could actually eliminate it, but that would require
>  a significant rework of its internals.

My experiments to make CC-mode use syntax-propertize-function suggest
that it wouldn't require too much work, actually.  For an outsider, it's
difficult because it's hard to understand all the invariants/assumptions
in the current design, but if Alan and I were to work together on it, it
would be pretty easy.  So far Alan has been opposed and there are several
good reasons for that:

- it's extra work.
- it will inevitably introduce bugs.
- while it will most likely be faster when opening the file, it will
  likely be slower in other cases (e.g. when modifying the buffer near
  point-min in one window while having point-max displayed in another).
- syntax-propertize was introduced in Emacs-24 so it would require
  either dropping CC-mode's support for earlier Emacsen, or adding some
  compatibility layer (I think this compatibility layer would be
  easy to write but would likely not cover all cases).

> I'm not saying that nobody should try to explore how to decrease the delay,
> and what tradeoffs come with that. But for now, I think, we should encourage
> our kind volunteers to just implement integration the way TreeSitter's
> authors expect it. And try, on our side, to provide the best tools for
> it. Then we can see how well it does or doesn't work, and what are the
> biggest annoyances that the users have with it.

+1


        Stefan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 21:48                                             ` Dmitry Gutov
  2020-04-01 22:29                                               ` Stefan Monnier
@ 2020-04-02 14:23                                               ` Eli Zaretskii
  2020-04-02 16:17                                                 ` Dmitry Gutov
  1 sibling, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-02 14:23 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel

> Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  akrl@sdf.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 2 Apr 2020 00:48:20 +0300
> 
> >> No 'goto-char'. As we've established, it only affects the time taken by
> >> redisplay, and I can't measure that. So I'm not profiling it either,
> >> otherwise I'd be comparing apples to oranges.
> > 
> > See the second profile below.
> 
> Comparing both, looks like redisplay (when at eob, at least) takes 
> approx. the same amount of time?

About 55% taken by redisplay (almost all of it due to fontification),
and the other 45% are the C mode "preprocessing" when the mode is
turned on in a buffer.

> >> Yes. The numbers can be different, but there is definitely some up-front
> >> computation there. One that's not present with e.g. js-mode.
> > 
> > So you are saying that we should do that up-front computation just
> > because CC mode currently does it?  That we shouldn't try to eliminate
> > such preprocessing?  I don't think so.
> 
> AFAIU CC Mode could actually eliminate it, but that would require a 
> significant rework of its internals.

Are we still talking about integrating a completely different parsing
engine into CC Mode?  Then redesign is a must, right?

> I'm just pointing out that apparently you didn't even notice an even 
> larger delay (1.7s), and were fine with it until now.

I didn't "didn't notice", I actually filed several bug reports and
complaints about the various slow aspects of CC mode, because the
slowdown in CC mode over the years annoys me quite a lot.  Some of the
problems were fixed, some weren't (due to limitations of the current
design, I was told).  I'm not at all complacent about this.

> I'm not saying that nobody should try to explore how to decrease the 
> delay, and what tradeoffs come with that. But for now, I think, we 
> should encourage our kind volunteers to just implement integration the 
> way TreeSitter's authors expect it. And try, on our side, to provide the 
> best tools for it. Then we can see how well it does or doesn't work, and 
> what are the biggest annoyances that the users have with it.

I cannot tell the volunteers what to do and where to invest their
resources.  But I can provide feedback on the design ideas, based on
what I know and on my experience, and I can suggest how to design and
implement this to achieve good and scalable performance.  In
particular, I think that it is useful to know what we have tried in
the past and what were the lessons we learned from that.  I hope what
I say is of some help, and I hope we will soon have such engine
available to Emacs.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 14:23                                               ` Eli Zaretskii
@ 2020-04-02 16:17                                                 ` Dmitry Gutov
  2020-04-02 18:25                                                   ` Eli Zaretskii
  2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
  0 siblings, 2 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-02 16:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

On 02.04.2020 17:23, Eli Zaretskii wrote:

>> Comparing both, looks like redisplay (when at eob, at least) takes
>> approx. the same amount of time?
> 
> About 55% taken by redisplay (almost all of it due to fontification),
> and the other 45% are the C mode "preprocessing" when the mode is
> turned on in a buffer.

So, all in all, when xdisp.c is opened at eob, it will be displayed 
after ~2.5 seconds, I guess.

>>> So you are saying that we should do that up-front computation just
>>> because CC mode currently does it?  That we shouldn't try to eliminate
>>> such preprocessing?  I don't think so.
>>
>> AFAIU CC Mode could actually eliminate it, but that would require a
>> significant rework of its internals.
> 
> Are we still talking about integrating a completely different parsing
> engine into CC Mode?  Then redesign is a must, right?

No, that's without TreeSitter.

>> I'm just pointing out that apparently you didn't even notice an even
>> larger delay (1.7s), and were fine with it until now.
> 
> I didn't "didn't notice", I actually filed several bug reports and
> complaints about the various slow aspects of CC mode, because the
> slowdown in CC mode over the years annoys me quite a lot.  Some of the
> problems were fixed, some weren't (due to limitations of the current
> design, I was told).  I'm not at all complacent about this.

Still, compare that with 0.15 sec, which is the current estimate of 
parsing xdisp.c. It could probably be improved still by supporting a 
no-copy buffer-string in modules.

> I cannot tell the volunteers what to do and where to invest their
> resources.  But I can provide feedback on the design ideas, based on
> what I know and on my experience, and I can suggest how to design and
> implement this to achieve good and scalable performance.

We shouldn't, however, create an impression that unless they follow our 
ideas to a T we won't help them realize their own preferred approach 
(e.g. by improving the module API).

 > In
 > particular, I think that it is useful to know what we have tried in
 > the past and what were the lessons we learned from that.  I hope what
 > I say is of some help, and I hope we will soon have such engine
 > available to Emacs.

I'm fairly confident that implementing deferred/on-demand parsing in 
emacs-tree-sitter can be done later without requiring a major redesign. 
It will require, however, an extra layer of complexity either way.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 16:17                                                 ` Dmitry Gutov
@ 2020-04-02 18:25                                                   ` Eli Zaretskii
  2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
  1 sibling, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-02 18:25 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel

> Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  akrl@sdf.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 2 Apr 2020 19:17:07 +0300
> 
> > I cannot tell the volunteers what to do and where to invest their
> > resources.  But I can provide feedback on the design ideas, based on
> > what I know and on my experience, and I can suggest how to design and
> > implement this to achieve good and scalable performance.
> 
> We shouldn't, however, create an impression that unless they follow our 
> ideas to a T we won't help them realize their own preferred approach 

That's so unfair that I will in the future think twice before offering
any advice.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 16:17                                                 ` Dmitry Gutov
  2020-04-02 18:25                                                   ` Eli Zaretskii
@ 2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
  2020-04-03 16:10                                                     ` Dmitry Gutov
  1 sibling, 1 reply; 109+ messages in thread
From: Tuấn-Anh Nguyễn @ 2020-04-03 14:40 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: Eli Zaretskii, emacs-devel, casouri, Stefan Monnier,
	Andrea Corallo

On Thu, Apr 2, 2020 at 11:17 PM Dmitry Gutov <dgutov@yandex.ru> wrote:
>
> On 02.04.2020 17:23, Eli Zaretskii wrote:
>
> > I cannot tell the volunteers what to do and where to invest their
> > resources.  But I can provide feedback on the design ideas, based on
> > what I know and on my experience, and I can suggest how to design and
> > implement this to achieve good and scalable performance.
>
> We shouldn't, however, create an impression that unless they follow our
> ideas to a T we won't help them realize their own preferred approach
> (e.g. by improving the module API).
>

FWIW, this was not my impression.

--
Tuấn-Anh Nguyễn
Software Engineer



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
@ 2020-04-03 16:10                                                     ` Dmitry Gutov
  0 siblings, 0 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-03 16:10 UTC (permalink / raw)
  To: Tuấn-Anh Nguyễn
  Cc: Eli Zaretskii, emacs-devel, casouri, Stefan Monnier,
	Andrea Corallo

On 03.04.2020 17:40, Tuấn-Anh Nguyễn wrote:
> FWIW, this was not my impression.

I'm glad to hear it.

My apologies, then.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  4:14                                   ` Eli Zaretskii
  2020-04-01 13:47                                     ` Dmitry Gutov
@ 2020-04-01 13:52                                     ` Alan Mackenzie
  2020-04-01 14:10                                       ` Eli Zaretskii
  2020-04-01 15:22                                       ` Dmitry Gutov
  1 sibling, 2 replies; 109+ messages in thread
From: Alan Mackenzie @ 2020-04-01 13:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: akrl, casouri, Dmitry Gutov, monnier, emacs-devel

Hello, Eli.

On Wed, Apr 01, 2020 at 07:14:09 +0300, Eli Zaretskii wrote:
> On April 1, 2020 6:49:45 AM GMT+03:00, Dmitry Gutov <dgutov@yandex.ru> wrote:
> > On 01.04.2020 05:28, Eli Zaretskii wrote:
> > >> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org,
> > >>   emacs-devel@gnu.org
> > >> From: Dmitry Gutov <dgutov@yandex.ru>
> > >> Date: Tue, 31 Mar 2020 22:50:43 +0300

> In general, there's no "preliminary processing" by the major mode's
> fontification facilities except what happens as part of jit-lock, i.e.
> at redisplay time or as side effect of functions that simulate display
> for redisplay purposes.  I'd be very surprised to see a major mode
> which somehow preprocesses the buffer on its own in preparation for
> fontification.  CC Mode certainly doesn't seem to do that.

CC Mode does do this.  It marks syntax-table text properties throughout
the buffer at find-file time, and keeps them valid thereafter in
before/after-change-functions.

This doesn't seem to affect starting up performance that badly.  On my
machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification
of the first screenful of comments) is taking 0.18s.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 13:52                                     ` Alan Mackenzie
@ 2020-04-01 14:10                                       ` Eli Zaretskii
  2020-04-01 15:27                                         ` Dmitry Gutov
  2020-04-01 15:22                                       ` Dmitry Gutov
  1 sibling, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-01 14:10 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: casouri, dgutov, emacs-devel, monnier, akrl

> Date: Wed, 1 Apr 2020 13:52:37 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: akrl@sdf.org, casouri@gmail.com, Dmitry Gutov <dgutov@yandex.ru>,
>  monnier@iro.umontreal.ca, emacs-devel@gnu.org
> 
> > In general, there's no "preliminary processing" by the major mode's
> > fontification facilities except what happens as part of jit-lock, i.e.
> > at redisplay time or as side effect of functions that simulate display
> > for redisplay purposes.  I'd be very surprised to see a major mode
> > which somehow preprocesses the buffer on its own in preparation for
> > fontification.  CC Mode certainly doesn't seem to do that.
> 
> CC Mode does do this.  It marks syntax-table text properties throughout
> the buffer at find-file time, and keeps them valid thereafter in
> before/after-change-functions.
> 
> This doesn't seem to affect starting up performance that badly.  On my
> machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification
> of the first screenful of comments) is taking 0.18s.

Like I said, the profile I see is very different, and shows that most
of the time is spent in redisplay-triggered font-lock.

But in any case, it should be trivially obvious that avoiding to parse
the entire buffer will make redisplay faster.  We should try doing
that instead of giving up, even if we think the current fontification
machinery is slow enough to make the parsing delay not so visible.
After all, we want to use these parsers to make CC Mode and friends
faster, so the design and the implementation should use every trick we
have up our sleeve to avoid expensive processing.  Just because using
buffer-substring and parsing the entire buffer up front is easy
doesn't yet mean we should go for it without trying more efficient
algorithms.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 14:10                                       ` Eli Zaretskii
@ 2020-04-01 15:27                                         ` Dmitry Gutov
  2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
  2020-04-01 16:03                                           ` Eli Zaretskii
  0 siblings, 2 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-01 15:27 UTC (permalink / raw)
  To: Eli Zaretskii, Alan Mackenzie; +Cc: casouri, emacs-devel, monnier, akrl

On 01.04.2020 17:10, Eli Zaretskii wrote:
> But in any case, it should be trivially obvious that avoiding to parse
> the entire buffer will make redisplay faster.  We should try doing
> that instead of giving up, even if we think the current fontification
> machinery is slow enough to make the parsing delay not so visible.

I think it's pointless to argue against the current design of TreeSitter 
here, where none of its developers can read it.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:27                                         ` Dmitry Gutov
@ 2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
  2020-04-01 16:03                                           ` Eli Zaretskii
  1 sibling, 0 replies; 109+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-04-01 15:44 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: casouri, emacs-devel, Stefan Monnier, Alan Mackenzie,
	Eli Zaretskii, akrl

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

Yup.

El mié., 1 de abr. de 2020 a la(s) 09:28, Dmitry Gutov (dgutov@yandex.ru)
escribió:

> On 01.04.2020 17:10, Eli Zaretskii wrote:
> > But in any case, it should be trivially obvious that avoiding to parse
> > the entire buffer will make redisplay faster.  We should try doing
> > that instead of giving up, even if we think the current fontification
> > machinery is slow enough to make the parsing delay not so visible.
>
> I think it's pointless to argue against the current design of TreeSitter
> here, where none of its developers can read it.
>
>

[-- Attachment #2: Type: text/html, Size: 937 bytes --]

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:27                                         ` Dmitry Gutov
  2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
@ 2020-04-01 16:03                                           ` Eli Zaretskii
  2020-04-01 21:21                                             ` Dmitry Gutov
  1 sibling, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-01 16:03 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> Cc: akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 1 Apr 2020 18:27:43 +0300
> 
> On 01.04.2020 17:10, Eli Zaretskii wrote:
> > But in any case, it should be trivially obvious that avoiding to parse
> > the entire buffer will make redisplay faster.  We should try doing
> > that instead of giving up, even if we think the current fontification
> > machinery is slow enough to make the parsing delay not so visible.
> 
> I think it's pointless to argue against the current design of TreeSitter 
> here, where none of its developers can read it.

If by TreeSitter you mean the parser (not the Emacs package which
interfaces it), then what I proposed is not against their design,
AFAIU.  They provide an API through which we can let the parser access
the buffer text directly, and they explicitly say that the parser is
tolerant to invalid/incomplete syntax trees.  And I don't see how it
could be any different, since when you start writing code, it takes
quite some time before it becomes syntactically complete and valid.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 16:03                                           ` Eli Zaretskii
@ 2020-04-01 21:21                                             ` Dmitry Gutov
  2020-04-02 14:09                                               ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-01 21:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl

On 01.04.2020 19:03, Eli Zaretskii wrote:
>> Cc:akrl@sdf.org,casouri@gmail.com,monnier@iro.umontreal.ca,
>>   emacs-devel@gnu.org
>> From: Dmitry Gutov<dgutov@yandex.ru>
>> Date: Wed, 1 Apr 2020 18:27:43 +0300
>>
>> On 01.04.2020 17:10, Eli Zaretskii wrote:
>>> But in any case, it should be trivially obvious that avoiding to parse
>>> the entire buffer will make redisplay faster.  We should try doing
>>> that instead of giving up, even if we think the current fontification
>>> machinery is slow enough to make the parsing delay not so visible.
>> I think it's pointless to argue against the current design of TreeSitter
>> here, where none of its developers can read it.
> If by TreeSitter you mean the parser (not the Emacs package which
> interfaces it), then what I proposed is not against their design,
> AFAIU.  They provide an API through which we can let the parser access
> the buffer text directly, and they explicitly say that the parser is
> tolerant to invalid/incomplete syntax trees.  And I don't see how it
> could be any different, since when you start writing code, it takes
> quite some time before it becomes syntactically complete and valid.

That makes sense, at least in theory. But I'd rather not break the usage 
assumptions of the authors of this library right away. And we'll likely 
want to adopt existing addons which use the result of the parse, which 
likely depend on the same assumptions.

Anyway, here's a (short) discussion on the topic of large files: 
https://github.com/tree-sitter/tree-sitter/issues/222



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 21:21                                             ` Dmitry Gutov
@ 2020-04-02 14:09                                               ` Eli Zaretskii
  2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-02 14:09 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> Cc: acm@muc.de, akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 2 Apr 2020 00:21:36 +0300
> 
> > If by TreeSitter you mean the parser (not the Emacs package which
> > interfaces it), then what I proposed is not against their design,
> > AFAIU.  They provide an API through which we can let the parser access
> > the buffer text directly, and they explicitly say that the parser is
> > tolerant to invalid/incomplete syntax trees.  And I don't see how it
> > could be any different, since when you start writing code, it takes
> > quite some time before it becomes syntactically complete and valid.
> 
> That makes sense, at least in theory. But I'd rather not break the usage 
> assumptions of the authors of this library right away.

From what I could glean by reading the documentation, the above is not
necessarily against the assumptions of the tree-sitter developers.  I
saw nothing that would indicate the initial full parse is a must.
That such full parse is unnecessary is what I would expect, because of
the use case that I start writing a source file from scratch.

> And we'll likely want to adopt existing addons which use the result
> of the parse, which likely depend on the same assumptions.

Those other addons must also support the "write from scratch" use
case, right?  Then they should also support passing only part of the
buffer, since it could be that this is all I have in the buffer right
now.

> Anyway, here's a (short) discussion on the topic of large files: 
> https://github.com/tree-sitter/tree-sitter/issues/222

Thanks.  This was long ago, though, so I'm not sure what became of
that (and Stefan's comment didn't yet get any responses to indicate
that this is a solved problem).



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 14:09                                               ` Eli Zaretskii
@ 2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
  2020-04-02 18:27                                                   ` Yuan Fu
  0 siblings, 1 reply; 109+ messages in thread
From: ì¡°ì„±ë¹ˆ via "Emacs development discussions. @ 2020-04-02 18:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Gutov, acm, casouri, Emacs-devel, monnier, akrl


> 2020. 4. 2. 오후 11:10, Eli Zaretskii <eliz@gnu.org> 작성:
> 
>>
>> Cc: acm@muc.de, akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Thu, 2 Apr 2020 00:21:36 +0300
>>
>>> If by TreeSitter you mean the parser (not the Emacs package which
>>> interfaces it), then what I proposed is not against their design,
>>> AFAIU.  They provide an API through which we can let the parser access
>>> the buffer text directly, and they explicitly say that the parser is
>>> tolerant to invalid/incomplete syntax trees.  And I don't see how it
>>> could be any different, since when you start writing code, it takes
>>> quite some time before it becomes syntactically complete and valid.
>>
>> That makes sense, at least in theory. But I'd rather not break the usage
>> assumptions of the authors of this library right away.
>
> From what I could glean by reading the documentation, the above is not
> necessarily against the assumptions of the tree-sitter developers.  I
> saw nothing that would indicate the initial full parse is a must.
> That such full parse is unnecessary is what I would expect, because of
> the use case that I start writing a source file from scratch.

The situation of a new user creating a new buffer is very different from
parsing code with only a peephole, because users don’t generally expect
unfinished code to be exactly highlighted, while users do expect finished
code to have exact highlighting.

Maybe it’s just because I got lost through a lot of emails, and Mail.app
doesn't really thread these emails properly, but I can’t understand the
resistance of the front-up parsing.

The current shipping CC-Mode is parsing most of the code front-up, and
clearly tree sitter will be faster than that. AFAIU parsing code only by
only looking through a peephole is super hard except for some languages
that are designed for peephole processing - and that makes it only hard,
not super hard.

>> And we'll likely want to adopt existing addons which use the result
>> of the parse, which likely depend on the same assumptions.
>
> Those other addons must also support the "write from scratch" use
> case, right?  Then they should also support passing only part of the
> buffer, since it could be that this is all I have in the buffer right
> now.
>
>> Anyway, here's a (short) discussion on the topic of large files:
>> https://github.com/tree-sitter/tree-sitter/issues/222
>
> Thanks.  This was long ago, though, so I'm not sure what became of
> that (and Stefan's comment didn't yet get any responses to indicate
> that this is a solved problem).




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
@ 2020-04-02 18:27                                                   ` Yuan Fu
  2020-04-02 19:39                                                     ` Stefan Monnier
  0 siblings, 1 reply; 109+ messages in thread
From: Yuan Fu @ 2020-04-02 18:27 UTC (permalink / raw)
  To: 조성빈
  Cc: Emacs-devel, Stefan Monnier, Dmitry Gutov, acm, Eli Zaretskii,
	akrl

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]



> On Apr 2, 2020, at 2:03 PM, 조성빈 <pcr910303@icloud.com> wrote:
> 
> Maybe it’s just because I got lost through a lot of emails, and Mail.app
> doesn't really thread these emails properly, but I can’t understand the
> resistance of the front-up parsing.
> 

I think we are just discussing if there is any way to not parse the whole buffer up front. (Which I consider unlikely because of the nature of parsing.)

> The current shipping CC-Mode is parsing most of the code front-up, and
> clearly tree sitter will be faster than that. AFAIU parsing code only by
> only looking through a peephole is super hard except for some languages
> that are designed for peephole processing - and that makes it only hard,
> not super hard.


Some modes doesn’t require a font-up parsing. IIRC, an example from an earlier message is javascript-mode.

Yuan

[-- Attachment #2: Type: text/html, Size: 7422 bytes --]

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 18:27                                                   ` Yuan Fu
@ 2020-04-02 19:39                                                     ` Stefan Monnier
  0 siblings, 0 replies; 109+ messages in thread
From: Stefan Monnier @ 2020-04-02 19:39 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Emacs-devel, 조성빈, Dmitry Gutov, acm,
	Eli Zaretskii, akrl

> Some modes doesn’t require a font-up parsing. IIRC, an example from an
> earlier message is javascript-mode.

Yet, in order to decide whether position P in a javascript buffer is
inside a comment or not, you will either have to look at everything
between point-min and P, or think hard about all the various
possibilities to try and see if you can argue that in this particular
case it's not necessary.
E.g. if you see

    foo /* bar */

then you might be able to say that "bar" is within a comment without
looking much further.  But for "foo" you first have to look back because
there might have been an earlier unmatched `/*`.
BTW, for "bar" you still have to look a bit further: it might be that
the previous line was:

    tmp = "hello\

in which case "bar" is not inside a comment but inside a string.
Well, unless there's ... an earlier unmatched `/*`.

Etc...

For the case of Javascript I believe that you can come up with an
algorithm which will reliably give the right answer while almost never
having to go back all the way to `point-min`.  I even believe it's
possible to write a tool that will automatically find that algorithm
given a suitable input grammar.  But for some languages like Elisp,
Python, and OCaml I believe it's simply impossible (for Elisp/Python
it's because of the existence of multiline strings (with no "trailing \"
to indicate their possible presence) and for OCaml it's because of the
nested comments).

        Stefan

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 13:52                                     ` Alan Mackenzie
  2020-04-01 14:10                                       ` Eli Zaretskii
@ 2020-04-01 15:22                                       ` Dmitry Gutov
  2020-04-04 11:06                                         ` Alan Mackenzie
  1 sibling, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-01 15:22 UTC (permalink / raw)
  To: Alan Mackenzie, Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

On 01.04.2020 16:52, Alan Mackenzie wrote:
> This doesn't seem to affect starting up performance that badly.  On my
> machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification
> of the first screenful of comments) is taking 0.18s.

Interesting. How do you measure it exactly? Do you kill the buffer 
between tries?

I have a fast Intel CPU that is barely 2 years old (i9-8950HK), 
system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og 
-g3'", the build is from emacs-27 branch, recent revision.

With 'emacs -Q' it's a little faster, but still

   (benchmark 1 '(progn (find-file "src/xdisp.c")))

prints out

   Elapsed time: 0.968598s (0.144805s in 8 GCs)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:22                                       ` Dmitry Gutov
@ 2020-04-04 11:06                                         ` Alan Mackenzie
  2020-04-04 11:26                                           ` Eli Zaretskii
                                                             ` (2 more replies)
  0 siblings, 3 replies; 109+ messages in thread
From: Alan Mackenzie @ 2020-04-04 11:06 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, emacs-devel, casouri, monnier, akrl

Hello, Dmitry.

On Wed, Apr 01, 2020 at 18:22:00 +0300, Dmitry Gutov wrote:
> On 01.04.2020 16:52, Alan Mackenzie wrote:
> > This doesn't seem to affect starting up performance that badly.  On my
> > machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification
> > of the first screenful of comments) is taking 0.18s.

> Interesting. How do you measure it exactly? Do you kill the buffer 
> between tries?

Using my macro time-it, I did:

(time-it (find-file "..../src/xdisp.c") (sit-for 0))

.  I think this was without the file yet being in the OS's file cache.
Mind you, I have an nvme SSD.

> I have a fast Intel CPU that is barely 2 years old (i9-8950HK), 
> system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og 
> -g3'", the build is from emacs-27 branch, recent revision.

That's a debugging build, isn't it?  That probably explains the
difference.

> With 'emacs -Q' it's a little faster, but still

>    (benchmark 1 '(progn (find-file "src/xdisp.c")))

> prints out

>    Elapsed time: 0.968598s (0.144805s in 8 GCs)

Is that also measuring the time for redisplay?

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 11:06                                         ` Alan Mackenzie
@ 2020-04-04 11:26                                           ` Eli Zaretskii
  2020-04-04 14:14                                             ` Andrea Corallo
  2020-04-04 11:27                                           ` Eli Zaretskii
  2020-04-04 12:01                                           ` Dmitry Gutov
  2 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 11:26 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: casouri, akrl, emacs-devel, monnier, dgutov

> Date: Sat, 4 Apr 2020 11:06:43 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, akrl@sdf.org, casouri@gmail.com,
>   monnier@iro.umontreal.ca, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > I have a fast Intel CPU that is barely 2 years old (i9-8950HK), 
> > system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og 
> > -g3'", the build is from emacs-27 branch, recent revision.
> 
> That's a debugging build, isn't it?

No, it's an optimized build, just not with -O2.  -Og is similar to -O1,
so slightly less optimized than -O2.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 11:26                                           ` Eli Zaretskii
@ 2020-04-04 14:14                                             ` Andrea Corallo
  2020-04-04 14:41                                               ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Andrea Corallo @ 2020-04-04 14:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Mackenzie, casouri, emacs-devel, monnier, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sat, 4 Apr 2020 11:06:43 +0000
>> Cc: Eli Zaretskii <eliz@gnu.org>, akrl@sdf.org, casouri@gmail.com,
>>   monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> From: Alan Mackenzie <acm@muc.de>
>> 
>> > I have a fast Intel CPU that is barely 2 years old (i9-8950HK), 
>> > system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og 
>> > -g3'", the build is from emacs-27 branch, recent revision.
>> 
>> That's a debugging build, isn't it?
>
> No, it's an optimized build, just not with -O2.  -Og is similar to -O1,
> so slightly less optimized than -O2.

Be careful that -Og produce considerably slower code than -O2.  For
instance if I'm not wrong it disable completely inlining that is one of
the most rewarding optimizations.

Andrea

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 14:14                                             ` Andrea Corallo
@ 2020-04-04 14:41                                               ` Eli Zaretskii
  2020-04-04 15:04                                                 ` Andrea Corallo
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 14:41 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: acm, casouri, emacs-devel, monnier, dgutov

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Alan Mackenzie <acm@muc.de>, dgutov@yandex.ru, casouri@gmail.com,
>         monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sat, 04 Apr 2020 14:14:45 +0000
> 
> Be careful that -Og produce considerably slower code than -O2.  For
> instance if I'm not wrong it disable completely inlining that is one of
> the most rewarding optimizations.

Yes, I know.  But the difference in performance between -Og and -O2
cannot be 8- or 9-fold, it should be somewhere around 50% to 70%.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 14:41                                               ` Eli Zaretskii
@ 2020-04-04 15:04                                                 ` Andrea Corallo
  2020-04-04 15:38                                                   ` Richard Copley
  0 siblings, 1 reply; 109+ messages in thread
From: Andrea Corallo @ 2020-04-04 15:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, dgutov, monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: Alan Mackenzie <acm@muc.de>, dgutov@yandex.ru, casouri@gmail.com,
>>         monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> Date: Sat, 04 Apr 2020 14:14:45 +0000
>> 
>> Be careful that -Og produce considerably slower code than -O2.  For
>> instance if I'm not wrong it disable completely inlining that is one of
>> the most rewarding optimizations.
>
> Yes, I know.  But the difference in performance between -Og and -O2
> cannot be 8- or 9-fold, it should be somewhere around 50% to 70%.

Mmmh I agree with you, one magnitude order sounds a bit too much, even
if we have a ton of small getter/setters that are usually inlined.

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 15:04                                                 ` Andrea Corallo
@ 2020-04-04 15:38                                                   ` Richard Copley
  0 siblings, 0 replies; 109+ messages in thread
From: Richard Copley @ 2020-04-04 15:38 UTC (permalink / raw)
  To: Emacs Development
  Cc: Alan Mackenzie, Eli Zaretskii, Dmitry Gutov, Andrea Corallo

Here, an -Og build takes about 2.5 times as long as an -O2 build to
execute either of the two benchmarks. That's a relative decrease of
60% in elapsed time, for -O2 relative to -Og.

I built Emacs in 4 separate clean worktrees of the master branch
(f71afd600a). The build commands were identical except for the
optimization flag. For each test I (twice) started "emacs -Q" and did
either [1] or [2]:

[1] M-: (benchmark 1 '(progn (find-file "src/xdisp.c")))
[2] M-: (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0)))

The elapsed time reported was:

without sit-for:
-O0: 1.027754s, 1.031642s
-Og: 1.295515s, 1.277441s
-O1: 0.629743s, 0.629870s
-O2: 0.513139s, 0.511230s

with sit-for:
-O0: 1.079090s, 1.068118s
-Og: 1.347256s, 1.337780s
-O1: 0.661679s, 0.664470s
-O2: 0.533649s, 0.533949s

(My only comment on the fact that -Og appears to be about 20% or 25%
worse than -O0 is that it's not a typo.)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 11:06                                         ` Alan Mackenzie
  2020-04-04 11:26                                           ` Eli Zaretskii
@ 2020-04-04 11:27                                           ` Eli Zaretskii
  2020-04-04 12:01                                           ` Dmitry Gutov
  2 siblings, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 11:27 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: akrl, casouri, emacs-devel, monnier, dgutov

> Date: Sat, 4 Apr 2020 11:06:43 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org, casouri@gmail.com,
>  monnier@iro.umontreal.ca, akrl@sdf.org
> 
> >    (benchmark 1 '(progn (find-file "src/xdisp.c")))
> 
> > prints out
> 
> >    Elapsed time: 0.968598s (0.144805s in 8 GCs)
> 
> Is that also measuring the time for redisplay?

No, redisplay runs after the function exits.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 11:06                                         ` Alan Mackenzie
  2020-04-04 11:26                                           ` Eli Zaretskii
  2020-04-04 11:27                                           ` Eli Zaretskii
@ 2020-04-04 12:01                                           ` Dmitry Gutov
  2020-04-04 12:36                                             ` Alan Mackenzie
  2 siblings, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-04 12:01 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Eli Zaretskii, akrl, casouri, monnier, emacs-devel

Hi Alan,

On 04.04.2020 14:06, Alan Mackenzie wrote:

>> Interesting. How do you measure it exactly? Do you kill the buffer
>> between tries?
> 
> Using my macro time-it, I did:
> 
> (time-it (find-file "..../src/xdisp.c") (sit-for 0))

It might be valuable if you evaluated exactly the same form I did. And 
made sure that the buffer is not visited in advance. And did that in an 
'emacs -Q' session.

> .  I think this was without the file yet being in the OS's file cache.
> Mind you, I have an nvme SSD.

I do as well. I have a fast laptop, pretty sure it's faster than what 
90% of our users have. My single-threaded performance must be better 
than yours for sure.

>> I have a fast Intel CPU that is barely 2 years old (i9-8950HK),
>> system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og
>> -g3'", the build is from emacs-27 branch, recent revision.
> 
> That's a debugging build, isn't it?  That probably explains the
> difference.

Debugging-ish. It hardly explains the 4.5x difference. So we're probably 
measuring different things.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 12:01                                           ` Dmitry Gutov
@ 2020-04-04 12:36                                             ` Alan Mackenzie
  2020-04-04 12:40                                               ` Dmitry Gutov
  2020-04-04 13:02                                               ` Eli Zaretskii
  0 siblings, 2 replies; 109+ messages in thread
From: Alan Mackenzie @ 2020-04-04 12:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, akrl, casouri, monnier, emacs-devel

Hello, Dmitry.

On Sat, Apr 04, 2020 at 15:01:23 +0300, Dmitry Gutov wrote:
> On 04.04.2020 14:06, Alan Mackenzie wrote:

> >> Interesting. How do you measure it exactly? Do you kill the buffer
> >> between tries?

> > Using my macro time-it, I did:

> > (time-it (find-file "..../src/xdisp.c") (sit-for 0))

> It might be valuable if you evaluated exactly the same form I did. And 
> made sure that the buffer is not visited in advance. And did that in an 
> 'emacs -Q' session.

Fair point:

    M-: (benchmark 1 '(progn (find-file "src/xdisp.c")))

    "Elapsed time: 1.249904s (0.165570s in 7 GCs)"

, in a build with the CLAGS and gtk toolkit like you said.  That's in
agreement with your timing, given my slightly slower machine.


> > .  I think this was without the file yet being in the OS's file cache.
> > Mind you, I have an nvme SSD.

> I do as well. I have a fast laptop, pretty sure it's faster than what 
> 90% of our users have. My single-threaded performance must be better 
> than yours for sure.

> >> I have a fast Intel CPU that is barely 2 years old (i9-8950HK),
> >> system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og
> >> -g3'", the build is from emacs-27 branch, recent revision.

> > That's a debugging build, isn't it?  That probably explains the
> > difference.

> Debugging-ish. It hardly explains the 4.5x difference. So we're probably 
> measuring different things.

I think it does explain the difference.  I repeated my previous timing,
which was 0.18s on an optimised build, and it came out at 1.16s.  That's
a factor of 6 different.  CFLAGS='-Og -g3' is a slow build.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 12:36                                             ` Alan Mackenzie
@ 2020-04-04 12:40                                               ` Dmitry Gutov
  2020-04-04 13:02                                               ` Eli Zaretskii
  1 sibling, 0 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-04 12:40 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Eli Zaretskii, emacs-devel, casouri, monnier, akrl

On 04.04.2020 15:36, Alan Mackenzie wrote:
> I think it does explain the difference.  I repeated my previous timing,
> which was 0.18s on an optimised build, and it came out at 1.16s.  That's
> a factor of 6 different.  CFLAGS='-Og -g3' is a slow build.

Hmm. Very good, thank you.

(I am just now in process of rebuilding Emacs with full optimizations; 
will report if the result is still starkly different from yours for some 
reason.)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 12:36                                             ` Alan Mackenzie
  2020-04-04 12:40                                               ` Dmitry Gutov
@ 2020-04-04 13:02                                               ` Eli Zaretskii
  2020-04-04 16:09                                                 ` Dmitry Gutov
  1 sibling, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 13:02 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: akrl, casouri, emacs-devel, monnier, dgutov

> Date: Sat, 4 Apr 2020 12:36:13 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org, casouri@gmail.com,
>   monnier@iro.umontreal.ca, akrl@sdf.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > > (time-it (find-file "..../src/xdisp.c") (sit-for 0))
> 
> > It might be valuable if you evaluated exactly the same form I did. And 
> > made sure that the buffer is not visited in advance. And did that in an 
> > 'emacs -Q' session.
> 
> Fair point:
> 
>     M-: (benchmark 1 '(progn (find-file "src/xdisp.c")))
> 
>     "Elapsed time: 1.249904s (0.165570s in 7 GCs)"
> 
> , in a build with the CLAGS and gtk toolkit like you said.  That's in
> agreement with your timing, given my slightly slower machine.

I don't believe these results.  It's night impossible for a -O2
optimized program to be 5 times faster than a -Og optimized.  And
benchmark.el doesn't seem to be so different from time-it, modulo the
function call.  Moreover, Alan's method does time redisplay, whereas
Dmitry's method does not.

So there's some other factor at work here that explains the
difference.

> I think it does explain the difference.  I repeated my previous timing,
> which was 0.18s on an optimised build, and it came out at 1.16s.  That's
> a factor of 6 different.  CFLAGS='-Og -g3' is a slow build.

It cannot be that slow.  Especially since some I/O is involved, and
you also measure redisplay.  More detailed data would be necessary to
explain the difference.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 13:02                                               ` Eli Zaretskii
@ 2020-04-04 16:09                                                 ` Dmitry Gutov
  2020-04-04 16:38                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-04 16:09 UTC (permalink / raw)
  To: Eli Zaretskii, Alan Mackenzie; +Cc: casouri, akrl, monnier, emacs-devel

On 04.04.2020 16:02, Eli Zaretskii wrote:
> I don't believe these results.  It's night impossible for a -O2
> optimized program to be 5 times faster than a -Og optimized.  And
> benchmark.el doesn't seem to be so different from time-it, modulo the
> function call.  Moreover, Alan's method does time redisplay, whereas
> Dmitry's method does not.

Unfortunately I can confirm the difference.

When Emacs is recompiled with the default optimizations,

   (benchmark 1 '(progn (find-file "src/xdisp.c")))

reports ~0.13s when FS cache is warm (compared to ~0.78 with the most 
recent -Og build here).

And

   (benchmark 1 '(progn (find-file "src/xdisp.c")
                        (goto-char (point-max))
                        (sit-for 0)))

reports ~0.29s.

Maybe CC Mode exercises some primitives that are hit especially hard by 
the lack of optimization.

Emacs looks snappier overall (e.g. during startup, loading my custom 
configuration with all its packages), but probably within the bounds of 
50-70% difference.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:09                                                 ` Dmitry Gutov
@ 2020-04-04 16:38                                                   ` Eli Zaretskii
  2020-04-04 16:45                                                     ` Eli Zaretskii
  2020-04-04 17:29                                                     ` Dmitry Gutov
  0 siblings, 2 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 16:38 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 4 Apr 2020 19:09:58 +0300
> Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> 
> When Emacs is recompiled with the default optimizations,
> 
>    (benchmark 1 '(progn (find-file "src/xdisp.c")))
> 
> reports ~0.13s when FS cache is warm (compared to ~0.78 with the most 
> recent -Og build here).
> 
> And
> 
>    (benchmark 1 '(progn (find-file "src/xdisp.c")
>                         (goto-char (point-max))
>                         (sit-for 0)))
> 
> reports ~0.29s.

Is this with xdisp.c in a Git repository or outside of a Git
repository?



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:38                                                   ` Eli Zaretskii
@ 2020-04-04 16:45                                                     ` Eli Zaretskii
  2020-04-04 17:22                                                       ` Richard Copley
  2020-04-04 17:36                                                       ` Dmitry Gutov
  2020-04-04 17:29                                                     ` Dmitry Gutov
  1 sibling, 2 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 16:45 UTC (permalink / raw)
  To: dgutov, acm; +Cc: casouri, akrl, monnier, emacs-devel

> Date: Sat, 04 Apr 2020 19:38:18 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: acm@muc.de, casouri@gmail.com, emacs-devel@gnu.org,
>  monnier@iro.umontreal.ca, akrl@sdf.org
> 
> Is this with xdisp.c in a Git repository or outside of a Git
> repository?

Also, how many GC's and the time they took did benchmark report?  With
such short timings and running the test only once, the difference GC
could make might be significant, so if different runs and different
people here have different numbers of GC, we could be comparing apples
with oranges.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:45                                                     ` Eli Zaretskii
@ 2020-04-04 17:22                                                       ` Richard Copley
  2020-04-04 17:50                                                         ` Eli Zaretskii
  2020-04-04 18:29                                                         ` Andrea Corallo
  2020-04-04 17:36                                                       ` Dmitry Gutov
  1 sibling, 2 replies; 109+ messages in thread
From: Richard Copley @ 2020-04-04 17:22 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Alan Mackenzie, Andrea Corallo, Emacs Development, Dmitry Gutov

On Sat, 4 Apr 2020 at 17:46, Eli Zaretskii <eliz@gnu.org> wrote:
>
> > Date: Sat, 04 Apr 2020 19:38:18 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: acm@muc.de, casouri@gmail.com, emacs-devel@gnu.org,
> >  monnier@iro.umontreal.ca, akrl@sdf.org
> >
> > Is this with xdisp.c in a Git repository or outside of a Git
> > repository?
>
> Also, how many GC's and the time they took did benchmark report?  With
> such short timings and running the test only once, the difference GC
> could make might be significant, so if different runs and different
> people here have different numbers of GC, we could be comparing apples
> with oranges.

For my earlier results, I ran the -Og benchmark was in the git
repository (with .git a directory) and the other three in git
worktrees (with .git a regular file). I have repeated my tests for the
-Og case in a git worktree, to match the other three. It didn't make a
significant difference. I haven't tried it outside of git.

Amended results below, including time in GC, for two runs each in
separate instances of "emacs -Q". In all 16 cases there were 8 GCs.

with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0)))
-Og 1.340039s (0.149663s), 1.350613s (0.149954s)
-O2 0.533649s (0.046995s), 0.533949s (0.046714s)
-O1 0.661679s (0.055181s), 0.664470s (0.057050s)
-O0 1.079090s (0.168691s), 1.068118s (0.168451s)

without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c")))
-Og 1.293845s (0.150200s), 1.305310s (0.149520s)
-O2 0.513139s (0.047117s), 0.511230s (0.047143s)
-O1 0.629743s (0.054738s), 0.629870s (0.056522s)
-O0 1.027754s (0.165569s), 1.031642s (0.168891s)



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:22                                                       ` Richard Copley
@ 2020-04-04 17:50                                                         ` Eli Zaretskii
  2020-04-04 18:29                                                         ` Andrea Corallo
  1 sibling, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 17:50 UTC (permalink / raw)
  To: Richard Copley; +Cc: acm, emacs-devel, dgutov, akrl

> From: Richard Copley <rcopley@gmail.com>
> Date: Sat, 4 Apr 2020 18:22:34 +0100
> Cc: Alan Mackenzie <acm@muc.de>, Andrea Corallo <akrl@sdf.org>,
>  Emacs Development <emacs-devel@gnu.org>, Dmitry Gutov <dgutov@yandex.ru>
> 
> For my earlier results, I ran the -Og benchmark was in the git
> repository (with .git a directory) and the other three in git
> worktrees (with .git a regular file). I have repeated my tests for the
> -Og case in a git worktree, to match the other three. It didn't make a
> significant difference. I haven't tried it outside of git.
> 
> Amended results below, including time in GC, for two runs each in
> separate instances of "emacs -Q". In all 16 cases there were 8 GCs.
> 
> with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0)))
> -Og 1.340039s (0.149663s), 1.350613s (0.149954s)
> -O2 0.533649s (0.046995s), 0.533949s (0.046714s)
> -O1 0.661679s (0.055181s), 0.664470s (0.057050s)
> -O0 1.079090s (0.168691s), 1.068118s (0.168451s)
> 
> without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c")))
> -Og 1.293845s (0.150200s), 1.305310s (0.149520s)
> -O2 0.513139s (0.047117s), 0.511230s (0.047143s)
> -O1 0.629743s (0.054738s), 0.629870s (0.056522s)
> -O0 1.027754s (0.165569s), 1.031642s (0.168891s)

Thanks.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:22                                                       ` Richard Copley
  2020-04-04 17:50                                                         ` Eli Zaretskii
@ 2020-04-04 18:29                                                         ` Andrea Corallo
  2020-04-04 18:56                                                           ` Richard Copley
  1 sibling, 1 reply; 109+ messages in thread
From: Andrea Corallo @ 2020-04-04 18:29 UTC (permalink / raw)
  To: Richard Copley
  Cc: Alan Mackenzie, Eli Zaretskii, Emacs Development, Dmitry Gutov

Richard Copley <rcopley@gmail.com> writes:

> For my earlier results, I ran the -Og benchmark was in the git
> repository (with .git a directory) and the other three in git
> worktrees (with .git a regular file). I have repeated my tests for the
> -Og case in a git worktree, to match the other three. It didn't make a
> significant difference. I haven't tried it outside of git.
>
> Amended results below, including time in GC, for two runs each in
> separate instances of "emacs -Q". In all 16 cases there were 8 GCs.
>
> with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0)))
> -Og 1.340039s (0.149663s), 1.350613s (0.149954s)
> -O2 0.533649s (0.046995s), 0.533949s (0.046714s)
> -O1 0.661679s (0.055181s), 0.664470s (0.057050s)
> -O0 1.079090s (0.168691s), 1.068118s (0.168451s)
>
> without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c")))
> -Og 1.293845s (0.150200s), 1.305310s (0.149520s)
> -O2 0.513139s (0.047117s), 0.511230s (0.047143s)
> -O1 0.629743s (0.054738s), 0.629870s (0.056522s)
> -O0 1.027754s (0.165569s), 1.031642s (0.168891s)

The fact that -Og is slower then -O0 is very sad but also interesting.

Which (I guess) GCC version are you on?

Generally speaking I suspect -Og is not very much tested, especially
performance wise.

  Andrea

--
akrl@sdf.org



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 18:29                                                         ` Andrea Corallo
@ 2020-04-04 18:56                                                           ` Richard Copley
  2020-04-04 20:36                                                             ` Andrea Corallo
  0 siblings, 1 reply; 109+ messages in thread
From: Richard Copley @ 2020-04-04 18:56 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Emacs Development

On Sat, 4 Apr 2020 at 19:29, Andrea Corallo <akrl@sdf.org> wrote:

> The fact that -Og is slower then -O0 is very sad but also interesting.

Yeah. Among its other selling points, it should give "a reasonable
level of optimization" [1].

[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-Og

> Which (I guess) GCC version are you on?

GCC 9.3.0, for/on 64-bit Windows, built by MSYS2.


> Generally speaking I suspect -Og is not very much tested, especially
> performance wise.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 18:56                                                           ` Richard Copley
@ 2020-04-04 20:36                                                             ` Andrea Corallo
  0 siblings, 0 replies; 109+ messages in thread
From: Andrea Corallo @ 2020-04-04 20:36 UTC (permalink / raw)
  To: Richard Copley; +Cc: Emacs Development

Richard Copley <rcopley@gmail.com> writes:

> On Sat, 4 Apr 2020 at 19:29, Andrea Corallo <akrl@sdf.org> wrote:
>
>> The fact that -Og is slower then -O0 is very sad but also interesting.
>
> Yeah. Among its other selling points, it should give "a reasonable
> level of optimization" [1].

Yep, it does not make much sense to be honest.  Just the fact you do not
spill and fill all the time every automatic variables on the stack should
give a measurable improvement.  There must be some macroscopic reason we
are missing.

> [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-Og
>
>> Which (I guess) GCC version are you on?
>
> GCC 9.3.0, for/on 64-bit Windows, built by MSYS2.
>
>
>> Generally speaking I suspect -Og is not very much tested, especially
>> performance wise.
>

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:45                                                     ` Eli Zaretskii
  2020-04-04 17:22                                                       ` Richard Copley
@ 2020-04-04 17:36                                                       ` Dmitry Gutov
  2020-04-04 17:47                                                         ` Eli Zaretskii
  1 sibling, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-04 17:36 UTC (permalink / raw)
  To: Eli Zaretskii, acm; +Cc: casouri, akrl, monnier, emacs-devel

On 04.04.2020 19:45, Eli Zaretskii wrote:
> Also, how many GC's and the time they took did benchmark report?

I showed such outputs before.

Now, with an -Og build, here are outputs of several consecutive runs:

Elapsed time: 0.912808s (0.125516s in 7 GCs)
Elapsed time: 0.772653s (0.077285s in 4 GCs)
Elapsed time: 0.769371s (0.076361s in 4 GCs)
Elapsed time: 0.776261s (0.077395s in 4 GCs)

(The first one right after Emacs was started).

> With
> such short timings and running the test only once,

I always run it several times, discarding the first result because the 
FS cache is likely cold that iteration. The buffer is killed between 
runs, of course.

> the difference GC
> could make might be significant, so if different runs and different
> people here have different numbers of GC, we could be comparing apples
> with oranges.

In an optimized build, it's always < 0.2s here. And I gave an average 
number. It's not my first time benchmarking either.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:36                                                       ` Dmitry Gutov
@ 2020-04-04 17:47                                                         ` Eli Zaretskii
  2020-04-04 18:02                                                           ` Dmitry Gutov
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 17:47 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 4 Apr 2020 20:36:03 +0300
> Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> 
> Now, with an -Og build, here are outputs of several consecutive runs:
> 
> Elapsed time: 0.912808s (0.125516s in 7 GCs)
> Elapsed time: 0.772653s (0.077285s in 4 GCs)
> Elapsed time: 0.769371s (0.076361s in 4 GCs)
> Elapsed time: 0.776261s (0.077395s in 4 GCs)
> [...]
> In an optimized build, it's always < 0.2s here.

So we are looking at -O2 being about 3 to 5 times faster than -Og,
right?  That's a speedup that is more than I'd expect, but still
nowhere near an order of magnitude that Alan's timings seemed to show.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:47                                                         ` Eli Zaretskii
@ 2020-04-04 18:02                                                           ` Dmitry Gutov
  2020-04-04 23:01                                                             ` Stefan Monnier
  0 siblings, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-04 18:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl

On 04.04.2020 20:47, Eli Zaretskii wrote:
>> Elapsed time: 0.912808s (0.125516s in 7 GCs)
>> Elapsed time: 0.772653s (0.077285s in 4 GCs)
>> Elapsed time: 0.769371s (0.076361s in 4 GCs)
>> Elapsed time: 0.776261s (0.077395s in 4 GCs)
>> [...]
>> In an optimized build, it's always < 0.2s here.
> So we are looking at -O2 being about 3 to 5 times faster than -Og,
> right?  That's a speedup that is more than I'd expect, but still
> nowhere near an order of magnitude that Alan's timings seemed to show.

0.76 / 0.13 ~= 5.86

Alan's difference is bigger, but not by much:

1.24 / 0.18 ~= 6.88
1.18 (from another email) / 0.18 ~= 6.55

Which probably makes sense given different CPU architectures.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 18:02                                                           ` Dmitry Gutov
@ 2020-04-04 23:01                                                             ` Stefan Monnier
  2020-04-06 14:25                                                               ` Yuan Fu
  0 siblings, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-04-04 23:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, Eli Zaretskii, emacs-devel, casouri, akrl

> 0.76 / 0.13 ~= 5.86
>
> Alan's difference is bigger, but not by much:
>
> 1.24 / 0.18 ~= 6.88
> 1.18 (from another email) / 0.18 ~= 6.55

That does remind me that I've had the impression "lately" that debug
builds are much slower than they used to be.  I suspect (for no reason
other than lack of imagination on my part) this is linked to the changes
from macros to inlinable functions.  When Paul started doing that we
tried to keep some "important" macros as macros (depending on
DEFINE_KEY_OPS_AS_MACROS) to keep the performance impact under control.
Maybe something changed in this respect (maybe we should add a few more
fallback-macros into the set of functions affected by
DEFINE_KEY_OPS_AS_MACROS, or maybe something prevents
DEFINE_KEY_OPS_AS_MACROS from doing its job, or ...)?

        Stefan

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 23:01                                                             ` Stefan Monnier
@ 2020-04-06 14:25                                                               ` Yuan Fu
  2020-04-06 19:55                                                                 ` Jorge Javier Araya Navarro
  0 siblings, 1 reply; 109+ messages in thread
From: Yuan Fu @ 2020-04-06 14:25 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: acm, Eli Zaretskii, Andrea Corallo, emacs-devel, Dmitry Gutov

Seems the discussion has stalled, may I ask what’s the conclusion so far? (w.r.t. whole buffer parse & how to pass text to tree-sitter.) 

Yuan


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-06 14:25                                                               ` Yuan Fu
@ 2020-04-06 19:55                                                                 ` Jorge Javier Araya Navarro
  0 siblings, 0 replies; 109+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-04-06 19:55 UTC (permalink / raw)
  To: emacs-devel



El lunes 06 de abril del 2020 a las 0825 horas, Yuan Fu escribió:

> Seems the discussion has stalled, may I ask what’s the conclusion so far? (w.r.t. whole buffer parse & how to pass text to tree-sitter.) 
>
> Yuan

whole buffer pass and using after-change-functions for incremental parsing, AFAIK. Tweak what ever needs to be tweaked or change what needs an adjustment, rinse and repeat.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:38                                                   ` Eli Zaretskii
  2020-04-04 16:45                                                     ` Eli Zaretskii
@ 2020-04-04 17:29                                                     ` Dmitry Gutov
  2020-04-04 17:38                                                       ` Eli Zaretskii
  1 sibling, 1 reply; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-04 17:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl

On 04.04.2020 19:38, Eli Zaretskii wrote:
> Is this with xdisp.c in a Git repository or outside of a Git
> repository?

Inside, always.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:29                                                     ` Dmitry Gutov
@ 2020-04-04 17:38                                                       ` Eli Zaretskii
  2020-04-04 17:57                                                         ` Dmitry Gutov
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-04-04 17:38 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> Cc: acm@muc.de, casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 4 Apr 2020 20:29:46 +0300
> 
> On 04.04.2020 19:38, Eli Zaretskii wrote:
> > Is this with xdisp.c in a Git repository or outside of a Git
> > repository?
> 
> Inside, always.

In which case invoking Git (and all the machinery that runs a
sub-process) is another factor.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:38                                                       ` Eli Zaretskii
@ 2020-04-04 17:57                                                         ` Dmitry Gutov
  0 siblings, 0 replies; 109+ messages in thread
From: Dmitry Gutov @ 2020-04-04 17:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl

On 04.04.2020 20:38, Eli Zaretskii wrote:
> In which case invoking Git (and all the machinery that runs a
> sub-process) is another factor.

See my older message about using js-mode with xdisp.c in an -Og build. 
It was 0.06s or so.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:14               ` Eli Zaretskii
  2020-03-31 14:31                 ` Dmitry Gutov
  2020-03-31 15:11                 ` Stefan Monnier
@ 2020-03-31 16:13                 ` Alan Third
  2020-03-31 17:55                   ` Eli Zaretskii
  2 siblings, 1 reply; 109+ messages in thread
From: Alan Third @ 2020-03-31 16:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, Stefan Monnier, akrl

On Tue, Mar 31, 2020 at 04:14:16PM +0300, Eli Zaretskii wrote:
> 
> In any case, I hope that passing the buffer to tree-sitter doesn't
> involve marshalling the entire buffer text via a function call as a
> huge string, or some such.  We should instead request that tree-sitter
> exposes an API through which we could give it direct access to buffer
> text as 2 parts, before and after the gap, like we do with regex
> code.  Otherwise this will be a bottleneck in the long run, not unlike
> the problem we have with LSP.

I'm not sure if this is exactly what you're talking about, but it has
an API for letting it access your own data structure:

https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code

-- 
Alan Third



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 16:13                 ` Alan Third
@ 2020-03-31 17:55                   ` Eli Zaretskii
  0 siblings, 0 replies; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:55 UTC (permalink / raw)
  To: Alan Third; +Cc: casouri, emacs-devel, monnier, akrl

> Date: Tue, 31 Mar 2020 18:13:15 +0200 (CEST)
> From: Alan Third <alan@idiocy.org>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, casouri@gmail.com,
> 	akrl@sdf.org, emacs-devel@gnu.org
> 
> I'm not sure if this is exactly what you're talking about, but it has
> an API for letting it access your own data structure:
> 
> https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code

Yes, I've read their docs.  It isn't optimal for us, although it will
do for initial experiments.  But for production I think we need
something more efficient.  One of the problems we need to solve is how
to avoid the costly encoding of buffer text, and still be able to
support the occasional raw bytes we sometimes have in our buffers.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-29 19:18   ` Eli Zaretskii
  2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
@ 2020-03-30  3:35     ` Stefan Monnier
  2020-03-30  6:02       ` Eli Zaretskii
  1 sibling, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-30  3:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Andrea Corallo

>> > Maybe those grammars could be compiled to some other representation (I
>> > don't know if it is made mostly of data-tables or actual code or what)?
>> IMO ideally should be lisp and we should leverage the native compiler
>> for that, but I understand we are not there.
> FWIW, it should indeed be possible to develop the grammars in Lisp,
> but that is not the first goal in bringing such a package to Emacs.

I'm not interested in changing the way grammars are *written*.
I'm proposing investigating if the tree-sitter run-time library can be
made to read an OS-and-architecture-neutral representation of
the grammar.


        Stefan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30  3:35     ` Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
@ 2020-03-30  6:02       ` Eli Zaretskii
  2020-03-30 13:33         ` Stefan Monnier
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-30  6:02 UTC (permalink / raw)
  To: emacs-devel, Stefan Monnier; +Cc: Andrea Corallo

On March 30, 2020 6:35:08 AM GMT+03:00, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> >> > Maybe those grammars could be compiled to some other
> representation (I
> >> > don't know if it is made mostly of data-tables or actual code or
> what)?
> >> IMO ideally should be lisp and we should leverage the native
> compiler
> >> for that, but I understand we are not there.
> > FWIW, it should indeed be possible to develop the grammars in Lisp,
> > but that is not the first goal in bringing such a package to Emacs.
> 
> I'm not interested in changing the way grammars are *written*.
> I'm proposing investigating if the tree-sitter run-time library can be
> made to read an OS-and-architecture-neutral representation of
> the grammar.

What is "OS-and-architecture-neutral representation of the grammar" and how it is different from what tree-sitter uses now?



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30  6:02       ` Eli Zaretskii
@ 2020-03-30 13:33         ` Stefan Monnier
  2020-03-30 14:09           ` Eli Zaretskii
  0 siblings, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-30 13:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Andrea Corallo, emacs-devel

> What is "OS-and-architecture-neutral representation of the grammar" and how
> it is different from what tree-sitter uses now?

I don't know, that's part of the question (well, I know what I mean by
an "OS-and-architecture-neutral representation", of course,
but I believe you also understand this concept).


        Stefan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30 13:33         ` Stefan Monnier
@ 2020-03-30 14:09           ` Eli Zaretskii
  2020-03-30 15:03             ` Stefan Monnier
  0 siblings, 1 reply; 109+ messages in thread
From: Eli Zaretskii @ 2020-03-30 14:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: akrl, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org,  Andrea Corallo <akrl@sdf.org>
> Date: Mon, 30 Mar 2020 09:33:37 -0400
> 
> > What is "OS-and-architecture-neutral representation of the grammar" and how
> > it is different from what tree-sitter uses now?
> 
> I don't know, that's part of the question (well, I know what I mean by
> an "OS-and-architecture-neutral representation", of course,
> but I believe you also understand this concept).

Actually, no, I don't.  It was a serious question, I didn't understand
what grammar representation you had in mind.



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30 14:09           ` Eli Zaretskii
@ 2020-03-30 15:03             ` Stefan Monnier
  2020-04-01  0:39               ` Stephen Leake
  0 siblings, 1 reply; 109+ messages in thread
From: Stefan Monnier @ 2020-03-30 15:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: akrl, emacs-devel

>> > What is "OS-and-architecture-neutral representation of the grammar" and how
>> > it is different from what tree-sitter uses now?
>> 
>> I don't know, that's part of the question (well, I know what I mean by
>> an "OS-and-architecture-neutral representation", of course,
>> but I believe you also understand this concept).
>
> Actually, no, I don't.  It was a serious question, I didn't understand
> what grammar representation you had in mind.

I don't have any in mind.  It just needs to be
OS-and-architecture-neutral (otherwise it requires either distribution
of pre-compiled versions (with the logistical problem of covering all
possible OSes and architectures), or it requires a compiler on the
end-user machine).


        Stefan




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30 15:03             ` Stefan Monnier
@ 2020-04-01  0:39               ` Stephen Leake
  0 siblings, 0 replies; 109+ messages in thread
From: Stephen Leake @ 2020-04-01  0:39 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> > What is "OS-and-architecture-neutral representation of the grammar" and how
>>> > it is different from what tree-sitter uses now?
>>> 
>>> I don't know, that's part of the question (well, I know what I mean by
>>> an "OS-and-architecture-neutral representation", of course,
>>> but I believe you also understand this concept).
>>
>> Actually, no, I don't.  It was a serious question, I didn't understand
>> what grammar representation you had in mind.
>
> I don't have any in mind.  It just needs to be
> OS-and-architecture-neutral (otherwise it requires either distribution
> of pre-compiled versions (with the logistical problem of covering all
> possible OSes and architectures), or it requires a compiler on the
> end-user machine).

At one extreme, the source code for the grammar is
OS-and-architecture-neutral. Tree-sitter compiles the source code to
binary (presumably in a linkable library). There may be some
intermediate representation of the grammar that would be useful in some
way, but I don't see how.

Normally, wisi compiles the grammar source to Ada code, then compiles
that to an executable, wisi also provides a "text_rep" representation of
the LR parse table (almost-readable ASCII text), but that's an
implementation detail; the Ada compiler can't handle very large tables
when represented as compilable Ada source.

semantic compiles a grammar to elisp source, then byte-compiles that.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
@ 2020-03-30 13:43 Tuấn Anh Nguyễn
  0 siblings, 0 replies; 109+ messages in thread
From: Tuấn Anh Nguyễn @ 2020-03-30 13:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> The existing third party packages should be good starting points to come
> up with a design.  But I think an important issue is to figure out how
> to make tree-sitter usable for the end users: AFAICT the main issue
> being how to let end users download and install new grammars.
> IIUC grammars are written in Javascript (or some subset thereof?) and
> then somehow compiled to C code.  Having them as C code implies either
> the end-user need to have a C compiler or distributing pre-compiled
> binaries with all the trouble this entails (with all the variations of
> OSes, and architectures, and ABIs, ..., plus issues related to
> licensing, security, ...).

In the short term, I think the practical approach would be distributing
pre-compiled binaries for major targets, and providing users with enough
tooling to compile on other targets. Licensing and security are a
different matter.

> Maybe those grammars could be compiled to some other representation (I
> don't know if it is made mostly of data-tables or actual code or what)?

Many of those grammars have parts of the lexer in custom C code.
Therefore, a common representation would require an instruction set. In
principle, that can be done with something like WebAssembly. Tree-sitter
already compiles to WebAssembly (both the runtime and the grammars). Its
playground uses that:
https://tree-sitter.github.io/tree-sitter/playground.

--
Tuấn-Anh Nguyễn
Software Engineer

^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2020-04-06 19:55 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-29 18:46 Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
2020-03-29 19:05 ` Andrea Corallo
2020-03-29 19:18   ` Eli Zaretskii
2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
2020-03-30 14:04       ` Eli Zaretskii
2020-03-30 15:06       ` Stefan Monnier
2020-03-30 17:14         ` Yuan Fu
2020-03-30 17:54           ` Stefan Monnier
2020-03-30 18:43             ` Štěpán Němec
2020-03-30 18:46               ` Stefan Monnier
2020-03-30 19:02                 ` Yuan Fu
2020-03-30 19:10                   ` Eli Zaretskii
2020-03-30 19:21                     ` Yuan Fu
2020-03-31  3:56                       ` Štěpán Němec
2020-03-31 13:16                         ` Eli Zaretskii
2020-03-31 13:36                           ` Štěpán Němec
2020-03-31 14:34                             ` Eli Zaretskii
2020-03-31 15:37                               ` Štěpán Němec
2020-03-31 15:58                                 ` Eli Zaretskii
2020-03-31 16:18                                   ` Štěpán Němec
2020-03-31 17:38                                     ` Eli Zaretskii
2020-04-01  0:57                     ` Stephen Leake
2020-03-30 19:42                   ` Stefan Monnier
2020-03-30 19:27                 ` Štěpán Němec
2020-03-31  2:24           ` Eli Zaretskii
2020-03-31  3:10             ` Stefan Monnier
2020-03-31 13:14               ` Eli Zaretskii
2020-03-31 14:31                 ` Dmitry Gutov
2020-03-31 15:36                   ` Eli Zaretskii
2020-03-31 15:45                     ` Dmitry Gutov
2020-03-31 17:16                     ` Stefan Monnier
2020-03-31 17:48                       ` Eli Zaretskii
2020-03-31 19:35                         ` Stefan Monnier
2020-04-01  2:23                           ` Eli Zaretskii
2020-03-31 15:11                 ` Stefan Monnier
2020-03-31 15:44                   ` Eli Zaretskii
2020-03-31 17:10                     ` Stefan Monnier
2020-03-31 17:19                       ` Jorge Javier Araya Navarro
2020-03-31 17:46                       ` Eli Zaretskii
2020-03-31 18:42                         ` 조성빈
2020-03-31 19:29                           ` Eli Zaretskii
2020-03-31 18:47                         ` Dmitry Gutov
2020-03-31 18:48                           ` Noam Postavsky
2020-03-31 19:02                             ` Dmitry Gutov
2020-03-31 19:26                           ` Eli Zaretskii
2020-03-31 19:50                             ` Dmitry Gutov
2020-04-01  2:28                               ` Eli Zaretskii
2020-04-01  3:49                                 ` Dmitry Gutov
2020-04-01  4:14                                   ` Eli Zaretskii
2020-04-01 13:47                                     ` Dmitry Gutov
2020-04-01 14:04                                       ` Eli Zaretskii
2020-04-01 14:55                                         ` Eli Zaretskii
2020-04-01 15:16                                         ` Dmitry Gutov
2020-04-01 15:59                                           ` Eli Zaretskii
2020-04-01 21:48                                             ` Dmitry Gutov
2020-04-01 22:29                                               ` Stefan Monnier
2020-04-02 14:23                                               ` Eli Zaretskii
2020-04-02 16:17                                                 ` Dmitry Gutov
2020-04-02 18:25                                                   ` Eli Zaretskii
2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
2020-04-03 16:10                                                     ` Dmitry Gutov
2020-04-01 13:52                                     ` Alan Mackenzie
2020-04-01 14:10                                       ` Eli Zaretskii
2020-04-01 15:27                                         ` Dmitry Gutov
2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
2020-04-01 16:03                                           ` Eli Zaretskii
2020-04-01 21:21                                             ` Dmitry Gutov
2020-04-02 14:09                                               ` Eli Zaretskii
2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
2020-04-02 18:27                                                   ` Yuan Fu
2020-04-02 19:39                                                     ` Stefan Monnier
2020-04-01 15:22                                       ` Dmitry Gutov
2020-04-04 11:06                                         ` Alan Mackenzie
2020-04-04 11:26                                           ` Eli Zaretskii
2020-04-04 14:14                                             ` Andrea Corallo
2020-04-04 14:41                                               ` Eli Zaretskii
2020-04-04 15:04                                                 ` Andrea Corallo
2020-04-04 15:38                                                   ` Richard Copley
2020-04-04 11:27                                           ` Eli Zaretskii
2020-04-04 12:01                                           ` Dmitry Gutov
2020-04-04 12:36                                             ` Alan Mackenzie
2020-04-04 12:40                                               ` Dmitry Gutov
2020-04-04 13:02                                               ` Eli Zaretskii
2020-04-04 16:09                                                 ` Dmitry Gutov
2020-04-04 16:38                                                   ` Eli Zaretskii
2020-04-04 16:45                                                     ` Eli Zaretskii
2020-04-04 17:22                                                       ` Richard Copley
2020-04-04 17:50                                                         ` Eli Zaretskii
2020-04-04 18:29                                                         ` Andrea Corallo
2020-04-04 18:56                                                           ` Richard Copley
2020-04-04 20:36                                                             ` Andrea Corallo
2020-04-04 17:36                                                       ` Dmitry Gutov
2020-04-04 17:47                                                         ` Eli Zaretskii
2020-04-04 18:02                                                           ` Dmitry Gutov
2020-04-04 23:01                                                             ` Stefan Monnier
2020-04-06 14:25                                                               ` Yuan Fu
2020-04-06 19:55                                                                 ` Jorge Javier Araya Navarro
2020-04-04 17:29                                                     ` Dmitry Gutov
2020-04-04 17:38                                                       ` Eli Zaretskii
2020-04-04 17:57                                                         ` Dmitry Gutov
2020-03-31 16:13                 ` Alan Third
2020-03-31 17:55                   ` Eli Zaretskii
2020-03-30  3:35     ` Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
2020-03-30  6:02       ` Eli Zaretskii
2020-03-30 13:33         ` Stefan Monnier
2020-03-30 14:09           ` Eli Zaretskii
2020-03-30 15:03             ` Stefan Monnier
2020-04-01  0:39               ` Stephen Leake
  -- strict thread matches above, loose matches on Subject: below --
2020-03-30 13:43 Tuấn Anh Nguyễn

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).