Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
@ 2020-03-29 18:46 Stefan Monnier
  2020-03-29 19:05 ` Andrea Corallo
  0 siblings, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-29 18:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> tree-sitter, like LSP, is something Emacs should embrace.
>   https://lists.gnu.org/archive/html/emacs-devel/2020-01/msg00059.html

Ah, thanks Eli: I guess I skipped over that while catching up.

> Would someone like to try to figure out how we could use the
> incremental parsing technology in Emacs for making our
> programming-language support more accurate and efficient?  One package
> that implements this technology is tree-sitter:
>
>   https://tree-sitter.github.io/tree-sitter/

Yes, adding support for this would be great.  

> AFAIU, these capabilities could be used as an alternative to
> regexp- and syntax-pps-based font-lock, better code folding,
> completion, refactoring, and other similar features; in general, any
> feature which would benefit from having a parse tree for the source
> code in a buffer.

Some of those features could be provided by LSP as well, but IIUC the
way LSP is designed and usually used makes it somewhat inadequate for
synchronous use, when you want an immediate answer.

tree-sitter is designed exactly for that: it can parse "immediately",
in the same sense as `syntax-ppss`, so LSP seems inapplicable (in the
near future at least) for things like font-lock and navigation, and
indentation, whereas tree-sitter should work great for that.

[ W.r.t disucssions around LSP's use of JSON: AFAICT, parsing and
  emitting json can be done as efficiently as any other format, AFAICT,
  so I don't see the use of JSON as a problem in the protocol.  ]

> To be able to use such libraries, we need to figure out how to
> integrate them into the core, what kind of interfaces would be needed
> for that, and what kind of infrastructure we would need for basing
> Lisp features on those libraries.

The existing third party packages should be good starting points to come
up with a design.  But I think an important issue is to figure out how
to make tree-sitter usable for the end users: AFAICT the main issue
being how to let end users download and install new grammars.
IIUC grammars are written in Javascript (or some subset thereof?) and
then somehow compiled to C code.  Having them as C code implies either
the end-user need to have a C compiler or distributing pre-compiled
binaries with all the trouble this entails (with all the variations of
OSes, and architectures, and ABIs, ..., plus issues related to
licensing, security, ...).

Maybe those grammars could be compiled to some other representation (I
don't know if it is made mostly of data-tables or actual code or what)?

        Stefan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-29 18:46 Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
@ 2020-03-29 19:05 ` Andrea Corallo
  2020-03-29 19:18   ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Andrea Corallo @ 2020-03-29 19:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Maybe those grammars could be compiled to some other representation (I
> don't know if it is made mostly of data-tables or actual code or what)?

IMO ideally should be lisp and we should leverage the native compiler
for that, but I understand we are not there.

  Andrea

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-29 19:05 ` Andrea Corallo
@ 2020-03-29 19:18   ` Eli Zaretskii
  2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
  2020-03-30  3:35     ` Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
  0 siblings, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-29 19:18 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: monnier, emacs-devel

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
> Date: Sun, 29 Mar 2020 19:05:57 +0000
> 
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> 
> > Maybe those grammars could be compiled to some other representation (I
> > don't know if it is made mostly of data-tables or actual code or what)?
> 
> IMO ideally should be lisp and we should leverage the native compiler
> for that, but I understand we are not there.

FWIW, it should indeed be possible to develop the grammars in Lisp,
but that is not the first goal in bringing such a package to Emacs.
Not even the second one.  Because once such a package can be used with
Emacs, and the results are significantly better than what we have
today, you will see someone come up with a way of doing that in Lisp
in no time.  Making the connection happen, and coming up with a good
design for that, should be the first goal.  IMO, we should identify
the features that can benefit from that (font-lock is just one of
them, maybe not even the most important one), and design the
interfaces and the infrastructure so that it could support them all
(and then some).  But I repeat myself.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-29 19:18   ` Eli Zaretskii
@ 2020-03-29 19:29     ` Yuan Fu
  2020-03-30 14:04       ` Eli Zaretskii
  2020-03-30 15:06       ` Stefan Monnier
  2020-03-30  3:35     ` Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
  1 sibling, 2 replies; 139+ messages in thread
From: Yuan Fu @ 2020-03-29 19:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Stefan Monnier, Andrea Corallo

A related question: is there a reliable way to be notified when buffer text changes? Because AFAICT both tree-sitter and LSP needs to know incremental changes. Both LSP packages (lsp-mode and eaglet) add hooks to after-change-function. But their hook is not guaranteed to run because of inhibit-modification-hooks. Undo seems to always know the exact change, but it doesn’t seem to have a hook avaliable.

Yuan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
@ 2020-03-30 14:04       ` Eli Zaretskii
  2020-03-30 15:06       ` Stefan Monnier
  1 sibling, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-30 14:04 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, monnier, akrl

> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 29 Mar 2020 15:29:41 -0400
> Cc: Andrea Corallo <akrl@sdf.org>,
>  Stefan Monnier <monnier@iro.umontreal.ca>,
>  emacs-devel@gnu.org
> 
> A related question: is there a reliable way to be notified when buffer text changes? Because AFAICT both tree-sitter and LSP needs to know incremental changes.

Why not simply pass to tree-sitter the chunk that jit-lock is about to
fontify?



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
  2020-03-30 14:04       ` Eli Zaretskii
@ 2020-03-30 15:06       ` Stefan Monnier
  2020-03-30 17:14         ` Yuan Fu
  1 sibling, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-30 15:06 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo

> A related question: is there a reliable way to be notified when buffer text
> changes? Because AFAICT both tree-sitter and LSP needs to know incremental
> changes. Both LSP packages (lsp-mode and eaglet) add hooks to
> after-change-function. But their hook is not guaranteed to run because of
> inhibit-modification-hooks.

If they needed to be informed of the change but
`inhibit-modification-hooks` prevented it, it's a bug.
Please report it.


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 15:06       ` Stefan Monnier
@ 2020-03-30 17:14         ` Yuan Fu
  2020-03-30 17:54           ` Stefan Monnier
  2020-03-31  2:24           ` Eli Zaretskii
  0 siblings, 2 replies; 139+ messages in thread
From: Yuan Fu @ 2020-03-30 17:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo

[-- Attachment #1: Type: text/plain, Size: 595 bytes --]


> On Mar 30, 2020, at 11:06 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
> If they needed to be informed of the change but
> `inhibit-modification-hooks` prevented it, it's a bug.
> Please report it.
> 

Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in inhibit-modification-hooks (or the code who set it to t)?


> Why not simply pass to tree-sitter the chunk that jit-lock is about to
> fontify?


Incremental parsing seems to be the preferred way to use tree-sitter—maintaining a syntax tree on the fly and later query for information from it.

Yuan

[-- Attachment #2: Type: text/html, Size: 4192 bytes --]

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 17:14         ` Yuan Fu
@ 2020-03-30 17:54           ` Stefan Monnier
  2020-03-30 18:43             ` Štěpán Němec
  2020-03-31  2:24           ` Eli Zaretskii
  1 sibling, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-30 17:54 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo

>> If they needed to be informed of the change but
>> `inhibit-modification-hooks` prevented it, it's a bug.
>> Please report it.
> Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in
> inhibit-modification-hooks (or the code who set it to t)?

The fact that they're not informed is the bug.
So it's presumably not the fault of eglot/lsp-mode.
Whose fault it is will depend on the details of the particular situation
where it occurs.


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 17:54           ` Stefan Monnier
@ 2020-03-30 18:43             ` Štěpán Němec
  2020-03-30 18:46               ` Stefan Monnier
  0 siblings, 1 reply; 139+ messages in thread
From: Štěpán Němec @ 2020-03-30 18:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Yuan Fu, Andrea Corallo, Eli Zaretskii, emacs-devel

On Mon, 30 Mar 2020 13:54:53 -0400
Stefan Monnier wrote:

>>> If they needed to be informed of the change but
>>> `inhibit-modification-hooks` prevented it, it's a bug.
>>> Please report it.
>> Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in
>> inhibit-modification-hooks (or the code who set it to t)?
>
> The fact that they're not informed is the bug.
> So it's presumably not the fault of eglot/lsp-mode.
> Whose fault it is will depend on the details of the particular situation
> where it occurs.

FWIW, I have described one such situation (unrelated to lsp) recently
here:

https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403

(In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
the buffer changes caused by populating dired buffers are not noticeable
in `after-change-functions'.)

I was wondering if I should report it as a bug, despite the workaround
not being particularly painful in this case (there's `dired-after-readin-hook').

-- 
Štěpán



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 18:43             ` Štěpán Němec
@ 2020-03-30 18:46               ` Stefan Monnier
  2020-03-30 19:02                 ` Yuan Fu
  2020-03-30 19:27                 ` Štěpán Němec
  0 siblings, 2 replies; 139+ messages in thread
From: Stefan Monnier @ 2020-03-30 18:46 UTC (permalink / raw)
  To: Štěpán Němec
  Cc: Yuan Fu, Andrea Corallo, Eli Zaretskii, emacs-devel

> https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403
> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
> the buffer changes caused by populating dired buffers are not noticeable
> in `after-change-functions'.)
> I was wondering if I should report it as a bug, despite the workaround
> not being particularly painful in this case (there's `dired-after-readin-hook').

I think it deserves a bug report, yes.


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 18:46               ` Stefan Monnier
@ 2020-03-30 19:02                 ` Yuan Fu
  2020-03-30 19:10                   ` Eli Zaretskii
  2020-03-30 19:42                   ` Stefan Monnier
  2020-03-30 19:27                 ` Štěpán Němec
  1 sibling, 2 replies; 139+ messages in thread
From: Yuan Fu @ 2020-03-30 19:02 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Eli Zaretskii, Andrea Corallo, Štěpán Němec,
	emacs-devel


> On Mar 30, 2020, at 2:46 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
>> https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403
>> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
>> the buffer changes caused by populating dired buffers are not noticeable
>> in `after-change-functions'.)
>> I was wondering if I should report it as a bug, despite the workaround
>> not being particularly painful in this case (there's `dired-after-readin-hook').
> 
> I think it deserves a bug report, yes.
> 
> 
>        Stefan
> 

Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced).

Yuan


^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:02                 ` Yuan Fu
@ 2020-03-30 19:10                   ` Eli Zaretskii
  2020-03-30 19:21                     ` Yuan Fu
  2020-04-01  0:57                     ` Stephen Leake
  2020-03-30 19:42                   ` Stefan Monnier
  1 sibling, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-30 19:10 UTC (permalink / raw)
  To: Yuan Fu; +Cc: akrl, stepnem, monnier, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 30 Mar 2020 15:02:58 -0400
> Cc: Štěpán Němec <stepnem@gmail.com>,
>  Eli Zaretskii <eliz@gnu.org>,
>  emacs-devel <emacs-devel@gnu.org>,
>  Andrea Corallo <akrl@sdf.org>
> 
> >> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
> >> the buffer changes caused by populating dired buffers are not noticeable
> >> in `after-change-functions'.)
> >> I was wondering if I should report it as a bug, despite the workaround
> >> not being particularly painful in this case (there's `dired-after-readin-hook').
> > 
> > I think it deserves a bug report, yes.
> > 
> > 
> >        Stefan
> > 
> 
> Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced).

I agree with Stefan: it's a bug.  All dired-readin needs to do is call
the modification hooks after it's done reading in the directory.  It's
just an optimization that it inhibits the hooks while it runs: read
the comments there and you will see why it is done.

IMO, inhibit-modification-hooks is for when some code makes a
temporary change, or a change that no one is supposed to care about,
like changing faces.  Any other case is a bug.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:10                   ` Eli Zaretskii
@ 2020-03-30 19:21                     ` Yuan Fu
  2020-03-31  3:56                       ` Štěpán Němec
  2020-04-01  0:57                     ` Stephen Leake
  1 sibling, 1 reply; 139+ messages in thread
From: Yuan Fu @ 2020-03-30 19:21 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: akrl, Štěpán Němec, Stefan Monnier,
	emacs-devel



> On Mar 30, 2020, at 3:10 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 30 Mar 2020 15:02:58 -0400
>> Cc: Štěpán Němec <stepnem@gmail.com>,
>> Eli Zaretskii <eliz@gnu.org>,
>> emacs-devel <emacs-devel@gnu.org>,
>> Andrea Corallo <akrl@sdf.org>
>> 
>>>> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
>>>> the buffer changes caused by populating dired buffers are not noticeable
>>>> in `after-change-functions'.)
>>>> I was wondering if I should report it as a bug, despite the workaround
>>>> not being particularly painful in this case (there's `dired-after-readin-hook').
>>> 
>>> I think it deserves a bug report, yes.
>>> 
>>> 
>>>       Stefan
>>> 
>> 
>> Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced).
> 
> I agree with Stefan: it's a bug.  All dired-readin needs to do is call
> the modification hooks after it's done reading in the directory.  It's
> just an optimization that it inhibits the hooks while it runs: read
> the comments there and you will see why it is done.
> 
> IMO, inhibit-modification-hooks is for when some code makes a
> temporary change, or a change that no one is supposed to care about,
> like changing faces.  Any other case is a bug.

I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'.

Yuan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:21                     ` Yuan Fu
@ 2020-03-31  3:56                       ` Štěpán Němec
  2020-03-31 13:16                         ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Štěpán Němec @ 2020-03-31  3:56 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Stefan Monnier, akrl

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

On Mon, 30 Mar 2020 15:21:10 -0400
Yuan Fu wrote:

>> IMO, inhibit-modification-hooks is for when some code makes a
>> temporary change, or a change that no one is supposed to care about,
>> like changing faces.  Any other case is a bug.
>
> I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'.

I think the explanation in (info "(elisp) Change Hooks") is quite good,
but the doc string had better clarify the usage as well.

How about the attached patch?

-- 
Štěpán


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Clarify-inhibit-modification-hooks-intended-usage-in.patch --]
[-- Type: text/x-patch, Size: 1412 bytes --]

From df7e9e1eb9e9ead46c9c8596d7f844e8b7f4d10b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com>
Date: Tue, 31 Mar 2020 05:38:50 +0200
Subject: [PATCH] Clarify inhibit-modification-hooks intended usage in its doc
 string

Cf. bug#40332 and the discussion at
https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html
---
 src/insdel.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/insdel.c b/src/insdel.c
index 21acf0e61d..a9fb25a27d 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -2397,7 +2397,13 @@ syms_of_insdel (void)
 as well as hooks attached to text properties and overlays.
 Setting this variable non-nil also inhibits file locks and checks
 whether files are locked by another Emacs session, as well as
-handling of the active region per `select-active-regions'.  */);
+handling of the active region per `select-active-regions'.
+
+This variable should only be used for modifications that do not result
+in lasting changes to buffer text contents (for example face changes or
+temporary modifications).  If you only need to delay change hooks during
+a series of changes (typically for performance reasons), you can use
+`combine-change-calls' or `combine-after-change-calls' instead.  */);
   inhibit_modification_hooks = 0;
   DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks");
 
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31  3:56                       ` Štěpán Němec
@ 2020-03-31 13:16                         ` Eli Zaretskii
  2020-03-31 13:36                           ` Štěpán Němec
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 13:16 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel

> From: Štěpán Němec <stepnem@gmail.com>
> Date: Tue, 31 Mar 2020 05:56:55 +0200
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org>,
>  Stefan Monnier <monnier@iro.umontreal.ca>, akrl@sdf.org
> 
> > I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'.
> 
> I think the explanation in (info "(elisp) Change Hooks") is quite good,
> but the doc string had better clarify the usage as well.
> 
> How about the attached patch?

Thanks, I think this is too wordy for a doc string.  I think it should
be enough to mention the two variables ("See also ...") and maybe add
a link to the ELisp manual section you mention.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:16                         ` Eli Zaretskii
@ 2020-03-31 13:36                           ` Štěpán Němec
  2020-03-31 14:34                             ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Štěpán Němec @ 2020-03-31 13:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

On Tue, 31 Mar 2020 16:16:20 +0300
Eli Zaretskii wrote:

>> I think the explanation in (info "(elisp) Change Hooks") is quite good,
>> but the doc string had better clarify the usage as well.
>> 
>> How about the attached patch?
>
> Thanks, I think this is too wordy for a doc string.  I think it should
> be enough to mention the two variables ("See also ...") and maybe add
> a link to the ELisp manual section you mention.

In that case, could we add the "should" part (or something similar) to
the manual (in addition to the doc string reference you describe)? It is
true that careful reading of the manual and the relevant doc strings as
they are now could suffice to make an informed decision on when
`inhibit-modification-hooks' is (in)appropriate, but I think having some
kind of explicit heads-up or dissuation regarding the likely misuse
would be better.

-- 
Štěpán

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:36                           ` Štěpán Němec
@ 2020-03-31 14:34                             ` Eli Zaretskii
  2020-03-31 15:37                               ` Štěpán Němec
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 14:34 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel

> From: Štěpán Němec <stepnem@gmail.com>
> Cc: casouri@gmail.com, emacs-devel@gnu.org, monnier@iro.umontreal.ca,
>  akrl@sdf.org
> Date: Tue, 31 Mar 2020 15:36:21 +0200
> 
> > Thanks, I think this is too wordy for a doc string.  I think it should
> > be enough to mention the two variables ("See also ...") and maybe add
> > a link to the ELisp manual section you mention.
> 
> In that case, could we add the "should" part (or something similar) to
> the manual (in addition to the doc string reference you describe)?

Most probably yes, but could you show the change you had in mind for
the manual?

> It is true that careful reading of the manual and the relevant doc
> strings as they are now could suffice to make an informed decision
> on when `inhibit-modification-hooks' is (in)appropriate, but I think
> having some kind of explicit heads-up or dissuation regarding the
> likely misuse would be better.

I agree, and the manual is the place to have such discussions and
recommendations.

Thanks.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 14:34                             ` Eli Zaretskii
@ 2020-03-31 15:37                               ` Štěpán Němec
  2020-03-31 15:58                                 ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Štěpán Němec @ 2020-03-31 15:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

[-- Attachment #1: Type: text/plain, Size: 342 bytes --]

On Tue, 31 Mar 2020 17:34:59 +0300
Eli Zaretskii wrote:

>> In that case, could we add the "should" part (or something similar) to
>> the manual (in addition to the doc string reference you describe)?
>
> Most probably yes, but could you show the change you had in mind for
> the manual?

Another attempt attached.

  Štěpán


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Clarify-documentation-on-inhibit-modification-hooks-.patch --]
[-- Type: text/x-patch, Size: 2038 bytes --]

From ccf0390392b08bcc1aa9aff24bb62dd3bb4bbfbd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com>
Date: Tue, 31 Mar 2020 05:38:50 +0200
Subject: [PATCH] Clarify documentation on inhibit-modification-hooks intended
 usage

Cf. bug#40332 and the discussion at
https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html
---
 doc/lispref/text.texi | 7 +++++++
 src/insdel.c          | 8 +++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index 3bb055a68d..daba03fadf 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -5776,4 +5776,11 @@ Change Hooks
 may cause recursive calls to the modification hooks, so be sure to
 prepare for that (for example, by binding some variable which tells
 your hook to do nothing).
+
+@strong{Warning:} You should only bind this variable for modifications
+that do not result in lasting changes to buffer text contents (for
+example face changes or temporary modifications).  If you need to
+delay change hooks during a series of changes (typically for
+performance reasons), use @code{combine-change-calls} or
+@code{combine-after-change-calls} instead.
 @end defvar
diff --git a/src/insdel.c b/src/insdel.c
index 21acf0e61d..236346fada 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -2397,7 +2397,13 @@ syms_of_insdel (void)
 as well as hooks attached to text properties and overlays.
 Setting this variable non-nil also inhibits file locks and checks
 whether files are locked by another Emacs session, as well as
-handling of the active region per `select-active-regions'.  */);
+handling of the active region per `select-active-regions'.
+
+To delay change hooks during a series of changes, use
+`combine-change-calls' or `combine-after-change-calls' instead of
+modifying this variable.
+
+See also the info node `(elisp) Change Hooks'.  */);
   inhibit_modification_hooks = 0;
   DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks");
 
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:37                               ` Štěpán Němec
@ 2020-03-31 15:58                                 ` Eli Zaretskii
  2020-03-31 16:18                                   ` Štěpán Němec
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 15:58 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: casouri, emacs-devel, monnier, akrl

> From: Štěpán Němec <stepnem@gmail.com>
> Cc: casouri@gmail.com,  akrl@sdf.org,  monnier@iro.umontreal.ca,
>   emacs-devel@gnu.org
> Date: Tue, 31 Mar 2020 17:37:22 +0200
> 
> Another attempt attached.

Thanks.  I have a couple of minor nits:

> +@strong{Warning:} You should only bind this variable for modifications

I'd prefer to remove the warning, and say "We recommend that..."
rather than "You should only...".

> +To delay change hooks during a series of changes, use
> +`combine-change-calls' or `combine-after-change-calls' instead of
> +modifying this variable.
   ^^^^^^^^^
"binding"

Other than that, LGTM.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:58                                 ` Eli Zaretskii
@ 2020-03-31 16:18                                   ` Štěpán Němec
  2020-03-31 17:38                                     ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Štěpán Němec @ 2020-03-31 16:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 492 bytes --]

On Tue, 31 Mar 2020 18:58:58 +0300
Eli Zaretskii wrote:

>> +@strong{Warning:} You should only bind this variable for modifications
>
> I'd prefer to remove the warning, and say "We recommend that..."
> rather than "You should only...".
>
>> +To delay change hooks during a series of changes, use
>> +`combine-change-calls' or `combine-after-change-calls' instead of
>> +modifying this variable.
>   ^^^^^^^^^
> "binding"

Updated version attached, thank you.

  Štěpán


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Clarify-documentation-on-inhibit-modification-hooks-.patch --]
[-- Type: text/x-patch, Size: 2029 bytes --]

From 8e2a5a8c8381c85d138f34d37931c52c289da2ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com>
Date: Tue, 31 Mar 2020 05:38:50 +0200
Subject: [PATCH] Clarify documentation on inhibit-modification-hooks intended
 usage

Cf. bug#40332 and the discussion at
https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html
---
 doc/lispref/text.texi | 7 +++++++
 src/insdel.c          | 8 +++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index 3bb055a68d..0d32c571b7 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -5776,4 +5776,11 @@ Change Hooks
 may cause recursive calls to the modification hooks, so be sure to
 prepare for that (for example, by binding some variable which tells
 your hook to do nothing).
+
+We recommend that you only bind this variable for modifications that
+do not result in lasting changes to buffer text contents (for example
+face changes or temporary modifications).  If you need to delay change
+hooks during a series of changes (typically for performance reasons),
+use @code{combine-change-calls} or @code{combine-after-change-calls}
+instead.
 @end defvar
diff --git a/src/insdel.c b/src/insdel.c
index 21acf0e61d..dfa1cc311c 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -2397,7 +2397,13 @@ syms_of_insdel (void)
 as well as hooks attached to text properties and overlays.
 Setting this variable non-nil also inhibits file locks and checks
 whether files are locked by another Emacs session, as well as
-handling of the active region per `select-active-regions'.  */);
+handling of the active region per `select-active-regions'.
+
+To delay change hooks during a series of changes, use
+`combine-change-calls' or `combine-after-change-calls' instead of
+binding this variable.
+
+See also the info node `(elisp) Change Hooks'.  */);
   inhibit_modification_hooks = 0;
   DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks");
 
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 16:18                                   ` Štěpán Němec
@ 2020-03-31 17:38                                     ` Eli Zaretskii
  0 siblings, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:38 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel

> From: Štěpán Němec <stepnem@gmail.com>
> Cc: casouri@gmail.com,  emacs-devel@gnu.org,  monnier@iro.umontreal.ca,
>   akrl@sdf.org
> Date: Tue, 31 Mar 2020 18:18:57 +0200
> 
> Updated version attached, thank you.

Perfect, thanks.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:10                   ` Eli Zaretskii
  2020-03-30 19:21                     ` Yuan Fu
@ 2020-04-01  0:57                     ` Stephen Leake
  1 sibling, 0 replies; 139+ messages in thread
From: Stephen Leake @ 2020-04-01  0:57 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Is it really a bug of dired-mode? Dired-mode probably has a good
>> reason to bind `inhibit-modification-hooks` to t. And if we provide
>> such feature (disabling after-change-functions), we should expect
>> people using it. Maybe there should be a reliable way to be informed
>> of buffer changes (that cannot be silenced).
>
> I agree with Stefan: it's a bug.  All dired-readin needs to do is call
> the modification hooks after it's done reading in the directory.  It's
> just an optimization that it inhibits the hooks while it runs: read
> the comments there and you will see why it is done.
>
> IMO, inhibit-modification-hooks is for when some code makes a
> temporary change, or a change that no one is supposed to care about,
> like changing faces.  Any other case is a bug.

ada-mode occasionally binds wisi-inhibit-parse for a similar reason; it
is writing Ada source, so it is about to make several changes, during
which the buffer will be syntactically incorrect, but it will be correct
when done. The wisi after-change-functions still record changed
regions, but the parser is not called until all the changes are done.

Perhaps tree-sitter and eglot could use a similar approach.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 19:02                 ` Yuan Fu
  2020-03-30 19:10                   ` Eli Zaretskii
@ 2020-03-30 19:42                   ` Stefan Monnier
  1 sibling, 0 replies; 139+ messages in thread
From: Stefan Monnier @ 2020-03-30 19:42 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Eli Zaretskii, Andrea Corallo, Štěpán Němec,
	emacs-devel

>>> https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403
>>> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so
>>> the buffer changes caused by populating dired buffers are not noticeable
>>> in `after-change-functions'.)
>>> I was wondering if I should report it as a bug, despite the workaround
>>> not being particularly painful in this case (there's `dired-after-readin-hook').
>> I think it deserves a bug report, yes.
> Is it really a bug of dired-mode?

Just file the bug report and send me the bug number so I can include it
in the commit of the fix I have here ready to be installed.


        Stefan "if you have to wonder if it's a bug, then file it as a bug"




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 18:46               ` Stefan Monnier
  2020-03-30 19:02                 ` Yuan Fu
@ 2020-03-30 19:27                 ` Štěpán Němec
  1 sibling, 0 replies; 139+ messages in thread
From: Štěpán Němec @ 2020-03-30 19:27 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Yuan Fu, emacs-devel, Eli Zaretskii, Andrea Corallo

On Mon, 30 Mar 2020 14:46:48 -0400
Stefan Monnier wrote:

> I think it deserves a bug report, yes.

Done (bug#40332), thanks.

-- 
Štěpán



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-30 17:14         ` Yuan Fu
  2020-03-30 17:54           ` Stefan Monnier
@ 2020-03-31  2:24           ` Eli Zaretskii
  2020-03-31  3:10             ` Stefan Monnier
  1 sibling, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31  2:24 UTC (permalink / raw)
  To: Yuan Fu; +Cc: akrl, monnier, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 30 Mar 2020 13:14:02 -0400
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org,
>  Andrea Corallo <akrl@sdf.org>
> 
>  Why not simply pass to tree-sitter the chunk that jit-lock is about to
>  fontify?
> 
> Incremental parsing seems to be the preferred way to use tree-sitter—maintaining a syntax tree on the fly
> and later query for information from it.

I don't see how this contradicts my proposal of passing just the chunk
that we need to fontify.  The function that actually passes the
portion of the buffer to tree-sitter can always extend the chunk in
both direction to make it easier, like make sure it's a complete code
block or something.

IOW, our goal is not to build the syntax tree, it's to give
tree-sitter enough information to allow us to fontify the part that's
about to be displayed.  We need to have tree-sitter play by Emacs
rules, not teach Emacs to play by tree-sitter rules.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31  2:24           ` Eli Zaretskii
@ 2020-03-31  3:10             ` Stefan Monnier
  2020-03-31 13:14               ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-31  3:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Yuan Fu, akrl, emacs-devel

> IOW, our goal is not to build the syntax tree, it's to give
> tree-sitter enough information to allow us to fontify the part that's
> about to be displayed.  We need to have tree-sitter play by Emacs
> rules, not teach Emacs to play by tree-sitter rules.

IIUC, tree-sitter starts by parsing the whole buffer anyway, and then
keeps the parse tree up-to-date in response to buffer changes.

Its algorithm is tuned so that the time needed to update the tree is
more or less proportional to the size of the change.

So jit-lock/font-lock doesn't need to pass any part of the buffer to
tree-sitter: tree-sitter already has the buffer's content and we can
assume its already parsed.  What emacs-tree-sitter's proposed
tree-sitter-highlight does is provide a function which takes
a START..END, then finds which part of the existing parse tree cover
that region and "reads the tree" to fontify the corresponding text.

        Stefan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31  3:10             ` Stefan Monnier
@ 2020-03-31 13:14               ` Eli Zaretskii
  2020-03-31 14:31                 ` Dmitry Gutov
                                   ` (2 more replies)
  0 siblings, 3 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 13:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Yuan Fu <casouri@gmail.com>,  emacs-devel@gnu.org,  akrl@sdf.org
> Date: Mon, 30 Mar 2020 23:10:57 -0400
> 
> > IOW, our goal is not to build the syntax tree, it's to give
> > tree-sitter enough information to allow us to fontify the part that's
> > about to be displayed.  We need to have tree-sitter play by Emacs
> > rules, not teach Emacs to play by tree-sitter rules.
> 
> IIUC, tree-sitter starts by parsing the whole buffer anyway, and then
> keeps the parse tree up-to-date in response to buffer changes.

Why does it need the entire buffer up front? that sounds like a
potential performance killer.  Fontifying a small part of a buffer
doesn't need its entire text.

In any case, I hope that passing the buffer to tree-sitter doesn't
involve marshalling the entire buffer text via a function call as a
huge string, or some such.  We should instead request that tree-sitter
exposes an API through which we could give it direct access to buffer
text as 2 parts, before and after the gap, like we do with regex
code.  Otherwise this will be a bottleneck in the long run, not unlike
the problem we have with LSP.

> Its algorithm is tuned so that the time needed to update the tree is
> more or less proportional to the size of the change.
> 
> So jit-lock/font-lock doesn't need to pass any part of the buffer to
> tree-sitter: tree-sitter already has the buffer's content and we can
> assume its already parsed.  What emacs-tree-sitter's proposed
> tree-sitter-highlight does is provide a function which takes
> a START..END, then finds which part of the existing parse tree cover
> that region and "reads the tree" to fontify the corresponding text.

I still don't see why it would need the entire buffer for this class
of applications.  Did anyone try the alternatives, in particular on
very large buffers?

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:14               ` Eli Zaretskii
@ 2020-03-31 14:31                 ` Dmitry Gutov
  2020-03-31 15:36                   ` Eli Zaretskii
  2020-03-31 15:11                 ` Stefan Monnier
  2020-03-31 16:13                 ` Alan Third
  2 siblings, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-03-31 14:31 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: casouri, emacs-devel, akrl

On 31.03.2020 16:14, Eli Zaretskii wrote:
> Why does it need the entire buffer up front? that sounds like a
> potential performance killer.  Fontifying a small part of a buffer
> doesn't need its entire text.

Because the end product of parsing the buffer is an AST, and the author 
decided to minimize the odds of problems that come with 
incomplete/broken ASTs.

The previous (first) discussion of TreeSitter has an URL to a 
presentation video. You can give it a watch.

Regarding performance, their solution is to make first parsing as fast 
as possible, and updates to an existing AST faster still.

As for the difficulty of sending the whole buffer contents... maybe VS 
Code and Atom somehow make it easier? If so, someone should investigate 
why it has to be slower in Emacs.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 14:31                 ` Dmitry Gutov
@ 2020-03-31 15:36                   ` Eli Zaretskii
  2020-03-31 15:45                     ` Dmitry Gutov
  2020-03-31 17:16                     ` Stefan Monnier
  0 siblings, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 15:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl

> Cc: casouri@gmail.com, akrl@sdf.org, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 31 Mar 2020 17:31:43 +0300
> 
> On 31.03.2020 16:14, Eli Zaretskii wrote:
> > Why does it need the entire buffer up front? that sounds like a
> > potential performance killer.  Fontifying a small part of a buffer
> > doesn't need its entire text.
> 
> Because the end product of parsing the buffer is an AST, and the author 
> decided to minimize the odds of problems that come with 
> incomplete/broken ASTs.

But it definitely can work with parts of the buffer, and we don't need
it to have a complete AST for this particular job.

> The previous (first) discussion of TreeSitter has an URL to a 
> presentation video. You can give it a watch.

Thanks, I've watched it back in January, when I wrote my message
calling for its integration.

> Regarding performance, their solution is to make first parsing as fast 
> as possible, and updates to an existing AST faster still.

I'm talking about _our_ performance, not theirs.

> As for the difficulty of sending the whole buffer contents... maybe VS 
> Code and Atom somehow make it easier? If so, someone should investigate 
> why it has to be slower in Emacs.

It should be obvious that sending a buffer as a single string is less
efficient than letting tree-sitter access buffer text directly.  We
just need an appropriate API for that (maybe there is one already, I
didn't take a look at their sources since January).



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:36                   ` Eli Zaretskii
@ 2020-03-31 15:45                     ` Dmitry Gutov
  2020-03-31 17:16                     ` Stefan Monnier
  1 sibling, 0 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-03-31 15:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

On 31.03.2020 18:36, Eli Zaretskii wrote:
> But it definitely can work with parts of the buffer, and we don't need
> it to have a complete AST for this particular job.

Syntax highlighting can and often does depend on buffer contents after 
the region.

It's one thing to mis-highlight a part of the buffer because the 
contents are incomplete (the user hasn't typed the full expression).

It's another thing to mis-highlight it because the chunk requested by 
jit-lock ended on a particular ambiguous position.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:36                   ` Eli Zaretskii
  2020-03-31 15:45                     ` Dmitry Gutov
@ 2020-03-31 17:16                     ` Stefan Monnier
  2020-03-31 17:48                       ` Eli Zaretskii
  1 sibling, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-31 17:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel, Dmitry Gutov

> It should be obvious that sending a buffer as a single string is less
> efficient than letting tree-sitter access buffer text directly.  We
> just need an appropriate API for that (maybe there is one already, I
> didn't take a look at their sources since January).

My benchmark say that `buffer-string` takes about 1/3 the time of
`parse-partial-sexp`, so letting tree-sitter access our buffer text
directly is unlikely to give more than a 30% speed up.

It doesn't mean it wouldn't be a desirable optimization, but it does
mean that it likely won't make a large difference as to whether it's
"fast enough".


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:16                     ` Stefan Monnier
@ 2020-03-31 17:48                       ` Eli Zaretskii
  2020-03-31 19:35                         ` Stefan Monnier
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel, dgutov

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Dmitry Gutov <dgutov@yandex.ru>,  casouri@gmail.com,  akrl@sdf.org,
>   emacs-devel@gnu.org
> Date: Tue, 31 Mar 2020 13:16:33 -0400
> 
> > It should be obvious that sending a buffer as a single string is less
> > efficient than letting tree-sitter access buffer text directly.  We
> > just need an appropriate API for that (maybe there is one already, I
> > didn't take a look at their sources since January).
> 
> My benchmark say that `buffer-string` takes about 1/3 the time of
> `parse-partial-sexp`, so letting tree-sitter access our buffer text
> directly is unlikely to give more than a 30% speed up.

Sure, but we never call parse-partial-sexp on the entire buffer, do
we?

> It doesn't mean it wouldn't be a desirable optimization, but it does
> mean that it likely won't make a large difference as to whether it's
> "fast enough".

I disagree.  Communicating with a C library by making a string out of
buffer text is extremely inelegant and inefficient.  We shouldn't do
that except when the strings are very short.





^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:48                       ` Eli Zaretskii
@ 2020-03-31 19:35                         ` Stefan Monnier
  2020-04-01  2:23                           ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-31 19:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel, dgutov

>> > It should be obvious that sending a buffer as a single string is less
>> > efficient than letting tree-sitter access buffer text directly.  We
>> > just need an appropriate API for that (maybe there is one already, I
>> > didn't take a look at their sources since January).
>> My benchmark say that `buffer-string` takes about 1/3 the time of
>> `parse-partial-sexp`, so letting tree-sitter access our buffer text
>> directly is unlikely to give more than a 30% speed up.
> Sure, but we never call parse-partial-sexp on the entire buffer, do we?

Not sure how that's relevant.  I only used `parse-partial-sexp` as
a lower bound on the time tree-sitter is likely to take to do its
own parsing.

>> It doesn't mean it wouldn't be a desirable optimization, but it does
>> mean that it likely won't make a large difference as to whether it's
>> "fast enough".
> I disagree.

Your disagreement doesn't seem to be with what I said: I didn't argue
about the elegance or efficiency, only about the fact that the
performance impact is likely to be small enough that it's not going to
affect the viability of the approach.

> Communicating with a C library by making a string out of buffer text
> is extremely inelegant and inefficient.  We shouldn't do that except
> when the strings are very short.

FWIW, elegant/efficient or not, that's the standard way to do
it, AFAICT.  E.g. that's what we do in `secure-hash`, that's what we do
when parsing JSON, ...

You basically always need to en/decode the content (even if it is into
utf-8, we still need to handle the potential raw-bytes), so a copy is
hard to avoid.

Note that for regexp-matching the problem is slightly different because
we don't know beforehand which part of the buffer will be consulted, so
doing a "copy and then regmatch" would be too inefficient (we'd always
need to copy everything til point-max).

        Stefan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 19:35                         ` Stefan Monnier
@ 2020-04-01  2:23                           ` Eli Zaretskii
  0 siblings, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01  2:23 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel, dgutov

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: dgutov@yandex.ru,  casouri@gmail.com,  akrl@sdf.org,  emacs-devel@gnu.org
> Date: Tue, 31 Mar 2020 15:35:41 -0400
> 
> You basically always need to en/decode the content (even if it is into
> utf-8, we still need to handle the potential raw-bytes), so a copy is
> hard to avoid.

It isn't hard in this case, AFAICT.  Tree-sitter has an API where we
can provide a function that will deliver text at a given offset.  We
should use that to access buffer text directly.  We can avoid encoding
the buffer text by converting raw bytes into something like U+FFFD, or
something else that tree-sitter will ignore.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:14               ` Eli Zaretskii
  2020-03-31 14:31                 ` Dmitry Gutov
@ 2020-03-31 15:11                 ` Stefan Monnier
  2020-03-31 15:44                   ` Eli Zaretskii
  2020-03-31 16:13                 ` Alan Third
  2 siblings, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-31 15:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel

>> IIUC, tree-sitter starts by parsing the whole buffer anyway, and then
>> keeps the parse tree up-to-date in response to buffer changes.
> Why does it need the entire buffer up front?

Because as a general rule you cannot parse a region without looking at
all the preceding text.  That's why when we fontify START..BEG we need
to begin by computing the `syntax-ppss` at START, which involved passing
the whole text from `point-min` to START though `parse-partial-sexp`.

> that sounds like a potential performance killer.

Indeed.  And so does this `syntax-ppss` call we have.
It's OK as long as the parsing is fast enough and you don't use it in
too large buffers.

E.g. I expect that most programming major modes currently exhibit
significant delays when you jump to the end of multi-GB buffer because
of that `syntax-ppss` call.

> Fontifying a small part of a buffer doesn't need its entire text.

Sadly, it does.  In specific cases you may be able to speed things up,
but that's only applicable to some cases.

I'm sure there could be other approaches that focus on trying to parse as
little of the buffer text as possible (e.g. SMIE follows this kind of
idea), but it's difficult to make them work with a "normal" grammar,
providing a full parse tree and giving a reliable result (and without
it degenerating to parsing the whole buffer anyway in most cases).

> In any case, I hope that passing the buffer to tree-sitter doesn't
> involve marshalling the entire buffer text via a function call as a
> huge string, or some such.

These are internal implementation details that can be tweaked later on.
I do expect that the code currently needs to call `buffer-string` or its
moral equivalent.  But if the resources this requires are significant
enough to worry about, then it's a great news: it means the parsing
itself is very fast.

> We should instead request that tree-sitter exposes an API through
> which we could give it direct access to buffer text as 2 parts, before
> and after the gap, like we do with regex code.  Otherwise this will be
> a bottleneck in the long run, not unlike the problem we have with LSP.

I'm not sure exactly which problem with LSP you're thinking about, but
I doubt `buffer-string` is a significant component of a performance
problem with LSP: the time to pass that string to the server via a pipe
should dwarf it.

> I still don't see why it would need the entire buffer for this class
> of applications.  Did anyone try the alternatives, in particular on
> very large buffers?

What alternatives?
How large is "very large" here?

        Stefan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:11                 ` Stefan Monnier
@ 2020-03-31 15:44                   ` Eli Zaretskii
  2020-03-31 17:10                     ` Stefan Monnier
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 15:44 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: casouri@gmail.com,  emacs-devel@gnu.org,  akrl@sdf.org
> Date: Tue, 31 Mar 2020 11:11:22 -0400
> 
> > I still don't see why it would need the entire buffer for this class
> > of applications.  Did anyone try the alternatives, in particular on
> > very large buffers?
> 
> What alternatives?

Let tree-sitter see just a portion of the buffer, like the outer block
of what will be displayed in the window.  You are saying that this is
impossible, but do tree-sitter developers also say that?

> How large is "very large" here?

xdisp.c comes to mind, obviously.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 15:44                   ` Eli Zaretskii
@ 2020-03-31 17:10                     ` Stefan Monnier
  2020-03-31 17:19                       ` Jorge Javier Araya Navarro
  2020-03-31 17:46                       ` Eli Zaretskii
  0 siblings, 2 replies; 139+ messages in thread
From: Stefan Monnier @ 2020-03-31 17:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel

>> > I still don't see why it would need the entire buffer for this class
>> > of applications.  Did anyone try the alternatives, in particular on
>> > very large buffers?
>> What alternatives?
> Let tree-sitter see just a portion of the buffer, like the outer block
> of what will be displayed in the window.  You are saying that this is
> impossible,

I think it would be definitely possible if you present "from point-min
to POS".  But "from START to END" is much more difficult, yes.

> but do tree-sitter developers also say that?

You'd have to ask them.  But what I say is based on the knowledge
I gleaned by reading the academic literature that the tree-sitter
authors cite (I did that while working on an article on SMIE ;-)

In any case, your question is really about the design of tree-sitter
rather than the design of the interface between tree-sitter and Emacs.

AFAICT tree-sitter is pretty close to the state of the art in this area,
so I think it's worth trying it out to see how it performs before
considering changing its design.

>> How large is "very large" here?
> xdisp.c comes to mind, obviously.

I'd expect tree-sitter to be able to parse xdisp.c in one second or less.

        Stefan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:10                     ` Stefan Monnier
@ 2020-03-31 17:19                       ` Jorge Javier Araya Navarro
  2020-03-31 17:46                       ` Eli Zaretskii
  1 sibling, 0 replies; 139+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-03-31 17:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, casouri, akrl

[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]

>>> How large is "very large" here?
>> xdisp.c comes to mind, obviously.
>
> I'd expect tree-sitter to be able to parse xdisp.c in one second or less.

It's funny because this can be tested doing a C program, sadly I don't have
the time now for writting it.

El mar., 31 de mar. de 2020 a la(s) 11:10, Stefan Monnier (
monnier@iro.umontreal.ca) escribió:

> >> > I still don't see why it would need the entire buffer for this class
> >> > of applications.  Did anyone try the alternatives, in particular on
> >> > very large buffers?
> >> What alternatives?
> > Let tree-sitter see just a portion of the buffer, like the outer block
> > of what will be displayed in the window.  You are saying that this is
> > impossible,
>
> I think it would be definitely possible if you present "from point-min
> to POS".  But "from START to END" is much more difficult, yes.
>
> > but do tree-sitter developers also say that?
>
> You'd have to ask them.  But what I say is based on the knowledge
> I gleaned by reading the academic literature that the tree-sitter
> authors cite (I did that while working on an article on SMIE ;-)
>
> In any case, your question is really about the design of tree-sitter
> rather than the design of the interface between tree-sitter and Emacs.
>
> AFAICT tree-sitter is pretty close to the state of the art in this area,
> so I think it's worth trying it out to see how it performs before
> considering changing its design.
>
> >> How large is "very large" here?
> > xdisp.c comes to mind, obviously.
>
> I'd expect tree-sitter to be able to parse xdisp.c in one second or less.
>
>
>         Stefan
>
>
>

[-- Attachment #2: Type: text/html, Size: 2465 bytes --]

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:10                     ` Stefan Monnier
  2020-03-31 17:19                       ` Jorge Javier Araya Navarro
@ 2020-03-31 17:46                       ` Eli Zaretskii
  2020-03-31 18:42                         ` 조성빈
  2020-03-31 18:47                         ` Dmitry Gutov
  1 sibling, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:46 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: casouri@gmail.com,  emacs-devel@gnu.org,  akrl@sdf.org
> Date: Tue, 31 Mar 2020 13:10:27 -0400
> 
> >> How large is "very large" here?
> > xdisp.c comes to mind, obviously.
> 
> I'd expect tree-sitter to be able to parse xdisp.c in one second or less.

One second of delay before the first window-full is displayed?  This
is like infinity.

And you didn't account for the time to take buffer-string of the
entire buffer (which involves allocating a large chunk of memory),
then encode it in UTF-8 (which needs to allocate another chunk of
memory), and pass that to tree-sitter.  If that's what the current
interface does.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:46                       ` Eli Zaretskii
@ 2020-03-31 18:42                         ` 조성빈
  2020-03-31 19:29                           ` Eli Zaretskii
  2020-03-31 18:47                         ` Dmitry Gutov
  1 sibling, 1 reply; 139+ messages in thread
From: 조성빈 @ 2020-03-31 18:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, casouri, akrl, Emacs-devel


> 2020. 4. 1. 오전 2:53, Eli Zaretskii <eliz@gnu.org> 작성:
> 
> 
>> 
>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: casouri@gmail.com,  emacs-devel@gnu.org,  akrl@sdf.org
>> Date: Tue, 31 Mar 2020 13:10:27 -0400
>> 
>>>> How large is "very large" here?
>>> xdisp.c comes to mind, obviously.
>> 
>> I'd expect tree-sitter to be able to parse xdisp.c in one second or less.
> 
> One second of delay before the first window-full is displayed?  This
> is like infinity.

Maybe I misunderstood, or maybe it’s just b.c. I don’t know enough internals, but doesn’t Emacs just display the raw text until highlighting is finished? It wouldn’t be an experience of not seeing the text for a sec, it would be more of a see the text and highlights are applied later.  

> And you didn't account for the time to take buffer-string of the
> entire buffer (which involves allocating a large chunk of memory),
> then encode it in UTF-8 (which needs to allocate another chunk of
> memory), and pass that to tree-sitter.  If that's what the current
> interface does.
> 



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 18:42                         ` 조성빈
@ 2020-03-31 19:29                           ` Eli Zaretskii
  0 siblings, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 19:29 UTC (permalink / raw)
  To: 조성빈; +Cc: casouri, Emacs-devel, monnier, akrl

> From: 조성빈 <pcr910303@icloud.com>
> Date: Wed, 1 Apr 2020 03:42:31 +0900
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, casouri@gmail.com,
>  akrl@sdf.org, Emacs-devel@gnu.org
> 
> > One second of delay before the first window-full is displayed?  This
> > is like infinity.
> 
> Maybe I misunderstood, or maybe it’s just b.c. I don’t know enough internals, but doesn’t Emacs just display the raw text until highlighting is finished?

I guess you are talking about jit-lock-defer-time and friends.  That's
off by default.  The default behavior is to fontify completely the
chunk that is about to be displayed (actually, we fontify slightly
more than that).



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:46                       ` Eli Zaretskii
  2020-03-31 18:42                         ` 조성빈
@ 2020-03-31 18:47                         ` Dmitry Gutov
  2020-03-31 18:48                           ` Noam Postavsky
  2020-03-31 19:26                           ` Eli Zaretskii
  1 sibling, 2 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-03-31 18:47 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: casouri, emacs-devel, akrl

On 31.03.2020 20:46, Eli Zaretskii wrote:
> One second of delay before the first window-full is displayed?  This
> is like infinity.

This is what we have now:

(benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))

=> Elapsed time: 1.940401s (0.376140s in 6 GCs)



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 18:47                         ` Dmitry Gutov
@ 2020-03-31 18:48                           ` Noam Postavsky
  2020-03-31 19:02                             ` Dmitry Gutov
  2020-03-31 19:26                           ` Eli Zaretskii
  1 sibling, 1 reply; 139+ messages in thread
From: Noam Postavsky @ 2020-03-31 18:48 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: Eli Zaretskii, akrl, Yuan Fu, Stefan Monnier, Emacs developers

On Tue, 31 Mar 2020 at 14:47, Dmitry Gutov <dgutov@yandex.ru> wrote:
>
> On 31.03.2020 20:46, Eli Zaretskii wrote:
> > One second of delay before the first window-full is displayed?  This
> > is like infinity.
>
> This is what we have now:

Except that s/first window-full/last window-full/



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 18:48                           ` Noam Postavsky
@ 2020-03-31 19:02                             ` Dmitry Gutov
  0 siblings, 0 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-03-31 19:02 UTC (permalink / raw)
  To: Noam Postavsky
  Cc: Eli Zaretskii, akrl, Yuan Fu, Stefan Monnier, Emacs developers

On 31.03.2020 21:48, Noam Postavsky wrote:
> On Tue, 31 Mar 2020 at 14:47, Dmitry Gutov<dgutov@yandex.ru>  wrote:
>> On 31.03.2020 20:46, Eli Zaretskii wrote:
>>> One second of delay before the first window-full is displayed?  This
>>> is like infinity.
>> This is what we have now:
> Except that s/first window-full/last window-full/

True. And I meant to suggest that, on average, we'd get the same 1 
second delay (if we assume all positions in the file are equally probable).

However, I've just tried the same experiment without goto-char, and got 
essentially the same result as with it: 1.2 s (my previous result was 
with "cold" filesystem cache).

In addition to that, though, I think this call returns before the window 
finishes displaying. So, when point is at eob, there's some extra wait, 
but I'm not sure how to measure it.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 18:47                         ` Dmitry Gutov
  2020-03-31 18:48                           ` Noam Postavsky
@ 2020-03-31 19:26                           ` Eli Zaretskii
  2020-03-31 19:50                             ` Dmitry Gutov
  1 sibling, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 19:26 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl

> Cc: casouri@gmail.com, akrl@sdf.org, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 31 Mar 2020 21:47:17 +0300
> 
> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))
> 
> => Elapsed time: 1.940401s (0.376140s in 6 GCs)

This doesn't measure the redisplay (which happens after the above
command returns).



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 19:26                           ` Eli Zaretskii
@ 2020-03-31 19:50                             ` Dmitry Gutov
  2020-04-01  2:28                               ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-03-31 19:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

On 31.03.2020 22:26, Eli Zaretskii wrote:
>> Cc:casouri@gmail.com,akrl@sdf.org,emacs-devel@gnu.org
>> From: Dmitry Gutov<dgutov@yandex.ru>
>> Date: Tue, 31 Mar 2020 21:47:17 +0300
>>
>> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))
>>
>> => Elapsed time: 1.940401s (0.376140s in 6 GCs)
> This doesn't measure the redisplay (which happens after the above
> command returns).

Which means that the current state of affairs is even slower.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 19:50                             ` Dmitry Gutov
@ 2020-04-01  2:28                               ` Eli Zaretskii
  2020-04-01  3:49                                 ` Dmitry Gutov
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01  2:28 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl

> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 31 Mar 2020 22:50:43 +0300
> 
> >> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))
> >>
> >> => Elapsed time: 1.940401s (0.376140s in 6 GCs)
> > This doesn't measure the redisplay (which happens after the above
> > command returns).
> 
> Which means that the current state of affairs is even slower.

No, it means that whatever delay we will have with parsing the entire
buffer is _in_addition_ to whatever you measured.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  2:28                               ` Eli Zaretskii
@ 2020-04-01  3:49                                 ` Dmitry Gutov
  2020-04-01  4:14                                   ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-01  3:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

On 01.04.2020 05:28, Eli Zaretskii wrote:
>> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org,
>>   emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Tue, 31 Mar 2020 22:50:43 +0300
>>
>>>> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max))))
>>>>
>>>> => Elapsed time: 1.940401s (0.376140s in 6 GCs)
>>> This doesn't measure the redisplay (which happens after the above
>>> command returns).
>>
>> Which means that the current state of affairs is even slower.
> 
> No, it means that whatever delay we will have with parsing the entire
> buffer is _in_addition_ to whatever you measured.

Probably not. IIUC, most of this 1.2 measured delay is CC Mode doing the 
preliminary parsing. That phase would be replaced by TreeSitter's full 
buffer parse, which supposedly takes a comparable amount of time.

The redisplay phase will most likely be faster because by then the 
correct AST is available, and computing highlighting based on it is 
supposedly something that TreeSitter does quickly and well.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  3:49                                 ` Dmitry Gutov
@ 2020-04-01  4:14                                   ` Eli Zaretskii
  2020-04-01 13:47                                     ` Dmitry Gutov
  2020-04-01 13:52                                     ` Alan Mackenzie
  0 siblings, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01  4:14 UTC (permalink / raw)
  To: emacs-devel, Dmitry Gutov; +Cc: casouri, monnier, akrl

On April 1, 2020 6:49:45 AM GMT+03:00, Dmitry Gutov <dgutov@yandex.ru> wrote:
> On 01.04.2020 05:28, Eli Zaretskii wrote:
> >> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org,
> >>   emacs-devel@gnu.org
> >> From: Dmitry Gutov <dgutov@yandex.ru>
> >> Date: Tue, 31 Mar 2020 22:50:43 +0300
> >>
> >>>> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char
> (point-max))))
> >>>>
> >>>> => Elapsed time: 1.940401s (0.376140s in 6 GCs)
> >>> This doesn't measure the redisplay (which happens after the above
> >>> command returns).
> >>
> >> Which means that the current state of affairs is even slower.
> > 
> > No, it means that whatever delay we will have with parsing the
> entire
> > buffer is _in_addition_ to whatever you measured.
> 
> Probably not. IIUC, most of this 1.2 measured delay is CC Mode doing
> the 
> preliminary parsing.

There's no need to guess.  Just profile this use case, and you will clearly see what takes most of this time.

In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes.  I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification.  CC Mode certainly doesn't seem to do that.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  4:14                                   ` Eli Zaretskii
@ 2020-04-01 13:47                                     ` Dmitry Gutov
  2020-04-01 14:04                                       ` Eli Zaretskii
  2020-04-01 13:52                                     ` Alan Mackenzie
  1 sibling, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-01 13:47 UTC (permalink / raw)
  To: Eli Zaretskii, emacs-devel; +Cc: casouri, monnier, akrl

On 01.04.2020 07:14, Eli Zaretskii wrote:

> There's no need to guess.  Just profile this use case, and you will clearly see what takes most of this time.

   - c-mode                                      772  75%
    - c-common-init                              766  74%
     - mapc                                      764  74%
      - #<compiled 0x158957d29ef1>                509  49%
       + c-neutralize-syntax-in-CPP                276  26%
       + c-after-change-mark-abnormal-strings                204  19%
       + c-parse-quotes-after-change                 18   1%
      - #<compiled 0x158957d29ee5>                255  24%
       + c-before-change-check-unbalanced-strings                199  19%
       + c-depropertize-CPP                       46   4%
       c-font-lock-init                            1   0%
       c-basic-common-init                         1   0%

You can also compare CC Mode's init with JS Mode's.

If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same 
benchmark takes ~60ms. So yes, CC Mode does a lot during initialization, 
and that stuff can be described as "preliminary parsing".

And there will be more of that during redisplay itself.

> In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes.  I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification.  CC Mode certainly doesn't seem to do that.

Now you know.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 13:47                                     ` Dmitry Gutov
@ 2020-04-01 14:04                                       ` Eli Zaretskii
  2020-04-01 14:55                                         ` Eli Zaretskii
  2020-04-01 15:16                                         ` Dmitry Gutov
  0 siblings, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01 14:04 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel

> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, akrl@sdf.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 1 Apr 2020 16:47:02 +0300
> 
> On 01.04.2020 07:14, Eli Zaretskii wrote:
> 
> > There's no need to guess.  Just profile this use case, and you will clearly see what takes most of this time.
> 
>    - c-mode                                      772  75%
>     - c-common-init                              766  74%
>      - mapc                                      764  74%
>       - #<compiled 0x158957d29ef1>                509  49%
>        + c-neutralize-syntax-in-CPP                276  26%
>        + c-after-change-mark-abnormal-strings                204  19%
>        + c-parse-quotes-after-change                 18   1%
>       - #<compiled 0x158957d29ee5>                255  24%
>        + c-before-change-check-unbalanced-strings                199  19%
>        + c-depropertize-CPP                       46   4%
>        c-font-lock-init                            1   0%
>        c-basic-common-init                         1   0%

I see a very different picture here: the above takes something like
15%.  Most of the time is spent in functions called by jit-lock.

> If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same 
> benchmark takes ~60ms. So yes, CC Mode does a lot during initialization, 
> and that stuff can be described as "preliminary parsing".

Except that I cannot reproduce these results, so I'm not really sure
what we are looking at.

What I did was start the profiler, then manually call got-char, then
produce the profiler report.  What did you do to collect the above
profile?

> And there will be more of that during redisplay itself.

Which is not what your benchmark measures.

> > In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes.  I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification.  CC Mode certainly doesn't seem to do that.
> 
> Now you know.

Do I?



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 14:04                                       ` Eli Zaretskii
@ 2020-04-01 14:55                                         ` Eli Zaretskii
  2020-04-01 15:16                                         ` Dmitry Gutov
  1 sibling, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01 14:55 UTC (permalink / raw)
  To: dgutov; +Cc: casouri, emacs-devel, monnier, akrl

> Date: Wed, 01 Apr 2020 17:04:24 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> 
> What I did was start the profiler, then manually call got-char, then
> produce the profiler report.

That came out confusingly unclear.  What I actually did was start the
profiler, then evaluate the form that visits xdisp.c and goes to
point-max, then call profiler-report.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 14:04                                       ` Eli Zaretskii
  2020-04-01 14:55                                         ` Eli Zaretskii
@ 2020-04-01 15:16                                         ` Dmitry Gutov
  2020-04-01 15:59                                           ` Eli Zaretskii
  1 sibling, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-01 15:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

On 01.04.2020 17:04, Eli Zaretskii wrote:
>> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, akrl@sdf.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Wed, 1 Apr 2020 16:47:02 +0300
>>
>> On 01.04.2020 07:14, Eli Zaretskii wrote:
>>
>>> There's no need to guess.  Just profile this use case, and you will clearly see what takes most of this time.
>>
>>     - c-mode                                      772  75%
>>      - c-common-init                              766  74%
>>       - mapc                                      764  74%
>>        - #<compiled 0x158957d29ef1>                509  49%
>>         + c-neutralize-syntax-in-CPP                276  26%
>>         + c-after-change-mark-abnormal-strings                204  19%
>>         + c-parse-quotes-after-change                 18   1%
>>        - #<compiled 0x158957d29ee5>                255  24%
>>         + c-before-change-check-unbalanced-strings                199  19%
>>         + c-depropertize-CPP                       46   4%
>>         c-font-lock-init                            1   0%
>>         c-basic-common-init                         1   0%
> 
> I see a very different picture here: the above takes something like
> 15%.  Most of the time is spent in functions called by jit-lock.

What are your measurements, though? Again, what does this print out?

   (benchmark 1 '(progn (find-file "src/xdisp.c")))

>> If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same
>> benchmark takes ~60ms. So yes, CC Mode does a lot during initialization,
>> and that stuff can be described as "preliminary parsing".
> 
> Except that I cannot reproduce these results, so I'm not really sure
> what we are looking at.
> 
> What I did was start the profiler, then manually call got-char, then
> produce the profiler report.  What did you do to collect the above
> profile?

No 'goto-char'. As we've established, it only affects the time taken by 
redisplay, and I can't measure that. So I'm not profiling it either, 
otherwise I'd be comparing apples to oranges.

>> And there will be more of that during redisplay itself.
> 
> Which is not what your benchmark measures.

Exactly. Like I said, I can't measure how long redisplay itself takes.

>>> In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes.  I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification.  CC Mode certainly doesn't seem to do that.
>>
>> Now you know.
> 
> Do I?

Yes. The numbers can be different, but there is definitely some up-front 
computation there. One that's not present with e.g. js-mode.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:16                                         ` Dmitry Gutov
@ 2020-04-01 15:59                                           ` Eli Zaretskii
  2020-04-01 21:48                                             ` Dmitry Gutov
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01 15:59 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel

> Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  akrl@sdf.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 1 Apr 2020 18:16:04 +0300
> 
> > I see a very different picture here: the above takes something like
> > 15%.  Most of the time is spent in functions called by jit-lock.
> 
> What are your measurements, though?

My full profile is below.  This is from Emacs 27.0.90 compiled with
the -Og optimization and with wide-int (which slows down Emacs by
about 30%).

> Again, what does this print out?
> 
>    (benchmark 1 '(progn (find-file "src/xdisp.c")))

  Elapsed time: 1.733853s (0.140584s in 6 GCs)

> No 'goto-char'. As we've established, it only affects the time taken by 
> redisplay, and I can't measure that. So I'm not profiling it either, 
> otherwise I'd be comparing apples to oranges.

See the second profile below.

> Yes. The numbers can be different, but there is definitely some up-front 
> computation there. One that's not present with e.g. js-mode.

So you are saying that we should do that up-front computation just
because CC mode currently does it?  That we shouldn't try to eliminate
such preprocessing?  I don't think so.

Here's the profile from visiting xdisp.c and going to end of the
buffer:

- redisplay_internal (C function)                                  65  41%
 - jit-lock-function                                               65  41%
  - jit-lock-fontify-now                                           65  41%
   - jit-lock--run-functions                                       65  41%
    - run-hook-wrapped                                             65  41%
     - #<compiled -0x1ffffffff8adaa88>                             65  41%
      - font-lock-fontify-region                                   65  41%
       - c-font-lock-fontify-region                                65  41%
        - font-lock-default-fontify-region                         50  31%
         - font-lock-fontify-keywords-region                       35  22%
          - c-font-lock-declarations                               34  21%
           - c-find-decl-spots                                     34  21%
            - c-bs-at-toplevel-p                                   32  20%
             - c-brace-stack-at                                    32  20%
              - c-update-brace-stack                               31  19%
               - c-syntactic-re-search-forward                     27  17%
                - c-beginning-of-macro                              6   3%
                   back-to-indentation                              2   1%
                   #<compiled -0x1ffffffff8ae5f98>                  1   0%
              c-forward-sws                                         1   0%
          - c-font-lock-complex-decl-prepare                        1   0%
           - c-parse-state                                          1   0%
            - c-parse-state-1                                       1   0%
             - c-parse-state-get-strategy                           1   0%
              - c-get-fallback-scan-pos                             1   0%
               - beginning-of-defun                                 1   0%
                - beginning-of-defun-raw                            1   0%
                   syntax-ppss                                      1   0%
         - font-lock-fontify-syntactically-region                  15   9%
            syntax-ppss                                            15   9%
        - c-before-context-fl-expand-region                        15   9%
         - mapc                                                    15   9%
          - #<compiled -0x1ffffffff8a66198>                        15   9%
           - c-context-expand-fl-region                            15   9%
            - c-fl-decl-start                                      15   9%
             - c-literal-start                                     14   8%
              - c-semi-pp-to-literal                               14   8%
                 c-parse-ps-state-below                            14   8%
               c-determine-limit                                    1   0%
- command-execute                                                  64  40%
 - call-interactively                                              64  40%
  - funcall-interactively                                          63  40%
   - eval-last-sexp                                                63  40%
    - elisp--eval-last-sexp                                        63  40%
     - eval                                                        63  40%
      - progn                                                      63  40%
       - progn                                                     63  40%
        - find-file                                                63  40%
         - find-file-noselect                                      63  40%
          - find-file-noselect-1                                   63  40%
           - after-find-file                                       63  40%
            - normal-mode                                          61  38%
             - set-auto-mode                                       61  38%
              - set-auto-mode-0                                    61  38%
               - c-mode                                            61  38%
                - c-common-init                                    57  36%
                 - mapc                                            57  36%
                  - #<compiled -0x1ffffffff8a7d680>                 37  23%
                   - c-neutralize-syntax-in-CPP                    20  12%
                    - c-beginning-of-macro                          4   2%
                       c-backward-single-comment                    2   1%
                       back-to-indentation                          1   0%
                      c-no-comment-end-of-macro                     3   1%
                     c-after-change-mark-abnormal-strings                 15   9%
                     c-parse-quotes-after-change                    1   0%
                  - #<compiled -0x1ffffffff8a7d6b0>                 20  12%
                   - c-before-change-check-unbalanced-strings                 15   9%
                    - c-literal-limits                             15   9%
                     - c-full-pp-to-literal                        15   9%
                        c-parse-ps-state-below                     15   9%
                     c-depropertize-CPP                             4   2%
                - byte-code                                         2   1%
                   require                                          1   0%
                - run-mode-hooks                                    1   0%
                 - hack-local-variables                             1   0%
                  - hack-dir-local-variables                        1   0%
                     dir-locals-read-from-dir                       1   0%
            - run-hooks                                             2   1%
             - vc-refresh-state                                     2   1%
              - vc-backend                                          2   1%
               - vc-registered                                      2   1%
                - mapc                                              2   1%
                 - #<compiled -0x1ffffffff8a67780>                  2   1%
                  - vc-call-backend                                 2   1%
                   - apply                                          2   1%
                    - vc-git-registered                             2   1%
                     - if                                           2   1%
                      - progn                                       2   1%
                       - load                                       1   0%
                          require                                   1   0%
  - byte-code                                                       1   0%
   - read-extended-command                                          1   0%
    - completing-read                                               1   0%
       completing-read-default                                      1   0%
- ...                                                              28  17%
   Automatic GC                                                    27  17%
 - substitute-key-definition-key                                    1   0%
  - substitute-key-definition                                       1   0%
   - map-keymap                                                     1   0%
    - #<compiled -0x1ffffffff8a80eb8>                               1   0%
     - substitute-key-definition-key                                1   0%
      - substitute-key-definition                                   1   0%
       - map-keymap                                                 1   0%
        - #<compiled -0x1ffffffff8a80c48>                           1   0%
         - substitute-key-definition-key                            1   0%
          - substitute-key-definition                               1   0%
           - map-keymap                                             1   0%
            - #<compiled -0x1ffffffff8a80658>                       1   0%
             - substitute-key-definition-key                        1   0%
              - substitute-key-definition                           1   0%
               - map-keymap                                         1   0%
                  #<compiled -0x1ffffffff8a7ce58>                   1   0%

Here's the profile from just visiting xdisp.c:

- command-execute                                                  67  82%
 - call-interactively                                              67  82%
  - funcall-interactively                                          67  82%
   - eval-expression                                               67  82%
    - eval                                                         67  82%
     - progn                                                       67  82%
      - find-file                                                  67  82%
       - find-file-noselect                                        67  82%
        - find-file-noselect-1                                     66  81%
         - after-find-file                                         66  81%
          - normal-mode                                            62  76%
           - set-auto-mode                                         62  76%
            - set-auto-mode-0                                      62  76%
             - c-mode                                              62  76%
              - c-common-init                                      55  67%
               - mapc                                              55  67%
                - #<compiled -0x1ffffffff8aa7940>                  36  44%
                 - c-neutralize-syntax-in-CPP                      21  25%
                  - c-beginning-of-macro                            2   2%
                     c-backward-single-comment                      1   1%
                   c-after-change-mark-abnormal-strings                 14  17%
                - #<compiled -0x1ffffffff8aa7970>                  19  23%
                 - c-before-change-check-unbalanced-strings                 14  17%
                  - c-literal-limits                               14  17%
                   - c-full-pp-to-literal                          14  17%
                      c-parse-ps-state-below                       14  17%
                 - c-depropertize-CPP                               4   4%
                    c-end-of-macro                                  1   1%
              - byte-code                                           6   7%
                 require                                            4   4%
               - substitute-key-definition                          1   1%
                - map-keymap                                        1   1%
                 - #<compiled -0x1ffffffff8aac0b8>                  1   1%
                  - substitute-key-definition-key                   1   1%
                   - substitute-key-definition                      1   1%
                      map-keymap                                    1   1%
          - run-hooks                                               4   4%
           - vc-refresh-state                                       4   4%
            - vc-backend                                            4   4%
             - vc-registered                                        4   4%
              - mapc                                                3   3%
               - #<compiled -0x1ffffffff8ae8e88>                    3   3%
                - vc-call-backend                                   3   3%
                 - apply                                            3   3%
                  - vc-git-registered                               2   2%
                   - if                                             2   2%
                    - progn                                         2   2%
                     - load                                         1   1%
                      - require                                     1   1%
                       - defconst                                   1   1%
                          byte-code                                 1   1%
                     - vc-git-registered                            1   1%
                      - vc-git--out-ok                              1   1%
                       - apply                                      1   1%
                        - vc-git--call                              1   1%
                         - apply                                    1   1%
                          - process-file                            1   1%
                             apply                                  1   1%
                  - vc-git-find-file-hook                           1   1%
                   - vc-state                                       1   1%
                    - vc-state-refresh                              1   1%
                     - vc-call-backend                              1   1%
                      - apply                                       1   1%
                       - vc-git-state                               1   1%
                        - apply                                     1   1%
                         - vc-git--run-command-string                  1   1%
                          - apply                                   1   1%
                           - vc-git--out-ok                         1   1%
                            - apply                                 1   1%
                             - vc-git--call                         1   1%
                              - apply                               1   1%
                               - process-file                       1   1%
                                  apply                             1   1%
                vc-file-getprop                                     1   1%
        - find-buffer-visiting                                      1   1%
         - file-truename                                            1   1%
          - file-truename                                           1   1%
           - file-truename                                          1   1%
            - file-truename                                         1   1%
             - file-truename                                        1   1%
                file-truename                                       1   1%
- ...                                                              14  17%
   Automatic GC                                                    14  17%



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:59                                           ` Eli Zaretskii
@ 2020-04-01 21:48                                             ` Dmitry Gutov
  2020-04-01 22:29                                               ` Stefan Monnier
  2020-04-02 14:23                                               ` Eli Zaretskii
  0 siblings, 2 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-01 21:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

On 01.04.2020 18:59, Eli Zaretskii wrote:

>> What are your measurements, though?
> 
> My full profile is below.  This is from Emacs 27.0.90 compiled with
> the -Og optimization and with wide-int (which slows down Emacs by
> about 30%).

Thank you. I also build with '-Og -g3' these days, but probably have a 
faster CPU.

>> Again, what does this print out?
>>
>>     (benchmark 1 '(progn (find-file "src/xdisp.c")))
> 
>    Elapsed time: 1.733853s (0.140584s in 6 GCs)

All right. So it takes 1.7s just to open the file, even before full 
syntax highlighting.

>> No 'goto-char'. As we've established, it only affects the time taken by
>> redisplay, and I can't measure that. So I'm not profiling it either,
>> otherwise I'd be comparing apples to oranges.
> 
> See the second profile below.

Comparing both, looks like redisplay (when at eob, at least) takes 
approx. the same amount of time?

>> Yes. The numbers can be different, but there is definitely some up-front
>> computation there. One that's not present with e.g. js-mode.
> 
> So you are saying that we should do that up-front computation just
> because CC mode currently does it?  That we shouldn't try to eliminate
> such preprocessing?  I don't think so.

AFAIU CC Mode could actually eliminate it, but that would require a 
significant rework of its internals.

I'm just pointing out that apparently you didn't even notice an even 
larger delay (1.7s), and were fine with it until now.

I'm not saying that nobody should try to explore how to decrease the 
delay, and what tradeoffs come with that. But for now, I think, we 
should encourage our kind volunteers to just implement integration the 
way TreeSitter's authors expect it. And try, on our side, to provide the 
best tools for it. Then we can see how well it does or doesn't work, and 
what are the biggest annoyances that the users have with it.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 21:48                                             ` Dmitry Gutov
@ 2020-04-01 22:29                                               ` Stefan Monnier
  2020-04-02 14:23                                               ` Eli Zaretskii
  1 sibling, 0 replies; 139+ messages in thread
From: Stefan Monnier @ 2020-04-01 22:29 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, akrl, casouri, emacs-devel

> AFAIU CC Mode could actually eliminate it, but that would require
>  a significant rework of its internals.

My experiments to make CC-mode use syntax-propertize-function suggest
that it wouldn't require too much work, actually.  For an outsider, it's
difficult because it's hard to understand all the invariants/assumptions
in the current design, but if Alan and I were to work together on it, it
would be pretty easy.  So far Alan has been opposed and there are several
good reasons for that:

- it's extra work.
- it will inevitably introduce bugs.
- while it will most likely be faster when opening the file, it will
  likely be slower in other cases (e.g. when modifying the buffer near
  point-min in one window while having point-max displayed in another).
- syntax-propertize was introduced in Emacs-24 so it would require
  either dropping CC-mode's support for earlier Emacsen, or adding some
  compatibility layer (I think this compatibility layer would be
  easy to write but would likely not cover all cases).

> I'm not saying that nobody should try to explore how to decrease the delay,
> and what tradeoffs come with that. But for now, I think, we should encourage
> our kind volunteers to just implement integration the way TreeSitter's
> authors expect it. And try, on our side, to provide the best tools for
> it. Then we can see how well it does or doesn't work, and what are the
> biggest annoyances that the users have with it.

+1


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 21:48                                             ` Dmitry Gutov
  2020-04-01 22:29                                               ` Stefan Monnier
@ 2020-04-02 14:23                                               ` Eli Zaretskii
  2020-04-02 16:17                                                 ` Dmitry Gutov
  1 sibling, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-02 14:23 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel

> Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  akrl@sdf.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 2 Apr 2020 00:48:20 +0300
> 
> >> No 'goto-char'. As we've established, it only affects the time taken by
> >> redisplay, and I can't measure that. So I'm not profiling it either,
> >> otherwise I'd be comparing apples to oranges.
> > 
> > See the second profile below.
> 
> Comparing both, looks like redisplay (when at eob, at least) takes 
> approx. the same amount of time?

About 55% taken by redisplay (almost all of it due to fontification),
and the other 45% are the C mode "preprocessing" when the mode is
turned on in a buffer.

> >> Yes. The numbers can be different, but there is definitely some up-front
> >> computation there. One that's not present with e.g. js-mode.
> > 
> > So you are saying that we should do that up-front computation just
> > because CC mode currently does it?  That we shouldn't try to eliminate
> > such preprocessing?  I don't think so.
> 
> AFAIU CC Mode could actually eliminate it, but that would require a 
> significant rework of its internals.

Are we still talking about integrating a completely different parsing
engine into CC Mode?  Then redesign is a must, right?

> I'm just pointing out that apparently you didn't even notice an even 
> larger delay (1.7s), and were fine with it until now.

I didn't "didn't notice", I actually filed several bug reports and
complaints about the various slow aspects of CC mode, because the
slowdown in CC mode over the years annoys me quite a lot.  Some of the
problems were fixed, some weren't (due to limitations of the current
design, I was told).  I'm not at all complacent about this.

> I'm not saying that nobody should try to explore how to decrease the 
> delay, and what tradeoffs come with that. But for now, I think, we 
> should encourage our kind volunteers to just implement integration the 
> way TreeSitter's authors expect it. And try, on our side, to provide the 
> best tools for it. Then we can see how well it does or doesn't work, and 
> what are the biggest annoyances that the users have with it.

I cannot tell the volunteers what to do and where to invest their
resources.  But I can provide feedback on the design ideas, based on
what I know and on my experience, and I can suggest how to design and
implement this to achieve good and scalable performance.  In
particular, I think that it is useful to know what we have tried in
the past and what were the lessons we learned from that.  I hope what
I say is of some help, and I hope we will soon have such engine
available to Emacs.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 14:23                                               ` Eli Zaretskii
@ 2020-04-02 16:17                                                 ` Dmitry Gutov
  2020-04-02 18:25                                                   ` Eli Zaretskii
  2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
  0 siblings, 2 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-02 16:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel

On 02.04.2020 17:23, Eli Zaretskii wrote:

>> Comparing both, looks like redisplay (when at eob, at least) takes
>> approx. the same amount of time?
> 
> About 55% taken by redisplay (almost all of it due to fontification),
> and the other 45% are the C mode "preprocessing" when the mode is
> turned on in a buffer.

So, all in all, when xdisp.c is opened at eob, it will be displayed 
after ~2.5 seconds, I guess.

>>> So you are saying that we should do that up-front computation just
>>> because CC mode currently does it?  That we shouldn't try to eliminate
>>> such preprocessing?  I don't think so.
>>
>> AFAIU CC Mode could actually eliminate it, but that would require a
>> significant rework of its internals.
> 
> Are we still talking about integrating a completely different parsing
> engine into CC Mode?  Then redesign is a must, right?

No, that's without TreeSitter.

>> I'm just pointing out that apparently you didn't even notice an even
>> larger delay (1.7s), and were fine with it until now.
> 
> I didn't "didn't notice", I actually filed several bug reports and
> complaints about the various slow aspects of CC mode, because the
> slowdown in CC mode over the years annoys me quite a lot.  Some of the
> problems were fixed, some weren't (due to limitations of the current
> design, I was told).  I'm not at all complacent about this.

Still, compare that with 0.15 sec, which is the current estimate of 
parsing xdisp.c. It could probably be improved still by supporting a 
no-copy buffer-string in modules.

> I cannot tell the volunteers what to do and where to invest their
> resources.  But I can provide feedback on the design ideas, based on
> what I know and on my experience, and I can suggest how to design and
> implement this to achieve good and scalable performance.

We shouldn't, however, create an impression that unless they follow our 
ideas to a T we won't help them realize their own preferred approach 
(e.g. by improving the module API).

 > In
 > particular, I think that it is useful to know what we have tried in
 > the past and what were the lessons we learned from that.  I hope what
 > I say is of some help, and I hope we will soon have such engine
 > available to Emacs.

I'm fairly confident that implementing deferred/on-demand parsing in 
emacs-tree-sitter can be done later without requiring a major redesign. 
It will require, however, an extra layer of complexity either way.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 16:17                                                 ` Dmitry Gutov
@ 2020-04-02 18:25                                                   ` Eli Zaretskii
  2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
  1 sibling, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-02 18:25 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel

> Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  akrl@sdf.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 2 Apr 2020 19:17:07 +0300
> 
> > I cannot tell the volunteers what to do and where to invest their
> > resources.  But I can provide feedback on the design ideas, based on
> > what I know and on my experience, and I can suggest how to design and
> > implement this to achieve good and scalable performance.
> 
> We shouldn't, however, create an impression that unless they follow our 
> ideas to a T we won't help them realize their own preferred approach 

That's so unfair that I will in the future think twice before offering
any advice.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 16:17                                                 ` Dmitry Gutov
  2020-04-02 18:25                                                   ` Eli Zaretskii
@ 2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
  2020-04-03 16:10                                                     ` Dmitry Gutov
  1 sibling, 1 reply; 139+ messages in thread
From: Tuấn-Anh Nguyễn @ 2020-04-03 14:40 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: Eli Zaretskii, emacs-devel, casouri, Stefan Monnier,
	Andrea Corallo

On Thu, Apr 2, 2020 at 11:17 PM Dmitry Gutov <dgutov@yandex.ru> wrote:
>
> On 02.04.2020 17:23, Eli Zaretskii wrote:
>
> > I cannot tell the volunteers what to do and where to invest their
> > resources.  But I can provide feedback on the design ideas, based on
> > what I know and on my experience, and I can suggest how to design and
> > implement this to achieve good and scalable performance.
>
> We shouldn't, however, create an impression that unless they follow our
> ideas to a T we won't help them realize their own preferred approach
> (e.g. by improving the module API).
>

FWIW, this was not my impression.

--
Tuấn-Anh Nguyễn
Software Engineer



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
@ 2020-04-03 16:10                                                     ` Dmitry Gutov
  0 siblings, 0 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-03 16:10 UTC (permalink / raw)
  To: Tuấn-Anh Nguyễn
  Cc: Eli Zaretskii, emacs-devel, casouri, Stefan Monnier,
	Andrea Corallo

On 03.04.2020 17:40, Tuấn-Anh Nguyễn wrote:
> FWIW, this was not my impression.

I'm glad to hear it.

My apologies, then.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  4:14                                   ` Eli Zaretskii
  2020-04-01 13:47                                     ` Dmitry Gutov
@ 2020-04-01 13:52                                     ` Alan Mackenzie
  2020-04-01 14:10                                       ` Eli Zaretskii
  2020-04-01 15:22                                       ` Dmitry Gutov
  1 sibling, 2 replies; 139+ messages in thread
From: Alan Mackenzie @ 2020-04-01 13:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: akrl, casouri, Dmitry Gutov, monnier, emacs-devel

Hello, Eli.

On Wed, Apr 01, 2020 at 07:14:09 +0300, Eli Zaretskii wrote:
> On April 1, 2020 6:49:45 AM GMT+03:00, Dmitry Gutov <dgutov@yandex.ru> wrote:
> > On 01.04.2020 05:28, Eli Zaretskii wrote:
> > >> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org,
> > >>   emacs-devel@gnu.org
> > >> From: Dmitry Gutov <dgutov@yandex.ru>
> > >> Date: Tue, 31 Mar 2020 22:50:43 +0300

> In general, there's no "preliminary processing" by the major mode's
> fontification facilities except what happens as part of jit-lock, i.e.
> at redisplay time or as side effect of functions that simulate display
> for redisplay purposes.  I'd be very surprised to see a major mode
> which somehow preprocesses the buffer on its own in preparation for
> fontification.  CC Mode certainly doesn't seem to do that.

CC Mode does do this.  It marks syntax-table text properties throughout
the buffer at find-file time, and keeps them valid thereafter in
before/after-change-functions.

This doesn't seem to affect starting up performance that badly.  On my
machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification
of the first screenful of comments) is taking 0.18s.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 13:52                                     ` Alan Mackenzie
@ 2020-04-01 14:10                                       ` Eli Zaretskii
  2020-04-01 15:27                                         ` Dmitry Gutov
  2020-04-01 15:22                                       ` Dmitry Gutov
  1 sibling, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01 14:10 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: casouri, dgutov, emacs-devel, monnier, akrl

> Date: Wed, 1 Apr 2020 13:52:37 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: akrl@sdf.org, casouri@gmail.com, Dmitry Gutov <dgutov@yandex.ru>,
>  monnier@iro.umontreal.ca, emacs-devel@gnu.org
> 
> > In general, there's no "preliminary processing" by the major mode's
> > fontification facilities except what happens as part of jit-lock, i.e.
> > at redisplay time or as side effect of functions that simulate display
> > for redisplay purposes.  I'd be very surprised to see a major mode
> > which somehow preprocesses the buffer on its own in preparation for
> > fontification.  CC Mode certainly doesn't seem to do that.
> 
> CC Mode does do this.  It marks syntax-table text properties throughout
> the buffer at find-file time, and keeps them valid thereafter in
> before/after-change-functions.
> 
> This doesn't seem to affect starting up performance that badly.  On my
> machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification
> of the first screenful of comments) is taking 0.18s.

Like I said, the profile I see is very different, and shows that most
of the time is spent in redisplay-triggered font-lock.

But in any case, it should be trivially obvious that avoiding to parse
the entire buffer will make redisplay faster.  We should try doing
that instead of giving up, even if we think the current fontification
machinery is slow enough to make the parsing delay not so visible.
After all, we want to use these parsers to make CC Mode and friends
faster, so the design and the implementation should use every trick we
have up our sleeve to avoid expensive processing.  Just because using
buffer-substring and parsing the entire buffer up front is easy
doesn't yet mean we should go for it without trying more efficient
algorithms.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 14:10                                       ` Eli Zaretskii
@ 2020-04-01 15:27                                         ` Dmitry Gutov
  2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
  2020-04-01 16:03                                           ` Eli Zaretskii
  0 siblings, 2 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-01 15:27 UTC (permalink / raw)
  To: Eli Zaretskii, Alan Mackenzie; +Cc: casouri, emacs-devel, monnier, akrl

On 01.04.2020 17:10, Eli Zaretskii wrote:
> But in any case, it should be trivially obvious that avoiding to parse
> the entire buffer will make redisplay faster.  We should try doing
> that instead of giving up, even if we think the current fontification
> machinery is slow enough to make the parsing delay not so visible.

I think it's pointless to argue against the current design of TreeSitter 
here, where none of its developers can read it.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:27                                         ` Dmitry Gutov
@ 2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
  2020-04-01 16:03                                           ` Eli Zaretskii
  1 sibling, 0 replies; 139+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-04-01 15:44 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: casouri, emacs-devel, Stefan Monnier, Alan Mackenzie,
	Eli Zaretskii, akrl

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

Yup.

El mié., 1 de abr. de 2020 a la(s) 09:28, Dmitry Gutov (dgutov@yandex.ru)
escribió:

> On 01.04.2020 17:10, Eli Zaretskii wrote:
> > But in any case, it should be trivially obvious that avoiding to parse
> > the entire buffer will make redisplay faster.  We should try doing
> > that instead of giving up, even if we think the current fontification
> > machinery is slow enough to make the parsing delay not so visible.
>
> I think it's pointless to argue against the current design of TreeSitter
> here, where none of its developers can read it.
>
>

[-- Attachment #2: Type: text/html, Size: 937 bytes --]

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:27                                         ` Dmitry Gutov
  2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
@ 2020-04-01 16:03                                           ` Eli Zaretskii
  2020-04-01 21:21                                             ` Dmitry Gutov
  1 sibling, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01 16:03 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> Cc: akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 1 Apr 2020 18:27:43 +0300
> 
> On 01.04.2020 17:10, Eli Zaretskii wrote:
> > But in any case, it should be trivially obvious that avoiding to parse
> > the entire buffer will make redisplay faster.  We should try doing
> > that instead of giving up, even if we think the current fontification
> > machinery is slow enough to make the parsing delay not so visible.
> 
> I think it's pointless to argue against the current design of TreeSitter 
> here, where none of its developers can read it.

If by TreeSitter you mean the parser (not the Emacs package which
interfaces it), then what I proposed is not against their design,
AFAIU.  They provide an API through which we can let the parser access
the buffer text directly, and they explicitly say that the parser is
tolerant to invalid/incomplete syntax trees.  And I don't see how it
could be any different, since when you start writing code, it takes
quite some time before it becomes syntactically complete and valid.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 16:03                                           ` Eli Zaretskii
@ 2020-04-01 21:21                                             ` Dmitry Gutov
  2020-04-02 14:09                                               ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-01 21:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl

On 01.04.2020 19:03, Eli Zaretskii wrote:
>> Cc:akrl@sdf.org,casouri@gmail.com,monnier@iro.umontreal.ca,
>>   emacs-devel@gnu.org
>> From: Dmitry Gutov<dgutov@yandex.ru>
>> Date: Wed, 1 Apr 2020 18:27:43 +0300
>>
>> On 01.04.2020 17:10, Eli Zaretskii wrote:
>>> But in any case, it should be trivially obvious that avoiding to parse
>>> the entire buffer will make redisplay faster.  We should try doing
>>> that instead of giving up, even if we think the current fontification
>>> machinery is slow enough to make the parsing delay not so visible.
>> I think it's pointless to argue against the current design of TreeSitter
>> here, where none of its developers can read it.
> If by TreeSitter you mean the parser (not the Emacs package which
> interfaces it), then what I proposed is not against their design,
> AFAIU.  They provide an API through which we can let the parser access
> the buffer text directly, and they explicitly say that the parser is
> tolerant to invalid/incomplete syntax trees.  And I don't see how it
> could be any different, since when you start writing code, it takes
> quite some time before it becomes syntactically complete and valid.

That makes sense, at least in theory. But I'd rather not break the usage 
assumptions of the authors of this library right away. And we'll likely 
want to adopt existing addons which use the result of the parse, which 
likely depend on the same assumptions.

Anyway, here's a (short) discussion on the topic of large files: 
https://github.com/tree-sitter/tree-sitter/issues/222



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 21:21                                             ` Dmitry Gutov
@ 2020-04-02 14:09                                               ` Eli Zaretskii
  2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-02 14:09 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> Cc: acm@muc.de, akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 2 Apr 2020 00:21:36 +0300
> 
> > If by TreeSitter you mean the parser (not the Emacs package which
> > interfaces it), then what I proposed is not against their design,
> > AFAIU.  They provide an API through which we can let the parser access
> > the buffer text directly, and they explicitly say that the parser is
> > tolerant to invalid/incomplete syntax trees.  And I don't see how it
> > could be any different, since when you start writing code, it takes
> > quite some time before it becomes syntactically complete and valid.
> 
> That makes sense, at least in theory. But I'd rather not break the usage 
> assumptions of the authors of this library right away.

From what I could glean by reading the documentation, the above is not
necessarily against the assumptions of the tree-sitter developers.  I
saw nothing that would indicate the initial full parse is a must.
That such full parse is unnecessary is what I would expect, because of
the use case that I start writing a source file from scratch.

> And we'll likely want to adopt existing addons which use the result
> of the parse, which likely depend on the same assumptions.

Those other addons must also support the "write from scratch" use
case, right?  Then they should also support passing only part of the
buffer, since it could be that this is all I have in the buffer right
now.

> Anyway, here's a (short) discussion on the topic of large files: 
> https://github.com/tree-sitter/tree-sitter/issues/222

Thanks.  This was long ago, though, so I'm not sure what became of
that (and Stefan's comment didn't yet get any responses to indicate
that this is a solved problem).



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 14:09                                               ` Eli Zaretskii
@ 2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
  2020-04-02 18:27                                                   ` Yuan Fu
  0 siblings, 1 reply; 139+ messages in thread
From: ì¡°ì„±ë¹ˆ via "Emacs development discussions. @ 2020-04-02 18:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Gutov, acm, casouri, Emacs-devel, monnier, akrl


> 2020. 4. 2. 오후 11:10, Eli Zaretskii <eliz@gnu.org> 작성:
> 
>>
>> Cc: acm@muc.de, akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Thu, 2 Apr 2020 00:21:36 +0300
>>
>>> If by TreeSitter you mean the parser (not the Emacs package which
>>> interfaces it), then what I proposed is not against their design,
>>> AFAIU.  They provide an API through which we can let the parser access
>>> the buffer text directly, and they explicitly say that the parser is
>>> tolerant to invalid/incomplete syntax trees.  And I don't see how it
>>> could be any different, since when you start writing code, it takes
>>> quite some time before it becomes syntactically complete and valid.
>>
>> That makes sense, at least in theory. But I'd rather not break the usage
>> assumptions of the authors of this library right away.
>
> From what I could glean by reading the documentation, the above is not
> necessarily against the assumptions of the tree-sitter developers.  I
> saw nothing that would indicate the initial full parse is a must.
> That such full parse is unnecessary is what I would expect, because of
> the use case that I start writing a source file from scratch.

The situation of a new user creating a new buffer is very different from
parsing code with only a peephole, because users don’t generally expect
unfinished code to be exactly highlighted, while users do expect finished
code to have exact highlighting.

Maybe it’s just because I got lost through a lot of emails, and Mail.app
doesn't really thread these emails properly, but I can’t understand the
resistance of the front-up parsing.

The current shipping CC-Mode is parsing most of the code front-up, and
clearly tree sitter will be faster than that. AFAIU parsing code only by
only looking through a peephole is super hard except for some languages
that are designed for peephole processing - and that makes it only hard,
not super hard.

>> And we'll likely want to adopt existing addons which use the result
>> of the parse, which likely depend on the same assumptions.
>
> Those other addons must also support the "write from scratch" use
> case, right?  Then they should also support passing only part of the
> buffer, since it could be that this is all I have in the buffer right
> now.
>
>> Anyway, here's a (short) discussion on the topic of large files:
>> https://github.com/tree-sitter/tree-sitter/issues/222
>
> Thanks.  This was long ago, though, so I'm not sure what became of
> that (and Stefan's comment didn't yet get any responses to indicate
> that this is a solved problem).




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
@ 2020-04-02 18:27                                                   ` Yuan Fu
  2020-04-02 19:39                                                     ` Stefan Monnier
  0 siblings, 1 reply; 139+ messages in thread
From: Yuan Fu @ 2020-04-02 18:27 UTC (permalink / raw)
  To: 조성빈
  Cc: Emacs-devel, Stefan Monnier, Dmitry Gutov, acm, Eli Zaretskii,
	akrl

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]



> On Apr 2, 2020, at 2:03 PM, 조성빈 <pcr910303@icloud.com> wrote:
> 
> Maybe it’s just because I got lost through a lot of emails, and Mail.app
> doesn't really thread these emails properly, but I can’t understand the
> resistance of the front-up parsing.
> 

I think we are just discussing if there is any way to not parse the whole buffer up front. (Which I consider unlikely because of the nature of parsing.)

> The current shipping CC-Mode is parsing most of the code front-up, and
> clearly tree sitter will be faster than that. AFAIU parsing code only by
> only looking through a peephole is super hard except for some languages
> that are designed for peephole processing - and that makes it only hard,
> not super hard.


Some modes doesn’t require a font-up parsing. IIRC, an example from an earlier message is javascript-mode.

Yuan

[-- Attachment #2: Type: text/html, Size: 7422 bytes --]

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 18:27                                                   ` Yuan Fu
@ 2020-04-02 19:39                                                     ` Stefan Monnier
  0 siblings, 0 replies; 139+ messages in thread
From: Stefan Monnier @ 2020-04-02 19:39 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Emacs-devel, 조성빈, Dmitry Gutov, acm,
	Eli Zaretskii, akrl

> Some modes doesn’t require a font-up parsing. IIRC, an example from an
> earlier message is javascript-mode.

Yet, in order to decide whether position P in a javascript buffer is
inside a comment or not, you will either have to look at everything
between point-min and P, or think hard about all the various
possibilities to try and see if you can argue that in this particular
case it's not necessary.
E.g. if you see

    foo /* bar */

then you might be able to say that "bar" is within a comment without
looking much further.  But for "foo" you first have to look back because
there might have been an earlier unmatched `/*`.
BTW, for "bar" you still have to look a bit further: it might be that
the previous line was:

    tmp = "hello\

in which case "bar" is not inside a comment but inside a string.
Well, unless there's ... an earlier unmatched `/*`.

Etc...

For the case of Javascript I believe that you can come up with an
algorithm which will reliably give the right answer while almost never
having to go back all the way to `point-min`.  I even believe it's
possible to write a tool that will automatically find that algorithm
given a suitable input grammar.  But for some languages like Elisp,
Python, and OCaml I believe it's simply impossible (for Elisp/Python
it's because of the existence of multiline strings (with no "trailing \"
to indicate their possible presence) and for OCaml it's because of the
nested comments).

        Stefan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 13:52                                     ` Alan Mackenzie
  2020-04-01 14:10                                       ` Eli Zaretskii
@ 2020-04-01 15:22                                       ` Dmitry Gutov
  2020-04-04 11:06                                         ` Alan Mackenzie
  1 sibling, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-01 15:22 UTC (permalink / raw)
  To: Alan Mackenzie, Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl

On 01.04.2020 16:52, Alan Mackenzie wrote:
> This doesn't seem to affect starting up performance that badly.  On my
> machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification
> of the first screenful of comments) is taking 0.18s.

Interesting. How do you measure it exactly? Do you kill the buffer 
between tries?

I have a fast Intel CPU that is barely 2 years old (i9-8950HK), 
system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og 
-g3'", the build is from emacs-27 branch, recent revision.

With 'emacs -Q' it's a little faster, but still

   (benchmark 1 '(progn (find-file "src/xdisp.c")))

prints out

   Elapsed time: 0.968598s (0.144805s in 8 GCs)



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:22                                       ` Dmitry Gutov
@ 2020-04-04 11:06                                         ` Alan Mackenzie
  2020-04-04 11:26                                           ` Eli Zaretskii
                                                             ` (2 more replies)
  0 siblings, 3 replies; 139+ messages in thread
From: Alan Mackenzie @ 2020-04-04 11:06 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, emacs-devel, casouri, monnier, akrl

Hello, Dmitry.

On Wed, Apr 01, 2020 at 18:22:00 +0300, Dmitry Gutov wrote:
> On 01.04.2020 16:52, Alan Mackenzie wrote:
> > This doesn't seem to affect starting up performance that badly.  On my
> > machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification
> > of the first screenful of comments) is taking 0.18s.

> Interesting. How do you measure it exactly? Do you kill the buffer 
> between tries?

Using my macro time-it, I did:

(time-it (find-file "..../src/xdisp.c") (sit-for 0))

.  I think this was without the file yet being in the OS's file cache.
Mind you, I have an nvme SSD.

> I have a fast Intel CPU that is barely 2 years old (i9-8950HK), 
> system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og 
> -g3'", the build is from emacs-27 branch, recent revision.

That's a debugging build, isn't it?  That probably explains the
difference.

> With 'emacs -Q' it's a little faster, but still

>    (benchmark 1 '(progn (find-file "src/xdisp.c")))

> prints out

>    Elapsed time: 0.968598s (0.144805s in 8 GCs)

Is that also measuring the time for redisplay?

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 11:06                                         ` Alan Mackenzie
@ 2020-04-04 11:26                                           ` Eli Zaretskii
  2020-04-04 14:14                                             ` Andrea Corallo
  2020-04-04 11:27                                           ` Eli Zaretskii
  2020-04-04 12:01                                           ` Dmitry Gutov
  2 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 11:26 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: casouri, akrl, emacs-devel, monnier, dgutov

> Date: Sat, 4 Apr 2020 11:06:43 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, akrl@sdf.org, casouri@gmail.com,
>   monnier@iro.umontreal.ca, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > I have a fast Intel CPU that is barely 2 years old (i9-8950HK), 
> > system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og 
> > -g3'", the build is from emacs-27 branch, recent revision.
> 
> That's a debugging build, isn't it?

No, it's an optimized build, just not with -O2.  -Og is similar to -O1,
so slightly less optimized than -O2.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 11:26                                           ` Eli Zaretskii
@ 2020-04-04 14:14                                             ` Andrea Corallo
  2020-04-04 14:41                                               ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Andrea Corallo @ 2020-04-04 14:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Mackenzie, casouri, emacs-devel, monnier, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sat, 4 Apr 2020 11:06:43 +0000
>> Cc: Eli Zaretskii <eliz@gnu.org>, akrl@sdf.org, casouri@gmail.com,
>>   monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> From: Alan Mackenzie <acm@muc.de>
>> 
>> > I have a fast Intel CPU that is barely 2 years old (i9-8950HK), 
>> > system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og 
>> > -g3'", the build is from emacs-27 branch, recent revision.
>> 
>> That's a debugging build, isn't it?
>
> No, it's an optimized build, just not with -O2.  -Og is similar to -O1,
> so slightly less optimized than -O2.

Be careful that -Og produce considerably slower code than -O2.  For
instance if I'm not wrong it disable completely inlining that is one of
the most rewarding optimizations.

Andrea

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 14:14                                             ` Andrea Corallo
@ 2020-04-04 14:41                                               ` Eli Zaretskii
  2020-04-04 15:04                                                 ` Andrea Corallo
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 14:41 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: acm, casouri, emacs-devel, monnier, dgutov

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Alan Mackenzie <acm@muc.de>, dgutov@yandex.ru, casouri@gmail.com,
>         monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sat, 04 Apr 2020 14:14:45 +0000
> 
> Be careful that -Og produce considerably slower code than -O2.  For
> instance if I'm not wrong it disable completely inlining that is one of
> the most rewarding optimizations.

Yes, I know.  But the difference in performance between -Og and -O2
cannot be 8- or 9-fold, it should be somewhere around 50% to 70%.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 14:41                                               ` Eli Zaretskii
@ 2020-04-04 15:04                                                 ` Andrea Corallo
  2020-04-04 15:38                                                   ` Richard Copley
  0 siblings, 1 reply; 139+ messages in thread
From: Andrea Corallo @ 2020-04-04 15:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, dgutov, monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: Alan Mackenzie <acm@muc.de>, dgutov@yandex.ru, casouri@gmail.com,
>>         monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> Date: Sat, 04 Apr 2020 14:14:45 +0000
>> 
>> Be careful that -Og produce considerably slower code than -O2.  For
>> instance if I'm not wrong it disable completely inlining that is one of
>> the most rewarding optimizations.
>
> Yes, I know.  But the difference in performance between -Og and -O2
> cannot be 8- or 9-fold, it should be somewhere around 50% to 70%.

Mmmh I agree with you, one magnitude order sounds a bit too much, even
if we have a ton of small getter/setters that are usually inlined.

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 15:04                                                 ` Andrea Corallo
@ 2020-04-04 15:38                                                   ` Richard Copley
  0 siblings, 0 replies; 139+ messages in thread
From: Richard Copley @ 2020-04-04 15:38 UTC (permalink / raw)
  To: Emacs Development
  Cc: Alan Mackenzie, Eli Zaretskii, Dmitry Gutov, Andrea Corallo

Here, an -Og build takes about 2.5 times as long as an -O2 build to
execute either of the two benchmarks. That's a relative decrease of
60% in elapsed time, for -O2 relative to -Og.

I built Emacs in 4 separate clean worktrees of the master branch
(f71afd600a). The build commands were identical except for the
optimization flag. For each test I (twice) started "emacs -Q" and did
either [1] or [2]:

[1] M-: (benchmark 1 '(progn (find-file "src/xdisp.c")))
[2] M-: (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0)))

The elapsed time reported was:

without sit-for:
-O0: 1.027754s, 1.031642s
-Og: 1.295515s, 1.277441s
-O1: 0.629743s, 0.629870s
-O2: 0.513139s, 0.511230s

with sit-for:
-O0: 1.079090s, 1.068118s
-Og: 1.347256s, 1.337780s
-O1: 0.661679s, 0.664470s
-O2: 0.533649s, 0.533949s

(My only comment on the fact that -Og appears to be about 20% or 25%
worse than -O0 is that it's not a typo.)



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 11:06                                         ` Alan Mackenzie
  2020-04-04 11:26                                           ` Eli Zaretskii
@ 2020-04-04 11:27                                           ` Eli Zaretskii
  2020-04-04 12:01                                           ` Dmitry Gutov
  2 siblings, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 11:27 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: akrl, casouri, emacs-devel, monnier, dgutov

> Date: Sat, 4 Apr 2020 11:06:43 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org, casouri@gmail.com,
>  monnier@iro.umontreal.ca, akrl@sdf.org
> 
> >    (benchmark 1 '(progn (find-file "src/xdisp.c")))
> 
> > prints out
> 
> >    Elapsed time: 0.968598s (0.144805s in 8 GCs)
> 
> Is that also measuring the time for redisplay?

No, redisplay runs after the function exits.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 11:06                                         ` Alan Mackenzie
  2020-04-04 11:26                                           ` Eli Zaretskii
  2020-04-04 11:27                                           ` Eli Zaretskii
@ 2020-04-04 12:01                                           ` Dmitry Gutov
  2020-04-04 12:36                                             ` Alan Mackenzie
  2 siblings, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-04 12:01 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Eli Zaretskii, akrl, casouri, monnier, emacs-devel

Hi Alan,

On 04.04.2020 14:06, Alan Mackenzie wrote:

>> Interesting. How do you measure it exactly? Do you kill the buffer
>> between tries?
> 
> Using my macro time-it, I did:
> 
> (time-it (find-file "..../src/xdisp.c") (sit-for 0))

It might be valuable if you evaluated exactly the same form I did. And 
made sure that the buffer is not visited in advance. And did that in an 
'emacs -Q' session.

> .  I think this was without the file yet being in the OS's file cache.
> Mind you, I have an nvme SSD.

I do as well. I have a fast laptop, pretty sure it's faster than what 
90% of our users have. My single-threaded performance must be better 
than yours for sure.

>> I have a fast Intel CPU that is barely 2 years old (i9-8950HK),
>> system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og
>> -g3'", the build is from emacs-27 branch, recent revision.
> 
> That's a debugging build, isn't it?  That probably explains the
> difference.

Debugging-ish. It hardly explains the 4.5x difference. So we're probably 
measuring different things.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 12:01                                           ` Dmitry Gutov
@ 2020-04-04 12:36                                             ` Alan Mackenzie
  2020-04-04 12:40                                               ` Dmitry Gutov
  2020-04-04 13:02                                               ` Eli Zaretskii
  0 siblings, 2 replies; 139+ messages in thread
From: Alan Mackenzie @ 2020-04-04 12:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, akrl, casouri, monnier, emacs-devel

Hello, Dmitry.

On Sat, Apr 04, 2020 at 15:01:23 +0300, Dmitry Gutov wrote:
> On 04.04.2020 14:06, Alan Mackenzie wrote:

> >> Interesting. How do you measure it exactly? Do you kill the buffer
> >> between tries?

> > Using my macro time-it, I did:

> > (time-it (find-file "..../src/xdisp.c") (sit-for 0))

> It might be valuable if you evaluated exactly the same form I did. And 
> made sure that the buffer is not visited in advance. And did that in an 
> 'emacs -Q' session.

Fair point:

    M-: (benchmark 1 '(progn (find-file "src/xdisp.c")))

    "Elapsed time: 1.249904s (0.165570s in 7 GCs)"

, in a build with the CLAGS and gtk toolkit like you said.  That's in
agreement with your timing, given my slightly slower machine.


> > .  I think this was without the file yet being in the OS's file cache.
> > Mind you, I have an nvme SSD.

> I do as well. I have a fast laptop, pretty sure it's faster than what 
> 90% of our users have. My single-threaded performance must be better 
> than yours for sure.

> >> I have a fast Intel CPU that is barely 2 years old (i9-8950HK),
> >> system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og
> >> -g3'", the build is from emacs-27 branch, recent revision.

> > That's a debugging build, isn't it?  That probably explains the
> > difference.

> Debugging-ish. It hardly explains the 4.5x difference. So we're probably 
> measuring different things.

I think it does explain the difference.  I repeated my previous timing,
which was 0.18s on an optimised build, and it came out at 1.16s.  That's
a factor of 6 different.  CFLAGS='-Og -g3' is a slow build.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 12:36                                             ` Alan Mackenzie
@ 2020-04-04 12:40                                               ` Dmitry Gutov
  2020-04-04 13:02                                               ` Eli Zaretskii
  1 sibling, 0 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-04 12:40 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Eli Zaretskii, emacs-devel, casouri, monnier, akrl

On 04.04.2020 15:36, Alan Mackenzie wrote:
> I think it does explain the difference.  I repeated my previous timing,
> which was 0.18s on an optimised build, and it came out at 1.16s.  That's
> a factor of 6 different.  CFLAGS='-Og -g3' is a slow build.

Hmm. Very good, thank you.

(I am just now in process of rebuilding Emacs with full optimizations; 
will report if the result is still starkly different from yours for some 
reason.)



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 12:36                                             ` Alan Mackenzie
  2020-04-04 12:40                                               ` Dmitry Gutov
@ 2020-04-04 13:02                                               ` Eli Zaretskii
  2020-04-04 16:09                                                 ` Dmitry Gutov
  1 sibling, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 13:02 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: akrl, casouri, emacs-devel, monnier, dgutov

> Date: Sat, 4 Apr 2020 12:36:13 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org, casouri@gmail.com,
>   monnier@iro.umontreal.ca, akrl@sdf.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > > (time-it (find-file "..../src/xdisp.c") (sit-for 0))
> 
> > It might be valuable if you evaluated exactly the same form I did. And 
> > made sure that the buffer is not visited in advance. And did that in an 
> > 'emacs -Q' session.
> 
> Fair point:
> 
>     M-: (benchmark 1 '(progn (find-file "src/xdisp.c")))
> 
>     "Elapsed time: 1.249904s (0.165570s in 7 GCs)"
> 
> , in a build with the CLAGS and gtk toolkit like you said.  That's in
> agreement with your timing, given my slightly slower machine.

I don't believe these results.  It's night impossible for a -O2
optimized program to be 5 times faster than a -Og optimized.  And
benchmark.el doesn't seem to be so different from time-it, modulo the
function call.  Moreover, Alan's method does time redisplay, whereas
Dmitry's method does not.

So there's some other factor at work here that explains the
difference.

> I think it does explain the difference.  I repeated my previous timing,
> which was 0.18s on an optimised build, and it came out at 1.16s.  That's
> a factor of 6 different.  CFLAGS='-Og -g3' is a slow build.

It cannot be that slow.  Especially since some I/O is involved, and
you also measure redisplay.  More detailed data would be necessary to
explain the difference.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 13:02                                               ` Eli Zaretskii
@ 2020-04-04 16:09                                                 ` Dmitry Gutov
  2020-04-04 16:38                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-04 16:09 UTC (permalink / raw)
  To: Eli Zaretskii, Alan Mackenzie; +Cc: casouri, akrl, monnier, emacs-devel

On 04.04.2020 16:02, Eli Zaretskii wrote:
> I don't believe these results.  It's night impossible for a -O2
> optimized program to be 5 times faster than a -Og optimized.  And
> benchmark.el doesn't seem to be so different from time-it, modulo the
> function call.  Moreover, Alan's method does time redisplay, whereas
> Dmitry's method does not.

Unfortunately I can confirm the difference.

When Emacs is recompiled with the default optimizations,

   (benchmark 1 '(progn (find-file "src/xdisp.c")))

reports ~0.13s when FS cache is warm (compared to ~0.78 with the most 
recent -Og build here).

And

   (benchmark 1 '(progn (find-file "src/xdisp.c")
                        (goto-char (point-max))
                        (sit-for 0)))

reports ~0.29s.

Maybe CC Mode exercises some primitives that are hit especially hard by 
the lack of optimization.

Emacs looks snappier overall (e.g. during startup, loading my custom 
configuration with all its packages), but probably within the bounds of 
50-70% difference.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:09                                                 ` Dmitry Gutov
@ 2020-04-04 16:38                                                   ` Eli Zaretskii
  2020-04-04 16:45                                                     ` Eli Zaretskii
  2020-04-04 17:29                                                     ` Dmitry Gutov
  0 siblings, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 16:38 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 4 Apr 2020 19:09:58 +0300
> Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> 
> When Emacs is recompiled with the default optimizations,
> 
>    (benchmark 1 '(progn (find-file "src/xdisp.c")))
> 
> reports ~0.13s when FS cache is warm (compared to ~0.78 with the most 
> recent -Og build here).
> 
> And
> 
>    (benchmark 1 '(progn (find-file "src/xdisp.c")
>                         (goto-char (point-max))
>                         (sit-for 0)))
> 
> reports ~0.29s.

Is this with xdisp.c in a Git repository or outside of a Git
repository?



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:38                                                   ` Eli Zaretskii
@ 2020-04-04 16:45                                                     ` Eli Zaretskii
  2020-04-04 17:22                                                       ` Richard Copley
  2020-04-04 17:36                                                       ` Dmitry Gutov
  2020-04-04 17:29                                                     ` Dmitry Gutov
  1 sibling, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 16:45 UTC (permalink / raw)
  To: dgutov, acm; +Cc: casouri, akrl, monnier, emacs-devel

> Date: Sat, 04 Apr 2020 19:38:18 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: acm@muc.de, casouri@gmail.com, emacs-devel@gnu.org,
>  monnier@iro.umontreal.ca, akrl@sdf.org
> 
> Is this with xdisp.c in a Git repository or outside of a Git
> repository?

Also, how many GC's and the time they took did benchmark report?  With
such short timings and running the test only once, the difference GC
could make might be significant, so if different runs and different
people here have different numbers of GC, we could be comparing apples
with oranges.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:45                                                     ` Eli Zaretskii
@ 2020-04-04 17:22                                                       ` Richard Copley
  2020-04-04 17:50                                                         ` Eli Zaretskii
  2020-04-04 18:29                                                         ` Andrea Corallo
  2020-04-04 17:36                                                       ` Dmitry Gutov
  1 sibling, 2 replies; 139+ messages in thread
From: Richard Copley @ 2020-04-04 17:22 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Alan Mackenzie, Andrea Corallo, Emacs Development, Dmitry Gutov

On Sat, 4 Apr 2020 at 17:46, Eli Zaretskii <eliz@gnu.org> wrote:
>
> > Date: Sat, 04 Apr 2020 19:38:18 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: acm@muc.de, casouri@gmail.com, emacs-devel@gnu.org,
> >  monnier@iro.umontreal.ca, akrl@sdf.org
> >
> > Is this with xdisp.c in a Git repository or outside of a Git
> > repository?
>
> Also, how many GC's and the time they took did benchmark report?  With
> such short timings and running the test only once, the difference GC
> could make might be significant, so if different runs and different
> people here have different numbers of GC, we could be comparing apples
> with oranges.

For my earlier results, I ran the -Og benchmark was in the git
repository (with .git a directory) and the other three in git
worktrees (with .git a regular file). I have repeated my tests for the
-Og case in a git worktree, to match the other three. It didn't make a
significant difference. I haven't tried it outside of git.

Amended results below, including time in GC, for two runs each in
separate instances of "emacs -Q". In all 16 cases there were 8 GCs.

with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0)))
-Og 1.340039s (0.149663s), 1.350613s (0.149954s)
-O2 0.533649s (0.046995s), 0.533949s (0.046714s)
-O1 0.661679s (0.055181s), 0.664470s (0.057050s)
-O0 1.079090s (0.168691s), 1.068118s (0.168451s)

without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c")))
-Og 1.293845s (0.150200s), 1.305310s (0.149520s)
-O2 0.513139s (0.047117s), 0.511230s (0.047143s)
-O1 0.629743s (0.054738s), 0.629870s (0.056522s)
-O0 1.027754s (0.165569s), 1.031642s (0.168891s)



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:22                                                       ` Richard Copley
@ 2020-04-04 17:50                                                         ` Eli Zaretskii
  2020-04-04 18:29                                                         ` Andrea Corallo
  1 sibling, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 17:50 UTC (permalink / raw)
  To: Richard Copley; +Cc: acm, emacs-devel, dgutov, akrl

> From: Richard Copley <rcopley@gmail.com>
> Date: Sat, 4 Apr 2020 18:22:34 +0100
> Cc: Alan Mackenzie <acm@muc.de>, Andrea Corallo <akrl@sdf.org>,
>  Emacs Development <emacs-devel@gnu.org>, Dmitry Gutov <dgutov@yandex.ru>
> 
> For my earlier results, I ran the -Og benchmark was in the git
> repository (with .git a directory) and the other three in git
> worktrees (with .git a regular file). I have repeated my tests for the
> -Og case in a git worktree, to match the other three. It didn't make a
> significant difference. I haven't tried it outside of git.
> 
> Amended results below, including time in GC, for two runs each in
> separate instances of "emacs -Q". In all 16 cases there were 8 GCs.
> 
> with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0)))
> -Og 1.340039s (0.149663s), 1.350613s (0.149954s)
> -O2 0.533649s (0.046995s), 0.533949s (0.046714s)
> -O1 0.661679s (0.055181s), 0.664470s (0.057050s)
> -O0 1.079090s (0.168691s), 1.068118s (0.168451s)
> 
> without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c")))
> -Og 1.293845s (0.150200s), 1.305310s (0.149520s)
> -O2 0.513139s (0.047117s), 0.511230s (0.047143s)
> -O1 0.629743s (0.054738s), 0.629870s (0.056522s)
> -O0 1.027754s (0.165569s), 1.031642s (0.168891s)

Thanks.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:22                                                       ` Richard Copley
  2020-04-04 17:50                                                         ` Eli Zaretskii
@ 2020-04-04 18:29                                                         ` Andrea Corallo
  2020-04-04 18:56                                                           ` Richard Copley
  1 sibling, 1 reply; 139+ messages in thread
From: Andrea Corallo @ 2020-04-04 18:29 UTC (permalink / raw)
  To: Richard Copley
  Cc: Alan Mackenzie, Eli Zaretskii, Emacs Development, Dmitry Gutov

Richard Copley <rcopley@gmail.com> writes:

> For my earlier results, I ran the -Og benchmark was in the git
> repository (with .git a directory) and the other three in git
> worktrees (with .git a regular file). I have repeated my tests for the
> -Og case in a git worktree, to match the other three. It didn't make a
> significant difference. I haven't tried it outside of git.
>
> Amended results below, including time in GC, for two runs each in
> separate instances of "emacs -Q". In all 16 cases there were 8 GCs.
>
> with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0)))
> -Og 1.340039s (0.149663s), 1.350613s (0.149954s)
> -O2 0.533649s (0.046995s), 0.533949s (0.046714s)
> -O1 0.661679s (0.055181s), 0.664470s (0.057050s)
> -O0 1.079090s (0.168691s), 1.068118s (0.168451s)
>
> without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c")))
> -Og 1.293845s (0.150200s), 1.305310s (0.149520s)
> -O2 0.513139s (0.047117s), 0.511230s (0.047143s)
> -O1 0.629743s (0.054738s), 0.629870s (0.056522s)
> -O0 1.027754s (0.165569s), 1.031642s (0.168891s)

The fact that -Og is slower then -O0 is very sad but also interesting.

Which (I guess) GCC version are you on?

Generally speaking I suspect -Og is not very much tested, especially
performance wise.

  Andrea

--
akrl@sdf.org



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 18:29                                                         ` Andrea Corallo
@ 2020-04-04 18:56                                                           ` Richard Copley
  2020-04-04 20:36                                                             ` Andrea Corallo
  0 siblings, 1 reply; 139+ messages in thread
From: Richard Copley @ 2020-04-04 18:56 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Emacs Development

On Sat, 4 Apr 2020 at 19:29, Andrea Corallo <akrl@sdf.org> wrote:

> The fact that -Og is slower then -O0 is very sad but also interesting.

Yeah. Among its other selling points, it should give "a reasonable
level of optimization" [1].

[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-Og

> Which (I guess) GCC version are you on?

GCC 9.3.0, for/on 64-bit Windows, built by MSYS2.


> Generally speaking I suspect -Og is not very much tested, especially
> performance wise.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 18:56                                                           ` Richard Copley
@ 2020-04-04 20:36                                                             ` Andrea Corallo
  0 siblings, 0 replies; 139+ messages in thread
From: Andrea Corallo @ 2020-04-04 20:36 UTC (permalink / raw)
  To: Richard Copley; +Cc: Emacs Development

Richard Copley <rcopley@gmail.com> writes:

> On Sat, 4 Apr 2020 at 19:29, Andrea Corallo <akrl@sdf.org> wrote:
>
>> The fact that -Og is slower then -O0 is very sad but also interesting.
>
> Yeah. Among its other selling points, it should give "a reasonable
> level of optimization" [1].

Yep, it does not make much sense to be honest.  Just the fact you do not
spill and fill all the time every automatic variables on the stack should
give a measurable improvement.  There must be some macroscopic reason we
are missing.

> [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-Og
>
>> Which (I guess) GCC version are you on?
>
> GCC 9.3.0, for/on 64-bit Windows, built by MSYS2.
>
>
>> Generally speaking I suspect -Og is not very much tested, especially
>> performance wise.
>

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:45                                                     ` Eli Zaretskii
  2020-04-04 17:22                                                       ` Richard Copley
@ 2020-04-04 17:36                                                       ` Dmitry Gutov
  2020-04-04 17:47                                                         ` Eli Zaretskii
  1 sibling, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-04 17:36 UTC (permalink / raw)
  To: Eli Zaretskii, acm; +Cc: casouri, akrl, monnier, emacs-devel

On 04.04.2020 19:45, Eli Zaretskii wrote:
> Also, how many GC's and the time they took did benchmark report?

I showed such outputs before.

Now, with an -Og build, here are outputs of several consecutive runs:

Elapsed time: 0.912808s (0.125516s in 7 GCs)
Elapsed time: 0.772653s (0.077285s in 4 GCs)
Elapsed time: 0.769371s (0.076361s in 4 GCs)
Elapsed time: 0.776261s (0.077395s in 4 GCs)

(The first one right after Emacs was started).

> With
> such short timings and running the test only once,

I always run it several times, discarding the first result because the 
FS cache is likely cold that iteration. The buffer is killed between 
runs, of course.

> the difference GC
> could make might be significant, so if different runs and different
> people here have different numbers of GC, we could be comparing apples
> with oranges.

In an optimized build, it's always < 0.2s here. And I gave an average 
number. It's not my first time benchmarking either.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:36                                                       ` Dmitry Gutov
@ 2020-04-04 17:47                                                         ` Eli Zaretskii
  2020-04-04 18:02                                                           ` Dmitry Gutov
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 17:47 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 4 Apr 2020 20:36:03 +0300
> Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> 
> Now, with an -Og build, here are outputs of several consecutive runs:
> 
> Elapsed time: 0.912808s (0.125516s in 7 GCs)
> Elapsed time: 0.772653s (0.077285s in 4 GCs)
> Elapsed time: 0.769371s (0.076361s in 4 GCs)
> Elapsed time: 0.776261s (0.077395s in 4 GCs)
> [...]
> In an optimized build, it's always < 0.2s here.

So we are looking at -O2 being about 3 to 5 times faster than -Og,
right?  That's a speedup that is more than I'd expect, but still
nowhere near an order of magnitude that Alan's timings seemed to show.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:47                                                         ` Eli Zaretskii
@ 2020-04-04 18:02                                                           ` Dmitry Gutov
  2020-04-04 23:01                                                             ` Stefan Monnier
  0 siblings, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-04 18:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl

On 04.04.2020 20:47, Eli Zaretskii wrote:
>> Elapsed time: 0.912808s (0.125516s in 7 GCs)
>> Elapsed time: 0.772653s (0.077285s in 4 GCs)
>> Elapsed time: 0.769371s (0.076361s in 4 GCs)
>> Elapsed time: 0.776261s (0.077395s in 4 GCs)
>> [...]
>> In an optimized build, it's always < 0.2s here.
> So we are looking at -O2 being about 3 to 5 times faster than -Og,
> right?  That's a speedup that is more than I'd expect, but still
> nowhere near an order of magnitude that Alan's timings seemed to show.

0.76 / 0.13 ~= 5.86

Alan's difference is bigger, but not by much:

1.24 / 0.18 ~= 6.88
1.18 (from another email) / 0.18 ~= 6.55

Which probably makes sense given different CPU architectures.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 18:02                                                           ` Dmitry Gutov
@ 2020-04-04 23:01                                                             ` Stefan Monnier
  2020-04-06 14:25                                                               ` Yuan Fu
  0 siblings, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-04-04 23:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, Eli Zaretskii, emacs-devel, casouri, akrl

> 0.76 / 0.13 ~= 5.86
>
> Alan's difference is bigger, but not by much:
>
> 1.24 / 0.18 ~= 6.88
> 1.18 (from another email) / 0.18 ~= 6.55

That does remind me that I've had the impression "lately" that debug
builds are much slower than they used to be.  I suspect (for no reason
other than lack of imagination on my part) this is linked to the changes
from macros to inlinable functions.  When Paul started doing that we
tried to keep some "important" macros as macros (depending on
DEFINE_KEY_OPS_AS_MACROS) to keep the performance impact under control.
Maybe something changed in this respect (maybe we should add a few more
fallback-macros into the set of functions affected by
DEFINE_KEY_OPS_AS_MACROS, or maybe something prevents
DEFINE_KEY_OPS_AS_MACROS from doing its job, or ...)?

        Stefan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 23:01                                                             ` Stefan Monnier
@ 2020-04-06 14:25                                                               ` Yuan Fu
  2020-04-06 19:55                                                                 ` Jorge Javier Araya Navarro
  0 siblings, 1 reply; 139+ messages in thread
From: Yuan Fu @ 2020-04-06 14:25 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: acm, Eli Zaretskii, Andrea Corallo, emacs-devel, Dmitry Gutov

Seems the discussion has stalled, may I ask what’s the conclusion so far? (w.r.t. whole buffer parse & how to pass text to tree-sitter.) 

Yuan


^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-06 14:25                                                               ` Yuan Fu
@ 2020-04-06 19:55                                                                 ` Jorge Javier Araya Navarro
  0 siblings, 0 replies; 139+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-04-06 19:55 UTC (permalink / raw)
  To: emacs-devel



El lunes 06 de abril del 2020 a las 0825 horas, Yuan Fu escribió:

> Seems the discussion has stalled, may I ask what’s the conclusion so far? (w.r.t. whole buffer parse & how to pass text to tree-sitter.) 
>
> Yuan

whole buffer pass and using after-change-functions for incremental parsing, AFAIK. Tweak what ever needs to be tweaked or change what needs an adjustment, rinse and repeat.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 16:38                                                   ` Eli Zaretskii
  2020-04-04 16:45                                                     ` Eli Zaretskii
@ 2020-04-04 17:29                                                     ` Dmitry Gutov
  2020-04-04 17:38                                                       ` Eli Zaretskii
  1 sibling, 1 reply; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-04 17:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl

On 04.04.2020 19:38, Eli Zaretskii wrote:
> Is this with xdisp.c in a Git repository or outside of a Git
> repository?

Inside, always.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:29                                                     ` Dmitry Gutov
@ 2020-04-04 17:38                                                       ` Eli Zaretskii
  2020-04-04 17:57                                                         ` Dmitry Gutov
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04 17:38 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl

> Cc: acm@muc.de, casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 4 Apr 2020 20:29:46 +0300
> 
> On 04.04.2020 19:38, Eli Zaretskii wrote:
> > Is this with xdisp.c in a Git repository or outside of a Git
> > repository?
> 
> Inside, always.

In which case invoking Git (and all the machinery that runs a
sub-process) is another factor.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04 17:38                                                       ` Eli Zaretskii
@ 2020-04-04 17:57                                                         ` Dmitry Gutov
  0 siblings, 0 replies; 139+ messages in thread
From: Dmitry Gutov @ 2020-04-04 17:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl

On 04.04.2020 20:38, Eli Zaretskii wrote:
> In which case invoking Git (and all the machinery that runs a
> sub-process) is another factor.

See my older message about using js-mode with xdisp.c in an -Og build. 
It was 0.06s or so.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 13:14               ` Eli Zaretskii
  2020-03-31 14:31                 ` Dmitry Gutov
  2020-03-31 15:11                 ` Stefan Monnier
@ 2020-03-31 16:13                 ` Alan Third
  2020-03-31 17:55                   ` Eli Zaretskii
  2 siblings, 1 reply; 139+ messages in thread
From: Alan Third @ 2020-03-31 16:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel, Stefan Monnier, akrl

On Tue, Mar 31, 2020 at 04:14:16PM +0300, Eli Zaretskii wrote:
> 
> In any case, I hope that passing the buffer to tree-sitter doesn't
> involve marshalling the entire buffer text via a function call as a
> huge string, or some such.  We should instead request that tree-sitter
> exposes an API through which we could give it direct access to buffer
> text as 2 parts, before and after the gap, like we do with regex
> code.  Otherwise this will be a bottleneck in the long run, not unlike
> the problem we have with LSP.

I'm not sure if this is exactly what you're talking about, but it has
an API for letting it access your own data structure:

https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code

-- 
Alan Third



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 16:13                 ` Alan Third
@ 2020-03-31 17:55                   ` Eli Zaretskii
  0 siblings, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:55 UTC (permalink / raw)
  To: Alan Third; +Cc: casouri, emacs-devel, monnier, akrl

> Date: Tue, 31 Mar 2020 18:13:15 +0200 (CEST)
> From: Alan Third <alan@idiocy.org>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, casouri@gmail.com,
> 	akrl@sdf.org, emacs-devel@gnu.org
> 
> I'm not sure if this is exactly what you're talking about, but it has
> an API for letting it access your own data structure:
> 
> https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code

Yes, I've read their docs.  It isn't optimal for us, although it will
do for initial experiments.  But for production I think we need
something more efficient.  One of the problems we need to solve is how
to avoid the costly encoding of buffer text, and still be able to
support the occasional raw bytes we sometimes have in our buffers.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-29 19:18   ` Eli Zaretskii
  2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
@ 2020-03-30  3:35     ` Stefan Monnier
  2020-03-30  6:02       ` Eli Zaretskii
  1 sibling, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-30  3:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Andrea Corallo

>> > Maybe those grammars could be compiled to some other representation (I
>> > don't know if it is made mostly of data-tables or actual code or what)?
>> IMO ideally should be lisp and we should leverage the native compiler
>> for that, but I understand we are not there.
> FWIW, it should indeed be possible to develop the grammars in Lisp,
> but that is not the first goal in bringing such a package to Emacs.

I'm not interested in changing the way grammars are *written*.
I'm proposing investigating if the tree-sitter run-time library can be
made to read an OS-and-architecture-neutral representation of
the grammar.


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30  3:35     ` Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
@ 2020-03-30  6:02       ` Eli Zaretskii
  2020-03-30 13:33         ` Stefan Monnier
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-30  6:02 UTC (permalink / raw)
  To: emacs-devel, Stefan Monnier; +Cc: Andrea Corallo

On March 30, 2020 6:35:08 AM GMT+03:00, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> >> > Maybe those grammars could be compiled to some other
> representation (I
> >> > don't know if it is made mostly of data-tables or actual code or
> what)?
> >> IMO ideally should be lisp and we should leverage the native
> compiler
> >> for that, but I understand we are not there.
> > FWIW, it should indeed be possible to develop the grammars in Lisp,
> > but that is not the first goal in bringing such a package to Emacs.
> 
> I'm not interested in changing the way grammars are *written*.
> I'm proposing investigating if the tree-sitter run-time library can be
> made to read an OS-and-architecture-neutral representation of
> the grammar.

What is "OS-and-architecture-neutral representation of the grammar" and how it is different from what tree-sitter uses now?



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30  6:02       ` Eli Zaretskii
@ 2020-03-30 13:33         ` Stefan Monnier
  2020-03-30 14:09           ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-30 13:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Andrea Corallo, emacs-devel

> What is "OS-and-architecture-neutral representation of the grammar" and how
> it is different from what tree-sitter uses now?

I don't know, that's part of the question (well, I know what I mean by
an "OS-and-architecture-neutral representation", of course,
but I believe you also understand this concept).


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30 13:33         ` Stefan Monnier
@ 2020-03-30 14:09           ` Eli Zaretskii
  2020-03-30 15:03             ` Stefan Monnier
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-30 14:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: akrl, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org,  Andrea Corallo <akrl@sdf.org>
> Date: Mon, 30 Mar 2020 09:33:37 -0400
> 
> > What is "OS-and-architecture-neutral representation of the grammar" and how
> > it is different from what tree-sitter uses now?
> 
> I don't know, that's part of the question (well, I know what I mean by
> an "OS-and-architecture-neutral representation", of course,
> but I believe you also understand this concept).

Actually, no, I don't.  It was a serious question, I didn't understand
what grammar representation you had in mind.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30 14:09           ` Eli Zaretskii
@ 2020-03-30 15:03             ` Stefan Monnier
  2020-04-01  0:39               ` Stephen Leake
  0 siblings, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-03-30 15:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: akrl, emacs-devel

>> > What is "OS-and-architecture-neutral representation of the grammar" and how
>> > it is different from what tree-sitter uses now?
>> 
>> I don't know, that's part of the question (well, I know what I mean by
>> an "OS-and-architecture-neutral representation", of course,
>> but I believe you also understand this concept).
>
> Actually, no, I don't.  It was a serious question, I didn't understand
> what grammar representation you had in mind.

I don't have any in mind.  It just needs to be
OS-and-architecture-neutral (otherwise it requires either distribution
of pre-compiled versions (with the logistical problem of covering all
possible OSes and architectures), or it requires a compiler on the
end-user machine).


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3)
  2020-03-30 15:03             ` Stefan Monnier
@ 2020-04-01  0:39               ` Stephen Leake
  0 siblings, 0 replies; 139+ messages in thread
From: Stephen Leake @ 2020-04-01  0:39 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> > What is "OS-and-architecture-neutral representation of the grammar" and how
>>> > it is different from what tree-sitter uses now?
>>> 
>>> I don't know, that's part of the question (well, I know what I mean by
>>> an "OS-and-architecture-neutral representation", of course,
>>> but I believe you also understand this concept).
>>
>> Actually, no, I don't.  It was a serious question, I didn't understand
>> what grammar representation you had in mind.
>
> I don't have any in mind.  It just needs to be
> OS-and-architecture-neutral (otherwise it requires either distribution
> of pre-compiled versions (with the logistical problem of covering all
> possible OSes and architectures), or it requires a compiler on the
> end-user machine).

At one extreme, the source code for the grammar is
OS-and-architecture-neutral. Tree-sitter compiles the source code to
binary (presumably in a linkable library). There may be some
intermediate representation of the grammar that would be useful in some
way, but I don't see how.

Normally, wisi compiles the grammar source to Ada code, then compiles
that to an executable, wisi also provides a "text_rep" representation of
the LR parse table (almost-readable ASCII text), but that's an
implementation detail; the Ada compiler can't handle very large tables
when represented as compilable Ada source.

semantic compiles a grammar to elisp source, then byte-compiles that.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
@ 2020-03-31 17:07 Tuấn Anh Nguyễn
  2020-03-31 17:50 ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Tuấn Anh Nguyễn @ 2020-03-31 17:07 UTC (permalink / raw)
  To: emacs-devel

> In any case, I hope that passing the buffer to tree-sitter doesn't
> involve marshalling the entire buffer text via a function call as a
> huge string, or some such.  We should instead request that tree-sitter
> exposes an API through which we could give it direct access to buffer
> text as 2 parts, before and after the gap, like we do with regex
> code.  Otherwise this will be a bottleneck in the long run, not unlike
> the problem we have with LSP.

It does support parsing through direct access. Which is why I wanted
dynamic modules to have direct access to buffer text.

>> How large is "very large" here?
>
> xdisp.c comes to mind, obviously.

On my machine, a 3.39 GHz Intel Core i7:

    (0.150791 0 0.0) ; 1 full parse
    (2.142236 5 0.6105190000000107) ; 10 full parses
    (0.015423 0 0.0) ; incremental parsing, after typing 1 character

--
Tuấn-Anh Nguyễn
Software Engineer



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:07 Reliable after-change-functions (via: Using incremental parsing in Emacs) Tuấn Anh Nguyễn
@ 2020-03-31 17:50 ` Eli Zaretskii
  2020-04-01  6:17   ` Tuấn Anh Nguyễn
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-03-31 17:50 UTC (permalink / raw)
  To: Tuấn Anh Nguyễn; +Cc: emacs-devel

> From: Tuấn Anh Nguyễn <ubolonton@gmail.com>
> Date: Wed, 1 Apr 2020 00:07:27 +0700
> 
> > xdisp.c comes to mind, obviously.
> 
> On my machine, a 3.39 GHz Intel Core i7:
> 
>     (0.150791 0 0.0) ; 1 full parse

How did you submit xdisp.c to the parser?

In any case, IIUC, the first time a buffer needs to be displayed, we
need to wait for these 150 msec?  That's annoyingly long (and I
suspect in real Emacs usage will be significantly longer, due to
memory allocation, encoding, etc.).



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-03-31 17:50 ` Eli Zaretskii
@ 2020-04-01  6:17   ` Tuấn Anh Nguyễn
  2020-04-01 13:26     ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Tuấn Anh Nguyễn @ 2020-04-01  6:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1220 bytes --]

On Wed, Apr 1, 2020 at 12:50 AM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Tuấn Anh Nguyễn <ubolonton@gmail.com>
> > Date: Wed, 1 Apr 2020 00:07:27 +0700
> >
> > > xdisp.c comes to mind, obviously.
> >
> > On my machine, a 3.39 GHz Intel Core i7:
> >
> >     (0.150791 0 0.0) ; 1 full parse
>
> How did you submit xdisp.c to the parser?
>

    (with-current-buffer "xdisp.c"
      (let ((language (tree-sitter-require 'c))
            (parser (ts-make-parser)))
        (ts-set-language parser language)
        (garbage-collect)
        (message "%s" (benchmark-run (ts-parse parser #'ts-buffer-input
nil)))))


> In any case, IIUC, the first time a buffer needs to be displayed, we
> need to wait for these 150 msec?  That's annoyingly long (and I
> suspect in real Emacs usage will be significantly longer, due to
> memory allocation, encoding, etc.).
>

Real usage with "xdisp.c":

    (define-advice tree-sitter--do-parse (:around (f &rest args) benchmark)
      (message "%s" (benchmark-run (apply f args))))

    (0.257998 1 0.13326100000000096)

So yes, direct access to buffer's text from dynamic modules would be nice.

--
Tuấn-Anh Nguyễn
Software Engineer

[-- Attachment #2: Type: text/html, Size: 2127 bytes --]

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01  6:17   ` Tuấn Anh Nguyễn
@ 2020-04-01 13:26     ` Eli Zaretskii
  2020-04-01 15:47       ` Jorge Javier Araya Navarro
  2020-04-01 17:55       ` Tuấn-Anh Nguyễn
  0 siblings, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01 13:26 UTC (permalink / raw)
  To: Tuấn Anh Nguyễn; +Cc: emacs-devel

> From: Tuấn Anh Nguyễn <ubolonton@gmail.com>
> Date: Wed, 1 Apr 2020 13:17:42 +0700
> Cc: emacs-devel@gnu.org
> 
> Real usage with "xdisp.c":
> 
>     (define-advice tree-sitter--do-parse (:around (f &rest args) benchmark)
>       (message "%s" (benchmark-run (apply f args))))
> 
>     (0.257998 1 0.13326100000000096)

And that is even without encoding the buffer text, IIUC what the
package does.

> So yes, direct access to buffer's text from dynamic modules would be nice.

Did you consider using the API where an application can provide a
function to return text at a given offset?  Such a function could be
relatively easily implemented for Emacs.

Btw, what do you do with the tree returned by the tree-sitter parser?
store it in some buffer-local variable?  If so, how much memory does
such a tree take, and when, if ever, is that memory released?

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 13:26     ` Eli Zaretskii
@ 2020-04-01 15:47       ` Jorge Javier Araya Navarro
  2020-04-01 16:07         ` Eli Zaretskii
  2020-04-01 17:55       ` Tuấn-Anh Nguyễn
  1 sibling, 1 reply; 139+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-04-01 15:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Tuấn Anh Nguyễn, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

> Did you consider using the API where an application can provide a
function to return text at a given offset?  Such a function could be
relatively easily implemented for Emacs.

But why not just allow access to buffers for dynamic modules, otherwise
what would be the point of dynamic modules?

El mié., 1 de abr. de 2020 a la(s) 07:26, Eli Zaretskii (eliz@gnu.org)
escribió:

> > From: Tuấn Anh Nguyễn <ubolonton@gmail.com>
> > Date: Wed, 1 Apr 2020 13:17:42 +0700
> > Cc: emacs-devel@gnu.org
> >
> > Real usage with "xdisp.c":
> >
> >     (define-advice tree-sitter--do-parse (:around (f &rest args)
> benchmark)
> >       (message "%s" (benchmark-run (apply f args))))
> >
> >     (0.257998 1 0.13326100000000096)
>
> And that is even without encoding the buffer text, IIUC what the
> package does.
>
> > So yes, direct access to buffer's text from dynamic modules would be
> nice.
>
> Did you consider using the API where an application can provide a
> function to return text at a given offset?  Such a function could be
> relatively easily implemented for Emacs.
>
> Btw, what do you do with the tree returned by the tree-sitter parser?
> store it in some buffer-local variable?  If so, how much memory does
> such a tree take, and when, if ever, is that memory released?
>
>

[-- Attachment #2: Type: text/html, Size: 2038 bytes --]

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 15:47       ` Jorge Javier Araya Navarro
@ 2020-04-01 16:07         ` Eli Zaretskii
  0 siblings, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01 16:07 UTC (permalink / raw)
  To: Jorge Javier Araya Navarro; +Cc: ubolonton, emacs-devel

> From: Jorge Javier Araya Navarro <jorge@esavara.cr>
> Date: Wed, 1 Apr 2020 09:47:48 -0600
> Cc: Tuấn Anh Nguyễn <ubolonton@gmail.com>, 
> 	emacs-devel@gnu.org
> 
> > Did you consider using the API where an application can provide a function to return text at a given offset? 
> Such a function could be relatively easily implemented for Emacs.
> 
> But why not just allow access to buffers for dynamic modules, otherwise what would be the point of dynamic
> modules?

These two are orthogonal issues: if we allow such access from modules,
will this particular module use it, and if so, how?



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 13:26     ` Eli Zaretskii
  2020-04-01 15:47       ` Jorge Javier Araya Navarro
@ 2020-04-01 17:55       ` Tuấn-Anh Nguyễn
  2020-04-01 19:33         ` Eli Zaretskii
  1 sibling, 1 reply; 139+ messages in thread
From: Tuấn-Anh Nguyễn @ 2020-04-01 17:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Wed, Apr 1, 2020 at 8:26 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Tuấn Anh Nguyễn <ubolonton@gmail.com>
> > Date: Wed, 1 Apr 2020 13:17:42 +0700
> > Cc: emacs-devel@gnu.org
> >
> > Real usage with "xdisp.c":
> >
> >     (define-advice tree-sitter--do-parse (:around (f &rest args) benchmark)
> >       (message "%s" (benchmark-run (apply f args))))
> >
> >     (0.257998 1 0.13326100000000096)
>
> And that is even without encoding the buffer text, IIUC what the
> package does.
>
> > So yes, direct access to buffer's text from dynamic modules would be nice.
>
> Did you consider using the API where an application can provide a
> function to return text at a given offset?  Such a function could be
> relatively easily implemented for Emacs.
>

I don't understand what you mean. Below I'll explain how it works
currently.

`ts-parse' uses the Tree-sitter's API that consumes text in chunks:

    TSTree *ts_parser_parse(
      TSParser *self,
      const TSTree *old_tree,
      TSInput input
    );

    typedef struct {
      void *payload;
      const char *(*read)(
        void *payload,
        uint32_t byte_offset,
        TSPoint position,
        uint32_t *bytes_read
      );
      TSInputEncoding encoding;
    } TSInput;

Because dynamic modules don't have direct access to buffer text,
`ts-parse' uses the module function `copy_string_contents', and exposes
this interface:

    (ts-parse PARSER INPUT-FUNCTION OLD-TREE)

Here INPUT-FUNCTION must return a chunk of the buffer text, starting
from the given byte offset, as a Lisp string. `ts-buffer-input' is one
such function.

So:

1. Chunks of the buffer text are copied into Lisp strings, through
   `buffer-substring-no-properties'.
2. These Lisp strings are copied into buffers of null-terminated utf-8
   bytes, through `copy_string_contents'.
3. All these temporary Lisp strings create GC pressure. In the xdisp.c
   example, it was 100ms for GC, in addition to 150ms for parsing.
4. emacs-module-rs has an automatic, blanket workaround for this bug
   https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31238. The workaround
   involves pairs of `make_global_ref' and `free_global_ref' calls, on
   all "suspected" `emacs_value's.

#4 can be avoided if emacs-module-rs allows selectively disabling the
blanket workaround. It's band-aid on top of band-aid, but at least it's
workable.

#3 can probably be alleviated by increasing the chunk size.

However, they are consequences of #1 and #2. If dynamic modules have
direct access to the buffer text, none of the above is an issue.

Such direct access can be enabled by something like this:

    char* (*access_buffer_text) (emacs_env *env,
                                 emacs_value buffer,
                                 ptrdiff_t byte_offset,
                                 ptrdiff_t *size_inout);

Of course, such an API would require extensive documentation on how it
must be used, to ensure safety and correctness.

> Btw, what do you do with the tree returned by the tree-sitter parser?
> store it in some buffer-local variable?  If so, how much memory does
> such a tree take, and when, if ever, is that memory released?
>

It's stored in a buffer-local variable. I haven't measured the memory
they take. Memory is released when the tree object is garbage-collected
(it's a `user-ptr').

--
Tuấn-Anh Nguyễn
Software Engineer

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 17:55       ` Tuấn-Anh Nguyễn
@ 2020-04-01 19:33         ` Eli Zaretskii
  2020-04-01 23:38           ` Stephen Leake
  2020-04-02  4:21           ` Tuấn-Anh Nguyễn
  0 siblings, 2 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-01 19:33 UTC (permalink / raw)
  To: Tuấn-Anh Nguyễn; +Cc: emacs-devel

> From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> Date: Thu, 2 Apr 2020 00:55:45 +0700
> Cc: emacs-devel@gnu.org
> 
> > Did you consider using the API where an application can provide a
> > function to return text at a given offset?  Such a function could be
> > relatively easily implemented for Emacs.
> >
> 
> I don't understand what you mean. Below I'll explain how it works
> currently.  [...]  If dynamic modules have direct access to the
> buffer text, none of the above is an issue.
> 
> Such direct access can be enabled by something like this:
> 
>     char* (*access_buffer_text) (emacs_env *env,
>                                  emacs_value buffer,
>                                  ptrdiff_t byte_offset,
>                                  ptrdiff_t *size_inout);
> 
> Of course, such an API would require extensive documentation on how it
> must be used, to ensure safety and correctness.

I think you are moving too fast, and keep the current implementation
in sight too much.

What I suggest is to step back and see how such direct access, if it
were available, could be used with tree-sitter.  Let's forget about
modules for a moment and consider tree-sitter linked with Emacs and
capable of calling any C function in core.  How would you use that?

Buffer text is not exactly UTF-8, it's a superset of UTF-8.  So one
question to answer is what to do with byte sequences that are not
valid UTF-8.  Any suggestions or ideas?  How does tree-sitter handle
invalid byte sequences in general?

Also, direct access to buffer text generally means we must make sure
GC never runs as long as pointers to buffer text are lying around.
Can any Lisp run between calls to the reader function that the
tree-sitter parser calls to access the buffer text?  If so, we need to
take care of that issue.

Next, I'm still asking whether parsing the whole buffer when it is
first created is necessary.  Can we pass to the parser just a small
chunk (say, 500 bytes) of the buffer around the window-full to be
displayed next?  If this presents problems, what are those problems?

IOW, the issue with exposing access to buffer text to modules is IMO
secondary.  My suggestion is first to figure out how to do this stuff
efficiently from within Emacs itself, as if the module interface were
not part of the equation.  We can add that aspect back later.

And yes, doing this by consing strings is not a good idea, it will
slow things down and cause a lot of GC.  It is best avoided.  Thus my
questions above.

> > Btw, what do you do with the tree returned by the tree-sitter parser?
> > store it in some buffer-local variable?  If so, how much memory does
> > such a tree take, and when, if ever, is that memory released?
> >
> 
> It's stored in a buffer-local variable. I haven't measured the memory
> they take. Memory is released when the tree object is garbage-collected
> (it's a `user-ptr').

So if I have many hundreds of buffers, I could have such a tree in
each one of them indefinitely?  Perhaps that's one more design issue
to consider, given that the parsing is so fast.  Similar to what we do
with image and face caches -- we flush them from time to time, to keep
the memory footprint in check.  So a buffer that was not current more
than some time interval ago could have its tree GCed.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 19:33         ` Eli Zaretskii
@ 2020-04-01 23:38           ` Stephen Leake
  2020-04-02  0:25             ` Stephen Leake
                               ` (3 more replies)
  2020-04-02  4:21           ` Tuấn-Anh Nguyễn
  1 sibling, 4 replies; 139+ messages in thread
From: Stephen Leake @ 2020-04-01 23:38 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Also, direct access to buffer text generally means we must make sure
> GC never runs as long as pointers to buffer text are lying around.
> Can any Lisp run between calls to the reader function that the
> tree-sitter parser calls to access the buffer text?  

If the parser copies the text into an internal buffer, that reader
function should only be called once per call to the parser. Parsers used
to based around small buffers that would read in a file a chunk at a
time, but that is not necessary on any machine that can run Emacs. Since
Emacs has the entire file in memory, the parser can too.

However, if we are really trying to avoid copying text (which is very
premature optimization), then the reader function will be called many
times during parsing (to fetch each word), and possibly during the
grammar actions (to compute indent or face).

> Next, I'm still asking whether parsing the whole buffer when it is
> first created is necessary.  

To some extent, that depends on the language.

The parser must be able to complete a parse, to generate a complete
syntax tree. I'll assume no error correction for a moment; more below.

In C or C++ body files, "a complete parse" is typically one variable or
function declaration. So if Emacs can reliably find the beginning and
end of those declarations, it could pass just the ones containing the
region of interest to the parser. Tree-sitter (if it supports this at
all, or is modified to) would end up with a forest of small parse trees,
rather than one large one. They might get merged if large chunks of text
are parsed together.

In Ada and Java, and most C++ header files, "a complete parse" is a
file; it contains an Ada package spec or body, or a Java or C++ class,
or a C++ namespace.

There are many "small languages" for which "a complete parse" is similar
to a "statement". Bash shell, for example. They could pass just the
statement, but only if Emacs can reliably find the start and end (not
always easy).

It is also possible to modify the language grammar to allow smaller
pieces of code to be a complete parse; ada-mode does this, making a
single declaration or statement "a complete parse", in order to support
"partial parse". That can easily lead to errors in indent, since the
indent of the start of the text portion is unknown (ada-mode simply
assumes it is correct in the buffer).

Another reason to allow smaller code chunks to be a complete parse is to
allow parsing the code fragments that appear in grammar actions; the
ELPA package wisitoken-grammar-mode uses this for wisitoken grammar
files with Ada actions.

In sum, the short answer is "yes, you must parse the whole file, unless
your language is particularly simple".

Since we need to support the worst case, we should assume the whole file
must be parsed at least once.

> Can we pass to the parser just a small chunk (say, 500 bytes) of the
> buffer around the window-full to be displayed next? If this presents
> problems, what are those problems?

In wisi, the error correction code will fill in the missing text so a
complete parse is possible. Since some of that is guesses, the results
may not be very good. Tree-sitter also has error correction; I'm not
clear how good it is.

> IOW, the issue with exposing access to buffer text to modules is IMO
> secondary.  

yes, because copying text is fast compared to everything else going on.

> My suggestion is first to figure out how to do this stuff efficiently
> from within Emacs itself, as if the module interface were not part of
> the equation. We can add that aspect back later.

There are two times the wisi code that wraps the parser needs access to
the buffer; first to copy the text, second to add text properties
(faces, indent values, navigation markers). There are usually many text
properties output by each parse.

The positions and values of the text properties are computed by
functions that run after the complete syntax tree has been produced. In
wisi, those functions are added directly in the grammar source file
(where they are called "post-parse grammar actions"). In tree-sitter, I
assume they are called from some mode-author-written code that traverses
the syntax tree (wisi provides that internally). Except I see below that
the emacs tree-sitter package stores the syntax tree in the buffer.

One option here is to try to standardize on an elisp representation of a
syntax tree, and have both the wisi and tree-sitter parsers provide
that. Then the grammar actions could be implemented in elisp. I suspect
that would be very slow; elisp is just not good at traversing large
complex data structures. That is not just my bias showing (I _much_
prefer doing as much as possible in Ada); I first wrote the ada-mode
parser and grammar actions in elisp, and then did a complete rewrite in
Ada, gaining significant speed. Although I never considered passing the
syntax tree to elisp as a single object, so maybe that could work well.

There is no universal standard for representing "a syntax tree". In
wisi, the tree is directly produced by the LR shift and reduce
operations, and thus is very close the the grammar expressed in BNF. I
don't know what the tree-sitter parse tree looks like. AdaCore provides
a parser similar in purpose to the wisi parsers
(https://github.com/AdaCore/libadalang), that also does more of what an
Ada compiler does (which could allow even better font-lock and navigation).
To support those additional operations, the syntax tree is quite
different from the ada-mode one.

In general, each parser library, and even each grammar author, will have
different representations for the syntax tree.

So if we want to support different parsers, I think it is best to define
the Emacs "parser API" as "give text to parser; accept text properties
from parser".

LSP (via eglot) provides other things the parser can return; code
completion menus, for example. And for indent and face, it returns
formatted text with markdown. I plan to translate that to text
properties to integrate LSP into wisi. Whether LSP requires a full
initial parse is up to the LSP server author (LSP itself provides both
"here's the full text" and "here's partial text" messages); they have
the same considerations discussed above.

> And yes, doing this by consing strings is not a good idea, it will
> slow things down and cause a lot of GC.  It is best avoided.  Thus my
> questions above.

I'm not sure how "convert syntax tree to elisp" compares to "consing
strings". I would certainly expect it to cause a lot of GC.

>> > Btw, what do you do with the tree returned by the tree-sitter parser?
>> > store it in some buffer-local variable?  If so, how much memory does
>> > such a tree take, and when, if ever, is that memory released?
>> >
>> 
>> It's stored in a buffer-local variable. I haven't measured the memory
>> they take. Memory is released when the tree object is garbage-collected
>> (it's a `user-ptr').

Is it an elisp structure (or accesible from elisp)? Have you written
code that traverses it to provide faces and indentation?

-- 
-- Stephe

PS; I have the beginnings of a migraine while typing this, so some of it
may not make sense. Sigh.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 23:38           ` Stephen Leake
@ 2020-04-02  0:25             ` Stephen Leake
  2020-04-02  2:46             ` Stefan Monnier
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 139+ messages in thread
From: Stephen Leake @ 2020-04-02  0:25 UTC (permalink / raw)
  To: emacs-devel

I looked at the tree-sitter source in git-hub
(https://github.com/ubolonton/emacs-tree-sitter) and the tree-sitter doc
that points to (https://tree-sitter.github.io/tree-sitter/using-parsers)

Stephen Leake <stephen_leake@stephe-leake.org> writes:

>>> > Btw, what do you do with the tree returned by the tree-sitter parser?
>>> > store it in some buffer-local variable?  If so, how much memory does
>>> > such a tree take, and when, if ever, is that memory released?
>>> >
>>> 
>>> It's stored in a buffer-local variable. I haven't measured the memory
>>> they take. Memory is released when the tree object is garbage-collected
>>> (it's a `user-ptr').
>

> Is it an elisp structure (or accesible from elisp)? 

It's a Rust structure; there is an emacs module providing elisp access
to it (things like "find syntax tree node at point", "get parent node",
"get node text").

The syntax tree is a "concrete syntax tree"; it should be quite close to
the wisi syntax tree.

> Have you written code that traverses it to provide faces and
> indentation?

Not in that repository.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 23:38           ` Stephen Leake
  2020-04-02  0:25             ` Stephen Leake
@ 2020-04-02  2:46             ` Stefan Monnier
  2020-04-02  4:36               ` Tuấn-Anh Nguyễn
  2020-04-02 14:44               ` Eli Zaretskii
  2020-04-02  5:21             ` Tuấn-Anh Nguyễn
  2020-04-02 14:36             ` Eli Zaretskii
  3 siblings, 2 replies; 139+ messages in thread
From: Stefan Monnier @ 2020-04-02  2:46 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> In C or C++ body files, "a complete parse" is typically one variable or
> function declaration. So if Emacs can reliably find the beginning and
> end of those declarations,

IIUC, a large part of CC-mode's trouble is exactly the need to find
somewhat reliably a position vaguely like "the beginning of
a declaration".

It's very much a non-trivial problem (and in the general case to
properly handle all possible comments you need to start parsing from
point-min).

>> And yes, doing this by consing strings is not a good idea, it will
>> slow things down and cause a lot of GC.  It is best avoided.  Thus my
>> questions above.
> I'm not sure how "convert syntax tree to elisp" compares to "consing
> strings". I would certainly expect it to cause a lot of GC.

If the GC is the worry, we can use a function which encodes the
buffer using a given coding-system and returns a malloc'd array of bytes.

>>> It's stored in a buffer-local variable. I haven't measured the memory
>>> they take. Memory is released when the tree object is garbage-collected
>>> (it's a `user-ptr').
> Is it an elisp structure (or accesible from elisp)? Have you written
> code that traverses it to provide faces and indentation?

According to https://github.com/tree-sitter/tree-sitter/issues/222 the
parse tree takes around 10 times the size of the source text.  At least
that's for tree-sitter's own parse-tree; not sure how that relates to
emacs-tree-sitter's yet.


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02  2:46             ` Stefan Monnier
@ 2020-04-02  4:36               ` Tuấn-Anh Nguyễn
  2020-04-02 14:44               ` Eli Zaretskii
  1 sibling, 0 replies; 139+ messages in thread
From: Tuấn-Anh Nguyễn @ 2020-04-02  4:36 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Stephen Leake, emacs-devel

> If the GC is the worry, we can use a function which encodes the
> buffer using a given coding-system and returns a malloc'd array of bytes.
>

If we are talking about a function exposed to dynamic modules, then we
will also need to expose another function to free that byte array,
because the dynamic module may use a different allocator. It's probably
better to ask the caller to prepare that array, like what
`copy_string_contents' does.

> >>> It's stored in a buffer-local variable. I haven't measured the memory
> >>> they take. Memory is released when the tree object is garbage-collected
> >>> (it's a `user-ptr').
> > Is it an elisp structure (or accesible from elisp)? Have you written
> > code that traverses it to provide faces and indentation?
>
> According to https://github.com/tree-sitter/tree-sitter/issues/222 the
> parse tree takes around 10 times the size of the source text.  At least
> that's for tree-sitter's own parse-tree; not sure how that relates to
> emacs-tree-sitter's yet.
>

emacs-tree-sitter adds 16 bytes for reference counting and 8 bytes for
checking concurrent modifications (because nodes are also exposed to
Lisp as objects). That's negligible I think.

--
Tuấn-Anh Nguyễn
Software Engineer



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02  2:46             ` Stefan Monnier
  2020-04-02  4:36               ` Tuấn-Anh Nguyễn
@ 2020-04-02 14:44               ` Eli Zaretskii
  2020-04-02 15:19                 ` Stefan Monnier
  1 sibling, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-02 14:44 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: stephen_leake, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Wed, 01 Apr 2020 22:46:18 -0400
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> If the GC is the worry, we can use a function which encodes the
> buffer using a given coding-system and returns a malloc'd array of bytes.

I think we should try to avoid both copying and encoding the text we
send to the parser.  Both operations are expensive and require memory
allocation.

> According to https://github.com/tree-sitter/tree-sitter/issues/222 the
> parse tree takes around 10 times the size of the source text.

Yes, that's another reason why it might make sense to "forget" trees
of buffers that were not displayed for a long time.  But this is an
optimization that can be added later without any significant changes
in the design.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 14:44               ` Eli Zaretskii
@ 2020-04-02 15:19                 ` Stefan Monnier
  0 siblings, 0 replies; 139+ messages in thread
From: Stefan Monnier @ 2020-04-02 15:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen_leake, emacs-devel

> I think we should try to avoid both copying and encoding the text we
> send to the parser.  Both operations are expensive and require memory
> allocation.

I think both operations are cheap enough relatively to the actual
parsing that it is not indispensable to avoid them: maybe it will be
worth the effort, but maybe not.

In any case, it's a minor implementation detail that can easily be
changed in the future without impacting the rest of the code.

So, I think it falls squarely in the realm of premature optimization.

>> According to https://github.com/tree-sitter/tree-sitter/issues/222 the
>> parse tree takes around 10 times the size of the source text.
> Yes, that's another reason why it might make sense to "forget" trees
> of buffers that were not displayed for a long time.

Agreed, tho I wouldn't word it that way: parse trees are not needed for
redisplay and can be used for things that don't relate to redisplay
(e.g. navigation, indentation, ...).

> But this is an optimization that can be added later without any
> significant changes in the design.

Agreed as well.

        Stefan

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 23:38           ` Stephen Leake
  2020-04-02  0:25             ` Stephen Leake
  2020-04-02  2:46             ` Stefan Monnier
@ 2020-04-02  5:21             ` Tuấn-Anh Nguyễn
  2020-04-02 14:36             ` Eli Zaretskii
  3 siblings, 0 replies; 139+ messages in thread
From: Tuấn-Anh Nguyễn @ 2020-04-02  5:21 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> > My suggestion is first to figure out how to do this stuff efficiently
> > from within Emacs itself, as if the module interface were not part of
> > the equation. We can add that aspect back later.
>
> There are two times the wisi code that wraps the parser needs access to
> the buffer; first to copy the text, second to add text properties
> (faces, indent values, navigation markers). There are usually many text
> properties output by each parse.
>
> The positions and values of the text properties are computed by
> functions that run after the complete syntax tree has been produced. In
> wisi, those functions are added directly in the grammar source file
> (where they are called "post-parse grammar actions"). In tree-sitter, I
> assume they are called from some mode-author-written code that traverses
> the syntax tree (wisi provides that internally). Except I see below that
> the emacs tree-sitter package stores the syntax tree in the buffer.
>

The preferred approach with tree-sitter is querying:
https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries

--
Tuấn-Anh Nguyễn
Software Engineer



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 23:38           ` Stephen Leake
                               ` (2 preceding siblings ...)
  2020-04-02  5:21             ` Tuấn-Anh Nguyễn
@ 2020-04-02 14:36             ` Eli Zaretskii
  2020-04-03  2:27               ` Stephen Leake
  3 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-02 14:36 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Wed, 01 Apr 2020 15:38:26 -0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Also, direct access to buffer text generally means we must make sure
> > GC never runs as long as pointers to buffer text are lying around.
> > Can any Lisp run between calls to the reader function that the
> > tree-sitter parser calls to access the buffer text?  
> 
> If the parser copies the text into an internal buffer, that reader
> function should only be called once per call to the parser.

Such copying is not really scalable, and IMO should be avoided.
During active editing, redisplay runs very frequently, and having to
copy portions of the buffer, let alone all of it, each time, which
necessarily requires memory allocation, consing of Lisp objects, etc.,
will produce significant memory pressure, expensive heap
allocations/deallocations, and a lot of GC.  Recall that on many
modern platforms Emacs doesn't really return memory to the system,
which means we risk increasing the memory footprint, and create
system-wide memory pressure.  It isn't a catastrophe, but we should
try to avoid it if possible.

> Since Emacs has the entire file in memory, the parser can too.

Having the file twice or more in memory is worse than having it only
once.

> However, if we are really trying to avoid copying text (which is very
> premature optimization)

I don't think it's premature.

> In sum, the short answer is "yes, you must parse the whole file, unless
> your language is particularly simple".

Funny, my conclusion from reading your detailed description was
entirely different.

> > IOW, the issue with exposing access to buffer text to modules is IMO
> > secondary.  
> 
> yes, because copying text is fast compared to everything else going on.

That wasn't my motivation when I wrote that.

> In general, each parser library, and even each grammar author, will have
> different representations for the syntax tree.
> 
> So if we want to support different parsers, I think it is best to define
> the Emacs "parser API" as "give text to parser; accept text properties
> from parser".

Yes, something like that.  It's probably enough to accept a list of
regions with syntactic attributes.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 14:36             ` Eli Zaretskii
@ 2020-04-03  2:27               ` Stephen Leake
  2020-04-03  7:43                 ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Stephen Leake @ 2020-04-03  2:27 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Wed, 01 Apr 2020 15:38:26 -0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Also, direct access to buffer text generally means we must make sure
>> > GC never runs as long as pointers to buffer text are lying around.
>> > Can any Lisp run between calls to the reader function that the
>> > tree-sitter parser calls to access the buffer text?  
>> 
>> If the parser copies the text into an internal buffer, that reader
>> function should only be called once per call to the parser.
>
> Such copying is not really scalable, and IMO should be avoided.
> During active editing, redisplay runs very frequently, and having to
> copy portions of the buffer, let alone all of it, each time, which
> necessarily requires memory allocation, consing of Lisp objects, etc.,
> will produce significant memory pressure, expensive heap
> allocations/deallocations, and a lot of GC.  Recall that on many
> modern platforms Emacs doesn't really return memory to the system,
> which means we risk increasing the memory footprint, and create
> system-wide memory pressure.  It isn't a catastrophe, but we should
> try to avoid it if possible.

Ok. I know very little about the internal storage of text in Emacs.
There is at least two strings with a gap at the current edit point; if
we pass a simple pointer to tree-sitter, it will have to handle the gap.
You mention "consing of Lisp objects" above, which says to me that the
text is stored in a more complex structure. How can we provide direct
access of that to tree-sitter?

Avoid _all_ copying is impossible; the parser must store the contents of
each token in some way. Typically that is done by storing
pointers/indices into the text buffer that contains the entire text.

>> In sum, the short answer is "yes, you must parse the whole file, unless
>> your language is particularly simple".
>
> Funny, my conclusion from reading your detailed description was
> entirely different.

I need more than that to respond in a helpful way.

>> In general, each parser library, and even each grammar author, will have
>> different representations for the syntax tree.
>> 
>> So if we want to support different parsers, I think it is best to define
>> the Emacs "parser API" as "give text to parser; accept text properties
>> from parser".
>
> Yes, something like that.  It's probably enough to accept a list of
> regions with syntactic attributes.

Ok, good.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-03  2:27               ` Stephen Leake
@ 2020-04-03  7:43                 ` Eli Zaretskii
  2020-04-03 17:45                   ` Stephen Leake
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-03  7:43 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Thu, 02 Apr 2020 18:27:59 -0800
> 
> > Such copying is not really scalable, and IMO should be avoided.
> > During active editing, redisplay runs very frequently, and having to
> > copy portions of the buffer, let alone all of it, each time, which
> > necessarily requires memory allocation, consing of Lisp objects, etc.,
> > will produce significant memory pressure, expensive heap
> > allocations/deallocations, and a lot of GC.  Recall that on many
> > modern platforms Emacs doesn't really return memory to the system,
> > which means we risk increasing the memory footprint, and create
> > system-wide memory pressure.  It isn't a catastrophe, but we should
> > try to avoid it if possible.
> 
> Ok. I know very little about the internal storage of text in Emacs.
> There is at least two strings with a gap at the current edit point; if
> we pass a simple pointer to tree-sitter, it will have to handle the gap.

Tree-sitter allows the application to define a "reader" function that
it will then call to access buffer text.  That function should cope
with the gap.

> You mention "consing of Lisp objects" above, which says to me that the
> text is stored in a more complex structure.

I meant the consing that is necessary to make a buffer-substring that
will be passed to the parser.

> How can we provide direct access of that to tree-sitter?

See above: by writing our function to access buffer text.

> Avoid _all_ copying is impossible; the parser must store the contents of
> each token in some way. Typically that is done by storing
> pointers/indices into the text buffer that contains the entire text.

I don't think tree-sitter does that, because the text it gets is
ephemeral.  If we pass it a buffer-substring, it's a temporary string
which will be GCed after it's used; if we pass it pointers to buffer
text, those pointers can be invalid after GC, because GC can relocate
buffer text to a different memory region.

They definitely do copy portions of the text they get for internal
processing purposes, but I doubt that they duplicate all of it,
because that would not be scalable to huge buffers.  And in any case,
any copying we do would be _in_addition_ to what tree-sitter does
internally.

> >> In sum, the short answer is "yes, you must parse the whole file, unless
> >> your language is particularly simple".
> >
> > Funny, my conclusion from reading your detailed description was
> > entirely different.
> 
> I need more than that to respond in a helpful way.

Well, you said:

> To some extent, that depends on the language.

and then went on to describing how each language might _not_ need a
full parse in many cases.  Thus the conclusion sounded a bit radical
to me.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-03  7:43                 ` Eli Zaretskii
@ 2020-04-03 17:45                   ` Stephen Leake
  2020-04-03 18:31                     ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Stephen Leake @ 2020-04-03 17:45 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Thu, 02 Apr 2020 18:27:59 -0800
>> 
>> > Such copying is not really scalable, and IMO should be avoided.
>> > During active editing, redisplay runs very frequently, and having to
>> > copy portions of the buffer, let alone all of it, each time, which
>> > necessarily requires memory allocation, consing of Lisp objects, etc.,
>> > will produce significant memory pressure, expensive heap
>> > allocations/deallocations, and a lot of GC.  Recall that on many
>> > modern platforms Emacs doesn't really return memory to the system,
>> > which means we risk increasing the memory footprint, and create
>> > system-wide memory pressure.  It isn't a catastrophe, but we should
>> > try to avoid it if possible.
>> 
>> Ok. I know very little about the internal storage of text in Emacs.
>> There is at least two strings with a gap at the current edit point; if
>> we pass a simple pointer to tree-sitter, it will have to handle the gap.
>
> Tree-sitter allows the application to define a "reader" function that
> it will then call to access buffer text.  That function should cope
> with the gap.

and also with the encoding, which you did not address. I don't see how
that is different from the C level buffer-substring. Certainly there
should be a module function buffer-substring that is as efficient as possible.

>> You mention "consing of Lisp objects" above, which says to me that the
>> text is stored in a more complex structure.
>
> I meant the consing that is necessary to make a buffer-substring that
> will be passed to the parser.

Since are are calling the parser from C (if it is linked into Emacs, or
in a module), I still don't understand. Does C code have to cons to
create a string? It will have to allocate if the requested range is not
contiguous in the buffer.

>> Avoid _all_ copying is impossible; the parser must store the contents of
>> each token in some way. Typically that is done by storing
>> pointers/indices into the text buffer that contains the entire text.
>
> I don't think tree-sitter does that, because the text it gets is
> ephemeral.  If we pass it a buffer-substring, it's a temporary string
> which will be GCed after it's used; if we pass it pointers to buffer
> text, those pointers can be invalid after GC, because GC can relocate
> buffer text to a different memory region.

Hmm.
https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code
says:

    Syntax nodes store their position in the source code both in terms
    of raw bytes and row/column coordinates

In the case of passing a pointer to a string (or buffer, etc), those
positions are relative to that original buffer. So the Emacs buffer is
serving as the parse buffer. Ok, that avoids any copying.

If we pass a buffer-substring to the parser, we are then responsible for
mapping positions relative to the substring into positions relative to
the full buffer. wisi delegates that to the parser; it can pass
start-char-pos and start-byte-pos to the parser along with a string.


>> >> In sum, the short answer is "yes, you must parse the whole file, unless
>> >> your language is particularly simple".
>> >
>> > Funny, my conclusion from reading your detailed description was
>> > entirely different.
>> 
>> I need more than that to respond in a helpful way.
>
> Well, you said:
>
>> To some extent, that depends on the language.
>
> and then went on to describing how each language might _not_ need a
> full parse in many cases.  Thus the conclusion sounded a bit radical
> to me.

Ok, we are putting different spins on what "particularly simple" means.

A more neutral phrasing would be:

    Some languages require parsing the whole file, some do not.
    
-- 
-- Stephe



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-03 17:45                   ` Stephen Leake
@ 2020-04-03 18:31                     ` Eli Zaretskii
  2020-04-04  0:04                       ` Stephen Leake
  0 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-03 18:31 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Fri, 03 Apr 2020 09:45:44 -0800
> 
> > Tree-sitter allows the application to define a "reader" function that
> > it will then call to access buffer text.  That function should cope
> > with the gap.
> 
> and also with the encoding, which you did not address.

I mentioned that in another message: I don't think encoding is
necessary in this case.

> I don't see how that is different from the C level
> buffer-substring. Certainly there should be a module function
> buffer-substring that is as efficient as possible.

If modules are allowed direct access to buffer text, then it's indeed
not different.  But the alternative that was discussed was different.
May I suggest that you look at the code of the module which triggered
this?

> >> You mention "consing of Lisp objects" above, which says to me that the
> >> text is stored in a more complex structure.
> >
> > I meant the consing that is necessary to make a buffer-substring that
> > will be passed to the parser.
> 
> Since are are calling the parser from C (if it is linked into Emacs, or
> in a module), I still don't understand. Does C code have to cons to
> create a string?

If course.  How else do you get a UTF-8 encoded string to pass to the
parser as a copy of buffer text?

> > I don't think tree-sitter does that, because the text it gets is
> > ephemeral.  If we pass it a buffer-substring, it's a temporary string
> > which will be GCed after it's used; if we pass it pointers to buffer
> > text, those pointers can be invalid after GC, because GC can relocate
> > buffer text to a different memory region.
> 
> Hmm.
> https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code
> says:
> 
>     Syntax nodes store their position in the source code both in terms
>     of raw bytes and row/column coordinates

Positions are okay; 'char *' pointers to buffer or string text are
not.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-03 18:31                     ` Eli Zaretskii
@ 2020-04-04  0:04                       ` Stephen Leake
  2020-04-04  7:13                         ` Eli Zaretskii
  0 siblings, 1 reply; 139+ messages in thread
From: Stephen Leake @ 2020-04-04  0:04 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> 
>> >> You mention "consing of Lisp objects" above, which says to me that the
>> >> text is stored in a more complex structure.
>> >
>> > I meant the consing that is necessary to make a buffer-substring that
>> > will be passed to the parser.
>> 
>> Since are are calling the parser from C (if it is linked into Emacs, or
>> in a module), I still don't understand. Does C code have to cons to
>> create a string?
>
> If course.  How else do you get a UTF-8 encoded string to pass to the
> parser as a copy of buffer text?

malloc and memcpy. I guess that's what you mean by "cons"; I was
assuming you meant the actual elisp function.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-04  0:04                       ` Stephen Leake
@ 2020-04-04  7:13                         ` Eli Zaretskii
  0 siblings, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-04  7:13 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Fri, 03 Apr 2020 16:04:04 -0800
> 
> >> Since are are calling the parser from C (if it is linked into Emacs, or
> >> in a module), I still don't understand. Does C code have to cons to
> >> create a string?
> >
> > If course.  How else do you get a UTF-8 encoded string to pass to the
> > parser as a copy of buffer text?
> 
> malloc and memcpy.

How do you know how much memory to allocate?  And memcpy doesn't cut
it, because you forgot the encoding step.

You could, of course, take the low-level encoding code from coding.c
and make your own high-level functions that don't work with Lisp
objects.  But (a) why bother doing that? and (b) I think you will
quickly find out that this is a non-trivial job, since coding.c
"knows", to the lowest level, that it's dealing with Lisp objects
(buffers or strings), so you'd need pretty much to rewrite everything.

It's no accident that the Cygwin port uses the Lisp string machinery
even when it needs to convert strings from UTF-16 (see from_unicode),
even though it basically needs to convert C strings.

> I guess that's what you mean by "cons"; I was assuming you meant the
> actual elisp function.

No, I meant "consing" as in "make a Lisp string", then encode it
(which makes another Lisp string).

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-01 19:33         ` Eli Zaretskii
  2020-04-01 23:38           ` Stephen Leake
@ 2020-04-02  4:21           ` Tuấn-Anh Nguyễn
  2020-04-02  5:19             ` Jorge Javier Araya Navarro
                               ` (3 more replies)
  1 sibling, 4 replies; 139+ messages in thread
From: Tuấn-Anh Nguyễn @ 2020-04-02  4:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Thu, Apr 2, 2020 at 2:33 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> > Date: Thu, 2 Apr 2020 00:55:45 +0700
> > Cc: emacs-devel@gnu.org
> >
> > > Did you consider using the API where an application can provide a
> > > function to return text at a given offset?  Such a function could be
> > > relatively easily implemented for Emacs.
> > >
> >
> > I don't understand what you mean. Below I'll explain how it works
> > currently.  [...]  If dynamic modules have direct access to the
> > buffer text, none of the above is an issue.
> >
> > Such direct access can be enabled by something like this:
> >
> >     char* (*access_buffer_text) (emacs_env *env,
> >                                  emacs_value buffer,
> >                                  ptrdiff_t byte_offset,
> >                                  ptrdiff_t *size_inout);
> >
> > Of course, such an API would require extensive documentation on how it
> > must be used, to ensure safety and correctness.
>
> I think you are moving too fast, and keep the current implementation
> in sight too much.
>

I'm actually moving too slow here. I have thought about this part quite
a bit, but I'm currently focusing on other things, partially because
this is not painful bottleneck.

> What I suggest is to step back and see how such direct access, if it
> were available, could be used with tree-sitter.  Let's forget about
> modules for a moment and consider tree-sitter linked with Emacs and
> capable of calling any C function in core.  How would you use that?
>
> Buffer text is not exactly UTF-8, it's a superset of UTF-8.  So one
> question to answer is what to do with byte sequences that are not
> valid UTF-8.  Any suggestions or ideas?  How does tree-sitter handle
> invalid byte sequences in general?
>

I haven't checked yet. It will probably bail out, which is usually the
desired behavior. The tree-sitter's author is likely open to making this
behavior configurable here, though. Alternatively, the direct access
function can offer different behaviors: as-is, bail-out, skip-over, or
null-out (tree-sitter will skip over null bytes, IIRC).

> Also, direct access to buffer text generally means we must make sure
> GC never runs as long as pointers to buffer text are lying around.
> Can any Lisp run between calls to the reader function that the
> tree-sitter parser calls to access the buffer text?  If so, we need to
> take care of that issue.
>

With direct access, no Lisp code will be run between these calls.

> Next, I'm still asking whether parsing the whole buffer when it is
> first created is necessary.  Can we pass to the parser just a small
> chunk (say, 500 bytes) of the buffer around the window-full to be
> displayed next?  If this presents problems, what are those problems?
>

In principle (not in tree-sitter ATM), and in very specific cases, yes.
IMO that's the wrong focus on a premature optimization anyway. As others
noted, even in the pathological case of xdisp.c, the performance is
acceptable. Also keep in mind that syntax highlighting is just one
application. Other use cases usually want a full parse tree.

If we really want to tackle this issue, there are other approaches to
consider, e.g. background parsing, or parsing up until a time limit, and
resume parsing when Emacs is idle. Tree-sitter's API supports the
latter.

But again, both thought exercises and my usage so far point to this
being a non-issue.

> IOW, the issue with exposing access to buffer text to modules is IMO
> secondary.  My suggestion is first to figure out how to do this stuff
> efficiently from within Emacs itself, as if the module interface were
> not part of the equation.  We can add that aspect back later.
>

My opinion is that it's better to experiment with this kind of stuff
out-of-core. It can move forward faster that way, allowing more lessons
to be learned. Real lessons, involving real-world use cases, not thought
exercises.

In a somewhat similar vein, writing emacs-tree-sitter highlighted real
issues with dynamic modules, which I'm going to write up sometime.

> And yes, doing this by consing strings is not a good idea, it will
> slow things down and cause a lot of GC.  It is best avoided.  Thus my
> questions above.
>
> > > Btw, what do you do with the tree returned by the tree-sitter parser?
> > > store it in some buffer-local variable?  If so, how much memory does
> > > such a tree take, and when, if ever, is that memory released?
> > >
> >
> > It's stored in a buffer-local variable. I haven't measured the memory
> > they take. Memory is released when the tree object is garbage-collected
> > (it's a `user-ptr').
>
> So if I have many hundreds of buffers, I could have such a tree in
> each one of them indefinitely?  Perhaps that's one more design issue
> to consider, given that the parsing is so fast.  Similar to what we do
> with image and face caches -- we flush them from time to time, to keep
> the memory footprint in check.  So a buffer that was not current more
> than some time interval ago could have its tree GCed.
>

That can work. Alternatively, tree-sitter can add support for "folding"
subtrees, as Stefan suggested.

--
Tuấn-Anh Nguyễn
Software Engineer



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02  4:21           ` Tuấn-Anh Nguyễn
@ 2020-04-02  5:19             ` Jorge Javier Araya Navarro
  2020-04-02  9:29               ` Stephen Leake
  2020-04-02 10:37             ` Andrea Corallo
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 139+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-04-02  5:19 UTC (permalink / raw)
  To: Tuấn-Anh Nguyễn; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 5897 bytes --]

>  Also keep in mind that syntax highlighting is just one
application. Other use cases usually want a full parse tree.

like indentation, or so I think 🤔, but indentation may be one of those use
cases.

El mié., 1 de abril de 2020 22:22, Tuấn-Anh Nguyễn <ubolonton@gmail.com>
escribió:

> On Thu, Apr 2, 2020 at 2:33 AM Eli Zaretskii <eliz@gnu.org> wrote:
> >
> > > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> > > Date: Thu, 2 Apr 2020 00:55:45 +0700
> > > Cc: emacs-devel@gnu.org
> > >
> > > > Did you consider using the API where an application can provide a
> > > > function to return text at a given offset?  Such a function could be
> > > > relatively easily implemented for Emacs.
> > > >
> > >
> > > I don't understand what you mean. Below I'll explain how it works
> > > currently.  [...]  If dynamic modules have direct access to the
> > > buffer text, none of the above is an issue.
> > >
> > > Such direct access can be enabled by something like this:
> > >
> > >     char* (*access_buffer_text) (emacs_env *env,
> > >                                  emacs_value buffer,
> > >                                  ptrdiff_t byte_offset,
> > >                                  ptrdiff_t *size_inout);
> > >
> > > Of course, such an API would require extensive documentation on how it
> > > must be used, to ensure safety and correctness.
> >
> > I think you are moving too fast, and keep the current implementation
> > in sight too much.
> >
>
> I'm actually moving too slow here. I have thought about this part quite
> a bit, but I'm currently focusing on other things, partially because
> this is not painful bottleneck.
>
> > What I suggest is to step back and see how such direct access, if it
> > were available, could be used with tree-sitter.  Let's forget about
> > modules for a moment and consider tree-sitter linked with Emacs and
> > capable of calling any C function in core.  How would you use that?
> >
> > Buffer text is not exactly UTF-8, it's a superset of UTF-8.  So one
> > question to answer is what to do with byte sequences that are not
> > valid UTF-8.  Any suggestions or ideas?  How does tree-sitter handle
> > invalid byte sequences in general?
> >
>
> I haven't checked yet. It will probably bail out, which is usually the
> desired behavior. The tree-sitter's author is likely open to making this
> behavior configurable here, though. Alternatively, the direct access
> function can offer different behaviors: as-is, bail-out, skip-over, or
> null-out (tree-sitter will skip over null bytes, IIRC).
>
> > Also, direct access to buffer text generally means we must make sure
> > GC never runs as long as pointers to buffer text are lying around.
> > Can any Lisp run between calls to the reader function that the
> > tree-sitter parser calls to access the buffer text?  If so, we need to
> > take care of that issue.
> >
>
> With direct access, no Lisp code will be run between these calls.
>
> > Next, I'm still asking whether parsing the whole buffer when it is
> > first created is necessary.  Can we pass to the parser just a small
> > chunk (say, 500 bytes) of the buffer around the window-full to be
> > displayed next?  If this presents problems, what are those problems?
> >
>
> In principle (not in tree-sitter ATM), and in very specific cases, yes.
> IMO that's the wrong focus on a premature optimization anyway. As others
> noted, even in the pathological case of xdisp.c, the performance is
> acceptable. Also keep in mind that syntax highlighting is just one
> application. Other use cases usually want a full parse tree.
>
> If we really want to tackle this issue, there are other approaches to
> consider, e.g. background parsing, or parsing up until a time limit, and
> resume parsing when Emacs is idle. Tree-sitter's API supports the
> latter.
>
> But again, both thought exercises and my usage so far point to this
> being a non-issue.
>
> > IOW, the issue with exposing access to buffer text to modules is IMO
> > secondary.  My suggestion is first to figure out how to do this stuff
> > efficiently from within Emacs itself, as if the module interface were
> > not part of the equation.  We can add that aspect back later.
> >
>
> My opinion is that it's better to experiment with this kind of stuff
> out-of-core. It can move forward faster that way, allowing more lessons
> to be learned. Real lessons, involving real-world use cases, not thought
> exercises.
>
> In a somewhat similar vein, writing emacs-tree-sitter highlighted real
> issues with dynamic modules, which I'm going to write up sometime.
>
> > And yes, doing this by consing strings is not a good idea, it will
> > slow things down and cause a lot of GC.  It is best avoided.  Thus my
> > questions above.
> >
> > > > Btw, what do you do with the tree returned by the tree-sitter parser?
> > > > store it in some buffer-local variable?  If so, how much memory does
> > > > such a tree take, and when, if ever, is that memory released?
> > > >
> > >
> > > It's stored in a buffer-local variable. I haven't measured the memory
> > > they take. Memory is released when the tree object is garbage-collected
> > > (it's a `user-ptr').
> >
> > So if I have many hundreds of buffers, I could have such a tree in
> > each one of them indefinitely?  Perhaps that's one more design issue
> > to consider, given that the parsing is so fast.  Similar to what we do
> > with image and face caches -- we flush them from time to time, to keep
> > the memory footprint in check.  So a buffer that was not current more
> > than some time interval ago could have its tree GCed.
> >
>
> That can work. Alternatively, tree-sitter can add support for "folding"
> subtrees, as Stefan suggested.
>
> --
> Tuấn-Anh Nguyễn
> Software Engineer
>
>

[-- Attachment #2: Type: text/html, Size: 7465 bytes --]

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02  5:19             ` Jorge Javier Araya Navarro
@ 2020-04-02  9:29               ` Stephen Leake
  0 siblings, 0 replies; 139+ messages in thread
From: Stephen Leake @ 2020-04-02  9:29 UTC (permalink / raw)
  To: emacs-devel

Jorge Javier Araya Navarro <jorge@esavara.cr> writes:

>>  Also keep in mind that syntax highlighting is just one
> application. Other use cases usually want a full parse tree.
>
> like indentation, or so I think 🤔, but indentation may be one of those use
> cases.

To correctly compute indentation for Ada code, you need to parse the
full file initially. After than, indent-region to indent edited code
fits nicely with incremental parse.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02  4:21           ` Tuấn-Anh Nguyễn
  2020-04-02  5:19             ` Jorge Javier Araya Navarro
@ 2020-04-02 10:37             ` Andrea Corallo
  2020-04-02 11:14               ` Tuấn-Anh Nguyễn
  2020-04-02 13:02             ` Stefan Monnier
  2020-04-02 15:02             ` Eli Zaretskii
  3 siblings, 1 reply; 139+ messages in thread
From: Andrea Corallo @ 2020-04-02 10:37 UTC (permalink / raw)
  To: Tuấn-Anh Nguyễn; +Cc: Eli Zaretskii, emacs-devel

Tuấn-Anh Nguyễn <ubolonton@gmail.com> writes:

> In principle (not in tree-sitter ATM), and in very specific cases, yes.
> IMO that's the wrong focus on a premature optimization anyway. As others
> noted, even in the pathological case of xdisp.c, the performance is
> acceptable. 

Please do not assume xdisp.c is the worst case scenario, I can testify
it is not :)

Andrea

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 10:37             ` Andrea Corallo
@ 2020-04-02 11:14               ` Tuấn-Anh Nguyễn
  0 siblings, 0 replies; 139+ messages in thread
From: Tuấn-Anh Nguyễn @ 2020-04-02 11:14 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Eli Zaretskii, emacs-devel

On Thu, Apr 2, 2020 at 5:37 PM Andrea Corallo <akrl@sdf.org> wrote:
>
> Tuấn-Anh Nguyễn <ubolonton@gmail.com> writes:
>
> > In principle (not in tree-sitter ATM), and in very specific cases, yes.
> > IMO that's the wrong focus on a premature optimization anyway. As others
> > noted, even in the pathological case of xdisp.c, the performance is
> > acceptable.
>
> Please do not assume xdisp.c is the worst case scenario, I can testify
> it is not :)
>

Fair enough.

--
Tuấn-Anh Nguyễn
Software Engineer



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02  4:21           ` Tuấn-Anh Nguyễn
  2020-04-02  5:19             ` Jorge Javier Araya Navarro
  2020-04-02 10:37             ` Andrea Corallo
@ 2020-04-02 13:02             ` Stefan Monnier
  2020-04-02 15:06               ` Eli Zaretskii
  2020-04-02 15:02             ` Eli Zaretskii
  3 siblings, 1 reply; 139+ messages in thread
From: Stefan Monnier @ 2020-04-02 13:02 UTC (permalink / raw)
  To: Tuấn-Anh Nguyễn; +Cc: Eli Zaretskii, emacs-devel

> If we really want to tackle this issue, there are other approaches to
> consider, e.g. background parsing, or parsing up until a time limit, and
> resume parsing when Emacs is idle. Tree-sitter's API supports the
> latter.

Emacs is in dire need to exploit multiple cores.  It would be very
natural to run tree-parser's initial parse asynchronously in a separate
thread.  This requires to pass tree-parser a *copy* of the
buffer's text.


        Stefan




^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 13:02             ` Stefan Monnier
@ 2020-04-02 15:06               ` Eli Zaretskii
  0 siblings, 0 replies; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-02 15:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: ubolonton, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Thu, 02 Apr 2020 09:02:41 -0400
> 
> > If we really want to tackle this issue, there are other approaches to
> > consider, e.g. background parsing, or parsing up until a time limit, and
> > resume parsing when Emacs is idle. Tree-sitter's API supports the
> > latter.
> 
> Emacs is in dire need to exploit multiple cores.

True.

> It would be very natural to run tree-parser's initial parse
> asynchronously in a separate thread.  This requires to pass
> tree-parser a *copy* of the buffer's text.

This also raises a lot of issues and problems of its own, of which
copying the buffer is the least one.  We don't yet have any example of
such asynchronous processing, so this feature will have to be the
first that does it, and will then have to resolve the issues in
addition to doing its main job.



^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02  4:21           ` Tuấn-Anh Nguyễn
                               ` (2 preceding siblings ...)
  2020-04-02 13:02             ` Stefan Monnier
@ 2020-04-02 15:02             ` Eli Zaretskii
  2020-04-03 14:34               ` Tuấn-Anh Nguyễn
  3 siblings, 1 reply; 139+ messages in thread
From: Eli Zaretskii @ 2020-04-02 15:02 UTC (permalink / raw)
  To: Tuấn-Anh Nguyễn; +Cc: emacs-devel

> From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> Date: Thu, 2 Apr 2020 11:21:49 +0700
> Cc: emacs-devel@gnu.org
> 
> > Buffer text is not exactly UTF-8, it's a superset of UTF-8.  So one
> > question to answer is what to do with byte sequences that are not
> > valid UTF-8.  Any suggestions or ideas?  How does tree-sitter handle
> > invalid byte sequences in general?
> >
> 
> I haven't checked yet. It will probably bail out, which is usually the
> desired behavior.

"Bail out" meaning that this breaks the parse?  I'd be surprised if
that was what happens in these cases.  But if it does, we will need to
replace such sequences by the likes of U+FFFD in the reader function
we provide.

> With direct access, no Lisp code will be run between these calls.

Then this issue is taken care of.

> > Next, I'm still asking whether parsing the whole buffer when it is
> > first created is necessary.  Can we pass to the parser just a small
> > chunk (say, 500 bytes) of the buffer around the window-full to be
> > displayed next?  If this presents problems, what are those problems?
> >
> 
> In principle (not in tree-sitter ATM), and in very specific cases, yes.
> IMO that's the wrong focus on a premature optimization anyway.

I tried to explain elsewhere why I don't think this is premature.

> As others noted, even in the pathological case of xdisp.c, the
> performance is acceptable.

xdisp.c is not a pathological case for me, I edit it very frequently.
More importantly, this scales poorly.

> Also keep in mind that syntax highlighting is just one
> application. Other use cases usually want a full parse tree.

Other applications have different restrictions and requirements, so
trying to satisfy all of them at once might not be the best way.

> If we really want to tackle this issue, there are other approaches to
> consider, e.g. background parsing, or parsing up until a time limit, and
> resume parsing when Emacs is idle. Tree-sitter's API supports the
> latter.

JIT-lock already supports background fontification (see
jit-lock-stealth-time), so using such parsers from jit-lock gives that
to you at almost no cost.

> > IOW, the issue with exposing access to buffer text to modules is IMO
> > secondary.  My suggestion is first to figure out how to do this stuff
> > efficiently from within Emacs itself, as if the module interface were
> > not part of the equation.  We can add that aspect back later.
> >
> 
> My opinion is that it's better to experiment with this kind of stuff
> out-of-core. It can move forward faster that way, allowing more lessons
> to be learned. Real lessons, involving real-world use cases, not thought
> exercises.

I'm talking about trying different design ideas.  It is best to do
that without being limited by what modules can and cannot do.
Building a hacked version of Emacs to test those ideas doesn't
necessarily contradict the desire to collect real-life experience.

IOW, I suggest to test alternative design ideas that are not based on
copying portions of the buffer via Lisp strings.  If those ideas are
workable (and I think they are), they will support a more scalable
implementation that exerts less memory pressure on Emacs and on the
host system.

HTH

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
  2020-04-02 15:02             ` Eli Zaretskii
@ 2020-04-03 14:34               ` Tuấn-Anh Nguyễn
  0 siblings, 0 replies; 139+ messages in thread
From: Tuấn-Anh Nguyễn @ 2020-04-03 14:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Thu, Apr 2, 2020 at 10:02 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> > Date: Thu, 2 Apr 2020 11:21:49 +0700
> > Cc: emacs-devel@gnu.org
> >
> > > Buffer text is not exactly UTF-8, it's a superset of UTF-8.  So one
> > > question to answer is what to do with byte sequences that are not
> > > valid UTF-8.  Any suggestions or ideas?  How does tree-sitter handle
> > > invalid byte sequences in general?
> > >
> >
> > I haven't checked yet. It will probably bail out, which is usually the
> > desired behavior.
>
> "Bail out" meaning that this breaks the parse?  I'd be surprised if
> that was what happens in these cases.  But if it does, we will need to
> replace such sequences by the likes of U+FFFD in the reader function
> we provide.
>

Agreed. I'll try checking its behavior on this.

> > > IOW, the issue with exposing access to buffer text to modules is IMO
> > > secondary.  My suggestion is first to figure out how to do this stuff
> > > efficiently from within Emacs itself, as if the module interface were
> > > not part of the equation.  We can add that aspect back later.
> > >
> >
> > My opinion is that it's better to experiment with this kind of stuff
> > out-of-core. It can move forward faster that way, allowing more lessons
> > to be learned. Real lessons, involving real-world use cases, not thought
> > exercises.
>
> I'm talking about trying different design ideas.  It is best to do
> that without being limited by what modules can and cannot do.
> Building a hacked version of Emacs to test those ideas doesn't
> necessarily contradict the desire to collect real-life experience.
>
> IOW, I suggest to test alternative design ideas that are not based on
> copying portions of the buffer via Lisp strings.  If those ideas are
> workable (and I think they are), they will support a more scalable
> implementation that exerts less memory pressure on Emacs and on the
> host system.
>
> HTH
>

Yeah, I agree that going through Lisp strings for this is sub-optimal.
When I have time to come back to this part, I'll hack up my local Emacs
to allow dynamic modules to access buffer texts directly, to test out
the idea.

--
Tuấn-Anh Nguyễn
Software Engineer

P.S. Sorry Gmail messed up my first reply.



^ permalink raw reply	[flat|nested] 139+ messages in thread

end of thread, other threads:[~2020-04-06 19:55 UTC | newest]

Thread overview: 139+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-29 18:46 Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
2020-03-29 19:05 ` Andrea Corallo
2020-03-29 19:18   ` Eli Zaretskii
2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
2020-03-30 14:04       ` Eli Zaretskii
2020-03-30 15:06       ` Stefan Monnier
2020-03-30 17:14         ` Yuan Fu
2020-03-30 17:54           ` Stefan Monnier
2020-03-30 18:43             ` Štěpán Němec
2020-03-30 18:46               ` Stefan Monnier
2020-03-30 19:02                 ` Yuan Fu
2020-03-30 19:10                   ` Eli Zaretskii
2020-03-30 19:21                     ` Yuan Fu
2020-03-31  3:56                       ` Štěpán Němec
2020-03-31 13:16                         ` Eli Zaretskii
2020-03-31 13:36                           ` Štěpán Němec
2020-03-31 14:34                             ` Eli Zaretskii
2020-03-31 15:37                               ` Štěpán Němec
2020-03-31 15:58                                 ` Eli Zaretskii
2020-03-31 16:18                                   ` Štěpán Němec
2020-03-31 17:38                                     ` Eli Zaretskii
2020-04-01  0:57                     ` Stephen Leake
2020-03-30 19:42                   ` Stefan Monnier
2020-03-30 19:27                 ` Štěpán Němec
2020-03-31  2:24           ` Eli Zaretskii
2020-03-31  3:10             ` Stefan Monnier
2020-03-31 13:14               ` Eli Zaretskii
2020-03-31 14:31                 ` Dmitry Gutov
2020-03-31 15:36                   ` Eli Zaretskii
2020-03-31 15:45                     ` Dmitry Gutov
2020-03-31 17:16                     ` Stefan Monnier
2020-03-31 17:48                       ` Eli Zaretskii
2020-03-31 19:35                         ` Stefan Monnier
2020-04-01  2:23                           ` Eli Zaretskii
2020-03-31 15:11                 ` Stefan Monnier
2020-03-31 15:44                   ` Eli Zaretskii
2020-03-31 17:10                     ` Stefan Monnier
2020-03-31 17:19                       ` Jorge Javier Araya Navarro
2020-03-31 17:46                       ` Eli Zaretskii
2020-03-31 18:42                         ` 조성빈
2020-03-31 19:29                           ` Eli Zaretskii
2020-03-31 18:47                         ` Dmitry Gutov
2020-03-31 18:48                           ` Noam Postavsky
2020-03-31 19:02                             ` Dmitry Gutov
2020-03-31 19:26                           ` Eli Zaretskii
2020-03-31 19:50                             ` Dmitry Gutov
2020-04-01  2:28                               ` Eli Zaretskii
2020-04-01  3:49                                 ` Dmitry Gutov
2020-04-01  4:14                                   ` Eli Zaretskii
2020-04-01 13:47                                     ` Dmitry Gutov
2020-04-01 14:04                                       ` Eli Zaretskii
2020-04-01 14:55                                         ` Eli Zaretskii
2020-04-01 15:16                                         ` Dmitry Gutov
2020-04-01 15:59                                           ` Eli Zaretskii
2020-04-01 21:48                                             ` Dmitry Gutov
2020-04-01 22:29                                               ` Stefan Monnier
2020-04-02 14:23                                               ` Eli Zaretskii
2020-04-02 16:17                                                 ` Dmitry Gutov
2020-04-02 18:25                                                   ` Eli Zaretskii
2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
2020-04-03 16:10                                                     ` Dmitry Gutov
2020-04-01 13:52                                     ` Alan Mackenzie
2020-04-01 14:10                                       ` Eli Zaretskii
2020-04-01 15:27                                         ` Dmitry Gutov
2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
2020-04-01 16:03                                           ` Eli Zaretskii
2020-04-01 21:21                                             ` Dmitry Gutov
2020-04-02 14:09                                               ` Eli Zaretskii
2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
2020-04-02 18:27                                                   ` Yuan Fu
2020-04-02 19:39                                                     ` Stefan Monnier
2020-04-01 15:22                                       ` Dmitry Gutov
2020-04-04 11:06                                         ` Alan Mackenzie
2020-04-04 11:26                                           ` Eli Zaretskii
2020-04-04 14:14                                             ` Andrea Corallo
2020-04-04 14:41                                               ` Eli Zaretskii
2020-04-04 15:04                                                 ` Andrea Corallo
2020-04-04 15:38                                                   ` Richard Copley
2020-04-04 11:27                                           ` Eli Zaretskii
2020-04-04 12:01                                           ` Dmitry Gutov
2020-04-04 12:36                                             ` Alan Mackenzie
2020-04-04 12:40                                               ` Dmitry Gutov
2020-04-04 13:02                                               ` Eli Zaretskii
2020-04-04 16:09                                                 ` Dmitry Gutov
2020-04-04 16:38                                                   ` Eli Zaretskii
2020-04-04 16:45                                                     ` Eli Zaretskii
2020-04-04 17:22                                                       ` Richard Copley
2020-04-04 17:50                                                         ` Eli Zaretskii
2020-04-04 18:29                                                         ` Andrea Corallo
2020-04-04 18:56                                                           ` Richard Copley
2020-04-04 20:36                                                             ` Andrea Corallo
2020-04-04 17:36                                                       ` Dmitry Gutov
2020-04-04 17:47                                                         ` Eli Zaretskii
2020-04-04 18:02                                                           ` Dmitry Gutov
2020-04-04 23:01                                                             ` Stefan Monnier
2020-04-06 14:25                                                               ` Yuan Fu
2020-04-06 19:55                                                                 ` Jorge Javier Araya Navarro
2020-04-04 17:29                                                     ` Dmitry Gutov
2020-04-04 17:38                                                       ` Eli Zaretskii
2020-04-04 17:57                                                         ` Dmitry Gutov
2020-03-31 16:13                 ` Alan Third
2020-03-31 17:55                   ` Eli Zaretskii
2020-03-30  3:35     ` Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
2020-03-30  6:02       ` Eli Zaretskii
2020-03-30 13:33         ` Stefan Monnier
2020-03-30 14:09           ` Eli Zaretskii
2020-03-30 15:03             ` Stefan Monnier
2020-04-01  0:39               ` Stephen Leake
  -- strict thread matches above, loose matches on Subject: below --
2020-03-31 17:07 Reliable after-change-functions (via: Using incremental parsing in Emacs) Tuấn Anh Nguyễn
2020-03-31 17:50 ` Eli Zaretskii
2020-04-01  6:17   ` Tuấn Anh Nguyễn
2020-04-01 13:26     ` Eli Zaretskii
2020-04-01 15:47       ` Jorge Javier Araya Navarro
2020-04-01 16:07         ` Eli Zaretskii
2020-04-01 17:55       ` Tuấn-Anh Nguyễn
2020-04-01 19:33         ` Eli Zaretskii
2020-04-01 23:38           ` Stephen Leake
2020-04-02  0:25             ` Stephen Leake
2020-04-02  2:46             ` Stefan Monnier
2020-04-02  4:36               ` Tuấn-Anh Nguyễn
2020-04-02 14:44               ` Eli Zaretskii
2020-04-02 15:19                 ` Stefan Monnier
2020-04-02  5:21             ` Tuấn-Anh Nguyễn
2020-04-02 14:36             ` Eli Zaretskii
2020-04-03  2:27               ` Stephen Leake
2020-04-03  7:43                 ` Eli Zaretskii
2020-04-03 17:45                   ` Stephen Leake
2020-04-03 18:31                     ` Eli Zaretskii
2020-04-04  0:04                       ` Stephen Leake
2020-04-04  7:13                         ` Eli Zaretskii
2020-04-02  4:21           ` Tuấn-Anh Nguyễn
2020-04-02  5:19             ` Jorge Javier Araya Navarro
2020-04-02  9:29               ` Stephen Leake
2020-04-02 10:37             ` Andrea Corallo
2020-04-02 11:14               ` Tuấn-Anh Nguyễn
2020-04-02 13:02             ` Stefan Monnier
2020-04-02 15:06               ` Eli Zaretskii
2020-04-02 15:02             ` Eli Zaretskii
2020-04-03 14:34               ` Tuấn-Anh Nguyễn

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.