emacs-tree-sitter and Emacs

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* emacs-tree-sitter and Emacs
@ 2020-03-30  3:23 Jorge Javier Araya Navarro
  2020-03-30 13:07 ` Eli Zaretskii
  2020-03-30 14:11 ` Stefan Monnier
  0 siblings, 2 replies; 46+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-03-30  3:23 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3369 bytes --]

Hi!

I'm the second contributor of emacs-tree-sitter, a fine dude linked the
mail thread where emacs-tree-sitter was mentioned and I went through it.

First off, in the issue tracker we have a ticket open for making this
project available on GNU ELPA, Ubolonton seems willing to make that a
reality but this package has some special requirements that would make
impossible to submit it to GNU ELPA or MELPA, for instance, we need to ship
with grammars for some languages that require compiling and AFAIK there is
no way to build external dependencies in *ELPA.[1]

That said, most GNU/Linux distributions ship Emacs with the modules feature
turned off, AFAIK, which for me personally (and maybe Ubolonton would agree
with me) is worrisome: I would like to everybody enjoy the benefits of this
project without having to re-compile Emacs, not sure if this happens
because Emacs itself ships with the flag turned off by default,
clarification about this is welcome.

I have gone through th0rex's pull request[2] and nothing suggests that
hooks are being used in order to do the fontification of buffers, the only
hook we have is `tree-sitter--after-change', it does re-parsing and keeps
the tree updated; the plan seems to use font-lock. There are some "edge
cases" with fontification like, for instance, multiple-language files, that
the project still has to find an answer[3].

Indeed, projects like this and lsp have their entire development cycle
confined to the comfort of Github, but (and I only speak for myself) it
never crossed my mind to send an email here requesting any kind of help or
advice, the time I needed a code review for one of my branch while the
project owner was absent I actually made a post on Reddit. Don't get me
wrong, I (and I presume this is the case for Ubolonton too) have no beef
against emacs-devel or any of the maintainers, is that emacs-devel wasn't
front and center in my mind when I had to seek help.

th0rex informed us that he won't be able to keep working on his pull
request for some time, I think in some months are left to go, but after
that he would present something better than what is currently shown[4]. In
any case, he had some issues with fontification and I would appreciate if
someone here with more experience in Emacs Lisp and font-lock could take a
look and comment the pull request. I understand font-lock is documented but
for me at least was too dense to wrap my mind around it, I would suggest
code examples would come handy, not sure if the suggestion sounds absurd.

Finally, I'm planning to tackle indentation for my next pull request. It
come as a surprise that Emacs has no "central" indentation engine but that
instead each major mode ships with its own indentation engine. Maybe
someone here could point me in the right direction? it seems this entry in
the Emacs Lisp reference manual about indentation covers the facilities
common to each engine of all major modes[5].

Hopefully can eventually join the conversation. I'm looking forward to the
replies.

[1]: https://github.com/ubolonton/emacs-tree-sitter/issues/1
[2]: https://github.com/ubolonton/emacs-tree-sitter/pull/16
[3]: https://github.com/ubolonton/emacs-tree-sitter/issues/33
[4]:
https://github.com/ubolonton/emacs-tree-sitter/pull/16#issuecomment-591402521
[5]:
https://www.gnu.org/software/emacs/manual/html_node/elisp/Indentation.html

[-- Attachment #2: Type: text/html, Size: 8240 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30  3:23 emacs-tree-sitter and Emacs Jorge Javier Araya Navarro
@ 2020-03-30 13:07 ` Eli Zaretskii
  2020-03-30 14:00   ` Stefan Monnier
  2020-04-01  0:27   ` Stephen Leake
  2020-03-30 14:11 ` Stefan Monnier
  1 sibling, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2020-03-30 13:07 UTC (permalink / raw)
  To: Jorge Javier Araya Navarro; +Cc: emacs-devel

> From: Jorge Javier Araya Navarro <jorge@esavara.cr>
> Date: Sun, 29 Mar 2020 21:23:49 -0600
> 
> I have gone through th0rex's pull request[2] and nothing suggests that hooks are being used in order to do
> the fontification of buffers, the only hook we have is `tree-sitter--after-change', it does re-parsing and keeps
> the tree updated

Can you explain why an after-change hook is at all needed?  Why not
pass to tree-sitter the chunk that jit-lock is going to fontify?
AFAIU, tree-sitter is well suited for parsing incomplete source
chunks.  OTOH, using an after-change hook has its downsides, even if
disregard slow-down (which I wouldn't).



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 13:07 ` Eli Zaretskii
@ 2020-03-30 14:00   ` Stefan Monnier
  2020-04-01  0:08     ` Stephen Leake
  2020-04-01  0:27   ` Stephen Leake
  1 sibling, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2020-03-30 14:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Jorge Javier Araya Navarro, emacs-devel

>> I have gone through th0rex's pull request[2] and nothing suggests that
>> hooks are being used in order to do
>> the fontification of buffers, the only hook we have is
>> `tree-sitter--after-change', it does re-parsing and keeps
>> the tree updated
>
> Can you explain why an after-change hook is at all needed?

IIUC `tree-sitter--after-change` is not a function placed on
`after-change-function` but a hook which tree-sitter runs after its
parse tree has been changed.  The "tree-sitter-highlight" feature uses
this hook to be informed which parts of the buffer need to be
re-fontified after a change (which can be much more than just the part
of the buffer that was changed).

        Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30  3:23 emacs-tree-sitter and Emacs Jorge Javier Araya Navarro
  2020-03-30 13:07 ` Eli Zaretskii
@ 2020-03-30 14:11 ` Stefan Monnier
  2020-03-30 17:00   ` Jorge Javier Araya Navarro
  1 sibling, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2020-03-30 14:11 UTC (permalink / raw)
  To: Jorge Javier Araya Navarro; +Cc: emacs-devel

> First off, in the issue tracker we have a ticket open for making this
> project available on GNU ELPA, Ubolonton seems willing to make that
> a reality

That would be neat.

> but this package has some special requirements that would make
> impossible to submit it to GNU ELPA or MELPA, for instance, we need to
> ship with grammars for some languages that require compiling and AFAIK
> there is no way to build external dependencies in *ELPA.[1]

Indeed, the GNU ELPA infrastructure is too weak to support such a thing
right now, but that's a problem we need to fix anyway, so it just means
we should work on it.

> Finally, I'm planning to tackle indentation for my next pull request.
> It come as a surprise that Emacs has no "central" indentation engine
> but that instead each major mode ships with its own indentation
> engine.

It's not so surprising if you think about it: indentation requires
parsing, so "a central indentation engine" requires something like
tree-sitter ;-)

SMIE is the closest there is so far (contrary to tree-sitter it uses
a very simple parsing strategy, which is just barely sufficient for
"typical" indentation cases).

CC-mode has another engine, which is used for several languages.

And finally, there's `wisi` on GNU ELPA, it aims to be very generic and
pretty flexible, but AFAIK it's only used by `ada-mode` so far.

        Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 14:11 ` Stefan Monnier
@ 2020-03-30 17:00   ` Jorge Javier Araya Navarro
  2020-03-30 17:07     ` Dmitry Gutov
  2020-03-30 17:22     ` Stefan Monnier
  0 siblings, 2 replies; 46+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-03-30 17:00 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2059 bytes --]

I see, so in your opinion having a central engine won't be a bad
architecture design for Emacs despite major modes having their own engine.

I wonder: is possible for any other package to "take the wheel" from major
modes when it comes to indentation or a better approach would be offering
facilities for package authors and maintainers to outsource their
indentation to this central tree-sitter-based indentation engine?

I'm trying to picture how this would look like.

El lun., 30 de mar. de 2020 a la(s) 08:11, Stefan Monnier (
monnier@iro.umontreal.ca) escribió:

> > First off, in the issue tracker we have a ticket open for making this
> > project available on GNU ELPA, Ubolonton seems willing to make that
> > a reality
>
> That would be neat.
>
> > but this package has some special requirements that would make
> > impossible to submit it to GNU ELPA or MELPA, for instance, we need to
> > ship with grammars for some languages that require compiling and AFAIK
> > there is no way to build external dependencies in *ELPA.[1]
>
> Indeed, the GNU ELPA infrastructure is too weak to support such a thing
> right now, but that's a problem we need to fix anyway, so it just means
> we should work on it.
>
> > Finally, I'm planning to tackle indentation for my next pull request.
> > It come as a surprise that Emacs has no "central" indentation engine
> > but that instead each major mode ships with its own indentation
> > engine.
>
> It's not so surprising if you think about it: indentation requires
> parsing, so "a central indentation engine" requires something like
> tree-sitter ;-)
>
> SMIE is the closest there is so far (contrary to tree-sitter it uses
> a very simple parsing strategy, which is just barely sufficient for
> "typical" indentation cases).
>
> CC-mode has another engine, which is used for several languages.
>
> And finally, there's `wisi` on GNU ELPA, it aims to be very generic and
> pretty flexible, but AFAIK it's only used by `ada-mode` so far.
>
>
>         Stefan
>
>

[-- Attachment #2: Type: text/html, Size: 2831 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 17:00   ` Jorge Javier Araya Navarro
@ 2020-03-30 17:07     ` Dmitry Gutov
  2020-03-30 17:09       ` Jorge Javier Araya Navarro
  2020-03-30 17:22     ` Stefan Monnier
  1 sibling, 1 reply; 46+ messages in thread
From: Dmitry Gutov @ 2020-03-30 17:07 UTC (permalink / raw)
  To: Jorge Javier Araya Navarro, Stefan Monnier; +Cc: emacs-devel

On 30.03.2020 20:00, Jorge Javier Araya Navarro wrote:
> I wonder: is possible for any other package to "take the wheel" from 
> major modes when it comes to indentation

A minor mode can set its own indent-line-function.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 17:07     ` Dmitry Gutov
@ 2020-03-30 17:09       ` Jorge Javier Araya Navarro
  0 siblings, 0 replies; 46+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-03-30 17:09 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 355 bytes --]

ah, so, there is a way.

El lun., 30 de mar. de 2020 a la(s) 11:07, Dmitry Gutov (dgutov@yandex.ru)
escribió:

> On 30.03.2020 20:00, Jorge Javier Araya Navarro wrote:
> > I wonder: is possible for any other package to "take the wheel" from
> > major modes when it comes to indentation
>
> A minor mode can set its own indent-line-function.
>

[-- Attachment #2: Type: text/html, Size: 712 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 17:00   ` Jorge Javier Araya Navarro
  2020-03-30 17:07     ` Dmitry Gutov
@ 2020-03-30 17:22     ` Stefan Monnier
  2020-03-30 17:34       ` Jorge Javier Araya Navarro
                         ` (2 more replies)
  1 sibling, 3 replies; 46+ messages in thread
From: Stefan Monnier @ 2020-03-30 17:22 UTC (permalink / raw)
  To: Jorge Javier Araya Navarro; +Cc: emacs-devel

> I see, so in your opinion having a central engine won't be a bad
> architecture design for Emacs despite major modes having their own engine.

On the contrary: moving indentation to tree-sitter would be of great benefit.

> I wonder: is possible for any other package to "take the wheel" from major
> modes when it comes to indentation

I think it's a simple matter of

    (setq-local indent-line-function #'tree-sitter-indent-line)


-- Stefan




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 17:22     ` Stefan Monnier
@ 2020-03-30 17:34       ` Jorge Javier Araya Navarro
  2020-03-30 17:50       ` Stefan Monnier
  2020-04-01  0:30       ` Stephen Leake
  2 siblings, 0 replies; 46+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-03-30 17:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 724 bytes --]

I'm so ready to start a new branch in emacs-tree-sitter jaja

thanks for the tips, I will come back if anything comes.

El lun., 30 de mar. de 2020 a la(s) 11:22, Stefan Monnier (
monnier@iro.umontreal.ca) escribió:

> > I see, so in your opinion having a central engine won't be a bad
> > architecture design for Emacs despite major modes having their own
> engine.
>
> On the contrary: moving indentation to tree-sitter would be of great
> benefit.
>
> > I wonder: is possible for any other package to "take the wheel" from
> major
> > modes when it comes to indentation
>
> I think it's a simple matter of
>
>     (setq-local indent-line-function #'tree-sitter-indent-line)
>
>
> -- Stefan
>
>

[-- Attachment #2: Type: text/html, Size: 1254 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 17:22     ` Stefan Monnier
  2020-03-30 17:34       ` Jorge Javier Araya Navarro
@ 2020-03-30 17:50       ` Stefan Monnier
  2020-04-01  0:30       ` Stephen Leake
  2 siblings, 0 replies; 46+ messages in thread
From: Stefan Monnier @ 2020-03-30 17:50 UTC (permalink / raw)
  To: Jorge Javier Araya Navarro; +Cc: emacs-devel

>     (setq-local indent-line-function #'tree-sitter-indent-line)

And if you want this change to be easily reversible:

    (add-function :override (local 'indent-line-function)
                  #'tree-sitter-indent-line)

so you can undo it later with

    (remove-function (local 'indent-line-function)
                     #'tree-sitter-indent-line)


-- Stefan




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 14:00   ` Stefan Monnier
@ 2020-04-01  0:08     ` Stephen Leake
  0 siblings, 0 replies; 46+ messages in thread
From: Stephen Leake @ 2020-04-01  0:08 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> I have gone through th0rex's pull request[2] and nothing suggests that
>>> hooks are being used in order to do
>>> the fontification of buffers, the only hook we have is
>>> `tree-sitter--after-change', it does re-parsing and keeps
>>> the tree updated
>>
>> Can you explain why an after-change hook is at all needed?
>
> IIUC `tree-sitter--after-change` is not a function placed on
> `after-change-function` but a hook which tree-sitter runs after its
> parse tree has been changed.  The "tree-sitter-highlight" feature uses
> this hook to be informed which parts of the buffer need to be
> re-fontified after a change (which can be much more than just the part
> of the buffer that was changed).

But presumably the parse is triggered by a function on
after-change-functions.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 13:07 ` Eli Zaretskii
  2020-03-30 14:00   ` Stefan Monnier
@ 2020-04-01  0:27   ` Stephen Leake
  2020-04-01 13:20     ` Eli Zaretskii
  2020-04-01 13:28     ` Stefan Monnier
  1 sibling, 2 replies; 46+ messages in thread
From: Stephen Leake @ 2020-04-01  0:27 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Jorge Javier Araya Navarro <jorge@esavara.cr>
>> Date: Sun, 29 Mar 2020 21:23:49 -0600
>> 
>> I have gone through th0rex's pull request[2] and nothing suggests
>> that hooks are being used in order to do
>> the fontification of buffers, the only hook we have is
>> `tree-sitter--after-change', it does re-parsing and keeps
>> the tree updated
>
> Can you explain why an after-change hook is at all needed?  Why not
> pass to tree-sitter the chunk that jit-lock is going to fontify?
> AFAIU, tree-sitter is well suited for parsing incomplete source
> chunks.  

It depends on what you mean by "incomplete". If the full buffer is
syntactically correct, and the "incomplete" source is a region chosen by
font-lock, then yes; tree-sitter can efficiently update the parse tree
to match that region.

On the other hand, if the buffer is not syntactically correct, there is
no guarrantee that tree-sitter can handle it; it's supposed to be
"robust" to syntax errors. One of the links to an error recovery
algorithm is broken; the other link describes an algorithm that is not
as sophisticated as the one wisi uses (I'm working on writing a paper
about that).

> OTOH, using an after-change hook has its downsides, even if disregard
> slow-down (which I wouldn't).

In wisi (used by ada-mode), the after-change hooks just record what
regions have been changed; font-lock then triggers a parse if the region
being fontified contains or is after a change region. Navigation and
indent also trigger parses.

By default font-lock runs after every character typed, which is often
too slow in an ada-mode buffer; I always set jit-lock-defer-time to
1.0 seconds. 

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-03-30 17:22     ` Stefan Monnier
  2020-03-30 17:34       ` Jorge Javier Araya Navarro
  2020-03-30 17:50       ` Stefan Monnier
@ 2020-04-01  0:30       ` Stephen Leake
  2 siblings, 0 replies; 46+ messages in thread
From: Stephen Leake @ 2020-04-01  0:30 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> I see, so in your opinion having a central engine won't be a bad
>> architecture design for Emacs despite major modes having their own engine.
>
> On the contrary: moving indentation to tree-sitter would be of great
> benefit.

Tree-sitter is just one way to implement a parser. Implementing
indentation as part of a tree-sitter Emacs package would be missing the
chance to use other parsers.

wisi provides indentation, navigation, and font-lock, with the
infrastructure to allow any parser; it should be straight-forward to use
tree-sitter as a wisi parse backend.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-01  0:27   ` Stephen Leake
@ 2020-04-01 13:20     ` Eli Zaretskii
  2020-04-01 19:51       ` Stephen Leake
  2020-04-01 13:28     ` Stefan Monnier
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-01 13:20 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Tue, 31 Mar 2020 16:27:35 -0800
> 
> > OTOH, using an after-change hook has its downsides, even if disregard
> > slow-down (which I wouldn't).
> 
> In wisi (used by ada-mode), the after-change hooks just record what
> regions have been changed; font-lock then triggers a parse if the region
> being fontified contains or is after a change region. Navigation and
> indent also trigger parses.

Can you tell in more detail why you need to rely on these hooks?  They
shouldn't be necessary, AFAIU.  And they cannot pick up every relevant
change; for example, what happens if some face used for font-lock is
modified?

> By default font-lock runs after every character typed

No, it only runs when redisplay kicks in.  If you type very quickly,
it won't run for every character.  At least AFAIR.

> which is often too slow in an ada-mode buffer; I always set
> jit-lock-defer-time to 1.0 seconds.

That's too long to be pleasant on display, IMO.  A second is a very
long time in this context.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-01  0:27   ` Stephen Leake
  2020-04-01 13:20     ` Eli Zaretskii
@ 2020-04-01 13:28     ` Stefan Monnier
  1 sibling, 0 replies; 46+ messages in thread
From: Stefan Monnier @ 2020-04-01 13:28 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> By default font-lock runs after every character typed, which is often
> too slow in an ada-mode buffer; I always set jit-lock-defer-time to
> 1.0 seconds.

"Immediate font-lock" was fast enough when font-lock was introduced
(around 1995, AFAICT), so it would be a pity if we had to give it up.


        Stefan




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-01 13:20     ` Eli Zaretskii
@ 2020-04-01 19:51       ` Stephen Leake
  2020-04-02 14:03         ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Stephen Leake @ 2020-04-01 19:51 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Tue, 31 Mar 2020 16:27:35 -0800
>> 
>> > OTOH, using an after-change hook has its downsides, even if disregard
>> > slow-down (which I wouldn't).
>> 
>> In wisi (used by ada-mode), the after-change hooks just record what
>> regions have been changed; font-lock then triggers a parse if the region
>> being fontified contains or is after a change region. Navigation and
>> indent also trigger parses.
>
> Can you tell in more detail why you need to rely on these hooks?  They
> shouldn't be necessary, AFAIU.  

It is an optimization choice.

In an unmodified buffer, that is smaller than 100,000 characters
(default setting of wisi-partial-parse-threshold), the entire buffer is
parsed once; that applies faces to all the Ada identifiers that need
faces (standard font-lock regexp handles the reserved words). Then when
font-lock fontifies a region, no parsing is needed.

Indent is similar; the parse sets text properties holding the indent for
each line; indent-region then applies them.

When the user starts editing, and font-lock is requested, we need
to know the changes before the font-lock region, because that can affect
the interpretation of the code in the region. Worst case is adding an
opening ". Adding/deleting "begin" or "end" can change indent
(equivalent to adding/deleting { or } in C).

If the default setting of jit-lock-defer-time (ie nil) is used, then
font-lock runs immediately after each change, and the after-change hooks
are not needed. But as I have mentioned, I always run with
jit-lock-defer-time set to 1.0 (because parsing is not fast enough in
some cases), so the change hooks are needed.

In addition, indent-region is run when the user types return or tab (or
otherwise invokes indent); there can easily be changes outside the line
or region begin indented, again requiring change hooks.

The alternative to not requiring after-change hooks is to always do a full
parse, for ever call of fontify-region or indent-region. That is far too
slow.

Note that Tree-Sitter requires one full parse of the buffer to generate
the parse tree that is later updated incrementally; in an unmodified
buffer, only that one parse is needed.

wisi can handle parsing only a small part of the file, but it produces
incorrect results more often in that case, since it relies on
error-correction to arrive at correct syntax. That's why partial parse
is only used on very large files, where the parser is _always_ too slow;
in most files, it is only too slow when there is a bad syntax error, and
recover is slow.

> And they cannot pick up every relevant change; for example, what
> happens if some face used for font-lock is modified?

Yes, that is a flaw. Not likely to occur in everyday use, and wisi
provides wisi-parse-buffer to force a full parse for such situations.

As I mentioned elsewhere, wisi provides wisi-inhibit-parse for use when
an elisp author might be tempted to use inhibit-modification-hooks. 

>> By default font-lock runs after every character typed
>
> No, it only runs when redisplay kicks in.  If you type very quickly,
> it won't run for every character.  At least AFAIR.

What triggers redisplay?

In practice, I and other ada-mode users notice font-lock running after
each character, with the default setting of jit-lock-defer-time. There
is a comment in jit-lock.el indicating that the default value may have
been 0.25 at one point (I did not check the git history); perhaps you
are remembering that behavior?

For example, in Ada the comment-start is "--". No matter how fast I type
the two chars, ada-mode reports a syntax error after the first one.
Syntax errors are detected by a parse and reported via fringe marks, as
in flymake; it blinks after the key is pressed twice, appearing after
the first character is displayed, disappearing after the second is
displayed. (I have just retested this in emacs from master).

I don't think there's anything in ada-mode that forces a redisplay
(except explicitly calling wisi-parse-buffer; that calls
font-lock-ensure). But I'd be happy to investigate further if you are
sure it should not work this way.

The elisp manual section "Forcing redisplay" says "Emacs normally tries
to redisplay the screen whenever it waits for input." After I type the
first character, it is no longer waiting for input, it is processing
that character. I assume here "process that char code" includes running
after-change-functions, which is (small) elisp code. But I guess after
processing that char, before calling redisplay, it checks if there is
more input, which should be true if I type fast enough. Perhaps "process
that char code" is faster than the combination of my fingers and the
keyboard char send rate?

Hmm. M-x (execute-kbd-macro "--") does not show a syntax-error fringe
blink. I'm not sure if that is relevant here.

>> which is often too slow in an ada-mode buffer; I always set
>> jit-lock-defer-time to 1.0 seconds.
>
> That's too long to be pleasant on display, IMO.  A second is a very
> long time in this context.

Other people have made the same complaint. I'm probably biased in
accepting the slow parser behavior (I know how hard it would be to
improve it :). Migrating from an external process to a module might
help. Changing from partial parse to incremental parse might help.

Setting jit-lock-defer-time to 0.25 eliminates the fringe blink when
typing "--". If I watch very closely, I can just barely see the delay
between displaying the last char and the change of color (from black to
red).

I'll run with 0.25 for a while; the parser has gotten better since the
last time I changed that, so maybe that's good enough now.

I mentioned above that the parser is only too slow when there is a bad
syntax error, and recover is slow. However, that is the typical case
while editing code. 

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-01 19:51       ` Stephen Leake
@ 2020-04-02 14:03         ` Eli Zaretskii
  2020-04-02 14:27           ` Michael Welsh Duggan
  2020-04-03  1:55           ` Stephen Leake
  0 siblings, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-02 14:03 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Wed, 01 Apr 2020 11:51:40 -0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Can you tell in more detail why you need to rely on these hooks?  They
> > shouldn't be necessary, AFAIU.  
> 
> It is an optimization choice.
> 
> In an unmodified buffer, that is smaller than 100,000 characters
> (default setting of wisi-partial-parse-threshold), the entire buffer is
> parsed once; that applies faces to all the Ada identifiers that need
> faces (standard font-lock regexp handles the reserved words). Then when
> font-lock fontifies a region, no parsing is needed.

But why do you need that initial full parse in the first place?  Is
parsing parts of the buffer so much harder?

> Indent is similar; the parse sets text properties holding the indent for
> each line; indent-region then applies them.

Indent is a different use case: it happens by user command, and thus
has different time restrictions than redisplay.

> If the default setting of jit-lock-defer-time (ie nil) is used, then
> font-lock runs immediately after each change, and the after-change hooks
> are not needed. But as I have mentioned, I always run with
> jit-lock-defer-time set to 1.0 (because parsing is not fast enough in
> some cases), so the change hooks are needed.

AFAIU, tree-sitter and similar parsers are supposed to be much faster,
so the problem with slow parsing, and all the solutions to alleviate
that problem, may not be necessary, if they are the only reason for
using the hooks.

> The alternative to not requiring after-change hooks is to always do a full
> parse, for ever call of fontify-region or indent-region. That is far too
> slow.

Even for indentation, a full parse should not be needed.  You need to
only parse the outermost enclosing function/procedure, right?  That's
rarely the full buffer, except when the buffer is small.

> Note that Tree-Sitter requires one full parse of the buffer to generate
> the parse tree that is later updated incrementally; in an unmodified
> buffer, only that one parse is needed.

Tree-sitter cannot know what the full buffer holds, so nothing
prevents us from passing it just part of the buffer.  After all,
tree-sitter should be able to do a decent job when the part we pass to
it actually _is_ all we have in the buffer, right?

> > And they cannot pick up every relevant change; for example, what
> > happens if some face used for font-lock is modified?
> 
> Yes, that is a flaw. Not likely to occur in everyday use

Redisplay cannot rely on something being "unlikely", because it's
expected to produce correct results in all situations.  Incorrect
display is one of the worst bugs that can happen in an editor.  In a
modified buffer that is not yet syntactically correct we can get away
with slightly incorrect fontifications, but missing face changes will
produce horribly incorrect results even if nothing has changed
syntactically.

That is why I think we should try to avoid using hooks for
fontification as much as we can.  I can understand why fontification
methods that are too slow want to get some help from hooks, but when
we design and implement novel fontification methods using fast
parsers, we should first try doing that without any hooks, because we
already know, from the bitter experience of Emacs 19, that using hooks
is a dead end.  We developed jit-lock in Emacs 21 precisely to avoid
using such hooks, because we realized that those old methods won't
work well enough.

> >> By default font-lock runs after every character typed
> >
> > No, it only runs when redisplay kicks in.  If you type very quickly,
> > it won't run for every character.  At least AFAIR.
> 
> What triggers redisplay?

When Emacs is about to read input, if no input is available, it
performs redisplay.  IOW, Emacs enters redisplay when it's about to
become idle.

> In practice, I and other ada-mode users notice font-lock running after
> each character, with the default setting of jit-lock-defer-time. There
> is a comment in jit-lock.el indicating that the default value may have
> been 0.25 at one point (I did not check the git history); perhaps you
> are remembering that behavior?

The 0.25 value is just a reminder of the default timing of a similar
feature in lazy-lock (RIP), used in Emacs 19.  AFAIR, we never had
jit-lock-defer-time non-nil by default in Emacs, because during
development of Emacs 21 the consensus was that its effect is too
surprising, and because (at least in those days) the default jit-lock
was fast enough for us to be able to leave the deferred fontification
disabled.

> For example, in Ada the comment-start is "--". No matter how fast I type
> the two chars, ada-mode reports a syntax error after the first one.

That means you don't type fast enough, at least relative to your CPU
speed.

> I don't think there's anything in ada-mode that forces a redisplay
> (except explicitly calling wisi-parse-buffer; that calls
> font-lock-ensure). But I'd be happy to investigate further if you are
> sure it should not work this way.

In other similar situations (e.g., in Flyspell mode) we wait for some
non-zero idle time before actually running the code which could react
to slow typing with annoying messages.

> The elisp manual section "Forcing redisplay" says "Emacs normally tries
> to redisplay the screen whenever it waits for input." After I type the
> first character, it is no longer waiting for input, it is processing
> that character. I assume here "process that char code" includes running
> after-change-functions, which is (small) elisp code. But I guess after
> processing that char, before calling redisplay, it checks if there is
> more input, which should be true if I type fast enough. Perhaps "process
> that char code" is faster than the combination of my fingers and the
> keyboard char send rate?

Yes, most probably.

> Hmm. M-x (execute-kbd-macro "--") does not show a syntax-error fringe
> blink. I'm not sure if that is relevant here.

I think it is, because it injects the characters through the same
input queue as when you type.  It just does that much faster.

> I mentioned above that the parser is only too slow when there is a bad
> syntax error, and recover is slow. However, that is the typical case
> while editing code. 

AFAIU, producing reasonably good results in this case is one of the
explicit design goals of tree-sitter.  So it might be much better in
these situations.  But I have no first-hand experience to tell if
that's indeed so.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 14:03         ` Eli Zaretskii
@ 2020-04-02 14:27           ` Michael Welsh Duggan
  2020-04-02 15:15             ` Eli Zaretskii
  2020-04-02 15:33             ` martin rudalics
  2020-04-03  1:55           ` Stephen Leake
  1 sibling, 2 replies; 46+ messages in thread
From: Michael Welsh Duggan @ 2020-04-02 14:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stephen Leake, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Wed, 01 Apr 2020 11:51:40 -0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Can you tell in more detail why you need to rely on these hooks?  They
>> > shouldn't be necessary, AFAIU.  
>> 
>> It is an optimization choice.
>> 
>> In an unmodified buffer, that is smaller than 100,000 characters
>> (default setting of wisi-partial-parse-threshold), the entire buffer is
>> parsed once; that applies faces to all the Ada identifiers that need
>> faces (standard font-lock regexp handles the reserved words). Then when
>> font-lock fontifies a region, no parsing is needed.
>
> But why do you need that initial full parse in the first place?  Is
> parsing parts of the buffer so much harder?

I would think that you at least need to parse everything displayed and
everything before what is displayed.  (You need all prior context.  What
if someone opened a comment on line 1 and hasn't closed it, for
example?)  I don't, however, see a reason you couldn't defer sending the
rest until afterward, any more than you would have to if the file were
being typed in, one line at a time.

-- 
Michael Welsh Duggan
(md5i@md5i.com)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 14:27           ` Michael Welsh Duggan
@ 2020-04-02 15:15             ` Eli Zaretskii
  2020-04-02 15:24               ` Michael Welsh Duggan
  2020-04-03  2:06               ` Stephen Leake
  2020-04-02 15:33             ` martin rudalics
  1 sibling, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-02 15:15 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: stephen_leake, emacs-devel

> From: Michael Welsh Duggan <mwd@md5i.com>
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,  emacs-devel@gnu.org
> Date: Thu, 02 Apr 2020 10:27:23 -0400
> 
> > But why do you need that initial full parse in the first place?  Is
> > parsing parts of the buffer so much harder?
> 
> I would think that you at least need to parse everything displayed and
> everything before what is displayed.  (You need all prior context.  What
> if someone opened a comment on line 1 and hasn't closed it, for
> example?)

Each buffer always knows which part of it remains unchanged.  When
fontification is invoked, it should start from that place.  So there's
no need to parse _all_ prior context.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 15:15             ` Eli Zaretskii
@ 2020-04-02 15:24               ` Michael Welsh Duggan
  2020-04-02 16:10                 ` Eli Zaretskii
  2020-04-03  2:06               ` Stephen Leake
  1 sibling, 1 reply; 46+ messages in thread
From: Michael Welsh Duggan @ 2020-04-02 15:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Michael Welsh Duggan, stephen_leake, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Michael Welsh Duggan <mwd@md5i.com>
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,  emacs-devel@gnu.org
>> Date: Thu, 02 Apr 2020 10:27:23 -0400
>> 
>> > But why do you need that initial full parse in the first place?  Is
>> > parsing parts of the buffer so much harder?
>> 
>> I would think that you at least need to parse everything displayed and
>> everything before what is displayed.  (You need all prior context.  What
>> if someone opened a comment on line 1 and hasn't closed it, for
>> example?)
>
> Each buffer always knows which part of it remains unchanged.  When
> fontification is invoked, it should start from that place.  So there's
> no need to parse _all_ prior context.

Why not?  Won't it miss that hypothetical opening comment delimiter, for
example?  And, if it's because we've already parsed it before, isn't
that what we're talking about?  The initial parse?  Because after that,
of course only changes are sent to the parser.

-- 
Michael Welsh Duggan
(mwd@cert.org)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 14:27           ` Michael Welsh Duggan
  2020-04-02 15:15             ` Eli Zaretskii
@ 2020-04-02 15:33             ` martin rudalics
  1 sibling, 0 replies; 46+ messages in thread
From: martin rudalics @ 2020-04-02 15:33 UTC (permalink / raw)
  To: Michael Welsh Duggan, Eli Zaretskii; +Cc: Stephen Leake, emacs-devel

 > I would think that you at least need to parse everything displayed and
 > everything before what is displayed.  (You need all prior context.  What
 > if someone opened a comment on line 1 and hasn't closed it, for
 > example?

When I open a comment on line 1 of a buffer whose beginning is shown in
one window and whose end is shown in another window, I explicitly do not
expect the code in that other window to show up as commented out.  What
I expect and consider reasonable instead is that any parsing mechanism
considers my code as commented out up to the first open paren in column
0 it finds.  Only if there's no such paren, it may assume that the
comment indeed extends to the end of the buffer.

Am I really the only person on this list who would be content with such
simple behavior?  If so, then I'm really getting too old for this kind
of job.

 > )  I don't, however, see a reason you couldn't defer sending the
 > rest until afterward, any more than you would have to if the file were
 > being typed in, one line at a time.

martin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 15:24               ` Michael Welsh Duggan
@ 2020-04-02 16:10                 ` Eli Zaretskii
  2020-04-02 16:19                   ` Michael Welsh Duggan
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-02 16:10 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: mwd, stephen_leake, emacs-devel

> From: Michael Welsh Duggan <mwd@cert.org>
> Cc: Michael Welsh Duggan <mwd@md5i.com>, <stephen_leake@stephe-leake.org>,
>         <emacs-devel@gnu.org>
> Date: Thu, 02 Apr 2020 11:24:33 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Each buffer always knows which part of it remains unchanged.  When
> > fontification is invoked, it should start from that place.  So there's
> > no need to parse _all_ prior context.
> 
> Why not?  Won't it miss that hypothetical opening comment delimiter, for
> example?

No, because that was already taken care of when that comment delimiter
was inserted.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 16:10                 ` Eli Zaretskii
@ 2020-04-02 16:19                   ` Michael Welsh Duggan
  2020-04-02 17:18                     ` Yuan Fu
  2020-04-02 18:27                     ` Eli Zaretskii
  0 siblings, 2 replies; 46+ messages in thread
From: Michael Welsh Duggan @ 2020-04-02 16:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mwd, stephen_leake, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Michael Welsh Duggan <mwd@cert.org>
>> Cc: Michael Welsh Duggan <mwd@md5i.com>, <stephen_leake@stephe-leake.org>,
>>         <emacs-devel@gnu.org>
>> Date: Thu, 02 Apr 2020 11:24:33 -0400
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Each buffer always knows which part of it remains unchanged.  When
>> > fontification is invoked, it should start from that place.  So there's
>> > no need to parse _all_ prior context.
>> 
>> Why not?  Won't it miss that hypothetical opening comment delimiter, for
>> example?
>
> No, because that was already taken care of when that comment delimiter
> was inserted.

In which case it's covered by the paragraph you omitted.

-- 
Michael Welsh Duggan
(mwd@cert.org)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 16:19                   ` Michael Welsh Duggan
@ 2020-04-02 17:18                     ` Yuan Fu
  2020-04-02 17:39                       ` Stefan Monnier
  2020-04-02 18:29                       ` Eli Zaretskii
  2020-04-02 18:27                     ` Eli Zaretskii
  1 sibling, 2 replies; 46+ messages in thread
From: Yuan Fu @ 2020-04-02 17:18 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: mwd, Eli Zaretskii, stephen_leake, emacs-devel

> I would think that you at least need to parse everything displayed and
> everything before what is displayed.  (You need all prior context.  What
> if someone opened a comment on line 1 and hasn't closed it, for
> example?)  I don't, however, see a reason you couldn't defer sending the
> rest until afterward, any more than you would have to if the file were
> being typed in, one line at a time.

Can we do something similar to fontified text property? I.e., only parse the first screen full of text when a buffer is opened, and mark the rest as unparsed with text property. Then when we need to access the parse tree (for fortification, etc), send all unparsed text before some point (e.g., last visible char in the window) to the parser.

Yuan


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 17:18                     ` Yuan Fu
@ 2020-04-02 17:39                       ` Stefan Monnier
  2020-04-02 18:17                         ` Yuan Fu
  2020-04-02 18:29                       ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2020-04-02 17:39 UTC (permalink / raw)
  To: Yuan Fu
  Cc: mwd, Eli Zaretskii, stephen_leake, Michael Welsh Duggan,
	emacs-devel

> Can we do something similar to fontified text property? I.e., only parse the
> first screen full of text when a buffer is opened, and mark the rest as
> unparsed with text property. Then when we need to access the parse tree (for
> fortification, etc), send all unparsed text before some point (e.g., last
> visible char in the window) to the parser.

If you follow the idea that parsing is always done "from point-min",
then you don't need text-properties: you only need to remember the
position up to which parsing has been done.

That's what `syntax-propertize` does, and it keeps the corresponding
info in `syntax-propertize--done` (it doesn't even need to be marker
since any changes to the text before `syntax-propertize--done` should
cause it to be reset to the beginning of those modifications).

As mentioned in some other message, this design becomes inefficient when
you have two windows displaying a large buffer, one display near the end,
and the other near the beginning, and you make changes at the beginning
of the buffer.  I haven't seen any performance bug-reports or complaints
about it, so it appears that those circumstances are very rare.

BTW, one of the potential benefits of font-locking done via tree-sitter
is that it generally doesn't care about lines, so it should be possible
to make it work just as efficiently on long lines (that won't solve all
the problems we have with long lines, but still).

        Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 17:39                       ` Stefan Monnier
@ 2020-04-02 18:17                         ` Yuan Fu
  2020-04-02 18:26                           ` Stefan Monnier
  2020-04-03  2:16                           ` Stephen Leake
  0 siblings, 2 replies; 46+ messages in thread
From: Yuan Fu @ 2020-04-02 18:17 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: mwd, Eli Zaretskii, stephen_leake, Michael Welsh Duggan,
	emacs-devel

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]


> As mentioned in some other message, this design becomes inefficient when
> you have two windows displaying a large buffer, one display near the end,
> and the other near the beginning, and you make changes at the beginning
> of the buffer.  I haven't seen any performance bug-reports or complaints
> about it, so it appears that those circumstances are very rare.

Or you open a buffer and jumps to point-max, which I assume isn’t that rare. OTOH if the whole buffer has been parsed, this two-window setup shouldn’t be slow when making changes because incremental parsing is fast, and we don’t need to re-parse the whole buffer.

Yuan

[-- Attachment #2: Type: text/html, Size: 4738 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 18:17                         ` Yuan Fu
@ 2020-04-02 18:26                           ` Stefan Monnier
  2020-04-03  2:16                           ` Stephen Leake
  1 sibling, 0 replies; 46+ messages in thread
From: Stefan Monnier @ 2020-04-02 18:26 UTC (permalink / raw)
  To: Yuan Fu
  Cc: mwd, Eli Zaretskii, stephen_leake, Michael Welsh Duggan,
	emacs-devel

>> As mentioned in some other message, this design becomes inefficient when
>> you have two windows displaying a large buffer, one display near the end,
>> and the other near the beginning, and you make changes at the beginning
>> of the buffer.  I haven't seen any performance bug-reports or complaints
>> about it, so it appears that those circumstances are very rare.
> Or you open a buffer and jumps to point-max,

I don't think this case is handled inefficiently by the approach taken
by `syntax-propertize`: it will take some time, yes, but it's largely
unavoidable in general (it's the same time that tree-sitter's and
CC-mode's up-front full parse have to pay for).

> OTOH if the whole buffer has been parsed, this two-window setup
> shouldn’t be slow when making changes because incremental parsing is fast,
> and we don’t need to re-parse the whole buffer.

That's supposed to be true with tree-sitter and CC-mode, but with
`syntax-propertize` we reparse everything between the first and the
second window, even though most of it was likely left untouched and
parsed identically to last time.

        Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 16:19                   ` Michael Welsh Duggan
  2020-04-02 17:18                     ` Yuan Fu
@ 2020-04-02 18:27                     ` Eli Zaretskii
  2020-04-02 18:50                       ` Michael Welsh Duggan
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-02 18:27 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: mwd, stephen_leake, emacs-devel

> From: Michael Welsh Duggan <mwd@cert.org>
> Cc: <mwd@md5i.com>, <stephen_leake@stephe-leake.org>, <emacs-devel@gnu.org>
> Date: Thu, 02 Apr 2020 12:19:44 -0400
> 
> >> > Each buffer always knows which part of it remains unchanged.  When
> >> > fontification is invoked, it should start from that place.  So there's
> >> > no need to parse _all_ prior context.
> >> 
> >> Why not?  Won't it miss that hypothetical opening comment delimiter, for
> >> example?
> >
> > No, because that was already taken care of when that comment delimiter
> > was inserted.
> 
> In which case it's covered by the paragraph you omitted.

I don't think it is, because you were talking about parsing the full
buffer, whereas I'm talking about parsing only what affects the
display in the window.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 17:18                     ` Yuan Fu
  2020-04-02 17:39                       ` Stefan Monnier
@ 2020-04-02 18:29                       ` Eli Zaretskii
  1 sibling, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-02 18:29 UTC (permalink / raw)
  To: Yuan Fu; +Cc: mwd, stephen_leake, mwd, emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 2 Apr 2020 13:18:51 -0400
> Cc: Eli Zaretskii <eliz@gnu.org>,
>  mwd@md5i.com,
>  stephen_leake@stephe-leake.org,
>  emacs-devel@gnu.org
> 
> Can we do something similar to fontified text property? I.e., only parse the first screen full of text when a buffer is opened, and mark the rest as unparsed with text property. Then when we need to access the parse tree (for fortification, etc), send all unparsed text before some point (e.g., last visible char in the window) to the parser.

The current "parser" used by font-lock is used exactly in this fashion
from the display engine, IIUC what you mean.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 18:27                     ` Eli Zaretskii
@ 2020-04-02 18:50                       ` Michael Welsh Duggan
  2020-04-02 19:03                         ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Michael Welsh Duggan @ 2020-04-02 18:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mwd, stephen_leake, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Michael Welsh Duggan <mwd@cert.org>
>> Cc: <mwd@md5i.com>, <stephen_leake@stephe-leake.org>, <emacs-devel@gnu.org>
>> Date: Thu, 02 Apr 2020 12:19:44 -0400
>> 
>> >> > Each buffer always knows which part of it remains unchanged.  When
>> >> > fontification is invoked, it should start from that place.  So there's
>> >> > no need to parse _all_ prior context.
>> >> 
>> >> Why not?  Won't it miss that hypothetical opening comment delimiter, for
>> >> example?
>> >
>> > No, because that was already taken care of when that comment delimiter
>> > was inserted.
>> 
>> In which case it's covered by the paragraph you omitted.
>
> I don't think it is, because you were talking about parsing the full
> buffer, whereas I'm talking about parsing only what affects the
> display in the window.

No, not really.  Though I think it's just miscommunication on my end.
Let me sum up what I believe has been said.  I don't claim to be correct
in any of this, but it is what I believe is the current set of facts:

In normal usage, these external parsers are supposed to work like this:

1) Parse the whole file.  (This is a step that I believe you'd like to
   avoid, but I'm talking about how it is intended to work right now.)

2) At this point the parser holds a parse tree and can be asked
   questions about syntactic information about any portion of the
   buffer.

3) Subsequently any change to the buffer is sent to the parser (just the
   change) which updates its view of the world and it's internal parse
   state.  

4) After each change, the parse might communicate some of its syntactic
   information back to the program using it, or it might just wait for
   the program to ask more questions.  Of this, I am uncertain.

If this is correct, I also think we could avoid (1) as an optimization.
In this case we only send the text from (save-restriction (widen)
(point-min)) to (window-end) to the parser as soon as the buffer is
visible.  Then treat scrolling down as a change that adds text to the
buffer (from the parser's point of view).  This may not produce correct
semantic information in all cases, but it is probably a reasonable first
approximation in the event that we want to avoid (1).  If we were to do
this, we would probably want to make this configurable.

Now that I have hopefully communicated my understanding, now people can
express how I have misunderstood and/or just gotten this completely
wrong.  Which is just fine, and hopefully I will have learned something
in the process.

-- 
Michael Welsh Duggan
(mwd@cert.org)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 18:50                       ` Michael Welsh Duggan
@ 2020-04-02 19:03                         ` Eli Zaretskii
  2020-04-02 19:39                           ` 조성빈
  2020-04-02 19:48                           ` Stefan Monnier
  0 siblings, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-02 19:03 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: mwd, stephen_leake, emacs-devel

> From: Michael Welsh Duggan <mwd@cert.org>
> Cc: <mwd@md5i.com>, <stephen_leake@stephe-leake.org>, <emacs-devel@gnu.org>
> Date: Thu, 02 Apr 2020 14:50:18 -0400
> 
> If this is correct, I also think we could avoid (1) as an optimization.
> In this case we only send the text from (save-restriction (widen)
> (point-min)) to (window-end) to the parser as soon as the buffer is
> visible.  Then treat scrolling down as a change that adds text to the
> buffer (from the parser's point of view).  This may not produce correct
> semantic information in all cases, but it is probably a reasonable first
> approximation in the event that we want to avoid (1).

Yes, with one correction: ideally, it should be unnecessary to start
from point-min (which could be a long way away).  Most languages
should do well enough with starting from the beginning of the
outermost function or class that affects the displayed text.  IOW,
start from window-start, then go back until you find the top-level
syntactic construct; then parse from there.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 19:03                         ` Eli Zaretskii
@ 2020-04-02 19:39                           ` 조성빈
  2020-04-03  6:37                             ` Eli Zaretskii
  2020-04-02 19:48                           ` Stefan Monnier
  1 sibling, 1 reply; 46+ messages in thread
From: 조성빈 @ 2020-04-02 19:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Michael Welsh Duggan, mwd, stephen_leake, Emacs-devel


> 2020. 4. 3. 오전 4:04, Eli Zaretskii <eliz@gnu.org> 작성:
> 
> 
>> 
>> From: Michael Welsh Duggan <mwd@cert.org>
>> Cc: <mwd@md5i.com>, <stephen_leake@stephe-leake.org>, <emacs-devel@gnu.org>
>> Date: Thu, 02 Apr 2020 14:50:18 -0400
>> 
>> If this is correct, I also think we could avoid (1) as an optimization.
>> In this case we only send the text from (save-restriction (widen)
>> (point-min)) to (window-end) to the parser as soon as the buffer is
>> visible.  Then treat scrolling down as a change that adds text to the
>> buffer (from the parser's point of view).  This may not produce correct
>> semantic information in all cases, but it is probably a reasonable first
>> approximation in the event that we want to avoid (1).
> 
> Yes, with one correction: ideally, it should be unnecessary to start
> from point-min (which could be a long way away).  Most languages
> should do well enough with starting from the beginning of the
> outermost function or class that affects the displayed text.

AFAIU, determining that starting point is a non-trivial task, and if Emacs wants to present the user an exact representation, the text from point-min is still needed: just a hypothetical case would be having a file with all code commented out.

Trying to find out the starting point in lisp would be hard enough and possibly will be slower than just passing everything to tree-sitter. 

>  IOW,
> start from window-start, then go back until you find the top-level
> syntactic construct; then parse from there.
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 19:03                         ` Eli Zaretskii
  2020-04-02 19:39                           ` 조성빈
@ 2020-04-02 19:48                           ` Stefan Monnier
  1 sibling, 0 replies; 46+ messages in thread
From: Stefan Monnier @ 2020-04-02 19:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mwd, stephen_leake, Michael Welsh Duggan, emacs-devel

> outermost function or class that affects the displayed text.
> IOW, start from window-start, then go back until you find the top-level
> syntactic construct; then parse from there.

I'm looking forward to your proof-of-concept implementation of "go back
until you find the top-level syntactic construct".

In the mean time, I'll be happy to wait the 0.2s tree-sitter takes for the
initial full parse.

        Stefan

PS: AFAIK, "find the top-level syntactic construct" is a big part of what
CC-mode works so hard on.  It's *hard* to do if you want it to work
reliably (like users expect nowadays).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 14:03         ` Eli Zaretskii
  2020-04-02 14:27           ` Michael Welsh Duggan
@ 2020-04-03  1:55           ` Stephen Leake
  2020-04-03  4:47             ` Jorge Javier Araya Navarro
  2020-04-03  7:32             ` Eli Zaretskii
  1 sibling, 2 replies; 46+ messages in thread
From: Stephen Leake @ 2020-04-03  1:55 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Wed, 01 Apr 2020 11:51:40 -0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Can you tell in more detail why you need to rely on these hooks?  They
>> > shouldn't be necessary, AFAIU.  
>> 
>> It is an optimization choice.
>> 
>> In an unmodified buffer, that is smaller than 100,000 characters
>> (default setting of wisi-partial-parse-threshold), the entire buffer is
>> parsed once; that applies faces to all the Ada identifiers that need
>> faces (standard font-lock regexp handles the reserved words). Then when
>> font-lock fontifies a region, no parsing is needed.
>
> But why do you need that initial full parse in the first place?  Is
> parsing parts of the buffer so much harder?

Because the parser must see a complete top level grammar statement. In
Ada, that's the whole file; a typical file looks like:

package Nifty is

    type Foo is ...;

    function Function_1 is ...;

end Nifty;

The parser needs to see all of the "package" declaration. Java and C++
header files are similar; a single class or namespace. In C++ and C body
files, there are lots of small declarations, and you could parse each
one of those independently, but _only_ if Emacs can find the start and
end of each, which is hard.

In addition, to properly compute indent, you need the fully nested
context. Computing faces usually doesn't need that, but it might in some
cases.

>> Indent is similar; the parse sets text properties holding the indent for
>> each line; indent-region then applies them.
>
> Indent is a different use case: it happens by user command, and thus
> has different time restrictions than redisplay.

Yes, but it is computed by the same parser, so it is relevant.

>> If the default setting of jit-lock-defer-time (ie nil) is used, then
>> font-lock runs immediately after each change, and the after-change hooks
>> are not needed. But as I have mentioned, I always run with
>> jit-lock-defer-time set to 1.0 (because parsing is not fast enough in
>> some cases), so the change hooks are needed.
>
> AFAIU, tree-sitter and similar parsers are supposed to be much faster,
> so the problem with slow parsing, and all the solutions to alleviate
> that problem, may not be necessary, if they are the only reason for
> using the hooks.

The main reason the ada-mode parser is too slow is the error correction.
tree-sitter appears to have less sophisticated error correction, which
will give worse results with code under edit. The ada-mode parser can be
speeded up by specifying parameters that cripple the error correction.

In addition, users will always create huge files (where "huge" means
"bigger than we've seen before"); there are always speed limits. The
reason ada-mode has partial parse is that Eurocontrol has huge files,
that they occasionally edit, and always parsing the whole file, even in
the absence of syntax errors, was too slow.

>> The alternative to not requiring after-change hooks is to always do a full
>> parse, for ever call of fontify-region or indent-region. That is far too
>> slow.
>
> Even for indentation, a full parse should not be needed.  You need to
> only parse the outermost enclosing function/procedure, right?  That's
> rarely the full buffer, except when the buffer is small.

As discussed above, that depends on your language; in Ada it is _always_
the full buffer. And finding the start of a function in C and C++ is hard.

>> Note that Tree-Sitter requires one full parse of the buffer to generate
>> the parse tree that is later updated incrementally; in an unmodified
>> buffer, only that one parse is needed.
>
> Tree-sitter cannot know what the full buffer holds, so nothing
> prevents us from passing it just part of the buffer.  After all,
> tree-sitter should be able to do a decent job when the part we pass to
> it actually _is_ all we have in the buffer, right?

Same issues as above.

>> > And they cannot pick up every relevant change; for example, what
>> > happens if some face used for font-lock is modified?
>> 
>> Yes, that is a flaw. Not likely to occur in everyday use
>
> Redisplay cannot rely on something being "unlikely", because it's
> expected to produce correct results in all situations.  

The flaw is not in ada-mode's use of a parser or after-change-functions;
it's a general problem with font-lock.

The face values are applied to the buffer text as text properties
containing the symbol that holds the face to be used; for example
(font-lock-face font-lock-function-name-face). If the contents of that
symbol change, then redisplay must be rerun to apply the correct values.
This does _not_ require a reparse; the parser sets the text property,
and that has not changed.

Use case: A c-mode buffer A is currently displayed in a window in a
frame, it is syntactically correct, and all displayed faces are correct.
In another frame, the user uses 'M-x set-variable' to change the value
of font-lock-function-name-face.

To update the display, something has to trigger redisplay of buffer A. I
don't think using M-x set-variable in a different frame does that.

Switching buffers in a frame does cause a redisplay (to update the menu
and mode line); If M-x set-variable is done in the same frame as buffer
A, the change in font-lock-function-name-face should show up as
expected.

A similar use case would be changing from "light mode" to "dark mode".
That could be done by changing the theme using load-theme; that should
force a redisplay (I assume it does; I have not checked). 

Other than the global face variables, ada-mode does not have any
variables that control faces. Some other modes may, for example setting
the level of highlighting to minimal or max. In that case, the font-lock
regexps change, and the function that does that presumably sets
fontified to nil in the current buffer, and should also force redisplay.
If ada-mode adds a feature like this, there will be a function to change
it (perhaps a custom variable change function) that also forces a
reparse and redisplay.

> I can understand why fontification methods that are too slow want to
> get some help from hooks, but when we design and implement novel
> fontification methods using fast parsers, we should first try doing
> that without any hooks,

Yes, premature optimization is evil. Using tree-sitter to implement
font-lock should start by always parsing the whole buffer for every call
of fontify-region. If that is fast enough, we're done. If not, we can
consider whether parsing a smaller part of the buffer is possible.

Note that the fact that tree-sitter provides incremental parse is a
strong hint that the answer will be "it's not fast enough".

>> >> By default font-lock runs after every character typed
>> >
>> > No, it only runs when redisplay kicks in.  If you type very quickly,
>> > it won't run for every character.  At least AFAIR.
>> 
>> What triggers redisplay?
>
> When Emacs is about to read input, if no input is available, it
> performs redisplay.  IOW, Emacs enters redisplay when it's about to
> become idle.
>
<snip>
>> The elisp manual section "Forcing redisplay" says "Emacs normally tries
>> to redisplay the screen whenever it waits for input." After I type the
>> first character, it is no longer waiting for input, it is processing
>> that character. I assume here "process that char code" includes running
>> after-change-functions, which is (small) elisp code. But I guess after
>> processing that char, before calling redisplay, it checks if there is
>> more input, which should be true if I type fast enough. Perhaps "process
>> that char code" is faster than the combination of my fingers and the
>> keyboard char send rate?
>
> Yes, most probably.

Ok, so in practice, it is not possible to type fast enough, and
font-lock runs after every character typed. 

> In other similar situations (e.g., in Flyspell mode) we wait for some
> non-zero idle time before actually running the code which could react
> to slow typing with annoying messages.

Since font-lock is running a parser, it detects syntax errors. I
could delay the display of the fringe mark, without delaying font-lock
itself. I'll put that on my list.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 15:15             ` Eli Zaretskii
  2020-04-02 15:24               ` Michael Welsh Duggan
@ 2020-04-03  2:06               ` Stephen Leake
  2020-04-03  7:33                 ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Stephen Leake @ 2020-04-03  2:06 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Michael Welsh Duggan <mwd@md5i.com>
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,  emacs-devel@gnu.org
>> Date: Thu, 02 Apr 2020 10:27:23 -0400
>> 
>> > But why do you need that initial full parse in the first place?  Is
>> > parsing parts of the buffer so much harder?
>> 
>> I would think that you at least need to parse everything displayed and
>> everything before what is displayed.  (You need all prior context.  What
>> if someone opened a comment on line 1 and hasn't closed it, for
>> example?)
>
> Each buffer always knows which part of it remains unchanged.  

Only because jit-lock-after-change on after-change-functions updates
that knowledge, via the fontified text property. But that in turn
assumes that the entire buffer has been fontified, which is typically
not the case; I rarely scroll thru an entire file.

Unless there is some other mechanism that maintains change locations?
There is buffer-modified-p, but that's a single binary for the whole
buffer. ada-mode has one; it also uses after-change-functions.

> When fontification is invoked, it should start from that place. 

I think you mean "from the point of the first change" which means "from
the first point where fontified is nil". But that's not what font-lock
does; it starts from the current top of window if that's after the first
non-fontified point.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 18:17                         ` Yuan Fu
  2020-04-02 18:26                           ` Stefan Monnier
@ 2020-04-03  2:16                           ` Stephen Leake
  1 sibling, 0 replies; 46+ messages in thread
From: Stephen Leake @ 2020-04-03  2:16 UTC (permalink / raw)
  To: emacs-devel

Yuan Fu <casouri@gmail.com> writes:

>> you have two windows displaying a large buffer, one display near the end,
>> and the other near the beginning, and you make changes at the beginning
>> of the buffer.  I haven't seen any performance bug-reports or complaints
>> about it, so it appears that those circumstances are very rare.
>
> Or you open a buffer and jumps to point-max, which I assume isn’t that
> rare. OTOH if the whole buffer has been parsed, this two-window setup
> shouldn’t be slow when making changes because incremental parsing is
> fast, and we don’t need to re-parse the whole buffer.

That assumes that changes in syntax before (or after) a point don't
affect fontification at that point.

That depends on the language being fontified.

As has been mentioned several times, inserting a block comment start
violates that assumption (if the language has those; Ada doesn't), as
does inserting an open string quote (if the string is not closed by line
end; it is in Ada, but that's not supported in Emacs).

There are smaller cases; in ada-mode, a name like Package_1.Function_1 has
different faces for the parts before and after the dot. So the
fontification of "Package" depends on the presence of ".", which comes
after.

I assume tree-sitter gets this right; it should parse from the inserted
block comment start to the end of the file, even though the text change
message did not include all of that.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03  1:55           ` Stephen Leake
@ 2020-04-03  4:47             ` Jorge Javier Araya Navarro
  2020-04-03  7:32             ` Eli Zaretskii
  1 sibling, 0 replies; 46+ messages in thread
From: Jorge Javier Araya Navarro @ 2020-04-03  4:47 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 9178 bytes --]

> Note that the fact that tree-sitter provides incremental parse is a
> strong hint that the answer will be "it's not fast enough".

that's a non-sequitur, it can also mean that really huge files can be
worked on just as if they were a couple of hundred of lines (after the
first parse, that is)

El jue., 2 de abr. de 2020 a la(s) 19:56, Stephen Leake (
stephen_leake@stephe-leake.org) escribió:

> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> From: Stephen Leake <stephen_leake@stephe-leake.org>
> >> Date: Wed, 01 Apr 2020 11:51:40 -0800
> >>
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >>
> >> > Can you tell in more detail why you need to rely on these hooks?  They
> >> > shouldn't be necessary, AFAIU.
> >>
> >> It is an optimization choice.
> >>
> >> In an unmodified buffer, that is smaller than 100,000 characters
> >> (default setting of wisi-partial-parse-threshold), the entire buffer is
> >> parsed once; that applies faces to all the Ada identifiers that need
> >> faces (standard font-lock regexp handles the reserved words). Then when
> >> font-lock fontifies a region, no parsing is needed.
> >
> > But why do you need that initial full parse in the first place?  Is
> > parsing parts of the buffer so much harder?
>
> Because the parser must see a complete top level grammar statement. In
> Ada, that's the whole file; a typical file looks like:
>
> package Nifty is
>
>     type Foo is ...;
>
>     function Function_1 is ...;
>
> end Nifty;
>
> The parser needs to see all of the "package" declaration. Java and C++
> header files are similar; a single class or namespace. In C++ and C body
> files, there are lots of small declarations, and you could parse each
> one of those independently, but _only_ if Emacs can find the start and
> end of each, which is hard.
>
> In addition, to properly compute indent, you need the fully nested
> context. Computing faces usually doesn't need that, but it might in some
> cases.
>
> >> Indent is similar; the parse sets text properties holding the indent for
> >> each line; indent-region then applies them.
> >
> > Indent is a different use case: it happens by user command, and thus
> > has different time restrictions than redisplay.
>
> Yes, but it is computed by the same parser, so it is relevant.
>
> >> If the default setting of jit-lock-defer-time (ie nil) is used, then
> >> font-lock runs immediately after each change, and the after-change hooks
> >> are not needed. But as I have mentioned, I always run with
> >> jit-lock-defer-time set to 1.0 (because parsing is not fast enough in
> >> some cases), so the change hooks are needed.
> >
> > AFAIU, tree-sitter and similar parsers are supposed to be much faster,
> > so the problem with slow parsing, and all the solutions to alleviate
> > that problem, may not be necessary, if they are the only reason for
> > using the hooks.
>
> The main reason the ada-mode parser is too slow is the error correction.
> tree-sitter appears to have less sophisticated error correction, which
> will give worse results with code under edit. The ada-mode parser can be
> speeded up by specifying parameters that cripple the error correction.
>
> In addition, users will always create huge files (where "huge" means
> "bigger than we've seen before"); there are always speed limits. The
> reason ada-mode has partial parse is that Eurocontrol has huge files,
> that they occasionally edit, and always parsing the whole file, even in
> the absence of syntax errors, was too slow.
>
> >> The alternative to not requiring after-change hooks is to always do a
> full
> >> parse, for ever call of fontify-region or indent-region. That is far too
> >> slow.
> >
> > Even for indentation, a full parse should not be needed.  You need to
> > only parse the outermost enclosing function/procedure, right?  That's
> > rarely the full buffer, except when the buffer is small.
>
> As discussed above, that depends on your language; in Ada it is _always_
> the full buffer. And finding the start of a function in C and C++ is hard.
>
> >> Note that Tree-Sitter requires one full parse of the buffer to generate
> >> the parse tree that is later updated incrementally; in an unmodified
> >> buffer, only that one parse is needed.
> >
> > Tree-sitter cannot know what the full buffer holds, so nothing
> > prevents us from passing it just part of the buffer.  After all,
> > tree-sitter should be able to do a decent job when the part we pass to
> > it actually _is_ all we have in the buffer, right?
>
> Same issues as above.
>
> >> > And they cannot pick up every relevant change; for example, what
> >> > happens if some face used for font-lock is modified?
> >>
> >> Yes, that is a flaw. Not likely to occur in everyday use
> >
> > Redisplay cannot rely on something being "unlikely", because it's
> > expected to produce correct results in all situations.
>
> The flaw is not in ada-mode's use of a parser or after-change-functions;
> it's a general problem with font-lock.
>
> The face values are applied to the buffer text as text properties
> containing the symbol that holds the face to be used; for example
> (font-lock-face font-lock-function-name-face). If the contents of that
> symbol change, then redisplay must be rerun to apply the correct values.
> This does _not_ require a reparse; the parser sets the text property,
> and that has not changed.
>
> Use case: A c-mode buffer A is currently displayed in a window in a
> frame, it is syntactically correct, and all displayed faces are correct.
> In another frame, the user uses 'M-x set-variable' to change the value
> of font-lock-function-name-face.
>
> To update the display, something has to trigger redisplay of buffer A. I
> don't think using M-x set-variable in a different frame does that.
>
> Switching buffers in a frame does cause a redisplay (to update the menu
> and mode line); If M-x set-variable is done in the same frame as buffer
> A, the change in font-lock-function-name-face should show up as
> expected.
>
> A similar use case would be changing from "light mode" to "dark mode".
> That could be done by changing the theme using load-theme; that should
> force a redisplay (I assume it does; I have not checked).
>
> Other than the global face variables, ada-mode does not have any
> variables that control faces. Some other modes may, for example setting
> the level of highlighting to minimal or max. In that case, the font-lock
> regexps change, and the function that does that presumably sets
> fontified to nil in the current buffer, and should also force redisplay.
> If ada-mode adds a feature like this, there will be a function to change
> it (perhaps a custom variable change function) that also forces a
> reparse and redisplay.
>
> > I can understand why fontification methods that are too slow want to
> > get some help from hooks, but when we design and implement novel
> > fontification methods using fast parsers, we should first try doing
> > that without any hooks,
>
> Yes, premature optimization is evil. Using tree-sitter to implement
> font-lock should start by always parsing the whole buffer for every call
> of fontify-region. If that is fast enough, we're done. If not, we can
> consider whether parsing a smaller part of the buffer is possible.
>
> Note that the fact that tree-sitter provides incremental parse is a
> strong hint that the answer will be "it's not fast enough".
>
> >> >> By default font-lock runs after every character typed
> >> >
> >> > No, it only runs when redisplay kicks in.  If you type very quickly,
> >> > it won't run for every character.  At least AFAIR.
> >>
> >> What triggers redisplay?
> >
> > When Emacs is about to read input, if no input is available, it
> > performs redisplay.  IOW, Emacs enters redisplay when it's about to
> > become idle.
> >
> <snip>
> >> The elisp manual section "Forcing redisplay" says "Emacs normally tries
> >> to redisplay the screen whenever it waits for input." After I type the
> >> first character, it is no longer waiting for input, it is processing
> >> that character. I assume here "process that char code" includes running
> >> after-change-functions, which is (small) elisp code. But I guess after
> >> processing that char, before calling redisplay, it checks if there is
> >> more input, which should be true if I type fast enough. Perhaps "process
> >> that char code" is faster than the combination of my fingers and the
> >> keyboard char send rate?
> >
> > Yes, most probably.
>
> Ok, so in practice, it is not possible to type fast enough, and
> font-lock runs after every character typed.
>
> > In other similar situations (e.g., in Flyspell mode) we wait for some
> > non-zero idle time before actually running the code which could react
> > to slow typing with annoying messages.
>
> Since font-lock is running a parser, it detects syntax errors. I
> could delay the display of the fringe mark, without delaying font-lock
> itself. I'll put that on my list.
>
> --
> -- Stephe
>
>

[-- Attachment #2: Type: text/html, Size: 10988 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-02 19:39                           ` 조성빈
@ 2020-04-03  6:37                             ` Eli Zaretskii
  2020-04-03 17:27                               ` Stephen Leake
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-03  6:37 UTC (permalink / raw)
  To: 조성빈; +Cc: mwd, stephen_leake, mwd, Emacs-devel

> From: 조성빈 <pcr910303@icloud.com>
> Date: Fri, 3 Apr 2020 04:39:08 +0900
> Cc: Michael Welsh Duggan <mwd@cert.org>, mwd@md5i.com,
>  stephen_leake@stephe-leake.org, Emacs-devel@gnu.org
> 
> > Yes, with one correction: ideally, it should be unnecessary to start
> > from point-min (which could be a long way away).  Most languages
> > should do well enough with starting from the beginning of the
> > outermost function or class that affects the displayed text.
> 
> AFAIU, determining that starting point is a non-trivial task, and if Emacs wants to present the user an exact representation, the text from point-min is still needed: just a hypothetical case would be having a file with all code commented out.

The decision whether there's a need to go to the beginning should be
made by the parser, not by the infrastructure that invokes it.  If the
parser needs access to earlier parts of the buffer, it should do that;
but the fontification infrastructure should not force it to do that.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03  1:55           ` Stephen Leake
  2020-04-03  4:47             ` Jorge Javier Araya Navarro
@ 2020-04-03  7:32             ` Eli Zaretskii
  2020-04-03 17:05               ` Stephen Leake
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-03  7:32 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Thu, 02 Apr 2020 17:55:36 -0800
> 
> > But why do you need that initial full parse in the first place?  Is
> > parsing parts of the buffer so much harder?
> 
> Because the parser must see a complete top level grammar statement. In
> Ada, that's the whole file; a typical file looks like:

OK, so this depends on the language, and is not a universal
requirement.  I expect that most languages are not like Ada.

the point I think we should take from this is that the font-lock
infrastructure should not _force_ the parser to always perform a full
parse, it should delegate that to the parser or to a language-aware
layer that calls the parser.

> Use case: A c-mode buffer A is currently displayed in a window in a
> frame, it is syntactically correct, and all displayed faces are correct.
> In another frame, the user uses 'M-x set-variable' to change the value
> of font-lock-function-name-face.
> 
> To update the display, something has to trigger redisplay of buffer A. I
> don't think using M-x set-variable in a different frame does that.

That's not so.  Redisplay is called every time Emacs is about to
become idle.  It just returns almost immediately if it detects that no
changes happened since last redisplay that require any work.  I guess
this fast return is what you mean by "not triggering redisplay".

Regarding the above use case, I don't think I understand what exactly
did you mean.  First, you cannot use set-variable to modify the value
of font-lock-function-name-face, because it isn't a defcustom.
Second, what exactly did you mean to set it to, to cause the effect
you were talking about?  IOW, can you present a complete recipe,
starting from "emacs -Q", where you make such a change, and then you
need to switch buffers to cause function names be displayed
differently?  I can tell you that if you replace "M-x set-variable"
with "M-x customize-face" and change some attribute of
font-lock-function-name-face, the effect on another frame is
immediate, which means redisplay takes note of the change and redraws
the other frame.  But I'm not sure this is the same use case you had
in mind.

> >> The elisp manual section "Forcing redisplay" says "Emacs normally tries
> >> to redisplay the screen whenever it waits for input." After I type the
> >> first character, it is no longer waiting for input, it is processing
> >> that character. I assume here "process that char code" includes running
> >> after-change-functions, which is (small) elisp code. But I guess after
> >> processing that char, before calling redisplay, it checks if there is
> >> more input, which should be true if I type fast enough. Perhaps "process
> >> that char code" is faster than the combination of my fingers and the
> >> keyboard char send rate?
> >
> > Yes, most probably.
> 
> Ok, so in practice, it is not possible to type fast enough, and
> font-lock runs after every character typed. 

It's possible.  One situation where this happens is when you have
something expensive running after each command, for example some
expensive post-command-hook.  Another situation is to make your
keyboard's auto-repeat be very fast, and lean on some key.  Yet
another is to run Emacs on an old and not-so-fast machine.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03  2:06               ` Stephen Leake
@ 2020-04-03  7:33                 ` Eli Zaretskii
  2020-04-03 17:24                   ` Stephen Leake
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-03  7:33 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> >> I would think that you at least need to parse everything displayed and
> >> everything before what is displayed.  (You need all prior context.  What
> >> if someone opened a comment on line 1 and hasn't closed it, for
> >> example?)
> >
> > Each buffer always knows which part of it remains unchanged.  
> 
> Only because jit-lock-after-change on after-change-functions updates
> that knowledge, via the fontified text property. But that in turn
> assumes that the entire buffer has been fontified, which is typically
> not the case; I rarely scroll thru an entire file.
> 
> Unless there is some other mechanism that maintains change locations?

I meant BEG_UNCHANGED and END_UNCHANGED.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03  7:32             ` Eli Zaretskii
@ 2020-04-03 17:05               ` Stephen Leake
  2020-04-03 18:19                 ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Stephen Leake @ 2020-04-03 17:05 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Thu, 02 Apr 2020 17:55:36 -0800
>> 
>> > But why do you need that initial full parse in the first place?  Is
>> > parsing parts of the buffer so much harder?
>> 
>> Because the parser must see a complete top level grammar statement. In
>> Ada, that's the whole file; a typical file looks like:
>
> OK, so this depends on the language, and is not a universal
> requirement.  

Yes.

> the point I think we should take from this is that the font-lock
> infrastructure should not _force_ the parser to always perform a full
> parse, it should delegate that to the parser or to a language-aware
> layer that calls the parser.

Yes.

>> Use case: A c-mode buffer A is currently displayed in a window in a
>> frame, it is syntactically correct, and all displayed faces are correct.
>> In another frame, the user uses 'M-x set-variable' to change the value
>> of font-lock-function-name-face.
>> 
>> To update the display, something has to trigger redisplay of buffer A. I
>> don't think using M-x set-variable in a different frame does that.
>
> That's not so.  Redisplay is called every time Emacs is about to
> become idle.  It just returns almost immediately if it detects that no
> changes happened since last redisplay that require any work.  I guess
> this fast return is what you mean by "not triggering redisplay".

Ok.

> Regarding the above use case, I don't think I understand what exactly
> did you mean.  First, you cannot use set-variable to modify the value
> of font-lock-function-name-face, because it isn't a defcustom.
> Second, what exactly did you mean to set it to, to cause the effect
> you were talking about?  IOW, can you present a complete recipe,
> starting from "emacs -Q", where you make such a change, and then you
> need to switch buffers to cause function names be displayed
> differently?  I can tell you that if you replace "M-x set-variable"
> with "M-x customize-face" and change some attribute of
> font-lock-function-name-face, the effect on another frame is
> immediate, which means redisplay takes note of the change and redraws
> the other frame.  But I'm not sure this is the same use case you had
> in mind.

Ok. In this case, customize-face causes the redisplay. So your original
objection to the ada-mode face design, which was:

> And they cannot pick up every relevant change; for example, what
> happens if some face used for font-lock is modified?

is moot. Unless some elisp program modifies the variable without using
customize-face, but then that program has the responsibility for forcing
redisplay.

If there are other things that can be changed that should force a
redisplay, but currently don't, I would say that's a bug, either in
ada-mode or elsewhere. So far there have been no bugs filed against
ada-mode for this type if issue.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03  7:33                 ` Eli Zaretskii
@ 2020-04-03 17:24                   ` Stephen Leake
  2020-04-03 18:39                     ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Stephen Leake @ 2020-04-03 17:24 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> >> I would think that you at least need to parse everything displayed and
>> >> everything before what is displayed.  (You need all prior context.  What
>> >> if someone opened a comment on line 1 and hasn't closed it, for
>> >> example?)
>> >
>> > Each buffer always knows which part of it remains unchanged.  
>> 
>> Only because jit-lock-after-change on after-change-functions updates
>> that knowledge, via the fontified text property. But that in turn
>> assumes that the entire buffer has been fontified, which is typically
>> not the case; I rarely scroll thru an entire file.
>> 
>> Unless there is some other mechanism that maintains change locations?
>
> I meant BEG_UNCHANGED and END_UNCHANGED.

Ah; I was unaware of those. It would be good if these values were
accessible from modules and elisp. That might remove the need for
ada-mode's after-change-functions; I'd have to try it to be sure.
For one thing, ada-mode maintains a list of changed regions, while this
lumps them all together, and includes unchanged text in between. That
would trigger parses that are not actually needed, but that might be a
good trade.

Hmm. Browsing the emacs C source to see if there is such a function (I
didn't find one, but I could have missed it), I found this in keyboard.c:

      /* If the previous command tried to force a specific window-start,
	 forget about that, in case this command moves point far away
	 from that position.  But also throw away beg_unchanged and
	 end_unchanged information in that case, so that redisplay will
	 update the whole window properly.  */

    BUF_BEG_UNCHANGED (b) = BUF_END_UNCHANGED (b) = 0;

which means BEG_UNCHANGED can indicate changes when there are actually
none. As long as that doesn't happen too often, it's acceptable.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03  6:37                             ` Eli Zaretskii
@ 2020-04-03 17:27                               ` Stephen Leake
  0 siblings, 0 replies; 46+ messages in thread
From: Stephen Leake @ 2020-04-03 17:27 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: 조성빈 <pcr910303@icloud.com>
>> Date: Fri, 3 Apr 2020 04:39:08 +0900
>> Cc: Michael Welsh Duggan <mwd@cert.org>, mwd@md5i.com,
>>  stephen_leake@stephe-leake.org, Emacs-devel@gnu.org
>> 
>> > Yes, with one correction: ideally, it should be unnecessary to start
>> > from point-min (which could be a long way away).  Most languages
>> > should do well enough with starting from the beginning of the
>> > outermost function or class that affects the displayed text.
>> 
>> AFAIU, determining that starting point is a non-trivial task, and if
>> Emacs wants to present the user an exact representation, the text
>> from point-min is still needed: just a hypothetical case would be
>> having a file with all code commented out.
>
> The decision whether there's a need to go to the beginning should be
> made by the parser, not by the infrastructure that invokes it.  If the
> parser needs access to earlier parts of the buffer, it should do that;
> but the fontification infrastructure should not force it to do that.

Yes. wisi provides wisi-parse-expand-region to do just that; it
dispatches on the parser.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03 17:05               ` Stephen Leake
@ 2020-04-03 18:19                 ` Eli Zaretskii
  2020-04-04  0:00                   ` Stephen Leake
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-03 18:19 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Fri, 03 Apr 2020 09:05:34 -0800
> 
> > Regarding the above use case, I don't think I understand what exactly
> > did you mean.  First, you cannot use set-variable to modify the value
> > of font-lock-function-name-face, because it isn't a defcustom.
> > Second, what exactly did you mean to set it to, to cause the effect
> > you were talking about?  IOW, can you present a complete recipe,
> > starting from "emacs -Q", where you make such a change, and then you
> > need to switch buffers to cause function names be displayed
> > differently?  I can tell you that if you replace "M-x set-variable"
> > with "M-x customize-face" and change some attribute of
> > font-lock-function-name-face, the effect on another frame is
> > immediate, which means redisplay takes note of the change and redraws
> > the other frame.  But I'm not sure this is the same use case you had
> > in mind.
> 
> Ok. In this case, customize-face causes the redisplay. So your original
> objection to the ada-mode face design, which was:
> 
> > And they cannot pick up every relevant change; for example, what
> > happens if some face used for font-lock is modified?
> 
> is moot.

My point was about using the modification hooks, not about triggering
redisplay.  That redisplay _is_ triggered by such changes (and by
others), was exactly my point.  Namely, relying on redisplay to redraw
the regions that require it, and as side-effect to refontify those
regions, is better than using modification hooks to decide where to
refontify.  And yes, if something changes that affects the appearance,
but redisplay is not triggered by that change, _is_ a bug.  By
contrast, not every such change is guaranteed to call the modification
hooks, and thus relying on those hooks will miss some changes, and you
will not be able to claim that this is a bug in those hooks, at least
not in all cases.

> If there are other things that can be changed that should force a
> redisplay, but currently don't, I would say that's a bug, either in
> ada-mode or elsewhere. So far there have been no bugs filed against
> ada-mode for this type if issue.

I wasn't claiming that ada-mode has bugs; I never use that mode.  I
was making a general argument against using modification hooks as
basis for deciding where to refontify.  This should be delegated to
redisplay, because it's redisplay's job to know which regions need to
be redrawn (including their fontification).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03 17:24                   ` Stephen Leake
@ 2020-04-03 18:39                     ` Eli Zaretskii
  0 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2020-04-03 18:39 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Fri, 03 Apr 2020 09:24:38 -0800
> 
> > I meant BEG_UNCHANGED and END_UNCHANGED.
> 
> Ah; I was unaware of those. It would be good if these values were
> accessible from modules and elisp.

We can discuss this.  It wasn't needed up until now.

> Hmm. Browsing the emacs C source to see if there is such a function (I
> didn't find one, but I could have missed it), I found this in keyboard.c:
> 
>       /* If the previous command tried to force a specific window-start,
> 	 forget about that, in case this command moves point far away
> 	 from that position.  But also throw away beg_unchanged and
> 	 end_unchanged information in that case, so that redisplay will
> 	 update the whole window properly.  */
> 
>     BUF_BEG_UNCHANGED (b) = BUF_END_UNCHANGED (b) = 0;
> 
> which means BEG_UNCHANGED can indicate changes when there are actually
> none.

Yes.  This machinery exists for use by redisplay, and redisplay has
its own methods of finding out what changed and where.  These two
values are optimizations that allow redisplay examine only part of the
buffer for possible changes; it is okay to disable optimizations when
unoptimized code produces the correct result.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: emacs-tree-sitter and Emacs
  2020-04-03 18:19                 ` Eli Zaretskii
@ 2020-04-04  0:00                   ` Stephen Leake
  0 siblings, 0 replies; 46+ messages in thread
From: Stephen Leake @ 2020-04-04  0:00 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Fri, 03 Apr 2020 09:05:34 -0800
>> 
>> > Regarding the above use case, I don't think I understand what exactly
>> > did you mean.  First, you cannot use set-variable to modify the value
>> > of font-lock-function-name-face, because it isn't a defcustom.
>> > Second, what exactly did you mean to set it to, to cause the effect
>> > you were talking about?  IOW, can you present a complete recipe,
>> > starting from "emacs -Q", where you make such a change, and then you
>> > need to switch buffers to cause function names be displayed
>> > differently?  I can tell you that if you replace "M-x set-variable"
>> > with "M-x customize-face" and change some attribute of
>> > font-lock-function-name-face, the effect on another frame is
>> > immediate, which means redisplay takes note of the change and redraws
>> > the other frame.  But I'm not sure this is the same use case you had
>> > in mind.
>> 
>> Ok. In this case, customize-face causes the redisplay. So your original
>> objection to the ada-mode face design, which was:
>> 
>> > And they cannot pick up every relevant change; for example, what
>> > happens if some face used for font-lock is modified?
>> 
>> is moot.
>
> My point was about using the modification hooks, not about triggering
> redisplay.  That redisplay _is_ triggered by such changes (and by
> others), was exactly my point.  

Ok.

> Namely, relying on redisplay to redraw the regions that require it,
> and as side-effect to refontify those regions, is better than using
> modification hooks to decide where to refontify. 

the after-change-functions decide where to _parse_, which is not the
same as where to refontify. The results of a parse for fontify are
cached in text properties; if all the text properties in the fontify
region are current, no parse is needed. If some are not, parsing all the
changed text at least to the end of the fontify region is required, for
Ada.

For other languages where parsing smaller chunks is possible, the
same mechanism works, and the parse must include all of the changed parsible
chunks that overlap the fontify region.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-04-04  0:00 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30  3:23 emacs-tree-sitter and Emacs Jorge Javier Araya Navarro
2020-03-30 13:07 ` Eli Zaretskii
2020-03-30 14:00   ` Stefan Monnier
2020-04-01  0:08     ` Stephen Leake
2020-04-01  0:27   ` Stephen Leake
2020-04-01 13:20     ` Eli Zaretskii
2020-04-01 19:51       ` Stephen Leake
2020-04-02 14:03         ` Eli Zaretskii
2020-04-02 14:27           ` Michael Welsh Duggan
2020-04-02 15:15             ` Eli Zaretskii
2020-04-02 15:24               ` Michael Welsh Duggan
2020-04-02 16:10                 ` Eli Zaretskii
2020-04-02 16:19                   ` Michael Welsh Duggan
2020-04-02 17:18                     ` Yuan Fu
2020-04-02 17:39                       ` Stefan Monnier
2020-04-02 18:17                         ` Yuan Fu
2020-04-02 18:26                           ` Stefan Monnier
2020-04-03  2:16                           ` Stephen Leake
2020-04-02 18:29                       ` Eli Zaretskii
2020-04-02 18:27                     ` Eli Zaretskii
2020-04-02 18:50                       ` Michael Welsh Duggan
2020-04-02 19:03                         ` Eli Zaretskii
2020-04-02 19:39                           ` 조성빈
2020-04-03  6:37                             ` Eli Zaretskii
2020-04-03 17:27                               ` Stephen Leake
2020-04-02 19:48                           ` Stefan Monnier
2020-04-03  2:06               ` Stephen Leake
2020-04-03  7:33                 ` Eli Zaretskii
2020-04-03 17:24                   ` Stephen Leake
2020-04-03 18:39                     ` Eli Zaretskii
2020-04-02 15:33             ` martin rudalics
2020-04-03  1:55           ` Stephen Leake
2020-04-03  4:47             ` Jorge Javier Araya Navarro
2020-04-03  7:32             ` Eli Zaretskii
2020-04-03 17:05               ` Stephen Leake
2020-04-03 18:19                 ` Eli Zaretskii
2020-04-04  0:00                   ` Stephen Leake
2020-04-01 13:28     ` Stefan Monnier
2020-03-30 14:11 ` Stefan Monnier
2020-03-30 17:00   ` Jorge Javier Araya Navarro
2020-03-30 17:07     ` Dmitry Gutov
2020-03-30 17:09       ` Jorge Javier Araya Navarro
2020-03-30 17:22     ` Stefan Monnier
2020-03-30 17:34       ` Jorge Javier Araya Navarro
2020-03-30 17:50       ` Stefan Monnier
2020-04-01  0:30       ` Stephen Leake

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).