Where to place third-party C source code?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Where to place third-party C source code?
       [not found] <1504933445.581219.1569619792280.ref@mail.yahoo.com>
@ 2019-09-27 21:29 ` Jorge Araya Navarro
  2019-09-28  6:31   ` Eli Zaretskii
  0 siblings, 1 reply; 76+ messages in thread
From: Jorge Araya Navarro @ 2019-09-27 21:29 UTC (permalink / raw)
  To: Emacs Developers

Hello!

I was wondering if placing third-party C source code that is used in any feature I would like to implement in the Emacs project is "against the rules", sort of speak.

I ask because I would like to attempt to integrate tree-sitter (a parser generator tool and an incremental parsing library; I'm aware there are a couple of projects that gives Emacs lisp bindings for this project, too; I'm just wondering how the world could be if this thing was shipped in Vanilla Emacs), according to its documentation[1]:

    [...] you can use the library in a larger project by adding one source file to the project. This source file needs three directories to be in the include path when compiled:

    source file:
    - tree-sitter/lib/src/lib.c 

    include directories: 
    - tree-sitter/lib/src
    - tree-sitter/lib/include
    - tree-sitter/lib/utf8proc

The instructions seem straightforward but reading the docs included with the Emacs source code there is no mention wherever this is "fair game" (or not) and where should such code be placed

[1]: https://tree-sitter.github.io/tree-sitter/using-parsers

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Where to place third-party C source code?
  2019-09-27 21:29 ` Where to place third-party C source code? Jorge Araya Navarro
@ 2019-09-28  6:31   ` Eli Zaretskii
  2019-09-28  7:33     ` Jorge Javier Araya Navarro
  0 siblings, 1 reply; 76+ messages in thread
From: Eli Zaretskii @ 2019-09-28  6:31 UTC (permalink / raw)
  To: Jorge Araya Navarro; +Cc: emacs-devel

> Date: Fri, 27 Sep 2019 21:29:52 +0000 (UTC)
> From: Jorge Araya Navarro <jorgejavieran@yahoo.com.mx>
> 
> I was wondering if placing third-party C source code that is used in any feature I would like to implement in the Emacs project is "against the rules", sort of speak.

I don't understand the question.  Any feature supported by Emacs that
needs C-level support has some C code in one of the Emacs C source
files.  There's no "third-party" code, everything is part of Emacs
proper.

>     [...] you can use the library in a larger project by adding one source file to the project. This source file needs three directories to be in the include path when compiled:
> 
>     source file:
>     - tree-sitter/lib/src/lib.c 
> 
>     include directories: 
>     - tree-sitter/lib/src
>     - tree-sitter/lib/include
>     - tree-sitter/lib/utf8proc

I don't see why we would need this method, since tree-sitter is a
library, and Emacs can be linked against that library.  What you quote
is an alternative method, but why would we need such an alternative?

Of course, this is all putting the wagon ahead of the horse: we should
first discuss whether we want to have Emacs be able to link to that
library and provide the related features.  An alternative would be to
have an unbundled module that uses the Emacs module API.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Where to place third-party C source code?
  2019-09-28  6:31   ` Eli Zaretskii
@ 2019-09-28  7:33     ` Jorge Javier Araya Navarro
  2019-09-28  9:53       ` Eli Zaretskii
  2019-09-28 12:54       ` Stefan Monnier
  0 siblings, 2 replies; 76+ messages in thread
From: Jorge Javier Araya Navarro @ 2019-09-28  7:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2655 bytes --]

El sábado 28 de septiembre del 2019 a las 0031 horas, Eli Zaretskii escribió:

>> Date: Fri, 27 Sep 2019 21:29:52 +0000 (UTC)
>> From: Jorge Araya Navarro <jorgejavieran@yahoo.com.mx>
>>
>> I was wondering if placing third-party C source code that is used in any feature I would like to implement in the Emacs project is "against the rules", sort of speak.
>
> I don't understand the question.  Any feature supported by Emacs that
> needs C-level support has some C code in one of the Emacs C source
> files.  There's no "third-party" code, everything is part of Emacs
> proper.
>
>>     [...] you can use the library in a larger project by adding one source file to the project. This source file needs three directories to be in the include path when compiled:
>>
>>     source file:
>>     - tree-sitter/lib/src/lib.c
>>
>>     include directories:
>>     - tree-sitter/lib/src
>>     - tree-sitter/lib/include
>>     - tree-sitter/lib/utf8proc
>
> I don't see why we would need this method, since tree-sitter is a
> library, and Emacs can be linked against that library.  What you quote
> is an alternative method, but why would we need such an alternative?

Well, yes, I realized that adding an option to configure.ac would allow the compiler to find the
source code of Tree Sitter (like `--with-tree-sitter=/some/path/tree-sitter' or who knows)

> Of course, this is all putting the wagon ahead of the horse: we should
> first discuss whether we want to have Emacs be able to link to that
> library and provide the related features.  An alternative would be to
> have an unbundled module that uses the Emacs module API.

Ah, yes. There is one project that provides tree-sitter like a dynamic module using the Emacs module
API[1], but my concern is: why should vanilla Emacs require their final users to download a bunch of
packages to make the user experience better when we could, like, literally, provide them from the
get-go? IIRC one pain-point of Emacs for (new?) users is how much configuration is needed to have a
better editing experience.

We could leverage projects like tree-sitter to improve the user experience in Emacs out-of-the-box,
integrating tree-sitter with Emacs and ship the grammars of some programming languages that Emacs is
already shipped with (like Python and JavaScript) would improve the experience of editing code in
those languages first, and second in any other language supported with a third-party elisp packages,
without mentioning what this could mean in terms of the tooling available for package authors.

[1]: https://github.com/ubolonton/emacs-tree-sitter

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Where to place third-party C source code?
  2019-09-28  7:33     ` Jorge Javier Araya Navarro
@ 2019-09-28  9:53       ` Eli Zaretskii
  2019-09-28 12:54       ` Stefan Monnier
  1 sibling, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2019-09-28  9:53 UTC (permalink / raw)
  To: Jorge Javier Araya Navarro; +Cc: emacs-devel

> From: Jorge Javier Araya Navarro <jorgejavieran@yahoo.com.mx>
> Cc: emacs-devel@gnu.org
> Date: Sat, 28 Sep 2019 01:33:40 -0600
> 
> > I don't see why we would need this method, since tree-sitter is a
> > library, and Emacs can be linked against that library.  What you quote
> > is an alternative method, but why would we need such an alternative?
> 
> Well, yes, I realized that adding an option to configure.ac would allow the compiler to find the
> source code of Tree Sitter (like `--with-tree-sitter=/some/path/tree-sitter' or who knows)

Why would building Emacs require access to the tree-sitter's source
code, even if we decide to provide this feature in core Emacs?  Isn't
linking against a library good enough?

> > Of course, this is all putting the wagon ahead of the horse: we should
> > first discuss whether we want to have Emacs be able to link to that
> > library and provide the related features.  An alternative would be to
> > have an unbundled module that uses the Emacs module API.
> 
> Ah, yes. There is one project that provides tree-sitter like a dynamic module using the Emacs module
> API[1], but my concern is: why should vanilla Emacs require their final users to download a bunch of
> packages to make the user experience better when we could, like, literally, provide them from the
> get-go? IIRC one pain-point of Emacs for (new?) users is how much configuration is needed to have a
> better editing experience.

There's no contradiction between better experience and having this
particular feature be implemented as a module.  We've added the module
API precisely for cases like this one, where some feature needed
access to the Emacs internals on the C level.

Making this part of the Emacs core would need to write some C code in
Emacs itself, and link against the library, it still won't need any
access to the sources of tree-sitter.  We could do that, but we
need first to be sure that this is worth our while, i.e. that having
that feature through an external module will somehow be insufficient,
or if most Emacs users will want to have that.  AFAIU, none of these
reasons were brought up and justified yet.  Improving the user
experience is not of itself a sufficient reason, because every
additional feature basically does that in some sense.  We need a more
serious reason, one that will justify the additional maintenance
burden that will put on the Emacs project in the long run.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Where to place third-party C source code?
  2019-09-28  7:33     ` Jorge Javier Araya Navarro
  2019-09-28  9:53       ` Eli Zaretskii
@ 2019-09-28 12:54       ` Stefan Monnier
  2019-12-26 16:52         ` yyoncho
  1 sibling, 1 reply; 76+ messages in thread
From: Stefan Monnier @ 2019-09-28 12:54 UTC (permalink / raw)
  To: Jorge Javier Araya Navarro; +Cc: Eli Zaretskii, emacs-devel

> Ah, yes. There is one project that provides tree-sitter like a dynamic
> module using the Emacs module API[1], but my concern is: why should
> vanilla Emacs require their final users to download a bunch of
> packages to make the user experience better when we could, like,
> literally, provide them from the get-go? IIRC one pain-point of Emacs
> for (new?) users is how much configuration is needed to have a better
> editing experience.

I'm vaguely familiar with tree-sitter but haven't spent the time
investigating the situation enough to have a clear idea of what happens
with it.  Could you clarify the points below?

1- Other than the fact that it's not available "out of the box" is there
   some other problem with the existing dynamic module?  I.e. would it
   be OK for Emacs to simply use this existing module as-is, just
   bundling it into the distribution tarball, or would it still not be
   as good as a "native" binding?

2- Say I find an SML grammar for tree-sitter, what extra
   (i.e. Emacs-specific) work do I need to do (as maintainer of
   sml-mode) in order for sml-mode users to benefit from it?  and what
   kind of features would they get from it (I assume: indentation, of
   course, and probably some navigation, anything else)?

3- How does tree-sitter compare with the LSP-route (via eglot-mode
   or lsp-mode)?

        Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Where to place third-party C source code?
  2019-09-28 12:54       ` Stefan Monnier
@ 2019-12-26 16:52         ` yyoncho
  2020-01-04  3:25           ` Using incremental parsing in Emacs HaiJun Zhang
  0 siblings, 1 reply; 76+ messages in thread
From: yyoncho @ 2019-12-26 16:52 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Jorge Javier Araya Navarro, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 343 bytes --]

Hi Stefan,


> 3- How does tree-sitter compare with the LSP-route (via eglot-mode
>    or lsp-mode)?
>

lsp protocol is not going to support full-featured highlighting but only
semantic
because it won't be fast enough.

Related: https://github.com/microsoft/vscode/issues/77140 and
https://github.com/Microsoft/vscode/issues/585

Thanks,
Ivan

[-- Attachment #2: Type: text/html, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Using incremental parsing in Emacs
@ 2020-01-03 10:05 Eli Zaretskii
  2020-01-03 13:36 ` phillip.lord
                   ` (8 more replies)
  0 siblings, 9 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-03 10:05 UTC (permalink / raw)
  To: emacs-devel

Would someone like to try to figure out how we could use the
incremental parsing technology in Emacs for making our
programming-language support more accurate and efficient?  One package
that implements this technology is tree-sitter:

  https://tree-sitter.github.io/tree-sitter/

AFAIU, these capabilities could be used as an alternative to
regexp- and syntax-pps-based font-lock, better code folding,
completion, refactoring, and other similar features; in general, any
feature which would benefit from having a parse tree for the source
code in a buffer.

To be able to use such libraries, we need to figure out how to
integrate them into the core, what kind of interfaces would be needed
for that, and what kind of infrastructure we would need for basing
Lisp features on those libraries.  Posting practical ideas for design
of all that would be a good first step in this promising direction.
Bonus points for providing code patches that demonstrate the
implementation of these ideas.

TIA

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 10:05 Eli Zaretskii
@ 2020-01-03 13:36 ` phillip.lord
  2020-01-03 14:24   ` Eli Zaretskii
  2020-01-03 16:00 ` Dmitry Gutov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 76+ messages in thread
From: phillip.lord @ 2020-01-03 13:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emacs-devel, emacs-devel

Think it's already being worked on.

https://github.com/ubolonton/emacs-tree-sitter/

https://github.com/karlotness/tree-sitter.el

The former uses Rust for the dynamic module support. Tree sitter itself 
also uses Javascript and npm to define the language grammars, although 
AFAICT, these compile down to C.


On 2020-01-03 10:05, Eli Zaretskii wrote:
> Would someone like to try to figure out how we could use the
> incremental parsing technology in Emacs for making our
> programming-language support more accurate and efficient?  One package
> that implements this technology is tree-sitter:
> 
>   https://tree-sitter.github.io/tree-sitter/
> 
> AFAIU, these capabilities could be used as an alternative to
> regexp- and syntax-pps-based font-lock, better code folding,
> completion, refactoring, and other similar features; in general, any
> feature which would benefit from having a parse tree for the source
> code in a buffer.
> 
> To be able to use such libraries, we need to figure out how to
> integrate them into the core, what kind of interfaces would be needed
> for that, and what kind of infrastructure we would need for basing
> Lisp features on those libraries.  Posting practical ideas for design
> of all that would be a good first step in this promising direction.
> Bonus points for providing code patches that demonstrate the
> implementation of these ideas.
> 
> TIA



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 13:36 ` phillip.lord
@ 2020-01-03 14:24   ` Eli Zaretskii
  2020-01-03 15:43     ` arthur miller
  0 siblings, 1 reply; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-03 14:24 UTC (permalink / raw)
  To: phillip.lord; +Cc: emacs-devel

> Date: Fri, 03 Jan 2020 13:36:39 +0000
> From: phillip.lord@russet.org.uk
> Cc: emacs-devel@gnu.org, Emacs-devel
>  <emacs-devel-bounces+phillip.lord=russet.org.uk@gnu.org>
> 
> https://github.com/ubolonton/emacs-tree-sitter/
> 
> https://github.com/karlotness/tree-sitter.el
> 
> The former uses Rust for the dynamic module support.

My gut feeling is that modules are not the best way of bringing this
to Emacs (and doing this via Rust on top of that makes even less sense
to me), which is why I suggested to come up with a design first.

> Tree sitter itself also uses Javascript and npm to define the
> language grammars, although AFAICT, these compile down to C.

They compile to C, and I'm quite sure that it shouldn't be too hard to
allow a language grammar to be written in some other scripting
language.  But I think these are secondary considerations at this
stage.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: Using incremental parsing in Emacs
  2020-01-03 14:24   ` Eli Zaretskii
@ 2020-01-03 15:43     ` arthur miller
  0 siblings, 0 replies; 76+ messages in thread
From: arthur miller @ 2020-01-03 15:43 UTC (permalink / raw)
  To: Eli Zaretskii, phillip.lord@russet.org.uk; +Cc: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

Microsoft has been doing this since 1990's and they ended up with "language server protocol".  Maybe Emacs core could implement some kind of support for implementing lsp:s or implement some better support to interchange data with existing lsp:s.

Personally I dislike the idea of json (or xml) as interchange format, I would prefer some binary protocol for compactness and speed, but otherwise their idea if lsp is probably quite sound.

Maybe GCC could export its ast as well ....

Skickat från min Samsung Galaxy-smartphone.

-------- Originalmeddelande --------
Från: Eli Zaretskii <eliz@gnu.org>
Datum: 2020-01-03 15:29 (GMT+01:00)
Till: phillip.lord@russet.org.uk
Kopia: emacs-devel@gnu.org
Ämne: Re: Using incremental parsing in Emacs

> Date: Fri, 03 Jan 2020 13:36:39 +0000
> From: phillip.lord@russet.org.uk
> Cc: emacs-devel@gnu.org, Emacs-devel
>  <emacs-devel-bounces+phillip.lord=russet.org.uk@gnu.org>
>
> https://github.com/ubolonton/emacs-tree-sitter/
>
> https://github.com/karlotness/tree-sitter.el
>
> The former uses Rust for the dynamic module support.

My gut feeling is that modules are not the best way of bringing this
to Emacs (and doing this via Rust on top of that makes even less sense
to me), which is why I suggested to come up with a design first.

> Tree sitter itself also uses Javascript and npm to define the
> language grammars, although AFAICT, these compile down to C.

They compile to C, and I'm quite sure that it shouldn't be too hard to
allow a language grammar to be written in some other scripting
language.  But I think these are secondary considerations at this
stage.

[-- Attachment #2: Type: text/html, Size: 2821 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 10:05 Eli Zaretskii
  2020-01-03 13:36 ` phillip.lord
@ 2020-01-03 16:00 ` Dmitry Gutov
  2020-01-03 17:09   ` Pankaj Jangid
  2020-01-03 19:39 ` Stephen Leake
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 76+ messages in thread
From: Dmitry Gutov @ 2020-01-03 16:00 UTC (permalink / raw)
  To: Eli Zaretskii, emacs-devel; +Cc: Stefan Monnier

On 03.01.2020 12:05, Eli Zaretskii wrote:
> Would someone like to try to figure out how we could use the
> incremental parsing technology in Emacs for making our
> programming-language support more accurate and efficient?  One package
> that implements this technology is tree-sitter:
> 
>    https://tree-sitter.github.io/tree-sitter/
> 
> AFAIU, these capabilities could be used as an alternative to
> regexp- and syntax-pps-based font-lock, better code folding,
> completion, refactoring, and other similar features; in general, any
> feature which would benefit from having a parse tree for the source
> code in a buffer.

Quite some time ago we talked with Stefan about supporting certain 
complex language features, and kind of agreed that we could use a new 
way to specify syntax, something to supersede syntax-propertize-function 
(and maybe font-lock).

Tree-Sitter could be an example of how such new grammars could be 
structured, but I think we'd need that implemented in Lisp. Not in a 
foreign library that we import through modules mechanism.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 16:00 ` Dmitry Gutov
@ 2020-01-03 17:09   ` Pankaj Jangid
  0 siblings, 0 replies; 76+ messages in thread
From: Pankaj Jangid @ 2020-01-03 17:09 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, Stefan Monnier, emacs-devel

> Tree-Sitter could be an example of how such new grammars could be
> structured, but I think we'd need that implemented in Lisp. Not in a
> foreign library that we import through modules mechanism.

If it is a pure-C library then there is no harm in including it. And for
performance reasons, integration must be done at C level. tree-sitter
dynamically changes the syntax tree on every character-level change in
the input source (it will be buffer in our case).

Lisp interface must be there to support customization and integration
with other packages. I guess that your intent is same as Eli's
intent. But he insists that we come up with the design first.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 10:05 Eli Zaretskii
  2020-01-03 13:36 ` phillip.lord
  2020-01-03 16:00 ` Dmitry Gutov
@ 2020-01-03 19:39 ` Stephen Leake
  2020-01-03 20:05   ` Eli Zaretskii
  2020-01-04  3:59 ` HaiJun Zhang
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 76+ messages in thread
From: Stephen Leake @ 2020-01-03 19:39 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Would someone like to try to figure out how we could use the
> incremental parsing technology in Emacs for making our
> programming-language support more accurate and efficient?  

GNU ELPA ada-mode is an existing example; it has a full language parser
(error-correcting generalized LR), that supports some advanced
navigation. It could be extended to do some code completion.

Instead of "incremental parsing" (which updates an existing syntax tree
given source changes) it uses "partial parsing" (parsing only part of a
file) and very robust error handling. It works very well on very large
Ada files (it is in production use by Eurocontrol and others).

Error correction is critical, since buffers are normally not syntactically
correct during editing.

I've tried using the same parser generator on Java and Python; the
results are not as good as for Ada (apparently Ada lends itself to LR
parsing better than those languages). That might be improved by
massaging the grammar, but that risks implementing not-quite-Java,
not-quite-Python.

Others mentioned LSP (https://langserver.org/); that method supports
incremental parsing, since it is centered on sending source edits from
the editor to the language server (after sending the full text once). It
also supports algorithms that require more than one source file, since
all files involved in a project can be loaded into the same language
server instance (the ada-mode parser is strictly one file). That allows
providing completion on parameters for functions declared in other files,
for example.

Many editors are moving to support LSP; that allows them to take
advantage of any parser technology developed independently.

ada-mode has its own protocol between elisp and the external parser,
provided by the GNU ELPA wisi package (the ada-mode parser was started
before LSP). The parser in ada-mode could be used in an LSP language
server.

So I think the short answer to your post is "GNU ELPA eglot", with
possibly some work importing some of that into core to make it more
efficient. eglot is currently listed as "incompat" in *Packages* (in
both emacs 27 and 26); I don't know why. I have not tried eglot; I don't
know how complete it is. There is also
https://github.com/emacs-lsp/lsp-mode.

The syntax used for expressing the grammar is usually fairly tightly
tied to the language and/or the parser generator; trying to generalize
that for all languages supported by Emacs is a huge task, not worth
doing. With LSP, building a grammar for a langauge is done once for each
language server.

Whether the language server is implemented as an external process, or as
a loadable module, is an implementation detail. ada-mode uses an
external process, mostly because it was started before modules were
stablilized. The communications between the language server and elisp
(whether ada-mode style or LSP) involves sending text, not binary data
(and _not_ pointers into the emacs buffer!). Doing that via the module
interface vs pipes to a process is a wash for speed. Using a process
fully isolates the server code from emacs, eliminating any possible
third-party library version conflicts.

It could be possible to implent an LSP language server in elisp, running
in a separate thread (or even the same thread; it can be used
synchonously). That might be an interesting excercise, and would
eliminate other language dependencies. ada-mode used to support an elisp
parser generated from the same grammar, but that never supported error
correction; implementing very complex algorithms is just easier in a
more advanced language (and certainly faster at run time; critical for
error correction).

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 19:39 ` Stephen Leake
@ 2020-01-03 20:05   ` Eli Zaretskii
  2020-01-03 22:21     ` arthur miller
                       ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-03 20:05 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Fri, 03 Jan 2020 11:39:50 -0800
> 
> Whether the language server is implemented as an external process, or as
> a loadable module, is an implementation detail.

That detail can be very important, though.  E.g., direct access to
buffer text is not possible from external programs, and likely will
not be possible, at least not conveniently so, from modules.

So I still think we should first consider how the interfaces
supporting the various features should look, and only after that look
around for packages that perhaps are already doing that.  In general,
with all due respect, I don't expect the existing packages to teach us
TRT, because they are doing stuff in Lisp alone, and that is
inherently limited and likely sub-optimal.

But that's just MO; I started this thread to maybe inspire someone to
have a second look on the related features and propose ways of
improving what we do today, both feature-wise and speed-wise, as I see
quite a few complaints about lack of features and slowness in stuff
like font-lock.  If people are happy with what we have, it's fine with
me, even if I disagree with the approaches I see out there.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: Using incremental parsing in Emacs
  2020-01-03 20:05   ` Eli Zaretskii
@ 2020-01-03 22:21     ` arthur miller
  2020-01-04  3:46       ` HaiJun Zhang
  2020-01-04  8:23       ` Eli Zaretskii
  2020-01-03 23:53     ` Stephen Leake
  2020-01-05 22:44     ` Dmitry Gutov
  2 siblings, 2 replies; 76+ messages in thread
From: arthur miller @ 2020-01-03 22:21 UTC (permalink / raw)
  To: Eli Zaretskii, Stephen Leake; +Cc: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2465 bytes --]

When it comes to tree-sitter which you linked to, I didn't understand from their website how they deal with compile time dependencies and various configuration options that are usually passed via configure & co.

Various lsp implementations like ycmd and lsp-mode can use "compile database" produced by tools like bear, compiledb and similar. As I see on teee-sitter they only speak about language grammars. I have though only read their website tonight after you posted your mail.

Can they understand library dependencies, compile flags and so on?

Another thing is, about "speed-wise", if Enacs core implemented support for creating language servers, or plugging in them, as well as clients, then maybe it would be possible to use share memory for passing round those big json files that lsp-mode like to play with. That might also let Emacs reuse existing tools like lsp-mode. I think they are using sockets now, I am not sure though.

Skickat från min Samsung Galaxy-smartphone.

-------- Originalmeddelande --------
Från: Eli Zaretskii <eliz@gnu.org>
Datum: 2020-01-03 21:06 (GMT+01:00)
Till: Stephen Leake <stephen_leake@stephe-leake.org>
Kopia: emacs-devel@gnu.org
Ämne: Re: Using incremental parsing in Emacs

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Fri, 03 Jan 2020 11:39:50 -0800
>
> Whether the language server is implemented as an external process, or as
> a loadable module, is an implementation detail.

That detail can be very important, though.  E.g., direct access to
buffer text is not possible from external programs, and likely will
not be possible, at least not conveniently so, from modules.

So I still think we should first consider how the interfaces
supporting the various features should look, and only after that look
around for packages that perhaps are already doing that.  In general,
with all due respect, I don't expect the existing packages to teach us
TRT, because they are doing stuff in Lisp alone, and that is
inherently limited and likely sub-optimal.

But that's just MO; I started this thread to maybe inspire someone to
have a second look on the related features and propose ways of
improving what we do today, both feature-wise and speed-wise, as I see
quite a few complaints about lack of features and slowness in stuff
like font-lock.  If people are happy with what we have, it's fine with
me, even if I disagree with the approaches I see out there.

[-- Attachment #2: Type: text/html, Size: 3547 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 20:05   ` Eli Zaretskii
  2020-01-03 22:21     ` arthur miller
@ 2020-01-03 23:53     ` Stephen Leake
  2020-01-04  8:45       ` Eli Zaretskii
  2020-01-05 22:44     ` Dmitry Gutov
  2 siblings, 1 reply; 76+ messages in thread
From: Stephen Leake @ 2020-01-03 23:53 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> So I still think we should first consider how the interfaces
> supporting the various features should look, and only after that look
> around for packages that perhaps are already doing that.  In general,
> with all due respect, I don't expect the existing packages to teach us
> TRT, because they are doing stuff in Lisp alone, and that is
> inherently limited and likely sub-optimal.

The interface should look like LSP; it aims to support everything an IDE
needs from a "language server" (ie parser), and allows for custom
extensions where it falls short.

LSP language servers are implemented in some compiled language, not
elisp; eglot/lsp-mode is just the elisp side of the protocol. The elisp
sends edits and info requests (ie, "insert/delete this text at this
point", "fontify/format this range") to the server, and handles the
responses.

ada-mode works in a similar way, but LSP is an industry standard, so it
is a better choice.

If Emacs has a mode that conforms to the editor side of the LSP
protocol, it can use _any_ LSP language server; they do not have to be
provided by Emacs.

For example, Debian could provide some language servers as packages (I
assume it does now, but I have not checked), and Emacs could just use
them.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: Using incremental parsing in Emacs
  2019-12-26 16:52         ` yyoncho
@ 2020-01-04  3:25           ` HaiJun Zhang
  2020-01-04  5:21             ` Tobias Bading
  2020-01-04 23:48             ` Richard Stallman
  0 siblings, 2 replies; 76+ messages in thread
From: HaiJun Zhang @ 2020-01-04  3:25 UTC (permalink / raw)
  To: Eli Zaretskii, phillip.lord, arthur miller; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

在 2020年1月3日 +0800 PM11:43，arthur miller <arthur.miller@live.com>，写道：
> Microsoft has been doing this since 1990's and they ended up with "language server protocol".  Maybe Emacs core could implement some kind of support for implementing lsp:s or implement some better support to interchange data with existing lsp:s.
>

It seems that they(VSCode team) are also trying something like the tree-sitter. I forward another discussion to you.
在 2019年12月27日 +0800 AM12:52，yyoncho <yyoncho@gmail.com>，写道：
> > Hi Stefan,
> >
> > > 3- How does tree-sitter compare with the LSP-route (via eglot-mode
> > >    or lsp-mode)?
> >
> > lsp protocol is not going to support full-featured highlighting but only semantic
> > because it won't be fast enough.
> >
> > Related: https://github.com/microsoft/vscode/issues/77140 and
> > https://github.com/Microsoft/vscode/issues/585
> >
> > Thanks,
> > Ivan

[-- Attachment #2: Type: text/html, Size: 2494 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: Using incremental parsing in Emacs
  2020-01-03 22:21     ` arthur miller
@ 2020-01-04  3:46       ` HaiJun Zhang
  2020-01-04  8:23       ` Eli Zaretskii
  1 sibling, 0 replies; 76+ messages in thread
From: HaiJun Zhang @ 2020-01-04  3:46 UTC (permalink / raw)
  To: Eli Zaretskii, Stephen Leake, arthur miller; +Cc: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

在 2020年1月4日 +0800 AM6:30，arthur miller <arthur.miller@live.com>，写道：
> When it comes to tree-sitter which you linked to, I didn't understand from their website how they deal with compile time dependencies and various configuration options that are usually passed via configure & co.
>
> Various lsp implementations like ycmd and lsp-mode can use "compile database" produced by tools like bear, compiledb and similar. As I see on teee-sitter they only speak about language grammars. I have though only read their website tonight after you posted your mail.
>

Yes, grammars only. It is enough for font-lock, indention and imenu to use. And it is much more powerfull than the current regexp- and syntax-pps-based one.
What you ask for needs much more work to implement. And you need to setup a project before editing a file.


[-- Attachment #2: Type: text/html, Size: 1584 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 10:05 Eli Zaretskii
                   ` (2 preceding siblings ...)
  2020-01-03 19:39 ` Stephen Leake
@ 2020-01-04  3:59 ` HaiJun Zhang
       [not found] ` <41b3e9a0-2866-4692-a35c-6d9541bc3aaa@Spark>
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 76+ messages in thread
From: HaiJun Zhang @ 2020-01-04  3:59 UTC (permalink / raw)
  To: emacs-devel, Eli Zaretskii

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

I see it also supports multi-language Documents.

在 2020年1月3日 +0800 PM6:05，Eli Zaretskii <eliz@gnu.org>，写道：
> Would someone like to try to figure out how we could use the
> incremental parsing technology in Emacs for making our
> programming-language support more accurate and efficient? One package
> that implements this technology is tree-sitter:
>
> https://tree-sitter.github.io/tree-sitter/
>
> AFAIU, these capabilities could be used as an alternative to
> regexp- and syntax-pps-based font-lock, better code folding,
> completion, refactoring, and other similar features; in general, any
> feature which would benefit from having a parse tree for the source
> code in a buffer.
>
> To be able to use such libraries, we need to figure out how to
> integrate them into the core, what kind of interfaces would be needed
> for that, and what kind of infrastructure we would need for basing
> Lisp features on those libraries. Posting practical ideas for design
> of all that would be a good first step in this promising direction.
> Bonus points for providing code patches that demonstrate the
> implementation of these ideas.
>
> TIA
>

[-- Attachment #2: Type: text/html, Size: 1686 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
       [not found] ` <41b3e9a0-2866-4692-a35c-6d9541bc3aaa@Spark>
@ 2020-01-04  4:57   ` HaiJun Zhang
  2020-01-04  8:55     ` Eli Zaretskii
  0 siblings, 1 reply; 76+ messages in thread
From: HaiJun Zhang @ 2020-01-04  4:57 UTC (permalink / raw)
  To: emacs-devel, Eli Zaretskii

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

For font-lock, I think it may work with tree-sitter like this:
1. After openning a file, parse the whole buffer with tree-sitter. We get a syntax tree from tree-sitter.
2. Get the syntax nodes with ts_node_descendant_for_point_range and fontify the buffer text in the visible region or whole buffer.
3. After each modification of buffer text, make a copy of the syntax tree as the new one. Update the new one with the modification.
4. Get the changed range and changed nodes list by comparing the old and new syntax trees. Then free the old syntax tree.
5. Update the text properties in the changed range.
6. Goto 3

[-- Attachment #2: Type: text/html, Size: 1016 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04  3:25           ` Using incremental parsing in Emacs HaiJun Zhang
@ 2020-01-04  5:21             ` Tobias Bading
  2020-01-04 23:48             ` Richard Stallman
  1 sibling, 0 replies; 76+ messages in thread
From: Tobias Bading @ 2020-01-04  5:21 UTC (permalink / raw)
  To: emacs-devel; +Cc: Eli Zaretskii, HaiJun Zhang, arthur miller, phillip.lord

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

> On 4. Jan 2020, at 04:26, HaiJun Zhang <netjune@outlook.com> wrote:
> [...]
> 在 2019年12月27日 +0800 AM12:52，yyoncho <yyoncho@gmail.com>，写道：
>> [...]
>> lsp protocol is not going to support full-featured highlighting but only semantic
>> because it won't be fast enough. 
>> 
>> Related: https://github.com/microsoft/vscode/issues/77140 and 
>> https://github.com/Microsoft/vscode/issues/585

I’m not 100% sure because I don’t like too colorful source code and it’s been a while since I did the configuration, but I’m using the awesome
https://github.com/MaskRay/ccls and
https://github.com/MaskRay/emacs-ccls
(in combination with lsp-mode)
on a daily basis and I’m quite sure those are able to fontify member variables and what not.

Mickeysoft decisions never stopped anyone...

Happy hacking,
Tobias

[-- Attachment #2: Type: text/html, Size: 2052 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 22:21     ` arthur miller
  2020-01-04  3:46       ` HaiJun Zhang
@ 2020-01-04  8:23       ` Eli Zaretskii
  1 sibling, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-04  8:23 UTC (permalink / raw)
  To: arthur miller; +Cc: stephen_leake, emacs-devel

> From: arthur miller <arthur.miller@live.com>
> CC: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> Date: Fri, 3 Jan 2020 22:21:16 +0000
> 
> When it comes to tree-sitter which you linked to, I didn't understand from their website how they deal with
> compile time dependencies and various configuration options that are usually passed via configure & co.

You can ask them a question, but in general, some jobs related to this
don't need to understand that stuff.  E.g., our font-lock doesn't, and
it still does a pretty decent job.

> Another thing is, about "speed-wise", if Enacs core implemented support for creating language servers, or
> plugging in them, as well as clients, then maybe it would be possible to use share memory for passing round
> those big json files that lsp-mode like to play with. That might also let Emacs reuse existing tools like
> lsp-mode. I think they are using sockets now, I am not sure though.

Using JSON as the middleware adds a non-trivial processing overhead,
so if we can avoid that, we should.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 23:53     ` Stephen Leake
@ 2020-01-04  8:45       ` Eli Zaretskii
  2020-01-04 14:05         ` arthur miller
  2020-01-04 19:26         ` Stephen Leake
  0 siblings, 2 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-04  8:45 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Fri, 03 Jan 2020 15:53:45 -0800
> 
> The interface should look like LSP; it aims to support everything an IDE
> needs from a "language server" (ie parser), and allows for custom
> extensions where it falls short.

Maybe I'm the odd one out, but I don't think I have a clear idea of
what the "LSP interface" entails.  Would you (or someone else) post a
summary, or point to some place where this is described succinctly
enough to not require a long study?

We did learn one important thing from using LSP servers: that
processing the JSON stuff back and forth adds non-trivial overhead and
slows down the application enough to annoy users, even after we did
all we can to speed up the translation.  So I think it makes sense to
take one more look at the issue and see if we can come up with better
interfaces, which will suit Emacs applications better and allow faster
processing.  Using a library that processes stuff locally would then
allow us to implement such interfaces more easily, since we will be
free from the restrictions imposed by the need to communicate with
external processes.

we'll most probably want some combination of LSP-based and local
parsers-based features.  E.g., it's quite possible that LSP servers
could be better for some complex jobs, where speed matters less.

My point is that we shouldn't lock up our minds, not yet anyway.  A
fresh look at these issues, taking the incremental parsing into
account, could benefit us in the long run.

> LSP language servers are implemented in some compiled language, not
> elisp; eglot/lsp-mode is just the elisp side of the protocol. The elisp
> sends edits and info requests (ie, "insert/delete this text at this
> point", "fontify/format this range") to the server, and handles the
> responses.

I'm saying we should look into this and see whether there are better
ways that that.  Suppose such a server had direct access to buffer
text: would that allow a more efficient interface than the above?  We
shouldn't be "dragged" after the LSP ideas just because they are
there, and we shouldn't automatically decide that an interface is OK
just because it can be done in Lisp with no changes on the C level.
Emacs has some unique features and requirements that might affect the
design of the interfaces to such servers and libraries.

We've been there several times with minor features like line numbers
and fill-column indicator.  Why would this issue be an exception?
Especially since we have lately quite a "schism" among our active
developers: those who have the most experience in implementing such
applications aren't familiar well enough with the C internals, and
avoid making changes there.  This could very well be an obstacle to
coming up with good ideas regarding the best interfaces and
implementation options for the related features.

> ada-mode works in a similar way, but LSP is an industry standard, so it
> is a better choice.
> 
> If Emacs has a mode that conforms to the editor side of the LSP
> protocol, it can use _any_ LSP language server; they do not have to be
> provided by Emacs.

We should definitely support LSP.  We already do, albeit in
third-party packages.  We added native JSON support and jsonrpc for
doing this better.  If there's anything else we can do in that
direction, people should speak up.

But my point is that LSP is not necessarily the only game in town we
should support.  For example, font-lock doesn't use LSP, and probably
never will, due to performance issues; should we improve font-lock
using infrastructure that's based on language parsing?  And there are
other features that could benefit, I've mentioned them.  If you are
saying they all should just use LSP, then I don't think I agree.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04  4:57   ` HaiJun Zhang
@ 2020-01-04  8:55     ` Eli Zaretskii
  2020-01-04 12:50       ` VanL
  2020-01-04 13:30       ` arthur miller
  0 siblings, 2 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-04  8:55 UTC (permalink / raw)
  To: HaiJun Zhang; +Cc: emacs-devel

> Date: Sat, 4 Jan 2020 12:57:24 +0800
> From: HaiJun Zhang <netjune@outlook.com>
> 
> For font-lock, I think it may work with tree-sitter like this:
> 1. After openning a file, parse the whole buffer with tree-sitter. We get a syntax tree from tree-sitter.
> 2. Get the syntax nodes with ts_node_descendant_for_point_range and fontify the buffer text in the visible
> region or whole buffer.
> 3. After each modification of buffer text, make a copy of the syntax tree as the new one. Update the new one
> with the modification.
> 4. Get the changed range and changed nodes list by comparing the old and new syntax trees. Then free the
> old syntax tree.
> 5. Update the text properties in the changed range.
> 6. Goto 3

I encourage you to study how JIT font lock works in Emacs, and in
particular how it plugs itself into the display engine.  Because using
any incremental parsing technology for font-lock needs a good
understanding of how font-lock is typically used in Emacs, and any
practical suggestions for integration and interfaces must take that
into consideration.

E.g., step 1 is anathema to JIT font-lock: it would produce a long
delay in displaying a file's buffer when the file is first visited.
For example, think about visiting a large and complex source file such
as xdisp.c: even if it takes tens of milliseconds to parse all of it,
as some tree-sitter presentation claims, waiting for that long before
we even start displaying the first window-full would be an annoyance.
And that's even before we consider the time to compute all the face
text properties from the syntax tree, something that will also take
time.

Thanks.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04  8:55     ` Eli Zaretskii
@ 2020-01-04 12:50       ` VanL
  2020-01-04 13:22         ` arthur miller
  2020-01-04 13:30       ` arthur miller
  1 sibling, 1 reply; 76+ messages in thread
From: VanL @ 2020-01-04 12:50 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> E.g., step 1 is anathema to JIT font-lock: it would produce a long
> delay in displaying a file's buffer when the file is first visited.
> For example, think about visiting a large and complex source file such
> as xdisp.c: even if it takes tens of milliseconds to parse all of it,
> as some tree-sitter presentation claims, waiting for that long before
> we even start displaying the first window-full would be an annoyance.
> And that's even before we consider the time to compute all the face
> text properties from the syntax tree, something that will also take
> time.

Is it possible to phase out the C part of Emacs over the present decade
given [1] and can the approaches presented there foreshorten that wait
time?

Footnotes: 
[1]  C is not a low-level language
     https://queue.acm.org/detail.cfm?id=3212479




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04 12:50       ` VanL
@ 2020-01-04 13:22         ` arthur miller
  0 siblings, 0 replies; 76+ messages in thread
From: arthur miller @ 2020-01-04 13:22 UTC (permalink / raw)
  To: VanL; +Cc: emacs-devel@gnu.org

VanL <van@scratch.space> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> E.g., step 1 is anathema to JIT font-lock: it would produce a long
>> delay in displaying a file's buffer when the file is first visited.
>> For example, think about visiting a large and complex source file such
>> as xdisp.c: even if it takes tens of milliseconds to parse all of it,
>> as some tree-sitter presentation claims, waiting for that long before
>> we even start displaying the first window-full would be an annoyance.
>> And that's even before we consider the time to compute all the face
>> text properties from the syntax tree, something that will also take
>> time.
>
> Is it possible to phase out the C part of Emacs over the present decade
> given [1] and can the approaches presented there foreshorten that wait
> time?
>
> Footnotes: 
> [1]  C is not a low-level language
>      https://queue.acm.org/detail.cfm?id=3212479

That was a long read article. I don't believe C is chosen because of
being a low-level language, probably because at some point was a
"high-level" :-) Anyway, I always wondered if Emacs could be compiled
as C++ code (with g++). It would made lot's more code avialable to
be used in Emacs.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04  8:55     ` Eli Zaretskii
  2020-01-04 12:50       ` VanL
@ 2020-01-04 13:30       ` arthur miller
  2020-01-04 13:42         ` Dmitry Gutov
  1 sibling, 1 reply; 76+ messages in thread
From: arthur miller @ 2020-01-04 13:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: HaiJun Zhang, emacs-devel@gnu.org

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sat, 4 Jan 2020 12:57:24 +0800
>> From: HaiJun Zhang <netjune@outlook.com>
>> 
>> For font-lock, I think it may work with tree-sitter like this:
>> 1. After openning a file, parse the whole buffer with tree-sitter. We get a syntax tree from tree-sitter.
>> 2. Get the syntax nodes with ts_node_descendant_for_point_range and fontify the buffer text in the visible
>> region or whole buffer.
>> 3. After each modification of buffer text, make a copy of the syntax tree as the new one. Update the new one
>> with the modification.
>> 4. Get the changed range and changed nodes list by comparing the old and new syntax trees. Then free the
>> old syntax tree.
>> 5. Update the text properties in the changed range.
>> 6. Goto 3
>
> I encourage you to study how JIT font lock works in Emacs, and in
> particular how it plugs itself into the display engine.  Because using
> any incremental parsing technology for font-lock needs a good
> understanding of how font-lock is typically used in Emacs, and any
> practical suggestions for integration and interfaces must take that
> into consideration.
>
> E.g., step 1 is anathema to JIT font-lock: it would produce a long
> delay in displaying a file's buffer when the file is first visited.
> For example, think about visiting a large and complex source file such
> as xdisp.c: even if it takes tens of milliseconds to parse all of it,
> as some tree-sitter presentation claims, waiting for that long before
> we even start displaying the first window-full would be an annoyance.
> And that's even before we consider the time to compute all the face
> text properties from the syntax tree, something that will also take
> time.
>
> Thanks.

Do it in a thread and display file originally without syntax coloring,
and then gradually display results as the tree-sitter thread work it's
way?

Maybe start with displayed portion of the file only. Then since
tree-sitter accepts changes as fine-grained as on character level, send
new lines for syntax colouring as they are scrolled upp or down.

Or other strategy could be to have tree-sitter thread highlight the
visible portion of the file and to continue to work with non-visible
part of the buffer in background?

Could something like that work?



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04 13:30       ` arthur miller
@ 2020-01-04 13:42         ` Dmitry Gutov
  0 siblings, 0 replies; 76+ messages in thread
From: Dmitry Gutov @ 2020-01-04 13:42 UTC (permalink / raw)
  To: arthur miller, Eli Zaretskii; +Cc: HaiJun Zhang, emacs-devel@gnu.org

On 04.01.2020 15:30, arthur miller wrote:

> Do it in a thread and display file originally without syntax coloring,
> and then gradually display results as the tree-sitter thread work it's
> way?

Syntax info is needed not only for coloring, and often on-demand. So a 
synchronous API will be needed.

> Maybe start with displayed portion of the file only. Then since
> tree-sitter accepts changes as fine-grained as on character level, send
> new lines for syntax colouring as they are scrolled upp or down.

Like Eli said, see how JIT-lock works. If tree-sitter can support this 
usage, very good.

> Or other strategy could be to have tree-sitter thread highlight the
> visible portion of the file and to continue to work with non-visible
> part of the buffer in background?

That seems kinda wasteful.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04  8:45       ` Eli Zaretskii
@ 2020-01-04 14:05         ` arthur miller
  2020-01-04 19:26         ` Stephen Leake
  1 sibling, 0 replies; 76+ messages in thread
From: arthur miller @ 2020-01-04 14:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stephen Leake, emacs-devel@gnu.org

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Fri, 03 Jan 2020 15:53:45 -0800
>> 
>> The interface should look like LSP; it aims to support everything an IDE
>> needs from a "language server" (ie parser), and allows for custom
>> extensions where it falls short.
>
> Maybe I'm the odd one out, but I don't think I have a clear idea of
> what the "LSP interface" entails.  Would you (or someone else) post a
> summary, or point to some place where this is described succinctly
> enough to not require a long study?
>
> We did learn one important thing from using LSP servers: that
> processing the JSON stuff back and forth adds non-trivial overhead and
> slows down the application enough to annoy users, even after we did
> all we can to speed up the translation.  So I think it makes sense to
> take one more look at the issue and see if we can come up with better
> interfaces, which will suit Emacs applications better and allow faster
> processing.  Using a library that processes stuff locally would then
> allow us to implement such interfaces more easily, since we will be
> free from the restrictions imposed by the need to communicate with
> external processes.

Personally I dislike the idea that they used json at all, but lots of
tools work now with that json.

Maybe the slowness comes from the overhead to communicate that json?
Could it help if Emacs provided a way to put those json files into
a shared memory so that both server and client can use same file
without sending it over between processes and having two copies of
both source and analysis in RAM. Maybe server could use Emacs buffer
directly instead of sending source file to external process. But
yes generally I also think that json is horrible format as
interchange format but I can understand why MS choosed it for their
tool.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 10:05 Eli Zaretskii
                   ` (4 preceding siblings ...)
       [not found] ` <41b3e9a0-2866-4692-a35c-6d9541bc3aaa@Spark>
@ 2020-01-04 14:46 ` arthur miller
  2020-01-05 14:50   ` Alan Third
  2020-01-09 21:56   ` Dmitry Gutov
  2020-01-04 20:26 ` Yuan Fu
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 76+ messages in thread
From: arthur miller @ 2020-01-04 14:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel@gnu.org

Eli Zaretskii <eliz@gnu.org> writes:

> Would someone like to try to figure out how we could use the
> incremental parsing technology in Emacs for making our
> programming-language support more accurate and efficient?  One package
> that implements this technology is tree-sitter:
>
>   https://tree-sitter.github.io/tree-sitter/
>
> AFAIU, these capabilities could be used as an alternative to
> regexp- and syntax-pps-based font-lock, better code folding,
> completion, refactoring, and other similar features; in general, any
> feature which would benefit from having a parse tree for the source
> code in a buffer.
>
> To be able to use such libraries, we need to figure out how to
> integrate them into the core, what kind of interfaces would be needed
> for that, and what kind of infrastructure we would need for basing
> Lisp features on those libraries.  Posting practical ideas for design
> of all that would be a good first step in this promising direction.
> Bonus points for providing code patches that demonstrate the
> implementation of these ideas.
>
> TIA

There is a very good presentation of tree-sitter on YT by its author:

https://www.youtube.com/watch?v=Jes3bD6P0To

Looks much better then what I got a picture by just reading on the
website:



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04  8:45       ` Eli Zaretskii
  2020-01-04 14:05         ` arthur miller
@ 2020-01-04 19:26         ` Stephen Leake
  2020-01-04 19:54           ` Eli Zaretskii
  1 sibling, 1 reply; 76+ messages in thread
From: Stephen Leake @ 2020-01-04 19:26 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Fri, 03 Jan 2020 15:53:45 -0800
>> 
>> The interface should look like LSP; it aims to support everything an IDE
>> needs from a "language server" (ie parser), and allows for custom
>> extensions where it falls short.
>
> Maybe I'm the odd one out, but I don't think I have a clear idea of
> what the "LSP interface" entails.  Would you (or someone else) post a
> summary, or point to some place where this is described succinctly
> enough to not require a long study?

The full description is at
https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/
However, that document apparently only describes commands sent to the
server, not the responses sent from the server.

My attempt at a summary, in the form of a description of how LSP is used
in a typical editing session:

User visits a file who's major mode supports LSP. Emacs starts or
connects to a language server for that language (this can be customized
in eglot to be per-project, and in other ways).

Emacs sends the entire file contents to the server. For every edit the
user makes after that, the edit it sent to the server; the message
contains deleted and inserted text. It is up to Emacs how much
insert/delete to include in each message to the server; I assume it is
not every character. Sending that message from after-change-hook would
be a natural choice, but it might be better to cache the information in
order to send fewer messages.

When font-lock is triggered, Emacs sends a request for formatting
a range to the server (LSP command ‘textDocument/rangeFormatting’); the
server sends back new text for that range, with proper indentation and
capitalization. I assume it also supports faces via some markup in the
JSON, but I have not seen that in the docs.

Similarly, when the user requests indentation (via TAB or some other
command), a format request is sent.

When the user starts typing a function call (or otherwise requests
completion), a textDocument/completion request is sent to the server; it
responds with the possible completions of the function name, and then
the parameter list.

> We did learn one important thing from using LSP servers: that
> processing the JSON stuff back and forth adds non-trivial overhead and
> slows down the application enough to annoy users, even after we did
> all we can to speed up the translation.  

Ok. I did not follow that in detail. Do we have any speed comparisons
with other editors?

> So I think it makes sense to take one more look at the issue and see
> if we can come up with better interfaces, which will suit Emacs
> applications better and allow faster processing. 

There is always a tradeoff between speed and flexibility. The ada-mode
interface to the external process is highly optimized to do exactly what
ada-mode currently needs, and is very fast. But it is also brittle;
adding new features may require large changes, and causes version
incompatibility. LSP is much more flexible, allowing expansion to new
features easily, and allowing feature negotiation.

Other editors seem to cope well with the json approach, so it should be
possible for Emacs as well.

> Using a library that processes stuff locally would then allow us to
> implement such interfaces more easily, since we will be free from the
> restrictions imposed by the need to communicate with external
> processes.

I gather you are suggesting that the language server could be an Emacs
module (or even an elisp package), with function calls for the various
features. That is certainly possible, but loses the ability to use any
server developed external to the Emacs project.

It might be possible to refactor some servers to work that way
(replacing the json interface with a direct function call interface),
but it would be a lot of work.

> we'll most probably want some combination of LSP-based and local
> parsers-based features.  E.g., it's quite possible that LSP servers
> could be better for some complex jobs, where speed matters less.
>
> My point is that we shouldn't lock up our minds, not yet anyway.  A
> fresh look at these issues, taking the incremental parsing into
> account, could benefit us in the long run.

Ok.

I will work on adding LSP support for ada-mode (reusing eglot and/or
lsp-mode), and see what might be done about the speed issues. I need to
do that anyway to support a customer request.

I can also look at moving the current Ada parser into an Emacs module,
to see if that helps with speed.

>> LSP language servers are implemented in some compiled language, not
>> elisp; eglot/lsp-mode is just the elisp side of the protocol. The elisp
>> sends edits and info requests (ie, "insert/delete this text at this
>> point", "fontify/format this range") to the server, and handles the
>> responses.
>
> I'm saying we should look into this and see whether there are better
> ways that that.  Suppose such a server had direct access to buffer
> text: would that allow a more efficient interface than the above?  

No; lexing the actual text is not where the time is spent.

> We should definitely support LSP.  We already do, albeit in
> third-party packages.  We added native JSON support and jsonrpc for
> doing this better.  If there's anything else we can do in that
> direction, people should speak up.

Ok.

> But my point is that LSP is not necessarily the only game in town we
> should support.  For example, font-lock doesn't use LSP, and probably
> never will, due to performance issues; 

Ada-mode uses the external process to compute faces for identifiers.
That works well, although I do (setq jit-lock-defer-time 1.0) so it only
fontifies when I pause typing; otherwise there can be an annoying delay
after each character.

However, doing correct font-lock for Ada without a parser is pretty much
impossible (on anything more than language keywords), and there is very
little that can be done to speed up the parsing. Migrating the parser
into a module might help, but only a little. Adding a json interface
would slow it down, of course.

> should we improve font-lock using infrastructure that's based on
> language parsing? 

ada-mode builds on the current font-lock infrastructure; the font-lock
timer triggers a parse on a range, and the parse actions set
font-lock-face text properties.

> And there are other features that could benefit, I've mentioned them.
> If you are saying they all should just use LSP, then I don't think I
> agree.

I'm saying they all could use LSP in principle, but I have not had any
experience actually doing that, so it may not work very well in
practice.

I don't think you are objecting to LSP in principle, but do have a
problem with the speed penalty due to using JSON. Since other editors
are succeeding with that, perhaps there is more Emacs could do here.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04 19:26         ` Stephen Leake
@ 2020-01-04 19:54           ` Eli Zaretskii
  2020-01-05 17:05             ` Stephen Leake
  0 siblings, 1 reply; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-04 19:54 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Sat, 04 Jan 2020 11:26:38 -0800
> 
> My attempt at a summary, in the form of a description of how LSP is used
> in a typical editing session:

Thanks.

> > We did learn one important thing from using LSP servers: that
> > processing the JSON stuff back and forth adds non-trivial overhead and
> > slows down the application enough to annoy users, even after we did
> > all we can to speed up the translation.  
> 
> Ok. I did not follow that in detail. Do we have any speed comparisons
> with other editors?

I didn't see any.

> > I'm saying we should look into this and see whether there are better
> > ways that that.  Suppose such a server had direct access to buffer
> > text: would that allow a more efficient interface than the above?  
> 
> No; lexing the actual text is not where the time is spent.

the way we currently communicate with servers is to make a buffer
substring and encode it, which in itself is an overhead.  And then
JSON adds to that.

> ada-mode builds on the current font-lock infrastructure; the font-lock
> timer triggers a parse on a range, and the parse actions set
> font-lock-face text properties.

Font-lock by default doesn't use any timers, so what you do in Ada
mode is not exactly how Emacs fontifies buffers.  Or at least it
sounds like that.

> I don't think you are objecting to LSP in principle, but do have a
> problem with the speed penalty due to using JSON.

Using JSON is one thing; talking to an external program is another.

> Since other editors are succeeding with that, perhaps there is more
> Emacs could do here.

Other editors don't need to go through Lisp, so they can do more stuff
faster, and also off-load some of the work to threads.  We need to
find our own ways of being efficient, which might be different from
what other editors do.  It's like with the bidi support: I needed to
write an entirely different implementation of the UBA to make it fit
into the Emacs display engine's design.  No other implementation of
the UBA I know of works like our bidi.c and presents interfaces like
it does.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 10:05 Eli Zaretskii
                   ` (5 preceding siblings ...)
  2020-01-04 14:46 ` arthur miller
@ 2020-01-04 20:26 ` Yuan Fu
  2020-01-04 20:43 ` Stefan Monnier
  2020-01-06 16:14 ` Anand Tamariya
  8 siblings, 0 replies; 76+ messages in thread
From: Yuan Fu @ 2020-01-04 20:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> AFAIU, these capabilities could be used as an alternative to
> regexp- and syntax-pps-based font-lock, better code folding,
> completion, refactoring, and other similar features; in general, any
> feature which would benefit from having a parse tree for the source
> code in a buffer.

Another possibility is modular editing — syntax-aware word jumps (M-f/b/d), “free” expand-regions by parse tree, etc. I’m not sure about font-locking, but these types of operations needs only a parse tree. So I think it suffice to simply provide this parse tree and let packages use it like (syntax-ppss): provide functions that returns the syntax object at point (cl-struct?) and functions can extract informations from it, like position of beginning and end, type, previous/next object, nested level, etc.

I’m not sure how refactoring could work (accurately), tho. Maybe it’s better to leave refactoring to lsp servers. In general, I think we should leave tasks that requires a deeper understanding of the semantics of the language to lsp.

Yuan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 10:05 Eli Zaretskii
                   ` (6 preceding siblings ...)
  2020-01-04 20:26 ` Yuan Fu
@ 2020-01-04 20:43 ` Stefan Monnier
  2020-01-05 14:19   ` Alan Third
  2020-01-06 16:14 ` Anand Tamariya
  8 siblings, 1 reply; 76+ messages in thread
From: Stefan Monnier @ 2020-01-04 20:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

I'm pretty far behind in my backlog, so can't say much yet, but IMO the
design of some "next generation font-lock / syntax-ppss / indentation /
navigation" infrastructure should start by considering the use of
multiple CPUs.  That's actually one of the benefits of the LSP approach ;-)


        Stefan


Eli Zaretskii [2020-01-03 12:05:02] wrote:

> Would someone like to try to figure out how we could use the
> incremental parsing technology in Emacs for making our
> programming-language support more accurate and efficient?  One package
> that implements this technology is tree-sitter:
>
>   https://tree-sitter.github.io/tree-sitter/
>
> AFAIU, these capabilities could be used as an alternative to
> regexp- and syntax-pps-based font-lock, better code folding,
> completion, refactoring, and other similar features; in general, any
> feature which would benefit from having a parse tree for the source
> code in a buffer.
>
> To be able to use such libraries, we need to figure out how to
> integrate them into the core, what kind of interfaces would be needed
> for that, and what kind of infrastructure we would need for basing
> Lisp features on those libraries.  Posting practical ideas for design
> of all that would be a good first step in this promising direction.
> Bonus points for providing code patches that demonstrate the
> implementation of these ideas.
>
> TIA




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04  3:25           ` Using incremental parsing in Emacs HaiJun Zhang
  2020-01-04  5:21             ` Tobias Bading
@ 2020-01-04 23:48             ` Richard Stallman
  2020-01-05  3:36               ` Eli Zaretskii
  1 sibling, 1 reply; 76+ messages in thread
From: Richard Stallman @ 2020-01-04 23:48 UTC (permalink / raw)
  To: HaiJun Zhang; +Cc: eliz, emacs-devel, arthur.miller, phillip.lord

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

What is VSCode, and how does it relate to Emacs development?

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04 23:48             ` Richard Stallman
@ 2020-01-05  3:36               ` Eli Zaretskii
  0 siblings, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-05  3:36 UTC (permalink / raw)
  To: rms; +Cc: netjune, phillip.lord, arthur.miller, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Date: Sat, 04 Jan 2020 18:48:48 -0500
> Cc: eliz@gnu.org, emacs-devel@gnu.org, arthur.miller@live.com,
>  phillip.lord@russet.org.uk
> 
> What is VSCode, and how does it relate to Emacs development?

It's another programming editor, developed by Microsoft.

  https://en.wikipedia.org/wiki/Visual_Studio_Code




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04 20:43 ` Stefan Monnier
@ 2020-01-05 14:19   ` Alan Third
  2020-01-05 17:07     ` Stephen Leake
  2020-01-05 17:09     ` Stefan Monnier
  0 siblings, 2 replies; 76+ messages in thread
From: Alan Third @ 2020-01-05 14:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

On Sat, Jan 04, 2020 at 03:43:13PM -0500, Stefan Monnier wrote:
> I'm pretty far behind in my backlog, so can't say much yet, but IMO the
> design of some "next generation font-lock / syntax-ppss / indentation /
> navigation" infrastructure should start by considering the use of
> multiple CPUs.  That's actually one of the benefits of the LSP approach ;-)

My understanding is that tree sitter also supports parallel parsing of
a single file.
-- 
Alan Third



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04 14:46 ` arthur miller
@ 2020-01-05 14:50   ` Alan Third
  2020-01-05 15:16     ` arthur miller
                       ` (3 more replies)
  2020-01-09 21:56   ` Dmitry Gutov
  1 sibling, 4 replies; 76+ messages in thread
From: Alan Third @ 2020-01-05 14:50 UTC (permalink / raw)
  To: arthur miller; +Cc: Eli Zaretskii, emacs-devel@gnu.org

On Sat, Jan 04, 2020 at 02:46:14PM +0000, arthur miller wrote:
> 
> There is a very good presentation of tree-sitter on YT by its author:
> 
> https://www.youtube.com/watch?v=Jes3bD6P0To
> 
> Looks much better then what I got a picture by just reading on the
> website:

I watched this video and it looks to me like tree sitter is trying to
solve a fundamentally different problem than LSP servers.

Most of the conversation in this thread seems to make the assumption
that tree sitter and LSP are mutually exclusive, which is clearly not
true.

Tree sitter provides a fast and accurate parse tree to the editor
which it can then use for syntax highlighting, moving by syntax
element, code folding, expand region, etc.

LSP does not provide that and possibly never will:

https://github.com/Microsoft/language-server-protocol/issues/682#issuecomment-486676262

Tree sitter does not provide error checking (although it does some
basic syntax checking simply through being a parser), language aware
refactoring, completion, etc. Things that LSP DOES provide.

One of the things that interested me the most in that presentation was
the discussion of syntax highlighting on very long lines. Perhaps it
couldn’t help Emacs, but it certainly made me think.
-- 
Alan Third

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 14:50   ` Alan Third
@ 2020-01-05 15:16     ` arthur miller
  2020-01-05 15:29     ` Eli Zaretskii
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 76+ messages in thread
From: arthur miller @ 2020-01-05 15:16 UTC (permalink / raw)
  To: Alan Third; +Cc: Eli Zaretskii, emacs-devel@gnu.org

Alan Third <alan@idiocy.org> writes:

> On Sat, Jan 04, 2020 at 02:46:14PM +0000, arthur miller wrote:
>> 
>> There is a very good presentation of tree-sitter on YT by its author:
>> 
>> https://www.youtube.com/watch?v=Jes3bD6P0To
>> 
>> Looks much better then what I got a picture by just reading on the
>> website:
>
> I watched this video and it looks to me like tree sitter is trying to
> solve a fundamentally different problem than LSP servers.
>
> Most of the conversation in this thread seems to make the assumption
> that tree sitter and LSP are mutually exclusive, which is clearly not
> true.
>
> Tree sitter provides a fast and accurate parse tree to the editor
> which it can then use for syntax highlighting, moving by syntax
> element, code folding, expand region, etc.
>
> LSP does not provide that and possibly never will:
>
> https://github.com/Microsoft/language-server-protocol/issues/682#issuecomment-486676262
>
> Tree sitter does not provide error checking (although it does some
> basic syntax checking simply through being a parser), language aware
> refactoring, completion, etc. Things that LSP DOES provide.
>
> One of the things that interested me the most in that presentation was
> the discussion of syntax highlighting on very long lines. Perhaps it
> couldn’t help Emacs, but it certainly made me think.

Yesterday evening after thinking for a while, I also got to conclusion
that tree-sitter provides just syntax coloring. However it does
provide an AST so question is if that AST can be used for more than just
syntax coloring.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 14:50   ` Alan Third
  2020-01-05 15:16     ` arthur miller
@ 2020-01-05 15:29     ` Eli Zaretskii
  2020-01-05 15:31     ` Eli Zaretskii
  2020-01-05 17:11     ` Stephen Leake
  3 siblings, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-05 15:29 UTC (permalink / raw)
  To: Alan Third; +Cc: arthur.miller, emacs-devel

> Date: Sun, 5 Jan 2020 14:50:02 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,
> 	"emacs-devel@gnu.org" <emacs-devel@gnu.org>
> 
> Most of the conversation in this thread seems to make the assumption
> that tree sitter and LSP are mutually exclusive, which is clearly not
> true.

Definitely not true.  We should consider both ways complementary, and
provide support for both of them.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 14:50   ` Alan Third
  2020-01-05 15:16     ` arthur miller
  2020-01-05 15:29     ` Eli Zaretskii
@ 2020-01-05 15:31     ` Eli Zaretskii
  2020-01-05 17:11     ` Stephen Leake
  3 siblings, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-05 15:31 UTC (permalink / raw)
  To: Alan Third; +Cc: arthur.miller, emacs-devel

> Date: Sun, 5 Jan 2020 14:50:02 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,
> 	"emacs-devel@gnu.org" <emacs-devel@gnu.org>
> 
> One of the things that interested me the most in that presentation was
> the discussion of syntax highlighting on very long lines. Perhaps it
> couldn’t help Emacs, but it certainly made me think.

Btw, all the examples they show of how "traditional" regexp-based
syntax highlighting doesn't work well, do work well in Emacs.  I guess
it isn't a coincidence that Emacs's highlighting was not shown there.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04 19:54           ` Eli Zaretskii
@ 2020-01-05 17:05             ` Stephen Leake
  2020-01-05 19:14               ` yyoncho
  0 siblings, 1 reply; 76+ messages in thread
From: Stephen Leake @ 2020-01-05 17:05 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Sat, 04 Jan 2020 11:26:38 -0800
>>
>> > I'm saying we should look into this and see whether there are better
>> > ways that that.  Suppose such a server had direct access to buffer
>> > text: would that allow a more efficient interface than the above?
>>
>> No; lexing the actual text is not where the time is spent.
>
> the way we currently communicate with servers is to make a buffer
> substring and encode it, which in itself is an overhead.  And then
> JSON adds to that.

Right. The encode step is done when communicating with modules as well,
because the internal encoding is not exactly uff-8.

>> ada-mode builds on the current font-lock infrastructure; the font-lock
>> timer triggers a parse on a range, and the parse actions set
>> font-lock-face text properties.
>
> Font-lock by default doesn't use any timers,

It does if you set jit-lock-defer-time, which I do (in my ~/.emacs, not in
ada-mode).

>> I don't think you are objecting to LSP in principle, but do have a
>> problem with the speed penalty due to using JSON.
>
> Using JSON is one thing; talking to an external program is another.
>
>> Since other editors are succeeding with that, perhaps there is more
>> Emacs could do here.
>
> Other editors don't need to go through Lisp, so they can do more stuff
> faster, and also off-load some of the work to threads.  We need to
> find our own ways of being efficient, which might be different from
> what other editors do.

Ok.

I've configured eglot and ada_language_server (from AdaCore); it works,
but doesn't do everything ada-mode needs (yet; AdaCore labels this a
work in progress).

In particular, LSP does not currently support semantic highlighting (ie
font-lock). There is a proposal to add that
(https://github.com/microsoft/language-server-protocol/issues/18); it's
not at all clear when/if that will make it into the standard.

It appears LSP is not actually a "standard", just something Microsoft is
making available; it is totally controlled by Microsoft. I'd be happier
if they turned it over to some standards org (FSF, ISO, IEEE, ...).

So far, I have not noticed a speed problem (even on very large files),
but I haven't really pushed it yet.

ada_language_server does one thing better than current ada-mode; provide
cross reference information. Currently ada-mode uses some files output
by the compiler to get cross-references; processing that information is
noticeably slow, and the information gets out of date, so the user must
request a cache refresh. ada_language_server provides cross references
directly from the parsed sources, with no caching issues. So I'll work
on adding that as an alternative xref backend for ada-mode.

--
-- Stephe

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 14:19   ` Alan Third
@ 2020-01-05 17:07     ` Stephen Leake
  2020-01-05 19:16       ` Alan Third
  2020-01-05 17:09     ` Stefan Monnier
  1 sibling, 1 reply; 76+ messages in thread
From: Stephen Leake @ 2020-01-05 17:07 UTC (permalink / raw)
  To: emacs-devel

Alan Third <alan@idiocy.org> writes:

> On Sat, Jan 04, 2020 at 03:43:13PM -0500, Stefan Monnier wrote:
>> I'm pretty far behind in my backlog, so can't say much yet, but IMO the
>> design of some "next generation font-lock / syntax-ppss / indentation /
>> navigation" infrastructure should start by considering the use of
>> multiple CPUs.  That's actually one of the benefits of the LSP approach ;-)
>
> My understanding is that tree sitter also supports parallel parsing of
> a single file.

Can you point to some literature on this? I've never heard of parallel
parsing (other than a generalized LR parser, which is _not_ a speed up :).


-- 
-- Stephe



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 14:19   ` Alan Third
  2020-01-05 17:07     ` Stephen Leake
@ 2020-01-05 17:09     ` Stefan Monnier
  2020-01-05 18:22       ` Eli Zaretskii
  1 sibling, 1 reply; 76+ messages in thread
From: Stefan Monnier @ 2020-01-05 17:09 UTC (permalink / raw)
  To: Alan Third; +Cc: Eli Zaretskii, emacs-devel

>> I'm pretty far behind in my backlog, so can't say much yet, but IMO the
>> design of some "next generation font-lock / syntax-ppss / indentation /
>> navigation" infrastructure should start by considering the use of
>> multiple CPUs.  That's actually one of the benefits of the LSP approach ;-)
>
> My understanding is that tree sitter also supports parallel parsing of
> a single file.

Such parallelism is great, but just to clarify I was more thinking about
having the new system work concurrently with the rest of Elisp.
I.e. have one CPU run Elisp while other CPU(s) do the syntax processing,
whereas a naive use of tree sitter's parallelism would end up with
either Elisp running alone or tree-sitter using several CPUs but never
both at the same time.

        Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 14:50   ` Alan Third
                       ` (2 preceding siblings ...)
  2020-01-05 15:31     ` Eli Zaretskii
@ 2020-01-05 17:11     ` Stephen Leake
  3 siblings, 0 replies; 76+ messages in thread
From: Stephen Leake @ 2020-01-05 17:11 UTC (permalink / raw)
  To: emacs-devel

Alan Third <alan@idiocy.org> writes:

> On Sat, Jan 04, 2020 at 02:46:14PM +0000, arthur miller wrote:
>> 
>> There is a very good presentation of tree-sitter on YT by its author:
>> 
>> https://www.youtube.com/watch?v=Jes3bD6P0To
>> 
>> Looks much better then what I got a picture by just reading on the
>> website:
>
> I watched this video and it looks to me like tree sitter is trying to
> solve a fundamentally different problem than LSP servers.
>
> Most of the conversation in this thread seems to make the assumption
> that tree sitter and LSP are mutually exclusive, which is clearly not
> true.

tree sitter could be the parser inside an LSP server

Which was my first point; LSP servers can take advantage of advanced/new
parsing technology, and they immediately become available to Emacs.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 17:09     ` Stefan Monnier
@ 2020-01-05 18:22       ` Eli Zaretskii
  2020-01-05 19:18         ` Stefan Monnier
  2020-01-05 19:23         ` arthur miller
  0 siblings, 2 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-05 18:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: alan, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Sun, 05 Jan 2020 12:09:08 -0500
> 
> I was more thinking about having the new system work concurrently
> with the rest of Elisp.  I.e. have one CPU run Elisp while other
> CPU(s) do the syntax processing,

How do you give the syntax-processing thread access to buffer text, if
it's running asynchronously?



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 17:05             ` Stephen Leake
@ 2020-01-05 19:14               ` yyoncho
  0 siblings, 0 replies; 76+ messages in thread
From: yyoncho @ 2020-01-05 19:14 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

On Sun, Jan 5, 2020 at 7:05 PM Stephen Leake <stephen_leake@stephe-leake.org>
wrote:

> It appears LSP is not actually a "standard", just something Microsoft is
> making available; it is totally controlled by Microsoft. I'd be happier
> if they turned it over to some standards org (FSF, ISO, IEEE, ...).
>

The funniest part is that the developers of the server most of the time do
not care much about the standard but only whether it works with vscode. To
give you an example, when receiving the completion results the client is
supposed to calculate the prefix and filter against it. In html file if you
have <foo it will calculate the prefix foo but if you are in xml file and
you want to complete <foo the prefix to match against is <foo. So, if we
want to work against these servers we have to implement the vscode's
bizarre behaviour.

[-- Attachment #2: Type: text/html, Size: 1234 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 17:07     ` Stephen Leake
@ 2020-01-05 19:16       ` Alan Third
  0 siblings, 0 replies; 76+ messages in thread
From: Alan Third @ 2020-01-05 19:16 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

On Sun, Jan 05, 2020 at 09:07:40AM -0800, Stephen Leake wrote:
> Alan Third <alan@idiocy.org> writes:
> 
> > On Sat, Jan 04, 2020 at 03:43:13PM -0500, Stefan Monnier wrote:
> >> I'm pretty far behind in my backlog, so can't say much yet, but IMO the
> >> design of some "next generation font-lock / syntax-ppss / indentation /
> >> navigation" infrastructure should start by considering the use of
> >> multiple CPUs.  That's actually one of the benefits of the LSP approach ;-)
> >
> > My understanding is that tree sitter also supports parallel parsing of
> > a single file.
> 
> Can you point to some literature on this? I've never heard of parallel
> parsing (other than a generalized LR parser, which is _not_ a speed up :).

I think you’re right and I’ve misunderstood a part of the video. It’s
a generalized LR parser, and it won’t be faster, just handles
ambiguous code better.
-- 
Alan Third



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 18:22       ` Eli Zaretskii
@ 2020-01-05 19:18         ` Stefan Monnier
  2020-01-05 19:36           ` Eli Zaretskii
  2020-01-05 19:23         ` arthur miller
  1 sibling, 1 reply; 76+ messages in thread
From: Stefan Monnier @ 2020-01-05 19:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: alan, emacs-devel

>> I was more thinking about having the new system work concurrently
>> with the rest of Elisp.  I.e. have one CPU run Elisp while other
>> CPU(s) do the syntax processing,
>
> How do you give the syntax-processing thread access to buffer text, if
> it's running asynchronously?

That's indeed one of the questions that I think we should answer as part
of such a new design.

In the case of LSP this is done by maintaining a copy of our buffer's
content in the LSP server process.


        Stefan




^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: Using incremental parsing in Emacs
  2020-01-05 18:22       ` Eli Zaretskii
  2020-01-05 19:18         ` Stefan Monnier
@ 2020-01-05 19:23         ` arthur miller
  2020-01-05 19:40           ` Eli Zaretskii
  1 sibling, 1 reply; 76+ messages in thread
From: arthur miller @ 2020-01-05 19:23 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: alan@idiocy.org, emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]

I think there are two use-cases:

1) batch processing where entire file (or visible portion) is processed by different thread(s), for example on file opening or if some expansion takes place (yasnippet or similar).

This can be done by dividing text in number of blocks (lines or similar) and letting each thread match block atva time against shared pattern database.

2) interactive use; when user is typing actively.

Interactive use is easy, just match ladt word after certain delimiters are typed. Probably does not need threading, but can't be done with multiple threads  as well. Multiple threads can match one word at a time against patterns database split in blocks. Case 2 is probably not worth the round-trip time to the bus, but I don't know.

Just as a thought. Would it be possible?

Skickat från min Samsung Galaxy-smartphone.

-------- Originalmeddelande --------
Från: Eli Zaretskii <eliz@gnu.org>
Datum: 2020-01-05 19:22 (GMT+01:00)
Till: Stefan Monnier <monnier@iro.umontreal.ca>
Kopia: alan@idiocy.org, emacs-devel@gnu.org
Ämne: Re: Using incremental parsing in Emacs

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Sun, 05 Jan 2020 12:09:08 -0500
>
> I was more thinking about having the new system work concurrently
> with the rest of Elisp.  I.e. have one CPU run Elisp while other
> CPU(s) do the syntax processing,

How do you give the syntax-processing thread access to buffer text, if
it's running asynchronously?

[-- Attachment #2: Type: text/html, Size: 2667 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 19:18         ` Stefan Monnier
@ 2020-01-05 19:36           ` Eli Zaretskii
  2020-01-05 20:27             ` Stefan Monnier
  0 siblings, 1 reply; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-05 19:36 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: alan, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: alan@idiocy.org,  emacs-devel@gnu.org
> Date: Sun, 05 Jan 2020 14:18:55 -0500
> 
> > How do you give the syntax-processing thread access to buffer text, if
> > it's running asynchronously?
> 
> That's indeed one of the questions that I think we should answer as part
> of such a new design.
> 
> In the case of LSP this is done by maintaining a copy of our buffer's
> content in the LSP server process.

That's exactly the issue: if we need to make a copy just to run the
parser asynchronously, then there's no advantage significant in having
such asynchronous processing inside the Emacs process, we might as
well communicate to an external process and pass it that copy.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 19:23         ` arthur miller
@ 2020-01-05 19:40           ` Eli Zaretskii
  2020-01-05 20:28             ` arthur miller
  0 siblings, 1 reply; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-05 19:40 UTC (permalink / raw)
  To: arthur miller; +Cc: alan, monnier, emacs-devel

> From: arthur miller <arthur.miller@live.com>
> CC: "alan@idiocy.org" <alan@idiocy.org>, "emacs-devel@gnu.org"
> 	<emacs-devel@gnu.org>
> Date: Sun, 5 Jan 2020 19:23:16 +0000
> 
> I think there are two use-cases:
> 
> 1) batch processing where entire file (or visible portion) is processed by different thread(s), for example on file
> opening or if some expansion takes place (yasnippet or similar).
> 
> This can be done by dividing text in number of blocks (lines or similar) and letting each thread match block
> atva time against shared pattern database. 
> 
> 2) interactive use; when user is typing actively.

Maybe I'm missing something, but I don't see how any of this is
relevant to batch processing.  We never do anything in batch in an
interactive Emacs session, since the user is always there, waiting.
The display engine has many optimizations to eliminate the delays
caused by prolonged processing required to decide what should change
on the glass.  IOW, "interactive" doesn't just mean "typing", it can
mean any other command that changes what's on display, like scrolling.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 19:36           ` Eli Zaretskii
@ 2020-01-05 20:27             ` Stefan Monnier
  2020-01-05 21:12               ` yyoncho
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Monnier @ 2020-01-05 20:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: alan, emacs-devel

> That's exactly the issue: if we need to make a copy just to run the
> parser asynchronously, then there's no advantage significant in having
> such asynchronous processing inside the Emacs process,

There can still be advantages depending on many other details.

Another option is to give them direct access to the buffer, but only
allow read-only access and impose some synchronization between the
threads, e.g.: prepare_before_change could signal the concurrent
threads and wait for them to acknowledge that they can't look at the
buffer positions after START and then re-allow access past START when we
finish the buffer modification or when we return to the command loop).

Similarly, when buffer relocation takes place, we'd first signal to
concurrent threads and wait for them to acknowledge that they've stopped
accessing the buffer's content, and later re-signal them to let them
know they can access it again.

        Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: Using incremental parsing in Emacs
  2020-01-05 19:40           ` Eli Zaretskii
@ 2020-01-05 20:28             ` arthur miller
  2020-01-06  3:42               ` Eli Zaretskii
  0 siblings, 1 reply; 76+ messages in thread
From: arthur miller @ 2020-01-05 20:28 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: alan@idiocy.org, monnier@iro.umontreal.ca, emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

When I said batch-processing I ment processing a file or chunk of buffer (region) just before it is to be displayed to a user. Same for other "insertions" from macro expansions or similar.

I didn't know what term to use to make less typing, sorry if it was unclear what I ment. Scrolling would probably go into same category too. I didn't ment batch-processing as in calling 3macs from shell-scripts :-). Hope it clarifies what I nent. If you have better term I m glad to use it  :-)

With interactive I ment user typing, but it includes all edits done by user regardless by keyboard or mouse. Point was that for a small edits, a word ot two, there is probably more overhead to use threads then to do them from current thread.

I also don't think teee-sitter is needed for syntax coloring. Tree-sitter seems to be very expensive regex engine in that case.

Skickat från min Samsung Galaxy-smartphone.

-------- Originalmeddelande --------
Från: Eli Zaretskii <eliz@gnu.org>
Datum: 2020-01-05 20:40 (GMT+01:00)
Till: arthur miller <arthur.miller@live.com>
Kopia: monnier@iro.umontreal.ca, alan@idiocy.org, emacs-devel@gnu.org
Ämne: Re: Using incremental parsing in Emacs

> From: arthur miller <arthur.miller@live.com>
> CC: "alan@idiocy.org" <alan@idiocy.org>, "emacs-devel@gnu.org"
>        <emacs-devel@gnu.org>
> Date: Sun, 5 Jan 2020 19:23:16 +0000
>
> I think there are two use-cases:
>
> 1) batch processing where entire file (or visible portion) is processed by different thread(s), for example on file
> opening or if some expansion takes place (yasnippet or similar).
>
> This can be done by dividing text in number of blocks (lines or similar) and letting each thread match block
> atva time against shared pattern database.
>
> 2) interactive use; when user is typing actively.

Maybe I'm missing something, but I don't see how any of this is
relevant to batch processing.  We never do anything in batch in an
interactive Emacs session, since the user is always there, waiting.
The display engine has many optimizations to eliminate the delays
caused by prolonged processing required to decide what should change
on the glass.  IOW, "interactive" doesn't just mean "typing", it can
mean any other command that changes what's on display, like scrolling.

[-- Attachment #2: Type: text/html, Size: 3506 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 20:27             ` Stefan Monnier
@ 2020-01-05 21:12               ` yyoncho
  2020-01-05 22:10                 ` Stefan Monnier
  0 siblings, 1 reply; 76+ messages in thread
From: yyoncho @ 2020-01-05 21:12 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, alan, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1445 bytes --]

tree-sitter seems to designed to handle that case OOTB

As per tree-sitter docs:

1. You start parsing via ts_parser_parse(in a separate thread).
2. Document is changed
3. Call ts_parser_set_cancellation_flag
4. Call ts_tree_edit with the edits from 2
5. You call ts_parser_parse with the same params and tree-sitter is
smart enough to reuse the stuff that has already been parsed.


On Sun, Jan 5, 2020 at 10:28 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> > That's exactly the issue: if we need to make a copy just to run the
> > parser asynchronously, then there's no advantage significant in having
> > such asynchronous processing inside the Emacs process,
>
> There can still be advantages depending on many other details.
>
> Another option is to give them direct access to the buffer, but only
> allow read-only access and impose some synchronization between the
> threads, e.g.: prepare_before_change could signal the concurrent
> threads and wait for them to acknowledge that they can't look at the
> buffer positions after START and then re-allow access past START when we
> finish the buffer modification or when we return to the command loop).
>
> Similarly, when buffer relocation takes place, we'd first signal to
> concurrent threads and wait for them to acknowledge that they've stopped
> accessing the buffer's content, and later re-signal them to let them
> know they can access it again.
>
>
>         Stefan
>
>
>

[-- Attachment #2: Type: text/html, Size: 1923 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 21:12               ` yyoncho
@ 2020-01-05 22:10                 ` Stefan Monnier
  2020-01-05 23:08                   ` yyoncho
  2020-01-06  3:39                   ` Eli Zaretskii
  0 siblings, 2 replies; 76+ messages in thread
From: Stefan Monnier @ 2020-01-05 22:10 UTC (permalink / raw)
  To: yyoncho; +Cc: Eli Zaretskii, alan, emacs-devel

> tree-sitter seems to designed to handle that case OOTB

Or rather, my proposal was based on the "standard" way incremental
parsing works (e.g. syntax-ppss works the same way).

> As per tree-sitter docs:
>
> 1. You start parsing via ts_parser_parse(in a separate thread).
> 2. Document is changed
> 3. Call ts_parser_set_cancellation_flag

The question is how quickly this stops the parsing: any delay here
is very costly because the Elisp execution will have to sit idly waiting
for the parser to stop before it can continue its own execution.

        Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 20:05   ` Eli Zaretskii
  2020-01-03 22:21     ` arthur miller
  2020-01-03 23:53     ` Stephen Leake
@ 2020-01-05 22:44     ` Dmitry Gutov
  2 siblings, 0 replies; 76+ messages in thread
From: Dmitry Gutov @ 2020-01-05 22:44 UTC (permalink / raw)
  To: Eli Zaretskii, Stephen Leake; +Cc: emacs-devel

On 03.01.2020 22:05, Eli Zaretskii wrote:
> But that's just MO; I started this thread to maybe inspire someone to
> have a second look on the related features and propose ways of
> improving what we do today, both feature-wise and speed-wise, as I see
> quite a few complaints about lack of features and slowness in stuff
> like font-lock.

IME font-lock is slow:

- On long lines (which is hard to improve by simply implementing 
something in C),
- In certain major modes, on particular ones in CC Mode collection.

I imagine the latter is also caused by a relative complexity of the 
syntax in C++ and related languages, but also because, IIUC, CC Mode 
doesn't really do incremental parsing.

syntax-propertize-function, which is used by many other modes, kind of 
uses that paradigm, with a certain amount of success.

So the main thing I'd really expect from the new approach is "new 
features", i.e. ways to easier and more clearly express language syntax. 
Which, in turn, could entice our CC Mode developer to use it, with 
corresponding improvement in perceived performance. And, really, if 
TreeSitter's algorithmic complexity is good enough for us, it can 
probably be implemented in Lisp. With some critical section or two in C, 
maybe.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 22:10                 ` Stefan Monnier
@ 2020-01-05 23:08                   ` yyoncho
  2020-01-06  3:39                   ` Eli Zaretskii
  1 sibling, 0 replies; 76+ messages in thread
From: yyoncho @ 2020-01-05 23:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, alan, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

On Mon, 6 Jan 2020, 00:10 Stefan Monnier, <monnier@iro.umontreal.ca> wrote:

>
> The question is how quickly this stops the parsing: any delay here
> is very costly because the Elisp execution will have to sit idly waiting
> for the parser to stop before it can continue its own execution.
>

Without looking into the implementation I bet that it checks for
cancellations on each char. This means that we do not need to wait it to
stop, we just have to ensure that if it reads one more char it is from the
original content which could be achieved by caching one char ahead.

>

[-- Attachment #2: Type: text/html, Size: 1121 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 22:10                 ` Stefan Monnier
  2020-01-05 23:08                   ` yyoncho
@ 2020-01-06  3:39                   ` Eli Zaretskii
  1 sibling, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-06  3:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: alan, yyoncho, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  alan@idiocy.org,  emacs-devel
>  <emacs-devel@gnu.org>
> Date: Sun, 05 Jan 2020 17:10:08 -0500
> 
> > As per tree-sitter docs:
> >
> > 1. You start parsing via ts_parser_parse(in a separate thread).
> > 2. Document is changed
> > 3. Call ts_parser_set_cancellation_flag
> 
> The question is how quickly this stops the parsing: any delay here
> is very costly because the Elisp execution will have to sit idly waiting
> for the parser to stop before it can continue its own execution.

I think we will have full control of that anyway, because we will
provide a reader function to access buffer text, so we can always
signal EOB to tree-sitter and/or stop it in its tracks.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-05 20:28             ` arthur miller
@ 2020-01-06  3:42               ` Eli Zaretskii
  2020-01-06  4:39                 ` HaiJun Zhang
  0 siblings, 1 reply; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-06  3:42 UTC (permalink / raw)
  To: arthur miller; +Cc: alan, monnier, emacs-devel

> From: arthur miller <arthur.miller@live.com>
> CC: "monnier@iro.umontreal.ca" <monnier@iro.umontreal.ca>, "alan@idiocy.org"
> 	<alan@idiocy.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> Date: Sun, 5 Jan 2020 20:28:43 +0000
> 
> When I said batch-processing I ment processing a file or chunk of buffer (region) just before it is to be
> displayed to a user. Same for other "insertions" from macro expansions or similar. 

Then we always do "batch processing", because the display engine has
no good idea what exactly changed in the buffer.  So it always
processes some minimal chunk of text that it can prove to itself that
the changes were all inside that chunk.

> I also don't think teee-sitter is needed for syntax coloring. Tree-sitter seems to be very expensive regex engine
> in that case. 

They claim to be less expensive than regexp-based coloring, especially
with very long lines.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06  3:42               ` Eli Zaretskii
@ 2020-01-06  4:39                 ` HaiJun Zhang
  2020-01-06  5:33                   ` Eli Zaretskii
  2020-01-06 13:47                   ` Stefan Monnier
  0 siblings, 2 replies; 76+ messages in thread
From: HaiJun Zhang @ 2020-01-06  4:39 UTC (permalink / raw)
  To: arthur miller, Eli Zaretskii; +Cc: alan, monnier, emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 1321 bytes --]

Could someone explain how the jit-lock in Emacs works in the following case?

1. emacs -Q
2. open the attachment file and goto end of buffer
3. M-x desktop-save and quit emacs
4. emacs -Q
5. M-x desktop-read

I see the buffer is fontified correctly. Does it parse the whole buffer?


在 2020年1月6日 +0800 AM11:42，Eli Zaretskii <eliz@gnu.org>，写道：
> > From: arthur miller <arthur.miller@live.com>
> > CC: "monnier@iro.umontreal.ca" <monnier@iro.umontreal.ca>, "alan@idiocy.org"
> > <alan@idiocy.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> > Date: Sun, 5 Jan 2020 20:28:43 +0000
> >
> > When I said batch-processing I ment processing a file or chunk of buffer (region) just before it is to be
> > displayed to a user. Same for other "insertions" from macro expansions or similar.
>
> Then we always do "batch processing", because the display engine has
> no good idea what exactly changed in the buffer. So it always
> processes some minimal chunk of text that it can prove to itself that
> the changes were all inside that chunk.
>
> > I also don't think teee-sitter is needed for syntax coloring. Tree-sitter seems to be very expensive regex engine
> > in that case.
>
> They claim to be less expensive than regexp-based coloring, especially
> with very long lines.
>

[-- Attachment #1.2: Type: text/html, Size: 2311 bytes --]

[-- Attachment #2: a.cpp --]
[-- Type: application/octet-stream, Size: 2170 bytes --]

/*#include <stdio.h>

int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}

int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}

int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}
int main(int argc, char **argv)
{
	printf("hello world\n");

	return 0;
}

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06  4:39                 ` HaiJun Zhang
@ 2020-01-06  5:33                   ` Eli Zaretskii
  2020-01-06  5:55                     ` HaiJun Zhang
  2020-01-06 16:45                     ` arthur miller
  2020-01-06 13:47                   ` Stefan Monnier
  1 sibling, 2 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-06  5:33 UTC (permalink / raw)
  To: emacs-devel, HaiJun Zhang, arthur miller; +Cc: alan, monnier

On January 6, 2020 6:39:02 AM GMT+02:00, HaiJun Zhang <netjune@outlook.com> wrote:
> Could someone explain how the jit-lock in Emacs works in the following
> case?
> 
> 1. emacs -Q
> 2. open the attachment file and goto end of buffer
> 3. M-x desktop-save and quit emacs
> 4. emacs -Q
> 5. M-x desktop-read
> 
> I see the buffer is fontified correctly. Does it parse the whole
> buffer?
> 
> 
> 在 2020年1月6日 +0800 AM11:42，Eli Zaretskii <eliz@gnu.org>，写道：
> > > From: arthur miller <arthur.miller@live.com>
> > > CC: "monnier@iro.umontreal.ca" <monnier@iro.umontreal.ca>,
> "alan@idiocy.org"
> > > <alan@idiocy.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> > > Date: Sun, 5 Jan 2020 20:28:43 +0000
> > >
> > > When I said batch-processing I ment processing a file or chunk of
> buffer (region) just before it is to be
> > > displayed to a user. Same for other "insertions" from macro
> expansions or similar.
> >
> > Then we always do "batch processing", because the display engine has
> > no good idea what exactly changed in the buffer. So it always
> > processes some minimal chunk of text that it can prove to itself
> that
> > the changes were all inside that chunk.
> >
> > > I also don't think teee-sitter is needed for syntax coloring.
> Tree-sitter seems to be very expensive regex engine
> > > in that case.
> >
> > They claim to be less expensive than regexp-based coloring,
> especially
> > with very long lines.
> >

We never parse the whole buffer, only its chunk that is slightly larger than what would be actually displayed in a window.

This works by the display engine calling the fontification-functions for the buffer text it is about to display, whenever it finds a chunk of text whose 'fontified' text property is nil.




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06  5:33                   ` Eli Zaretskii
@ 2020-01-06  5:55                     ` HaiJun Zhang
  2020-01-06  6:11                       ` Eli Zaretskii
  2020-01-06 16:45                     ` arthur miller
  1 sibling, 1 reply; 76+ messages in thread
From: HaiJun Zhang @ 2020-01-06  5:55 UTC (permalink / raw)
  To: emacs-devel, arthur miller, Eli Zaretskii; +Cc: alan, monnier

[-- Attachment #1: Type: text/plain, Size: 2231 bytes --]

In the test cpp file, there is a  “/*” at the beginning and no “*/“ in the file. So all contents of the file are comment. If it doesn’t parse from the beginning of the file, how can it known they are comment when the point is at the end of buffer?

在 2020年1月6日 +0800 PM1:33，Eli Zaretskii <eliz@gnu.org>，写道：
> On January 6, 2020 6:39:02 AM GMT+02:00, HaiJun Zhang <netjune@outlook.com> wrote:
> > Could someone explain how the jit-lock in Emacs works in the following
> > case?
> >
> > 1. emacs -Q
> > 2. open the attachment file and goto end of buffer
> > 3. M-x desktop-save and quit emacs
> > 4. emacs -Q
> > 5. M-x desktop-read
> >
> > I see the buffer is fontified correctly. Does it parse the whole
> > buffer?
> >
> >
> > 在 2020年1月6日 +0800 AM11:42，Eli Zaretskii <eliz@gnu.org>，写道：
> > > > From: arthur miller <arthur.miller@live.com>
> > > > CC: "monnier@iro.umontreal.ca" <monnier@iro.umontreal.ca>,
> > "alan@idiocy.org"
> > > > <alan@idiocy.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> > > > Date: Sun, 5 Jan 2020 20:28:43 +0000
> > > >
> > > > When I said batch-processing I ment processing a file or chunk of
> > buffer (region) just before it is to be
> > > > displayed to a user. Same for other "insertions" from macro
> > expansions or similar.
> > >
> > > Then we always do "batch processing", because the display engine has
> > > no good idea what exactly changed in the buffer. So it always
> > > processes some minimal chunk of text that it can prove to itself
> > that
> > > the changes were all inside that chunk.
> > >
> > > > I also don't think teee-sitter is needed for syntax coloring.
> > Tree-sitter seems to be very expensive regex engine
> > > > in that case.
> > >
> > > They claim to be less expensive than regexp-based coloring,
> > especially
> > > with very long lines.
> > >
>
> We never parse the whole buffer, only its chunk that is slightly larger than what would be actually displayed in a window.
>
> This works by the display engine calling the fontification-functions for the buffer text it is about to display, whenever it finds a chunk of text whose 'fontified' text property is nil.
>

[-- Attachment #2: Type: text/html, Size: 4524 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06  5:55                     ` HaiJun Zhang
@ 2020-01-06  6:11                       ` Eli Zaretskii
  0 siblings, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-06  6:11 UTC (permalink / raw)
  To: emacs-devel, HaiJun Zhang, arthur miller; +Cc: alan, monnier

On January 6, 2020 7:55:02 AM GMT+02:00, HaiJun Zhang <netjune@outlook.com> wrote:
> In the test cpp file, there is a  “/*” at the beginning and no “*/“ in
> the file. So all contents of the file are comment. If it doesn’t parse
> from the beginning of the file, how can it known they are comment when
> the point is at the end of buffer?
> 


That's up to the fontification-functions the display engine calls.  The major mode defines its fontification function, and that function can look before and after the chunk passed to it.  In the case of your file, look in CC Mode to see what it does 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06  4:39                 ` HaiJun Zhang
  2020-01-06  5:33                   ` Eli Zaretskii
@ 2020-01-06 13:47                   ` Stefan Monnier
  2020-01-06 16:36                     ` HaiJun Zhang
  2020-01-06 16:48                     ` arthur miller
  1 sibling, 2 replies; 76+ messages in thread
From: Stefan Monnier @ 2020-01-06 13:47 UTC (permalink / raw)
  To: HaiJun Zhang; +Cc: Eli Zaretskii, alan, arthur miller, emacs-devel

> I see the buffer is fontified correctly. Does it parse the whole buffer?

We have different levels of parsing.  At the bottom we have
`syntax-ppss` (whose workhorse, implemented in C, is
`parse-partial-sexp`) which only counts parentheses and looks for
comment and string markers.  In the above case, `syntax-ppss` indeed
parses the whole buffer, but given its limited scope this parsing is
usually fast (it can be slow in some cases, because `parse-partial-sexp`
is supplemented by `syntax-propertize-function` to handle the "unusual"
cases of "strings/comments" (a typical example would be here-documents
in shell scripts) and this is all implemented in Elisp using regexp
searches).

After this parsing is done, font-lock looks at the few lines actually
displayed using its Elisp/regexps rules to apply the actual highlighting.
This may look at more parts of the buffer, tho, depending on the actual
font-lock rules.

        Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-03 10:05 Eli Zaretskii
                   ` (7 preceding siblings ...)
  2020-01-04 20:43 ` Stefan Monnier
@ 2020-01-06 16:14 ` Anand Tamariya
  8 siblings, 0 replies; 76+ messages in thread
From: Anand Tamariya @ 2020-01-06 16:14 UTC (permalink / raw)
  To: emacs-devel

Emacs already has Wisent and Bovine parsers which support many
languages. If you want, you can start from there as a starting point.

https://www.gnu.org/software/emacs/manual/html_mono/wisent.html
https://www.gnu.org/software/emacs/manual/html_mono/bovine.html





^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06 13:47                   ` Stefan Monnier
@ 2020-01-06 16:36                     ` HaiJun Zhang
  2020-01-06 16:48                     ` arthur miller
  1 sibling, 0 replies; 76+ messages in thread
From: HaiJun Zhang @ 2020-01-06 16:36 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, alan, arthur miller, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1114 bytes --]

Thanks for your great explanation.
在 2020年1月6日 +0800 PM9:47，Stefan Monnier <monnier@iro.umontreal.ca>，写道：
> > I see the buffer is fontified correctly. Does it parse the whole buffer?
>
> We have different levels of parsing. At the bottom we have
> `syntax-ppss` (whose workhorse, implemented in C, is
> `parse-partial-sexp`) which only counts parentheses and looks for
> comment and string markers. In the above case, `syntax-ppss` indeed
> parses the whole buffer, but given its limited scope this parsing is
> usually fast (it can be slow in some cases, because `parse-partial-sexp`
> is supplemented by `syntax-propertize-function` to handle the "unusual"
> cases of "strings/comments" (a typical example would be here-documents
> in shell scripts) and this is all implemented in Elisp using regexp
> searches).
>
> After this parsing is done, font-lock looks at the few lines actually
> displayed using its Elisp/regexps rules to apply the actual highlighting.
> This may look at more parts of the buffer, tho, depending on the actual
> font-lock rules.
>
>
> Stefan
>

[-- Attachment #2: Type: text/html, Size: 1738 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06  5:33                   ` Eli Zaretskii
  2020-01-06  5:55                     ` HaiJun Zhang
@ 2020-01-06 16:45                     ` arthur miller
  2020-01-07 16:19                       ` Eli Zaretskii
  1 sibling, 1 reply; 76+ messages in thread
From: arthur miller @ 2020-01-06 16:45 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: HaiJun Zhang, alan@idiocy.org, monnier@iro.umontreal.ca,
	emacs-devel@gnu.org

Eli Zaretskii <eliz@gnu.org> writes:

> On January 6, 2020 6:39:02 AM GMT+02:00, HaiJun Zhang <netjune@outlook.com> wrote:
>> Could someone explain how the jit-lock in Emacs works in the following
>> case?
>> 
>> 1. emacs -Q
>> 2. open the attachment file and goto end of buffer
>> 3. M-x desktop-save and quit emacs
>> 4. emacs -Q
>> 5. M-x desktop-read
>> 
>> I see the buffer is fontified correctly. Does it parse the whole
>> buffer?
>> 
>> 
>> 在 2020年1月6日 +0800 AM11:42，Eli Zaretskii <eliz@gnu.org>，写道：
>> > > From: arthur miller <arthur.miller@live.com>
>> > > CC: "monnier@iro.umontreal.ca" <monnier@iro.umontreal.ca>,
>> "alan@idiocy.org"
>> > > <alan@idiocy.org>, "emacs-devel@gnu.org" <emacs-devel@gnu.org>
>> > > Date: Sun, 5 Jan 2020 20:28:43 +0000
>> > >
>> > > When I said batch-processing I ment processing a file or chunk of
>> buffer (region) just before it is to be
>> > > displayed to a user. Same for other "insertions" from macro
>> expansions or similar.
>> >
>> > Then we always do "batch processing", because the display engine has
>> > no good idea what exactly changed in the buffer. So it always
>> > processes some minimal chunk of text that it can prove to itself
>> that
>> > the changes were all inside that chunk.
>> >
>> > > I also don't think teee-sitter is needed for syntax coloring.
>> Tree-sitter seems to be very expensive regex engine
>> > > in that case.
>> >
>> > They claim to be less expensive than regexp-based coloring,
>> especially
>> > with very long lines.
>> >
>
> We never parse the whole buffer, only its chunk that is slightly larger than what would be actually displayed in a window.
>
> This works by the display engine calling the fontification-functions for the
> buffer text it is about to display, whenever it finds a chunk of text whose
> 'fontified' text property is nil.

Thanks for the explanation.

Can I ask another related think: if I would to break the buffer
in chunks to send it to different threads, how can I find an
'edge' (in absence of better term) of an expression?

Say I happened to make a split in the middle of a comment, or some
expression, is there already something I can use to figure out
how to adjust split so I break on whole expressions, and not in
the middle?

About tree-sitter, I ment more in terms of RAM, it must cost to keep
all those AST nodes in RAM. In general it will keep entire file as an
AST copy in RAM.

But they are certainly more effective in terms of CPU then regular
expressions since tree-sitter seems to do only minimal work needed
when updating the AST, while regular expressions are more of a brute
force approach.

If you can use tree-sitter for other purposes then just syntax colouring
then tree-sitter might be definitely be a winner.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06 13:47                   ` Stefan Monnier
  2020-01-06 16:36                     ` HaiJun Zhang
@ 2020-01-06 16:48                     ` arthur miller
  1 sibling, 0 replies; 76+ messages in thread
From: arthur miller @ 2020-01-06 16:48 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: HaiJun Zhang, emacs-devel@gnu.org, Eli Zaretskii, alan@idiocy.org

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> I see the buffer is fontified correctly. Does it parse the whole buffer?
>
> We have different levels of parsing.  At the bottom we have
> `syntax-ppss` (whose workhorse, implemented in C, is
> `parse-partial-sexp`) which only counts parentheses and looks for
> comment and string markers.  In the above case, `syntax-ppss` indeed
> parses the whole buffer, but given its limited scope this parsing is
> usually fast (it can be slow in some cases, because `parse-partial-sexp`
> is supplemented by `syntax-propertize-function` to handle the "unusual"
> cases of "strings/comments" (a typical example would be here-documents
> in shell scripts) and this is all implemented in Elisp using regexp
> searches).
>
> After this parsing is done, font-lock looks at the few lines actually
> displayed using its Elisp/regexps rules to apply the actual highlighting.
> This may look at more parts of the buffer, tho, depending on the actual
> font-lock rules.
>
>
>         Stefan

Thanks for the overview.
/a



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-06 16:45                     ` arthur miller
@ 2020-01-07 16:19                       ` Eli Zaretskii
  0 siblings, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-07 16:19 UTC (permalink / raw)
  To: arthur miller; +Cc: netjune, alan, monnier, emacs-devel

> From: arthur miller <arthur.miller@live.com>
> CC: "emacs-devel@gnu.org" <emacs-devel@gnu.org>, HaiJun Zhang
> 	<netjune@outlook.com>, "alan@idiocy.org" <alan@idiocy.org>,
> 	"monnier@iro.umontreal.ca" <monnier@iro.umontreal.ca>
> Date: Mon, 6 Jan 2020 16:45:51 +0000
> 
> Can I ask another related think: if I would to break the buffer
> in chunks to send it to different threads, how can I find an
> 'edge' (in absence of better term) of an expression?

You could use the syntax-related features we have.

> Say I happened to make a split in the middle of a comment, or some
> expression, is there already something I can use to figure out
> how to adjust split so I break on whole expressions, and not in
> the middle?

Look up syntax-ppss.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-04 14:46 ` arthur miller
  2020-01-05 14:50   ` Alan Third
@ 2020-01-09 21:56   ` Dmitry Gutov
  2020-01-10  7:41     ` Eli Zaretskii
  1 sibling, 1 reply; 76+ messages in thread
From: Dmitry Gutov @ 2020-01-09 21:56 UTC (permalink / raw)
  To: arthur miller, Eli Zaretskii; +Cc: emacs-devel@gnu.org

On 04.01.2020 16:46, arthur miller wrote:

> There is a very good presentation of tree-sitter on YT by its author:
> 
> https://www.youtube.com/watch?v=Jes3bD6P0To
> 
> Looks much better then what I got a picture by just reading on the
> website:

It was a good watch.

Some takeaways from me:

It implements a GLR parser. One that can update the existing AST quickly 
for an arbitrary edit in the middle of a file. (*)

But it parses a new file quickly as well: a 20000 lines JS file in 54ms.

To be able to reach that speed, they went the traditional 
compiler-writer route of having a separate (grammar-to-C-code) 
compilation step from a grammar to a parser program (which relies on a 
shared runtime). (**)

Some of it seems to be by necessity. Every run returns a full AST, not 
just an "AST up to this position". I suppose the author didn't want the 
problems that come with unfinished parse trees when code relies on that 
returned value. (***)

The generated parser, in addition to being incremental, is 
error-tolerant, which is a necessity for use in editors.

As a result, they have features like fast semantic syntax highlighting, 
as well code folding that accurately detects where function body begins 
and ends (previously, Atom and other editors used guessing based on 
indentation levels, apparently). And a "extend selection" command based 
on AST as well (****)

Tree-Sitter is also used inside GitHub for various features, including 
their Semantic library (which implements code navigation on the web).

In the meantime, our current answer to all of the above is syntax-ppss 
plus local regexp-based parsing around the visible part of the buffer.

To compare:

(*) syntax-ppss is also fully incremental, although the returned value 
is a very simplistic substitute for an AST. But we've been using it for 
a while and have done solid things with it.

(**) Which means that if we try to use Tree-Sitter as-is, our current 
practice of defining the language grammar in Lisp would go our of the 
window. https://github.com/ubolonton/emacs-tree-sitter demonstrates this 
as well: language grammars have to be compiled into a shared library (or 
libraries). We would have lots of grammars supplied by the third party, 
which is kind of good, but we would lose the ease of experimenting with 
them that we have now, or being able to write support for a new 
up-and-coming language very quickly. Which a certain fraction of our 
users enjoys, AFAIK.

(***) Whereas syntax-ppss stops at a requested position, thus saving on 
CPU cycles this way. Similarly, if a new system we'll transition to 
someday also does this, its absolute performance/throughput would be 
less important if it only usually has to parse a screen-worth of file at 
a time.

(****) We've been managing surprisingly well with syntax-ppss, 
forward-sexp, etc. So code folding works quite well in Emacs already, 
and the easy-kill package in GNU ELPA does the "expand selection" thing 
very successfully as well. But we could use some improvement in having 
some more complex syntax supported or handled more easily, in certain 
languages. Having a "proper AST" available is nothing to sneeze at 
either, and would likely help a lot in indentation code.

My personal takeaway is that we could really benefit from a lispier 
version of this technology, and Someone(tm) should start working on that.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-09 21:56   ` Dmitry Gutov
@ 2020-01-10  7:41     ` Eli Zaretskii
  2020-01-11  1:41       ` Dmitry Gutov
  0 siblings, 1 reply; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-10  7:41 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: arthur.miller, emacs-devel

> Cc: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Fri, 10 Jan 2020 00:56:38 +0300
> 
> (**) Which means that if we try to use Tree-Sitter as-is, our current 
> practice of defining the language grammar in Lisp would go our of the 
> window. https://github.com/ubolonton/emacs-tree-sitter demonstrates this 
> as well: language grammars have to be compiled into a shared library (or 
> libraries). We would have lots of grammars supplied by the third party, 
> which is kind of good, but we would lose the ease of experimenting with 
> them that we have now, or being able to write support for a new 
> up-and-coming language very quickly. Which a certain fraction of our 
> users enjoys, AFAIK.

If we provide infrastructure for using the likes of Tree-Sitter in
core, how long do you think it will take until someone rewrites their
JS generator of parse tables in Lisp?  And we already have machinery
in place for loading external shared objects; it can be extended if
necessary to handle loading parse tables.

Bottom line: this aspect doesn't sound like a problem to me in the
long run.  I was rather surprised that they didn't have ELisp parse
tables out of the box.

> My personal takeaway is that we could really benefit from a lispier 
> version of this technology, and Someone(tm) should start working on that.

Agreed.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-10  7:41     ` Eli Zaretskii
@ 2020-01-11  1:41       ` Dmitry Gutov
  2020-01-11  7:53         ` Eli Zaretskii
  0 siblings, 1 reply; 76+ messages in thread
From: Dmitry Gutov @ 2020-01-11  1:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: arthur.miller, emacs-devel

On 10.01.2020 9:41, Eli Zaretskii wrote:
>> Cc: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Fri, 10 Jan 2020 00:56:38 +0300
>>
>> (**) Which means that if we try to use Tree-Sitter as-is, our current
>> practice of defining the language grammar in Lisp would go our of the
>> window. https://github.com/ubolonton/emacs-tree-sitter demonstrates this
>> as well: language grammars have to be compiled into a shared library (or
>> libraries). We would have lots of grammars supplied by the third party,
>> which is kind of good, but we would lose the ease of experimenting with
>> them that we have now, or being able to write support for a new
>> up-and-coming language very quickly. Which a certain fraction of our
>> users enjoys, AFAIK.
> 
> If we provide infrastructure for using the likes of Tree-Sitter in
> core, how long do you think it will take until someone rewrites their
> JS generator of parse tables in Lisp?  And we already have machinery
> in place for loading external shared objects; it can be extended if
> necessary to handle loading parse tables.

It's should be easy enough to convert between the JS and Lisp syntax. 
But how do you compile it to a library that Tree-Sitter expects without 
having the user install a C compiler toolchain?

IIRC you objected against features relying on something like this in the 
past.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-11  1:41       ` Dmitry Gutov
@ 2020-01-11  7:53         ` Eli Zaretskii
  2020-01-11 12:24           ` Dmitry Gutov
  0 siblings, 1 reply; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-11  7:53 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: arthur.miller, emacs-devel

> Cc: arthur.miller@live.com, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 11 Jan 2020 04:41:53 +0300
> 
> > If we provide infrastructure for using the likes of Tree-Sitter in
> > core, how long do you think it will take until someone rewrites their
> > JS generator of parse tables in Lisp?  And we already have machinery
> > in place for loading external shared objects; it can be extended if
> > necessary to handle loading parse tables.
> 
> It's should be easy enough to convert between the JS and Lisp syntax. 
> But how do you compile it to a library that Tree-Sitter expects without 
> having the user install a C compiler toolchain?

Yes, users who want to compile their own parsers, or recompile
existing ones, will have to have a C compiler installed.  Which is a
downside, but not a serious one in this case, IMO, because most users
will use existing parser tables.  I'd expect most if not all of such
tables to come together with the Emacs-adapted Tree-Sitter package, or
be available on ELPA, or even (gasp!) in core.

> IIRC you objected against features relying on something like this in the 
> past.

If users need to make changes in this stuff frequently, then yes, it's
a serious disadvantage to need a compiler.  But it doesn't seem to be
the case here, not up front anyway.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-11  7:53         ` Eli Zaretskii
@ 2020-01-11 12:24           ` Dmitry Gutov
  2020-01-11 12:29             ` Eli Zaretskii
  0 siblings, 1 reply; 76+ messages in thread
From: Dmitry Gutov @ 2020-01-11 12:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: arthur.miller, emacs-devel

On 11.01.2020 9:53, Eli Zaretskii wrote:

> Yes, users who want to compile their own parsers, or recompile
> existing ones, will have to have a C compiler installed.  Which is a
> downside, but not a serious one in this case, IMO, because most users
> will use existing parser tables.  I'd expect most if not all of such
> tables to come together with the Emacs-adapted Tree-Sitter package, or
> be available on ELPA, or even (gasp!) in core.

ELPA won't solve the necessity to have this code compiled for different 
platforms.

If we go this route, then Tree-Sitter and some core grammars will have 
to be in the core for sure, I'm just worried about the ease of improving 
or developing new ones.

Our users, compared to other editors, are probably the most spoiled (in 
a good way) in regards to development iteration speed.

So, as outlined previously, we might even prefer 10x slower parsing 
speed if it comes with faster development cycle.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: Using incremental parsing in Emacs
  2020-01-11 12:24           ` Dmitry Gutov
@ 2020-01-11 12:29             ` Eli Zaretskii
  0 siblings, 0 replies; 76+ messages in thread
From: Eli Zaretskii @ 2020-01-11 12:29 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: arthur.miller, emacs-devel

> Cc: arthur.miller@live.com, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 11 Jan 2020 15:24:37 +0300
> 
> On 11.01.2020 9:53, Eli Zaretskii wrote:
> 
> > Yes, users who want to compile their own parsers, or recompile
> > existing ones, will have to have a C compiler installed.  Which is a
> > downside, but not a serious one in this case, IMO, because most users
> > will use existing parser tables.  I'd expect most if not all of such
> > tables to come together with the Emacs-adapted Tree-Sitter package, or
> > be available on ELPA, or even (gasp!) in core.
> 
> ELPA won't solve the necessity to have this code compiled for different 
> platforms.

It will if distros include them.

> If we go this route, then Tree-Sitter and some core grammars will have 
> to be in the core for sure, I'm just worried about the ease of improving 
> or developing new ones.
> 
> Our users, compared to other editors, are probably the most spoiled (in 
> a good way) in regards to development iteration speed.
> 
> So, as outlined previously, we might even prefer 10x slower parsing 
> speed if it comes with faster development cycle.

These are all valid concerns, but I'd defer dealing with them once we
have the infrastructure for using incremental parsers.  Right now,
doing so is just a pipe dream, and I think we need to make it more
practical.



^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2020-01-11 12:29 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1504933445.581219.1569619792280.ref@mail.yahoo.com>
2019-09-27 21:29 ` Where to place third-party C source code? Jorge Araya Navarro
2019-09-28  6:31   ` Eli Zaretskii
2019-09-28  7:33     ` Jorge Javier Araya Navarro
2019-09-28  9:53       ` Eli Zaretskii
2019-09-28 12:54       ` Stefan Monnier
2019-12-26 16:52         ` yyoncho
2020-01-04  3:25           ` Using incremental parsing in Emacs HaiJun Zhang
2020-01-04  5:21             ` Tobias Bading
2020-01-04 23:48             ` Richard Stallman
2020-01-05  3:36               ` Eli Zaretskii
2020-01-03 10:05 Eli Zaretskii
2020-01-03 13:36 ` phillip.lord
2020-01-03 14:24   ` Eli Zaretskii
2020-01-03 15:43     ` arthur miller
2020-01-03 16:00 ` Dmitry Gutov
2020-01-03 17:09   ` Pankaj Jangid
2020-01-03 19:39 ` Stephen Leake
2020-01-03 20:05   ` Eli Zaretskii
2020-01-03 22:21     ` arthur miller
2020-01-04  3:46       ` HaiJun Zhang
2020-01-04  8:23       ` Eli Zaretskii
2020-01-03 23:53     ` Stephen Leake
2020-01-04  8:45       ` Eli Zaretskii
2020-01-04 14:05         ` arthur miller
2020-01-04 19:26         ` Stephen Leake
2020-01-04 19:54           ` Eli Zaretskii
2020-01-05 17:05             ` Stephen Leake
2020-01-05 19:14               ` yyoncho
2020-01-05 22:44     ` Dmitry Gutov
2020-01-04  3:59 ` HaiJun Zhang
     [not found] ` <41b3e9a0-2866-4692-a35c-6d9541bc3aaa@Spark>
2020-01-04  4:57   ` HaiJun Zhang
2020-01-04  8:55     ` Eli Zaretskii
2020-01-04 12:50       ` VanL
2020-01-04 13:22         ` arthur miller
2020-01-04 13:30       ` arthur miller
2020-01-04 13:42         ` Dmitry Gutov
2020-01-04 14:46 ` arthur miller
2020-01-05 14:50   ` Alan Third
2020-01-05 15:16     ` arthur miller
2020-01-05 15:29     ` Eli Zaretskii
2020-01-05 15:31     ` Eli Zaretskii
2020-01-05 17:11     ` Stephen Leake
2020-01-09 21:56   ` Dmitry Gutov
2020-01-10  7:41     ` Eli Zaretskii
2020-01-11  1:41       ` Dmitry Gutov
2020-01-11  7:53         ` Eli Zaretskii
2020-01-11 12:24           ` Dmitry Gutov
2020-01-11 12:29             ` Eli Zaretskii
2020-01-04 20:26 ` Yuan Fu
2020-01-04 20:43 ` Stefan Monnier
2020-01-05 14:19   ` Alan Third
2020-01-05 17:07     ` Stephen Leake
2020-01-05 19:16       ` Alan Third
2020-01-05 17:09     ` Stefan Monnier
2020-01-05 18:22       ` Eli Zaretskii
2020-01-05 19:18         ` Stefan Monnier
2020-01-05 19:36           ` Eli Zaretskii
2020-01-05 20:27             ` Stefan Monnier
2020-01-05 21:12               ` yyoncho
2020-01-05 22:10                 ` Stefan Monnier
2020-01-05 23:08                   ` yyoncho
2020-01-06  3:39                   ` Eli Zaretskii
2020-01-05 19:23         ` arthur miller
2020-01-05 19:40           ` Eli Zaretskii
2020-01-05 20:28             ` arthur miller
2020-01-06  3:42               ` Eli Zaretskii
2020-01-06  4:39                 ` HaiJun Zhang
2020-01-06  5:33                   ` Eli Zaretskii
2020-01-06  5:55                     ` HaiJun Zhang
2020-01-06  6:11                       ` Eli Zaretskii
2020-01-06 16:45                     ` arthur miller
2020-01-07 16:19                       ` Eli Zaretskii
2020-01-06 13:47                   ` Stefan Monnier
2020-01-06 16:36                     ` HaiJun Zhang
2020-01-06 16:48                     ` arthur miller
2020-01-06 16:14 ` Anand Tamariya

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).