unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* New package emacs-parser-generator
@ 2021-11-27 21:40 Christian Johansson
  2021-11-28  7:01 ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Johansson @ 2021-11-27 21:40 UTC (permalink / raw)
  To: Emacs developers

Hi!

I have started on a parser generator library for Emacs that currently 
can generate canonical LR(k) parsers as stand-alone elisp files. I'm 
using it for the automatically generated PHP 8.0 parser in phps-mode. A 
difference between my library and the built-in Wisent parser generator 
is that it can handle e-identifiers (like %empty), context sensitive 
precedence (like %prec) and global precedence rules (not sure if Wisent 
does support that)

In the future I have planned on implementing more parsing algorithms in 
this library like LL(k) and LALR(k)

It is located at https://github.com/cjohansson/emacs-parser-generator  
(GitHub is currently down it seems but when it gets up again it should 
be at that location)


-- 
Hälsningar / Best Regards
Christian




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-27 21:40 New package emacs-parser-generator Christian Johansson
@ 2021-11-28  7:01 ` Eli Zaretskii
  2021-11-28  7:22   ` Christian Johansson
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2021-11-28  7:01 UTC (permalink / raw)
  To: Christian Johansson; +Cc: emacs-devel

> Date: Sat, 27 Nov 2021 22:40:45 +0100
> From: Christian Johansson <christian@cvj.se>
> 
> I have started on a parser generator library for Emacs that currently 
> can generate canonical LR(k) parsers as stand-alone elisp files. I'm 
> using it for the automatically generated PHP 8.0 parser in phps-mode. A 
> difference between my library and the built-in Wisent parser generator 
> is that it can handle e-identifiers (like %empty), context sensitive 
> precedence (like %prec) and global precedence rules (not sure if Wisent 
> does support that)
> 
> In the future I have planned on implementing more parsing algorithms in 
> this library like LL(k) and LALR(k)

Thanks.  I believe the plan is to use Tree-sitter for the jobs for
which a parser is needed.  But we also want to have an API that could
accommodate other parsers into the same framework.  So I think it
would be good if you cooperate with Yuan Fu <casouri@gmail.com>, who
is working on Tree-sitter integration, so that the API for using a
parser could be a common one.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-28  7:01 ` Eli Zaretskii
@ 2021-11-28  7:22   ` Christian Johansson
  2021-11-28 13:24     ` Stefan Monnier
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Johansson @ 2021-11-28  7:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Hi,

I believe tree-sitter is not suitable for proper parsing (it does not support LR(1) for example)

It is good for fast syntax coloring with approximate / good enough parsing

Regards
Christian

> 28 nov. 2021 kl. 08:02 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> 
>> 
>> Date: Sat, 27 Nov 2021 22:40:45 +0100
>> From: Christian Johansson <christian@cvj.se>
>> 
>> I have started on a parser generator library for Emacs that currently 
>> can generate canonical LR(k) parsers as stand-alone elisp files. I'm 
>> using it for the automatically generated PHP 8.0 parser in phps-mode. A 
>> difference between my library and the built-in Wisent parser generator 
>> is that it can handle e-identifiers (like %empty), context sensitive 
>> precedence (like %prec) and global precedence rules (not sure if Wisent 
>> does support that)
>> 
>> In the future I have planned on implementing more parsing algorithms in 
>> this library like LL(k) and LALR(k)
> 
> Thanks.  I believe the plan is to use Tree-sitter for the jobs for
> which a parser is needed.  But we also want to have an API that could
> accommodate other parsers into the same framework.  So I think it
> would be good if you cooperate with Yuan Fu <casouri@gmail.com>, who
> is working on Tree-sitter integration, so that the API for using a
> parser could be a common one.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-28  7:22   ` Christian Johansson
@ 2021-11-28 13:24     ` Stefan Monnier
  2021-11-28 13:45       ` Christian Johansson
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Monnier @ 2021-11-28 13:24 UTC (permalink / raw)
  To: Christian Johansson; +Cc: Eli Zaretskii, emacs-devel

Christian Johansson [2021-11-28 08:22:48] wrote:
> I believe tree-sitter is not suitable for proper parsing (it does not
> support LR(1) for example)

Really?  AFAIK it uses a GLR parser and hence handles LR(1) and more.


        Stefan




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-28 13:24     ` Stefan Monnier
@ 2021-11-28 13:45       ` Christian Johansson
  2021-11-28 23:46         ` Daniel Martín
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Johansson @ 2021-11-28 13:45 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

Well the GLR(k) algorithm might find a nondeterminstic route route in the grammar that a LR(k) do not find, if I want to know if a code is syntatically correct I would test it against the same type of parser the language uses, that is a deterministic parser

> 
An intuitive structure - Tree-sitter’s output is a concrete syntax tree; each node in the tree corresponds directly to a terminal or non-terminal symbol in the grammar. So in order to produce an easy-to-analyze tree, there should be a direct correspondence between the symbols in your grammar and the recognizable constructs in the language. This might seem obvious, but it is very different from the way that context-free grammars are often written in contexts like language specifications or Yacc/Bison parsers.

> https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl

This is a big issue because each version of a language grammar would need to be converted into tred-sitter form



But anyways I don't see the issue with pluralism in the parser generator space, why would one exclude the other?



Regards

Christian


> 28 nov. 2021 kl. 14:24 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
> 
> Christian Johansson [2021-11-28 08:22:48] wrote:
>> I believe tree-sitter is not suitable for proper parsing (it does not
>> support LR(1) for example)
> 
> Really?  AFAIK it uses a GLR parser and hence handles LR(1) and more.
> 
> 
>        Stefan
> 

[-- Attachment #2: Type: text/html, Size: 3511 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-28 13:45       ` Christian Johansson
@ 2021-11-28 23:46         ` Daniel Martín
  2021-11-29 12:30           ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Martín @ 2021-11-28 23:46 UTC (permalink / raw)
  To: Christian Johansson; +Cc: Stefan Monnier, Eli Zaretskii, emacs-devel

Christian Johansson <christian@cvj.se> writes:

> Well the GLR(k) algorithm might find a nondeterminstic route route in
> the grammar that a LR(k) do not find, if I want to know if a code is
> syntatically correct I would test it against the same type of parser
> the language uses, that is a deterministic parser
>
>> 
> An intuitive structure - Tree-sitter’s output is a concrete syntax
> tree; each node in the tree corresponds directly to a terminal or
> non-terminal symbol in the grammar. So in order to produce an
> easy-to-analyze tree, there should be a direct correspondence between
> the symbols in your grammar and the recognizable constructs in the
> language. This might seem obvious, but it is very different from the
> way that context-free grammars are often written in contexts like
> language specifications or Yacc/Bison parsers.
>
>> https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl
>
> This is a big issue because each version of a language grammar would need to be converted into tred-sitter form
>
>
>
> But anyways I don't see the issue with pluralism in the parser
> generator space, why would one exclude the other?

I think the main question is not about this library vs. Tree-sitter.  We
can have both.  But IMO we should spend some time investigating if a
common API is possible and makes sense.  To the untrained eye, both
libraries solve the problem of generating parsers for languages, and the
use cases seem to be more or less the same, so maybe there's an
opportunity to abstract what's common:

- Create a parser from a grammar (the way grammars are defined differs).
- Parse a region of text and generate a syntax tree.
- Query the syntax tree.
- etc.

To people much more familiar with this topic, is this an
oversimplification that would led to the wrong abstraction?

One thing I saw in the in-progress Tree-sitter ELisp API is that it
feels a bit too coupled to Tree-sitter.  I think in the long run it's
better for Emacs to have an abstract API similar to the package you
propose here, where Tree-sitter could be one possible alternative
implementation.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-28 23:46         ` Daniel Martín
@ 2021-11-29 12:30           ` Eli Zaretskii
  2021-11-29 13:09             ` Christian Johansson
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2021-11-29 12:30 UTC (permalink / raw)
  To: Daniel Martín; +Cc: christian, monnier, emacs-devel

> From: Daniel Martín <mardani29@yahoo.es>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  Eli Zaretskii
>  <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Mon, 29 Nov 2021 00:46:09 +0100
> 
> I think the main question is not about this library vs. Tree-sitter.  We
> can have both.  But IMO we should spend some time investigating if a
> common API is possible and makes sense.  To the untrained eye, both
> libraries solve the problem of generating parsers for languages, and the
> use cases seem to be more or less the same, so maybe there's an
> opportunity to abstract what's common:
> 
> - Create a parser from a grammar (the way grammars are defined differs).
> - Parse a region of text and generate a syntax tree.
> - Query the syntax tree.
> - etc.

I'm not sure this is the correct approach to the issue.  We should
instead to ask ourselves: "what information would Emacs want from a
parser for use in features like indentation, syntax highlight,
refactoring, etc.", and devise the APIs that would be convenient and
would make sense for those Emacs jobs.  The kind of information and
data that a parser can provide should be considered in the light of
those Emacs requirements, and then we have a better chance of coming
up with common APIs.

> One thing I saw in the in-progress Tree-sitter ELisp API is that it
> feels a bit too coupled to Tree-sitter.  I think in the long run it's
> better for Emacs to have an abstract API similar to the package you
> propose here, where Tree-sitter could be one possible alternative
> implementation.

That's true and agreed, but until someone comes up with at least one
more parser and proposes a common API, discussing this will tend to be
academic, I think.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-29 12:30           ` Eli Zaretskii
@ 2021-11-29 13:09             ` Christian Johansson
  2021-11-29 19:22               ` Yuan Fu
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Johansson @ 2021-11-29 13:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, monnier, Daniel Martín

Hi again

My main intention that I forgot to include in the original email was to request to include my parser-generator library in GNU ELPA. Stefan requested that to solve the dependency-problem with generating the parser for phps-mode.

My library is only elisp and I think treesitter is C-based so treesitter is probably faster overall even though a LR(k) should be faster than a GLR(k) parser in theory. Especially the lex-analyzer part is slow in my library, also  generating a parser for a complex language is ridiculously slow. Parsing is very fast though

Regards
Christian



> 29 nov. 2021 kl. 13:30 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> 
>> 
>> From: Daniel Martín <mardani29@yahoo.es>
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  Eli Zaretskii
>> <eliz@gnu.org>,  emacs-devel@gnu.org
>> Date: Mon, 29 Nov 2021 00:46:09 +0100
>> 
>> I think the main question is not about this library vs. Tree-sitter.  We
>> can have both.  But IMO we should spend some time investigating if a
>> common API is possible and makes sense.  To the untrained eye, both
>> libraries solve the problem of generating parsers for languages, and the
>> use cases seem to be more or less the same, so maybe there's an
>> opportunity to abstract what's common:
>> 
>> - Create a parser from a grammar (the way grammars are defined differs).
>> - Parse a region of text and generate a syntax tree.
>> - Query the syntax tree.
>> - etc.
> 
> I'm not sure this is the correct approach to the issue.  We should
> instead to ask ourselves: "what information would Emacs want from a
> parser for use in features like indentation, syntax highlight,
> refactoring, etc.", and devise the APIs that would be convenient and
> would make sense for those Emacs jobs.  The kind of information and
> data that a parser can provide should be considered in the light of
> those Emacs requirements, and then we have a better chance of coming
> up with common APIs.
> 
>> One thing I saw in the in-progress Tree-sitter ELisp API is that it
>> feels a bit too coupled to Tree-sitter.  I think in the long run it's
>> better for Emacs to have an abstract API similar to the package you
>> propose here, where Tree-sitter could be one possible alternative
>> implementation.
> 
> That's true and agreed, but until someone comes up with at least one
> more parser and proposes a common API, discussing this will tend to be
> academic, I think.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-29 13:09             ` Christian Johansson
@ 2021-11-29 19:22               ` Yuan Fu
  2021-12-01  7:52                 ` Christian Johansson
  0 siblings, 1 reply; 16+ messages in thread
From: Yuan Fu @ 2021-11-29 19:22 UTC (permalink / raw)
  To: Christian Johansson
  Cc: Eli Zaretskii, Daniel Martín, monnier, emacs-devel

The tree-sitter integration doesn’t provide anything to define a grammar nor generating a parser. It merely uses pre-defined grammar (written by tree-sitter community) to parse buffers and produce an AST.

To write an tree-sitter grammar definition you need to write the grammar in JavaScript and pass it to tree-sitter’s parser generator. The generator spits out a grammar definition encoded in a C source file (in the form of a C struct). To use this grammar definition, we compile it to a library, load it at runtime, pass the struct to tree-sitter library, and tree-sitter can now parse according to this grammar definition.

The tree-sitter integration provides 1) Lisp wrappers for tree-sitter’s C API; 2) some convenient functions built on the C API; 3) integration with font-lock and indentation. I didn’t add any “common API”, I simply used existing frameworks in Emacs: for font-lock I used font-lock-fontify-region-function, for indent I used indent-line-function. If in the future a common API for parses is desirable, tree-sitter can easily comply.

IOW, this is what tree-sitter integration currently does:

font-lock                                indent
      |                                     |
font-lock-fontify-region-function        indent-line-function
      |                                     |
      |                                     |
tree-sitter-font-lock-fontify-region     tree-sitter-indent-function

This is what could happen if we want a common API:

font-lock                                indent
      |                                     |
font-lock-fontify-region-function        indent-line-function
      |                                     |
      |                                     |
     common  API  --------------------------+
     /         \
    /           \
   /             \
tree-sitter    other parser

Yuan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-11-29 19:22               ` Yuan Fu
@ 2021-12-01  7:52                 ` Christian Johansson
  2021-12-01  8:39                   ` Yuan Fu
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Johansson @ 2021-12-01  7:52 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Daniel Martín, monnier, emacs-devel

Alright, could you give some more details about when a buffer parse is triggered, is it via threads, is it sometimes incremental, does it work on a string copy of the buffer or on the buffer contents directly?

Does treesitter expose a (faster) regexp matcher that perhaps can be used by my library?

Regards
Christian

> 29 nov. 2021 kl. 20:22 skrev Yuan Fu <casouri@gmail.com>:
> 
> The tree-sitter integration doesn’t provide anything to define a grammar nor generating a parser. It merely uses pre-defined grammar (written by tree-sitter community) to parse buffers and produce an AST.
> 
> To write an tree-sitter grammar definition you need to write the grammar in JavaScript and pass it to tree-sitter’s parser generator. The generator spits out a grammar definition encoded in a C source file (in the form of a C struct). To use this grammar definition, we compile it to a library, load it at runtime, pass the struct to tree-sitter library, and tree-sitter can now parse according to this grammar definition.
> 
> The tree-sitter integration provides 1) Lisp wrappers for tree-sitter’s C API; 2) some convenient functions built on the C API; 3) integration with font-lock and indentation. I didn’t add any “common API”, I simply used existing frameworks in Emacs: for font-lock I used font-lock-fontify-region-function, for indent I used indent-line-function. If in the future a common API for parses is desirable, tree-sitter can easily comply.
> 
> IOW, this is what tree-sitter integration currently does:
> 
> font-lock                                indent
>      |                                     |
> font-lock-fontify-region-function        indent-line-function
>      |                                     |
>      |                                     |
> tree-sitter-font-lock-fontify-region     tree-sitter-indent-function
> 
> This is what could happen if we want a common API:
> 
> font-lock                                indent
>      |                                     |
> font-lock-fontify-region-function        indent-line-function
>      |                                     |
>      |                                     |
>     common  API  --------------------------+
>     /         \
>    /           \
>   /             \
> tree-sitter    other parser
> 
> Yuan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-12-01  7:52                 ` Christian Johansson
@ 2021-12-01  8:39                   ` Yuan Fu
  2021-12-01  8:51                     ` Christian Johansson
  0 siblings, 1 reply; 16+ messages in thread
From: Yuan Fu @ 2021-12-01  8:39 UTC (permalink / raw)
  To: Christian Johansson
  Cc: Eli Zaretskii, Daniel Martín, Stefan Monnier,
	Emacs developers



> On Nov 30, 2021, at 11:52 PM, Christian Johansson <christian@cvj.se> wrote:
> 
> Alright, could you give some more details about when a buffer parse is triggered, is it via threads, is it sometimes incremental, does it work on a string copy of the buffer or on the buffer contents directly?

Tree-sitter parses incrementally, I modified primitive insert/delete functions in insdel.c to incrementally parse changed content. There is no need for threads as incremental parsing is extremely fast. We don’t make copies of buffer string, instead, we pass tree-sitter library a function that reads directly from the buffer.

> 
> Does treesitter expose a (faster) regexp matcher that perhaps can be used by my library?

Not that I know of. 

IIUC, tree-sitter integration doesn’t have much to do with proposed emacs-parser-generator, it just exposes tree-sitter to Emacs, and adds some integration to font-lock and indentation that leverages tree-sitter features; whereas emacs-parser-generator seems to be about defining grammar and generating elisp parsers.

Yuan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-12-01  8:39                   ` Yuan Fu
@ 2021-12-01  8:51                     ` Christian Johansson
  2021-12-01 13:45                       ` Stefan Monnier
  2021-12-01 19:25                       ` Yuan Fu
  0 siblings, 2 replies; 16+ messages in thread
From: Christian Johansson @ 2021-12-01  8:51 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Daniel Martín, Stefan Monnier,
	Emacs developers



> 1 dec. 2021 kl. 09:39 skrev Yuan Fu <casouri@gmail.com>:
> 
> 
>> On Nov 30, 2021, at 11:52 PM, Christian Johansson <christian@cvj.se> wrote:
>> 
>> Alright, could you give some more details about when a buffer parse is triggered, is it via threads, is it sometimes incremental, does it work on a string copy of the buffer or on the buffer contents directly?
> 
> Tree-sitter parses incrementally, I modified primitive insert/delete functions in insdel.c to incrementally parse changed content. There is no need for threads as incremental parsing is extremely fast. We don’t make copies of buffer string, instead, we pass tree-sitter library a function that reads directly from the buffer.

Ok where can I read about this function? How does treesitter handle incremental parses which are state-dependent? For example PHPs lex-analyzer works differently in different states of the grammar, like do you signal to treesitter a point in the buffer so it can backtrack the parsers states in order to correctly perform a incremental parse on the new content?

> 
>> 
>> Does treesitter expose a (faster) regexp matcher that perhaps can be used by my library?
> 
> Not that I know of. 
> 
> IIUC, tree-sitter integration doesn’t have much to do with proposed emacs-parser-generator, it just exposes tree-sitter to Emacs, and adds some integration to font-lock and indentation that leverages tree-sitter features; whereas emacs-parser-generator seems to be about defining grammar and generating elisp parsers.

Yes I understand, I'm just curious to see if I could perhaps take use of any of the new features treesitter introduce
 
Regards
Christian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-12-01  8:51                     ` Christian Johansson
@ 2021-12-01 13:45                       ` Stefan Monnier
  2021-12-01 14:10                         ` Christian Johansson
  2021-12-01 19:25                       ` Yuan Fu
  1 sibling, 1 reply; 16+ messages in thread
From: Stefan Monnier @ 2021-12-01 13:45 UTC (permalink / raw)
  To: Christian Johansson
  Cc: Yuan Fu, Eli Zaretskii, Emacs developers, Daniel Martín

> Ok where can I read about this function?

You can try and start with
https://tree-sitter.github.io/tree-sitter/#underlying-research


        Stefan




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-12-01 13:45                       ` Stefan Monnier
@ 2021-12-01 14:10                         ` Christian Johansson
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Johansson @ 2021-12-01 14:10 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Yuan Fu, Daniel Martín, Eli Zaretskii, Emacs developers

No I mean the function in emacs that is the interface to treesitter that Yuan was talking about

Regards
Christian

> 1 dec. 2021 kl. 14:45 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
> 
> 
>> 
>> Ok where can I read about this function?
> 
> You can try and start with
> https://tree-sitter.github.io/tree-sitter/#underlying-research
> 
> 
>        Stefan
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-12-01  8:51                     ` Christian Johansson
  2021-12-01 13:45                       ` Stefan Monnier
@ 2021-12-01 19:25                       ` Yuan Fu
  2021-12-01 19:44                         ` Christian Johansson
  1 sibling, 1 reply; 16+ messages in thread
From: Yuan Fu @ 2021-12-01 19:25 UTC (permalink / raw)
  To: Christian Johansson
  Cc: Eli Zaretskii, Daniel Martín, Stefan Monnier,
	Emacs developers

>> 
>> Tree-sitter parses incrementally, I modified primitive insert/delete functions in insdel.c to incrementally parse changed content. There is no need for threads as incremental parsing is extremely fast. We don’t make copies of buffer string, instead, we pass tree-sitter library a function that reads directly from the buffer.
> 
> Ok where can I read about this function? How does treesitter handle incremental parses which are state-dependent? For example PHPs lex-analyzer works differently in different states of the grammar, like do you signal to treesitter a point in the buffer so it can backtrack the parsers states in order to correctly perform a incremental parse on the new content?

I assume you mean the read function? It can be found here:

https://github.com/casouri/emacs/blob/106d050ad5d02f673f8a089e1f10c1eacfedd124/src/tree-sitter.c#L372

Tree-sitter only requires to be informed of every change to the buffer, it will read the buffer for itself and update the AST. I have no idea if it backtracks behind the scenes. Presumably tree-sitter can figure out from where to backtrack from the change information we give it.

Yuan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New package emacs-parser-generator
  2021-12-01 19:25                       ` Yuan Fu
@ 2021-12-01 19:44                         ` Christian Johansson
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Johansson @ 2021-12-01 19:44 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Daniel Martín, Stefan Monnier,
	Emacs developers

Ok thanks for the link, I will investigate and see if I can figure it out. 

Read a bit of the research and it seems to to use a state-independent lex-analyzer and do like multiple alternative / nondeterministic error-tolerant parses to make a probalistic best parse which is then merged into the previous AST or something like that..

Regards
Christian

> 1 dec. 2021 kl. 20:25 skrev Yuan Fu <casouri@gmail.com>:
> 
> 
>> 
>>> 
>>> Tree-sitter parses incrementally, I modified primitive insert/delete functions in insdel.c to incrementally parse changed content. There is no need for threads as incremental parsing is extremely fast. We don’t make copies of buffer string, instead, we pass tree-sitter library a function that reads directly from the buffer.
>> 
>> Ok where can I read about this function? How does treesitter handle incremental parses which are state-dependent? For example PHPs lex-analyzer works differently in different states of the grammar, like do you signal to treesitter a point in the buffer so it can backtrack the parsers states in order to correctly perform a incremental parse on the new content?
> 
> I assume you mean the read function? It can be found here:
> 
> https://github.com/casouri/emacs/blob/106d050ad5d02f673f8a089e1f10c1eacfedd124/src/tree-sitter.c#L372
> 
> Tree-sitter only requires to be informed of every change to the buffer, it will read the buffer for itself and update the AST. I have no idea if it backtracks behind the scenes. Presumably tree-sitter can figure out from where to backtrack from the change information we give it.
> 
> Yuan



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-12-01 19:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-27 21:40 New package emacs-parser-generator Christian Johansson
2021-11-28  7:01 ` Eli Zaretskii
2021-11-28  7:22   ` Christian Johansson
2021-11-28 13:24     ` Stefan Monnier
2021-11-28 13:45       ` Christian Johansson
2021-11-28 23:46         ` Daniel Martín
2021-11-29 12:30           ` Eli Zaretskii
2021-11-29 13:09             ` Christian Johansson
2021-11-29 19:22               ` Yuan Fu
2021-12-01  7:52                 ` Christian Johansson
2021-12-01  8:39                   ` Yuan Fu
2021-12-01  8:51                     ` Christian Johansson
2021-12-01 13:45                       ` Stefan Monnier
2021-12-01 14:10                         ` Christian Johansson
2021-12-01 19:25                       ` Yuan Fu
2021-12-01 19:44                         ` Christian Johansson

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).