unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Maybe we're taking a wrong approach towards tree-sitter
@ 2021-07-28  1:57 Andrei Kuznetsov
  2021-07-28  3:53 ` [SPAM UNSURE] " Stephen Leake
  2021-07-28 15:09 ` Perry E. Metzger
  0 siblings, 2 replies; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28  1:57 UTC (permalink / raw)
  To: emacs-devel


I could not follow the conversation <<cc-mode fontification feels
random>> particularly well, as it seemed somehow disjoint in a manner I
cannot explain, but it seemed as if consensus has been reached that
Emacs will provide optional functionality integrating yet another
external package, this time tree-sitter.

Unlike features like native JSON, however, I believe tree-sitter is the
first optional package providing notable functionality that would
require a toolchain that depends on LLVM (that of Rust, which
tree-sitter is implemented in), and is therefore inaccessible to people
not running popular systems; I.E., how would one make tree-sitter work
in MS-DOS (Emacs on FreeDOS is a must-have for me, and it would be a
great annoyance if cc-mode, or similar external packages depend on
tree-sitter in the future), or on an Itanium system running GNU/Linux?

I think we should focus on portably reimplementing the relevant
functionality within Emacs, preferably in Lisp.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28  1:57 Maybe we're taking a wrong approach towards tree-sitter Andrei Kuznetsov
@ 2021-07-28  3:53 ` Stephen Leake
  2021-07-28  8:23   ` Manuel Giraud
  2021-07-28 11:43   ` Andrei Kuznetsov
  2021-07-28 15:09 ` Perry E. Metzger
  1 sibling, 2 replies; 59+ messages in thread
From: Stephen Leake @ 2021-07-28  3:53 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: emacs-devel

Andrei Kuznetsov <r12451428287@163.com> writes:

> it seemed as if consensus has been reached that
> Emacs will provide optional functionality integrating yet another
> external package, this time tree-sitter.
>
> Unlike features like native JSON, however, I believe tree-sitter is the
> first optional package providing notable functionality that would
> require a toolchain that depends on LLVM (that of Rust, which
> tree-sitter is implemented in), and is therefore inaccessible to people
> not running popular systems; 

The tree-sitter runtime, that Emacs would link with, it implemented in
C, partly for this reason. It would be compiled with whatever Emacs is
compiled with, or the system compiler.

Some of the tree-sitter development tools are implemented in Rust; you
only need Rust if you are developing/fixing a grammar for a language.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28  3:53 ` [SPAM UNSURE] " Stephen Leake
@ 2021-07-28  8:23   ` Manuel Giraud
  2021-07-28 11:48     ` Andrei Kuznetsov
  2021-07-28 11:43   ` Andrei Kuznetsov
  1 sibling, 1 reply; 59+ messages in thread
From: Manuel Giraud @ 2021-07-28  8:23 UTC (permalink / raw)
  To: Stephen Leake; +Cc: Andrei Kuznetsov, emacs-devel

Stephen Leake <stephen_leake@stephe-leake.org> writes:

[...]

> The tree-sitter runtime, that Emacs would link with, it implemented in
> C, partly for this reason. It would be compiled with whatever Emacs is
> compiled with, or the system compiler.
>
> Some of the tree-sitter development tools are implemented in Rust; you
> only need Rust if you are developing/fixing a grammar for a language.

Hi,

I too did not follow the tree-sitter discussion closely. But AFAIU,
tree-sitter provides tools to generate a parser (in C) from a grammar.

So, is it the generated parsers (for any language Emacs supports) that
will be versionned into the emacs tree?
-- 
Manuel Giraud



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28  3:53 ` [SPAM UNSURE] " Stephen Leake
  2021-07-28  8:23   ` Manuel Giraud
@ 2021-07-28 11:43   ` Andrei Kuznetsov
  2021-07-28 11:50     ` Eli Zaretskii
                       ` (3 more replies)
  1 sibling, 4 replies; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 11:43 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

Stephen Leake <stephen_leake@stephe-leake.org> writes:

> The tree-sitter runtime, that Emacs would link with, it implemented in
> C, partly for this reason. It would be compiled with whatever Emacs is
> compiled with, or the system compiler.

Interesting.  I was not aware of that.

> Some of the tree-sitter development tools are implemented in Rust; you
> only need Rust if you are developing/fixing a grammar for a language.

If I understand this correctly, it means one would require the Rust
toolchain to support new languages in tree-sitter, or to improve
existing support.  Would that really fit Emacs?  I think many people
might not be comfortable learning such a large language and toolchain to
develop editing tools for Emacs.

Furthermore, is there any concrete reason this could not be done in
Lisp?

Note: Somehow I sent a reply earlier, and not a follow-up.  I apologize
for the duplicate.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28  8:23   ` Manuel Giraud
@ 2021-07-28 11:48     ` Andrei Kuznetsov
  2021-07-28 13:04       ` Eli Zaretskii
  0 siblings, 1 reply; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 11:48 UTC (permalink / raw)
  To: Manuel Giraud; +Cc: Stephen Leake, emacs-devel

Manuel Giraud <manuel@ledu-giraud.fr> writes:

> I too did not follow the tree-sitter discussion closely. But AFAIU,
> tree-sitter provides tools to generate a parser (in C) from a grammar.

If that is the case, it certainly seems grave! I don't think an Emacs
that requires source modifications for extending vital editing
functionality is a good idea.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 11:43   ` Andrei Kuznetsov
@ 2021-07-28 11:50     ` Eli Zaretskii
  2021-07-28 12:06       ` Andrei Kuznetsov
  2021-07-28 12:36     ` Ergus
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 11:50 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: stephen_leake, emacs-devel

> From: Andrei Kuznetsov <r12451428287@163.com>
> Date: Wed, 28 Jul 2021 19:43:03 +0800
> Cc: emacs-devel@gnu.org
> 
> > Some of the tree-sitter development tools are implemented in Rust; you
> > only need Rust if you are developing/fixing a grammar for a language.
> 
> If I understand this correctly, it means one would require the Rust
> toolchain to support new languages in tree-sitter, or to improve
> existing support.  Would that really fit Emacs?  I think many people
> might not be comfortable learning such a large language and toolchain to
> develop editing tools for Emacs.
> 
> Furthermore, is there any concrete reason this could not be done in
> Lisp?

This has been discussed.  Patches to convert the TS grammar files to
Emacs Lisp and/or to maintain and develop them in Emacs Lisp will be
most welcome.

As usual with Free Software, it isn't an issue of what's desirable,
it's an issue with someone stepping forward to do the job of
developing this stuff.

TIA



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 11:50     ` Eli Zaretskii
@ 2021-07-28 12:06       ` Andrei Kuznetsov
  2021-07-28 13:05         ` Eli Zaretskii
  0 siblings, 1 reply; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 12:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen_leake, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> This has been discussed.  Patches to convert the TS grammar files to
> Emacs Lisp and/or to maintain and develop them in Emacs Lisp will be
> most welcome.

Does "to maintain and develop them in Emacs Lisp" include facilities
providing functionality similar to TS but not compatible with TS grammar
files? If so I think I may have something up my sleeve, though in an
early state.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 11:43   ` Andrei Kuznetsov
  2021-07-28 11:50     ` Eli Zaretskii
@ 2021-07-28 12:36     ` Ergus
  2021-07-28 13:07       ` Andrei Kuznetsov
  2021-07-28 15:12     ` Perry E. Metzger
  2021-07-29  4:35     ` Richard Stallman
  3 siblings, 1 reply; 59+ messages in thread
From: Ergus @ 2021-07-28 12:36 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: Stephen Leake, emacs-devel

On Wed, Jul 28, 2021 at 07:43:03PM +0800, Andrei Kuznetsov wrote:
>Stephen Leake <stephen_leake@stephe-leake.org> writes:
>
>> The tree-sitter runtime, that Emacs would link with, it implemented in
>> C, partly for this reason. It would be compiled with whatever Emacs is
>> compiled with, or the system compiler.
>
>Interesting.  I was not aware of that.
>
>> Some of the tree-sitter development tools are implemented in Rust; you
>> only need Rust if you are developing/fixing a grammar for a language.
>
>If I understand this correctly, it means one would require the Rust
>toolchain to support new languages in tree-sitter, or to improve
>existing support.  Would that really fit Emacs?  I think many people
>might not be comfortable learning such a large language and toolchain to
>develop editing tools for Emacs.
>
>Furthermore, is there any concrete reason this could not be done in
>Lisp?
>
I will say:
1) Performance (discussed in the previous thread):
2) Not reinvent the wheel.

Tree-sitter is very well maintained, optimized and with very specialized
algorithms; and we lack manpower to duplicate all that effort; and
implementing it in lisp won't really worth the efforts and may be
unmaintainable and slow.

Tree-sitter hopefully won't get abandoned in the future because many
editors use it right now (including neovim) and the community is very
dynamic.

Another advantage is that with tree-sitter as a back-end we could
officially (almost for free) support many languages that are currently
unsupported officially and may require a lot of effort to support them
in a minimal way (or currently supported in some inconsistent way, with
incoherent bindings/colors/indentations. Ex: Typescripts, Rust, Julia)

>Note: Somehow I sent a reply earlier, and not a follow-up.  I apologize
>for the duplicate.
>
>



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 11:48     ` Andrei Kuznetsov
@ 2021-07-28 13:04       ` Eli Zaretskii
  2021-07-28 13:14         ` Andrei Kuznetsov
  2021-07-29 23:12         ` Stephen Leake
  0 siblings, 2 replies; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 13:04 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: stephen_leake, manuel, emacs-devel

> From: Andrei Kuznetsov <r12451428287@163.com>
> Date: Wed, 28 Jul 2021 19:48:18 +0800
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>, emacs-devel@gnu.org
> 
> Manuel Giraud <manuel@ledu-giraud.fr> writes:
> 
> > I too did not follow the tree-sitter discussion closely. But AFAIU,
> > tree-sitter provides tools to generate a parser (in C) from a grammar.
> 
> If that is the case, it certainly seems grave! I don't think an Emacs
> that requires source modifications for extending vital editing
> functionality is a good idea.

TS's code is written in plain C, and doesn't require any regeneration
or source modifications.  Anything else is misunderstanding.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 12:06       ` Andrei Kuznetsov
@ 2021-07-28 13:05         ` Eli Zaretskii
  2021-07-28 13:16           ` Andrei Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 13:05 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: stephen_leake, emacs-devel

> From: Andrei Kuznetsov <r12451428287@163.com>
> Cc: stephen_leake@stephe-leake.org,  emacs-devel@gnu.org
> Date: Wed, 28 Jul 2021 20:06:23 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > This has been discussed.  Patches to convert the TS grammar files to
> > Emacs Lisp and/or to maintain and develop them in Emacs Lisp will be
> > most welcome.
> 
> Does "to maintain and develop them in Emacs Lisp" include facilities
> providing functionality similar to TS but not compatible with TS grammar
> files?

We are talking about the grammar files to be used by TS, so they
should be compatible, of course.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 12:36     ` Ergus
@ 2021-07-28 13:07       ` Andrei Kuznetsov
  2021-07-28 13:16         ` Eli Zaretskii
  2021-07-29 23:25         ` [SPAM UNSURE] " Stephen Leake
  0 siblings, 2 replies; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 13:07 UTC (permalink / raw)
  To: Ergus; +Cc: Stephen Leake, emacs-devel

Ergus <spacibba@aol.com> writes:

> 1) Performance (discussed in the previous thread):

FWIW I have been experimenting with an increcemental GLR parser
generator in Emacs Lisp.  While I have not put in the effort to couple
it with font-lock and such, from anecdotal examination it does not
perform badly with a naive C grammar.  The initial parse does take
several seconds on large files, but afterwards I did not notice a
significant drop in editor responsiveness.

> 2) Not reinvent the wheel.

While tree-sitter may be nice and all, it doesn't seem to offer the
usual extensibility expected from Emacs.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:04       ` Eli Zaretskii
@ 2021-07-28 13:14         ` Andrei Kuznetsov
  2021-07-28 13:27           ` Eli Zaretskii
  2021-07-29 23:12         ` Stephen Leake
  1 sibling, 1 reply; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 13:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen_leake, manuel, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> TS's code is written in plain C, and doesn't require any regeneration
> or source modifications.  Anything else is misunderstanding.

I am confused by TS's documentation, but if my understanding is correct,
shouldn't it be a parser generator that generates C code?  In that case,
how would users load new parsers or modify existing ones?  Perhaps
through something similar to the existing native module support?

I might be making a grave misunderstanding (or several)




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:07       ` Andrei Kuznetsov
@ 2021-07-28 13:16         ` Eli Zaretskii
  2021-07-28 13:27           ` Andrei Kuznetsov
  2021-07-29 23:25         ` [SPAM UNSURE] " Stephen Leake
  1 sibling, 1 reply; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 13:16 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: spacibba, stephen_leake, emacs-devel

> From: Andrei Kuznetsov <r12451428287@163.com>
> Date: Wed, 28 Jul 2021 21:07:40 +0800
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>, emacs-devel@gnu.org
> 
> While tree-sitter may be nice and all, it doesn't seem to offer the
> usual extensibility expected from Emacs.

Which extensibility did you have in mind that TS doesn't support?



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:05         ` Eli Zaretskii
@ 2021-07-28 13:16           ` Andrei Kuznetsov
  0 siblings, 0 replies; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 13:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen_leake, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> We are talking about the grammar files to be used by TS, so they
> should be compatible, of course.

I see.  Thanks for the clarification.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:14         ` Andrei Kuznetsov
@ 2021-07-28 13:27           ` Eli Zaretskii
  2021-07-28 13:31             ` Andrei Kuznetsov
  2021-07-28 14:24             ` Dmitry Gutov
  0 siblings, 2 replies; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 13:27 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: stephen_leake, manuel, emacs-devel

> From: Andrei Kuznetsov <r12451428287@163.com>
> Cc: manuel@ledu-giraud.fr,  stephen_leake@stephe-leake.org,
>   emacs-devel@gnu.org
> Date: Wed, 28 Jul 2021 21:14:48 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > TS's code is written in plain C, and doesn't require any regeneration
> > or source modifications.  Anything else is misunderstanding.
> 
> I am confused by TS's documentation, but if my understanding is correct,
> shouldn't it be a parser generator that generates C code?

TS is not a parser generator, it's a parser that accepts the language
grammar from external files.

> In that case, how would users load new parsers or modify existing
> ones?

If you want to modify a TS grammar file, you can (not in C).  But why
would you want to?  The whole point of using TS is NOT to require that
the Emacs development team or Emacs users should know enough about
parsing of the many languages Emacs supports to modify the grammar.
We want another, independent development team to take care of that,
and we want to use the results of their development with minimum
fuss.  Exactly like we do with other libraries developed by other
projects: the image libraries, GnuTLS, HarfBuzz, etc.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:16         ` Eli Zaretskii
@ 2021-07-28 13:27           ` Andrei Kuznetsov
  2021-07-28 13:32             ` Eli Zaretskii
  0 siblings, 1 reply; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 13:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: spacibba, stephen_leake, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> While tree-sitter may be nice and all, it doesn't seem to offer the
>> usual extensibility expected from Emacs.

> Which extensibility did you have in mind that TS doesn't support?

Let us assume that a generated TS grammar contains a (C) function akin
to `semantic-lex-unterminated-syntax-detected', and I wish to achieve
similar results to binding
`semantic-lex-unterminated-syntax-end-function' to a function of my
choice.  Would that be possible?




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:27           ` Eli Zaretskii
@ 2021-07-28 13:31             ` Andrei Kuznetsov
  2021-07-28 14:24             ` Dmitry Gutov
  1 sibling, 0 replies; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 13:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen_leake, manuel, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> If you want to modify a TS grammar file, you can (not in C).  But why
> would you want to?  The whole point of using TS is NOT to require that
> the Emacs development team or Emacs users should know enough about
> parsing of the many languages Emacs supports to modify the grammar.
> We want another, independent development team to take care of that,
> and we want to use the results of their development with minimum
> fuss.  Exactly like we do with other libraries developed by other
> projects: the image libraries, GnuTLS, HarfBuzz, etc.

Interesting perspective.  Thanks for the clarification




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:27           ` Andrei Kuznetsov
@ 2021-07-28 13:32             ` Eli Zaretskii
  2021-07-28 13:38               ` Andrei Kuznetsov
  0 siblings, 1 reply; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 13:32 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: spacibba, stephen_leake, emacs-devel

> From: Andrei Kuznetsov <r12451428287@163.com>
> Cc: spacibba@aol.com,  stephen_leake@stephe-leake.org,  emacs-devel@gnu.org
> Date: Wed, 28 Jul 2021 21:27:43 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> While tree-sitter may be nice and all, it doesn't seem to offer the
> >> usual extensibility expected from Emacs.
> 
> > Which extensibility did you have in mind that TS doesn't support?
> 
> Let us assume that a generated TS grammar contains a (C) function akin
> to `semantic-lex-unterminated-syntax-detected', and I wish to achieve
> similar results to binding
> `semantic-lex-unterminated-syntax-end-function' to a function of my
> choice.  Would that be possible?

(TS doesn't generate a grammar, it comes with grammar files prepared
externally.)

If you are talking about affecting how TS does lexical analysis for
some language, then I see no reason why we in the Emacs project would
want to do that.  We don't _want_ to develop parsers if we can use
parsers available out there.  Lexical analysis of a parser is
determined by the language it parses, so you need only to change the
parser when the language changes, or to fix a bug.  Both are part of
the job of the TS developers, so there should be no need for us to get
busy with that.  Exactly like we do with other libraries we use that
aren't developed as part of the Emacs project.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:32             ` Eli Zaretskii
@ 2021-07-28 13:38               ` Andrei Kuznetsov
  2021-07-28 14:41                 ` Manuel Giraud
  0 siblings, 1 reply; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-28 13:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: spacibba, stephen_leake, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> If you are talking about affecting how TS does lexical analysis for
> some language, then I see no reason why we in the Emacs project would
> want to do that.  We don't _want_ to develop parsers if we can use
> parsers available out there.  Lexical analysis of a parser is
> determined by the language it parses, so you need only to change the
> parser when the language changes, or to fix a bug.  Both are part of
> the job of the TS developers, so there should be no need for us to get
> busy with that.  Exactly like we do with other libraries we use that
> aren't developed as part of the Emacs project.

Okay, thanks for the clarification




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:27           ` Eli Zaretskii
  2021-07-28 13:31             ` Andrei Kuznetsov
@ 2021-07-28 14:24             ` Dmitry Gutov
  2021-07-28 14:36               ` Dmitry Gutov
                                 ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Dmitry Gutov @ 2021-07-28 14:24 UTC (permalink / raw)
  To: Eli Zaretskii, Andrei Kuznetsov; +Cc: stephen_leake, manuel, emacs-devel

On 28.07.2021 16:27, Eli Zaretskii wrote:
> The whole point of using TS is NOT to require that
> the Emacs development team or Emacs users should know enough about
> parsing of the many languages Emacs supports to modify the grammar.
> We want another, independent development team to take care of that,

I think we know both, though? There are a number of niche languages that 
only Emacs supports.

Or at least that aren't likely to get good support in TreeSitter.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 14:24             ` Dmitry Gutov
@ 2021-07-28 14:36               ` Dmitry Gutov
  2021-07-28 14:51               ` Daniele Nicolodi
  2021-07-28 16:10               ` Eli Zaretskii
  2 siblings, 0 replies; 59+ messages in thread
From: Dmitry Gutov @ 2021-07-28 14:36 UTC (permalink / raw)
  To: Eli Zaretskii, Andrei Kuznetsov; +Cc: stephen_leake, manuel, emacs-devel

Sorry,

On 28.07.2021 17:24, Dmitry Gutov wrote:
> I think we know both, though? There are a number of niche languages that 
              ^ want





^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:38               ` Andrei Kuznetsov
@ 2021-07-28 14:41                 ` Manuel Giraud
  2021-07-28 15:15                   ` Perry E. Metzger
  2021-07-28 16:10                   ` Eli Zaretskii
  0 siblings, 2 replies; 59+ messages in thread
From: Manuel Giraud @ 2021-07-28 14:41 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: Eli Zaretskii, stephen_leake, spacibba, emacs-devel

Andrei Kuznetsov <r12451428287@163.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> If you are talking about affecting how TS does lexical analysis for
>> some language, then I see no reason why we in the Emacs project would
>> want to do that.  We don't _want_ to develop parsers if we can use
>> parsers available out there.  Lexical analysis of a parser is
>> determined by the language it parses, so you need only to change the
>> parser when the language changes, or to fix a bug.  Both are part of
>> the job of the TS developers, so there should be no need for us to get
>> busy with that.  Exactly like we do with other libraries we use that
>> aren't developed as part of the Emacs project.
>
> Okay, thanks for the clarification

Yes, thanks for these clarifications and sorry for my
misunderstanding. I still have one question left though: will the
parsers C code (and TS C code) land into the emacs repo or will TS be
accessible as an external library?
-- 
Manuel Giraud



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 14:24             ` Dmitry Gutov
  2021-07-28 14:36               ` Dmitry Gutov
@ 2021-07-28 14:51               ` Daniele Nicolodi
  2021-07-28 16:10               ` Eli Zaretskii
  2 siblings, 0 replies; 59+ messages in thread
From: Daniele Nicolodi @ 2021-07-28 14:51 UTC (permalink / raw)
  To: emacs-devel

On 28/07/2021 16:24, Dmitry Gutov wrote:
> On 28.07.2021 16:27, Eli Zaretskii wrote:
>> The whole point of using TS is NOT to require that
>> the Emacs development team or Emacs users should know enough about
>> parsing of the many languages Emacs supports to modify the grammar.
>> We want another, independent development team to take care of that,
> 
> I think we know both, though? There are a number of niche languages that 
> only Emacs supports.
> 
> Or at least that aren't likely to get good support in TreeSitter.

I don't see how adding support for TreeSitter can cause any problem to
those. Would you like to elaborate? No one is proposing to disable other
mechanism for fontification and syntax analysis in Emacs.

Cheers,
Dan



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28  1:57 Maybe we're taking a wrong approach towards tree-sitter Andrei Kuznetsov
  2021-07-28  3:53 ` [SPAM UNSURE] " Stephen Leake
@ 2021-07-28 15:09 ` Perry E. Metzger
  2021-07-29 23:35   ` Stephen Leake
  1 sibling, 1 reply; 59+ messages in thread
From: Perry E. Metzger @ 2021-07-28 15:09 UTC (permalink / raw)
  To: emacs-devel

On 7/27/21 21:57, Andrei Kuznetsov wrote:
> Unlike features like native JSON, however, I believe tree-sitter is the
> first optional package providing notable functionality that would
> require a toolchain that depends on LLVM (that of Rust, which
> tree-sitter is implemented in)

Tree sitter is written in C. It has an available set of Rust bindings. 
It compiles perfectly well with any C compiler.

Perry






^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 11:43   ` Andrei Kuznetsov
  2021-07-28 11:50     ` Eli Zaretskii
  2021-07-28 12:36     ` Ergus
@ 2021-07-28 15:12     ` Perry E. Metzger
  2021-07-29 23:28       ` Stephen Leake
  2021-07-29  4:35     ` Richard Stallman
  3 siblings, 1 reply; 59+ messages in thread
From: Perry E. Metzger @ 2021-07-28 15:12 UTC (permalink / raw)
  To: emacs-devel

On 7/28/21 07:43, Andrei Kuznetsov wrote:
> Stephen Leake <stephen_leake@stephe-leake.org> writes:
>
>> Some of the tree-sitter development tools are implemented in Rust; you
>> only need Rust if you are developing/fixing a grammar for a language.
> If I understand this correctly, it means one would require the Rust
> toolchain to support new languages in tree-sitter, or to improve
> existing support.

That's not true. Tree Sitter is not written even partially in Rust. It 
does have Rust bindings for people who use Rust.

Perry






^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 14:41                 ` Manuel Giraud
@ 2021-07-28 15:15                   ` Perry E. Metzger
  2021-07-28 16:10                   ` Eli Zaretskii
  1 sibling, 0 replies; 59+ messages in thread
From: Perry E. Metzger @ 2021-07-28 15:15 UTC (permalink / raw)
  To: emacs-devel

On 7/28/21 10:41, Manuel Giraud wrote:
> Yes, thanks for these clarifications and sorry for my
> misunderstanding. I still have one question left though: will the
> parsers C code (and TS C code) land into the emacs repo or will TS be
> accessible as an external library?

I don't think that has been fully decided. It may be necessary to have a 
patched version of Tree Sitter available to integrate properly with the 
rest of the Emacs runtime.

Perry





^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 14:41                 ` Manuel Giraud
  2021-07-28 15:15                   ` Perry E. Metzger
@ 2021-07-28 16:10                   ` Eli Zaretskii
  1 sibling, 0 replies; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 16:10 UTC (permalink / raw)
  To: Manuel Giraud; +Cc: r12451428287, spacibba, stephen_leake, emacs-devel

> From: Manuel Giraud <manuel@ledu-giraud.fr>
> Cc: Eli Zaretskii <eliz@gnu.org>,  spacibba@aol.com,
>   stephen_leake@stephe-leake.org,  emacs-devel@gnu.org
> Date: Wed, 28 Jul 2021 16:41:01 +0200
> 
> I still have one question left though: will the parsers C code (and
> TS C code) land into the emacs repo or will TS be accessible as an
> external library?

The latter.  Unless something very unexpected will be discovered about
TS that could not be fixed by the TS developers.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 14:24             ` Dmitry Gutov
  2021-07-28 14:36               ` Dmitry Gutov
  2021-07-28 14:51               ` Daniele Nicolodi
@ 2021-07-28 16:10               ` Eli Zaretskii
  2021-07-28 16:24                 ` Perry E. Metzger
  2 siblings, 1 reply; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 16:10 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: r12451428287, stephen_leake, manuel, emacs-devel

> Cc: stephen_leake@stephe-leake.org, manuel@ledu-giraud.fr, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 28 Jul 2021 17:24:35 +0300
> 
> On 28.07.2021 16:27, Eli Zaretskii wrote:
> > The whole point of using TS is NOT to require that
> > the Emacs development team or Emacs users should know enough about
> > parsing of the many languages Emacs supports to modify the grammar.
> > We want another, independent development team to take care of that,
> 
> I think we know both, though? There are a number of niche languages that 
> only Emacs supports.
> 
> Or at least that aren't likely to get good support in TreeSitter.

Either someone motivated will write a TS grammar for them, or they
will continue be supported by "other means".



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 16:10               ` Eli Zaretskii
@ 2021-07-28 16:24                 ` Perry E. Metzger
  2021-07-28 16:29                   ` Eli Zaretskii
  0 siblings, 1 reply; 59+ messages in thread
From: Perry E. Metzger @ 2021-07-28 16:24 UTC (permalink / raw)
  To: Eli Zaretskii, emacs-devel

On 7/28/21 12:10, Eli Zaretskii wrote:
>> There are a number of niche languages that only Emacs supports.
>> Or at least that aren't likely to get good support in TreeSitter.
> Either someone motivated will write a TS grammar for them, or they
> will continue be supported by "other means".
>
It would be nice, of course, if people would contribute new grammars to 
Tree Sitter, as that will benefit everyone using a tool that works with 
Tree Sitter.

Perry





^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 16:24                 ` Perry E. Metzger
@ 2021-07-28 16:29                   ` Eli Zaretskii
  0 siblings, 0 replies; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-28 16:29 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: emacs-devel

> Date: Wed, 28 Jul 2021 12:24:54 -0400
> From: "Perry E. Metzger" <perry@piermont.com>
> 
> On 7/28/21 12:10, Eli Zaretskii wrote:
> >> There are a number of niche languages that only Emacs supports.
> >> Or at least that aren't likely to get good support in TreeSitter.
> > Either someone motivated will write a TS grammar for them, or they
> > will continue be supported by "other means".
> >
> It would be nice, of course, if people would contribute new grammars to 
> Tree Sitter, as that will benefit everyone using a tool that works with 
> Tree Sitter.

Yes, it would be nice.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 11:43   ` Andrei Kuznetsov
                       ` (2 preceding siblings ...)
  2021-07-28 15:12     ` Perry E. Metzger
@ 2021-07-29  4:35     ` Richard Stallman
  3 siblings, 0 replies; 59+ messages in thread
From: Richard Stallman @ 2021-07-29  4:35 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: stephen_leake, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

I don't think we should reject Rust code for the GNU system.  There is
no need for such a drastic step.  We already use software that is
built in Rust.

Tree sitter is not going to be a part of Emacs; its use is not limited
to Emacs.  Other programs will work with it too.  So I don't see any
special reason to replace parts of it with Emacs Lisp code.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:04       ` Eli Zaretskii
  2021-07-28 13:14         ` Andrei Kuznetsov
@ 2021-07-29 23:12         ` Stephen Leake
  2021-07-29 23:21           ` Yuan Fu
                             ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Stephen Leake @ 2021-07-29 23:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Andrei Kuznetsov, manuel, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrei Kuznetsov <r12451428287@163.com>
>> Date: Wed, 28 Jul 2021 19:48:18 +0800
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>, emacs-devel@gnu.org
>> 
>> Manuel Giraud <manuel@ledu-giraud.fr> writes:
>> 
>> > I too did not follow the tree-sitter discussion closely. But AFAIU,
>> > tree-sitter provides tools to generate a parser (in C) from a grammar.
>> 
>> If that is the case, it certainly seems grave! I don't think an Emacs
>> that requires source modifications for extending vital editing
>> functionality is a good idea.
>
> TS's code is written in plain C, and doesn't require any regeneration
> or source modifications.  Anything else is misunderstanding.

That's true for the common TS runtime, which implements the parser and
error recovery, but the code for each language, that builds the LR parse
table and some other data structures, is generated in C from a grammar
file written in javascript, and must be linked into Emacs somehow. In
addition, some languages require an "external scanner", which is more
code in C that is specific to the language.

Ideally, there would be some sort of plugin, so new languages could be
added at run-time; maybe we could add a protocol on top of emacs
modules. I don't know how Yuan is handling this now.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-29 23:12         ` Stephen Leake
@ 2021-07-29 23:21           ` Yuan Fu
  2021-07-30 18:38             ` Stephen Leake
  2021-07-30  0:41           ` Andrei Kuznetsov
  2021-07-30  6:05           ` Eli Zaretskii
  2 siblings, 1 reply; 59+ messages in thread
From: Yuan Fu @ 2021-07-29 23:21 UTC (permalink / raw)
  To: Stephen Leake; +Cc: Andrei Kuznetsov, Eli Zaretskii, manuel, emacs-devel



> On Jul 29, 2021, at 7:12 PM, Stephen Leake <stephen_leake@stephe-leake.org> wrote:
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> From: Andrei Kuznetsov <r12451428287@163.com>
>>> Date: Wed, 28 Jul 2021 19:48:18 +0800
>>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>, emacs-devel@gnu.org
>>> 
>>> Manuel Giraud <manuel@ledu-giraud.fr> writes:
>>> 
>>>> I too did not follow the tree-sitter discussion closely. But AFAIU,
>>>> tree-sitter provides tools to generate a parser (in C) from a grammar.
>>> 
>>> If that is the case, it certainly seems grave! I don't think an Emacs
>>> that requires source modifications for extending vital editing
>>> functionality is a good idea.
>> 
>> TS's code is written in plain C, and doesn't require any regeneration
>> or source modifications.  Anything else is misunderstanding.
> 
> That's true for the common TS runtime, which implements the parser and
> error recovery, but the code for each language, that builds the LR parse
> table and some other data structures, is generated in C from a grammar
> file written in javascript, and must be linked into Emacs somehow.

Languages don’t need to be linked into Emacs. They can be in dynamic modules.

Yuan




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 13:07       ` Andrei Kuznetsov
  2021-07-28 13:16         ` Eli Zaretskii
@ 2021-07-29 23:25         ` Stephen Leake
  2021-07-30  0:54           ` Andrei Kuznetsov
  1 sibling, 1 reply; 59+ messages in thread
From: Stephen Leake @ 2021-07-29 23:25 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: Ergus, emacs-devel

Andrei Kuznetsov <r12451428287@163.com> writes:

> Ergus <spacibba@aol.com> writes:
>
>> 1) Performance (discussed in the previous thread):
>
> FWIW I have been experimenting with an increcemental GLR parser
> generator in Emacs Lisp.  

The "generator" and the "runtime" are two separate programs, with
separate functions, used at different times.

The generator takes the javascript language grammar file and translates
it (thru lots of hairy computations) into code that builds a parse table
and other data structures. The tree-sitter generator outputs that code
in C; it might be possible to adapt it to output in elisp (the wisitoken
generator used to output elisp, but i gave that up when I implemented
error recover in Ada; elisp is way to slow for that).

The "runtime" uses the parse table to parse text at runtime, in response
to user actions on the buffer. To be useful in an interactive editing
context, it must have robust error recovery. What is your error recovery
algorithm?

> While I have not put in the effort to couple it with font-lock and
> such, from anecdotal examination it does not perform badly with a
> naive C grammar. 

Are you talking about the generator or runtime here?

> The initial parse does take several seconds on large files, 

That's the runtime. Actual time for xdisp.c, preferably compared with a
tree-sitter parse run on the same machine, would be helpful.

How long does the generator take?

> but afterwards I did not notice a significant drop in editor
> responsiveness.

This seems to imply that the runtime supports incremental parse, so it
does not reparse the whole buffer each time; is that true?

>> 2) Not reinvent the wheel.
>
> While tree-sitter may be nice and all, it doesn't seem to offer the
> usual extensibility expected from Emacs.

It's all open-source, but it is very complicated and may be beyond many
people's ability to change correctly.

It requires running a C compiler to change it, but so do other parts of
Emacs (for example, the json parser).

So what is it missing?

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 15:12     ` Perry E. Metzger
@ 2021-07-29 23:28       ` Stephen Leake
  2021-07-30  0:19         ` Perry E. Metzger
  0 siblings, 1 reply; 59+ messages in thread
From: Stephen Leake @ 2021-07-29 23:28 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: emacs-devel

"Perry E. Metzger" <perry@piermont.com> writes:

> On 7/28/21 07:43, Andrei Kuznetsov wrote:
>> Stephen Leake <stephen_leake@stephe-leake.org> writes:
>>
>>> Some of the tree-sitter development tools are implemented in Rust; you
>>> only need Rust if you are developing/fixing a grammar for a language.
>> If I understand this correctly, it means one would require the Rust
>> toolchain to support new languages in tree-sitter, or to improve
>> existing support.
>
> That's not true. Tree Sitter is not written even partially in Rust. It
> does have Rust bindings for people who use Rust.

https://github.com/tree-sitter/tree-sitter/tree/master/cli/src

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Maybe we're taking a wrong approach towards tree-sitter
  2021-07-28 15:09 ` Perry E. Metzger
@ 2021-07-29 23:35   ` Stephen Leake
  0 siblings, 0 replies; 59+ messages in thread
From: Stephen Leake @ 2021-07-29 23:35 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: emacs-devel

"Perry E. Metzger" <perry@piermont.com> writes:

> On 7/27/21 21:57, Andrei Kuznetsov wrote:
>> Unlike features like native JSON, however, I believe tree-sitter is the
>> first optional package providing notable functionality that would
>> require a toolchain that depends on LLVM (that of Rust, which
>> tree-sitter is implemented in)
>
> Tree sitter is written in C. 

There are many parts to tree-sitter. The runtime, which uses
language-specific parse tables to parse use files, is written in C.

The command line tools (cli), one of which converts the language grammar
file written in javascript into C code that builds the parse table, are
written in Rust;

https://github.com/tree-sitter/tree-sitter/tree/master/cli/src

> It has an available set of Rust bindings. It compiles perfectly well
> with any C compiler.

Here you are describing the runtime, which is what must be linked with
Emacs for a major-mode to use the tree-sitter parser.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-29 23:28       ` Stephen Leake
@ 2021-07-30  0:19         ` Perry E. Metzger
  2021-07-30 18:44           ` [SPAM UNSURE] " Stephen Leake
  0 siblings, 1 reply; 59+ messages in thread
From: Perry E. Metzger @ 2021-07-30  0:19 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

On 7/29/21 19:28, Stephen Leake wrote:
> "Perry E. Metzger" <perry@piermont.com> writes:
>
>> That's not true. Tree Sitter is not written even partially in Rust. It
>> does have Rust bindings for people who use Rust.
> https://github.com/tree-sitter/tree-sitter/tree/master/cli/src
>
That's an optional CLI and is not part of the library runtime.

Perry





^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-29 23:12         ` Stephen Leake
  2021-07-29 23:21           ` Yuan Fu
@ 2021-07-30  0:41           ` Andrei Kuznetsov
  2021-07-30 12:06             ` Arthur Miller
  2021-07-30 18:42             ` Stephen Leake
  2021-07-30  6:05           ` Eli Zaretskii
  2 siblings, 2 replies; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-30  0:41 UTC (permalink / raw)
  To: Stephen Leake; +Cc: Eli Zaretskii, manuel, emacs-devel

Stephen Leake <stephen_leake@stephe-leake.org> writes:

> That's true for the common TS runtime, which implements the parser and
> error recovery, but the code for each language, that builds the LR parse
> table and some other data structures, is generated in C from a grammar
> file written in javascript, and must be linked into Emacs somehow. In
> addition, some languages require an "external scanner", which is more
> code in C that is specific to the language.

Interesting.  I assume it would be possible to reuse the source grammar
files?




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-29 23:25         ` [SPAM UNSURE] " Stephen Leake
@ 2021-07-30  0:54           ` Andrei Kuznetsov
  2021-07-30  3:02             ` Andrei Kuznetsov
  2021-07-30 18:48             ` Stephen Leake
  0 siblings, 2 replies; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-30  0:54 UTC (permalink / raw)
  To: Stephen Leake; +Cc: Ergus, emacs-devel

Stephen Leake <stephen_leake@stephe-leake.org> writes:

> The "generator" and the "runtime" are two separate programs, with
> separate functions, used at different times.
>
> The generator takes the javascript language grammar file and translates
> it (thru lots of hairy computations) into code that builds a parse table
> and other data structures. The tree-sitter generator outputs that code
> in C; it might be possible to adapt it to output in elisp (the wisitoken
> generator used to output elisp, but i gave that up when I implemented
> error recover in Ada; elisp is way to slow for that).
>
> The "runtime" uses the parse table to parse text at runtime, in response
> to user actions on the buffer. To be useful in an interactive editing
> context, it must have robust error recovery. What is your error recovery
> algorithm?

Currently extremely naive.  After an error occurs, it skips productions
until it can parses without errors, and just continues from there.  I
plan to improve it somewhat in the near future.

> Are you talking about the generator or runtime here?

The runtime.  The parser generator does not seem to be astonishingly
fast, but I don't think most people will have any cause to run it very
often.

> That's the runtime. Actual time for xdisp.c, preferably compared with a
> tree-sitter parse run on the same machine, would be helpful.

I'm currently pre-occupied and unable to work on this, but I will return
with these measurements as soon as reasonably possible.

> How long does the generator take?

I did not measure that, but as most people would be loading compiled
parsers, and not running the generator, I don't think it would matter
too much.  FWIW macroexpansion of the macro `defgrammar' blocks Emacs
for a second or 2.

> This seems to imply that the runtime supports incremental parse, so it
> does not reparse the whole buffer each time; is that true?

Indeed.  I've not yet figured out a particularly good way of recording
changes though -- as of present it relies on its own versions of
self-insert-command, kill-region, et cetera.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30  0:54           ` Andrei Kuznetsov
@ 2021-07-30  3:02             ` Andrei Kuznetsov
  2021-07-30 18:48             ` Stephen Leake
  1 sibling, 0 replies; 59+ messages in thread
From: Andrei Kuznetsov @ 2021-07-30  3:02 UTC (permalink / raw)
  To: Stephen Leake; +Cc: Ergus, emacs-devel

Andrei Kuznetsov <r12451428287@163.com> writes:


> until it can parses without errors, and just continues from there.
           ^^^^^^^^^^

I meant to say "parse" instead.

Further, as for "without errors", it skips until it finds the next
synchronizing token, attempts to parse starting from that token, and if
that fails repeats the process until either EOF is reached or it is
successful.






^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-29 23:12         ` Stephen Leake
  2021-07-29 23:21           ` Yuan Fu
  2021-07-30  0:41           ` Andrei Kuznetsov
@ 2021-07-30  6:05           ` Eli Zaretskii
  2021-07-31 12:12             ` Stephen Leake
  2 siblings, 1 reply; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-30  6:05 UTC (permalink / raw)
  To: Stephen Leake; +Cc: r12451428287, manuel, emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: Andrei Kuznetsov <r12451428287@163.com>,  manuel@ledu-giraud.fr,
>   emacs-devel@gnu.org
> Date: Thu, 29 Jul 2021 16:12:56 -0700
> 
> > TS's code is written in plain C, and doesn't require any regeneration
> > or source modifications.  Anything else is misunderstanding.
> 
> That's true for the common TS runtime, which implements the parser and
> error recovery, but the code for each language, that builds the LR parse
> table and some other data structures, is generated in C from a grammar
> file written in javascript, and must be linked into Emacs somehow.

That "linking" happens when Emacs is linked against the TS library,
right?



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30  0:41           ` Andrei Kuznetsov
@ 2021-07-30 12:06             ` Arthur Miller
  2021-07-30 12:52               ` Óscar Fuentes
                                 ` (2 more replies)
  2021-07-30 18:42             ` Stephen Leake
  1 sibling, 3 replies; 59+ messages in thread
From: Arthur Miller @ 2021-07-30 12:06 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: Eli Zaretskii, Stephen Leake, manuel, emacs-devel

Andrei Kuznetsov <r12451428287@163.com> writes:

 Leake <stephen_leake@stephe-leake.org> writes:
>
>> That's true for the common TS runtime, which implements the parser and
>> error recovery, but the code for each language, that builds the LR parse
>> table and some other data structures, is generated in C from a grammar
>> file written in javascript, and must be linked into Emacs somehow. In
>> addition, some languages require an "external scanner", which is more
>> code in C that is specific to the language.
>
> Interesting.  I assume it would be possible to reuse the source grammar
> files?

It probably is, and looking at neowim's gh repo, there are some
instructions on how to create a grammar for new language:

https://github.com/nvim-treesitter/nvim-treesitter

The process could probably be somehow automated from lisp.

I have though a sincere question about this entire tree-sitter
venture. Is it really worth trouble in Emacs case? As I understand TS it
is a specialized regex matcher, and looking at some language specs leave
me with that feeling (for example the grammar for bash):

https://github.com/tree-sitter/tree-sitter-bash/blob/master/src/grammar.json

I undestand that having specialized regex matcher is more efficient than
some generalized regular matcher current font-locking in Emacs relies
upon, but is it *that* more efficient to be worth the extra troubles?
TS seem to keep state (a node) for each character typed, that will be a
lot of memory consumed in some big files. If this syntax tree it keeps
to implement what it does can be re-used for something else than it
could be very useful, but just for syntax-highlight and indentation?
Some years ago, when opening some 10k lines as found in Emacs src dir, I
noticed some slowdown on font lock. But nowadays I don't experience any
hickups with syntax hightlighting or indentation.

Anyway, it is very educating to see TS get merged into Emacs and to read
Eli's tips and guidance about Emacs internals.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 12:06             ` Arthur Miller
@ 2021-07-30 12:52               ` Óscar Fuentes
  2021-07-30 13:30                 ` Arthur Miller
  2021-07-30 13:32               ` Ergus
  2021-08-02 22:13               ` Perry E. Metzger
  2 siblings, 1 reply; 59+ messages in thread
From: Óscar Fuentes @ 2021-07-30 12:52 UTC (permalink / raw)
  To: emacs-devel

Arthur Miller <arthur.miller@live.com> writes:

> I undestand that having specialized regex matcher is more efficient than
> some generalized regular matcher current font-locking in Emacs relies
> upon, but is it *that* more efficient to be worth the extra troubles?

AFAIU this is not about efficience, but mainly about correctness (modern
languages are increasingly more difficult to analyze) and also about
decreasing the maintenance load. In the process, Emacs gets support for
some new languages too.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 12:52               ` Óscar Fuentes
@ 2021-07-30 13:30                 ` Arthur Miller
  2021-07-30 13:57                   ` Ergus
  2021-07-30 13:59                   ` Eli Zaretskii
  0 siblings, 2 replies; 59+ messages in thread
From: Arthur Miller @ 2021-07-30 13:30 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

Óscar Fuentes <ofv@wanadoo.es> writes:

> Arthur Miller <arthur.miller@live.com> writes:
>
>> I undestand that having specialized regex matcher is more efficient than
>> some generalized regular matcher current font-locking in Emacs relies
>> upon, but is it *that* more efficient to be worth the extra troubles?
>
> AFAIU this is not about efficience, but mainly about correctness (modern
> languages are increasingly more difficult to analyze)

Ok, I understand, and I can buy that one. Question is if it is still
worth just for the syntax hightlight and indentation? If I get some
spurious color here or there sometimes not colored, do I care?

Can that syntax tree of TS be exposed to lisp and used for some other
purposes, or is it just internal to TS and only output we see is some
colors on the screen?

>                                                       and also about
> decreasing the maintenance load.
Sure, but it is also a limitation. If Emacs will rely on TS maintainers
to create new grammars and update existing ones when language changes,
it means Emacs users will have to wait for changes until they are
fixed upstream, similar as how gnu/linux distros work regarding
packaging. Of course, a user who wish to modify or introduce new
language can always rely on old font-lock or go through pain of TS
toolilng based on JS and custom tools. Lisp frontend to that toolchain
can probably be developed but that is even more work.

>                                  In the process, Emacs gets support for
> some new languages too.

Yes, it is always nice I guess :). Is there really demand for some
language currently provided in TS and not in Emacs?

I don't know, I am maybe overly sceptical to TS; I don't mean it is a
bad package, and I am sure it has it's place in other editors, I am just
not sure how it fits in Emacs where everything is easily configurable
and extensible.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 12:06             ` Arthur Miller
  2021-07-30 12:52               ` Óscar Fuentes
@ 2021-07-30 13:32               ` Ergus
  2021-07-30 15:07                 ` Arthur Miller
  2021-08-02 22:13               ` Perry E. Metzger
  2 siblings, 1 reply; 59+ messages in thread
From: Ergus @ 2021-07-30 13:32 UTC (permalink / raw)
  To: Arthur Miller
  Cc: Andrei Kuznetsov, Eli Zaretskii, Stephen Leake, manuel,
	emacs-devel

On Fri, Jul 30, 2021 at 02:06:00PM +0200, Arthur Miller wrote:
>Andrei Kuznetsov <r12451428287@163.com> writes:
>
> Leake <stephen_leake@stephe-leake.org> writes:
>>
>>> That's true for the common TS runtime, which implements the parser and
>>> error recovery, but the code for each language, that builds the LR parse
>>> table and some other data structures, is generated in C from a grammar
>>> file written in javascript, and must be linked into Emacs somehow. In
>>> addition, some languages require an "external scanner", which is more
>>> code in C that is specific to the language.
>>
>> Interesting.  I assume it would be possible to reuse the source grammar
>> files?
>
>It probably is, and looking at neowim's gh repo, there are some
>instructions on how to create a grammar for new language:
>
>https://github.com/nvim-treesitter/nvim-treesitter
>
>The process could probably be somehow automated from lisp.
>
>I have though a sincere question about this entire tree-sitter
>venture. Is it really worth trouble in Emacs case? As I understand TS it
>is a specialized regex matcher, and looking at some language specs leave
>me with that feeling (for example the grammar for bash):
>
>https://github.com/tree-sitter/tree-sitter-bash/blob/master/src/grammar.json
>
>I undestand that having specialized regex matcher is more efficient than
>some generalized regular matcher current font-locking in Emacs relies
>upon, but is it *that* more efficient to be worth the extra troubles?
>TS seem to keep state (a node) for each character typed, that will be a
>lot of memory consumed in some big files. If this syntax tree it keeps
>to implement what it does can be re-used for something else than it
>could be very useful, but just for syntax-highlight and indentation?
>Some years ago, when opening some 10k lines as found in Emacs src dir, I
>noticed some slowdown on font lock. But nowadays I don't experience any
>hickups with syntax hightlighting or indentation.
>
>Anyway, it is very educating to see TS get merged into Emacs and to read
>Eli's tips and guidance about Emacs internals.
>
The TS thing came out due to some issues in the c-mode highlighting
reported in that thread: correctness and speed (slowing down things like
scrolling). c-mode does its best, but C++ is evolving and more complex
analysis comes with a penalty and more and more code complexity in the
parser. Same happens with new languages very extended.

It will be very difficult to implement a complete/competitive mode like
c-mode for all the new languages that are very popular today (rust,
typescript; even python). So we end having some "weak" modes with
inconsistencies and different bindings and color themes. Those become
unmaintained after a time because the developers migrate to more
complete editors/ide and new developers just don't come to emacs because
it does not satisfy their needs to start with.

  Probably I am wrong but 99% of the web developers (React, Nodejs,
Angular) are using VSCode, the rest are with neovim; so we don't even
have people with enough knowledge and motivation to implement one of
those in Emacs one by one.

Because these languages are more complex to analyze and because we don't
have people to maintain a mode for all of them. Trying to do so will
spend too much developer time reinventing what TS already does (and does
it right, efficiently and with a support community).

So; maintaining a mode for every language we currently don't support is
not scalable over time. And reimplementing a replacement for TS in Elisp
won't worth it and will end up being very slow and repeating all the
errors that TS developers have already solved. TS may be useful not only
for syntax highlight and indentation but also for code navigation and
some basic syntax checking.

Basically TS is: One "infrastructure" to rule them all.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 13:30                 ` Arthur Miller
@ 2021-07-30 13:57                   ` Ergus
  2021-07-30 14:52                     ` Arthur Miller
  2021-07-30 13:59                   ` Eli Zaretskii
  1 sibling, 1 reply; 59+ messages in thread
From: Ergus @ 2021-07-30 13:57 UTC (permalink / raw)
  To: Arthur Miller; +Cc: Óscar Fuentes, emacs-devel

On Fri, Jul 30, 2021 at 03:30:42PM +0200, Arthur Miller wrote:
>�scar Fuentes <ofv@wanadoo.es> writes:
>
>> Arthur Miller <arthur.miller@live.com> writes:
>>
>>> I undestand that having specialized regex matcher is more efficient than
>>> some generalized regular matcher current font-locking in Emacs relies
>>> upon, but is it *that* more efficient to be worth the extra troubles?
>>
>> AFAIU this is not about efficience, but mainly about correctness (modern
>> languages are increasingly more difficult to analyze)
>
>Ok, I understand, and I can buy that one. Question is if it is still
>worth just for the syntax hightlight and indentation? If I get some
>spurious color here or there sometimes not colored, do I care?
>

Yes, we care. Syntax highlight for an editor is a basic feature in 2021.

>Can that syntax tree of TS be exposed to lisp and used for some other
>purposes,

This is the idea. use the tree for navigations like up-list or
goto-defun for example. Maybe not the tree directly, but the information
it provides (maybe calling TS function wrappers or setting the TS
information as text properties).

>or is it just internal to TS and only output we see is some
>colors on the screen?
>
How we use it is more a design choice. We can access the tree
information with the TS api or we can just put the tree's information as
text properties... imagination is the limit ;)

>>                                                       and also about
>> decreasing the maintenance load.
>Sure, but it is also a limitation. If Emacs will rely on TS maintainers
>to create new grammars and update existing ones when language changes,
>it means Emacs users will have to wait for changes until they are
>fixed upstream, similar as how gnu/linux distros work regarding
>packaging. Of course, a user who wish to modify or introduce new
>language can always rely on old font-lock or go through pain of TS
>toolilng based on JS and custom tools. Lisp frontend to that toolchain
>can probably be developed but that is even more work.
>
Sincerely; create a grammar for TS is much simpler than create a mode
with font-lock, navigation commands, indentation rules and some
flymake. All the modes with TS will be a bit more consistent in colors
and keybindings (now we have modes where all commands use different
prefixes, or lacking navigation or with different indentation
customs. So using them is like learning different editors for every
language)

>>                                  In the process, Emacs gets support for
>> some new languages too.
>
>Yes, it is always nice I guess :). Is there really demand for some
>language currently provided in TS and not in Emacs?
>

Indeed. As I mentioned before web developers are using VScode or neovim
because Angular, React, Nodejs and Python are painfully supported
(compared to VScode or Sublime). Rust is very limited supported in
emacs, so users rely on external packages like rust-mode, elpy or
anaconda that introduce different bindings, collisions and require some
complex setups for the basics.

>I don't know, I am maybe overly sceptical to TS; I don't mean it is a
>bad package, and I am sure it has it's place in other editors, I am just
>not sure how it fits in Emacs where everything is easily configurable
>and extensible.
>
It is just a good trade-off configurable enough for 99% of the use
cases. Unless we expect all the users to be advanced lisp hackers to
customize their fontlocking, indentation and navigation functions for
every single prog-mode.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 13:30                 ` Arthur Miller
  2021-07-30 13:57                   ` Ergus
@ 2021-07-30 13:59                   ` Eli Zaretskii
  2021-07-30 15:45                     ` Arthur Miller
  1 sibling, 1 reply; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-30 13:59 UTC (permalink / raw)
  To: Arthur Miller; +Cc: ofv, emacs-devel

> From: Arthur Miller <arthur.miller@live.com>
> Date: Fri, 30 Jul 2021 15:30:42 +0200
> Cc: emacs-devel@gnu.org
> 
> >                                                       and also about
> > decreasing the maintenance load.
> Sure, but it is also a limitation. If Emacs will rely on TS maintainers
> to create new grammars and update existing ones when language changes,
> it means Emacs users will have to wait for changes until they are
> fixed upstream, similar as how gnu/linux distros work regarding
> packaging.

We have the same "problem" with every other library we use: the image
libraries, GnuTLS, HarfBuzz, etc.

Besides, TS is used by quite a few projects, so how long do you think
it will take for serious problems in language support to be fixed?

OTOH, take a look at some places in Emacs that don't have active
maintainers: problems there sometimes take forever to fix.  This is
what happens when a project wants to control everything in its domain,
but lacks manpower for doing so.

It is not reasonable to expect Emacs to have experts on board for
parsing every language on the face of Earth.  It won't work.

> I don't know, I am maybe overly sceptical to TS; I don't mean it is a
> bad package, and I am sure it has it's place in other editors, I am just
> not sure how it fits in Emacs where everything is easily configurable
> and extensible.

Not everything.  Again, take the other optional libraries we use as
examples: they cannot be extended inside Emacs.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 13:57                   ` Ergus
@ 2021-07-30 14:52                     ` Arthur Miller
  0 siblings, 0 replies; 59+ messages in thread
From: Arthur Miller @ 2021-07-30 14:52 UTC (permalink / raw)
  To: Ergus; +Cc: Óscar Fuentes, emacs-devel

Ergus <spacibba@aol.com> writes:

> On Fri, Jul 30, 2021 at 03:30:42PM +0200, Arthur Miller wrote:
>>�scar Fuentes <ofv@wanadoo.es> writes:
>>
>>> Arthur Miller <arthur.miller@live.com> writes:
>>>
>>>> I undestand that having specialized regex matcher is more efficient than
>>>> some generalized regular matcher current font-locking in Emacs relies
>>>> upon, but is it *that* more efficient to be worth the extra troubles?
>>>
>>> AFAIU this is not about efficience, but mainly about correctness (modern
>>> languages are increasingly more difficult to analyze)
>>
>>Ok, I understand, and I can buy that one. Question is if it is still
>>worth just for the syntax hightlight and indentation? If I get some
>>spurious color here or there sometimes not colored, do I care?
>>
>
> Yes, we care. Syntax highlight for an editor is a basic feature in 2021.
Of course, but I didn't meant Emacs should be without one, wtf, it's not
all or nothing :). What I said is do I really care if a file of 10k
source lines has a word here or there not highlighted, which I haven't
noticed with current implementation either.

>>Can that syntax tree of TS be exposed to lisp and used for some other
>>purposes,
>
> This is the idea. use the tree for navigations like up-list or
> goto-defun for example. Maybe not the tree directly, but the information
> it provides (maybe calling TS function wrappers or setting the TS
> information as text properties).
Ok, that might be useful.

> Indeed. As I mentioned before web developers are using VScode or neovim
> because Angular, React, Nodejs and Python are painfully supported
> (compared to VScode or Sublime). Rust is very limited supported in
> emacs, so users rely on external packages like rust-mode, elpy or
> anaconda that introduce different bindings, collisions and require some
> complex setups for the basics.
Don't we rely on external packages for lots of things. Almost all of
external packages you mentioned provide more than just syntax highlight,
and indentation, so we will probably continue to use those for other
reasons  even wen TS enters Emacs.

>        Unless we expect all the users to be advanced lisp hackers to
> customize their fontlocking, indentation and navigation functions for
> every single prog-mode.

Is it considered advanced lisp hackery to add extra keywords to
font-lock in their init file? I always think of myself as an elisp
noob. Thanks for boosting my ego :-).

Don't take me wrong, I mean nothing bad, I just find answers a tad bit
too extreme for my taste, but thanks for the input, it is interesting read.

I guess I'll be less sceptical and see what TS brings, anyway, thanks
for the all the work to all of you who work on it.





^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 13:32               ` Ergus
@ 2021-07-30 15:07                 ` Arthur Miller
  0 siblings, 0 replies; 59+ messages in thread
From: Arthur Miller @ 2021-07-30 15:07 UTC (permalink / raw)
  To: Ergus; +Cc: Andrei Kuznetsov, Eli Zaretskii, Stephen Leake, manuel,
	emacs-devel

Ergus <spacibba@aol.com> writes:
>  Probably I am wrong but 99% of the web developers (React, Nodejs,
> Angular) are using VSCode, the rest are with neovim; so we don't even
> have people with enough knowledge and motivation to implement one of
> those in Emacs one by one.
That might be for other reasons as well, like interaction modell,
wording and other idiosyncrasies of Emacs as discussed in numerous
threads about making Emacs popular, because of certain company is
backing VSCode etc. There are other editors like Adobe's Brackets which
came before VSCode and is by far less popular than VSCode. Looking at
recent MS business moves (AI, Github, copilot ...), it is now
understandable why they pour resources into a free code editor. I
wondered how come when they first released it, now the picture clears. I
don't think Emacs or barely some other editor can compete with MS,
simply nobody has so much resource. That is of course not an argument
for or against TS, just a thought about people prefereing a tool. Yes, I
agree with you that syntax highlight out of the box for a certain
library like Node oor Vue might help Emacs. I have nothing against that
argument.

>                                                TS may be useful not only
> for syntax highlight and indentation but also for code navigation and
> some basic syntax checking.
Yes, that would be a nice thing if it could be used for more than just
syntax and indentation.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 13:59                   ` Eli Zaretskii
@ 2021-07-30 15:45                     ` Arthur Miller
  0 siblings, 0 replies; 59+ messages in thread
From: Arthur Miller @ 2021-07-30 15:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Arthur Miller <arthur.miller@live.com>
>> Date: Fri, 30 Jul 2021 15:30:42 +0200
>> Cc: emacs-devel@gnu.org
>> 
>> >                                                       and also about
>> > decreasing the maintenance load.
>> Sure, but it is also a limitation. If Emacs will rely on TS maintainers
>> to create new grammars and update existing ones when language changes,
>> it means Emacs users will have to wait for changes until they are
>> fixed upstream, similar as how gnu/linux distros work regarding
>> packaging.
>
> We have the same "problem" with every other library we use: the image
> libraries, GnuTLS, HarfBuzz, etc.
>
> Besides, TS is used by quite a few projects, so how long do you think
> it will take for serious problems in language support to be fixed?

Yes, I understand that, I didn't meant so much problems as general
configurability after pesonal preferences and extendability. That is
what people seem to praise on Reddit when it comes to Emacs.

> OTOH, take a look at some places in Emacs that don't have active
> maintainers: problems there sometimes take forever to fix.  This is
> what happens when a project wants to control everything in its domain,
> but lacks manpower for doing so.
>
> It is not reasonable to expect Emacs to have experts on board for
> parsing every language on the face of Earth.  It won't work.
>
>> I don't know, I am maybe overly sceptical to TS; I don't mean it is a
>> bad package, and I am sure it has it's place in other editors, I am just
>> not sure how it fits in Emacs where everything is easily configurable
>> and extensible.
>
> Not everything.  Again, take the other optional libraries we use as
> examples: they cannot be extended inside Emacs.


TS is a bit special library since it hooks into a part of Emacs that
people do extend often but I do understand it is just a library that
adds some extra value like the others. I guess you are correct about its
popularity among other projects, that might do work for Emacs indeed. As
Ergus pointed out it will bring lots out of the box to many people, so I
guess it is a win at least in that section.

Thanks for the answer.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-29 23:21           ` Yuan Fu
@ 2021-07-30 18:38             ` Stephen Leake
  0 siblings, 0 replies; 59+ messages in thread
From: Stephen Leake @ 2021-07-30 18:38 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Andrei Kuznetsov, Eli Zaretskii, manuel, emacs-devel

Yuan Fu <casouri@gmail.com> writes:

>> On Jul 29, 2021, at 7:12 PM, Stephen Leake <stephen_leake@stephe-leake.org> wrote:
>> That's true for the common TS runtime, which implements the parser and
>> error recovery, but the code for each language, that builds the LR parse
>> table and some other data structures, is generated in C from a grammar
>> file written in javascript, and must be linked into Emacs somehow.
>
> Languages don’t need to be linked into Emacs. They can be in dynamic
> modules.

Dynamic modules are linked, at run-time. That's how the code that calls
them knows what addresses to call.

So I think you are saying the tree-sitter runtime will be 
linked into Emacs at emacs compile time, while the languages can be
linked in at run-time. That's good.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30  0:41           ` Andrei Kuznetsov
  2021-07-30 12:06             ` Arthur Miller
@ 2021-07-30 18:42             ` Stephen Leake
  1 sibling, 0 replies; 59+ messages in thread
From: Stephen Leake @ 2021-07-30 18:42 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: Eli Zaretskii, manuel, emacs-devel

Andrei Kuznetsov <r12451428287@163.com> writes:

> Stephen Leake <stephen_leake@stephe-leake.org> writes:
>
>> That's true for the common TS runtime, which implements the parser and
>> error recovery, but the code for each language, that builds the LR parse
>> table and some other data structures, is generated in C from a grammar
>> file written in javascript, and must be linked into Emacs somehow. In
>> addition, some languages require an "external scanner", which is more
>> code in C that is specific to the language.
>
> Interesting.  I assume it would be possible to reuse the source grammar
> files?

If they are licensed as free software, yes, of course.

What sort of reuse do you have in mind?

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30  0:19         ` Perry E. Metzger
@ 2021-07-30 18:44           ` Stephen Leake
  0 siblings, 0 replies; 59+ messages in thread
From: Stephen Leake @ 2021-07-30 18:44 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: emacs-devel

"Perry E. Metzger" <perry@piermont.com> writes:

> On 7/29/21 19:28, Stephen Leake wrote:
>> "Perry E. Metzger" <perry@piermont.com> writes:
>>
>>> That's not true. Tree Sitter is not written even partially in Rust. It
>>> does have Rust bindings for people who use Rust.
>> https://github.com/tree-sitter/tree-sitter/tree/master/cli/src
>>
> That's an optional CLI and is not part of the library runtime.

Yes, and the optional CLI is part of the tree-sitter project, so the
statement "Tree Sitter is not written even partially in Rust" is simply
wrong.

Please be more careful when you say "tree-sitter", but mean
"tree-sitter runtime"; it is not always clear from context.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30  0:54           ` Andrei Kuznetsov
  2021-07-30  3:02             ` Andrei Kuznetsov
@ 2021-07-30 18:48             ` Stephen Leake
  1 sibling, 0 replies; 59+ messages in thread
From: Stephen Leake @ 2021-07-30 18:48 UTC (permalink / raw)
  To: Andrei Kuznetsov; +Cc: Ergus, emacs-devel

Andrei Kuznetsov <r12451428287@163.com> writes:

> Stephen Leake <stephen_leake@stephe-leake.org> writes:
>
>> How long does the generator take?
>
> I did not measure that, but as most people would be loading compiled
> parsers, and not running the generator, I don't think it would matter
> too much.  FWIW macroexpansion of the macro `defgrammar' blocks Emacs
> for a second or 2.

It can matter a lot for large grammars. wisitoken used to take hours to
generate the LR1 parse table for Ada; now it takes a couple minutes.
tree-sitter never finishes that grammar.

A naive LR grammar generator can easily be O (n**3) or worse in the
grammar size; I spent a lot of time optimizing wisitoken so it can
handle Ada reasonably.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30  6:05           ` Eli Zaretskii
@ 2021-07-31 12:12             ` Stephen Leake
  2021-07-31 13:07               ` Eli Zaretskii
  0 siblings, 1 reply; 59+ messages in thread
From: Stephen Leake @ 2021-07-31 12:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: r12451428287, manuel, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Cc: Andrei Kuznetsov <r12451428287@163.com>,  manuel@ledu-giraud.fr,
>>   emacs-devel@gnu.org
>> Date: Thu, 29 Jul 2021 16:12:56 -0700
>> 
>> > TS's code is written in plain C, and doesn't require any regeneration
>> > or source modifications.  Anything else is misunderstanding.
>> 
>> That's true for the common TS runtime, which implements the parser and
>> error recovery, but the code for each language, that builds the LR parse
>> table and some other data structures, is generated in C from a grammar
>> file written in javascript, and must be linked into Emacs somehow.
>
> That "linking" happens when Emacs is linked against the TS library,
> right?

I don't know what you mean by "the TS library".

I'm guessing you mean the tree-sitter runtime, in which case no, that
does not include any languages.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-31 12:12             ` Stephen Leake
@ 2021-07-31 13:07               ` Eli Zaretskii
  2021-07-31 16:55                 ` Stephen Leake
  0 siblings, 1 reply; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-31 13:07 UTC (permalink / raw)
  To: Stephen Leake; +Cc: r12451428287, manuel, emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: r12451428287@163.com,  manuel@ledu-giraud.fr,  emacs-devel@gnu.org
> Date: Sat, 31 Jul 2021 05:12:54 -0700
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Stephen Leake <stephen_leake@stephe-leake.org>
> >> Cc: Andrei Kuznetsov <r12451428287@163.com>,  manuel@ledu-giraud.fr,
> >>   emacs-devel@gnu.org
> >> Date: Thu, 29 Jul 2021 16:12:56 -0700
> >> 
> >> > TS's code is written in plain C, and doesn't require any regeneration
> >> > or source modifications.  Anything else is misunderstanding.
> >> 
> >> That's true for the common TS runtime, which implements the parser and
> >> error recovery, but the code for each language, that builds the LR parse
> >> table and some other data structures, is generated in C from a grammar
> >> file written in javascript, and must be linked into Emacs somehow.
> >
> > That "linking" happens when Emacs is linked against the TS library,
> > right?
> 
> I don't know what you mean by "the TS library".

I mean libtree-sitter.a produced by building the library.

> I'm guessing you mean the tree-sitter runtime, in which case no, that
> does not include any languages.

"Include" in what sense?



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-31 13:07               ` Eli Zaretskii
@ 2021-07-31 16:55                 ` Stephen Leake
  2021-07-31 17:12                   ` Eli Zaretskii
  0 siblings, 1 reply; 59+ messages in thread
From: Stephen Leake @ 2021-07-31 16:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: r12451428287, manuel, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Cc: r12451428287@163.com,  manuel@ledu-giraud.fr,  emacs-devel@gnu.org
>> Date: Sat, 31 Jul 2021 05:12:54 -0700
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> >> Cc: Andrei Kuznetsov <r12451428287@163.com>,  manuel@ledu-giraud.fr,
>> >>   emacs-devel@gnu.org
>> >> Date: Thu, 29 Jul 2021 16:12:56 -0700
>> >> 
>> >> > TS's code is written in plain C, and doesn't require any regeneration
>> >> > or source modifications.  Anything else is misunderstanding.
>> >> 
>> >> That's true for the common TS runtime, which implements the parser and
>> >> error recovery, but the code for each language, that builds the LR parse
>> >> table and some other data structures, is generated in C from a grammar
>> >> file written in javascript, and must be linked into Emacs somehow.
>> >
>> > That "linking" happens when Emacs is linked against the TS library,
>> > right?
>> 
>> I don't know what you mean by "the TS library".
>
> I mean libtree-sitter.a produced by building the library.
>
>> I'm guessing you mean the tree-sitter runtime, in which case no, that
>> does not include any languages.
>
> "Include" in what sense?

There is no code in libtree-sitter.a that provides a language; all
languages are built separately, by the language developers.

https://github.com/tree-sitter/tree-sitter builds libtree-sitter.a, and
the command line tools to build a language.

https://github.com/tree-sitter/tree-sitter-python builds the object file
providing the python language.

There are many other languages, each with its own repository.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-31 16:55                 ` Stephen Leake
@ 2021-07-31 17:12                   ` Eli Zaretskii
  0 siblings, 0 replies; 59+ messages in thread
From: Eli Zaretskii @ 2021-07-31 17:12 UTC (permalink / raw)
  To: Stephen Leake; +Cc: r12451428287, manuel, emacs-devel

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: r12451428287@163.com,  manuel@ledu-giraud.fr,  emacs-devel@gnu.org
> Date: Sat, 31 Jul 2021 09:55:46 -0700
> 
> >> I don't know what you mean by "the TS library".
> >
> > I mean libtree-sitter.a produced by building the library.
> >
> >> I'm guessing you mean the tree-sitter runtime, in which case no, that
> >> does not include any languages.
> >
> > "Include" in what sense?
> 
> There is no code in libtree-sitter.a that provides a language; all
> languages are built separately, by the language developers.
> 
> https://github.com/tree-sitter/tree-sitter builds libtree-sitter.a, and
> the command line tools to build a language.
> 
> https://github.com/tree-sitter/tree-sitter-python builds the object file
> providing the python language.
> 
> There are many other languages, each with its own repository.

We are talking past each other.  But I don't think we should keep
arguing about this, because there's no real disagreement here to argue
about.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter
  2021-07-30 12:06             ` Arthur Miller
  2021-07-30 12:52               ` Óscar Fuentes
  2021-07-30 13:32               ` Ergus
@ 2021-08-02 22:13               ` Perry E. Metzger
  2 siblings, 0 replies; 59+ messages in thread
From: Perry E. Metzger @ 2021-08-02 22:13 UTC (permalink / raw)
  To: emacs-devel

On 7/30/21 08:06, Arthur Miller wrote:
> I undestand that having specialized regex matcher is more efficient than
> some generalized regular matcher current font-locking in Emacs relies
> upon, but is it *that* more efficient to be worth the extra troubles?

It is not a question of efficiency. You cannot parse a context free 
grammar using regular expressions. The reason that almost all our 
highlight modes produce random garbage throughout is that you cannot 
parse a context free grammar using regular expressions. (For many 
languages, correctness isn't just occasionally violated, it's generally 
violated.)

Reliable highlighting regardless of code formatting, reliable 
indentation assistance, reliable code folding, and other such features 
require that the editor be able to both parse the program being edited 
_and_ that the editor be able to incrementally re-parse it as it changes 
in minimal time.

Other editors now have such features and make good use of them. Highly 
reliable code folding alone is worth the price of admission IMHO. 
(Currently, the best we can do for code folding is assume that the 
indentation is correct.)

LSP has been revolutionary in improving the programmer's experience in 
Emacs. Tree Sitter will provide significant additional improvement.

> TS seem to keep state (a node) for each character typed, that will be a
> lot of memory consumed in some big files.

No one will be forced to turn it on. I, however, almost certainly will. 
My productivity is more important to me than my RAM budget. Those that 
don't like it, though, won't have to pay the RAM tax.

Perry





^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2021-08-02 22:13 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-28  1:57 Maybe we're taking a wrong approach towards tree-sitter Andrei Kuznetsov
2021-07-28  3:53 ` [SPAM UNSURE] " Stephen Leake
2021-07-28  8:23   ` Manuel Giraud
2021-07-28 11:48     ` Andrei Kuznetsov
2021-07-28 13:04       ` Eli Zaretskii
2021-07-28 13:14         ` Andrei Kuznetsov
2021-07-28 13:27           ` Eli Zaretskii
2021-07-28 13:31             ` Andrei Kuznetsov
2021-07-28 14:24             ` Dmitry Gutov
2021-07-28 14:36               ` Dmitry Gutov
2021-07-28 14:51               ` Daniele Nicolodi
2021-07-28 16:10               ` Eli Zaretskii
2021-07-28 16:24                 ` Perry E. Metzger
2021-07-28 16:29                   ` Eli Zaretskii
2021-07-29 23:12         ` Stephen Leake
2021-07-29 23:21           ` Yuan Fu
2021-07-30 18:38             ` Stephen Leake
2021-07-30  0:41           ` Andrei Kuznetsov
2021-07-30 12:06             ` Arthur Miller
2021-07-30 12:52               ` Óscar Fuentes
2021-07-30 13:30                 ` Arthur Miller
2021-07-30 13:57                   ` Ergus
2021-07-30 14:52                     ` Arthur Miller
2021-07-30 13:59                   ` Eli Zaretskii
2021-07-30 15:45                     ` Arthur Miller
2021-07-30 13:32               ` Ergus
2021-07-30 15:07                 ` Arthur Miller
2021-08-02 22:13               ` Perry E. Metzger
2021-07-30 18:42             ` Stephen Leake
2021-07-30  6:05           ` Eli Zaretskii
2021-07-31 12:12             ` Stephen Leake
2021-07-31 13:07               ` Eli Zaretskii
2021-07-31 16:55                 ` Stephen Leake
2021-07-31 17:12                   ` Eli Zaretskii
2021-07-28 11:43   ` Andrei Kuznetsov
2021-07-28 11:50     ` Eli Zaretskii
2021-07-28 12:06       ` Andrei Kuznetsov
2021-07-28 13:05         ` Eli Zaretskii
2021-07-28 13:16           ` Andrei Kuznetsov
2021-07-28 12:36     ` Ergus
2021-07-28 13:07       ` Andrei Kuznetsov
2021-07-28 13:16         ` Eli Zaretskii
2021-07-28 13:27           ` Andrei Kuznetsov
2021-07-28 13:32             ` Eli Zaretskii
2021-07-28 13:38               ` Andrei Kuznetsov
2021-07-28 14:41                 ` Manuel Giraud
2021-07-28 15:15                   ` Perry E. Metzger
2021-07-28 16:10                   ` Eli Zaretskii
2021-07-29 23:25         ` [SPAM UNSURE] " Stephen Leake
2021-07-30  0:54           ` Andrei Kuznetsov
2021-07-30  3:02             ` Andrei Kuznetsov
2021-07-30 18:48             ` Stephen Leake
2021-07-28 15:12     ` Perry E. Metzger
2021-07-29 23:28       ` Stephen Leake
2021-07-30  0:19         ` Perry E. Metzger
2021-07-30 18:44           ` [SPAM UNSURE] " Stephen Leake
2021-07-29  4:35     ` Richard Stallman
2021-07-28 15:09 ` Perry E. Metzger
2021-07-29 23:35   ` Stephen Leake

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).