* Maybe we're taking a wrong approach towards tree-sitter @ 2021-07-28 1:57 Andrei Kuznetsov 2021-07-28 3:53 ` [SPAM UNSURE] " Stephen Leake 2021-07-28 15:09 ` Perry E. Metzger 0 siblings, 2 replies; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 1:57 UTC (permalink / raw) To: emacs-devel I could not follow the conversation <<cc-mode fontification feels random>> particularly well, as it seemed somehow disjoint in a manner I cannot explain, but it seemed as if consensus has been reached that Emacs will provide optional functionality integrating yet another external package, this time tree-sitter. Unlike features like native JSON, however, I believe tree-sitter is the first optional package providing notable functionality that would require a toolchain that depends on LLVM (that of Rust, which tree-sitter is implemented in), and is therefore inaccessible to people not running popular systems; I.E., how would one make tree-sitter work in MS-DOS (Emacs on FreeDOS is a must-have for me, and it would be a great annoyance if cc-mode, or similar external packages depend on tree-sitter in the future), or on an Itanium system running GNU/Linux? I think we should focus on portably reimplementing the relevant functionality within Emacs, preferably in Lisp. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 1:57 Maybe we're taking a wrong approach towards tree-sitter Andrei Kuznetsov @ 2021-07-28 3:53 ` Stephen Leake 2021-07-28 8:23 ` Manuel Giraud 2021-07-28 11:43 ` Andrei Kuznetsov 2021-07-28 15:09 ` Perry E. Metzger 1 sibling, 2 replies; 59+ messages in thread From: Stephen Leake @ 2021-07-28 3:53 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: emacs-devel Andrei Kuznetsov <r12451428287@163.com> writes: > it seemed as if consensus has been reached that > Emacs will provide optional functionality integrating yet another > external package, this time tree-sitter. > > Unlike features like native JSON, however, I believe tree-sitter is the > first optional package providing notable functionality that would > require a toolchain that depends on LLVM (that of Rust, which > tree-sitter is implemented in), and is therefore inaccessible to people > not running popular systems; The tree-sitter runtime, that Emacs would link with, it implemented in C, partly for this reason. It would be compiled with whatever Emacs is compiled with, or the system compiler. Some of the tree-sitter development tools are implemented in Rust; you only need Rust if you are developing/fixing a grammar for a language. -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 3:53 ` [SPAM UNSURE] " Stephen Leake @ 2021-07-28 8:23 ` Manuel Giraud 2021-07-28 11:48 ` Andrei Kuznetsov 2021-07-28 11:43 ` Andrei Kuznetsov 1 sibling, 1 reply; 59+ messages in thread From: Manuel Giraud @ 2021-07-28 8:23 UTC (permalink / raw) To: Stephen Leake; +Cc: Andrei Kuznetsov, emacs-devel Stephen Leake <stephen_leake@stephe-leake.org> writes: [...] > The tree-sitter runtime, that Emacs would link with, it implemented in > C, partly for this reason. It would be compiled with whatever Emacs is > compiled with, or the system compiler. > > Some of the tree-sitter development tools are implemented in Rust; you > only need Rust if you are developing/fixing a grammar for a language. Hi, I too did not follow the tree-sitter discussion closely. But AFAIU, tree-sitter provides tools to generate a parser (in C) from a grammar. So, is it the generated parsers (for any language Emacs supports) that will be versionned into the emacs tree? -- Manuel Giraud ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 8:23 ` Manuel Giraud @ 2021-07-28 11:48 ` Andrei Kuznetsov 2021-07-28 13:04 ` Eli Zaretskii 0 siblings, 1 reply; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 11:48 UTC (permalink / raw) To: Manuel Giraud; +Cc: Stephen Leake, emacs-devel Manuel Giraud <manuel@ledu-giraud.fr> writes: > I too did not follow the tree-sitter discussion closely. But AFAIU, > tree-sitter provides tools to generate a parser (in C) from a grammar. If that is the case, it certainly seems grave! I don't think an Emacs that requires source modifications for extending vital editing functionality is a good idea. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 11:48 ` Andrei Kuznetsov @ 2021-07-28 13:04 ` Eli Zaretskii 2021-07-28 13:14 ` Andrei Kuznetsov 2021-07-29 23:12 ` Stephen Leake 0 siblings, 2 replies; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 13:04 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: stephen_leake, manuel, emacs-devel > From: Andrei Kuznetsov <r12451428287@163.com> > Date: Wed, 28 Jul 2021 19:48:18 +0800 > Cc: Stephen Leake <stephen_leake@stephe-leake.org>, emacs-devel@gnu.org > > Manuel Giraud <manuel@ledu-giraud.fr> writes: > > > I too did not follow the tree-sitter discussion closely. But AFAIU, > > tree-sitter provides tools to generate a parser (in C) from a grammar. > > If that is the case, it certainly seems grave! I don't think an Emacs > that requires source modifications for extending vital editing > functionality is a good idea. TS's code is written in plain C, and doesn't require any regeneration or source modifications. Anything else is misunderstanding. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:04 ` Eli Zaretskii @ 2021-07-28 13:14 ` Andrei Kuznetsov 2021-07-28 13:27 ` Eli Zaretskii 2021-07-29 23:12 ` Stephen Leake 1 sibling, 1 reply; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 13:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: stephen_leake, manuel, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > TS's code is written in plain C, and doesn't require any regeneration > or source modifications. Anything else is misunderstanding. I am confused by TS's documentation, but if my understanding is correct, shouldn't it be a parser generator that generates C code? In that case, how would users load new parsers or modify existing ones? Perhaps through something similar to the existing native module support? I might be making a grave misunderstanding (or several) ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:14 ` Andrei Kuznetsov @ 2021-07-28 13:27 ` Eli Zaretskii 2021-07-28 13:31 ` Andrei Kuznetsov 2021-07-28 14:24 ` Dmitry Gutov 0 siblings, 2 replies; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 13:27 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: stephen_leake, manuel, emacs-devel > From: Andrei Kuznetsov <r12451428287@163.com> > Cc: manuel@ledu-giraud.fr, stephen_leake@stephe-leake.org, > emacs-devel@gnu.org > Date: Wed, 28 Jul 2021 21:14:48 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > > TS's code is written in plain C, and doesn't require any regeneration > > or source modifications. Anything else is misunderstanding. > > I am confused by TS's documentation, but if my understanding is correct, > shouldn't it be a parser generator that generates C code? TS is not a parser generator, it's a parser that accepts the language grammar from external files. > In that case, how would users load new parsers or modify existing > ones? If you want to modify a TS grammar file, you can (not in C). But why would you want to? The whole point of using TS is NOT to require that the Emacs development team or Emacs users should know enough about parsing of the many languages Emacs supports to modify the grammar. We want another, independent development team to take care of that, and we want to use the results of their development with minimum fuss. Exactly like we do with other libraries developed by other projects: the image libraries, GnuTLS, HarfBuzz, etc. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:27 ` Eli Zaretskii @ 2021-07-28 13:31 ` Andrei Kuznetsov 2021-07-28 14:24 ` Dmitry Gutov 1 sibling, 0 replies; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 13:31 UTC (permalink / raw) To: Eli Zaretskii; +Cc: stephen_leake, manuel, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > If you want to modify a TS grammar file, you can (not in C). But why > would you want to? The whole point of using TS is NOT to require that > the Emacs development team or Emacs users should know enough about > parsing of the many languages Emacs supports to modify the grammar. > We want another, independent development team to take care of that, > and we want to use the results of their development with minimum > fuss. Exactly like we do with other libraries developed by other > projects: the image libraries, GnuTLS, HarfBuzz, etc. Interesting perspective. Thanks for the clarification ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:27 ` Eli Zaretskii 2021-07-28 13:31 ` Andrei Kuznetsov @ 2021-07-28 14:24 ` Dmitry Gutov 2021-07-28 14:36 ` Dmitry Gutov ` (2 more replies) 1 sibling, 3 replies; 59+ messages in thread From: Dmitry Gutov @ 2021-07-28 14:24 UTC (permalink / raw) To: Eli Zaretskii, Andrei Kuznetsov; +Cc: stephen_leake, manuel, emacs-devel On 28.07.2021 16:27, Eli Zaretskii wrote: > The whole point of using TS is NOT to require that > the Emacs development team or Emacs users should know enough about > parsing of the many languages Emacs supports to modify the grammar. > We want another, independent development team to take care of that, I think we know both, though? There are a number of niche languages that only Emacs supports. Or at least that aren't likely to get good support in TreeSitter. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 14:24 ` Dmitry Gutov @ 2021-07-28 14:36 ` Dmitry Gutov 2021-07-28 14:51 ` Daniele Nicolodi 2021-07-28 16:10 ` Eli Zaretskii 2 siblings, 0 replies; 59+ messages in thread From: Dmitry Gutov @ 2021-07-28 14:36 UTC (permalink / raw) To: Eli Zaretskii, Andrei Kuznetsov; +Cc: stephen_leake, manuel, emacs-devel Sorry, On 28.07.2021 17:24, Dmitry Gutov wrote: > I think we know both, though? There are a number of niche languages that ^ want ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 14:24 ` Dmitry Gutov 2021-07-28 14:36 ` Dmitry Gutov @ 2021-07-28 14:51 ` Daniele Nicolodi 2021-07-28 16:10 ` Eli Zaretskii 2 siblings, 0 replies; 59+ messages in thread From: Daniele Nicolodi @ 2021-07-28 14:51 UTC (permalink / raw) To: emacs-devel On 28/07/2021 16:24, Dmitry Gutov wrote: > On 28.07.2021 16:27, Eli Zaretskii wrote: >> The whole point of using TS is NOT to require that >> the Emacs development team or Emacs users should know enough about >> parsing of the many languages Emacs supports to modify the grammar. >> We want another, independent development team to take care of that, > > I think we know both, though? There are a number of niche languages that > only Emacs supports. > > Or at least that aren't likely to get good support in TreeSitter. I don't see how adding support for TreeSitter can cause any problem to those. Would you like to elaborate? No one is proposing to disable other mechanism for fontification and syntax analysis in Emacs. Cheers, Dan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 14:24 ` Dmitry Gutov 2021-07-28 14:36 ` Dmitry Gutov 2021-07-28 14:51 ` Daniele Nicolodi @ 2021-07-28 16:10 ` Eli Zaretskii 2021-07-28 16:24 ` Perry E. Metzger 2 siblings, 1 reply; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 16:10 UTC (permalink / raw) To: Dmitry Gutov; +Cc: r12451428287, stephen_leake, manuel, emacs-devel > Cc: stephen_leake@stephe-leake.org, manuel@ledu-giraud.fr, emacs-devel@gnu.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Wed, 28 Jul 2021 17:24:35 +0300 > > On 28.07.2021 16:27, Eli Zaretskii wrote: > > The whole point of using TS is NOT to require that > > the Emacs development team or Emacs users should know enough about > > parsing of the many languages Emacs supports to modify the grammar. > > We want another, independent development team to take care of that, > > I think we know both, though? There are a number of niche languages that > only Emacs supports. > > Or at least that aren't likely to get good support in TreeSitter. Either someone motivated will write a TS grammar for them, or they will continue be supported by "other means". ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 16:10 ` Eli Zaretskii @ 2021-07-28 16:24 ` Perry E. Metzger 2021-07-28 16:29 ` Eli Zaretskii 0 siblings, 1 reply; 59+ messages in thread From: Perry E. Metzger @ 2021-07-28 16:24 UTC (permalink / raw) To: Eli Zaretskii, emacs-devel On 7/28/21 12:10, Eli Zaretskii wrote: >> There are a number of niche languages that only Emacs supports. >> Or at least that aren't likely to get good support in TreeSitter. > Either someone motivated will write a TS grammar for them, or they > will continue be supported by "other means". > It would be nice, of course, if people would contribute new grammars to Tree Sitter, as that will benefit everyone using a tool that works with Tree Sitter. Perry ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 16:24 ` Perry E. Metzger @ 2021-07-28 16:29 ` Eli Zaretskii 0 siblings, 0 replies; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 16:29 UTC (permalink / raw) To: Perry E. Metzger; +Cc: emacs-devel > Date: Wed, 28 Jul 2021 12:24:54 -0400 > From: "Perry E. Metzger" <perry@piermont.com> > > On 7/28/21 12:10, Eli Zaretskii wrote: > >> There are a number of niche languages that only Emacs supports. > >> Or at least that aren't likely to get good support in TreeSitter. > > Either someone motivated will write a TS grammar for them, or they > > will continue be supported by "other means". > > > It would be nice, of course, if people would contribute new grammars to > Tree Sitter, as that will benefit everyone using a tool that works with > Tree Sitter. Yes, it would be nice. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:04 ` Eli Zaretskii 2021-07-28 13:14 ` Andrei Kuznetsov @ 2021-07-29 23:12 ` Stephen Leake 2021-07-29 23:21 ` Yuan Fu ` (2 more replies) 1 sibling, 3 replies; 59+ messages in thread From: Stephen Leake @ 2021-07-29 23:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Andrei Kuznetsov, manuel, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Andrei Kuznetsov <r12451428287@163.com> >> Date: Wed, 28 Jul 2021 19:48:18 +0800 >> Cc: Stephen Leake <stephen_leake@stephe-leake.org>, emacs-devel@gnu.org >> >> Manuel Giraud <manuel@ledu-giraud.fr> writes: >> >> > I too did not follow the tree-sitter discussion closely. But AFAIU, >> > tree-sitter provides tools to generate a parser (in C) from a grammar. >> >> If that is the case, it certainly seems grave! I don't think an Emacs >> that requires source modifications for extending vital editing >> functionality is a good idea. > > TS's code is written in plain C, and doesn't require any regeneration > or source modifications. Anything else is misunderstanding. That's true for the common TS runtime, which implements the parser and error recovery, but the code for each language, that builds the LR parse table and some other data structures, is generated in C from a grammar file written in javascript, and must be linked into Emacs somehow. In addition, some languages require an "external scanner", which is more code in C that is specific to the language. Ideally, there would be some sort of plugin, so new languages could be added at run-time; maybe we could add a protocol on top of emacs modules. I don't know how Yuan is handling this now. -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-29 23:12 ` Stephen Leake @ 2021-07-29 23:21 ` Yuan Fu 2021-07-30 18:38 ` Stephen Leake 2021-07-30 0:41 ` Andrei Kuznetsov 2021-07-30 6:05 ` Eli Zaretskii 2 siblings, 1 reply; 59+ messages in thread From: Yuan Fu @ 2021-07-29 23:21 UTC (permalink / raw) To: Stephen Leake; +Cc: Andrei Kuznetsov, Eli Zaretskii, manuel, emacs-devel > On Jul 29, 2021, at 7:12 PM, Stephen Leake <stephen_leake@stephe-leake.org> wrote: > > Eli Zaretskii <eliz@gnu.org> writes: > >>> From: Andrei Kuznetsov <r12451428287@163.com> >>> Date: Wed, 28 Jul 2021 19:48:18 +0800 >>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>, emacs-devel@gnu.org >>> >>> Manuel Giraud <manuel@ledu-giraud.fr> writes: >>> >>>> I too did not follow the tree-sitter discussion closely. But AFAIU, >>>> tree-sitter provides tools to generate a parser (in C) from a grammar. >>> >>> If that is the case, it certainly seems grave! I don't think an Emacs >>> that requires source modifications for extending vital editing >>> functionality is a good idea. >> >> TS's code is written in plain C, and doesn't require any regeneration >> or source modifications. Anything else is misunderstanding. > > That's true for the common TS runtime, which implements the parser and > error recovery, but the code for each language, that builds the LR parse > table and some other data structures, is generated in C from a grammar > file written in javascript, and must be linked into Emacs somehow. Languages don’t need to be linked into Emacs. They can be in dynamic modules. Yuan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-29 23:21 ` Yuan Fu @ 2021-07-30 18:38 ` Stephen Leake 0 siblings, 0 replies; 59+ messages in thread From: Stephen Leake @ 2021-07-30 18:38 UTC (permalink / raw) To: Yuan Fu; +Cc: Andrei Kuznetsov, Eli Zaretskii, manuel, emacs-devel Yuan Fu <casouri@gmail.com> writes: >> On Jul 29, 2021, at 7:12 PM, Stephen Leake <stephen_leake@stephe-leake.org> wrote: >> That's true for the common TS runtime, which implements the parser and >> error recovery, but the code for each language, that builds the LR parse >> table and some other data structures, is generated in C from a grammar >> file written in javascript, and must be linked into Emacs somehow. > > Languages don’t need to be linked into Emacs. They can be in dynamic > modules. Dynamic modules are linked, at run-time. That's how the code that calls them knows what addresses to call. So I think you are saying the tree-sitter runtime will be linked into Emacs at emacs compile time, while the languages can be linked in at run-time. That's good. -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-29 23:12 ` Stephen Leake 2021-07-29 23:21 ` Yuan Fu @ 2021-07-30 0:41 ` Andrei Kuznetsov 2021-07-30 12:06 ` Arthur Miller 2021-07-30 18:42 ` Stephen Leake 2021-07-30 6:05 ` Eli Zaretskii 2 siblings, 2 replies; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-30 0:41 UTC (permalink / raw) To: Stephen Leake; +Cc: Eli Zaretskii, manuel, emacs-devel Stephen Leake <stephen_leake@stephe-leake.org> writes: > That's true for the common TS runtime, which implements the parser and > error recovery, but the code for each language, that builds the LR parse > table and some other data structures, is generated in C from a grammar > file written in javascript, and must be linked into Emacs somehow. In > addition, some languages require an "external scanner", which is more > code in C that is specific to the language. Interesting. I assume it would be possible to reuse the source grammar files? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 0:41 ` Andrei Kuznetsov @ 2021-07-30 12:06 ` Arthur Miller 2021-07-30 12:52 ` Óscar Fuentes ` (2 more replies) 2021-07-30 18:42 ` Stephen Leake 1 sibling, 3 replies; 59+ messages in thread From: Arthur Miller @ 2021-07-30 12:06 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: Eli Zaretskii, Stephen Leake, manuel, emacs-devel Andrei Kuznetsov <r12451428287@163.com> writes: Leake <stephen_leake@stephe-leake.org> writes: > >> That's true for the common TS runtime, which implements the parser and >> error recovery, but the code for each language, that builds the LR parse >> table and some other data structures, is generated in C from a grammar >> file written in javascript, and must be linked into Emacs somehow. In >> addition, some languages require an "external scanner", which is more >> code in C that is specific to the language. > > Interesting. I assume it would be possible to reuse the source grammar > files? It probably is, and looking at neowim's gh repo, there are some instructions on how to create a grammar for new language: https://github.com/nvim-treesitter/nvim-treesitter The process could probably be somehow automated from lisp. I have though a sincere question about this entire tree-sitter venture. Is it really worth trouble in Emacs case? As I understand TS it is a specialized regex matcher, and looking at some language specs leave me with that feeling (for example the grammar for bash): https://github.com/tree-sitter/tree-sitter-bash/blob/master/src/grammar.json I undestand that having specialized regex matcher is more efficient than some generalized regular matcher current font-locking in Emacs relies upon, but is it *that* more efficient to be worth the extra troubles? TS seem to keep state (a node) for each character typed, that will be a lot of memory consumed in some big files. If this syntax tree it keeps to implement what it does can be re-used for something else than it could be very useful, but just for syntax-highlight and indentation? Some years ago, when opening some 10k lines as found in Emacs src dir, I noticed some slowdown on font lock. But nowadays I don't experience any hickups with syntax hightlighting or indentation. Anyway, it is very educating to see TS get merged into Emacs and to read Eli's tips and guidance about Emacs internals. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 12:06 ` Arthur Miller @ 2021-07-30 12:52 ` Óscar Fuentes 2021-07-30 13:30 ` Arthur Miller 2021-07-30 13:32 ` Ergus 2021-08-02 22:13 ` Perry E. Metzger 2 siblings, 1 reply; 59+ messages in thread From: Óscar Fuentes @ 2021-07-30 12:52 UTC (permalink / raw) To: emacs-devel Arthur Miller <arthur.miller@live.com> writes: > I undestand that having specialized regex matcher is more efficient than > some generalized regular matcher current font-locking in Emacs relies > upon, but is it *that* more efficient to be worth the extra troubles? AFAIU this is not about efficience, but mainly about correctness (modern languages are increasingly more difficult to analyze) and also about decreasing the maintenance load. In the process, Emacs gets support for some new languages too. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 12:52 ` Óscar Fuentes @ 2021-07-30 13:30 ` Arthur Miller 2021-07-30 13:57 ` Ergus 2021-07-30 13:59 ` Eli Zaretskii 0 siblings, 2 replies; 59+ messages in thread From: Arthur Miller @ 2021-07-30 13:30 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel Óscar Fuentes <ofv@wanadoo.es> writes: > Arthur Miller <arthur.miller@live.com> writes: > >> I undestand that having specialized regex matcher is more efficient than >> some generalized regular matcher current font-locking in Emacs relies >> upon, but is it *that* more efficient to be worth the extra troubles? > > AFAIU this is not about efficience, but mainly about correctness (modern > languages are increasingly more difficult to analyze) Ok, I understand, and I can buy that one. Question is if it is still worth just for the syntax hightlight and indentation? If I get some spurious color here or there sometimes not colored, do I care? Can that syntax tree of TS be exposed to lisp and used for some other purposes, or is it just internal to TS and only output we see is some colors on the screen? > and also about > decreasing the maintenance load. Sure, but it is also a limitation. If Emacs will rely on TS maintainers to create new grammars and update existing ones when language changes, it means Emacs users will have to wait for changes until they are fixed upstream, similar as how gnu/linux distros work regarding packaging. Of course, a user who wish to modify or introduce new language can always rely on old font-lock or go through pain of TS toolilng based on JS and custom tools. Lisp frontend to that toolchain can probably be developed but that is even more work. > In the process, Emacs gets support for > some new languages too. Yes, it is always nice I guess :). Is there really demand for some language currently provided in TS and not in Emacs? I don't know, I am maybe overly sceptical to TS; I don't mean it is a bad package, and I am sure it has it's place in other editors, I am just not sure how it fits in Emacs where everything is easily configurable and extensible. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 13:30 ` Arthur Miller @ 2021-07-30 13:57 ` Ergus 2021-07-30 14:52 ` Arthur Miller 2021-07-30 13:59 ` Eli Zaretskii 1 sibling, 1 reply; 59+ messages in thread From: Ergus @ 2021-07-30 13:57 UTC (permalink / raw) To: Arthur Miller; +Cc: Óscar Fuentes, emacs-devel On Fri, Jul 30, 2021 at 03:30:42PM +0200, Arthur Miller wrote: >�scar Fuentes <ofv@wanadoo.es> writes: > >> Arthur Miller <arthur.miller@live.com> writes: >> >>> I undestand that having specialized regex matcher is more efficient than >>> some generalized regular matcher current font-locking in Emacs relies >>> upon, but is it *that* more efficient to be worth the extra troubles? >> >> AFAIU this is not about efficience, but mainly about correctness (modern >> languages are increasingly more difficult to analyze) > >Ok, I understand, and I can buy that one. Question is if it is still >worth just for the syntax hightlight and indentation? If I get some >spurious color here or there sometimes not colored, do I care? > Yes, we care. Syntax highlight for an editor is a basic feature in 2021. >Can that syntax tree of TS be exposed to lisp and used for some other >purposes, This is the idea. use the tree for navigations like up-list or goto-defun for example. Maybe not the tree directly, but the information it provides (maybe calling TS function wrappers or setting the TS information as text properties). >or is it just internal to TS and only output we see is some >colors on the screen? > How we use it is more a design choice. We can access the tree information with the TS api or we can just put the tree's information as text properties... imagination is the limit ;) >> and also about >> decreasing the maintenance load. >Sure, but it is also a limitation. If Emacs will rely on TS maintainers >to create new grammars and update existing ones when language changes, >it means Emacs users will have to wait for changes until they are >fixed upstream, similar as how gnu/linux distros work regarding >packaging. Of course, a user who wish to modify or introduce new >language can always rely on old font-lock or go through pain of TS >toolilng based on JS and custom tools. Lisp frontend to that toolchain >can probably be developed but that is even more work. > Sincerely; create a grammar for TS is much simpler than create a mode with font-lock, navigation commands, indentation rules and some flymake. All the modes with TS will be a bit more consistent in colors and keybindings (now we have modes where all commands use different prefixes, or lacking navigation or with different indentation customs. So using them is like learning different editors for every language) >> In the process, Emacs gets support for >> some new languages too. > >Yes, it is always nice I guess :). Is there really demand for some >language currently provided in TS and not in Emacs? > Indeed. As I mentioned before web developers are using VScode or neovim because Angular, React, Nodejs and Python are painfully supported (compared to VScode or Sublime). Rust is very limited supported in emacs, so users rely on external packages like rust-mode, elpy or anaconda that introduce different bindings, collisions and require some complex setups for the basics. >I don't know, I am maybe overly sceptical to TS; I don't mean it is a >bad package, and I am sure it has it's place in other editors, I am just >not sure how it fits in Emacs where everything is easily configurable >and extensible. > It is just a good trade-off configurable enough for 99% of the use cases. Unless we expect all the users to be advanced lisp hackers to customize their fontlocking, indentation and navigation functions for every single prog-mode. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 13:57 ` Ergus @ 2021-07-30 14:52 ` Arthur Miller 0 siblings, 0 replies; 59+ messages in thread From: Arthur Miller @ 2021-07-30 14:52 UTC (permalink / raw) To: Ergus; +Cc: Óscar Fuentes, emacs-devel Ergus <spacibba@aol.com> writes: > On Fri, Jul 30, 2021 at 03:30:42PM +0200, Arthur Miller wrote: >>�scar Fuentes <ofv@wanadoo.es> writes: >> >>> Arthur Miller <arthur.miller@live.com> writes: >>> >>>> I undestand that having specialized regex matcher is more efficient than >>>> some generalized regular matcher current font-locking in Emacs relies >>>> upon, but is it *that* more efficient to be worth the extra troubles? >>> >>> AFAIU this is not about efficience, but mainly about correctness (modern >>> languages are increasingly more difficult to analyze) >> >>Ok, I understand, and I can buy that one. Question is if it is still >>worth just for the syntax hightlight and indentation? If I get some >>spurious color here or there sometimes not colored, do I care? >> > > Yes, we care. Syntax highlight for an editor is a basic feature in 2021. Of course, but I didn't meant Emacs should be without one, wtf, it's not all or nothing :). What I said is do I really care if a file of 10k source lines has a word here or there not highlighted, which I haven't noticed with current implementation either. >>Can that syntax tree of TS be exposed to lisp and used for some other >>purposes, > > This is the idea. use the tree for navigations like up-list or > goto-defun for example. Maybe not the tree directly, but the information > it provides (maybe calling TS function wrappers or setting the TS > information as text properties). Ok, that might be useful. > Indeed. As I mentioned before web developers are using VScode or neovim > because Angular, React, Nodejs and Python are painfully supported > (compared to VScode or Sublime). Rust is very limited supported in > emacs, so users rely on external packages like rust-mode, elpy or > anaconda that introduce different bindings, collisions and require some > complex setups for the basics. Don't we rely on external packages for lots of things. Almost all of external packages you mentioned provide more than just syntax highlight, and indentation, so we will probably continue to use those for other reasons even wen TS enters Emacs. > Unless we expect all the users to be advanced lisp hackers to > customize their fontlocking, indentation and navigation functions for > every single prog-mode. Is it considered advanced lisp hackery to add extra keywords to font-lock in their init file? I always think of myself as an elisp noob. Thanks for boosting my ego :-). Don't take me wrong, I mean nothing bad, I just find answers a tad bit too extreme for my taste, but thanks for the input, it is interesting read. I guess I'll be less sceptical and see what TS brings, anyway, thanks for the all the work to all of you who work on it. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 13:30 ` Arthur Miller 2021-07-30 13:57 ` Ergus @ 2021-07-30 13:59 ` Eli Zaretskii 2021-07-30 15:45 ` Arthur Miller 1 sibling, 1 reply; 59+ messages in thread From: Eli Zaretskii @ 2021-07-30 13:59 UTC (permalink / raw) To: Arthur Miller; +Cc: ofv, emacs-devel > From: Arthur Miller <arthur.miller@live.com> > Date: Fri, 30 Jul 2021 15:30:42 +0200 > Cc: emacs-devel@gnu.org > > > and also about > > decreasing the maintenance load. > Sure, but it is also a limitation. If Emacs will rely on TS maintainers > to create new grammars and update existing ones when language changes, > it means Emacs users will have to wait for changes until they are > fixed upstream, similar as how gnu/linux distros work regarding > packaging. We have the same "problem" with every other library we use: the image libraries, GnuTLS, HarfBuzz, etc. Besides, TS is used by quite a few projects, so how long do you think it will take for serious problems in language support to be fixed? OTOH, take a look at some places in Emacs that don't have active maintainers: problems there sometimes take forever to fix. This is what happens when a project wants to control everything in its domain, but lacks manpower for doing so. It is not reasonable to expect Emacs to have experts on board for parsing every language on the face of Earth. It won't work. > I don't know, I am maybe overly sceptical to TS; I don't mean it is a > bad package, and I am sure it has it's place in other editors, I am just > not sure how it fits in Emacs where everything is easily configurable > and extensible. Not everything. Again, take the other optional libraries we use as examples: they cannot be extended inside Emacs. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 13:59 ` Eli Zaretskii @ 2021-07-30 15:45 ` Arthur Miller 0 siblings, 0 replies; 59+ messages in thread From: Arthur Miller @ 2021-07-30 15:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Arthur Miller <arthur.miller@live.com> >> Date: Fri, 30 Jul 2021 15:30:42 +0200 >> Cc: emacs-devel@gnu.org >> >> > and also about >> > decreasing the maintenance load. >> Sure, but it is also a limitation. If Emacs will rely on TS maintainers >> to create new grammars and update existing ones when language changes, >> it means Emacs users will have to wait for changes until they are >> fixed upstream, similar as how gnu/linux distros work regarding >> packaging. > > We have the same "problem" with every other library we use: the image > libraries, GnuTLS, HarfBuzz, etc. > > Besides, TS is used by quite a few projects, so how long do you think > it will take for serious problems in language support to be fixed? Yes, I understand that, I didn't meant so much problems as general configurability after pesonal preferences and extendability. That is what people seem to praise on Reddit when it comes to Emacs. > OTOH, take a look at some places in Emacs that don't have active > maintainers: problems there sometimes take forever to fix. This is > what happens when a project wants to control everything in its domain, > but lacks manpower for doing so. > > It is not reasonable to expect Emacs to have experts on board for > parsing every language on the face of Earth. It won't work. > >> I don't know, I am maybe overly sceptical to TS; I don't mean it is a >> bad package, and I am sure it has it's place in other editors, I am just >> not sure how it fits in Emacs where everything is easily configurable >> and extensible. > > Not everything. Again, take the other optional libraries we use as > examples: they cannot be extended inside Emacs. TS is a bit special library since it hooks into a part of Emacs that people do extend often but I do understand it is just a library that adds some extra value like the others. I guess you are correct about its popularity among other projects, that might do work for Emacs indeed. As Ergus pointed out it will bring lots out of the box to many people, so I guess it is a win at least in that section. Thanks for the answer. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 12:06 ` Arthur Miller 2021-07-30 12:52 ` Óscar Fuentes @ 2021-07-30 13:32 ` Ergus 2021-07-30 15:07 ` Arthur Miller 2021-08-02 22:13 ` Perry E. Metzger 2 siblings, 1 reply; 59+ messages in thread From: Ergus @ 2021-07-30 13:32 UTC (permalink / raw) To: Arthur Miller Cc: Andrei Kuznetsov, Eli Zaretskii, Stephen Leake, manuel, emacs-devel On Fri, Jul 30, 2021 at 02:06:00PM +0200, Arthur Miller wrote: >Andrei Kuznetsov <r12451428287@163.com> writes: > > Leake <stephen_leake@stephe-leake.org> writes: >> >>> That's true for the common TS runtime, which implements the parser and >>> error recovery, but the code for each language, that builds the LR parse >>> table and some other data structures, is generated in C from a grammar >>> file written in javascript, and must be linked into Emacs somehow. In >>> addition, some languages require an "external scanner", which is more >>> code in C that is specific to the language. >> >> Interesting. I assume it would be possible to reuse the source grammar >> files? > >It probably is, and looking at neowim's gh repo, there are some >instructions on how to create a grammar for new language: > >https://github.com/nvim-treesitter/nvim-treesitter > >The process could probably be somehow automated from lisp. > >I have though a sincere question about this entire tree-sitter >venture. Is it really worth trouble in Emacs case? As I understand TS it >is a specialized regex matcher, and looking at some language specs leave >me with that feeling (for example the grammar for bash): > >https://github.com/tree-sitter/tree-sitter-bash/blob/master/src/grammar.json > >I undestand that having specialized regex matcher is more efficient than >some generalized regular matcher current font-locking in Emacs relies >upon, but is it *that* more efficient to be worth the extra troubles? >TS seem to keep state (a node) for each character typed, that will be a >lot of memory consumed in some big files. If this syntax tree it keeps >to implement what it does can be re-used for something else than it >could be very useful, but just for syntax-highlight and indentation? >Some years ago, when opening some 10k lines as found in Emacs src dir, I >noticed some slowdown on font lock. But nowadays I don't experience any >hickups with syntax hightlighting or indentation. > >Anyway, it is very educating to see TS get merged into Emacs and to read >Eli's tips and guidance about Emacs internals. > The TS thing came out due to some issues in the c-mode highlighting reported in that thread: correctness and speed (slowing down things like scrolling). c-mode does its best, but C++ is evolving and more complex analysis comes with a penalty and more and more code complexity in the parser. Same happens with new languages very extended. It will be very difficult to implement a complete/competitive mode like c-mode for all the new languages that are very popular today (rust, typescript; even python). So we end having some "weak" modes with inconsistencies and different bindings and color themes. Those become unmaintained after a time because the developers migrate to more complete editors/ide and new developers just don't come to emacs because it does not satisfy their needs to start with. Probably I am wrong but 99% of the web developers (React, Nodejs, Angular) are using VSCode, the rest are with neovim; so we don't even have people with enough knowledge and motivation to implement one of those in Emacs one by one. Because these languages are more complex to analyze and because we don't have people to maintain a mode for all of them. Trying to do so will spend too much developer time reinventing what TS already does (and does it right, efficiently and with a support community). So; maintaining a mode for every language we currently don't support is not scalable over time. And reimplementing a replacement for TS in Elisp won't worth it and will end up being very slow and repeating all the errors that TS developers have already solved. TS may be useful not only for syntax highlight and indentation but also for code navigation and some basic syntax checking. Basically TS is: One "infrastructure" to rule them all. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 13:32 ` Ergus @ 2021-07-30 15:07 ` Arthur Miller 0 siblings, 0 replies; 59+ messages in thread From: Arthur Miller @ 2021-07-30 15:07 UTC (permalink / raw) To: Ergus; +Cc: Andrei Kuznetsov, Eli Zaretskii, Stephen Leake, manuel, emacs-devel Ergus <spacibba@aol.com> writes: > Probably I am wrong but 99% of the web developers (React, Nodejs, > Angular) are using VSCode, the rest are with neovim; so we don't even > have people with enough knowledge and motivation to implement one of > those in Emacs one by one. That might be for other reasons as well, like interaction modell, wording and other idiosyncrasies of Emacs as discussed in numerous threads about making Emacs popular, because of certain company is backing VSCode etc. There are other editors like Adobe's Brackets which came before VSCode and is by far less popular than VSCode. Looking at recent MS business moves (AI, Github, copilot ...), it is now understandable why they pour resources into a free code editor. I wondered how come when they first released it, now the picture clears. I don't think Emacs or barely some other editor can compete with MS, simply nobody has so much resource. That is of course not an argument for or against TS, just a thought about people prefereing a tool. Yes, I agree with you that syntax highlight out of the box for a certain library like Node oor Vue might help Emacs. I have nothing against that argument. > TS may be useful not only > for syntax highlight and indentation but also for code navigation and > some basic syntax checking. Yes, that would be a nice thing if it could be used for more than just syntax and indentation. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 12:06 ` Arthur Miller 2021-07-30 12:52 ` Óscar Fuentes 2021-07-30 13:32 ` Ergus @ 2021-08-02 22:13 ` Perry E. Metzger 2 siblings, 0 replies; 59+ messages in thread From: Perry E. Metzger @ 2021-08-02 22:13 UTC (permalink / raw) To: emacs-devel On 7/30/21 08:06, Arthur Miller wrote: > I undestand that having specialized regex matcher is more efficient than > some generalized regular matcher current font-locking in Emacs relies > upon, but is it *that* more efficient to be worth the extra troubles? It is not a question of efficiency. You cannot parse a context free grammar using regular expressions. The reason that almost all our highlight modes produce random garbage throughout is that you cannot parse a context free grammar using regular expressions. (For many languages, correctness isn't just occasionally violated, it's generally violated.) Reliable highlighting regardless of code formatting, reliable indentation assistance, reliable code folding, and other such features require that the editor be able to both parse the program being edited _and_ that the editor be able to incrementally re-parse it as it changes in minimal time. Other editors now have such features and make good use of them. Highly reliable code folding alone is worth the price of admission IMHO. (Currently, the best we can do for code folding is assume that the indentation is correct.) LSP has been revolutionary in improving the programmer's experience in Emacs. Tree Sitter will provide significant additional improvement. > TS seem to keep state (a node) for each character typed, that will be a > lot of memory consumed in some big files. No one will be forced to turn it on. I, however, almost certainly will. My productivity is more important to me than my RAM budget. Those that don't like it, though, won't have to pay the RAM tax. Perry ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 0:41 ` Andrei Kuznetsov 2021-07-30 12:06 ` Arthur Miller @ 2021-07-30 18:42 ` Stephen Leake 1 sibling, 0 replies; 59+ messages in thread From: Stephen Leake @ 2021-07-30 18:42 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: Eli Zaretskii, manuel, emacs-devel Andrei Kuznetsov <r12451428287@163.com> writes: > Stephen Leake <stephen_leake@stephe-leake.org> writes: > >> That's true for the common TS runtime, which implements the parser and >> error recovery, but the code for each language, that builds the LR parse >> table and some other data structures, is generated in C from a grammar >> file written in javascript, and must be linked into Emacs somehow. In >> addition, some languages require an "external scanner", which is more >> code in C that is specific to the language. > > Interesting. I assume it would be possible to reuse the source grammar > files? If they are licensed as free software, yes, of course. What sort of reuse do you have in mind? -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-29 23:12 ` Stephen Leake 2021-07-29 23:21 ` Yuan Fu 2021-07-30 0:41 ` Andrei Kuznetsov @ 2021-07-30 6:05 ` Eli Zaretskii 2021-07-31 12:12 ` Stephen Leake 2 siblings, 1 reply; 59+ messages in thread From: Eli Zaretskii @ 2021-07-30 6:05 UTC (permalink / raw) To: Stephen Leake; +Cc: r12451428287, manuel, emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Cc: Andrei Kuznetsov <r12451428287@163.com>, manuel@ledu-giraud.fr, > emacs-devel@gnu.org > Date: Thu, 29 Jul 2021 16:12:56 -0700 > > > TS's code is written in plain C, and doesn't require any regeneration > > or source modifications. Anything else is misunderstanding. > > That's true for the common TS runtime, which implements the parser and > error recovery, but the code for each language, that builds the LR parse > table and some other data structures, is generated in C from a grammar > file written in javascript, and must be linked into Emacs somehow. That "linking" happens when Emacs is linked against the TS library, right? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 6:05 ` Eli Zaretskii @ 2021-07-31 12:12 ` Stephen Leake 2021-07-31 13:07 ` Eli Zaretskii 0 siblings, 1 reply; 59+ messages in thread From: Stephen Leake @ 2021-07-31 12:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: r12451428287, manuel, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stephen Leake <stephen_leake@stephe-leake.org> >> Cc: Andrei Kuznetsov <r12451428287@163.com>, manuel@ledu-giraud.fr, >> emacs-devel@gnu.org >> Date: Thu, 29 Jul 2021 16:12:56 -0700 >> >> > TS's code is written in plain C, and doesn't require any regeneration >> > or source modifications. Anything else is misunderstanding. >> >> That's true for the common TS runtime, which implements the parser and >> error recovery, but the code for each language, that builds the LR parse >> table and some other data structures, is generated in C from a grammar >> file written in javascript, and must be linked into Emacs somehow. > > That "linking" happens when Emacs is linked against the TS library, > right? I don't know what you mean by "the TS library". I'm guessing you mean the tree-sitter runtime, in which case no, that does not include any languages. -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-31 12:12 ` Stephen Leake @ 2021-07-31 13:07 ` Eli Zaretskii 2021-07-31 16:55 ` Stephen Leake 0 siblings, 1 reply; 59+ messages in thread From: Eli Zaretskii @ 2021-07-31 13:07 UTC (permalink / raw) To: Stephen Leake; +Cc: r12451428287, manuel, emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Cc: r12451428287@163.com, manuel@ledu-giraud.fr, emacs-devel@gnu.org > Date: Sat, 31 Jul 2021 05:12:54 -0700 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> From: Stephen Leake <stephen_leake@stephe-leake.org> > >> Cc: Andrei Kuznetsov <r12451428287@163.com>, manuel@ledu-giraud.fr, > >> emacs-devel@gnu.org > >> Date: Thu, 29 Jul 2021 16:12:56 -0700 > >> > >> > TS's code is written in plain C, and doesn't require any regeneration > >> > or source modifications. Anything else is misunderstanding. > >> > >> That's true for the common TS runtime, which implements the parser and > >> error recovery, but the code for each language, that builds the LR parse > >> table and some other data structures, is generated in C from a grammar > >> file written in javascript, and must be linked into Emacs somehow. > > > > That "linking" happens when Emacs is linked against the TS library, > > right? > > I don't know what you mean by "the TS library". I mean libtree-sitter.a produced by building the library. > I'm guessing you mean the tree-sitter runtime, in which case no, that > does not include any languages. "Include" in what sense? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-31 13:07 ` Eli Zaretskii @ 2021-07-31 16:55 ` Stephen Leake 2021-07-31 17:12 ` Eli Zaretskii 0 siblings, 1 reply; 59+ messages in thread From: Stephen Leake @ 2021-07-31 16:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: r12451428287, manuel, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stephen Leake <stephen_leake@stephe-leake.org> >> Cc: r12451428287@163.com, manuel@ledu-giraud.fr, emacs-devel@gnu.org >> Date: Sat, 31 Jul 2021 05:12:54 -0700 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> >> From: Stephen Leake <stephen_leake@stephe-leake.org> >> >> Cc: Andrei Kuznetsov <r12451428287@163.com>, manuel@ledu-giraud.fr, >> >> emacs-devel@gnu.org >> >> Date: Thu, 29 Jul 2021 16:12:56 -0700 >> >> >> >> > TS's code is written in plain C, and doesn't require any regeneration >> >> > or source modifications. Anything else is misunderstanding. >> >> >> >> That's true for the common TS runtime, which implements the parser and >> >> error recovery, but the code for each language, that builds the LR parse >> >> table and some other data structures, is generated in C from a grammar >> >> file written in javascript, and must be linked into Emacs somehow. >> > >> > That "linking" happens when Emacs is linked against the TS library, >> > right? >> >> I don't know what you mean by "the TS library". > > I mean libtree-sitter.a produced by building the library. > >> I'm guessing you mean the tree-sitter runtime, in which case no, that >> does not include any languages. > > "Include" in what sense? There is no code in libtree-sitter.a that provides a language; all languages are built separately, by the language developers. https://github.com/tree-sitter/tree-sitter builds libtree-sitter.a, and the command line tools to build a language. https://github.com/tree-sitter/tree-sitter-python builds the object file providing the python language. There are many other languages, each with its own repository. -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-31 16:55 ` Stephen Leake @ 2021-07-31 17:12 ` Eli Zaretskii 0 siblings, 0 replies; 59+ messages in thread From: Eli Zaretskii @ 2021-07-31 17:12 UTC (permalink / raw) To: Stephen Leake; +Cc: r12451428287, manuel, emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Cc: r12451428287@163.com, manuel@ledu-giraud.fr, emacs-devel@gnu.org > Date: Sat, 31 Jul 2021 09:55:46 -0700 > > >> I don't know what you mean by "the TS library". > > > > I mean libtree-sitter.a produced by building the library. > > > >> I'm guessing you mean the tree-sitter runtime, in which case no, that > >> does not include any languages. > > > > "Include" in what sense? > > There is no code in libtree-sitter.a that provides a language; all > languages are built separately, by the language developers. > > https://github.com/tree-sitter/tree-sitter builds libtree-sitter.a, and > the command line tools to build a language. > > https://github.com/tree-sitter/tree-sitter-python builds the object file > providing the python language. > > There are many other languages, each with its own repository. We are talking past each other. But I don't think we should keep arguing about this, because there's no real disagreement here to argue about. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 3:53 ` [SPAM UNSURE] " Stephen Leake 2021-07-28 8:23 ` Manuel Giraud @ 2021-07-28 11:43 ` Andrei Kuznetsov 2021-07-28 11:50 ` Eli Zaretskii ` (3 more replies) 1 sibling, 4 replies; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 11:43 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel Stephen Leake <stephen_leake@stephe-leake.org> writes: > The tree-sitter runtime, that Emacs would link with, it implemented in > C, partly for this reason. It would be compiled with whatever Emacs is > compiled with, or the system compiler. Interesting. I was not aware of that. > Some of the tree-sitter development tools are implemented in Rust; you > only need Rust if you are developing/fixing a grammar for a language. If I understand this correctly, it means one would require the Rust toolchain to support new languages in tree-sitter, or to improve existing support. Would that really fit Emacs? I think many people might not be comfortable learning such a large language and toolchain to develop editing tools for Emacs. Furthermore, is there any concrete reason this could not be done in Lisp? Note: Somehow I sent a reply earlier, and not a follow-up. I apologize for the duplicate. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 11:43 ` Andrei Kuznetsov @ 2021-07-28 11:50 ` Eli Zaretskii 2021-07-28 12:06 ` Andrei Kuznetsov 2021-07-28 12:36 ` Ergus ` (2 subsequent siblings) 3 siblings, 1 reply; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 11:50 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: stephen_leake, emacs-devel > From: Andrei Kuznetsov <r12451428287@163.com> > Date: Wed, 28 Jul 2021 19:43:03 +0800 > Cc: emacs-devel@gnu.org > > > Some of the tree-sitter development tools are implemented in Rust; you > > only need Rust if you are developing/fixing a grammar for a language. > > If I understand this correctly, it means one would require the Rust > toolchain to support new languages in tree-sitter, or to improve > existing support. Would that really fit Emacs? I think many people > might not be comfortable learning such a large language and toolchain to > develop editing tools for Emacs. > > Furthermore, is there any concrete reason this could not be done in > Lisp? This has been discussed. Patches to convert the TS grammar files to Emacs Lisp and/or to maintain and develop them in Emacs Lisp will be most welcome. As usual with Free Software, it isn't an issue of what's desirable, it's an issue with someone stepping forward to do the job of developing this stuff. TIA ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 11:50 ` Eli Zaretskii @ 2021-07-28 12:06 ` Andrei Kuznetsov 2021-07-28 13:05 ` Eli Zaretskii 0 siblings, 1 reply; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 12:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: stephen_leake, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > This has been discussed. Patches to convert the TS grammar files to > Emacs Lisp and/or to maintain and develop them in Emacs Lisp will be > most welcome. Does "to maintain and develop them in Emacs Lisp" include facilities providing functionality similar to TS but not compatible with TS grammar files? If so I think I may have something up my sleeve, though in an early state. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 12:06 ` Andrei Kuznetsov @ 2021-07-28 13:05 ` Eli Zaretskii 2021-07-28 13:16 ` Andrei Kuznetsov 0 siblings, 1 reply; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 13:05 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: stephen_leake, emacs-devel > From: Andrei Kuznetsov <r12451428287@163.com> > Cc: stephen_leake@stephe-leake.org, emacs-devel@gnu.org > Date: Wed, 28 Jul 2021 20:06:23 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > > This has been discussed. Patches to convert the TS grammar files to > > Emacs Lisp and/or to maintain and develop them in Emacs Lisp will be > > most welcome. > > Does "to maintain and develop them in Emacs Lisp" include facilities > providing functionality similar to TS but not compatible with TS grammar > files? We are talking about the grammar files to be used by TS, so they should be compatible, of course. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:05 ` Eli Zaretskii @ 2021-07-28 13:16 ` Andrei Kuznetsov 0 siblings, 0 replies; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 13:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: stephen_leake, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > We are talking about the grammar files to be used by TS, so they > should be compatible, of course. I see. Thanks for the clarification. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 11:43 ` Andrei Kuznetsov 2021-07-28 11:50 ` Eli Zaretskii @ 2021-07-28 12:36 ` Ergus 2021-07-28 13:07 ` Andrei Kuznetsov 2021-07-28 15:12 ` Perry E. Metzger 2021-07-29 4:35 ` Richard Stallman 3 siblings, 1 reply; 59+ messages in thread From: Ergus @ 2021-07-28 12:36 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: Stephen Leake, emacs-devel On Wed, Jul 28, 2021 at 07:43:03PM +0800, Andrei Kuznetsov wrote: >Stephen Leake <stephen_leake@stephe-leake.org> writes: > >> The tree-sitter runtime, that Emacs would link with, it implemented in >> C, partly for this reason. It would be compiled with whatever Emacs is >> compiled with, or the system compiler. > >Interesting. I was not aware of that. > >> Some of the tree-sitter development tools are implemented in Rust; you >> only need Rust if you are developing/fixing a grammar for a language. > >If I understand this correctly, it means one would require the Rust >toolchain to support new languages in tree-sitter, or to improve >existing support. Would that really fit Emacs? I think many people >might not be comfortable learning such a large language and toolchain to >develop editing tools for Emacs. > >Furthermore, is there any concrete reason this could not be done in >Lisp? > I will say: 1) Performance (discussed in the previous thread): 2) Not reinvent the wheel. Tree-sitter is very well maintained, optimized and with very specialized algorithms; and we lack manpower to duplicate all that effort; and implementing it in lisp won't really worth the efforts and may be unmaintainable and slow. Tree-sitter hopefully won't get abandoned in the future because many editors use it right now (including neovim) and the community is very dynamic. Another advantage is that with tree-sitter as a back-end we could officially (almost for free) support many languages that are currently unsupported officially and may require a lot of effort to support them in a minimal way (or currently supported in some inconsistent way, with incoherent bindings/colors/indentations. Ex: Typescripts, Rust, Julia) >Note: Somehow I sent a reply earlier, and not a follow-up. I apologize >for the duplicate. > > ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 12:36 ` Ergus @ 2021-07-28 13:07 ` Andrei Kuznetsov 2021-07-28 13:16 ` Eli Zaretskii 2021-07-29 23:25 ` [SPAM UNSURE] " Stephen Leake 0 siblings, 2 replies; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 13:07 UTC (permalink / raw) To: Ergus; +Cc: Stephen Leake, emacs-devel Ergus <spacibba@aol.com> writes: > 1) Performance (discussed in the previous thread): FWIW I have been experimenting with an increcemental GLR parser generator in Emacs Lisp. While I have not put in the effort to couple it with font-lock and such, from anecdotal examination it does not perform badly with a naive C grammar. The initial parse does take several seconds on large files, but afterwards I did not notice a significant drop in editor responsiveness. > 2) Not reinvent the wheel. While tree-sitter may be nice and all, it doesn't seem to offer the usual extensibility expected from Emacs. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:07 ` Andrei Kuznetsov @ 2021-07-28 13:16 ` Eli Zaretskii 2021-07-28 13:27 ` Andrei Kuznetsov 2021-07-29 23:25 ` [SPAM UNSURE] " Stephen Leake 1 sibling, 1 reply; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 13:16 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: spacibba, stephen_leake, emacs-devel > From: Andrei Kuznetsov <r12451428287@163.com> > Date: Wed, 28 Jul 2021 21:07:40 +0800 > Cc: Stephen Leake <stephen_leake@stephe-leake.org>, emacs-devel@gnu.org > > While tree-sitter may be nice and all, it doesn't seem to offer the > usual extensibility expected from Emacs. Which extensibility did you have in mind that TS doesn't support? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:16 ` Eli Zaretskii @ 2021-07-28 13:27 ` Andrei Kuznetsov 2021-07-28 13:32 ` Eli Zaretskii 0 siblings, 1 reply; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 13:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: spacibba, stephen_leake, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> While tree-sitter may be nice and all, it doesn't seem to offer the >> usual extensibility expected from Emacs. > Which extensibility did you have in mind that TS doesn't support? Let us assume that a generated TS grammar contains a (C) function akin to `semantic-lex-unterminated-syntax-detected', and I wish to achieve similar results to binding `semantic-lex-unterminated-syntax-end-function' to a function of my choice. Would that be possible? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:27 ` Andrei Kuznetsov @ 2021-07-28 13:32 ` Eli Zaretskii 2021-07-28 13:38 ` Andrei Kuznetsov 0 siblings, 1 reply; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 13:32 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: spacibba, stephen_leake, emacs-devel > From: Andrei Kuznetsov <r12451428287@163.com> > Cc: spacibba@aol.com, stephen_leake@stephe-leake.org, emacs-devel@gnu.org > Date: Wed, 28 Jul 2021 21:27:43 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> While tree-sitter may be nice and all, it doesn't seem to offer the > >> usual extensibility expected from Emacs. > > > Which extensibility did you have in mind that TS doesn't support? > > Let us assume that a generated TS grammar contains a (C) function akin > to `semantic-lex-unterminated-syntax-detected', and I wish to achieve > similar results to binding > `semantic-lex-unterminated-syntax-end-function' to a function of my > choice. Would that be possible? (TS doesn't generate a grammar, it comes with grammar files prepared externally.) If you are talking about affecting how TS does lexical analysis for some language, then I see no reason why we in the Emacs project would want to do that. We don't _want_ to develop parsers if we can use parsers available out there. Lexical analysis of a parser is determined by the language it parses, so you need only to change the parser when the language changes, or to fix a bug. Both are part of the job of the TS developers, so there should be no need for us to get busy with that. Exactly like we do with other libraries we use that aren't developed as part of the Emacs project. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:32 ` Eli Zaretskii @ 2021-07-28 13:38 ` Andrei Kuznetsov 2021-07-28 14:41 ` Manuel Giraud 0 siblings, 1 reply; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-28 13:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: spacibba, stephen_leake, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > If you are talking about affecting how TS does lexical analysis for > some language, then I see no reason why we in the Emacs project would > want to do that. We don't _want_ to develop parsers if we can use > parsers available out there. Lexical analysis of a parser is > determined by the language it parses, so you need only to change the > parser when the language changes, or to fix a bug. Both are part of > the job of the TS developers, so there should be no need for us to get > busy with that. Exactly like we do with other libraries we use that > aren't developed as part of the Emacs project. Okay, thanks for the clarification ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:38 ` Andrei Kuznetsov @ 2021-07-28 14:41 ` Manuel Giraud 2021-07-28 15:15 ` Perry E. Metzger 2021-07-28 16:10 ` Eli Zaretskii 0 siblings, 2 replies; 59+ messages in thread From: Manuel Giraud @ 2021-07-28 14:41 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: Eli Zaretskii, stephen_leake, spacibba, emacs-devel Andrei Kuznetsov <r12451428287@163.com> writes: > Eli Zaretskii <eliz@gnu.org> writes: > >> If you are talking about affecting how TS does lexical analysis for >> some language, then I see no reason why we in the Emacs project would >> want to do that. We don't _want_ to develop parsers if we can use >> parsers available out there. Lexical analysis of a parser is >> determined by the language it parses, so you need only to change the >> parser when the language changes, or to fix a bug. Both are part of >> the job of the TS developers, so there should be no need for us to get >> busy with that. Exactly like we do with other libraries we use that >> aren't developed as part of the Emacs project. > > Okay, thanks for the clarification Yes, thanks for these clarifications and sorry for my misunderstanding. I still have one question left though: will the parsers C code (and TS C code) land into the emacs repo or will TS be accessible as an external library? -- Manuel Giraud ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 14:41 ` Manuel Giraud @ 2021-07-28 15:15 ` Perry E. Metzger 2021-07-28 16:10 ` Eli Zaretskii 1 sibling, 0 replies; 59+ messages in thread From: Perry E. Metzger @ 2021-07-28 15:15 UTC (permalink / raw) To: emacs-devel On 7/28/21 10:41, Manuel Giraud wrote: > Yes, thanks for these clarifications and sorry for my > misunderstanding. I still have one question left though: will the > parsers C code (and TS C code) land into the emacs repo or will TS be > accessible as an external library? I don't think that has been fully decided. It may be necessary to have a patched version of Tree Sitter available to integrate properly with the rest of the Emacs runtime. Perry ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 14:41 ` Manuel Giraud 2021-07-28 15:15 ` Perry E. Metzger @ 2021-07-28 16:10 ` Eli Zaretskii 1 sibling, 0 replies; 59+ messages in thread From: Eli Zaretskii @ 2021-07-28 16:10 UTC (permalink / raw) To: Manuel Giraud; +Cc: r12451428287, spacibba, stephen_leake, emacs-devel > From: Manuel Giraud <manuel@ledu-giraud.fr> > Cc: Eli Zaretskii <eliz@gnu.org>, spacibba@aol.com, > stephen_leake@stephe-leake.org, emacs-devel@gnu.org > Date: Wed, 28 Jul 2021 16:41:01 +0200 > > I still have one question left though: will the parsers C code (and > TS C code) land into the emacs repo or will TS be accessible as an > external library? The latter. Unless something very unexpected will be discovered about TS that could not be fixed by the TS developers. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 13:07 ` Andrei Kuznetsov 2021-07-28 13:16 ` Eli Zaretskii @ 2021-07-29 23:25 ` Stephen Leake 2021-07-30 0:54 ` Andrei Kuznetsov 1 sibling, 1 reply; 59+ messages in thread From: Stephen Leake @ 2021-07-29 23:25 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: Ergus, emacs-devel Andrei Kuznetsov <r12451428287@163.com> writes: > Ergus <spacibba@aol.com> writes: > >> 1) Performance (discussed in the previous thread): > > FWIW I have been experimenting with an increcemental GLR parser > generator in Emacs Lisp. The "generator" and the "runtime" are two separate programs, with separate functions, used at different times. The generator takes the javascript language grammar file and translates it (thru lots of hairy computations) into code that builds a parse table and other data structures. The tree-sitter generator outputs that code in C; it might be possible to adapt it to output in elisp (the wisitoken generator used to output elisp, but i gave that up when I implemented error recover in Ada; elisp is way to slow for that). The "runtime" uses the parse table to parse text at runtime, in response to user actions on the buffer. To be useful in an interactive editing context, it must have robust error recovery. What is your error recovery algorithm? > While I have not put in the effort to couple it with font-lock and > such, from anecdotal examination it does not perform badly with a > naive C grammar. Are you talking about the generator or runtime here? > The initial parse does take several seconds on large files, That's the runtime. Actual time for xdisp.c, preferably compared with a tree-sitter parse run on the same machine, would be helpful. How long does the generator take? > but afterwards I did not notice a significant drop in editor > responsiveness. This seems to imply that the runtime supports incremental parse, so it does not reparse the whole buffer each time; is that true? >> 2) Not reinvent the wheel. > > While tree-sitter may be nice and all, it doesn't seem to offer the > usual extensibility expected from Emacs. It's all open-source, but it is very complicated and may be beyond many people's ability to change correctly. It requires running a C compiler to change it, but so do other parts of Emacs (for example, the json parser). So what is it missing? -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-29 23:25 ` [SPAM UNSURE] " Stephen Leake @ 2021-07-30 0:54 ` Andrei Kuznetsov 2021-07-30 3:02 ` Andrei Kuznetsov 2021-07-30 18:48 ` Stephen Leake 0 siblings, 2 replies; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-30 0:54 UTC (permalink / raw) To: Stephen Leake; +Cc: Ergus, emacs-devel Stephen Leake <stephen_leake@stephe-leake.org> writes: > The "generator" and the "runtime" are two separate programs, with > separate functions, used at different times. > > The generator takes the javascript language grammar file and translates > it (thru lots of hairy computations) into code that builds a parse table > and other data structures. The tree-sitter generator outputs that code > in C; it might be possible to adapt it to output in elisp (the wisitoken > generator used to output elisp, but i gave that up when I implemented > error recover in Ada; elisp is way to slow for that). > > The "runtime" uses the parse table to parse text at runtime, in response > to user actions on the buffer. To be useful in an interactive editing > context, it must have robust error recovery. What is your error recovery > algorithm? Currently extremely naive. After an error occurs, it skips productions until it can parses without errors, and just continues from there. I plan to improve it somewhat in the near future. > Are you talking about the generator or runtime here? The runtime. The parser generator does not seem to be astonishingly fast, but I don't think most people will have any cause to run it very often. > That's the runtime. Actual time for xdisp.c, preferably compared with a > tree-sitter parse run on the same machine, would be helpful. I'm currently pre-occupied and unable to work on this, but I will return with these measurements as soon as reasonably possible. > How long does the generator take? I did not measure that, but as most people would be loading compiled parsers, and not running the generator, I don't think it would matter too much. FWIW macroexpansion of the macro `defgrammar' blocks Emacs for a second or 2. > This seems to imply that the runtime supports incremental parse, so it > does not reparse the whole buffer each time; is that true? Indeed. I've not yet figured out a particularly good way of recording changes though -- as of present it relies on its own versions of self-insert-command, kill-region, et cetera. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 0:54 ` Andrei Kuznetsov @ 2021-07-30 3:02 ` Andrei Kuznetsov 2021-07-30 18:48 ` Stephen Leake 1 sibling, 0 replies; 59+ messages in thread From: Andrei Kuznetsov @ 2021-07-30 3:02 UTC (permalink / raw) To: Stephen Leake; +Cc: Ergus, emacs-devel Andrei Kuznetsov <r12451428287@163.com> writes: > until it can parses without errors, and just continues from there. ^^^^^^^^^^ I meant to say "parse" instead. Further, as for "without errors", it skips until it finds the next synchronizing token, attempts to parse starting from that token, and if that fails repeats the process until either EOF is reached or it is successful. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 0:54 ` Andrei Kuznetsov 2021-07-30 3:02 ` Andrei Kuznetsov @ 2021-07-30 18:48 ` Stephen Leake 1 sibling, 0 replies; 59+ messages in thread From: Stephen Leake @ 2021-07-30 18:48 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: Ergus, emacs-devel Andrei Kuznetsov <r12451428287@163.com> writes: > Stephen Leake <stephen_leake@stephe-leake.org> writes: > >> How long does the generator take? > > I did not measure that, but as most people would be loading compiled > parsers, and not running the generator, I don't think it would matter > too much. FWIW macroexpansion of the macro `defgrammar' blocks Emacs > for a second or 2. It can matter a lot for large grammars. wisitoken used to take hours to generate the LR1 parse table for Ada; now it takes a couple minutes. tree-sitter never finishes that grammar. A naive LR grammar generator can easily be O (n**3) or worse in the grammar size; I spent a lot of time optimizing wisitoken so it can handle Ada reasonably. -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 11:43 ` Andrei Kuznetsov 2021-07-28 11:50 ` Eli Zaretskii 2021-07-28 12:36 ` Ergus @ 2021-07-28 15:12 ` Perry E. Metzger 2021-07-29 23:28 ` Stephen Leake 2021-07-29 4:35 ` Richard Stallman 3 siblings, 1 reply; 59+ messages in thread From: Perry E. Metzger @ 2021-07-28 15:12 UTC (permalink / raw) To: emacs-devel On 7/28/21 07:43, Andrei Kuznetsov wrote: > Stephen Leake <stephen_leake@stephe-leake.org> writes: > >> Some of the tree-sitter development tools are implemented in Rust; you >> only need Rust if you are developing/fixing a grammar for a language. > If I understand this correctly, it means one would require the Rust > toolchain to support new languages in tree-sitter, or to improve > existing support. That's not true. Tree Sitter is not written even partially in Rust. It does have Rust bindings for people who use Rust. Perry ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 15:12 ` Perry E. Metzger @ 2021-07-29 23:28 ` Stephen Leake 2021-07-30 0:19 ` Perry E. Metzger 0 siblings, 1 reply; 59+ messages in thread From: Stephen Leake @ 2021-07-29 23:28 UTC (permalink / raw) To: Perry E. Metzger; +Cc: emacs-devel "Perry E. Metzger" <perry@piermont.com> writes: > On 7/28/21 07:43, Andrei Kuznetsov wrote: >> Stephen Leake <stephen_leake@stephe-leake.org> writes: >> >>> Some of the tree-sitter development tools are implemented in Rust; you >>> only need Rust if you are developing/fixing a grammar for a language. >> If I understand this correctly, it means one would require the Rust >> toolchain to support new languages in tree-sitter, or to improve >> existing support. > > That's not true. Tree Sitter is not written even partially in Rust. It > does have Rust bindings for people who use Rust. https://github.com/tree-sitter/tree-sitter/tree/master/cli/src -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-29 23:28 ` Stephen Leake @ 2021-07-30 0:19 ` Perry E. Metzger 2021-07-30 18:44 ` [SPAM UNSURE] " Stephen Leake 0 siblings, 1 reply; 59+ messages in thread From: Perry E. Metzger @ 2021-07-30 0:19 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel On 7/29/21 19:28, Stephen Leake wrote: > "Perry E. Metzger" <perry@piermont.com> writes: > >> That's not true. Tree Sitter is not written even partially in Rust. It >> does have Rust bindings for people who use Rust. > https://github.com/tree-sitter/tree-sitter/tree/master/cli/src > That's an optional CLI and is not part of the library runtime. Perry ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-30 0:19 ` Perry E. Metzger @ 2021-07-30 18:44 ` Stephen Leake 0 siblings, 0 replies; 59+ messages in thread From: Stephen Leake @ 2021-07-30 18:44 UTC (permalink / raw) To: Perry E. Metzger; +Cc: emacs-devel "Perry E. Metzger" <perry@piermont.com> writes: > On 7/29/21 19:28, Stephen Leake wrote: >> "Perry E. Metzger" <perry@piermont.com> writes: >> >>> That's not true. Tree Sitter is not written even partially in Rust. It >>> does have Rust bindings for people who use Rust. >> https://github.com/tree-sitter/tree-sitter/tree/master/cli/src >> > That's an optional CLI and is not part of the library runtime. Yes, and the optional CLI is part of the tree-sitter project, so the statement "Tree Sitter is not written even partially in Rust" is simply wrong. Please be more careful when you say "tree-sitter", but mean "tree-sitter runtime"; it is not always clear from context. -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 11:43 ` Andrei Kuznetsov ` (2 preceding siblings ...) 2021-07-28 15:12 ` Perry E. Metzger @ 2021-07-29 4:35 ` Richard Stallman 3 siblings, 0 replies; 59+ messages in thread From: Richard Stallman @ 2021-07-29 4:35 UTC (permalink / raw) To: Andrei Kuznetsov; +Cc: stephen_leake, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] I don't think we should reject Rust code for the GNU system. There is no need for such a drastic step. We already use software that is built in Rust. Tree sitter is not going to be a part of Emacs; its use is not limited to Emacs. Other programs will work with it too. So I don't see any special reason to replace parts of it with Emacs Lisp code. -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 1:57 Maybe we're taking a wrong approach towards tree-sitter Andrei Kuznetsov 2021-07-28 3:53 ` [SPAM UNSURE] " Stephen Leake @ 2021-07-28 15:09 ` Perry E. Metzger 2021-07-29 23:35 ` Stephen Leake 1 sibling, 1 reply; 59+ messages in thread From: Perry E. Metzger @ 2021-07-28 15:09 UTC (permalink / raw) To: emacs-devel On 7/27/21 21:57, Andrei Kuznetsov wrote: > Unlike features like native JSON, however, I believe tree-sitter is the > first optional package providing notable functionality that would > require a toolchain that depends on LLVM (that of Rust, which > tree-sitter is implemented in) Tree sitter is written in C. It has an available set of Rust bindings. It compiles perfectly well with any C compiler. Perry ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: Maybe we're taking a wrong approach towards tree-sitter 2021-07-28 15:09 ` Perry E. Metzger @ 2021-07-29 23:35 ` Stephen Leake 0 siblings, 0 replies; 59+ messages in thread From: Stephen Leake @ 2021-07-29 23:35 UTC (permalink / raw) To: Perry E. Metzger; +Cc: emacs-devel "Perry E. Metzger" <perry@piermont.com> writes: > On 7/27/21 21:57, Andrei Kuznetsov wrote: >> Unlike features like native JSON, however, I believe tree-sitter is the >> first optional package providing notable functionality that would >> require a toolchain that depends on LLVM (that of Rust, which >> tree-sitter is implemented in) > > Tree sitter is written in C. There are many parts to tree-sitter. The runtime, which uses language-specific parse tables to parse use files, is written in C. The command line tools (cli), one of which converts the language grammar file written in javascript into C code that builds the parse table, are written in Rust; https://github.com/tree-sitter/tree-sitter/tree/master/cli/src > It has an available set of Rust bindings. It compiles perfectly well > with any C compiler. Here you are describing the runtime, which is what must be linked with Emacs for a major-mode to use the tree-sitter parser. -- -- Stephe ^ permalink raw reply [flat|nested] 59+ messages in thread
end of thread, other threads:[~2021-08-02 22:13 UTC | newest] Thread overview: 59+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-07-28 1:57 Maybe we're taking a wrong approach towards tree-sitter Andrei Kuznetsov 2021-07-28 3:53 ` [SPAM UNSURE] " Stephen Leake 2021-07-28 8:23 ` Manuel Giraud 2021-07-28 11:48 ` Andrei Kuznetsov 2021-07-28 13:04 ` Eli Zaretskii 2021-07-28 13:14 ` Andrei Kuznetsov 2021-07-28 13:27 ` Eli Zaretskii 2021-07-28 13:31 ` Andrei Kuznetsov 2021-07-28 14:24 ` Dmitry Gutov 2021-07-28 14:36 ` Dmitry Gutov 2021-07-28 14:51 ` Daniele Nicolodi 2021-07-28 16:10 ` Eli Zaretskii 2021-07-28 16:24 ` Perry E. Metzger 2021-07-28 16:29 ` Eli Zaretskii 2021-07-29 23:12 ` Stephen Leake 2021-07-29 23:21 ` Yuan Fu 2021-07-30 18:38 ` Stephen Leake 2021-07-30 0:41 ` Andrei Kuznetsov 2021-07-30 12:06 ` Arthur Miller 2021-07-30 12:52 ` Óscar Fuentes 2021-07-30 13:30 ` Arthur Miller 2021-07-30 13:57 ` Ergus 2021-07-30 14:52 ` Arthur Miller 2021-07-30 13:59 ` Eli Zaretskii 2021-07-30 15:45 ` Arthur Miller 2021-07-30 13:32 ` Ergus 2021-07-30 15:07 ` Arthur Miller 2021-08-02 22:13 ` Perry E. Metzger 2021-07-30 18:42 ` Stephen Leake 2021-07-30 6:05 ` Eli Zaretskii 2021-07-31 12:12 ` Stephen Leake 2021-07-31 13:07 ` Eli Zaretskii 2021-07-31 16:55 ` Stephen Leake 2021-07-31 17:12 ` Eli Zaretskii 2021-07-28 11:43 ` Andrei Kuznetsov 2021-07-28 11:50 ` Eli Zaretskii 2021-07-28 12:06 ` Andrei Kuznetsov 2021-07-28 13:05 ` Eli Zaretskii 2021-07-28 13:16 ` Andrei Kuznetsov 2021-07-28 12:36 ` Ergus 2021-07-28 13:07 ` Andrei Kuznetsov 2021-07-28 13:16 ` Eli Zaretskii 2021-07-28 13:27 ` Andrei Kuznetsov 2021-07-28 13:32 ` Eli Zaretskii 2021-07-28 13:38 ` Andrei Kuznetsov 2021-07-28 14:41 ` Manuel Giraud 2021-07-28 15:15 ` Perry E. Metzger 2021-07-28 16:10 ` Eli Zaretskii 2021-07-29 23:25 ` [SPAM UNSURE] " Stephen Leake 2021-07-30 0:54 ` Andrei Kuznetsov 2021-07-30 3:02 ` Andrei Kuznetsov 2021-07-30 18:48 ` Stephen Leake 2021-07-28 15:12 ` Perry E. Metzger 2021-07-29 23:28 ` Stephen Leake 2021-07-30 0:19 ` Perry E. Metzger 2021-07-30 18:44 ` [SPAM UNSURE] " Stephen Leake 2021-07-29 4:35 ` Richard Stallman 2021-07-28 15:09 ` Perry E. Metzger 2021-07-29 23:35 ` Stephen Leake
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).