unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: arthur miller <arthur.miller@live.com>, Eli Zaretskii <eliz@gnu.org>
Cc: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
Subject: Re: Using incremental parsing in Emacs
Date: Fri, 10 Jan 2020 00:56:38 +0300	[thread overview]
Message-ID: <aecd377e-2132-f3ed-fe89-0837116ace9a@yandex.ru> (raw)
In-Reply-To: <VI1P194MB042962990766CCDD18B3CE0796220@VI1P194MB0429.EURP194.PROD.OUTLOOK.COM>

On 04.01.2020 16:46, arthur miller wrote:

> There is a very good presentation of tree-sitter on YT by its author:
> 
> https://www.youtube.com/watch?v=Jes3bD6P0To
> 
> Looks much better then what I got a picture by just reading on the
> website:

It was a good watch.

Some takeaways from me:

It implements a GLR parser. One that can update the existing AST quickly 
for an arbitrary edit in the middle of a file. (*)

But it parses a new file quickly as well: a 20000 lines JS file in 54ms.

To be able to reach that speed, they went the traditional 
compiler-writer route of having a separate (grammar-to-C-code) 
compilation step from a grammar to a parser program (which relies on a 
shared runtime). (**)

Some of it seems to be by necessity. Every run returns a full AST, not 
just an "AST up to this position". I suppose the author didn't want the 
problems that come with unfinished parse trees when code relies on that 
returned value. (***)

The generated parser, in addition to being incremental, is 
error-tolerant, which is a necessity for use in editors.

As a result, they have features like fast semantic syntax highlighting, 
as well code folding that accurately detects where function body begins 
and ends (previously, Atom and other editors used guessing based on 
indentation levels, apparently). And a "extend selection" command based 
on AST as well (****)

Tree-Sitter is also used inside GitHub for various features, including 
their Semantic library (which implements code navigation on the web).

In the meantime, our current answer to all of the above is syntax-ppss 
plus local regexp-based parsing around the visible part of the buffer.

To compare:

(*) syntax-ppss is also fully incremental, although the returned value 
is a very simplistic substitute for an AST. But we've been using it for 
a while and have done solid things with it.

(**) Which means that if we try to use Tree-Sitter as-is, our current 
practice of defining the language grammar in Lisp would go our of the 
window. https://github.com/ubolonton/emacs-tree-sitter demonstrates this 
as well: language grammars have to be compiled into a shared library (or 
libraries). We would have lots of grammars supplied by the third party, 
which is kind of good, but we would lose the ease of experimenting with 
them that we have now, or being able to write support for a new 
up-and-coming language very quickly. Which a certain fraction of our 
users enjoys, AFAIK.

(***) Whereas syntax-ppss stops at a requested position, thus saving on 
CPU cycles this way. Similarly, if a new system we'll transition to 
someday also does this, its absolute performance/throughput would be 
less important if it only usually has to parse a screen-worth of file at 
a time.

(****) We've been managing surprisingly well with syntax-ppss, 
forward-sexp, etc. So code folding works quite well in Emacs already, 
and the easy-kill package in GNU ELPA does the "expand selection" thing 
very successfully as well. But we could use some improvement in having 
some more complex syntax supported or handled more easily, in certain 
languages. Having a "proper AST" available is nothing to sneeze at 
either, and would likely help a lot in indentation code.

My personal takeaway is that we could really benefit from a lispier 
version of this technology, and Someone(tm) should start working on that.



  parent reply	other threads:[~2020-01-09 21:56 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-03 10:05 Using incremental parsing in Emacs Eli Zaretskii
2020-01-03 13:36 ` phillip.lord
2020-01-03 14:24   ` Eli Zaretskii
2020-01-03 15:43     ` arthur miller
2020-01-03 16:00 ` Dmitry Gutov
2020-01-03 17:09   ` Pankaj Jangid
2020-01-03 19:39 ` Stephen Leake
2020-01-03 20:05   ` Eli Zaretskii
2020-01-03 22:21     ` arthur miller
2020-01-04  3:46       ` HaiJun Zhang
2020-01-04  8:23       ` Eli Zaretskii
2020-01-03 23:53     ` Stephen Leake
2020-01-04  8:45       ` Eli Zaretskii
2020-01-04 14:05         ` arthur miller
2020-01-04 19:26         ` Stephen Leake
2020-01-04 19:54           ` Eli Zaretskii
2020-01-05 17:05             ` Stephen Leake
2020-01-05 19:14               ` yyoncho
2020-01-05 22:44     ` Dmitry Gutov
2020-01-04  3:59 ` HaiJun Zhang
     [not found] ` <41b3e9a0-2866-4692-a35c-6d9541bc3aaa@Spark>
2020-01-04  4:57   ` HaiJun Zhang
2020-01-04  8:55     ` Eli Zaretskii
2020-01-04 12:50       ` VanL
2020-01-04 13:22         ` arthur miller
2020-01-04 23:47         ` Replacing all C code???? Richard Stallman
2020-01-05  3:35           ` VanL
2020-01-05 22:19             ` Richard Stallman
2020-01-05  5:01           ` Stefan Monnier
2020-01-05 16:58             ` Fangrui Song
2020-01-05 22:18             ` Richard Stallman
2020-01-05 22:25               ` Stefan Monnier
2020-01-07  2:34                 ` VanL
2020-01-04 13:30       ` Using incremental parsing in Emacs arthur miller
2020-01-04 13:42         ` Dmitry Gutov
2020-01-04 14:46 ` arthur miller
2020-01-05 14:50   ` Alan Third
2020-01-05 15:16     ` arthur miller
2020-01-05 15:29     ` Eli Zaretskii
2020-01-05 15:31     ` Eli Zaretskii
2020-01-05 17:11     ` Stephen Leake
2020-01-09 21:56   ` Dmitry Gutov [this message]
2020-01-10  7:41     ` Eli Zaretskii
2020-01-11  1:41       ` Dmitry Gutov
2020-01-11  7:53         ` Eli Zaretskii
2020-01-11 12:24           ` Dmitry Gutov
2020-01-11 12:29             ` Eli Zaretskii
2020-01-04 20:26 ` Yuan Fu
2020-01-04 20:43 ` Stefan Monnier
2020-01-05 14:19   ` Alan Third
2020-01-05 17:07     ` Stephen Leake
2020-01-05 19:16       ` Alan Third
2020-01-05 17:09     ` Stefan Monnier
2020-01-05 18:22       ` Eli Zaretskii
2020-01-05 19:18         ` Stefan Monnier
2020-01-05 19:36           ` Eli Zaretskii
2020-01-05 20:27             ` Stefan Monnier
2020-01-05 21:12               ` yyoncho
2020-01-05 22:10                 ` Stefan Monnier
2020-01-05 23:08                   ` yyoncho
2020-01-06  3:39                   ` Eli Zaretskii
2020-01-05 19:23         ` arthur miller
2020-01-05 19:40           ` Eli Zaretskii
2020-01-05 20:28             ` arthur miller
2020-01-06  3:42               ` Eli Zaretskii
2020-01-06  4:39                 ` HaiJun Zhang
2020-01-06  5:33                   ` Eli Zaretskii
2020-01-06  5:55                     ` HaiJun Zhang
2020-01-06  6:11                       ` Eli Zaretskii
2020-01-06 16:45                     ` arthur miller
2020-01-07 16:19                       ` Eli Zaretskii
2020-01-06 13:47                   ` Stefan Monnier
2020-01-06 16:36                     ` HaiJun Zhang
2020-01-06 16:48                     ` arthur miller
2020-01-06 16:14 ` Anand Tamariya
     [not found] <1504933445.581219.1569619792280.ref@mail.yahoo.com>
2019-09-27 21:29 ` Where to place third-party C source code? Jorge Araya Navarro
2019-09-28  6:31   ` Eli Zaretskii
2019-09-28  7:33     ` Jorge Javier Araya Navarro
2019-09-28 12:54       ` Stefan Monnier
2019-12-26 16:52         ` yyoncho
2020-01-04  3:25           ` Using incremental parsing in Emacs HaiJun Zhang
2020-01-04  5:21             ` Tobias Bading
2020-01-04 23:48             ` Richard Stallman
2020-01-05  3:36               ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aecd377e-2132-f3ed-fe89-0837116ace9a@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=arthur.miller@live.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).