* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) @ 2020-03-31 17:07 Tuấn Anh Nguyễn 2020-03-31 17:50 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Tuấn Anh Nguyễn @ 2020-03-31 17:07 UTC (permalink / raw) To: emacs-devel > In any case, I hope that passing the buffer to tree-sitter doesn't > involve marshalling the entire buffer text via a function call as a > huge string, or some such. We should instead request that tree-sitter > exposes an API through which we could give it direct access to buffer > text as 2 parts, before and after the gap, like we do with regex > code. Otherwise this will be a bottleneck in the long run, not unlike > the problem we have with LSP. It does support parsing through direct access. Which is why I wanted dynamic modules to have direct access to buffer text. >> How large is "very large" here? > > xdisp.c comes to mind, obviously. On my machine, a 3.39 GHz Intel Core i7: (0.150791 0 0.0) ; 1 full parse (2.142236 5 0.6105190000000107) ; 10 full parses (0.015423 0 0.0) ; incremental parsing, after typing 1 character -- Tuấn-Anh Nguyễn Software Engineer ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 17:07 Reliable after-change-functions (via: Using incremental parsing in Emacs) Tuấn Anh Nguyễn @ 2020-03-31 17:50 ` Eli Zaretskii 2020-04-01 6:17 ` Tuấn Anh Nguyễn 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 17:50 UTC (permalink / raw) To: Tuấn Anh Nguyễn; +Cc: emacs-devel > From: Tuấn Anh Nguyễn <ubolonton@gmail.com> > Date: Wed, 1 Apr 2020 00:07:27 +0700 > > > xdisp.c comes to mind, obviously. > > On my machine, a 3.39 GHz Intel Core i7: > > (0.150791 0 0.0) ; 1 full parse How did you submit xdisp.c to the parser? In any case, IIUC, the first time a buffer needs to be displayed, we need to wait for these 150 msec? That's annoyingly long (and I suspect in real Emacs usage will be significantly longer, due to memory allocation, encoding, etc.). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 17:50 ` Eli Zaretskii @ 2020-04-01 6:17 ` Tuấn Anh Nguyễn 2020-04-01 13:26 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Tuấn Anh Nguyễn @ 2020-04-01 6:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1220 bytes --] On Wed, Apr 1, 2020 at 12:50 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Tuấn Anh Nguyễn <ubolonton@gmail.com> > > Date: Wed, 1 Apr 2020 00:07:27 +0700 > > > > > xdisp.c comes to mind, obviously. > > > > On my machine, a 3.39 GHz Intel Core i7: > > > > (0.150791 0 0.0) ; 1 full parse > > How did you submit xdisp.c to the parser? > (with-current-buffer "xdisp.c" (let ((language (tree-sitter-require 'c)) (parser (ts-make-parser))) (ts-set-language parser language) (garbage-collect) (message "%s" (benchmark-run (ts-parse parser #'ts-buffer-input nil))))) > In any case, IIUC, the first time a buffer needs to be displayed, we > need to wait for these 150 msec? That's annoyingly long (and I > suspect in real Emacs usage will be significantly longer, due to > memory allocation, encoding, etc.). > Real usage with "xdisp.c": (define-advice tree-sitter--do-parse (:around (f &rest args) benchmark) (message "%s" (benchmark-run (apply f args)))) (0.257998 1 0.13326100000000096) So yes, direct access to buffer's text from dynamic modules would be nice. -- Tuấn-Anh Nguyễn Software Engineer [-- Attachment #2: Type: text/html, Size: 2127 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 6:17 ` Tuấn Anh Nguyễn @ 2020-04-01 13:26 ` Eli Zaretskii 2020-04-01 15:47 ` Jorge Javier Araya Navarro 2020-04-01 17:55 ` Tuấn-Anh Nguyễn 0 siblings, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 13:26 UTC (permalink / raw) To: Tuấn Anh Nguyễn; +Cc: emacs-devel > From: Tuấn Anh Nguyễn <ubolonton@gmail.com> > Date: Wed, 1 Apr 2020 13:17:42 +0700 > Cc: emacs-devel@gnu.org > > Real usage with "xdisp.c": > > (define-advice tree-sitter--do-parse (:around (f &rest args) benchmark) > (message "%s" (benchmark-run (apply f args)))) > > (0.257998 1 0.13326100000000096) And that is even without encoding the buffer text, IIUC what the package does. > So yes, direct access to buffer's text from dynamic modules would be nice. Did you consider using the API where an application can provide a function to return text at a given offset? Such a function could be relatively easily implemented for Emacs. Btw, what do you do with the tree returned by the tree-sitter parser? store it in some buffer-local variable? If so, how much memory does such a tree take, and when, if ever, is that memory released? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 13:26 ` Eli Zaretskii @ 2020-04-01 15:47 ` Jorge Javier Araya Navarro 2020-04-01 16:07 ` Eli Zaretskii 2020-04-01 17:55 ` Tuấn-Anh Nguyễn 1 sibling, 1 reply; 142+ messages in thread From: Jorge Javier Araya Navarro @ 2020-04-01 15:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Tuấn Anh Nguyễn, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1326 bytes --] > Did you consider using the API where an application can provide a function to return text at a given offset? Such a function could be relatively easily implemented for Emacs. But why not just allow access to buffers for dynamic modules, otherwise what would be the point of dynamic modules? El mié., 1 de abr. de 2020 a la(s) 07:26, Eli Zaretskii (eliz@gnu.org) escribió: > > From: Tuấn Anh Nguyễn <ubolonton@gmail.com> > > Date: Wed, 1 Apr 2020 13:17:42 +0700 > > Cc: emacs-devel@gnu.org > > > > Real usage with "xdisp.c": > > > > (define-advice tree-sitter--do-parse (:around (f &rest args) > benchmark) > > (message "%s" (benchmark-run (apply f args)))) > > > > (0.257998 1 0.13326100000000096) > > And that is even without encoding the buffer text, IIUC what the > package does. > > > So yes, direct access to buffer's text from dynamic modules would be > nice. > > Did you consider using the API where an application can provide a > function to return text at a given offset? Such a function could be > relatively easily implemented for Emacs. > > Btw, what do you do with the tree returned by the tree-sitter parser? > store it in some buffer-local variable? If so, how much memory does > such a tree take, and when, if ever, is that memory released? > > [-- Attachment #2: Type: text/html, Size: 2038 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 15:47 ` Jorge Javier Araya Navarro @ 2020-04-01 16:07 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 16:07 UTC (permalink / raw) To: Jorge Javier Araya Navarro; +Cc: ubolonton, emacs-devel > From: Jorge Javier Araya Navarro <jorge@esavara.cr> > Date: Wed, 1 Apr 2020 09:47:48 -0600 > Cc: Tuấn Anh Nguyễn <ubolonton@gmail.com>, > emacs-devel@gnu.org > > > Did you consider using the API where an application can provide a function to return text at a given offset? > Such a function could be relatively easily implemented for Emacs. > > But why not just allow access to buffers for dynamic modules, otherwise what would be the point of dynamic > modules? These two are orthogonal issues: if we allow such access from modules, will this particular module use it, and if so, how? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 13:26 ` Eli Zaretskii 2020-04-01 15:47 ` Jorge Javier Araya Navarro @ 2020-04-01 17:55 ` Tuấn-Anh Nguyễn 2020-04-01 19:33 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: Tuấn-Anh Nguyễn @ 2020-04-01 17:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel On Wed, Apr 1, 2020 at 8:26 PM Eli Zaretskii <eliz@gnu.org> wrote: > > > From: Tuấn Anh Nguyễn <ubolonton@gmail.com> > > Date: Wed, 1 Apr 2020 13:17:42 +0700 > > Cc: emacs-devel@gnu.org > > > > Real usage with "xdisp.c": > > > > (define-advice tree-sitter--do-parse (:around (f &rest args) benchmark) > > (message "%s" (benchmark-run (apply f args)))) > > > > (0.257998 1 0.13326100000000096) > > And that is even without encoding the buffer text, IIUC what the > package does. > > > So yes, direct access to buffer's text from dynamic modules would be nice. > > Did you consider using the API where an application can provide a > function to return text at a given offset? Such a function could be > relatively easily implemented for Emacs. > I don't understand what you mean. Below I'll explain how it works currently. `ts-parse' uses the Tree-sitter's API that consumes text in chunks: TSTree *ts_parser_parse( TSParser *self, const TSTree *old_tree, TSInput input ); typedef struct { void *payload; const char *(*read)( void *payload, uint32_t byte_offset, TSPoint position, uint32_t *bytes_read ); TSInputEncoding encoding; } TSInput; Because dynamic modules don't have direct access to buffer text, `ts-parse' uses the module function `copy_string_contents', and exposes this interface: (ts-parse PARSER INPUT-FUNCTION OLD-TREE) Here INPUT-FUNCTION must return a chunk of the buffer text, starting from the given byte offset, as a Lisp string. `ts-buffer-input' is one such function. So: 1. Chunks of the buffer text are copied into Lisp strings, through `buffer-substring-no-properties'. 2. These Lisp strings are copied into buffers of null-terminated utf-8 bytes, through `copy_string_contents'. 3. All these temporary Lisp strings create GC pressure. In the xdisp.c example, it was 100ms for GC, in addition to 150ms for parsing. 4. emacs-module-rs has an automatic, blanket workaround for this bug https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31238. The workaround involves pairs of `make_global_ref' and `free_global_ref' calls, on all "suspected" `emacs_value's. #4 can be avoided if emacs-module-rs allows selectively disabling the blanket workaround. It's band-aid on top of band-aid, but at least it's workable. #3 can probably be alleviated by increasing the chunk size. However, they are consequences of #1 and #2. If dynamic modules have direct access to the buffer text, none of the above is an issue. Such direct access can be enabled by something like this: char* (*access_buffer_text) (emacs_env *env, emacs_value buffer, ptrdiff_t byte_offset, ptrdiff_t *size_inout); Of course, such an API would require extensive documentation on how it must be used, to ensure safety and correctness. > Btw, what do you do with the tree returned by the tree-sitter parser? > store it in some buffer-local variable? If so, how much memory does > such a tree take, and when, if ever, is that memory released? > It's stored in a buffer-local variable. I haven't measured the memory they take. Memory is released when the tree object is garbage-collected (it's a `user-ptr'). -- Tuấn-Anh Nguyễn Software Engineer ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 17:55 ` Tuấn-Anh Nguyễn @ 2020-04-01 19:33 ` Eli Zaretskii 2020-04-01 23:38 ` Stephen Leake 2020-04-02 4:21 ` Tuấn-Anh Nguyễn 0 siblings, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 19:33 UTC (permalink / raw) To: Tuấn-Anh Nguyễn; +Cc: emacs-devel > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com> > Date: Thu, 2 Apr 2020 00:55:45 +0700 > Cc: emacs-devel@gnu.org > > > Did you consider using the API where an application can provide a > > function to return text at a given offset? Such a function could be > > relatively easily implemented for Emacs. > > > > I don't understand what you mean. Below I'll explain how it works > currently. [...] If dynamic modules have direct access to the > buffer text, none of the above is an issue. > > Such direct access can be enabled by something like this: > > char* (*access_buffer_text) (emacs_env *env, > emacs_value buffer, > ptrdiff_t byte_offset, > ptrdiff_t *size_inout); > > Of course, such an API would require extensive documentation on how it > must be used, to ensure safety and correctness. I think you are moving too fast, and keep the current implementation in sight too much. What I suggest is to step back and see how such direct access, if it were available, could be used with tree-sitter. Let's forget about modules for a moment and consider tree-sitter linked with Emacs and capable of calling any C function in core. How would you use that? Buffer text is not exactly UTF-8, it's a superset of UTF-8. So one question to answer is what to do with byte sequences that are not valid UTF-8. Any suggestions or ideas? How does tree-sitter handle invalid byte sequences in general? Also, direct access to buffer text generally means we must make sure GC never runs as long as pointers to buffer text are lying around. Can any Lisp run between calls to the reader function that the tree-sitter parser calls to access the buffer text? If so, we need to take care of that issue. Next, I'm still asking whether parsing the whole buffer when it is first created is necessary. Can we pass to the parser just a small chunk (say, 500 bytes) of the buffer around the window-full to be displayed next? If this presents problems, what are those problems? IOW, the issue with exposing access to buffer text to modules is IMO secondary. My suggestion is first to figure out how to do this stuff efficiently from within Emacs itself, as if the module interface were not part of the equation. We can add that aspect back later. And yes, doing this by consing strings is not a good idea, it will slow things down and cause a lot of GC. It is best avoided. Thus my questions above. > > Btw, what do you do with the tree returned by the tree-sitter parser? > > store it in some buffer-local variable? If so, how much memory does > > such a tree take, and when, if ever, is that memory released? > > > > It's stored in a buffer-local variable. I haven't measured the memory > they take. Memory is released when the tree object is garbage-collected > (it's a `user-ptr'). So if I have many hundreds of buffers, I could have such a tree in each one of them indefinitely? Perhaps that's one more design issue to consider, given that the parsing is so fast. Similar to what we do with image and face caches -- we flush them from time to time, to keep the memory footprint in check. So a buffer that was not current more than some time interval ago could have its tree GCed. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 19:33 ` Eli Zaretskii @ 2020-04-01 23:38 ` Stephen Leake 2020-04-02 0:25 ` Stephen Leake ` (3 more replies) 2020-04-02 4:21 ` Tuấn-Anh Nguyễn 1 sibling, 4 replies; 142+ messages in thread From: Stephen Leake @ 2020-04-01 23:38 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Also, direct access to buffer text generally means we must make sure > GC never runs as long as pointers to buffer text are lying around. > Can any Lisp run between calls to the reader function that the > tree-sitter parser calls to access the buffer text? If the parser copies the text into an internal buffer, that reader function should only be called once per call to the parser. Parsers used to based around small buffers that would read in a file a chunk at a time, but that is not necessary on any machine that can run Emacs. Since Emacs has the entire file in memory, the parser can too. However, if we are really trying to avoid copying text (which is very premature optimization), then the reader function will be called many times during parsing (to fetch each word), and possibly during the grammar actions (to compute indent or face). > Next, I'm still asking whether parsing the whole buffer when it is > first created is necessary. To some extent, that depends on the language. The parser must be able to complete a parse, to generate a complete syntax tree. I'll assume no error correction for a moment; more below. In C or C++ body files, "a complete parse" is typically one variable or function declaration. So if Emacs can reliably find the beginning and end of those declarations, it could pass just the ones containing the region of interest to the parser. Tree-sitter (if it supports this at all, or is modified to) would end up with a forest of small parse trees, rather than one large one. They might get merged if large chunks of text are parsed together. In Ada and Java, and most C++ header files, "a complete parse" is a file; it contains an Ada package spec or body, or a Java or C++ class, or a C++ namespace. There are many "small languages" for which "a complete parse" is similar to a "statement". Bash shell, for example. They could pass just the statement, but only if Emacs can reliably find the start and end (not always easy). It is also possible to modify the language grammar to allow smaller pieces of code to be a complete parse; ada-mode does this, making a single declaration or statement "a complete parse", in order to support "partial parse". That can easily lead to errors in indent, since the indent of the start of the text portion is unknown (ada-mode simply assumes it is correct in the buffer). Another reason to allow smaller code chunks to be a complete parse is to allow parsing the code fragments that appear in grammar actions; the ELPA package wisitoken-grammar-mode uses this for wisitoken grammar files with Ada actions. In sum, the short answer is "yes, you must parse the whole file, unless your language is particularly simple". Since we need to support the worst case, we should assume the whole file must be parsed at least once. > Can we pass to the parser just a small chunk (say, 500 bytes) of the > buffer around the window-full to be displayed next? If this presents > problems, what are those problems? In wisi, the error correction code will fill in the missing text so a complete parse is possible. Since some of that is guesses, the results may not be very good. Tree-sitter also has error correction; I'm not clear how good it is. > IOW, the issue with exposing access to buffer text to modules is IMO > secondary. yes, because copying text is fast compared to everything else going on. > My suggestion is first to figure out how to do this stuff efficiently > from within Emacs itself, as if the module interface were not part of > the equation. We can add that aspect back later. There are two times the wisi code that wraps the parser needs access to the buffer; first to copy the text, second to add text properties (faces, indent values, navigation markers). There are usually many text properties output by each parse. The positions and values of the text properties are computed by functions that run after the complete syntax tree has been produced. In wisi, those functions are added directly in the grammar source file (where they are called "post-parse grammar actions"). In tree-sitter, I assume they are called from some mode-author-written code that traverses the syntax tree (wisi provides that internally). Except I see below that the emacs tree-sitter package stores the syntax tree in the buffer. One option here is to try to standardize on an elisp representation of a syntax tree, and have both the wisi and tree-sitter parsers provide that. Then the grammar actions could be implemented in elisp. I suspect that would be very slow; elisp is just not good at traversing large complex data structures. That is not just my bias showing (I _much_ prefer doing as much as possible in Ada); I first wrote the ada-mode parser and grammar actions in elisp, and then did a complete rewrite in Ada, gaining significant speed. Although I never considered passing the syntax tree to elisp as a single object, so maybe that could work well. There is no universal standard for representing "a syntax tree". In wisi, the tree is directly produced by the LR shift and reduce operations, and thus is very close the the grammar expressed in BNF. I don't know what the tree-sitter parse tree looks like. AdaCore provides a parser similar in purpose to the wisi parsers (https://github.com/AdaCore/libadalang), that also does more of what an Ada compiler does (which could allow even better font-lock and navigation). To support those additional operations, the syntax tree is quite different from the ada-mode one. In general, each parser library, and even each grammar author, will have different representations for the syntax tree. So if we want to support different parsers, I think it is best to define the Emacs "parser API" as "give text to parser; accept text properties from parser". LSP (via eglot) provides other things the parser can return; code completion menus, for example. And for indent and face, it returns formatted text with markdown. I plan to translate that to text properties to integrate LSP into wisi. Whether LSP requires a full initial parse is up to the LSP server author (LSP itself provides both "here's the full text" and "here's partial text" messages); they have the same considerations discussed above. > And yes, doing this by consing strings is not a good idea, it will > slow things down and cause a lot of GC. It is best avoided. Thus my > questions above. I'm not sure how "convert syntax tree to elisp" compares to "consing strings". I would certainly expect it to cause a lot of GC. >> > Btw, what do you do with the tree returned by the tree-sitter parser? >> > store it in some buffer-local variable? If so, how much memory does >> > such a tree take, and when, if ever, is that memory released? >> > >> >> It's stored in a buffer-local variable. I haven't measured the memory >> they take. Memory is released when the tree object is garbage-collected >> (it's a `user-ptr'). Is it an elisp structure (or accesible from elisp)? Have you written code that traverses it to provide faces and indentation? -- -- Stephe PS; I have the beginnings of a migraine while typing this, so some of it may not make sense. Sigh. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 23:38 ` Stephen Leake @ 2020-04-02 0:25 ` Stephen Leake 2020-04-02 2:46 ` Stefan Monnier ` (2 subsequent siblings) 3 siblings, 0 replies; 142+ messages in thread From: Stephen Leake @ 2020-04-02 0:25 UTC (permalink / raw) To: emacs-devel I looked at the tree-sitter source in git-hub (https://github.com/ubolonton/emacs-tree-sitter) and the tree-sitter doc that points to (https://tree-sitter.github.io/tree-sitter/using-parsers) Stephen Leake <stephen_leake@stephe-leake.org> writes: >>> > Btw, what do you do with the tree returned by the tree-sitter parser? >>> > store it in some buffer-local variable? If so, how much memory does >>> > such a tree take, and when, if ever, is that memory released? >>> > >>> >>> It's stored in a buffer-local variable. I haven't measured the memory >>> they take. Memory is released when the tree object is garbage-collected >>> (it's a `user-ptr'). > > Is it an elisp structure (or accesible from elisp)? It's a Rust structure; there is an emacs module providing elisp access to it (things like "find syntax tree node at point", "get parent node", "get node text"). The syntax tree is a "concrete syntax tree"; it should be quite close to the wisi syntax tree. > Have you written code that traverses it to provide faces and > indentation? Not in that repository. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 23:38 ` Stephen Leake 2020-04-02 0:25 ` Stephen Leake @ 2020-04-02 2:46 ` Stefan Monnier 2020-04-02 4:36 ` Tuấn-Anh Nguyễn 2020-04-02 14:44 ` Eli Zaretskii 2020-04-02 5:21 ` Tuấn-Anh Nguyễn 2020-04-02 14:36 ` Eli Zaretskii 3 siblings, 2 replies; 142+ messages in thread From: Stefan Monnier @ 2020-04-02 2:46 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel > In C or C++ body files, "a complete parse" is typically one variable or > function declaration. So if Emacs can reliably find the beginning and > end of those declarations, IIUC, a large part of CC-mode's trouble is exactly the need to find somewhat reliably a position vaguely like "the beginning of a declaration". It's very much a non-trivial problem (and in the general case to properly handle all possible comments you need to start parsing from point-min). >> And yes, doing this by consing strings is not a good idea, it will >> slow things down and cause a lot of GC. It is best avoided. Thus my >> questions above. > I'm not sure how "convert syntax tree to elisp" compares to "consing > strings". I would certainly expect it to cause a lot of GC. If the GC is the worry, we can use a function which encodes the buffer using a given coding-system and returns a malloc'd array of bytes. >>> It's stored in a buffer-local variable. I haven't measured the memory >>> they take. Memory is released when the tree object is garbage-collected >>> (it's a `user-ptr'). > Is it an elisp structure (or accesible from elisp)? Have you written > code that traverses it to provide faces and indentation? According to https://github.com/tree-sitter/tree-sitter/issues/222 the parse tree takes around 10 times the size of the source text. At least that's for tree-sitter's own parse-tree; not sure how that relates to emacs-tree-sitter's yet. Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 2:46 ` Stefan Monnier @ 2020-04-02 4:36 ` Tuấn-Anh Nguyễn 2020-04-02 14:44 ` Eli Zaretskii 1 sibling, 0 replies; 142+ messages in thread From: Tuấn-Anh Nguyễn @ 2020-04-02 4:36 UTC (permalink / raw) To: Stefan Monnier; +Cc: Stephen Leake, emacs-devel > If the GC is the worry, we can use a function which encodes the > buffer using a given coding-system and returns a malloc'd array of bytes. > If we are talking about a function exposed to dynamic modules, then we will also need to expose another function to free that byte array, because the dynamic module may use a different allocator. It's probably better to ask the caller to prepare that array, like what `copy_string_contents' does. > >>> It's stored in a buffer-local variable. I haven't measured the memory > >>> they take. Memory is released when the tree object is garbage-collected > >>> (it's a `user-ptr'). > > Is it an elisp structure (or accesible from elisp)? Have you written > > code that traverses it to provide faces and indentation? > > According to https://github.com/tree-sitter/tree-sitter/issues/222 the > parse tree takes around 10 times the size of the source text. At least > that's for tree-sitter's own parse-tree; not sure how that relates to > emacs-tree-sitter's yet. > emacs-tree-sitter adds 16 bytes for reference counting and 8 bytes for checking concurrent modifications (because nodes are also exposed to Lisp as objects). That's negligible I think. -- Tuấn-Anh Nguyễn Software Engineer ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 2:46 ` Stefan Monnier 2020-04-02 4:36 ` Tuấn-Anh Nguyễn @ 2020-04-02 14:44 ` Eli Zaretskii 2020-04-02 15:19 ` Stefan Monnier 2020-04-03 2:49 ` [SPAM UNSURE] " Stephen Leake 1 sibling, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-02 14:44 UTC (permalink / raw) To: Stefan Monnier; +Cc: stephen_leake, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Wed, 01 Apr 2020 22:46:18 -0400 > Cc: emacs-devel <emacs-devel@gnu.org> > > If the GC is the worry, we can use a function which encodes the > buffer using a given coding-system and returns a malloc'd array of bytes. I think we should try to avoid both copying and encoding the text we send to the parser. Both operations are expensive and require memory allocation. > According to https://github.com/tree-sitter/tree-sitter/issues/222 the > parse tree takes around 10 times the size of the source text. Yes, that's another reason why it might make sense to "forget" trees of buffers that were not displayed for a long time. But this is an optimization that can be added later without any significant changes in the design. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 14:44 ` Eli Zaretskii @ 2020-04-02 15:19 ` Stefan Monnier 2020-04-03 2:49 ` [SPAM UNSURE] " Stephen Leake 1 sibling, 0 replies; 142+ messages in thread From: Stefan Monnier @ 2020-04-02 15:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: stephen_leake, emacs-devel > I think we should try to avoid both copying and encoding the text we > send to the parser. Both operations are expensive and require memory > allocation. I think both operations are cheap enough relatively to the actual parsing that it is not indispensable to avoid them: maybe it will be worth the effort, but maybe not. In any case, it's a minor implementation detail that can easily be changed in the future without impacting the rest of the code. So, I think it falls squarely in the realm of premature optimization. >> According to https://github.com/tree-sitter/tree-sitter/issues/222 the >> parse tree takes around 10 times the size of the source text. > Yes, that's another reason why it might make sense to "forget" trees > of buffers that were not displayed for a long time. Agreed, tho I wouldn't word it that way: parse trees are not needed for redisplay and can be used for things that don't relate to redisplay (e.g. navigation, indentation, ...). > But this is an optimization that can be added later without any > significant changes in the design. Agreed as well. Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 14:44 ` Eli Zaretskii 2020-04-02 15:19 ` Stefan Monnier @ 2020-04-03 2:49 ` Stephen Leake 2020-04-03 7:47 ` Eli Zaretskii 2020-04-03 8:11 ` Robert Pluim 1 sibling, 2 replies; 142+ messages in thread From: Stephen Leake @ 2020-04-03 2:49 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stefan Monnier <monnier@iro.umontreal.ca> >> Date: Wed, 01 Apr 2020 22:46:18 -0400 >> Cc: emacs-devel <emacs-devel@gnu.org> >> >> If the GC is the worry, we can use a function which encodes the >> buffer using a given coding-system and returns a malloc'd array of bytes. > > I think we should try to avoid both copying and encoding the text we > send to the parser. Both operations are expensive and require memory > allocation. I don't understand what the alternative is. The parser imposes the reasonable requirement that the input text be utf-8 (or possibly some other standard format). Emacs raw buffer text is not utf-8, so we must do some encoding. If we try to pass a plain pointer to a point in the Emacs internal buffer, there is no way to do that encoding. It would be possible to change the lexer in the parser to accept Emacs raw buffer format, but I don't think you are proposing that. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 2:49 ` [SPAM UNSURE] " Stephen Leake @ 2020-04-03 7:47 ` Eli Zaretskii 2020-04-03 18:11 ` Stephen Leake 2020-04-03 8:11 ` Robert Pluim 1 sibling, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-03 7:47 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Date: Thu, 02 Apr 2020 18:49:07 -0800 > > > I think we should try to avoid both copying and encoding the text we > > send to the parser. Both operations are expensive and require memory > > allocation. > > I don't understand what the alternative is. The parser imposes the > reasonable requirement that the input text be utf-8 (or possibly some > other standard format). Emacs raw buffer text is not utf-8, so we must > do some encoding. Emacs represents buffer text as a superset of UTF-8, with the violations of strict UTF-8 being very rare in buffers that hold program sources. The function we can provide that lets tree-sitter access buffer text can cope with those violations, if it turns out that tree-sitter cannot do that by itself (which frankly, I'd expect it to be able to do). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 7:47 ` Eli Zaretskii @ 2020-04-03 18:11 ` Stephen Leake 2020-04-03 18:46 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Stephen Leake @ 2020-04-03 18:11 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stephen Leake <stephen_leake@stephe-leake.org> >> Date: Thu, 02 Apr 2020 18:49:07 -0800 >> >> > I think we should try to avoid both copying and encoding the text we >> > send to the parser. Both operations are expensive and require memory >> > allocation. >> >> I don't understand what the alternative is. The parser imposes the >> reasonable requirement that the input text be utf-8 (or possibly some >> other standard format). Emacs raw buffer text is not utf-8, so we must >> do some encoding. > > Emacs represents buffer text as a superset of UTF-8, with the > violations of strict UTF-8 being very rare in buffers that hold > program sources. The function we can provide that lets tree-sitter > access buffer text can cope with those violations, Ok. "cope with those violations" = "do some encoding". We can avoid copying _if_ the encoding does not change character positions, or somehow preserves positions, for example with an auxiliary table of changes due to encoding. Coping with violations in the lexer would make it much easier to avoid changing character positions; it is easy to simply ignore bytes there. wisi makes it easy to implement this in the lexer (because it uses re2c), although currently there is no way to make that language-specific (that would be an enhancement). https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners describes the facility for enhancing the tree-sitter lexer (aka scanner). That is not convenient for handling this issue, so we'd have to request (and or provide) an enhancement. We cannot avoid encoding (either in the read function provided to tree-sitter, or in the tree-sitter lexer), but the encoding may be very simple and efficient. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 18:11 ` Stephen Leake @ 2020-04-03 18:46 ` Eli Zaretskii 2020-04-04 0:05 ` Stephen Leake 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-03 18:46 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Date: Fri, 03 Apr 2020 10:11:05 -0800 > > > Emacs represents buffer text as a superset of UTF-8, with the > > violations of strict UTF-8 being very rare in buffers that hold > > program sources. The function we can provide that lets tree-sitter > > access buffer text can cope with those violations, > > Ok. "cope with those violations" = "do some encoding". If we use "encoding" terminology for this, it will be confusing and will cause misunderstandings. "Conversion" is better, IMO. Some sequences may need to be converted when feeding them to tree-sitter. But I think tree-sitter should be able to cope with this itself. It is unreasonable to expect strict UTF-8 from all applications. Maybe I'm dreaming, but ISTR there is (or was) an issue on their issue tracker about this. > We cannot avoid encoding (either in the read function provided to > tree-sitter, or in the tree-sitter lexer), but the encoding may be very > simple and efficient. Once again, please reserve "encoding" to the likes of encode-coding-region or code_convert_string, to avoid confusion. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 18:46 ` Eli Zaretskii @ 2020-04-04 0:05 ` Stephen Leake 0 siblings, 0 replies; 142+ messages in thread From: Stephen Leake @ 2020-04-04 0:05 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stephen Leake <stephen_leake@stephe-leake.org> >> Date: Fri, 03 Apr 2020 10:11:05 -0800 >> >> > Emacs represents buffer text as a superset of UTF-8, with the >> > violations of strict UTF-8 being very rare in buffers that hold >> > program sources. The function we can provide that lets tree-sitter >> > access buffer text can cope with those violations, >> >> Ok. "cope with those violations" = "do some encoding". > > If we use "encoding" terminology for this, it will be confusing and > will cause misunderstandings. Yeah, I realized that after I posted this. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 2:49 ` [SPAM UNSURE] " Stephen Leake 2020-04-03 7:47 ` Eli Zaretskii @ 2020-04-03 8:11 ` Robert Pluim 2020-04-03 11:00 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: Robert Pluim @ 2020-04-03 8:11 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel >>>>> On Thu, 02 Apr 2020 18:49:07 -0800, Stephen Leake <stephen_leake@stephe-leake.org> said: Stephen> Eli Zaretskii <eliz@gnu.org> writes: >>> From: Stefan Monnier <monnier@iro.umontreal.ca> >>> Date: Wed, 01 Apr 2020 22:46:18 -0400 >>> Cc: emacs-devel <emacs-devel@gnu.org> >>> >>> If the GC is the worry, we can use a function which encodes the >>> buffer using a given coding-system and returns a malloc'd array of bytes. >> >> I think we should try to avoid both copying and encoding the text we >> send to the parser. Both operations are expensive and require memory >> allocation. Stephen> I don't understand what the alternative is. The parser imposes the Stephen> reasonable requirement that the input text be utf-8 (or possibly some Stephen> other standard format). Emacs raw buffer text is not utf-8, so we must Stephen> do some encoding. Itʼs pretty close, apart from raw bytes. How much of an imposition would it be in practice to say 'source code must not contain raw bytes'? Stephen> If we try to pass a plain pointer to a point in the Emacs internal Stephen> buffer, there is no way to do that encoding. As pointed out elsewhere, you'd have to take the gap into account, so it would be two pointers and two lengths to describe the entire buffer text. Robert ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 8:11 ` Robert Pluim @ 2020-04-03 11:00 ` Eli Zaretskii 2020-04-03 11:09 ` Robert Pluim 2020-04-03 11:21 ` John Yates 0 siblings, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-03 11:00 UTC (permalink / raw) To: Robert Pluim; +Cc: stephen_leake, emacs-devel > From: Robert Pluim <rpluim@gmail.com> > Date: Fri, 03 Apr 2020 10:11:07 +0200 > Cc: emacs-devel <emacs-devel@gnu.org> > > Stephen> I don't understand what the alternative is. The parser imposes the > Stephen> reasonable requirement that the input text be utf-8 (or possibly some > Stephen> other standard format). Emacs raw buffer text is not utf-8, so we must > Stephen> do some encoding. > > Itʼs pretty close, apart from raw bytes. Not only raw bytes: also some characters that cannot be unified with Unicode. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 11:00 ` Eli Zaretskii @ 2020-04-03 11:09 ` Robert Pluim 2020-04-03 12:44 ` Eli Zaretskii 2020-04-03 11:21 ` John Yates 1 sibling, 1 reply; 142+ messages in thread From: Robert Pluim @ 2020-04-03 11:09 UTC (permalink / raw) To: Eli Zaretskii; +Cc: stephen_leake, emacs-devel >>>>> On Fri, 03 Apr 2020 14:00:06 +0300, Eli Zaretskii <eliz@gnu.org> said: >> From: Robert Pluim <rpluim@gmail.com> >> Date: Fri, 03 Apr 2020 10:11:07 +0200 >> Cc: emacs-devel <emacs-devel@gnu.org> >> Stephen> I don't understand what the alternative is. The parser imposes the Stephen> reasonable requirement that the input text be utf-8 (or possibly some Stephen> other standard format). Emacs raw buffer text is not utf-8, so we must Stephen> do some encoding. >> >> Itʼs pretty close, apart from raw bytes. Eli> Not only raw bytes: also some characters that cannot be unified with Eli> Unicode. And again: how likely are those characters to be in source code? Robert ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 11:09 ` Robert Pluim @ 2020-04-03 12:44 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-03 12:44 UTC (permalink / raw) To: Robert Pluim; +Cc: stephen_leake, emacs-devel > From: Robert Pluim <rpluim@gmail.com> > Cc: stephen_leake@stephe-leake.org, emacs-devel@gnu.org > Date: Fri, 03 Apr 2020 13:09:34 +0200 > > Eli> Not only raw bytes: also some characters that cannot be unified with > Eli> Unicode. > > And again: how likely are those characters to be in source code? Probably not too likely, but we need to have a solution for when they do happen. I guess a useful first step would be for someone to find out what does tree-sitter do when it encounters such byte sequences. We can then devise a solution that will produce good results, preferably without the need to ask tree-sitter developers to cater to these situations. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 11:00 ` Eli Zaretskii 2020-04-03 11:09 ` Robert Pluim @ 2020-04-03 11:21 ` John Yates 2020-04-03 12:50 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: John Yates @ 2020-04-03 11:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Robert Pluim, Stephen Leake, Emacs developers On Fri, Apr 3, 2020 at 7:01 AM Eli Zaretskii <eliz@gnu.org> wrote: > Not only raw bytes: also some characters that cannot be unified with > Unicode. How are the semantics of such characters queried? E.g. can such characters participate in case conversion? /john ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 11:21 ` John Yates @ 2020-04-03 12:50 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-03 12:50 UTC (permalink / raw) To: John Yates; +Cc: rpluim, stephen_leake, emacs-devel > From: John Yates <john@yates-sheets.org> > Date: Fri, 3 Apr 2020 07:21:28 -0400 > Cc: Robert Pluim <rpluim@gmail.com>, Stephen Leake <stephen_leake@stephe-leake.org>, > Emacs developers <emacs-devel@gnu.org> > > On Fri, Apr 3, 2020 at 7:01 AM Eli Zaretskii <eliz@gnu.org> wrote: > > Not only raw bytes: also some characters that cannot be unified with > > Unicode. > > How are the semantics of such characters queried? I don't think I understand well enough what you mean by "semantics" in this context. Pleased elaborate: what else is in the semantics except case conversions that you mention below? > E.g. can such characters participate in case conversion? Yes, of course. Emacs converts letter-case by consulting the case tables, see the node "Case Tables" in the ELisp manual. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 23:38 ` Stephen Leake 2020-04-02 0:25 ` Stephen Leake 2020-04-02 2:46 ` Stefan Monnier @ 2020-04-02 5:21 ` Tuấn-Anh Nguyễn 2020-04-02 9:24 ` [SPAM UNSURE] " Stephen Leake 2020-04-02 14:36 ` Eli Zaretskii 3 siblings, 1 reply; 142+ messages in thread From: Tuấn-Anh Nguyễn @ 2020-04-02 5:21 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel > > My suggestion is first to figure out how to do this stuff efficiently > > from within Emacs itself, as if the module interface were not part of > > the equation. We can add that aspect back later. > > There are two times the wisi code that wraps the parser needs access to > the buffer; first to copy the text, second to add text properties > (faces, indent values, navigation markers). There are usually many text > properties output by each parse. > > The positions and values of the text properties are computed by > functions that run after the complete syntax tree has been produced. In > wisi, those functions are added directly in the grammar source file > (where they are called "post-parse grammar actions"). In tree-sitter, I > assume they are called from some mode-author-written code that traverses > the syntax tree (wisi provides that internally). Except I see below that > the emacs tree-sitter package stores the syntax tree in the buffer. > The preferred approach with tree-sitter is querying: https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries -- Tuấn-Anh Nguyễn Software Engineer ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 5:21 ` Tuấn-Anh Nguyễn @ 2020-04-02 9:24 ` Stephen Leake 0 siblings, 0 replies; 142+ messages in thread From: Stephen Leake @ 2020-04-02 9:24 UTC (permalink / raw) To: emacs-devel Tuấn-Anh Nguyễn <ubolonton@gmail.com> writes: >> > My suggestion is first to figure out how to do this stuff efficiently >> > from within Emacs itself, as if the module interface were not part of >> > the equation. We can add that aspect back later. >> >> There are two times the wisi code that wraps the parser needs access to >> the buffer; first to copy the text, second to add text properties >> (faces, indent values, navigation markers). There are usually many text >> properties output by each parse. >> >> The positions and values of the text properties are computed by >> functions that run after the complete syntax tree has been produced. In >> wisi, those functions are added directly in the grammar source file >> (where they are called "post-parse grammar actions"). In tree-sitter, I >> assume they are called from some mode-author-written code that traverses >> the syntax tree (wisi provides that internally). Except I see below that >> the emacs tree-sitter package stores the syntax tree in the buffer. >> > > The preferred approach with tree-sitter is querying: > https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries For access to the syntax tree, yes. There must be code somewhere that computes face, indent and navigation (and code completion, ...). That code will build on top of the syntax tree access; it could be in Rust (in the module) or in elisp (using the module functions). Or in C linked directly into Emacs, as Eli suggests. But I don't think he meant that as an actual implementation approach, just as a design approach. The wisi Ada code that computes the text properties accesses the syntax tree more directly, but that's just an implementation detail. I think it makes sense at this point to try to merge wisi and emacs-tree-sitter. There are several approaches: 1. rewrite the wisi grammar actions in elisp, using the emacs-tree-sitter module functions to access the syntax tree. 2. rewrite the wisi grammar actions in Rust, using Rust functions to access the syntax tree 3. rewrite the emacs-tree-sitter module in Ada, using an Ada binding to the Tree-Sitter C API. Then the Emacs module would provide the current wisi Ada code, modified to work with a Tree-Sitter parser. 4. There would also be value in doing an independent design and implementation of code to compute face, indent and navigation using the tree-sitter syntax tree; there might be a better approach than what wisi does. 1 is probably the quickest path to getting something working, but 2 or 3 will probably provide faster execution time. Ideally we'd do all three (or four) and get some good metrics. After doing one of the above, we must still write the calls to the grammar actions for each language of interest. In wisi, this is done by adding grammar actions to the grammar source code; for example, here is the indent action for the Ada 'if then end if' statement: if_statement : IF expression_opt THEN sequence_of_statements_opt END IF SEMICOLON %((wisi-indent-action [nil [(wisi-hanging% ada-indent-broken (* 2 ada-indent-broken)) ada-indent-broken] nil [ada-indent ada-indent] nil nil nil]))% There is one lisp form for each token in the grammar production. IF is not indented by this action; it is indented by the enclosing Ada statement. The conditional expression is indented by wisi-hanging; comments within the expression (assuming it is multi-line) are indented by ada-indent-broken. wisi-hanging takes care of indenting the second line in a long expression. THEN, END IF SEMICOLON are not indented. The statements in the true branch are indented by ada-indent. If ada-indent is 3, ada-broken-indent 2, this produces: if a or b -- a comment then statement_1; statement_2; -- another comment end if; In the upstream development repository for the wisi package (https://savannah.nongnu.org/projects/ada-mode/), there is a user guide to the grammar actions. I can provide an html or info version on request. (or get my act together and do another wisi/ada-mode release). In Tree-Sitter, the calls to the grammar actions are written in code that traverses the syntax tree; this would be a higher level elisp or Rust function, for 1 and 2 above. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 23:38 ` Stephen Leake ` (2 preceding siblings ...) 2020-04-02 5:21 ` Tuấn-Anh Nguyễn @ 2020-04-02 14:36 ` Eli Zaretskii 2020-04-03 2:27 ` Stephen Leake 3 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-02 14:36 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Date: Wed, 01 Apr 2020 15:38:26 -0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Also, direct access to buffer text generally means we must make sure > > GC never runs as long as pointers to buffer text are lying around. > > Can any Lisp run between calls to the reader function that the > > tree-sitter parser calls to access the buffer text? > > If the parser copies the text into an internal buffer, that reader > function should only be called once per call to the parser. Such copying is not really scalable, and IMO should be avoided. During active editing, redisplay runs very frequently, and having to copy portions of the buffer, let alone all of it, each time, which necessarily requires memory allocation, consing of Lisp objects, etc., will produce significant memory pressure, expensive heap allocations/deallocations, and a lot of GC. Recall that on many modern platforms Emacs doesn't really return memory to the system, which means we risk increasing the memory footprint, and create system-wide memory pressure. It isn't a catastrophe, but we should try to avoid it if possible. > Since Emacs has the entire file in memory, the parser can too. Having the file twice or more in memory is worse than having it only once. > However, if we are really trying to avoid copying text (which is very > premature optimization) I don't think it's premature. > In sum, the short answer is "yes, you must parse the whole file, unless > your language is particularly simple". Funny, my conclusion from reading your detailed description was entirely different. > > IOW, the issue with exposing access to buffer text to modules is IMO > > secondary. > > yes, because copying text is fast compared to everything else going on. That wasn't my motivation when I wrote that. > In general, each parser library, and even each grammar author, will have > different representations for the syntax tree. > > So if we want to support different parsers, I think it is best to define > the Emacs "parser API" as "give text to parser; accept text properties > from parser". Yes, something like that. It's probably enough to accept a list of regions with syntactic attributes. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 14:36 ` Eli Zaretskii @ 2020-04-03 2:27 ` Stephen Leake 2020-04-03 7:43 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Stephen Leake @ 2020-04-03 2:27 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stephen Leake <stephen_leake@stephe-leake.org> >> Date: Wed, 01 Apr 2020 15:38:26 -0800 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > Also, direct access to buffer text generally means we must make sure >> > GC never runs as long as pointers to buffer text are lying around. >> > Can any Lisp run between calls to the reader function that the >> > tree-sitter parser calls to access the buffer text? >> >> If the parser copies the text into an internal buffer, that reader >> function should only be called once per call to the parser. > > Such copying is not really scalable, and IMO should be avoided. > During active editing, redisplay runs very frequently, and having to > copy portions of the buffer, let alone all of it, each time, which > necessarily requires memory allocation, consing of Lisp objects, etc., > will produce significant memory pressure, expensive heap > allocations/deallocations, and a lot of GC. Recall that on many > modern platforms Emacs doesn't really return memory to the system, > which means we risk increasing the memory footprint, and create > system-wide memory pressure. It isn't a catastrophe, but we should > try to avoid it if possible. Ok. I know very little about the internal storage of text in Emacs. There is at least two strings with a gap at the current edit point; if we pass a simple pointer to tree-sitter, it will have to handle the gap. You mention "consing of Lisp objects" above, which says to me that the text is stored in a more complex structure. How can we provide direct access of that to tree-sitter? Avoid _all_ copying is impossible; the parser must store the contents of each token in some way. Typically that is done by storing pointers/indices into the text buffer that contains the entire text. >> In sum, the short answer is "yes, you must parse the whole file, unless >> your language is particularly simple". > > Funny, my conclusion from reading your detailed description was > entirely different. I need more than that to respond in a helpful way. >> In general, each parser library, and even each grammar author, will have >> different representations for the syntax tree. >> >> So if we want to support different parsers, I think it is best to define >> the Emacs "parser API" as "give text to parser; accept text properties >> from parser". > > Yes, something like that. It's probably enough to accept a list of > regions with syntactic attributes. Ok, good. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 2:27 ` Stephen Leake @ 2020-04-03 7:43 ` Eli Zaretskii 2020-04-03 17:45 ` Stephen Leake 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-03 7:43 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Date: Thu, 02 Apr 2020 18:27:59 -0800 > > > Such copying is not really scalable, and IMO should be avoided. > > During active editing, redisplay runs very frequently, and having to > > copy portions of the buffer, let alone all of it, each time, which > > necessarily requires memory allocation, consing of Lisp objects, etc., > > will produce significant memory pressure, expensive heap > > allocations/deallocations, and a lot of GC. Recall that on many > > modern platforms Emacs doesn't really return memory to the system, > > which means we risk increasing the memory footprint, and create > > system-wide memory pressure. It isn't a catastrophe, but we should > > try to avoid it if possible. > > Ok. I know very little about the internal storage of text in Emacs. > There is at least two strings with a gap at the current edit point; if > we pass a simple pointer to tree-sitter, it will have to handle the gap. Tree-sitter allows the application to define a "reader" function that it will then call to access buffer text. That function should cope with the gap. > You mention "consing of Lisp objects" above, which says to me that the > text is stored in a more complex structure. I meant the consing that is necessary to make a buffer-substring that will be passed to the parser. > How can we provide direct access of that to tree-sitter? See above: by writing our function to access buffer text. > Avoid _all_ copying is impossible; the parser must store the contents of > each token in some way. Typically that is done by storing > pointers/indices into the text buffer that contains the entire text. I don't think tree-sitter does that, because the text it gets is ephemeral. If we pass it a buffer-substring, it's a temporary string which will be GCed after it's used; if we pass it pointers to buffer text, those pointers can be invalid after GC, because GC can relocate buffer text to a different memory region. They definitely do copy portions of the text they get for internal processing purposes, but I doubt that they duplicate all of it, because that would not be scalable to huge buffers. And in any case, any copying we do would be _in_addition_ to what tree-sitter does internally. > >> In sum, the short answer is "yes, you must parse the whole file, unless > >> your language is particularly simple". > > > > Funny, my conclusion from reading your detailed description was > > entirely different. > > I need more than that to respond in a helpful way. Well, you said: > To some extent, that depends on the language. and then went on to describing how each language might _not_ need a full parse in many cases. Thus the conclusion sounded a bit radical to me. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 7:43 ` Eli Zaretskii @ 2020-04-03 17:45 ` Stephen Leake 2020-04-03 18:31 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Stephen Leake @ 2020-04-03 17:45 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stephen Leake <stephen_leake@stephe-leake.org> >> Date: Thu, 02 Apr 2020 18:27:59 -0800 >> >> > Such copying is not really scalable, and IMO should be avoided. >> > During active editing, redisplay runs very frequently, and having to >> > copy portions of the buffer, let alone all of it, each time, which >> > necessarily requires memory allocation, consing of Lisp objects, etc., >> > will produce significant memory pressure, expensive heap >> > allocations/deallocations, and a lot of GC. Recall that on many >> > modern platforms Emacs doesn't really return memory to the system, >> > which means we risk increasing the memory footprint, and create >> > system-wide memory pressure. It isn't a catastrophe, but we should >> > try to avoid it if possible. >> >> Ok. I know very little about the internal storage of text in Emacs. >> There is at least two strings with a gap at the current edit point; if >> we pass a simple pointer to tree-sitter, it will have to handle the gap. > > Tree-sitter allows the application to define a "reader" function that > it will then call to access buffer text. That function should cope > with the gap. and also with the encoding, which you did not address. I don't see how that is different from the C level buffer-substring. Certainly there should be a module function buffer-substring that is as efficient as possible. >> You mention "consing of Lisp objects" above, which says to me that the >> text is stored in a more complex structure. > > I meant the consing that is necessary to make a buffer-substring that > will be passed to the parser. Since are are calling the parser from C (if it is linked into Emacs, or in a module), I still don't understand. Does C code have to cons to create a string? It will have to allocate if the requested range is not contiguous in the buffer. >> Avoid _all_ copying is impossible; the parser must store the contents of >> each token in some way. Typically that is done by storing >> pointers/indices into the text buffer that contains the entire text. > > I don't think tree-sitter does that, because the text it gets is > ephemeral. If we pass it a buffer-substring, it's a temporary string > which will be GCed after it's used; if we pass it pointers to buffer > text, those pointers can be invalid after GC, because GC can relocate > buffer text to a different memory region. Hmm. https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code says: Syntax nodes store their position in the source code both in terms of raw bytes and row/column coordinates In the case of passing a pointer to a string (or buffer, etc), those positions are relative to that original buffer. So the Emacs buffer is serving as the parse buffer. Ok, that avoids any copying. If we pass a buffer-substring to the parser, we are then responsible for mapping positions relative to the substring into positions relative to the full buffer. wisi delegates that to the parser; it can pass start-char-pos and start-byte-pos to the parser along with a string. >> >> In sum, the short answer is "yes, you must parse the whole file, unless >> >> your language is particularly simple". >> > >> > Funny, my conclusion from reading your detailed description was >> > entirely different. >> >> I need more than that to respond in a helpful way. > > Well, you said: > >> To some extent, that depends on the language. > > and then went on to describing how each language might _not_ need a > full parse in many cases. Thus the conclusion sounded a bit radical > to me. Ok, we are putting different spins on what "particularly simple" means. A more neutral phrasing would be: Some languages require parsing the whole file, some do not. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 17:45 ` Stephen Leake @ 2020-04-03 18:31 ` Eli Zaretskii 2020-04-04 0:04 ` Stephen Leake 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-03 18:31 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Date: Fri, 03 Apr 2020 09:45:44 -0800 > > > Tree-sitter allows the application to define a "reader" function that > > it will then call to access buffer text. That function should cope > > with the gap. > > and also with the encoding, which you did not address. I mentioned that in another message: I don't think encoding is necessary in this case. > I don't see how that is different from the C level > buffer-substring. Certainly there should be a module function > buffer-substring that is as efficient as possible. If modules are allowed direct access to buffer text, then it's indeed not different. But the alternative that was discussed was different. May I suggest that you look at the code of the module which triggered this? > >> You mention "consing of Lisp objects" above, which says to me that the > >> text is stored in a more complex structure. > > > > I meant the consing that is necessary to make a buffer-substring that > > will be passed to the parser. > > Since are are calling the parser from C (if it is linked into Emacs, or > in a module), I still don't understand. Does C code have to cons to > create a string? If course. How else do you get a UTF-8 encoded string to pass to the parser as a copy of buffer text? > > I don't think tree-sitter does that, because the text it gets is > > ephemeral. If we pass it a buffer-substring, it's a temporary string > > which will be GCed after it's used; if we pass it pointers to buffer > > text, those pointers can be invalid after GC, because GC can relocate > > buffer text to a different memory region. > > Hmm. > https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code > says: > > Syntax nodes store their position in the source code both in terms > of raw bytes and row/column coordinates Positions are okay; 'char *' pointers to buffer or string text are not. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 18:31 ` Eli Zaretskii @ 2020-04-04 0:04 ` Stephen Leake 2020-04-04 7:13 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Stephen Leake @ 2020-04-04 0:04 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Stephen Leake <stephen_leake@stephe-leake.org> >> >> >> You mention "consing of Lisp objects" above, which says to me that the >> >> text is stored in a more complex structure. >> > >> > I meant the consing that is necessary to make a buffer-substring that >> > will be passed to the parser. >> >> Since are are calling the parser from C (if it is linked into Emacs, or >> in a module), I still don't understand. Does C code have to cons to >> create a string? > > If course. How else do you get a UTF-8 encoded string to pass to the > parser as a copy of buffer text? malloc and memcpy. I guess that's what you mean by "cons"; I was assuming you meant the actual elisp function. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 0:04 ` Stephen Leake @ 2020-04-04 7:13 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 7:13 UTC (permalink / raw) To: Stephen Leake; +Cc: emacs-devel > From: Stephen Leake <stephen_leake@stephe-leake.org> > Date: Fri, 03 Apr 2020 16:04:04 -0800 > > >> Since are are calling the parser from C (if it is linked into Emacs, or > >> in a module), I still don't understand. Does C code have to cons to > >> create a string? > > > > If course. How else do you get a UTF-8 encoded string to pass to the > > parser as a copy of buffer text? > > malloc and memcpy. How do you know how much memory to allocate? And memcpy doesn't cut it, because you forgot the encoding step. You could, of course, take the low-level encoding code from coding.c and make your own high-level functions that don't work with Lisp objects. But (a) why bother doing that? and (b) I think you will quickly find out that this is a non-trivial job, since coding.c "knows", to the lowest level, that it's dealing with Lisp objects (buffers or strings), so you'd need pretty much to rewrite everything. It's no accident that the Cygwin port uses the Lisp string machinery even when it needs to convert strings from UTF-16 (see from_unicode), even though it basically needs to convert C strings. > I guess that's what you mean by "cons"; I was assuming you meant the > actual elisp function. No, I meant "consing" as in "make a Lisp string", then encode it (which makes another Lisp string). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 19:33 ` Eli Zaretskii 2020-04-01 23:38 ` Stephen Leake @ 2020-04-02 4:21 ` Tuấn-Anh Nguyễn 2020-04-02 5:19 ` Jorge Javier Araya Navarro ` (3 more replies) 1 sibling, 4 replies; 142+ messages in thread From: Tuấn-Anh Nguyễn @ 2020-04-02 4:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel On Thu, Apr 2, 2020 at 2:33 AM Eli Zaretskii <eliz@gnu.org> wrote: > > > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com> > > Date: Thu, 2 Apr 2020 00:55:45 +0700 > > Cc: emacs-devel@gnu.org > > > > > Did you consider using the API where an application can provide a > > > function to return text at a given offset? Such a function could be > > > relatively easily implemented for Emacs. > > > > > > > I don't understand what you mean. Below I'll explain how it works > > currently. [...] If dynamic modules have direct access to the > > buffer text, none of the above is an issue. > > > > Such direct access can be enabled by something like this: > > > > char* (*access_buffer_text) (emacs_env *env, > > emacs_value buffer, > > ptrdiff_t byte_offset, > > ptrdiff_t *size_inout); > > > > Of course, such an API would require extensive documentation on how it > > must be used, to ensure safety and correctness. > > I think you are moving too fast, and keep the current implementation > in sight too much. > I'm actually moving too slow here. I have thought about this part quite a bit, but I'm currently focusing on other things, partially because this is not painful bottleneck. > What I suggest is to step back and see how such direct access, if it > were available, could be used with tree-sitter. Let's forget about > modules for a moment and consider tree-sitter linked with Emacs and > capable of calling any C function in core. How would you use that? > > Buffer text is not exactly UTF-8, it's a superset of UTF-8. So one > question to answer is what to do with byte sequences that are not > valid UTF-8. Any suggestions or ideas? How does tree-sitter handle > invalid byte sequences in general? > I haven't checked yet. It will probably bail out, which is usually the desired behavior. The tree-sitter's author is likely open to making this behavior configurable here, though. Alternatively, the direct access function can offer different behaviors: as-is, bail-out, skip-over, or null-out (tree-sitter will skip over null bytes, IIRC). > Also, direct access to buffer text generally means we must make sure > GC never runs as long as pointers to buffer text are lying around. > Can any Lisp run between calls to the reader function that the > tree-sitter parser calls to access the buffer text? If so, we need to > take care of that issue. > With direct access, no Lisp code will be run between these calls. > Next, I'm still asking whether parsing the whole buffer when it is > first created is necessary. Can we pass to the parser just a small > chunk (say, 500 bytes) of the buffer around the window-full to be > displayed next? If this presents problems, what are those problems? > In principle (not in tree-sitter ATM), and in very specific cases, yes. IMO that's the wrong focus on a premature optimization anyway. As others noted, even in the pathological case of xdisp.c, the performance is acceptable. Also keep in mind that syntax highlighting is just one application. Other use cases usually want a full parse tree. If we really want to tackle this issue, there are other approaches to consider, e.g. background parsing, or parsing up until a time limit, and resume parsing when Emacs is idle. Tree-sitter's API supports the latter. But again, both thought exercises and my usage so far point to this being a non-issue. > IOW, the issue with exposing access to buffer text to modules is IMO > secondary. My suggestion is first to figure out how to do this stuff > efficiently from within Emacs itself, as if the module interface were > not part of the equation. We can add that aspect back later. > My opinion is that it's better to experiment with this kind of stuff out-of-core. It can move forward faster that way, allowing more lessons to be learned. Real lessons, involving real-world use cases, not thought exercises. In a somewhat similar vein, writing emacs-tree-sitter highlighted real issues with dynamic modules, which I'm going to write up sometime. > And yes, doing this by consing strings is not a good idea, it will > slow things down and cause a lot of GC. It is best avoided. Thus my > questions above. > > > > Btw, what do you do with the tree returned by the tree-sitter parser? > > > store it in some buffer-local variable? If so, how much memory does > > > such a tree take, and when, if ever, is that memory released? > > > > > > > It's stored in a buffer-local variable. I haven't measured the memory > > they take. Memory is released when the tree object is garbage-collected > > (it's a `user-ptr'). > > So if I have many hundreds of buffers, I could have such a tree in > each one of them indefinitely? Perhaps that's one more design issue > to consider, given that the parsing is so fast. Similar to what we do > with image and face caches -- we flush them from time to time, to keep > the memory footprint in check. So a buffer that was not current more > than some time interval ago could have its tree GCed. > That can work. Alternatively, tree-sitter can add support for "folding" subtrees, as Stefan suggested. -- Tuấn-Anh Nguyễn Software Engineer ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 4:21 ` Tuấn-Anh Nguyễn @ 2020-04-02 5:19 ` Jorge Javier Araya Navarro 2020-04-02 9:29 ` Stephen Leake 2020-04-02 10:37 ` Andrea Corallo ` (2 subsequent siblings) 3 siblings, 1 reply; 142+ messages in thread From: Jorge Javier Araya Navarro @ 2020-04-02 5:19 UTC (permalink / raw) To: Tuấn-Anh Nguyễn; +Cc: Eli Zaretskii, emacs-devel [-- Attachment #1: Type: text/plain, Size: 5897 bytes --] > Also keep in mind that syntax highlighting is just one application. Other use cases usually want a full parse tree. like indentation, or so I think 🤔, but indentation may be one of those use cases. El mié., 1 de abril de 2020 22:22, Tuấn-Anh Nguyễn <ubolonton@gmail.com> escribió: > On Thu, Apr 2, 2020 at 2:33 AM Eli Zaretskii <eliz@gnu.org> wrote: > > > > > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com> > > > Date: Thu, 2 Apr 2020 00:55:45 +0700 > > > Cc: emacs-devel@gnu.org > > > > > > > Did you consider using the API where an application can provide a > > > > function to return text at a given offset? Such a function could be > > > > relatively easily implemented for Emacs. > > > > > > > > > > I don't understand what you mean. Below I'll explain how it works > > > currently. [...] If dynamic modules have direct access to the > > > buffer text, none of the above is an issue. > > > > > > Such direct access can be enabled by something like this: > > > > > > char* (*access_buffer_text) (emacs_env *env, > > > emacs_value buffer, > > > ptrdiff_t byte_offset, > > > ptrdiff_t *size_inout); > > > > > > Of course, such an API would require extensive documentation on how it > > > must be used, to ensure safety and correctness. > > > > I think you are moving too fast, and keep the current implementation > > in sight too much. > > > > I'm actually moving too slow here. I have thought about this part quite > a bit, but I'm currently focusing on other things, partially because > this is not painful bottleneck. > > > What I suggest is to step back and see how such direct access, if it > > were available, could be used with tree-sitter. Let's forget about > > modules for a moment and consider tree-sitter linked with Emacs and > > capable of calling any C function in core. How would you use that? > > > > Buffer text is not exactly UTF-8, it's a superset of UTF-8. So one > > question to answer is what to do with byte sequences that are not > > valid UTF-8. Any suggestions or ideas? How does tree-sitter handle > > invalid byte sequences in general? > > > > I haven't checked yet. It will probably bail out, which is usually the > desired behavior. The tree-sitter's author is likely open to making this > behavior configurable here, though. Alternatively, the direct access > function can offer different behaviors: as-is, bail-out, skip-over, or > null-out (tree-sitter will skip over null bytes, IIRC). > > > Also, direct access to buffer text generally means we must make sure > > GC never runs as long as pointers to buffer text are lying around. > > Can any Lisp run between calls to the reader function that the > > tree-sitter parser calls to access the buffer text? If so, we need to > > take care of that issue. > > > > With direct access, no Lisp code will be run between these calls. > > > Next, I'm still asking whether parsing the whole buffer when it is > > first created is necessary. Can we pass to the parser just a small > > chunk (say, 500 bytes) of the buffer around the window-full to be > > displayed next? If this presents problems, what are those problems? > > > > In principle (not in tree-sitter ATM), and in very specific cases, yes. > IMO that's the wrong focus on a premature optimization anyway. As others > noted, even in the pathological case of xdisp.c, the performance is > acceptable. Also keep in mind that syntax highlighting is just one > application. Other use cases usually want a full parse tree. > > If we really want to tackle this issue, there are other approaches to > consider, e.g. background parsing, or parsing up until a time limit, and > resume parsing when Emacs is idle. Tree-sitter's API supports the > latter. > > But again, both thought exercises and my usage so far point to this > being a non-issue. > > > IOW, the issue with exposing access to buffer text to modules is IMO > > secondary. My suggestion is first to figure out how to do this stuff > > efficiently from within Emacs itself, as if the module interface were > > not part of the equation. We can add that aspect back later. > > > > My opinion is that it's better to experiment with this kind of stuff > out-of-core. It can move forward faster that way, allowing more lessons > to be learned. Real lessons, involving real-world use cases, not thought > exercises. > > In a somewhat similar vein, writing emacs-tree-sitter highlighted real > issues with dynamic modules, which I'm going to write up sometime. > > > And yes, doing this by consing strings is not a good idea, it will > > slow things down and cause a lot of GC. It is best avoided. Thus my > > questions above. > > > > > > Btw, what do you do with the tree returned by the tree-sitter parser? > > > > store it in some buffer-local variable? If so, how much memory does > > > > such a tree take, and when, if ever, is that memory released? > > > > > > > > > > It's stored in a buffer-local variable. I haven't measured the memory > > > they take. Memory is released when the tree object is garbage-collected > > > (it's a `user-ptr'). > > > > So if I have many hundreds of buffers, I could have such a tree in > > each one of them indefinitely? Perhaps that's one more design issue > > to consider, given that the parsing is so fast. Similar to what we do > > with image and face caches -- we flush them from time to time, to keep > > the memory footprint in check. So a buffer that was not current more > > than some time interval ago could have its tree GCed. > > > > That can work. Alternatively, tree-sitter can add support for "folding" > subtrees, as Stefan suggested. > > -- > Tuấn-Anh Nguyễn > Software Engineer > > [-- Attachment #2: Type: text/html, Size: 7465 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 5:19 ` Jorge Javier Araya Navarro @ 2020-04-02 9:29 ` Stephen Leake 0 siblings, 0 replies; 142+ messages in thread From: Stephen Leake @ 2020-04-02 9:29 UTC (permalink / raw) To: emacs-devel Jorge Javier Araya Navarro <jorge@esavara.cr> writes: >> Also keep in mind that syntax highlighting is just one > application. Other use cases usually want a full parse tree. > > like indentation, or so I think 🤔, but indentation may be one of those use > cases. To correctly compute indentation for Ada code, you need to parse the full file initially. After than, indent-region to indent edited code fits nicely with incremental parse. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 4:21 ` Tuấn-Anh Nguyễn 2020-04-02 5:19 ` Jorge Javier Araya Navarro @ 2020-04-02 10:37 ` Andrea Corallo 2020-04-02 11:14 ` Tuấn-Anh Nguyễn 2020-04-02 13:02 ` Stefan Monnier 2020-04-02 15:02 ` Eli Zaretskii 3 siblings, 1 reply; 142+ messages in thread From: Andrea Corallo @ 2020-04-02 10:37 UTC (permalink / raw) To: Tuấn-Anh Nguyễn; +Cc: Eli Zaretskii, emacs-devel Tuấn-Anh Nguyễn <ubolonton@gmail.com> writes: > In principle (not in tree-sitter ATM), and in very specific cases, yes. > IMO that's the wrong focus on a premature optimization anyway. As others > noted, even in the pathological case of xdisp.c, the performance is > acceptable. Please do not assume xdisp.c is the worst case scenario, I can testify it is not :) Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 10:37 ` Andrea Corallo @ 2020-04-02 11:14 ` Tuấn-Anh Nguyễn 0 siblings, 0 replies; 142+ messages in thread From: Tuấn-Anh Nguyễn @ 2020-04-02 11:14 UTC (permalink / raw) To: Andrea Corallo; +Cc: Eli Zaretskii, emacs-devel On Thu, Apr 2, 2020 at 5:37 PM Andrea Corallo <akrl@sdf.org> wrote: > > Tuấn-Anh Nguyễn <ubolonton@gmail.com> writes: > > > In principle (not in tree-sitter ATM), and in very specific cases, yes. > > IMO that's the wrong focus on a premature optimization anyway. As others > > noted, even in the pathological case of xdisp.c, the performance is > > acceptable. > > Please do not assume xdisp.c is the worst case scenario, I can testify > it is not :) > Fair enough. -- Tuấn-Anh Nguyễn Software Engineer ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 4:21 ` Tuấn-Anh Nguyễn 2020-04-02 5:19 ` Jorge Javier Araya Navarro 2020-04-02 10:37 ` Andrea Corallo @ 2020-04-02 13:02 ` Stefan Monnier 2020-04-02 15:06 ` Eli Zaretskii 2020-04-02 15:02 ` Eli Zaretskii 3 siblings, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-04-02 13:02 UTC (permalink / raw) To: Tuấn-Anh Nguyễn; +Cc: Eli Zaretskii, emacs-devel > If we really want to tackle this issue, there are other approaches to > consider, e.g. background parsing, or parsing up until a time limit, and > resume parsing when Emacs is idle. Tree-sitter's API supports the > latter. Emacs is in dire need to exploit multiple cores. It would be very natural to run tree-parser's initial parse asynchronously in a separate thread. This requires to pass tree-parser a *copy* of the buffer's text. Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 13:02 ` Stefan Monnier @ 2020-04-02 15:06 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-02 15:06 UTC (permalink / raw) To: Stefan Monnier; +Cc: ubolonton, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org > Date: Thu, 02 Apr 2020 09:02:41 -0400 > > > If we really want to tackle this issue, there are other approaches to > > consider, e.g. background parsing, or parsing up until a time limit, and > > resume parsing when Emacs is idle. Tree-sitter's API supports the > > latter. > > Emacs is in dire need to exploit multiple cores. True. > It would be very natural to run tree-parser's initial parse > asynchronously in a separate thread. This requires to pass > tree-parser a *copy* of the buffer's text. This also raises a lot of issues and problems of its own, of which copying the buffer is the least one. We don't yet have any example of such asynchronous processing, so this feature will have to be the first that does it, and will then have to resolve the issues in addition to doing its main job. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 4:21 ` Tuấn-Anh Nguyễn ` (2 preceding siblings ...) 2020-04-02 13:02 ` Stefan Monnier @ 2020-04-02 15:02 ` Eli Zaretskii 2020-04-03 14:34 ` Tuấn-Anh Nguyễn 3 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-02 15:02 UTC (permalink / raw) To: Tuấn-Anh Nguyễn; +Cc: emacs-devel > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com> > Date: Thu, 2 Apr 2020 11:21:49 +0700 > Cc: emacs-devel@gnu.org > > > Buffer text is not exactly UTF-8, it's a superset of UTF-8. So one > > question to answer is what to do with byte sequences that are not > > valid UTF-8. Any suggestions or ideas? How does tree-sitter handle > > invalid byte sequences in general? > > > > I haven't checked yet. It will probably bail out, which is usually the > desired behavior. "Bail out" meaning that this breaks the parse? I'd be surprised if that was what happens in these cases. But if it does, we will need to replace such sequences by the likes of U+FFFD in the reader function we provide. > With direct access, no Lisp code will be run between these calls. Then this issue is taken care of. > > Next, I'm still asking whether parsing the whole buffer when it is > > first created is necessary. Can we pass to the parser just a small > > chunk (say, 500 bytes) of the buffer around the window-full to be > > displayed next? If this presents problems, what are those problems? > > > > In principle (not in tree-sitter ATM), and in very specific cases, yes. > IMO that's the wrong focus on a premature optimization anyway. I tried to explain elsewhere why I don't think this is premature. > As others noted, even in the pathological case of xdisp.c, the > performance is acceptable. xdisp.c is not a pathological case for me, I edit it very frequently. More importantly, this scales poorly. > Also keep in mind that syntax highlighting is just one > application. Other use cases usually want a full parse tree. Other applications have different restrictions and requirements, so trying to satisfy all of them at once might not be the best way. > If we really want to tackle this issue, there are other approaches to > consider, e.g. background parsing, or parsing up until a time limit, and > resume parsing when Emacs is idle. Tree-sitter's API supports the > latter. JIT-lock already supports background fontification (see jit-lock-stealth-time), so using such parsers from jit-lock gives that to you at almost no cost. > > IOW, the issue with exposing access to buffer text to modules is IMO > > secondary. My suggestion is first to figure out how to do this stuff > > efficiently from within Emacs itself, as if the module interface were > > not part of the equation. We can add that aspect back later. > > > > My opinion is that it's better to experiment with this kind of stuff > out-of-core. It can move forward faster that way, allowing more lessons > to be learned. Real lessons, involving real-world use cases, not thought > exercises. I'm talking about trying different design ideas. It is best to do that without being limited by what modules can and cannot do. Building a hacked version of Emacs to test those ideas doesn't necessarily contradict the desire to collect real-life experience. IOW, I suggest to test alternative design ideas that are not based on copying portions of the buffer via Lisp strings. If those ideas are workable (and I think they are), they will support a more scalable implementation that exerts less memory pressure on Emacs and on the host system. HTH ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 15:02 ` Eli Zaretskii @ 2020-04-03 14:34 ` Tuấn-Anh Nguyễn 0 siblings, 0 replies; 142+ messages in thread From: Tuấn-Anh Nguyễn @ 2020-04-03 14:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel On Thu, Apr 2, 2020 at 10:02 PM Eli Zaretskii <eliz@gnu.org> wrote: > > > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com> > > Date: Thu, 2 Apr 2020 11:21:49 +0700 > > Cc: emacs-devel@gnu.org > > > > > Buffer text is not exactly UTF-8, it's a superset of UTF-8. So one > > > question to answer is what to do with byte sequences that are not > > > valid UTF-8. Any suggestions or ideas? How does tree-sitter handle > > > invalid byte sequences in general? > > > > > > > I haven't checked yet. It will probably bail out, which is usually the > > desired behavior. > > "Bail out" meaning that this breaks the parse? I'd be surprised if > that was what happens in these cases. But if it does, we will need to > replace such sequences by the likes of U+FFFD in the reader function > we provide. > Agreed. I'll try checking its behavior on this. > > > IOW, the issue with exposing access to buffer text to modules is IMO > > > secondary. My suggestion is first to figure out how to do this stuff > > > efficiently from within Emacs itself, as if the module interface were > > > not part of the equation. We can add that aspect back later. > > > > > > > My opinion is that it's better to experiment with this kind of stuff > > out-of-core. It can move forward faster that way, allowing more lessons > > to be learned. Real lessons, involving real-world use cases, not thought > > exercises. > > I'm talking about trying different design ideas. It is best to do > that without being limited by what modules can and cannot do. > Building a hacked version of Emacs to test those ideas doesn't > necessarily contradict the desire to collect real-life experience. > > IOW, I suggest to test alternative design ideas that are not based on > copying portions of the buffer via Lisp strings. If those ideas are > workable (and I think they are), they will support a more scalable > implementation that exerts less memory pressure on Emacs and on the > host system. > > HTH > Yeah, I agree that going through Lisp strings for this is sub-optimal. When I have time to come back to this part, I'll hack up my local Emacs to allow dynamic modules to access buffer texts directly, to test out the idea. -- Tuấn-Anh Nguyễn Software Engineer P.S. Sorry Gmail messed up my first reply. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) @ 2020-03-29 18:46 Stefan Monnier 2020-03-29 19:05 ` Andrea Corallo 0 siblings, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-03-29 18:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >> tree-sitter, like LSP, is something Emacs should embrace. > https://lists.gnu.org/archive/html/emacs-devel/2020-01/msg00059.html Ah, thanks Eli: I guess I skipped over that while catching up. > Would someone like to try to figure out how we could use the > incremental parsing technology in Emacs for making our > programming-language support more accurate and efficient? One package > that implements this technology is tree-sitter: > > https://tree-sitter.github.io/tree-sitter/ Yes, adding support for this would be great. > AFAIU, these capabilities could be used as an alternative to > regexp- and syntax-pps-based font-lock, better code folding, > completion, refactoring, and other similar features; in general, any > feature which would benefit from having a parse tree for the source > code in a buffer. Some of those features could be provided by LSP as well, but IIUC the way LSP is designed and usually used makes it somewhat inadequate for synchronous use, when you want an immediate answer. tree-sitter is designed exactly for that: it can parse "immediately", in the same sense as `syntax-ppss`, so LSP seems inapplicable (in the near future at least) for things like font-lock and navigation, and indentation, whereas tree-sitter should work great for that. [ W.r.t disucssions around LSP's use of JSON: AFAICT, parsing and emitting json can be done as efficiently as any other format, AFAICT, so I don't see the use of JSON as a problem in the protocol. ] > To be able to use such libraries, we need to figure out how to > integrate them into the core, what kind of interfaces would be needed > for that, and what kind of infrastructure we would need for basing > Lisp features on those libraries. The existing third party packages should be good starting points to come up with a design. But I think an important issue is to figure out how to make tree-sitter usable for the end users: AFAICT the main issue being how to let end users download and install new grammars. IIUC grammars are written in Javascript (or some subset thereof?) and then somehow compiled to C code. Having them as C code implies either the end-user need to have a C compiler or distributing pre-compiled binaries with all the trouble this entails (with all the variations of OSes, and architectures, and ABIs, ..., plus issues related to licensing, security, ...). Maybe those grammars could be compiled to some other representation (I don't know if it is made mostly of data-tables or actual code or what)? Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) 2020-03-29 18:46 Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier @ 2020-03-29 19:05 ` Andrea Corallo 2020-03-29 19:18 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Andrea Corallo @ 2020-03-29 19:05 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > Maybe those grammars could be compiled to some other representation (I > don't know if it is made mostly of data-tables or actual code or what)? IMO ideally should be lisp and we should leverage the native compiler for that, but I understand we are not there. Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) 2020-03-29 19:05 ` Andrea Corallo @ 2020-03-29 19:18 ` Eli Zaretskii 2020-03-29 19:29 ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-29 19:18 UTC (permalink / raw) To: Andrea Corallo; +Cc: monnier, emacs-devel > From: Andrea Corallo <akrl@sdf.org> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org > Date: Sun, 29 Mar 2020 19:05:57 +0000 > > Stefan Monnier <monnier@iro.umontreal.ca> writes: > > > Maybe those grammars could be compiled to some other representation (I > > don't know if it is made mostly of data-tables or actual code or what)? > > IMO ideally should be lisp and we should leverage the native compiler > for that, but I understand we are not there. FWIW, it should indeed be possible to develop the grammars in Lisp, but that is not the first goal in bringing such a package to Emacs. Not even the second one. Because once such a package can be used with Emacs, and the results are significantly better than what we have today, you will see someone come up with a way of doing that in Lisp in no time. Making the connection happen, and coming up with a good design for that, should be the first goal. IMO, we should identify the features that can benefit from that (font-lock is just one of them, maybe not even the most important one), and design the interfaces and the infrastructure so that it could support them all (and then some). But I repeat myself. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-29 19:18 ` Eli Zaretskii @ 2020-03-29 19:29 ` Yuan Fu 2020-03-30 14:04 ` Eli Zaretskii 2020-03-30 15:06 ` Stefan Monnier 0 siblings, 2 replies; 142+ messages in thread From: Yuan Fu @ 2020-03-29 19:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, Stefan Monnier, Andrea Corallo A related question: is there a reliable way to be notified when buffer text changes? Because AFAICT both tree-sitter and LSP needs to know incremental changes. Both LSP packages (lsp-mode and eaglet) add hooks to after-change-function. But their hook is not guaranteed to run because of inhibit-modification-hooks. Undo seems to always know the exact change, but it doesn’t seem to have a hook avaliable. Yuan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-29 19:29 ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu @ 2020-03-30 14:04 ` Eli Zaretskii 2020-03-30 15:06 ` Stefan Monnier 1 sibling, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-03-30 14:04 UTC (permalink / raw) To: Yuan Fu; +Cc: emacs-devel, monnier, akrl > From: Yuan Fu <casouri@gmail.com> > Date: Sun, 29 Mar 2020 15:29:41 -0400 > Cc: Andrea Corallo <akrl@sdf.org>, > Stefan Monnier <monnier@iro.umontreal.ca>, > emacs-devel@gnu.org > > A related question: is there a reliable way to be notified when buffer text changes? Because AFAICT both tree-sitter and LSP needs to know incremental changes. Why not simply pass to tree-sitter the chunk that jit-lock is about to fontify? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-29 19:29 ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu 2020-03-30 14:04 ` Eli Zaretskii @ 2020-03-30 15:06 ` Stefan Monnier 2020-03-30 17:14 ` Yuan Fu 1 sibling, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-03-30 15:06 UTC (permalink / raw) To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo > A related question: is there a reliable way to be notified when buffer text > changes? Because AFAICT both tree-sitter and LSP needs to know incremental > changes. Both LSP packages (lsp-mode and eaglet) add hooks to > after-change-function. But their hook is not guaranteed to run because of > inhibit-modification-hooks. If they needed to be informed of the change but `inhibit-modification-hooks` prevented it, it's a bug. Please report it. Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 15:06 ` Stefan Monnier @ 2020-03-30 17:14 ` Yuan Fu 2020-03-30 17:54 ` Stefan Monnier 2020-03-31 2:24 ` Eli Zaretskii 0 siblings, 2 replies; 142+ messages in thread From: Yuan Fu @ 2020-03-30 17:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo [-- Attachment #1: Type: text/plain, Size: 595 bytes --] > On Mar 30, 2020, at 11:06 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > If they needed to be informed of the change but > `inhibit-modification-hooks` prevented it, it's a bug. > Please report it. > Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in inhibit-modification-hooks (or the code who set it to t)? > Why not simply pass to tree-sitter the chunk that jit-lock is about to > fontify? Incremental parsing seems to be the preferred way to use tree-sitter—maintaining a syntax tree on the fly and later query for information from it. Yuan [-- Attachment #2: Type: text/html, Size: 4192 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 17:14 ` Yuan Fu @ 2020-03-30 17:54 ` Stefan Monnier 2020-03-30 18:43 ` Štěpán Němec 2020-03-31 2:24 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-03-30 17:54 UTC (permalink / raw) To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Andrea Corallo >> If they needed to be informed of the change but >> `inhibit-modification-hooks` prevented it, it's a bug. >> Please report it. > Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in > inhibit-modification-hooks (or the code who set it to t)? The fact that they're not informed is the bug. So it's presumably not the fault of eglot/lsp-mode. Whose fault it is will depend on the details of the particular situation where it occurs. Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 17:54 ` Stefan Monnier @ 2020-03-30 18:43 ` Štěpán Němec 2020-03-30 18:46 ` Stefan Monnier 0 siblings, 1 reply; 142+ messages in thread From: Štěpán Němec @ 2020-03-30 18:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: Yuan Fu, Andrea Corallo, Eli Zaretskii, emacs-devel On Mon, 30 Mar 2020 13:54:53 -0400 Stefan Monnier wrote: >>> If they needed to be informed of the change but >>> `inhibit-modification-hooks` prevented it, it's a bug. >>> Please report it. >> Do you mean it’s a bug in eglot/lsp-mode or it’s a bug in >> inhibit-modification-hooks (or the code who set it to t)? > > The fact that they're not informed is the bug. > So it's presumably not the fault of eglot/lsp-mode. > Whose fault it is will depend on the details of the particular situation > where it occurs. FWIW, I have described one such situation (unrelated to lsp) recently here: https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403 (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so the buffer changes caused by populating dired buffers are not noticeable in `after-change-functions'.) I was wondering if I should report it as a bug, despite the workaround not being particularly painful in this case (there's `dired-after-readin-hook'). -- Štěpán ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 18:43 ` Štěpán Němec @ 2020-03-30 18:46 ` Stefan Monnier 2020-03-30 19:02 ` Yuan Fu 2020-03-30 19:27 ` Štěpán Němec 0 siblings, 2 replies; 142+ messages in thread From: Stefan Monnier @ 2020-03-30 18:46 UTC (permalink / raw) To: Štěpán Němec Cc: Yuan Fu, Andrea Corallo, Eli Zaretskii, emacs-devel > https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403 > (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so > the buffer changes caused by populating dired buffers are not noticeable > in `after-change-functions'.) > I was wondering if I should report it as a bug, despite the workaround > not being particularly painful in this case (there's `dired-after-readin-hook'). I think it deserves a bug report, yes. Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 18:46 ` Stefan Monnier @ 2020-03-30 19:02 ` Yuan Fu 2020-03-30 19:10 ` Eli Zaretskii 2020-03-30 19:42 ` Stefan Monnier 2020-03-30 19:27 ` Štěpán Němec 1 sibling, 2 replies; 142+ messages in thread From: Yuan Fu @ 2020-03-30 19:02 UTC (permalink / raw) To: Stefan Monnier Cc: Eli Zaretskii, Andrea Corallo, Štěpán Němec, emacs-devel > On Mar 30, 2020, at 2:46 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403 >> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so >> the buffer changes caused by populating dired buffers are not noticeable >> in `after-change-functions'.) >> I was wondering if I should report it as a bug, despite the workaround >> not being particularly painful in this case (there's `dired-after-readin-hook'). > > I think it deserves a bug report, yes. > > > Stefan > Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced). Yuan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 19:02 ` Yuan Fu @ 2020-03-30 19:10 ` Eli Zaretskii 2020-03-30 19:21 ` Yuan Fu 2020-04-01 0:57 ` Stephen Leake 2020-03-30 19:42 ` Stefan Monnier 1 sibling, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-03-30 19:10 UTC (permalink / raw) To: Yuan Fu; +Cc: akrl, stepnem, monnier, emacs-devel > From: Yuan Fu <casouri@gmail.com> > Date: Mon, 30 Mar 2020 15:02:58 -0400 > Cc: Štěpán Němec <stepnem@gmail.com>, > Eli Zaretskii <eliz@gnu.org>, > emacs-devel <emacs-devel@gnu.org>, > Andrea Corallo <akrl@sdf.org> > > >> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so > >> the buffer changes caused by populating dired buffers are not noticeable > >> in `after-change-functions'.) > >> I was wondering if I should report it as a bug, despite the workaround > >> not being particularly painful in this case (there's `dired-after-readin-hook'). > > > > I think it deserves a bug report, yes. > > > > > > Stefan > > > > Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced). I agree with Stefan: it's a bug. All dired-readin needs to do is call the modification hooks after it's done reading in the directory. It's just an optimization that it inhibits the hooks while it runs: read the comments there and you will see why it is done. IMO, inhibit-modification-hooks is for when some code makes a temporary change, or a change that no one is supposed to care about, like changing faces. Any other case is a bug. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 19:10 ` Eli Zaretskii @ 2020-03-30 19:21 ` Yuan Fu 2020-03-31 3:56 ` Štěpán Němec 2020-04-01 0:57 ` Stephen Leake 1 sibling, 1 reply; 142+ messages in thread From: Yuan Fu @ 2020-03-30 19:21 UTC (permalink / raw) To: Eli Zaretskii Cc: akrl, Štěpán Němec, Stefan Monnier, emacs-devel > On Mar 30, 2020, at 3:10 PM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Yuan Fu <casouri@gmail.com> >> Date: Mon, 30 Mar 2020 15:02:58 -0400 >> Cc: Štěpán Němec <stepnem@gmail.com>, >> Eli Zaretskii <eliz@gnu.org>, >> emacs-devel <emacs-devel@gnu.org>, >> Andrea Corallo <akrl@sdf.org> >> >>>> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so >>>> the buffer changes caused by populating dired buffers are not noticeable >>>> in `after-change-functions'.) >>>> I was wondering if I should report it as a bug, despite the workaround >>>> not being particularly painful in this case (there's `dired-after-readin-hook'). >>> >>> I think it deserves a bug report, yes. >>> >>> >>> Stefan >>> >> >> Is it really a bug of dired-mode? Dired-mode probably has a good reason to bind `inhibit-modification-hooks` to t. And if we provide such feature (disabling after-change-functions), we should expect people using it. Maybe there should be a reliable way to be informed of buffer changes (that cannot be silenced). > > I agree with Stefan: it's a bug. All dired-readin needs to do is call > the modification hooks after it's done reading in the directory. It's > just an optimization that it inhibits the hooks while it runs: read > the comments there and you will see why it is done. > > IMO, inhibit-modification-hooks is for when some code makes a > temporary change, or a change that no one is supposed to care about, > like changing faces. Any other case is a bug. I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'. Yuan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 19:21 ` Yuan Fu @ 2020-03-31 3:56 ` Štěpán Němec 2020-03-31 13:16 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Štěpán Němec @ 2020-03-31 3:56 UTC (permalink / raw) To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, Stefan Monnier, akrl [-- Attachment #1: Type: text/plain, Size: 561 bytes --] On Mon, 30 Mar 2020 15:21:10 -0400 Yuan Fu wrote: >> IMO, inhibit-modification-hooks is for when some code makes a >> temporary change, or a change that no one is supposed to care about, >> like changing faces. Any other case is a bug. > > I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'. I think the explanation in (info "(elisp) Change Hooks") is quite good, but the doc string had better clarify the usage as well. How about the attached patch? -- Štěpán [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Clarify-inhibit-modification-hooks-intended-usage-in.patch --] [-- Type: text/x-patch, Size: 1412 bytes --] From df7e9e1eb9e9ead46c9c8596d7f844e8b7f4d10b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com> Date: Tue, 31 Mar 2020 05:38:50 +0200 Subject: [PATCH] Clarify inhibit-modification-hooks intended usage in its doc string Cf. bug#40332 and the discussion at https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html --- src/insdel.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/insdel.c b/src/insdel.c index 21acf0e61d..a9fb25a27d 100644 --- a/src/insdel.c +++ b/src/insdel.c @@ -2397,7 +2397,13 @@ syms_of_insdel (void) as well as hooks attached to text properties and overlays. Setting this variable non-nil also inhibits file locks and checks whether files are locked by another Emacs session, as well as -handling of the active region per `select-active-regions'. */); +handling of the active region per `select-active-regions'. + +This variable should only be used for modifications that do not result +in lasting changes to buffer text contents (for example face changes or +temporary modifications). If you only need to delay change hooks during +a series of changes (typically for performance reasons), you can use +`combine-change-calls' or `combine-after-change-calls' instead. */); inhibit_modification_hooks = 0; DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks"); -- 2.26.0 ^ permalink raw reply related [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 3:56 ` Štěpán Němec @ 2020-03-31 13:16 ` Eli Zaretskii 2020-03-31 13:36 ` Štěpán Němec 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 13:16 UTC (permalink / raw) To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel > From: Štěpán Němec <stepnem@gmail.com> > Date: Tue, 31 Mar 2020 05:56:55 +0200 > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org>, > Stefan Monnier <monnier@iro.umontreal.ca>, akrl@sdf.org > > > I see. Then I suggest mentioning it (when you should use the variable) in the documentation of `inhibit-modification-hooks'. > > I think the explanation in (info "(elisp) Change Hooks") is quite good, > but the doc string had better clarify the usage as well. > > How about the attached patch? Thanks, I think this is too wordy for a doc string. I think it should be enough to mention the two variables ("See also ...") and maybe add a link to the ELisp manual section you mention. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 13:16 ` Eli Zaretskii @ 2020-03-31 13:36 ` Štěpán Němec 2020-03-31 14:34 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Štěpán Němec @ 2020-03-31 13:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel On Tue, 31 Mar 2020 16:16:20 +0300 Eli Zaretskii wrote: >> I think the explanation in (info "(elisp) Change Hooks") is quite good, >> but the doc string had better clarify the usage as well. >> >> How about the attached patch? > > Thanks, I think this is too wordy for a doc string. I think it should > be enough to mention the two variables ("See also ...") and maybe add > a link to the ELisp manual section you mention. In that case, could we add the "should" part (or something similar) to the manual (in addition to the doc string reference you describe)? It is true that careful reading of the manual and the relevant doc strings as they are now could suffice to make an informed decision on when `inhibit-modification-hooks' is (in)appropriate, but I think having some kind of explicit heads-up or dissuation regarding the likely misuse would be better. -- Štěpán ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 13:36 ` Štěpán Němec @ 2020-03-31 14:34 ` Eli Zaretskii 2020-03-31 15:37 ` Štěpán Němec 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 14:34 UTC (permalink / raw) To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel > From: Štěpán Němec <stepnem@gmail.com> > Cc: casouri@gmail.com, emacs-devel@gnu.org, monnier@iro.umontreal.ca, > akrl@sdf.org > Date: Tue, 31 Mar 2020 15:36:21 +0200 > > > Thanks, I think this is too wordy for a doc string. I think it should > > be enough to mention the two variables ("See also ...") and maybe add > > a link to the ELisp manual section you mention. > > In that case, could we add the "should" part (or something similar) to > the manual (in addition to the doc string reference you describe)? Most probably yes, but could you show the change you had in mind for the manual? > It is true that careful reading of the manual and the relevant doc > strings as they are now could suffice to make an informed decision > on when `inhibit-modification-hooks' is (in)appropriate, but I think > having some kind of explicit heads-up or dissuation regarding the > likely misuse would be better. I agree, and the manual is the place to have such discussions and recommendations. Thanks. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 14:34 ` Eli Zaretskii @ 2020-03-31 15:37 ` Štěpán Němec 2020-03-31 15:58 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Štěpán Němec @ 2020-03-31 15:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl [-- Attachment #1: Type: text/plain, Size: 342 bytes --] On Tue, 31 Mar 2020 17:34:59 +0300 Eli Zaretskii wrote: >> In that case, could we add the "should" part (or something similar) to >> the manual (in addition to the doc string reference you describe)? > > Most probably yes, but could you show the change you had in mind for > the manual? Another attempt attached. Štěpán [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Clarify-documentation-on-inhibit-modification-hooks-.patch --] [-- Type: text/x-patch, Size: 2038 bytes --] From ccf0390392b08bcc1aa9aff24bb62dd3bb4bbfbd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com> Date: Tue, 31 Mar 2020 05:38:50 +0200 Subject: [PATCH] Clarify documentation on inhibit-modification-hooks intended usage Cf. bug#40332 and the discussion at https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html --- doc/lispref/text.texi | 7 +++++++ src/insdel.c | 8 +++++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 3bb055a68d..daba03fadf 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi @@ -5776,4 +5776,11 @@ Change Hooks may cause recursive calls to the modification hooks, so be sure to prepare for that (for example, by binding some variable which tells your hook to do nothing). + +@strong{Warning:} You should only bind this variable for modifications +that do not result in lasting changes to buffer text contents (for +example face changes or temporary modifications). If you need to +delay change hooks during a series of changes (typically for +performance reasons), use @code{combine-change-calls} or +@code{combine-after-change-calls} instead. @end defvar diff --git a/src/insdel.c b/src/insdel.c index 21acf0e61d..236346fada 100644 --- a/src/insdel.c +++ b/src/insdel.c @@ -2397,7 +2397,13 @@ syms_of_insdel (void) as well as hooks attached to text properties and overlays. Setting this variable non-nil also inhibits file locks and checks whether files are locked by another Emacs session, as well as -handling of the active region per `select-active-regions'. */); +handling of the active region per `select-active-regions'. + +To delay change hooks during a series of changes, use +`combine-change-calls' or `combine-after-change-calls' instead of +modifying this variable. + +See also the info node `(elisp) Change Hooks'. */); inhibit_modification_hooks = 0; DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks"); -- 2.26.0 ^ permalink raw reply related [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 15:37 ` Štěpán Němec @ 2020-03-31 15:58 ` Eli Zaretskii 2020-03-31 16:18 ` Štěpán Němec 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 15:58 UTC (permalink / raw) To: Štěpán Němec; +Cc: casouri, emacs-devel, monnier, akrl > From: Štěpán Němec <stepnem@gmail.com> > Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca, > emacs-devel@gnu.org > Date: Tue, 31 Mar 2020 17:37:22 +0200 > > Another attempt attached. Thanks. I have a couple of minor nits: > +@strong{Warning:} You should only bind this variable for modifications I'd prefer to remove the warning, and say "We recommend that..." rather than "You should only...". > +To delay change hooks during a series of changes, use > +`combine-change-calls' or `combine-after-change-calls' instead of > +modifying this variable. ^^^^^^^^^ "binding" Other than that, LGTM. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 15:58 ` Eli Zaretskii @ 2020-03-31 16:18 ` Štěpán Němec 2020-03-31 17:38 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Štěpán Němec @ 2020-03-31 16:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel [-- Attachment #1: Type: text/plain, Size: 492 bytes --] On Tue, 31 Mar 2020 18:58:58 +0300 Eli Zaretskii wrote: >> +@strong{Warning:} You should only bind this variable for modifications > > I'd prefer to remove the warning, and say "We recommend that..." > rather than "You should only...". > >> +To delay change hooks during a series of changes, use >> +`combine-change-calls' or `combine-after-change-calls' instead of >> +modifying this variable. > ^^^^^^^^^ > "binding" Updated version attached, thank you. Štěpán [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Clarify-documentation-on-inhibit-modification-hooks-.patch --] [-- Type: text/x-patch, Size: 2029 bytes --] From 8e2a5a8c8381c85d138f34d37931c52c289da2ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0t=C4=9Bp=C3=A1n=20N=C4=9Bmec?= <stepnem@gmail.com> Date: Tue, 31 Mar 2020 05:38:50 +0200 Subject: [PATCH] Clarify documentation on inhibit-modification-hooks intended usage Cf. bug#40332 and the discussion at https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00921.html --- doc/lispref/text.texi | 7 +++++++ src/insdel.c | 8 +++++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 3bb055a68d..0d32c571b7 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi @@ -5776,4 +5776,11 @@ Change Hooks may cause recursive calls to the modification hooks, so be sure to prepare for that (for example, by binding some variable which tells your hook to do nothing). + +We recommend that you only bind this variable for modifications that +do not result in lasting changes to buffer text contents (for example +face changes or temporary modifications). If you need to delay change +hooks during a series of changes (typically for performance reasons), +use @code{combine-change-calls} or @code{combine-after-change-calls} +instead. @end defvar diff --git a/src/insdel.c b/src/insdel.c index 21acf0e61d..dfa1cc311c 100644 --- a/src/insdel.c +++ b/src/insdel.c @@ -2397,7 +2397,13 @@ syms_of_insdel (void) as well as hooks attached to text properties and overlays. Setting this variable non-nil also inhibits file locks and checks whether files are locked by another Emacs session, as well as -handling of the active region per `select-active-regions'. */); +handling of the active region per `select-active-regions'. + +To delay change hooks during a series of changes, use +`combine-change-calls' or `combine-after-change-calls' instead of +binding this variable. + +See also the info node `(elisp) Change Hooks'. */); inhibit_modification_hooks = 0; DEFSYM (Qinhibit_modification_hooks, "inhibit-modification-hooks"); -- 2.26.0 ^ permalink raw reply related [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 16:18 ` Štěpán Němec @ 2020-03-31 17:38 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 17:38 UTC (permalink / raw) To: Štěpán Němec; +Cc: casouri, akrl, monnier, emacs-devel > From: Štěpán Němec <stepnem@gmail.com> > Cc: casouri@gmail.com, emacs-devel@gnu.org, monnier@iro.umontreal.ca, > akrl@sdf.org > Date: Tue, 31 Mar 2020 18:18:57 +0200 > > Updated version attached, thank you. Perfect, thanks. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 19:10 ` Eli Zaretskii 2020-03-30 19:21 ` Yuan Fu @ 2020-04-01 0:57 ` Stephen Leake 1 sibling, 0 replies; 142+ messages in thread From: Stephen Leake @ 2020-04-01 0:57 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> Is it really a bug of dired-mode? Dired-mode probably has a good >> reason to bind `inhibit-modification-hooks` to t. And if we provide >> such feature (disabling after-change-functions), we should expect >> people using it. Maybe there should be a reliable way to be informed >> of buffer changes (that cannot be silenced). > > I agree with Stefan: it's a bug. All dired-readin needs to do is call > the modification hooks after it's done reading in the directory. It's > just an optimization that it inhibits the hooks while it runs: read > the comments there and you will see why it is done. > > IMO, inhibit-modification-hooks is for when some code makes a > temporary change, or a change that no one is supposed to care about, > like changing faces. Any other case is a bug. ada-mode occasionally binds wisi-inhibit-parse for a similar reason; it is writing Ada source, so it is about to make several changes, during which the buffer will be syntactically incorrect, but it will be correct when done. The wisi after-change-functions still record changed regions, but the parser is not called until all the changes are done. Perhaps tree-sitter and eglot could use a similar approach. -- -- Stephe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 19:02 ` Yuan Fu 2020-03-30 19:10 ` Eli Zaretskii @ 2020-03-30 19:42 ` Stefan Monnier 1 sibling, 0 replies; 142+ messages in thread From: Stefan Monnier @ 2020-03-30 19:42 UTC (permalink / raw) To: Yuan Fu Cc: Eli Zaretskii, Andrea Corallo, Štěpán Němec, emacs-devel >>> https://gitlab.com/stepnem/stripes-el/-/issues/1#note_309176403 >>> (In short, `dired-readin' binds `inhibit-modification-hooks' to t, so >>> the buffer changes caused by populating dired buffers are not noticeable >>> in `after-change-functions'.) >>> I was wondering if I should report it as a bug, despite the workaround >>> not being particularly painful in this case (there's `dired-after-readin-hook'). >> I think it deserves a bug report, yes. > Is it really a bug of dired-mode? Just file the bug report and send me the bug number so I can include it in the commit of the fix I have here ready to be installed. Stefan "if you have to wonder if it's a bug, then file it as a bug" ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 18:46 ` Stefan Monnier 2020-03-30 19:02 ` Yuan Fu @ 2020-03-30 19:27 ` Štěpán Němec 1 sibling, 0 replies; 142+ messages in thread From: Štěpán Němec @ 2020-03-30 19:27 UTC (permalink / raw) To: Stefan Monnier; +Cc: Yuan Fu, emacs-devel, Eli Zaretskii, Andrea Corallo On Mon, 30 Mar 2020 14:46:48 -0400 Stefan Monnier wrote: > I think it deserves a bug report, yes. Done (bug#40332), thanks. -- Štěpán ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-30 17:14 ` Yuan Fu 2020-03-30 17:54 ` Stefan Monnier @ 2020-03-31 2:24 ` Eli Zaretskii 2020-03-31 3:10 ` Stefan Monnier 1 sibling, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 2:24 UTC (permalink / raw) To: Yuan Fu; +Cc: akrl, monnier, emacs-devel > From: Yuan Fu <casouri@gmail.com> > Date: Mon, 30 Mar 2020 13:14:02 -0400 > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org, > Andrea Corallo <akrl@sdf.org> > > Why not simply pass to tree-sitter the chunk that jit-lock is about to > fontify? > > Incremental parsing seems to be the preferred way to use tree-sitter—maintaining a syntax tree on the fly > and later query for information from it. I don't see how this contradicts my proposal of passing just the chunk that we need to fontify. The function that actually passes the portion of the buffer to tree-sitter can always extend the chunk in both direction to make it easier, like make sure it's a complete code block or something. IOW, our goal is not to build the syntax tree, it's to give tree-sitter enough information to allow us to fontify the part that's about to be displayed. We need to have tree-sitter play by Emacs rules, not teach Emacs to play by tree-sitter rules. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 2:24 ` Eli Zaretskii @ 2020-03-31 3:10 ` Stefan Monnier 2020-03-31 13:14 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-03-31 3:10 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Yuan Fu, akrl, emacs-devel > IOW, our goal is not to build the syntax tree, it's to give > tree-sitter enough information to allow us to fontify the part that's > about to be displayed. We need to have tree-sitter play by Emacs > rules, not teach Emacs to play by tree-sitter rules. IIUC, tree-sitter starts by parsing the whole buffer anyway, and then keeps the parse tree up-to-date in response to buffer changes. Its algorithm is tuned so that the time needed to update the tree is more or less proportional to the size of the change. So jit-lock/font-lock doesn't need to pass any part of the buffer to tree-sitter: tree-sitter already has the buffer's content and we can assume its already parsed. What emacs-tree-sitter's proposed tree-sitter-highlight does is provide a function which takes a START..END, then finds which part of the existing parse tree cover that region and "reads the tree" to fontify the corresponding text. Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 3:10 ` Stefan Monnier @ 2020-03-31 13:14 ` Eli Zaretskii 2020-03-31 14:31 ` Dmitry Gutov ` (2 more replies) 0 siblings, 3 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 13:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Yuan Fu <casouri@gmail.com>, emacs-devel@gnu.org, akrl@sdf.org > Date: Mon, 30 Mar 2020 23:10:57 -0400 > > > IOW, our goal is not to build the syntax tree, it's to give > > tree-sitter enough information to allow us to fontify the part that's > > about to be displayed. We need to have tree-sitter play by Emacs > > rules, not teach Emacs to play by tree-sitter rules. > > IIUC, tree-sitter starts by parsing the whole buffer anyway, and then > keeps the parse tree up-to-date in response to buffer changes. Why does it need the entire buffer up front? that sounds like a potential performance killer. Fontifying a small part of a buffer doesn't need its entire text. In any case, I hope that passing the buffer to tree-sitter doesn't involve marshalling the entire buffer text via a function call as a huge string, or some such. We should instead request that tree-sitter exposes an API through which we could give it direct access to buffer text as 2 parts, before and after the gap, like we do with regex code. Otherwise this will be a bottleneck in the long run, not unlike the problem we have with LSP. > Its algorithm is tuned so that the time needed to update the tree is > more or less proportional to the size of the change. > > So jit-lock/font-lock doesn't need to pass any part of the buffer to > tree-sitter: tree-sitter already has the buffer's content and we can > assume its already parsed. What emacs-tree-sitter's proposed > tree-sitter-highlight does is provide a function which takes > a START..END, then finds which part of the existing parse tree cover > that region and "reads the tree" to fontify the corresponding text. I still don't see why it would need the entire buffer for this class of applications. Did anyone try the alternatives, in particular on very large buffers? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 13:14 ` Eli Zaretskii @ 2020-03-31 14:31 ` Dmitry Gutov 2020-03-31 15:36 ` Eli Zaretskii 2020-03-31 15:11 ` Stefan Monnier 2020-03-31 16:13 ` Alan Third 2 siblings, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-03-31 14:31 UTC (permalink / raw) To: Eli Zaretskii, Stefan Monnier; +Cc: casouri, emacs-devel, akrl On 31.03.2020 16:14, Eli Zaretskii wrote: > Why does it need the entire buffer up front? that sounds like a > potential performance killer. Fontifying a small part of a buffer > doesn't need its entire text. Because the end product of parsing the buffer is an AST, and the author decided to minimize the odds of problems that come with incomplete/broken ASTs. The previous (first) discussion of TreeSitter has an URL to a presentation video. You can give it a watch. Regarding performance, their solution is to make first parsing as fast as possible, and updates to an existing AST faster still. As for the difficulty of sending the whole buffer contents... maybe VS Code and Atom somehow make it easier? If so, someone should investigate why it has to be slower in Emacs. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 14:31 ` Dmitry Gutov @ 2020-03-31 15:36 ` Eli Zaretskii 2020-03-31 15:45 ` Dmitry Gutov 2020-03-31 17:16 ` Stefan Monnier 0 siblings, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 15:36 UTC (permalink / raw) To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl > Cc: casouri@gmail.com, akrl@sdf.org, emacs-devel@gnu.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Tue, 31 Mar 2020 17:31:43 +0300 > > On 31.03.2020 16:14, Eli Zaretskii wrote: > > Why does it need the entire buffer up front? that sounds like a > > potential performance killer. Fontifying a small part of a buffer > > doesn't need its entire text. > > Because the end product of parsing the buffer is an AST, and the author > decided to minimize the odds of problems that come with > incomplete/broken ASTs. But it definitely can work with parts of the buffer, and we don't need it to have a complete AST for this particular job. > The previous (first) discussion of TreeSitter has an URL to a > presentation video. You can give it a watch. Thanks, I've watched it back in January, when I wrote my message calling for its integration. > Regarding performance, their solution is to make first parsing as fast > as possible, and updates to an existing AST faster still. I'm talking about _our_ performance, not theirs. > As for the difficulty of sending the whole buffer contents... maybe VS > Code and Atom somehow make it easier? If so, someone should investigate > why it has to be slower in Emacs. It should be obvious that sending a buffer as a single string is less efficient than letting tree-sitter access buffer text directly. We just need an appropriate API for that (maybe there is one already, I didn't take a look at their sources since January). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 15:36 ` Eli Zaretskii @ 2020-03-31 15:45 ` Dmitry Gutov 2020-03-31 17:16 ` Stefan Monnier 1 sibling, 0 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-03-31 15:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl On 31.03.2020 18:36, Eli Zaretskii wrote: > But it definitely can work with parts of the buffer, and we don't need > it to have a complete AST for this particular job. Syntax highlighting can and often does depend on buffer contents after the region. It's one thing to mis-highlight a part of the buffer because the contents are incomplete (the user hasn't typed the full expression). It's another thing to mis-highlight it because the chunk requested by jit-lock ended on a particular ambiguous position. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 15:36 ` Eli Zaretskii 2020-03-31 15:45 ` Dmitry Gutov @ 2020-03-31 17:16 ` Stefan Monnier 2020-03-31 17:48 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-03-31 17:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel, Dmitry Gutov > It should be obvious that sending a buffer as a single string is less > efficient than letting tree-sitter access buffer text directly. We > just need an appropriate API for that (maybe there is one already, I > didn't take a look at their sources since January). My benchmark say that `buffer-string` takes about 1/3 the time of `parse-partial-sexp`, so letting tree-sitter access our buffer text directly is unlikely to give more than a 30% speed up. It doesn't mean it wouldn't be a desirable optimization, but it does mean that it likely won't make a large difference as to whether it's "fast enough". Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 17:16 ` Stefan Monnier @ 2020-03-31 17:48 ` Eli Zaretskii 2020-03-31 19:35 ` Stefan Monnier 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 17:48 UTC (permalink / raw) To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel, dgutov > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Dmitry Gutov <dgutov@yandex.ru>, casouri@gmail.com, akrl@sdf.org, > emacs-devel@gnu.org > Date: Tue, 31 Mar 2020 13:16:33 -0400 > > > It should be obvious that sending a buffer as a single string is less > > efficient than letting tree-sitter access buffer text directly. We > > just need an appropriate API for that (maybe there is one already, I > > didn't take a look at their sources since January). > > My benchmark say that `buffer-string` takes about 1/3 the time of > `parse-partial-sexp`, so letting tree-sitter access our buffer text > directly is unlikely to give more than a 30% speed up. Sure, but we never call parse-partial-sexp on the entire buffer, do we? > It doesn't mean it wouldn't be a desirable optimization, but it does > mean that it likely won't make a large difference as to whether it's > "fast enough". I disagree. Communicating with a C library by making a string out of buffer text is extremely inelegant and inefficient. We shouldn't do that except when the strings are very short. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 17:48 ` Eli Zaretskii @ 2020-03-31 19:35 ` Stefan Monnier 2020-04-01 2:23 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-03-31 19:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel, dgutov >> > It should be obvious that sending a buffer as a single string is less >> > efficient than letting tree-sitter access buffer text directly. We >> > just need an appropriate API for that (maybe there is one already, I >> > didn't take a look at their sources since January). >> My benchmark say that `buffer-string` takes about 1/3 the time of >> `parse-partial-sexp`, so letting tree-sitter access our buffer text >> directly is unlikely to give more than a 30% speed up. > Sure, but we never call parse-partial-sexp on the entire buffer, do we? Not sure how that's relevant. I only used `parse-partial-sexp` as a lower bound on the time tree-sitter is likely to take to do its own parsing. >> It doesn't mean it wouldn't be a desirable optimization, but it does >> mean that it likely won't make a large difference as to whether it's >> "fast enough". > I disagree. Your disagreement doesn't seem to be with what I said: I didn't argue about the elegance or efficiency, only about the fact that the performance impact is likely to be small enough that it's not going to affect the viability of the approach. > Communicating with a C library by making a string out of buffer text > is extremely inelegant and inefficient. We shouldn't do that except > when the strings are very short. FWIW, elegant/efficient or not, that's the standard way to do it, AFAICT. E.g. that's what we do in `secure-hash`, that's what we do when parsing JSON, ... You basically always need to en/decode the content (even if it is into utf-8, we still need to handle the potential raw-bytes), so a copy is hard to avoid. Note that for regexp-matching the problem is slightly different because we don't know beforehand which part of the buffer will be consulted, so doing a "copy and then regmatch" would be too inefficient (we'd always need to copy everything til point-max). Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 19:35 ` Stefan Monnier @ 2020-04-01 2:23 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 2:23 UTC (permalink / raw) To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel, dgutov > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: dgutov@yandex.ru, casouri@gmail.com, akrl@sdf.org, emacs-devel@gnu.org > Date: Tue, 31 Mar 2020 15:35:41 -0400 > > You basically always need to en/decode the content (even if it is into > utf-8, we still need to handle the potential raw-bytes), so a copy is > hard to avoid. It isn't hard in this case, AFAICT. Tree-sitter has an API where we can provide a function that will deliver text at a given offset. We should use that to access buffer text directly. We can avoid encoding the buffer text by converting raw bytes into something like U+FFFD, or something else that tree-sitter will ignore. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 13:14 ` Eli Zaretskii 2020-03-31 14:31 ` Dmitry Gutov @ 2020-03-31 15:11 ` Stefan Monnier 2020-03-31 15:44 ` Eli Zaretskii 2020-03-31 16:13 ` Alan Third 2 siblings, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-03-31 15:11 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel >> IIUC, tree-sitter starts by parsing the whole buffer anyway, and then >> keeps the parse tree up-to-date in response to buffer changes. > Why does it need the entire buffer up front? Because as a general rule you cannot parse a region without looking at all the preceding text. That's why when we fontify START..BEG we need to begin by computing the `syntax-ppss` at START, which involved passing the whole text from `point-min` to START though `parse-partial-sexp`. > that sounds like a potential performance killer. Indeed. And so does this `syntax-ppss` call we have. It's OK as long as the parsing is fast enough and you don't use it in too large buffers. E.g. I expect that most programming major modes currently exhibit significant delays when you jump to the end of multi-GB buffer because of that `syntax-ppss` call. > Fontifying a small part of a buffer doesn't need its entire text. Sadly, it does. In specific cases you may be able to speed things up, but that's only applicable to some cases. I'm sure there could be other approaches that focus on trying to parse as little of the buffer text as possible (e.g. SMIE follows this kind of idea), but it's difficult to make them work with a "normal" grammar, providing a full parse tree and giving a reliable result (and without it degenerating to parsing the whole buffer anyway in most cases). > In any case, I hope that passing the buffer to tree-sitter doesn't > involve marshalling the entire buffer text via a function call as a > huge string, or some such. These are internal implementation details that can be tweaked later on. I do expect that the code currently needs to call `buffer-string` or its moral equivalent. But if the resources this requires are significant enough to worry about, then it's a great news: it means the parsing itself is very fast. > We should instead request that tree-sitter exposes an API through > which we could give it direct access to buffer text as 2 parts, before > and after the gap, like we do with regex code. Otherwise this will be > a bottleneck in the long run, not unlike the problem we have with LSP. I'm not sure exactly which problem with LSP you're thinking about, but I doubt `buffer-string` is a significant component of a performance problem with LSP: the time to pass that string to the server via a pipe should dwarf it. > I still don't see why it would need the entire buffer for this class > of applications. Did anyone try the alternatives, in particular on > very large buffers? What alternatives? How large is "very large" here? Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 15:11 ` Stefan Monnier @ 2020-03-31 15:44 ` Eli Zaretskii 2020-03-31 17:10 ` Stefan Monnier 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 15:44 UTC (permalink / raw) To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: casouri@gmail.com, emacs-devel@gnu.org, akrl@sdf.org > Date: Tue, 31 Mar 2020 11:11:22 -0400 > > > I still don't see why it would need the entire buffer for this class > > of applications. Did anyone try the alternatives, in particular on > > very large buffers? > > What alternatives? Let tree-sitter see just a portion of the buffer, like the outer block of what will be displayed in the window. You are saying that this is impossible, but do tree-sitter developers also say that? > How large is "very large" here? xdisp.c comes to mind, obviously. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 15:44 ` Eli Zaretskii @ 2020-03-31 17:10 ` Stefan Monnier 2020-03-31 17:19 ` Jorge Javier Araya Navarro 2020-03-31 17:46 ` Eli Zaretskii 0 siblings, 2 replies; 142+ messages in thread From: Stefan Monnier @ 2020-03-31 17:10 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, emacs-devel >> > I still don't see why it would need the entire buffer for this class >> > of applications. Did anyone try the alternatives, in particular on >> > very large buffers? >> What alternatives? > Let tree-sitter see just a portion of the buffer, like the outer block > of what will be displayed in the window. You are saying that this is > impossible, I think it would be definitely possible if you present "from point-min to POS". But "from START to END" is much more difficult, yes. > but do tree-sitter developers also say that? You'd have to ask them. But what I say is based on the knowledge I gleaned by reading the academic literature that the tree-sitter authors cite (I did that while working on an article on SMIE ;-) In any case, your question is really about the design of tree-sitter rather than the design of the interface between tree-sitter and Emacs. AFAICT tree-sitter is pretty close to the state of the art in this area, so I think it's worth trying it out to see how it performs before considering changing its design. >> How large is "very large" here? > xdisp.c comes to mind, obviously. I'd expect tree-sitter to be able to parse xdisp.c in one second or less. Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 17:10 ` Stefan Monnier @ 2020-03-31 17:19 ` Jorge Javier Araya Navarro 2020-03-31 17:46 ` Eli Zaretskii 1 sibling, 0 replies; 142+ messages in thread From: Jorge Javier Araya Navarro @ 2020-03-31 17:19 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, casouri, akrl [-- Attachment #1: Type: text/plain, Size: 1673 bytes --] >>> How large is "very large" here? >> xdisp.c comes to mind, obviously. > > I'd expect tree-sitter to be able to parse xdisp.c in one second or less. It's funny because this can be tested doing a C program, sadly I don't have the time now for writting it. El mar., 31 de mar. de 2020 a la(s) 11:10, Stefan Monnier ( monnier@iro.umontreal.ca) escribió: > >> > I still don't see why it would need the entire buffer for this class > >> > of applications. Did anyone try the alternatives, in particular on > >> > very large buffers? > >> What alternatives? > > Let tree-sitter see just a portion of the buffer, like the outer block > > of what will be displayed in the window. You are saying that this is > > impossible, > > I think it would be definitely possible if you present "from point-min > to POS". But "from START to END" is much more difficult, yes. > > > but do tree-sitter developers also say that? > > You'd have to ask them. But what I say is based on the knowledge > I gleaned by reading the academic literature that the tree-sitter > authors cite (I did that while working on an article on SMIE ;-) > > In any case, your question is really about the design of tree-sitter > rather than the design of the interface between tree-sitter and Emacs. > > AFAICT tree-sitter is pretty close to the state of the art in this area, > so I think it's worth trying it out to see how it performs before > considering changing its design. > > >> How large is "very large" here? > > xdisp.c comes to mind, obviously. > > I'd expect tree-sitter to be able to parse xdisp.c in one second or less. > > > Stefan > > > [-- Attachment #2: Type: text/html, Size: 2465 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 17:10 ` Stefan Monnier 2020-03-31 17:19 ` Jorge Javier Araya Navarro @ 2020-03-31 17:46 ` Eli Zaretskii 2020-03-31 18:42 ` 조성빈 2020-03-31 18:47 ` Dmitry Gutov 1 sibling, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 17:46 UTC (permalink / raw) To: Stefan Monnier; +Cc: casouri, akrl, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: casouri@gmail.com, emacs-devel@gnu.org, akrl@sdf.org > Date: Tue, 31 Mar 2020 13:10:27 -0400 > > >> How large is "very large" here? > > xdisp.c comes to mind, obviously. > > I'd expect tree-sitter to be able to parse xdisp.c in one second or less. One second of delay before the first window-full is displayed? This is like infinity. And you didn't account for the time to take buffer-string of the entire buffer (which involves allocating a large chunk of memory), then encode it in UTF-8 (which needs to allocate another chunk of memory), and pass that to tree-sitter. If that's what the current interface does. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 17:46 ` Eli Zaretskii @ 2020-03-31 18:42 ` 조성빈 2020-03-31 19:29 ` Eli Zaretskii 2020-03-31 18:47 ` Dmitry Gutov 1 sibling, 1 reply; 142+ messages in thread From: 조성빈 @ 2020-03-31 18:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Stefan Monnier, casouri, akrl, Emacs-devel > 2020. 4. 1. 오전 2:53, Eli Zaretskii <eliz@gnu.org> 작성: > > >> >> From: Stefan Monnier <monnier@iro.umontreal.ca> >> Cc: casouri@gmail.com, emacs-devel@gnu.org, akrl@sdf.org >> Date: Tue, 31 Mar 2020 13:10:27 -0400 >> >>>> How large is "very large" here? >>> xdisp.c comes to mind, obviously. >> >> I'd expect tree-sitter to be able to parse xdisp.c in one second or less. > > One second of delay before the first window-full is displayed? This > is like infinity. Maybe I misunderstood, or maybe it’s just b.c. I don’t know enough internals, but doesn’t Emacs just display the raw text until highlighting is finished? It wouldn’t be an experience of not seeing the text for a sec, it would be more of a see the text and highlights are applied later. > And you didn't account for the time to take buffer-string of the > entire buffer (which involves allocating a large chunk of memory), > then encode it in UTF-8 (which needs to allocate another chunk of > memory), and pass that to tree-sitter. If that's what the current > interface does. > ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 18:42 ` 조성빈 @ 2020-03-31 19:29 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 19:29 UTC (permalink / raw) To: 조성빈; +Cc: casouri, Emacs-devel, monnier, akrl > From: 조성빈 <pcr910303@icloud.com> > Date: Wed, 1 Apr 2020 03:42:31 +0900 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, casouri@gmail.com, > akrl@sdf.org, Emacs-devel@gnu.org > > > One second of delay before the first window-full is displayed? This > > is like infinity. > > Maybe I misunderstood, or maybe it’s just b.c. I don’t know enough internals, but doesn’t Emacs just display the raw text until highlighting is finished? I guess you are talking about jit-lock-defer-time and friends. That's off by default. The default behavior is to fontify completely the chunk that is about to be displayed (actually, we fontify slightly more than that). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 17:46 ` Eli Zaretskii 2020-03-31 18:42 ` 조성빈 @ 2020-03-31 18:47 ` Dmitry Gutov 2020-03-31 18:48 ` Noam Postavsky 2020-03-31 19:26 ` Eli Zaretskii 1 sibling, 2 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-03-31 18:47 UTC (permalink / raw) To: Eli Zaretskii, Stefan Monnier; +Cc: casouri, emacs-devel, akrl On 31.03.2020 20:46, Eli Zaretskii wrote: > One second of delay before the first window-full is displayed? This > is like infinity. This is what we have now: (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max)))) => Elapsed time: 1.940401s (0.376140s in 6 GCs) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 18:47 ` Dmitry Gutov @ 2020-03-31 18:48 ` Noam Postavsky 2020-03-31 19:02 ` Dmitry Gutov 2020-03-31 19:26 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: Noam Postavsky @ 2020-03-31 18:48 UTC (permalink / raw) To: Dmitry Gutov Cc: Eli Zaretskii, akrl, Yuan Fu, Stefan Monnier, Emacs developers On Tue, 31 Mar 2020 at 14:47, Dmitry Gutov <dgutov@yandex.ru> wrote: > > On 31.03.2020 20:46, Eli Zaretskii wrote: > > One second of delay before the first window-full is displayed? This > > is like infinity. > > This is what we have now: Except that s/first window-full/last window-full/ ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 18:48 ` Noam Postavsky @ 2020-03-31 19:02 ` Dmitry Gutov 0 siblings, 0 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-03-31 19:02 UTC (permalink / raw) To: Noam Postavsky Cc: Eli Zaretskii, akrl, Yuan Fu, Stefan Monnier, Emacs developers On 31.03.2020 21:48, Noam Postavsky wrote: > On Tue, 31 Mar 2020 at 14:47, Dmitry Gutov<dgutov@yandex.ru> wrote: >> On 31.03.2020 20:46, Eli Zaretskii wrote: >>> One second of delay before the first window-full is displayed? This >>> is like infinity. >> This is what we have now: > Except that s/first window-full/last window-full/ True. And I meant to suggest that, on average, we'd get the same 1 second delay (if we assume all positions in the file are equally probable). However, I've just tried the same experiment without goto-char, and got essentially the same result as with it: 1.2 s (my previous result was with "cold" filesystem cache). In addition to that, though, I think this call returns before the window finishes displaying. So, when point is at eob, there's some extra wait, but I'm not sure how to measure it. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 18:47 ` Dmitry Gutov 2020-03-31 18:48 ` Noam Postavsky @ 2020-03-31 19:26 ` Eli Zaretskii 2020-03-31 19:50 ` Dmitry Gutov 1 sibling, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 19:26 UTC (permalink / raw) To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl > Cc: casouri@gmail.com, akrl@sdf.org, emacs-devel@gnu.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Tue, 31 Mar 2020 21:47:17 +0300 > > (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max)))) > > => Elapsed time: 1.940401s (0.376140s in 6 GCs) This doesn't measure the redisplay (which happens after the above command returns). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 19:26 ` Eli Zaretskii @ 2020-03-31 19:50 ` Dmitry Gutov 2020-04-01 2:28 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-03-31 19:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl On 31.03.2020 22:26, Eli Zaretskii wrote: >> Cc:casouri@gmail.com,akrl@sdf.org,emacs-devel@gnu.org >> From: Dmitry Gutov<dgutov@yandex.ru> >> Date: Tue, 31 Mar 2020 21:47:17 +0300 >> >> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max)))) >> >> => Elapsed time: 1.940401s (0.376140s in 6 GCs) > This doesn't measure the redisplay (which happens after the above > command returns). Which means that the current state of affairs is even slower. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 19:50 ` Dmitry Gutov @ 2020-04-01 2:28 ` Eli Zaretskii 2020-04-01 3:49 ` Dmitry Gutov 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 2:28 UTC (permalink / raw) To: Dmitry Gutov; +Cc: casouri, emacs-devel, monnier, akrl > Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org, > emacs-devel@gnu.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Tue, 31 Mar 2020 22:50:43 +0300 > > >> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max)))) > >> > >> => Elapsed time: 1.940401s (0.376140s in 6 GCs) > > This doesn't measure the redisplay (which happens after the above > > command returns). > > Which means that the current state of affairs is even slower. No, it means that whatever delay we will have with parsing the entire buffer is _in_addition_ to whatever you measured. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 2:28 ` Eli Zaretskii @ 2020-04-01 3:49 ` Dmitry Gutov 2020-04-01 4:14 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-01 3:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl On 01.04.2020 05:28, Eli Zaretskii wrote: >> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org, >> emacs-devel@gnu.org >> From: Dmitry Gutov <dgutov@yandex.ru> >> Date: Tue, 31 Mar 2020 22:50:43 +0300 >> >>>> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max)))) >>>> >>>> => Elapsed time: 1.940401s (0.376140s in 6 GCs) >>> This doesn't measure the redisplay (which happens after the above >>> command returns). >> >> Which means that the current state of affairs is even slower. > > No, it means that whatever delay we will have with parsing the entire > buffer is _in_addition_ to whatever you measured. Probably not. IIUC, most of this 1.2 measured delay is CC Mode doing the preliminary parsing. That phase would be replaced by TreeSitter's full buffer parse, which supposedly takes a comparable amount of time. The redisplay phase will most likely be faster because by then the correct AST is available, and computing highlighting based on it is supposedly something that TreeSitter does quickly and well. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 3:49 ` Dmitry Gutov @ 2020-04-01 4:14 ` Eli Zaretskii 2020-04-01 13:47 ` Dmitry Gutov 2020-04-01 13:52 ` Alan Mackenzie 0 siblings, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 4:14 UTC (permalink / raw) To: emacs-devel, Dmitry Gutov; +Cc: casouri, monnier, akrl On April 1, 2020 6:49:45 AM GMT+03:00, Dmitry Gutov <dgutov@yandex.ru> wrote: > On 01.04.2020 05:28, Eli Zaretskii wrote: > >> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org, > >> emacs-devel@gnu.org > >> From: Dmitry Gutov <dgutov@yandex.ru> > >> Date: Tue, 31 Mar 2020 22:50:43 +0300 > >> > >>>> (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char > (point-max)))) > >>>> > >>>> => Elapsed time: 1.940401s (0.376140s in 6 GCs) > >>> This doesn't measure the redisplay (which happens after the above > >>> command returns). > >> > >> Which means that the current state of affairs is even slower. > > > > No, it means that whatever delay we will have with parsing the > entire > > buffer is _in_addition_ to whatever you measured. > > Probably not. IIUC, most of this 1.2 measured delay is CC Mode doing > the > preliminary parsing. There's no need to guess. Just profile this use case, and you will clearly see what takes most of this time. In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes. I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification. CC Mode certainly doesn't seem to do that. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 4:14 ` Eli Zaretskii @ 2020-04-01 13:47 ` Dmitry Gutov 2020-04-01 14:04 ` Eli Zaretskii 2020-04-01 13:52 ` Alan Mackenzie 1 sibling, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-01 13:47 UTC (permalink / raw) To: Eli Zaretskii, emacs-devel; +Cc: casouri, monnier, akrl On 01.04.2020 07:14, Eli Zaretskii wrote: > There's no need to guess. Just profile this use case, and you will clearly see what takes most of this time. - c-mode 772 75% - c-common-init 766 74% - mapc 764 74% - #<compiled 0x158957d29ef1> 509 49% + c-neutralize-syntax-in-CPP 276 26% + c-after-change-mark-abnormal-strings 204 19% + c-parse-quotes-after-change 18 1% - #<compiled 0x158957d29ee5> 255 24% + c-before-change-check-unbalanced-strings 199 19% + c-depropertize-CPP 46 4% c-font-lock-init 1 0% c-basic-common-init 1 0% You can also compare CC Mode's init with JS Mode's. If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same benchmark takes ~60ms. So yes, CC Mode does a lot during initialization, and that stuff can be described as "preliminary parsing". And there will be more of that during redisplay itself. > In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes. I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification. CC Mode certainly doesn't seem to do that. Now you know. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 13:47 ` Dmitry Gutov @ 2020-04-01 14:04 ` Eli Zaretskii 2020-04-01 14:55 ` Eli Zaretskii 2020-04-01 15:16 ` Dmitry Gutov 0 siblings, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 14:04 UTC (permalink / raw) To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel > Cc: casouri@gmail.com, monnier@iro.umontreal.ca, akrl@sdf.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Wed, 1 Apr 2020 16:47:02 +0300 > > On 01.04.2020 07:14, Eli Zaretskii wrote: > > > There's no need to guess. Just profile this use case, and you will clearly see what takes most of this time. > > - c-mode 772 75% > - c-common-init 766 74% > - mapc 764 74% > - #<compiled 0x158957d29ef1> 509 49% > + c-neutralize-syntax-in-CPP 276 26% > + c-after-change-mark-abnormal-strings 204 19% > + c-parse-quotes-after-change 18 1% > - #<compiled 0x158957d29ee5> 255 24% > + c-before-change-check-unbalanced-strings 199 19% > + c-depropertize-CPP 46 4% > c-font-lock-init 1 0% > c-basic-common-init 1 0% I see a very different picture here: the above takes something like 15%. Most of the time is spent in functions called by jit-lock. > If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same > benchmark takes ~60ms. So yes, CC Mode does a lot during initialization, > and that stuff can be described as "preliminary parsing". Except that I cannot reproduce these results, so I'm not really sure what we are looking at. What I did was start the profiler, then manually call got-char, then produce the profiler report. What did you do to collect the above profile? > And there will be more of that during redisplay itself. Which is not what your benchmark measures. > > In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes. I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification. CC Mode certainly doesn't seem to do that. > > Now you know. Do I? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 14:04 ` Eli Zaretskii @ 2020-04-01 14:55 ` Eli Zaretskii 2020-04-01 15:16 ` Dmitry Gutov 1 sibling, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 14:55 UTC (permalink / raw) To: dgutov; +Cc: casouri, emacs-devel, monnier, akrl > Date: Wed, 01 Apr 2020 17:04:24 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca, > emacs-devel@gnu.org > > What I did was start the profiler, then manually call got-char, then > produce the profiler report. That came out confusingly unclear. What I actually did was start the profiler, then evaluate the form that visits xdisp.c and goes to point-max, then call profiler-report. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 14:04 ` Eli Zaretskii 2020-04-01 14:55 ` Eli Zaretskii @ 2020-04-01 15:16 ` Dmitry Gutov 2020-04-01 15:59 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-01 15:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel On 01.04.2020 17:04, Eli Zaretskii wrote: >> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, akrl@sdf.org >> From: Dmitry Gutov <dgutov@yandex.ru> >> Date: Wed, 1 Apr 2020 16:47:02 +0300 >> >> On 01.04.2020 07:14, Eli Zaretskii wrote: >> >>> There's no need to guess. Just profile this use case, and you will clearly see what takes most of this time. >> >> - c-mode 772 75% >> - c-common-init 766 74% >> - mapc 764 74% >> - #<compiled 0x158957d29ef1> 509 49% >> + c-neutralize-syntax-in-CPP 276 26% >> + c-after-change-mark-abnormal-strings 204 19% >> + c-parse-quotes-after-change 18 1% >> - #<compiled 0x158957d29ee5> 255 24% >> + c-before-change-check-unbalanced-strings 199 19% >> + c-depropertize-CPP 46 4% >> c-font-lock-init 1 0% >> c-basic-common-init 1 0% > > I see a very different picture here: the above takes something like > 15%. Most of the time is spent in functions called by jit-lock. What are your measurements, though? Again, what does this print out? (benchmark 1 '(progn (find-file "src/xdisp.c"))) >> If I just (push '("\\.c\\'" . js-mode) auto-mode-alist), the same >> benchmark takes ~60ms. So yes, CC Mode does a lot during initialization, >> and that stuff can be described as "preliminary parsing". > > Except that I cannot reproduce these results, so I'm not really sure > what we are looking at. > > What I did was start the profiler, then manually call got-char, then > produce the profiler report. What did you do to collect the above > profile? No 'goto-char'. As we've established, it only affects the time taken by redisplay, and I can't measure that. So I'm not profiling it either, otherwise I'd be comparing apples to oranges. >> And there will be more of that during redisplay itself. > > Which is not what your benchmark measures. Exactly. Like I said, I can't measure how long redisplay itself takes. >>> In general, there's no "preliminary processing" by the major mode's fontification facilities except what happens as part of jit-lock, i.e. at redisplay time or as side effect of functions that simulate display for redisplay purposes. I'd be very surprised to see a major mode which somehow preprocesses the buffer on its own in preparation for fontification. CC Mode certainly doesn't seem to do that. >> >> Now you know. > > Do I? Yes. The numbers can be different, but there is definitely some up-front computation there. One that's not present with e.g. js-mode. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 15:16 ` Dmitry Gutov @ 2020-04-01 15:59 ` Eli Zaretskii 2020-04-01 21:48 ` Dmitry Gutov 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 15:59 UTC (permalink / raw) To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel > Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca, > akrl@sdf.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Wed, 1 Apr 2020 18:16:04 +0300 > > > I see a very different picture here: the above takes something like > > 15%. Most of the time is spent in functions called by jit-lock. > > What are your measurements, though? My full profile is below. This is from Emacs 27.0.90 compiled with the -Og optimization and with wide-int (which slows down Emacs by about 30%). > Again, what does this print out? > > (benchmark 1 '(progn (find-file "src/xdisp.c"))) Elapsed time: 1.733853s (0.140584s in 6 GCs) > No 'goto-char'. As we've established, it only affects the time taken by > redisplay, and I can't measure that. So I'm not profiling it either, > otherwise I'd be comparing apples to oranges. See the second profile below. > Yes. The numbers can be different, but there is definitely some up-front > computation there. One that's not present with e.g. js-mode. So you are saying that we should do that up-front computation just because CC mode currently does it? That we shouldn't try to eliminate such preprocessing? I don't think so. Here's the profile from visiting xdisp.c and going to end of the buffer: - redisplay_internal (C function) 65 41% - jit-lock-function 65 41% - jit-lock-fontify-now 65 41% - jit-lock--run-functions 65 41% - run-hook-wrapped 65 41% - #<compiled -0x1ffffffff8adaa88> 65 41% - font-lock-fontify-region 65 41% - c-font-lock-fontify-region 65 41% - font-lock-default-fontify-region 50 31% - font-lock-fontify-keywords-region 35 22% - c-font-lock-declarations 34 21% - c-find-decl-spots 34 21% - c-bs-at-toplevel-p 32 20% - c-brace-stack-at 32 20% - c-update-brace-stack 31 19% - c-syntactic-re-search-forward 27 17% - c-beginning-of-macro 6 3% back-to-indentation 2 1% #<compiled -0x1ffffffff8ae5f98> 1 0% c-forward-sws 1 0% - c-font-lock-complex-decl-prepare 1 0% - c-parse-state 1 0% - c-parse-state-1 1 0% - c-parse-state-get-strategy 1 0% - c-get-fallback-scan-pos 1 0% - beginning-of-defun 1 0% - beginning-of-defun-raw 1 0% syntax-ppss 1 0% - font-lock-fontify-syntactically-region 15 9% syntax-ppss 15 9% - c-before-context-fl-expand-region 15 9% - mapc 15 9% - #<compiled -0x1ffffffff8a66198> 15 9% - c-context-expand-fl-region 15 9% - c-fl-decl-start 15 9% - c-literal-start 14 8% - c-semi-pp-to-literal 14 8% c-parse-ps-state-below 14 8% c-determine-limit 1 0% - command-execute 64 40% - call-interactively 64 40% - funcall-interactively 63 40% - eval-last-sexp 63 40% - elisp--eval-last-sexp 63 40% - eval 63 40% - progn 63 40% - progn 63 40% - find-file 63 40% - find-file-noselect 63 40% - find-file-noselect-1 63 40% - after-find-file 63 40% - normal-mode 61 38% - set-auto-mode 61 38% - set-auto-mode-0 61 38% - c-mode 61 38% - c-common-init 57 36% - mapc 57 36% - #<compiled -0x1ffffffff8a7d680> 37 23% - c-neutralize-syntax-in-CPP 20 12% - c-beginning-of-macro 4 2% c-backward-single-comment 2 1% back-to-indentation 1 0% c-no-comment-end-of-macro 3 1% c-after-change-mark-abnormal-strings 15 9% c-parse-quotes-after-change 1 0% - #<compiled -0x1ffffffff8a7d6b0> 20 12% - c-before-change-check-unbalanced-strings 15 9% - c-literal-limits 15 9% - c-full-pp-to-literal 15 9% c-parse-ps-state-below 15 9% c-depropertize-CPP 4 2% - byte-code 2 1% require 1 0% - run-mode-hooks 1 0% - hack-local-variables 1 0% - hack-dir-local-variables 1 0% dir-locals-read-from-dir 1 0% - run-hooks 2 1% - vc-refresh-state 2 1% - vc-backend 2 1% - vc-registered 2 1% - mapc 2 1% - #<compiled -0x1ffffffff8a67780> 2 1% - vc-call-backend 2 1% - apply 2 1% - vc-git-registered 2 1% - if 2 1% - progn 2 1% - load 1 0% require 1 0% - byte-code 1 0% - read-extended-command 1 0% - completing-read 1 0% completing-read-default 1 0% - ... 28 17% Automatic GC 27 17% - substitute-key-definition-key 1 0% - substitute-key-definition 1 0% - map-keymap 1 0% - #<compiled -0x1ffffffff8a80eb8> 1 0% - substitute-key-definition-key 1 0% - substitute-key-definition 1 0% - map-keymap 1 0% - #<compiled -0x1ffffffff8a80c48> 1 0% - substitute-key-definition-key 1 0% - substitute-key-definition 1 0% - map-keymap 1 0% - #<compiled -0x1ffffffff8a80658> 1 0% - substitute-key-definition-key 1 0% - substitute-key-definition 1 0% - map-keymap 1 0% #<compiled -0x1ffffffff8a7ce58> 1 0% Here's the profile from just visiting xdisp.c: - command-execute 67 82% - call-interactively 67 82% - funcall-interactively 67 82% - eval-expression 67 82% - eval 67 82% - progn 67 82% - find-file 67 82% - find-file-noselect 67 82% - find-file-noselect-1 66 81% - after-find-file 66 81% - normal-mode 62 76% - set-auto-mode 62 76% - set-auto-mode-0 62 76% - c-mode 62 76% - c-common-init 55 67% - mapc 55 67% - #<compiled -0x1ffffffff8aa7940> 36 44% - c-neutralize-syntax-in-CPP 21 25% - c-beginning-of-macro 2 2% c-backward-single-comment 1 1% c-after-change-mark-abnormal-strings 14 17% - #<compiled -0x1ffffffff8aa7970> 19 23% - c-before-change-check-unbalanced-strings 14 17% - c-literal-limits 14 17% - c-full-pp-to-literal 14 17% c-parse-ps-state-below 14 17% - c-depropertize-CPP 4 4% c-end-of-macro 1 1% - byte-code 6 7% require 4 4% - substitute-key-definition 1 1% - map-keymap 1 1% - #<compiled -0x1ffffffff8aac0b8> 1 1% - substitute-key-definition-key 1 1% - substitute-key-definition 1 1% map-keymap 1 1% - run-hooks 4 4% - vc-refresh-state 4 4% - vc-backend 4 4% - vc-registered 4 4% - mapc 3 3% - #<compiled -0x1ffffffff8ae8e88> 3 3% - vc-call-backend 3 3% - apply 3 3% - vc-git-registered 2 2% - if 2 2% - progn 2 2% - load 1 1% - require 1 1% - defconst 1 1% byte-code 1 1% - vc-git-registered 1 1% - vc-git--out-ok 1 1% - apply 1 1% - vc-git--call 1 1% - apply 1 1% - process-file 1 1% apply 1 1% - vc-git-find-file-hook 1 1% - vc-state 1 1% - vc-state-refresh 1 1% - vc-call-backend 1 1% - apply 1 1% - vc-git-state 1 1% - apply 1 1% - vc-git--run-command-string 1 1% - apply 1 1% - vc-git--out-ok 1 1% - apply 1 1% - vc-git--call 1 1% - apply 1 1% - process-file 1 1% apply 1 1% vc-file-getprop 1 1% - find-buffer-visiting 1 1% - file-truename 1 1% - file-truename 1 1% - file-truename 1 1% - file-truename 1 1% - file-truename 1 1% file-truename 1 1% - ... 14 17% Automatic GC 14 17% ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 15:59 ` Eli Zaretskii @ 2020-04-01 21:48 ` Dmitry Gutov 2020-04-01 22:29 ` Stefan Monnier 2020-04-02 14:23 ` Eli Zaretskii 0 siblings, 2 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-04-01 21:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel On 01.04.2020 18:59, Eli Zaretskii wrote: >> What are your measurements, though? > > My full profile is below. This is from Emacs 27.0.90 compiled with > the -Og optimization and with wide-int (which slows down Emacs by > about 30%). Thank you. I also build with '-Og -g3' these days, but probably have a faster CPU. >> Again, what does this print out? >> >> (benchmark 1 '(progn (find-file "src/xdisp.c"))) > > Elapsed time: 1.733853s (0.140584s in 6 GCs) All right. So it takes 1.7s just to open the file, even before full syntax highlighting. >> No 'goto-char'. As we've established, it only affects the time taken by >> redisplay, and I can't measure that. So I'm not profiling it either, >> otherwise I'd be comparing apples to oranges. > > See the second profile below. Comparing both, looks like redisplay (when at eob, at least) takes approx. the same amount of time? >> Yes. The numbers can be different, but there is definitely some up-front >> computation there. One that's not present with e.g. js-mode. > > So you are saying that we should do that up-front computation just > because CC mode currently does it? That we shouldn't try to eliminate > such preprocessing? I don't think so. AFAIU CC Mode could actually eliminate it, but that would require a significant rework of its internals. I'm just pointing out that apparently you didn't even notice an even larger delay (1.7s), and were fine with it until now. I'm not saying that nobody should try to explore how to decrease the delay, and what tradeoffs come with that. But for now, I think, we should encourage our kind volunteers to just implement integration the way TreeSitter's authors expect it. And try, on our side, to provide the best tools for it. Then we can see how well it does or doesn't work, and what are the biggest annoyances that the users have with it. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 21:48 ` Dmitry Gutov @ 2020-04-01 22:29 ` Stefan Monnier 2020-04-02 14:23 ` Eli Zaretskii 1 sibling, 0 replies; 142+ messages in thread From: Stefan Monnier @ 2020-04-01 22:29 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Eli Zaretskii, akrl, casouri, emacs-devel > AFAIU CC Mode could actually eliminate it, but that would require > a significant rework of its internals. My experiments to make CC-mode use syntax-propertize-function suggest that it wouldn't require too much work, actually. For an outsider, it's difficult because it's hard to understand all the invariants/assumptions in the current design, but if Alan and I were to work together on it, it would be pretty easy. So far Alan has been opposed and there are several good reasons for that: - it's extra work. - it will inevitably introduce bugs. - while it will most likely be faster when opening the file, it will likely be slower in other cases (e.g. when modifying the buffer near point-min in one window while having point-max displayed in another). - syntax-propertize was introduced in Emacs-24 so it would require either dropping CC-mode's support for earlier Emacsen, or adding some compatibility layer (I think this compatibility layer would be easy to write but would likely not cover all cases). > I'm not saying that nobody should try to explore how to decrease the delay, > and what tradeoffs come with that. But for now, I think, we should encourage > our kind volunteers to just implement integration the way TreeSitter's > authors expect it. And try, on our side, to provide the best tools for > it. Then we can see how well it does or doesn't work, and what are the > biggest annoyances that the users have with it. +1 Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 21:48 ` Dmitry Gutov 2020-04-01 22:29 ` Stefan Monnier @ 2020-04-02 14:23 ` Eli Zaretskii 2020-04-02 16:17 ` Dmitry Gutov 1 sibling, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-02 14:23 UTC (permalink / raw) To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel > Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca, > akrl@sdf.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Thu, 2 Apr 2020 00:48:20 +0300 > > >> No 'goto-char'. As we've established, it only affects the time taken by > >> redisplay, and I can't measure that. So I'm not profiling it either, > >> otherwise I'd be comparing apples to oranges. > > > > See the second profile below. > > Comparing both, looks like redisplay (when at eob, at least) takes > approx. the same amount of time? About 55% taken by redisplay (almost all of it due to fontification), and the other 45% are the C mode "preprocessing" when the mode is turned on in a buffer. > >> Yes. The numbers can be different, but there is definitely some up-front > >> computation there. One that's not present with e.g. js-mode. > > > > So you are saying that we should do that up-front computation just > > because CC mode currently does it? That we shouldn't try to eliminate > > such preprocessing? I don't think so. > > AFAIU CC Mode could actually eliminate it, but that would require a > significant rework of its internals. Are we still talking about integrating a completely different parsing engine into CC Mode? Then redesign is a must, right? > I'm just pointing out that apparently you didn't even notice an even > larger delay (1.7s), and were fine with it until now. I didn't "didn't notice", I actually filed several bug reports and complaints about the various slow aspects of CC mode, because the slowdown in CC mode over the years annoys me quite a lot. Some of the problems were fixed, some weren't (due to limitations of the current design, I was told). I'm not at all complacent about this. > I'm not saying that nobody should try to explore how to decrease the > delay, and what tradeoffs come with that. But for now, I think, we > should encourage our kind volunteers to just implement integration the > way TreeSitter's authors expect it. And try, on our side, to provide the > best tools for it. Then we can see how well it does or doesn't work, and > what are the biggest annoyances that the users have with it. I cannot tell the volunteers what to do and where to invest their resources. But I can provide feedback on the design ideas, based on what I know and on my experience, and I can suggest how to design and implement this to achieve good and scalable performance. In particular, I think that it is useful to know what we have tried in the past and what were the lessons we learned from that. I hope what I say is of some help, and I hope we will soon have such engine available to Emacs. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 14:23 ` Eli Zaretskii @ 2020-04-02 16:17 ` Dmitry Gutov 2020-04-02 18:25 ` Eli Zaretskii 2020-04-03 14:40 ` Tuấn-Anh Nguyễn 0 siblings, 2 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-04-02 16:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, akrl, monnier, emacs-devel On 02.04.2020 17:23, Eli Zaretskii wrote: >> Comparing both, looks like redisplay (when at eob, at least) takes >> approx. the same amount of time? > > About 55% taken by redisplay (almost all of it due to fontification), > and the other 45% are the C mode "preprocessing" when the mode is > turned on in a buffer. So, all in all, when xdisp.c is opened at eob, it will be displayed after ~2.5 seconds, I guess. >>> So you are saying that we should do that up-front computation just >>> because CC mode currently does it? That we shouldn't try to eliminate >>> such preprocessing? I don't think so. >> >> AFAIU CC Mode could actually eliminate it, but that would require a >> significant rework of its internals. > > Are we still talking about integrating a completely different parsing > engine into CC Mode? Then redesign is a must, right? No, that's without TreeSitter. >> I'm just pointing out that apparently you didn't even notice an even >> larger delay (1.7s), and were fine with it until now. > > I didn't "didn't notice", I actually filed several bug reports and > complaints about the various slow aspects of CC mode, because the > slowdown in CC mode over the years annoys me quite a lot. Some of the > problems were fixed, some weren't (due to limitations of the current > design, I was told). I'm not at all complacent about this. Still, compare that with 0.15 sec, which is the current estimate of parsing xdisp.c. It could probably be improved still by supporting a no-copy buffer-string in modules. > I cannot tell the volunteers what to do and where to invest their > resources. But I can provide feedback on the design ideas, based on > what I know and on my experience, and I can suggest how to design and > implement this to achieve good and scalable performance. We shouldn't, however, create an impression that unless they follow our ideas to a T we won't help them realize their own preferred approach (e.g. by improving the module API). > In > particular, I think that it is useful to know what we have tried in > the past and what were the lessons we learned from that. I hope what > I say is of some help, and I hope we will soon have such engine > available to Emacs. I'm fairly confident that implementing deferred/on-demand parsing in emacs-tree-sitter can be done later without requiring a major redesign. It will require, however, an extra layer of complexity either way. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 16:17 ` Dmitry Gutov @ 2020-04-02 18:25 ` Eli Zaretskii 2020-04-03 14:40 ` Tuấn-Anh Nguyễn 1 sibling, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-02 18:25 UTC (permalink / raw) To: Dmitry Gutov; +Cc: casouri, akrl, monnier, emacs-devel > Cc: emacs-devel@gnu.org, casouri@gmail.com, monnier@iro.umontreal.ca, > akrl@sdf.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Thu, 2 Apr 2020 19:17:07 +0300 > > > I cannot tell the volunteers what to do and where to invest their > > resources. But I can provide feedback on the design ideas, based on > > what I know and on my experience, and I can suggest how to design and > > implement this to achieve good and scalable performance. > > We shouldn't, however, create an impression that unless they follow our > ideas to a T we won't help them realize their own preferred approach That's so unfair that I will in the future think twice before offering any advice. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 16:17 ` Dmitry Gutov 2020-04-02 18:25 ` Eli Zaretskii @ 2020-04-03 14:40 ` Tuấn-Anh Nguyễn 2020-04-03 16:10 ` Dmitry Gutov 1 sibling, 1 reply; 142+ messages in thread From: Tuấn-Anh Nguyễn @ 2020-04-03 14:40 UTC (permalink / raw) To: Dmitry Gutov Cc: Eli Zaretskii, emacs-devel, casouri, Stefan Monnier, Andrea Corallo On Thu, Apr 2, 2020 at 11:17 PM Dmitry Gutov <dgutov@yandex.ru> wrote: > > On 02.04.2020 17:23, Eli Zaretskii wrote: > > > I cannot tell the volunteers what to do and where to invest their > > resources. But I can provide feedback on the design ideas, based on > > what I know and on my experience, and I can suggest how to design and > > implement this to achieve good and scalable performance. > > We shouldn't, however, create an impression that unless they follow our > ideas to a T we won't help them realize their own preferred approach > (e.g. by improving the module API). > FWIW, this was not my impression. -- Tuấn-Anh Nguyễn Software Engineer ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-03 14:40 ` Tuấn-Anh Nguyễn @ 2020-04-03 16:10 ` Dmitry Gutov 0 siblings, 0 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-04-03 16:10 UTC (permalink / raw) To: Tuấn-Anh Nguyễn Cc: Eli Zaretskii, emacs-devel, casouri, Stefan Monnier, Andrea Corallo On 03.04.2020 17:40, Tuấn-Anh Nguyễn wrote: > FWIW, this was not my impression. I'm glad to hear it. My apologies, then. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 4:14 ` Eli Zaretskii 2020-04-01 13:47 ` Dmitry Gutov @ 2020-04-01 13:52 ` Alan Mackenzie 2020-04-01 14:10 ` Eli Zaretskii 2020-04-01 15:22 ` Dmitry Gutov 1 sibling, 2 replies; 142+ messages in thread From: Alan Mackenzie @ 2020-04-01 13:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: akrl, casouri, Dmitry Gutov, monnier, emacs-devel Hello, Eli. On Wed, Apr 01, 2020 at 07:14:09 +0300, Eli Zaretskii wrote: > On April 1, 2020 6:49:45 AM GMT+03:00, Dmitry Gutov <dgutov@yandex.ru> wrote: > > On 01.04.2020 05:28, Eli Zaretskii wrote: > > >> Cc: monnier@iro.umontreal.ca, casouri@gmail.com, akrl@sdf.org, > > >> emacs-devel@gnu.org > > >> From: Dmitry Gutov <dgutov@yandex.ru> > > >> Date: Tue, 31 Mar 2020 22:50:43 +0300 > In general, there's no "preliminary processing" by the major mode's > fontification facilities except what happens as part of jit-lock, i.e. > at redisplay time or as side effect of functions that simulate display > for redisplay purposes. I'd be very surprised to see a major mode > which somehow preprocesses the buffer on its own in preparation for > fontification. CC Mode certainly doesn't seem to do that. CC Mode does do this. It marks syntax-table text properties throughout the buffer at find-file time, and keeps them valid thereafter in before/after-change-functions. This doesn't seem to affect starting up performance that badly. On my machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification of the first screenful of comments) is taking 0.18s. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 13:52 ` Alan Mackenzie @ 2020-04-01 14:10 ` Eli Zaretskii 2020-04-01 15:27 ` Dmitry Gutov 2020-04-01 15:22 ` Dmitry Gutov 1 sibling, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 14:10 UTC (permalink / raw) To: Alan Mackenzie; +Cc: casouri, dgutov, emacs-devel, monnier, akrl > Date: Wed, 1 Apr 2020 13:52:37 +0000 > From: Alan Mackenzie <acm@muc.de> > Cc: akrl@sdf.org, casouri@gmail.com, Dmitry Gutov <dgutov@yandex.ru>, > monnier@iro.umontreal.ca, emacs-devel@gnu.org > > > In general, there's no "preliminary processing" by the major mode's > > fontification facilities except what happens as part of jit-lock, i.e. > > at redisplay time or as side effect of functions that simulate display > > for redisplay purposes. I'd be very surprised to see a major mode > > which somehow preprocesses the buffer on its own in preparation for > > fontification. CC Mode certainly doesn't seem to do that. > > CC Mode does do this. It marks syntax-table text properties throughout > the buffer at find-file time, and keeps them valid thereafter in > before/after-change-functions. > > This doesn't seem to affect starting up performance that badly. On my > machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification > of the first screenful of comments) is taking 0.18s. Like I said, the profile I see is very different, and shows that most of the time is spent in redisplay-triggered font-lock. But in any case, it should be trivially obvious that avoiding to parse the entire buffer will make redisplay faster. We should try doing that instead of giving up, even if we think the current fontification machinery is slow enough to make the parsing delay not so visible. After all, we want to use these parsers to make CC Mode and friends faster, so the design and the implementation should use every trick we have up our sleeve to avoid expensive processing. Just because using buffer-substring and parsing the entire buffer up front is easy doesn't yet mean we should go for it without trying more efficient algorithms. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 14:10 ` Eli Zaretskii @ 2020-04-01 15:27 ` Dmitry Gutov 2020-04-01 15:44 ` Jorge Javier Araya Navarro 2020-04-01 16:03 ` Eli Zaretskii 0 siblings, 2 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-04-01 15:27 UTC (permalink / raw) To: Eli Zaretskii, Alan Mackenzie; +Cc: casouri, emacs-devel, monnier, akrl On 01.04.2020 17:10, Eli Zaretskii wrote: > But in any case, it should be trivially obvious that avoiding to parse > the entire buffer will make redisplay faster. We should try doing > that instead of giving up, even if we think the current fontification > machinery is slow enough to make the parsing delay not so visible. I think it's pointless to argue against the current design of TreeSitter here, where none of its developers can read it. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 15:27 ` Dmitry Gutov @ 2020-04-01 15:44 ` Jorge Javier Araya Navarro 2020-04-01 16:03 ` Eli Zaretskii 1 sibling, 0 replies; 142+ messages in thread From: Jorge Javier Araya Navarro @ 2020-04-01 15:44 UTC (permalink / raw) To: Dmitry Gutov Cc: casouri, emacs-devel, Stefan Monnier, Alan Mackenzie, Eli Zaretskii, akrl [-- Attachment #1: Type: text/plain, Size: 574 bytes --] Yup. El mié., 1 de abr. de 2020 a la(s) 09:28, Dmitry Gutov (dgutov@yandex.ru) escribió: > On 01.04.2020 17:10, Eli Zaretskii wrote: > > But in any case, it should be trivially obvious that avoiding to parse > > the entire buffer will make redisplay faster. We should try doing > > that instead of giving up, even if we think the current fontification > > machinery is slow enough to make the parsing delay not so visible. > > I think it's pointless to argue against the current design of TreeSitter > here, where none of its developers can read it. > > [-- Attachment #2: Type: text/html, Size: 937 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 15:27 ` Dmitry Gutov 2020-04-01 15:44 ` Jorge Javier Araya Navarro @ 2020-04-01 16:03 ` Eli Zaretskii 2020-04-01 21:21 ` Dmitry Gutov 1 sibling, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-01 16:03 UTC (permalink / raw) To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl > Cc: akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca, > emacs-devel@gnu.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Wed, 1 Apr 2020 18:27:43 +0300 > > On 01.04.2020 17:10, Eli Zaretskii wrote: > > But in any case, it should be trivially obvious that avoiding to parse > > the entire buffer will make redisplay faster. We should try doing > > that instead of giving up, even if we think the current fontification > > machinery is slow enough to make the parsing delay not so visible. > > I think it's pointless to argue against the current design of TreeSitter > here, where none of its developers can read it. If by TreeSitter you mean the parser (not the Emacs package which interfaces it), then what I proposed is not against their design, AFAIU. They provide an API through which we can let the parser access the buffer text directly, and they explicitly say that the parser is tolerant to invalid/incomplete syntax trees. And I don't see how it could be any different, since when you start writing code, it takes quite some time before it becomes syntactically complete and valid. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 16:03 ` Eli Zaretskii @ 2020-04-01 21:21 ` Dmitry Gutov 2020-04-02 14:09 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-01 21:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl On 01.04.2020 19:03, Eli Zaretskii wrote: >> Cc:akrl@sdf.org,casouri@gmail.com,monnier@iro.umontreal.ca, >> emacs-devel@gnu.org >> From: Dmitry Gutov<dgutov@yandex.ru> >> Date: Wed, 1 Apr 2020 18:27:43 +0300 >> >> On 01.04.2020 17:10, Eli Zaretskii wrote: >>> But in any case, it should be trivially obvious that avoiding to parse >>> the entire buffer will make redisplay faster. We should try doing >>> that instead of giving up, even if we think the current fontification >>> machinery is slow enough to make the parsing delay not so visible. >> I think it's pointless to argue against the current design of TreeSitter >> here, where none of its developers can read it. > If by TreeSitter you mean the parser (not the Emacs package which > interfaces it), then what I proposed is not against their design, > AFAIU. They provide an API through which we can let the parser access > the buffer text directly, and they explicitly say that the parser is > tolerant to invalid/incomplete syntax trees. And I don't see how it > could be any different, since when you start writing code, it takes > quite some time before it becomes syntactically complete and valid. That makes sense, at least in theory. But I'd rather not break the usage assumptions of the authors of this library right away. And we'll likely want to adopt existing addons which use the result of the parse, which likely depend on the same assumptions. Anyway, here's a (short) discussion on the topic of large files: https://github.com/tree-sitter/tree-sitter/issues/222 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 21:21 ` Dmitry Gutov @ 2020-04-02 14:09 ` Eli Zaretskii 2020-04-02 18:03 ` 조성빈 via "Emacs development discussions. 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-02 14:09 UTC (permalink / raw) To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl > Cc: acm@muc.de, akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca, > emacs-devel@gnu.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Thu, 2 Apr 2020 00:21:36 +0300 > > > If by TreeSitter you mean the parser (not the Emacs package which > > interfaces it), then what I proposed is not against their design, > > AFAIU. They provide an API through which we can let the parser access > > the buffer text directly, and they explicitly say that the parser is > > tolerant to invalid/incomplete syntax trees. And I don't see how it > > could be any different, since when you start writing code, it takes > > quite some time before it becomes syntactically complete and valid. > > That makes sense, at least in theory. But I'd rather not break the usage > assumptions of the authors of this library right away. From what I could glean by reading the documentation, the above is not necessarily against the assumptions of the tree-sitter developers. I saw nothing that would indicate the initial full parse is a must. That such full parse is unnecessary is what I would expect, because of the use case that I start writing a source file from scratch. > And we'll likely want to adopt existing addons which use the result > of the parse, which likely depend on the same assumptions. Those other addons must also support the "write from scratch" use case, right? Then they should also support passing only part of the buffer, since it could be that this is all I have in the buffer right now. > Anyway, here's a (short) discussion on the topic of large files: > https://github.com/tree-sitter/tree-sitter/issues/222 Thanks. This was long ago, though, so I'm not sure what became of that (and Stefan's comment didn't yet get any responses to indicate that this is a solved problem). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 14:09 ` Eli Zaretskii @ 2020-04-02 18:03 ` 조성빈 via "Emacs development discussions. 2020-04-02 18:27 ` Yuan Fu 0 siblings, 1 reply; 142+ messages in thread From: ì¡°ì±ë¹ via "Emacs development discussions. @ 2020-04-02 18:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Dmitry Gutov, acm, casouri, Emacs-devel, monnier, akrl > 2020. 4. 2. 오후 11:10, Eli Zaretskii <eliz@gnu.org> 작성: > >> >> Cc: acm@muc.de, akrl@sdf.org, casouri@gmail.com, monnier@iro.umontreal.ca, >> emacs-devel@gnu.org >> From: Dmitry Gutov <dgutov@yandex.ru> >> Date: Thu, 2 Apr 2020 00:21:36 +0300 >> >>> If by TreeSitter you mean the parser (not the Emacs package which >>> interfaces it), then what I proposed is not against their design, >>> AFAIU. They provide an API through which we can let the parser access >>> the buffer text directly, and they explicitly say that the parser is >>> tolerant to invalid/incomplete syntax trees. And I don't see how it >>> could be any different, since when you start writing code, it takes >>> quite some time before it becomes syntactically complete and valid. >> >> That makes sense, at least in theory. But I'd rather not break the usage >> assumptions of the authors of this library right away. > > From what I could glean by reading the documentation, the above is not > necessarily against the assumptions of the tree-sitter developers. I > saw nothing that would indicate the initial full parse is a must. > That such full parse is unnecessary is what I would expect, because of > the use case that I start writing a source file from scratch. The situation of a new user creating a new buffer is very different from parsing code with only a peephole, because users don’t generally expect unfinished code to be exactly highlighted, while users do expect finished code to have exact highlighting. Maybe it’s just because I got lost through a lot of emails, and Mail.app doesn't really thread these emails properly, but I can’t understand the resistance of the front-up parsing. The current shipping CC-Mode is parsing most of the code front-up, and clearly tree sitter will be faster than that. AFAIU parsing code only by only looking through a peephole is super hard except for some languages that are designed for peephole processing - and that makes it only hard, not super hard. >> And we'll likely want to adopt existing addons which use the result >> of the parse, which likely depend on the same assumptions. > > Those other addons must also support the "write from scratch" use > case, right? Then they should also support passing only part of the > buffer, since it could be that this is all I have in the buffer right > now. > >> Anyway, here's a (short) discussion on the topic of large files: >> https://github.com/tree-sitter/tree-sitter/issues/222 > > Thanks. This was long ago, though, so I'm not sure what became of > that (and Stefan's comment didn't yet get any responses to indicate > that this is a solved problem). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 18:03 ` 조성빈 via "Emacs development discussions. @ 2020-04-02 18:27 ` Yuan Fu 2020-04-02 19:39 ` Stefan Monnier 0 siblings, 1 reply; 142+ messages in thread From: Yuan Fu @ 2020-04-02 18:27 UTC (permalink / raw) To: 조성빈 Cc: Emacs-devel, Stefan Monnier, Dmitry Gutov, acm, Eli Zaretskii, akrl [-- Attachment #1: Type: text/plain, Size: 875 bytes --] > On Apr 2, 2020, at 2:03 PM, 조성빈 <pcr910303@icloud.com> wrote: > > Maybe it’s just because I got lost through a lot of emails, and Mail.app > doesn't really thread these emails properly, but I can’t understand the > resistance of the front-up parsing. > I think we are just discussing if there is any way to not parse the whole buffer up front. (Which I consider unlikely because of the nature of parsing.) > The current shipping CC-Mode is parsing most of the code front-up, and > clearly tree sitter will be faster than that. AFAIU parsing code only by > only looking through a peephole is super hard except for some languages > that are designed for peephole processing - and that makes it only hard, > not super hard. Some modes doesn’t require a font-up parsing. IIRC, an example from an earlier message is javascript-mode. Yuan [-- Attachment #2: Type: text/html, Size: 7422 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-02 18:27 ` Yuan Fu @ 2020-04-02 19:39 ` Stefan Monnier 0 siblings, 0 replies; 142+ messages in thread From: Stefan Monnier @ 2020-04-02 19:39 UTC (permalink / raw) To: Yuan Fu Cc: Emacs-devel, 조성빈, Dmitry Gutov, acm, Eli Zaretskii, akrl > Some modes doesn’t require a font-up parsing. IIRC, an example from an > earlier message is javascript-mode. Yet, in order to decide whether position P in a javascript buffer is inside a comment or not, you will either have to look at everything between point-min and P, or think hard about all the various possibilities to try and see if you can argue that in this particular case it's not necessary. E.g. if you see foo /* bar */ then you might be able to say that "bar" is within a comment without looking much further. But for "foo" you first have to look back because there might have been an earlier unmatched `/*`. BTW, for "bar" you still have to look a bit further: it might be that the previous line was: tmp = "hello\ in which case "bar" is not inside a comment but inside a string. Well, unless there's ... an earlier unmatched `/*`. Etc... For the case of Javascript I believe that you can come up with an algorithm which will reliably give the right answer while almost never having to go back all the way to `point-min`. I even believe it's possible to write a tool that will automatically find that algorithm given a suitable input grammar. But for some languages like Elisp, Python, and OCaml I believe it's simply impossible (for Elisp/Python it's because of the existence of multiline strings (with no "trailing \" to indicate their possible presence) and for OCaml it's because of the nested comments). Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 13:52 ` Alan Mackenzie 2020-04-01 14:10 ` Eli Zaretskii @ 2020-04-01 15:22 ` Dmitry Gutov 2020-04-04 11:06 ` Alan Mackenzie 1 sibling, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-01 15:22 UTC (permalink / raw) To: Alan Mackenzie, Eli Zaretskii; +Cc: casouri, emacs-devel, monnier, akrl On 01.04.2020 16:52, Alan Mackenzie wrote: > This doesn't seem to affect starting up performance that badly. On my > machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification > of the first screenful of comments) is taking 0.18s. Interesting. How do you measure it exactly? Do you kill the buffer between tries? I have a fast Intel CPU that is barely 2 years old (i9-8950HK), system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og -g3'", the build is from emacs-27 branch, recent revision. With 'emacs -Q' it's a little faster, but still (benchmark 1 '(progn (find-file "src/xdisp.c"))) prints out Elapsed time: 0.968598s (0.144805s in 8 GCs) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-01 15:22 ` Dmitry Gutov @ 2020-04-04 11:06 ` Alan Mackenzie 2020-04-04 11:26 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 142+ messages in thread From: Alan Mackenzie @ 2020-04-04 11:06 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Eli Zaretskii, emacs-devel, casouri, monnier, akrl Hello, Dmitry. On Wed, Apr 01, 2020 at 18:22:00 +0300, Dmitry Gutov wrote: > On 01.04.2020 16:52, Alan Mackenzie wrote: > > This doesn't seem to affect starting up performance that badly. On my > > machine (a 3 yo AMD Ryzen) visiting xdisp.c (including the fontification > > of the first screenful of comments) is taking 0.18s. > Interesting. How do you measure it exactly? Do you kill the buffer > between tries? Using my macro time-it, I did: (time-it (find-file "..../src/xdisp.c") (sit-for 0)) . I think this was without the file yet being in the OS's file cache. Mind you, I have an nvme SSD. > I have a fast Intel CPU that is barely 2 years old (i9-8950HK), > system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og > -g3'", the build is from emacs-27 branch, recent revision. That's a debugging build, isn't it? That probably explains the difference. > With 'emacs -Q' it's a little faster, but still > (benchmark 1 '(progn (find-file "src/xdisp.c"))) > prints out > Elapsed time: 0.968598s (0.144805s in 8 GCs) Is that also measuring the time for redisplay? -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 11:06 ` Alan Mackenzie @ 2020-04-04 11:26 ` Eli Zaretskii 2020-04-04 14:14 ` Andrea Corallo 2020-04-04 11:27 ` Eli Zaretskii 2020-04-04 12:01 ` Dmitry Gutov 2 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 11:26 UTC (permalink / raw) To: Alan Mackenzie; +Cc: casouri, akrl, emacs-devel, monnier, dgutov > Date: Sat, 4 Apr 2020 11:06:43 +0000 > Cc: Eli Zaretskii <eliz@gnu.org>, akrl@sdf.org, casouri@gmail.com, > monnier@iro.umontreal.ca, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > I have a fast Intel CPU that is barely 2 years old (i9-8950HK), > > system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og > > -g3'", the build is from emacs-27 branch, recent revision. > > That's a debugging build, isn't it? No, it's an optimized build, just not with -O2. -Og is similar to -O1, so slightly less optimized than -O2. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 11:26 ` Eli Zaretskii @ 2020-04-04 14:14 ` Andrea Corallo 2020-04-04 14:41 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Andrea Corallo @ 2020-04-04 14:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Alan Mackenzie, casouri, emacs-devel, monnier, dgutov Eli Zaretskii <eliz@gnu.org> writes: >> Date: Sat, 4 Apr 2020 11:06:43 +0000 >> Cc: Eli Zaretskii <eliz@gnu.org>, akrl@sdf.org, casouri@gmail.com, >> monnier@iro.umontreal.ca, emacs-devel@gnu.org >> From: Alan Mackenzie <acm@muc.de> >> >> > I have a fast Intel CPU that is barely 2 years old (i9-8950HK), >> > system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og >> > -g3'", the build is from emacs-27 branch, recent revision. >> >> That's a debugging build, isn't it? > > No, it's an optimized build, just not with -O2. -Og is similar to -O1, > so slightly less optimized than -O2. Be careful that -Og produce considerably slower code than -O2. For instance if I'm not wrong it disable completely inlining that is one of the most rewarding optimizations. Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 14:14 ` Andrea Corallo @ 2020-04-04 14:41 ` Eli Zaretskii 2020-04-04 15:04 ` Andrea Corallo 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 14:41 UTC (permalink / raw) To: Andrea Corallo; +Cc: acm, casouri, emacs-devel, monnier, dgutov > From: Andrea Corallo <akrl@sdf.org> > Cc: Alan Mackenzie <acm@muc.de>, dgutov@yandex.ru, casouri@gmail.com, > monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Sat, 04 Apr 2020 14:14:45 +0000 > > Be careful that -Og produce considerably slower code than -O2. For > instance if I'm not wrong it disable completely inlining that is one of > the most rewarding optimizations. Yes, I know. But the difference in performance between -Og and -O2 cannot be 8- or 9-fold, it should be somewhere around 50% to 70%. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 14:41 ` Eli Zaretskii @ 2020-04-04 15:04 ` Andrea Corallo 2020-04-04 15:38 ` Richard Copley 0 siblings, 1 reply; 142+ messages in thread From: Andrea Corallo @ 2020-04-04 15:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, casouri, dgutov, monnier, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Andrea Corallo <akrl@sdf.org> >> Cc: Alan Mackenzie <acm@muc.de>, dgutov@yandex.ru, casouri@gmail.com, >> monnier@iro.umontreal.ca, emacs-devel@gnu.org >> Date: Sat, 04 Apr 2020 14:14:45 +0000 >> >> Be careful that -Og produce considerably slower code than -O2. For >> instance if I'm not wrong it disable completely inlining that is one of >> the most rewarding optimizations. > > Yes, I know. But the difference in performance between -Og and -O2 > cannot be 8- or 9-fold, it should be somewhere around 50% to 70%. Mmmh I agree with you, one magnitude order sounds a bit too much, even if we have a ton of small getter/setters that are usually inlined. -- akrl@sdf.org ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 15:04 ` Andrea Corallo @ 2020-04-04 15:38 ` Richard Copley 0 siblings, 0 replies; 142+ messages in thread From: Richard Copley @ 2020-04-04 15:38 UTC (permalink / raw) To: Emacs Development Cc: Alan Mackenzie, Eli Zaretskii, Dmitry Gutov, Andrea Corallo Here, an -Og build takes about 2.5 times as long as an -O2 build to execute either of the two benchmarks. That's a relative decrease of 60% in elapsed time, for -O2 relative to -Og. I built Emacs in 4 separate clean worktrees of the master branch (f71afd600a). The build commands were identical except for the optimization flag. For each test I (twice) started "emacs -Q" and did either [1] or [2]: [1] M-: (benchmark 1 '(progn (find-file "src/xdisp.c"))) [2] M-: (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0))) The elapsed time reported was: without sit-for: -O0: 1.027754s, 1.031642s -Og: 1.295515s, 1.277441s -O1: 0.629743s, 0.629870s -O2: 0.513139s, 0.511230s with sit-for: -O0: 1.079090s, 1.068118s -Og: 1.347256s, 1.337780s -O1: 0.661679s, 0.664470s -O2: 0.533649s, 0.533949s (My only comment on the fact that -Og appears to be about 20% or 25% worse than -O0 is that it's not a typo.) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 11:06 ` Alan Mackenzie 2020-04-04 11:26 ` Eli Zaretskii @ 2020-04-04 11:27 ` Eli Zaretskii 2020-04-04 12:01 ` Dmitry Gutov 2 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 11:27 UTC (permalink / raw) To: Alan Mackenzie; +Cc: akrl, casouri, emacs-devel, monnier, dgutov > Date: Sat, 4 Apr 2020 11:06:43 +0000 > From: Alan Mackenzie <acm@muc.de> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org, casouri@gmail.com, > monnier@iro.umontreal.ca, akrl@sdf.org > > > (benchmark 1 '(progn (find-file "src/xdisp.c"))) > > > prints out > > > Elapsed time: 0.968598s (0.144805s in 8 GCs) > > Is that also measuring the time for redisplay? No, redisplay runs after the function exits. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 11:06 ` Alan Mackenzie 2020-04-04 11:26 ` Eli Zaretskii 2020-04-04 11:27 ` Eli Zaretskii @ 2020-04-04 12:01 ` Dmitry Gutov 2020-04-04 12:36 ` Alan Mackenzie 2 siblings, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-04 12:01 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, akrl, casouri, monnier, emacs-devel Hi Alan, On 04.04.2020 14:06, Alan Mackenzie wrote: >> Interesting. How do you measure it exactly? Do you kill the buffer >> between tries? > > Using my macro time-it, I did: > > (time-it (find-file "..../src/xdisp.c") (sit-for 0)) It might be valuable if you evaluated exactly the same form I did. And made sure that the buffer is not visited in advance. And did that in an 'emacs -Q' session. > . I think this was without the file yet being in the OS's file cache. > Mind you, I have an nvme SSD. I do as well. I have a fast laptop, pretty sure it's faster than what 90% of our users have. My single-threaded performance must be better than yours for sure. >> I have a fast Intel CPU that is barely 2 years old (i9-8950HK), >> system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og >> -g3'", the build is from emacs-27 branch, recent revision. > > That's a debugging build, isn't it? That probably explains the > difference. Debugging-ish. It hardly explains the 4.5x difference. So we're probably measuring different things. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 12:01 ` Dmitry Gutov @ 2020-04-04 12:36 ` Alan Mackenzie 2020-04-04 12:40 ` Dmitry Gutov 2020-04-04 13:02 ` Eli Zaretskii 0 siblings, 2 replies; 142+ messages in thread From: Alan Mackenzie @ 2020-04-04 12:36 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Eli Zaretskii, akrl, casouri, monnier, emacs-devel Hello, Dmitry. On Sat, Apr 04, 2020 at 15:01:23 +0300, Dmitry Gutov wrote: > On 04.04.2020 14:06, Alan Mackenzie wrote: > >> Interesting. How do you measure it exactly? Do you kill the buffer > >> between tries? > > Using my macro time-it, I did: > > (time-it (find-file "..../src/xdisp.c") (sit-for 0)) > It might be valuable if you evaluated exactly the same form I did. And > made sure that the buffer is not visited in advance. And did that in an > 'emacs -Q' session. Fair point: M-: (benchmark 1 '(progn (find-file "src/xdisp.c"))) "Elapsed time: 1.249904s (0.165570s in 7 GCs)" , in a build with the CLAGS and gtk toolkit like you said. That's in agreement with your timing, given my slightly slower machine. > > . I think this was without the file yet being in the OS's file cache. > > Mind you, I have an nvme SSD. > I do as well. I have a fast laptop, pretty sure it's faster than what > 90% of our users have. My single-threaded performance must be better > than yours for sure. > >> I have a fast Intel CPU that is barely 2 years old (i9-8950HK), > >> system-configuration-options is "--with-x-toolkit=gtk3 'CFLAGS=-Og > >> -g3'", the build is from emacs-27 branch, recent revision. > > That's a debugging build, isn't it? That probably explains the > > difference. > Debugging-ish. It hardly explains the 4.5x difference. So we're probably > measuring different things. I think it does explain the difference. I repeated my previous timing, which was 0.18s on an optimised build, and it came out at 1.16s. That's a factor of 6 different. CFLAGS='-Og -g3' is a slow build. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 12:36 ` Alan Mackenzie @ 2020-04-04 12:40 ` Dmitry Gutov 2020-04-04 13:02 ` Eli Zaretskii 1 sibling, 0 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-04-04 12:40 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, emacs-devel, casouri, monnier, akrl On 04.04.2020 15:36, Alan Mackenzie wrote: > I think it does explain the difference. I repeated my previous timing, > which was 0.18s on an optimised build, and it came out at 1.16s. That's > a factor of 6 different. CFLAGS='-Og -g3' is a slow build. Hmm. Very good, thank you. (I am just now in process of rebuilding Emacs with full optimizations; will report if the result is still starkly different from yours for some reason.) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 12:36 ` Alan Mackenzie 2020-04-04 12:40 ` Dmitry Gutov @ 2020-04-04 13:02 ` Eli Zaretskii 2020-04-04 16:09 ` Dmitry Gutov 1 sibling, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 13:02 UTC (permalink / raw) To: Alan Mackenzie; +Cc: akrl, casouri, emacs-devel, monnier, dgutov > Date: Sat, 4 Apr 2020 12:36:13 +0000 > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org, casouri@gmail.com, > monnier@iro.umontreal.ca, akrl@sdf.org > From: Alan Mackenzie <acm@muc.de> > > > > (time-it (find-file "..../src/xdisp.c") (sit-for 0)) > > > It might be valuable if you evaluated exactly the same form I did. And > > made sure that the buffer is not visited in advance. And did that in an > > 'emacs -Q' session. > > Fair point: > > M-: (benchmark 1 '(progn (find-file "src/xdisp.c"))) > > "Elapsed time: 1.249904s (0.165570s in 7 GCs)" > > , in a build with the CLAGS and gtk toolkit like you said. That's in > agreement with your timing, given my slightly slower machine. I don't believe these results. It's night impossible for a -O2 optimized program to be 5 times faster than a -Og optimized. And benchmark.el doesn't seem to be so different from time-it, modulo the function call. Moreover, Alan's method does time redisplay, whereas Dmitry's method does not. So there's some other factor at work here that explains the difference. > I think it does explain the difference. I repeated my previous timing, > which was 0.18s on an optimised build, and it came out at 1.16s. That's > a factor of 6 different. CFLAGS='-Og -g3' is a slow build. It cannot be that slow. Especially since some I/O is involved, and you also measure redisplay. More detailed data would be necessary to explain the difference. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 13:02 ` Eli Zaretskii @ 2020-04-04 16:09 ` Dmitry Gutov 2020-04-04 16:38 ` Eli Zaretskii 0 siblings, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-04 16:09 UTC (permalink / raw) To: Eli Zaretskii, Alan Mackenzie; +Cc: casouri, akrl, monnier, emacs-devel On 04.04.2020 16:02, Eli Zaretskii wrote: > I don't believe these results. It's night impossible for a -O2 > optimized program to be 5 times faster than a -Og optimized. And > benchmark.el doesn't seem to be so different from time-it, modulo the > function call. Moreover, Alan's method does time redisplay, whereas > Dmitry's method does not. Unfortunately I can confirm the difference. When Emacs is recompiled with the default optimizations, (benchmark 1 '(progn (find-file "src/xdisp.c"))) reports ~0.13s when FS cache is warm (compared to ~0.78 with the most recent -Og build here). And (benchmark 1 '(progn (find-file "src/xdisp.c") (goto-char (point-max)) (sit-for 0))) reports ~0.29s. Maybe CC Mode exercises some primitives that are hit especially hard by the lack of optimization. Emacs looks snappier overall (e.g. during startup, loading my custom configuration with all its packages), but probably within the bounds of 50-70% difference. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 16:09 ` Dmitry Gutov @ 2020-04-04 16:38 ` Eli Zaretskii 2020-04-04 16:45 ` Eli Zaretskii 2020-04-04 17:29 ` Dmitry Gutov 0 siblings, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 16:38 UTC (permalink / raw) To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Sat, 4 Apr 2020 19:09:58 +0300 > Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca, > emacs-devel@gnu.org > > When Emacs is recompiled with the default optimizations, > > (benchmark 1 '(progn (find-file "src/xdisp.c"))) > > reports ~0.13s when FS cache is warm (compared to ~0.78 with the most > recent -Og build here). > > And > > (benchmark 1 '(progn (find-file "src/xdisp.c") > (goto-char (point-max)) > (sit-for 0))) > > reports ~0.29s. Is this with xdisp.c in a Git repository or outside of a Git repository? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 16:38 ` Eli Zaretskii @ 2020-04-04 16:45 ` Eli Zaretskii 2020-04-04 17:22 ` Richard Copley 2020-04-04 17:36 ` Dmitry Gutov 2020-04-04 17:29 ` Dmitry Gutov 1 sibling, 2 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 16:45 UTC (permalink / raw) To: dgutov, acm; +Cc: casouri, akrl, monnier, emacs-devel > Date: Sat, 04 Apr 2020 19:38:18 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: acm@muc.de, casouri@gmail.com, emacs-devel@gnu.org, > monnier@iro.umontreal.ca, akrl@sdf.org > > Is this with xdisp.c in a Git repository or outside of a Git > repository? Also, how many GC's and the time they took did benchmark report? With such short timings and running the test only once, the difference GC could make might be significant, so if different runs and different people here have different numbers of GC, we could be comparing apples with oranges. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 16:45 ` Eli Zaretskii @ 2020-04-04 17:22 ` Richard Copley 2020-04-04 17:50 ` Eli Zaretskii 2020-04-04 18:29 ` Andrea Corallo 2020-04-04 17:36 ` Dmitry Gutov 1 sibling, 2 replies; 142+ messages in thread From: Richard Copley @ 2020-04-04 17:22 UTC (permalink / raw) To: Eli Zaretskii Cc: Alan Mackenzie, Andrea Corallo, Emacs Development, Dmitry Gutov On Sat, 4 Apr 2020 at 17:46, Eli Zaretskii <eliz@gnu.org> wrote: > > > Date: Sat, 04 Apr 2020 19:38:18 +0300 > > From: Eli Zaretskii <eliz@gnu.org> > > Cc: acm@muc.de, casouri@gmail.com, emacs-devel@gnu.org, > > monnier@iro.umontreal.ca, akrl@sdf.org > > > > Is this with xdisp.c in a Git repository or outside of a Git > > repository? > > Also, how many GC's and the time they took did benchmark report? With > such short timings and running the test only once, the difference GC > could make might be significant, so if different runs and different > people here have different numbers of GC, we could be comparing apples > with oranges. For my earlier results, I ran the -Og benchmark was in the git repository (with .git a directory) and the other three in git worktrees (with .git a regular file). I have repeated my tests for the -Og case in a git worktree, to match the other three. It didn't make a significant difference. I haven't tried it outside of git. Amended results below, including time in GC, for two runs each in separate instances of "emacs -Q". In all 16 cases there were 8 GCs. with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0))) -Og 1.340039s (0.149663s), 1.350613s (0.149954s) -O2 0.533649s (0.046995s), 0.533949s (0.046714s) -O1 0.661679s (0.055181s), 0.664470s (0.057050s) -O0 1.079090s (0.168691s), 1.068118s (0.168451s) without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c"))) -Og 1.293845s (0.150200s), 1.305310s (0.149520s) -O2 0.513139s (0.047117s), 0.511230s (0.047143s) -O1 0.629743s (0.054738s), 0.629870s (0.056522s) -O0 1.027754s (0.165569s), 1.031642s (0.168891s) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 17:22 ` Richard Copley @ 2020-04-04 17:50 ` Eli Zaretskii 2020-04-04 18:29 ` Andrea Corallo 1 sibling, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 17:50 UTC (permalink / raw) To: Richard Copley; +Cc: acm, emacs-devel, dgutov, akrl > From: Richard Copley <rcopley@gmail.com> > Date: Sat, 4 Apr 2020 18:22:34 +0100 > Cc: Alan Mackenzie <acm@muc.de>, Andrea Corallo <akrl@sdf.org>, > Emacs Development <emacs-devel@gnu.org>, Dmitry Gutov <dgutov@yandex.ru> > > For my earlier results, I ran the -Og benchmark was in the git > repository (with .git a directory) and the other three in git > worktrees (with .git a regular file). I have repeated my tests for the > -Og case in a git worktree, to match the other three. It didn't make a > significant difference. I haven't tried it outside of git. > > Amended results below, including time in GC, for two runs each in > separate instances of "emacs -Q". In all 16 cases there were 8 GCs. > > with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0))) > -Og 1.340039s (0.149663s), 1.350613s (0.149954s) > -O2 0.533649s (0.046995s), 0.533949s (0.046714s) > -O1 0.661679s (0.055181s), 0.664470s (0.057050s) > -O0 1.079090s (0.168691s), 1.068118s (0.168451s) > > without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c"))) > -Og 1.293845s (0.150200s), 1.305310s (0.149520s) > -O2 0.513139s (0.047117s), 0.511230s (0.047143s) > -O1 0.629743s (0.054738s), 0.629870s (0.056522s) > -O0 1.027754s (0.165569s), 1.031642s (0.168891s) Thanks. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 17:22 ` Richard Copley 2020-04-04 17:50 ` Eli Zaretskii @ 2020-04-04 18:29 ` Andrea Corallo 2020-04-04 18:56 ` Richard Copley 1 sibling, 1 reply; 142+ messages in thread From: Andrea Corallo @ 2020-04-04 18:29 UTC (permalink / raw) To: Richard Copley Cc: Alan Mackenzie, Eli Zaretskii, Emacs Development, Dmitry Gutov Richard Copley <rcopley@gmail.com> writes: > For my earlier results, I ran the -Og benchmark was in the git > repository (with .git a directory) and the other three in git > worktrees (with .git a regular file). I have repeated my tests for the > -Og case in a git worktree, to match the other three. It didn't make a > significant difference. I haven't tried it outside of git. > > Amended results below, including time in GC, for two runs each in > separate instances of "emacs -Q". In all 16 cases there were 8 GCs. > > with sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c") (sit-for 0))) > -Og 1.340039s (0.149663s), 1.350613s (0.149954s) > -O2 0.533649s (0.046995s), 0.533949s (0.046714s) > -O1 0.661679s (0.055181s), 0.664470s (0.057050s) > -O0 1.079090s (0.168691s), 1.068118s (0.168451s) > > without sit-for, (benchmark 1 '(progn (find-file "src/xdisp.c"))) > -Og 1.293845s (0.150200s), 1.305310s (0.149520s) > -O2 0.513139s (0.047117s), 0.511230s (0.047143s) > -O1 0.629743s (0.054738s), 0.629870s (0.056522s) > -O0 1.027754s (0.165569s), 1.031642s (0.168891s) The fact that -Og is slower then -O0 is very sad but also interesting. Which (I guess) GCC version are you on? Generally speaking I suspect -Og is not very much tested, especially performance wise. Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 18:29 ` Andrea Corallo @ 2020-04-04 18:56 ` Richard Copley 2020-04-04 20:36 ` Andrea Corallo 0 siblings, 1 reply; 142+ messages in thread From: Richard Copley @ 2020-04-04 18:56 UTC (permalink / raw) To: Andrea Corallo; +Cc: Emacs Development On Sat, 4 Apr 2020 at 19:29, Andrea Corallo <akrl@sdf.org> wrote: > The fact that -Og is slower then -O0 is very sad but also interesting. Yeah. Among its other selling points, it should give "a reasonable level of optimization" [1]. [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-Og > Which (I guess) GCC version are you on? GCC 9.3.0, for/on 64-bit Windows, built by MSYS2. > Generally speaking I suspect -Og is not very much tested, especially > performance wise. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 18:56 ` Richard Copley @ 2020-04-04 20:36 ` Andrea Corallo 0 siblings, 0 replies; 142+ messages in thread From: Andrea Corallo @ 2020-04-04 20:36 UTC (permalink / raw) To: Richard Copley; +Cc: Emacs Development Richard Copley <rcopley@gmail.com> writes: > On Sat, 4 Apr 2020 at 19:29, Andrea Corallo <akrl@sdf.org> wrote: > >> The fact that -Og is slower then -O0 is very sad but also interesting. > > Yeah. Among its other selling points, it should give "a reasonable > level of optimization" [1]. Yep, it does not make much sense to be honest. Just the fact you do not spill and fill all the time every automatic variables on the stack should give a measurable improvement. There must be some macroscopic reason we are missing. > [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-Og > >> Which (I guess) GCC version are you on? > > GCC 9.3.0, for/on 64-bit Windows, built by MSYS2. > > >> Generally speaking I suspect -Og is not very much tested, especially >> performance wise. > -- akrl@sdf.org ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 16:45 ` Eli Zaretskii 2020-04-04 17:22 ` Richard Copley @ 2020-04-04 17:36 ` Dmitry Gutov 2020-04-04 17:47 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-04 17:36 UTC (permalink / raw) To: Eli Zaretskii, acm; +Cc: casouri, akrl, monnier, emacs-devel On 04.04.2020 19:45, Eli Zaretskii wrote: > Also, how many GC's and the time they took did benchmark report? I showed such outputs before. Now, with an -Og build, here are outputs of several consecutive runs: Elapsed time: 0.912808s (0.125516s in 7 GCs) Elapsed time: 0.772653s (0.077285s in 4 GCs) Elapsed time: 0.769371s (0.076361s in 4 GCs) Elapsed time: 0.776261s (0.077395s in 4 GCs) (The first one right after Emacs was started). > With > such short timings and running the test only once, I always run it several times, discarding the first result because the FS cache is likely cold that iteration. The buffer is killed between runs, of course. > the difference GC > could make might be significant, so if different runs and different > people here have different numbers of GC, we could be comparing apples > with oranges. In an optimized build, it's always < 0.2s here. And I gave an average number. It's not my first time benchmarking either. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 17:36 ` Dmitry Gutov @ 2020-04-04 17:47 ` Eli Zaretskii 2020-04-04 18:02 ` Dmitry Gutov 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 17:47 UTC (permalink / raw) To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Sat, 4 Apr 2020 20:36:03 +0300 > Cc: casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca, > emacs-devel@gnu.org > > Now, with an -Og build, here are outputs of several consecutive runs: > > Elapsed time: 0.912808s (0.125516s in 7 GCs) > Elapsed time: 0.772653s (0.077285s in 4 GCs) > Elapsed time: 0.769371s (0.076361s in 4 GCs) > Elapsed time: 0.776261s (0.077395s in 4 GCs) > [...] > In an optimized build, it's always < 0.2s here. So we are looking at -O2 being about 3 to 5 times faster than -Og, right? That's a speedup that is more than I'd expect, but still nowhere near an order of magnitude that Alan's timings seemed to show. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 17:47 ` Eli Zaretskii @ 2020-04-04 18:02 ` Dmitry Gutov 2020-04-04 23:01 ` Stefan Monnier 0 siblings, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-04 18:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl On 04.04.2020 20:47, Eli Zaretskii wrote: >> Elapsed time: 0.912808s (0.125516s in 7 GCs) >> Elapsed time: 0.772653s (0.077285s in 4 GCs) >> Elapsed time: 0.769371s (0.076361s in 4 GCs) >> Elapsed time: 0.776261s (0.077395s in 4 GCs) >> [...] >> In an optimized build, it's always < 0.2s here. > So we are looking at -O2 being about 3 to 5 times faster than -Og, > right? That's a speedup that is more than I'd expect, but still > nowhere near an order of magnitude that Alan's timings seemed to show. 0.76 / 0.13 ~= 5.86 Alan's difference is bigger, but not by much: 1.24 / 0.18 ~= 6.88 1.18 (from another email) / 0.18 ~= 6.55 Which probably makes sense given different CPU architectures. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 18:02 ` Dmitry Gutov @ 2020-04-04 23:01 ` Stefan Monnier 2020-04-06 14:25 ` Yuan Fu 0 siblings, 1 reply; 142+ messages in thread From: Stefan Monnier @ 2020-04-04 23:01 UTC (permalink / raw) To: Dmitry Gutov; +Cc: acm, Eli Zaretskii, emacs-devel, casouri, akrl > 0.76 / 0.13 ~= 5.86 > > Alan's difference is bigger, but not by much: > > 1.24 / 0.18 ~= 6.88 > 1.18 (from another email) / 0.18 ~= 6.55 That does remind me that I've had the impression "lately" that debug builds are much slower than they used to be. I suspect (for no reason other than lack of imagination on my part) this is linked to the changes from macros to inlinable functions. When Paul started doing that we tried to keep some "important" macros as macros (depending on DEFINE_KEY_OPS_AS_MACROS) to keep the performance impact under control. Maybe something changed in this respect (maybe we should add a few more fallback-macros into the set of functions affected by DEFINE_KEY_OPS_AS_MACROS, or maybe something prevents DEFINE_KEY_OPS_AS_MACROS from doing its job, or ...)? Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 23:01 ` Stefan Monnier @ 2020-04-06 14:25 ` Yuan Fu 2020-04-06 19:55 ` Jorge Javier Araya Navarro 0 siblings, 1 reply; 142+ messages in thread From: Yuan Fu @ 2020-04-06 14:25 UTC (permalink / raw) To: Stefan Monnier Cc: acm, Eli Zaretskii, Andrea Corallo, emacs-devel, Dmitry Gutov Seems the discussion has stalled, may I ask what’s the conclusion so far? (w.r.t. whole buffer parse & how to pass text to tree-sitter.) Yuan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-06 14:25 ` Yuan Fu @ 2020-04-06 19:55 ` Jorge Javier Araya Navarro 0 siblings, 0 replies; 142+ messages in thread From: Jorge Javier Araya Navarro @ 2020-04-06 19:55 UTC (permalink / raw) To: emacs-devel El lunes 06 de abril del 2020 a las 0825 horas, Yuan Fu escribió: > Seems the discussion has stalled, may I ask what’s the conclusion so far? (w.r.t. whole buffer parse & how to pass text to tree-sitter.) > > Yuan whole buffer pass and using after-change-functions for incremental parsing, AFAIK. Tweak what ever needs to be tweaked or change what needs an adjustment, rinse and repeat. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 16:38 ` Eli Zaretskii 2020-04-04 16:45 ` Eli Zaretskii @ 2020-04-04 17:29 ` Dmitry Gutov 2020-04-04 17:38 ` Eli Zaretskii 1 sibling, 1 reply; 142+ messages in thread From: Dmitry Gutov @ 2020-04-04 17:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl On 04.04.2020 19:38, Eli Zaretskii wrote: > Is this with xdisp.c in a Git repository or outside of a Git > repository? Inside, always. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 17:29 ` Dmitry Gutov @ 2020-04-04 17:38 ` Eli Zaretskii 2020-04-04 17:57 ` Dmitry Gutov 0 siblings, 1 reply; 142+ messages in thread From: Eli Zaretskii @ 2020-04-04 17:38 UTC (permalink / raw) To: Dmitry Gutov; +Cc: acm, casouri, emacs-devel, monnier, akrl > Cc: acm@muc.de, casouri@gmail.com, akrl@sdf.org, monnier@iro.umontreal.ca, > emacs-devel@gnu.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Sat, 4 Apr 2020 20:29:46 +0300 > > On 04.04.2020 19:38, Eli Zaretskii wrote: > > Is this with xdisp.c in a Git repository or outside of a Git > > repository? > > Inside, always. In which case invoking Git (and all the machinery that runs a sub-process) is another factor. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-04-04 17:38 ` Eli Zaretskii @ 2020-04-04 17:57 ` Dmitry Gutov 0 siblings, 0 replies; 142+ messages in thread From: Dmitry Gutov @ 2020-04-04 17:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, casouri, emacs-devel, monnier, akrl On 04.04.2020 20:38, Eli Zaretskii wrote: > In which case invoking Git (and all the machinery that runs a > sub-process) is another factor. See my older message about using js-mode with xdisp.c in an -Og build. It was 0.06s or so. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 13:14 ` Eli Zaretskii 2020-03-31 14:31 ` Dmitry Gutov 2020-03-31 15:11 ` Stefan Monnier @ 2020-03-31 16:13 ` Alan Third 2020-03-31 17:55 ` Eli Zaretskii 2 siblings, 1 reply; 142+ messages in thread From: Alan Third @ 2020-03-31 16:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: casouri, emacs-devel, Stefan Monnier, akrl On Tue, Mar 31, 2020 at 04:14:16PM +0300, Eli Zaretskii wrote: > > In any case, I hope that passing the buffer to tree-sitter doesn't > involve marshalling the entire buffer text via a function call as a > huge string, or some such. We should instead request that tree-sitter > exposes an API through which we could give it direct access to buffer > text as 2 parts, before and after the gap, like we do with regex > code. Otherwise this will be a bottleneck in the long run, not unlike > the problem we have with LSP. I'm not sure if this is exactly what you're talking about, but it has an API for letting it access your own data structure: https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code -- Alan Third ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) 2020-03-31 16:13 ` Alan Third @ 2020-03-31 17:55 ` Eli Zaretskii 0 siblings, 0 replies; 142+ messages in thread From: Eli Zaretskii @ 2020-03-31 17:55 UTC (permalink / raw) To: Alan Third; +Cc: casouri, emacs-devel, monnier, akrl > Date: Tue, 31 Mar 2020 18:13:15 +0200 (CEST) > From: Alan Third <alan@idiocy.org> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, casouri@gmail.com, > akrl@sdf.org, emacs-devel@gnu.org > > I'm not sure if this is exactly what you're talking about, but it has > an API for letting it access your own data structure: > > https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code Yes, I've read their docs. It isn't optimal for us, although it will do for initial experiments. But for production I think we need something more efficient. One of the problems we need to solve is how to avoid the costly encoding of buffer text, and still be able to support the occasional raw bytes we sometimes have in our buffers. ^ permalink raw reply [flat|nested] 142+ messages in thread
end of thread, other threads:[~2020-04-06 19:55 UTC | newest] Thread overview: 142+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-03-31 17:07 Reliable after-change-functions (via: Using incremental parsing in Emacs) Tuấn Anh Nguyễn 2020-03-31 17:50 ` Eli Zaretskii 2020-04-01 6:17 ` Tuấn Anh Nguyễn 2020-04-01 13:26 ` Eli Zaretskii 2020-04-01 15:47 ` Jorge Javier Araya Navarro 2020-04-01 16:07 ` Eli Zaretskii 2020-04-01 17:55 ` Tuấn-Anh Nguyễn 2020-04-01 19:33 ` Eli Zaretskii 2020-04-01 23:38 ` Stephen Leake 2020-04-02 0:25 ` Stephen Leake 2020-04-02 2:46 ` Stefan Monnier 2020-04-02 4:36 ` Tuấn-Anh Nguyễn 2020-04-02 14:44 ` Eli Zaretskii 2020-04-02 15:19 ` Stefan Monnier 2020-04-03 2:49 ` [SPAM UNSURE] " Stephen Leake 2020-04-03 7:47 ` Eli Zaretskii 2020-04-03 18:11 ` Stephen Leake 2020-04-03 18:46 ` Eli Zaretskii 2020-04-04 0:05 ` Stephen Leake 2020-04-03 8:11 ` Robert Pluim 2020-04-03 11:00 ` Eli Zaretskii 2020-04-03 11:09 ` Robert Pluim 2020-04-03 12:44 ` Eli Zaretskii 2020-04-03 11:21 ` John Yates 2020-04-03 12:50 ` Eli Zaretskii 2020-04-02 5:21 ` Tuấn-Anh Nguyễn 2020-04-02 9:24 ` [SPAM UNSURE] " Stephen Leake 2020-04-02 14:36 ` Eli Zaretskii 2020-04-03 2:27 ` Stephen Leake 2020-04-03 7:43 ` Eli Zaretskii 2020-04-03 17:45 ` Stephen Leake 2020-04-03 18:31 ` Eli Zaretskii 2020-04-04 0:04 ` Stephen Leake 2020-04-04 7:13 ` Eli Zaretskii 2020-04-02 4:21 ` Tuấn-Anh Nguyễn 2020-04-02 5:19 ` Jorge Javier Araya Navarro 2020-04-02 9:29 ` Stephen Leake 2020-04-02 10:37 ` Andrea Corallo 2020-04-02 11:14 ` Tuấn-Anh Nguyễn 2020-04-02 13:02 ` Stefan Monnier 2020-04-02 15:06 ` Eli Zaretskii 2020-04-02 15:02 ` Eli Zaretskii 2020-04-03 14:34 ` Tuấn-Anh Nguyễn -- strict thread matches above, loose matches on Subject: below -- 2020-03-29 18:46 Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier 2020-03-29 19:05 ` Andrea Corallo 2020-03-29 19:18 ` Eli Zaretskii 2020-03-29 19:29 ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu 2020-03-30 14:04 ` Eli Zaretskii 2020-03-30 15:06 ` Stefan Monnier 2020-03-30 17:14 ` Yuan Fu 2020-03-30 17:54 ` Stefan Monnier 2020-03-30 18:43 ` Štěpán Němec 2020-03-30 18:46 ` Stefan Monnier 2020-03-30 19:02 ` Yuan Fu 2020-03-30 19:10 ` Eli Zaretskii 2020-03-30 19:21 ` Yuan Fu 2020-03-31 3:56 ` Štěpán Němec 2020-03-31 13:16 ` Eli Zaretskii 2020-03-31 13:36 ` Štěpán Němec 2020-03-31 14:34 ` Eli Zaretskii 2020-03-31 15:37 ` Štěpán Němec 2020-03-31 15:58 ` Eli Zaretskii 2020-03-31 16:18 ` Štěpán Němec 2020-03-31 17:38 ` Eli Zaretskii 2020-04-01 0:57 ` Stephen Leake 2020-03-30 19:42 ` Stefan Monnier 2020-03-30 19:27 ` Štěpán Němec 2020-03-31 2:24 ` Eli Zaretskii 2020-03-31 3:10 ` Stefan Monnier 2020-03-31 13:14 ` Eli Zaretskii 2020-03-31 14:31 ` Dmitry Gutov 2020-03-31 15:36 ` Eli Zaretskii 2020-03-31 15:45 ` Dmitry Gutov 2020-03-31 17:16 ` Stefan Monnier 2020-03-31 17:48 ` Eli Zaretskii 2020-03-31 19:35 ` Stefan Monnier 2020-04-01 2:23 ` Eli Zaretskii 2020-03-31 15:11 ` Stefan Monnier 2020-03-31 15:44 ` Eli Zaretskii 2020-03-31 17:10 ` Stefan Monnier 2020-03-31 17:19 ` Jorge Javier Araya Navarro 2020-03-31 17:46 ` Eli Zaretskii 2020-03-31 18:42 ` 조성빈 2020-03-31 19:29 ` Eli Zaretskii 2020-03-31 18:47 ` Dmitry Gutov 2020-03-31 18:48 ` Noam Postavsky 2020-03-31 19:02 ` Dmitry Gutov 2020-03-31 19:26 ` Eli Zaretskii 2020-03-31 19:50 ` Dmitry Gutov 2020-04-01 2:28 ` Eli Zaretskii 2020-04-01 3:49 ` Dmitry Gutov 2020-04-01 4:14 ` Eli Zaretskii 2020-04-01 13:47 ` Dmitry Gutov 2020-04-01 14:04 ` Eli Zaretskii 2020-04-01 14:55 ` Eli Zaretskii 2020-04-01 15:16 ` Dmitry Gutov 2020-04-01 15:59 ` Eli Zaretskii 2020-04-01 21:48 ` Dmitry Gutov 2020-04-01 22:29 ` Stefan Monnier 2020-04-02 14:23 ` Eli Zaretskii 2020-04-02 16:17 ` Dmitry Gutov 2020-04-02 18:25 ` Eli Zaretskii 2020-04-03 14:40 ` Tuấn-Anh Nguyễn 2020-04-03 16:10 ` Dmitry Gutov 2020-04-01 13:52 ` Alan Mackenzie 2020-04-01 14:10 ` Eli Zaretskii 2020-04-01 15:27 ` Dmitry Gutov 2020-04-01 15:44 ` Jorge Javier Araya Navarro 2020-04-01 16:03 ` Eli Zaretskii 2020-04-01 21:21 ` Dmitry Gutov 2020-04-02 14:09 ` Eli Zaretskii 2020-04-02 18:03 ` 조성빈 via "Emacs development discussions. 2020-04-02 18:27 ` Yuan Fu 2020-04-02 19:39 ` Stefan Monnier 2020-04-01 15:22 ` Dmitry Gutov 2020-04-04 11:06 ` Alan Mackenzie 2020-04-04 11:26 ` Eli Zaretskii 2020-04-04 14:14 ` Andrea Corallo 2020-04-04 14:41 ` Eli Zaretskii 2020-04-04 15:04 ` Andrea Corallo 2020-04-04 15:38 ` Richard Copley 2020-04-04 11:27 ` Eli Zaretskii 2020-04-04 12:01 ` Dmitry Gutov 2020-04-04 12:36 ` Alan Mackenzie 2020-04-04 12:40 ` Dmitry Gutov 2020-04-04 13:02 ` Eli Zaretskii 2020-04-04 16:09 ` Dmitry Gutov 2020-04-04 16:38 ` Eli Zaretskii 2020-04-04 16:45 ` Eli Zaretskii 2020-04-04 17:22 ` Richard Copley 2020-04-04 17:50 ` Eli Zaretskii 2020-04-04 18:29 ` Andrea Corallo 2020-04-04 18:56 ` Richard Copley 2020-04-04 20:36 ` Andrea Corallo 2020-04-04 17:36 ` Dmitry Gutov 2020-04-04 17:47 ` Eli Zaretskii 2020-04-04 18:02 ` Dmitry Gutov 2020-04-04 23:01 ` Stefan Monnier 2020-04-06 14:25 ` Yuan Fu 2020-04-06 19:55 ` Jorge Javier Araya Navarro 2020-04-04 17:29 ` Dmitry Gutov 2020-04-04 17:38 ` Eli Zaretskii 2020-04-04 17:57 ` Dmitry Gutov 2020-03-31 16:13 ` Alan Third 2020-03-31 17:55 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).