unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Stephen Leake <stephen_leake@stephe-leake.org>
To: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Fri, 03 Apr 2020 09:45:44 -0800	[thread overview]
Message-ID: <868sjcfoon.fsf@stephe-leake.org> (raw)
In-Reply-To: <83zhbtvwsm.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 03 Apr 2020 10:43:53 +0300")

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Thu, 02 Apr 2020 18:27:59 -0800
>> 
>> > Such copying is not really scalable, and IMO should be avoided.
>> > During active editing, redisplay runs very frequently, and having to
>> > copy portions of the buffer, let alone all of it, each time, which
>> > necessarily requires memory allocation, consing of Lisp objects, etc.,
>> > will produce significant memory pressure, expensive heap
>> > allocations/deallocations, and a lot of GC.  Recall that on many
>> > modern platforms Emacs doesn't really return memory to the system,
>> > which means we risk increasing the memory footprint, and create
>> > system-wide memory pressure.  It isn't a catastrophe, but we should
>> > try to avoid it if possible.
>> 
>> Ok. I know very little about the internal storage of text in Emacs.
>> There is at least two strings with a gap at the current edit point; if
>> we pass a simple pointer to tree-sitter, it will have to handle the gap.
>
> Tree-sitter allows the application to define a "reader" function that
> it will then call to access buffer text.  That function should cope
> with the gap.

and also with the encoding, which you did not address. I don't see how
that is different from the C level buffer-substring. Certainly there
should be a module function buffer-substring that is as efficient as possible.

>> You mention "consing of Lisp objects" above, which says to me that the
>> text is stored in a more complex structure.
>
> I meant the consing that is necessary to make a buffer-substring that
> will be passed to the parser.

Since are are calling the parser from C (if it is linked into Emacs, or
in a module), I still don't understand. Does C code have to cons to
create a string? It will have to allocate if the requested range is not
contiguous in the buffer.

>> Avoid _all_ copying is impossible; the parser must store the contents of
>> each token in some way. Typically that is done by storing
>> pointers/indices into the text buffer that contains the entire text.
>
> I don't think tree-sitter does that, because the text it gets is
> ephemeral.  If we pass it a buffer-substring, it's a temporary string
> which will be GCed after it's used; if we pass it pointers to buffer
> text, those pointers can be invalid after GC, because GC can relocate
> buffer text to a different memory region.

Hmm.
https://tree-sitter.github.io/tree-sitter/using-parsers#providing-the-code
says:

    Syntax nodes store their position in the source code both in terms
    of raw bytes and row/column coordinates

In the case of passing a pointer to a string (or buffer, etc), those
positions are relative to that original buffer. So the Emacs buffer is
serving as the parse buffer. Ok, that avoids any copying.

If we pass a buffer-substring to the parser, we are then responsible for
mapping positions relative to the substring into positions relative to
the full buffer. wisi delegates that to the parser; it can pass
start-char-pos and start-byte-pos to the parser along with a string.


>> >> In sum, the short answer is "yes, you must parse the whole file, unless
>> >> your language is particularly simple".
>> >
>> > Funny, my conclusion from reading your detailed description was
>> > entirely different.
>> 
>> I need more than that to respond in a helpful way.
>
> Well, you said:
>
>> To some extent, that depends on the language.
>
> and then went on to describing how each language might _not_ need a
> full parse in many cases.  Thus the conclusion sounded a bit radical
> to me.

Ok, we are putting different spins on what "particularly simple" means.

A more neutral phrasing would be:

    Some languages require parsing the whole file, some do not.
    
-- 
-- Stephe



  reply	other threads:[~2020-04-03 17:45 UTC|newest]

Thread overview: 142+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31 17:07 Reliable after-change-functions (via: Using incremental parsing in Emacs) Tuấn Anh Nguyễn
2020-03-31 17:50 ` Eli Zaretskii
2020-04-01  6:17   ` Tuấn Anh Nguyễn
2020-04-01 13:26     ` Eli Zaretskii
2020-04-01 15:47       ` Jorge Javier Araya Navarro
2020-04-01 16:07         ` Eli Zaretskii
2020-04-01 17:55       ` Tuấn-Anh Nguyễn
2020-04-01 19:33         ` Eli Zaretskii
2020-04-01 23:38           ` Stephen Leake
2020-04-02  0:25             ` Stephen Leake
2020-04-02  2:46             ` Stefan Monnier
2020-04-02  4:36               ` Tuấn-Anh Nguyễn
2020-04-02 14:44               ` Eli Zaretskii
2020-04-02 15:19                 ` Stefan Monnier
2020-04-03  2:49                 ` [SPAM UNSURE] " Stephen Leake
2020-04-03  7:47                   ` Eli Zaretskii
2020-04-03 18:11                     ` Stephen Leake
2020-04-03 18:46                       ` Eli Zaretskii
2020-04-04  0:05                         ` Stephen Leake
2020-04-03  8:11                   ` Robert Pluim
2020-04-03 11:00                     ` Eli Zaretskii
2020-04-03 11:09                       ` Robert Pluim
2020-04-03 12:44                         ` Eli Zaretskii
2020-04-03 11:21                       ` John Yates
2020-04-03 12:50                         ` Eli Zaretskii
2020-04-02  5:21             ` Tuấn-Anh Nguyễn
2020-04-02  9:24               ` [SPAM UNSURE] " Stephen Leake
2020-04-02 14:36             ` Eli Zaretskii
2020-04-03  2:27               ` Stephen Leake
2020-04-03  7:43                 ` Eli Zaretskii
2020-04-03 17:45                   ` Stephen Leake [this message]
2020-04-03 18:31                     ` Eli Zaretskii
2020-04-04  0:04                       ` Stephen Leake
2020-04-04  7:13                         ` Eli Zaretskii
2020-04-02  4:21           ` Tuấn-Anh Nguyễn
2020-04-02  5:19             ` Jorge Javier Araya Navarro
2020-04-02  9:29               ` Stephen Leake
2020-04-02 10:37             ` Andrea Corallo
2020-04-02 11:14               ` Tuấn-Anh Nguyễn
2020-04-02 13:02             ` Stefan Monnier
2020-04-02 15:06               ` Eli Zaretskii
2020-04-02 15:02             ` Eli Zaretskii
2020-04-03 14:34               ` Tuấn-Anh Nguyễn
  -- strict thread matches above, loose matches on Subject: below --
2020-03-29 18:46 Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
2020-03-29 19:05 ` Andrea Corallo
2020-03-29 19:18   ` Eli Zaretskii
2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
2020-03-30 14:04       ` Eli Zaretskii
2020-03-30 15:06       ` Stefan Monnier
2020-03-30 17:14         ` Yuan Fu
2020-03-30 17:54           ` Stefan Monnier
2020-03-30 18:43             ` Štěpán Němec
2020-03-30 18:46               ` Stefan Monnier
2020-03-30 19:02                 ` Yuan Fu
2020-03-30 19:10                   ` Eli Zaretskii
2020-03-30 19:21                     ` Yuan Fu
2020-03-31  3:56                       ` Štěpán Němec
2020-03-31 13:16                         ` Eli Zaretskii
2020-03-31 13:36                           ` Štěpán Němec
2020-03-31 14:34                             ` Eli Zaretskii
2020-03-31 15:37                               ` Štěpán Němec
2020-03-31 15:58                                 ` Eli Zaretskii
2020-03-31 16:18                                   ` Štěpán Němec
2020-03-31 17:38                                     ` Eli Zaretskii
2020-04-01  0:57                     ` Stephen Leake
2020-03-30 19:42                   ` Stefan Monnier
2020-03-30 19:27                 ` Štěpán Němec
2020-03-31  2:24           ` Eli Zaretskii
2020-03-31  3:10             ` Stefan Monnier
2020-03-31 13:14               ` Eli Zaretskii
2020-03-31 14:31                 ` Dmitry Gutov
2020-03-31 15:36                   ` Eli Zaretskii
2020-03-31 15:45                     ` Dmitry Gutov
2020-03-31 17:16                     ` Stefan Monnier
2020-03-31 17:48                       ` Eli Zaretskii
2020-03-31 19:35                         ` Stefan Monnier
2020-04-01  2:23                           ` Eli Zaretskii
2020-03-31 15:11                 ` Stefan Monnier
2020-03-31 15:44                   ` Eli Zaretskii
2020-03-31 17:10                     ` Stefan Monnier
2020-03-31 17:19                       ` Jorge Javier Araya Navarro
2020-03-31 17:46                       ` Eli Zaretskii
2020-03-31 18:42                         ` 조성빈
2020-03-31 19:29                           ` Eli Zaretskii
2020-03-31 18:47                         ` Dmitry Gutov
2020-03-31 18:48                           ` Noam Postavsky
2020-03-31 19:02                             ` Dmitry Gutov
2020-03-31 19:26                           ` Eli Zaretskii
2020-03-31 19:50                             ` Dmitry Gutov
2020-04-01  2:28                               ` Eli Zaretskii
2020-04-01  3:49                                 ` Dmitry Gutov
2020-04-01  4:14                                   ` Eli Zaretskii
2020-04-01 13:47                                     ` Dmitry Gutov
2020-04-01 14:04                                       ` Eli Zaretskii
2020-04-01 14:55                                         ` Eli Zaretskii
2020-04-01 15:16                                         ` Dmitry Gutov
2020-04-01 15:59                                           ` Eli Zaretskii
2020-04-01 21:48                                             ` Dmitry Gutov
2020-04-01 22:29                                               ` Stefan Monnier
2020-04-02 14:23                                               ` Eli Zaretskii
2020-04-02 16:17                                                 ` Dmitry Gutov
2020-04-02 18:25                                                   ` Eli Zaretskii
2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
2020-04-03 16:10                                                     ` Dmitry Gutov
2020-04-01 13:52                                     ` Alan Mackenzie
2020-04-01 14:10                                       ` Eli Zaretskii
2020-04-01 15:27                                         ` Dmitry Gutov
2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
2020-04-01 16:03                                           ` Eli Zaretskii
2020-04-01 21:21                                             ` Dmitry Gutov
2020-04-02 14:09                                               ` Eli Zaretskii
2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
2020-04-02 18:27                                                   ` Yuan Fu
2020-04-02 19:39                                                     ` Stefan Monnier
2020-04-01 15:22                                       ` Dmitry Gutov
2020-04-04 11:06                                         ` Alan Mackenzie
2020-04-04 11:26                                           ` Eli Zaretskii
2020-04-04 14:14                                             ` Andrea Corallo
2020-04-04 14:41                                               ` Eli Zaretskii
2020-04-04 15:04                                                 ` Andrea Corallo
2020-04-04 15:38                                                   ` Richard Copley
2020-04-04 11:27                                           ` Eli Zaretskii
2020-04-04 12:01                                           ` Dmitry Gutov
2020-04-04 12:36                                             ` Alan Mackenzie
2020-04-04 12:40                                               ` Dmitry Gutov
2020-04-04 13:02                                               ` Eli Zaretskii
2020-04-04 16:09                                                 ` Dmitry Gutov
2020-04-04 16:38                                                   ` Eli Zaretskii
2020-04-04 16:45                                                     ` Eli Zaretskii
2020-04-04 17:22                                                       ` Richard Copley
2020-04-04 17:50                                                         ` Eli Zaretskii
2020-04-04 18:29                                                         ` Andrea Corallo
2020-04-04 18:56                                                           ` Richard Copley
2020-04-04 20:36                                                             ` Andrea Corallo
2020-04-04 17:36                                                       ` Dmitry Gutov
2020-04-04 17:47                                                         ` Eli Zaretskii
2020-04-04 18:02                                                           ` Dmitry Gutov
2020-04-04 23:01                                                             ` Stefan Monnier
2020-04-06 14:25                                                               ` Yuan Fu
2020-04-06 19:55                                                                 ` Jorge Javier Araya Navarro
2020-04-04 17:29                                                     ` Dmitry Gutov
2020-04-04 17:38                                                       ` Eli Zaretskii
2020-04-04 17:57                                                         ` Dmitry Gutov
2020-03-31 16:13                 ` Alan Third
2020-03-31 17:55                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=868sjcfoon.fsf@stephe-leake.org \
    --to=stephen_leake@stephe-leake.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).