all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Tuấn-Anh Nguyễn" <ubolonton@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Thu, 2 Apr 2020 11:21:49 +0700	[thread overview]
Message-ID: <CAOGL6j6_2w=ktr4tdSv1gfGwB9X3aqjwW7qAC2Rao6r6LtJqjA@mail.gmail.com> (raw)
In-Reply-To: <831rp7ypam.fsf@gnu.org>

On Thu, Apr 2, 2020 at 2:33 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> > Date: Thu, 2 Apr 2020 00:55:45 +0700
> > Cc: emacs-devel@gnu.org
> >
> > > Did you consider using the API where an application can provide a
> > > function to return text at a given offset?  Such a function could be
> > > relatively easily implemented for Emacs.
> > >
> >
> > I don't understand what you mean. Below I'll explain how it works
> > currently.  [...]  If dynamic modules have direct access to the
> > buffer text, none of the above is an issue.
> >
> > Such direct access can be enabled by something like this:
> >
> >     char* (*access_buffer_text) (emacs_env *env,
> >                                  emacs_value buffer,
> >                                  ptrdiff_t byte_offset,
> >                                  ptrdiff_t *size_inout);
> >
> > Of course, such an API would require extensive documentation on how it
> > must be used, to ensure safety and correctness.
>
> I think you are moving too fast, and keep the current implementation
> in sight too much.
>

I'm actually moving too slow here. I have thought about this part quite
a bit, but I'm currently focusing on other things, partially because
this is not painful bottleneck.

> What I suggest is to step back and see how such direct access, if it
> were available, could be used with tree-sitter.  Let's forget about
> modules for a moment and consider tree-sitter linked with Emacs and
> capable of calling any C function in core.  How would you use that?
>
> Buffer text is not exactly UTF-8, it's a superset of UTF-8.  So one
> question to answer is what to do with byte sequences that are not
> valid UTF-8.  Any suggestions or ideas?  How does tree-sitter handle
> invalid byte sequences in general?
>

I haven't checked yet. It will probably bail out, which is usually the
desired behavior. The tree-sitter's author is likely open to making this
behavior configurable here, though. Alternatively, the direct access
function can offer different behaviors: as-is, bail-out, skip-over, or
null-out (tree-sitter will skip over null bytes, IIRC).

> Also, direct access to buffer text generally means we must make sure
> GC never runs as long as pointers to buffer text are lying around.
> Can any Lisp run between calls to the reader function that the
> tree-sitter parser calls to access the buffer text?  If so, we need to
> take care of that issue.
>

With direct access, no Lisp code will be run between these calls.

> Next, I'm still asking whether parsing the whole buffer when it is
> first created is necessary.  Can we pass to the parser just a small
> chunk (say, 500 bytes) of the buffer around the window-full to be
> displayed next?  If this presents problems, what are those problems?
>

In principle (not in tree-sitter ATM), and in very specific cases, yes.
IMO that's the wrong focus on a premature optimization anyway. As others
noted, even in the pathological case of xdisp.c, the performance is
acceptable. Also keep in mind that syntax highlighting is just one
application. Other use cases usually want a full parse tree.

If we really want to tackle this issue, there are other approaches to
consider, e.g. background parsing, or parsing up until a time limit, and
resume parsing when Emacs is idle. Tree-sitter's API supports the
latter.

But again, both thought exercises and my usage so far point to this
being a non-issue.

> IOW, the issue with exposing access to buffer text to modules is IMO
> secondary.  My suggestion is first to figure out how to do this stuff
> efficiently from within Emacs itself, as if the module interface were
> not part of the equation.  We can add that aspect back later.
>

My opinion is that it's better to experiment with this kind of stuff
out-of-core. It can move forward faster that way, allowing more lessons
to be learned. Real lessons, involving real-world use cases, not thought
exercises.

In a somewhat similar vein, writing emacs-tree-sitter highlighted real
issues with dynamic modules, which I'm going to write up sometime.

> And yes, doing this by consing strings is not a good idea, it will
> slow things down and cause a lot of GC.  It is best avoided.  Thus my
> questions above.
>
> > > Btw, what do you do with the tree returned by the tree-sitter parser?
> > > store it in some buffer-local variable?  If so, how much memory does
> > > such a tree take, and when, if ever, is that memory released?
> > >
> >
> > It's stored in a buffer-local variable. I haven't measured the memory
> > they take. Memory is released when the tree object is garbage-collected
> > (it's a `user-ptr').
>
> So if I have many hundreds of buffers, I could have such a tree in
> each one of them indefinitely?  Perhaps that's one more design issue
> to consider, given that the parsing is so fast.  Similar to what we do
> with image and face caches -- we flush them from time to time, to keep
> the memory footprint in check.  So a buffer that was not current more
> than some time interval ago could have its tree GCed.
>

That can work. Alternatively, tree-sitter can add support for "folding"
subtrees, as Stefan suggested.

--
Tuấn-Anh Nguyễn
Software Engineer



  parent reply	other threads:[~2020-04-02  4:21 UTC|newest]

Thread overview: 142+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31 17:07 Reliable after-change-functions (via: Using incremental parsing in Emacs) Tuấn Anh Nguyễn
2020-03-31 17:50 ` Eli Zaretskii
2020-04-01  6:17   ` Tuấn Anh Nguyễn
2020-04-01 13:26     ` Eli Zaretskii
2020-04-01 15:47       ` Jorge Javier Araya Navarro
2020-04-01 16:07         ` Eli Zaretskii
2020-04-01 17:55       ` Tuấn-Anh Nguyễn
2020-04-01 19:33         ` Eli Zaretskii
2020-04-01 23:38           ` Stephen Leake
2020-04-02  0:25             ` Stephen Leake
2020-04-02  2:46             ` Stefan Monnier
2020-04-02  4:36               ` Tuấn-Anh Nguyễn
2020-04-02 14:44               ` Eli Zaretskii
2020-04-02 15:19                 ` Stefan Monnier
2020-04-03  2:49                 ` [SPAM UNSURE] " Stephen Leake
2020-04-03  7:47                   ` Eli Zaretskii
2020-04-03 18:11                     ` Stephen Leake
2020-04-03 18:46                       ` Eli Zaretskii
2020-04-04  0:05                         ` Stephen Leake
2020-04-03  8:11                   ` Robert Pluim
2020-04-03 11:00                     ` Eli Zaretskii
2020-04-03 11:09                       ` Robert Pluim
2020-04-03 12:44                         ` Eli Zaretskii
2020-04-03 11:21                       ` John Yates
2020-04-03 12:50                         ` Eli Zaretskii
2020-04-02  5:21             ` Tuấn-Anh Nguyễn
2020-04-02  9:24               ` [SPAM UNSURE] " Stephen Leake
2020-04-02 14:36             ` Eli Zaretskii
2020-04-03  2:27               ` Stephen Leake
2020-04-03  7:43                 ` Eli Zaretskii
2020-04-03 17:45                   ` Stephen Leake
2020-04-03 18:31                     ` Eli Zaretskii
2020-04-04  0:04                       ` Stephen Leake
2020-04-04  7:13                         ` Eli Zaretskii
2020-04-02  4:21           ` Tuấn-Anh Nguyễn [this message]
2020-04-02  5:19             ` Jorge Javier Araya Navarro
2020-04-02  9:29               ` Stephen Leake
2020-04-02 10:37             ` Andrea Corallo
2020-04-02 11:14               ` Tuấn-Anh Nguyễn
2020-04-02 13:02             ` Stefan Monnier
2020-04-02 15:06               ` Eli Zaretskii
2020-04-02 15:02             ` Eli Zaretskii
2020-04-03 14:34               ` Tuấn-Anh Nguyễn
  -- strict thread matches above, loose matches on Subject: below --
2020-03-29 18:46 Using incremental parsing in Emacs (via: emacs rendering comparisson between emacs23 and emacs26.3) Stefan Monnier
2020-03-29 19:05 ` Andrea Corallo
2020-03-29 19:18   ` Eli Zaretskii
2020-03-29 19:29     ` Reliable after-change-functions (via: Using incremental parsing in Emacs) Yuan Fu
2020-03-30 14:04       ` Eli Zaretskii
2020-03-30 15:06       ` Stefan Monnier
2020-03-30 17:14         ` Yuan Fu
2020-03-30 17:54           ` Stefan Monnier
2020-03-30 18:43             ` Štěpán Němec
2020-03-30 18:46               ` Stefan Monnier
2020-03-30 19:02                 ` Yuan Fu
2020-03-30 19:10                   ` Eli Zaretskii
2020-03-30 19:21                     ` Yuan Fu
2020-03-31  3:56                       ` Štěpán Němec
2020-03-31 13:16                         ` Eli Zaretskii
2020-03-31 13:36                           ` Štěpán Němec
2020-03-31 14:34                             ` Eli Zaretskii
2020-03-31 15:37                               ` Štěpán Němec
2020-03-31 15:58                                 ` Eli Zaretskii
2020-03-31 16:18                                   ` Štěpán Němec
2020-03-31 17:38                                     ` Eli Zaretskii
2020-04-01  0:57                     ` Stephen Leake
2020-03-30 19:42                   ` Stefan Monnier
2020-03-30 19:27                 ` Štěpán Němec
2020-03-31  2:24           ` Eli Zaretskii
2020-03-31  3:10             ` Stefan Monnier
2020-03-31 13:14               ` Eli Zaretskii
2020-03-31 14:31                 ` Dmitry Gutov
2020-03-31 15:36                   ` Eli Zaretskii
2020-03-31 15:45                     ` Dmitry Gutov
2020-03-31 17:16                     ` Stefan Monnier
2020-03-31 17:48                       ` Eli Zaretskii
2020-03-31 19:35                         ` Stefan Monnier
2020-04-01  2:23                           ` Eli Zaretskii
2020-03-31 15:11                 ` Stefan Monnier
2020-03-31 15:44                   ` Eli Zaretskii
2020-03-31 17:10                     ` Stefan Monnier
2020-03-31 17:19                       ` Jorge Javier Araya Navarro
2020-03-31 17:46                       ` Eli Zaretskii
2020-03-31 18:42                         ` 조성빈
2020-03-31 19:29                           ` Eli Zaretskii
2020-03-31 18:47                         ` Dmitry Gutov
2020-03-31 18:48                           ` Noam Postavsky
2020-03-31 19:02                             ` Dmitry Gutov
2020-03-31 19:26                           ` Eli Zaretskii
2020-03-31 19:50                             ` Dmitry Gutov
2020-04-01  2:28                               ` Eli Zaretskii
2020-04-01  3:49                                 ` Dmitry Gutov
2020-04-01  4:14                                   ` Eli Zaretskii
2020-04-01 13:47                                     ` Dmitry Gutov
2020-04-01 14:04                                       ` Eli Zaretskii
2020-04-01 14:55                                         ` Eli Zaretskii
2020-04-01 15:16                                         ` Dmitry Gutov
2020-04-01 15:59                                           ` Eli Zaretskii
2020-04-01 21:48                                             ` Dmitry Gutov
2020-04-01 22:29                                               ` Stefan Monnier
2020-04-02 14:23                                               ` Eli Zaretskii
2020-04-02 16:17                                                 ` Dmitry Gutov
2020-04-02 18:25                                                   ` Eli Zaretskii
2020-04-03 14:40                                                   ` Tuấn-Anh Nguyễn
2020-04-03 16:10                                                     ` Dmitry Gutov
2020-04-01 13:52                                     ` Alan Mackenzie
2020-04-01 14:10                                       ` Eli Zaretskii
2020-04-01 15:27                                         ` Dmitry Gutov
2020-04-01 15:44                                           ` Jorge Javier Araya Navarro
2020-04-01 16:03                                           ` Eli Zaretskii
2020-04-01 21:21                                             ` Dmitry Gutov
2020-04-02 14:09                                               ` Eli Zaretskii
2020-04-02 18:03                                                 ` 조성빈 via "Emacs development discussions.
2020-04-02 18:27                                                   ` Yuan Fu
2020-04-02 19:39                                                     ` Stefan Monnier
2020-04-01 15:22                                       ` Dmitry Gutov
2020-04-04 11:06                                         ` Alan Mackenzie
2020-04-04 11:26                                           ` Eli Zaretskii
2020-04-04 14:14                                             ` Andrea Corallo
2020-04-04 14:41                                               ` Eli Zaretskii
2020-04-04 15:04                                                 ` Andrea Corallo
2020-04-04 15:38                                                   ` Richard Copley
2020-04-04 11:27                                           ` Eli Zaretskii
2020-04-04 12:01                                           ` Dmitry Gutov
2020-04-04 12:36                                             ` Alan Mackenzie
2020-04-04 12:40                                               ` Dmitry Gutov
2020-04-04 13:02                                               ` Eli Zaretskii
2020-04-04 16:09                                                 ` Dmitry Gutov
2020-04-04 16:38                                                   ` Eli Zaretskii
2020-04-04 16:45                                                     ` Eli Zaretskii
2020-04-04 17:22                                                       ` Richard Copley
2020-04-04 17:50                                                         ` Eli Zaretskii
2020-04-04 18:29                                                         ` Andrea Corallo
2020-04-04 18:56                                                           ` Richard Copley
2020-04-04 20:36                                                             ` Andrea Corallo
2020-04-04 17:36                                                       ` Dmitry Gutov
2020-04-04 17:47                                                         ` Eli Zaretskii
2020-04-04 18:02                                                           ` Dmitry Gutov
2020-04-04 23:01                                                             ` Stefan Monnier
2020-04-06 14:25                                                               ` Yuan Fu
2020-04-06 19:55                                                                 ` Jorge Javier Araya Navarro
2020-04-04 17:29                                                     ` Dmitry Gutov
2020-04-04 17:38                                                       ` Eli Zaretskii
2020-04-04 17:57                                                         ` Dmitry Gutov
2020-03-31 16:13                 ` Alan Third
2020-03-31 17:55                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOGL6j6_2w=ktr4tdSv1gfGwB9X3aqjwW7qAC2Rao6r6LtJqjA@mail.gmail.com' \
    --to=ubolonton@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.