From: Stephen Leake <stephen_leake@stephe-leake.org>
To: emacs-devel <emacs-devel@gnu.org>
Subject: Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Fri, 03 Apr 2020 10:11:05 -0800 [thread overview]
Message-ID: <86zhbse8xy.fsf@stephe-leake.org> (raw)
In-Reply-To: <83y2rdvwm1.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 03 Apr 2020 10:47:50 +0300")
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Date: Thu, 02 Apr 2020 18:49:07 -0800
>>
>> > I think we should try to avoid both copying and encoding the text we
>> > send to the parser. Both operations are expensive and require memory
>> > allocation.
>>
>> I don't understand what the alternative is. The parser imposes the
>> reasonable requirement that the input text be utf-8 (or possibly some
>> other standard format). Emacs raw buffer text is not utf-8, so we must
>> do some encoding.
>
> Emacs represents buffer text as a superset of UTF-8, with the
> violations of strict UTF-8 being very rare in buffers that hold
> program sources. The function we can provide that lets tree-sitter
> access buffer text can cope with those violations,
Ok. "cope with those violations" = "do some encoding".
We can avoid copying _if_ the encoding does not change character
positions, or somehow preserves positions, for example with an auxiliary
table of changes due to encoding.
Coping with violations in the lexer would make it much easier to avoid
changing character positions; it is easy to simply ignore bytes there.
wisi makes it easy to implement this in the lexer (because it uses
re2c), although currently there is no way to make that language-specific
(that would be an enhancement).
https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners
describes the facility for enhancing the tree-sitter lexer (aka
scanner). That is not convenient for handling this issue, so we'd have
to request (and or provide) an enhancement.
We cannot avoid encoding (either in the read function provided to
tree-sitter, or in the tree-sitter lexer), but the encoding may be very
simple and efficient.
--
-- Stephe
next prev parent reply other threads:[~2020-04-03 18:11 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-31 17:07 Reliable after-change-functions (via: Using incremental parsing in Emacs) Tuấn Anh Nguyễn
2020-03-31 17:50 ` Eli Zaretskii
2020-04-01 6:17 ` Tuấn Anh Nguyễn
2020-04-01 13:26 ` Eli Zaretskii
2020-04-01 15:47 ` Jorge Javier Araya Navarro
2020-04-01 16:07 ` Eli Zaretskii
2020-04-01 17:55 ` Tuấn-Anh Nguyễn
2020-04-01 19:33 ` Eli Zaretskii
2020-04-01 23:38 ` Stephen Leake
2020-04-02 0:25 ` Stephen Leake
2020-04-02 2:46 ` Stefan Monnier
2020-04-02 4:36 ` Tuấn-Anh Nguyễn
2020-04-02 14:44 ` Eli Zaretskii
2020-04-02 15:19 ` Stefan Monnier
2020-04-03 2:49 ` [SPAM UNSURE] " Stephen Leake
2020-04-03 7:47 ` Eli Zaretskii
2020-04-03 18:11 ` Stephen Leake [this message]
2020-04-03 18:46 ` Eli Zaretskii
2020-04-04 0:05 ` Stephen Leake
2020-04-03 8:11 ` Robert Pluim
2020-04-03 11:00 ` Eli Zaretskii
2020-04-03 11:09 ` Robert Pluim
2020-04-03 12:44 ` Eli Zaretskii
2020-04-03 11:21 ` John Yates
2020-04-03 12:50 ` Eli Zaretskii
2020-04-02 5:21 ` Tuấn-Anh Nguyễn
2020-04-02 9:24 ` [SPAM UNSURE] " Stephen Leake
2020-04-02 14:36 ` Eli Zaretskii
2020-04-03 2:27 ` Stephen Leake
2020-04-03 7:43 ` Eli Zaretskii
2020-04-03 17:45 ` Stephen Leake
2020-04-03 18:31 ` Eli Zaretskii
2020-04-04 0:04 ` Stephen Leake
2020-04-04 7:13 ` Eli Zaretskii
2020-04-02 4:21 ` Tuấn-Anh Nguyễn
2020-04-02 5:19 ` Jorge Javier Araya Navarro
2020-04-02 9:29 ` Stephen Leake
2020-04-02 10:37 ` Andrea Corallo
2020-04-02 11:14 ` Tuấn-Anh Nguyễn
2020-04-02 13:02 ` Stefan Monnier
2020-04-02 15:06 ` Eli Zaretskii
2020-04-02 15:02 ` Eli Zaretskii
2020-04-03 14:34 ` Tuấn-Anh Nguyễn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86zhbse8xy.fsf@stephe-leake.org \
--to=stephen_leake@stephe-leake.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).