From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: emacs-devel@gnu.org, casouri@gmail.com
Subject: Re: master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers.
Date: Wed, 23 Nov 2022 17:25:50 +0200 [thread overview]
Message-ID: <8335a9zo8x.fsf@gnu.org> (raw)
In-Reply-To: <jwvilj5919x.fsf-monnier+emacs@gnu.org> (message from Stefan Monnier on Wed, 23 Nov 2022 09:57:38 -0500)
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org, casouri@gmail.com
> Date: Wed, 23 Nov 2022 09:57:38 -0500
>
> >> But my question was not so much pointing out a problem but trying to
> >> understand why we chose the more complex code.
> > Because we need to compare with byte positions,
>
> Ah, because we wrote "(in bytes)" in the docstring of
> `treesit-max-buffer-size`. That's a rather unusual choice. All other
> places were we use(d) a limit on the buffer size it's always been based
> on the number of chars.
No, not because we wrote "in bytes", but because treesit.c consistently uses
byte-counts to make similar tests (with a single exception that I fixed
yesterday), and keeps track of byte positions in its data structures. I
assumed Yuan Fu did that for a reason, and I see at least a hint in the
signature of this function, through which tree-sitter reads buffer text:
static const char*
treesit_read_buffer (void *parser, uint32_t byte_index,
TSPoint position, uint32_t *bytes_read)
which uses "byte_index and bytes_read, each of which is an unsigned 32-bit
value. And since our hard limit is 4G _bytes_, it didn't seem to me
consistent to test smaller limits against character counts, not byte counts.
> I doubt it would make a significant difference here either (e.g. not
> only the "10 times" memory use of the tree-sitter tree is obviously
> a rough approximation, but I doubt it's related to the number of bytes
> more than to the number of chars or even the number of lexemes).
If someone looks in the tree-sitter source code and tells us that we can
compare with character counts instead, I'll be the first to agree.
prev parent reply other threads:[~2022-11-23 15:25 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <166916717199.12853.3816069320355351676@vcs2.savannah.gnu.org>
[not found] ` <20221123013252.46814C004B6@vcs2.savannah.gnu.org>
2022-11-23 1:51 ` master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers Stefan Monnier
2022-11-23 2:18 ` Yuan Fu
2022-11-23 12:27 ` Eli Zaretskii
2022-11-23 12:40 ` Stefan Monnier
2022-11-23 13:38 ` Eli Zaretskii
2022-11-23 14:57 ` Stefan Monnier
2022-11-23 15:25 ` Eli Zaretskii [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8335a9zo8x.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=casouri@gmail.com \
--cc=emacs-devel@gnu.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).