* Re: master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers. [not found] ` <20221123013252.46814C004B6@vcs2.savannah.gnu.org> @ 2022-11-23 1:51 ` Stefan Monnier 2022-11-23 2:18 ` Yuan Fu 2022-11-23 12:27 ` Eli Zaretskii 0 siblings, 2 replies; 7+ messages in thread From: Stefan Monnier @ 2022-11-23 1:51 UTC (permalink / raw) To: emacs-devel; +Cc: Yuan Fu > - (when (> (position-bytes (1- (point-max))) treesit-max-buffer-size) > + (when (> (position-bytes (max (point-min) (1- (point-max)))) > + treesit-max-buffer-size) I'd expect `treesit-max-buffer-size` to be compared to `buffer-size` rather than to buffer positions. Is it really that important to count bytes rather than characters? Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers. 2022-11-23 1:51 ` master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers Stefan Monnier @ 2022-11-23 2:18 ` Yuan Fu 2022-11-23 12:27 ` Eli Zaretskii 1 sibling, 0 replies; 7+ messages in thread From: Yuan Fu @ 2022-11-23 2:18 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, Eli Zaretskii > On Nov 22, 2022, at 5:51 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> - (when (> (position-bytes (1- (point-max))) treesit-max-buffer-size) >> + (when (> (position-bytes (max (point-min) (1- (point-max)))) >> + treesit-max-buffer-size) > > I'd expect `treesit-max-buffer-size` to be compared to `buffer-size` > rather than to buffer positions. > Is it really that important to count bytes rather than characters? I’m not sure, I’ll leave Eli to explain his intention :-) Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers. 2022-11-23 1:51 ` master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers Stefan Monnier 2022-11-23 2:18 ` Yuan Fu @ 2022-11-23 12:27 ` Eli Zaretskii 2022-11-23 12:40 ` Stefan Monnier 1 sibling, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2022-11-23 12:27 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, casouri > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Yuan Fu <casouri@gmail.com> > Date: Tue, 22 Nov 2022 20:51:40 -0500 > > > - (when (> (position-bytes (1- (point-max))) treesit-max-buffer-size) > > + (when (> (position-bytes (max (point-min) (1- (point-max)))) > > + treesit-max-buffer-size) > > I'd expect `treesit-max-buffer-size` to be compared to `buffer-size` > rather than to buffer positions. Please tell more: what problems do you see with the above, and why? It is not easy to guess what's on your mind. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers. 2022-11-23 12:27 ` Eli Zaretskii @ 2022-11-23 12:40 ` Stefan Monnier 2022-11-23 13:38 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Stefan Monnier @ 2022-11-23 12:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, casouri >> > - (when (> (position-bytes (1- (point-max))) treesit-max-buffer-size) >> > + (when (> (position-bytes (max (point-min) (1- (point-max)))) >> > + treesit-max-buffer-size) >> >> I'd expect `treesit-max-buffer-size` to be compared to `buffer-size` >> rather than to buffer positions. > > Please tell more: what problems do you see with the above, and why? It is > not easy to guess what's on your mind. I see 4 very minor problems: - the code is more complex than the obvious (> (buffer-size) treesit-max-buffer-size) - as a result of that complexity, we see that its original version had a bug :-) - it uses `position-bytes` which is an unusual function (because it exposes details of the internal representation). But my question was not so much pointing out a problem but trying to understand why we chose the more complex code. Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers. 2022-11-23 12:40 ` Stefan Monnier @ 2022-11-23 13:38 ` Eli Zaretskii 2022-11-23 14:57 ` Stefan Monnier 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2022-11-23 13:38 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, casouri > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: emacs-devel@gnu.org, casouri@gmail.com > Date: Wed, 23 Nov 2022 07:40:31 -0500 > > But my question was not so much pointing out a problem but trying to > understand why we chose the more complex code. Because we need to compare with byte positions, and I'm not aware that we have anything like a buffer-size-bytes function. (Do we?) And I didn't feel like adding one, for this simple use case, in a function that is unlikely to be called frequently and/or in some inner loop. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers. 2022-11-23 13:38 ` Eli Zaretskii @ 2022-11-23 14:57 ` Stefan Monnier 2022-11-23 15:25 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Stefan Monnier @ 2022-11-23 14:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, casouri >> But my question was not so much pointing out a problem but trying to >> understand why we chose the more complex code. > Because we need to compare with byte positions, Ah, because we wrote "(in bytes)" in the docstring of `treesit-max-buffer-size`. That's a rather unusual choice. All other places were we use(d) a limit on the buffer size it's always been based on the number of chars. I doubt it would make a significant difference here either (e.g. not only the "10 times" memory use of the tree-sitter tree is obviously a rough approximation, but I doubt it's related to the number of bytes more than to the number of chars or even the number of lexemes). Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers. 2022-11-23 14:57 ` Stefan Monnier @ 2022-11-23 15:25 ` Eli Zaretskii 0 siblings, 0 replies; 7+ messages in thread From: Eli Zaretskii @ 2022-11-23 15:25 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, casouri > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: emacs-devel@gnu.org, casouri@gmail.com > Date: Wed, 23 Nov 2022 09:57:38 -0500 > > >> But my question was not so much pointing out a problem but trying to > >> understand why we chose the more complex code. > > Because we need to compare with byte positions, > > Ah, because we wrote "(in bytes)" in the docstring of > `treesit-max-buffer-size`. That's a rather unusual choice. All other > places were we use(d) a limit on the buffer size it's always been based > on the number of chars. No, not because we wrote "in bytes", but because treesit.c consistently uses byte-counts to make similar tests (with a single exception that I fixed yesterday), and keeps track of byte positions in its data structures. I assumed Yuan Fu did that for a reason, and I see at least a hint in the signature of this function, through which tree-sitter reads buffer text: static const char* treesit_read_buffer (void *parser, uint32_t byte_index, TSPoint position, uint32_t *bytes_read) which uses "byte_index and bytes_read, each of which is an unsigned 32-bit value. And since our hard limit is 4G _bytes_, it didn't seem to me consistent to test smaller limits against character counts, not byte counts. > I doubt it would make a significant difference here either (e.g. not > only the "10 times" memory use of the tree-sitter tree is obviously > a rough approximation, but I doubt it's related to the number of bytes > more than to the number of chars or even the number of lexemes). If someone looks in the tree-sitter source code and tells us that we can compare with character counts instead, I'll be the first to agree. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-11-23 15:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <166916717199.12853.3816069320355351676@vcs2.savannah.gnu.org> [not found] ` <20221123013252.46814C004B6@vcs2.savannah.gnu.org> 2022-11-23 1:51 ` master c69858b3f0: ; * lisp/treesit.el (treesit-ready-p): Guard against empty buffers Stefan Monnier 2022-11-23 2:18 ` Yuan Fu 2022-11-23 12:27 ` Eli Zaretskii 2022-11-23 12:40 ` Stefan Monnier 2022-11-23 13:38 ` Eli Zaretskii 2022-11-23 14:57 ` Stefan Monnier 2022-11-23 15:25 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).