* Some issues with the tree-sitter branch
@ 2022-10-16 13:32 Eli Zaretskii
2022-10-16 14:01 ` Eli Zaretskii
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-16 13:32 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
I noticed several minor issues with the branch while reading the code:
. Several places assign EMACS_INT values to uint32_t variables with
an explicit range check (and error signal in case of overflow).
. Several functions produce Lisp_Object results by reference, and
callers pass to them pointers to Lisp_Object variables. Our style
prefers returning a Lisp_Object value through the return value,
like this:
Lisp_Object some_var = some_func (...);
When a function produces a single value, I think the above is
preferable.
. There's a call to malloc in Ftreesit_parser_set_included_ranges
which doesn't check the return value of malloc, and doesn't signal
memory-full error when malloc fails (that function should perhaps
use SAFE_ALLOCA).
In addition, the style of treesit.c (indentation etc.) is not exactly
ours (but this can be fixed later).
Thank you for your work on this important feature.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-16 13:32 Some issues with the tree-sitter branch Eli Zaretskii
@ 2022-10-16 14:01 ` Eli Zaretskii
2022-10-16 15:10 ` Eli Zaretskii
2022-10-17 4:53 ` Some issues with the tree-sitter branch Yuan Fu
2 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-16 14:01 UTC (permalink / raw)
To: casouri; +Cc: emacs-devel
> Date: Sun, 16 Oct 2022 16:32:10 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> CC: emacs-devel@gnu.org
>
> I noticed several minor issues with the branch while reading the code:
Oh, and one more: the test/src/treesit-tests.el test suite assumes
without checking that certain language definitions are installed. It
would be nice to skip the tests which need languages that aren't
installed.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-16 13:32 Some issues with the tree-sitter branch Eli Zaretskii
2022-10-16 14:01 ` Eli Zaretskii
@ 2022-10-16 15:10 ` Eli Zaretskii
2022-10-16 19:36 ` tree-sitter: Paths used for loading of language definitions Jostein Kjønigsen
2022-10-17 4:53 ` Some issues with the tree-sitter branch Yuan Fu
2 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-16 15:10 UTC (permalink / raw)
To: casouri; +Cc: emacs-devel
> Date: Sun, 16 Oct 2022 16:32:10 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> CC: emacs-devel@gnu.org
>
> In addition, the style of treesit.c (indentation etc.) is not exactly
> ours (but this can be fixed later).
I fixed some of those now.
^ permalink raw reply [flat|nested] 23+ messages in thread
* tree-sitter: Paths used for loading of language definitions
2022-10-16 15:10 ` Eli Zaretskii
@ 2022-10-16 19:36 ` Jostein Kjønigsen
2022-10-16 20:12 ` Daniel Martín
2022-10-17 4:27 ` Yuan Fu
0 siblings, 2 replies; 23+ messages in thread
From: Jostein Kjønigsen @ 2022-10-16 19:36 UTC (permalink / raw)
To: casouri; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]
Hey there, Yuan Fu.
First of all: Thank you for the tremendous effort you've put in to get a
tree-sitter implementation mainlined into Emacs. For some of the
major-modes I've been working on, tree-sitter has been a godsend and if
it can enable some of them to become mainlined into Emacs... I mean
nothing could be better!
That said, I've noticed something when building the feature/tree-sitter
branch, and also while testing other things tree-sitter related... If a
language-definition SO is missing... treesit.el in Emacs only reports
looking for files and folders typically found within $HOME/.emacs.d/
(that is user-owned files).
Based on that, I'm assuming those are the only locations probed. Is that
assumption correct?
If so, I think that is going to become a problem quite quickly if/when
Linux distro starts providing Emacs-builds with tree-sitter enabled.
Before that can be done, there needs to be support put in place to load
these from a system/distro/package-manager owned location, or else it's
going to be near-impossible for distros to ship a working Emacs with
tree-sitter.
Should we consider adding another system-wide location for such files?
/usr/lib/emacs? /usr/share/emacs?
What do you guys think?
--
Kind regards
*Jostein Kjønigsen*
jostein@kjonigsen.net 🍵 jostein@gmail.com
https://jostein.kjønigsen.no <https://jostein.kjønigsen.no>
[-- Attachment #2: Type: text/html, Size: 1977 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-16 19:36 ` tree-sitter: Paths used for loading of language definitions Jostein Kjønigsen
@ 2022-10-16 20:12 ` Daniel Martín
2022-10-17 4:27 ` Yuan Fu
1 sibling, 0 replies; 23+ messages in thread
From: Daniel Martín @ 2022-10-16 20:12 UTC (permalink / raw)
To: Jostein Kjønigsen; +Cc: casouri, jostein, emacs-devel
Jostein Kjønigsen <jostein@secure.kjonigsen.net> writes:
> Hey there, Yuan Fu.
>
> First of all: Thank you for the tremendous effort you've put in to get
> a tree-sitter implementation mainlined into Emacs. For some of the
> major-modes I've been working on, tree-sitter has been a godsend and
> if it can enable some of them to become mainlined into Emacs... I mean
> nothing could be better!
>
> That said, I've noticed something when building the
> feature/tree-sitter branch, and also while testing other things
> tree-sitter related... If a language-definition SO is
> missing... treesit.el in Emacs only reports looking for files and
> folders typically found within $HOME/.emacs.d/ (that is user-owned
> files).
>
> Based on that, I'm assuming those are the only locations probed. Is
> that assumption correct?
I think the variable treesit-extra-load-path is supposed to contain
those extra locations.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-16 19:36 ` tree-sitter: Paths used for loading of language definitions Jostein Kjønigsen
2022-10-16 20:12 ` Daniel Martín
@ 2022-10-17 4:27 ` Yuan Fu
2022-10-17 7:14 ` Jostein Kjønigsen
1 sibling, 1 reply; 23+ messages in thread
From: Yuan Fu @ 2022-10-17 4:27 UTC (permalink / raw)
To: jostein; +Cc: emacs-devel
> On Oct 16, 2022, at 12:36 PM, Jostein Kjønigsen <jostein@secure.kjonigsen.net> wrote:
>
> Hey there, Yuan Fu.
>
> First of all: Thank you for the tremendous effort you've put in to get a tree-sitter implementation mainlined into Emacs. For some of the major-modes I've been working on, tree-sitter has been a godsend and if it can enable some of them to become mainlined into Emacs... I mean nothing could be better!
Thank you!
>
> That said, I've noticed something when building the feature/tree-sitter branch, and also while testing other things tree-sitter related... If a language-definition SO is missing... treesit.el in Emacs only reports looking for files and folders typically found within $HOME/.emacs.d/ (that is user-owned files).
>
> Based on that, I'm assuming those are the only locations probed. Is that assumption correct?
It shouldn’t be. Emacs should first look in treesit-extra-load-path, then ~/.emacs.d/tree-sitter, then standard system library locations. The exact locations depend on dlopen. Could you share the error message reported when loading a non-exist language? It should print all the locations it tries (as you observed).
Yuan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-17 4:27 ` Yuan Fu
@ 2022-10-17 7:14 ` Jostein Kjønigsen
2022-10-17 7:29 ` Yuan Fu
2022-10-17 8:02 ` Eli Zaretskii
0 siblings, 2 replies; 23+ messages in thread
From: Jostein Kjønigsen @ 2022-10-17 7:14 UTC (permalink / raw)
To: Yuan Fu, jostein; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1716 bytes --]
On 17.10.2022 06:27, Yuan Fu wrote:
>
>> That said, I've noticed something when building the feature/tree-sitter branch, and also while testing other things tree-sitter related... If a language-definition SO is missing... treesit.el in Emacs only reports looking for files and folders typically found within $HOME/.emacs.d/ (that is user-owned files).
>>
>> Based on that, I'm assuming those are the only locations probed. Is that assumption correct?
> It shouldn’t be. Emacs should first look in treesit-extra-load-path, then ~/.emacs.d/tree-sitter, then standard system library locations. The exact locations depend on dlopen. Could you share the error message reported when loading a non-exist language? It should print all the locations it tries (as you observed).
>
> Yuan
textmodes/mhtml-mode.el:29:2: Error: Cannot load language definition:
"javascript",
("/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript: cannot
open shared object file: No such file or directory"
"/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript.so: cannot
open shared object file: No such file or directory"
"libtree-sitter-javascript: cannot open shared object file: No such file
or directory" "libtree-sitter-javascript.so: cannot open shared object
file: No such file or directory")
Does that impliy it checks in the standard system library localtion? If
so, I guess there's no problem.
In that case, sorry for the "noise", but I just wanted to make sure this
was packagable and thought one check too much is better than one check
too little :)
--
Vennlig hilsen
*Jostein Kjønigsen*
jostein@kjonigsen.net 🍵 jostein@gmail.com
https://jostein.kjønigsen.no <https://jostein.kjønigsen.no>
[-- Attachment #2: Type: text/html, Size: 2547 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-17 7:14 ` Jostein Kjønigsen
@ 2022-10-17 7:29 ` Yuan Fu
2022-10-17 8:02 ` Eli Zaretskii
1 sibling, 0 replies; 23+ messages in thread
From: Yuan Fu @ 2022-10-17 7:29 UTC (permalink / raw)
To: jostein; +Cc: emacs-devel
> On Oct 17, 2022, at 12:14 AM, Jostein Kjønigsen <jostein@secure.kjonigsen.net> wrote:
>
>
>
> On 17.10.2022 06:27, Yuan Fu wrote:
>>
>>> That said, I've noticed something when building the feature/tree-sitter branch, and also while testing other things tree-sitter related... If a language-definition SO is missing... treesit.el in Emacs only reports looking for files and folders typically found within $HOME/.emacs.d/ (that is user-owned files).
>>>
>>> Based on that, I'm assuming those are the only locations probed. Is that assumption correct?
>>>
>> It shouldn’t be. Emacs should first look in treesit-extra-load-path, then ~/.emacs.d/tree-sitter, then standard system library locations. The exact locations depend on dlopen. Could you share the error message reported when loading a non-exist language? It should print all the locations it tries (as you observed).
>>
>> Yuan
>>
> textmodes/mhtml-mode.el:29:2: Error: Cannot load language definition: "javascript", ("/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript: cannot open shared object file: No such file or directory" "/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript.so: cannot open shared object file: No such file or directory" "libtree-sitter-javascript: cannot open shared object file: No such file or directory" "libtree-sitter-javascript.so: cannot open shared object file: No such file or directory")
Ah, I guess your system doesn’t explicitly say what path has been checked. The last two messages in the list is where dlopen searches the system paths.
>
> Does that impliy it checks in the standard system library localtion? If so, I guess there's no problem.
>
> In that case, sorry for the "noise", but I just wanted to make sure this was packagable and thought one check too much is better than one check too little :)
Yes, false positives are always better than false negative ;-)
Yuan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-17 7:14 ` Jostein Kjønigsen
2022-10-17 7:29 ` Yuan Fu
@ 2022-10-17 8:02 ` Eli Zaretskii
2022-10-17 9:02 ` Yuan Fu
1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-17 8:02 UTC (permalink / raw)
To: jostein; +Cc: casouri, jostein, emacs-devel
> Date: Mon, 17 Oct 2022 09:14:58 +0200
> Cc: emacs-devel@gnu.org
> From: Jostein Kjønigsen <jostein@secure.kjonigsen.net>
>
> textmodes/mhtml-mode.el:29:2: Error: Cannot load language definition: "javascript",
> ("/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript: cannot open shared object file: No such file or
> directory" "/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript.so: cannot open shared object file: No
> such file or directory" "libtree-sitter-javascript: cannot open shared object file: No such file or directory"
> "libtree-sitter-javascript.so: cannot open shared object file: No such file or directory")
The error data should be improved to provide a more informative error
message. It would be best to display just the name of the language
definition library, without leading directories.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-17 8:02 ` Eli Zaretskii
@ 2022-10-17 9:02 ` Yuan Fu
2022-10-17 9:08 ` Eli Zaretskii
0 siblings, 1 reply; 23+ messages in thread
From: Yuan Fu @ 2022-10-17 9:02 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: jostein, emacs-devel
> On Oct 17, 2022, at 1:02 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> Date: Mon, 17 Oct 2022 09:14:58 +0200
>> Cc: emacs-devel@gnu.org
>> From: Jostein Kjønigsen <jostein@secure.kjonigsen.net>
>>
>> textmodes/mhtml-mode.el:29:2: Error: Cannot load language definition: "javascript",
>> ("/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript: cannot open shared object file: No such file or
>> directory" "/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript.so: cannot open shared object file: No
>> such file or directory" "libtree-sitter-javascript: cannot open shared object file: No such file or directory"
>> "libtree-sitter-javascript.so: cannot open shared object file: No such file or directory")
>
> The error data should be improved to provide a more informative error
> message. It would be best to display just the name of the language
> definition library, without leading directories.
You mean something like
Error: Cannot load language definition: "javascript”, “libtree-sitter-javascript.so”
?
Yuan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-17 9:02 ` Yuan Fu
@ 2022-10-17 9:08 ` Eli Zaretskii
2022-10-17 11:56 ` Stephen Leake
2022-10-17 21:15 ` Yuan Fu
0 siblings, 2 replies; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-17 9:08 UTC (permalink / raw)
To: Yuan Fu; +Cc: jostein, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 17 Oct 2022 02:02:43 -0700
> Cc: jostein@kjonigsen.net,
> emacs-devel@gnu.org
>
> > On Oct 17, 2022, at 1:02 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> >
> >> Date: Mon, 17 Oct 2022 09:14:58 +0200
> >> Cc: emacs-devel@gnu.org
> >> From: Jostein Kjønigsen <jostein@secure.kjonigsen.net>
> >>
> >> textmodes/mhtml-mode.el:29:2: Error: Cannot load language definition: "javascript",
> >> ("/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript: cannot open shared object file: No such file or
> >> directory" "/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript.so: cannot open shared object file: No
> >> such file or directory" "libtree-sitter-javascript: cannot open shared object file: No such file or directory"
> >> "libtree-sitter-javascript.so: cannot open shared object file: No such file or directory")
> >
> > The error data should be improved to provide a more informative error
> > message. It would be best to display just the name of the language
> > definition library, without leading directories.
>
> You mean something like
>
> Error: Cannot load language definition: "javascript”, “libtree-sitter-javascript.so”
>
> ?
Better yet
Error: Cannot load language definition: "javascript”, “libtree-sitter-javascript.so”: No such file or directory
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-17 9:08 ` Eli Zaretskii
@ 2022-10-17 11:56 ` Stephen Leake
2022-10-17 13:39 ` Eli Zaretskii
2022-10-17 21:15 ` Yuan Fu
1 sibling, 1 reply; 23+ messages in thread
From: Stephen Leake @ 2022-10-17 11:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, jostein, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 17 Oct 2022 02:02:43 -0700
>> Cc: jostein@kjonigsen.net,
>> emacs-devel@gnu.org
>>
>> > On Oct 17, 2022, at 1:02 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> >
>> >> Date: Mon, 17 Oct 2022 09:14:58 +0200
>> >> Cc: emacs-devel@gnu.org
>> >> From: Jostein Kjønigsen <jostein@secure.kjonigsen.net>
>> >>
>> >> textmodes/mhtml-mode.el:29:2: Error: Cannot load language definition: "javascript",
>> >> ("/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript:
>> >> cannot open shared object file: No such file or
>> >> directory"
>> >> "/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript.so:
>> >> cannot open shared object file: No
>> >> such file or directory" "libtree-sitter-javascript: cannot open
>> >> shared object file: No such file or directory"
>> >> "libtree-sitter-javascript.so: cannot open shared object file: No
>> >> such file or directory")
>> >
>> > The error data should be improved to provide a more informative error
>> > message. It would be best to display just the name of the language
>> > definition library, without leading directories.
>>
>> You mean something like
>>
>> Error: Cannot load language definition: "javascript”, “libtree-sitter-javascript.so”
>>
>> ?
>
> Better yet
>
> Error: Cannot load language definition: "javascript”,
> “libtree-sitter-javascript.so”: No such file or directory
I would find it helpful if the error message told me what variable
contains the search path, so I can append the correct directory to it.
--
-- Stephe
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-17 11:56 ` Stephen Leake
@ 2022-10-17 13:39 ` Eli Zaretskii
0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-17 13:39 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, jostein, emacs-devel
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: Yuan Fu <casouri@gmail.com>, jostein@kjonigsen.net, emacs-devel@gnu.org
> Date: Mon, 17 Oct 2022 04:56:38 -0700
>
> > Error: Cannot load language definition: "javascript”,
> > “libtree-sitter-javascript.so”: No such file or directory
>
> I would find it helpful if the error message told me what variable
> contains the search path, so I can append the correct directory to it.
We don't say that when we fail to load other kinds of files. For
example:
(load "nonexistent")
=> error: (file-missing "Cannot open load file" "No such file or directory" "nonexistent")
So I see no reason to mention the variable in this case. There's no
limit to relevant information to supply for an error, but we cannot
possibly give all of that when we signal an error. The failure to
load a file should prompt the user to look for the information how
these files are looked up, and that should lead to the variable,
among other things.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: tree-sitter: Paths used for loading of language definitions
2022-10-17 9:08 ` Eli Zaretskii
2022-10-17 11:56 ` Stephen Leake
@ 2022-10-17 21:15 ` Yuan Fu
1 sibling, 0 replies; 23+ messages in thread
From: Yuan Fu @ 2022-10-17 21:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: jostein, emacs-devel
> On Oct 17, 2022, at 2:08 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 17 Oct 2022 02:02:43 -0700
>> Cc: jostein@kjonigsen.net,
>> emacs-devel@gnu.org
>>
>>> On Oct 17, 2022, at 1:02 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>>>
>>>> Date: Mon, 17 Oct 2022 09:14:58 +0200
>>>> Cc: emacs-devel@gnu.org
>>>> From: Jostein Kjønigsen <jostein@secure.kjonigsen.net>
>>>>
>>>> textmodes/mhtml-mode.el:29:2: Error: Cannot load language definition: "javascript",
>>>> ("/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript: cannot open shared object file: No such file or
>>>> directory" "/home/jostein/.emacs.d/tree-sitter/libtree-sitter-javascript.so: cannot open shared object file: No
>>>> such file or directory" "libtree-sitter-javascript: cannot open shared object file: No such file or directory"
>>>> "libtree-sitter-javascript.so: cannot open shared object file: No such file or directory")
>>>
>>> The error data should be improved to provide a more informative error
>>> message. It would be best to display just the name of the language
>>> definition library, without leading directories.
>>
>> You mean something like
>>
>> Error: Cannot load language definition: "javascript”, “libtree-sitter-javascript.so”
>>
>> ?
>
> Better yet
>
> Error: Cannot load language definition: "javascript”, “libtree-sitter-javascript.so”: No such file or directory
Done.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-16 13:32 Some issues with the tree-sitter branch Eli Zaretskii
2022-10-16 14:01 ` Eli Zaretskii
2022-10-16 15:10 ` Eli Zaretskii
@ 2022-10-17 4:53 ` Yuan Fu
2022-10-17 5:37 ` Po Lu
` (2 more replies)
2 siblings, 3 replies; 23+ messages in thread
From: Yuan Fu @ 2022-10-17 4:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> On Oct 16, 2022, at 6:32 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
> I noticed several minor issues with the branch while reading the code:
>
> . Several places assign EMACS_INT values to uint32_t variables with
> an explicit range check (and error signal in case of overflow).
I think you meant the lack of a range check? Like mentioned here:
+ /* FIXME: We should signal an error below if START_BYTE
+ etc. overflow the 32-bit unsigned data type. */
I added buffer size check at parser creation time, and used casts to uint32_t liberally, assuming the values never overflows and, so we don’t need to handle the error at a million places. But I should have added checks in ts_after_chang and other places where buffer size could change. I’ll add checks in ts_after_change and other places, and if the argumetns overflows uint32, it will set a flag (say, buffer_too_large) in the parser object, and next time any lisp function tries to use that parser, an buffer-too-large error will be signaled. WDYT?
> . Several functions produce Lisp_Object results by reference, and
> callers pass to them pointers to Lisp_Object variables. Our style
> prefers returning a Lisp_Object value through the return value,
> like this:
>
> Lisp_Object some_var = some_func (...);
>
> When a function produces a single value, I think the above is
> preferable.
Got it.
> . There's a call to malloc in Ftreesit_parser_set_included_ranges
> which doesn't check the return value of malloc, and doesn't signal
> memory-full error when malloc fails (that function should perhaps
> use SAFE_ALLOCA).
I’ll fix that.
>
> In addition, the style of treesit.c (indentation etc.) is not exactly
> ours (but this can be fixed later).
>
> Thank you for your work on this important feature.
Thank you!
I see that you fixed them, I’ll keep those in mind in the future. That’s a lot of lines you need to change, sorry about that :-(
Yuan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-17 4:53 ` Some issues with the tree-sitter branch Yuan Fu
@ 2022-10-17 5:37 ` Po Lu
2022-10-17 5:48 ` Yuan Fu
2022-10-17 6:48 ` Eli Zaretskii
2022-10-18 0:14 ` Yuan Fu
2 siblings, 1 reply; 23+ messages in thread
From: Po Lu @ 2022-10-17 5:37 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
> I think you meant the lack of a range check? Like mentioned here:
>
> + /* FIXME: We should signal an error below if START_BYTE
> + etc. overflow the 32-bit unsigned data type. */
>
> I added buffer size check at parser creation time, and used casts to
> uint32_t liberally, assuming the values never overflows and, so we
> don’t need to handle the error at a million places. But I should have
> added checks in ts_after_chang and other places where buffer size
> could change. I’ll add checks in ts_after_change and other places, and
> if the argumetns overflows uint32, it will set a flag (say,
> buffer_too_large) in the parser object, and next time any lisp
> function tries to use that parser, an buffer-too-large error will be
> signaled. WDYT?
Isn't there a way to make tree-sitter parse buffers larger than
uint32_t? AFAIK, our buffers can be EMACS_INT large (especially with
long log files), and tree-sitter should be fast enough to parse them
incrementally?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-17 5:37 ` Po Lu
@ 2022-10-17 5:48 ` Yuan Fu
0 siblings, 0 replies; 23+ messages in thread
From: Yuan Fu @ 2022-10-17 5:48 UTC (permalink / raw)
To: Po Lu; +Cc: Eli Zaretskii, emacs-devel
> On Oct 16, 2022, at 10:37 PM, Po Lu <luangruo@yahoo.com> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>> I think you meant the lack of a range check? Like mentioned here:
>>
>> + /* FIXME: We should signal an error below if START_BYTE
>> + etc. overflow the 32-bit unsigned data type. */
>>
>> I added buffer size check at parser creation time, and used casts to
>> uint32_t liberally, assuming the values never overflows and, so we
>> don’t need to handle the error at a million places. But I should have
>> added checks in ts_after_chang and other places where buffer size
>> could change. I’ll add checks in ts_after_change and other places, and
>> if the argumetns overflows uint32, it will set a flag (say,
>> buffer_too_large) in the parser object, and next time any lisp
>> function tries to use that parser, an buffer-too-large error will be
>> signaled. WDYT?
>
> Isn't there a way to make tree-sitter parse buffers larger than
> uint32_t? AFAIK, our buffers can be EMACS_INT large (especially with
> long log files), and tree-sitter should be fast enough to parse them
> incrementally?
I don’t think we want to get even near that size, since tree-sitter uses ~10x the size of the buffer to store the parse tree. That’s probably why tree-sitter’s author didn’t bother to use uint64.
Yuan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-17 4:53 ` Some issues with the tree-sitter branch Yuan Fu
2022-10-17 5:37 ` Po Lu
@ 2022-10-17 6:48 ` Eli Zaretskii
2022-10-17 9:12 ` Yuan Fu
2022-10-18 0:14 ` Yuan Fu
2 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-17 6:48 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 16 Oct 2022 21:53:06 -0700
> Cc: emacs-devel@gnu.org
>
> > . Several places assign EMACS_INT values to uint32_t variables with
> > an explicit range check (and error signal in case of overflow).
>
> I think you meant the lack of a range check? Like mentioned here:
>
> + /* FIXME: We should signal an error below if START_BYTE
> + etc. overflow the 32-bit unsigned data type. */
Yes, I added FIXME comments to make it easier to find those places.
> I added buffer size check at parser creation time, and used casts to uint32_t liberally, assuming the values never overflows and, so we don’t need to handle the error at a million places. But I should have added checks in ts_after_chang and other places where buffer size could change. I’ll add checks in ts_after_change and other places, and if the argumetns overflows uint32, it will set a flag (say, buffer_too_large) in the parser object, and next time any lisp function tries to use that parser, an buffer-too-large error will be signaled. WDYT?
That sounds fine to me, but I think we also should do something when
some value that can be larger than UINT_MAX is passed to tree-sitter
functions, because doing so might cause tree-sitter do something for a
completely unrelated portion of the buffer. At the very least add
eassert there, so that at least a build with --enable-checking will
detect such cases.
> I see that you fixed them, I’ll keep those in mind in the future. That’s a lot of lines you need to change, sorry about that :-(
That was a low-hanging fruit that was easy to pluck ;-)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-17 6:48 ` Eli Zaretskii
@ 2022-10-17 9:12 ` Yuan Fu
2022-10-17 10:14 ` Eli Zaretskii
0 siblings, 1 reply; 23+ messages in thread
From: Yuan Fu @ 2022-10-17 9:12 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
>
>> I added buffer size check at parser creation time, and used casts to uint32_t liberally, assuming the values never overflows and, so we don’t need to handle the error at a million places. But I should have added checks in ts_after_chang and other places where buffer size could change. I’ll add checks in ts_after_change and other places, and if the argumetns overflows uint32, it will set a flag (say, buffer_too_large) in the parser object, and next time any lisp function tries to use that parser, an buffer-too-large error will be signaled. WDYT?
>
> That sounds fine to me, but I think we also should do something when
> some value that can be larger than UINT_MAX is passed to tree-sitter
> functions, because doing so might cause tree-sitter do something for a
> completely unrelated portion of the buffer. At the very least add
> eassert there, so that at least a build with --enable-checking will
> detect such cases.
IIUC you don’t want to hide and delay the error, right? That makes sense. I can add assertions, those assertions shouldn’t interfere with normal usage.
Yuan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-17 9:12 ` Yuan Fu
@ 2022-10-17 10:14 ` Eli Zaretskii
2022-10-18 0:15 ` Yuan Fu
0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-17 10:14 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 17 Oct 2022 02:12:13 -0700
> Cc: emacs-devel@gnu.org
>
> > That sounds fine to me, but I think we also should do something when
> > some value that can be larger than UINT_MAX is passed to tree-sitter
> > functions, because doing so might cause tree-sitter do something for a
> > completely unrelated portion of the buffer. At the very least add
> > eassert there, so that at least a build with --enable-checking will
> > detect such cases.
>
> IIUC you don’t want to hide and delay the error, right? That makes sense. I can add assertions, those assertions shouldn’t interfere with normal usage.
Yes, please.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-17 10:14 ` Eli Zaretskii
@ 2022-10-18 0:15 ` Yuan Fu
0 siblings, 0 replies; 23+ messages in thread
From: Yuan Fu @ 2022-10-18 0:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> On Oct 17, 2022, at 3:14 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 17 Oct 2022 02:12:13 -0700
>> Cc: emacs-devel@gnu.org
>>
>>> That sounds fine to me, but I think we also should do something when
>>> some value that can be larger than UINT_MAX is passed to tree-sitter
>>> functions, because doing so might cause tree-sitter do something for a
>>> completely unrelated portion of the buffer. At the very least add
>>> eassert there, so that at least a build with --enable-checking will
>>> detect such cases.
>>
>> IIUC you don’t want to hide and delay the error, right? That makes sense. I can add assertions, those assertions shouldn’t interfere with normal usage.
>
> Yes, please.
Now done.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-17 4:53 ` Some issues with the tree-sitter branch Yuan Fu
2022-10-17 5:37 ` Po Lu
2022-10-17 6:48 ` Eli Zaretskii
@ 2022-10-18 0:14 ` Yuan Fu
2022-10-18 16:11 ` Eli Zaretskii
2 siblings, 1 reply; 23+ messages in thread
From: Yuan Fu @ 2022-10-18 0:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
>
>> . There's a call to malloc in Ftreesit_parser_set_included_ranges
>> which doesn't check the return value of malloc, and doesn't signal
>> memory-full error when malloc fails (that function should perhaps
>> use SAFE_ALLOCA).
>
> I’ll fix that.
Done. IIUC I just need to change malloc to xmalloc, right?
Yuan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Some issues with the tree-sitter branch
2022-10-18 0:14 ` Yuan Fu
@ 2022-10-18 16:11 ` Eli Zaretskii
0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2022-10-18 16:11 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 17 Oct 2022 17:14:17 -0700
> Cc: emacs-devel@gnu.org
>
>
> >
> >> . There's a call to malloc in Ftreesit_parser_set_included_ranges
> >> which doesn't check the return value of malloc, and doesn't signal
> >> memory-full error when malloc fails (that function should perhaps
> >> use SAFE_ALLOCA).
> >
> > I’ll fix that.
>
> Done. IIUC I just need to change malloc to xmalloc, right?
Yes, that's one way.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2022-10-18 16:11 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-16 13:32 Some issues with the tree-sitter branch Eli Zaretskii
2022-10-16 14:01 ` Eli Zaretskii
2022-10-16 15:10 ` Eli Zaretskii
2022-10-16 19:36 ` tree-sitter: Paths used for loading of language definitions Jostein Kjønigsen
2022-10-16 20:12 ` Daniel Martín
2022-10-17 4:27 ` Yuan Fu
2022-10-17 7:14 ` Jostein Kjønigsen
2022-10-17 7:29 ` Yuan Fu
2022-10-17 8:02 ` Eli Zaretskii
2022-10-17 9:02 ` Yuan Fu
2022-10-17 9:08 ` Eli Zaretskii
2022-10-17 11:56 ` Stephen Leake
2022-10-17 13:39 ` Eli Zaretskii
2022-10-17 21:15 ` Yuan Fu
2022-10-17 4:53 ` Some issues with the tree-sitter branch Yuan Fu
2022-10-17 5:37 ` Po Lu
2022-10-17 5:48 ` Yuan Fu
2022-10-17 6:48 ` Eli Zaretskii
2022-10-17 9:12 ` Yuan Fu
2022-10-17 10:14 ` Eli Zaretskii
2022-10-18 0:15 ` Yuan Fu
2022-10-18 0:14 ` Yuan Fu
2022-10-18 16:11 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).