* bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields @ 2023-10-21 20:36 Dominik Honnef 2023-10-25 13:03 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Dominik Honnef @ 2023-10-21 20:36 UTC (permalink / raw) To: 66674 Using tree-sitter's CLI as well as the publicly hosted playground produce different parse trees than treesit in Emacs. Specifically, the assignment of nodes to named fields differs. Given the following C source: void main() { int x = // foo 1+ // comment 2; } treesit-explore-mode displays the following tree: (translation_unit (function_definition type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( ))) body: (compound_statement { (declaration type: (primitive_type) declarator: (init_declarator declarator: (identifier) = value: (comment) (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) ;) }))) Note how in the init_declarator node, the 'value' field is a comment node, and similarly for the 'right' field in the binary_expression node. Running 'tree-sitter parse file.c', on the other hand, produces the following tree: (translation_unit [0, 0] - [6, 0] (function_definition [0, 0] - [5, 1] type: (primitive_type [0, 0] - [0, 4]) declarator: (function_declarator [0, 5] - [0, 11] declarator: (identifier [0, 5] - [0, 9]) parameters: (parameter_list [0, 9] - [0, 11])) body: (compound_statement [0, 12] - [5, 1] (declaration [1, 2] - [4, 6] type: (primitive_type [1, 2] - [1, 5]) declarator: (init_declarator [1, 6] - [4, 5] declarator: (identifier [1, 6] - [1, 7]) (comment [1, 10] - [1, 16]) value: (binary_expression [2, 4] - [4, 5] left: (number_literal [2, 4] - [2, 5]) (comment [3, 4] - [3, 14]) right: (number_literal [4, 4] - [4, 5]))))))) Here, the two comment nodes appear as unnamed nodes. IMHO the second tree is a more useful one, as the named fields contain the semantically important subtrees (e.g. a binary expression is made up of a left and right subtree, not a left subtree, a right comment, and then some unnamed subtree.) Emacs's tree makes writing queries less convenient, as instead of being able to refer to well-defined names, one has to rely on child indices to account for comments. Further mismatch arises from repeated fields and separators. Consider the following Go source: package pkg var a, b, c = 1, 2, 3 treesit-explore-mode displays the following tree: (source_file (package_clause package (package_identifier)) \n (var_declaration var (var_spec name: (identifier) name: , (identifier) value: , (identifier) = (expression_list (int_literal) , (int_literal) , (int_literal)))) \n) Here, the var_spec node has two fields named 'name' even though the source specifies three names. Furthermore, The second 'name', as well as 'value' are set to the ',' separator between identifiers. Two of the three identifiers aren't named. 'tree-sitter parse file.go', on the other hand, produces this more accurate tree: (source_file [0, 0] - [2, 21] (package_clause [0, 0] - [0, 11] (package_identifier [0, 8] - [0, 11])) (var_declaration [2, 0] - [2, 21] (var_spec [2, 4] - [2, 21] name: (identifier [2, 4] - [2, 5]) name: (identifier [2, 7] - [2, 8]) name: (identifier [2, 10] - [2, 11]) value: (expression_list [2, 14] - [2, 21] (int_literal [2, 14] - [2, 15]) (int_literal [2, 17] - [2, 18]) (int_literal [2, 20] - [2, 21]))))) This reproduces with 29.1 as well as 30.0.50. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields 2023-10-21 20:36 bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Dominik Honnef @ 2023-10-25 13:03 ` Eli Zaretskii 2023-11-19 10:08 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2023-10-25 13:03 UTC (permalink / raw) To: Dominik Honnef, Yuan Fu; +Cc: 66674 > From: Dominik Honnef <dominik@honnef.co> > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > Using tree-sitter's CLI as well as the publicly hosted playground > produce different parse trees than treesit in Emacs. Specifically, the > assignment of nodes to named fields differs. > > Given the following C source: > > void main() { > int x = // foo > 1+ > // comment > 2; > } > > treesit-explore-mode displays the following tree: > > (translation_unit > (function_definition type: (primitive_type) > declarator: > (function_declarator declarator: (identifier) > parameters: (parameter_list ( ))) > body: > (compound_statement { > (declaration type: (primitive_type) > declarator: > (init_declarator declarator: (identifier) = value: (comment) > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > ;) > }))) > > Note how in the init_declarator node, the 'value' field is a comment > node, and similarly for the 'right' field in the binary_expression node. > > Running 'tree-sitter parse file.c', on the other hand, produces the > following tree: > > (translation_unit [0, 0] - [6, 0] > (function_definition [0, 0] - [5, 1] > type: (primitive_type [0, 0] - [0, 4]) > declarator: (function_declarator [0, 5] - [0, 11] > declarator: (identifier [0, 5] - [0, 9]) > parameters: (parameter_list [0, 9] - [0, 11])) > body: (compound_statement [0, 12] - [5, 1] > (declaration [1, 2] - [4, 6] > type: (primitive_type [1, 2] - [1, 5]) > declarator: (init_declarator [1, 6] - [4, 5] > declarator: (identifier [1, 6] - [1, 7]) > (comment [1, 10] - [1, 16]) > value: (binary_expression [2, 4] - [4, 5] > left: (number_literal [2, 4] - [2, 5]) > (comment [3, 4] - [3, 14]) > right: (number_literal [4, 4] - [4, 5]))))))) > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > tree is a more useful one, as the named fields contain the semantically > important subtrees (e.g. a binary expression is made up of a left and > right subtree, not a left subtree, a right comment, and then some > unnamed subtree.) > > Emacs's tree makes writing queries less convenient, as instead of being > able to refer to well-defined names, one has to rely on child indices to > account for comments. > > > Further mismatch arises from repeated fields and separators. > > Consider the following Go source: > > package pkg > > var a, b, c = 1, 2, 3 > > treesit-explore-mode displays the following tree: > > (source_file > (package_clause package (package_identifier)) > \n > (var_declaration var > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > (expression_list (int_literal) , (int_literal) , (int_literal)))) > \n) > > Here, the var_spec node has two fields named 'name' even though the > source specifies three names. Furthermore, The second 'name', as well as > 'value' are set to the ',' separator between identifiers. Two of the three > identifiers aren't named. > > 'tree-sitter parse file.go', on the other hand, produces this more > accurate tree: > > (source_file [0, 0] - [2, 21] > (package_clause [0, 0] - [0, 11] > (package_identifier [0, 8] - [0, 11])) > (var_declaration [2, 0] - [2, 21] > (var_spec [2, 4] - [2, 21] > name: (identifier [2, 4] - [2, 5]) > name: (identifier [2, 7] - [2, 8]) > name: (identifier [2, 10] - [2, 11]) > value: (expression_list [2, 14] - [2, 21] > (int_literal [2, 14] - [2, 15]) > (int_literal [2, 17] - [2, 18]) > (int_literal [2, 20] - [2, 21]))))) > > This reproduces with 29.1 as well as 30.0.50. Yuan, any comments or suggestions? ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields 2023-10-25 13:03 ` Eli Zaretskii @ 2023-11-19 10:08 ` Eli Zaretskii 2023-11-25 10:03 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2023-11-19 10:08 UTC (permalink / raw) To: casouri; +Cc: 66674, dominik Ping! Yuan, any comments? > Cc: 66674@debbugs.gnu.org > Date: Wed, 25 Oct 2023 16:03:10 +0300 > From: Eli Zaretskii <eliz@gnu.org> > > > From: Dominik Honnef <dominik@honnef.co> > > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > > > Using tree-sitter's CLI as well as the publicly hosted playground > > produce different parse trees than treesit in Emacs. Specifically, the > > assignment of nodes to named fields differs. > > > > Given the following C source: > > > > void main() { > > int x = // foo > > 1+ > > // comment > > 2; > > } > > > > treesit-explore-mode displays the following tree: > > > > (translation_unit > > (function_definition type: (primitive_type) > > declarator: > > (function_declarator declarator: (identifier) > > parameters: (parameter_list ( ))) > > body: > > (compound_statement { > > (declaration type: (primitive_type) > > declarator: > > (init_declarator declarator: (identifier) = value: (comment) > > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > > ;) > > }))) > > > > Note how in the init_declarator node, the 'value' field is a comment > > node, and similarly for the 'right' field in the binary_expression node. > > > > Running 'tree-sitter parse file.c', on the other hand, produces the > > following tree: > > > > (translation_unit [0, 0] - [6, 0] > > (function_definition [0, 0] - [5, 1] > > type: (primitive_type [0, 0] - [0, 4]) > > declarator: (function_declarator [0, 5] - [0, 11] > > declarator: (identifier [0, 5] - [0, 9]) > > parameters: (parameter_list [0, 9] - [0, 11])) > > body: (compound_statement [0, 12] - [5, 1] > > (declaration [1, 2] - [4, 6] > > type: (primitive_type [1, 2] - [1, 5]) > > declarator: (init_declarator [1, 6] - [4, 5] > > declarator: (identifier [1, 6] - [1, 7]) > > (comment [1, 10] - [1, 16]) > > value: (binary_expression [2, 4] - [4, 5] > > left: (number_literal [2, 4] - [2, 5]) > > (comment [3, 4] - [3, 14]) > > right: (number_literal [4, 4] - [4, 5]))))))) > > > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > > tree is a more useful one, as the named fields contain the semantically > > important subtrees (e.g. a binary expression is made up of a left and > > right subtree, not a left subtree, a right comment, and then some > > unnamed subtree.) > > > > Emacs's tree makes writing queries less convenient, as instead of being > > able to refer to well-defined names, one has to rely on child indices to > > account for comments. > > > > > > Further mismatch arises from repeated fields and separators. > > > > Consider the following Go source: > > > > package pkg > > > > var a, b, c = 1, 2, 3 > > > > treesit-explore-mode displays the following tree: > > > > (source_file > > (package_clause package (package_identifier)) > > \n > > (var_declaration var > > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > > (expression_list (int_literal) , (int_literal) , (int_literal)))) > > \n) > > > > Here, the var_spec node has two fields named 'name' even though the > > source specifies three names. Furthermore, The second 'name', as well as > > 'value' are set to the ',' separator between identifiers. Two of the three > > identifiers aren't named. > > > > 'tree-sitter parse file.go', on the other hand, produces this more > > accurate tree: > > > > (source_file [0, 0] - [2, 21] > > (package_clause [0, 0] - [0, 11] > > (package_identifier [0, 8] - [0, 11])) > > (var_declaration [2, 0] - [2, 21] > > (var_spec [2, 4] - [2, 21] > > name: (identifier [2, 4] - [2, 5]) > > name: (identifier [2, 7] - [2, 8]) > > name: (identifier [2, 10] - [2, 11]) > > value: (expression_list [2, 14] - [2, 21] > > (int_literal [2, 14] - [2, 15]) > > (int_literal [2, 17] - [2, 18]) > > (int_literal [2, 20] - [2, 21]))))) > > > > This reproduces with 29.1 as well as 30.0.50. > > Yuan, any comments or suggestions? > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields 2023-11-19 10:08 ` Eli Zaretskii @ 2023-11-25 10:03 ` Eli Zaretskii 2023-12-10 10:07 ` Yuan Fu 0 siblings, 1 reply; 7+ messages in thread From: Eli Zaretskii @ 2023-11-25 10:03 UTC (permalink / raw) To: casouri; +Cc: 66674, dominik Ping! Ping! Yuan, please chime in. > Cc: 66674@debbugs.gnu.org, dominik@honnef.co > Date: Sun, 19 Nov 2023 12:08:08 +0200 > From: Eli Zaretskii <eliz@gnu.org> > > Ping! Yuan, any comments? > > > Cc: 66674@debbugs.gnu.org > > Date: Wed, 25 Oct 2023 16:03:10 +0300 > > From: Eli Zaretskii <eliz@gnu.org> > > > > > From: Dominik Honnef <dominik@honnef.co> > > > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > > > > > Using tree-sitter's CLI as well as the publicly hosted playground > > > produce different parse trees than treesit in Emacs. Specifically, the > > > assignment of nodes to named fields differs. > > > > > > Given the following C source: > > > > > > void main() { > > > int x = // foo > > > 1+ > > > // comment > > > 2; > > > } > > > > > > treesit-explore-mode displays the following tree: > > > > > > (translation_unit > > > (function_definition type: (primitive_type) > > > declarator: > > > (function_declarator declarator: (identifier) > > > parameters: (parameter_list ( ))) > > > body: > > > (compound_statement { > > > (declaration type: (primitive_type) > > > declarator: > > > (init_declarator declarator: (identifier) = value: (comment) > > > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > > > ;) > > > }))) > > > > > > Note how in the init_declarator node, the 'value' field is a comment > > > node, and similarly for the 'right' field in the binary_expression node. > > > > > > Running 'tree-sitter parse file.c', on the other hand, produces the > > > following tree: > > > > > > (translation_unit [0, 0] - [6, 0] > > > (function_definition [0, 0] - [5, 1] > > > type: (primitive_type [0, 0] - [0, 4]) > > > declarator: (function_declarator [0, 5] - [0, 11] > > > declarator: (identifier [0, 5] - [0, 9]) > > > parameters: (parameter_list [0, 9] - [0, 11])) > > > body: (compound_statement [0, 12] - [5, 1] > > > (declaration [1, 2] - [4, 6] > > > type: (primitive_type [1, 2] - [1, 5]) > > > declarator: (init_declarator [1, 6] - [4, 5] > > > declarator: (identifier [1, 6] - [1, 7]) > > > (comment [1, 10] - [1, 16]) > > > value: (binary_expression [2, 4] - [4, 5] > > > left: (number_literal [2, 4] - [2, 5]) > > > (comment [3, 4] - [3, 14]) > > > right: (number_literal [4, 4] - [4, 5]))))))) > > > > > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > > > tree is a more useful one, as the named fields contain the semantically > > > important subtrees (e.g. a binary expression is made up of a left and > > > right subtree, not a left subtree, a right comment, and then some > > > unnamed subtree.) > > > > > > Emacs's tree makes writing queries less convenient, as instead of being > > > able to refer to well-defined names, one has to rely on child indices to > > > account for comments. > > > > > > > > > Further mismatch arises from repeated fields and separators. > > > > > > Consider the following Go source: > > > > > > package pkg > > > > > > var a, b, c = 1, 2, 3 > > > > > > treesit-explore-mode displays the following tree: > > > > > > (source_file > > > (package_clause package (package_identifier)) > > > \n > > > (var_declaration var > > > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > > > (expression_list (int_literal) , (int_literal) , (int_literal)))) > > > \n) > > > > > > Here, the var_spec node has two fields named 'name' even though the > > > source specifies three names. Furthermore, The second 'name', as well as > > > 'value' are set to the ',' separator between identifiers. Two of the three > > > identifiers aren't named. > > > > > > 'tree-sitter parse file.go', on the other hand, produces this more > > > accurate tree: > > > > > > (source_file [0, 0] - [2, 21] > > > (package_clause [0, 0] - [0, 11] > > > (package_identifier [0, 8] - [0, 11])) > > > (var_declaration [2, 0] - [2, 21] > > > (var_spec [2, 4] - [2, 21] > > > name: (identifier [2, 4] - [2, 5]) > > > name: (identifier [2, 7] - [2, 8]) > > > name: (identifier [2, 10] - [2, 11]) > > > value: (expression_list [2, 14] - [2, 21] > > > (int_literal [2, 14] - [2, 15]) > > > (int_literal [2, 17] - [2, 18]) > > > (int_literal [2, 20] - [2, 21]))))) > > > > > > This reproduces with 29.1 as well as 30.0.50. > > > > Yuan, any comments or suggestions? > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields 2023-11-25 10:03 ` Eli Zaretskii @ 2023-12-10 10:07 ` Yuan Fu 2023-12-10 14:28 ` Dominik Honnef 0 siblings, 1 reply; 7+ messages in thread From: Yuan Fu @ 2023-12-10 10:07 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 66674, dominik On 11/25/23 2:03 AM, Eli Zaretskii wrote: > Ping! Ping! Yuan, please chime in. > >> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >> Date: Sun, 19 Nov 2023 12:08:08 +0200 >> From: Eli Zaretskii <eliz@gnu.org> >> >> Ping! Yuan, any comments? >> >>> Cc: 66674@debbugs.gnu.org >>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>> From: Eli Zaretskii <eliz@gnu.org> >>> >>>> From: Dominik Honnef <dominik@honnef.co> >>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>> >>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>> produce different parse trees than treesit in Emacs. Specifically, the >>>> assignment of nodes to named fields differs. >>>> >>>> Given the following C source: >>>> >>>> void main() { >>>> int x = // foo >>>> 1+ >>>> // comment >>>> 2; >>>> } >>>> >>>> treesit-explore-mode displays the following tree: >>>> >>>> (translation_unit >>>> (function_definition type: (primitive_type) >>>> declarator: >>>> (function_declarator declarator: (identifier) >>>> parameters: (parameter_list ( ))) >>>> body: >>>> (compound_statement { >>>> (declaration type: (primitive_type) >>>> declarator: >>>> (init_declarator declarator: (identifier) = value: (comment) >>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>> ;) >>>> }))) >>>> >>>> Note how in the init_declarator node, the 'value' field is a comment >>>> node, and similarly for the 'right' field in the binary_expression node. >>>> >>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>> following tree: >>>> >>>> (translation_unit [0, 0] - [6, 0] >>>> (function_definition [0, 0] - [5, 1] >>>> type: (primitive_type [0, 0] - [0, 4]) >>>> declarator: (function_declarator [0, 5] - [0, 11] >>>> declarator: (identifier [0, 5] - [0, 9]) >>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>> body: (compound_statement [0, 12] - [5, 1] >>>> (declaration [1, 2] - [4, 6] >>>> type: (primitive_type [1, 2] - [1, 5]) >>>> declarator: (init_declarator [1, 6] - [4, 5] >>>> declarator: (identifier [1, 6] - [1, 7]) >>>> (comment [1, 10] - [1, 16]) >>>> value: (binary_expression [2, 4] - [4, 5] >>>> left: (number_literal [2, 4] - [2, 5]) >>>> (comment [3, 4] - [3, 14]) >>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>> >>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>> tree is a more useful one, as the named fields contain the semantically >>>> important subtrees (e.g. a binary expression is made up of a left and >>>> right subtree, not a left subtree, a right comment, and then some >>>> unnamed subtree.) >>>> >>>> Emacs's tree makes writing queries less convenient, as instead of being >>>> able to refer to well-defined names, one has to rely on child indices to >>>> account for comments. >>>> >>>> >>>> Further mismatch arises from repeated fields and separators. >>>> >>>> Consider the following Go source: >>>> >>>> package pkg >>>> >>>> var a, b, c = 1, 2, 3 >>>> >>>> treesit-explore-mode displays the following tree: >>>> >>>> (source_file >>>> (package_clause package (package_identifier)) >>>> \n >>>> (var_declaration var >>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>> \n) >>>> >>>> Here, the var_spec node has two fields named 'name' even though the >>>> source specifies three names. Furthermore, The second 'name', as well as >>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>> identifiers aren't named. >>>> >>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>> accurate tree: >>>> >>>> (source_file [0, 0] - [2, 21] >>>> (package_clause [0, 0] - [0, 11] >>>> (package_identifier [0, 8] - [0, 11])) >>>> (var_declaration [2, 0] - [2, 21] >>>> (var_spec [2, 4] - [2, 21] >>>> name: (identifier [2, 4] - [2, 5]) >>>> name: (identifier [2, 7] - [2, 8]) >>>> name: (identifier [2, 10] - [2, 11]) >>>> value: (expression_list [2, 14] - [2, 21] >>>> (int_literal [2, 14] - [2, 15]) >>>> (int_literal [2, 17] - [2, 18]) >>>> (int_literal [2, 20] - [2, 21]))))) >>>> >>>> This reproduces with 29.1 as well as 30.0.50. >>> Yuan, any comments or suggestions? Sorry sorry sorry, another missed report. I think this is a bug in treesit-explore-mode, I'll work on fixing it! Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields 2023-12-10 10:07 ` Yuan Fu @ 2023-12-10 14:28 ` Dominik Honnef 2023-12-11 1:02 ` Yuan Fu 0 siblings, 1 reply; 7+ messages in thread From: Dominik Honnef @ 2023-12-10 14:28 UTC (permalink / raw) To: Yuan Fu, Eli Zaretskii; +Cc: 66674 Yuan Fu <casouri@gmail.com> writes: > On 11/25/23 2:03 AM, Eli Zaretskii wrote: >> Ping! Ping! Yuan, please chime in. >> >>> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >>> Date: Sun, 19 Nov 2023 12:08:08 +0200 >>> From: Eli Zaretskii <eliz@gnu.org> >>> >>> Ping! Yuan, any comments? >>> >>>> Cc: 66674@debbugs.gnu.org >>>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>>> From: Eli Zaretskii <eliz@gnu.org> >>>> >>>>> From: Dominik Honnef <dominik@honnef.co> >>>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>>> >>>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>>> produce different parse trees than treesit in Emacs. Specifically, the >>>>> assignment of nodes to named fields differs. >>>>> >>>>> Given the following C source: >>>>> >>>>> void main() { >>>>> int x = // foo >>>>> 1+ >>>>> // comment >>>>> 2; >>>>> } >>>>> >>>>> treesit-explore-mode displays the following tree: >>>>> >>>>> (translation_unit >>>>> (function_definition type: (primitive_type) >>>>> declarator: >>>>> (function_declarator declarator: (identifier) >>>>> parameters: (parameter_list ( ))) >>>>> body: >>>>> (compound_statement { >>>>> (declaration type: (primitive_type) >>>>> declarator: >>>>> (init_declarator declarator: (identifier) = value: (comment) >>>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>>> ;) >>>>> }))) >>>>> >>>>> Note how in the init_declarator node, the 'value' field is a comment >>>>> node, and similarly for the 'right' field in the binary_expression node. >>>>> >>>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>>> following tree: >>>>> >>>>> (translation_unit [0, 0] - [6, 0] >>>>> (function_definition [0, 0] - [5, 1] >>>>> type: (primitive_type [0, 0] - [0, 4]) >>>>> declarator: (function_declarator [0, 5] - [0, 11] >>>>> declarator: (identifier [0, 5] - [0, 9]) >>>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>>> body: (compound_statement [0, 12] - [5, 1] >>>>> (declaration [1, 2] - [4, 6] >>>>> type: (primitive_type [1, 2] - [1, 5]) >>>>> declarator: (init_declarator [1, 6] - [4, 5] >>>>> declarator: (identifier [1, 6] - [1, 7]) >>>>> (comment [1, 10] - [1, 16]) >>>>> value: (binary_expression [2, 4] - [4, 5] >>>>> left: (number_literal [2, 4] - [2, 5]) >>>>> (comment [3, 4] - [3, 14]) >>>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>>> >>>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>>> tree is a more useful one, as the named fields contain the semantically >>>>> important subtrees (e.g. a binary expression is made up of a left and >>>>> right subtree, not a left subtree, a right comment, and then some >>>>> unnamed subtree.) >>>>> >>>>> Emacs's tree makes writing queries less convenient, as instead of being >>>>> able to refer to well-defined names, one has to rely on child indices to >>>>> account for comments. >>>>> >>>>> >>>>> Further mismatch arises from repeated fields and separators. >>>>> >>>>> Consider the following Go source: >>>>> >>>>> package pkg >>>>> >>>>> var a, b, c = 1, 2, 3 >>>>> >>>>> treesit-explore-mode displays the following tree: >>>>> >>>>> (source_file >>>>> (package_clause package (package_identifier)) >>>>> \n >>>>> (var_declaration var >>>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>>> \n) >>>>> >>>>> Here, the var_spec node has two fields named 'name' even though the >>>>> source specifies three names. Furthermore, The second 'name', as well as >>>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>>> identifiers aren't named. >>>>> >>>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>>> accurate tree: >>>>> >>>>> (source_file [0, 0] - [2, 21] >>>>> (package_clause [0, 0] - [0, 11] >>>>> (package_identifier [0, 8] - [0, 11])) >>>>> (var_declaration [2, 0] - [2, 21] >>>>> (var_spec [2, 4] - [2, 21] >>>>> name: (identifier [2, 4] - [2, 5]) >>>>> name: (identifier [2, 7] - [2, 8]) >>>>> name: (identifier [2, 10] - [2, 11]) >>>>> value: (expression_list [2, 14] - [2, 21] >>>>> (int_literal [2, 14] - [2, 15]) >>>>> (int_literal [2, 17] - [2, 18]) >>>>> (int_literal [2, 20] - [2, 21]))))) >>>>> >>>>> This reproduces with 29.1 as well as 30.0.50. >>>> Yuan, any comments or suggestions? > > Sorry sorry sorry, another missed report. I think this is a bug in > treesit-explore-mode, I'll work on fixing it! > > Yuan I don't think that's the case, at least not exclusively. I used treesit-explore-mode to debug patterns that matched in the playground but not in Emacs. The matching behavior seemed pretty in line with what treesit-explore-mode reported. ^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields 2023-12-10 14:28 ` Dominik Honnef @ 2023-12-11 1:02 ` Yuan Fu 0 siblings, 0 replies; 7+ messages in thread From: Yuan Fu @ 2023-12-11 1:02 UTC (permalink / raw) To: Dominik Honnef, Eli Zaretskii; +Cc: 66674-done On 12/10/23 6:28 AM, Dominik Honnef wrote: > Yuan Fu <casouri@gmail.com> writes: > >> On 11/25/23 2:03 AM, Eli Zaretskii wrote: >>> Ping! Ping! Yuan, please chime in. >>> >>>> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >>>> Date: Sun, 19 Nov 2023 12:08:08 +0200 >>>> From: Eli Zaretskii <eliz@gnu.org> >>>> >>>> Ping! Yuan, any comments? >>>> >>>>> Cc: 66674@debbugs.gnu.org >>>>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>>>> From: Eli Zaretskii <eliz@gnu.org> >>>>> >>>>>> From: Dominik Honnef <dominik@honnef.co> >>>>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>>>> >>>>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>>>> produce different parse trees than treesit in Emacs. Specifically, the >>>>>> assignment of nodes to named fields differs. >>>>>> >>>>>> Given the following C source: >>>>>> >>>>>> void main() { >>>>>> int x = // foo >>>>>> 1+ >>>>>> // comment >>>>>> 2; >>>>>> } >>>>>> >>>>>> treesit-explore-mode displays the following tree: >>>>>> >>>>>> (translation_unit >>>>>> (function_definition type: (primitive_type) >>>>>> declarator: >>>>>> (function_declarator declarator: (identifier) >>>>>> parameters: (parameter_list ( ))) >>>>>> body: >>>>>> (compound_statement { >>>>>> (declaration type: (primitive_type) >>>>>> declarator: >>>>>> (init_declarator declarator: (identifier) = value: (comment) >>>>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>>>> ;) >>>>>> }))) >>>>>> >>>>>> Note how in the init_declarator node, the 'value' field is a comment >>>>>> node, and similarly for the 'right' field in the binary_expression node. >>>>>> >>>>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>>>> following tree: >>>>>> >>>>>> (translation_unit [0, 0] - [6, 0] >>>>>> (function_definition [0, 0] - [5, 1] >>>>>> type: (primitive_type [0, 0] - [0, 4]) >>>>>> declarator: (function_declarator [0, 5] - [0, 11] >>>>>> declarator: (identifier [0, 5] - [0, 9]) >>>>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>>>> body: (compound_statement [0, 12] - [5, 1] >>>>>> (declaration [1, 2] - [4, 6] >>>>>> type: (primitive_type [1, 2] - [1, 5]) >>>>>> declarator: (init_declarator [1, 6] - [4, 5] >>>>>> declarator: (identifier [1, 6] - [1, 7]) >>>>>> (comment [1, 10] - [1, 16]) >>>>>> value: (binary_expression [2, 4] - [4, 5] >>>>>> left: (number_literal [2, 4] - [2, 5]) >>>>>> (comment [3, 4] - [3, 14]) >>>>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>>>> >>>>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>>>> tree is a more useful one, as the named fields contain the semantically >>>>>> important subtrees (e.g. a binary expression is made up of a left and >>>>>> right subtree, not a left subtree, a right comment, and then some >>>>>> unnamed subtree.) >>>>>> >>>>>> Emacs's tree makes writing queries less convenient, as instead of being >>>>>> able to refer to well-defined names, one has to rely on child indices to >>>>>> account for comments. >>>>>> >>>>>> >>>>>> Further mismatch arises from repeated fields and separators. >>>>>> >>>>>> Consider the following Go source: >>>>>> >>>>>> package pkg >>>>>> >>>>>> var a, b, c = 1, 2, 3 >>>>>> >>>>>> treesit-explore-mode displays the following tree: >>>>>> >>>>>> (source_file >>>>>> (package_clause package (package_identifier)) >>>>>> \n >>>>>> (var_declaration var >>>>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>>>> \n) >>>>>> >>>>>> Here, the var_spec node has two fields named 'name' even though the >>>>>> source specifies three names. Furthermore, The second 'name', as well as >>>>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>>>> identifiers aren't named. >>>>>> >>>>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>>>> accurate tree: >>>>>> >>>>>> (source_file [0, 0] - [2, 21] >>>>>> (package_clause [0, 0] - [0, 11] >>>>>> (package_identifier [0, 8] - [0, 11])) >>>>>> (var_declaration [2, 0] - [2, 21] >>>>>> (var_spec [2, 4] - [2, 21] >>>>>> name: (identifier [2, 4] - [2, 5]) >>>>>> name: (identifier [2, 7] - [2, 8]) >>>>>> name: (identifier [2, 10] - [2, 11]) >>>>>> value: (expression_list [2, 14] - [2, 21] >>>>>> (int_literal [2, 14] - [2, 15]) >>>>>> (int_literal [2, 17] - [2, 18]) >>>>>> (int_literal [2, 20] - [2, 21]))))) >>>>>> >>>>>> This reproduces with 29.1 as well as 30.0.50. >>>>> Yuan, any comments or suggestions? >> Sorry sorry sorry, another missed report. I think this is a bug in >> treesit-explore-mode, I'll work on fixing it! >> >> Yuan > I don't think that's the case, at least not exclusively. I used > treesit-explore-mode to debug patterns that matched in the playground > but not in Emacs. The matching behavior seemed pretty in line with what > treesit-explore-mode reported. I do find that treesit-node-field-name are returning wrong field names, that's why in the first example, you see the "value" field name given to the comment node, rather than the binary_expression behind it. In the actual parse tree, "value" belongs to binary_expression. With the fixed I just pushed to emacs-29, the explorer parse tree for the first example becomes (translation_unit (function_definition type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( ))) body: (compound_statement { (declaration type: (primitive_type) declarator: (init_declarator declarator: (identifier) = (comment) value: (binary_expression left: (number_literal) operator: + operator: (comment) right: (number_literal))) ;) }))) which should match the playground. If you can find the pattern that matches in the playground but doesn't in Emacs, do please post it and I can see if there's anything wrong. Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-12-11 1:02 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-21 20:36 bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Dominik Honnef 2023-10-25 13:03 ` Eli Zaretskii 2023-11-19 10:08 ` Eli Zaretskii 2023-11-25 10:03 ` Eli Zaretskii 2023-12-10 10:07 ` Yuan Fu 2023-12-10 14:28 ` Dominik Honnef 2023-12-11 1:02 ` Yuan Fu
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).