From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sun, 19 Nov 2023 12:08:08 +0200 Message-ID: <835y1ykqd3.fsf@gnu.org> References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21558"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 66674@debbugs.gnu.org, dominik@honnef.co To: casouri@gmail.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Nov 19 11:09:28 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1r4ek4-0005PF-Ag for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 19 Nov 2023 11:09:28 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r4ejo-0001ih-Ns; Sun, 19 Nov 2023 05:09:15 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r4ejd-0001f5-FN for bug-gnu-emacs@gnu.org; Sun, 19 Nov 2023 05:09:03 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r4ejd-0007se-4z for bug-gnu-emacs@gnu.org; Sun, 19 Nov 2023 05:09:01 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1r4eje-0006GP-Eo for bug-gnu-emacs@gnu.org; Sun, 19 Nov 2023 05:09:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 19 Nov 2023 10:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs Original-Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.170038851824032 (code B ref 66674); Sun, 19 Nov 2023 10:09:02 +0000 Original-Received: (at 66674) by debbugs.gnu.org; 19 Nov 2023 10:08:38 +0000 Original-Received: from localhost ([127.0.0.1]:49988 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r4ejF-0006FV-Ps for submit@debbugs.gnu.org; Sun, 19 Nov 2023 05:08:38 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53106) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r4ejC-0006FB-JA for 66674@debbugs.gnu.org; Sun, 19 Nov 2023 05:08:35 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r4ej5-0007pT-Hh; Sun, 19 Nov 2023 05:08:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=vug/w8wQjM41yS/F6YfClFt5TzvIlCqD8wpwiujZNxM=; b=Unh3Ja7iBf6K 3C/nw6VFOabTi7fn/gxovHCfZCSd+7lvjNzyFgOHX+Bs4TBlUN6rYDIKfnmWIrwYhiboeywpFWBvL 9el7tdhChi6zB/cVwz64vCmhSv8SUn4vTbeE7RzXqzixxtkOXy6h4rf/q1B5n9HmlegeqYywWqOYL oH9jrTEcMhABoqG5E6XXwcr/oaF79rVpcIlwMBAGnZMXFDi1VdoaStflXvOjnJAqYDa7e0mJ6/Ob2 G6QnE4NXrbGS2Dr9Cc0IRi9Agh5dsbuSS+X+Gptb3NfJwAItqo1FjAYkFGZAKPhxcOU9UUH2Qg8/5 vklvM/kkpF4P5IFbh+XcSA==; In-Reply-To: <835y2ukg6p.fsf@gnu.org> (message from Eli Zaretskii on Wed, 25 Oct 2023 16:03:10 +0300) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:274602 Archived-At: Ping! Yuan, any comments? > Cc: 66674@debbugs.gnu.org > Date: Wed, 25 Oct 2023 16:03:10 +0300 > From: Eli Zaretskii > > > From: Dominik Honnef > > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > > > Using tree-sitter's CLI as well as the publicly hosted playground > > produce different parse trees than treesit in Emacs. Specifically, the > > assignment of nodes to named fields differs. > > > > Given the following C source: > > > > void main() { > > int x = // foo > > 1+ > > // comment > > 2; > > } > > > > treesit-explore-mode displays the following tree: > > > > (translation_unit > > (function_definition type: (primitive_type) > > declarator: > > (function_declarator declarator: (identifier) > > parameters: (parameter_list ( ))) > > body: > > (compound_statement { > > (declaration type: (primitive_type) > > declarator: > > (init_declarator declarator: (identifier) = value: (comment) > > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > > ;) > > }))) > > > > Note how in the init_declarator node, the 'value' field is a comment > > node, and similarly for the 'right' field in the binary_expression node. > > > > Running 'tree-sitter parse file.c', on the other hand, produces the > > following tree: > > > > (translation_unit [0, 0] - [6, 0] > > (function_definition [0, 0] - [5, 1] > > type: (primitive_type [0, 0] - [0, 4]) > > declarator: (function_declarator [0, 5] - [0, 11] > > declarator: (identifier [0, 5] - [0, 9]) > > parameters: (parameter_list [0, 9] - [0, 11])) > > body: (compound_statement [0, 12] - [5, 1] > > (declaration [1, 2] - [4, 6] > > type: (primitive_type [1, 2] - [1, 5]) > > declarator: (init_declarator [1, 6] - [4, 5] > > declarator: (identifier [1, 6] - [1, 7]) > > (comment [1, 10] - [1, 16]) > > value: (binary_expression [2, 4] - [4, 5] > > left: (number_literal [2, 4] - [2, 5]) > > (comment [3, 4] - [3, 14]) > > right: (number_literal [4, 4] - [4, 5]))))))) > > > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > > tree is a more useful one, as the named fields contain the semantically > > important subtrees (e.g. a binary expression is made up of a left and > > right subtree, not a left subtree, a right comment, and then some > > unnamed subtree.) > > > > Emacs's tree makes writing queries less convenient, as instead of being > > able to refer to well-defined names, one has to rely on child indices to > > account for comments. > > > > > > Further mismatch arises from repeated fields and separators. > > > > Consider the following Go source: > > > > package pkg > > > > var a, b, c = 1, 2, 3 > > > > treesit-explore-mode displays the following tree: > > > > (source_file > > (package_clause package (package_identifier)) > > \n > > (var_declaration var > > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > > (expression_list (int_literal) , (int_literal) , (int_literal)))) > > \n) > > > > Here, the var_spec node has two fields named 'name' even though the > > source specifies three names. Furthermore, The second 'name', as well as > > 'value' are set to the ',' separator between identifiers. Two of the three > > identifiers aren't named. > > > > 'tree-sitter parse file.go', on the other hand, produces this more > > accurate tree: > > > > (source_file [0, 0] - [2, 21] > > (package_clause [0, 0] - [0, 11] > > (package_identifier [0, 8] - [0, 11])) > > (var_declaration [2, 0] - [2, 21] > > (var_spec [2, 4] - [2, 21] > > name: (identifier [2, 4] - [2, 5]) > > name: (identifier [2, 7] - [2, 8]) > > name: (identifier [2, 10] - [2, 11]) > > value: (expression_list [2, 14] - [2, 21] > > (int_literal [2, 14] - [2, 15]) > > (int_literal [2, 17] - [2, 18]) > > (int_literal [2, 20] - [2, 21]))))) > > > > This reproduces with 29.1 as well as 30.0.50. > > Yuan, any comments or suggestions? > > > >