From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sat, 25 Nov 2023 12:03:27 +0200 Message-ID: <83ttpacfps.fsf@gnu.org> References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> <835y1ykqd3.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39755"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 66674@debbugs.gnu.org, dominik@honnef.co To: casouri@gmail.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Nov 25 11:04:15 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1r6pWI-000A2d-SL for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 25 Nov 2023 11:04:15 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r6pW2-00057S-HT; Sat, 25 Nov 2023 05:03:58 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r6pW0-00056l-W3 for bug-gnu-emacs@gnu.org; Sat, 25 Nov 2023 05:03:57 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r6pW0-0005Yc-O0 for bug-gnu-emacs@gnu.org; Sat, 25 Nov 2023 05:03:56 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1r6pW5-0004ne-MY for bug-gnu-emacs@gnu.org; Sat, 25 Nov 2023 05:04:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 25 Nov 2023 10:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs Original-Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.170090662318334 (code B ref 66674); Sat, 25 Nov 2023 10:04:01 +0000 Original-Received: (at 66674) by debbugs.gnu.org; 25 Nov 2023 10:03:43 +0000 Original-Received: from localhost ([127.0.0.1]:37883 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r6pVm-0004le-DK for submit@debbugs.gnu.org; Sat, 25 Nov 2023 05:03:42 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50390) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r6pVk-0004lS-Bl for 66674@debbugs.gnu.org; Sat, 25 Nov 2023 05:03:41 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r6pVZ-0005Wr-Lt; Sat, 25 Nov 2023 05:03:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=a4Ed5pBVxDv2InKjqbRM5IlhcKXnj6WORqtOoAGBICw=; b=abu2viRdn7y+ 5nFjqnc+BnVLg8XWyFxP2rhQbjdTEZ2zqtOj9k25udtlwDdn/7TqOWYporo1dGFZcfOzIrSRPfL5J tvOkfQ8dibTKJ5C6rRlpWteS1ilsYktT6DaSrsiX8vJ2PORcMyKbdfHJRjNktqzbF1nfOIq8UXJUO Vrjywx0LBUT6CK7gPCPeGPv+KOuxEgttE6aekU4e8gEv1emSNi0325IfgcCIdZK0ak7AfKAUqpXDK I7WKtJ9SwwU76eiPF5jSvk8PiB3vVgnhqN6KrYiv4yMsDKfPNv8SfhpoLvjYBBzFEs0kcuo68uku6 cTIA20JVro8uO4wnPIfSrg==; In-Reply-To: <835y1ykqd3.fsf@gnu.org> (message from Eli Zaretskii on Sun, 19 Nov 2023 12:08:08 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:274944 Archived-At: Ping! Ping! Yuan, please chime in. > Cc: 66674@debbugs.gnu.org, dominik@honnef.co > Date: Sun, 19 Nov 2023 12:08:08 +0200 > From: Eli Zaretskii > > Ping! Yuan, any comments? > > > Cc: 66674@debbugs.gnu.org > > Date: Wed, 25 Oct 2023 16:03:10 +0300 > > From: Eli Zaretskii > > > > > From: Dominik Honnef > > > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > > > > > Using tree-sitter's CLI as well as the publicly hosted playground > > > produce different parse trees than treesit in Emacs. Specifically, the > > > assignment of nodes to named fields differs. > > > > > > Given the following C source: > > > > > > void main() { > > > int x = // foo > > > 1+ > > > // comment > > > 2; > > > } > > > > > > treesit-explore-mode displays the following tree: > > > > > > (translation_unit > > > (function_definition type: (primitive_type) > > > declarator: > > > (function_declarator declarator: (identifier) > > > parameters: (parameter_list ( ))) > > > body: > > > (compound_statement { > > > (declaration type: (primitive_type) > > > declarator: > > > (init_declarator declarator: (identifier) = value: (comment) > > > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > > > ;) > > > }))) > > > > > > Note how in the init_declarator node, the 'value' field is a comment > > > node, and similarly for the 'right' field in the binary_expression node. > > > > > > Running 'tree-sitter parse file.c', on the other hand, produces the > > > following tree: > > > > > > (translation_unit [0, 0] - [6, 0] > > > (function_definition [0, 0] - [5, 1] > > > type: (primitive_type [0, 0] - [0, 4]) > > > declarator: (function_declarator [0, 5] - [0, 11] > > > declarator: (identifier [0, 5] - [0, 9]) > > > parameters: (parameter_list [0, 9] - [0, 11])) > > > body: (compound_statement [0, 12] - [5, 1] > > > (declaration [1, 2] - [4, 6] > > > type: (primitive_type [1, 2] - [1, 5]) > > > declarator: (init_declarator [1, 6] - [4, 5] > > > declarator: (identifier [1, 6] - [1, 7]) > > > (comment [1, 10] - [1, 16]) > > > value: (binary_expression [2, 4] - [4, 5] > > > left: (number_literal [2, 4] - [2, 5]) > > > (comment [3, 4] - [3, 14]) > > > right: (number_literal [4, 4] - [4, 5]))))))) > > > > > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > > > tree is a more useful one, as the named fields contain the semantically > > > important subtrees (e.g. a binary expression is made up of a left and > > > right subtree, not a left subtree, a right comment, and then some > > > unnamed subtree.) > > > > > > Emacs's tree makes writing queries less convenient, as instead of being > > > able to refer to well-defined names, one has to rely on child indices to > > > account for comments. > > > > > > > > > Further mismatch arises from repeated fields and separators. > > > > > > Consider the following Go source: > > > > > > package pkg > > > > > > var a, b, c = 1, 2, 3 > > > > > > treesit-explore-mode displays the following tree: > > > > > > (source_file > > > (package_clause package (package_identifier)) > > > \n > > > (var_declaration var > > > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > > > (expression_list (int_literal) , (int_literal) , (int_literal)))) > > > \n) > > > > > > Here, the var_spec node has two fields named 'name' even though the > > > source specifies three names. Furthermore, The second 'name', as well as > > > 'value' are set to the ',' separator between identifiers. Two of the three > > > identifiers aren't named. > > > > > > 'tree-sitter parse file.go', on the other hand, produces this more > > > accurate tree: > > > > > > (source_file [0, 0] - [2, 21] > > > (package_clause [0, 0] - [0, 11] > > > (package_identifier [0, 8] - [0, 11])) > > > (var_declaration [2, 0] - [2, 21] > > > (var_spec [2, 4] - [2, 21] > > > name: (identifier [2, 4] - [2, 5]) > > > name: (identifier [2, 7] - [2, 8]) > > > name: (identifier [2, 10] - [2, 11]) > > > value: (expression_list [2, 14] - [2, 21] > > > (int_literal [2, 14] - [2, 15]) > > > (int_literal [2, 17] - [2, 18]) > > > (int_literal [2, 20] - [2, 21]))))) > > > > > > This reproduces with 29.1 as well as 30.0.50. > > > > Yuan, any comments or suggestions? > > > > > > > > > > > >