From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Wed, 25 Oct 2023 16:03:10 +0300 Message-ID: <835y2ukg6p.fsf@gnu.org> References: <87edhnzp9t.fsf@honnef.co> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23416"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 66674@debbugs.gnu.org To: Dominik Honnef , Yuan Fu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Oct 25 15:03:38 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qvdXu-0005qs-FK for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 25 Oct 2023 15:03:38 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qvdXq-00032p-BZ; Wed, 25 Oct 2023 09:03:34 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qvdXo-00032O-Ds for bug-gnu-emacs@gnu.org; Wed, 25 Oct 2023 09:03:32 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qvdXo-000240-5E for bug-gnu-emacs@gnu.org; Wed, 25 Oct 2023 09:03:32 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qvdYI-0007Lv-9K for bug-gnu-emacs@gnu.org; Wed, 25 Oct 2023 09:04:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 25 Oct 2023 13:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs Original-Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.169823902228231 (code B ref 66674); Wed, 25 Oct 2023 13:04:02 +0000 Original-Received: (at 66674) by debbugs.gnu.org; 25 Oct 2023 13:03:42 +0000 Original-Received: from localhost ([127.0.0.1]:57693 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qvdXx-0007LH-V2 for submit@debbugs.gnu.org; Wed, 25 Oct 2023 09:03:42 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46094) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qvdXv-0007L1-Qs for 66674@debbugs.gnu.org; Wed, 25 Oct 2023 09:03:40 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qvdXL-0001xb-NV; Wed, 25 Oct 2023 09:03:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=n0jZ2TtcH17DmMgCWJqcDkZUn2Iy4TkmXr+f6VcNlIU=; b=g7NaTxkHUwti 5oq+jO/YTyizfH1SlbWXsuupGA29dJG6RhQOk5ZtiXvEM6wAMyRRGg4DJYv3s9xW3B2Xw+L3I0p1o LPj83ThoGh4Gd62IFoHFLA2pdLICmNf+rV3XPu3jaJG1XV294Q1lcITlGykb+meu4DtFiVBRV0Mxw B1NnNiPdqA9Sp4MuXXG3kUcJeF6QIt581NVi/7o2BeJFdPpBfN7O4g54EGmZEuUAVGyze6dgVtqcv OBmieoir8IxJLKOxr8IuNH21iY4AsDJsXuNRVZ2YQ1Hgt4aguqwjzvLvjBGjjC8YPWAulRhplYP+B +DB+5hdGz3M/6Hw1zaZBtw==; In-Reply-To: <87edhnzp9t.fsf@honnef.co> (message from Dominik Honnef on Sat, 21 Oct 2023 22:36:30 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:273171 Archived-At: > From: Dominik Honnef > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > Using tree-sitter's CLI as well as the publicly hosted playground > produce different parse trees than treesit in Emacs. Specifically, the > assignment of nodes to named fields differs. > > Given the following C source: > > void main() { > int x = // foo > 1+ > // comment > 2; > } > > treesit-explore-mode displays the following tree: > > (translation_unit > (function_definition type: (primitive_type) > declarator: > (function_declarator declarator: (identifier) > parameters: (parameter_list ( ))) > body: > (compound_statement { > (declaration type: (primitive_type) > declarator: > (init_declarator declarator: (identifier) = value: (comment) > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > ;) > }))) > > Note how in the init_declarator node, the 'value' field is a comment > node, and similarly for the 'right' field in the binary_expression node. > > Running 'tree-sitter parse file.c', on the other hand, produces the > following tree: > > (translation_unit [0, 0] - [6, 0] > (function_definition [0, 0] - [5, 1] > type: (primitive_type [0, 0] - [0, 4]) > declarator: (function_declarator [0, 5] - [0, 11] > declarator: (identifier [0, 5] - [0, 9]) > parameters: (parameter_list [0, 9] - [0, 11])) > body: (compound_statement [0, 12] - [5, 1] > (declaration [1, 2] - [4, 6] > type: (primitive_type [1, 2] - [1, 5]) > declarator: (init_declarator [1, 6] - [4, 5] > declarator: (identifier [1, 6] - [1, 7]) > (comment [1, 10] - [1, 16]) > value: (binary_expression [2, 4] - [4, 5] > left: (number_literal [2, 4] - [2, 5]) > (comment [3, 4] - [3, 14]) > right: (number_literal [4, 4] - [4, 5]))))))) > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > tree is a more useful one, as the named fields contain the semantically > important subtrees (e.g. a binary expression is made up of a left and > right subtree, not a left subtree, a right comment, and then some > unnamed subtree.) > > Emacs's tree makes writing queries less convenient, as instead of being > able to refer to well-defined names, one has to rely on child indices to > account for comments. > > > Further mismatch arises from repeated fields and separators. > > Consider the following Go source: > > package pkg > > var a, b, c = 1, 2, 3 > > treesit-explore-mode displays the following tree: > > (source_file > (package_clause package (package_identifier)) > \n > (var_declaration var > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > (expression_list (int_literal) , (int_literal) , (int_literal)))) > \n) > > Here, the var_spec node has two fields named 'name' even though the > source specifies three names. Furthermore, The second 'name', as well as > 'value' are set to the ',' separator between identifiers. Two of the three > identifiers aren't named. > > 'tree-sitter parse file.go', on the other hand, produces this more > accurate tree: > > (source_file [0, 0] - [2, 21] > (package_clause [0, 0] - [0, 11] > (package_identifier [0, 8] - [0, 11])) > (var_declaration [2, 0] - [2, 21] > (var_spec [2, 4] - [2, 21] > name: (identifier [2, 4] - [2, 5]) > name: (identifier [2, 7] - [2, 8]) > name: (identifier [2, 10] - [2, 11]) > value: (expression_list [2, 14] - [2, 21] > (int_literal [2, 14] - [2, 15]) > (int_literal [2, 17] - [2, 18]) > (int_literal [2, 20] - [2, 21]))))) > > This reproduces with 29.1 as well as 30.0.50. Yuan, any comments or suggestions?