From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.bugs Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sun, 10 Dec 2023 02:07:35 -0800 Message-ID: <5ad5f956-7533-451b-9815-1710713ee334@gmail.com> References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> <835y1ykqd3.fsf@gnu.org> <83ttpacfps.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9613"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla Thunderbird Cc: 66674@debbugs.gnu.org, dominik@honnef.co To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Dec 10 11:08:05 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rCGjE-0002Gt-Cz for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 10 Dec 2023 11:08:04 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rCGj0-0001oB-Me; Sun, 10 Dec 2023 05:07:50 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rCGiz-0001nx-5G for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 05:07:49 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rCGiy-0005q3-Sz for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 05:07:48 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rCGjB-00063P-U7 for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 05:08:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Dec 2023 10:08:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs Original-Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.170220288023264 (code B ref 66674); Sun, 10 Dec 2023 10:08:01 +0000 Original-Received: (at 66674) by debbugs.gnu.org; 10 Dec 2023 10:08:00 +0000 Original-Received: from localhost ([127.0.0.1]:49523 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCGj9-000639-AP for submit@debbugs.gnu.org; Sun, 10 Dec 2023 05:07:59 -0500 Original-Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]:49202) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCGj7-00062s-5Z for 66674@debbugs.gnu.org; Sun, 10 Dec 2023 05:07:57 -0500 Original-Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1d053c45897so30796815ad.2 for <66674@debbugs.gnu.org>; Sun, 10 Dec 2023 02:07:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702202857; x=1702807657; darn=debbugs.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=62WsOQdhsnBRfiFan3+3nPsomkdk78mR31J5+ijqXMw=; b=BmLjv4Tt3TGPvascmTdw2zYbrEC7+20JblCTA8abOkfosnnrjqC+CGAH8ieAuWHfce iG/EJXuniU21tNOZfe7V9U/NofvqZWbp1a3s/cjxsQ5/jLSCMdJzBOmVwyS4h5vO5hrd SuK5XmyF3u9JLXqiSzGcEwjuIrvJkhekdn3H+gMo6AOSBN55I7Zzkez2ZDF0aCwNgUOB B8GK8HAI/3NNC3nlG3GPkdfU6WjWFJAEVTul6DilifIviE51n9uiiqCoH8hc+vrUSujA UYKWbSqYLKH6fETS6lVie+iA9nGBAkO894qPfGFOCSS4Sp/PSpqKMKbhR23MYRj4Srm9 lmpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702202857; x=1702807657; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=62WsOQdhsnBRfiFan3+3nPsomkdk78mR31J5+ijqXMw=; b=PBwokPbSwNJbvpvHoHQQGVr01Dik9IULOxt0NKI8K4shcXgkBXHbn6BCLOPAGRkcDt mLcb/FVLneOacWxcaho176AMGC/FdFhwsezWQD26Z2VLPiP7CzBtTcueXTk9WRENJSDK LBOZeOsYJRsYWFc7jkLAayFmYZgpchInUmed+P7Zitjf9pJsUAbkuTC5B0dkq10VfO3j z4hC/2x3kDjHQpV7qHfPWvvTrT4LXkOT7xbBt6JRvF14UHJjHa9HT8B4yQvCqfaz+zw7 kbLyeNq5zFBCvWjXEfB683qufYfVeeUWzugoD1y30ll+qb753XMnwgarP9ApG3KIZCWY 9jiQ== X-Gm-Message-State: AOJu0YxFvAAtBuKKuS8pjxz/O7DxD+/UOfbwqmeUUf4n0QkPV9DTjMw1 dMVXpggktxNb2m09z2Tp3DM= X-Google-Smtp-Source: AGHT+IEVR7xS1/U7fmT67KpTWyVGl3vJ2L0KIIOqEbjqj/kdtaBEJctQGpjM7jGAbIOIKr318MTkHw== X-Received: by 2002:a17:903:32c5:b0:1d0:a53e:2662 with SMTP id i5-20020a17090332c500b001d0a53e2662mr3194537plr.104.1702202857135; Sun, 10 Dec 2023 02:07:37 -0800 (PST) Original-Received: from [192.168.1.7] (172-117-161-177.res.spectrum.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id x4-20020a170902ec8400b001d0b6caddb1sm4553239plg.137.2023.12.10.02.07.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 10 Dec 2023 02:07:36 -0800 (PST) Content-Language: en-US In-Reply-To: <83ttpacfps.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:275908 Archived-At: On 11/25/23 2:03 AM, Eli Zaretskii wrote: > Ping! Ping! Yuan, please chime in. > >> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >> Date: Sun, 19 Nov 2023 12:08:08 +0200 >> From: Eli Zaretskii >> >> Ping! Yuan, any comments? >> >>> Cc: 66674@debbugs.gnu.org >>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>> From: Eli Zaretskii >>> >>>> From: Dominik Honnef >>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>> >>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>> produce different parse trees than treesit in Emacs. Specifically, the >>>> assignment of nodes to named fields differs. >>>> >>>> Given the following C source: >>>> >>>> void main() { >>>> int x = // foo >>>> 1+ >>>> // comment >>>> 2; >>>> } >>>> >>>> treesit-explore-mode displays the following tree: >>>> >>>> (translation_unit >>>> (function_definition type: (primitive_type) >>>> declarator: >>>> (function_declarator declarator: (identifier) >>>> parameters: (parameter_list ( ))) >>>> body: >>>> (compound_statement { >>>> (declaration type: (primitive_type) >>>> declarator: >>>> (init_declarator declarator: (identifier) = value: (comment) >>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>> ;) >>>> }))) >>>> >>>> Note how in the init_declarator node, the 'value' field is a comment >>>> node, and similarly for the 'right' field in the binary_expression node. >>>> >>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>> following tree: >>>> >>>> (translation_unit [0, 0] - [6, 0] >>>> (function_definition [0, 0] - [5, 1] >>>> type: (primitive_type [0, 0] - [0, 4]) >>>> declarator: (function_declarator [0, 5] - [0, 11] >>>> declarator: (identifier [0, 5] - [0, 9]) >>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>> body: (compound_statement [0, 12] - [5, 1] >>>> (declaration [1, 2] - [4, 6] >>>> type: (primitive_type [1, 2] - [1, 5]) >>>> declarator: (init_declarator [1, 6] - [4, 5] >>>> declarator: (identifier [1, 6] - [1, 7]) >>>> (comment [1, 10] - [1, 16]) >>>> value: (binary_expression [2, 4] - [4, 5] >>>> left: (number_literal [2, 4] - [2, 5]) >>>> (comment [3, 4] - [3, 14]) >>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>> >>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>> tree is a more useful one, as the named fields contain the semantically >>>> important subtrees (e.g. a binary expression is made up of a left and >>>> right subtree, not a left subtree, a right comment, and then some >>>> unnamed subtree.) >>>> >>>> Emacs's tree makes writing queries less convenient, as instead of being >>>> able to refer to well-defined names, one has to rely on child indices to >>>> account for comments. >>>> >>>> >>>> Further mismatch arises from repeated fields and separators. >>>> >>>> Consider the following Go source: >>>> >>>> package pkg >>>> >>>> var a, b, c = 1, 2, 3 >>>> >>>> treesit-explore-mode displays the following tree: >>>> >>>> (source_file >>>> (package_clause package (package_identifier)) >>>> \n >>>> (var_declaration var >>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>> \n) >>>> >>>> Here, the var_spec node has two fields named 'name' even though the >>>> source specifies three names. Furthermore, The second 'name', as well as >>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>> identifiers aren't named. >>>> >>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>> accurate tree: >>>> >>>> (source_file [0, 0] - [2, 21] >>>> (package_clause [0, 0] - [0, 11] >>>> (package_identifier [0, 8] - [0, 11])) >>>> (var_declaration [2, 0] - [2, 21] >>>> (var_spec [2, 4] - [2, 21] >>>> name: (identifier [2, 4] - [2, 5]) >>>> name: (identifier [2, 7] - [2, 8]) >>>> name: (identifier [2, 10] - [2, 11]) >>>> value: (expression_list [2, 14] - [2, 21] >>>> (int_literal [2, 14] - [2, 15]) >>>> (int_literal [2, 17] - [2, 18]) >>>> (int_literal [2, 20] - [2, 21]))))) >>>> >>>> This reproduces with 29.1 as well as 30.0.50. >>> Yuan, any comments or suggestions? Sorry sorry sorry, another missed report. I think this is a bug in treesit-explore-mode, I'll work on fixing it! Yuan