From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.bugs Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sun, 10 Dec 2023 17:02:48 -0800 Message-ID: References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> <835y1ykqd3.fsf@gnu.org> <83ttpacfps.fsf@gnu.org> <5ad5f956-7533-451b-9815-1710713ee334@gmail.com> <87r0jujfmx.fsf@honnef.co> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38848"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla Thunderbird Cc: 66674-done@debbugs.gnu.org To: Dominik Honnef , Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Dec 11 02:04:09 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rCUiO-0009sC-Sg for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 11 Dec 2023 02:04:08 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rCUi5-0006ai-Ob; Sun, 10 Dec 2023 20:03:49 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rCUi4-0006Yt-NO for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 20:03:48 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rCUi4-0006fa-F7 for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 20:03:48 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rCUiI-0003md-Aq for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 20:04:02 -0500 Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-To: bug-gnu-emacs@gnu.org Resent-Date: Mon, 11 Dec 2023 01:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: cc-closed 66674 X-GNU-PR-Package: emacs Mail-Followup-To: 66674@debbugs.gnu.org, casouri@gmail.com, dominik@honnef.co Original-Received: via spool by 66674-done@debbugs.gnu.org id=D66674.170225659614480 (code D ref 66674); Mon, 11 Dec 2023 01:04:02 +0000 Original-Received: (at 66674-done) by debbugs.gnu.org; 11 Dec 2023 01:03:16 +0000 Original-Received: from localhost ([127.0.0.1]:52140 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCUhX-0003lU-VI for submit@debbugs.gnu.org; Sun, 10 Dec 2023 20:03:16 -0500 Original-Received: from mail-pl1-x62e.google.com ([2607:f8b0:4864:20::62e]:47524) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCUhT-0003lE-0s for 66674-done@debbugs.gnu.org; Sun, 10 Dec 2023 20:03:14 -0500 Original-Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1d0bcc0c313so20819815ad.3 for <66674-done@debbugs.gnu.org>; Sun, 10 Dec 2023 17:02:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702256570; x=1702861370; darn=debbugs.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=WXfwJLNXUvENkGRTP0c1UhTtEYpXmJHjmGsCbYw6jcU=; b=K4Cz/JWXALGTOZkQLb1K1LGWLNPqYkZdiOKdBtHW4oNoNNKVjhfQcXxqvjh4kAw0Lx JqAUw+Fl5mjMm2BN40prCGIpsfaAfz+YctnOcc3wH71qaVIjbb+obZgzoy+21ukXB/bK YLDYuHzKChjtds4BzXwuThr1lRZXFVAZJ+Cxgd5RXB6/yDiAnSHJ5A3mvPbiTpwQ7cb2 Tr5RwrRS/XhHq7VD2OJtt92DgL9Fe8ys9AE9xQNs15SaOiWmDkEJp1zN5FpnX+JmuRtL B7Qqj5SHLjy4OaMypnhsfBrMM2b5scy0Bh6zSInLuUzN3+A3seSC66POImIDGgShcZRx W4fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702256570; x=1702861370; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WXfwJLNXUvENkGRTP0c1UhTtEYpXmJHjmGsCbYw6jcU=; b=g93EHpaN1EldKGtFAZmg7yqCR0p+LPL7YcblHg8vibecATkK5kwxUMEQ+U7LNy8dRw ++MHVVXoPqduXWZX0urd18lZHQ+1bQ+Qt0s/zyZjIYnW9a4e3LMu81P6j8OxTxbzQy6T nmmgtAqeLnA0gwDjbmMiWXgz6u5B/Tfv6FeBaa2EKCiC0HKt7SMXfkjWB9eVjB/xfwcT r3pke/+lx6ep3LmW+yWLnxqPeiuOK4b8eCkEMkc3WNJPhg29cMe6PCWOaSqb35nqCTRR ghkJd5VgEzKUwVgv0KDu8AMo+ZTBZdajRtynmV7+zoMRXyXzhbtXwt1JjYAfmXNrpoT9 7p0w== X-Gm-Message-State: AOJu0Yy7HVZ0QSPN0L7pHuLOeRzAILxJx/1I75R+XYFgta7njKlpYBsv 0kyA5ot7L0/lAKXAso90D5k= X-Google-Smtp-Source: AGHT+IGx6UdRCgaO4F5QeoaSE+tVmGy6LQwFzqFYcZ3n26M34OGxW17wd0eVBe6z+C62h2JurVe0xg== X-Received: by 2002:a17:903:40c6:b0:1d0:9c54:2fa5 with SMTP id t6-20020a17090340c600b001d09c542fa5mr1343914pld.25.1702256570516; Sun, 10 Dec 2023 17:02:50 -0800 (PST) Original-Received: from [192.168.1.7] (172-117-161-177.res.spectrum.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id k10-20020a170902694a00b001cca8a01e68sm5286411plt.278.2023.12.10.17.02.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 10 Dec 2023 17:02:50 -0800 (PST) Content-Language: en-US In-Reply-To: <87r0jujfmx.fsf@honnef.co> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:275954 Archived-At: On 12/10/23 6:28 AM, Dominik Honnef wrote: > Yuan Fu writes: > >> On 11/25/23 2:03 AM, Eli Zaretskii wrote: >>> Ping! Ping! Yuan, please chime in. >>> >>>> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >>>> Date: Sun, 19 Nov 2023 12:08:08 +0200 >>>> From: Eli Zaretskii >>>> >>>> Ping! Yuan, any comments? >>>> >>>>> Cc: 66674@debbugs.gnu.org >>>>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>>>> From: Eli Zaretskii >>>>> >>>>>> From: Dominik Honnef >>>>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>>>> >>>>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>>>> produce different parse trees than treesit in Emacs. Specifically, the >>>>>> assignment of nodes to named fields differs. >>>>>> >>>>>> Given the following C source: >>>>>> >>>>>> void main() { >>>>>> int x = // foo >>>>>> 1+ >>>>>> // comment >>>>>> 2; >>>>>> } >>>>>> >>>>>> treesit-explore-mode displays the following tree: >>>>>> >>>>>> (translation_unit >>>>>> (function_definition type: (primitive_type) >>>>>> declarator: >>>>>> (function_declarator declarator: (identifier) >>>>>> parameters: (parameter_list ( ))) >>>>>> body: >>>>>> (compound_statement { >>>>>> (declaration type: (primitive_type) >>>>>> declarator: >>>>>> (init_declarator declarator: (identifier) = value: (comment) >>>>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>>>> ;) >>>>>> }))) >>>>>> >>>>>> Note how in the init_declarator node, the 'value' field is a comment >>>>>> node, and similarly for the 'right' field in the binary_expression node. >>>>>> >>>>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>>>> following tree: >>>>>> >>>>>> (translation_unit [0, 0] - [6, 0] >>>>>> (function_definition [0, 0] - [5, 1] >>>>>> type: (primitive_type [0, 0] - [0, 4]) >>>>>> declarator: (function_declarator [0, 5] - [0, 11] >>>>>> declarator: (identifier [0, 5] - [0, 9]) >>>>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>>>> body: (compound_statement [0, 12] - [5, 1] >>>>>> (declaration [1, 2] - [4, 6] >>>>>> type: (primitive_type [1, 2] - [1, 5]) >>>>>> declarator: (init_declarator [1, 6] - [4, 5] >>>>>> declarator: (identifier [1, 6] - [1, 7]) >>>>>> (comment [1, 10] - [1, 16]) >>>>>> value: (binary_expression [2, 4] - [4, 5] >>>>>> left: (number_literal [2, 4] - [2, 5]) >>>>>> (comment [3, 4] - [3, 14]) >>>>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>>>> >>>>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>>>> tree is a more useful one, as the named fields contain the semantically >>>>>> important subtrees (e.g. a binary expression is made up of a left and >>>>>> right subtree, not a left subtree, a right comment, and then some >>>>>> unnamed subtree.) >>>>>> >>>>>> Emacs's tree makes writing queries less convenient, as instead of being >>>>>> able to refer to well-defined names, one has to rely on child indices to >>>>>> account for comments. >>>>>> >>>>>> >>>>>> Further mismatch arises from repeated fields and separators. >>>>>> >>>>>> Consider the following Go source: >>>>>> >>>>>> package pkg >>>>>> >>>>>> var a, b, c = 1, 2, 3 >>>>>> >>>>>> treesit-explore-mode displays the following tree: >>>>>> >>>>>> (source_file >>>>>> (package_clause package (package_identifier)) >>>>>> \n >>>>>> (var_declaration var >>>>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>>>> \n) >>>>>> >>>>>> Here, the var_spec node has two fields named 'name' even though the >>>>>> source specifies three names. Furthermore, The second 'name', as well as >>>>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>>>> identifiers aren't named. >>>>>> >>>>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>>>> accurate tree: >>>>>> >>>>>> (source_file [0, 0] - [2, 21] >>>>>> (package_clause [0, 0] - [0, 11] >>>>>> (package_identifier [0, 8] - [0, 11])) >>>>>> (var_declaration [2, 0] - [2, 21] >>>>>> (var_spec [2, 4] - [2, 21] >>>>>> name: (identifier [2, 4] - [2, 5]) >>>>>> name: (identifier [2, 7] - [2, 8]) >>>>>> name: (identifier [2, 10] - [2, 11]) >>>>>> value: (expression_list [2, 14] - [2, 21] >>>>>> (int_literal [2, 14] - [2, 15]) >>>>>> (int_literal [2, 17] - [2, 18]) >>>>>> (int_literal [2, 20] - [2, 21]))))) >>>>>> >>>>>> This reproduces with 29.1 as well as 30.0.50. >>>>> Yuan, any comments or suggestions? >> Sorry sorry sorry, another missed report. I think this is a bug in >> treesit-explore-mode, I'll work on fixing it! >> >> Yuan > I don't think that's the case, at least not exclusively. I used > treesit-explore-mode to debug patterns that matched in the playground > but not in Emacs. The matching behavior seemed pretty in line with what > treesit-explore-mode reported. I do find that treesit-node-field-name are returning wrong field names, that's why in the first example, you see the "value" field name given to the comment node, rather than the binary_expression behind it. In the actual parse tree, "value" belongs to binary_expression. With the fixed I just pushed to emacs-29, the explorer parse tree for the first example becomes (translation_unit (function_definition type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( ))) body: (compound_statement { (declaration type: (primitive_type) declarator: (init_declarator declarator: (identifier) = (comment) value: (binary_expression left: (number_literal) operator: + operator: (comment) right: (number_literal))) ;) }))) which should match the playground. If you can find the pattern that matches in the playground but doesn't in Emacs, do please post it and I can see if there's anything wrong. Yuan