From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dominik Honnef Newsgroups: gmane.emacs.bugs Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sun, 10 Dec 2023 15:28:38 +0100 Message-ID: <87r0jujfmx.fsf@honnef.co> References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> <835y1ykqd3.fsf@gnu.org> <83ttpacfps.fsf@gnu.org> <5ad5f956-7533-451b-9815-1710713ee334@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4985"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 66674@debbugs.gnu.org To: Yuan Fu , Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Dec 10 15:43:11 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rCL1S-00012M-Tw for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 10 Dec 2023 15:43:11 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rCL18-0004zd-Ah; Sun, 10 Dec 2023 09:42:50 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rCL16-0004zQ-Cp for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 09:42:48 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rCL16-0001Eh-4T for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 09:42:48 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rCL1K-0008DC-77 for bug-gnu-emacs@gnu.org; Sun, 10 Dec 2023 09:43:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dominik Honnef Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Dec 2023 14:43:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs Original-Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.170221937131549 (code B ref 66674); Sun, 10 Dec 2023 14:43:02 +0000 Original-Received: (at 66674) by debbugs.gnu.org; 10 Dec 2023 14:42:51 +0000 Original-Received: from localhost ([127.0.0.1]:49847 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCL19-0008Cm-38 for submit@debbugs.gnu.org; Sun, 10 Dec 2023 09:42:51 -0500 Original-Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334]:50564) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCKnk-0007mo-In for 66674@debbugs.gnu.org; Sun, 10 Dec 2023 09:29:02 -0500 Original-Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-40c2308faedso38944765e9.1 for <66674@debbugs.gnu.org>; Sun, 10 Dec 2023 06:28:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=honnef-co.20230601.gappssmtp.com; s=20230601; t=1702218520; x=1702823320; darn=debbugs.gnu.org; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=hZs2zvF19wz9P+2X0dC5vN1Ih3w2eaIdJmLG5BdiGgY=; b=iIZ2jV2BjWjYhH5Awg9e1gDlK2qgd8Rj/c/NAqkCeQxkHORyMcGtaEAdLuNqjaQmXo RdvHaaNGMN/khhoCiQwuBm6fRkzx/YHZZ7v1Gvtb11I/dcKM2QUH/XNXr2pfSSTq9ep4 yx6h+pIF877tlK8sitJ+L0jHDSg6pIx/zDVEQwRMvoXSMk3U5KpbGtLYgpAXdoUVbfQs 39+PpJ0CPehUtO0yLpaN7PM1vp6USqDDXPciq6nNXyZdF+NXKkdJMkoCdbJMDFBBWA4e IablfyumSZiXOiUp51G9J/WwKQMEpmz5CTJuZuQELIdvSWTnZHb1Y5mIkZ/vP/1LYld4 j42Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702218520; x=1702823320; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hZs2zvF19wz9P+2X0dC5vN1Ih3w2eaIdJmLG5BdiGgY=; b=JV77hEe6AleQOJBo/u/3v6xT6GyRPMM2rrE0Ncyd7NbI9zxwxIz4lIoeR+HmmKlkNe SJ9PZnLQm4AX/uN7VacwQLe+e+F02Ypt00eQoQG2iES/Za+Q1xrfbJH6UKdKXAdWoB/J fsXlMDodTv14qbgRsKUBf5pb9VbNG1mziHMV0sM5wJPSWt93IlE7M2Ae2MaOj3k6EeRu uZN/yIKSfilK3084PmmdfG8bU8r2OzrC7cp9cQae6Ph+fUeAo/tZ8eRtdOtjZAr0dNAz /IKCfvqDBc4Iy141P7Va7d7rNtogkIH/CnOrawopE97JLTHRoKhkB5ArEvrq2NUmlKTp 0Krw== X-Gm-Message-State: AOJu0YyWXpARC3LtvpRS+VSx/4+zXn9VOJI0LRsNtOBL7s7wGMMwAPbK ZfDXmLmUqU3RqML1e3/8qNrwSZn9aYBEN+c1/DHDJA== X-Google-Smtp-Source: AGHT+IHr/UZnWEqWr5KhBsuxsTHyJ29/CTesy+wcnkoHuYUo8JiQDK6lKA8npeuXR0uLbomuO/sl6A== X-Received: by 2002:a05:600c:35d6:b0:40b:338b:5f10 with SMTP id r22-20020a05600c35d600b0040b338b5f10mr1588361wmq.32.1702218520448; Sun, 10 Dec 2023 06:28:40 -0800 (PST) Original-Received: from localhost (ip-176-199-155-051.um44.pools.vodafone-ip.de. [176.199.155.51]) by smtp.gmail.com with ESMTPSA id w7-20020a05600c474700b0040c41846919sm3730515wmo.41.2023.12.10.06.28.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Dec 2023 06:28:39 -0800 (PST) In-Reply-To: <5ad5f956-7533-451b-9815-1710713ee334@gmail.com> X-Mailman-Approved-At: Sun, 10 Dec 2023 09:42:49 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:275926 Archived-At: Yuan Fu writes: > On 11/25/23 2:03 AM, Eli Zaretskii wrote: >> Ping! Ping! Yuan, please chime in. >> >>> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >>> Date: Sun, 19 Nov 2023 12:08:08 +0200 >>> From: Eli Zaretskii >>> >>> Ping! Yuan, any comments? >>> >>>> Cc: 66674@debbugs.gnu.org >>>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>>> From: Eli Zaretskii >>>> >>>>> From: Dominik Honnef >>>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>>> >>>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>>> produce different parse trees than treesit in Emacs. Specifically, the >>>>> assignment of nodes to named fields differs. >>>>> >>>>> Given the following C source: >>>>> >>>>> void main() { >>>>> int x = // foo >>>>> 1+ >>>>> // comment >>>>> 2; >>>>> } >>>>> >>>>> treesit-explore-mode displays the following tree: >>>>> >>>>> (translation_unit >>>>> (function_definition type: (primitive_type) >>>>> declarator: >>>>> (function_declarator declarator: (identifier) >>>>> parameters: (parameter_list ( ))) >>>>> body: >>>>> (compound_statement { >>>>> (declaration type: (primitive_type) >>>>> declarator: >>>>> (init_declarator declarator: (identifier) = value: (comment) >>>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>>> ;) >>>>> }))) >>>>> >>>>> Note how in the init_declarator node, the 'value' field is a comment >>>>> node, and similarly for the 'right' field in the binary_expression node. >>>>> >>>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>>> following tree: >>>>> >>>>> (translation_unit [0, 0] - [6, 0] >>>>> (function_definition [0, 0] - [5, 1] >>>>> type: (primitive_type [0, 0] - [0, 4]) >>>>> declarator: (function_declarator [0, 5] - [0, 11] >>>>> declarator: (identifier [0, 5] - [0, 9]) >>>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>>> body: (compound_statement [0, 12] - [5, 1] >>>>> (declaration [1, 2] - [4, 6] >>>>> type: (primitive_type [1, 2] - [1, 5]) >>>>> declarator: (init_declarator [1, 6] - [4, 5] >>>>> declarator: (identifier [1, 6] - [1, 7]) >>>>> (comment [1, 10] - [1, 16]) >>>>> value: (binary_expression [2, 4] - [4, 5] >>>>> left: (number_literal [2, 4] - [2, 5]) >>>>> (comment [3, 4] - [3, 14]) >>>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>>> >>>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>>> tree is a more useful one, as the named fields contain the semantically >>>>> important subtrees (e.g. a binary expression is made up of a left and >>>>> right subtree, not a left subtree, a right comment, and then some >>>>> unnamed subtree.) >>>>> >>>>> Emacs's tree makes writing queries less convenient, as instead of being >>>>> able to refer to well-defined names, one has to rely on child indices to >>>>> account for comments. >>>>> >>>>> >>>>> Further mismatch arises from repeated fields and separators. >>>>> >>>>> Consider the following Go source: >>>>> >>>>> package pkg >>>>> >>>>> var a, b, c = 1, 2, 3 >>>>> >>>>> treesit-explore-mode displays the following tree: >>>>> >>>>> (source_file >>>>> (package_clause package (package_identifier)) >>>>> \n >>>>> (var_declaration var >>>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>>> \n) >>>>> >>>>> Here, the var_spec node has two fields named 'name' even though the >>>>> source specifies three names. Furthermore, The second 'name', as well as >>>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>>> identifiers aren't named. >>>>> >>>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>>> accurate tree: >>>>> >>>>> (source_file [0, 0] - [2, 21] >>>>> (package_clause [0, 0] - [0, 11] >>>>> (package_identifier [0, 8] - [0, 11])) >>>>> (var_declaration [2, 0] - [2, 21] >>>>> (var_spec [2, 4] - [2, 21] >>>>> name: (identifier [2, 4] - [2, 5]) >>>>> name: (identifier [2, 7] - [2, 8]) >>>>> name: (identifier [2, 10] - [2, 11]) >>>>> value: (expression_list [2, 14] - [2, 21] >>>>> (int_literal [2, 14] - [2, 15]) >>>>> (int_literal [2, 17] - [2, 18]) >>>>> (int_literal [2, 20] - [2, 21]))))) >>>>> >>>>> This reproduces with 29.1 as well as 30.0.50. >>>> Yuan, any comments or suggestions? > > Sorry sorry sorry, another missed report. I think this is a bug in > treesit-explore-mode, I'll work on fixing it! > > Yuan I don't think that's the case, at least not exclusively. I used treesit-explore-mode to debug patterns that matched in the playground but not in Emacs. The matching behavior seemed pretty in line with what treesit-explore-mode reported.