From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dominik Honnef Newsgroups: gmane.emacs.bugs Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sat, 21 Oct 2023 22:36:30 +0200 Message-ID: <87edhnzp9t.fsf@honnef.co> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5601"; mail-complaints-to="usenet@ciao.gmane.io" To: 66674@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Oct 22 08:31:49 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1quS05-0001EE-1b for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 22 Oct 2023 08:31:49 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1quRzs-0005Uh-Sv; Sun, 22 Oct 2023 02:31:37 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1quRzq-0005UJ-H6 for bug-gnu-emacs@gnu.org; Sun, 22 Oct 2023 02:31:35 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1quRzq-0007e9-9Q for bug-gnu-emacs@gnu.org; Sun, 22 Oct 2023 02:31:34 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1quS0H-0003Db-Ui for bug-gnu-emacs@gnu.org; Sun, 22 Oct 2023 02:32:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dominik Honnef Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 22 Oct 2023 06:32:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 66674 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.169795626412281 (code B ref -1); Sun, 22 Oct 2023 06:32:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 22 Oct 2023 06:31:04 +0000 Original-Received: from localhost ([127.0.0.1]:45405 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1quRzK-0003Bx-U8 for submit@debbugs.gnu.org; Sun, 22 Oct 2023 02:31:03 -0400 Original-Received: from lists.gnu.org ([2001:470:142::17]:60224) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1quIif-0006CQ-IW for submit@debbugs.gnu.org; Sat, 21 Oct 2023 16:37:18 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1quIi8-0003Jy-1D for bug-gnu-emacs@gnu.org; Sat, 21 Oct 2023 16:36:40 -0400 Original-Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1quIi6-00055d-0k for bug-gnu-emacs@gnu.org; Sat, 21 Oct 2023 16:36:39 -0400 Original-Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-32da7ac5c4fso1379249f8f.1 for ; Sat, 21 Oct 2023 13:36:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=honnef-co.20230601.gappssmtp.com; s=20230601; t=1697920594; x=1698525394; darn=gnu.org; h=mime-version:message-id:date:subject:to:from:from:to:cc:subject :date:message-id:reply-to; bh=PFjdNz5zQlWoi+hYlZqNb8fJo0xNv/JWYyfTS1bT6FE=; b=ANuiafN2IGPYjUD9zxdxI73oNzIZ/9e47Zi7HuI9IsV97a+8mnp48cjS3+/FRPjk2r XdRHuIKltOQgy8up3dDkqSNUgwApRUeaYYlQ2dQWKwAoJD7bZxnXx0o5EP+jYWd+BA81 NydwEULQpDPrHkZyQHr/Uc+KlPZrBwgBT32ccKulw6KlPwJL1g4vp+5Zy3UoEJ2tiTiH BoJA4D5HV+dHsp1cWkIP9RwHyHad5fRmxV54wUNG7E+FHIi4NLw7lF7tNHIWVhzZQ8qj 1wJKRVsiptf9unKTeJJj+N+WcJxF/kN18jeSAKYTgNGR7KP0YIYkkHu7+L9XEHe3BsG1 H2dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697920594; x=1698525394; h=mime-version:message-id:date:subject:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=PFjdNz5zQlWoi+hYlZqNb8fJo0xNv/JWYyfTS1bT6FE=; b=F758wlJIs53Lt87Ez61/r05cQ+pg3Og7Ef6r0K/EVGKUp9mTqULGbwv78ea9w4KJHG B41WVvE0r/4hcIp1ybMxZMoDYY4cZ15Y+Y+NnnbBTgNn8dZ2jLZbMvSstyw7jLPls3gt rynyjcCyyTHO6pp0o+HdigK3THP74zMUasWuwQB0k3ioo0nEw5frUJWvqttcZc7GYYzy kHBw6D1h9mZf/L2ftwMpO462IDlAuA+XnzroVjY1Mqwa4j4+DAq+jupAgmCwSMiaYXPy Ceul7lgOJzICdYRGYeN3IoKIBh6Y6AGyxyYLXTDdXuIPNUVJzk0iM99FANDMFvsW6+PA wflw== X-Gm-Message-State: AOJu0Yx++62QtO33wMKeiaigHZlxWJCzhcJGZBwmcZFT5Q0JFePPAoMh saaGyEjM/W19wziRwS+IzRkmOKcFxF8XAK445Is= X-Google-Smtp-Source: AGHT+IGUSXmAOMU5WaXkIJSPW/YuCDOu1IZVb+HWGoIBLQsCIeL3D2nApWT0enIDiKNs6UpaNO+L2w== X-Received: by 2002:a5d:44ce:0:b0:329:6b3e:d87d with SMTP id z14-20020a5d44ce000000b003296b3ed87dmr3236056wrr.42.1697920593299; Sat, 21 Oct 2023 13:36:33 -0700 (PDT) Original-Received: from localhost (ip-176-199-155-051.um44.pools.vodafone-ip.de. [176.199.155.51]) by smtp.gmail.com with ESMTPSA id a10-20020adfe5ca000000b0032415213a6fsm4290864wrn.87.2023.10.21.13.36.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Oct 2023 13:36:32 -0700 (PDT) Received-SPF: none client-ip=2a00:1450:4864:20::42f; envelope-from=dominik@honnef.co; helo=mail-wr1-x42f.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_FILL_THIS_FORM_SHORT=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sun, 22 Oct 2023 02:31:02 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:272960 Archived-At: Using tree-sitter's CLI as well as the publicly hosted playground produce different parse trees than treesit in Emacs. Specifically, the assignment of nodes to named fields differs. Given the following C source: void main() { int x = // foo 1+ // comment 2; } treesit-explore-mode displays the following tree: (translation_unit (function_definition type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( ))) body: (compound_statement { (declaration type: (primitive_type) declarator: (init_declarator declarator: (identifier) = value: (comment) (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) ;) }))) Note how in the init_declarator node, the 'value' field is a comment node, and similarly for the 'right' field in the binary_expression node. Running 'tree-sitter parse file.c', on the other hand, produces the following tree: (translation_unit [0, 0] - [6, 0] (function_definition [0, 0] - [5, 1] type: (primitive_type [0, 0] - [0, 4]) declarator: (function_declarator [0, 5] - [0, 11] declarator: (identifier [0, 5] - [0, 9]) parameters: (parameter_list [0, 9] - [0, 11])) body: (compound_statement [0, 12] - [5, 1] (declaration [1, 2] - [4, 6] type: (primitive_type [1, 2] - [1, 5]) declarator: (init_declarator [1, 6] - [4, 5] declarator: (identifier [1, 6] - [1, 7]) (comment [1, 10] - [1, 16]) value: (binary_expression [2, 4] - [4, 5] left: (number_literal [2, 4] - [2, 5]) (comment [3, 4] - [3, 14]) right: (number_literal [4, 4] - [4, 5]))))))) Here, the two comment nodes appear as unnamed nodes. IMHO the second tree is a more useful one, as the named fields contain the semantically important subtrees (e.g. a binary expression is made up of a left and right subtree, not a left subtree, a right comment, and then some unnamed subtree.) Emacs's tree makes writing queries less convenient, as instead of being able to refer to well-defined names, one has to rely on child indices to account for comments. Further mismatch arises from repeated fields and separators. Consider the following Go source: package pkg var a, b, c = 1, 2, 3 treesit-explore-mode displays the following tree: (source_file (package_clause package (package_identifier)) \n (var_declaration var (var_spec name: (identifier) name: , (identifier) value: , (identifier) = (expression_list (int_literal) , (int_literal) , (int_literal)))) \n) Here, the var_spec node has two fields named 'name' even though the source specifies three names. Furthermore, The second 'name', as well as 'value' are set to the ',' separator between identifiers. Two of the three identifiers aren't named. 'tree-sitter parse file.go', on the other hand, produces this more accurate tree: (source_file [0, 0] - [2, 21] (package_clause [0, 0] - [0, 11] (package_identifier [0, 8] - [0, 11])) (var_declaration [2, 0] - [2, 21] (var_spec [2, 4] - [2, 21] name: (identifier [2, 4] - [2, 5]) name: (identifier [2, 7] - [2, 8]) name: (identifier [2, 10] - [2, 11]) value: (expression_list [2, 14] - [2, 21] (int_literal [2, 14] - [2, 15]) (int_literal [2, 17] - [2, 18]) (int_literal [2, 20] - [2, 21]))))) This reproduces with 29.1 as well as 30.0.50.