all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Dominik Honnef <dominik@honnef.co>
To: 66674@debbugs.gnu.org
Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields
Date: Sat, 21 Oct 2023 22:36:30 +0200	[thread overview]
Message-ID: <87edhnzp9t.fsf@honnef.co> (raw)

Using tree-sitter's CLI as well as the publicly hosted playground
produce different parse trees than treesit in Emacs. Specifically, the
assignment of nodes to named fields differs.

Given the following C source:

    void main() {
      int x = // foo
        1+
        // comment
        2;
    }

treesit-explore-mode displays the following tree:

    (translation_unit
     (function_definition type: (primitive_type)
      declarator: 
       (function_declarator declarator: (identifier)
        parameters: (parameter_list ( )))
      body: 
       (compound_statement {
        (declaration type: (primitive_type)
         declarator: 
          (init_declarator declarator: (identifier) = value: (comment)
           (binary_expression left: (number_literal) operator: + right: (comment) (number_literal)))
         ;)
        })))

Note how in the init_declarator node, the 'value' field is a comment
node, and similarly for the 'right' field in the binary_expression node.

Running 'tree-sitter parse file.c', on the other hand, produces the
following tree:

    (translation_unit [0, 0] - [6, 0]
      (function_definition [0, 0] - [5, 1]
        type: (primitive_type [0, 0] - [0, 4])
        declarator: (function_declarator [0, 5] - [0, 11]
          declarator: (identifier [0, 5] - [0, 9])
          parameters: (parameter_list [0, 9] - [0, 11]))
        body: (compound_statement [0, 12] - [5, 1]
          (declaration [1, 2] - [4, 6]
            type: (primitive_type [1, 2] - [1, 5])
            declarator: (init_declarator [1, 6] - [4, 5]
              declarator: (identifier [1, 6] - [1, 7])
              (comment [1, 10] - [1, 16])
              value: (binary_expression [2, 4] - [4, 5]
                left: (number_literal [2, 4] - [2, 5])
                (comment [3, 4] - [3, 14])
                right: (number_literal [4, 4] - [4, 5])))))))

Here, the two comment nodes appear as unnamed nodes. IMHO the second
tree is a more useful one, as the named fields contain the semantically
important subtrees (e.g. a binary expression is made up of a left and
right subtree, not a left subtree, a right comment, and then some
unnamed subtree.)

Emacs's tree makes writing queries less convenient, as instead of being
able to refer to well-defined names, one has to rely on child indices to
account for comments.


Further mismatch arises from repeated fields and separators.

Consider the following Go source:

    package pkg
    
    var a, b, c = 1, 2, 3

treesit-explore-mode displays the following tree:

    (source_file
     (package_clause package (package_identifier))
     \n
     (var_declaration var
      (var_spec name: (identifier) name: , (identifier) value: , (identifier) =
       (expression_list (int_literal) , (int_literal) , (int_literal))))
     \n)

Here, the var_spec node has two fields named 'name' even though the
source specifies three names. Furthermore, The second 'name', as well as
'value' are set to the ',' separator between identifiers. Two of the three
identifiers aren't named.

'tree-sitter parse file.go', on the other hand, produces this more
accurate tree:

    (source_file [0, 0] - [2, 21]
      (package_clause [0, 0] - [0, 11]
        (package_identifier [0, 8] - [0, 11]))
      (var_declaration [2, 0] - [2, 21]
        (var_spec [2, 4] - [2, 21]
          name: (identifier [2, 4] - [2, 5])
          name: (identifier [2, 7] - [2, 8])
          name: (identifier [2, 10] - [2, 11])
          value: (expression_list [2, 14] - [2, 21]
            (int_literal [2, 14] - [2, 15])
            (int_literal [2, 17] - [2, 18])
            (int_literal [2, 20] - [2, 21])))))

This reproduces with 29.1 as well as 30.0.50.





             reply	other threads:[~2023-10-21 20:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-21 20:36 Dominik Honnef [this message]
2023-10-25 13:03 ` bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Eli Zaretskii
2023-11-19 10:08   ` Eli Zaretskii
2023-11-25 10:03     ` Eli Zaretskii
2023-12-10 10:07       ` Yuan Fu
2023-12-10 14:28         ` Dominik Honnef
2023-12-11  1:02           ` Yuan Fu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87edhnzp9t.fsf@honnef.co \
    --to=dominik@honnef.co \
    --cc=66674@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.