From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.bugs Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node Date: Mon, 27 Feb 2023 01:05:49 -0800 Message-ID: <8A0520AE-7C8C-43D2-BE93-E80D5CC8856C@gmail.com> References: <9FCDA5B7-D216-45B1-8051-35B05633BEFB@gmail.com> <83sfeukwsb.fsf@gnu.org> <574817C4-3FD8-43EA-B53C-B2BCB60A6D0A@gmail.com> <87a610wyod.fsf@masteringemacs.org> <875ybnwm2r.fsf@masteringemacs.org> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17087"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Po Lu , Eli Zaretskii , 60237@debbugs.gnu.org To: Mickey Petersen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Feb 27 10:07:14 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pWZTV-0004Gy-VE for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Feb 2023 10:07:14 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pWZTL-0001NZ-Nq; Mon, 27 Feb 2023 04:07:03 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pWZTK-0001E8-H4 for bug-gnu-emacs@gnu.org; Mon, 27 Feb 2023 04:07:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pWZTK-0002Lv-8X for bug-gnu-emacs@gnu.org; Mon, 27 Feb 2023 04:07:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pWZTJ-0005E0-Tj for bug-gnu-emacs@gnu.org; Mon, 27 Feb 2023 04:07:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 27 Feb 2023 09:07:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60237 X-GNU-PR-Package: emacs Original-Received: via spool by 60237-submit@debbugs.gnu.org id=B60237.167748877420018 (code B ref 60237); Mon, 27 Feb 2023 09:07:01 +0000 Original-Received: (at 60237) by debbugs.gnu.org; 27 Feb 2023 09:06:14 +0000 Original-Received: from localhost ([127.0.0.1]:45880 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pWZSX-0005Co-H0 for submit@debbugs.gnu.org; Mon, 27 Feb 2023 04:06:13 -0500 Original-Received: from mail-pj1-f47.google.com ([209.85.216.47]:33588) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pWZST-0005Ca-8f for 60237@debbugs.gnu.org; Mon, 27 Feb 2023 04:06:12 -0500 Original-Received: by mail-pj1-f47.google.com with SMTP id m3-20020a17090ade0300b00229eec90a7fso11309503pjv.0 for <60237@debbugs.gnu.org>; Mon, 27 Feb 2023 01:06:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nZu6xFSQkxvqLKm/e2LmeWg0DVeD9R8hEciRBmakKAQ=; b=I7mICWON9FEi7M66o/2jkpavmIYykZD7vL1BJ2lWMZDfSV146Jj+cVp2CP3u0bdxi9 t6wrDd32Qqes6X8txmIEMhoUb4cl4iE7Ppapdjptz/nAPRqwuD6hYJQCST+iS8Wn61/c csPqgUt1DghRz8jmqeyeguSNMRJVnBvtHiOupoyF/gDGbTsrn5hYP1AX8pKh15VEX722 J6TA20/gfp5syKJgE9hYD9flrGOxgvVAKh6Xd76bQY0Qt6dDCeSFAglmF0y2f+95YiCX D9drVpsFXEuacY9zGpvszhaueJXPgSGD0ZYB7eBd0mq4KZHIbFYu4GgtTwe9zJcKAsEU yyuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nZu6xFSQkxvqLKm/e2LmeWg0DVeD9R8hEciRBmakKAQ=; b=Uhc9VvvD6QdDneuMpwgsI7yhU/7kHncD9JQupREquixo5fLuG0EADo+MNilGWbNJ+y Ngip0t374h+KJWlYgjIdbGGY4+JKFGAInoTZ83RljRMGCsyr71BTNNlQ1pXWDRFB6oY3 vYnMDyL+x3y+NT2v/hdAu5rjqt6XmhQiQJztAhyEzT7nEHnb3fRtKIOqNnQlNnFF6vb7 h52QRfZUCAkZMxcVbWYBIYfzMurIj2Ez9r1Rma4eS0TI904CaRrYKF2iuEPa8n6bAu1G SkbXvUSJ703HPvkIqtqF8NUgzST8Nk5U255aNxdSLGL4TMJg8rORN34A+4vmy38Ct1TT N7rQ== X-Gm-Message-State: AO0yUKUA886oEMpJk15bO99p13EXSZGLuPQVAO7DMJdpAf/9zxVdBg4m gnpoBDDJqd3afc5h/fFY0Zk= X-Google-Smtp-Source: AK7set/KZPnxFj8amL5C5SfoD7YXaDq5p048AuZ01pqQf5+iTanl14VH1N141ynD/+oBXa58p87ReA== X-Received: by 2002:a05:6a20:431b:b0:cc:9f59:4562 with SMTP id h27-20020a056a20431b00b000cc9f594562mr10535641pzk.53.1677488762875; Mon, 27 Feb 2023 01:06:02 -0800 (PST) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id l3-20020a62be03000000b005a852450b14sm3751189pff.183.2023.02.27.01.06.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Feb 2023 01:06:00 -0800 (PST) In-Reply-To: <875ybnwm2r.fsf@masteringemacs.org> X-Mailer: Apple Mail (2.3731.400.51.1.1) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:256855 Archived-At: > On Feb 27, 2023, at 12:22 AM, Mickey Petersen = wrote: >=20 >=20 > Yuan Fu writes: >=20 >>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen = wrote: >>>=20 >>>=20 >>> Yuan Fu writes: >>>=20 >>>>> GC has historically never called xmalloc, so the profiler will >>>>> likely >>>>> crash upon growing the mark stack as well. I guess another >>>>> important >>>>> question is why ts_delete_parser is calling xmalloc. >>>>>=20 >>>>=20 >>>>> As you see, when we call ts_tree_delete, it calls >>>>> ts_subtree_release, >>>>> which in turn calls malloc (redirected into our xmalloc). Is this >>>>> expected? Can you look in the tree-sitter sources and verify that >>>>> this is OK? >>>>=20 >>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more >>>> precisely, a Subtree) is just some inlined data plus a refcounted >>>> pointer to the complete data. This way multiple trees share common >>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old >>>> tree and get a new tree, these two trees will share the unchanged = part >>>> of the tree. >>>=20 >>> Would that mean we could possibly preserve node instances -- either >>> the real TS ones, or an Emacs-created facsimile -- between >>> incremental parsing? That would be useful for refactoring. >>=20 >> What kind of exact interface (function) do you want? The >> treesit-node-outdated error is solely Emacs=E2=80=99s product, = tree-sitter >> itself doesn=E2=80=99t mark a node outdated. It is possible for Emacs = to not >> delete the old tree and give it to you, or allow you to access >> information of an outdated node. >=20 > OK, so let me explain: >=20 > Touching the buffer for any reason invalidates the whole tree; that's > not good. It's not good, because a lot of the information may still be > useful and viable. Outdating the node is not a bad idea as it avoids a > lot of 'traps' around accidental modifications that can corrupt things > without the developer's knowledge. >=20 > I'd like to be able to access all the information possible; perhaps > behind a flag variable like `treesit-allow-outdated-node-access'. What > I'm really mostly interested in is: >=20 > - How well the node references handle changes in byte positions in TS. They don=E2=80=99t handle position changes. If the buffer content = changed, we need to reparse. Once we reparsed the buffer, a new tree is = born. While it is true that the new tree shares some node with the old = tree, tree-sitter does not expose any function or information that tells = you which node in the new tree is =E2=80=9Cthe same=E2=80=9D as which = node in the old tree; nor does it tell you whether a node in the old = tree still =E2=80=9Cexists=E2=80=9D in the new tree. Now, there does exist a function (in tree-sitter=E2=80=99s API) that = allows you to =E2=80=9Cedit=E2=80=9D a node with position changes. But = a) I=E2=80=99m not sure how does it handle the case where the node is = deleted by the change and b) it is not very useful because once you = reparse the buffer, the new tree is completely independent from the old = tree (ignoring the implementation detail which is not exposed). >=20 > - Does changing something at X shift (like a `point-marker`) = everything > below it? Does an outdated node correctly reference its new location > and state, such as changes to children or its position in the tree? Like I said above, any buffer change will create a new tree with no = relation to the old tree, so there is no shifting. And there really isn=E2=80=99t a =E2=80=9Cnew location=E2=80=9D: we = don=E2=80=99t know if the old node is still in the new tree. Mind you, = even if the node is completely outside of the changed region, it can = still disappear from the new tree because of change of its surrounding = context. For example, in the following C code: /* int c =3D 1; If I insert a closing comment delimiter, and buffer becomes /* int c =3D 1; */ Even though int c =3D 1; is not in the changed range, nor did it=E2=80=99s= position move, all those nodes (int, c, =3D, etc) are not in the new = tree anymore, because the whole thing becomes a comment. I made any access to outdated nodes error because there really isn=E2=80=99= t any good reason to use them, at least I didn=E2=80=99t think of any at = the time. And make them error out should help people catch errors. >=20 > Right now, Combobulate can make a proxy node, which essentially > captures the basics of a live node and stores it in a defstruct. That > way I can at least retain the start/end, type, text, etc. of a node > and still do light refactoring without contorting myself to do things > in a particular order, which is not always possible (like delaying > editing to the very end.) IIUC, you want to do some very minor whitespace edit to the buffer which = doesn=E2=80=99t really change the parse tree, so you don=E2=80=99t want = the nodes to be invalidated for no good reason? Not erroring on outdated = nodes is easy. As you said, we can add a treesit-inhibit-error-outdated = variable. But not it=E2=80=99s not so easy to automatically update = outdated nodes=E2=80=99 positions (with aforementioned tree-sitter = function). However, if you are making those changes, you much know how = to adjust your nodes position, right? So maybe it isn=E2=80=99t a = must-have for your purpose. Yuan