From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.bugs Subject: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node Date: Mon, 27 Feb 2023 14:37:22 -0800 Message-ID: <4FF7BE3E-A966-4419-AA22-BAF57D1EF792@gmail.com> References: <9FCDA5B7-D216-45B1-8051-35B05633BEFB@gmail.com> <83sfeukwsb.fsf@gnu.org> <574817C4-3FD8-43EA-B53C-B2BCB60A6D0A@gmail.com> <87a610wyod.fsf@masteringemacs.org> <875ybnwm2r.fsf@masteringemacs.org> <8A0520AE-7C8C-43D2-BE93-E80D5CC8856C@gmail.com> <871qmbw55n.fsf@masteringemacs.org> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12347"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Po Lu , Eli Zaretskii , 60237@debbugs.gnu.org To: Mickey Petersen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Feb 27 23:38:10 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pWm8H-0002zh-Ot for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Feb 2023 23:38:09 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pWm8C-0006L2-05; Mon, 27 Feb 2023 17:38:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pWm8A-0006EE-Kp for bug-gnu-emacs@gnu.org; Mon, 27 Feb 2023 17:38:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pWm8A-0003fv-CV for bug-gnu-emacs@gnu.org; Mon, 27 Feb 2023 17:38:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pWm8A-00006Z-7V for bug-gnu-emacs@gnu.org; Mon, 27 Feb 2023 17:38:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 27 Feb 2023 22:38:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60237 X-GNU-PR-Package: emacs Original-Received: via spool by 60237-submit@debbugs.gnu.org id=B60237.1677537463374 (code B ref 60237); Mon, 27 Feb 2023 22:38:02 +0000 Original-Received: (at 60237) by debbugs.gnu.org; 27 Feb 2023 22:37:43 +0000 Original-Received: from localhost ([127.0.0.1]:49092 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pWm7q-00005y-H1 for submit@debbugs.gnu.org; Mon, 27 Feb 2023 17:37:43 -0500 Original-Received: from mail-pj1-f51.google.com ([209.85.216.51]:54081) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pWm7o-00005f-Cm for 60237@debbugs.gnu.org; Mon, 27 Feb 2023 17:37:41 -0500 Original-Received: by mail-pj1-f51.google.com with SMTP id y2so7786740pjg.3 for <60237@debbugs.gnu.org>; Mon, 27 Feb 2023 14:37:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HUz4JZ+Ol4l4XcqGMwSX+6OThSiAxJDCilurvWsJnJE=; b=nl1c+wOQvoufgigC7ly3f6E4mpyoGFS08XaaP4Ootro0TnrBSNXmAjP9zfysdfacZl ycnqI5yq0QzdEUVzbYENGWLR2hcU5x1bFVWjHTaSgGhsj4T7gmXYQvsrD41w73wZApPT vur+p2btQ26fzhGZGkWT/JSya6f4YU5R0ynS9vGC9U0chdm+8jEVe7HKKTcJsakdx2cM D89fTmO5Y6wrEwGqFvhY2BdDHzKySwkHAa52nWSeoN2ElZ6SaunUzqBluMzAIqLUSQkq 7/kiDPmuWeZ8XgMY4qO7lNlfDrV2wbnRQfVo1vKa1ryvPLzyTM5HN6vMG8H0dSujM/Rz w5HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HUz4JZ+Ol4l4XcqGMwSX+6OThSiAxJDCilurvWsJnJE=; b=j3OV0G8D+OqJpymTzOnL/maNngohksjjpgCDtbu4wpFeEzxE/nBdISgdsJA1Fkw5UX LCHLytS3Qr79sVrhdKMadRMhzdOo32Hs1haQQZY8davGChSdYKsFEU5NLqgzUYyXQB4+ 1l6D7jAKZHAiZ2bwr1Cc7tKzEk+uvNrsgW1yXUz+JvdC2rDTcKf5/GqLzZvBwdvHQ2jb l8tEZiUpmjVXRTnZa0AkIpCqXkOo02ozmuKarHVE0iCf+jZnbwA0RdutpKBpmEaogeZw qkX0Vw1q4hpDMh10w4Uas03asI5C9eIdChotkqSMNcLLoiSzRRsOHT3i8ueFxx/ecDQn cvqg== X-Gm-Message-State: AO0yUKUfDpa41zifwnYRnc68Yyq+Vw8/fgdeUYdqg1l6OBJu3javeYwP VQdbhLhZRz64Z7zxp4x048s= X-Google-Smtp-Source: AK7set8BfEK0+bXuffahL4h2Bq98hdxw1tI4FIxqk5BNZM4bna/1VeeNJQHIt5W6mfcrW5qz3vUf4g== X-Received: by 2002:a17:902:7d94:b0:19d:135:2013 with SMTP id a20-20020a1709027d9400b0019d01352013mr534257plm.26.1677537454292; Mon, 27 Feb 2023 14:37:34 -0800 (PST) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id s22-20020a170902b19600b00198f9fa23a3sm5071167plr.287.2023.02.27.14.37.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Feb 2023 14:37:33 -0800 (PST) In-Reply-To: <871qmbw55n.fsf@masteringemacs.org> X-Mailer: Apple Mail (2.3731.400.51.1.1) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:256918 Archived-At: > On Feb 27, 2023, at 6:29 AM, Mickey Petersen = wrote: >=20 >=20 > Yuan Fu writes: >=20 >>> On Feb 27, 2023, at 12:22 AM, Mickey Petersen = wrote: >>>=20 >>>=20 >>> Yuan Fu writes: >>>=20 >>>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen = wrote: >>>>>=20 >>>>>=20 >>>>> Yuan Fu writes: >>>>>=20 >>>>>>> GC has historically never called xmalloc, so the profiler will >>>>>>> likely >>>>>>> crash upon growing the mark stack as well. I guess another >>>>>>> important >>>>>>> question is why ts_delete_parser is calling xmalloc. >>>>>>>=20 >>>>>>=20 >>>>>>> As you see, when we call ts_tree_delete, it calls >>>>>>> ts_subtree_release, >>>>>>> which in turn calls malloc (redirected into our xmalloc). Is = this >>>>>>> expected? Can you look in the tree-sitter sources and verify = that >>>>>>> this is OK? >>>>>>=20 >>>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or = more >>>>>> precisely, a Subtree) is just some inlined data plus a refcounted >>>>>> pointer to the complete data. This way multiple trees share = common >>>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an = old >>>>>> tree and get a new tree, these two trees will share the unchanged = part >>>>>> of the tree. >>>>>=20 >>>>> Would that mean we could possibly preserve node instances -- = either >>>>> the real TS ones, or an Emacs-created facsimile -- between >>>>> incremental parsing? That would be useful for refactoring. >>>>=20 >>>> What kind of exact interface (function) do you want? The >>>> treesit-node-outdated error is solely Emacs=E2=80=99s product, = tree-sitter >>>> itself doesn=E2=80=99t mark a node outdated. It is possible for = Emacs to not >>>> delete the old tree and give it to you, or allow you to access >>>> information of an outdated node. >>>=20 >>> OK, so let me explain: >>>=20 >>> Touching the buffer for any reason invalidates the whole tree; = that's >>> not good. It's not good, because a lot of the information may still = be >>> useful and viable. Outdating the node is not a bad idea as it avoids = a >>> lot of 'traps' around accidental modifications that can corrupt = things >>> without the developer's knowledge. >>>=20 >>> I'd like to be able to access all the information possible; perhaps >>> behind a flag variable like `treesit-allow-outdated-node-access'. = What >>> I'm really mostly interested in is: >>>=20 >>> - How well the node references handle changes in byte positions in = TS. >>=20 >> They don=E2=80=99t handle position changes. If the buffer content = changed, we >> need to reparse. Once we reparsed the buffer, a new tree is >> born. While it is true that the new tree shares some node with the = old >> tree, tree-sitter does not expose any function or information that >> tells you which node in the new tree is =E2=80=9Cthe same=E2=80=9D as = which node in >> the old tree; nor does it tell you whether a node in the old tree >> still =E2=80=9Cexists=E2=80=9D in the new tree. >>=20 >> Now, there does exist a function (in tree-sitter=E2=80=99s API) that = allows >> you to =E2=80=9Cedit=E2=80=9D a node with position changes. But a) = I=E2=80=99m not sure how >> does it handle the case where the node is deleted by the change and = b) >> it is not very useful because once you reparse the buffer, the new >> tree is completely independent from the old tree (ignoring the >> implementation detail which is not exposed). >>=20 >>>=20 >>> - Does changing something at X shift (like a `point-marker`) = everything >>> below it? Does an outdated node correctly reference its new location >>> and state, such as changes to children or its position in the tree? >>=20 >> Like I said above, any buffer change will create a new tree with no = relation to the old tree, so there is no shifting. >>=20 >> And there really isn=E2=80=99t a =E2=80=9Cnew location=E2=80=9D: we = don=E2=80=99t know if the old node >> is still in the new tree. Mind you, even if the node is completely >> outside of the changed region, it can still disappear from the new >> tree because of change of its surrounding context. For example, in = the >> following C code: >>=20 >> /* >> int c =3D 1; >>=20 >> If I insert a closing comment delimiter, and buffer becomes >>=20 >> /* >> int c =3D 1; >> */ >>=20 >> Even though int c =3D 1; is not in the changed range, nor did it=E2=80=99= s >> position move, all those nodes (int, c, =3D, etc) are not in the new >> tree anymore, because the whole thing becomes a comment. >>=20 >> I made any access to outdated nodes error because there really = isn=E2=80=99t >> any good reason to use them, at least I didn=E2=80=99t think of any = at the >> time. And make them error out should help people catch errors. >>=20 >>>=20 >>> Right now, Combobulate can make a proxy node, which essentially >>> captures the basics of a live node and stores it in a defstruct. = That >>> way I can at least retain the start/end, type, text, etc. of a node >>> and still do light refactoring without contorting myself to do = things >>> in a particular order, which is not always possible (like delaying >>> editing to the very end.) >>=20 >> IIUC, you want to do some very minor whitespace edit to the buffer >> which doesn=E2=80=99t really change the parse tree, so you don=E2=80=99= t want the >> nodes to be invalidated for no good reason? Not erroring on outdated >> nodes is easy. As you said, we can add a >> treesit-inhibit-error-outdated variable. But not it=E2=80=99s not so = easy to >> automatically update outdated nodes=E2=80=99 positions (with = aforementioned >> tree-sitter function). However, if you are making those changes, you >> much know how to adjust your nodes position, right? So maybe it = isn=E2=80=99t >> a must-have for your purpose. >=20 > It's a good point, but it's also easy to create a scenario where you > at least want to keep the position and esp. the type and text (for > reporting information to the user, or similar.) I should be clearer. I meant that treesit-inhibit-error-outdated is = reasonable and easy to implement. So if you want we can add it. OTOH = auto-updating outdated nodes with position information is nontrivial, = and might not be must-have for your purpose. >=20 > My main interest is now refactoring and how to best do it. If TS can > do some of it, then all the better. I realise it was never meant to, > but if we can continue accessing the information contained in a node > even if it is outdated, then that could be useful, however niche. I guess =E2=80=9Crefactoring=E2=80=9D includes not only whitespace = changes but also some structural changes like slurping (or whatever = it=E2=80=99s called), right? If you want to do structural changes, = tree-sitter probably can=E2=80=99t help you much, as you observed. Maybe = it=E2=80=99s better to =E2=80=9Cexport=E2=80=9D the tree-sitter tree to = your own tree and do transformations with it? Maybe that=E2=80=99s = already what you does now. > Currently I use overlays and point markers, but they are not > infallible. Yuan=