From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: Update on tree-sitter structure navigation Date: Wed, 6 Sep 2023 15:47:42 +0300 Message-ID: References: <5E7F2A94-4377-45C0-8541-7F59F3B54BA1@gmail.com> <8a5b3b3e-f091-3f38-09d4-c4e26bec97f9@yandex.ru> <87o7igc80a.fsf@dfreeman.email> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6242"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: Yuan Fu , emacs-devel , Theodor Thornhill , =?UTF-8?Q?Jostein_Kj=c3=b8nigsen?= , Randy Taylor , Wilhelm Kirschbaum , Perry Smith To: Danny Freeman Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Sep 06 14:49:03 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qdrxv-0001QY-NL for ged-emacs-devel@m.gmane-mx.org; Wed, 06 Sep 2023 14:49:03 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qdrxH-0006PC-SA; Wed, 06 Sep 2023 08:48:25 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qdrx3-0006MY-6U for emacs-devel@gnu.org; Wed, 06 Sep 2023 08:48:11 -0400 Original-Received: from forward500a.mail.yandex.net ([2a02:6b8:c0e:500:1:45:d181:d500]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qdrwv-0007SV-Du for emacs-devel@gnu.org; Wed, 06 Sep 2023 08:48:05 -0400 Original-Received: from mail-nwsmtp-smtp-production-main-74.vla.yp-c.yandex.net (mail-nwsmtp-smtp-production-main-74.vla.yp-c.yandex.net [IPv6:2a02:6b8:c0f:5d0f:0:640:79fc:0]) by forward500a.mail.yandex.net (Yandex) with ESMTP id 3ECC75EE62; Wed, 6 Sep 2023 15:47:52 +0300 (MSK) Original-Received: by mail-nwsmtp-smtp-production-main-74.vla.yp-c.yandex.net (smtp/Yandex) with ESMTPSA id mlXARj2DemI0-GJQvnJm4; Wed, 06 Sep 2023 15:47:51 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1694004471; bh=WwTXGvQXR/ON5Lv3MYLASgJDVHld0UqhADQ+1s5z+d0=; h=In-Reply-To:From:Subject:Message-ID:Cc:References:Date:To; b=uyReqHPZ1JV5I3qp4GvDih7fUQD1Nx2WrWLL/qBS8njRgTrUYRBU5syyhI0ZpwCBJ /edc24hLUE94ALPyJnPo3um5nwcBwOydBHp116CffPk8lfsdNXVGD/JvGdjVvij/zl lC1KoyiKQvpopwc2WYzIcmi1r4ry45mLQOrHkAkM= Authentication-Results: mail-nwsmtp-smtp-production-main-74.vla.yp-c.yandex.net; dkim=pass header.i=@yandex.ru Original-Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailauth.nyi.internal (Postfix) with ESMTP id 4134B27C005A; Wed, 6 Sep 2023 08:47:48 -0400 (EDT) Original-Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Wed, 06 Sep 2023 08:47:48 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudehfedgheehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtfeejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegughhuthhovheshigrnhguvgigrdhruheqnecuggftrfgrth htvghrnhepgfejhfduffegvdevtefhgfettefgfeelgfelffehgeehhfeiudehfedvffeg teegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hguhhtohhvodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqddufeeffeelleeh hedvqddvleegjeejjeejiedqughguhhtohhvpeephigrnhguvgigrdhruhesfhgrshhtmh grihhlrdgtohhm X-ME-Proxy: Feedback-ID: ib1d9465d:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 6 Sep 2023 08:47:44 -0400 (EDT) Content-Language: en-US In-Reply-To: <87o7igc80a.fsf@dfreeman.email> Received-SPF: pass client-ip=2a02:6b8:c0e:500:1:45:d181:d500; envelope-from=dgutov@yandex.ru; helo=forward500a.mail.yandex.net X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-1.473, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:310194 Archived-At: On 06/09/2023 05:51, Danny Freeman wrote: > > Dmitry Gutov writes: > >> Hi Yuan, >> >> On 02/09/2023 08:01, Yuan Fu wrote: >>> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version >>> number, so every time the author changes the grammar, our queries break, and loading the mode only >>> produces a giant error. >> >> I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser >> repositories and the ref of the latest known good revision, for the current version of the major >> mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly >> to how auto-mode-alist is appended to). > > clojure-ts-mode keeps a URL for the parser, but doesn't do anything > about the git revision. It easily could but I don't feel the need (yet) > since I am also a maintainer of the clojure grammar and know when we're > about to break grammar consumers. Sure, that's easy enough to do when the package is only in ELPA: upgrade the grammar, upgrade the package, all in lockstep. Unless nixos or other distros are going to start distributing it as well, and you'll need to care about having the recent clojure-ts-mode being loaded with old versions of the grammar. > It's not quite that simple though. Some distributions (nixos for > example) are already providing pre-compiled grammars. That is how I > discovered a couple recent bugs in js-ts-mode, because the grammars > distributed with nixos 23.05 no longer worked on Emacs 30 after a patch > was applied that was supposed to be backwards compatible (a real pain to > verify in my experience). A helpful find. ;) > With the way Emacs can load a grammar provided by the user's > distribution, keeping information about the version of the grammar in > the major mode doesn't help all that much. Even if we did it we have no > idea what version might be have been built used the user's > .emacs.d/tree-sitter folder. That would require something like putting a > version number in the file name, or maybe applying a patch to the > grammar's C source that allowed us to get a version, SHA, something at > runtime. Well, it would at least allow the user to rebuild the grammar to the version best known to work. Also, perhaps if the mode tracks the changes in the hash over time, it could see whether the grammar needs to be rebuilt. Finally, treesit-install-language-grammar could track which revision was last compiled. So there is *something* we could do for the users who upgrade their grammars from Git. Grammars distributed from distros are more of a problem, because it's not always a good idea to abort with "wrong version". But perhaps we could do that and recommend installing from Git in such cases anyway? Another problem is that grammars don't have good versioning, and even if they did, we'd have to sometimes update the "upper bound" (we'd need coarse ranges, right? rather that one fixed version requirement) more frequently than Emacs is released. Less of a problem for modes in ELPA, though. > I'm not so sure we can have a great way to do this without a change to > the tree-sitter libraries. I would love to see some kind of increasing > version number generated in the grammar's C source that we could then > access. It could be used to make decisions about what queries to use, or > to warn the user they need to use a different grammar (maybe offering to > install a compatible version). Yes, that would be an improvement, worth being up on the issue tracker maybe. > Tree-sitter grammar changes are almost always breaking changes. Adding > nodes can break things, re-naming them and removing them definitely can. > I'm not sure any grammar consumer has a great way to deal with this > without always compiling the exact grammar they need and only ever using > it. That's my conclusion as well for the time being.