From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter integration on feature/tree-sitter Date: Fri, 13 May 2022 09:34:28 +0300 Message-ID: <83ee0yndor.fsf@gnu.org> References: <87y1zabmbt.fsf@gmail.com> <5F186EBD-CD21-422B-8B4F-0D5424173334@gmail.com> <875ymdwf76.fsf@gmail.com> <011DA1A3-0FA8-4449-878A-FD6B336B0F1B@gmail.com> <8735hhw75p.fsf@gmail.com> <83czgks4ss.fsf@gnu.org> <87wnesuw63.fsf@gmail.com> <83pmkkqhft.fsf@gnu.org> <87tu9wukbt.fsf@gmail.com> <83ee10qbk7.fsf@gnu.org> <8F6A43D1-D1EA-4602-A245-627DB7960FC2@gmail.com> <838rr7qqhw.fsf@gnu.org> <87sfpekf6t.fsf@gmail.com> <838rr6pwjt.fsf@gnu.org> <87pmkik7x6.fsf@gmail.com> <83wneqoej5.fsf@gnu.org> <87mtfmk4oi.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32254"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, emacs-devel@gnu.org To: Yoav Marco Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri May 13 09:26:12 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1npPgi-0008Fo-No for ged-emacs-devel@m.gmane-mx.org; Fri, 13 May 2022 09:26:12 +0200 Original-Received: from localhost ([::1]:54266 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1npPgh-0005aU-7S for ged-emacs-devel@m.gmane-mx.org; Fri, 13 May 2022 03:26:11 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43694) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1npOsZ-0001vd-NB for emacs-devel@gnu.org; Fri, 13 May 2022 02:34:25 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:55268) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1npOsZ-0001Fp-D0; Fri, 13 May 2022 02:34:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=SHcaInB56fHmekXDMigj4/58ZfATC5yJGcTPjrym884=; b=mFTL5BxQe1BXla3jlLnL DOES3jn+8F1SihtDyuFRtrJCcTJ3mT4OpIQQgsKZs09mL5xPiMS8dM7juzKp4GPdba1lL3/0MwVzo k0ZFUwfKsdwHtqc01IsnCgI6im9vryfuTUQR27LMZrOZDP87gbQ+11O+c4ot9/6tDVjfmmY++o1hJ m2nn1G0pb22WpdMHlrSjwQ9xpkgTkWKOTmsfe5+8Rk7aYRSH4ehNuZZ7yRWDCYZ5xjjDdJsG5xa+Z YO11TpK7yoOnrkUl4ji8madLS/Amj0XXywksp+HrDNAW9/ONMWU6rgck4GS58YKulyoRVbcygZOUg MaImkJ8iJGjAqw==; Original-Received: from [87.69.77.57] (port=1442 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1npOsY-0001jL-Gp; Fri, 13 May 2022 02:34:23 -0400 In-Reply-To: <87mtfmk4oi.fsf@gmail.com> (message from Yoav Marco on Thu, 12 May 2022 20:22:30 +0300) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:289713 Archived-At: > From: Yoav Marco > Cc: casouri@gmail.com, emacs-devel@gnu.org > Date: Thu, 12 May 2022 20:22:30 +0300 > > > But maybe we should make this discussion more concrete. Can you show > > the queries and explain how they are produced from the font-lock rules > > (or whatever else they are produced from)? How many different queries > > do we expect to have in a garden-variety major mode for a PL, and what > > do they depend on? > > So first of all, query is kind of an aggregate term, since one query > string/sexp can contain many "query patterns". I expect most major modes > to have one big query string/sexp, and maybe a handful more that are > optional to users. treesit allows you to set as many query strings/sexps > as you want for syntax highlighting. Outside of that, queries are also > how packages like evil-textobj-tree-sitter work, with the backend of the > elisp-tree-sitter which uses a dynamic module. > > Queries are specific to the parse tree and therefore to the parser. Most > parsers have a queries/highlights.scm file in their repo, and > tree-sitter-langs contains a bunch of these: > > > https://github.com/emacs-tree-sitter/tree-sitter-langs/#readme > > > > Highlighting query patterns for a language are in the file > > queries//highlights.scm. Most of them are intentionally > > different from those from upstream repositories, which are more geared > > towards GitHub’s use cases. We try to be more consistent with Emacs’s > > existing conventions. (For some languages, this is WIP, so their > > patterns may look similar to upstream’s.) > > The query I used in the benchmarks is tree-sitter-langs's > queries/c/highlights.scm, which is a rather big file. One thing to check > that I only thought of now is how long it takes with treesit having to > compile and run multiple queries. Is it true that there's just one query for each PL mode, and it is fixed (doesn't change) and doesn't depend on the buffer contents in any way? If that is true, the major mode could compile the query whenever it is initialized, and then reuse it in every buffer that is under that major mode. If the above conclusion is not correct, then please tell what are the differences between the query/queries of different buffers, and how do they depend on the buffer contents. > > . the time it takes to visit xdisp.c and display the first window-full > > . visit xdisp.c, then immediately go to its end > > . C-v in xdisp.c (repeat many times to see how much a single C-v > > takes) > > Okay, we can try that. What's the proper way to trigger a "natural > fontification" as would occur in the GUI without opening an interactive > session? There isn't any (IIUC what you are asking). Fontification is a feature of interactive sessions, and is basically meaningless without normal redisplay. > I'd rather use the groundwork that's actually used by users, > and not get stuff like the JIT chunck size wrong. In general I'm not too > familiar with that part of Emacs; the benchmarks up to now used > with-temp-buffer, would that suffice for these new benchmarks? Using with-temp-buffer could cause problems, because not everything is set up as it would when actually visiting the file. Why is with-temp-buffer necessary for the benchmarks? But if it turns out that a query doesn't depend on the buffer contents, I think this is a moot point, and the major mode could compile the query just once when its first loaded. Thanks.