From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Tree-sitter integration on feature/tree-sitter
Date: Fri, 13 May 2022 09:34:28 +0300
Message-ID: <83ee0yndor.fsf@gnu.org>
References: <87y1zabmbt.fsf@gmail.com>
 <5F186EBD-CD21-422B-8B4F-0D5424173334@gmail.com>
 <875ymdwf76.fsf@gmail.com>
 <011DA1A3-0FA8-4449-878A-FD6B336B0F1B@gmail.com>
 <8735hhw75p.fsf@gmail.com> <83czgks4ss.fsf@gnu.org>
 <87wnesuw63.fsf@gmail.com> <83pmkkqhft.fsf@gnu.org>
 <87tu9wukbt.fsf@gmail.com> <83ee10qbk7.fsf@gnu.org>
 <8F6A43D1-D1EA-4602-A245-627DB7960FC2@gmail.com> <838rr7qqhw.fsf@gnu.org>
 <87sfpekf6t.fsf@gmail.com> <838rr6pwjt.fsf@gnu.org>
 <87pmkik7x6.fsf@gmail.com> <83wneqoej5.fsf@gnu.org> <87mtfmk4oi.fsf@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="32254"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: casouri@gmail.com, emacs-devel@gnu.org
To: Yoav Marco <yoavm448@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri May 13 09:26:12 2022
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1npPgi-0008Fo-No
	for ged-emacs-devel@m.gmane-mx.org; Fri, 13 May 2022 09:26:12 +0200
Original-Received: from localhost ([::1]:54266 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1npPgh-0005aU-7S
	for ged-emacs-devel@m.gmane-mx.org; Fri, 13 May 2022 03:26:11 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43694)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>) id 1npOsZ-0001vd-NB
 for emacs-devel@gnu.org; Fri, 13 May 2022 02:34:25 -0400
Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:55268)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1npOsZ-0001Fp-D0; Fri, 13 May 2022 02:34:23 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=SHcaInB56fHmekXDMigj4/58ZfATC5yJGcTPjrym884=; b=mFTL5BxQe1BXla3jlLnL
 DOES3jn+8F1SihtDyuFRtrJCcTJ3mT4OpIQQgsKZs09mL5xPiMS8dM7juzKp4GPdba1lL3/0MwVzo
 k0ZFUwfKsdwHtqc01IsnCgI6im9vryfuTUQR27LMZrOZDP87gbQ+11O+c4ot9/6tDVjfmmY++o1hJ
 m2nn1G0pb22WpdMHlrSjwQ9xpkgTkWKOTmsfe5+8Rk7aYRSH4ehNuZZ7yRWDCYZ5xjjDdJsG5xa+Z
 YO11TpK7yoOnrkUl4ji8madLS/Amj0XXywksp+HrDNAW9/ONMWU6rgck4GS58YKulyoRVbcygZOUg
 MaImkJ8iJGjAqw==;
Original-Received: from [87.69.77.57] (port=1442 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1npOsY-0001jL-Gp; Fri, 13 May 2022 02:34:23 -0400
In-Reply-To: <87mtfmk4oi.fsf@gmail.com> (message from Yoav Marco on Thu, 12
 May 2022 20:22:30 +0300)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:289713
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/289713>

> From: Yoav Marco <yoavm448@gmail.com>
> Cc: casouri@gmail.com, emacs-devel@gnu.org
> Date: Thu, 12 May 2022 20:22:30 +0300
> 
> > But maybe we should make this discussion more concrete.  Can you show
> > the queries and explain how they are produced from the font-lock rules
> > (or whatever else they are produced from)?  How many different queries
> > do we expect to have in a garden-variety major mode for a PL, and what
> > do they depend on?
> 
> So first of all, query is kind of an aggregate term, since one query
> string/sexp can contain many "query patterns". I expect most major modes
> to have one big query string/sexp, and maybe a handful more that are
> optional to users. treesit allows you to set as many query strings/sexps
> as you want for syntax highlighting. Outside of that, queries are also
> how packages like evil-textobj-tree-sitter work, with the backend of the
> elisp-tree-sitter which uses a dynamic module.
> 
> Queries are specific to the parse tree and therefore to the parser. Most
> parsers have a queries/highlights.scm file in their repo, and
> tree-sitter-langs contains a bunch of these:
> 
> > https://github.com/emacs-tree-sitter/tree-sitter-langs/#readme
> >
> > Highlighting query patterns for a language are in the file
> > queries/<lang>/highlights.scm. Most of them are intentionally
> > different from those from upstream repositories, which are more geared
> > towards GitHub’s use cases. We try to be more consistent with Emacs’s
> > existing conventions. (For some languages, this is WIP, so their
> > patterns may look similar to upstream’s.)
> 
> The query I used in the benchmarks is tree-sitter-langs's
> queries/c/highlights.scm, which is a rather big file. One thing to check
> that I only thought of now is how long it takes with treesit having to
> compile and run multiple queries.

Is it true that there's just one query for each PL mode, and it is
fixed (doesn't change) and doesn't depend on the buffer contents in
any way?  If that is true, the major mode could compile the query
whenever it is initialized, and then reuse it in every buffer that is
under that major mode.

If the above conclusion is not correct, then please tell what are the
differences between the query/queries of different buffers, and how do
they depend on the buffer contents.

> >   . the time it takes to visit xdisp.c and display the first window-full
> >   . visit xdisp.c, then immediately go to its end
> >   . C-v in xdisp.c (repeat many times to see how much a single C-v
> >     takes)
> 
> Okay, we can try that. What's the proper way to trigger a "natural
> fontification" as would occur in the GUI without opening an interactive
> session?

There isn't any (IIUC what you are asking).  Fontification is a
feature of interactive sessions, and is basically meaningless without
normal redisplay.

> I'd rather use the groundwork that's actually used by users,
> and not get stuff like the JIT chunck size wrong. In general I'm not too
> familiar with that part of Emacs; the benchmarks up to now used
> with-temp-buffer, would that suffice for these new benchmarks?

Using with-temp-buffer could cause problems, because not everything is
set up as it would when actually visiting the file.  Why is
with-temp-buffer necessary for the benchmarks?

But if it turns out that a query doesn't depend on the buffer
contents, I think this is a moot point, and the major mode could
compile the query just once when its first loaded.

Thanks.