From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: How to add pseudo vector types
Date: Thu, 22 Jul 2021 20:00:43 +0300
Message-ID: <83k0li2shw.fsf@gnu.org>
References: <83h7gw6pyj.fsf@gnu.org>
 <45EBF16A-C953-42C7-97D1-3A2BFEF7DD01@gmail.com> <83y2a764oy.fsf@gnu.org>
 <EC57595C-64CE-4920-8CB3-9DD3F4CFAF38@gmail.com> <83v95b60fn.fsf@gnu.org>
 <00DD5BFE-D14E-449A-9319-E7B725DEBFB3@gmail.com> <83r1fz5xr9.fsf@gnu.org>
 <1AAB1BCC-362B-4249-B785-4E0530E15C60@gmail.com> <83czri67h0.fsf@gnu.org>
 <46BBFF88-76C3-4818-8805-5437409BEA93@gmail.com> <83wnpq46uk.fsf@gnu.org>
 <533BD53B-4E85-4E9E-B46A-346A5BBAD0F5@gmail.com>
 <258CB68D-1CC1-42C8-BDCD-2A8A8099B783@gmail.com>
 <1a776770-50b7-93cd-6591-c9a5b3a56eb8@gmail.com>
 <jwv4kcseu43.fsf-monnier+emacs@gnu.org>
 <DC82148D-FFEB-4B3E-B258-AC58D7124E16@gmail.com>
 <F95D0E9B-0B21-450A-B91D-87E9E05873CE@gmail.com> <8335s64v10.fsf@gnu.org>
 <5380C92B-6C15-4490-A1E0-1C3132DBB16A@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="32375"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: cpitclaudel@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
To: Yuan Fu <casouri@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Jul 22 19:01:54 2021
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1m6c54-0008ED-DS
	for ged-emacs-devel@m.gmane-mx.org; Thu, 22 Jul 2021 19:01:54 +0200
Original-Received: from localhost ([::1]:60366 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1m6c53-0002Tu-Bk
	for ged-emacs-devel@m.gmane-mx.org; Thu, 22 Jul 2021 13:01:53 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49788)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>) id 1m6c4F-0001AU-DK
 for emacs-devel@gnu.org; Thu, 22 Jul 2021 13:01:03 -0400
Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58466)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <eliz@gnu.org>)
 id 1m6c4E-0000Ev-NU; Thu, 22 Jul 2021 13:01:02 -0400
Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:4535
 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1m6c4E-0007qH-AI; Thu, 22 Jul 2021 13:01:02 -0400
In-Reply-To: <5380C92B-6C15-4490-A1E0-1C3132DBB16A@gmail.com> (message from
 Yuan Fu on Thu, 22 Jul 2021 09:47:45 -0400)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:271469
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/271469>

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 22 Jul 2021 09:47:45 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  Clément Pit-Claudel <cpitclaudel@gmail.com>,
>  emacs-devel@gnu.org
> 
> Yes, I meant to discuss this. The problem with respecting narrowing is that, a user can freely narrow and widen arbitrarily, and Emacs needs to translate them into insertion & deletion of the buffer text for tree-sitter, every time a user narrows or widens the buffer. Plus, if tree-sitter respects narrowing, it could happen where a user narrows the buffer, the font-locking changes and is not correct anymore. Maybe that’s not the user want. Also, if someone narrows and widens often, maybe narrow to a function for better focus, tree-sitter needs to constantly re-parse most of the buffer. These are not significant disadvantages, but what do we get from respecting narrowing that justifies code complexity and these small annoyances?

But that's how the current font-lock and indentation work: they never
look beyond the narrowing limits.  So why should the TS-based features
behave differently?

As for temporary narrowing: if we record the changes, but don't send
them to TS until we actually need re-parsing, then we could eliminate
the temporary narrowing when we report the changes to TS, leaving only
the narrowing that exists at the time of the re-parse.  At least for
fontifications, that time is redisplay time, and users do expect to
see the text fontified according to the current narrowing.

> >>   *bytes_read = (uint32_t) len;
> > 
> > Is using uint32_t the restriction of tree-sitter?  Doesn't it support
> > reading more than 2 gigabytes?
> 
> I’m not sure why it asks for uint32 specifically, but that’s what it asks for its api. I don’t think you are supposed to use tree-sitter on files of size of gigabytes, because the author mentioned that tree-sitter uses over 10x as much memory as the size of the source file [1]. On files larger than a couple of megabytes, I think we better turn off tree-sitter. Normally those files are not regular source files, anyway, and we don’t need a parse tree for a log.

I don't necessarily agree with the "not regular source files" part.
For example, JSON files can be quite large.  And there are also log
files, which are even larger -- did no one adapt TS to fontifying
those yet?

More generally: is the problem real?  If you make a file that is 1000
copies of xdisp.c, and then submit it to TS, do you really get 10GB of
memory consumption?  This is something that is good to know up front,
so we'd know what to expect down the road.

> That leads to another point. I suspect the memory limit will come before the speed limit, i.e., as the file size increases, the memory consumption will become unacceptable before the speed does. So it is possible that we want to outright disable tree-sitter for larger files, then we don’t need to do much to improve the responsiveness of tree-sitter on large files. And we might want to delete the parse tree if a buffer has been idle for a while. Of course, that’s just my superstition, we’ll see once we can measure the performance.

See above: IMO, we should benchmark both the CPU and memory
performance of TS for such large files, before we decide on the course
of action.

> >> +DEFUN ("tree-sitter-node-type",
> >> +       Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
> >> +       doc: /* Return the NODE's type as a symbol.  */)
> >> +  (Lisp_Object node)
> >> +{
> >> +  CHECK_TS_NODE (node);
> >> +  TSNode ts_node = XTS_NODE (node)->node;
> >> +  const char *type = ts_node_type(ts_node);
> >> +  return intern_c_string (type);
> > 
> > Why do we need to intern the string each time? can't we store the
> > interned symbol there, instead of a C string, in the first place?
> 
> I’m not sure what do you mean by “store the interned symbol there”, where do I store the interned symbol?

In the struct that ts_node_type accesses, instead of the 'char *'
string you store there now.

> (BTW, If you see something wrong, that’s probably because I don’t know the right way to do it, and grepping only got me that far.)

Do what? feel free to ask questions when you aren't sure how to
accomplish something on the C level.