From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: How to add pseudo vector types Date: Thu, 22 Jul 2021 20:00:43 +0300 Message-ID: <83k0li2shw.fsf@gnu.org> References: <83h7gw6pyj.fsf@gnu.org> <45EBF16A-C953-42C7-97D1-3A2BFEF7DD01@gmail.com> <83y2a764oy.fsf@gnu.org> <83v95b60fn.fsf@gnu.org> <00DD5BFE-D14E-449A-9319-E7B725DEBFB3@gmail.com> <83r1fz5xr9.fsf@gnu.org> <1AAB1BCC-362B-4249-B785-4E0530E15C60@gmail.com> <83czri67h0.fsf@gnu.org> <46BBFF88-76C3-4818-8805-5437409BEA93@gmail.com> <83wnpq46uk.fsf@gnu.org> <533BD53B-4E85-4E9E-B46A-346A5BBAD0F5@gmail.com> <258CB68D-1CC1-42C8-BDCD-2A8A8099B783@gmail.com> <1a776770-50b7-93cd-6591-c9a5b3a56eb8@gmail.com> <8335s64v10.fsf@gnu.org> <5380C92B-6C15-4490-A1E0-1C3132DBB16A@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32375"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Jul 22 19:01:54 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m6c54-0008ED-DS for ged-emacs-devel@m.gmane-mx.org; Thu, 22 Jul 2021 19:01:54 +0200 Original-Received: from localhost ([::1]:60366 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m6c53-0002Tu-Bk for ged-emacs-devel@m.gmane-mx.org; Thu, 22 Jul 2021 13:01:53 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49788) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6c4F-0001AU-DK for emacs-devel@gnu.org; Thu, 22 Jul 2021 13:01:03 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58466) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m6c4E-0000Ev-NU; Thu, 22 Jul 2021 13:01:02 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:4535 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6c4E-0007qH-AI; Thu, 22 Jul 2021 13:01:02 -0400 In-Reply-To: <5380C92B-6C15-4490-A1E0-1C3132DBB16A@gmail.com> (message from Yuan Fu on Thu, 22 Jul 2021 09:47:45 -0400) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271469 Archived-At: > From: Yuan Fu > Date: Thu, 22 Jul 2021 09:47:45 -0400 > Cc: Stefan Monnier , > Clément Pit-Claudel , > emacs-devel@gnu.org > > Yes, I meant to discuss this. The problem with respecting narrowing is that, a user can freely narrow and widen arbitrarily, and Emacs needs to translate them into insertion & deletion of the buffer text for tree-sitter, every time a user narrows or widens the buffer. Plus, if tree-sitter respects narrowing, it could happen where a user narrows the buffer, the font-locking changes and is not correct anymore. Maybe that’s not the user want. Also, if someone narrows and widens often, maybe narrow to a function for better focus, tree-sitter needs to constantly re-parse most of the buffer. These are not significant disadvantages, but what do we get from respecting narrowing that justifies code complexity and these small annoyances? But that's how the current font-lock and indentation work: they never look beyond the narrowing limits. So why should the TS-based features behave differently? As for temporary narrowing: if we record the changes, but don't send them to TS until we actually need re-parsing, then we could eliminate the temporary narrowing when we report the changes to TS, leaving only the narrowing that exists at the time of the re-parse. At least for fontifications, that time is redisplay time, and users do expect to see the text fontified according to the current narrowing. > >> *bytes_read = (uint32_t) len; > > > > Is using uint32_t the restriction of tree-sitter? Doesn't it support > > reading more than 2 gigabytes? > > I’m not sure why it asks for uint32 specifically, but that’s what it asks for its api. I don’t think you are supposed to use tree-sitter on files of size of gigabytes, because the author mentioned that tree-sitter uses over 10x as much memory as the size of the source file [1]. On files larger than a couple of megabytes, I think we better turn off tree-sitter. Normally those files are not regular source files, anyway, and we don’t need a parse tree for a log. I don't necessarily agree with the "not regular source files" part. For example, JSON files can be quite large. And there are also log files, which are even larger -- did no one adapt TS to fontifying those yet? More generally: is the problem real? If you make a file that is 1000 copies of xdisp.c, and then submit it to TS, do you really get 10GB of memory consumption? This is something that is good to know up front, so we'd know what to expect down the road. > That leads to another point. I suspect the memory limit will come before the speed limit, i.e., as the file size increases, the memory consumption will become unacceptable before the speed does. So it is possible that we want to outright disable tree-sitter for larger files, then we don’t need to do much to improve the responsiveness of tree-sitter on large files. And we might want to delete the parse tree if a buffer has been idle for a while. Of course, that’s just my superstition, we’ll see once we can measure the performance. See above: IMO, we should benchmark both the CPU and memory performance of TS for such large files, before we decide on the course of action. > >> +DEFUN ("tree-sitter-node-type", > >> + Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0, > >> + doc: /* Return the NODE's type as a symbol. */) > >> + (Lisp_Object node) > >> +{ > >> + CHECK_TS_NODE (node); > >> + TSNode ts_node = XTS_NODE (node)->node; > >> + const char *type = ts_node_type(ts_node); > >> + return intern_c_string (type); > > > > Why do we need to intern the string each time? can't we store the > > interned symbol there, instead of a C string, in the first place? > > I’m not sure what do you mean by “store the interned symbol there”, where do I store the interned symbol? In the struct that ts_node_type accesses, instead of the 'char *' string you store there now. > (BTW, If you see something wrong, that’s probably because I don’t know the right way to do it, and grepping only got me that far.) Do what? feel free to ask questions when you aren't sure how to accomplish something on the C level.