From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: How to add pseudo vector types Date: Sat, 17 Jul 2021 09:56:13 +0300 Message-ID: <83mtql4eeq.fsf@gnu.org> References: <83h7gw6pyj.fsf@gnu.org> <45EBF16A-C953-42C7-97D1-3A2BFEF7DD01@gmail.com> <83y2a764oy.fsf@gnu.org> <83v95b60fn.fsf@gnu.org> <00DD5BFE-D14E-449A-9319-E7B725DEBFB3@gmail.com> <83r1fz5xr9.fsf@gnu.org> <1AAB1BCC-362B-4249-B785-4E0530E15C60@gmail.com> <83czri67h0.fsf@gnu.org> <46BBFF88-76C3-4818-8805-5437409BEA93@gmail.com> <83wnpq46uk.fsf@gnu.org> <533BD53B-4E85-4E9E-B46A-346A5BBAD0F5@gmail.com> <258CB68D-1CC1-42C8-BDCD-2A8A8099B783@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23117"; mail-complaints-to="usenet@ciao.gmane.io" Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jul 17 08:57:00 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m4eFw-0005oZ-1v for ged-emacs-devel@m.gmane-mx.org; Sat, 17 Jul 2021 08:57:00 +0200 Original-Received: from localhost ([::1]:40914 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m4eFu-0006TF-LR for ged-emacs-devel@m.gmane-mx.org; Sat, 17 Jul 2021 02:56:58 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40922) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m4eFQ-0005nv-1Q for emacs-devel@gnu.org; Sat, 17 Jul 2021 02:56:28 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:34310) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m4eFO-00034x-Mn; Sat, 17 Jul 2021 02:56:26 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:2591 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m4eFO-0006u3-2W; Sat, 17 Jul 2021 02:56:26 -0400 In-Reply-To: <258CB68D-1CC1-42C8-BDCD-2A8A8099B783@gmail.com> (message from Yuan Fu on Fri, 16 Jul 2021 22:05:01 -0400) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271328 Archived-At: > From: Yuan Fu > Date: Fri, 16 Jul 2021 22:05:01 -0400 > Cc: Stefan Monnier , > emacs-devel@gnu.org > > Please have a look at the second patch that applies on top of the first one. This time I added after-change hooks, so if you create a parser for a buffer and edit that buffer, the parser is kept updated lazily. Instead of using the hook machinery, it is better to augment the insdel.c functions to directly update the information you need to keep. If you do it directly in the insdel.c functions that modify the buffer text, you can be much more accurate in updating the information about the changes, because each insdel.c function performs a well-defined operation of the buffer text. By contrast, buffer-change hooks are higher-level functionality, meant for Lisp programs, so they don't necessarily make it easy to reverse-engineer the specific changes. All you have there is some higher-level information about which part of the buffer changed. Moreover, the hooks are sometimes called more times than they should be, to be on the safe side. As a trivial (but not insignificant!) optimization, primitive insdel.c functions always know both the character and the byte positions in the buffer they change, so this code you needed in your hooks: > + ptrdiff_t beg_byte = CHAR_TO_BYTE (start_int); > + ptrdiff_t old_end_byte = CHAR_TO_BYTE (end_int); could be avoided. Converting character to byte positions can sometimes significantly slow down the code, for example if there are a lot of markers in the buffer. So I urge you to record the change information directly in the primitive functions of insdel.c, not in the hooks. > In summary, the parser parses the whole buffer on the first time when the user asks for the parse tree. In after-change-hook, no parsing is done, but we do update the trees with position changes. On the next time when the user asks for the parse tree, the whole buffer is re-parsed incrementally. (I didn’t read the paper, but I assume it knows where are the bits to re-parse because we updated the tree with position changes.) Why do you update the entire parser list for every modification? This comment: > +void > +ts_before_change (ptrdiff_t start_int, ptrdiff_t end_int) > +{ > + /* Iterate through each parser in 'tree-sitter-parser-list' and > + record the byte position. There could be better ways to record > + it than storing the same position in every parser, but this is > + the most fool-proof way, and I expect a buffer to have only one > + parser most of the time anyway. */ already says that there are better ways: record the change info just once in one place. Then propagate that to the entire list, if you need, when you actually need to call TS. It is possible that at the call time you will know which parser needs to be called, and will be able to update only that parser, not the entire list. > + // FIXME: Add some boundary checks? > + /* I believe we can get away with only setting current-buffer > + and not actually switching to it, like what we did in > + 'make_gap_1'. */ > + struct buffer *old_buffer = current_buffer; > + current_buffer = (struct buffer *) buffer; This looks unnecessary: we have BUF_BYTE_ADDRESS, which accepts the buffer as its argument, and the corresponding buf_next_char_len. IOW, why did you need to switch to the buffer? > + /* Read one character. */ > + char *beg; > + int len; > + if (byte_pos >= Z_BYTE) > + { > + beg = ""; > + len = 0; > + } Is getting an empty string what TS wants when it attempts to read beyond EOB? Also, why do you test Z_BYTE and not ZV_BYTE (actually, BUF_ZV_BYTE)? Emacs in general behaves as if text beyond point-max didn't exist, why should code supported by the TS parser behave differently? > + beg = (char *) BYTE_POS_ADDR (byte_pos); > + len = next_char_len(byte_pos); This is sub-optimal: next_char_len also calls BYTE_POS_ADDR. Why not use BYTES_BY_CHAR_HEAD instead? Thanks for working on this.