From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: How to add pseudo vector types
Date: Sat, 17 Jul 2021 09:56:13 +0300
Message-ID: <83mtql4eeq.fsf@gnu.org>
References: <83h7gw6pyj.fsf@gnu.org>
 <45EBF16A-C953-42C7-97D1-3A2BFEF7DD01@gmail.com> <83y2a764oy.fsf@gnu.org>
 <EC57595C-64CE-4920-8CB3-9DD3F4CFAF38@gmail.com> <83v95b60fn.fsf@gnu.org>
 <00DD5BFE-D14E-449A-9319-E7B725DEBFB3@gmail.com> <83r1fz5xr9.fsf@gnu.org>
 <1AAB1BCC-362B-4249-B785-4E0530E15C60@gmail.com> <83czri67h0.fsf@gnu.org>
 <46BBFF88-76C3-4818-8805-5437409BEA93@gmail.com> <83wnpq46uk.fsf@gnu.org>
 <533BD53B-4E85-4E9E-B46A-346A5BBAD0F5@gmail.com>
 <258CB68D-1CC1-42C8-BDCD-2A8A8099B783@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="23117"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
To: Yuan Fu <casouri@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jul 17 08:57:00 2021
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1m4eFw-0005oZ-1v
	for ged-emacs-devel@m.gmane-mx.org; Sat, 17 Jul 2021 08:57:00 +0200
Original-Received: from localhost ([::1]:40914 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1m4eFu-0006TF-LR
	for ged-emacs-devel@m.gmane-mx.org; Sat, 17 Jul 2021 02:56:58 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40922)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>) id 1m4eFQ-0005nv-1Q
 for emacs-devel@gnu.org; Sat, 17 Jul 2021 02:56:28 -0400
Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:34310)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <eliz@gnu.org>)
 id 1m4eFO-00034x-Mn; Sat, 17 Jul 2021 02:56:26 -0400
Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:2591
 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1m4eFO-0006u3-2W; Sat, 17 Jul 2021 02:56:26 -0400
In-Reply-To: <258CB68D-1CC1-42C8-BDCD-2A8A8099B783@gmail.com> (message from
 Yuan Fu on Fri, 16 Jul 2021 22:05:01 -0400)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:271328
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/271328>

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 16 Jul 2021 22:05:01 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  emacs-devel@gnu.org
> 
> Please have a look at the second patch that applies on top of the first one. This time I added after-change hooks, so if you create a parser for a buffer and edit that buffer, the parser is kept updated lazily.

Instead of using the hook machinery, it is better to augment the
insdel.c functions to directly update the information you need to
keep.  If you do it directly in the insdel.c functions that modify the
buffer text, you can be much more accurate in updating the information
about the changes, because each insdel.c function performs a
well-defined operation of the buffer text.  By contrast, buffer-change
hooks are higher-level functionality, meant for Lisp programs, so they
don't necessarily make it easy to reverse-engineer the specific
changes.  All you have there is some higher-level information about
which part of the buffer changed.  Moreover, the hooks are sometimes
called more times than they should be, to be on the safe side.

As a trivial (but not insignificant!) optimization, primitive insdel.c
functions always know both the character and the byte positions in the
buffer they change, so this code you needed in your hooks:

> +  ptrdiff_t beg_byte = CHAR_TO_BYTE (start_int);
> +  ptrdiff_t old_end_byte = CHAR_TO_BYTE (end_int);

could be avoided.  Converting character to byte positions can
sometimes significantly slow down the code, for example if there are a
lot of markers in the buffer.

So I urge you to record the change information directly in the
primitive functions of insdel.c, not in the hooks.

> In summary, the parser parses the whole buffer on the first time when the user asks for the parse tree. In after-change-hook, no parsing is done, but we do update the trees with position changes. On the next time when the user asks for the parse tree, the whole buffer is re-parsed incrementally. (I didn’t read the paper, but I assume it knows where are the bits to re-parse because we updated the tree with position changes.)

Why do you update the entire parser list for every modification?  This
comment:

> +void
> +ts_before_change (ptrdiff_t start_int, ptrdiff_t end_int)
> +{
> +  /* Iterate through each parser in 'tree-sitter-parser-list' and
> +     record the byte position.  There could be better ways to record
> +     it than storing the same position in every parser, but this is
> +     the most fool-proof way, and I expect a buffer to have only one
> +     parser most of the time anyway. */

already says that there are better ways: record the change info just
once in one place.  Then propagate that to the entire list, if you
need, when you actually need to call TS.  It is possible that at the
call time you will know which parser needs to be called, and will be
able to update only that parser, not the entire list.

> +  // FIXME: Add some boundary checks?
> +  /* I believe we can get away with only setting current-buffer
> +     and not actually switching to it, like what we did in
> +     'make_gap_1'.  */
> +  struct buffer *old_buffer = current_buffer;
> +  current_buffer = (struct buffer *) buffer;

This looks unnecessary: we have BUF_BYTE_ADDRESS, which accepts the
buffer as its argument, and the corresponding buf_next_char_len.  IOW,
why did you need to switch to the buffer?

> +  /* Read one character.  */
> +  char *beg;
> +  int len;
> +  if (byte_pos >= Z_BYTE)
> +    {
> +      beg = "";
> +      len = 0;
> +    }

Is getting an empty string what TS wants when it attempts to read
beyond EOB?

Also, why do you test Z_BYTE and not ZV_BYTE (actually, BUF_ZV_BYTE)?
Emacs in general behaves as if text beyond point-max didn't exist, why
should code supported by the TS parser behave differently?

> +      beg = (char *) BYTE_POS_ADDR (byte_pos);
> +      len = next_char_len(byte_pos);

This is sub-optimal: next_char_len also calls BYTE_POS_ADDR.  Why not
use BYTES_BY_CHAR_HEAD instead?

Thanks for working on this.