From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: How to add pseudo vector types Date: Sat, 17 Jul 2021 10:16:07 +0300 Message-ID: <83lf654dhk.fsf@gnu.org> References: <83h7gw6pyj.fsf@gnu.org> <45EBF16A-C953-42C7-97D1-3A2BFEF7DD01@gmail.com> <83y2a764oy.fsf@gnu.org> <83v95b60fn.fsf@gnu.org> <00DD5BFE-D14E-449A-9319-E7B725DEBFB3@gmail.com> <83r1fz5xr9.fsf@gnu.org> <1AAB1BCC-362B-4249-B785-4E0530E15C60@gmail.com> <83czri67h0.fsf@gnu.org> <46BBFF88-76C3-4818-8805-5437409BEA93@gmail.com> <83wnpq46uk.fsf@gnu.org> <533BD53B-4E85-4E9E-B46A-346A5BBAD0F5@gmail.com> <258CB68D-1CC1-42C8-BDCD-2A8A8099B783@gmail.com> <1a776770-50b7-93cd-6591-c9a5b3a56eb8@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32054"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: =?utf-8?Q?Cl=C3=A9ment?= Pit-Claudel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jul 17 09:17:12 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m4eZU-00081G-66 for ged-emacs-devel@m.gmane-mx.org; Sat, 17 Jul 2021 09:17:12 +0200 Original-Received: from localhost ([::1]:43992 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m4eZS-0001C8-Qi for ged-emacs-devel@m.gmane-mx.org; Sat, 17 Jul 2021 03:17:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:42726) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m4eYf-0000Qr-4r for emacs-devel@gnu.org; Sat, 17 Jul 2021 03:16:21 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:34610) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m4eYe-00087C-Ts; Sat, 17 Jul 2021 03:16:20 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:3803 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m4eYe-00006E-72; Sat, 17 Jul 2021 03:16:20 -0400 In-Reply-To: <1a776770-50b7-93cd-6591-c9a5b3a56eb8@gmail.com> (message from =?utf-8?Q?Cl=C3=A9ment?= Pit-Claudel on Fri, 16 Jul 2021 22:23:26 -0400) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271329 Archived-At: > From: Clément Pit-Claudel > Date: Fri, 16 Jul 2021 22:23:26 -0400 > > On 7/16/21 10:05 PM, Yuan Fu wrote: > > My conclusion is that after-change-hook is pretty insignificant, and the initial parse is a bit slow (on large files). > > I have no idea if it makes sense, but: does the initial parse need to be synchronous, or could you instead run the parsing in one thread, and the rest of Emacs in another? (I'm talking about concurrent execution, not cooperative threading). You cannot have a thread freely accessing buffer text when the Lisp machine is allowed to run concurrently with this, because the Lisp machine can change the buffer text. > In most cases there should be very limited contention, if at at all: in large buffers most of Emacs' activity will be focused on the (relatively few) characters around the gap, and most of the parser's activity will be reading from the buffer at other positions. When Emacs moves or enlarges/shrinks the gap, that affects the entire buffer text after the gap, regardless of where the gap is. So it will affect the TS reader if it reads stuff after the gap. > You do need to be careful to not read the garbage data from the gap, but otherwise seeing stale or even inconsistent data from the parser thread shouldn't be an issue, since tree-sitter is supposed to be robust to bad parses. What would be the purpose of calling the parser if we know in advance it will fail when it gets to the "garbage" caused by async access to the buffer text? And besides, current Emacs primitives that access buffer text don't necessarily do that atomically, since the assumption built into their design is that no one should access that text at the same time. So you could have windows where the buffer text is in inconsistent state, like if the gap was moved, but the variables which tell where the gap is were not yet updated, or windows where a multibyte character was not yet completely written or deleted to/from the buffer, resulting in invalid multibyte sequences and inconsistent values of EOB. So I don't see how this could be done without some inter-locking. And what do you want the code which requested parsing do while the parse thread runs? The requesting code is in the main thread, so if it just waits, you don't gain anything. > In fact, depending on how robust tree-sitter is, you might even be able to do the concurrency-control optimistically (parse everything up to close to the gap, check that the gap hasn't moved into the region that you read, and then resume reading or rollback). I don't understand what you suggest here. For starters, the gap could move (assuming you are still talking about a separate thread that does the parsing), and what do we do then? > Alternatively, maybe you could even do a full parse with minimal concurrency control: you'd make sure that the Emacs thread records not just changes to the buffer text but also movements of the gap, and then you could use that list of changes for the next parse? I don't understand what could recording the gap solve. The stuff in the gap is generally garbage, and can easily include invalid multibyte sequences. I don't think it's a good idea to pass that to TS. Also, recording the gap changes in the main thread and accessing that information from a concurrent thread again opens a window for races, and requires synchronization. Bottom line, I think what you are suggesting is premature optimization: we don't yet know that we will need this. If the TS performance information is reliable, it should be fast enough for our purposes; we just need to come up with an optimal way of calling it so that we don't impose unnecessary delays.