From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) Date: Wed, 01 Apr 2020 22:33:05 +0300 Message-ID: <831rp7ypam.fsf@gnu.org> References: <83369o1khx.fsf@gnu.org> <83imijz68s.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="62048"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: =?utf-8?Q?Tu=E1=BA=A5n-Anh_Nguy=E1=BB=85n?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Apr 01 21:33:48 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jJj7U-000G3i-Ao for ged-emacs-devel@m.gmane-mx.org; Wed, 01 Apr 2020 21:33:48 +0200 Original-Received: from localhost ([::1]:36454 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJj7S-0000nv-UD for ged-emacs-devel@m.gmane-mx.org; Wed, 01 Apr 2020 15:33:46 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35473) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJj6w-0000Hh-61 for emacs-devel@gnu.org; Wed, 01 Apr 2020 15:33:15 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:39137) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jJj6w-0008Le-2d; Wed, 01 Apr 2020 15:33:14 -0400 Original-Received: from [176.228.60.248] (port=1262 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jJj6v-0004Es-6j; Wed, 01 Apr 2020 15:33:13 -0400 In-Reply-To: (message from =?utf-8?Q?Tu=E1=BA=A5n-Anh_Nguy=E1=BB=85n?= on Thu, 2 Apr 2020 00:55:45 +0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:246221 Archived-At: > From: Tuấn-Anh Nguyễn > Date: Thu, 2 Apr 2020 00:55:45 +0700 > Cc: emacs-devel@gnu.org > > > Did you consider using the API where an application can provide a > > function to return text at a given offset? Such a function could be > > relatively easily implemented for Emacs. > > > > I don't understand what you mean. Below I'll explain how it works > currently. [...] If dynamic modules have direct access to the > buffer text, none of the above is an issue. > > Such direct access can be enabled by something like this: > > char* (*access_buffer_text) (emacs_env *env, > emacs_value buffer, > ptrdiff_t byte_offset, > ptrdiff_t *size_inout); > > Of course, such an API would require extensive documentation on how it > must be used, to ensure safety and correctness. I think you are moving too fast, and keep the current implementation in sight too much. What I suggest is to step back and see how such direct access, if it were available, could be used with tree-sitter. Let's forget about modules for a moment and consider tree-sitter linked with Emacs and capable of calling any C function in core. How would you use that? Buffer text is not exactly UTF-8, it's a superset of UTF-8. So one question to answer is what to do with byte sequences that are not valid UTF-8. Any suggestions or ideas? How does tree-sitter handle invalid byte sequences in general? Also, direct access to buffer text generally means we must make sure GC never runs as long as pointers to buffer text are lying around. Can any Lisp run between calls to the reader function that the tree-sitter parser calls to access the buffer text? If so, we need to take care of that issue. Next, I'm still asking whether parsing the whole buffer when it is first created is necessary. Can we pass to the parser just a small chunk (say, 500 bytes) of the buffer around the window-full to be displayed next? If this presents problems, what are those problems? IOW, the issue with exposing access to buffer text to modules is IMO secondary. My suggestion is first to figure out how to do this stuff efficiently from within Emacs itself, as if the module interface were not part of the equation. We can add that aspect back later. And yes, doing this by consing strings is not a good idea, it will slow things down and cause a lot of GC. It is best avoided. Thus my questions above. > > Btw, what do you do with the tree returned by the tree-sitter parser? > > store it in some buffer-local variable? If so, how much memory does > > such a tree take, and when, if ever, is that memory released? > > > > It's stored in a buffer-local variable. I haven't measured the memory > they take. Memory is released when the tree object is garbage-collected > (it's a `user-ptr'). So if I have many hundreds of buffers, I could have such a tree in each one of them indefinitely? Perhaps that's one more design issue to consider, given that the parsing is so fast. Similar to what we do with image and face caches -- we flush them from time to time, to keep the memory footprint in check. So a buffer that was not current more than some time interval ago could have its tree GCed.