From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: =?UTF-8?B?VHXhuqVuLUFuaCBOZ3V54buFbg==?= Newsgroups: gmane.emacs.devel Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) Date: Thu, 2 Apr 2020 11:21:49 +0700 Message-ID: References: <83369o1khx.fsf@gnu.org> <83imijz68s.fsf@gnu.org> <831rp7ypam.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="124160"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Apr 02 06:22:55 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jJrNX-000WCK-4P for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 06:22:55 +0200 Original-Received: from localhost ([::1]:33238 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJrNW-0005OT-5b for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 00:22:54 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:55710) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJrMu-0004hE-MF for emacs-devel@gnu.org; Thu, 02 Apr 2020 00:22:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jJrMt-0006OW-7P for emacs-devel@gnu.org; Thu, 02 Apr 2020 00:22:16 -0400 Original-Received: from mail-pg1-x534.google.com ([2607:f8b0:4864:20::534]:42462) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jJrMr-0006HM-Bm; Thu, 02 Apr 2020 00:22:13 -0400 Original-Received: by mail-pg1-x534.google.com with SMTP id h8so1247709pgs.9; Wed, 01 Apr 2020 21:22:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=vwWI64RKhn9FS6pnIgiqdl4JEZg4M/hIychzjaL+/Wc=; b=QmyWZD0yhOhWlHOvdscp3cnnVt4hMc0kOBaebljj8ShE6uijXfCbRYD8A2juPPCdHj AU4+iJxn7SZkC5lKaH/mlhlo7VeRKv+2SQzGjrfxsiDdIT9+rZdEf9ifmnDSHcxsaT3k fRVdVaULHCyVRkDAf37Gk4BALfLSVwNpbjFt/YH5U4XJybmL2v1NYrfi75QK8/Ec31yp C+m90ojzRNp6+3y1POni319ac10PuVeQiet63UAJAoRryxpMSTsGhqElMmwWOkj8KxCK PmOzfZWoWEBdpFiw8l6X6tj3fECksoHIy1z8YCs28ZuxDG5vzxolxSbCALBnJFz2rido /KzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=vwWI64RKhn9FS6pnIgiqdl4JEZg4M/hIychzjaL+/Wc=; b=AXVpKqdv0w8uFJjyxZRawLbP7HA6AMIQV9MSXM2t15cwCPgspOLj5x80MXQXIIQhy2 xdWqemNIsgINABJ0POfu770VggmqqXJlT6OSdahxFZeQfUkJASEUxfZpi0KXB3lNHp/V bg0VD9XqOxiwKGP5+3BWfUbilrVtXi/GzsbjfLCxoV8wKC5xN5EkArsDpn9E38X3NN7B r7cAU3y+z2o9xPUHG8mqrVCpBkO3X+i3JSbyccDUO90uClTiyrYjC//2GLSu88vchiPU BZ7rnF3cAynMQdVg9hvOMRWOQMlECUhK4Qv3NeFG3azUTEYMaJ7cUCKlurUMCXBh5iSw o4Tg== X-Gm-Message-State: AGi0PuYRYYuT3LHWULNfk5lbW9jvo5McrfVr/B6Mvqd5S2/4JdCv9FC7 oULbC/1QTc6NRYFaD8EZl5+F4TJHkff5YcDSErP96o2BrqlTzQ== X-Google-Smtp-Source: APiQypJdUcRD/3L/xDLtaA6Kivgp1oR44jUJlt+Q1cch7viaykNkr4VsNqJa+07ewQfEfa1P95i+BzLBoZkR+tfnWPc= X-Received: by 2002:a62:4e57:: with SMTP id c84mr1350813pfb.156.1585801326544; Wed, 01 Apr 2020 21:22:06 -0700 (PDT) In-Reply-To: <831rp7ypam.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::534 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:246236 Archived-At: On Thu, Apr 2, 2020 at 2:33 AM Eli Zaretskii wrote: > > > From: Tu=E1=BA=A5n-Anh Nguy=E1=BB=85n > > Date: Thu, 2 Apr 2020 00:55:45 +0700 > > Cc: emacs-devel@gnu.org > > > > > Did you consider using the API where an application can provide a > > > function to return text at a given offset? Such a function could be > > > relatively easily implemented for Emacs. > > > > > > > I don't understand what you mean. Below I'll explain how it works > > currently. [...] If dynamic modules have direct access to the > > buffer text, none of the above is an issue. > > > > Such direct access can be enabled by something like this: > > > > char* (*access_buffer_text) (emacs_env *env, > > emacs_value buffer, > > ptrdiff_t byte_offset, > > ptrdiff_t *size_inout); > > > > Of course, such an API would require extensive documentation on how it > > must be used, to ensure safety and correctness. > > I think you are moving too fast, and keep the current implementation > in sight too much. > I'm actually moving too slow here. I have thought about this part quite a bit, but I'm currently focusing on other things, partially because this is not painful bottleneck. > What I suggest is to step back and see how such direct access, if it > were available, could be used with tree-sitter. Let's forget about > modules for a moment and consider tree-sitter linked with Emacs and > capable of calling any C function in core. How would you use that? > > Buffer text is not exactly UTF-8, it's a superset of UTF-8. So one > question to answer is what to do with byte sequences that are not > valid UTF-8. Any suggestions or ideas? How does tree-sitter handle > invalid byte sequences in general? > I haven't checked yet. It will probably bail out, which is usually the desired behavior. The tree-sitter's author is likely open to making this behavior configurable here, though. Alternatively, the direct access function can offer different behaviors: as-is, bail-out, skip-over, or null-out (tree-sitter will skip over null bytes, IIRC). > Also, direct access to buffer text generally means we must make sure > GC never runs as long as pointers to buffer text are lying around. > Can any Lisp run between calls to the reader function that the > tree-sitter parser calls to access the buffer text? If so, we need to > take care of that issue. > With direct access, no Lisp code will be run between these calls. > Next, I'm still asking whether parsing the whole buffer when it is > first created is necessary. Can we pass to the parser just a small > chunk (say, 500 bytes) of the buffer around the window-full to be > displayed next? If this presents problems, what are those problems? > In principle (not in tree-sitter ATM), and in very specific cases, yes. IMO that's the wrong focus on a premature optimization anyway. As others noted, even in the pathological case of xdisp.c, the performance is acceptable. Also keep in mind that syntax highlighting is just one application. Other use cases usually want a full parse tree. If we really want to tackle this issue, there are other approaches to consider, e.g. background parsing, or parsing up until a time limit, and resume parsing when Emacs is idle. Tree-sitter's API supports the latter. But again, both thought exercises and my usage so far point to this being a non-issue. > IOW, the issue with exposing access to buffer text to modules is IMO > secondary. My suggestion is first to figure out how to do this stuff > efficiently from within Emacs itself, as if the module interface were > not part of the equation. We can add that aspect back later. > My opinion is that it's better to experiment with this kind of stuff out-of-core. It can move forward faster that way, allowing more lessons to be learned. Real lessons, involving real-world use cases, not thought exercises. In a somewhat similar vein, writing emacs-tree-sitter highlighted real issues with dynamic modules, which I'm going to write up sometime. > And yes, doing this by consing strings is not a good idea, it will > slow things down and cause a lot of GC. It is best avoided. Thus my > questions above. > > > > Btw, what do you do with the tree returned by the tree-sitter parser? > > > store it in some buffer-local variable? If so, how much memory does > > > such a tree take, and when, if ever, is that memory released? > > > > > > > It's stored in a buffer-local variable. I haven't measured the memory > > they take. Memory is released when the tree object is garbage-collected > > (it's a `user-ptr'). > > So if I have many hundreds of buffers, I could have such a tree in > each one of them indefinitely? Perhaps that's one more design issue > to consider, given that the parsing is so fast. Similar to what we do > with image and face caches -- we flush them from time to time, to keep > the memory footprint in check. So a buffer that was not current more > than some time interval ago could have its tree GCed. > That can work. Alternatively, tree-sitter can add support for "folding" subtrees, as Stefan suggested. -- Tu=E1=BA=A5n-Anh Nguy=E1=BB=85n Software Engineer