From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) Date: Thu, 02 Apr 2020 18:02:06 +0300 Message-ID: <83mu7ux769.fsf@gnu.org> References: <83369o1khx.fsf@gnu.org> <83imijz68s.fsf@gnu.org> <831rp7ypam.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="71815"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: =?utf-8?Q?Tu=E1=BA=A5n-Anh_Nguy=E1=BB=85n?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Apr 02 17:04:56 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jK1Oq-000IZj-0L for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 17:04:56 +0200 Original-Received: from localhost ([::1]:42728 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jK1Oo-0003tY-Ug for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 11:04:54 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43382) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jK1MP-0000tF-1t for emacs-devel@gnu.org; Thu, 02 Apr 2020 11:02:30 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58675) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jK1MO-0002Q6-T8; Thu, 02 Apr 2020 11:02:24 -0400 Original-Received: from [176.228.60.248] (port=4421 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jK1MM-0003vo-9p; Thu, 02 Apr 2020 11:02:22 -0400 In-Reply-To: (message from =?utf-8?Q?Tu=E1=BA=A5n-Anh_Nguy=E1=BB=85n?= on Thu, 2 Apr 2020 11:21:49 +0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:246263 Archived-At: > From: Tuấn-Anh Nguyễn > Date: Thu, 2 Apr 2020 11:21:49 +0700 > Cc: emacs-devel@gnu.org > > > Buffer text is not exactly UTF-8, it's a superset of UTF-8. So one > > question to answer is what to do with byte sequences that are not > > valid UTF-8. Any suggestions or ideas? How does tree-sitter handle > > invalid byte sequences in general? > > > > I haven't checked yet. It will probably bail out, which is usually the > desired behavior. "Bail out" meaning that this breaks the parse? I'd be surprised if that was what happens in these cases. But if it does, we will need to replace such sequences by the likes of U+FFFD in the reader function we provide. > With direct access, no Lisp code will be run between these calls. Then this issue is taken care of. > > Next, I'm still asking whether parsing the whole buffer when it is > > first created is necessary. Can we pass to the parser just a small > > chunk (say, 500 bytes) of the buffer around the window-full to be > > displayed next? If this presents problems, what are those problems? > > > > In principle (not in tree-sitter ATM), and in very specific cases, yes. > IMO that's the wrong focus on a premature optimization anyway. I tried to explain elsewhere why I don't think this is premature. > As others noted, even in the pathological case of xdisp.c, the > performance is acceptable. xdisp.c is not a pathological case for me, I edit it very frequently. More importantly, this scales poorly. > Also keep in mind that syntax highlighting is just one > application. Other use cases usually want a full parse tree. Other applications have different restrictions and requirements, so trying to satisfy all of them at once might not be the best way. > If we really want to tackle this issue, there are other approaches to > consider, e.g. background parsing, or parsing up until a time limit, and > resume parsing when Emacs is idle. Tree-sitter's API supports the > latter. JIT-lock already supports background fontification (see jit-lock-stealth-time), so using such parsers from jit-lock gives that to you at almost no cost. > > IOW, the issue with exposing access to buffer text to modules is IMO > > secondary. My suggestion is first to figure out how to do this stuff > > efficiently from within Emacs itself, as if the module interface were > > not part of the equation. We can add that aspect back later. > > > > My opinion is that it's better to experiment with this kind of stuff > out-of-core. It can move forward faster that way, allowing more lessons > to be learned. Real lessons, involving real-world use cases, not thought > exercises. I'm talking about trying different design ideas. It is best to do that without being limited by what modules can and cannot do. Building a hacked version of Emacs to test those ideas doesn't necessarily contradict the desire to collect real-life experience. IOW, I suggest to test alternative design ideas that are not based on copying portions of the buffer via Lisp strings. If those ideas are workable (and I think they are), they will support a more scalable implementation that exerts less memory pressure on Emacs and on the host system. HTH