From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) Date: Thu, 02 Apr 2020 17:36:12 +0300 Message-ID: <83r1x6x8df.fsf@gnu.org> References: <83369o1khx.fsf@gnu.org> <83imijz68s.fsf@gnu.org> <831rp7ypam.fsf@gnu.org> <86wo6yhj4d.fsf@stephe-leake.org> Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="68840"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Stephen Leake Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Apr 02 16:37:12 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jK0xz-000Ho1-Jy for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 16:37:11 +0200 Original-Received: from localhost ([::1]:41218 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jK0xy-0004ih-Mw for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 10:37:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:34257) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jK0xM-0003rH-1k for emacs-devel@gnu.org; Thu, 02 Apr 2020 10:36:33 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58069) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jK0xL-0002s7-43; Thu, 02 Apr 2020 10:36:31 -0400 Original-Received: from [176.228.60.248] (port=2859 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jK0xK-0001F4-Hr; Thu, 02 Apr 2020 10:36:30 -0400 In-Reply-To: <86wo6yhj4d.fsf@stephe-leake.org> (message from Stephen Leake on Wed, 01 Apr 2020 15:38:26 -0800) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:246261 Archived-At: > From: Stephen Leake > Date: Wed, 01 Apr 2020 15:38:26 -0800 > > Eli Zaretskii writes: > > > Also, direct access to buffer text generally means we must make sure > > GC never runs as long as pointers to buffer text are lying around. > > Can any Lisp run between calls to the reader function that the > > tree-sitter parser calls to access the buffer text? > > If the parser copies the text into an internal buffer, that reader > function should only be called once per call to the parser. Such copying is not really scalable, and IMO should be avoided. During active editing, redisplay runs very frequently, and having to copy portions of the buffer, let alone all of it, each time, which necessarily requires memory allocation, consing of Lisp objects, etc., will produce significant memory pressure, expensive heap allocations/deallocations, and a lot of GC. Recall that on many modern platforms Emacs doesn't really return memory to the system, which means we risk increasing the memory footprint, and create system-wide memory pressure. It isn't a catastrophe, but we should try to avoid it if possible. > Since Emacs has the entire file in memory, the parser can too. Having the file twice or more in memory is worse than having it only once. > However, if we are really trying to avoid copying text (which is very > premature optimization) I don't think it's premature. > In sum, the short answer is "yes, you must parse the whole file, unless > your language is particularly simple". Funny, my conclusion from reading your detailed description was entirely different. > > IOW, the issue with exposing access to buffer text to modules is IMO > > secondary. > > yes, because copying text is fast compared to everything else going on. That wasn't my motivation when I wrote that. > In general, each parser library, and even each grammar author, will have > different representations for the syntax tree. > > So if we want to support different parsers, I think it is best to define > the Emacs "parser API" as "give text to parser; accept text properties > from parser". Yes, something like that. It's probably enough to accept a list of regions with syntactic attributes.