From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) Date: Tue, 31 Mar 2020 11:11:22 -0400 Message-ID: References: <83o8sf3r7i.fsf@gnu.org> <2E218879-0F24-4A20-B210-263C8D0BEEA4@gmail.com> <838sjh2red.fsf@gnu.org> <83369o3bvb.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="67985"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: casouri@gmail.com, akrl@sdf.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Mar 31 17:12:29 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jJIZ3-000HYO-Mr for ged-emacs-devel@m.gmane-mx.org; Tue, 31 Mar 2020 17:12:29 +0200 Original-Received: from localhost ([::1]:39914 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJIZ2-000795-N8 for ged-emacs-devel@m.gmane-mx.org; Tue, 31 Mar 2020 11:12:28 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58766) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJIYB-0005HD-Ss for emacs-devel@gnu.org; Tue, 31 Mar 2020 11:11:37 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jJIY9-0001bz-5a for emacs-devel@gnu.org; Tue, 31 Mar 2020 11:11:34 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:55348) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jJIY6-0001as-AK; Tue, 31 Mar 2020 11:11:30 -0400 Original-Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 28E2A44FAB4; Tue, 31 Mar 2020 11:11:29 -0400 (EDT) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 3F87E44FAAD; Tue, 31 Mar 2020 11:11:23 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1585667483; bh=VCOYCXij7orGTxZIc52NYD8X1WDKGraob+qZ+IKL/pU=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=Vp4WC8N/B4jzqwRdJpvbibllgimCEAFAsCcdcemx70rIQ2DMob7VC4UbtA8LGFGFl 2i5u0iShhTsuS/YIs2SeJ0qh+E2MxrgGv7lGf/+fYr8QZYHyONHEP7bxVMI9d+ctD6 ZYvQOmny6nLbwatIBcJM4Vx+tDGU8aJVYd/iskcBnhEtm02xABtKKJaBzAyUEyFcrp 9WD6o9RrrNE2ZKkLik7cykMF/nUWRfGtHmqcueZJIQ5f7WJuoQwaNB/iiOP11mOr6z wkoaqE8XcNAXQH9SMujGqFqSvVom1fNburfg9mt5id1g/0XRmhYpmRbrdN/brqBEJl lA2vh2kJyvwiA== Original-Received: from alfajor (unknown [104.247.241.114]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id EF3DA12079C; Tue, 31 Mar 2020 11:11:22 -0400 (EDT) In-Reply-To: <83369o3bvb.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 31 Mar 2020 16:14:16 +0300") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 132.204.25.50 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:246107 Archived-At: >> IIUC, tree-sitter starts by parsing the whole buffer anyway, and then >> keeps the parse tree up-to-date in response to buffer changes. > Why does it need the entire buffer up front? Because as a general rule you cannot parse a region without looking at all the preceding text. That's why when we fontify START..BEG we need to begin by computing the `syntax-ppss` at START, which involved passing the whole text from `point-min` to START though `parse-partial-sexp`. > that sounds like a potential performance killer. Indeed. And so does this `syntax-ppss` call we have. It's OK as long as the parsing is fast enough and you don't use it in too large buffers. E.g. I expect that most programming major modes currently exhibit significant delays when you jump to the end of multi-GB buffer because of that `syntax-ppss` call. > Fontifying a small part of a buffer doesn't need its entire text. Sadly, it does. In specific cases you may be able to speed things up, but that's only applicable to some cases. I'm sure there could be other approaches that focus on trying to parse as little of the buffer text as possible (e.g. SMIE follows this kind of idea), but it's difficult to make them work with a "normal" grammar, providing a full parse tree and giving a reliable result (and without it degenerating to parsing the whole buffer anyway in most cases). > In any case, I hope that passing the buffer to tree-sitter doesn't > involve marshalling the entire buffer text via a function call as a > huge string, or some such. These are internal implementation details that can be tweaked later on. I do expect that the code currently needs to call `buffer-string` or its moral equivalent. But if the resources this requires are significant enough to worry about, then it's a great news: it means the parsing itself is very fast. > We should instead request that tree-sitter exposes an API through > which we could give it direct access to buffer text as 2 parts, before > and after the gap, like we do with regex code. Otherwise this will be > a bottleneck in the long run, not unlike the problem we have with LSP. I'm not sure exactly which problem with LSP you're thinking about, but I doubt `buffer-string` is a significant component of a performance problem with LSP: the time to pass that string to the server via a pipe should dwarf it. > I still don't see why it would need the entire buffer for this class > of applications. Did anyone try the alternatives, in particular on > very large buffers? What alternatives? How large is "very large" here? Stefan