From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: How to add pseudo vector types Date: Tue, 20 Jul 2021 09:25:11 -0700 Message-ID: <86mtqh54wo.fsf@stephe-leake.org> References: <83h7gw6pyj.fsf@gnu.org> <45EBF16A-C953-42C7-97D1-3A2BFEF7DD01@gmail.com> <83y2a764oy.fsf@gnu.org> <83v95b60fn.fsf@gnu.org> <00DD5BFE-D14E-449A-9319-E7B725DEBFB3@gmail.com> <83r1fz5xr9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3958"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (windows-nt) Cc: Yuan Fu , monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Jul 20 18:29:10 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m5scH-0000ns-6I for ged-emacs-devel@m.gmane-mx.org; Tue, 20 Jul 2021 18:29:09 +0200 Original-Received: from localhost ([::1]:36054 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m5scF-0001UB-UQ for ged-emacs-devel@m.gmane-mx.org; Tue, 20 Jul 2021 12:29:07 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40846) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m5saN-0008DC-GP for emacs-devel@gnu.org; Tue, 20 Jul 2021 12:27:12 -0400 Original-Received: from gateway33.websitewelcome.com ([192.185.145.4]:27437) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m5saI-0001tc-0i for emacs-devel@gnu.org; Tue, 20 Jul 2021 12:27:11 -0400 Original-Received: from cm11.websitewelcome.com (cm11.websitewelcome.com [100.42.49.5]) by gateway33.websitewelcome.com (Postfix) with ESMTP id 2B7643E83A for ; Tue, 20 Jul 2021 11:25:15 -0500 (CDT) Original-Received: from host2007.hostmonster.com ([67.20.76.71]) by cmsmtp with SMTP id 5sYUmLQaCK61i5sYVmayRm; Tue, 20 Jul 2021 11:25:15 -0500 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=stephe-leake.org; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:In-Reply-To:Date:References:Subject:Cc:To:From:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=W5azfayBcNg1HZuM9p4t3tO+rt6qjkwW7UQm5uTNoSI=; b=UdLnLeWOOZrQKPYS9Ar6miq18y LPb99dbAptKjP3Bjww3aphALL1oRDXk9Y88cdHMODmcsvuTeULgpXoL0PpOusMkA/ADsyrQOt7cyA T++KTq+3ZGm1W995nCETEJgQ6nqYwS5x4o+C3XybvEgU6JD5TiCCo11x6wwO7Z8ESE5ODV42J0eCW beLDtF8OSK5eAeLYzNjbLJvTRB4Lb/LCTZfYYU+/Ui7dZX8Ro/6W24l+H3rfIDSkhVKZV5r2pYYfQ 6DHckP35owS43pI1CEyEjmtpQ9lYlEFSIa8P9q3JzhocsSttyLWsBx3S3Em3+Fw7p1MNrmG28my+K e6ab+ilg==; Original-Received: from [76.77.182.20] (port=57257 helo=Takver4) by host2007.hostmonster.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1m5sYU-003SDV-D3; Tue, 20 Jul 2021 10:25:14 -0600 In-Reply-To: <83r1fz5xr9.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 15 Jul 2021 19:48:26 +0300") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host2007.hostmonster.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stephe-leake.org X-BWhitelist: no X-Source-IP: 76.77.182.20 X-Source-L: No X-Exim-ID: 1m5sYU-003SDV-D3 X-Source-Sender: (Takver4) [76.77.182.20]:57257 X-Source-Auth: stephen_leake@stephe-leake.org X-Email-Count: 4 X-Source-Cap: c3RlcGhlbGU7c3RlcGhlbGU7aG9zdDIwMDcuaG9zdG1vbnN0ZXIuY29t X-Local-Domain: yes Received-SPF: permerror client-ip=192.185.145.4; envelope-from=stephen_leake@stephe-leake.org; helo=gateway33.websitewelcome.com X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_NEUTRAL=0.779 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271379 Archived-At: Eli Zaretskii writes: >> From: Yuan Fu >> Date: Thu, 15 Jul 2021 12:19:31 -0400 >> Cc: monnier@iro.umontreal.ca, >> emacs-devel@gnu.org >>=20 >> > Why do you need to do this when a buffer is updated? why not use >> > display as the trigger? Large portions of a buffer will never be >> > displayed, and some buffers will not be displayed at all. Why waste >> > cycles on them? Redisplay is perfectly equipped to tell you when some >> > chunk of buffer text is going to be redrawn, and it already knows to >> > do nothing if the buffer haven't changed. >>=20 >> Tree-sitter expects you to tell it every single change to the parsed tex= t. > > That cannot be true, because the parsed text could be in a state where > parsing it will fail.=20=20 You can relax this to "when a parse is requested, tree-sitter must be given the net changes to the text". You can combine several changes into one, if that saves time or something. But tree-sitter does have to deal with incorrect syntax. > When you are in the middle of writing the code, this is what will > happen many times, even if you pass the whole buffer to the parser. Yes. > And since tree-sitter _must_ be able to deal with this problem, it > also must be able to receive incomplete parts of the buffer text, and > do the best it can with it. That does not follow. I took that approach with ada-mode, and the results are not good. Mostly this is because Ada requires always parsing from BOB, so parsing only part of the buffer is bound to give bad results. Knowing the changes from a previous complete parse allows the parser to do a much better job. >> Say you have a buffer with some content and scrolled through it, so >> tree-sitter has parsed the whole buffer. Then some elisp edited some >> text outside the visible portion. Redisplay doesn=E2=80=99t happen, we d= on=E2=80=99t >> tell this edit to tree-sitter. Then I scroll to the place that has >> been edited. What now? > > Now you call tree-sitter passing it the part of the buffer that needs > to be parsed (e.g., the chunk that is about to be displayed). If > tree-sitter needs to look back, it will. No, you pass tree-sitter the net list of changes since the last parse was requested. Changes outside the visible region can easily affect the visible region; consider inserting a comment or block start or end. >> I=E2=80=99ve lost the change information, and tree-sitter=E2=80=99s tree= is out-dated. > > No information is lost because the updated buffer text is available. That is useful only if the previous buffer text is also available, so you can diff it. It is more efficient to keep a list of changes. Although if that list grows too large, it can be better to simply start over, and parse the whole buffer again. > In addition, Emacs records (for redisplay purposes) two places in each > buffer related to changes: the minimum buffer position before which no > changes were done since last redisplay, and the maximum buffer > position beyond which there were no changes. This can also be used to > pass only a small part of the buffer to the parser, because the rest > didn't change. Again, the input to tree-sitter is a list of changes, not a block of text containing changes. That is because of the way incremental parsing works. The list of changes to the buffer text are used to edit the parse tree, deleting nodes that represent deleted or modified text, lexing the new text to create new nodes. Then the parser is run on the edited tree, _not_ on the buffer text. The parser adds new nodes as appropriate to arrive at a complete parse tree. There's no point in trying to tell the parser how much to parse; any non-edited portion of the original text will be represented in the edited tree by one or a small number of nodes; the parser then consumes those quickly. >> What we can do is to only parse the portion from BOB to the visible >> portion. So we won=E2=80=99t parse the whole buffer unless you scroll to= the >> bottom. You can stop parsing at the end of a complete grammar production; in languages that require parsing from BOB, that is always EOB. The parser cannot stop at an arbitrary point in the text; that would leave an incomplete tree. The point of incremental parsing is that parsing unchanged text is very fast, because it is represented by a small number of nodes in the edited tree. > My primary worry is the fact that you want to use buffer-change hooks > (and will soon enough want to use post-command-hook as well). They > slow down editing, sometimes tremendously, so I'd very much prefer not > to use those hooks for fontification/parsing. The original font-lock > mechanism in Emacs 19 used these hooks; we switched to jit-lock and > its redisplay-triggered fontifications because the original design had > problems which couldn't be solved reliably and with reasonable > performance. I hope we will not make the mistake of going back to > that sub-optimal design. Ah. That could be a problem; incremental parsing fundamentally requires a list of changes. If the parser is in an Emacs module, so it has direct access to the buffer, then the hooks only need to record the buffer positions of the insertions and deletions, not the new text. That should be very fast. Then the parse is only requested when the results are needed for something, like indent or fontify. That is how wisi works, except the parser is currently in an external process, so the buffer change hooks also have to store the new text, which can be large. Which is a good reason to improve wisi to support the parser in a module. In addition, the code that computes the requested information (fontification or indentation) takes region bounds as input, and only computes the information for that region (using the full parse tree); that is much faster than always computing all information for the entire buffer. eglot, on the other hand, sends the change information to the LSP server immediately (or after small delay), and then tries to do something with the response, rather than waiting until some event triggers a need for information from the server. I'm guessing that font-lock ran the actual fontification functions from the buffer-change hooks; that would be slow. --=20 -- Stephe