From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: Tree Sitter (was Re: cc-mode fontification feels random) Date: Tue, 20 Jul 2021 17:04:35 -0700 Message-ID: <865yx45y7g.fsf@stephe-leake.org> References: <83o8cge4lg.fsf@gnu.org> <62e438b5-d27f-1d3c-69c6-11fe29a76d74@dancol.org> <83fsxsdxhu.fsf@gnu.org> <179f22a44d8.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <179f38c0370.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <236e62c2-be9b-b26d-8cd0-4b5a1a86e19a@dancol.org> <86mtqsoh3f.fsf@stephe-leake.org> <286d815e-d1a1-07ca-6696-a7f51923ab4e@piermont.com> <86wnpl6f0y.fsf@stephe-leake.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14267"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (windows-nt) Cc: emacs-devel@gnu.org To: "Perry E. Metzger" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jul 21 02:05:39 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m5zk2-0003XR-Sv for ged-emacs-devel@m.gmane-mx.org; Wed, 21 Jul 2021 02:05:39 +0200 Original-Received: from localhost ([::1]:34218 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m5zk1-00013w-MJ for ged-emacs-devel@m.gmane-mx.org; Tue, 20 Jul 2021 20:05:37 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46692) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m5zjC-0000NQ-DM for emacs-devel@gnu.org; Tue, 20 Jul 2021 20:04:46 -0400 Original-Received: from gateway30.websitewelcome.com ([192.185.151.58]:31446) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m5zjA-0004JB-9d for emacs-devel@gnu.org; Tue, 20 Jul 2021 20:04:45 -0400 Original-Received: from cm14.websitewelcome.com (cm14.websitewelcome.com [100.42.49.7]) by gateway30.websitewelcome.com (Postfix) with ESMTP id A15CD7028 for ; Tue, 20 Jul 2021 19:04:39 -0500 (CDT) Original-Received: from host2007.hostmonster.com ([67.20.76.71]) by cmsmtp with SMTP id 5zj5mQasP7sOi5zj5mCwWs; Tue, 20 Jul 2021 19:04:39 -0500 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=stephe-leake.org; s=default; h=Content-Type:MIME-Version:Message-ID: In-Reply-To:Date:References:Subject:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ockZqfrRbXBTHnPIijOQ7eXGnqe8fkga2o3HmFrRz0k=; b=K7kgUTnVrOh9Vs7HrFGnFUpSyG DqCBNpDGUAS9GXxSRUFnMSwcq28yJCv/dhO6n+8P+O1G9dtKrIStSivIXNoSXhKaO1Hx/QQRL1eWe o2gZzoOvZWbWFbw5pS9Hk5Rsn5ZwE6TnrmpieqqR47Vmd4gWJAQh354wOZYWEBmcMaaMjs0tytnkd i/IAYyVJdmGOhf5/MdVszsUfaeBEXKcHPnR/PhUnEOsoqbQhlmdJDtbxoywumcxh9EtGEsuHgeWWT OQYhNffX2zUdRErSvj1acD2m8O8VItmShEh/0n9QLWyReCIO7LcMcraXMbdqGLS+JJ0zPlxwcqaUk iZoLB68w==; Original-Received: from [76.77.182.20] (port=58729 helo=Takver4) by host2007.hostmonster.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1m5zj4-000TeN-Ul; Tue, 20 Jul 2021 18:04:39 -0600 In-Reply-To: (Perry E. Metzger's message of "Tue, 20 Jul 2021 10:53:23 -0400") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host2007.hostmonster.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stephe-leake.org X-BWhitelist: no X-Source-IP: 76.77.182.20 X-Source-L: No X-Exim-ID: 1m5zj4-000TeN-Ul X-Source-Sender: (Takver4) [76.77.182.20]:58729 X-Source-Auth: stephen_leake@stephe-leake.org X-Email-Count: 2 X-Source-Cap: c3RlcGhlbGU7c3RlcGhlbGU7aG9zdDIwMDcuaG9zdG1vbnN0ZXIuY29t X-Local-Domain: yes Received-SPF: permerror client-ip=192.185.151.58; envelope-from=stephen_leake@stephe-leake.org; helo=gateway30.websitewelcome.com X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_NEUTRAL=0.779 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271397 Archived-At: "Perry E. Metzger" writes: > On 7/19/21 19:49, Stephen Leake wrote: >> "Perry E. Metzger" writes: >> >>> Apologies for not having been present for, er, the extensive previous >>> discussion on Tree Sitter. I discovered it looking at the archives. >> Ok. >> >>> I still believe that it would be a great thing to integrate into the >>> base of Emacs. The algorithms it employs are excellent, it's extremely >>> fast, and it handles the issues of real editors (like dealing with >>> partial code fragments and incrementally changing the parse on every >>> keystroke) very efficiently. >> +1. >> >> I'm working on adding incremental parse to wisi (and have been for >> almost a year now ...). I believe wisi has stronger error recovery than >> tree-sitter, which allows it to support indentation. > > > Tree sitter can reparse an entire file in a few milliseconds. This is > almost impossible to achieve in elisp I suspect. Yes. wisi.el is an elisp interface to an external process that is implemented in Ada. At some point, it will also support an internal module, which will avoid having to send text; larger files will be faster. When I finish implementing incremental parse, it should be as fast as tree-sitter. > Because of this, it can reparse on every keystroke, which is quite an > astonishing thing. There are some reports in the tree-sitter issues of reparsing taking longer, on large files. So there are some parts of the algorithm that are proportional to the buffer length, while most of the algorithm is proportional to the changes length. Consider; if the parse tree stores absolute buffer position for each token, then when you insert 5 chars at the beginning of the buffer, the buffer position of every node in the tree must be shifted by 5 chars. That process is linear in the length of the buffer (it can also be very fast). Alternately, you can only store the length of each token (as tree-sitter does); then when processing queries, you have to add up all the lengths of the preceding tokens in order to report the buffer position of the information you are computing. That is also linear in the length of the buffer. We'll have to see how fast wisi is; I'm making good progress in my testing, but there are still a few non-incremental algorithms to convert. -- -- Stephe