From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "Perry E. Metzger" Newsgroups: gmane.emacs.devel Subject: Re: Tree Sitter (was Re: cc-mode fontification feels random) Date: Wed, 21 Jul 2021 10:43:37 -0400 Message-ID: References: <83o8cge4lg.fsf@gnu.org> <62e438b5-d27f-1d3c-69c6-11fe29a76d74@dancol.org> <83fsxsdxhu.fsf@gnu.org> <179f22a44d8.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <179f38c0370.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <236e62c2-be9b-b26d-8cd0-4b5a1a86e19a@dancol.org> <86mtqsoh3f.fsf@stephe-leake.org> <286d815e-d1a1-07ca-6696-a7f51923ab4e@piermont.com> <86wnpl6f0y.fsf@stephe-leake.org> <865yx45y7g.fsf@stephe-leake.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2667"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.0 Cc: emacs-devel@gnu.org To: Stefan Monnier , Stephen Leake Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jul 21 16:44:31 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m6DSY-0000SN-WD for ged-emacs-devel@m.gmane-mx.org; Wed, 21 Jul 2021 16:44:31 +0200 Original-Received: from localhost ([::1]:56996 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m6DSX-0007jD-W9 for ged-emacs-devel@m.gmane-mx.org; Wed, 21 Jul 2021 10:44:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49938) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6DRl-0006QS-Tt for emacs-devel@gnu.org; Wed, 21 Jul 2021 10:43:41 -0400 Original-Received: from hacklheber.piermont.com ([2001:470:30:84:e276:63ff:fe62:3400]:43100) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6DRj-00082a-9D for emacs-devel@gnu.org; Wed, 21 Jul 2021 10:43:41 -0400 Original-Received: from snark.cb.piermont.com (localhost [127.0.0.1]) by hacklheber.piermont.com (Postfix) with UTF8SMTP id 4506A21B; Wed, 21 Jul 2021 10:43:38 -0400 (EDT) Original-Received: from [10.160.2.107] (jabberwock.cb.piermont.com [10.160.2.107]) by snark.cb.piermont.com (Postfix) with UTF8SMTP id 0355B2DE7BE; Wed, 21 Jul 2021 10:43:38 -0400 (EDT) Content-Language: en-US In-Reply-To: Received-SPF: pass client-ip=2001:470:30:84:e276:63ff:fe62:3400; envelope-from=perry@piermont.com; helo=hacklheber.piermont.com X-Spam_score_int: -5 X-Spam_score: -0.6 X-Spam_bar: / X-Spam_report: (-0.6 / 5.0 requ) BAYES_00=-1.9, NICE_REPLY_A=-0.117, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URI_DOTEDU=1.432 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271413 Archived-At: On 7/20/21 21:28, Stefan Monnier wrote: >> Alternately, you can only store the length of each token (as tree-sitter >> does); then when processing queries, you have to add up all the lengths >> of the preceding tokens in order to report the buffer position of the >> information you are computing. That is also linear in the length of the >> buffer. > You can probably get better than linear performance in "the usual case" > by storing the total length of the subtree at each node of the AST. > It's still theoretically linear in the worst case, of course. > Thought I would note that there's a substantial literature now on incremental parsing, especially the sort that is needed for editor tools. One doesn't need to reinvent the algorithms, they're out there waiting to be used. The Tree Sitter project is based on previous published work. There are good links at the end of the https://tree-sitter.github.io/tree-sitter/ web page, but I thought I'd link to some a couple of them directly: Practical Algorithms for Incremental Software Development Environments https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf Incremental Analysis of Real Programming Languages https://www.semanticscholar.org/paper/Incremental-analysis-of-real-programming-languages-Wagner-Graham/ca69018c29cc415820ed207d7e1d391e2da1656f?p2df There's also this paper on error recovery for LR parsers, but for some reason it won't load for me right now. http://www.dtic.mil/dtic/tr/fulltext/u2/a043470.pdf Perry