From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: Tree Sitter (was Re: cc-mode fontification feels random) Date: Wed, 21 Jul 2021 09:21:38 -0700 Message-ID: References: <83o8cge4lg.fsf@gnu.org> <62e438b5-d27f-1d3c-69c6-11fe29a76d74@dancol.org> <83fsxsdxhu.fsf@gnu.org> <179f22a44d8.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <179f38c0370.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <236e62c2-be9b-b26d-8cd0-4b5a1a86e19a@dancol.org> <86mtqsoh3f.fsf@stephe-leake.org> <286d815e-d1a1-07ca-6696-a7f51923ab4e@piermont.com> <86wnpl6f0y.fsf@stephe-leake.org> <865yx45y7g.fsf@stephe-leake.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="26982"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 Cc: emacs-devel@gnu.org To: "Perry E. Metzger" , Stefan Monnier , Stephen Leake Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jul 21 18:26:27 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m6F3C-0006nu-W2 for ged-emacs-devel@m.gmane-mx.org; Wed, 21 Jul 2021 18:26:27 +0200 Original-Received: from localhost ([::1]:52630 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m6F3B-0004go-V6 for ged-emacs-devel@m.gmane-mx.org; Wed, 21 Jul 2021 12:26:26 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48266) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6Eyk-0006lG-9O for emacs-devel@gnu.org; Wed, 21 Jul 2021 12:21:50 -0400 Original-Received: from dancol.org ([2600:3c01:e000:3d8::1]:56946) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6Eyh-0008Kx-Ve for emacs-devel@gnu.org; Wed, 21 Jul 2021 12:21:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=S+a4htTNb3k8+tcGigevioeEBn6FvreAEfPlPB6uHFM=; b=HOyKq5Z4EkfkZ1GK/22maZ/h2m Pb/SbIPk+myS+nWh8WJ3tOVIr+8YCwuuobs3zrmbiQU233LaViN6icQwvGwdrisYEHQa7mBJx8PH0 tEA7cvbP0u3sES2hisYAevZ4iZSJRlJg/PAY/HycsKYjk7UmFXz65QYasJcrB26sborx27De2ADX0 b2U0Zu4ldE2BtzLt8Ap6QLTwPZISGJV6zXIyO1kGrtz5aTndxF4Mgm5sjJCcOMdJ9eZuL6xfqYVIi 6tifHcHRQkHCi34JoPvc8FumIo8FOIjgbcaPgIeMc9kH4C2jTO4Rbao7pfmwMRksnnFoSHfBla40V 26iJ6NLg==; Original-Received: from [2604:4080:1321:8020:2761:c5fe:e373:3ed] (port=33206) by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1m6Eya-000715-3m; Wed, 21 Jul 2021 09:21:40 -0700 In-Reply-To: Content-Language: en-US Received-SPF: pass client-ip=2600:3c01:e000:3d8::1; envelope-from=dancol@dancol.org; helo=dancol.org X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.117, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271418 Archived-At: On 7/21/21 7:43 AM, Perry E. Metzger wrote: > On 7/20/21 21:28, Stefan Monnier wrote: >>> Alternately, you can only store the length of each token (as >>> tree-sitter >>> does); then when processing queries, you have to add up all the lengths >>> of the preceding tokens in order to report the buffer position of the >>> information you are computing. That is also linear in the length of the >>> buffer. >> You can probably get better than linear performance in "the usual case" >> by storing the total length of the subtree at each node of the AST. >> It's still theoretically linear in the worst case, of course. >> > Thought I would note that there's a substantial literature now on > incremental parsing, especially the sort that is needed for editor > tools. One doesn't need to reinvent the algorithms, they're out there > waiting to be used. The Tree Sitter project is based on previous > published work. There is indeed a big literature! I wish there were a bigger literature on *composable* incremental parsers though. IMHO, what we need is an incremental GLR system (yes, GLR is bad worst-case, but it's not a practical concern) that spits out a parse *forest* which we then pare down to a parse tree with ad-hoc syntactic consistency rules. Something like this naturally supports multi-language modes and incorporation of out-of-band semantic information.