From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: [SPAM UNSURE] Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels random) Date: Sun, 25 Jul 2021 21:24:26 -0700 Message-ID: <86bl6p7ldx.fsf@stephe-leake.org> References: <179f22a44d8.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <179f38c0370.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <236e62c2-be9b-b26d-8cd0-4b5a1a86e19a@dancol.org> <86mtqsoh3f.fsf@stephe-leake.org> <286d815e-d1a1-07ca-6696-a7f51923ab4e@piermont.com> <86wnpl6f0y.fsf@stephe-leake.org> <865yx45y7g.fsf@stephe-leake.org> <0c575ca7-d287-4699-02bd-65822c11bf5d@piermont.com> <2e5ead63-624e-57bf-feaa-996f078fc782@dancol.org> <86im0z8olu.fsf@stephe-leake.org> <07c0a285-af96-a5cf-e008-e6eeffeb9d69@dancol.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32376"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (windows-nt) Cc: emacs-devel@gnu.org, Stefan Monnier , "Perry E. Metzger" To: Daniel Colascione Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jul 26 06:25:37 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m7sBM-0008Dr-Tf for ged-emacs-devel@m.gmane-mx.org; Mon, 26 Jul 2021 06:25:37 +0200 Original-Received: from localhost ([::1]:52630 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m7sBL-0002z8-BM for ged-emacs-devel@m.gmane-mx.org; Mon, 26 Jul 2021 00:25:35 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38498) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m7sAM-0002Ie-E4 for emacs-devel@gnu.org; Mon, 26 Jul 2021 00:24:34 -0400 Original-Received: from gateway34.websitewelcome.com ([192.185.148.142]:30993) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m7sAJ-0000Da-Gp for emacs-devel@gnu.org; Mon, 26 Jul 2021 00:24:34 -0400 Original-Received: from cm16.websitewelcome.com (cm16.websitewelcome.com [100.42.49.19]) by gateway34.websitewelcome.com (Postfix) with ESMTP id 61A71833F3 for ; Sun, 25 Jul 2021 23:24:29 -0500 (CDT) Original-Received: from host2007.hostmonster.com ([67.20.76.71]) by cmsmtp with SMTP id 7sAHmMCjjjSwz7sAHm8mfi; Sun, 25 Jul 2021 23:24:29 -0500 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=stephe-leake.org; s=default; h=Content-Type:MIME-Version:Message-ID: In-Reply-To:Date:References:Subject:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=3yoZ0QeAPQSJORc+vS6xx6b7szQ33W1Da4S3WIxDGcM=; b=lxIdN5T0iSIAcBLa+V88VlKqjE 7lGcDHNGIsPc54XiTX+ESDhCXQ7Luf9Dj98pbI1YVWWJ34IQuA1C+xjrEv+cRj4QP93Uh04bDWlnt XLtVTSfLWhqGyzfdUPQXwm3CBymGvaWNXzkNUZKdzqFgY3NmiUFPbC2SwN6aZGGL5jMnqLrTU2dM8 AOXYzeXpPndbJP0R3YzxuV16E62IvHQYMC1kngJGiG/QbMgraGzKpjM++Z0+aakSbjaYr/ynQGvOI LFSRUpo4uiv0FG3jpXFyP9abyP4+5j5x0T6i18f4UUvHxbvq6QtY1mKAf/QwEfmOgU+uoopf4lJjQ BBwkhl4g==; Original-Received: from [76.77.182.20] (port=58805 helo=Takver4) by host2007.hostmonster.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1m7sAG-00387c-ME; Sun, 25 Jul 2021 22:24:28 -0600 In-Reply-To: <07c0a285-af96-a5cf-e008-e6eeffeb9d69@dancol.org> (Daniel Colascione's message of "Sat, 24 Jul 2021 17:41:22 -0700") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host2007.hostmonster.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stephe-leake.org X-BWhitelist: no X-Source-IP: 76.77.182.20 X-Source-L: No X-Exim-ID: 1m7sAG-00387c-ME X-Source-Sender: (Takver4) [76.77.182.20]:58805 X-Source-Auth: stephen_leake@stephe-leake.org X-Email-Count: 4 X-Source-Cap: c3RlcGhlbGU7c3RlcGhlbGU7aG9zdDIwMDcuaG9zdG1vbnN0ZXIuY29t X-Local-Domain: yes Received-SPF: permerror client-ip=192.185.148.142; envelope-from=stephen_leake@stephe-leake.org; helo=gateway34.websitewelcome.com X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_NEUTRAL=0.779 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271629 Archived-At: Daniel Colascione writes: > On 7/24/21 1:05 PM, Stephen Leake wrote: > >> Daniel Colascione writes: >> >>> On 7/21/21 12:15 PM, Perry E. Metzger wrote: >>>> On 7/21/21 12:21, Daniel Colascione wrote: >>>>> On 7/21/21 7:43 AM, Perry E. Metzger wrote: >>>>>> Thought I would note that there's a substantial literature now on >>>>>> incremental parsing, especially the sort that is needed for editor >>>>>> tools. One doesn't need to reinvent the algorithms, they're out >>>>>> there waiting to be used. The Tree Sitter project is based on >>>>>> previous published work. >>>>> There is indeed a big literature! I wish there were a bigger >>>>> literature on *composable* incremental parsers though. IMHO, what >>>>> we need is an incremental GLR system (yes, GLR is bad worst-case, >>>>> but it's not a practical concern) that spits out a parse *forest* >>>>> which we then pare down to a parse tree with ad-hoc syntactic >>>>> consistency rules. Something like this naturally supports >>>>> multi-language modes and incorporation of out-of-band semantic >>>>> information. >>>>> >>>> Tree sitter handles GLR. >>>> >>> Cool. How does it prune the parse forest? >> wisi also uses GLR. It prunes trees during parse when the parse stacks >> contained in the trees are identical; it uses error recover cost and >> length to decide which tree to delete, or picks one at random. It's an >> error if more than one tree is alive at the end of parse. That's because >> programming languages must be unambiguous. It would be possible to adapt >> the wisi parser to use some other pruning strategy. > > > Programs *as a whole*, properly understood by a compiler or execution > environment, must be unambiguous. That's true. But when we're editing, > we're dealing with program fragments, sometimes damaged by user > modifications, and have to do our best given fragmentary information. Right. That's why wisi has robust error recovery. > All I'm suggesting is that it'd be useful to use language-specific > semantic rules to disambiguate parse trees: So far, wisi is only used for Ada; I did not need any disambiguation rules that seemed language-specific. That may change when/if other languages use wisi. > for example, if in location L1, symbol T can be a type or a name, and > in location L2, symbol T is definitely a type, then we should regard > symbol T as a type in location L1 too. That might be possible, but it adds a layer of semantic analysis that could be slow. -- -- Stephe