From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels random) Date: Sat, 24 Jul 2021 17:41:22 -0700 Message-ID: <07c0a285-af96-a5cf-e008-e6eeffeb9d69@dancol.org> References: <62e438b5-d27f-1d3c-69c6-11fe29a76d74@dancol.org> <83fsxsdxhu.fsf@gnu.org> <179f22a44d8.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <179f38c0370.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <236e62c2-be9b-b26d-8cd0-4b5a1a86e19a@dancol.org> <86mtqsoh3f.fsf@stephe-leake.org> <286d815e-d1a1-07ca-6696-a7f51923ab4e@piermont.com> <86wnpl6f0y.fsf@stephe-leake.org> <865yx45y7g.fsf@stephe-leake.org> <0c575ca7-d287-4699-02bd-65822c11bf5d@piermont.com> <2e5ead63-624e-57bf-feaa-996f078fc782@dancol.org> <86im0z8olu.fsf@stephe-leake.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17697"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 Cc: emacs-devel@gnu.org, Stefan Monnier , "Perry E. Metzger" To: Stephen Leake Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Jul 25 02:43:38 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m7SEy-0004FF-Av for ged-emacs-devel@m.gmane-mx.org; Sun, 25 Jul 2021 02:43:36 +0200 Original-Received: from localhost ([::1]:48702 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m7SEw-0000wd-DB for ged-emacs-devel@m.gmane-mx.org; Sat, 24 Jul 2021 20:43:34 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:47396) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m7SCz-0000DL-5U for emacs-devel@gnu.org; Sat, 24 Jul 2021 20:41:33 -0400 Original-Received: from dancol.org ([2600:3c01:e000:3d8::1]:56950) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m7SCx-0000IS-6o for emacs-devel@gnu.org; Sat, 24 Jul 2021 20:41:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=m3rkp3Q5tHeicT8EYnkhdtT9YPT7bi4ijB4/Y7zk/mA=; b=mKeqenY7E2X0RLY7GijSnFtcZa eInkg/eUQ1GuhcbknatrUD5wbtdxqeJt5FCH9XE8BL47SmEu8QvVPnOHfD+bGGpXJPu+yZVfmFIeB xfGhe2MEDuzscZYRMtLNIsIIQUwBe9nvMeSd5cD4GxWG3wClJ5KyERDcCTZvlf961vPivzxgezIYh tobdTwC+PF8bF6cBBR1Vbuy7IuAUCtr9sWXlHehY1gkIaL11v3oJTdHd2G2gciYnzMUL6VWE37Hqe pUJBb0Jaud3VgqygdnpdLITYXDK9FxHtrIAoXVeDNxl5nB+cIPdwACFy41DUu8sCMuTIH7ghE1DuG IG4HznTA==; Original-Received: from [2604:4080:1321:8020:f874:d744:59c6:1880] (port=59178) by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1m7SCr-0002hZ-0q; Sat, 24 Jul 2021 17:41:25 -0700 In-Reply-To: <86im0z8olu.fsf@stephe-leake.org> Content-Language: en-US Received-SPF: pass client-ip=2600:3c01:e000:3d8::1; envelope-from=dancol@dancol.org; helo=dancol.org X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.058, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271580 Archived-At: On 7/24/21 1:05 PM, Stephen Leake wrote: > Daniel Colascione writes: > >> On 7/21/21 12:15 PM, Perry E. Metzger wrote: >>> On 7/21/21 12:21, Daniel Colascione wrote: >>>> On 7/21/21 7:43 AM, Perry E. Metzger wrote: >>>>> Thought I would note that there's a substantial literature now on >>>>> incremental parsing, especially the sort that is needed for editor >>>>> tools. One doesn't need to reinvent the algorithms, they're out >>>>> there waiting to be used. The Tree Sitter project is based on >>>>> previous published work. >>>> There is indeed a big literature! I wish there were a bigger >>>> literature on *composable* incremental parsers though. IMHO, what >>>> we need is an incremental GLR system (yes, GLR is bad worst-case, >>>> but it's not a practical concern) that spits out a parse *forest* >>>> which we then pare down to a parse tree with ad-hoc syntactic >>>> consistency rules. Something like this naturally supports >>>> multi-language modes and incorporation of out-of-band semantic >>>> information. >>>> >>> Tree sitter handles GLR. >>> >> Cool. How does it prune the parse forest? > wisi also uses GLR. It prunes trees during parse when the parse stacks > contained in the trees are identical; it uses error recover cost and > length to decide which tree to delete, or picks one at random. It's an > error if more than one tree is alive at the end of parse. That's because > programming languages must be unambiguous. It would be possible to adapt > the wisi parser to use some other pruning strategy. Programs *as a whole*, properly understood by a compiler or execution environment, must be unambiguous. That's true. But when we're editing, we're dealing with program fragments, sometimes damaged by user modifications, and have to do our best given fragmentary information. All I'm suggesting is that it'd be useful to use language-specific semantic rules to disambiguate parse trees: for example, if in location L1, symbol T can be a type or a name, and in location L2, symbol T is definitely a type, then we should regard symbol T as a type in location L1 too. Iterate until we reach a fixed point, and only *then* apply the more general disambiguation strategies you've mentioned.