From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: Tokenizing Date: Mon, 22 Sep 2014 09:02:28 -0500 Message-ID: <85vbofkb2j.fsf@stephe-leake.org> References: <85ha01dm5u.fsf@stephe-leake.org> <541F4AD0.3060109@dancol.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1411394582 5124 80.91.229.3 (22 Sep 2014 14:03:02 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 22 Sep 2014 14:03:02 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Sep 22 16:02:54 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XW4Cb-0004gI-13 for ged-emacs-devel@m.gmane.org; Mon, 22 Sep 2014 16:02:53 +0200 Original-Received: from localhost ([::1]:46802 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XW4Ca-0003yz-Lh for ged-emacs-devel@m.gmane.org; Mon, 22 Sep 2014 10:02:52 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37268) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XW4CR-0003vP-Ow for emacs-devel@gnu.org; Mon, 22 Sep 2014 10:02:50 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XW4CL-0007jW-9E for emacs-devel@gnu.org; Mon, 22 Sep 2014 10:02:43 -0400 Original-Received: from dnvrco-outbound-snat.email.rr.com ([107.14.73.226]:58635 helo=dnvrco-oedge-vip.email.rr.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XW4CL-0007j2-48 for emacs-devel@gnu.org; Mon, 22 Sep 2014 10:02:37 -0400 Original-Received: from [70.94.38.149] ([70.94.38.149:50499] helo=TAKVER) by dnvrco-oedge01 (envelope-from ) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id 16/28-18326-6FB20245; Mon, 22 Sep 2014 14:02:30 +0000 In-Reply-To: <541F4AD0.3060109@dancol.org> (Daniel Colascione's message of "Sun, 21 Sep 2014 15:01:52 -0700") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (windows-nt) X-RR-Connecting-IP: 107.14.64.118:25 X-Authority-Analysis: v=2.1 cv=btPCBSqi c=1 sm=1 tr=0 a=AppmJ/7ZOOFWL/q6u6u93g==:117 a=AppmJ/7ZOOFWL/q6u6u93g==:17 a=ayC55rCoAAAA:8 a=9XSUBuVRJI8A:10 a=o_R75loqY_IA:10 a=9i_RQKNPAAAA:8 a=KxCsr730AAAA:8 a=rPc5LbpOMhw_17ECiEAA:9 a=lEV88E3HW2IA:10 X-Cloudmark-Score: 0 X-detected-operating-system: by eggs.gnu.org: BaiduSpider X-Received-From: 107.14.73.226 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:174645 Archived-At: Daniel Colascione writes: > Wagner's thesis contains a description of a few alternative incremental > GLR algorithms that look very promising. Thanks for the pointer; I really need better error handling in the Ada mode parser. It currently defaults to a _really_ dumb indentation algorithm (same as previous line) when the parse fails. > We can start parsing at some region we quickly determine is semantically > equivalent to the start of the buffer using something like > font-lock-beginning-of-syntax-function. For Ada, that's only start of buffer; the standard convention is only one compilation-unit (the single grammar start symbol) per file. > Here, we can reliably partially parse the buffer by parsing > forward (remember, GLR parsing is an online algorithm) until the > parsing limit is after the visible portion of the buffer; we can then > use a "shortest-to-completion" production (which we can pre-compute > from the grammar) to insert virtual tokens that complete the language. I had the same idea; I was hoping it was new and I could make money :). Is that in Wagner's thesis (I didn't see it on a quick scan), or written up somewhere else? > One really nice > property of GLR parsing (and LR parsers generally) is that we can > trivially pause and restart them, so we can parse in the background with > very little impact on user activity, using `while-no-input' to back out > to the event loop when we detect we have something more important to > do. That's what I need! Although a really fast GLR parser implemented in compiled Ada that accesses the buffer via an Emacs FFI would be better. > Error recovery is a challenge --- we want to do something useful on > buffers that are temporarily invalided by user edits. One of the > biggest challenges is dealing with mismatched brace constructs, since > one missing "}" in C-mode can cause the entire rest of the buffer to no > longer conform to the grammar. I'm hoping to use the same "shortest-to-completion" algorithm for this. It may distort the rest of the buffer, but it will be better than what it does now. > Bridge parsing is particularly interesting here: the basic idea is that > we perform one pass over the buffer using a simplified grammar that > looks only at nesting constructs, insert virtual braces to make > everything balance, and then apply the usual parsing algorithm (with > ad-hoc error recovery) to the result. I haven't tried it, but it > apparently works pretty well. That could work for Ada as well. > I like what SMIE does, but I don't think it's powerful enough to parse > C++. I wrote a SMIE parser for Ada, but it ended up always parsing the whole buffer. So I switched to LALR. Then to generalized so I would not have to mangle the grammar in the Ada standard as much. > GLR parsers also leave us with a parse forest instead of a parse tree, > and we really need a parse *tree* if we're going to apply fontification > rules specified in terms of AST nodes. We need to prune the parse > forest down to a parse tree before using it for fontification. In Ada mode, I just assume the results of parallel parses that end in the same state are equivalent for the purposes of Ada mode (fontification and navigation). So I just throw away all but one result. I haven't really investigated whether this is true. > Consider the most vexing parse: > > type variable(function()) One reason I like Ada; the equivalent declarations are not ambiguous. -- -- Stephe