From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: access to parser stack in SMIE Date: Sat, 06 Oct 2012 14:55:39 -0400 Message-ID: <85lifjfn10.fsf@member.fsf.org> References: <85pq4wgrho.fsf@member.fsf.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1349550049 19835 80.91.229.3 (6 Oct 2012 19:00:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 6 Oct 2012 19:00:49 +0000 (UTC) Cc: Stephen Leake , emacs-devel To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Oct 06 21:00:55 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TKZcL-0005Fh-S2 for ged-emacs-devel@m.gmane.org; Sat, 06 Oct 2012 21:00:54 +0200 Original-Received: from localhost ([::1]:48919 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TKZcG-0007MF-2L for ged-emacs-devel@m.gmane.org; Sat, 06 Oct 2012 15:00:48 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:44187) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TKZcD-0007M1-22 for emacs-devel@gnu.org; Sat, 06 Oct 2012 15:00:46 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TKZcB-0007dW-Hj for emacs-devel@gnu.org; Sat, 06 Oct 2012 15:00:44 -0400 Original-Received: from qmta07.westchester.pa.mail.comcast.net ([76.96.62.64]:43237) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TKZcB-0007dA-Cb for emacs-devel@gnu.org; Sat, 06 Oct 2012 15:00:43 -0400 Original-Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta07.westchester.pa.mail.comcast.net with comcast id 7uma1k0011uE5Es57v0nj2; Sat, 06 Oct 2012 19:00:47 +0000 Original-Received: from TAKVER ([69.140.67.196]) by omta16.westchester.pa.mail.comcast.net with comcast id 7uvz1k00A4E4Fsd3cuvzZR; Sat, 06 Oct 2012 18:55:59 +0000 In-Reply-To: (Stefan Monnier's message of "Sat, 06 Oct 2012 08:37:40 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 76.96.62.64 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:154136 Archived-At: Stefan Monnier writes: > Your problem is one I also bumped into for the Modula-2 and Pascal modes > (can't remember how I "solved" them nor to what extent the "solution" > works). Ok. >> When all the tokens are properly disambiguated, SMIE can traverse >> correctly from package "begin" to "package". But we can't do that while >> disambiguating "begin"; that's circular. > > Actually, in some cases, it can be made to work: to disambiguate "begin" > setup a loop that calls smie-backward-sexp repeatedly (starting from the > position just before the "begin", of course) checking after each call > whether the result lets us decide which of the two begins we're > dealing with. In the Ada use case I presented (package ... begin), that ends up going all the way back to package at the beginning of the buffer, encountering more "begin" tokens on the way; it's much more efficient to start there in the first place. >> It might also make sense to incorporate the refined keyword cache >> mechanism into smie. > > Right, if we want to make the stack visible, then we also need to > implement the cache. Ok. Although different languages may want to cache different things, so I'm not sure that would really be common code. > Note that such a "forward full-parse with cache" approach has several > downsides: > - potential performance impact on long buffers. I have not timed this on truly large code yet; I'll ask for examples. The cache could be improved, by storing the stack with each token as well; then when you need to do a full parse forward, you can start at the previous valid token, not at the start of the buffer. > - risk of the cache going out of sync. Yes. I've already got an interactive `ada-indent-invalidate-cache' to handle that, but the user will have to be aware. So far, the cache has not gotten out of sync, but I haven't used this to write real code yet. > - parse errors far away in an unrelated (earlier) part of the buffer > can prevent proper local indentation. Parse errors can occur for lots > of reasons (e.g. temporarily incorrect code, incomplete parser, or > use of "exotic" language extension, use of a preprocessor, ...). Yes; those are all reasons to stick with local parsing whenever possible. Hmm. "begin" occurs a lot in Ada code (every function body), so I guess I can't really claim I'm not relying on forward full-parse much, since I need it for every "begin". > That doesn't mean it's not workable, but those downsides had better come > with some significant upside. One significant upside is that by only > parsing forward we can use any other parsing technology, such as LALR, > GLR, PEG, ... A much more significant upside: it supports the language I need (Ada)! Switching to another parsing technology just for the forward full-parse means supporting a second grammar; that's a lot of work. It might be simpler to switch to only forward full-parse (which I think is what you are suggesting). If I switch to only using forward full-parse, I'd have to do things differently in the indentation rules as well. The key information they need need is the location of the statement start for every indentation point. So the forward parse could store the relevant statement start (and maybe other stuff) with every token. I could look at semantic, and see where that gets me. That would have all the downsides listed here, without the benefit of being able to use local parsing when it works. I've started that several times, and given up because I had working examples of indentation engines that use smie, or I figured out how to get smie to do what I need. But I guess it will be worth giving it a more serious try. I should not stick with smie just because I haven't learned semantic (how many times have I said that to someone else? - sigh). One upside of the semantic approach would be avoiding writing all the token refining code; most is very simple, but some is pretty hairy, and it can only get worse as I implement more of the Ada language. > I.e. I think it's an interesting direction but it would be another package. Hmm. The actual change to smie I'm asking for is very small (if we leave out the cache); perhaps you could put that in, with a large "use at your own risk" comment? Otherwise, I can reimplement smie-next-sexp in ada-indent; there have been times when I thought that would be a good idea anyway (it's more complicated than I need). Then I would only be using smie for the grammar generating code. But I'll give semantic a serious try first; "today is a good day to learn something new" :). -- -- Stephe