From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Stephen Leake <stephen_leake@member.fsf.org>
Newsgroups: gmane.emacs.devel
Subject: Re: access to parser stack in SMIE
Date: Sat, 06 Oct 2012 14:55:39 -0400
Message-ID: <85lifjfn10.fsf@member.fsf.org>
References: <85pq4wgrho.fsf@member.fsf.org>
	<jwvipan23ok.fsf-monnier+emacs@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: ger.gmane.org 1349550049 19835 80.91.229.3 (6 Oct 2012 19:00:49 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 6 Oct 2012 19:00:49 +0000 (UTC)
Cc: Stephen Leake <stephen_leake@member.fsf.org>,
	emacs-devel <emacs-devel@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Oct 06 21:00:55 2012
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1TKZcL-0005Fh-S2
	for ged-emacs-devel@m.gmane.org; Sat, 06 Oct 2012 21:00:54 +0200
Original-Received: from localhost ([::1]:48919 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1TKZcG-0007MF-2L
	for ged-emacs-devel@m.gmane.org; Sat, 06 Oct 2012 15:00:48 -0400
Original-Received: from eggs.gnu.org ([208.118.235.92]:44187)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen_leake@member.fsf.org>) id 1TKZcD-0007M1-22
	for emacs-devel@gnu.org; Sat, 06 Oct 2012 15:00:46 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stephen_leake@member.fsf.org>) id 1TKZcB-0007dW-Hj
	for emacs-devel@gnu.org; Sat, 06 Oct 2012 15:00:44 -0400
Original-Received: from qmta07.westchester.pa.mail.comcast.net ([76.96.62.64]:43237)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen_leake@member.fsf.org>) id 1TKZcB-0007dA-Cb
	for emacs-devel@gnu.org; Sat, 06 Oct 2012 15:00:43 -0400
Original-Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88])
	by qmta07.westchester.pa.mail.comcast.net with comcast
	id 7uma1k0011uE5Es57v0nj2; Sat, 06 Oct 2012 19:00:47 +0000
Original-Received: from TAKVER ([69.140.67.196])
	by omta16.westchester.pa.mail.comcast.net with comcast
	id 7uvz1k00A4E4Fsd3cuvzZR; Sat, 06 Oct 2012 18:55:59 +0000
In-Reply-To: <jwvipan23ok.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message
	of "Sat, 06 Oct 2012 08:37:40 -0400")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt)
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 76.96.62.64
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:154136
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/154136>

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Your problem is one I also bumped into for the Modula-2 and Pascal modes
> (can't remember how I "solved" them nor to what extent the "solution"
> works).

Ok.

>> When all the tokens are properly disambiguated, SMIE can traverse
>> correctly from package "begin" to "package". But we can't do that while
>> disambiguating "begin"; that's circular.
>
> Actually, in some cases, it can be made to work: to disambiguate "begin"
> setup a loop that calls smie-backward-sexp repeatedly (starting from the
> position just before the "begin", of course) checking after each call
> whether the result lets us decide which of the two begins we're
> dealing with.

In the Ada use case I presented (package ... begin), that ends up going
all the way back to package at the beginning of the buffer, encountering
more "begin" tokens on the way; it's much more efficient to start there
in the first place.

>> It might also make sense to incorporate the refined keyword cache
>> mechanism into smie.
>
> Right, if we want to make the stack visible, then we also need to
> implement the cache.

Ok. Although different languages may want to cache different things, so
I'm not sure that would really be common code.

> Note that such a "forward full-parse with cache" approach has several
> downsides:
> - potential performance impact on long buffers.

I have not timed this on truly large code yet; I'll ask for examples.

The cache could be improved, by storing the stack with each token as
well; then when you need to do a full parse forward, you can start at
the previous valid token, not at the start of the buffer.

> - risk of the cache going out of sync.

Yes. I've already got an interactive `ada-indent-invalidate-cache' to
handle that, but the user will have to be aware. So far, the cache has
not gotten out of sync, but I haven't used this to write real code yet.

> - parse errors far away in an unrelated (earlier) part of the buffer
>   can prevent proper local indentation.  Parse errors can occur for lots
>   of reasons (e.g. temporarily incorrect code, incomplete parser, or
>   use of "exotic" language extension, use of a preprocessor, ...).

Yes; those are all reasons to stick with local parsing whenever
possible.

Hmm. "begin" occurs a lot in Ada code (every function body), so I
guess I can't really claim I'm not relying on forward full-parse much,
since I need it for every "begin".

> That doesn't mean it's not workable, but those downsides had better come
> with some significant upside.  One significant upside is that by only
> parsing forward we can use any other parsing technology, such as LALR,
> GLR, PEG, ...

A much more significant upside: it supports the language I need (Ada)!

Switching to another parsing technology just for the forward full-parse
means supporting a second grammar; that's a lot of work. It might be
simpler to switch to only forward full-parse (which I think is what you
are suggesting).

If I switch to only using forward full-parse, I'd have to do things
differently in the indentation rules as well. The key information they
need need is the location of the statement start for every indentation
point. So the forward parse could store the relevant statement start
(and maybe other stuff) with every token.

I could look at semantic, and see where that gets me. That would have
all the downsides listed here, without the benefit of being able to use
local parsing when it works. I've started that several times, and given
up because I had working examples of indentation engines that use smie,
or I figured out how to get smie to do what I need. But I guess it will
be worth giving it a more serious try. I should not stick with smie just
because I haven't learned semantic (how many times have I said that to
someone else? - sigh).

One upside of the semantic approach would be avoiding writing all the
token refining code; most is very simple, but some is pretty hairy, and
it can only get worse as I implement more of the Ada language.

> I.e. I think it's an interesting direction but it would be another package.

Hmm. The actual change to smie I'm asking for is very small (if we leave
out the cache); perhaps you could put that in, with a large "use at your own
risk" comment?

Otherwise, I can reimplement smie-next-sexp in ada-indent; there
have been times when I thought that would be a good idea anyway (it's
more complicated than I need). Then I would only be using smie for the
grammar generating code.

But I'll give semantic a serious try first; "today is a good day to
learn something new" :). 

-- 
-- Stephe