Re: access to parser stack in SMIE

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Stephen Leake <stephen_leake@member.fsf.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Stephen Leake <stephen_leake@member.fsf.org>,
	emacs-devel <emacs-devel@gnu.org>
Subject: Re: access to parser stack in SMIE
Date: Sat, 06 Oct 2012 14:55:39 -0400	[thread overview]
Message-ID: <85lifjfn10.fsf@member.fsf.org> (raw)
In-Reply-To: <jwvipan23ok.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message of "Sat, 06 Oct 2012 08:37:40 -0400")

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Your problem is one I also bumped into for the Modula-2 and Pascal modes
> (can't remember how I "solved" them nor to what extent the "solution"
> works).

Ok.

>> When all the tokens are properly disambiguated, SMIE can traverse
>> correctly from package "begin" to "package". But we can't do that while
>> disambiguating "begin"; that's circular.
>
> Actually, in some cases, it can be made to work: to disambiguate "begin"
> setup a loop that calls smie-backward-sexp repeatedly (starting from the
> position just before the "begin", of course) checking after each call
> whether the result lets us decide which of the two begins we're
> dealing with.

In the Ada use case I presented (package ... begin), that ends up going
all the way back to package at the beginning of the buffer, encountering
more "begin" tokens on the way; it's much more efficient to start there
in the first place.

>> It might also make sense to incorporate the refined keyword cache
>> mechanism into smie.
>
> Right, if we want to make the stack visible, then we also need to
> implement the cache.

Ok. Although different languages may want to cache different things, so
I'm not sure that would really be common code.

> Note that such a "forward full-parse with cache" approach has several
> downsides:
> - potential performance impact on long buffers.

I have not timed this on truly large code yet; I'll ask for examples.

The cache could be improved, by storing the stack with each token as
well; then when you need to do a full parse forward, you can start at
the previous valid token, not at the start of the buffer.

> - risk of the cache going out of sync.

Yes. I've already got an interactive `ada-indent-invalidate-cache' to
handle that, but the user will have to be aware. So far, the cache has
not gotten out of sync, but I haven't used this to write real code yet.

> - parse errors far away in an unrelated (earlier) part of the buffer
>   can prevent proper local indentation.  Parse errors can occur for lots
>   of reasons (e.g. temporarily incorrect code, incomplete parser, or
>   use of "exotic" language extension, use of a preprocessor, ...).

Yes; those are all reasons to stick with local parsing whenever
possible.

Hmm. "begin" occurs a lot in Ada code (every function body), so I
guess I can't really claim I'm not relying on forward full-parse much,
since I need it for every "begin".

> That doesn't mean it's not workable, but those downsides had better come
> with some significant upside.  One significant upside is that by only
> parsing forward we can use any other parsing technology, such as LALR,
> GLR, PEG, ...

A much more significant upside: it supports the language I need (Ada)!

Switching to another parsing technology just for the forward full-parse
means supporting a second grammar; that's a lot of work. It might be
simpler to switch to only forward full-parse (which I think is what you
are suggesting).

If I switch to only using forward full-parse, I'd have to do things
differently in the indentation rules as well. The key information they
need need is the location of the statement start for every indentation
point. So the forward parse could store the relevant statement start
(and maybe other stuff) with every token.

I could look at semantic, and see where that gets me. That would have
all the downsides listed here, without the benefit of being able to use
local parsing when it works. I've started that several times, and given
up because I had working examples of indentation engines that use smie,
or I figured out how to get smie to do what I need. But I guess it will
be worth giving it a more serious try. I should not stick with smie just
because I haven't learned semantic (how many times have I said that to
someone else? - sigh).

One upside of the semantic approach would be avoiding writing all the
token refining code; most is very simple, but some is pretty hairy, and
it can only get worse as I implement more of the Ada language.

> I.e. I think it's an interesting direction but it would be another package.

Hmm. The actual change to smie I'm asking for is very small (if we leave
out the cache); perhaps you could put that in, with a large "use at your own
risk" comment?

Otherwise, I can reimplement smie-next-sexp in ada-indent; there
have been times when I thought that would be a good idea anyway (it's
more complicated than I need). Then I would only be using smie for the
grammar generating code.

But I'll give semantic a serious try first; "today is a good day to
learn something new" :). 

-- 
-- Stephe

next prev parent reply	other threads:[~2012-10-06 18:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-06  4:21 access to parser stack in SMIE Stephen Leake
2012-10-06 12:37 ` Stefan Monnier
2012-10-06 18:55   ` Stephen Leake [this message]
2012-10-07 19:04     ` Stefan Monnier
2012-10-07 23:18       ` Stephen Leake
2012-10-08  1:00         ` Stefan Monnier
2012-10-08  1:28           ` Stephen Leake
2012-10-08 22:58             ` Stephen Leake
2012-10-09  2:26               ` Stefan Monnier
2012-10-09  3:29                 ` Stephen Leake
2012-10-09  4:20                   ` Stefan Monnier
2012-10-09 11:23                   ` Stephen Leake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85lifjfn10.fsf@member.fsf.org \
    --to=stephen_leake@member.fsf.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.