Re: access to parser stack in SMIE

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Stephen Leake <stephen_leake@member.fsf.org>
To: Stefan Monnier <monnier@IRO.UMontreal.CA>
Cc: Stephen Leake <stephen_leake@member.fsf.org>,
	emacs-devel <emacs-devel@gnu.org>
Subject: Re: access to parser stack in SMIE
Date: Sun, 07 Oct 2012 19:18:07 -0400	[thread overview]
Message-ID: <85wqz1dg7k.fsf@member.fsf.org> (raw)
In-Reply-To: <jwv8vbi2k38.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message of "Sun, 07 Oct 2012 15:04:56 -0400")

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>> Actually, in some cases, it can be made to work: to disambiguate "begin"
>>> setup a loop that calls smie-backward-sexp repeatedly (starting from the
>>> position just before the "begin", of course) checking after each call
>>> whether the result lets us decide which of the two begins we're
>>> dealing with.
>> In the Ada use case I presented (package ... begin), that ends up going
>> all the way back to package at the beginning of the buffer, encountering
>> more "begin" tokens on the way; it's much more efficient to start there
>> in the first place.
>
> In your example, we might indeed end up scanning the buffer 3 times
> (once for the normal scan, once to disambiguate package's `begin', and
> one more time (in various chunks) to disambiguate the `begin's of the
> nested functions).

Yes. Which is why the cache is critical. And improving the cache by
storing the stack at each keyword would be even better.

> But I wonder now: can a "begin" that comes right after a "function
> ... end;" be a begin-open?  

package Package_1 is 

    function Function_1 return Integer
    is begin
       return 1;
    end;

begin

That isn't "begin-open". But how can you be sure that's the case you
have? You can't just stop a the "end"; consider:

package Package_1 is 

    function Function_1 return Integer
    is begin
       begin
         ...
       end;

       begin -- begin-open
         ...
       end;

       return 1;
    end;

begin -- begin-body

If you scan backward, and get to "is" (refined to "is-subprogram_body"),
you can stop; the flavor of "is" gives you the flavor of "begin". But
that means crossing an un-refined "begin", possibly many. That's the
problem. You could recurse; if you hit a "begin", start another scan
backwards, and eventually you'll hit "is" and unwind. Maybe that would
work. But it's messier and less general than the forward parse
mechanism, and the other tokens that need forward parse would need their
own variations. Too much ad-hoc code.

>> Switching to another parsing technology just for the forward full-parse
>> means supporting a second grammar; that's a lot of work. It might be
>> simpler to switch to only forward full-parse (which I think is what you
>> are suggesting).
>
> Yes, if you end up doing a full forward parse in most cases anyway,
> there's little point doing extra work to support backward parsing.

At the moment, it's the other way around; there are only a few cases
that need full forward parse, and using smie for that works well enough.
So it's the LALR parser that's extra work.

>> If I switch to only using forward full-parse, I'd have to do things
>> differently in the indentation rules as well. The key information they
>> need need is the location of the statement start for every indentation
>> point. So the forward parse could store the relevant statement start
>> (and maybe other stuff) with every token.
>
> Indeed.

I've gotten a trivial semantic grammar implemented, but it's not
returning any tags. It's frustrating (as were my first few days with
SMIE, come to think of it :).

>> Hmm.  The actual change to smie I'm asking for is very small (if we
>> leave out the cache); perhaps you could put that in, with a large "use
>> at your own risk" comment?
>
> Indeed, just exposing the stack is not that bad, and lets you solve your
> problem.  But it's kind of ugly.  Maybe I could instead provide
> a function that lets you query the particular part of the stack that
> you're interested in (that would make it easier to adapt to a new
> format of the stack, for example).
>
> Could you describe which part of the stack you need to know?

The top few tokens, which might be most of the stack. Here's the code
that examines the stack:

(defconst ada-indent-pre-begin-tokens
  '("declare-open"
    "declare-label"
    "is-subprogram_body"
    "is-package"
    "is-task_body"
    "is-entry_body"))

	;; Parsing from beginning of buffer; examine stack
	(let ((stack smie--levels)
	      stack-token
	      (token nil))
	  (while (null token)
	    (setq stack-token (nth 0 (rassoc (pop stack) ada-indent-grammar)))
	    (cond
	     ((equal stack-token ";") nil)
	     ((member stack-token ada-indent-pre-begin-tokens)
	      (setq token "begin-body"))
	     (t
	      (setq token "begin-open"))))

	  (if (>= (point) ada-indent-refine-forward-to)
	      (throw 'ada-indent-refine-all-quit token)
	    (throw 'local-quit token)))

In this case:

    function Function_1 return Integer
    is 
       Local_1 : integer;
       Local_2 : integer;
    begin

we see ";" from the local variables, then "is-subprogram_body", and we
stop.

For a package-level "begin", we see a ";" for each intervening
declaration, then "is-package", and stop. 

I guess you could provide a function that scans the stack, skipping ";",
and returning the next token. That would work in this particular case,
but I'm not sure about the other cases I need.

-- 
-- Stephe

next prev parent reply	other threads:[~2012-10-07 23:18 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-06  4:21 access to parser stack in SMIE Stephen Leake
2012-10-06 12:37 ` Stefan Monnier
2012-10-06 18:55   ` Stephen Leake
2012-10-07 19:04     ` Stefan Monnier
2012-10-07 23:18       ` Stephen Leake [this message]
2012-10-08  1:00         ` Stefan Monnier
2012-10-08  1:28           ` Stephen Leake
2012-10-08 22:58             ` Stephen Leake
2012-10-09  2:26               ` Stefan Monnier
2012-10-09  3:29                 ` Stephen Leake
2012-10-09  4:20                   ` Stefan Monnier
2012-10-09 11:23                   ` Stephen Leake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85wqz1dg7k.fsf@member.fsf.org \
    --to=stephen_leake@member.fsf.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@IRO.UMontreal.CA \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.