access to parser stack in SMIE

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* access to parser stack in SMIE
@ 2012-10-06  4:21 Stephen Leake
  2012-10-06 12:37 ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Leake @ 2012-10-06  4:21 UTC (permalink / raw)
  To: emacs-devel

I hit a major snag in the new Emacs Ada mode indentation engine, that I
resolved by allowing access to the stack SMIE uses in smie-next-sexp.
See the patch below; it was made from smie.el in Emacs 24.2.1. It
replaces the local variable `levels' (which contains the stack) with a
global variable `smie--levels', so ada-indent-forward-token can see the
stack contents.

The problem I had is that it is not possible to refine the Ada "begin"
keyword using only local information. "begin" is used in two ways: as
the _start_ of a block, and as the _divider_ between declarations and
statements in a block:

function F1 is
  <declarations>
begin -- divider (refined to "begin-body")
  <statements>
  begin -- block start (refined to "begin-open")
    <statements>
  end;
end;

These two uses must be different tokens in SMIE; "begin-open" is an
opener, while "begin-body" is neither opener nor closer.

The only way to figure out which role "begin" is playing is to parse
from the start of the compilation unit. Consider a package body:

package body Pack_1 is

  <declarations>

  function F1 is
    <declarations>
  begin -- divider
    <statements>
    begin -- block start
      <statements>
    end;
    <statements>
    begin -- block start
      <statements>
    end;
  end;

  begin -- divider
    <statements>
  end;

Here I've deliberately got the indentation wrong at the end, to
emphasize the ambiguity.

If we just look back or forward a few keywords from each "begin", we
can't tell which role it is playing. In particular, the package level
"begin" just looks like it follows a bunch of statements/declarations.
We must go all the way back to "package" to sort it out.

When all the tokens are properly disambiguated, SMIE can traverse
correctly from package "begin" to "package". But we can't do that while
disambiguating "begin"; that's circular.

The solution I found is to deliberately call `smie-forward-sexp'
starting at the beginning of the buffer. Then in
ada-indent-forward-token, when a "begin" is encountered, examine the
parser stack. We only have to look at a few tokens on the stack to
determine whether we've got a "divider" use or a "start" use.

Of course, parsing forward from the start of the buffer for each "begin"
is painfully slow (I tried that), so we also need a cache. I store the
refined keyword as a text property on each keyword;
ada-indent-forward-token and ada-indent-backward-token check for that
text property before calling the refine function.

The cache is invalidated when editing occurs before the keyword that has
cached information; this is handled by storing the maximum valid cache
position in `ada-indent-cache-max'; it is moved forward when a keyword
is refined, and backward by `ada-indent-after-change' when the buffer is
changed.

I use the cache for all keywords, since the mechanism is simple, and it
does provide some speed-up (some of the ada-indent-refine-* functions
are quite lengthy).

The full code for my current version is at
http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0
That includes the patched version of smie.el. At the moment, only the
function ada-indent-refine-begin uses the "parse from the beginning"
approach, but there are other keywords that need it; I plan to factor
that out soon.

I doubt that we really want to make the parser stack directly accessible
to smie clients; a copy would be safer. I implemented it this way
because it was the smallest change that let me test the idea. On the
other hand, there may be times when we'd like to add a token to the
stack, to handle broken source code, for example.

Also, the name of the global variable could be better; perhaps
`smie-parser-stack'.

It might also make sense to incorporate the refined keyword cache
mechanism into smie.

-- 
-- Stephe

--- smie.el
+++ smie.el
@@ -672,6 +672,9 @@ it should move backward to the beginning
   ;; has to be careful to distinguish those different cases.
   (eq (smie-op-left toklevels) (smie-op-right toklevels)))

+(defvar smie--levels nil
+  "Parser token stack lambda-bound in smie-next-sexp; `next-token' can examine stack to help refine.")
+
 (defun smie-next-sexp (next-token next-sexp op-forw op-back halfsexp)
   "Skip over one sexp.
 NEXT-TOKEN is a function of no argument that moves forward by one
@@ -691,7 +694,7 @@ Possible return values:
   (nil POS TOKEN): we skipped over a paren-like pair.
   nil: we skipped over an identifier, matched parentheses, ..."
   (catch 'return
-    (let ((levels
+    (let ((smie--levels
            (if (stringp halfsexp)
                (prog1 (list (cdr (assoc halfsexp smie-grammar)))
                  (setq halfsexp nil)))))
@@ -718,32 +721,32 @@ Possible return values:
               ;; A token like a paren-close.
               (assert (numberp     ; Otherwise, why mention it in smie-grammar.
                        (funcall op-forw toklevels)))
-              (push toklevels levels))
+              (push toklevels smie--levels))
              (t
-              (while (and levels (< (funcall op-back toklevels)
-                                    (funcall op-forw (car levels))))
-                (setq levels (cdr levels)))
+              (while (and smie--levels (< (funcall op-back toklevels)
+                                    (funcall op-forw (car smie--levels))))
+                (setq smie--levels (cdr smie--levels)))
               (cond
-               ((null levels)
+               ((null smie--levels)
                 (if (and halfsexp (numberp (funcall op-forw toklevels)))
-                    (push toklevels levels)
+                    (push toklevels smie--levels)
                   (throw 'return
                          (prog1 (list (or (car toklevels) t) (point) token)
                            (goto-char pos)))))
                (t
-                (let ((lastlevels levels))
-                  (if (and levels (= (funcall op-back toklevels)
-                                     (funcall op-forw (car levels))))
-                      (setq levels (cdr levels)))
+                (let ((lastlevels smie--levels))
+                  (if (and smie--levels (= (funcall op-back toklevels)
+                                     (funcall op-forw (car smie--levels))))
+                      (setq smie--levels (cdr smie--levels)))
                   ;; We may have found a match for the previously pending
                   ;; operator.  Is this the end?
                   (cond
                    ;; Keep looking as long as we haven't matched the
                    ;; topmost operator.
-                   (levels
+                   (smie--levels
                     (cond
                      ((numberp (funcall op-forw toklevels))
-                      (push toklevels levels))
+                      (push toklevels smie--levels))
                      ;; FIXME: For some languages, we can express the grammar
                      ;; OK, but next-sexp doesn't stop where we'd want it to.
                      ;; E.g. in SML, we'd want to stop right in front of
@@ -754,8 +757,8 @@ Possible return values:
                      ;;
                      ;; ((and (functionp (cadr (funcall op-forw toklevels)))
                      ;;       (funcall (cadr (funcall op-forw toklevels))
-                     ;;                levels))
-                     ;;  (setq levels nil))
+                     ;;                smie--levels))
+                     ;;  (setq smie--levels nil))
                      ))
                    ;; We matched the topmost operator.  If the new operator
                    ;; is the last in the corresponding BNF rule, we're done.
@@ -766,7 +769,7 @@ Possible return values:
                    ;; and is not associative, it's one of the inner operators
                    ;; (like the "in" in "let .. in .. end"), so keep looking.
                    ((not (smie--associative-p toklevels))
-                    (push toklevels levels))
+                    (push toklevels smie--levels))
                    ;; The new operator is associative.  Two cases:
                    ;; - it's really just an associative operator (like + or ;)
                    ;;   in which case we should have stopped right before.
@@ -778,8 +781,8 @@ Possible return values:
                    ;; - it's an associative operator within a larger construct
                    ;;   (e.g. an "elsif"), so we should just ignore it and keep
                    ;;   looking for the closing element.
-                   (t (setq levels lastlevels))))))))
-            levels)
+                   (t (setq smie--levels lastlevels))))))))
+            smie--levels)
         (setq halfsexp nil)))))

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-06  4:21 access to parser stack in SMIE Stephen Leake
@ 2012-10-06 12:37 ` Stefan Monnier
  2012-10-06 18:55   ` Stephen Leake
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-10-06 12:37 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

Your problem is one I also bumped into for the Modula-2 and Pascal modes
(can't remember how I "solved" them nor to what extent the "solution"
works).

> When all the tokens are properly disambiguated, SMIE can traverse
> correctly from package "begin" to "package". But we can't do that while
> disambiguating "begin"; that's circular.

Actually, in some cases, it can be made to work: to disambiguate "begin"
setup a loop that calls smie-backward-sexp repeatedly (starting from the
position just before the "begin", of course) checking after each call
whether the result lets us decide which of the two begins we're
dealing with.

Depending on the particular language's grammar this may not work because
"checking whether the result lets us decide" may not be possible.
But the recursion does terminate at least.

> The solution I found is to deliberately call `smie-forward-sexp'
> starting at the beginning of the buffer.

Right: the only way the parsing-stack can be used reliably is if we
always parse from the beginning.

> It might also make sense to incorporate the refined keyword cache
> mechanism into smie.

Right, if we want to make the stack visible, then we also need to
implement the cache.

Note that such a "forward full-parse with cache" approach has several
downsides:
- potential performance impact on long buffers.
- risk of the cache going out of sync.
- parse errors far away in an unrelated (earlier) part of the buffer
  can prevent proper local indentation.  Parse errors can occur for lots
  of reasons (e.g. temporarily incorrect code, incomplete parser, or
  use of "exotic" language extension, use of a preprocessor, ...).

That doesn't mean it's not workable, but those downsides had better come
with some significant upside.  One significant upside is that by only
parsing forward we can use any other parsing technology, such as LALR,
GLR, PEG, ...

I.e. I think it's an interesting direction but it would be another package.

        Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-06 12:37 ` Stefan Monnier
@ 2012-10-06 18:55   ` Stephen Leake
  2012-10-07 19:04     ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Leake @ 2012-10-06 18:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Stephen Leake, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Your problem is one I also bumped into for the Modula-2 and Pascal modes
> (can't remember how I "solved" them nor to what extent the "solution"
> works).

Ok.

>> When all the tokens are properly disambiguated, SMIE can traverse
>> correctly from package "begin" to "package". But we can't do that while
>> disambiguating "begin"; that's circular.
>
> Actually, in some cases, it can be made to work: to disambiguate "begin"
> setup a loop that calls smie-backward-sexp repeatedly (starting from the
> position just before the "begin", of course) checking after each call
> whether the result lets us decide which of the two begins we're
> dealing with.

In the Ada use case I presented (package ... begin), that ends up going
all the way back to package at the beginning of the buffer, encountering
more "begin" tokens on the way; it's much more efficient to start there
in the first place.

>> It might also make sense to incorporate the refined keyword cache
>> mechanism into smie.
>
> Right, if we want to make the stack visible, then we also need to
> implement the cache.

Ok. Although different languages may want to cache different things, so
I'm not sure that would really be common code.

> Note that such a "forward full-parse with cache" approach has several
> downsides:
> - potential performance impact on long buffers.

I have not timed this on truly large code yet; I'll ask for examples.

The cache could be improved, by storing the stack with each token as
well; then when you need to do a full parse forward, you can start at
the previous valid token, not at the start of the buffer.

> - risk of the cache going out of sync.

Yes. I've already got an interactive `ada-indent-invalidate-cache' to
handle that, but the user will have to be aware. So far, the cache has
not gotten out of sync, but I haven't used this to write real code yet.

> - parse errors far away in an unrelated (earlier) part of the buffer
>   can prevent proper local indentation.  Parse errors can occur for lots
>   of reasons (e.g. temporarily incorrect code, incomplete parser, or
>   use of "exotic" language extension, use of a preprocessor, ...).

Yes; those are all reasons to stick with local parsing whenever
possible.

Hmm. "begin" occurs a lot in Ada code (every function body), so I
guess I can't really claim I'm not relying on forward full-parse much,
since I need it for every "begin".

> That doesn't mean it's not workable, but those downsides had better come
> with some significant upside.  One significant upside is that by only
> parsing forward we can use any other parsing technology, such as LALR,
> GLR, PEG, ...

A much more significant upside: it supports the language I need (Ada)!

Switching to another parsing technology just for the forward full-parse
means supporting a second grammar; that's a lot of work. It might be
simpler to switch to only forward full-parse (which I think is what you
are suggesting).

If I switch to only using forward full-parse, I'd have to do things
differently in the indentation rules as well. The key information they
need need is the location of the statement start for every indentation
point. So the forward parse could store the relevant statement start
(and maybe other stuff) with every token.

I could look at semantic, and see where that gets me. That would have
all the downsides listed here, without the benefit of being able to use
local parsing when it works. I've started that several times, and given
up because I had working examples of indentation engines that use smie,
or I figured out how to get smie to do what I need. But I guess it will
be worth giving it a more serious try. I should not stick with smie just
because I haven't learned semantic (how many times have I said that to
someone else? - sigh).

One upside of the semantic approach would be avoiding writing all the
token refining code; most is very simple, but some is pretty hairy, and
it can only get worse as I implement more of the Ada language.

> I.e. I think it's an interesting direction but it would be another package.

Hmm. The actual change to smie I'm asking for is very small (if we leave
out the cache); perhaps you could put that in, with a large "use at your own
risk" comment?

Otherwise, I can reimplement smie-next-sexp in ada-indent; there
have been times when I thought that would be a good idea anyway (it's
more complicated than I need). Then I would only be using smie for the
grammar generating code.

But I'll give semantic a serious try first; "today is a good day to
learn something new" :). 

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-06 18:55   ` Stephen Leake
@ 2012-10-07 19:04     ` Stefan Monnier
  2012-10-07 23:18       ` Stephen Leake
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-10-07 19:04 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

>> Actually, in some cases, it can be made to work: to disambiguate "begin"
>> setup a loop that calls smie-backward-sexp repeatedly (starting from the
>> position just before the "begin", of course) checking after each call
>> whether the result lets us decide which of the two begins we're
>> dealing with.
> In the Ada use case I presented (package ... begin), that ends up going
> all the way back to package at the beginning of the buffer, encountering
> more "begin" tokens on the way; it's much more efficient to start there
> in the first place.

In your example, we might indeed end up scanning the buffer 3 times
(once for the normal scan, once to disambiguate package's `begin', and
one more time (in various chunks) to disambiguate the `begin's of the
nested functions).

But I wonder now: can a "begin" that comes right after a "function
... end;" be a begin-open?  I don't think so (but my Ada is rusty and
outdated).  So you might really not need to scan that far back.

>> Right, if we want to make the stack visible, then we also need to
>> implement the cache.
> Ok. Although different languages may want to cache different things, so
> I'm not sure that would really be common code.

We'd make smie.el cache the whole stack state at various buffer
positions (just like we do for the syntax-ppss cache).

>> - risk of the cache going out of sync.
> Yes. I've already got an interactive `ada-indent-invalidate-cache' to
> handle that, but the user will have to be aware. So far, the cache has
> not gotten out of sync, but I haven't used this to write real
> code yet.

I don't think there is that much to worry about in general.

> Switching to another parsing technology just for the forward full-parse
> means supporting a second grammar; that's a lot of work. It might be
> simpler to switch to only forward full-parse (which I think is what you
> are suggesting).

Yes, if you end up doing a full forward parse in most cases anyway,
there's little point doing extra work to support backward parsing.

> If I switch to only using forward full-parse, I'd have to do things
> differently in the indentation rules as well. The key information they
> need need is the location of the statement start for every indentation
> point. So the forward parse could store the relevant statement start
> (and maybe other stuff) with every token.

Indeed.

> Hmm.  The actual change to smie I'm asking for is very small (if we
> leave out the cache); perhaps you could put that in, with a large "use
> at your own risk" comment?

Indeed, just exposing the stack is not that bad, and lets you solve your
problem.  But it's kind of ugly.  Maybe I could instead provide
a function that lets you query the particular part of the stack that
you're interested in (that would make it easier to adapt to a new
format of the stack, for example).

Could you describe which part of the stack you need to know?

        Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-07 19:04     ` Stefan Monnier
@ 2012-10-07 23:18       ` Stephen Leake
  2012-10-08  1:00         ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Leake @ 2012-10-07 23:18 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Stephen Leake, emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>> Actually, in some cases, it can be made to work: to disambiguate "begin"
>>> setup a loop that calls smie-backward-sexp repeatedly (starting from the
>>> position just before the "begin", of course) checking after each call
>>> whether the result lets us decide which of the two begins we're
>>> dealing with.
>> In the Ada use case I presented (package ... begin), that ends up going
>> all the way back to package at the beginning of the buffer, encountering
>> more "begin" tokens on the way; it's much more efficient to start there
>> in the first place.
>
> In your example, we might indeed end up scanning the buffer 3 times
> (once for the normal scan, once to disambiguate package's `begin', and
> one more time (in various chunks) to disambiguate the `begin's of the
> nested functions).

Yes. Which is why the cache is critical. And improving the cache by
storing the stack at each keyword would be even better.

> But I wonder now: can a "begin" that comes right after a "function
> ... end;" be a begin-open?  

package Package_1 is 

    function Function_1 return Integer
    is begin
       return 1;
    end;

begin

That isn't "begin-open". But how can you be sure that's the case you
have? You can't just stop a the "end"; consider:

package Package_1 is 

    function Function_1 return Integer
    is begin
       begin
         ...
       end;

       begin -- begin-open
         ...
       end;

       return 1;
    end;

begin -- begin-body

If you scan backward, and get to "is" (refined to "is-subprogram_body"),
you can stop; the flavor of "is" gives you the flavor of "begin". But
that means crossing an un-refined "begin", possibly many. That's the
problem. You could recurse; if you hit a "begin", start another scan
backwards, and eventually you'll hit "is" and unwind. Maybe that would
work. But it's messier and less general than the forward parse
mechanism, and the other tokens that need forward parse would need their
own variations. Too much ad-hoc code.

>> Switching to another parsing technology just for the forward full-parse
>> means supporting a second grammar; that's a lot of work. It might be
>> simpler to switch to only forward full-parse (which I think is what you
>> are suggesting).
>
> Yes, if you end up doing a full forward parse in most cases anyway,
> there's little point doing extra work to support backward parsing.

At the moment, it's the other way around; there are only a few cases
that need full forward parse, and using smie for that works well enough.
So it's the LALR parser that's extra work.

>> If I switch to only using forward full-parse, I'd have to do things
>> differently in the indentation rules as well. The key information they
>> need need is the location of the statement start for every indentation
>> point. So the forward parse could store the relevant statement start
>> (and maybe other stuff) with every token.
>
> Indeed.

I've gotten a trivial semantic grammar implemented, but it's not
returning any tags. It's frustrating (as were my first few days with
SMIE, come to think of it :).

>> Hmm.  The actual change to smie I'm asking for is very small (if we
>> leave out the cache); perhaps you could put that in, with a large "use
>> at your own risk" comment?
>
> Indeed, just exposing the stack is not that bad, and lets you solve your
> problem.  But it's kind of ugly.  Maybe I could instead provide
> a function that lets you query the particular part of the stack that
> you're interested in (that would make it easier to adapt to a new
> format of the stack, for example).
>
> Could you describe which part of the stack you need to know?

The top few tokens, which might be most of the stack. Here's the code
that examines the stack:

(defconst ada-indent-pre-begin-tokens
  '("declare-open"
    "declare-label"
    "is-subprogram_body"
    "is-package"
    "is-task_body"
    "is-entry_body"))

	;; Parsing from beginning of buffer; examine stack
	(let ((stack smie--levels)
	      stack-token
	      (token nil))
	  (while (null token)
	    (setq stack-token (nth 0 (rassoc (pop stack) ada-indent-grammar)))
	    (cond
	     ((equal stack-token ";") nil)
	     ((member stack-token ada-indent-pre-begin-tokens)
	      (setq token "begin-body"))
	     (t
	      (setq token "begin-open"))))

	  (if (>= (point) ada-indent-refine-forward-to)
	      (throw 'ada-indent-refine-all-quit token)
	    (throw 'local-quit token)))

In this case:

    function Function_1 return Integer
    is 
       Local_1 : integer;
       Local_2 : integer;
    begin

we see ";" from the local variables, then "is-subprogram_body", and we
stop.

For a package-level "begin", we see a ";" for each intervening
declaration, then "is-package", and stop. 

I guess you could provide a function that scans the stack, skipping ";",
and returning the next token. That would work in this particular case,
but I'm not sure about the other cases I need.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-07 23:18       ` Stephen Leake
@ 2012-10-08  1:00         ` Stefan Monnier
  2012-10-08  1:28           ` Stephen Leake
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-10-08  1:00 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> Yes. Which is why the cache is critical. And improving the cache by
> storing the stack at each keyword would be even better.

That can lead to a very large cache, which can then become slow to
manage.  In syntax-ppss we keep the intermediate state of the parse
every ~20KB.  So that if we have to re-parse at most ~20KB's worth
of text.

>> But I wonder now: can a "begin" that comes right after a "function
>> ... end;" be a begin-open?  

> package Package_1 is 

>     function Function_1 return Integer
>     is begin
>        return 1;
>     end;

> begin

> That isn't "begin-open". But how can you be sure that's the case you
> have? You can't just stop a the "end"; consider:

No, but if you have

   package Package_1 is 
       function Function_1 return Integer
       is begin return 1; end;
       function Function_2 return Integer
       is begin return 2; end;
   begin

you can stop when you see the first "function".  I'd expect that within
a package there are usually several functions, so parsing backward
a single function doesn't sound bad compared to parsing forward the
whole file.

> work. But it's messier and less general than the forward parse
> mechanism, and the other tokens that need forward parse would need their
> own variations.

I get the impression that a

   (let ((begin-flavor nil))
     (while (let ((s (smie-backward-sexp 'halfsexp)))
              (null (setq begin-flavor (ada--begin-flavor s))))))

wouldn't be that hard to write and that ada--begin-flavor would look
similar to your current function that looks at the stack items.
But of course, here, details matter, so we can't know without actually
trying it out.

> Too much ad-hoc code.

You might be right.

> 	    (setq stack-token (nth 0 (rassoc (pop stack) ada-indent-grammar)))

Hmm... indeed, by relying on the identity of the cons cells in the
stack, you can recover the actual token, even if 2 tokens have the same
left&right precedence.
It should be the case in all real-life situations, but nothing
guarantees that it's the case.
Hmm...

> I guess you could provide a function that scans the stack, skipping ";",
> and returning the next token. That would work in this particular case,
> but I'm not sure about the other cases I need.

I have to think it over some more,

        Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-08  1:00         ` Stefan Monnier
@ 2012-10-08  1:28           ` Stephen Leake
  2012-10-08 22:58             ` Stephen Leake
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Leake @ 2012-10-08  1:28 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Yes. Which is why the cache is critical. And improving the cache by
>> storing the stack at each keyword would be even better.
>
> That can lead to a very large cache, which can then become slow to
> manage.  In syntax-ppss we keep the intermediate state of the parse
> every ~20KB.  So that if we have to re-parse at most ~20KB's worth
> of text.

That makes sense.

>> But it's messier and less general than the forward parse
>> mechanism, and the other tokens that need forward parse would need their
>> own variations.
>
> I get the impression that a
>
>    (let ((begin-flavor nil))
>      (while (let ((s (smie-backward-sexp 'halfsexp)))
>               (null (setq begin-flavor (ada--begin-flavor s))))))
>
> wouldn't be that hard to write and that ada--begin-flavor would look
> similar to your current function that looks at the stack items.
> But of course, here, details matter, so we can't know without actually
> trying it out.

Yes. I should look at this again; it might be better than what I have now.

>> 	    (setq stack-token (nth 0 (rassoc (pop stack) ada-indent-grammar)))
>
> Hmm... indeed, by relying on the identity of the cons cells in the
> stack, you can recover the actual token, even if 2 tokens have the same
> left&right precedence.
> It should be the case in all real-life situations, but nothing
> guarantees that it's the case.
> Hmm...

Ah, I never even considered that it might not be unique. That makes this
technique less satisfactory; to be safe, we'd have to add the actual
token to the stack.

I did wonder why the stack didn't have the actual token; I guess you
didn't need it for anything yet.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-08  1:28           ` Stephen Leake
@ 2012-10-08 22:58             ` Stephen Leake
  2012-10-09  2:26               ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Leake @ 2012-10-08 22:58 UTC (permalink / raw)
  To: emacs-devel

Stephen Leake <stephen_leake@member.fsf.org> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>> I get the impression that a
>>
>>    (let ((begin-flavor nil))
>>      (while (let ((s (smie-backward-sexp 'halfsexp)))
>>               (null (setq begin-flavor (ada--begin-flavor s))))))
>>
>> wouldn't be that hard to write and that ada--begin-flavor would look
>> similar to your current function that looks at the stack items.
>> But of course, here, details matter, so we can't know without actually
>> trying it out.
>
> Yes. I should look at this again; it might be better than what I have
> now.

It turns out you were right. Scanning backwards is better, because there
is one additional test you can apply; as you cross each Ada statement or
declaration, you can check which it was (statement or declaration), and
that tells you whether you have begin-open or begin-body.

So you typically only have to go back one statement, not all the way to
the top. I can construct a worse case scenario that requires scanning to
the top, but it's pretty pathological code.

That still involves encountering unrefined "begin", but you just recurse
on that, no problem.

I've implemented that, and removed my local copy of smie.el. The code is
quite straight-forward, not much more hairy than the code it replaces.
And this technique should work for the other places I thought I needed
full parse-forward. I hereby withdraw my request for smie--levels.

Thanks for pushing me on this :).

I've kept the cache; it's simple, and it should speed things up quite a
bit. I'll try to measure how much when I'm all done.

I'll still play with semantic; there are front-end tools like code
completion and function call template that rely on it. But it's less
urgent.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-08 22:58             ` Stephen Leake
@ 2012-10-09  2:26               ` Stefan Monnier
  2012-10-09  3:29                 ` Stephen Leake
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-10-09  2:26 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> So you typically only have to go back one statement, not all the way to
> the top. I can construct a worse case scenario that requires scanning to
> the top, but it's pretty pathological code.

Great.  That was also my experience with other languages so far, but I'm
glad it worked for you too.

> I've kept the cache; it's simple, and it should speed things up quite a
> bit.  I'll try to measure how much when I'm all done.

I'd be interested to see your results.

> I'll still play with semantic; there are front-end tools like code
> completion and function call template that rely on it. But it's less
> urgent.

Semantic support would be really neat, yes.
[ Hopefully, at some point SMIE and Semantic will get closer, to the
  point where a given language mode doesn't need to give 2 different
  specifications of the language.  ]


        Stefan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-09  2:26               ` Stefan Monnier
@ 2012-10-09  3:29                 ` Stephen Leake
  2012-10-09  4:20                   ` Stefan Monnier
  2012-10-09 11:23                   ` Stephen Leake
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen Leake @ 2012-10-09  3:29 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> I'll still play with semantic; there are front-end tools like code
>> completion and function call template that rely on it. But it's less
>> urgent.
>
> Semantic support would be really neat, yes.
> [ Hopefully, at some point SMIE and Semantic will get closer, to the
>   point where a given language mode doesn't need to give 2 different
>   specifications of the language.  ]

You could write a translator from the bison BNF format to the current
SMIE BNF format, that should not be too hard. 

But I suspect that the subset of the language represented in the BNF
will be different for the two; the tools that semantic supports (_not_
indentation) require different things. Detailed lists of parameter names
(for function template/code completion), for one, which SMIE does not
need. I'll know more after I finish this current Ada mode project.

Neither set of tools wants a _complete_ BNF.

Automating a BNF filter to extract just the right subset for each tool
would be a really neat feat (and probably pass the Turing test :).

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-09  3:29                 ` Stephen Leake
@ 2012-10-09  4:20                   ` Stefan Monnier
  2012-10-09 11:23                   ` Stephen Leake
  1 sibling, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2012-10-09  4:20 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel

> Neither set of tools wants a _complete_ BNF.

And because they use a different underlying parser algorithm, even if
they described exactly the same grammar, they'd want different BNFs.

> Automating a BNF filter to extract just the right subset for each tool
> would be a really neat feat (and probably pass the Turing test :).

I think that merging the two will require some way to give
a higher-level description of the grammar which gives enough hints about
what goes to SMIE and what goes to Semantic (and what is shared).

        Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: access to parser stack in SMIE
  2012-10-09  3:29                 ` Stephen Leake
  2012-10-09  4:20                   ` Stefan Monnier
@ 2012-10-09 11:23                   ` Stephen Leake
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Leake @ 2012-10-09 11:23 UTC (permalink / raw)
  To: emacs-devel

Stephen Leake <stephen_leake@member.fsf.org> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>>> I'll still play with semantic; there are front-end tools like code
>>> completion and function call template that rely on it. But it's less
>>> urgent.
>>
>> Semantic support would be really neat, yes.
>> [ Hopefully, at some point SMIE and Semantic will get closer, to the
>>   point where a given language mode doesn't need to give 2 different
>>   specifications of the language.  ]
>
> You could write a translator from the bison BNF format to the current
> SMIE BNF format, that should not be too hard. 

I take this back; the SMIE grammar needs refined tokens, which the BNF
does not. Most of the work in building a SMIE indentation engine is
figuring out what tokens to refine, and how to refine them.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-10-09 11:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-06  4:21 access to parser stack in SMIE Stephen Leake
2012-10-06 12:37 ` Stefan Monnier
2012-10-06 18:55   ` Stephen Leake
2012-10-07 19:04     ` Stefan Monnier
2012-10-07 23:18       ` Stephen Leake
2012-10-08  1:00         ` Stefan Monnier
2012-10-08  1:28           ` Stephen Leake
2012-10-08 22:58             ` Stephen Leake
2012-10-09  2:26               ` Stefan Monnier
2012-10-09  3:29                 ` Stephen Leake
2012-10-09  4:20                   ` Stefan Monnier
2012-10-09 11:23                   ` Stephen Leake

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).