From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Stephen Leake <stephen_leake@member.fsf.org>
Newsgroups: gmane.emacs.devel
Subject: Re: access to parser stack in SMIE
Date: Sun, 07 Oct 2012 19:18:07 -0400
Message-ID: <85wqz1dg7k.fsf@member.fsf.org>
References: <85pq4wgrho.fsf@member.fsf.org>
	<jwvipan23ok.fsf-monnier+emacs@gnu.org>
	<85lifjfn10.fsf@member.fsf.org>
	<jwv8vbi2k38.fsf-monnier+emacs@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: ger.gmane.org 1349652199 25104 80.91.229.3 (7 Oct 2012 23:23:19 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 7 Oct 2012 23:23:19 +0000 (UTC)
Cc: Stephen Leake <stephen_leake@member.fsf.org>,
	emacs-devel <emacs-devel@gnu.org>
To: Stefan Monnier <monnier@IRO.UMontreal.CA>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 08 01:23:24 2012
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1TL0Bv-0000c1-Qt
	for ged-emacs-devel@m.gmane.org; Mon, 08 Oct 2012 01:23:24 +0200
Original-Received: from localhost ([::1]:45253 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1TL0Bp-0003rb-Ol
	for ged-emacs-devel@m.gmane.org; Sun, 07 Oct 2012 19:23:17 -0400
Original-Received: from eggs.gnu.org ([208.118.235.92]:51291)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen_leake@member.fsf.org>) id 1TL0Bm-0003rV-PH
	for emacs-devel@gnu.org; Sun, 07 Oct 2012 19:23:15 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stephen_leake@member.fsf.org>) id 1TL0Bl-0003PZ-A9
	for emacs-devel@gnu.org; Sun, 07 Oct 2012 19:23:14 -0400
Original-Received: from qmta05.westchester.pa.mail.comcast.net ([76.96.62.48]:50694)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen_leake@member.fsf.org>) id 1TL0Bl-0003PT-5Y
	for emacs-devel@gnu.org; Sun, 07 Oct 2012 19:23:13 -0400
Original-Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88])
	by qmta05.westchester.pa.mail.comcast.net with comcast
	id 8MZs1k0021uE5Es55PPHss; Sun, 07 Oct 2012 23:23:17 +0000
Original-Received: from TAKVER ([69.140.67.196])
	by omta16.westchester.pa.mail.comcast.net with comcast
	id 8PJU1k00x4E4Fsd3cPJUCV; Sun, 07 Oct 2012 23:18:29 +0000
In-Reply-To: <jwv8vbi2k38.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message
	of "Sun, 07 Oct 2012 15:04:56 -0400")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt)
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 76.96.62.48
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:154210
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/154210>

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>> Actually, in some cases, it can be made to work: to disambiguate "begin"
>>> setup a loop that calls smie-backward-sexp repeatedly (starting from the
>>> position just before the "begin", of course) checking after each call
>>> whether the result lets us decide which of the two begins we're
>>> dealing with.
>> In the Ada use case I presented (package ... begin), that ends up going
>> all the way back to package at the beginning of the buffer, encountering
>> more "begin" tokens on the way; it's much more efficient to start there
>> in the first place.
>
> In your example, we might indeed end up scanning the buffer 3 times
> (once for the normal scan, once to disambiguate package's `begin', and
> one more time (in various chunks) to disambiguate the `begin's of the
> nested functions).

Yes. Which is why the cache is critical. And improving the cache by
storing the stack at each keyword would be even better.

> But I wonder now: can a "begin" that comes right after a "function
> ... end;" be a begin-open?  

package Package_1 is 

    function Function_1 return Integer
    is begin
       return 1;
    end;

begin

That isn't "begin-open". But how can you be sure that's the case you
have? You can't just stop a the "end"; consider:

package Package_1 is 

    function Function_1 return Integer
    is begin
       begin
         ...
       end;

       begin -- begin-open
         ...
       end;

       return 1;
    end;

begin -- begin-body
    
If you scan backward, and get to "is" (refined to "is-subprogram_body"),
you can stop; the flavor of "is" gives you the flavor of "begin". But
that means crossing an un-refined "begin", possibly many. That's the
problem. You could recurse; if you hit a "begin", start another scan
backwards, and eventually you'll hit "is" and unwind. Maybe that would
work. But it's messier and less general than the forward parse
mechanism, and the other tokens that need forward parse would need their
own variations. Too much ad-hoc code.

>> Switching to another parsing technology just for the forward full-parse
>> means supporting a second grammar; that's a lot of work. It might be
>> simpler to switch to only forward full-parse (which I think is what you
>> are suggesting).
>
> Yes, if you end up doing a full forward parse in most cases anyway,
> there's little point doing extra work to support backward parsing.

At the moment, it's the other way around; there are only a few cases
that need full forward parse, and using smie for that works well enough.
So it's the LALR parser that's extra work.

>> If I switch to only using forward full-parse, I'd have to do things
>> differently in the indentation rules as well. The key information they
>> need need is the location of the statement start for every indentation
>> point. So the forward parse could store the relevant statement start
>> (and maybe other stuff) with every token.
>
> Indeed.

I've gotten a trivial semantic grammar implemented, but it's not
returning any tags. It's frustrating (as were my first few days with
SMIE, come to think of it :).

>> Hmm.  The actual change to smie I'm asking for is very small (if we
>> leave out the cache); perhaps you could put that in, with a large "use
>> at your own risk" comment?
>
> Indeed, just exposing the stack is not that bad, and lets you solve your
> problem.  But it's kind of ugly.  Maybe I could instead provide
> a function that lets you query the particular part of the stack that
> you're interested in (that would make it easier to adapt to a new
> format of the stack, for example).
>
> Could you describe which part of the stack you need to know?

The top few tokens, which might be most of the stack. Here's the code
that examines the stack:

(defconst ada-indent-pre-begin-tokens
  '("declare-open"
    "declare-label"
    "is-subprogram_body"
    "is-package"
    "is-task_body"
    "is-entry_body"))

	;; Parsing from beginning of buffer; examine stack
	(let ((stack smie--levels)
	      stack-token
	      (token nil))
	  (while (null token)
	    (setq stack-token (nth 0 (rassoc (pop stack) ada-indent-grammar)))
	    (cond
	     ((equal stack-token ";") nil)
	     ((member stack-token ada-indent-pre-begin-tokens)
	      (setq token "begin-body"))
	     (t
	      (setq token "begin-open"))))

	  (if (>= (point) ada-indent-refine-forward-to)
	      (throw 'ada-indent-refine-all-quit token)
	    (throw 'local-quit token)))

In this case:

    function Function_1 return Integer
    is 
       Local_1 : integer;
       Local_2 : integer;
    begin

we see ";" from the local variables, then "is-subprogram_body", and we
stop.

For a package-level "begin", we see a ";" for each intervening
declaration, then "is-package", and stop. 

I guess you could provide a function that scans the stack, skipping ";",
and returning the next token. That would work in this particular case,
but I'm not sure about the other cases I need.

-- 
-- Stephe