From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: access to parser stack in SMIE Date: Sun, 07 Oct 2012 19:18:07 -0400 Message-ID: <85wqz1dg7k.fsf@member.fsf.org> References: <85pq4wgrho.fsf@member.fsf.org> <85lifjfn10.fsf@member.fsf.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1349652199 25104 80.91.229.3 (7 Oct 2012 23:23:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 7 Oct 2012 23:23:19 +0000 (UTC) Cc: Stephen Leake , emacs-devel To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 08 01:23:24 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TL0Bv-0000c1-Qt for ged-emacs-devel@m.gmane.org; Mon, 08 Oct 2012 01:23:24 +0200 Original-Received: from localhost ([::1]:45253 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TL0Bp-0003rb-Ol for ged-emacs-devel@m.gmane.org; Sun, 07 Oct 2012 19:23:17 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:51291) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TL0Bm-0003rV-PH for emacs-devel@gnu.org; Sun, 07 Oct 2012 19:23:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TL0Bl-0003PZ-A9 for emacs-devel@gnu.org; Sun, 07 Oct 2012 19:23:14 -0400 Original-Received: from qmta05.westchester.pa.mail.comcast.net ([76.96.62.48]:50694) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TL0Bl-0003PT-5Y for emacs-devel@gnu.org; Sun, 07 Oct 2012 19:23:13 -0400 Original-Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta05.westchester.pa.mail.comcast.net with comcast id 8MZs1k0021uE5Es55PPHss; Sun, 07 Oct 2012 23:23:17 +0000 Original-Received: from TAKVER ([69.140.67.196]) by omta16.westchester.pa.mail.comcast.net with comcast id 8PJU1k00x4E4Fsd3cPJUCV; Sun, 07 Oct 2012 23:18:29 +0000 In-Reply-To: (Stefan Monnier's message of "Sun, 07 Oct 2012 15:04:56 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 76.96.62.48 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:154210 Archived-At: Stefan Monnier writes: >>> Actually, in some cases, it can be made to work: to disambiguate "begin" >>> setup a loop that calls smie-backward-sexp repeatedly (starting from the >>> position just before the "begin", of course) checking after each call >>> whether the result lets us decide which of the two begins we're >>> dealing with. >> In the Ada use case I presented (package ... begin), that ends up going >> all the way back to package at the beginning of the buffer, encountering >> more "begin" tokens on the way; it's much more efficient to start there >> in the first place. > > In your example, we might indeed end up scanning the buffer 3 times > (once for the normal scan, once to disambiguate package's `begin', and > one more time (in various chunks) to disambiguate the `begin's of the > nested functions). Yes. Which is why the cache is critical. And improving the cache by storing the stack at each keyword would be even better. > But I wonder now: can a "begin" that comes right after a "function > ... end;" be a begin-open? package Package_1 is function Function_1 return Integer is begin return 1; end; begin That isn't "begin-open". But how can you be sure that's the case you have? You can't just stop a the "end"; consider: package Package_1 is function Function_1 return Integer is begin begin ... end; begin -- begin-open ... end; return 1; end; begin -- begin-body If you scan backward, and get to "is" (refined to "is-subprogram_body"), you can stop; the flavor of "is" gives you the flavor of "begin". But that means crossing an un-refined "begin", possibly many. That's the problem. You could recurse; if you hit a "begin", start another scan backwards, and eventually you'll hit "is" and unwind. Maybe that would work. But it's messier and less general than the forward parse mechanism, and the other tokens that need forward parse would need their own variations. Too much ad-hoc code. >> Switching to another parsing technology just for the forward full-parse >> means supporting a second grammar; that's a lot of work. It might be >> simpler to switch to only forward full-parse (which I think is what you >> are suggesting). > > Yes, if you end up doing a full forward parse in most cases anyway, > there's little point doing extra work to support backward parsing. At the moment, it's the other way around; there are only a few cases that need full forward parse, and using smie for that works well enough. So it's the LALR parser that's extra work. >> If I switch to only using forward full-parse, I'd have to do things >> differently in the indentation rules as well. The key information they >> need need is the location of the statement start for every indentation >> point. So the forward parse could store the relevant statement start >> (and maybe other stuff) with every token. > > Indeed. I've gotten a trivial semantic grammar implemented, but it's not returning any tags. It's frustrating (as were my first few days with SMIE, come to think of it :). >> Hmm. The actual change to smie I'm asking for is very small (if we >> leave out the cache); perhaps you could put that in, with a large "use >> at your own risk" comment? > > Indeed, just exposing the stack is not that bad, and lets you solve your > problem. But it's kind of ugly. Maybe I could instead provide > a function that lets you query the particular part of the stack that > you're interested in (that would make it easier to adapt to a new > format of the stack, for example). > > Could you describe which part of the stack you need to know? The top few tokens, which might be most of the stack. Here's the code that examines the stack: (defconst ada-indent-pre-begin-tokens '("declare-open" "declare-label" "is-subprogram_body" "is-package" "is-task_body" "is-entry_body")) ;; Parsing from beginning of buffer; examine stack (let ((stack smie--levels) stack-token (token nil)) (while (null token) (setq stack-token (nth 0 (rassoc (pop stack) ada-indent-grammar))) (cond ((equal stack-token ";") nil) ((member stack-token ada-indent-pre-begin-tokens) (setq token "begin-body")) (t (setq token "begin-open")))) (if (>= (point) ada-indent-refine-forward-to) (throw 'ada-indent-refine-all-quit token) (throw 'local-quit token))) In this case: function Function_1 return Integer is Local_1 : integer; Local_2 : integer; begin we see ";" from the local variables, then "is-subprogram_body", and we stop. For a package-level "begin", we see a ";" for each intervening declaration, then "is-package", and stop. I guess you could provide a function that scans the stack, skipping ";", and returning the next token. That would work in this particular case, but I'm not sure about the other cases I need. -- -- Stephe